Many data are arranged by major
classification such as CASE, zone, activity, etc. The information for each CASE
is variable values arranged in an ordered or unordered sequence along a row.
The rows are then lists of data about the CASE. These kinds of data arrays may
be directories or indices of other data arrays, files or libraries.
The NUNET index is an
example of this type of file. Such a
data array can be called a flat file. In the Engineering Library INDEX ENLIST
the file is a list of row id. numbers (document record number), titles, authors
and library identifiers of the documents (document file numbers).
If the data in a row appear unordered there
may be some underlying structure that may be discovered by subsequent analysis.
The first approach is to assume that they are unordered and use a technique
such a file inversion to interpret the data in the file. The ENLIST system also
uses an inverted file that consists of words as row identifiers and
lists of document record numbers that identify the documents that use the word
as a descriptor.
The arrays of unordered data and their
matching inverted arrays provide the means of identifying the frequency of a
particular data element or of logical combinations of data elements. They can
be used much in the same way as NUNET index is used to find a file, except that
all the files that are identified by a keyword are listed in a row. The initial
flat file and its inverted counterpart allow detailed individual or group data
examination and analysis.
If the data in a row follow a predetermined
sequence, the columns are lists of the values of each variable. The row or
columns can then be compared with the other columns according to the
classification that structured the table. This is common form recording classified
information about a case.
If the values of a variable represent
categories then an inverted file can be constructed for each category of each
variable. An inverted file of this type can give frequency counts and identify
cases that have desired combinations of variable values. If the data are not
categorical then other analytical techniques may be used. The nature of the
data values should be determined before attempting any type of analysis.
Cross section or snapshot information for are
particular time period may form the basis for a two dimensional table. To
portray a situation over time it may be necessary to have an ordered set of
tables, i.e. a three dimensioned time series array. The notes below deal with
two dimensional ordered arrays, but can be extended to three dimensions.
Models can be developed using one or more
columns. The data can be from all or selected rows. Time series data are saved
by the sequence of values for the cells of a series of tables. The placing of
data in such arrays requires consistency in definition of position and scaling.
Such arrays are used by many data analysis packages such as Spreadsheets, SASS, SPSS, and STATGRAF. The Spreadsheet packages are particularly convenient for data entry and simple manipulation. Even some word processors have built in spreadsheet capabilities.
^ [NUNET] * [General List of Transportation Topics ] * [Search Index * Files]
End to date 060215, ams