! Flat Files, Inverted Files, Techniques, Analyzing Case by Row;

Variables by column DATA arrays, Statistics,

Fit Multivariate models, correlation, regression

Many data are arranged by major classification such as CASE, zone, activity, etc. The information for each CASE is variable values arranged in an ordered or unordered sequence along a row. The rows are then lists of data about the CASE. These kinds of data arrays may be directories or indices of other data arrays, files or libraries.

The NUNET index is an example of this type of file.  Such a data array can be called a flat file. In the Engineering Library INDEX ENLIST the file is a list of row id. numbers (document record number), titles, authors and library identifiers of the documents (document file numbers).

If the data in a row appear unordered there may be some underlying structure that may be discovered by subsequent analysis. The first approach is to assume that they are unordered and use a technique such a file inversion to interpret the data in the file. The ENLIST system also uses an inverted file that consists of words as row identifiers and lists of document record numbers that identify the documents that use the word as a descriptor.

The arrays of unordered data and their matching inverted arrays provide the means of identifying the frequency of a particular data element or of logical combinations of data elements. They can be used much in the same way as NUNET index is used to find a file, except that all the files that are identified by a keyword are listed in a row. The initial flat file and its inverted counterpart allow detailed individual or group data examination and analysis.

If the data in a row follow a predetermined sequence, the columns are lists of the values of each variable. The row or columns can then be compared with the other columns according to the classification that structured the table. This is common form recording classified information about a case.

If the values of a variable represent categories then an inverted file can be constructed for each category of each variable. An inverted file of this type can give frequency counts and identify cases that have desired combinations of variable values. If the data are not categorical then other analytical techniques may be used. The nature of the data values should be determined before attempting any type of analysis.

Cross section or snapshot information for are particular time period may form the basis for a two dimensional table. To portray a situation over time it may be necessary to have an ordered set of tables, i.e. a three dimensioned time series array. The notes below deal with two dimensional ordered arrays, but can be extended to three dimensions.

Models can be developed using one or more columns. The data can be from all or selected rows. Time series data are saved by the sequence of values for the cells of a series of tables. The placing of data in such arrays requires consistency in definition of position and scaling.

Such arrays are used by many data analysis packages such as Spreadsheets, SASS, SPSS, and STATGRAF. 
The Spreadsheet packages are particularly convenient for data entry and simple manipulation. 
Even some word processors have built in spreadsheet capabilities. 
 
^ [NUNET] * [General List of Transportation Topics ] * [Search Index * Files]

End to date 060215, ams