This note was prepared for use in an APL workspace. For full use of the material enter ROAD.aws, or other APL workspace. Then go to APLie., and enter: LESSON'x9', and follow instructions. Read the text material below for information and then decide if you need to use APL.
Properly calibrated mathematical models are the basic tools used for most quantitative prediction. Data from measurement, accounts, samples, etc are used to calibrate models of known form or to determine an appropiate model and its calibration.
The models may describe the variability of one or the relationship between two or more phenomena. There are a number of fitting techniques and a multitude of model forms. These notes use APL to describe the techniques ,the models and perform the necessary computation, plots, text preperation etc.
The following is based on the work many. Where specific material or techniques are used they are credited. To the many others not mentioned who evolved these concepts their work is gratefully acknowledged.
The concepts used are discussed in many texts on Probability and Statistics, Numerical Analysis, etc. These notes are not intended to be a substitute for such text but only as a summary of some of the techniques which may prove useful.
The following data are from "Probability Concepts in Engineering Planning and Design; A.H-S. Ang & W.H. Tang", p5 and will be used to illustrate a number of basic techniques:
mean rainfall from 1918 to 1946. Enter beginning at bottom line YY,54.49 47.38 40.78 45.05 50.37 54.91 51.28 39.91 53.29 67.59 YY,58.71 42.96 55.77 41.31 58.83 48.21 44.67 67.72 43.11 Y43.3 53.02 63.52 45.93 48.26 50.51 49.47 43.93 46.77 59.12`Graphs of the data are used to determine the shape of their distribution. By sorting from low to high and plotting according to position is similar to a cumulative distribution function.
Plot' Unsorted data' vs Y r Plot' Sorted data' vs YY[Y] rThese data represent measurements of a natural phenomenon which has inherent variability. They are arranged as a list (vector) of amounts. They can be summarized by computing the mean, range, variance, standard deviation, etc.
MEAN(+/Y)Y r Usual APL idiom for computing mean RANGE(/Y), /Y r Range: High and Low VAR(+/(Y-MEAN)*2)Y r Variance SDVAR*2 r Standard deviationThe data list is a tabular function which can be used for simulation purposes. Eg. there are 29 values and a random selection of 10 values can be made from the table as follows:
This technique is useful for selecting random values for Monte Carlo Simulation. eg.
Y[?1029] r Selection of 10 values from table at random
Plot' Sorted data against index' vs Y r
If the plot approximates a straight line the data could be
represented by a uniform distribution. Ang and Tang discuss the
technique of using plots on probability paper to empirically
determine an appropiate distribution model.
The technique transforms the X values to a scale which is a function of a distribution model. The Y and the transformed X values can then be plotted as if they were linear.`
A transform of the X values from positions to a 0-1 probability scale (position 1 + no. of items). These values are then scaled for the distribution to be tested.
A function which is a 13 degree polynomial has been calibrated to estimate cumulative normal probability. The development of this function and the importance of polynomials is discussed elsewhere.
A polynomal can be represented by a vector of coefficients, and evaluated by the decode function .
The coefficient vector for the above is item 7 of the UVARS file.
CNget 'v7
Development of transformed X values for a normal probability
plot involves the following steps:
XY r Index numbers of sorted Y XX1+Y r Plot positions ( i1+n) 6 4 X(X.+,0)CN r Display and TransformationsThe plot expression below incorporates all the steps described above. The Y's are sorted and plotted against computed X's.
Plot' Transformed X plot' Y vs X(((Y)1+Y).+,0)CN rWhile this plot is not exactly a straight line it is closer than that of the untransformed plot positions, indicating that the data approximately fit a normal distribution.
If a straight line is fitted to the Y data and the transformed plot positions the parameters of the distribution can be computed from the line. The technique used to fit the line will be described later and only the code is listed below:
A Y[Y] ((((Y)1+Y).+,0)CN).*1 0 r` Plot' Transformed X and straight line plot' Y, vs X (((Y)1+Y).+,0)CN rThe vector A is the coefficients of a 1 degree polynomial, i.e. a straight line that can be evaluated by the decode function . The mean should be at the centre of the distribution, i.e. at 0.5 and the estimated value of Y at this point is shown below:
.5A r
(+/Y),Y r An other technique for computing the mean
A function for transforming the plot positions for an
exponential distribution is defined below:`
fn 'ed:r(1+*5w)1+*5'`Some sorted exponentially distributed data are listed on p.271 of Ang & Tang are listed below:
SVX200 201 203 208 212 226 248 254 274 289 306 308 332 343 360
SVXSVX,389 408 460 531 543 559 611 772 774 787 791 842 909 946
SVXSVX,952 952 981 1031 1122 1331 1427 1635 1844 2497 2781
An untransformed plot of these data is as follows:
SVX vs SVX r
The above plot is clearly exponential in shape. A plot of the
same data transformed by the function 'ed' shows a much better
approximation to a straight line.
SVX vs ed (SVX)1+SVX r Plot vs transformed plot pos.Plot transform functions for other distributions can be developed using a technique for calibrating curvilinear models. The the fit of the data is then tested by viewing a plot as described above.
Quantative estimates of fit are given by the Chi-square and/or Kolmogorov-Smirnov tests.
The technique essentially duplicates probability paper plotting by using a transform as the horizontal axis to model the candidate probability distribution. The vertical axis can also be transformed to model the effect of transforming the data.
The technique is described in Ang and Tang Chapter 6 (DR 72669).
)erase SVX ed Y X A CN MEAN RANGE VAR SD LSN