Chapter 6: Histo(gram)  


The histo application is used to calculate and display univariant statistics for differing data sets. For a sample population histo can be used to calculate basic statistics such as the mean, and variance. It may also be used to display the behavior of several different populations at once using stacked histograms and box and whisker plots. Histo can also be used to infer information about the population, i.e. is the population normal, using a 2 (chi-squared) test or by plotting the frequency distribution as a probability plot.

The histo application is composed of three sections (Figure 6.1); the main menu- bar, the status and log text area, and the drawing or graph area. The menu-bar is used to select all histo commands, the log/status area is used by the program to report important messages or results, and the drawing area is the display area for the histograms.

(6-1)Figure 6.1


Menu Items
Examples
Command Line Arguments
File Formats
Mathematics
Bibliography

The Main Menu:

The main menu controls nearly all the program operations; files can be opened and saved, graphics can be plotted, the appearance of the graphic can be modified, help can be requested, and the results can be sent to the printer. For histo there are eight items on the main menu: File, Data, Style, Statistics, Graph, Plot, Log, and Help (Figure 6.1). File controls file handling (opening, saving, naming files), directs printing, and allows the user to quit the application. Data defines which columns (when appropriate) the X and Y data will be read from. Style defines the parameters associated with the appearance of each line. Statistics displays some calculated statistical values of potential interest. Graph is used to define details about the graph border, fonts, label, mesh, and line styles. Log allows the user to save, print, or view any text which has been written to the status/log window. Help gives the user a selection of pop-up help topics. Each menu item is fully described below with all the available options.

[TOP] [SYNTAX]


File:

The File sub-menu options control file and print handling, and exiting the program. The options include Open, View, Save, Save as, Save Preferences, Print Setup, Print, and Quit.

Open:

Selecting File:Open generates a pop-up dialog which allows the user to select an existing data file. This dialog functions exactly as the dialog in Figure 5.2 (plotgraph - Chapter 5). As with plotgraph files, the default data file name extension is "*.dat".

View:

File:View pops up a simple screen editor with the current data file.

Save Preferences:

When using programs with many user options, it is not possible for the program to always pick reasonable default values for each parameter or input variable. For this reason preference files were created (See Appendix C). These allow the user to define a unique set of "defaults" applicable to the particular project. When File:Save Preferences is selected, histo determines how all the input variables are currently defined and writes them to the file "histo.prf."

WARNING: if "histo.prf" already exists, you will be warned that it is about to be over- written. If you do not want the old version destroyed you must move it to a new file (e.g. the UNIX command mv histo.prf histo.old.prf would be sufficient). When you press OK, the old version will be over-written! This cannot be done from within the application. To rename the file you will have to execute the UNIX mv command from a UNIX prompt in another window.

If "histo.prf" does not exist in the current directory, it is created. This is an ASCII file and can be edited by the user. See Appendix C for details.

Print Setup:

File:Print Setup works exactly as explained in Chapter 5.

Print:

File:Print generates a Postscript file of the graph, and depending on how the print options are define in Print Setup, directs this file to the specified print queue, or to the specified file.

Quit:

File:Quit terminates the program,.

[TOP] [SYNTAX]


Data:

When the appropriate file type is being read, selecting the Data:Modify menu option will pop-up the dialog shown if Figure 6.2. This dialog allows the user to select which Data Columns in the data file will be evaluated (Up to 20 columns can be selected. The number of toggles reflects the number of columns in the data file). It allows the user to specify the histogram Sizing Rule (This is used for sizing histogram bars). The options are Division Width, Number of Divisions, or Equal Percent. With the first two methods, the divisions are equally spaced; with the third method spacing is a function of the data distribution. For the Division Width, the user must specify the desired bar width (The default is 1/10 the data range). For Number of Divisions, user specifies how many equal divisions to divide the data range into (the default is 10). If the Equal Percent Divisions rule is selected, the Number of Divisions text field is used again to enter how many divisions to divide the data into. Instead of dividing the data by the range of the data though, the data is divided by number of points in the file. For example, if the data file has 100 points and the number of divisions is 20, the histogram will show the extents of groups of sorted 5 point data groupings. The Starting and Ending Locations, by default, are the minimum and maximum extents of the data file. These may be redefined to more appropriate values. If the values have been reset, or are set to the bounds of a pervious data set, pressing the Maximize Data Range button, will reset the Starting and Ending Locations values to the minimum and maximum extents of the current data set.

(6-2)Figure 6.2

[TOP] [SYNTAX]


Style:

The Style menu option allows the user to specify various attributes that control the appearance of the histogram graph. These are divided into three sub-menus: Plot Type, Y-Axis Type, and Transform Type.

Plot Type:

Style:Plot Type allows the user to specify type of graph that will be plotted. There are several different types of plots supported by histo. There are five basic options, some with additional options. The available graph type are: Histograms, Box and Whisker Plots, Cumulative Distributions, 1.0 - Cumulative Distributions, and Probability Plots.

Histograms:

There are two type of histograms, Single and Stacked. The Single histogram is the default and will plot all the data onto a single graph. When just one data column has been selected for plotting (See Data above) a plot similar to that in Figure 6.3a will be drawn. If more then one data column has been selected, the histogram division width is divided by the number of active data columns. The histogram bars for each range from the different data columns are then plotted side by side (Figure 6.3b). This kind of graph can become very busy and difficult to read if more then a few data columns are plotted together. Instead of using the Single option, the histograms can be Stacked (Figure 6.4). This style will generally be easier to interpret. If only one data column has been selected, Single or Stacked will generate the same plot.

(6-3a)Figure 6.3a and (6-3b)Figure 6.3b

(6-4)Figure 6.4

Box and Whisker Plots:

Box and Whisker plots are used to quickly show the mean, median, standard deviation, 25-75 percentiles, 10-90 percentiles, and the full range of the data. These plots can give the user a quick feel for the distribution of the data and whether the data is skewed. An example plot is shown in Figure 6.5a. A key to the different symbols is shown in Figure 6.5b.

(6-5a)Figure 6.5a and (6-5b)Figure 6.5b

Cumulative Distributions:

A cumulative distribution is similar to the histogram, but it starts at 0.0% on the left (the data minimum value), and increased to 100.0% on the right (the data maximum value). At any point in-between, the percent (or number) of data values less than the X-axis value is plotted (Figure 6.6a). Again Single or Stacked plots can be used.

(6-6a)Figure 6.6a

1.0 - Cumulative Distributions:

For the 1.0 minus the cumulative distribution, the histogram represents the percent (or number) of data values greater than the X-axis value (Figure 6.6.b). Like the histogram and the cumulative distribution plots, Single or Stacked plots can be used.

(6-6b)Figure 6.6b

Probability Plots:

If a Probability Plot option is desired, select Set, or one of the Exceedence or Rank Order options (Set is just a menu short-cut). The Exceedence Type may be specified, and the Rank Order Method used to determine the frequency of occurrence of a variable value can be specified.

The Exceedence Type only affects the labeling on the X-axis. An Exceedence plot indicates the percentage of points which exceed a specified value. A Nonexceedence plot indicates the percentage of points which do not exceed a specified value. The appearance of the graphs otherwise is identical.

The Hazen and Weibull methods are two methods for determining the Rank Order of one data value within the data set. For further details see the histo Mathematics section (Equations 6-12 and 6-13), or refer to McCuen (1989).

It is common in nature that the distribution of a measured parameter has a log distribution (This is set with the Style:Transform Type:Log menu option discussed below). If this is the case, a Normal probability plot will show a curved line (Figure 6.7a). From the curved line, though one can say the data is not normally distributed, but little more. By Log transforming the data, if the line becomes "straight," the probability plot suggests that the data is log-normally distributed (Figure 6.7b).

(6-7a)Figure 6.7a and (6-7b)Figure 6.7b

Y-Axis Type:

Style:Y-Axis Type allows the user to specify how the frequency distribution is presented on the Y-axis. It can be specified by Count or by Percentage.

Transform Type:

Style:Transform Type allows for either Normal or Log (Base 10) transforms (Normal implies the data is unaltered). Transformed histograms are shown in Figure 6.3a (Normal) and Figure 6.8 (Log). Transformed probability plots are shown in Figures 6.7a and 6.7b.

NOTE: The transforms use the log base 10, not natural log.

(6-8)Figure 6.8

[TOP] [SYNTAX]


Statistics:

There are two ways to display statistical information about data. One method is graphically, and the other is numerically.

Display:

The plotting routines in histo display various statistical information. Some of the items for different style plots can be selected using the Statistics:Display menu option which creates the pop-up dialog in Figure 6.9. Note, these items do not apply to all plot styles. For histograms and cumulative histograms, the Mean, Median, and Normal Distribution Curve can be turned on or off. In a box and whisker plot, the Mean, Median, 25-75 Percentiles, 10-90 Percentiles, and the Minimum and Maximum Extents can be turned on or off. In addition zero to three Standard Deviations can be displayed.

(6-9)Figure 6.9

Tabulated:

Statistics:Tabulated will display the informational dialog shown in Figure 6.10. This dialog gives statistical information about the data set and the histogram. Data such as the data set minimum and maximum, mean, median, variance, standard deviation, average deviation, skew, kurtosis, 2 test result, and the 10th, 25th, 75th, and 90th percentiles (See the histo Mathematics section on the calculation of these values). The dialog also displays the number of data points and the extents of the data set. Note, if the data has been log transformed, these values represent the appropriate statistics or range based on the log value of each data point. A copy of the statistics can be printed to the log/status window by pressing the Post Statistics to Log/Status Window button. Pressing the Post All Columns button will print the statistics for all the selected data columns (See Data section) to the log/status window. To view different columns or data sets from the data file, press the Pervious or Next Active Date Set buttons.

(6-10)Figure 6.10

[TOP] [SYNTAX]


Graph:

Graph allows the user to specify various attributes about the appearance of the graph. Attributes about the graph Border, Fonts, Labels, Mesh, and Line Styles.

Border:

Graph:Border is described in Chapter 5 in the Graph:Border section (Figure 5.9).

Fonts:

Graph:Fonts is described in Chapter 5 in the Graph:Fonts section (Figures 5.10 and 5.11).

Labels:

Graph:Labels is described in Chapter 5 in the Graph:Labels section (Figure 5.12)

Line Styles:

This option is similar to that used in plotgraph but instead of changing the attributes associated with a line, attributes are changed with regard to the histogram bars, the mean data value line, and the standard deviation bars. They are not truly lines but they are treated as such:

Line #1 = Mean Value Data Line
Line #2 = Standard Deviation Error-Bars
Line #3+ = Histogram bars

This dialog is described in the Graph:Style section of Chapter 5 (Figure 5.14).

Mesh:

Graph:Mesh is described in Chapter 5 in the Graph:Mesh section (Figure 5.13).

[TOP] [SYNTAX]


Plot:

Plot is described in Chapter 5 in the Plot section.

[TOP] [SYNTAX]


Log:

The Log menu option is supplied to allow the user to save, view, or print all text which has been written to the log/status window by the program or added by the user (The log window is also a simple text editor). The options include View Log, Save, Save as, Clear, and Print. View Log, Save, and Save as are similar in operation to the menu options under File described above.

When calculating the frequency distribution for the histogram, the column number (Pos.), X position, and frequency of occurrence within each histogram bar are reported to the log window. NOTE: All calculations are maintained in the log window, and the most recent are presented at the top of the log. An example is shown below for data1.dat using 10 columns:

	Calculation #6
 	   Number of Divisions = 10
	 Pos.       X       Frequency
	-------------------------------------
	     1:      -4.24          9
	     2:      -3.69         29
	     3:      -3.14         82
	     4:      -2.6         132
 	     5:      -2.05        138
 	     6:      -1.51         89
 	     7:      -0.959        56
	     8:      -0.413        27
	     9:       0.133         9
	    10:       0.68          5

[TOP] [SYNTAX]


Help:

Help works exactly as explained in Chapter 5 (plotgraph, Figure 5.15) Help section.

[TOP] [SYNTAX]


Zoom and Mouse Control:

Using the mouse in histo is mush the same as described in plotgraph (Chapter 5). The mouse can be used to refresh the plot display and zoom in exactly the same manner, but in addition to the position of the mouse on the plot being shown in the upper left of the drawing area, the value of the histogram bar (if appropriate) relative to the size of the data set will be displayed. This will be in terms of number of points in the histogram bar to the total number of points in the data set (Depending on how the Style:Y-Axis Type option is set, this term may be expressed as a percentage).

[TOP]


Example of Using Histo:

Using histo is quite straight forward. Once a file has been loaded a graph is generated; most of the program options control only the appearance of the graph.

There are three methods to load a file in histo. The first is to execute histo from the UNIX prompt and open the file from the menu, the second is to pass the file as a command line argument, and the third is to define the file name in the program preference file. To open a file from the main menu, execute histo from the UNIX prompt:

> histo

Once in the application, select the File:Open menu option. The pop-up dialog shown in Figure 5.2 will appear. Select the desired file. Once a file has been selected the graph of the data will be drawn. To open a file from the command line, enter at the UNIX prompt:

> histo [optional arguments] filename

For example:

> histo data1.dat

will open the graph file shown in Figure 6.8, and

> histo -tt 1 data1.dat

will open the same data file, but it will specify the x-axis is a log axis. NOTE: in both Figures 6.4 and 6.8, other variables then those passed on the command line were defined. These variables could have been set using the menus, or using a preference file (Appendix C). A preference file is used to define user preferred variable default values. Every time histo runs, it searches the current working directory for the file histo.prf. If it exists, histo reads the file and sets the variables as specified. This is the third way to open a file, because one of the arguments in the preference file is the name of the graph file.

[TOP]


Running From the Command Line:

In many cases it is more convenient to run the application completely from the command line, or at least pass some parameter values in from the command line. The options listed below allow the user to accomplish almost anything that is possible from within the X-windows application from the command line (adding lines from different files is not currently supported). This feature can be useful when the user does not have a X-windows/Motif terminal available, or when many graphs need to be processed quickly, and the operation can be completed in batch mode without user interaction.

Syntax:

histo [-d1090 #] [-d2575 #] [-dc1 to -d20 #] [-dive #] [-divn #] [-divr #] [-dm #] [-dme #] [-dmm #] [-dsd #] [-esp #] [-exceed #] [-fnt1 " "] [-fnt2 " "] [-fnt3 " "] [-fnt4 " "] [-fnt5 " "] [-fnt6 " "] [-fnts1 #.#] [-fnts2 #.#] [-fnts3 #.#] [-fnts4 #.#] [-fnts5 #.#] [-fnts6 #.#] [-ft #] [-gst #] [-help] [-lc {#}] [-lgf " "] [-lpbm #.#] [-lpc #] [-lpd #] [-lpf " "] [-lph #] [-lplm #.#] [-lppsext " "] [-lpo #] [-lpq " "] [-lpr] [-lprm #.#] [-lps #] [-lptm #.#] [-lty {#}] [-ltk {#.#}] [-md #] [-mox #.#] [-moy #.#] [-ms #] [-mx #.#] [-my #.#] [-nt #] [-prf " "] [-pt #] [-rfh #] [-ro #] [-sfl {#}] [-ssz {#.#}] [-sttl " "] [-sty {#}] [-tt #] [-ttl " "] [-xfmt " "] [-xlabel " "] [-xmax #.#] [-xmin #.#] [-xMt #.#] [-xmt #] [-xto #.#] [-xy #.#] [-yfmt " "] [-ylabel " "] [-ymax #.#] [-ymin #.#] [-yMt #.#] [-ymt #] [-ys #.#] [-yto #.#] [filename]

Meaning of flag symbols:

# = integer
#.# = float
" " = character string.
{} = variable is an array. Values must be seperated by a ',' and no spaces are allowed. Do not use the "{ }" symbols on the command line.

NOTES:

1). All parameters in [] brackets are optional.
2). Quotes must be used around character strings.
3). Filename, if given, must be listed last.
4). If no default is given, the feature is not currently supported on command line.

If no entry is required for flag, flag command executed.

Flag Definitions:

-d1090 = draw 10-90 percentile default = 1
0
1
=
=
False
True
-d2575 = draw 25-75 percentile default = 1
0
1
=
=
False
True
-dcl to -d20 = active data column default = a (1),0 (2-20)
0
1
=
=
False
True
-dive = histogram bar ending location default = data maximum
-divn = number of divisions (histogram bars) default = 10
-divr = division method default = 0
0
1
2
=
=
=
number of divisions
equal division width
equal percentage divisions
-divs = histogram bar starting location default = data minimum
-divw = division width (histogram bars) default = data range / 10.0
-dm = draw mean default = 1
0
1
=
=
False
True
-dme = draw median default = 1
0
1
=
=
False
True
-dmm = draw minimum and maximum data extents default = 1
0
1
=
=
False
True
-dsd = draw standard deviation default = 1
0
1
=
=
False
True
-esp = exageration scale priority default = 0
0
1
=
=
favor y-exageration scale (-ys)
favor x/y ratio
-exceed = exceedence on nonexceedence switch default = 0
0
1
=
=
Exceedence
Nonexceedence
-fnt1 = main title font default = Helvetica-Bold
-fnt2 = secondary title font default = Helvetica-Bold
-fnt3 = axes label font default = Helvetica
-fnt4 = division font default = Helvetica
-fnt5 = annotation font default = Helvetica
-fnt6 = mouse position font default = Helvetica
-fnts1 = main title font size default = 24.0
-fnts2 = main title font size default = 15.0
-fnts3 = main title font size default = 15.0
-fnts4 = main title font size default = 12.0
-fnts5 = main title font size default = 10.0
-fnts6 = main title font size default = 12.0
-ft = frequency type default = 0
0
1
=
=
Count
Percentage
-gst = frequency type default = 0
0
1
=
=
single
stacked
-help = give this help menu
-lc {} = line color default = variable
0
1
2
3
4
5
6
7
=
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-lgf = log file name defalut = "log.dat"
-lglp = line legend position default = 1
0
1
2
3
=
=
=
=
Top left
Top right
Bottom left
Bottom right
-lgmw = maximum line legend width default = 200
-lpbm = page bottom margin default = 1.5
-lpc = number of copies to print default = 1
-lpd = print destination default = 0
0
1
=
=
Printer
File
-lpf = print filename default = "junk.ps"
-lph = print header page default = 0
0
1
=
=
False
True
-lplm = page left margin default = 1.5
-lpo = print orientation default = 0
0
1
=
=
Portrait
Landscape
-lppsext = search extention for postscript files default = "*.ps"
-lpq = print queue default = "ps"
-lpr = print file at specified orientations
-lprm = page right margin default = 1.0
-lps = print output default = 0
0
1
=
=
Black & white
Color
-lptm = page top margin default = 1.5
-lsfl {} = fill line symbol default = 0
0
1
=
=
False
True
-lsc {} = line symbol color default = variable
1
2
3
4
5
6
7
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-lssz {} = line synbol size default = 9.0
-lsty {} = line symbol type default = 0
-1
0
1
2
3
4
=
=
=
=
=
=
No Symbol
Circle
Cross
Diamond
Square
X
-ltk {} = line thickness default = 1.0
-lty {} = line type default = 0
-1
0
1
2
=
=
=
=
No Line
Solid
Dashed
Double Dashed
-md = dash mesh default = 0
0
1
=
=
False
True
-mox = X mesh origin default = 0.0
-moy = Y mesh origin default = 0.0
-ms = use mesh default = 0
0
1
=
=
False
True
-mx = X mesh frequency default = 1/10 DX
-my = Y mesh frequency default = 1/10 DY
-nt = show normal curve(s) default = 1
0
1
=
=
Show
Hide
-prf = preference file name defalut = "histo.prf"
-pt = plot type default = 0
0
1
2
3
4
5
=
=
=
=
=
=
Histogram
Box and Wisker Plot
Cumulative Distribution Function
1 - CDF
Probability
-rfh = screen refresh default = 0
0
1
=
=
On exposure
On update
-ro = rank order option default = 1
0
1
=
=
Hazen (2i - 1) / 2n
Weibull 1 / (n - 1)
-se = series file ending ID default = last series ID
-ss = series file starting ID default = 1
-sttl = Secondary title default = " "
-tt = transform type default = 0
0
1
=
=
Normal
Log (Base 10)
-ttl = Main title default = Filename
-xfmt = Number of decimal places for X-axis default = ".2f"
-xlabel = X-axis label default = "X"
-xmax = Graph X-maximum default = Data Maximum
-xmin = Graph X-minimum default = Data Minimum
-xMt = X main tic frequency default = 1/10 DX
-xmt = Number of minor X tics default = 5
-xto = X axis label origin default = 0.0
-xy = xy ratio default = 1.5
-yfmt = Number of decimal places for X-axis default = ".2f"
-ylabel = X-axis label default = "Y"
-ymax = Graph Y-maximum default = Data Maximum
-ymin = Graph Y-minimum default = Data Minimum
-yMt = X main tic frequency default = 1/10 DY
-ymt = Number of minor Y tics default = 5
-ys = Y-axis exageration relative to X-axis default = Calculated
-yto = X axis label origin default = 0.0
An example command might be (typed on one line):
histo -lpr -xmin 0.0 -ymin 0.0 -ymax 12.0 -md 1 -ms 1 -mx 1.0 -my 1.0 -xMt 1.0 - yMt 1.0 -ttl "Semivariogram of Elevation Data" -sttl "UNCERT histo module" - xlabel "distance (feet)" -ylabel "gamma h" -xfmt ".1f" -yfmt ".1f" -esp 1 -xy 1.0 water.out

[TOP]


Setting up the Input Data File:

Two basic type of files can be read by histo. The first type is simply column data (*.dat); the second are gridded (Chapters 11 and 13) data files.

Column:

No Header Column Data:

Data files using the column format may have one or more columns of data. The first line in the data file though determines how many columns there will be throughout the file. Each following line must have as many columns as the first line, or more (extra columns however are ignored). On each line, there must be an entry for every column; there is no accommodation for NULL or blank values. Each value on a line must be separated by a space; the file is unformatted. A sample data set might appear:

		1.0	23.23	0.123	1.45
		2.21	12.34	0.00123	1.56
		3.31	12.98	0.231	2.34
		4.56	8.21	0.345	1.76
		5.12	10.92	0.456	1.43
		.	.	.	.
		.	.	.	.
		.	.	.	.

This data set has four columns. Each column could be analyzed by selecting and plotting the data using the Data Column option under Data:Modify.

NOTE: This is the only file format available in histo which allows character data. See the notes in Chapter 5 about file format restrictions.

Variable Length Columns:

Normally a file must have an entry for every column in the data file. Using the VARIABLE LENGTH SETS format, this restriction can be bypassed. An example file would look like:

	VARIABLE LENGTH SETS
	2		Number of data sets
	4		Points in data set #1
	12.3
	15.5
	1.3
	99.5
	3		Points in data set #2
	0.012
	0.435
	0.098

The format is:

LINE 1 : VARIABLE LENGTH SETS
LINE 2 : Number of data sets

For each set:

LINE S1 : Number of points in set
LINE S2-... : Sample number.

GEO-EAS/GSLIB:

As described in Chapter 5, histo will support GEO-EAS file formats.

Gridded:

In UNCERT many gridded fdata setsare generated and used in modeling. Often it is important to examine the statistics of the data sets or grids. These data sets are directly readable by histo. For a complete description of the file formats for *.srf and *.bck files see Chapters 11 and 13.

[TOP]


Histo Mathematics:

Classical Statistics:

In examining the raw data there are a number of values which are of interest in this type of analysis. These are the mean, variance, skew, and kurtosis (these are the 1st, 2nd, 3rd, and 4th order moments of the data). Also of interest is whether the data are normally distributed, or can be transformed into a normal distribution. This is a fundamental assumption of kriging, and significant violations may lead to unreasonable results. To check that the data are normally distributed, this package supports probability plots and the chi-squared (2) test. These tools allow the user to determine if the data are likely to be normally distributed. If they appear not to be normally distributed, a logarithmic transform or another type transform may be appropriate. By viewing histograms of the data, a bimodal distribution may be identified which suggests there is more then one population of data in the data set (i.e. more then one process controlled the values sampled). If the data is bimodally distributed, it may be possible to separate the populations and check the normality of each population.

For a data set with n samples, the sample mean () is calculated as:

(6-1)(6-1)

where xi = an individual sample value. The median (M) is calculated by (Press et al, 1992):

(6-2)(6-2)

The unbiased sample variance (s2) may be calculated as (McCuen, 1989):

(6-3)(6-3)

and the standard deviation (s) is defined as:

(6-4)(6-4)

The standardized skew (g) is defined as (Press et al, 1992):

(6-5)(6-5)

where the skew is a measure of symmetry. A symmetric distribution will have a skew of zero, and non-symmetric skews will be positive or negative, as shown in Figure 6.11a (McCuen, 1989). The 4th order moment, kurtosis (k), is defined (Press et al, 1992):

(6-6)(6-6)

The kurtosis is a measure of the peakedness or flatness of the distribution relative to a normal distribution. A positive kurtosis reflects a peaked distribution (leptokurtic) and a negative kurtosis is relatively flat (platykuric) (Figure 6.11b, Press et al, 1992).

(6-11a)Figure 6.11a and (6-11b)Figure 6.11b

In addition to these statistical terms, the 10th, 25th, 75th, and 90th percentiles are often calculated. These are simply read from the sorted data set.

10th = xn*0.1

25th = xn*0.25

75th = xn*0.75

90th = xn*0.9

Calculation of Normal-Distribution and the Development of a Probability Distribution Graph Axis:

Normal-distribution with associated z values:

Once the moments for the data have been calculated, it is important to determine if the data are normally distributed, or to transform it, or to split the data so that they may be treated as normally distributed. One test for this is the 2 test; it is calculated as:
(6-7)(6-7)

where Oi = the observed frequency of values over a range, and Ei = the expected frequency of values over a range. To determine the expected frequency, the normal distribution itself must be evaluated. The expected frequency is based on the probability y samples will occur between two values. It is calculated by:

(6-8)(6-8)

(6-9)(6-9)

(6-10)(6-10)

Note that Equation 6-9 cannot be directly integrated and must be estimated numerically or evaluated from tabulated values. Below is a table of calculated probabilities for given z values (The table used by the software uses double precision values (non-truncated at the fourth decimal place)). These values agree with tables by McCuen (1989) to the fourth significant digit; the differences at the fifth place and beyond may only be in rounding.

------+-------------------------------------------------------------------------------
  Z   |  0.00    0.01    0.02    0.03    0.04    0.05    0.06    0.07    0.08    0.09
------+-------------------------------------------------------------------------------
-3.90 | 0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000 
-3.80 | 0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000  0.0000   
-3.70 | 0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0000  0.0000  
-3.60 | 0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  0.0001  
-3.50 | 0.0002  0.0002  0.0002  0.0002  0.0002  0.0002  0.0002  0.0001  0.0001  0.0001  
      |
-3.40 | 0.0003  0.0003  0.0003  0.0003  0.0003  0.0002  0.0002  0.0002  0.0002  0.0002  
-3.30 | 0.0005  0.0004  0.0004  0.0004  0.0004  0.0004  0.0004  0.0003  0.0003  0.0003  
-3.20 | 0.0007  0.0006  0.0006  0.0006  0.0006  0.0005  0.0005  0.0005  0.0005  0.0005  
-3.10 | 0.0009  0.0009  0.0009  0.0008  0.0008  0.0008  0.0008  0.0007  0.0007  0.0007  
-3.00 | 0.0013  0.0013  0.0012  0.0012  0.0012  0.0011  0.0011  0.0010  0.0010  0.0010  
      |
-2.90 | 0.0018  0.0018  0.0017  0.0017  0.0016  0.0016  0.0015  0.0015  0.0014  0.0014  
-2.80 | 0.0025  0.0024  0.0024  0.0023  0.0022  0.0022  0.0021  0.0020  0.0020  0.0019  
-2.70 | 0.0034  0.0033  0.0032  0.0031  0.0030  0.0030  0.0029  0.0028  0.0027  0.0026  
-2.60 | 0.0046  0.0045  0.0044  0.0042  0.0041  0.0040  0.0039  0.0038  0.0037  0.0035  
-2.50 | 0.0062  0.0060  0.0058  0.0057  0.0055  0.0054  0.0052  0.0051  0.0049  0.0048  
      |
-2.40 | 0.0082  0.0080  0.0077  0.0075  0.0073  0.0071  0.0069  0.0067  0.0065  0.0064  
-2.30 | 0.0107  0.0104  0.0102  0.0099  0.0096  0.0094  0.0091  0.0089  0.0086  0.0084  
-2.20 | 0.0139  0.0135  0.0132  0.0129  0.0125  0.0122  0.0119  0.0116  0.0113  0.0110  
-2.10 | 0.0179  0.0174  0.0170  0.0166  0.0162  0.0158  0.0154  0.0150  0.0146  0.0142  
-2.00 | 0.0227  0.0222  0.0217  0.0212  0.0207  0.0202  0.0197  0.0192  0.0188  0.0183  
      |
-1.90 | 0.0287  0.0281  0.0274  0.0268  0.0262  0.0256  0.0250  0.0244  0.0238  0.0233  
-1.80 | 0.0359  0.0351  0.0344  0.0336  0.0329  0.0322  0.0314  0.0307  0.0301  0.0294  
-1.70 | 0.0446  0.0436  0.0427  0.0418  0.0409  0.0401  0.0392  0.0384  0.0375  0.0367  
-1.60 | 0.0548  0.0537  0.0526  0.0516  0.0505  0.0495  0.0485  0.0475  0.0465  0.0455  
-1.50 | 0.0668  0.0655  0.0643  0.0630  0.0618  0.0606  0.0594  0.0582  0.0571  0.0559  
      |
-1.40 | 0.0808  0.0793  0.0778  0.0764  0.0750  0.0736  0.0722  0.0708  0.0695  0.0681  
-1.30 | 0.0968  0.0951  0.0935  0.0918  0.0902  0.0885  0.0869  0.0854  0.0838  0.0823  
-1.20 | 0.1151  0.1132  0.1113  0.1094  0.1075  0.1057  0.1039  0.1021  0.1003  0.0986  
-1.10 | 0.1357  0.1336  0.1314  0.1293  0.1272  0.1251  0.1231  0.1211  0.1191  0.1171  
-1.00 | 0.1587  0.1563  0.1539  0.1516  0.1492  0.1469  0.1446  0.1424  0.1401  0.1379  
      |
-0.90 | 0.1841  0.1815  0.1789  0.1763  0.1737  0.1711  0.1686  0.1661  0.1636  0.1612  
-0.80 | 0.2119  0.2091  0.2062  0.2033  0.2005  0.1977  0.1950  0.1922  0.1895  0.1868  
-0.70 | 0.2421  0.2389  0.2359  0.2328  0.2297  0.2267  0.2237  0.2207  0.2178  0.2148  
-0.60 | 0.2743  0.2710  0.2677  0.2644  0.2612  0.2579  0.2547  0.2515  0.2483  0.2452  
-0.50 | 0.3086  0.3051  0.3016  0.2982  0.2947  0.2913  0.2878  0.2844  0.2811  0.2777  
      |
-0.40 | 0.3447  0.3410  0.3373  0.3337  0.3301  0.3265  0.3229  0.3193  0.3157  0.3122  
-0.30 | 0.3822  0.3784  0.3746  0.3708  0.3670  0.3633  0.3595  0.3558  0.3521  0.3484  
-0.20 | 0.4208  0.4169  0.4130  0.4091  0.4053  0.4014  0.3975  0.3937  0.3898  0.3860  
-0.10 | 0.4603  0.4563  0.4523  0.4484  0.4444  0.4405  0.4365  0.4326  0.4287  0.4248  
 0.00 | 0.5000  0.4961  0.4921  0.4881  0.4841  0.4802  0.4762  0.4722  0.4682  0.4642  
      |
 0.00 | 0.5000  0.5039  0.5079  0.5119  0.5159  0.5198  0.5238  0.5278  0.5318  0.5358  
 0.10 | 0.5397  0.5437  0.5477  0.5516  0.5556  0.5595  0.5635  0.5674  0.5713  0.5752  
 0.20 | 0.5792  0.5831  0.5870  0.5909  0.5947  0.5986  0.6025  0.6063  0.6102  0.6140  
 0.30 | 0.6178  0.6216  0.6254  0.6292  0.6330  0.6367  0.6405  0.6442  0.6479  0.6516  
 0.40 | 0.6553  0.6590  0.6627  0.6663  0.6699  0.6735  0.6771  0.6807  0.6843  0.6878  
      |
 0.50 | 0.6914  0.6949  0.6984  0.7018  0.7053  0.7087  0.7122  0.7156  0.7189  0.7223  
 0.60 | 0.7257  0.7290  0.7323  0.7356  0.7388  0.7421  0.7453  0.7485  0.7517  0.7548  
 0.70 | 0.7579  0.7611  0.7641  0.7672  0.7703  0.7733  0.7763  0.7793  0.7822  0.7852  
 0.80 | 0.7881  0.7909  0.7938  0.7967  0.7995  0.8023  0.8050  0.8078  0.8105  0.8132  
 0.90 | 0.8159  0.8185  0.8211  0.8237  0.8263  0.8289  0.8314  0.8339  0.8364  0.8388  
      |
 1.00 | 0.8413  0.8437  0.8461  0.8484  0.8508  0.8531  0.8554  0.8576  0.8599  0.8621  
 1.10 | 0.8643  0.8664  0.8686  0.8707  0.8728  0.8749  0.8769  0.8789  0.8809  0.8829  
 1.20 | 0.8849  0.8868  0.8887  0.8906  0.8925  0.8943  0.8961  0.8979  0.8997  0.9014  
 1.30 | 0.9032  0.9049  0.9065  0.9082  0.9098  0.9115  0.9131  0.9146  0.9162  0.9177  
 1.40 | 0.9192  0.9207  0.9222  0.9236  0.9250  0.9264  0.9278  0.9292  0.9305  0.9319  
      |
 1.50 | 0.9332  0.9345  0.9357  0.9370  0.9382  0.9394  0.9406  0.9418  0.9429  0.9441  
 1.60 | 0.9452  0.9463  0.9474  0.9484  0.9495  0.9505  0.9515  0.9525  0.9535  0.9545  
 1.70 | 0.9554  0.9564  0.9573  0.9582  0.9591  0.9599  0.9608  0.9616  0.9625  0.9633  
 1.80 | 0.9641  0.9649  0.9656  0.9664  0.9671  0.9678  0.9686  0.9693  0.9699  0.9706  
 1.90 | 0.9713  0.9719  0.9726  0.9732  0.9738  0.9744  0.9750  0.9756  0.9762  0.9767  
      |
 2.00 | 0.9773  0.9778  0.9783  0.9788  0.9793  0.9798  0.9803  0.9808  0.9812  0.9817  
 2.10 | 0.9821  0.9826  0.9830  0.9834  0.9838  0.9842  0.9846  0.9850  0.9854  0.9858  
 2.20 | 0.9861  0.9865  0.9868  0.9871  0.9875  0.9878  0.9881  0.9884  0.9887  0.9890  
 2.30 | 0.9893  0.9896  0.9898  0.9901  0.9904  0.9906  0.9909  0.9911  0.9914  0.9916  
 2.40 | 0.9918  0.9920  0.9923  0.9925  0.9927  0.9929  0.9931  0.9933  0.9935  0.9936  
      |
 2.50 | 0.9938  0.9940  0.9942  0.9943  0.9945  0.9946  0.9948  0.9949  0.9951  0.9952  
 2.60 | 0.9954  0.9955  0.9956  0.9958  0.9959  0.9960  0.9961  0.9962  0.9963  0.9965  
 2.70 | 0.9966  0.9967  0.9968  0.9969  0.9970  0.9970  0.9971  0.9972  0.9973  0.9974  
 2.80 | 0.9975  0.9976  0.9976  0.9977  0.9978  0.9978  0.9979  0.9980  0.9980  0.9981  
 2.90 | 0.9982  0.9982  0.9983  0.9983  0.9984  0.9984  0.9985  0.9985  0.9986  0.9986  
      |
 3.00 | 0.9987  0.9987  0.9988  0.9988  0.9988  0.9989  0.9989  0.9990  0.9990  0.9990  
 3.10 | 0.9991  0.9991  0.9991  0.9992  0.9992  0.9992  0.9992  0.9993  0.9993  0.9993  
 3.20 | 0.9993  0.9994  0.9994  0.9994  0.9994  0.9995  0.9995  0.9995  0.9995  0.9995  
 3.30 | 0.9995  0.9996  0.9996  0.9996  0.9996  0.9996  0.9996  0.9997  0.9997  0.9997  
 3.40 | 0.9997  0.9997  0.9997  0.9997  0.9997  0.9998  0.9998  0.9998  0.9998  0.9998  
      |
 3.50 | 0.9998  0.9998  0.9998  0.9998  0.9998  0.9998  0.9998  0.9999  0.9999  0.9999  
 3.60 | 0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  
 3.70 | 0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  0.9999  1.0000  1.0000  
 3.80 | 1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  
 3.90 | 1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000  1.0000 
Determining the probability that y of n sample's occur in a given range is calculated by (McCuen, 1989):

(6-11)(6-11)

From this relationship, not only can we evaluate normality based on the 2 test, but this distribution can be used to develop probability plots (NOTE: On probability paper, the probability axis is non-linear and non-logarithmic. Its scale can be determined as a function of z and p(z)).

To develop the probability plot, the data must be rank ordered (sorted). Two common methods are presented by Weibull (pw) and Hazen (ph) (McCuen, 1989) and the expected value for a given rank is calculated as:

(6-12)(6-12)

(6-13)(6-13)

where n is the number of samples and i is the rank-order of the given sample. These methods generate slightly different results, but either method is valid. Which method is used is largely a matter of user preference, and the user's impression of what works best for a particular data set.

[TOP]


Bibliography (histo):

McCuen, R.H., 1989, Hydrologic Analysis and Design, Prentice-Hall, Englewood Cliffs, New Jersey.

Press, W.H., S.A. Teukolsky, W.T. Vettering, and B.P. Flannery, 1992, Numerical Recipes in C, The Art of Scientific Computing, Second Edition, Cambridge University Press, New York, pps. 612-614.

[TOP]


Table of Contents
Previous Chapter
Beginning of this Chapter
Next Chapter