Chapter 7: Distribution Comparison (distcomp)


The distcomp application is used to calculate and display the differences between two data sets, or one data set and a series of other data sets. The distcomp application is composed of three sections (Figure 7.1); the main menu- bar, the status and log text area, and the drawing or graph area. The menu-bar is used to select all distcomp commands, the log/status area is used by the program to report important messages or results, and the drawing area is the display area for the histograms and probability plots.

(Fig. 7-1) Figure 7.1


Menu Items
Examples
Command Line Arguments
File Formats
Mathematics
Bibliography

The Main Menu:

The main menu controls nearly all the program operations; files can be opened and saved, graphics can be plotted, the appearance of the graphic can be modified, help can be requested, and the results can be sent to the printer. For distcomp there are eight items on the main menu: File, Data, Style, Statistics, Graph, Plot, Log, and Help (Figure 7.1). File controls file handling (opening, saving, naming files), directs printing, and allows the user to quit the application. Data defines which columns (when appropriate) the X and Y data will be read from. Style defines the parameters associated with the appearance of each line. Statistics displays some calculated statistical values of potential interest. Graph is used to define details about the graph border, fonts, label, mesh, and line styles. Log allows the user to save, print, or view any text which has been written to the status/log window. Help gives the user a selection of pop-up help topics. Each menu item is fully described below with all the available options.

File:

The File sub-menu options control file and print handling, and exiting the program. The options include Open, View, Save, Save as, Save Preferences, Print Setup, Print, and Quit.

Open:

Selecting File:Open generates a pop-up dialog which allows the user to select an existing data file. This dialog functions exactly as the dialog in Figure 5.2 (plotgraph - Chapter 5). As with plotgraph files, the default data file name extension is "*.dat".

(7-2) Figure 7.2

View:

File:View pops up a simple screen editor with the current data file.

Save Preferences:

When using programs with many user options, it is not possible for the program to always pick reasonable default values for each parameter or input variable. For this reason preference files were created (See Appendix C). These allow the user to define a unique set of "defaults" applicable to the particular project. When File:Save Preferences is selected, histo determines how all the input variables are currently defined and writes them to the file "distcomp.prf."

WARNING: if "distcomp.prf" already exists, you will be warned that it is about to be over- written. If you do not want the old version destroyed you must move it to a new file (e.g. the UNIX command mv distcomp.prf distcomp.old.prf would be sufficient). When you press OK, the old version will be over-written! This cannot be done from within the application. To rename the file you will have to execute the UNIX mv command from a UNIX prompt in another window.

If "distcomp.prf" does not exist in the current directory, it is created. This is an ASCII file and can be edited by the user. See Appendix C for details.

Print Setup:

File:Print Setup works exactly as explained in Chapter 5.

Print:

File:Print generates a Postscript file of the graph, and depending on how the print options are define in Print Setup, directs this file to the specified print queue, or to the specified file.

Quit:

File:Quit terminates the program,.

[TOP] [SYNTAX]


Data:

When the appropriate file type is being read, selecting the Data:Modify menu option will pop-up the dialog shown if Figure 7.2. This dialog allows the user to select which Data Columns in the data file will be evaluated (Up to 20 columns can be selected. The number of toggles reflects the number of columns in the data file). It allows the user to specify the histogram Sizing Rule (This is used for sizing histogram bars). The options are Division Width, Number of Divisions, or Equal Percent. With the first two methods, the divisions are equally spaced; with the third method spacing is a function of the data distribution. For the Division Width, the user must specify the desired bar width (The default is 1/10 the data range). For Number of Divisions, user specifies how many equal divisions to divide the data range into (the default is 10). If the Equal Percent Divisions rule is selected, the Number of Divisions text field is used again to enter how many divisions to divide the data into. Instead of dividing the data by the range of the data though, the data is divided by number of points in the file. For example, if the data file has 100 points and the number of divisions is 20, the histogram will show the extents of groups of sorted 5 point data groupings. The Starting and Ending Locations, by default, are the minimum and maximum extents of the data file. These may be redefined to more appropriate values. If the values have been reset, or are set to the bounds of a pervious data set, pressing the Maximize Data Range button, will reset the Starting and Ending Locations values to the minimum and maximum extents of the current data set.

(7-3) Figure 7.3

[TOP] [SYNTAX]


Style:

The Style menu option allows the user to specify various attributes that control the appearance of the histogram graph. These are divided into three sub-menus: Plot Type, Y-Axis Type, and Transform Type.

Plot Type:

Style:Plot Type allows the user to specify type of graph that will be plotted. There are several different types of plots supported by histo. There are five basic options, some with additional options. The available graph type are: Histograms, Box and Whisker Plots, Cumulative Distributions, 1.0 - Cumulative Distributions, and Probability Plots.

Histograms:

There are two type of histograms, Single and Stacked. The Single histogram is the default and will plot all the data onto a single graph. When just one data column has been selected for plotting (See Data above) a plot similar to that in Figure 7.3 will be drawn. If more then one data column has been selected, the histogram division width is divided by the number of active data columns. The histogram bars for each range from the different data columns are then plotted side by side (Figure 6.3b). This kind of graph can become very busy and difficult to read if more then a few data columns are plotted together. Instead of using the Single option, the histograms can be Stacked (Figure 6.4). This style will generally be easier to interpret. If only one data column has been selected, Single or Stacked will generate the same plot.

(7-4) Figure 7.4 and

Cumulative Distributions:

A cumulative distribution is similar to the histogram, but it starts at 0.0% on the left (the data minimum value), and increased to 100.0% on the right (the data maximum value). At any point in-between, the percent (or number) of data values less than the X-axis value is plotted (Figure 7.4). Again Single or Stacked plots can be used.

(7-5) Figure 7.5

1.0 - Cumulative Distributions:

For the 1.0 minus the cumulative distribution, the histogram represents the percent (or number) of data values greater than the X-axis value (Figure 7.5). Like the histogram and the cumulative distribution plots, Single or Stacked plots can be used.

(7-6) Figure 7.6

Probability Plots:

If a Probability Plot option is desired, select Set, or one of the Exceedence or Rank Order options (Set is just a menu short-cut). The Exceedence Type may be specified, and the Rank Order Method used to determine the frequency of occurrence of a variable value can be specified.

The Exceedence Type only affects the labeling on the X-axis. An Exceedence plot indicates the percentage of points which exceed a specified value. A Nonexceedence plot indicates the percentage of points which do not exceed a specified value. The appearance of the graphs otherwise is identical.

The Hazen and Weibull methods are two methods for determining the Rank Order of one data value within the data set. For further details see the histo Mathematics section (Equations 6-12 and 6-13), or refer to McCuen (1989).

It is common in nature that the distribution of a measured parameter has a log distribution (This is set with the Style:Transform Type:Log menu option discussed below). If this is the case, a Normal probability plot will show a curved line (Figure 6.7a). From the curved line, though one can say the data is not normally distributed, but little more. By Log transforming the data, if the line becomes "straight," the probability plot suggests that the data is log-normally distributed (Figure 6.7b).

(7-7a) Figure 7.7a and (7-7b) Figure 7.7b

P-P Plots:

(7-8a) Figure 7.8a (7-8b) Figure 7.8b (7-8c) Figure 7.8c

Q-Q Plots:

(7-9a) Figure 7.9a (7-9b) Figure 7.9b (7-9c) Figure 7.9c

Y-Axis Type:

Style:Y-Axis Type allows the user to specify how the frequency distribution is presented on the Y-axis. It can be specified by Count or by Percentage.

Transform Type:

Style:Transform Type allows for either Normal or Log (Base 10) transforms (Normal implies the data is unaltered). Transformed histograms are shown in Figure 6.3a (Normal) and Figure 6.8 (Log). Transformed probability plots are shown in Figures 6.7a and 6.7b.

NOTE: The transforms use the log base 10, not natural log.

[TOP] [SYNTAX]


Graph:

Graph allows the user to specify various attributes about the appearance of the graph. Attributes about the graph Border, Fonts, Labels, Mesh, and Line Styles.

Border:

Graph:Border is described in Chapter 5 in the Graph:Border section (Figure 5.9).

Fonts:

Graph:Fonts is described in Chapter 5 in the Graph:Fonts section (Figures 5.10 and 5.11).

Labels:

Graph:Labels is described in Chapter 5 in the Graph:Labels section (Figure 5.12)

Line Styles:

This option is similar to that used in plotgraph but instead of changing the attributes associated with a line, attributes are changed with regard to the histogram bars, the mean data value line, and the standard deviation bars. They are not truly lines but they are treated as such:

Line #1 = Mean Value Data Line
Line #2 = Standard Deviation Error-Bars
Line #3+ = Histogram bars

This dialog is described in the Graph:Style section of Chapter 5 (Figure 5.14).

Mesh:

Graph:Mesh is described in Chapter 5 in the Graph:Mesh section (Figure 5.13).

[TOP] [SYNTAX]


Plot:

Plot is described in Chapter 5 in the Plot section.

[TOP] [SYNTAX]


Log:

The Log menu option is supplied to allow the user to save, view, or print all text which has been written to the log/status window by the program or added by the user (The log window is also a simple text editor). The options include View Log, Save, Save as, Clear, and Print. View Log, Save, and Save as are similar in operation to the menu options under File described above.

[TOP] [SYNTAX]


Help:

Help works exactly as explained in Chapter 5 (plotgraph, Figure 5.15) Help section.

[TOP] [SYNTAX]


Zoom and Mouse Control:

Using the mouse in histo is mush the same as described in plotgraph (Chapter 5). The mouse can be used to refresh the plot display and zoom in exactly the same manner, but in addition to the position of the mouse on the plot being shown in the upper left of the drawing area, the value of the histogram bar (if appropriate) relative to the size of the data set will be displayed. This will be in terms of number of points in the histogram bar to the total number of points in the data set (Depending on how the Style:Y-Axis Type option is set, this term may be expressed as a percentage).

[TOP]


Example of Using Distcomp:

Using distcomp is quite straight forward. Once a file has been loaded a graph is generated; most of the program options control only the appearance of the graph.

Running From the Command Line:

In many cases it is more convenient to run the application completely from the command line, or at least pass some parameter values in from the command line. The options listed below allow the user to accomplish almost anything that is possible from within the X-windows application from the command line (adding lines from different files is not currently supported). This feature can be useful when the user does not have a X-windows/Motif terminal available, or when many graphs need to be processed quickly, and the operation can be completed in batch mode without user interaction.

Syntax:

distcomp [-dive #.#] [-divn #] [-divr #] [-divs #.#] [-divw #.#] [-dm #] [-dme #] [-dsd #] [-esp #] [-exceed #] [-ft #] [-help] [-lc #] [-leg #] [-lgf " "] [-lglp #] [-lgmw #] [-lpbm #.#] [-lpc #] [-lpd #] [-lpf " "] [-lph #] [-lplm #.#] [-lppsext " "] [-lpo #] [-lpq " "] [-lpr] [-lprm #.#] [-lps #] [-lptm #.#] [-lsc #] [-lsfl #] [-lssz #.#] [-lsty #] [-ltk #.#] [-lty #] [-md #.#] [-moy #.#] [-ms #] [-mx #.#] [-my #.#] [-nt #] [-prf " "] [-pt #] [-rfh #] [-ro #] [-se #.#] [-ss #.#] [-sttl " "] [-tt #] [-ttl " "] [-xfmt " "] [-xlabel " "] [-xmax #.#] [-xmin #.#] [-xMt #.#] [-xmt #] [-xto #.#] [-xy #.#] [-yfmt " "] [-ylabel " "] [-ymax #.#] [-ymin #.#] [-yMt #.#] [-ymt #] [-ys #] [-yto #.#] [filename]

Meaning of flag symbols:

# = integer
#.# = float
" " = character string.
{} = variable is an array. Values must be seperated by a ',' and no spaces are allowed. Do not use the "{ }" symbols on the command line.

NOTES:

1). All parameters in [] brackets are optional.
2). Quotes must be used around character strings.
3). Filename, if given, must be listed last.
4). If no default is given, the feature is not currently supported on command line.

If no entry is required for flag, flag command executed.

Flag Definitions:

-dive = histogram bar ending location default = data maximum
-divn = number of divisions (histogram bars) default = 10
-divr = division method default = 0
0
1
2
=
=
=
number of divisions
equal division width
equal percentage divisions
-divs = histogram bar starting location default = data minimum
-divw = division width (histogram bars) default = data range / 10.0
-dm = draw mean default = 1
0
1
=
=
False
True
-dme = draw median default = 1
0
1
=
=
False
True
-dsd = draw standard deviation default = 1
0
1
=
=
False
True
-esp = exageration scale priority default = 0
0
1
=
=
favor y-exageration scale (-ys)
favor x/y ratio
-exceed = exceedence on nonexceedence switch default = 0
0
1
=
=
Exceedence
Nonexceedence
-fnt1 = main title font default = Helvetica-Bold
-fnt2 = secondary title font default = Helvetica-Bold
-fnt3 = axes label font default = Helvetica
-fnt4 = division font default = Helvetica
-fnt5 = annotation font default = Helvetica
-fnt6 = mouse position font default = Helvetica
-fnts1 = main title font size default = 24.0
-fnts2 = main title font size default = 15.0
-fnts3 = main title font size default = 15.0
-fnts4 = main title font size default = 12.0
-fnts5 = main title font size default = 10.0
-fnts6 = main title font size default = 12.0
-ft = frequency type default = 0
0
1
=
=
Count
Percentage
-help = give this help menu
-lc {} = line color default = variable
0
1
2
3
4
5
6
7
=
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-leg = draw line legend default =1
0
1
=
=
False
True
-lgf = log file name defalut = "log.dat"
-lglp = line legend position default = 1
0
1
2
3
=
=
=
=
Top left
Top right
Bottom left
Bottom right
-lgmw = maximum line legend width default = 200
-lpbm = page bottom margin default = 1.5
-lpc = number of copies to print default = 1
-lpd = print destination default = 0
0
1
=
=
Printer
File
-lpf = print filename default = "junk.ps"
-lph = print header page default = 0
0
1
=
=
False
True
-lplm = page left margin default = 1.5
-lpo = print orientation default = 0
0
1
=
=
Portrait
Landscape
-lppsext = search extention for postscript files default = "*.ps"
-lpq = print queue default = "ps"
-lpr = print file at specified orientations
-lprm = page right margin default = 1.0
-lps = print output default = 0
0
1
=
=
Black & white
Color
-lptm = page top margin default = 1.5
-lsfl {} = fill line symbol default = 0
0
1
=
=
False
True
-lsc {} = line symbol color default = variable
1
2
3
4
5
6
7
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-lssz {} = line synbol size default = 9.0
-lsty {} = line symbol type default = 0
-1
0
1
2
3
4
=
=
=
=
=
=
No Symbol
Circle
Cross
Diamond
Square
X
-ltk {} = line thickness default = 1.0
-lty {} = line type default = 0
-1
0
1
2
=
=
=
=
No Line
Solid
Dashed
Double Dashed
-md = dash mesh default = 0
0
1
=
=
False
True
-mox = X mesh origin default = 0.0
-moy = Y mesh origin default = 0.0
-ms = use mesh default = 0
0
1
=
=
False
True
-mx = X mesh frequency default = 1/10 DX
-my = Y mesh frequency default = 1/10 DY
-nt = show normal curve(s) default = 1
0
1
=
=
Show
Hide
-prf = preference file name defalut = "distcomp.prf"
-pt = plot type default = 0
0
1
2
3
4
5
6
=
=
=
=
=
=
=
Histogram
Cumulative Distribution Function
1 - CDF
Probability
P-P Plot
Q-Q Plot
-rfh = screen refresh default = 0
0
1
=
=
On exposure
On update
-ro = rank order option default = 1
0
1
=
=
Hazen (2i - 1) / 2n
Weibull 1 / (n - 1)
-se = series file ending ID default = last series ID
-ss = series file starting ID default = 1
-sttl = Secondary title default = " "
-tt = transform type default = 0
0
1
=
=
Normal
Log (Base 10)
-ttl = Main title default = Filename
-xMt = X main tic frequency default = 1/10 DX
-xfmt = Number of decimal places for X-axis default = ".2f"
-xlabel = X-axis label default = "X"
-xmax = Graph X-maximum default = Data Maximum
-xmin = Graph X-minimum default = Data Minimum
-xmt = Number of minor X tics default = 5
-xto = X axis label origin default = 0.0
-xy = xy ratio default = 1.5
-yMt = X main tic frequency default = 1/10 DY
-yfmt = Number of decimal places for X-axis default = ".2f"
-ylabel = X-axis label default = "Y"
-ymax = Graph Y-maximum default = Data Maximum
-ymin = Graph Y-minimum default = Data Minimum
-ymt = Number of minor Y tics default = 5
-ys = Y-axis exageration relative to X-axis default = Calculated
-yto = X axis label origin default = 0.0
An example command might be (typed on one line):

distcomp -lpr -xmin 0.0 -ymin 0.0 -ymax 12.0 -md 1 -ms 1 -mx 1.0 -my 1.0 -xMt 1.0 - yMt 1.0 -ttl "Semivariogram of Elevation Data" -sttl "UNCERT histo module" - xlabel "distance (feet)" -ylabel "gamma h" -xfmt ".1f" -yfmt ".1f" -esp 1 -xy 1.0 finfac.dat

[TOP]


Setting up the Input Data File:

Two basic type of files can be read by distcomp. The first type is simply column data (*.dat); the second are gridded (Chapters 10 and 12) data files.


Table of Contents
Previous Chapter
Beginning of this Chapter
Next Chapter