Chapter 7: Distribution Comparison (distcomp)


The distcomp application is used to calculate and display the differences between two data sets, or one data
set and a series of other data sets.
The distcomp application is composed of three sections (Figure 7.1); the main menu-
bar, the status and log text area, and the drawing or graph area. The menu-bar is used
to select all distcomp commands, the log/status area is used by the program to report
important messages or results, and the drawing area is the display area for the
histograms and probability plots.
Figure 7.1
Menu Items
Examples
Command Line Arguments
File Formats
Mathematics
Bibliography
The main menu controls nearly all the program operations; files can be opened
and saved, graphics can be plotted, the appearance of the graphic can be modified, help
can be requested, and the results can be sent to the printer. For distcomp there are eight
items on the main menu:
File,
Data,
Style,
Statistics,
Graph,
Plot,
Log, and
Help
(Figure
7.1). File controls file handling (opening, saving, naming files), directs printing, and
allows the user to quit the application. Data defines which columns (when appropriate)
the X and Y data will be read from. Style defines the parameters associated with the
appearance of each line. Statistics displays some calculated statistical values of
potential interest. Graph is used to define details about the graph border, fonts, label,
mesh, and line styles. Log allows the user to save, print, or view any text which has
been written to the status/log window. Help gives the user a selection of pop-up help
topics. Each menu item is fully described below with all the available options.
The File sub-menu options control file and print handling, and exiting the
program. The options include Open, View, Save, Save as, Save Preferences, Print Setup,
Print, and Quit.
Open:
Selecting File:Open generates a pop-up dialog which allows the user to select an
existing data file. This dialog functions exactly as the dialog in Figure 5.2 (plotgraph -
Chapter 5). As with plotgraph files, the default data file name extension is "*.dat".
Figure 7.2
View:
File:View pops up a simple screen editor with the current data file.
Save Preferences:
When using programs with many user options, it is not possible for the program
to always pick reasonable default values for each parameter or input variable. For this
reason preference files were created (See Appendix C). These allow the user to define a
unique set of "defaults" applicable to the particular project. When File:Save Preferences
is selected, histo determines how all the input variables are currently defined and writes
them to the file "distcomp.prf."
-
WARNING: if "distcomp.prf" already exists, you will be warned that it is about to be over-
written. If you do not want the old version destroyed you must move it to a new
file (e.g. the UNIX command mv distcomp.prf distcomp.old.prf would be sufficient). When
you press OK, the old version will be over-written! This cannot be done from
within the application. To rename the file you will have to execute the UNIX mv
command from a UNIX prompt in another window.
If "distcomp.prf" does not exist in the current directory, it is created. This is an ASCII file
and can be edited by the user. See Appendix C for details.
Print Setup:
File:Print Setup works exactly as explained in Chapter 5.
Print:
File:Print generates a Postscript file of the graph, and depending on how the
print options are define in Print Setup, directs this file to the specified print queue, or to
the specified file.
Quit:
File:Quit terminates the program,.
[TOP] [SYNTAX]
Data:
When the appropriate file type is being read, selecting the Data:Modify menu
option will pop-up the dialog shown if Figure 7.2. This dialog allows the user to select
which Data Columns in the data file will be evaluated (Up to 20 columns can be
selected. The number of toggles reflects the number of columns in the data file). It
allows the user to specify the histogram Sizing Rule (This is used for sizing histogram
bars). The options are Division Width, Number of Divisions, or Equal Percent. With the
first two methods, the divisions are equally spaced; with the third method spacing is a
function of the data distribution. For the Division Width, the user must specify the
desired bar width (The default is 1/10 the data range). For Number of Divisions, user
specifies how many equal divisions to divide the data range into (the default is 10). If
the Equal Percent Divisions rule is selected, the Number of Divisions text field is used
again to enter how many divisions to divide the data into. Instead of dividing the data
by the range of the data though, the data is divided by number of points in the file. For
example, if the data file has 100 points and the number of divisions is 20, the
histogram will show the extents of groups of sorted 5 point data groupings. The
Starting and Ending Locations, by default, are the minimum and maximum extents of the data file.
These may be redefined to more appropriate values. If the values have been reset, or
are set to the bounds of a pervious data set, pressing the Maximize Data Range button,
will reset the Starting and Ending Locations values to the minimum and maximum
extents of the current data set.
Figure 7.3
[TOP] [SYNTAX]
Style:
The Style menu option allows the user to specify various attributes that control
the appearance of the histogram graph. These are divided into three sub-menus: Plot
Type, Y-Axis Type, and Transform Type.
Plot Type:
Style:Plot Type allows the user to specify type of graph that will be plotted.
There are several different types of plots supported by histo. There are five basic
options, some with additional options. The available graph type are: Histograms, Box
and Whisker Plots, Cumulative Distributions, 1.0 - Cumulative Distributions, and
Probability Plots.
Histograms:
There are two type of histograms, Single and Stacked. The Single histogram is
the default and will plot all the data onto a single graph. When just one data column
has been selected for plotting (See Data above) a plot similar to that in Figure 7.3 will
be drawn. If more then one data column has been selected, the histogram division
width is divided by the number of active data columns. The histogram bars for each
range from the different data columns are then plotted side by side (Figure 6.3b). This
kind of graph can become very busy and difficult to read if more then a few data
columns are plotted together. Instead of using the Single option, the histograms can be
Stacked (Figure 6.4). This style will generally be easier to interpret. If only one data
column has been selected, Single or Stacked will generate the same plot.
Figure 7.4 and
Cumulative Distributions:
A cumulative distribution is similar to the histogram, but it starts at 0.0% on
the left (the data minimum value), and increased to 100.0% on the right (the data
maximum value). At any point in-between, the percent (or number) of data values less
than the X-axis value is plotted (Figure 7.4). Again Single or Stacked plots can be
used.
Figure 7.5
1.0 - Cumulative Distributions:
For the 1.0 minus the cumulative distribution, the histogram represents the
percent (or number) of data values greater than the X-axis value (Figure 7.5). Like the
histogram and the cumulative distribution plots, Single or Stacked plots can be used.
Figure 7.6
Probability Plots:
If a Probability Plot option is desired, select Set, or one of the Exceedence or
Rank Order options (Set is just a menu short-cut). The Exceedence Type may be
specified, and the Rank Order Method used to determine the frequency of occurrence of
a variable value can be specified.
The Exceedence Type only affects the labeling on the X-axis. An Exceedence plot
indicates the percentage of points which exceed a specified value. A Nonexceedence plot
indicates the percentage of points which do not exceed a specified value. The
appearance of the graphs otherwise is identical.
The Hazen and Weibull methods are two methods for determining the Rank
Order of one data value within the data set. For further details see the histo
Mathematics section (Equations 6-12 and 6-13), or refer to McCuen (1989).
It is common in nature that the distribution of a measured parameter has a log
distribution (This is set with the Style:Transform Type:Log menu option discussed
below). If this is the case, a Normal probability plot will show a curved line (Figure
6.7a). From the curved line, though one can say the data is not normally distributed,
but little more. By Log transforming the data, if the line becomes "straight," the
probability plot suggests that the data is log-normally distributed (Figure 6.7b).
Figure 7.7a
and
Figure 7.7b
P-P Plots:
Figure 7.8a
Figure 7.8b
Figure 7.8c
Q-Q Plots:
Figure 7.9a
Figure 7.9b
Figure 7.9c
Y-Axis Type:
Style:Y-Axis Type allows the user to specify how the frequency distribution is
presented on the Y-axis. It can be specified by Count or by Percentage.
Transform Type:
Style:Transform Type allows for either Normal or Log (Base 10) transforms
(Normal implies the data is unaltered). Transformed histograms are shown in Figure
6.3a (Normal) and Figure 6.8 (Log). Transformed probability plots are shown in Figures
6.7a and 6.7b.
-
NOTE: The transforms use the log base 10, not natural log.
[TOP] [SYNTAX]
Graph:
Graph allows the user to specify various attributes about the appearance of the
graph. Attributes about the graph Border, Fonts, Labels, Mesh, and Line Styles.
Border:
Graph:Border is described in Chapter 5 in the Graph:Border section (Figure 5.9).
Fonts:
Graph:Fonts is described in Chapter 5 in the Graph:Fonts section (Figures 5.10
and 5.11).
Labels:
Graph:Labels is described in Chapter 5 in the Graph:Labels section (Figure 5.12)
Line Styles:
This option is similar to that used in plotgraph but instead of changing the
attributes associated with a line, attributes are changed with regard to the histogram
bars, the mean data value line, and the standard deviation bars. They are not truly
lines but they are treated as such:
-
Line #1 = Mean Value Data Line
Line #2 = Standard Deviation Error-Bars
Line #3+ = Histogram bars
This dialog is described in the Graph:Style section of Chapter 5 (Figure 5.14).
Mesh:
Graph:Mesh is described in Chapter 5 in the Graph:Mesh section (Figure 5.13).
[TOP] [SYNTAX]
Plot:
Plot is described in Chapter 5 in the Plot section.
[TOP] [SYNTAX]
Log:
The Log menu option is supplied to allow the user to save, view, or print all text
which has been written to the log/status window by the program or added by the user
(The log window is also a simple text editor). The options include View Log, Save, Save
as, Clear, and Print. View Log, Save, and Save as are similar in operation to the menu
options under File described above.
[TOP] [SYNTAX]
Help:
Help works exactly as explained in Chapter 5 (plotgraph, Figure 5.15) Help
section.
[TOP] [SYNTAX]
Zoom and Mouse Control:
Using the mouse in histo is mush the same as described in plotgraph (Chapter
5). The mouse can be used to refresh the plot display and zoom in exactly the same
manner, but in addition to the position of the mouse on the plot being shown in the
upper left of the drawing area, the value of the histogram bar (if appropriate) relative to
the size of the data set will be displayed. This will be in terms of number of points in
the histogram bar to the total number of points in the data set (Depending on how the
Style:Y-Axis Type option is set, this term may be expressed as a percentage).
[TOP]
Example of Using Distcomp:
Using distcomp is quite straight forward. Once a file has been loaded a graph is
generated; most of the program options control only the appearance of the graph.
Running From the Command Line:
In many cases it is more convenient to run the application completely from the
command line, or at least pass some parameter values in from the command line. The
options listed below allow the user to accomplish almost anything that is possible from
within the X-windows application from the command line (adding lines from different
files is not currently supported). This feature can be useful when the user does not
have a X-windows/Motif terminal available, or when many graphs need to be processed
quickly, and the operation can be completed in batch mode without user interaction.
Syntax:
-
distcomp
[-dive #.#]
[-divn #]
[-divr #]
[-divs #.#]
[-divw #.#]
[-dm #]
[-dme #]
[-dsd #]
[-esp #]
[-exceed #]
[-ft #]
[-help]
[-lc #]
[-leg #]
[-lgf " "]
[-lglp #]
[-lgmw #]
[-lpbm #.#]
[-lpc #]
[-lpd #]
[-lpf " "]
[-lph #]
[-lplm #.#]
[-lppsext " "]
[-lpo #]
[-lpq " "]
[-lpr]
[-lprm #.#]
[-lps #]
[-lptm #.#]
[-lsc #]
[-lsfl #]
[-lssz #.#]
[-lsty #]
[-ltk #.#]
[-lty #]
[-md #.#]
[-moy #.#]
[-ms #]
[-mx #.#]
[-my #.#]
[-nt #]
[-prf " "]
[-pt #]
[-rfh #]
[-ro #]
[-se #.#]
[-ss #.#]
[-sttl " "]
[-tt #]
[-ttl " "]
[-xfmt " "]
[-xlabel " "]
[-xmax #.#]
[-xmin #.#]
[-xMt #.#]
[-xmt #]
[-xto #.#]
[-xy #.#]
[-yfmt " "]
[-ylabel " "]
[-ymax #.#]
[-ymin #.#]
[-yMt #.#]
[-ymt #]
[-ys #]
[-yto #.#]
[filename]
Meaning of flag symbols:
-
# = integer
#.# = float
" " = character string.
{} = variable is an array. Values must be seperated by a ',' and no spaces
are allowed. Do not use the "{ }" symbols on the command line.
NOTES:
-
1). All parameters in [] brackets are optional.
2). Quotes must be used around character strings.
3). Filename, if given, must be listed last.
4). If no default is given, the feature is not currently supported
on
command line.
If no entry is required for flag, flag command executed.
Flag Definitions:
-
| -dive | = |
histogram bar ending location | default = data maximum |
| -divn | = |
number of divisions (histogram bars) | default = 10 |
| -divr | = |
division method | default = 0 |
| |
0 1 2 |
= = = |
number of divisions
equal division width
equal percentage divisions |
|
| -divs | = |
histogram bar starting location | default = data minimum |
| -divw | = |
division width (histogram bars) | default = data range / 10.0 |
| -dm | = |
draw mean | default = 1 |
| |
|
| -dme | = |
draw median | default = 1 |
| |
|
| -dsd | = |
draw standard deviation | default = 1 |
| |
|
| -esp | = |
exageration scale priority | default = 0 |
| |
0 1 |
= = |
favor y-exageration scale (-ys)
favor x/y ratio |
|
| -exceed | = |
exceedence on nonexceedence switch | default = 0 |
| |
0 1 |
= = |
Exceedence
Nonexceedence |
|
| -fnt1 | = |
main title font | default = Helvetica-Bold |
| -fnt2 | = |
secondary title font | default = Helvetica-Bold |
| -fnt3 | = |
axes label font | default = Helvetica |
| -fnt4 | = |
division font | default = Helvetica |
| -fnt5 | = |
annotation font | default = Helvetica |
| -fnt6 | = |
mouse position font | default = Helvetica |
| -fnts1 | = |
main title font size | default = 24.0 |
| -fnts2 | = |
main title font size | default = 15.0 |
| -fnts3 | = |
main title font size | default = 15.0 |
| -fnts4 | = |
main title font size | default = 12.0 |
| -fnts5 | = |
main title font size | default = 10.0 |
| -fnts6 | = |
main title font size | default = 12.0 |
| -ft | = |
frequency type | default = 0 |
| |
|
| -help | = |
give this help menu | |
| -lc {} | = |
line color | default = variable |
| |
0 1 2 3 4 5 6 7 |
= = = = = = = = |
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan |
|
| -leg | = |
draw line legend | default =1 |
| |
|
| -lgf | = |
log file name | defalut = "log.dat" |
| -lglp | = |
line legend position | default = 1 |
| |
0 1 2 3 |
= = = = |
Top left
Top right
Bottom left
Bottom right |
|
| -lgmw | = |
maximum line legend width | default = 200 |
| -lpbm | = |
page bottom margin | default = 1.5 |
| -lpc | = |
number of copies to print | default = 1 |
| -lpd | = |
print destination | default = 0 |
| |
|
| -lpf | = |
print filename | default = "junk.ps" |
| -lph | = |
print header page | default = 0 |
| |
|
| -lplm | = |
page left margin | default = 1.5 |
| -lpo | = |
print orientation | default = 0 |
| |
0 1 |
= = |
Portrait
Landscape |
|
| -lppsext | = |
search extention for postscript files | default = "*.ps" |
| -lpq | = |
print queue | default = "ps" |
| -lpr | = |
print file at specified orientations | |
| -lprm | = |
page right margin | default = 1.0 |
| -lps | = |
print output | default = 0 |
| |
0 1 |
= = |
Black & white
Color |
|
| -lptm | = |
page top margin | default = 1.5 |
| -lsfl {} | = |
fill line symbol | default = 0 |
| |
|
| -lsc {} | = |
line symbol color | default = variable |
| |
1 2 3 4 5 6 7 |
= = = = = = = |
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan |
|
| -lssz {} | = |
line synbol size | default = 9.0 |
| -lsty {} | = |
line symbol type | default = 0 |
| |
-1 0 1 2 3 4 |
= = = = = = |
No Symbol
Circle
Cross
Diamond
Square
X |
|
| -ltk {} | = |
line thickness | default = 1.0 |
| -lty {} | = |
line type | default = 0 |
| |
-1 0 1 2 |
= = = = |
No Line
Solid
Dashed
Double Dashed |
|
| -md | = |
dash mesh | default = 0 |
| |
|
| -mox | = |
X mesh origin | default = 0.0 |
| -moy | = |
Y mesh origin | default = 0.0 |
| -ms | = |
use mesh | default = 0 |
| |
|
| -mx | = |
X mesh frequency | default = 1/10 DX |
| -my | = |
Y mesh frequency | default = 1/10 DY |
| -nt | = |
show normal curve(s) | default = 1 |
| |
|
| -prf | = |
preference file name | defalut = "distcomp.prf" |
| -pt | = |
plot type | default = 0 |
| |
0 1 2 3 4 5 6 |
= = = = = = = |
Histogram
Cumulative Distribution Function
1 - CDF
Probability
P-P Plot
Q-Q Plot |
|
| -rfh | = |
screen refresh | default = 0 |
| |
0 1 |
= = |
On exposure
On update |
|
| -ro | = |
rank order option | default = 1 |
| |
0 1 |
= = |
Hazen (2i - 1) / 2n
Weibull 1 / (n - 1) |
|
| -se | = |
series file ending ID | default = last series ID |
| -ss | = |
series file starting ID | default = 1 |
| -sttl | = |
Secondary title | default = " " |
| -tt | = |
transform type | default = 0 |
| |
0 1 |
= = |
Normal
Log (Base 10) |
|
| -ttl | = |
Main title | default = Filename |
| -xMt | = |
X main tic frequency | default = 1/10 DX |
| -xfmt | = |
Number of decimal places for X-axis | default = ".2f" |
| -xlabel | = |
X-axis label | default = "X" |
| -xmax | = |
Graph X-maximum | default = Data Maximum |
| -xmin | = |
Graph X-minimum | default = Data Minimum |
| -xmt | = |
Number of minor X tics | default = 5 |
| -xto | = |
X axis label origin | default = 0.0 |
| -xy | = |
xy ratio | default = 1.5 |
| -yMt | = |
X main tic frequency | default = 1/10 DY |
| -yfmt | = |
Number of decimal places for X-axis | default = ".2f" |
| -ylabel | = |
X-axis label | default = "Y" |
| -ymax | = |
Graph Y-maximum | default = Data Maximum |
| -ymin | = |
Graph Y-minimum | default = Data Minimum |
| -ymt | = |
Number of minor Y tics | default = 5 |
| -ys | = |
Y-axis exageration relative to X-axis | default = Calculated |
| -yto | = |
X axis label origin | default = 0.0 |
An example command might be (typed on one line):
-
distcomp -lpr -xmin 0.0 -ymin 0.0 -ymax 12.0 -md 1 -ms 1 -mx 1.0 -my 1.0 -xMt 1.0 -
yMt 1.0 -ttl "Semivariogram of Elevation Data" -sttl "UNCERT histo module" -
xlabel "distance (feet)" -ylabel "gamma h" -xfmt ".1f" -yfmt ".1f" -esp 1 -xy 1.0
finfac.dat
[TOP]
Setting up the Input Data File:
Two basic type of files can be read by distcomp. The first type is simply column data
(*.dat); the second are gridded (Chapters 10 and 12) data files.
Table of Contents
Previous Chapter
Beginning of this Chapter
Next Chapter