Chapter 8: Vario  


The vario application is used for calculating one- and two-dimensional experimental semivariograms. The package is not limited to just the classical semivariogram but will also calculate covariances, madograms, rodograms, cross- semivariograms, etc. Analysis of the calculated values through jackknifing is also an option. Three types of soft indicator data can be used with hard data to calculate the spatial continuity. The application displays the measure of covariance ((h)) versus lag.

The vario application is composed of three sections (Figure 8.1); the main menu- bar, the status and log text area, and the drawing or graph area. The menu-bar is used to select all vario commands, the log/status area is used by the program to report important messages or results, and the drawing area is the display area for the graphs.

(8-1)Figure 8.1


Menu Items
Examples
Command Line Arguments
File Formats
Mathematics
Bibliography

The Main Menu:

The main menu controls nearly all the program operations; files can be opened and saved, graphics can be plotted, the appearance of the graphic can be modified, help can be requested, and the results can be sent to the printer. For vario there are eight items on the main menu: File, Experimental, Jackknife, Graph, Log, Plot, Model, and Help (Figure 8.1). File controls file handling (opening, saving, naming files), directs printing, and allows the user to quit the application. Experimental defines which columns (when appropriate) the X, Y, Z, and Value data will be read from, allows the user to define search parameters for the calculation and to choose a spatial equation. Jackknife controls the jackknifing calculation parameters. Graph is used to define details about the graph border, fonts, label, mesh, line styles, and error-bar styles. Log is used to save, print or view any message information printed within the log/status window. Plot plots the graph. Model allows the user to launch the Variofit package and it can also be used to save the display for comparison with the current calculated spatial measure. Help gives the user a selection of pop-up help topics. Each menu item is fully described below with all the available options.

[TOP] [SYNTAX]


File:

The File sub-menu options control file and print handling, and exiting the program. The options include Open, View Data, View Results, Save, Save as, Save Preferences, Print Setup, Print, Quit, and Quit Without Saving.

Open:

Selecting File:Open generates a pop-up dialog which allows the user to select an existing data file. This dialog operates exactly as the Open:File dialog in Chapter 5 (plotgraph Figure 5.2). As with plotgraph files, the default data file extension is *.dat.

View:

File:View pops up a simple screen editor with the last saved version of the data file being used.

View Results:

File:View Results pops up a simple screen editor which allows the user to view the results of the latest calculation.

Save:

File:Save saves the results of the latest calculation to a file. If a save file has already been opened, the data are simply saved. If a save file has not been selected yet, a pop-up dialog similar to that used in File:Open (Figure 5.2) is created. The main difference between the Open and the Save dialog is that to save a file, the file does not have to pre-exist. For a description of how the dialog works, see the Open section above and substitute Save for Open wherever appropriate.

The data from the calculation will be saved in a *.gam file. This file is ready to be used as input to the Variofit application. Simple editing of the header in this file allows it to be viewed in the Plotgraph application.

Save as:

File:Save as is identical to File:Save described above, when a file has not been selected yet. This option can be used to save the file for the first time, or save results to a new file.

Save Preferences:

When using programs with many user options, it is not possible for the program to always pick acceptable default values for each parameter or input variable. For this reason preference files were created (See Appendix C). These allow the user to define a unique set of "defaults" applicable to the particular project. When File:Save Preferences is selected, vario determines how all the input variables are currently defined and writes them to the file "plotgraph.prf".

WARNING: if "vario.prf" already exists, you will be warned that it is about to be over- written. If you do not want the old version destroyed you must move it to a new file (e.g. the UNIX command mv vario.prf vario.old.prf would be sufficient). When you press OK the old version will be over-written! This cannot be done currently from within the application. To rename the file you will have to execute the UNIX mv command from a UNIX prompt in another window.

If "vario.prf" does not exist in the current directory, it is created. This is an ASCII file and can be edited by the user. See Appendix C for details.

Print Setup:

File:Print Setup works exactly as explained in Chapter 5.

Print:

File:Print generates a Postscript file of the calculated spatial measure, and depending on how the print options are define in Print Setup, directs this file to the specified print queue, or to the specified file.

Quit:

File:Quit terminates the program, but if additions have been made to the graph, the user will first be queried to supply a file to save the changes in.

Quit Without Saving:

File:Quit Without Saving terminates the program regardless of any additions to the graph. Once pressed there is no option to change your mind.

[TOP] [SYNTAX]


Experimental:

The Experimental menu options allow the user to specify everything regarding what data in the data file will be evaluated, what spatial equation will be used, and all the relevant search parameters used to calculated a spatial continuity equation (e.g. an experimental semivariogram).

Data:

The Experimental:Hard Data menu option allows the user to specify which columns will define the X, Y, Z, and head and tail data (Figure 8.2). The tail column is only relevant if covariances are being calculated (This is used with the Cross- Semivariogram, Covariance, Correlogram, and Soft Indicator Covariance spatial equations described below). The head column is equivalent to an individual data value column. Once the columns are defined, the minimum and maximum value in each column, the total number of data points, and the maximum diagonal distance across the data set are displayed. It is possible to read in two dimensional data by placing a "0" in the field for the missing data column (Any other column can be used too, but you must remember to turn off the Active Z Dimension Data toggle in the Data:Search Parameters dialog described below).

(8-2)Figure 8.2

Soft Data:

The Vario program allows the user to incorporate three types of soft data in calculations of measures of spatial continuity. Soft data are defined as data which contain non-negligible uncertainty on the parameter being studied. Vario uses three distinct types of soft data: imprecise measurements (Type A soft data) which have been calibrated through misclassification probabilities as defined by Alabert (1987), interval bounds on a data value (Type B soft data) as defined by Journel (1986), and prior probability distributions (Type C soft data ) on a value as described by Journel (1986). This user's manual does not provide a tutorial on these types of data and it is necessary that the user be familiar with the concepts of soft data prior to using this package.

To enter soft data information, two dialogs are used. The first is created by the Experimental:Soft Data menu option and is shown in Figure 8.3. The second (Figure 8.4) is generated by pressing the Data Options button on the Experimental:Soft Data dialog (Figure 8.3). All soft data calculations are based on an indicator transform of the data. It is necessary to enter the number of indicator classes prior to making any soft data calculations. For both type A and type C soft data, it is necessary to open a file which contains necessary information on the soft data (See the "Setting Up the Input Data File" section).

(8-3)Figure 8.3

The Uncertainty File Select button will display a listing of files in the directory similar to that shown in the original file open command under the File dialog (Figure 5.2). The uncertainty files generally have the suffix *.unc.

The Type A Data Flag and Threshold Number fields describe locations within the *.unc file. The Type A Data Flag is an index which defines the level of soft data being used. The Threshold Number is the threshold level (1 is lowest, n is highest) which is being calculated.

Type A soft data calculations can be weighted in several different ways. Each type A soft data calculation is a linear combination of the covariance calculated from the hard data, the hard-soft cross-covariance and the soft data auto-covariance. Each portion of the calculation is weighted separately and the three weights add up to one. Straight Pair Weighting weights the gamma calculation towards the most numerous type of data within the lag. Generally, soft data outnumber hard data and this type of weighting will favor the soft information. p1-p2 Scaled Pair Weighting takes into account the quality, as well as the quantity, of the soft information when calculating the weights. If the imprecision level of the soft data is high, the scaled weighting option is perhaps more prudent. The equations used to calculate the weights are discussed in the Vario math section.

The Data Options dialog (Figure 8.4) allows the user to set the column numbers for the soft data parameters. These parameters are the lower bound column for the data value, the upper bound column for the data value and the soft data index column. Type A soft data have the lower and upper bounds set equal to each other as do hard data. The uncertainty file which was opened above, contains the indicator thresholds and the misclassification probabilities p1 and p2. Type B and C soft data (interval and prior pdf) have unequal upper and lower bounds. For type B soft data these bounds define the interval in which the actual data value is believed to lie. For type C data, the lower and upper bounds define an interval within which the prior probability is defined at every indicator threshold. The previously opened uncertainty file contains the indicator thresholds and the prior probabilities at each threshold. Summary information on the soft data is displayed.

(8-4)Figure 8.4

It is possible to display each component of the spatial measure by clicking on one of the Soft Data Print Options. The default option is the combined calculation.

Indicator Threshold:

The Experimental:Indicator Threshold menu option is used to define the indicator classes (Figure 8.5). It is generally only necessary to set the upper threshold to something other than the default maximum . By doing this, the values greater than the upper threshold are zero and those less than or equal to the upper threshold are set to one. Further discretization is possible by using the lower threshold. All values between the upper and lower thresholds are set to one; all others are set to zero.

(8-5)Figure 8.5

NOTE: If the default minimum and maximum are used, there will be an error. There is no variance if the data set is converted completely to 1's.

Search Parameters:

There are two methods for calculating experimental semivariograms. Which you use depends on the nature of the data set and the desired results. One is designed for irregularly spaced (non-gridded) data, and the other is designed for gridded data. The method used for gridded data is very fast, but cannot be used with irregularly spaced data, and can only evaluate one direction (zero half-angles, zero bandwidths). The method for irregularly spaced data (this can be used for gridded data too) is much slower, but half-angles and bandwidths (discussed below) can be evaluated.

NOTE: The "Gridded" method currently does not support Cross-Semivariogram, Covariance, Correlogram, and Soft indicator Covariance spatial variability models.

Point (Irregular) Data:

The Experimental:Search Parameters:Point Data menu creates the dialog shown in Figure 8.6. This dialog is used to define how many spatial equation will be calculated, whether a two- or three-dimensional method will be used, how the first lag will be calculated, and display the data extents.

(8-6)Figure 8.6

The user is allowed two choices as to where the first lag should be calculated. Depressing the First Lag = 1/2 Lag Distance button will make the first calculation from all pairs of points within the distance 0.0 to 1/2 the lag distance value. If the First Lag = 1/2 Lag Distance button is not depressed, the first lag will be calculated from 0.0 to the lag distance value. If the Z-dimension is to be used in the calculation, the Activate Z Dimension button must be depressed. To define the search parameters for each spatial equation, define the number of models/directions desired then press the Define Directions button. This will create the dialog shown if Figure 8.7.

(8-7)Figure 8.7

From the search directions dialog, the lag distance, maximum search distance, direction bandwidth, plunge bandwidth, horizontal and vertical search directions, and horizontal and vertical half-angles can be defined. The Plot toggle is used to turn on and off plotting for each line. The graph can get very busy and it can become difficult to determine which line is associated with each model/direction otherwise. The search directions, half angles and bandwidths are shown schematically in Figure 8.8.

(8-8)Figure 8.8

Gridded Data:

The Experimental:Search Parameters:Gridded Data menu creates the dialog shown in Figure 8.9. This dialog is used to define how many spatial equation will be calculated and display the grid dimensions of the data.

(8-9)Figure 8.9

To define the search parameters for each spatial equation, define the number of models/directions desired then press the Define Directions button. This will create the dialog shown if Figure 8.10.

(8-10)Figure 8.10

From the search directions dialog, the search direction and lag spacing is defined by specifying the X, Y, and Z step size (These are integer values in units of rows, columns, and layers). The maximum lag, is specified by defining the maximum number of steps. The Plot toggle is used to turn on and off plotting for each line. The graph can get very busy and it can become difficult to determine which line is associated with each model/direction otherwise.

Spatial Equations:

The Experimental:Spatial Equation dialog (Figure 8.11) is used to define which spatial continuity equation will be used by vario. These spatial measures are calculated by the algorithms obtained from the geostatistical package GSLIB (Deutsch and Journel, 1992). See the Vario Mathematics section of this chapter. For a full understanding of the strengths and limitations of the different spatial measures, the reader is strongly referred to the original source by Deutsch and Journel (1992).

(8-11)Figure 8.11

Eleven different measures of spatial continuity are available (Figure 8.11). The standard Semivariogram is the default method. For the Semivariogram, General Relative Semivariogram Pairwise Relative Semivariogram, Semivariogram of logarithms, Semirodogram, Semimadogram, and Indicator Semivariogram spatial equations the head (or value) data column in the Experimental:Hard Data dialog (Figure 8.2) must be defined. For the Cross-Semivariogram, Covariance, Correlogram, and Soft Indicator Covariance spatial equations the head and tail variables must be defined. For the Indicator Semivariogram and the Soft Indicator Covariance spatial equations, the minimum and maximum indicator cutoffs must be defined (Figure 8.5, discussed below). For the Soft Indicator Covariance spatial equation, the parameters in the Experimental:Soft Data dialogs must be defined.

The Calculate button calculates the specified spatial measure and plots the resulting values at the appropriate lag value.

NOTE: Not all spatial equations can be used with the gridded data solution method. See comment above.

Covariance View:

For aesthetic reasons, covariance calculations can be plotted as standard semivariograms by the relationship formulated by Isaaks and Srivastava (1988). The Experimental:Covariance View menu option allows the results to be plotted in the normal Covariance form or as a Semivariogram. This option if available for the Cross- Semivariogram, Covariance, and Soft Indicator Covariance spatial equations only. This relationship does not require any assumption of ergodicity as is required in the more common relationship between the semivariogram and covariance (see Vario Mathematics section).

2D Semivariogram:

Described thus far were tools for calculating one-dimensional experimental semivariograms (For example, Figure 8.12, a set of three models based on the same data and search direction, but with different half-angles). One-dimensional models are good you describing the spatial variation in a particular direction, but it can be difficult and time consuming to determine from these, the principle anisotropy's for the data set. This process is simplified by calculating two-dimensional experimental semivariograms. There are some limitations with the method though (discussed below), so these models should be used to get an overall feel for the spatial variability of the data, but one- dimensional models should be calculated along the principle anisotropic axes as input for the kriging solutions.

(8-12)Figure 8.12

Search Parameters:

The Experimental:2D Semivariogram:Search Parameters dialog (Figure 8.13) is used to define the Lag Distance, and a Maximum Search Distance used in calculating the 2D semivariogram. Since many data sets are actually three-dimensional, the orientation of the 2D plane must also be specified. These are degree angle rotation around the X, Y, and Z Axes. The rotation directions are defined in Figure 8.14.

(8-13)Figure 8.13

(8-14)Figure 8.14

NOTE and WARNING: There are not the same rotation directions as described for sisim (Chapter 15). This discrepancy will be corrected in a future release.

When all the parameters are defined, press Calculate on this dialog. This calculation may take some time, so be patient. The calculation is basically computing eighteen experimental semivariogram models with a five degree half-angle. When the calculation is complete, the pop-up dialog shown in Figure 8.15 will be displayed.

(8-15)Figure 8.15

WARNING: Do not close this dialog until you have created all the 2D half-angle maps you want. If you do, you need to do the previous calculation again. Once the base calculation is made, alternative half-angles can be produced very quickly.

Once the base calculation is complete, the data set can be evaluated using a limited number of half-angles (5o, 10o, 15o, ..., 75o, 85o, 90o. Bandwidths are not installed). This is because of how the base calculation was made, but these should be adequate for most purposes. Once a half-angle has been selected, press Calculate on THIS dialog. When calculated, eighteen experimental semivariograms will be displayed in the graph area (Figures 8.16a and 8.16b). This is a very busy way and not very useful way to examine the results. There are two other methods to view the results that are must more useful. The first step is to select a file name (a new data file is going to be created). The next step is to press the Grid or 3D Post button. If you press 3D Grid, the data set will be passed to the program block (Chapter 12), and you will see a display similar to Figure 8.17. In block you can tilt, rotate, and print the results (Refer to Chapter 12 for more details. If you press Grid, the data will be passed to grid (For details see Chapter 9). Once in grid, you can select the Method:Calculate menu option, then View:Contour Map. It will ask you if you want to save the results; say Save & Plot (This will save the results to a file called junk.srf). You will then see a plot similar to Figures 8.18a and 8.18b (See contour, Chapter 10 for more details).

(8-16a)Figure 8.16a and (8-16b)Figure 8.16b

(8-17)Figure 8.17

(8-18a)Figure 8.18a and (8-18b)Figure 8.18b

These steps can quickly be done for each half-angle desired. Using this method, the principle anisotropy's (if present) can be quickly determined.

Calculate:

The Experimental:2D Semivariogram:Calculate button serves the same purpose as the Calculate button on the Experimental:2D Semivariogram:Search Parameters dialog.

Calculate:

The Experimental:Calculate option calculates the specified spatial equation and plots the resulting values based on the parameters definable under the Experimental menu option. This calculates one-dimensional semivariogram models only.

[TOP] [SYNTAX]


Jackknifing:

Jackknifing is a technique that can be used with very small data sets (generally less then 50 to 100 data points) to evaluate the uncertainty in the semivariogram. With small data sets, there are often too few pairs in each lag to generate meaningful results and minor changes in the search angles, half angles, bandwidths, or lag spacing can significantly change the resultant semivariogram. In the simple case, what this technique does is, remove one data point at a time from the data set, and calculate a semivariogram for each set. For n data points, this method generates n + 1 (semivariogram of full data set) experimental semivariograms. At each lag distance there will be a scatter of calculated values. Based on these values, a mean and the variance and standard deviation can be calculated for each lag. By plotting each value for each experimental semivariogram at each lag, and 95% confidence error bars, much if the uncertainty of the semivariogram can be described (Figure 8.19).

(8-19)Figure 8.19

NOTE: This method can be used on any data set, but because it generates n + 1 experimental semivariograms, it can be computational very expensive on large data sets. Also, in general, when an experimental semivariogram is well behaved (little variation when only the lag is changed), the uncertainty will be relatively small and the jackknifing method will add little information.

Modify:

The Jackknifing dialog is shown in Figure 8.20. The jackknifing option allows for recalculation of the spatial measure after removing a number of data points. The number of points removed can be set as a fixed number or as a percentage of the total data set. Display options for the jackknifed solution can be set by the user.

(8-20)Figure 8.20

Individual Display:

The Individual Display option allows the user to look at a single jackknifed solution in the display portion of the vario package. The dialog is shown in Figure 8.21.

(8-21)Figure 8.21

Calculate:

The Jackknife:Calculate option calculates and jackknifes the specified spatial equation and plots the resulting values based on the parameters definable under the Jackknife menu option.

[TOP] [SYNTAX]


Graph:

Graph allows the user to specify various attributes about the appearance of the graph. Attributes about the graph Border, Error-bar Styles, Fonts, Labels, Legends, Mesh, and Line Styles.

Border:

Graph:Border is described in Chapter 5 in the Graph:Border section (Figure 5.9).

Error-Bar Style:

Error-Bar Style controls the display characteristics of the error bars. Error bars only exist when a jackknifed calculation has been done. The Error-Bar Style dialog is shown in Figure 8.22. The Error-Bar Style settings are very similar to the settings available in the Style: Set Line Attributes dialog as described in Chapter 5 (Figure 5.14). The only difference in the dialogs is the option to set the extent of the cross-width on the error-bars.

(8-22)Figure 8.22

Fonts:

Graph:Fonts is described in Chapter 5 in the Graph:Fonts section (Figures 5.10 and 5.11).

Labels:

Graph:Labels is described in Chapter 5 in the Graph:Labels section (Figure 5.12).

Legends:

Graph:Legends:Data Parameters creates the pop-up dialog shown in Figure 8.23. The dialog controls whether or not (Compare Figures 8.1, 8.12, and 8.16) the semivariogram Data Parameters are shown on the graph, and if they are displayed, in which corner, and for which semivariogram model.

(8-23)Figure 8.23

Mesh:

Graph:Mesh is described in Chapter 5 in the Graph:Mesh section (Figure 5.13).

Style:

Graph:Line Styles is described in Chapter 5 in the Graph:Style section (Figure 5.14).

[TOP] [SYNTAX]


Log:

The Log menu option is supplied to allow the user to save, view, or print all text which has been written to the log/status window by the program or added by the user (The log window is also a simple text editor). The options include View Log, Save, Save as, Clear, and Print. View Log, Save, and Save as are similar in operation to the menu options under File described above.

[TOP] [SYNTAX]


Plot:

Plot is described in Chapter 5 in the Plot section.

[TOP] [SYNTAX]


Model:

The model menu option allows the user to launch the variofit package from within the vario package. The model dialog is displayed in Figure 8.24. The dialog allows the option of placing the launched variofit package in any one of the four corners of the screen. The user may decide to just launch the variofit package several times in order to save displays of a spatial measure while recalculating. In this fashion, it is possible to simultaneously compare the graphics from several calculations.

(8-24)Figure 8.24

[TOP] [SYNTAX]


Help:

Help works exactly as explained in Chapter 5 (plotgraph, Figure 5.15) Help section.

[TOP] [SYNTAX]


Example of Using Vario:

There are three methods to load a file in vario. The first is to execute vario from the UNIX prompt and open the file from the menu, the second is to pass the file as a command line argument, and the third is to define the file name in the program preference file. To open a file from the main menu, execute vario from the UNIX prompt:

> vario

Once in the application, select the File:Open menu option. The pop-up dialog shown in Figure 5.2 will appear. Select the desired file. To open a file from the command line, enter at the UNIX prompt:

> vario [optional arguments] filename

NOTE: if other command line arguments are used (See Running From the Command Line Section) the file name must be specified last. Also, if any command line arguments are used, you must specify a file name. For example:

> vario gs.water.dat

will open the data file named "gs.water.dat", and

> vario -hbw 6.0 gs.water.dat

will open the same data file, but it will specify the bandwidth in the horizontal search direction to be 6.0. This file consists of GEO-EAS (See Setting up the Input Data File section) header and 659 data locations. The file has three columns: X, Y, Value. The values are all integers and range from 1 to 13. The vario package is called by typing "vario". Once the display appears on the screen the Experimental:Data dialog is brought up and edited. The X and Y columns are set to 1 and 2 respectively, the Z column is set to zero and the Value column is set to 3. Once this is done the file can be opened through the File:Open dialog as discussed previously.

The next step is to set the desired parameters which control the calculation of the spatial measure. The Experimental:Search Parameters:Point dialog is opened and modified. Press the Define Directions button. For this example, a lag distance of 0.25 units and the directional bandwidth is set to 6.0 units. The horizontal search direction is 0 degrees and the horizontal 1/2 angle is 90.0 degrees (an omni-directional calculation). Since this spatial measure is being calculated on a two dimensional data set, the plunge bandwidth, search direction and 1/2 angle are ignored in the calculation and can be left at the default values.

Next, a spatial measure is chosen by going to the Experimental:Spatial Equations dialog. For this example an indicator semivariogram was chosen and the upper threshold is set to 8.5. At this point, all that remains is to calculate the spatial measure (in this case, the indicator semivariogram). This can be done by clicking on the Experimental:Calculate button, or if the Experimental:Search Parameters:Points:Define Directions dialog remains on the screen the Calculate button at the bottom of this dialog can be used. Using the Calculate button in the Experimental:Search Parameters dialog, allows the user to adjust calculation parameters and recalculate the spatial measure without having to close the dialog. Results of the calculation of the indicator semivariogram are shown in Figure 8.25. The display parameters can be adjusted within the Border dialog.

(8-25)Figure 8.25

[TOP]


Running From the Command Line:

In many cases it is more convenient to run the application completely from the command line, or at least pass some parameter values in from the command line. The options listed below allow the user to accomplish almost anything that is possible from within the X-windows application from the command line (adding lines from different files is not currently supported). This feature can be useful when the user does not have a X-windows/Motif terminal available, or when many graphs need to be processed quickly, and the operation can be completed in batch mode without user interaction.

Syntax:

vario [-3f " "] [-3haz #] [-3rx #.#] [-3ry #.#] [-3rz #.#] [-calct #] [-ch #] [-ct #] [-dd #] [-dip {#.#}] [-dir {#.#}] [-dsd #] [-esp #] [-fl #] [-fnt1 " "] [-fnt2 " "] [-fnt3 " "] [-fnt4 " "] [-fnt5 " "] [-fnt6 " "] [-fnts1 #.#] [-fnts2 #.#] [-fnts3 #.#] [-fnts4 #.#] [-fnts5 #.#] [-fnts6 #.#] [-gml {#}] [-gxi {#}] [-gyi {#}] [-gzi {#}] [-gnd #] [-hbw #.#] [-help] [-hi #.#] [-hw #.#] [-il #] [-ind #] [-jack] [-jcl #.#] [-jebd #] [-jebr #] [-jev #] [-jir #] [-jjr #] [-jrp #] [-jrpc #.#] [-jrt #] [-lag {#.#}] [-lbc #] [-lc {#}] [-lgf " "] [-lgpa #] [-lgpp #] [-li #.#] [-lpbm #.#] [-lpc #] [-lpd #] [-lpf " "] [-lph #] [-lplm #.#] [-lpo #] [-lppsext " "] [-lpq " "] [-lpr] [-lprm #.#] [-lps #] [-lptm #.#] [-lty {#}] [-ltk {#.#}] [-md #] [-mg #.#] [-mox #.#] [-moy #.#] [-ms #] [-mx #.#] [-my #.#] [-nsi #] [-out " "] [-prf " "] [-rfh #] [-run] [-set #] [-sfa #] [-sic #] [-sfl {#}] [-sp #] [-spo #] [-ssz {#.#}] [-sttl " "] [-sty {#}] [-swo #] [-ttl " "] [-ubc #] [-unf " "] [-vbw {#.#}] [-vc #] [-vw {#.#}] [-xc #] [-xfmt " "] [-xlabel " "] [-xmax #.#] [-xmin #.#] [-xMt #.#] [-xmt #] [-xy #.#] [-yc #] [-yfmt " "] [-ylabel " "] [-ymax #.#] [-ymin #.#] [-yMt #.#] [-ymt #] [-ys #.#] [-zc #] [filename]

Meaning of flag symbols:

# = integer
#.# = float
" " = character string.
{} = variable is an array. Values must be seperated by a ',' and no spaces are allowed. Do not use the "{ }" symbols on the command line.

NOTES:

1). All parameters in [] brackets are optional.
2). Quotes must be used around character strings.

If no entry is required for flag, flag command executed.

Flag Definitions:

-3f = 2D experimental semivariogram filename default = "junk.exp.dat"
-3haz = 2D half-angle default = 0
0
1
2
3
4
5
6
7
8
9
=
=
=
=
=
=
=
=
=
=
0 degrees
15 degrees
25 degrees
35 degrees
45 degrees
55 degrees
65 degrees
75 degrees
85 degrees
90 degrees
-3rx = 2D rotation around X-axis default = 0.0
-3rx = 2D rotation around Y-axis default = 0.0
-3rx = 2D rotation around Z-axis default = 0.0
-calct = calculation type default = 0
0
1
=
=
Irregular (point)
Gridded
-ct = head variable column default = 4
-ct = tail variable column default = 4
-dip {} = dip (plunge) angle default = 0.0
-dir {} = search direction default = 0.0
-dsd = data set dimension (activate z dimen.) default = 1
0
1
=
=
False
True
-esp = exageration scale priority default = 0
0
1
=
=
favor y-exageration scale (-ys)
favor x/y ratio
-fl = first lag setting default = 0
0
1
=
=
half lag
full lag
-fnt1 = main title font default = Helvetica-Bold
-fnt2 = secondary title font default = Helvetica-Bold
-fnt3 = axes label font default = Helvetica
-fnt4 = division font default = Helvetica
-fnt5 = annotation font default = Helvetica
-fnt6 = mouse position font default = Helvetica
-fnts1 = main title font size default = 24.0
-fnts2 = main title font size default = 15.0
-fnts3 = main title font size default = 15.0
-fnts4 = main title font size default = 12.0
-fnts5 = main title font size default = 10.0
-fnts6 = main title font size default = 12.0
-gml = grided data: maximum lags default = 10
-gnd = number of grid search directions default = 1
-gxi = grided data X step increment default = 1
-gyi = grided data Y step increment default = 1
-gzi = grided data Z step increment default = 1
-hbw {} = horizontal bandwidth default = 1/2th max diag
-help = give this help menu
-hi = high indicator cutoff default = data max.
-hw {} = horizontal 1/2 angle default = 90.0
-il = soft indicator index level default = 1
-ind = irregualr number of search directions default = 1
-jack = run, jackknife, and calcualte vario without X-interface
-jcl = jackknife error-bar/band confidence level default = 90.0%
-jebd = plot jackknife error-bands (lag variance) default = 1
0
1
=
=
False
True
-jebr = plot jackknife error-barss (g(h) variance) default = 1
0
1
=
=
False
True
-jev = plot jackknife error variance bars default = 1
0
1
=
=
False
True
-jir = plot intermediate jackknife results default = 1
0
1
=
=
False
True
-jjr = plot full unjackknifed semivariogram default = 1
0
1
=
=
False
True
-jrp = jackknife number of points removed default = 1
-jrpc = jackknife percentage of points removed default = 10%
-jrt = jackknife point removal protocall default = 0
0
1
2
=
=
=
Fixed Number
Percentage
Well ID
-lag {} = lag spacing default = 1/10th max diag
-lbc = lower bound column (soft data) default = 4
-lc {} = line color default = variable
0
1
2
3
4
5
6
7
=
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-lgf = log file name defalut = "log.dat"
-lgpp = legend parameter position default = 1
0
1
2
3
=
=
=
=
Top left
Top right
Bottom left
Bottom right
-lgpa = legend model displayed defalut = 1
0
1
=
=
False
True
-li = low indicator cutoff default = data min.
-lpbm = page bottom margin default = 1.5
-lpc = number of copies to print default = 1
-lpd = print destination default = 0
0
1
=
=
Printer
File
-lpf = print filename default = "junk.ps"
-lph = print header page default = 0
0
1
=
=
False
True
-lplm = page left margin default = 1.5
-lpo = print orientation default = 0
0
1
=
=
Portrait
Landscape
-lppsext = search extention for postscript files default = "*.ps"
-lpq = print queue default = "ps"
-lpr = print file at specified orientations
-lprm = page right margin default = 1.0
-lps = print output default = 0
0
1
=
=
Black & white
Color
-lptm = page top margin default = 1.5
-lsfl {} = fill line symbol default = 0
0
1
=
=
False
True
-lsc {} = line symbol color default = variable
1
2
3
4
5
6
7
=
=
=
=
=
=
=
Black
White
Red
Green
Blue
Magenta
Yellow
Cyan
-lssz {} = line synbol size default = 9.0
-lsty {} = line symbol type default = 0
-1
0
1
2
3
4
=
=
=
=
=
=
No Symbol
Circle
Cross
Diamond
Square
X
-ltk {} = line thickness default = 1.0
-lty {} = line type default = 0
-1
0
1
2
=
=
=
=
No Line
Solid
Dashed
Double Dashed
-md = dash mesh default = 0
0
1
=
=
False
True
-mg {} = mag lag default = max diagonal
-mox = X mesh origin default = 0.0
-moy = Y mesh origin default = 0.0
-ms = use mesh default = 0
0
1
=
=
False
True
-mx = X mesh frequency default = 1/10 DX
-my = Y mesh frequency default = 1/10 DY
-nsi = number of soft indicators default = 8
-out = output *.gam filename defalut = "junk.gam"
-prf = preference file name defalut = "vario.prf"
-rfh = screen refresh default = 0
0
1
=
=
On exposure
On update
-run = run and calcualte vario without X-interface
-set = spatial equation type default = 0
0
1
2
3
4
5
6
7
8
9
10
=
=
=
=
=
=
=
=
=
=
=
Semivariogram
Cross-Semivariogram
Covariance
Correlogram
General Relative Semivariogram
Pairwise Semivariogram
Semivariogram of Logarithms
Semirodogram
Indicator Semivariogram
Soft Indicator Covariance
-sfa = soft flag A type data default = 2
-sic = soft index column default = 6
-sp = plot as semivariogram default = 1
0
1
=
=
False
True
-spo = soft print option default = 0
0
1
2
=
=
=
Hard-Hard
Hard-Soft
Soft-Soft
-sttl = Secondary title default = " "
-swo = soft weighting option default = 0
0
1
=
=
stright pair weighting
p1-p2 scaled pair weighting
-ttl = Main title default = Filename
-ubc = upper bound column (soft data) default = 5
-unf = soft data uncertainty definition file default = Undefined
-vbw {} = vertical band width default = 1/50th max lag
-vw {} = vertical 1/2 angle default = 90.0
-xc = X data input column default = 1
-xfmt = Number of decimal places for X-axis default = ".2f"
-xlabel = X-axis label default = "X"
-xmax = Graph X-maximum default = Data Maximum
-xmin = Graph X-minimum default = Data Minimum
-xMt = X main tic frequency default = 1/10 DX
-xmt = Number of minor X tics default = 5
-xto = X axis label origin default = 0.0
-xy = xy ratio default = 1.5
-yc = Y data input column default = 2
-yfmt = Number of decimal places for X-axis default = ".2f"
-ylabel = X-axis label default = "Y"
-ymax = Graph Y-maximum default = Data Maximum
-ymin = Graph Y-minimum default = Data Minimum
-yMt = X main tic frequency default = 1/10 DY
-ymt = Number of minor Y tics default = 5
-ys = Y-axis exageration relative to X-axis default = Calculated
-yto = X axis label origin default = 0.0
-zc = Z data input column default = 3

[TOP]


Setting up the Input Data File:

Two data file formats are readable by vario. The user does not have to explicitly tell the program what format the data file is; to some extent the program can determine this based on the file format, or on header lines within the data file.

The basic file consists of columns of data. The vario package will read in up to 10 columns in the data file. The number of rows of data is limited only by machine memory. This type of data file requires no header lines. Once inside the program, the columns denoting the X and/or Y and/or Z coordinates as well as the column containing the data values are specified within the I dialog. If a soft data calculation is to be done, the columns containing the upper bound, lower bound and the soft data index are defined in the soft data dialog by touching the data options button and filling in the dialog.

The second data format that vario will read is the "GEO-EAS" format. This is described in Chapter 5.

The following section explains the format for soft data entry. Table 8.1 displays the input format for hard data, the three types of soft data and locations in the domain where there are no conditioning data. In the case of a hard data conditioning value, the coordinates of the well are given and the class to which the attribute belongs is both the upper and lower bound of an interval. Making the upper and lower bounds of the interval equal denotes that there is negligible error in assigning this measure of the attribute to class 3.

X Y Z Index Bound 1 Bound2 Comments
23
27
39
55
44
104
279
340
412
85
1
1
1
1
1
1
2
1
-2
-1
3
4
3
2
1
3
4
5
4
5
Hard Data
Soft Data A
Soft Data B
Soft Data C
No Conditioning Data

Table 8.1: Format for input data files. X, Y, and Z are coordinates.

Type A soft data are entered in a manner similar to hard data in that a single value is assigned to both ends of the interval. However, there is uncertainty in assigning this location to class 4. As was seen in the discussion above, this uncertainty is quantified by the values p1 and p2. In the case of type A data, the index value is a flag telling the simulation software where the values of p1 and p2 for each threshold are located. This index corresponds to the index in the uncertainty file. These values of p1 and p2 will be entered in the uncertainty file as shown above. If p1 and p2 do not vary spatially, every location conditioned with Type A soft data will have the same index. If p1 and p2 do vary spatially, different locations will be characterized by different values of p1 and p2 and each set of p1 and p2 values will have a different index. The index numbers can range from 2 to infinity.

Type B soft data are entered with the lower and upper bounds of the interval. Through some technique, it is possible to determine that the location has a class value of 3, 4 or 5.

Type C soft data are entered in a manner similar to those of type B. The bounds define an interval, but in the case of type C data the shape of the distribution between the interval bounds is known. The index, from -2 to -infinity, is a flag telling the simulation software where to locate the distribution within the uncertainty file that belongs within the interval bounds for this location.

Uninformed locations are given the maximum bounds of the observed data. For example, at each location in the domain where there is no well, nor any soft data measurement, the simulated class must be within the extremes of the classes observed throughout the site. Assigning the uninformed locations to be within the maximum and minimum values observed during the site investigation relies on the assumption that the maximum and minimum of the attribute within the domain have been sampled. This may not be the case, and it may be reasonable to set the bounds on the uninformed locations to define a greater interval than the observed interval.

Uncertainty File:

The format of this file is given below by example:

		2  1
		2
		0.876   0.123
		0.921   0.147
		0.806   0.095
		3
		0.798   0.078
		0.932   0.134
		0.902   0.201
		-2
		0.145
		0.433
		0.780

The first line contains the number of sets of soft data type A (imprecise data) calibration sets and the number of type C (prior probability) soft data cumulative probability distributions. For this example file, there are two type A data sets and 1 type C prior cdf. The second line contains the index for the first set of type A calibration probabilities. Type A indices are equal to or greater than 2. In this example file there are three indicator thresholds and the next three lines hold the p1 and p2 values for each threshold. Lines 6 through 10 are the index and p1 and p2 values for the second set of type A calibration. Line 11 holds the index for the only type C prior cdf. Type C indices are equal to or less than -2. The last three lines of this example file are the values of the prior cdf at each of the indicator thresholds.

[TOP]


Vario Mathematics:

The following section presents the mathematics behind the calculations performed within the vario package. The equations used to calculate the different measures of spatial continuity are presented first. These equations are taken from the GSLIB Software Library (Deutsch and Journel, 1992). The following description is paraphrased from Deutsch and Journel (1992).

Semivariogram:

This is the traditional semivariogram measure. The gamma value is one half the average squared distance between to variables separated by a vector h.

(8-1) (8-1)

N(h) is the number of pairs of data points. xi is the value at the start or "tail" of the pair and yi is the variable at the end or "head" of the pair (Figure 8.26). Calculations with the semivariogram should be limited to cases where the head and tail refer to the same variable (attribute). For different variables, the cross-semivariogram should be employed.

Cross-Semivariogram:

A measure of cross variability defined as half the average product of h- increments corresponding to two different variables (attributes).

(8-2) (8-2)

zi and zi' are the tail and head values of attribute z respectively. Similarly, yi and yi' are the tail and head values of the second attribute. The head and tail for both attributes are separated by the vector h.

Covariance:

This spatial measure is the covariance measure used in traditional statistics. Written in spatial notation this calculation is referred to as the "non-ergodic covariance." The covariance measure does not explicitly assume that the means of the head variable and the means of the tail variables are equal.

The means of the head and tail variables are denoted by m+h and m-h respectively and are calculated as:

(8-3) (8-3)

where

(8-4) (8-4)

(8-5) (8-5)

If x and y refer to different variables, the covariance calculation determines the cross- covariance. This calculation is used to determine the cross-covariance between hard and soft data.

Correlogram:

The correlogram is the covariance calculation standardized by the respective tail and head standard deviations:

(8-6) (8-6)

where s-h and s+h refer to the standard deviation of the tail and head values respectively. The standard deviations are calculated by:

(8-7) (8-7)

(8-8) (8-8)

When x and y refer to two different variables, this calculation becomes the cross- correlogram.

General Relative Semi-semivariogram:

The traditional semi-semivariogram measure is standardized by the squared mean of the data for each lag:

(8-9) (8-9)

Pairwise Relative Semi-semivariogram:

The traditional semi-semivariogram calculation where each pair is normalized by the average of the tail and head values:

(8-10) (8-10)

Note: both the general relative and the pairwise relative semivariograms have been shown to be resistant to data sparsity and outliers when applied to positively skewed data sets. Because of the denominators in the calculations, the general and pairwise relative semivariograms should be used only with positive variables.

Semivariograms of Logarithms:

The traditional semi-semivariogram calculated on the natural logarithms of the original variables:

(8-11) (8-11)

Semirodogram:

A measure of spatial variability similar to the traditional semi-semivariogram, but using the square root of the absolute difference between variables separated by a vector h.

(8-12) (8-12)

Note: rodograms and madograms are useful for determining the large scale spatial structure but should not be used for modeling the nugget value of spatial continuity.

Semimadogram:

A measure of spatial variability similar to the traditional semi-semivariogram, but using the absolute difference of the two variables separated by a vector h.

(8-13) (8-13)

Indicator Semivariogram:

The semivariogram is constructed on an indicator variable. The indicator classes are constructed within the program and an indicator threshold must be supplied. The indicator classification is done as:

(8-14) (8-14)

Where cuti is the indicator threshold.

Indicator Covariance:

Indicator covariance is calculated by doing an indicator transform on the data and then using the covariance equation. The advantage over the indicator semivariogram is that the assumption of ergodicity need not be met when using covariance.

Soft Data Theory:

What Are Soft Data?:

The most comprehensive overview of soft data and their application to problems in the earth sciences is given by Alabert (1987). Much of the following section is a distillation of Alabert (1987) and the notation in this paper will follow Alabert's as much as possible.

Hard data are values which are measured with no, or negligible, uncertainty (e.g., the shear strength of a soil sample as measured in a laboratory). Hard data are considered to be quantitative data and ideally, there will be a large amount of hard data collected during site investigations. Hard data will be denoted as z(x): the value of attribute z as measured at location x.

Soft data are data which contain nonnegligble uncertainty (Alabert, 1987). These qualitative data will be written as (x) : an estimate of the attribute z at location x. Three categories of soft data are generally recognized. Type A soft data are values, or value classes, assigned at a location based on an imprecise measurement or an expert guess (e.g., lithologic facies based on a measurement of seismic velocity; estimates of porosity and/or permeability from geophysical well logs; length of a channel sand deposit based on an expert opinion). Type B soft data consist of recognized bounds, or a single bound, on a value without information on the distribution of the value between the upper and lower bounds (e.g., from previous exploration, an aquitard is believed to be between 100m and 150m below the ground surface at a given location; by observing a house which is still standing but suffered structural damage, an interval bounding the seismic intensity of an earthquake at that location can be determined). Type C soft data consist of a prior probability distribution on the variable of interest (e.g., it is known from previous studies that the distribution of hydraulic conductivity values for a sandstone aquifer is log-normal with a given mean and variance). Table 8.2 summarizes the three types of soft data. These three types of soft data are discussed below in terms of Bayesian statistics.

TYPE OF DATA FORMAT UNCERTAINTY MEASURE
HARD DATA single value z(x) no uncertainty
SOFT DATA
Type-A
Type-B
Type-C
imprecise single value z^(x)
|zmin(x), zmax(x)|
probability distribution
quality index, probability level
interval width
probability distribution

Table 8.2: Types of information (after Alabert, 1987).

Bayesian Statistics:

Bayesian statistics can be used to evaluate the three types of soft data. Bayes Theorem states:

(8-15) (8-15)

where A and B are discrete events. For continuous random variables X and Y, Bayes Theorem is expressed in terms of probability density functions (pdf's):

(8-16) (8-16)

Where fx(x) and fy(y) are the marginal pdf's of random variables X and Y respectively, and f(x|y) is the pdf of random variable X given that Y = y and vice versa for f(y|x) (x and y are the specific values of X and Y in this instance) (Alabert, 1987).

For a random variable Z corresponding to an unknown attribute z at a given location, the prior marginal pdf on z is fz(z). This pdf is derived from any knowledge of the variable in the area and is a prior pdf as no experiment has yet been performed at this location to gain more knowledge of the attribute z. This pdf, fz(z), summarizes the uncertainty on z at this location. An experiment performed to gain knowledge of variable z at a given location will update the pdf on z:

(8-17) (8-17)

f(z|e) is a posterior distribution, and it is a measure of the uncertainty on z after the experiment E. f(e|z) is a likelihood function-it measures the likelihood of the outcome of the experiment, e, given z and thus quantifies the informative quality of the experiment E (Alabert, 1987).

Bayesian Analysis of Hard Data:

In the case of hard data, the outcome of an experiment, e, is a direct and unambiguous measure of the attribute z. The prior uncertainty on z has been reduced to essentially zero (Figure 8.27).

(8-27)Figure 8.27

Bayesian Analysis of Soft Data (A):

The precision of the measurement of attribute z by the experiment can range from perfect (hard data) to poor enough such that no information on z is gained by performing the experiment. In the latter case, the imprecisely measured value, , is independent of the actual value of the attribute, z, and the prior distribution fz(z) equals the posterior distribution, f(z|e) - "the experiment did not improve the prior beliefs on z" (Alabert, 1987, p.18).

In practice, the development of a reasonable likelihood function is difficult. In the best situations, both hard and soft data can be sampled in the same area and the quality of the experiment, or updating, can be assessed via calibration samples. Each calibration sample is a location where both hard and soft data measurements of the same attribute have been obtained. For example, if lithology is to be determined from seismic velocity measurements (soft data), the accuracy of this method can be determined by examining the results of the experiment (interpreting lithology from velocity) at locations where hard data are available (the wells). From the calibration samples, the probabilities of misclassifying the attribute via the soft data are determined. These probabilities are capable of fully characterizing the indicator likelihood function for each indicator cutoff, as will be discussed in the "Estimating the Z-cdf With Soft Data" section. This calibration is essential, otherwise it would be necessary to rely entirely on models of uncertainty which could be woefully inadequate. Alabert (1987) discusses several models that could be used as likelihood functions when it is impossible to quantify the precision of the experiment.

Bayesian Analysis of Soft Data (B):

Soft data of type B are those which are defined by the upper and lower bounds of an interval, or a single bound of an interval. The exact value within the interval remains unknown as does the distribution of the attribute within the interval. In Bayesian terms, the posterior distribution is considered zero outside the bounds of the interval and its form within the bounds is not known. If the upper and lower bounds are equal, the value is considered to be a hard datum. As the bounds on the interval increase to the physical limits of the attribute, the interval datum contains no additional information and is independent of the actual value z (Alabert, 1987).

Bayesian Analysis of Soft Data (C):

Soft data type C concerns the case where a distribution of an attribute exists prior to an experiment. This prior distribution is distinct from a prior distribution which may be generated from hard data obtained at the site. The prior distribution of type C soft data is derived without experimentation at the site. For example, this prior distribution could be derived from physical knowledge and/or expert opinions of the attributes behavior or from experiments carried out previously at an analogous site.

Indicator Coding of Soft Data:

The background of indicator geostatistics has been discussed previously. The methods of encoding soft data into the indicator formalism are discussed below, and are again a distillation of the work of Alabert (1987).

The range of the spatial attribute z(x) is discretized into Nc+1 classes described by Nc thresholds, or cutoffs, z1, z2, ... zNc. The indicator coding of the information does not call for a priori knowledge of prior and posterior distributions, hard and soft data are coded similarly.

Hard Data:

For hard data, each discrete class will contain a 0 or 1. This complete vector of 0's and 1's is the discretized version of the step posterior cdf corresponding to the precise information z(x) (Alabert, 1987).

Soft Data, Type A:

Two formalisms have been developed for coding type A soft data. The first calls for indicators with values between 0 and 1. The second contains indicators with values of only 0 or 1, and a recognition that there is uncertainty associated with each 0 or 1. For reasons discussed by Alabert (1987, pp. 30-31), the latter formalism will be employed in this study. This indicator coding formalism is shown in Table 8.3. The major advantage of this approach is that the soft data can be used consistently with hard data indicators in the kriging procedure (Alabert, 1987).

CutoffsHard Data
Datum
Type-ASoft Data
Type-B
Type-C
ZNc
.
.
.
Z1
1.0
1.0
1.0
0.0
0.0
1.0
1.0
0.0
0.0
0.0
1.0
1.0
?
?
0.0
1.0
0.8
0.5
0.2
0.2

Table 8.3: Indicator coding of hard and soft data (after Alabert, 1987).

As shown in Table 8.3, the type A soft data are encoded with a 0 or 1 at each discrete class. The resulting vector of 0's and 1's contains imprecise information. For example, each indicator "1" has a nonnegligble probability that the true corresponding indicator is actually a 0. Thus the imprecision of the type A indicators must be quantified. This quantification is determined by covariances between imprecise indicators and covariances between imprecise indicators and hard indicators. "Under some assumptions, those covariances are shown to be related to hard indicator covariances through a scaling factor which depends only on the misclassification probabilities": (Alabert, 1987, p. 31) p1 and p2.

(8-18) (8-18)

(8-19) (8-19)

where Z(x) is a binary function defined for a cutoff zc. The probabilities p1 and p2 are easily estimated for each cutoff if calibration samples are available to determine the precision of the experiment generating the soft data. If (x) is a good measure of z(x), p1 approaches 1 and p2 approaches 0. If (x) contains no information of z(x), p1 = p2 = P[(x) <= zc]. This method is essentially cokriging of hard and imprecise indicator data to derive an estimation of spatial probabilities conditioned to hard and soft data (Alabert, 1987).

The fact that p1 and p2 do not sum to 1.0 is a source of confusion. An examination of p1 and p2 in terms of statistical hypothesis testing is given in order to clarify the definitions of p1 and p2. The null hypothesis, Ho, is defined here to be the case of Z(x) <= zk (the actual datum ó the threshold). The possible cases corresponding to acceptance and rejection of the null hypothesis are given below:

(e-a)

There are two cases where Ho is rejected and two cases where it is accepted:

H0 is TRUE H0 is FALSE
Accept H0 Correct Decision
(case 1a)
Type II Error
(case 2a)
Reject H0 Type I Error
(case 1b)
Correct Decision
(case 2b)
Thus, p1 is the probability of accepting the null hypothesis based on the soft data estimate when that is the correct course of action. p2 is the probability of committing a type II error-accepting the null hypothesis based on the soft data when that is an incorrect choice.

Soft Data, Type B:

In the case of a value bounded by an interval, the vector of indicators describing the soft datum is an incomplete series of 0's and 1's (Table 8.3). The value of the attribute is known, or assumed, to be within the interval. So, all indicators at cutoffs below zmin(x) are 0 and all indicators greater than or equal to zmax(x) are 1's. The indicator values are unknown between zmin(x) and zmax(x) (Alabert, 1987).

Soft Data, Type C:

Type C soft data cover the situation where a prior distribution is known. This prior distribution could be thought of as the posterior distribution derived from experiments carried out at another site. The prior cumulative distribution is discretized according to the chosen cutoffs. The coded vectors are complete and contain indicators with values other than 0 or 1 (Table 8.3). Indicator coding of Type C soft data is similar to that developed by Journel (1986) in the original version of the soft kriging method.

Improving Semivariogram Estimates With Soft Data:

A major benefit derived from soft data is better estimation of the spatial covariance In many subsurface studies, all hard data comes from wells. These data are plentiful in the vertical dimension, but usually sparse in the horizontal due to the well spacing. This shortcoming can be overcome by using soft data which are often plentiful in the horizontal dimension (e.g., geophysical data collected in surface surveys). This section addresses methods of estimating spatial covariance with both hard and soft data.

Alabert (1987) uses co-semivariograms, not semivariograms, to develop his theory of improving spatial estimation with soft data. Generally, the covariance can be obtained from the semivariogram by the relationship: C(h) = s2 - g(h). However, this relation has been shown to be valid only in cases where stationarity and ergodicity can be assumed. In most situations, the experimental covariance is a better estimate of the exhaustive covariance than the covariance derived from the semivariogram (Alabert, 1987, Issaks and Srivastava, 1988). All soft data covariance measures calculated with this software can be displayed as semivariograms using a relationship developed by Issaks and Srivastava which does not require implicit assumptions of ergodicity (1988):

(8-20) (8-20)

Improving Covariance Estimates with Type A Soft Data:

Hard and soft data are coded into vectors as discussed in the previous section, and at this point it is assumed that the soft data all came from a single source of information. The error on the hard data is assumed to be negligible; however, the possible error on the soft data must be addressed. As discussed earlier, at each cutoff zc and each location x, the quality of the soft indicator datum (x,zk) is described by the classification probabilities p1(x,zc) and p2(x,zc) (Alabert, 1987). These probabilities are defined:

(8-21) (8-21)

(8-22) (8-22)

Assuming the error on the soft indicators is stationary, p1 and p2 are independent of x and can be written p1(zc) and p2(zc). These probabilities fully characterize the indicator likelihood function at cutoff zc defined as:

(8-23) (8-23)

Alabert (1987) makes the point that knowledge of the Nc indicator likelihood functions is not equivalent to knowledge of the full likelihood function f(|z); however determination of the full likelihood function is not practical, nor is it necessary to fully account for the quality of the indicator information. Therefore, in practice, the likelihood function is estimated through estimates of p1(zc) and p2(zc). A graphical method of determining p1 and p2 is shown in Figure 8.28. The misclassification probabilities are calculated from the equations at the top of the figure. Regions A and D are inclusive of any points lying on the vertical Zc. Region A is also inclusive of any values lying directly on the horizontal Zc. The case of Type A soft data without calibration samples will not be discussed here.

(8-28)Figure 8.28

Two relations will be developed: the first is between hard indicator covariances and hard-soft indicator cross-covariances and the second is between hard indicator covariances and soft indicator covariances. Alabert (1987) derives a simple relationship between the hard indicator covariance C(h,zc) and the hard-soft indicator cross- covariance CI(h,zc). Assuming stationarity of I and î and that p1(zc) does not equal p2(zc):

(8-24) (8-24)

An estimate of the hard indicator covariance can be derived from the hard-soft indicator covariance:

(8-25) (8-25)

the subscript Nh refers to that portion of the estimate derived from the available hard indicators. Nh, Ns denotes the estimate derived from the experimental hard-soft indicator cross-covariance. The weight, w, can be adjusted to account for both the number of pairs involved in each covariance estimate, as well as, the quality of the soft indicators (Alabert, 1987).

The second case considers estimating the hard data covariance from the covariance between soft data locations. When considering two soft indicators at two different locations, î(x,zc) and î(y,zc), Alabert (1987) shows that the covariance between the two is a scaled version of the hard data covariance between them:

(8-26) (8-26)

and assuming stationarity of I and î and rearranging:

(8-27) (8-27)

Note that the quality of the soft information at both locations is taken into account. If one of the two soft data values provides no information on its corresponding hard attribute (p1(zc) approximately equals p2(zc)), then C (x,y,zc) = 0. This model will break down when the errors have a strong spatial correlation (Alabert, 1987). So, assuming stationarity of I and î and if p1(zc) does not equal p2(zc), CI(h,zc) can be estimated:

(8-28) (8-28)

where Ns denotes the estimate of covariance derived from the experimental soft indicator covariance. The weights should again be chosen to reflect the quality of the soft data and the number of pairs involved in each of the experimental covariances. The three weights must sum to 1.0. The two weighting options within the software are described below.

The "Straight Pair Weighting Option" calculates the omega weights strictly by the quantity of hard and soft data pairs within the lag spacing.

(8-29) (8-29)

(8-30) (8-30)

(8-31) (8-31)

Where Nh and Ns denote the number of hard data and soft data pairs within the lag spacing. Ntotal equals the total number of pairs within the lag spacing. The use of soft data to estimate the covariance, or semivariogram, allows the maximum number of pairs available for estimation to be Nh2 + Nh * Ns + Ns2 rather than just Nh2 if only hard data were available.

For a full theoretical development of these relationships the reader is referred to Alabert (1987). Alabert (1987) performed experimental checks for the relationships presented in this section and found that even small amounts of soft data can improve the estimation of the spatial correlation relative to using only hard data.

Improving Semivariogram Estimates With Type B Soft Data:

The impact of type B soft data is simply to provide more hard data locations with which to estimate the semivariogram. All locations of type B soft data are considered as hard data outside the interval and are uninformed locations within the interval. Thus for a given cutoff below the lower bound of the interval, a location with Type B soft data will have an indicator value of zero and it is considered a hard datum. For cutoffs within the interval, the location is considered to be uninformed. For cutoffs above the upper bound of the interval the indicator value is one, and the location is again considered a hard datum.

Improving Semivariogram Estimates With Type C Soft Data:

Type C information provides a prior cumulative probability distribution for every point. This prior probability distribution can be thought of as the defining the probability that the actual value lies below the current indicator threshold at each location. For example a point is classified as an indicator value of 1.0 at the current threshold; and the corresponding pcdf gives a value of 0.45 at that threshold. This translates into a 45 percent chance of that point actually being a one and a 55 percent chance that it is a zero. Unless the pcdf is 0.0 at every point, that data location is classified as a 1.0. The pcdf value for each threshold is then used to determine how that data location is weighted in the semivariogram calculations.

Kulkarni (1984) devised a semivariogram equation which took into account different weights of points at different locations.

(8-32) (8-32)

Where the quantity K is the weight of the data pair calculated as the product of the pcdf value for each point at the current indicator threshold. The number of pairs of data that make up the gamma calculation at each lag is no longer necessarily an integer value since it is the sum of the K weights. The previous equation can be written in terms of a covariance calculation:

(8-33) (8-33)

where the head and tail means are given by:

(8-34) (8-34)

(8-35) (8-35)

Jackknifing:

A problem using the experimental semivariograms is that a method for directly measuring its uncertainty, error, or confidence limits is not available. This is because for each lag, there is only a single calculable mean. It may be the mean value of numerous calculations at the given lag, but by trying to measure the variance of the deviation around the mean, one is in effect calculating the variance of the data set variance at that lag. As a result, as the range increases, typically so does the variance, and the results are basically useless. The calculated variance is generally equal to or greater than the mean value for the lag calculated (i.e. at a given lag, the mean value with 95% confidence is generally between, less than zero (really zero) and more than twice the calculated value; this is not useful information).

To side step this problem, a process called jackknifing (cross-validation) is used (Shafer and Varljen, 1990, and Davis, 1987). Jackknifing is a procedure where one (or more) data points is removed from the data set, and then the experimental semivariogram is calculated. By repeating this procedure for every point in the data set, a series of n (n = number of samples) experimental semivariograms are calculated. For each lag distance there are now n mean (h) values. From these values it is then possible to approximately determine, for example, the 95% confidence limits for the mean (h) value for a particular lag. When these are plotted, the error bars define the possible range of the modeled semivariogram. There is a problem with this method. Each mean value calculated is correlated with the other mean values calculated at that lag (the same data, except for one point, is being used) therefore the variance calculations are not strictly correct (Davis, 1987). As will be explained, this technique is not being used to prove a particular semivariogram model is correct, which it cannot do (Davis, 1987), but to guide the modeler in collecting further data or identifying a likely range of reasonable model semivariograms.

Two other concepts should be considered when jackknifing. First, because data points are being removed from the data set to calculate the experimental semivariogram, the variance, and therefore the calculated sill will generally increase slightly. With more data the population is better defined, and the variance is lower. Secondly, when a single experimental semivariogram based on all the data is calculated, the results may appear to be easily modeled. The problem with a single experimental semivariogram is that it is difficult to determine if it represents the true nature of the site, or if the modeler was fortunate in selecting lags. By jackknifing the data the error-bars let the modeler determine how much confidence can be attributed to the modeled semivariogram.

[TOP]


Bibliography (vario):

Alabert, F., 1987, Stochastic Imaging of Spatial Distributions Using Hard and Soft Information, M.S. Thesis, Stanford University, Stanford, California.

Davis, B.M., 1987, Uses and Abuses of Cross-Validation in Geostatistics, Mathematical Geology, Vol. 19, No. 3, pp 241-248.

Deutsch, C.V. and A.G. Journel, 1992, GSLIB: Geostatistical Software Library and User's Guide, Oxford University Press, New York, 340 pp.

Englund, E. and A. Sparks, 1988, GEO-EAS, U.S. Environmental Protestion Agency, Environmental Monitoring Systems Laboratory, Las Vegas, Nevada, EPA/600/4- 88/033.

Isaaks, E., and R.M. Srivastava, 1988, Spatial Continuity Measures for Probabilistic and Deterministic Geostatistics, Mathematical Geology, Vol. 20, No. 4, pp. 313-341.

Journel, A., 1986, Constrained Interpolation and Qualitative Information, Mathematical Geology, Vol. 18, No. 3, pp. 269-286.

Shafer, J.M. and M.D. Varljen, 1990, Approximation of Confidence Limits on Sample Semivariograms From Single Realizations of Spatially Correlated Random Fields, Water Resources Research, Vol. 26, No. 8, pp 1787-1802.

[TOP]


Table of Contents
Previous Chapter
Beginning of this Chapter
Next Chapter