Table Of ContentsNext Page

New technology and soil landscape mapping in NSW

Geoff Goldrick, Greg Chapman, Alex McGaw, Nicole Simons, Humphrey Milford, Casey Murphy, Jennifer Edye and Andrew Macleod

NSW Dept of Land & Water Conservation
PO Box 149, Kempsey, NSW 2441
Ph: +61-2-6563-1212, Fax: +61-2-6562-8728
gggoldrick@dlwc.nsw.gov.au

Abstract

Soil landscape mapping involves the identification of relationships between soil properties and distribution and environmental attributes including lithology, terrain, biota and climate. Traditionally these relationships have been elucidated through interpretation of aerial photographs, geological and climatic data, field observation, and more recently remotely sensed data. The assumption of mappable relationships between soil landscapes and environmental variables raises the possibility of automated mapping by developing decision rules that define these relationships and yield predictive models of soil landscape class. The decision rules may be “imposed” on the soil landscape model as a result of expert knowledge of the relationships between soil landscapes and the environmental variables, or the rules may be derived by statistical exploration of these relationships. One tool for deriving these relationships is recursive partition of the data to give classification trees.

Classification trees are being used by the NSW Department of Land and Water Conservation to facilitate the mapping of the Southern Brigalow Interim Bioregion. Preliminary results suggest that this approach is suitable for at least reconnaissance scale mapping providing the model can be trained on data that adequately describe the range of environmental variation. Methods are being developed to assess the degree of environmental variation to evaluate the coverage of the training data and guide future sampling. If classification trees can be successfully adapted to this purpose the time necessary for reconnaissance scale regional assessment of soil landscapes should be greatly reduced.

Introduction

Soil characteristics are important parameters for land use and land management. The geographic distribution of soil characteristics is a fundamental consideration for natural resource management of any area. McKenzie (1991) highlights the Australian problem for soil resource inventory; of large areas and a comparatively small resource base. The New South Wales Department of Land and Water Conservation (DLWC) is responsible for compiling a soil resource inventory for NSW. Given this area it is not practicable to undertake routine mapping at scales more detailed than 1:100,000. At this scale in most landscapes it is not feasible to map individual soil types, but repeating patterns of soil can be presented on maps. DLWC maps soil landscapes, which are areas of land that have ‘recognisable and specifiable topographies and soils, that are capable of presentation on maps, and can be described by concise statements’ (Northcote, 1978). The concept of soil landscapes relies on the assumption that there are mappable, deterministic relationships between patterns of soil distribution and biophysical environmental attribute. “Landscapes can be used to identify mappable areas of soils because similar causal factors are involved in the formation of both landscapes and soils. Similarly constraints to rural and urban development of land area related to both landscape and soil limitations” (Chapman and Murphy, 1989). Specifically, soil landscape mapping assumes that the pattern of soil distribution is related to parent material, landform attributes such as slope and elevation, soil hydrology and other climate variables, geomorphological processes, and the impact of vegetation and other organisms interacting over long periods. While such an assumption is undoubtedly a simplification, the models of soil distribution produced by this approach have proved useful for a variety of purposes including land use and farm planning (Murphy, 2001), preparation of Environmental Impact Statements (Department of Land and Water Conservation, 2000), crop and native plant species distribution (Austin et al., 2000) and soil structure decline (Keady et al., 1998), effluent disposal facility designs (Department of Local Government, 1998), salinity (Banks, 2000) and acid soil management.

Traditionally, soil landscape models have been elucidated through aerial photograph interpretation (API) and field observation/sampling combined with geological, topographical and vegetation data. Recently new digital tools have been added to the soil surveyor’s kit , including remotely sensed data (Wilford et al., 1997) and good quality digital elevation models (DEMs) (Dobos et al., 2000). These tools have the potential to improve the estimation and quantification of landform, lithology, vegetation and other biophysical parameters across continuous surfaces and thereby improve models of soil distribution. In concert with these improvements in data availability and quality there have been a number of advances in analytical tools that can be used to model environmental attributes such as climate (Hutchinson, 1991; Houlder et al., 1999) and terrain (Gallant and Wilson, 1996; Wilson and Gallant, 2000) and build predictive models of soil distribution (Odeh et al., 1994; Bui et al., 1999; McKenzie and Ryan, 1999; McBratney et al., 2000).

These data can be used within a geographic information system (GIS) to automatically map soil landscape relationships using imposed or derived models.

Imposed models

This approach involves using expert knowledge to impose a model on the landscape. A soil surveyor derives an expected relationship between soil attributes and environmental attributes based on observation and previous experience. This relationship is expressed as explicit rules that can be interpreted by a Geographic Information System (GIS) to predict the distribution of soil attributes.

Derived models

A more sophisticated approach is the application of statistical techniques such as classification trees in order to analyse the relationship between soil attributes and environmental attributes (McKenzie and Ryan, 1999; Bui et al., 1998; Bui et al., 1999). In this case the model is derived by the GIS rather than imposed upon it. Note that in the context of this paper GIS includes both spatial and non-spatial analytical tools from a variety of software packages.

Imposed and derived models are not mutually exclusive, indeed, the best results are usually achieved when they are used in combination.

This paper illustrates an approach to derived modelling based on classification trees (Therneau and Atkinson, 1997; Venables and Ripley, 1999; Atkinson and Therneau, 2000) using examples from study of soil landscapes of the Southern Brigalow Interim Bioregion (Figure 1) currently being undertaken by the NSW Dept of Land and Water Conservation as part of a Western Region Assessment.

Figure 1: Shaded relief model of the Southern Brigalow Interim Bioregion showing major towns and location (inset)

Soil Landscape Mapping of the Southern Brigalow Interim Bioregion

The Southern Brigalow Interim Bioregion

The Southern Brigalow Interim Bioregion (SBIB) covers over 52,400 square kilometres or 6.2% or NSW. It extends from the Queensland Border south to Narromine and east to Merriwa. The area is extensively cleared Eucalypt woodland in the subhumid summer dominated rainfall zone. Landuse includes extensive grazing, cropping and forestry. Gunnedah, Coonabarabran, Dubbo and Narrabri are amongst the major town centres. The landscape consists of basalt plateaux in the south with rolling hills and low hills in the east and north extending to subdued rises and outwash fans of the Pilliga in the west. Geology includes a variety of Jurassic to recent sediments along with the dramatic volcanic ranges of the Warrumbungles and Nandewar Range. It includes highly fertile basalt derived soils of the Liverpool Ranges and very poor soils of the Pilliga sandstones. Soil erosion, salinity and soil structure decline are among the soils issues of the SBIB.

Background to the study

The Resource And Conservation Assessment Council (RACAC) of the New South Wales Government has commissioned a series of studies in the SBIB in order to make the best possible decision on the long term use of state-owned land for both conservation and production purposes. Information will be collected for regional land use planning, conservation management and land use management. The studies include geological, mineral, soils, vegetation, fauna, socio-economic and wood production related assessments. The soils inventory is important for helping to model individual plant and animal species distributions, modelling of extant and pre-European distribution of vegetation communities, and modelling of site quality for wood resource productivity. Mapped soil attributes (including depth, fertility, water holding capacity and stability) are important for the vegetation mapping project, particularly in the SBIB because the terrain is mostly subdued and soils assume a greater influence in determining plant species distribution. Soils information is a fundamental precursor to subsequent vegetation mapping so it is essential that soils information be provided as soon as practicable.

The SBIB is in a zone of salinity hazard. Matching soils with public land uses which will minimise recharge to water tables will improve water quality in the Murray Darling river system.

The spatial and temporal scales of the SBIB Soil Landscape mapping project make it impracticable to use conventional soil landscape mapping techniques. The constraints of the project demanded the use of rapid, automated techniques with only limited reliance on time intensive techniques such as API and field sampling and observation. To meet these constraints it was decided to derive preliminary soil landscape models using GIS and existing data and then test and refine these models through API and field sampling and observation.

The Geographic Information System and Methods

The GIS for this study (Table 1) consisted of ArcView 3.2 plus Spatial Analyst and Image Analyst extensions (www.esri.com), teamed with RPART (Therneau and Atkinson, 1997; Venables and Ripley, 1999; Atkinson and Therneau, 2000; http://lib.stat.cmu.edu/DOS/S/Swin/Rpart.zip) a set of routines for recursive partitioning within S-Plus (www.cmis.csiro.au/S-Plus). In addition, DLWC have developed an ArcView extension, the Soil Landscape Analysis Package (SLAP) for terrain analysis and a suite of other functions relevant to soil survey.

Table 1: The GIS software packages and their functions

Software

Function

Source

ArcView

Desktop vector GIS

www.esri.com

Spatial Analyst

Extension for ArcView enabling analysis of raster data (grids)

www.esri.com

Image Analyst

Extension for ArcView enabling analysis of image data

www.esri.com

S-Plus

Statistical analysis

www.cmis.csiro.au/S-Plus

RPART

Routines for the construction of classification and regression trees based on recursive partitioning within S-Plus

http://lib.stat.cmu.edu/DOS/S/Swin/Rpart.zip

SLAP

Extension for ArcView for terrain analysis, coordinate transformation, derivation of erosion indices and various raster and vector manipulations

Soils Information and Planning Unit, DLWC

Available Data Sets

The base data for the project consisted of Digital Elevation Models (DEM) and derived attributes, airborne gamma radiometrics, interpolated climate surfaces, interim lithology coverage and existing soil landscape maps at scales of 1:100,000 and 1:250,000 (Figure 2). Table 2 outlines the coverage and potential uses of these data sets for digital soil landscape modelling.

These base data were used to produce the environmental parameters (predictor variables) listed in Table 2. Table 2 also lists a brief description of each parameter, the source and method (usually the software) used to derive it, and where appropriate a reference to a more detailed discussion on the nature and derivation of the parameter.

Table 2: Base data sets used in the SBIB study and their potential uses

Data Set (Coverage)

Uses

References

Gamma Radiometrics

(Missing from north of study area and from Warrumbungles.)

Signal taken from dry soils to depths of up to 50cm. Discrimination of soil parent materials and interpretation of landforming processes, interpretation of soil depth and veneers of aeolian materials

Wilford et al., 1997; Pickup and Marks, 2000.

Digital Elevation Models

(9 sec DEM complete; 25 m DEM complete except northwestern portion of Narrabri 1:250,000 sheet)

Derivation of terrain attributes as predictors of soil landscape class; important predictor for drainage and depositional and erosional processes

Moore et al., 1993; Wilson and Gallant, 1996; Wilson and Gallant, 2000.

Interpolated Climate Surfaces List.

(Complete but accuracy of interpolations between surfaces is dependant on sparse data sets)

Climate as a predictor of biological activity, soil moisture regime and hence soil type; fire regime influenced by climate influences soil parameters

Hutchinson, 1991; McKenzie and Ryan, 1999.

Lithology.

(Complete but composed of tiles of maps of different styles and purposes - not continuous.)

Major determinant of soil mineralogy

Gray and Murphy, 1999; Paton et al., 1995.

Soil Landscapes and other Soil Maps.

(Limited, mostly in south. Mapped at 1:100,000 and 1:250,000 scales with edge mismatches between mapping scales and different unit names - not continuous.)

Tested model of soil distribution in the landscape

Banks, 1998.

Soil Profile Description Points.

(Limited, mostly distributed throughout areas of previous soil mapping)

Point locations of soil types

 

All base data and predictor variables were converted to ArcView grids if they were not already in this form, mosaiced if they consisted if several tiles, and then reprojected to latitude/longitude if they were in AMG. All of this was achieved using the routine in SLAP. The resultant surfaces were grids with a 0.0005 degree (approximately 50 m) cell size.

Table 3: The environmental parameters (predictor variables) used for modelling and prediction of soil landscapes in the BBS

Parameter

Description (units)

Source/Method

Reference

elevation

height above sea level (m)

DEM

 

slope

slope gradient (degrees)

DEM + ArcView + SLAP

[Wilson et al., 2000]

compound topographic index (CTI)

measure of topographic control on soil moisture based on slope and downslope flow accumulation

DEM + ArcView + SLAP

[Wilson et al., 2000]

profile curvature

measure of the rate of the downslope rate of change slope gradient

DEM + ArcView + SLAP

[Wilson et al., 2000]

tangential curvature

a measure of topographic convergence and divergence

DEM + ArcView + SLAP

[Wilson et al., 2000]

relative elevation index (REI)

a local (300m radius) measure of position in the landscape

DEM + ArcView + SLAP

 

temperature (mean annual)

0C

DEM + ANUCLIM

[Houlder et al., 1999]

precipitation (mean annual)

mm

DEM + ANUCLIM

[Houlder et al., 1999]

moisture index (mean annual)

measure of the balance between precipitation and evaporation

DEM + ANUCLIM

[Houlder et al., 1999]

moisture index (coefficient of variation)

measure of the seasonality of the moisture index

DEM + ANUCLIM

DEM + ANUCLIM

radiation index

relative index of the total annual amount of radiation modified for slope and aspect

DEM + ANUCLIM + ArcView + SLAP

 

Potassium gamma ray count

   

[Wilford et al., 1997]

Thorium gamma ray count

   

[Wilford et al., 1997]

Uranium gamma ray count

   

[Wilford et al., 1997]

parent material lithology

a reclassification of geological units

DMR Geology

[Gray et al., 1999]

To “train” the soil landscape model 25,000 sample points were randomly generated for the area covered by existing 100,000 scale soil landscape data (Figure 2). The 1:250,000 scale soil landscape data were excluded for this preliminary analysis due to incompatibilities in the soil landscape classes but these are being resolved for future analyses. 25,000 sample points is arguably excessive because if meaningful relationships do exist between soil landscapes and environmental parameters they should become evident at much lower sampling densities. However, it was considered preferable to over sample the training data and then reduce tree complexity (by the methods described below) rather than run the risk of under sampling and missing important relationships.

Figure 2: Extent of existing 1:100,00 (striped) and 1:250,000 (stippled) soil landscape mapping in the SBIB. Note that only the 1:100,000 scale data were used to train the model for these preliminary analyses.

A training data matrix was constructed from the sample points with soil landscape class as the dependent (y) variable and the environmental variables in Table 3 as the predictor (x) variables. Because radiometric coverage was incomplete two training data matrices were used, one including radiometric data but omitting lithology, the other including lithology but omitting radiometric data. These two data sets were analysed using RPART to create a classification tree (Figure 3). A classification tree is a method of hierarchically splitting a data set into increasingly homogenous subsets. The criterion for each split is to minimise some measure of the impurity of the resultant classes, in this case the Gini index (see Therneau and Atkinson, 1997 and Venables and Ripley, 1999 for details).

Figure 3: A partial classification tree (upper nodes and branches) from the RPART analyses of the training data including radiometrics for the SBIB indicating the decision criteria (the final pruned tree has 451 nodes and is too complex to illustrate).

Two important controls on RPART are “minsplit” and “cp”. The former determines the minimum number of observations in a node for which the routine will try to compute a split. The latter is a measure of tree complexity (Venables and Ripley, 1999) with values ranging from 0 to 1 and lower values indicating greater complexity. For the BBS data sets cp was initially set to 0.000001 and minsplit set to 2 to allow a very complex tree to be generated.

The classification trees were constructed with 10 fold cross validation. Cross validation involves splitting the training data into roughly equal sized subsets and growing the tree on all but one of these subsets and testing it with the other (Venables and Ripley, 1999). In this case, the training data were split into 10 subsets then 9 were used to grow the tree and the tenth used to test it. This was repeated 10 times (for each combination of subsets) and the results averaged to produce the final tree.

A tree with few “leaves” (terminal nodes) is likely to “under fit” the data. That is it is unlikely to adequately reflect the complexity of the relationship between the predictor variables and the dependent variable. On the other hand a tree with very many leaves may “over fit” the data. That is it reflects noise within the data rather than meaningful relationships. Cross validation provides a measure for “pruning” the tree in order to reduce its complexity. As the size of a tree increases the cross validation error tends to decrease until it reaches a minimum. After this point the cross validation error may remain steady or even increase with increasing tree size (Figure 4). In other words, beyond this minimum increasing the size of the tree does not increase the information about the relationship between the predictor and dependent variables. This minimum therefore provides an objective basis for tree pruning. Venables and Ripley (1999) suggest applying the 1-SE rule, that is to select the smallest tree with a cross validation error within one standard error of the minimum cross validation error. The unpruned tree for the BBS training matrix with radiometrics had 3794 leaves but a plot of tree size versus cross validation error indicates that if the 1-SE rule is applied the tree may be pruned to 451 leaves with no meaningful loss of information (Figure 4). At this level of complexity (cp = 0.0002) the misclassification rate is approximately 26 per cent. For the training matrix with lithology the initial tree had 6958 leaves but this was pruned to 895 leaves with cp = 0.0001 and a resultant misclassification rate of 28 per cent. It is important to re-emphasise that a “perfect” fit could be achieved by setting cp = 0, but this would merely result in an exceedingly complex tree reflecting noise that is inherent in any natural data set such as soil landscapes rather than describing meaningful relationships between the predictor variables and the soil landscapes.

Figure 4: The relationship between tree complexity and cross validation error for the classification trees of the SBIB based on radiometrics.

The pruned trees were then translated into an Avenue script for generating a grid of predicted soil landscape classes. The two grids were combined such that the values from the radiometric tree took precedence over values from the lithology tree. Finally, the combined grid was simplified by passing a modal filter over the grid and then removing regions less than 40 cells in size (approximately 10 ha) by absorbing them into adjacent regions. The grid of predicted soil landscape class for the whole BBS is shown in Figure 5.

(a)

(b)

Figure 5: Grid of predicted soil landscapes for the SBIB (a) and detail of a section with overlay of the training data line work (b).

Evaluating the model

The most important question to be asked of any predictive model is “how good are the predictions resulting from the model?” One obvious approach to answering this question is to look at misclassification. In the case of the SBIB trees the misclassification rates were 26 and 29 per cent for the radiometric and lithological trees, respectively. Given the noise inherent in any natural dataset and expected variations within the soil landscapes (see below) this result seems acceptable.

Another option is to visually compare a map of predicted soil landscapes to the pre-existing map used to train the model. A visual comparison of the predicted and training maps for the SBIB appears good, however such a comparison is not conclusive. Even a poor quality predicted map, for example one that has only about 50% agreement with the training map can look acceptable in general. On the other hand even a very good model may produce a predicted map which differs from the training map in detail (Figure 5b).

There are several reasons to expect such discrepancies between the modelled and pre-existing maps even in the best of circumstances. In most cases a soil surveyor and a classification tree will predict the conceptual basis for each soil landscape in slightly different terms. For examples a soil surveyor might define a soil landscape those areas with granitic parent material, steep to very steep slopes and low mean annual rainfall. The same soil landscape may be defined by the classification tree as those areas with a certain radiometric signal, slopes greater than 22 degrees and mean annual precipitation below 587 mm. While these differences may seem trivial they will lead to differences, most notably in the placement of boundaries.

Further, in delineating the boundaries and extent of this soil landscape the classification tree follows precise, objective rules. The soil surveyor, on the other hand, is more likely to delineate the soil boundary through the subjective assessment of information from API, geology maps, topographic maps and climate maps. Moreover, it is often the case that two experienced soil surveyors will make slightly different assessments of the data and thus delineate the boundaries differently. This is not to say that the boundaries resulting from the classification tree are necessarily more accurate or better than those drawn by surveyors, but they are more precise and objective. On the other hand it is also possible that where discrepancies exist that the model has more accurately characterised the nature of a soil landscape than the surveyor.

Given these expected differences between the training and predicted maps it seems better to test the predictions through field observation and API. The question to be asked for each observation is “are the characteristics of the site consistent with the predicted soil landscape?” In evaluating this question it needs to be recognised that the within each soil landscape there will be variations in most of its characteristics. For example a single soil type may occur in a number of soil landscapes as either dominant or minor components. The test of prediction must therefore be “is it reasonable that the observed soil type should occur in this predicted soil landscape?” not “is the observed soil type the dominant type for this predicted soil landscape?”. The latter criterion, used in a similar study by Bui et al., (1999) is too restrictive given the heterogeneity of soil landscapes.

The soil landscape mapping of the SBIB is a work in progress. The predictive maps presented here are a first pass evaluation of the technique based on presently available data. These predictive maps have yet to be evaluated against field data but initial examination by soil surveyors familiar with the area suggest they are of a quality suitable for reconnaissance soil landscape mapping in and around the area of the training data (Banks, R., Senior Soil Surveyor, DLWC Gunnedah, pers. comm.) This raises the question of how far their predictions should be extrapolated.

Employing the model predictions

No matter how well a model predicts soil classes in the vicinity of the training data one can expect that the accuracy of the model predictions will decrease in environments dissimilar to that in which it was trained. This raises the problem of defining the degree of dissimilarity between the environment of the training area and the environment of the areas for which we wish to make predictions.

To assess dissimilarity we can define the concept of environmental space by analogy with geographic space. Figure 6 illustrates the case of three data points in two-dimensional environmental space defined by elevation and precipitation. We can observe that the distance between points A and B is small relative to the distance between points A and C. We can measure the distance, d, between points A and B as the Euclidean distance, the square root of the sum of the squares of the distance in both dimensions:

(1)

For the soil landscape model the environmental space is n dimensional, where n is the number of environmental parameters (12 for the lithology based tree and 14 for the radiometric based tree), and we can derive an analogous measure based on Euclidean distance where d is the square root of the sum of the squares of the distance in each of the n dimensions. Unlike geographic space, each dimension of environmental space is expressed in different units and each may vary in its importance to the soil landscape model. This means that environmental variables may need to be normalised and weighted before the calculation of environmental distance. We are currently looking at various methods of evaluating the distance in environmental space between the training data and the rest of the SBIB. Figure 7 illustrates one such measure where all variables have equal weighting and are normalised by subtracting the mean and dividing by the standard variation. Figure 7 suggests that for much of the SBIB the training data do not adequately describe the environmental space.

Figure 6: Three data points in the two dimensional environmental space defined by precipitation and elevation.

Figure 7: Estimates of the average distance in environmental space all grid cells in the SBIB from the points in the training data (excluding lithology and radiometric data). Darker tone indicates greater environmental distance.

To overcome the problem of the environmental distance between the existing training data and the areas for which we wish to make predictions we are undertaking additional mapping using API and field observations for several small areas scattered throughout the SBIB. The aim is to create training data sets with a broad coverage of the environmental space of the SBIB and then interpolate between these training datasets rather than extrapolate models outwards from a single training set. We will be using measures of environmental distance to locate our additional training areas to ensure that these are placed so as to provide the best possible coverage of environmental space. We are also exploring distance measures that can accommodate categorical data such as lithological classes.

Incorporating expert knowledge

Expert knowledge can be incorporated into the tree based soil landscape models at several stages including in the initial choice and derivation of predictor variables, pre-stratification, and assessment and modification of the decision rules produced by the model.

The success of the model is fundamentally dependent on the choice of appropriate predictor variables and good quality data sets that describe them. The range of possible predictor variables is very large and expert knowledge is required to determine which of these are most likely to be good predictors of soil landscape class in particular areas. It may also be desirable to manipulate base environmental data to develop new predictor variables. For example, in the broad alluvial plains of the SBIB it is known that the soils of elevated areas such as levees and dunes are different from the soils of the surrounding plain (Banks, R. pers. comm.). The difference in elevation that defines such features is often subtle and they are hard to identify reliably from raw elevation data. This association led to the development of the relative elevation index (REI, see Table 3) designed specifically to identify such features. Though the usefulness of the REI has not yet been evaluated in the field, discussion their appears to be a good correlation between values of the REI and levees and depressions (Banks, R. pers. comm., Figure 8). Another example of the use of expert knowledge to manipulate base data sets is the reclassification of geology into lithological parent material because geological classes may be differentiated on a variety of characteristics unrelated their role as soil parent material. Reclassification of the geological database allowed the number of classes to be reduced from 409 to 24 in a way that emphasised this role.

Figure 8: Detail of area around Narrabri illustrating the usefulness of the REI for identifying levees (red) and depressions (blue).

Expert knowledge may also be incorporated by stratifying the training data before constructing classification trees where it is known that the data are best divided according to some criteria for which suitable environmental parameters are not available.

Finally, expert knowledge may be incorporated in the post-classification phase. Evaluation of the data through field observation and other methods may indicate that some branches of the tree lead to erroneous predictions or that, given the available predictor variables, the model is unable to make suitably accurate predictions for some areas. In this case, one or more branches may need to be pruned and replaced by traditional methods in the areas to which they apply. Less drastically, it may be possible to prune off the offending branches of the tree, stratify the areas involved according to some imposed criterion, and grow separate trees to describe those areas. How extensively such measures need to be employed for the SBIB will depend on the results from field evaluation of the predictions. However it is clear form the preliminary assessment that the classification tree fails to identify the salt pelletised clay lunettes around Lake Goran represented by the Lochaber soil landscape (R.Banks, pers. comm.). It is probable that the predictor environmental variables are unable to “capture” this concept and the model will need to be altered or if no suitable decision rules can be found such features will need to be mapped manually.

Discussion and Summary

The results of the tree classification modelling for the SBIB are very preliminary. It is clear from an evaluation of environmental space that additional training data are needed. Further it is obvious that no final evaluation of the method can be made until the predictions have been thoroughly tested through additional sampling. However, an assessment of the preliminary results by experience soil surveyors familiar with the area suggest that with an appropriate choice of environmental predictor variables and good quality data tree classification models are useful for at least reconnaissance scale soil landscape mapping. Their principal advantage of this approach is that it allows the rapid assessment of large areas. Providing timely soil landscape data for regional assessments such as that being undertaken in the SBIB would simply not be possible using conventional techniques. A secondary advantage is that process is objective, explicit and repeatable. Mapping by traditional methods is subjective. Even when given identical guidelines and base data, no two soil surveyors are likely to produce identical maps. In contrast, models based on classification trees using the same guidelines and data will produce identical outputs regardless of the operator. Moreover, the rules by which soil surveyors delineate their soil landscape units are often implicit whereas for classification trees the rules for assigning a cell to a particular class are clear and explicit and therefore can be subject to scrutiny.

Automated methods do not usurp the role of the soil surveyor. Expert knowledge is required at each phase of the modelling process if the models are to be successful. Thus soil surveyors should: inform the choice, development and assessment of the predictor variables; influence the development of the model for example by pre-classification; and assess the quality of the model.

Automated models, such as those based on classification trees, should not be used far from the environmental space for which they were developed and where possible interpolation is preferable to extrapolation.

Finally, it should be recognised that automated techniques may not be universally applicable. The expert knowledge encapsulated in the definitions of some soil landscape classes may not be able to be captured by available environmental data or expressed as objective decision rules. In such cases it will be necessary to use more traditional methods. Indeed, the best outcomes are likely to occur where automated and traditional methods are used in concert.

References

Atkinson, E.J. and Therneau, T.M. (2000). An Introduction to Recursive Partitioning Using RPART Routines, Mayo Foundation.

Austin, M.P., Cawsey, E.M., Baker, B.L., Yialeloglou, M.M., Grice, D.J. and Briggs S.V. (2000). Predicting Vegetation Cover in the Central Lachlan Region, CSIRO Division of Wildlife and Ecology.

Banks, R.G. (2000). Forest Plantation for Saline Recharge Control: Blackville 1:100,000 Sheet, Barwon Region Soil Survey Unit.

Banks, R.G. (1998). Soil Landscapes of the Blackville 1:100,000 Sheet Report, Department of Land and Water Conservation, Sydney.

Bui, E.N., Moran, C.J. and Simon, D.A.P. (1998). New geotechnical maps for the Murray-Darling basin. Technical Report 42/98, CSIRO Land and Water, Canberra.

Bui, E.N., Loughhead, A. and Corner, R. (1999). Extracting soil-landscape rules from previous soil surveys. Australian Journal of Soil Research, 37: 495-508.

Department of Land and Water Conservation (2000). Soil and Landscape Issues in Environmental Impact Assessment, Technical Report No.34 , 2nd edition, Department of Land and Water Conservation

Department of Local Government, NSW Environmental Protection Agency, NSW Health, NSW Department of Land & Water Conservation, and Department of Urban Affairs & Planning (1998) On-site Sewage Management for Single Households. February 1998.

Dobos, E., Micheli, E., Baumgardner, M.F., Biehl, L. and Helt, T. (2000). Use of combined digital elevation model and satellite radiometric data for regional soil mapping. Geoderma, 97: 367-391.

Gallant, J.C. and Wilson, J.P. (1996). TAPES-G: A grid-based terrain analysis program for environmental sciences. Computers & Geosciences, 22: 713-722.

Gray, J.M. and Murphy, B.W. (1999). Parent Material and Soils - A Guide to the Influence of Parent Material on Soil Distribution in Eastern Australia. Technical Report No. 45, NSW Dept of Land and Water Conservation, Sydney.

Houlder, D., Hutchinson, M., Nix, H. and McMahon, J. (1999). ANUCLIM Version 5.0, http://cres.anu.edu.au, pp. 73.

Hutchinson, M.F. (1991). The application of thin plate smoothing splines continent-wide data assimilation. In: J.D. Jasper (Editor), Data Assimilation Systems. Bureau of Meteorology Research Report No. 27. Bureau of Meteorology, Melbourne, pp. 104-113.

Keady, L.C. and Banks, R.G. (1998). Field Guide to Soils of the Western Barwon Region Floodplains. Department of Land and Water Conservation, Sydney.

McBratney, A.B., Odeh, I.O.A., Bishop, T.F.A., Dunbar, M., S. and Shatar, T.M. (2000). An overview of pedometric techniques for use in soil survey. Geoderma, 97: 293-327.

McKenzie, N.J. (1991). A strategy for coordinating soil survey and land evaluation in Australia, CSIRO, Division of Soils, Divisional Report No. 114.

McKenzie, N.J. and Ryan, P.J. (1999). Spatial prediction of soil properties using environmental correlation. Geoderma, 89: 67-94.

Moore, I.D., Gessler, P.E., Nielsen, G.A. and Peterson, G.A. (1993). Soil attribute prediction using terrain analysis. Soil Science Society of America Journal, 57: 443-452.

Murphy, C.L., Macleod, A.P., Chapman, G.A., Milford, H.B., McGaw, A.J.E., Edye, J.A. and Simons, N.A. (2001). NSW State Soil Landscape Mapping Program and Derivative Products. In Proceedings of the Geospatial Information and Agriculture Symposium, Sydney 2001.

Odeh, I.O.A., McBratney, A.B. and Chittleborough, D.J. (1994). Spatial prediction of soil properties from landform attributes from a digital elevation model. Geoderma, 63: 197-214.

Paton, T.R., Humphreys, G.S. and Mitchell, P.B. (1995). Soils - a new global view. UCL Press, London.

Pickup, G. and Marks, A. (2000). Identifying large-scale erosion and deposition processes from airborne gamma radiometrics and digital elevation models in a weathered landscape. Earth Surface Processes and Landforms, 25: 535-557.

Therneau, T.M. and Atkinson, E.J., 1997. An introduction to recursive partitioning using the RPART routine. Technical Report 61, Mayo Clinic, Section of Statistics.

Venables, W.N. and Ripley, B.D. (1999). Modern applied Statistics with S-PLUS. Springer-Verlag, New York, 501 pp.

Wilford, J., Bierworth, P.N. and Craig, M.A. (1997). Application of airborne gamma ray spectrometry in soil/regolith mapping. Australian Geological Survey Organisation Journal of Geology and Geophysics, 17: 201-216.

Wilson, J.P. and Gallant, J.C. (1996). EROS: A grid based program for estimating spatially-distributed erosion indices. Computers & Geosciences, 22: 707-712.

Wilson, J.P. and Gallant, J.C. (2000). Terrain Analysis: Principles and Applications. John Wiley & Sons, 479 pp.

Top Of PageNext Page