* How well do spatial objects represent the real world?
* How well do GIS functional algorithms compute the true values of products?
* Accuracy of data is among the most important of technical issues in GIS
* Data quality, error, uncertainty, scale, resolution, and precision all affect
ways in which data can be used & interpreted
All spatial data are inaccurate to some degree, BUT data are generally represented in the computer to high precision
the closeness of results, computations, or estimates to TRUE values (or values accepted to be true)
since spatial data are usually a generalization of the real world, it is often difficult to identify a TRUE value
e.g., in measuring the accuracy of a contour, we compare the contour as
drawn on the source map, since the contour does not exist as a real line on the
surface of the earth
accuracy of the database may have little relationship to the accuracy of the products computed from the database
e.g., accuracy of a slope, aspect, or watershed computed from a DEM is not
easily related to the accuracy of the elevations in the DEM itself
* not necessarily the closeness of results, but the number of decimal places or
significant digits in a measurement
* Precision is not the same as accuracy!
* Repeatability vs. "truth"
* A large number of significant digits doesn't necessarily indicate that the
measurement is accurate
* A GIS works at high precision, usually much higher than the accuracy of the
data themselves
since all spatial data are of limited accuracy, the important questions are:
how to measure accuracy
how to track the way errors are propagated through GIS operations
how to ensure that users dont ascribe greater accuracy to data than they deserve
Alternative definitions. . . .conformance to expectations: fulfilling arbitrary thresholds
following established procedures, as with geodetic standards
fitness for use: truth in labelling (distinct roles of producer and consumer)
National Committee for Digital Cartographic Data Standards
(NCDCDS) was recently developed by a coordinated national effort in the
U.S.
1982-1988: Members of the American Congress on Surveying and Mapping met to produce a draft standard
Later became the Spatial Data Transfer Standard
standard model to be used for describing digital data accuracy
similar standards being adopted in other countries
National Committee for Digital Cartographic Data Standards (NCDCDS) identifies several components of data quality:
positional accuracy
attribute accuracy
logical consistency
completeness
lineage
defined as the closeness of locational information (usually coordinates) to the true position
maps are accurate to roughly one line width or 0.5 mm
equivalent to 12 m on 1:24,000 or 125 m on 1:250,000 maps
within a database a typical UTM coordinate pair might be:
Easting 579124.349 m
Northing 5194732.247 m
If the database was digitized from a 1:24,000 map sheet, the last four
digits in each coordinate (units, tenths, hundredths, thousandths) would be
questionable
* Use an independent source of higher accuracy:
* find a larger scale map
* use GPS
* use raw survey data
* Use internal evidence:
* digitized polygons that are unclosed, lines that overshoot or undershoot
nodes, etc. are indications of inaccuracy
* sizes of gaps, overshoots, etc. may be a measure of positional accuracy
Compute accuracy from knowledge of the errors introduced by different sources
e.g., 1 mm in source document
0.5 mm in map registration for digitizing
0.2 mm in digitizing
if sources combine independently, we can get an estimate of overall accuracy
by summing the squares of each component and taking the square root of the
sum
* defined as the closeness of attribute values to their true value
* while location may not change with time, attributes often do
* attribute accuracy must be analyzed in different ways depending on the nature
of the data
* for continuous attributes (surfaces) such as DEMs or TINs:
* accuracy is expressed as a measurement of error: e.g., elevation to +/- 1
m
* for categorical attributes such as classified polygons:
* are the categories appropriate, sufficiently detailed and defined?
* gross errors, such as a polygon classified at A (shopping center) when it
should have been B (golf course), are possible
* more likely the polygon will be heterogeneous, i.e. vegetation zones where
the area may be 70% A and 30% B
* refers to the internal consistency of the data structure, particularly as it
applies to topological consistency
* is the database consistent with its definitions? . . .
* if there are polygons, do they close?
* is there exactly one label within each polygon?
* are there nodes wherever arcs cross, or do arcs sometimes cross without
forming nodes?
concerns the degree to which the data exhaust the universe of possible items
e.g., are all possible object included within the database?
affected by rules of selection
by generalization
by scale
* a record of the data sources and of the operations which created the
database. . .
* how was it digitized, from what documents?
* when were the data collected?
* what agency collected the data?
* what steps were used to process the data?
* what was the precision of the computational results?
* Lineage is often useful as an indicator of accuracy
Additional stuff not covered in video lecture:
error is introduced at almost every step of database creation
What are these steps, and what kinds of error are introduced??
POSITIONAL MEASUREMENT ERROR
the most accurate basis of absolute positional data is the geodetic
control network = a series of points whose positions are known with high
precision
GPS is a powerful way of augmenting this network
Geodetic control points correlate lat/long, height, scale and orientation throughout the U.S. - based on the geoidGeoid is the shape that would approximate by an undisturbed mean sea level - surface where gravity everywhere is perpendicular
must correct for the deflections of the vertical so that measurements of
distance on the earth's surface will be consistent with those determined by
astronomic observations
Geoid heights range from lows in the Atlantic to highs in the Rockies
most positional data on land are derived from air photos
here accuracy depends on the establishment of good control points
data from remote sensing is more difficult to position accurately because of the size & quantity of each pixel
some positional data come from text descriptions, e.g., old surveys tied in
to marks on trees or boundary following a watershed or midline of a river
* digitizers encode manuscript lines as sets of x-y coordinate pairs
* resolution of coordinate data are dependent on mode of digitizing:
* point-mode - selecting & encoding only those points deemed critical to
truly representing a line
* stream-mode - digitizing device automatically selects points on a distance or
time parameter
* high density of coordinate pairs is selected
2 types of errors normally occur in stream-mode digitizingphysiological - hitting the button twice or involuntary muscle spasms that tend to produce spikes, switchbacks, or loops
psychomotor- digitizing operator can't see the line or can't properly move
the cross-hairs along the line
* may also involve misinterpretation or too much generalization
* not easy to remove these automatically
* in spite of physiological and psychomotor errors, digitizing itself is not a
major source of positional error
* also errors in registration & control points, as well as shrinkage or
shredding of paper
* attributes usually obtained through a combination of field collection and
interpretation
* categories may be subjective (e.g., "diversity," or "old growth" used in
forest mgmt.)
* attributes such as these may not be easy to check in the field
* for social data, a major source of inaccuracy is undercounting, e.g., missing
certain social groups in a Census
* common practices in map compilation introduce further inaccuracies
* generalization - practice of reducing the # of points on a line but still
keeping the line's appearance
* line smoothing - reducing the # of points on a line & also changing the
line's appearance
* separation of features - e.g., railroad moved on a map so as not to overlap
adjacent road
* usefulness & meaning of data as well
mathematical errors
accuracy lost due to low precision computations
rasterization of vector data
misuse of logic
generalization/smoothing and related problems of interpretation
Reports issued under U.S. National Standards. . . .
current standards being developed by the NCDCDS (National Committee for
Digital Cartographic Data Standards) which sets standards for data quality
documentation as well as components of data quality
http://dusk.geo.orst.edu/gis/Old/lec12.html
Return to GEO 465/565 Lecture 9