Spatial Uncertainty Management Based on a Bayesian Network Model
Jong-hoon Lee
Department of Information Science and Telecommunication, University
of Pittsburgh, 135 North Bellefield Avenue Pittsburgh, PA 15260
Tel: (412) 361-2657 Email: jhlee@sis.pitt.edu
Abstract: When solving real-world problems, spatial decision support
systems must take into account uncertainty in data and models. The problem
of how uncertain information of spatial data results in misinformed decisions
needs to be solved. Positional data uncertainty plays a major role in making
decisions and assessing risk when spatial decision support systems are
used. We use a Bayesian network model for probabilistic reasoning in a
geographic information system for the purpose of assessing the risk involved
in decision-making. In the Bayesian network, the main cause nodes of the
positional data uncertainty are horizontal and vertical positional uncertainties.
To quantify positional data uncertainty, we choose a statistical method
of comparison between spatial data sets and high accuracy maps. In our
method, based on National Map Accuracy Standards, we use a number of varying
width buffers around a line. We overlay the maps and compute statistics
for each width. We calculate the probability values by counting random
sample points on the line in the acceptable accurate buffer zone. We demonstrate
the usefulness of our approach by plotting the results with a Bayesian
network. Results of our study can be used to improve the quality of spatial
decision-making and to help in managing the total quality control of spatial
decision support systems.
1. Introduction
Spatial data uncertainty and quality control is a very significant
issue in Geographic Information System and has been studied by several
researchers. Spatial data uncertainty cause from many different sources
such as understanding and modeling of reality and source data and data
encoding, editing, conversion, processing, analysis, and output. Many researchers
have measured and visualized spatial data uncertainty for controlling spatial
data quality.
Geographic Information Systems are now
being implemented and used in public and private sectors widely. The major
goal of using Geographic Information Systems is to support decision making
for top management or other. Reliability of spatial data quality is one
of the very essential properties of Geographic Information System because
in general, the poorer quality of data, the poorer the decision. Misinformed
decisions can have harsh consequences, as when fire fighters are sent to
the wrong place or a house is built on landslide area.
We use a Bayesian network for probabilistic
reasoning in a geographic information system for the purpose of assessing
the risk involved in decision-making. In the Bayesian network, the cause
nodes of the positional data uncertainty are horizontal and vertical positional
uncertainties. The probability values are calculated using the confidence
levels and standard deviations of the horizontal and vertical differences
between a random sample of line and polygon data sets and a high accuracy
map. To quantify positional data uncertainty, we choose a statistical method
of comparison between spatial data sets and high accuracy maps. A number
of varying width buffers around a line are used in the method. For each
width, the maps are overlaid and statistics are computed. By plotting the
results with a Bayesian graphic tool, the usefulness of the method is demonstrated.
Results of the study can be used to improve the quality of spatial decision-making
and to help manage the total quality control of spatial decision support
systems.
In the following section, we present overview
of spatial data uncertainty and positional accuracy. In Section 3, we present
a brief overview of Bayesian networks and construct a Bayesian network
to solve our uncertainty problem. In Section 4 and 5, we demonstrate the
approach by presenting some results of the method in conjunction with some
real data set implemented by the Link to Learn Project team in University
of Pittsburgh. We conclude in section 6.
2. Spatial data uncertainty and Positional accuracy
Issues of uncertainty in Geographic Information
System deal with all sources of incorrectness and incompleteness in the
measurement, analysis, and interpretation of digitally represented Earth
referenced phenomena. Spatial data uncertainty is often grouped according
to positional uncertainty and topological uncertainty and attribute uncertainty
and temporal uncertainty (Goodchild M. F. 1997).
In this paper, we will assume that the Geographic
Information System data have only positional uncertainty to investigate
application of a decision support system model based on Bayesian networks
Positional accuracy of a spatial object, or a digital representation of
a feature, can be defined through measures of the difference between the
apparent locations of the feature as recorded in a database and its true
location. Unfortunately this may be impractical due to time and economic
constraints.
We compare differences between data sets of
Geographic Information System and those of high accuracy paper map to estimate
horizontal and vertical positional uncertainty. In the test, we choose
several random points of polyline and polygon data from the tables in the
main Geographic Information System database. And then, we calculate probability
of data uncertainties to apply these values with a Bayesian network, an
inference and reasoning model.
3. Bayesian network and problem formulation.
Bayesian networks are used for reasoning and
under uncertainty (Pearl, 1988). Bayesian networks are also known as causal
networks or belief networks. Probability theory and graph theory form their
basis: random variables are nodes and conditional dependencies are edges
in a directed acyclic graph. Edges typically point from cause to effect.
Temporal Bayesian networks can be used in dynamic environments (Mengshoel
& Wilkins, 1997).
Firgure1: A Bayesian Network, with a causal
relationship between A and B
To explain the process better we will use a
simple example.
Given a situation where it might rain today,
and might rain tomorrow, what is the probability that it will rain on both
days? Rain on two consecutive days is not independent events with isolated
probabilities. If it rains on one day, it is more likely to rain the next.
Solving such a problem involves determining the chances that it will rain
today, and then determining the chance that it will rain tomorrow conditional
on the probability that it will rain today. These are known as "joint probabilities."
Suppose that P (rain today) = 0.20 and P (rain tomorrow given that it rains
today) = 0.70. The probability of such joint events is determined by:
P (E1, E2) = P (E1) P (E2|E1)
which can also be expressed as:
P (E2|E1) = P (E1, E2) / P (E1)
Working out the joint probabilities for all
eventualities, the results can be expressed in a table format
Table 1 Marginal and Joint Probabilities for rain both today and tomorrow
|
|
|
|
|
| Rain Today |
|
|
|
| No Rain Today |
|
|
|
| Marginal Probability of Rain Tomorrow |
|
|
|
From the table, it is evident that the joint probability of rain over
both days is 0.14, but there is a great deal of other information that
had to be brought into the calculations before such a determination was
possible. With only two discrete, binary variables, four calculations were
required.
This same scenario can be expressed using
a Bayesian Network Diagram as shown ("!" is used to denote "not").
Figure 2: A Bayesian Network showing the probability of rain
One attraction of Bayesian Networks is the
efficiency that only one branch of the tree needs to be traversed. We are
really only concerned with P(E1), P(E2|E1) and P(E2,E1).
We can also utilize the graph both visually
and algorithmically to determine which parameters are independent of each
other. Instead of calculating four joint probabilities, we can use the
independence of the parameters to limit our calculations to two. It is
self-evident that the probabilities of rain on the second day having rained
on the first are completely autonomous from the probabilities of rain on
the second day having not rained on the first.
At the same time as emphasizing parametric
indifference, Bayesian Networks also provide a parsimonious representation
of conditionality among parametric relationships. While the probability
of rain today and the probability of rain tomorrow are two discrete events
(it cannot rain both today and tomorrow at the same time), there is a conditional
relationship between them (if it rains today, the lingering weather systems
and residual moisture are more likely to result in rain tomorrow). For
this reason, the directed edges of the graph are connected to show this
dependency.
A Bayesian method in Geographic Information
System was used by Aspinall (1992) to predict the distribution of red deer
in the Grampian Region, northeast Scotland by combining a number of data
set. The inference method presented in this paper is based on Bayes theorem.
Stassopoulou, Petrou and Kittler(1998) used a Bayesian network for probabilistic
reasoning with a Geographic Information System for the purpose of assessing
the risk of desertification after a forest fire. They showed how uncertainty
in the input data can be incorporated in the network and present various
methods by which the conditional probability matrices used by the network
can be constructed. However we use a flexible model to combine data, namely
the Bayesian network which is not only a tool for structuring expert knowledge
by the use of a graph, but it also allows bi-directional inference, it
propagates information casually, from causes to effect, as well as diagnostically,
from effects to possible causes.
We use Bayesian network for probabilistic
reasoning with a Geographic Information System for the purpose of assessing
the risk of informed decision after estimating positional uncertainty.
In this paper, risk of informed decision-making
is originated from positional uncertainties. Horizontal and vertical positional
uncertainties are cause of total positional uncertainty.
Figure 3. Bayesian Network
4. Assessment of positional uncertainty
First of all, we need two spatial data sets
to investigate positional uncertainty. We use technology atlas data as
dependent data set. As a part of Pennsylvania Education Network Project
(Link to Learn), Technology atlas has various geographic data about technology
facilities in Pennsylvania such as fiber optic lines and Internet service
providers etc. Those data were digitized and geocoded based on survey data
and paper maps. We use high-resolution paper maps as independent data set.
After scanning the maps, we overlay two images to calculate positional
uncertainty. And then we use buff zones based on National Map Accuracy
Standard. In this case, we can allow ±0.846mm error zone along to
data lines because we use 1:12000 scale map as independent data sets.
We can quantify by counting how many random
points are in accuracy zone.
Following tables show samples of results.
Table 2. Vertical statistic worksheet
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Average: 3.25
The points within accurate buff zone: 8/160
The points within inaccurate buff zone: 152/160
Table3. Horizontal statistic worksheet
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Average: 2.24
The points within accurate buff zone: 34/200
The points within inaccurate buff zone: 166/200
5. Bayesian Network and Decision making
By counting the points within buff zones, we can set nodes of a Bayesian
Network as following by the result of calculation.
Tbale4 probability of each node
| Horizontal | Vertical | |
| Accuracy | 0.17 | 0.05 |
| Inaccuracy | 0.83 | 0.95 |
The decision-making under positional uncertainty
follows rule-based reasoning. An example of such a rule is
IF the horizontal data is inaccuracy
AND the vertical data is inaccuracy
THEN risk of informed decision is high
We have 166 inaccurate data points in total
200 horizontal points and 152 inaccurate data in total 160 vertical points.
They exceeded requirement of The National Map Accuracy Standard. The combinational
probability of vertical and horizontal inaccuracies is 0.7885. We suggest
the risk of informed decision caused by positional accuracy data is more
than 78% if you make a decision based on the technology atlas data. The
usefulness of model is increased when we have more nodes that affect spatial
data quality.
References:
Aspinall, R. (1994). The design of belief network-based systems for
price forecasting. Computer and Electronic Engineering, 20, 163-180.
Goodchild M. F. (1997). Uncertainty in Geospatial Information Representation,
Analysis and Decision Support. FY97 NURI Research Proposal.
Mengshoel, O. J. & Wilkins, D. C. (1997). "Abstraction and Aggregation
in Bayesian Networks." Proc. 1997 AAAI Workshop on Abstractions, Decisions,
and Uncertainty, July 1997, Providence.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Networks
of Plausible Inference. Morgan Kaufmann, San Mateo, California.
Stassopoulou, A., Petrou, M., Kittler, J.(1998). Application
of Bayesian network in Geographic Information System based decision making
system. International Journal of Geographical Information Science, 12,
23-45.