- This topic is on going. The paper will be done in the future -
 
 

Spatial Uncertainty Management Based on a Bayesian Network Model

Jong-hoon Lee
Department of Information Science and Telecommunication, University of Pittsburgh, 135 North Bellefield Avenue Pittsburgh, PA 15260
Tel: (412) 361-2657 Email: jhlee@sis.pitt.edu


http://www.ucgis.org/oregon/papers/lee.htm




Abstract: When solving real-world problems, spatial decision support systems must take into account uncertainty in data and models. The problem of how uncertain information of spatial data results in misinformed decisions needs to be solved. Positional data uncertainty plays a major role in making decisions and assessing risk when spatial decision support systems are used. We use a Bayesian network model for probabilistic reasoning in a geographic information system for the purpose of assessing the risk involved in decision-making. In the Bayesian network, the main cause nodes of the positional data uncertainty are horizontal and vertical positional uncertainties. To quantify positional data uncertainty, we choose a statistical method of comparison between spatial data sets and high accuracy maps. In our method, based on National Map Accuracy Standards, we use a number of varying width buffers around a line. We overlay the maps and compute statistics for each width. We calculate the probability values by counting random sample points on the line in the acceptable accurate buffer zone. We demonstrate the usefulness of our approach by plotting the results with a Bayesian network. Results of our study can be used to improve the quality of spatial decision-making and to help in managing the total quality control of spatial decision support systems.
 
 

1. Introduction
Spatial data uncertainty and quality control is a very significant issue in Geographic Information System and has been studied by several researchers. Spatial data uncertainty cause from many different sources such as understanding and modeling of reality and source data and data encoding, editing, conversion, processing, analysis, and output. Many researchers have measured and visualized spatial data uncertainty for controlling spatial data quality.
      Geographic Information Systems are now being implemented and used in public and private sectors widely. The major goal of using Geographic Information Systems is to support decision making for top management or other. Reliability of spatial data quality is one of the very essential properties of Geographic Information System because in general, the poorer quality of data, the poorer the decision. Misinformed decisions can have harsh consequences, as when fire fighters are sent to the wrong place or a house is built on landslide area.
     We use a Bayesian network for probabilistic reasoning in a geographic information system for the purpose of assessing the risk involved in decision-making. In the Bayesian network, the cause nodes of the positional data uncertainty are horizontal and vertical positional uncertainties. The probability values are calculated using the confidence levels and standard deviations of the horizontal and vertical differences between a random sample of line and polygon data sets and a high accuracy map. To quantify positional data uncertainty, we choose a statistical method of comparison between spatial data sets and high accuracy maps. A number of varying width buffers around a line are used in the method. For each width, the maps are overlaid and statistics are computed. By plotting the results with a Bayesian graphic tool, the usefulness of the method is demonstrated. Results of the study can be used to improve the quality of spatial decision-making and to help manage the total quality control of spatial decision support systems.
     In the following section, we present overview of spatial data uncertainty and positional accuracy. In Section 3, we present a brief overview of Bayesian networks and construct a Bayesian network to solve our uncertainty problem. In Section 4 and 5, we demonstrate the approach by presenting some results of the method in conjunction with some real data set implemented by the Link to Learn Project team in University of Pittsburgh. We conclude in section 6.
 

2. Spatial data uncertainty and Positional accuracy
     Issues of uncertainty in Geographic Information System deal with all sources of incorrectness and incompleteness in the measurement, analysis, and interpretation of digitally represented Earth referenced phenomena. Spatial data uncertainty is often grouped according to positional uncertainty and topological uncertainty and attribute uncertainty and temporal uncertainty (Goodchild M. F. 1997).
     In this paper, we will assume that the Geographic Information System data have only positional uncertainty to investigate application of a decision support system model based on Bayesian networks     Positional accuracy of a spatial object, or a digital representation of a feature, can be defined through measures of the difference between the apparent locations of the feature as recorded in a database and its true location. Unfortunately this may be impractical due to time and economic constraints.
     We compare differences between data sets of Geographic Information System and those of high accuracy paper map to estimate horizontal and vertical positional uncertainty. In the test, we choose several random points of polyline and polygon data from the tables in the main Geographic Information System database. And then, we calculate probability of data uncertainties to apply these values with a Bayesian network, an inference and reasoning model.
 

     3. Bayesian network and problem formulation.
     Bayesian networks are used for reasoning and under uncertainty (Pearl, 1988). Bayesian networks are also known as causal networks or belief networks. Probability theory and graph theory form their basis: random variables are nodes and conditional dependencies are edges in a directed acyclic graph. Edges typically point from cause to effect. Temporal Bayesian networks can be used in dynamic environments (Mengshoel & Wilkins, 1997).


     Firgure1: A Bayesian Network, with a causal relationship between A and B

     To explain the process better we will use a simple example.
     Given a situation where it might rain today, and might rain tomorrow, what is the probability that it will rain on both days? Rain on two consecutive days is not independent events with isolated probabilities. If it rains on one day, it is more likely to rain the next. Solving such a problem involves determining the chances that it will rain today, and then determining the chance that it will rain tomorrow conditional on the probability that it will rain today. These are known as "joint probabilities." Suppose that P (rain today) = 0.20 and P (rain tomorrow given that it rains today) = 0.70. The probability of such joint events is determined by:
P (E1, E2) = P (E1) P (E2|E1)
    which can also be expressed as:
P (E2|E1) = P (E1, E2) / P (E1)
     Working out the joint probabilities for all eventualities, the results can be expressed in a table format

Table 1 Marginal and Joint Probabilities for rain both today and tomorrow
 
 
Rain Tomorrow 
No Rain Tomorrow
Marginal Probability of Rain Today
Rain Today
0.14
0.06
0.20
No Rain Today 
0.08
0.72
0.80
Marginal Probability of Rain Tomorrow
0.22
0.78
1.0

From the table, it is evident that the joint probability of rain over both days is 0.14, but there is a great deal of other information that had to be brought into the calculations before such a determination was possible. With only two discrete, binary variables, four calculations were required.
     This same scenario can be expressed using a Bayesian Network Diagram as shown ("!" is used to denote "not").


Figure 2: A Bayesian Network showing the probability of rain

     One attraction of Bayesian Networks is the efficiency that only one branch of the tree needs to be traversed. We are really only concerned with P(E1), P(E2|E1) and P(E2,E1).
     We can also utilize the graph both visually and algorithmically to determine which parameters are independent of each other. Instead of calculating four joint probabilities, we can use the independence of the parameters to limit our calculations to two. It is self-evident that the probabilities of rain on the second day having rained on the first are completely autonomous from the probabilities of rain on the second day having not rained on the first.
     At the same time as emphasizing parametric indifference, Bayesian Networks also provide a parsimonious representation of conditionality among parametric relationships. While the probability of rain today and the probability of rain tomorrow are two discrete events (it cannot rain both today and tomorrow at the same time), there is a conditional relationship between them (if it rains today, the lingering weather systems and residual moisture are more likely to result in rain tomorrow). For this reason, the directed edges of the graph are connected to show this dependency.
     A Bayesian method in Geographic Information System was used by Aspinall (1992) to predict the distribution of red deer in the Grampian Region, northeast Scotland by combining a number of data set. The inference method presented in this paper is based on Bayes theorem. Stassopoulou, Petrou and Kittler(1998) used a Bayesian network for probabilistic reasoning with a Geographic Information System for the purpose of assessing the risk of desertification after a forest fire. They showed how uncertainty in the input data can be incorporated in the network and present various methods by which the conditional probability matrices used by the network can be constructed. However we use a flexible model to combine data, namely the Bayesian network which is not only a tool for structuring expert knowledge by the use of a graph, but it also allows bi-directional inference, it propagates information casually, from causes to effect, as well as diagnostically, from effects to possible causes.
     We use Bayesian network for probabilistic reasoning with a Geographic Information System for the purpose of assessing the risk of informed decision after estimating positional uncertainty.
     In this paper, risk of informed decision-making is originated from positional uncertainties. Horizontal and vertical positional uncertainties are cause of total positional uncertainty.


Figure 3. Bayesian Network

4. Assessment of positional uncertainty
     First of all, we need two spatial data sets to investigate positional uncertainty. We use technology atlas data as dependent data set. As a part of Pennsylvania Education Network Project (Link to Learn), Technology atlas has various geographic data about technology facilities in Pennsylvania such as fiber optic lines and Internet service providers etc. Those data were digitized and geocoded based on survey data and paper maps. We use high-resolution paper maps as independent data set. After scanning the maps, we overlay two images to calculate positional uncertainty. And then we use buff zones based on National Map Accuracy Standard. In this case, we can allow ±0.846mm error zone along to data lines because we use 1:12000 scale map as independent data sets.
     We can quantify by counting how many random points are in accuracy zone.
     Following tables show samples of results.

Table 2. Vertical statistic worksheet
 
Point Number
Name
Layer
Difference
1
Scranton
Outside Plant
0.5
2
Lilly
Regeneration
7.5
...
...
...
...
159
Connellsville
Store Room
1.0
160
Philadelphia
Central Ofiice
3.5

Average: 3.25
The points within accurate buff zone: 8/160
The points within inaccurate buff zone: 152/160
 

Table3. Horizontal statistic worksheet
 
Point Number
Name
Layer
Difference
1
I-279
Interstate Highway
0.25
2
HWY28
State Highway
4.5
...
...
...
...
199
1-376
Interstate Highway
3.5
200
HWY422
US Highway
0.75

Average: 2.24
The points within accurate buff zone: 34/200
The points within inaccurate buff zone: 166/200
 

5. Bayesian Network and Decision making
By counting the points within buff zones, we can set nodes of a Bayesian Network as following by the result of calculation.

Tbale4 probability of each node
 
Horizontal Vertical
Accuracy  0.17 0.05
Inaccuracy 0.83 0.95

     The decision-making under positional uncertainty follows rule-based reasoning. An example of such a rule is
IF the horizontal data is inaccuracy
AND the vertical data is inaccuracy
THEN risk of informed decision is high
     We have 166 inaccurate data points in total 200 horizontal points and 152 inaccurate data in total 160 vertical points. They exceeded requirement of The National Map Accuracy Standard. The combinational probability of vertical and horizontal inaccuracies is 0.7885. We suggest the risk of informed decision caused by positional accuracy data is more than 78% if you make a decision based on the technology atlas data. The usefulness of model is increased when we have more nodes that affect spatial data quality.
 

References:
Aspinall, R. (1994). The design of belief network-based systems for price forecasting. Computer and Electronic Engineering, 20, 163-180.
Goodchild M. F. (1997). Uncertainty in Geospatial Information Representation, Analysis and Decision Support. FY97 NURI Research Proposal.
Mengshoel, O. J. & Wilkins, D. C. (1997). "Abstraction and Aggregation in Bayesian Networks." Proc. 1997 AAAI Workshop on Abstractions, Decisions, and Uncertainty, July 1997, Providence.
Pearl, J. (1988). Probabilistic Reasoning in Intelligent Systems. Networks of Plausible Inference. Morgan Kaufmann, San Mateo, California.
 Stassopoulou, A., Petrou, M., Kittler, J.(1998). Application of Bayesian network in Geographic Information System based decision making system. International Journal of Geographical Information Science, 12, 23-45.