Objectives
The objectives of this research priority are (1) to extend the capabilities of current data representations and data models so that they can more effectively represent volumetric and dynamic phenomena (i.e., those that change over time and space) and (2) to develop analytical approaches that support these extensions, particularly for very large, distributed databases.
Background
The manner in which geographic information is represented, both conceptually and physically as stored data observations, is a central issue for any field that studies phenomena on, over, or under the surface of the Earth. A scheme for representing data is required and is, in fact, inextricably linked with spatial analysis and the modeling of geographic phenomena. In studies of routing problems, for example, spatial information is typically represented as links between places, which are denoted as points. In studies of environmental problems, the pollutants in air, water, or soil tend to be represented simply as grids, but in other studies, these entities may be represented as polygonal objects that are defined by explicit boundaries. These two representations have become known as location-based and feature-based representations, respectively.
The selection of information to be represented and the choice of representational scheme are thus often driven by the purpose of the analyses. In turn, the results of an analysis can be greatly influenced by the way in which the phenomena under study are viewed. We find that we can follow a strip map or route map more easily than an overall areal map from one place to another in our cars, but a route map is virtually useless for showing the overall distribution of geographic features we encounter within a given area.
The limitations of current data models in GIS are due in large part to the continuing use of the (traditional) cartographic paradigm. With some ingenious exceptions, maps historically have been limited predominantly to a flat two-dimensional static view of the Earth. This view is also at a single scale, with assumed exactitude and with no capability for dynamic interaction by the user. These limitations in large part resulted from the limitations of paper (or parchment, etc.) as a cartographic medium.
In geographic information systems (GISs), current techniques for representing data are capable of recording complex associations among multiple variables. Nevertheless, these techniques generally depict static situations on a plane surface at one specific scale. Many of these two-dimensional representations can be extended conceptually to accommodate volumetric applications, but integration of operational capabilities for representing and analyzing 3-dimensional data has been realized only recently in general-purpose, commercially available geographic information systems.
Current two-dimensional approaches can be extended to three dimensions for volumetric applications (e.g., representing pollutant concentrations in air and groundwater). Some volumetric geographic data handling systems already are in use for graphical and specialized analytical applications. Such systems, however, do not have the representational flexibility and power that are needed for addressing complex, global-scale analyses. Nor can they handle 3D topological relationships that are critical for true 3D applications.
Similarly, current spatial data storage and access techniques are also not designed to handle the increased complexity and representational robustness needed for representing heterogeneous types of data for a wide range of analytical and application contexts, as is currently envisioned for handling these same earth-related problems. Earth-related data are being collected in digital form at a phenomenal rate, far beyond anything we have experienced before. The Earth's surface measures nearly 1.5x1015 square meters. For Spot Image satellite data to provide a single, complete coverage of the Earth's surface at 10-meter pixel resolution (a grid cell of 100 square meters) would require approximately 1.5x1013 pixels. If we assume that a single data value for a single pixel can be stored in one byte, then 1.5x1013 bytes (or 15 terabytes) of storage would be required for that single, complete coverage.
To complicate matters further, satellite imagery data is normally represented as a gridded array, or matrix, of cells. It is geometrically impossible to represent the spheroidal Earth with a single mesh of uniform, rectangular cells (Dutton 1983). Other geometries, particularly the triangular mesh, do not exhibit this problem and have other well-known favorable properties (Peuquet 1984).
To cope with the vast influx of data, various Federal agencies are cooperating in the development of a "global spatial data infrastructure." The infrastructure includes the agreements, materials, technology, and people necessary to acquire, process, store, maintain, and provide access to most of the Earth-related data collected and maintained by the Federal government. Without significant extensions to current representational techniques, however, Earth scientists will not be able to access the data in any usable manner. We therefore need to develop highly flexible, yet highly efficient data models (i.e., concepts addressing structure and format) for handling Earth-related data of this range and magnitude. Severe tradeoffs in the capabilities of representational techniques exist, usually between representational power and efficiency. We therefore plan to extend representational techniques for GISs in order to handle complex, multi-scale volumetric data for interactive analytical and modeling applications.
Although many efforts have been made to integrate GISs with dynamic modeling, most of the efforts are limited to the development of an interface between two separate types of software systems. Modeling software tends to operate within very narrowly defined domains and to use mathematical simulation, whereas GISs are employed primarily for preprocessing observational data and post-processing data for comparative display. The ability to represent and examine the dynamics of observed geographic phenomena within a GIS context, except in the most rudimentary fashion, is currently not yet available. We urgently need this capability so we can more effectively analyze an increasing variety of problems at local, regional, and global scales. The careful analysis of change through time and patterns of change is vital to understanding a range of problems, from urban growth and agricultural impacts to global warming. Thus research to improve representational schemes is a high priority.
Building dynamic processing within a GIS is difficult because the current GIS data model is geared toward static situations. A number of characteristics of space-time data make the development of space-time representation more difficult than volumetric representations. First, unlike volume or geographic area, time cannot be measured in spatial units (i.e., feet or meters). Second, the nature of time itself differs from that of space. At a given moment, everything everywhere is at the same point in time. Furthermore, unlike space, time is unidirectional, progressing only infinitely forward. For example, July 4, 1996, occurs only once. Moreover, interactions between space-time processes are complex and need to be represented in some way.
A GIS can act as a microscope to geographic worlds (Alber 1987). Looking for phenomenon with a GIS microscope means that many properties of the phenomenon will not be seen, while others will be enhanced. Hence, information obtainable from a GIS is a function of the chosen representational scheme. Geographic representation has to agree with the information need of sciences and applications to ensure the usefulness of GIS data to modeling and decision making. Because spatial characteristics of geographic phenomena may vary with scale of observation and their interactions can occur across scales, geographic representation has to be dynamic, interconnecting geographic phenomena at multiple scales in space and time to support dynamic modeling.
Information contained within a geographic database may be augmented or modified over time, but successive change or dynamics through time cannot be represented, except through some extremely simplistic methods (the most frequently used method is equivalent to a series of still snapshots). Data representation techniques also need to accommodate more dynamic interaction so that the user can conduct exploratory data analysis and examine multiple "what if " scenarios. Given the rapidly increasing use of GISs for policy analysis and decision making, another urgent issue is how to represent data of varying exactness and varying degrees of reliability and then convey this additional information to the user. The importance of this issue is underscored by the fact that the National Center for Geographic Information and Analysis designated accuracy in spatial databases as Initiative 1. Much work remains to be done on developing methods to handle fuzziness and imprecision --which are inherent in geographic observational data-within a digital database. Such methods are crucial for combining multiple layers of data from varying sources (Goodchild and Gopal 1989).
In addition to analysis and modeling, query support is one of the most important capabilities for any information system. The faithfulness of a representation to processes which it attempts to represent is critical to assess the effectiveness of the representation. Other factors include the amount of data need to be stored to comply with the representation, the degree to which data can be associated to resolve patterns and relationships, and the amount of information can be computed from the representation. As the aforementioned, current GIS representations have limited capabilities to depict volumetric and dynamic processes as well as their interactions at multiple scales. They also have limited support for queries that inquire information about spatial relationships in a 3D environment, spatiotemporal behaviors of a process, and interactions among processes across scales.
From a human standpoint, spatial relationships between geographical entities (cities, etc.) are often expressed in an imprecise manner that can be interpreted only within a specific context (e.g., Is New York near Washington, D.C.?). Current methods of data representation and query, however, are limited to absolute and exact values and cannot handle inexact terms, such as "near" (Beard 1994). Yet inexactness and context dependency is an integral component of human cognition and of the human decision-making process. In order for GISs to become truly useful and user-friendly tools, whether for addressing complex analytical issues such as global change or urban crime or for making day-to-day decisions, the data model used by GISs needs to accommodate such cognitive issues.
Although many studies have directed to improve geographic representation (c.f. Langran 1992, Peuquet 1994), further research is needed to examine the essence of geographic processes to extend geographic representation. The extended representation needs to provide a framewok to hold data in ways that information can be inferred to characterize geographic processes that generate these data. Many research questions remain unanswered. How should a GIS handle processes of various kinds to ensure that salient characteristics and behaviors of these processes are represented? How does identification of entities and relationships relate to different problem domains? How can a representation accommodate different ways that geographic entities and relationships are identified to allow interoperability of data among applications? Research in extension to geographic representation attempts to provide both conceptual and practical frameworks with which that geographic phenomena are represented in optimal ways to support GIS data analysis, dynamic modeling, and information query in multiple dimensions across various scales in space and time.
The UCGIS Approach
A primary theoretical issue we plan to address is to develop a new representational approach for GIS that optimizes both the capabilities of the modern computing environment and representational techniques recently developed in a number of fields, including GIS, and that incorporates human cognition of geographic space. Addressing this issue involves decisions that range from the most philosophical (e.g., determining how time differs from space and how those differences can be represented) to the most practical (e.g., choosing high performance computing techniques for handling vastly increased data volumes).
To provide sophisticated capabilities for temporal analysis and the ability to effectively answer a wide range of spatiotemporal queries, we must adopt an approach that uses multiple representations, including combined geometries (rectangular, triangular and hexagonal) for location-based representations, and combinations of location-based, feature-based, and time-based representations. Researchers within the GIS community as well as developers of commercial GISs generally recognize this multi-representational approach as the best method, although its deliberate use as a long-term solution for designing geographic databases is a recent development. We have only begun to explore how tools such as computational geometry and object-oriented design can be used to achieve the representational capabilities required for volumetric and dynamic geographic data. Recent attempts at extending current representational techniques to include time have served mostly to demonstrate the complexity of the problem (Peuquet 1994). Several worldwide efforts are addressing the representation of geographic data, and separately, the representation of dynamics within database management systems (DBMS) (Tansel et al. 1993). We need to continue these efforts, and explore how temp oral DBMS techniques can be applied to combined space-time representation.
Data mining and knowledge discovery (KDD) in databases have become emerging fields that incorporate techniques from statistics, database management, artificial intelligence, information science, and scientific data visualization. Data mining and KDD technologies aim to extract higher level information and facilitate knowledge formulation based on data records in large, heterogeneous databases. Important concepts being developed in this emerging area can be valuable to the study of geographic representation in support of query and information computing. As already mentioned, modern GIS applications usually deal with large data sets from various sources. Advancing acquisition technologies have accelerated the growth of geographic data. The capabilities of extracting meaningful, previously unknown information from vast amounts of observational data are important to advance the usefulness of GIS technologies in both scientific inquiries and practical applications. Geographic representation and subsequent query and retrieval techniques play crucial roles in enabling these capabilities. Extension to geographic representation needs to consider incorporating parallel and distributed database storage and query techniques in order to allow efficient data access for such "data sifting" operations.
Developing new ways of representing geographic data requires an interdisciplinary effort involving geographers, computer scientists (particularly those currently involved in database management or high-performance computing), applied mathematicians, cognitive scientists, and experts from the application domains.
Although much conceptual work is required to extend methods for geographic representation, the proof and practical refinement of any new data model lies in its implementation and empirical testing on real-world data. Such activities require significant investment in programming time and computing resources. Because methods of data representation are so fundamental to software design, data models can rarely be replaced within existing software. Instead, software for testing new data models needs to be custom made.
There are numerous GIS data models proposed in the literature with an increasing number in feature-based and object-oriented representations. A primary theoretical issue we need to address is how to utilize this approach within a broader perspective that incorporates an integrated theory for geographic representation. With such a theory, we can assess geographic representations in terms of correspondence to human cognition, information capacity, computability, and data model interoperability. A theory of geographic representation is necessary to provide a common framework to examine what kinds of geographic phenomena or processes are best represented by data models with a certain set of characteristics so that attributes and behaviors of these phenomena or processes can be fully represented, analyzed, and modeled in a GIS environment.
Importance to National Research Needs/Benefits
We have an increasingly urgent need to better understand the effects of human activities on the natural environment at all geographic scales. In natural resource management within the developed world, emphasis is shifting from inventory and exploitation to maintenance of the long-term productivity of the environment. Such maintenance requires interactive space-time analysis at multiple scales to clarify the complex interrelationships of environmental systems. As only one component of this analysis, Global Circulation Models (GCMs) are used to study climate dynamics, ocean dynamics, and global warming. We need to verify and refine these models. To do so requires sophisticated analysis of large volumes of multidimensional data, particularly the study of change through time and patterns of change through time around the entire Earth, including the oceans and the atmosphere.
In an urban context, we need an interactive, and real-time means for solving problems in emergencies (e.g., floods, wildfires) to preserve life and property. As populations and development have increased, the need for predicting human/environmental interactions through the use of multiple "what if " scenarios has become recognized. For all of these diverse uses of GISs, we must have the ability to perform interactive space-time analysis at multiple scales and to have data of known reliability.
Enormous amounts of data, already in digital form, are being collected for studies of a diverse range of urgent environmental, economic, and social problems. GIS can provide an integral platform to spatially and temporally correlate data from multiple sources for a better understanding of the dynamics of and interplay between global and regional processes and their interactions with human environments. Nevertheless, current representational techniques for storing and accessing these data within GISs are not adequate. We need significant advancements in representational methods in order to access these data in forms that are useful for analysis and improve the science being done. Extension to geographic representation aims to provide such an integral platform to hold data in space and time to facilitate computing important information, formulating hypotheses, and developing knowledge.
Priority Areas for Research
Short term
Apply new DBMS techniques, particularly temporal DBMS, to the geographic context; examining alternative ways for representing the temporal component, evaluate alternative space-time database designs, and identify aspects of time in geographic data that cannot be represented in existing DBMS.
Apply high-performance computing techniques to the geographic context, examining methodologies for distributed databases and distributed processing that accommodate the spatial nature of both the data and potential retrieval queries.
Apply data mining and knowledge discovery (KDD) to the geographic context, examining methodologies for complex spatiotemporal query support and information inference that will further lead to hypothesis formation and knowledge development.
Medium term
I. Develop new strategies and techniques that combine current approaches, such as the use of object-oriented programming techniques and the use of computational geometry for multi-representations. II. Develop a space-time data model that can represent dynamic processes and spatial interactions in an effective manner, using a multi-representation approach.
III. Develop new graphical interface (visualization) techniques that utilize the increased capabilities needed for representing large, multi-scale, heterogeneous data.
IV. Develop new query language capabilities for handling the increased dimensionality of spatiotemporal data (e.g., although standard query languages have been extended to handle spatial queries, research is still needed to make appropriate extensions to accommodate space-time phenomena).
V. Examine geographic information components and structures from a cognitive perspective; how humans learn about and recognize geographic knowledge, how the semantics of geographic entities, locations, and relationships can be most appropriately represented in digital form and maintained when we transform them from one representation to another.
Long term
VI. Develop a new, multidimensional representational theory that more closely reflects human cognition yet is also highly efficient and minimally complex from a computing standpoint. VII. Develop characteristics based upon the new representational approach that allow geographic databases and associated analytical capabilities to be implemented with predictable characteristics.
In terms of research, the highest priority should be placed on the long-term efforts and the second highest priority on the medium-term efforts because many private GIS providers and government agencies are already funding or directly participating in the areas identified as short-term efforts. Areas identified as medium- and long-term efforts are areas in which the most work needs to be done and where the highest benefit will be derived. These areas are also least likely to gain support of private GIS providers or government agencies because of the length of time that sustained support is needed before concrete benefits to GIS would be realized.
Each of these priority areas also requires the context of an example problem. The test problem should include multiple scales, multiple dimensions, and a diverse range of data types so that the research will focus on solutions that are directly useful and applicable. Possible test problems include the global water cycle, global carbon cycle, Central American forests, land-use change and social impacts, crime, dynamic changes in urban neighborhoods, and emergency response.
References
Alber, R. F., 1987. "The National Science Foudation National Center for Geographic Information and Analysis," International Journal of Geogrpahical Information Systems 3:117-136.
Beard, K., 1994. Accommodating uncertainty in query response. Proceedings, Sixth International Symposium on Spatial Data Handling, Edinburgh, Scotland: International Geographical Union.
Dutton, J., 1983. Geodesic modelling of planetary relief. Proceedings, AutoCarto VI, Ottawa.
Goodchild, M. F., and S. Gopal, editors, 1989. Accuracy of Spatial Databases. London: Taylor and Francis.
Langran, G. 1992. Time in Geographic Information Systems. London: Taylor & Francis.
Peuquet, D. J., 1984. A conceptual framework and comparison of spatial data models. Cartographica 21(4):66-113.
Peuquet, D. J., 1994. It's about time: A conceptual framework for the representation of spatiotemporal dynamics in geographic information systems. Annals of the Association of American Geographers 84: 441-461.
Tansel, A. U., J. Clifford, S. Gadia, S. Jajodia, A. Segev, and R. Snodgrass, 1993. Temporal Databases. Redwood City, CA: Benjamin/Cummings Publishing Co.