SPATIAL DATA ACQUISITION AND INTEGRATION

Objective

This research objective calls for efforts to extend and improve existing technologies for capturing, integrating, and consolidating multiple spatial data resources, including maps, remotely sensed imagery, and topographic measurements; to develop methods for automating the acquisition of new high-precision spatial data and integrating the data into existing spatial databases by means of consistent georeferencing based on image analysis and understanding, suitable geometric transformations, and the use of appropriate algorithms and statistical methods.

Background

Geographic information provides the basis for many types of decisions, in areas ranging from simple wayfinding to management of complex networks of facilities and the sustainable management of natural resources. In all of these areas, better data should lead to better conclusions and better decisions. According to several standards groups and users groups, better data would include greater positional accuracy and logical consistency and completeness. Technological advances are making it possible to capture positional information with ever improving accuracy and precision. Commercial remotely sensed images from space will soon offer a resolution of one meter or better. Satellite telemetry using the Global Positioning System can now achieve accuracies well within one centimeter. But each new data set, each new data item that is collected, however accurate it may be, can be fully utilized only if it can be placed correctly into the context of other available data and information.

"Because of the very high costs of human expertise, more and more geographic data sets are being built by automated processes of data capture, and integration with other data. Digital Orthophoto Quadrangles (DOQs), a comparatively new form of geographic information, provide a good example. According to the National Mapping Division of the U.S. Geological Survey, "Digital orthophotos require several types of inputs to produce an orthogonally rectified image from the original perspective image captured by the sensor. Chief among these are: 1) the unrectified raster image file acquired from the scanning of the dispositive image or directly from the sensor, 2) a digital elevation model with the same area of coverage as the digital orthophoto, 3) the photo-identifiable image and ground coordinates of ground control positions (a minimum of four) acquired from ground surveys or aerotriangulation, and 4) calibration information about the sensor collector device. These four inputs are used collectively to register the raw image file mathematically to the scanner or to the sensor platform, to determine the orientation and location of the sensor platform with respect to the ground, and to remove the relief displacement from the image file." [from http://wwwnmd.usgs.gov/www/ti/DOQ/spec_processing.html]
Geographic data sets show two clear trends: they are becoming increasingly abundant and they are growing ever more precise. Remote sensing technology alone generates a vast amount of raw spatial data daily, much of which is redundant, information poor, or too complex for adequate analysis with current technology before the next batch of raw data arrives. Remote sensing technology also promises and delivers continuous precision improvements in image resolution and data capture methods. Data assimilation strategies and methodologies have not kept pace with these advances in generation, however.

One approach to facilitating the integration of spatial data is to mandate uniformity through standardization and agreed-upon formats and requirements. While this may work in the short term for a temporal cross-section of similar data, it cannot fully address the ever-evolving character of the captured spatial data. Historical databases from a pre-standards era must coexist with current standardized products; and future fully three-dimensional spatial datasets must be reconciled with other data within a context of current two-dimensional spatial data standards.

While remotely sensed and imaged data are becoming available in greater and greater quantities and at higher resolutions, integrating data from different sources is not yet easy because of variations in resolution, registration, and sensor characteristics. Without the ability to integrate data from different sources, we are faced with extensive duplication of effort and unnecessary cost. Imagery can play a very valuable role in updating old data sets, but this process is similarly impeded by the problems of integration.

The term "conflation" is often used to refer to the integration of data from different sources. It may apply to the transfer of attributes from old versions of feature geometry to new, more accurate versions; or to the detection of changes by comparing images of an area from two different dates; or to the automatic registration of one data set to another through the recognition of common features. Too often, however, methods of conflation and integration have been ad hoc, designed for specific purposes and of no general value. For example, much effort in the past was directed toward updating the relatively low-accuracy TIGER database from the U.S. Bureau of the Census with more accurate topographic data.
Technological advances-including vastly greater computing speeds, larger storage volumes, better human-computer interaction, better algorithms, and better database tools-are making conflation and integration more feasible than ever before. A general theoretical and conceptual framework would address at least five distinct forms of integration, all residing in a common database: map to map (different scales, different coverages, etc.), image(s) to map (elevation mapping, map revision, etc.), image to image (different resolutions, wavelengths, etc.), map to measurement (verification, registration, etc.), and measurement to measurement (adjustment, variance, etc.).

The UCGIS Approach

"Interdisciplinary" is the watchword for University Consortium for Geographic Information Science (UCGIS) research and associated technology development. The capture and integration of spatial data requires the collaboration of many participating disciplines, including cartography, computer science, photogrammetry, geodesy, mathematics, remote sensing, statistics, modeling, geography, and various physical, social, and behavioral sciences with spatial analysis applications. We will solve key problems of capturing the right data and relating diverse data sources to each other by involving participants from all specialty areas, including the traditional data collectors, the applications users, and the computer scientists and statisticians who optimize data management and analysis for all types of data sets. We will develop mathematical and statistical models for integrating spatial data at different scales and different resolutions. We will especially focus on developing tools for identifying, quantifying, and dealing with imperfections and imprecision in the data throughout every phase of building a comprehensive spatial database.

Many organizations and data users have developed and promoted standards for spatial data collection and representation. By adhering to these standards, data collectors and data integrators will improve the consistency and overall quality of their pro ducts. The standards alone, while facilitating the sound construction of multifaceted spatial (and spatiotemporal) databases, do not, in and of themselves, offer the means by which to integrate fully all types of spatial data efficiently and consistently . Different standards exist for imagery at different scales, for maps at different scales, and for adjustment of measurements taken with instruments of different precisions. A single common framework is needed for the diverse types of spatial data. Spatial data integration permits the coexistence of multiple spatially coherent inputs. Spatial data integration must include horizontal integration (merging adjacent data sets) and vertical data integration (operations involving the overlaying of maps); handling differences in spatial data content, scales, data acquisition methods, standards, definitions, and practices; managing uncertainty and representational differences; and detecting and removing redundancy and ambiguity of representation.

Importance to National Research Needs

The National Spatial Data Infrastructure consists of the collective spatial databases of the country and the people and mechanisms for fostering better use of these resources. Integration of information is essential to any information system. Users hope the integration step will be easy, but this happens only rarely when data providers have invested considerable effort to ease the user's burden. Spatial data imposes the additional requirement of correct (or at least consistent) assignment of position to spatial features. Merging spatial data sets under that constraint is uniquely characteristic of a conflation or spatial data integration process.

The benefits of efficient and effective spatial data integration include reduction or elimination of some costs associated with new data collection; quality improvement of data through added value and greater accuracy, resulting in better decision-making, reduced risk, and increased options for use; and better opportunities to update maps through spatial database maintenance.

High Priority Activities

We need to inventory existing and evolving methodologies for spatial data capture and integration. With the inventory in hand, the UCGIS institutions can develop a conceptual framework for integrating diverse data sets based on actual content and quality and on current practices and capabilities.

Several areas of basic research in data acquisition and integration promise significant payoffs in terms of reduced costs and better spatial data products. The basic research areas include image processing and analysis, computer vision, geometry of imagery, and other methods of remote sensing; feature recognition, feature matching, feature classification in spatial data sets; algorithm development and data structures for matching and merging spatial data; analysis of impediments to and limitations of spatial data capture and spatial data integration; and development of map update and maintenance methodologies based on data integration practices.

Several areas of applied research have also been identified and in some cases begun by UCGIS member institutions:

Relevant Literature

Abel, D. J., and M. A. Wilson, 1990. A systems approach to integration of raster and vector data and operations. In K. Brassel and H. Kishimoto, editors, Proceedings of the 4th International Symposium on Spatial Data Handling, Zurich, Switzerland, Vol. 2, pp. 559-566.

Department of Commerce, 1992. Spatial Data Transfer Standard (SDTS) (Federal Information Processing Standard 173). Washington, DC: Department of Commerce, National Institute of Standards and Technology.

Federal Geographic Data Committee, 1995. Development of a National Digital Geospatial Data Framework. Washington, DC: Federal Geographic Data Committee, Department of Interior. (ftp://www.fgdc.gov/pub/standards/refmod.txt)

Kiefer, R. W., and T. M. Lillesand, 1994. Remote Sensing and Image Interpretation, Third Edition. New York: John Wiley and Sons.

National Research Council, Mapping Science Committee, 1993. Towards a Spatial Data Infrastructure for the Nation. Washington, DC: National Academy Press.

Saalfeld, A., 1988. Conflation: Automated map compilation. International Journal of Geographical Information Systems 2(3):217-228.