Suggested time for completion: One week
To gain a clear understanding of what a data model is, and why data models are important. To learn the data models and data structures ESRI supports in ArcInfo8, and the similarities and differences between them. To learn the advantages and disadvantages of using certain data structures for different tasks. To reinforce basic ArcInfo8 skills.
For more information on data models in Geography:
You will notice some diversity in the definitions, as they are in the context of different companies, software, times, and degrees of specificity. For this lab, focus on the hierarchy described in the main body of the lab. Data models in geography: ESRI Glossary Definition of Data model AGI dictionary Definition of "Data Model" ESRI on ArcInfo8 Data Models GIS.com's "Data Types and Models" GIdata models from the Environmental. Systems Analysis Group Custom Data Models from ESRI's ArcOnline |
Data models are a crucial concept for
GIS users to understand. Data models describe how geographic data
will be represented and stored. The choice of data model will yield
benefits in terms of simplifying aspects of the real world, but will also
incur costs in terms of oversimplifying or misrepresenting other features.
A map is an example of an analogue data model(
1); the cartographer has abstracted the real world with a set of conventions
that she can use to represent important aspects of the landscape.
In a computer, all information must be stored digitally: that is, it ultimately
must be reduced to numbers (1010000110...). Therefore, the abstractions
of a real-world model must be formalized in a data model.
The data model define show the computer can store the geographic information
(geometry and attributes) in a database or other format. Bernhardsen
(1999) diagrams the process along these lines:
Figure 1: The modeling process. (after Bernhardsen 1999, p.39.
Map graphics from www.gis.com)
In order for geographic data to be represented digitally,
a geographic data model has to be chosen. Most of the confusion
about data models arises from the diversity of geographic data models.
Unlike classifications of things in the natural sciences or geometry, data
models are not necessarily defined by hard-and-fast rules derived from
observation or logic; data models are instead created by GIS programmers
and users for the purpose of representing certain specific features from
the real world. The definitions and capabilities of data models will
thus vary depending upon the aspect of reality that the GIS software designers
and users are attempting to model. Furthermore, data models (and the
resulting data structures that are actually implemented in GIS software)
may evolve
through time under the influences of technology (e.g., increasing storage
space and processing power, or networking, or software compatibility) or
even history (e.g., ESRI started with the georelational model way back in 1980,
so it is still probably their best-supported and most used data model).
Finally, the influences of the marketplace and the interests of GIS companies
and consumers must be taken into account.
The result of all this is that every GIS software
package will be capable of supporting a number of data models. The
capabilities of the data models may change with new versions of the software,
and compatibility issues may arise. Certain functions will be accessible
with data in the form of one data model but not another.
The National Center for Geographic Information and Analysis (NCGIA)
has their core
GIScience curriculum online. Some resources relevant to Data
Structures and Data Models:
Fundamentals of Data Storage Information Organization and Data Structure and Non-spatial Database Models. |
The confusion surrounding all of this can be
reduced if one thinks of the data models as fitting within a general hierarchy
(these will all be discussed in detail in lecture).
Figure 2: Hierarchy of ESRI's ArcInfo8 data structures. Based on ESRI descriptions in Booth (1999). Note that there are raster and vector data models and raster and vector data structures. Beware: ESRI documentation often uses the same names for both data model and data structure, which may be confusing (e.g., ESRI developed a "georelational" data model which they sometimes call the "coverage" data model as well - and the resulting data structure is the "coverage", i.e., the ArcInfo coverage). TIN is a data structure resulting from the Delauney triangulation data model.
One final complication is that geodatabase data
structures (based on ESRI's object-oriented geodatabase data model) will be
able to contain rasters and TINs, as well as vector
data sets, although this is not supported as of ArcInfo 8.0.2.
Data Models, Data Structures, and Feature Classes in ArcInfo8.
In ArcCatalog, the geometry and data structure of every feature is identified by a small picture or icon. This works much like Windows Explorer, except that only file formats recognized by ArcCatalog as geographic data will be displayed.
Your life will be made much easier if, as part of
learning about data structures, features, and other files, you learn ArcCatalog's
icons for them. There are a lot of them and they can be initially
confusing, so here is the handy table from
Lab #1 that you can refer back to. Below is a display from ArcCatalog
showing how the icons are identified by type.
|
![]() |
The folders and files that make up shapefiles, coverages, geodatabase feature classes, rasters, and TINs fall into an organizational hierarchy in ArcCatalog (Note: this is a completely different matter than the conceptual hierarchy discussed above in Figure2). Figure 3, below right, shows the hierarchy of folders, data models, data sets, and feature classes as displayed in ArcCatalog. Feature classes are the lowest level that the user accesses.
- For shapefiles, the shapefile is the feature class. Each feature (donut shops, streets, etc.) will be contained in its own shapefile. The geometric information (stored in hidden binary files) will be displayed in the "Geography Preview" and the attribute information (stored in dBASE tables) will be displayed in the "Table Preview". This linkage of geometric files to separate attribute tables is common to shapefiles and coverages and is the main conceptual tenet of ESRI's georelational data model.
Figure 3: Icons and hierarchy. Click on figure for enlarged image.
- For coverages, each feature class does not correspond to a map feature. The coverage feature classes are standard categories like arc, label, polygon, tic, etc. The feature classes are found in a folder. This folder is the coverage. Each feature on the map (landuse, railroads, etc.) will correspond to one of these coverage folders. Within the folder, the feature classes store the geometric information (coordinates are stored in hidden binary ARC files; displayed in the "Geography Preview") that is linked to attribute tables (INFO tables; displayed in the "Table Preview"). Like shapefiles, coverages are data structures resulting from a georelational data model.
- For geodatabases, like shapefiles, each feature class corresponds to a map feature such as roads, counties, etc. The feature classes are generally grouped into a feature data set, a folder that might contain data about a region or topic (e.g., "USA container" contains information about the USA). Unlike shapefiles and coverages, geodatabases employ a geodatabase data model that stores each feature as a row in a relational database table (this record would link to other tables containing geometric information, topological relationships, attribute information, etc.). A number of feature data sets can be stored in a geodatabase.
- Looking again at Figure 3, you will notice that the geodatabase, the coverages, and the shapefiles are all contained within the the folder 'Some-Data.' The little blue symbol on the folder indicates that it contains recognizable geographic data in the first level beneath 'Some-Data.' In the context of coverages, this folder would often be referred to as a coverage workspace.
- Additional note: notice that we did not name the folder "Some-Data" as "Some Data", with a real space, even though this is allowed by Windows 95/98. ArcToolbox needs to read directory path names into a command line to run certain commands, and if there is a real space in the file name, the path will be split and interpreted as two separate words. You will get an error like "Spaces are not permitted in the path name" or "too many commands" or some such thing. So, for ArcInfo purposes, name your directories and files using dashes ("-") or underscores ("_") instead of using spaces. You have been warned.
/mystery -- Contains 8 data layers of several features in different data models. You will be figuring out what these are in the lab./sb
roads -- Santa Barbara county roadscoverage, clipped to the Goleta-Santa Barbara region
sbdem -- digital elevation model of Santa Barbara county
sbtin -- TIN derived from sbdem
sbcontour -- Contour coverage derived from sbdem
cacounties -- counties of California, from the GDT data setCopy the data to your local work folder.
The street data we are using in this lab were provided by Geographic Data Technology (GDT, located at www.geographic.com).
A bit on Geographic Data Technology GDT is a company that specializes in collecting geographic data, improving it, and packaging it in formats customized for specific uses. Their flagship products derive from their street centerline database, a continually updated database attempting to coverall of the streets of the U.S. From this database they sell streetdata for uses such as displaying street maps, routing, traffic planning, and transportation. GDT has generously donated their Dynamap/2000 data set for Santa Barbara county for educational use. As such PLEASE keep in mind that students are NOT allowed to copy it for uses outside of this course!
An additional item of interest:
- GDT's street data is originally derived from sources such as the USCensusTIGER/Linefiles, but has gone through a number of improvements to increase its usefulness for routing and is updated regularly on the basis of sources such as airphotos and suggestions from their Community Update program, in collaboration with ESRI.
2.4.1 Understanding data models:
Tables
Question 1:
As you work through the lab, fill out Tables A and B below based on information from the lab introduction, exercises, course text, and lecture. If time is short, you may want to leave some of the tables to fill out outside of section. |
Table A: Main Data Models. Briefly describe each data model.
Geographic Data Model |
|
|
(hint: for TIN data structure) |
Briefly describe the essential characteristics of each data model. Include the types of data generally represented by a particular data model (i.e., continuous or discontinuous) and the data structures that would be implemented in GIS software by the data model. Give an example of a likely geographic feature that would be represented by each model. |
Table B: ArcInfo8 Vector Data Structures. Fill out the table
as you work through the lab. If you need additional information,
make sure you examine lecture notes, reading for the course (Zeiller's
"Modeling our World"), and the ArcInfo help files.
Table B:
Here is the table in HTML format and Word97 format. |
Copy the mystery and sb directories onto your
zip disk, and then, if you wish, into your temp directory on the
hard drive. Connect to this folder in ArcCatalog and examine the
layers in the folder mystery.
Answer question 2:
What are the data model do each of the layers represent? What feature does each layer represent? (be as specific as possible). mystery1 --
|
Once you have identified the layers and the conceptual data models that they are based on,
convert mystery5
into the same data structure as mystery2. You will have to figure out
how to do this yourself, but here are some big hints:
Give the output a name you will remember, and run
the conversion. Take your resulting layer and display it in ArcMap,
along with mystery5.
Answer question 3:
How similar are mystery5 and your converted layer? Briefly describe the major differences between the two. What is the cause of them? What do you think was the source data from which mystery5 was derived? |
Go to the directory sb.
Now, add sbcontour, sbdem, and sbtin
into ArcMap. Display just sbcontour and sbtin, and
overlay
sbcontour on top of sbtin. To make the display
intelligible, you will have to change the properties for the two layers.
If you're curious about making better use of Properties, the main methods are the creation of Layers in ArcCatalog, and ArcMap's Style Manager, found in the Menu Bar under Tools-->Styles-->Style Manager. | You will be repeating these steps to change a layer's properties hundreds of times throughout the quarter. You will probably find the Properties functions very useful but perhaps not as user-friendly as they could be and somewhat tedious and frustrating to use for complicated tasks. We will discuss ways to make this easier later on in the quarter by using ArcMap's Style Manager. |
Answer question 4:
Which of the three layers (sbdem, sbtin, sbcountour) do you think was the original data layer? Which is "second generation" and which is "third generation"? Why do you think this? |
2.4.3 Data Structures and ArcToolbox
Coverages are the vector data structures long used
in the old Unix workstation version of ARC/INFO. Therefore, many of the ArcToolbox tools
simply use a wizard to create a command line that runs an ARC process in
the background. As a result, many of the tools only support coverages,
although some of the newer tools are designed for geodatabases. To
familiarize yourself with the Toolbox and the input formats required, find
each tool listed below and figure out what kind of input file(s) it supports
(e.g., coverage, geodatabase feature class, grid, TIN, etc.).
Answer question 5:
Find each of these tools and determine what data model type(s) (or perhaps other file types) it takes as input: a) Clip, Select, Intersect, Buffer, & most other Analysis Tools
(all the same answer)
|
As discussed above, coverages have been the standard data structure for the generic vector data model for previous releases of Arc/INFO. With the release of ArcInfo 8.x, Arc and INFO have apparently been integrated, and the new geodatabase data structure has been promoted. However, coverages are still the best-supported data structure overall, therefore it behooves us to understand their structure.
Recall that coverages are based on the georelational data model. The INFO part of Arc/INFO was a relational database manager. An INFO file is a table that stores the information associated with the geographic features of a spatially referenced data set. This gives a GIS the ability to manipulate information both spatially and via standard tabular database functions. An example relational model is when two tables share a common column. In a georelational model the individual records in two or more tables are related through their location in space. The polygon coverage below serves as a simple example of this concept. The common column is often called the KEY column and is used for relating or joining tables.
(courtesy of ESRI)
Let's explore the attribute tables of roads.
Go to ArcCatalog and Preview the data.
Previewing Tables
|
Answer question 6:
How many records are there? What do FNODE# and TNODE# mean? What other attribute information can you recognize or guess at in the table (pick 3 columns)? |
For a look at polygons and Polygon Attribute
Tables (PATs), open cacounty. Explore the tables for the arc, polygon, and region.cty coverage feature classes.
Answer question 7:
How many counties are there in California? Why do the AAT, PAT, and RAT have different numbers of records? Explain the relationship between arc, polygon, and region.cty in this coverage. What are the label and tic feature classes for? Hints: To figure out the answers, you will need to examine
the tables. In addition, you might want to use the Identify Tool ( |
Your map for Lab 2:
Make a map of mainland Santa Barbara county with the roads coverage overlaid on the contour coverage, using your knowledge & skills from Labs 1 and 2. You will have to choose appropriate properties for the two themes so that they are not confused on your black and white printout and so that they are easily distinguished by the viewer. Also, make sure you follow the basic principles of cartography outlined in Lab1. |
So far we have focused on digitally modeling geographic features, and attributes for those features. However, increasingly GIS users are increasingly seeking to model relationships between features as well. These relationships can have behavior and can follow rules. A primary advantage of the new geodatabase model is that it gives the user/designer the ability to build structured relationships between features.
To get a handle on this, consider the classic example of a power pole and a transformer. Perhaps you want to describe the location of the transformer on the pole -- e.g., height in feet and the side of the pole the transformer is on (North, West, etc.). The geodatabase designer could constrain the possible entries in the "location" field for the transformer to North, South, East, and West. Then, a person doing data entry would simply select the appropriate direction from the available options. Similarly, the designer could constrain the "height" field to between 10 and 20 feet.
The designer could also limit the number of relationships a particular pole can have with transformers. In the real world, several transformers can reside on a pole. However, an unlimited number of transformers will not fit -- we might imagine that four transformers is the maximum. The geodatabase designer could constrain the number of relationships the pole has with transformers to between 0 and 4. After four transformers have been assigned to that pole, a transformer would have to be deleted before another could be added. .
The relationship between poles and transformers
is directional as well. In a directional relationship, changing
A will change B, but changing B will not change A. If you move a
pole (in real life and in the GIS), you want the transformers on the pole
to move as well. But you don't want to be able to move a transformer
by itself, as it must always be on a pole. If you delete a pole from
the data layer (say, because it was burnt down in a forest fire), you will
want the records for the transformers on that pole to be deleted as well.
But if you delete a transformer, the pole should remain unaffected.
Answer question 8:
Come up with an example of two simple (geographic) features that you might want to represent in a geodatabase as having a relationship. Come up with some rules for the relationship describing directionality and data entry constraints. This is just a conceptual exercise, so you do not have to actually create the relationship rules in the computer. Creativity is fine for this question as long as you show that you understand the concept of relationships between features. |
In this lab, you have gained a basic understanding
of geographic data models and data modeling, and the resulting, primary data
structures
used in ESRI's ArcInfo 8.x software. You have seen how the ESRI data
structures
are similar and different from each other, and how each has advantages
and disadvantages for certain purposes. You have gained further experience
with some basic ArcInfo 8 skills, such as changing properties and using
the help functions. Finally, you have learned about the important
concept of relationships in GIS.
Bernhardsen, Tor. Geographic Information Systems: An Introduction . New York: John Wiley & Sons, Inc., 1999, pp. 37-99.
Booth, Bob. Getting Started with ArcInfo. Redlands, CA: ESRI Press, 1999, pp. 45-56.
Minami, Michael, Sakala, Michelle, and Wrightsell, Jennifer. Table:"Comparing the structure of vector data sets." In Using ArcMap . Redlands, CA: ESRI Press, 1999, pg. 403.
Zeiler, Michael. Modeling Our World: The ESRI Guide to GeodatabaseDesign . Redlands, CA: ESRI Press, 1999, pp. 1-199.
Online Sources:
First 3 lectures of GEO 580!
AGI dictionary Definition
of "Data Model"
FOLDOC definition of data model
Last update: April 8, 2002
http://dusk.geo.orst.edu/buffgis/Arc8Labs/lab2/lab2.html