Spatial Cluster Analysis for Point Data: Location Quotients verses Kernel Density

Yongmei Lu
Department of Geography
State University of New York at Buffalo


Beginning with a general review of spatial pattern analysis methods, this research investigates pros and cons of different point pattern analysis methods. Based on discussion of hot spot concept and properties, various hot spot identification methods are grouped into two categories according to perception of point concentration over spatial extent. Density surface analysis is identified as a good measurement for hot spot because of its minimum location information loss, maximum objectivity for showing hot spot, and potential for risk analysis. However, Location Quotient has its unique aspects for spatial pattern analysis over kernel density in that it provide the opportunity to compromise with the underlying process of observation by allowing for flexible partitioning of study area. Through exploring the theoretical as well as technical intention of Location Quotient, author indicates that Location Quotient analysis, when properly designed, could be used in addition to other analysis methods to reveal different aspects of information embedded in point pattern. Buffalo crime data of Unauthorized Use of Vehicle was used for experimental analysis. Hot spots of arrest as well as report crime are identified by different methods. After comparison of analysis results, it is concluded that further experiment and better knowledge about study area would benefit a comprehensive understanding of Buffalo vehicle crime.

Generally speaking, different measurements look into same spatial pattern from different perspectives and would reveal different information embed in data. It would be very na´ve to expect concrete and consistent hot spots pattern showing through various analyses. Mathematics and statistics methods are only tools helping analyst to investigate data. Interpretation of analysis result is critical and necessary for any spatial pattern analysis. Pros and cons of different measurements are predefined, but analyst's contribution to pattern recognition through interpreting results is inestimable.

1. Introduction

As large spatial databases become available, people from research as well as application domains are exclusively facing a dilemma of data rich while theory poor (Openshaw, 1991). Exploratory Spatial Data Analysis (ESDA) is effective and important in such an era of data exploding to assist discovering data patterns so that possible relationships can be identified and reasonable hypotheses established (Anselin, 1996). The spatial aspect of crime data and the necessity to recognize crime pattern embedded in it lends crime analysis to the power of ESDA, especially when ESDA is enhanced by GIS' capability of large spatial data processing and mapping automating.

Various parties, from federal and local government to private industry and academia as well, have recognized the potential of GIS and ESDA in helping crime analysis (refer to Appendix in Harries, 1999). Crime Mapping Research Center of National Institute of Justice organized a "Multi-Method Exploration of Crime Hot Spots" intramural research project in 1997-1998 to investigate twelve crime pattern analysis methods, especially hot spot identification techniques (Jefferis, 1999). UCGIS includes GIS and Crime Analysis as one of its research themes and challenges in 1999 (Getis, 1999).

To examine performances of spatial analysis methods and evaluate their fitness for crime analysis, I will discuss conventional ESDA techniques in general. Then major concerns of hot spot analysis in general and crime hot spot identification in specific will be discussed and specific characteristics summarized. A measurement based on traditional Location Quotient (LQ) will be proposed and comparison will be made with conventional spatial clustering measurement. Finally, crime hot spot analysis on Unauthorized Using of Vehicle (UUV) crime data of Buffalo, New York will be presented. Discussion of further research will be included in the end.

2. Spatial Clustering and related ESDA methods

According to Tobler's "first law of geography", "everything is related to everything else, but near things are more related than distant things" (Tobler, 1979). Spatial distributions with values at certain locations showing relationship with values at other locations are named spatial autocorrelation. Spatial cluster is positive spatial autocorrelation when similar values are spatially clustered together. On the opposite is the distribution with similar values separated / dispersed from each other, which is called negative spatial autocorrelation (Boots et. al., 1988). Spatial distribution could indicate patterns of underlying process. Incidents exposed to the impact of similar process tend to follow similar locating pattern. Hence, study on spatial cluster could reveal information about the underlying geographical process that generates the spatial pattern, which can further aid the comprehension of underlying geographical process and its relationship with the phenomenon under investigation. Crime hot spot appears when crime occurrences impacted by environmental backcloth are spatially clustered. Investigating spatial pattern of crime could shed light on uncovering the secret veil of crime environmental backcloth.

Similar as other branches in spatial data analysis, there exist different methods for identifying and measuring spatial cluster. Spatial cluster analysis methods distinguish themselves from each other either because they are designed to answer different aspects of cluster questions (i.e. whether there exist clusters, where are the clusters, or to what intensity is the cluster), or because they are based on different philosophies to examine real world (i.e. observation scale, measurement of spatial separation, and subjectivity introduced during the interaction with data) (figure 1).

Indicators based on the exclusive measurement of entire data set global view) are usually good judgement for existence of clustering through whole data distribution. However, they provide little information about where clusters are. Those indicators can show local patterns and measure local instabilities (local view) are suitable for identify specific clusters existing in data set. However, they simply assume the existence of spatial clustering without actual examination. Thus, indicators from global view methods should precede the use of local view indicators.

Spatial clustering measurements are different from each other also because they apply different methods to represent spatial separation. One group uses continuous function of distance metric while the other implements unified spatial cells and measure separation in terms of cell unit. Anselin (1996) named them distance view and neighborhood view respectively. Boots and Getis (1988), on the other hand, classified them into distance method and quadrat method. No matter which measurement is employed, analysis is always subject to ecological fallacy, either of scale effect or zoning effect (Wrigley et. al, 1996). Thus, it is important for all clustering analysis that special consideration must be given to scale of general layout of underlying process as well as overall distribution of point incidents on case-by-case bases so that proper spatial separation measurement and zoning definition could be selected.

Figure 1: overview of different spatial cluster analysis
Figure 1. overview of different spatial cluster analysis methods

Overall, techniques showing local pattern and local non-stationary are of most interest for identifying spatial cluster. However, most available local pattern indicators are not designed to analyze point pattern. They based on arbitrary zoning of study area and definition of neighborhood. Point occurrences are aggregated and are described as an attribute of corresponding zone. As mentioned above, these methods are highly subjective to both zoning effect fallacy and scale effect fallacy. Plus, discretization of study area into regular cells predetermines the shape of identified cluster to a large degree, which does not necessarily be the actual distribution and shape of observation. In a word, calculating these indicators involve too much subjective input from analyst and is thus biased from the purpose of letting data speak for themselves.

3. "Hot spot" and related spatial analysis techniques

Although there are significant volume of discussions about "hot spot" in both research and application domain, there is virtually no clear definition for it. Basically, "hot spot" refers to area with unusually high occurrence of point incidents. People some times transform point observations into area measurement and define hot spot as area showing high quantity / intensity. Hot spot analysis aims to assist identification of locations with unusual high concentration of occurrence. Hot spot concept shows the following properties: First, it is a concept of both subjective and objective. On one hand, different people main perceive different hot spots over same observation, which is related to their experience with hot spot analysis and knowledge with the study area. On the other hand, high concentrations of point occurrence exist in the distribution no matter being recognized or not. Second, it is a relative rather than absolute concept. Areas with relatively higher concentration of incidents than immediate surroundings could probably show as hot spot, even though its absolute concentration may not be that high compared to entire area. Also, it is hard to identify clear boundary for hot spot due to the continuity of point distribution. Third, hot spots should be comparable in terms of "degree of hot" both within study and across studies under certain circumstances. There are needs to evaluate the intensity of hot spots' both in research and practice. Lastly and very important, population at risk is an important factor to be considered for hot spot analysis. As discussed in previous section, point distribution is impacted by and therefore reflects its underlying process. Depending on analysis purpose, risk analysis may be necessary so that more information about point pattern is revealed.

>From statistics point of view hot spot is spatial cluster of point occurrence. Methods for measuring spatial autocorrelation as discussed in section 2 are better for evaluating general spatial pattern than for identifying and measuring concentration of point occurrences. Local view approach can identify locations showing concentration, but they are limited in two aspects: concentrations identified are only meaningful at the scale of immediate surroundings; and enforcement of rigorously defined cell and neighborhood conflict with hot spot's properties of relativity and objectivity. Plus, there is no attempt in those methods to remove the impact of population at risk, and thus their results reflect patterns of observation combined with that of risk population.

Properties of hot spot require that hot spot analysis technique satisfy certain criteria:
(1) The analysis method should adopt a global view for evaluating data set while follows a local view for identifying hot spots. Adoption of global view guarantees the exclusiveness of the analysis as well as comparability of hot spots over study area. Implementation of local view is essential for identifying location of concentration. Absence of this smooth switch between global and local view is a major restriction for traditional spatial autocorrelation indicators in hot spot analysis.
(2) The method is better to let data show out the embedded pattern through "pattern enhancement" techniques rather than to present analyst with concrete result of hot spot. The final saying about hot spot and its location and spatial extension should be reserved for interpreter. This is consistent with the relativity and subjectivity properties of hot spot.
(3) Good hot spot analysis technique involves as little interaction as possible from analyst so that possible distortion to data due to outside interruption minimized. Even though there must be certain analyst input, every effort should be made to honor the distribution pattern and to control possible fallacy. This is a very strict criterion and could hardly be fully satisfied by any single analysis techniques, including those mentioned in section 2.
(4) Technique provides user with the flexibility to remove the effect of underlying process is more desirable for certain study purposes. For exploratory analysis, it is critical that influences of known process be removed and "pure" point pattern revealed so that new hypothesis could be build and further investigation properly directed. Traditional spatial analysis techniques are real weak in this aspect.

There are dozens of analysis methods for spatial point pattern analysis in general and hot spot identification in specific (Everitt, 1974). Although their ultimate purposes are similar, different philosophies lead to different methodologies, and thus different techniques. There are two most common groups: The firs group perceives concentration through spatial separation among point incidents. Thus, they use distance criterion to group incidents together, and area with more incidents is identified as hot spot. The second group perceives concentration as density of incidents occurrence, and so derives a density surface and identifies place with higher density as hot spot.

According to the way incidents are grouped together, methods in the first group can further be divided into bottom-up hierarchical techniques (Hartigan, 1975) and top-down partitioning techniques (Beal, 1969). The bottom-up techniques are inductive. They examine all occurrences and group incidents within certain distance together. Different distance criteria could be implemented, and minimum size of cluster is arbitrarily defined. Most importantly, the technique disregards users' final judgement about hot spot area by presenting results with hot spot identified arbitrarily. Thus, the analysis result could possibly be something other than user intended to get. The deductive top-down techniques assume that there is known number of clusters over study area. Certain criteria are implemented to select "seed" points and to identify membership of incidents to clusters around seeds. The weakness of this group of techniques is that they assume certain knowledge about the distribution of incidents. Also subjectivity is involved in related algorithms when seeds are arbitrarily located and memberships functions defined.

Compared to spatial separation view, kernel density is more reliable and desirable for hot spot analysis. First, it uses more information about point distribution than virtually all distance view cluster algorithms. A continuous density surface is generated over all locations to let data pattern show itself. Users can visually inspect the variability of density over the whole surface and identify hot spot depending on his/her observation point of view. Second, since density is a measurement of magnitude, in addition to showing spatial clustering, hot spots are comparable over space as well as between studies. Algorithms based on spatial separation of observations are shy to support the quantitative comparison of hot spots. Third, kernel density technique is more arbitrariness free and provides relatively stable analysis results to user. Different from algorithms based on distance, the only part that user needs to interact for density surface is defining the interpolation interval (cell size of grid). There are many literatures address the variations of cell size (e.g. Silverman, 1986, Venables et. al., 1997). Usually either fixed size or variable (adaptive) size is adopted (Kelsall et. al., 1995). Otherwise, user only needs to interpolate analysis results. Locations and intensity of potential hot spots are all visually clear on density surface. It is neither necessary for user to decide where to cut-off in order to define hot spots zone. Fourth, it allows operations on multiple density surfaces of different variables so as to cancel the effect of population at risk (Kelsall et. al., 1995). This is unique to distance measurement techniques and very helpful to many analysis.

Although density analysis satisfies almost all criteria for hot spot analysis listed above, it is still restricted due to its assumption of the continuous distribution of observations all over study area. For surface interpolation purpose, points incidents are supposed to be distributed continuously over space while distance is the only factor that change intensity estimation. However, in reality, underlying backgrounds are rare universally continuous and thus distribution could be forced to show mutation or discontinuity. Techniques that compromise with spatial pattern of underlying process would be preferable in this case.

4. Location Quotient and its potential in hot spot analysis

Location Quotient (LQ) is an indicator used in regional science and regional planning to evaluate economic structure and specialty (Klosterman et. al., 1993). Equation 1 presents basic formula for Location Quotient in regional science. "The LQ is a measure which compares the relative importance (in terms of output or employment) of an industry in a region to its relative importance in the nation (Jensen, et. al., 1979)", so as to evaluate the deviation of region's industry structure from that of the nation. A region showing greater than 1 of LQ for industry j is believed to be producing more than its share of national output in this industry j, and thus is defined as specialized in industry j.

Equation 1 ..........................equation (1)

There are two assumptions embedded in LQ measurement: (1) there is a "standard" area for every spatial level of observation (e. g. nation for region level study); and (2) there is a "normal" economic structure for all observations on the same spatial level. The "standard" area is believed to be the spatial extent within which subareas are linked together tightly. The economic structure of "standard" area is usually treated as "normal" for every subarea. Depending on the spatial scale of observation and activities investigated, there are different LQ measurements for same region. Different standard area could be defined for same region, while different categories of activity of same region could be analyzed.

Although LQ measurement has its limitations (Jensen, et. al., 1979) and there are researches performed to improve it, there are properties making LQ a good candidate for spatial pattern analysis. First, LQ bears spatial implication within its non-spatial intention. Designed to measure economic structure, it also shows spatial pattern of economic activity at subarea level within the standard area. Second, LQ reveals non-stationary based on global observation. LQ is calculated by comparing local attribute with global (normal area) level. This characteristic makes it better than global view indicator discussed in section 2, because LQ identifies abnormal spatial units. Meanwhile, it is not subjective to local view restriction brought about by neighborhood definition as local indicators mentioned in section 2. Third, similar as density surface analysis, LQ measurement is comparable within and across studies. LQ is a quantitative measurement of relative local activity intensity; bigger LQ indicates relatively higher intensity at specific region. Fourth, LQ analyzes spatial pattern based on the spatial extent and spatial unit defined by study area. Rather than generating its own investigating extent as density surface method does, it comprises with the reality spatial scale and subarea division. Plus, the subarea definition inherited from reality is less subjective to zoning fallacy compared to arbitrarily defined regular zone. Finally, being a measurement of relativity, LQ could be designed to reveal relationships between variables. LQ lends itself to risk analysis, which otherwise could only be done by density method among all spatial cluster analysis methods discussed in section 2 and 3.

Nevertheless, to author's knowledge, there are not many attempts to employ LQ on the analysis of point spatial pattern. There are some investigations of using LQ for crime pattern analysis (Brantingham and Brantingham 1994, 1995, and 1997, and Barr and Pease, 1990), but the researches are more from the viewpoint of crime activity structure than aimed at identify areas of more concentrated crime occurrences, i.e. crime hot spots. In another word, these investigations map LQ directly from an indicator for economic specialty to one for crime specialty. Author believes that a careful and case-based design of LQ measurement could enhance the investigation of spatial pattern in general and crime hot spot in specific. Details are included in section 5 where LQ is designed to inspect crime pattern.

Author believes that there are quite some spaces to explore the potential of LQ for spatial pattern analysis, both theoretically and technically. As a matter of fact, employing LQ as an indicator for hot spot is an experiment to explore its theoretical intention. LQ in regional science emphasizes economic structure of single zone versus the standard area to identify specialty for every zone. Yet, a slight transformation of equation 1 (equation 2) shows LQ's implication for measuring single activity in one zone versus that of the whole area to identify zones with above normal level. In addition, risk analysis, as mentioned above, is another exploration to LQ's theoretical domain due to LQ's intention of examining relationship between variables. However, deliberations must precede any variable-relationship inspection to avoid semantics fallacy (fallacy introduced by incorrectly linking variables together). Moreover, as LQ has been used in regional science as an indicator for subarea economic activity / service import and export, it could be interpreted in terms of displacement of point occurrences among subareas, i. e. given certain amount of population at risk, subareas could show different occurrence rate because there are spatial displacement processes.

Equation 2 ..........................equation (2)

Technically, conventional LQ could be extended to fulfill spatial cluster analysis task. First, definition of "standard" area could be modified according to specific application and analysis purpose. Traditional LQ defines unified "standard" area for every subarea. In reality, as implied by First Law of Geography (Tobler, 1979), different subarea has unique spatial connection with its own spatial context. The so-called "standard" area is more accurate for subareas locating at about the center of whole area than for those at periphery. Adoption of adaptive "standard" area would describe more accurately the "normal" situation for corresponding subarea (see equation 3).

Equation 3 ..........................equation (3)

Second, "normal" distribution level could also be defined as various functions of distribution within "standard" area. Conventional LQ calculating uses distribution level of "standard" area as "normal" level for subarea. Depending on research purpose and characteristics of specific observation, other definition of "normal" level such as area mean, median, or other complicated measurement could be employed. Plus, if implementing "adaptive standard" area, subareas would have different "normal" distribution level even simple standard area level still used.

Third, depending on thematic interpretation, selection of variables to be used in LQ measurement could be as much an art as it is a science. Because there are always relationship between variables embraced in the results of LQ measurement, proper and savvy combination of variables could be an effective way of exploratory as well as confirmatory data analysis. On the other hand, arbitrary and incorrect linkage of variables will definitely lead to semantic fallacy.

However, as all other spatial analysis methods, LQ measurement has its limitation in identifying hot spots. Location and shape of hot spots identified by LQ are both restricted by the zoning pattern of whole area. There would be less likely to identify hot spot located on boundaries between subareas. Different zoning could lead to different analysis results. This is the effect of so-called zoning fallacy. Also due to its zone-based property, Hot spots identified by LQ analysis have abrupt boundaries, while it is well accepted that graduating is rule and mutation is exception in reality.

Readers may realize that density surface analysis is free from both of these shortages. As a matter of fact, density surface method could be technically regarded as a special case of location quotient measurement (note1). However, they show different properties due to difference in epistemology. Kernel density and location quotient are two analysis methods that could be used for hot spots analysis. They have their respective advantages and disadvantages. Users should make decision on case-by-case bases. Research purpose, analyst's point of view, property of study area and distributions, data format, and computation and mapping resources should all be evaluated before final decision is made.

5. Hot spots analysis on Buffalo Unauthorized Use of Vehicle data

In this section, hot spots pattern analysis performed on Buffalo Unauthorized Use of Vehicle (UUV) data is presented. While there are definitely much left to do for pattern analysis on this data set, author hopes this section could serve as an experiment of hot spots analysis methods and techniques on crime data.

It is well recognized that some places in cities, suburbs, or rural areas have persistently high occurrences of crime than others. "Crime hot spot" is used to refer to this phenomenon of spatially clustered crime incidents in crime analysis lexicon. Nevertheless, it is hard to define "crime hot spot" in practical domain as well as scientific domain, because crime hot spot is more an operational concept. Perception of "hot spot" is highly related to observer's objective. Defining crime hot spots as areas with absolutely high crime occurrence is a good for crime control purpose practice, while crime prevention practice benefits more from identifying hot spots as high crime risk areas. Also, crime hot spots are context based. Areas out of crime hot spots in a generally high crime rate neighborhood may turn out to be crime hot spots if put into a relatively low crime neighborhood. Furthermore, observation scale is an important factor for crime hot spot definition. Hot spots significant to a community may means much less for a whole city. Plus, definition of crime hot spots is also related to operational scale in terms of both space and time. Readers are referred to more literatures for discussion of "hot spot" definition (Harries, 1999, Taylor, 1998, Sherman, 1995).

Despite of theoretical uncertainties about why and how certain locations are favored by criminals as offense sites, crime research and analysis has a long tradition in trying to identify effective methods and techniques for crime pattern analysis (Brantingham and Brantingham, 1984). Crime activity hot spot identification has always been one of great concern. As part of the result of a project organized by Crime Mapping Research Center of National Institute of Justice, "Multi-Method Exploration of Crime Hot Spots" intramural research project identifies five general categories of crime hot spot analysis methods: (1) visual interpretation; (2) Choropleth mapping; (3) grid cell analysis; (4) point pattern analysis; (5) spatial autocorrelation (Jefferis, 1999). Although this is not a strict and exclusive classification, it embraces major spatial cluster analysis methods for crime hot spot study. Visual interpretation is favored due to its simplicity and easiness to implement. But it is restricted to relatively simple pattern and smaller distribution (Sadahiro, 1997). It is not effective for hot spots identification on crime data set having many overlapped and closely located points.

Choropleth map is designed to show spatial pattern by shading polygons according to concerned attribute. It is actually a technique of data transformation and representation. Crime point data is aggregated into area measurement, during which process point location accuracy assumption is removed but zoning fallacy could be introduced. There is no theoretical support for the homogeneous representation of each polygon. Moreover, because there is a process of generating discrete categories from continuous measurement for map display purpose, different classification schema (e.g. equal interval, quantile, and standard deviation, etc.) could be presenting different information to map reader (map 3 versus map 5). Nevertheless, choropleth is still popular because it could be used to display virtually all measurement of polygon. Location quotient measurement for crime occurrences could be displayed with choropleth map beautifully.

Grid cell analysis, point pattern analysis, and spatial autocorrelation analysis are all included in discussions in section 2, 3, and 4. All analysis "involv(ing) a grid of equal size square cells that is draped over the point coordinates" belong to grid cell analysis according to Jefferis (1999). Among all grid cell methods, author employed density surface analysis on Buffalo UUV data, and compare the results with other analysis. Pros and cons of various analyses, especially density surface and Location Quotient analysis for Crime data (LQC), become apparent during the study.

All the empirical analyses in this research use UUV crime report and arrest data of Buffalo City for the year of 1996. Among 837 UUV arrests, 699 were successfully address-matched, and the corresponding UUV report addresses are geocoded too. Map 1 and Map 2 show point distributions of UUV arrest and crime report locations respectively. They are typical pin maps used for visually interpretation. However, for a big crime data set with points close to each other and even overlapping, it is very hard for nature eyes to identify hot spots, although people can tell for sure that the north and south parts of Buffalo City are less subjective to UUV crime than central part. To assist identification of UUV hot spots in Buffalo City, more analysis are needed.

As mentioned above, partitioning geographical extent of Buffalo City into zones is the first step for choropleth mapping. It is well known that population is not evenly distributed through out city. Population-at-risk clustering in central city leads to classic crime hot spots in central city. Chakravorty (1995) justifies that census polygons are more appropriate for crime cluster analysis than other polygon schema. Hence, census blockgroup is used for the city of Buffalo. Map 3 and map 4 show number of arrest and report UUV cases at blockgroup level respectively. Compared to pin mapping of map 1 and map 2, choropleth maps definitely provide more information for identifying areas with more UUV cases. However, due to the above-discussed data transformation and presentation techniques, UUV hot spots revealed by choropleth maps could be distorted from reality. First, choropleth mapping could "destroy" cross-boundary UUV clusters and "create" hot spots based on cases actually belong to different clusters. Second, UUV hot spots identified are not consistent in term of spatial scale due to the variation of blockgroup extent. Hot spots could be identified simply because more UUV cases locate in one big blockgroup, such as those in the southwest part of Buffalo City. Third, different classification techniques for map presentation could pass to map reader different information of UUV case concentration. Map 5 is also a choropleth analysis of Buffalo UUV arrest data. Compared to map 3, map 5 shows deviation of UUV count of each blockgroup from average level. Hot spots identified in map 5 are relative to average count of UUV incidents in the city and thus more comparable within and across study areas. But map 5 does not identify hot spots consisting of conjunctive and small-sized blockgroups each with around average count of UUV cases (thus possible high concentration of UUV incidents within certain spatial extent). UUV arrest hot spots identified by other methods in the central part of the City (the circled area on map 5) do not show on map 5 at all.

Map 1: Buffalo UUV Crime Arrest Pin Map

Map 1: Buffalo UUV Crime Arrest Pin Map

Map 2: Buffalo UUV Crime Report Pin Map (cleared cases)

Map 2: Buffalo UUV Crime Report Pin Map (cleared cases)

Map 3: UUV Arrest Count for Blockgroup

Map 3: UUV Arrest Count for Blockgroup

Map 4: UUV Report Count for Blockgroup

Map 4: UUV Report Count for Blockgroup

Map 5: UUV Arrest Count for Blockgroup

Map 5: UUV Arrest Count for Blockgroup

For the general concern of UUV incident concentration over space, UUV density surface map is a wonderful measurement. Map 6 and map 7 shows kernel density analysis for Buffalo UUV arrest and crime data. Search radius is 1000 feet so as to approximate the average size of blockgroup (2000 feet). Cell size is defined as 200 feet and thus a grid of 171X256 is draped on study area. Comparing to choropleth mapping, kernel density analysis honors more point location information with little spatial displacement. Hot spots identified by choropleth mapping on the southwest part of Buffalo do not show any more. UUV crime hot spots identified by kernel density methods are more consistent with common sense to be centered along major roads rather than compromised with blockgroup boundaries. Three UUV crime "corridors" are apparent on both maps: the north-south one on west side of the City, the northeast-southwest one along Main Street, and the east-west one between Broadway Street and Sycamore Street. Also, UUV hot spots size and intensity are more quantitative and comparable through out study area so as to ease police's assessment for priority areas for resource deployment. Both maps show most severe UUV hot spots in and around central part of the City instead of in downtown area, where usually has worst image for being believed to have high crime occurrence. This result is consistent with the general distribution of population residence and vehicles available (see map 8 and 9). In addition, difference in patterns revealed by map 6 and map 7 might indicate travel pattern of UUV offenders in a degree. However, to completely interpret the kernel density surface with regards to location and severity, information and knowledge about the City's general layout, about social and physical context of specific area in the city, and about UUV criminal's behavior pattern is necessary.

Map 6: UUV Arrest Density

Map 6: UUV Arrest Density

Map 7: UUV Report Density

Map 7: UUV Report Density

Based on grid of equal-sized square cells, density surface analyses as map 6 and 7 are subject to critics of bearing information of population at risk into hot spot identification. Places with more people are very possible to generate more UUV criminal; and places with more vehicles may attract more UUV incidents. Map 8 is kernel density map showing population distribution pattern. Map 9 is kernel density map showing estimation of available vehicle. Both of them are based on census 1990 data. Visual comparison of these two maps with map 6 and map 7 suggests that comparing to population distribution, distribution of available vehicle is more related to spatial pattern of UUV crime.

Map 8: Buffalo Population Density

Map 8: Buffalo Population Density

Map 9: Buffalo Vehicle Availability

Map 9: Buffalo Vehicle Availability

To make the above interpretation on UUV crime-population relationship and UUV crime-vehicle relationship quantitative and more convincing, map algebra could be performed on multiple kernel density maps. Map 10 shows arrest rate for UUV crime. UUV arrest density map (map 6) is divided by population density map (map 8) and result is displayed using method of equal interval. Comparing to UUV arrest density map (map 6), hot spots in map 10 identify areas with high UUV arrest / population rate. Most severe hot spots on map 10 appear in areas south to the central parts of City identified by map 6. While corridor along Main Street still shows, the other twos, especially the one on the west is greatly weakened. This map identifies areas with high UUV arrest rate, meaning that with respect to population size, hot post areas have relatively more people arrested because of UUV crime. While map 6 can be used to assist deployment of police resource to solve UUV case, map 10 would help to investigate the relationship between UUV arrest and factors other than population size. Similarly, map 11 is UUV crime vehicle-risk map where UUV report density map (map 7) is divided by vehicle availability map (map 9). Again, downtown area is shown as having the most severe hot spots. And some spots appear on the northwest and northeast part of the City. This suggests that there must be unique factors other than availability of vehicle in these areas attracting UUV offenders.

Map 10: Buffalo UUV Arrest Rate Map

Map 10: Buffalo UUV Arrest Rate Map

Map 11: Buffalo UUV Vehicle Risk Map

Map 11: Buffalo UUV Vehicle Risk Map

As discussed in section 4, location quotient is a good measurement for risk analysis. Measurement for both UUV case and population at risk can be aggregated to blockgroup level. (Actually, census data only provides area description of population size and vehicle availability. They were transformed to point data by author for density surface analysis, during which process inaccuracy was introduced.) In addition to the justification based on general similarity and continuity within census polygon in socio-demographic aspects (Chakravorty, 1995), census blockgroup is used as measurement unit for LQ with the assumption that blockgroup may form good control zone for community's action against crime. Calculated as an attribute for census polygons in city of Buffalo, LQ analysis results for UUV crime are presented with the help of choropleth mapping. Map 12, 13, and 14 show different LQ measurements of Buffalo UUV crime. Map 12 is UUV crime arrest rate LQ, which is calculated by comparing UUV arrest rate of each blockgroup with that of the whole city. Map 13 shows LQ for vehicle-risk for UUV crime, where UUV crime vehicle-risk of blockgroup is compared with general risk of the City. Comparing map 12, map 13 with density surface risk analysis in map 9 and 10, people can see very similar patterns identified. However, LQ measurement shows risk pattern with spatial layout consistent with other social and demographic measurement and thus easier for analyst to identify possible relationships between UUV crime and other characteristics. This is significant for the advance of human being's understanding of crime and criminal. Furthermore, different from density surface analysis, LQ analysis has the flexibility of moving through different observation scales to further its ability in assisting exploring possible relationships. As shown by equation 2, LQ can be designed to reveal hot spots relative to different "standard" area. Map 14 is also a UUV arrest rate map but it uses census tract instead of Buffalo City as reference area. Thus, it reveals relative arrest rate of every blockgroup versus its census track. This is informative for inspecting within tract difference that may related to high UUV arrest. Nevertheless, using choropleth mapping for presentation, LQ methods are subjective to the restrictions related to choropleth mapping. Almost all LQ maps show hot spots for big blockgroups on the southwest part of the City. And spots in central part of the City with small blockgroups concentrated are more than once "escaped" from being identified due to the inconsistent spatial extent of census polygons.

Map 12: LQ of UUV Arrest versus Population

Map 12: LQ of UUV Arrest versus Population

Map 13: LQ of UUV Report versus Vehicle Available

Map 13: LQ of UUV Report versus Vehicle Available

Map 14: LQ of UUV Arrest versus Population Using Tract as Control

Map 14: LQ of UUV Arrest versus Population (Track as Control Area)

There is definitely much room to improve the UUV hot spot analysis of Buffalo City. For LQ analysis, adaptive definition of "standard" area based on blockgroup contiguity could be used. Distance methods like nearest neighbor analysis could be used on crime point pattern analysis for comparison purpose. UUV crime data could be examined against other social, economic, and demographic characteristics of blockgroup, e. g population under poverty level, unemployment rate, or landuse categories, etc. UUV crime patterns could also be investigated at the conjunction of space and time axis. Different spatial pattern may show for different time phrase, which could be related to temporal pattern of other social and demographic characters. As for most ESDA, identification of crime point pattern is just a first step for expending our knowledge of Buffalo UUV crime. Explanatory and confirmatory analysis should always follow to complete knowledge expending process.

6. Conclusion

In a word, hot spots analysis methods are different from each other not only in technical measurement and theoretical assumptions, but also in observation point of view and purpose of revealing different aspects of spatial pattern. Moreover, each method has its own pros and cons for hot spots identification. It is very important to be fully aware of the context of phenomenon being investigated so that proper hot spot analysis method employed. It is very common that more than one analysis being performed on same data to reveal patterns from different aspects for different purposes. Expecting concrete and consistent hot spots pattern showing through various analyses would be too naive. Cross-reference of results from multi-method hot spot analysis is very beneficial for a comprehensive understanding of data. Finally, analyst's interaction always plays a critical role in the process of hot spot analysis. Mathematics and statistics methods are only tools helping analyst to investigate data. Parameters need to be defined as required by different methods; data transformation and display techniques are involved in analysis from time to time; spatial partitioning and aggregation need to be performed sometimes; and most importantly, knowledge of context is definitely required for proper design. Hot spot analysis is an art as much as it is a science.


Author wishes to thank Buffalo Police Department for providing UUV crime data for analysis.


1. Anselin, L., 1996, The Moran scatterplot as an ESDA tool to assess local instability in spatial association, in Fisher, M., Scholten, H., and Unwin, D (Eds.) Spatial Analytical Perspectives on GIS, pp. 111-125, London: Taylor & Francis.
2. Barr, R., and Pease, K., 1990, Crime Placement, Displacement and Detection. Crime and Justice: A Review of Research, Vol. 12, pp. 277-318
. 3. Beal, E.M.L., 1969, Cluster Analysis. London: Scientific Control Systems.
4. Boots, B.N., and Getis, A., 1988, Point Pattern Analysis. Newbury Park, CA: Sage Publications.
5. Brantingham, P.J., and Brantingham, P.L., 1984, Patterns in Crime. New York: Macmillan.
6. Brantingham, P.L., Brantingham, P.J., 1997, Mapping Crime for Analytic Purposes: Location Quotients, Counts, and Rates. In Weisburd, D., and McEwen, J.T. (Eds.), Crime Mapping and Crime Prevention, pp. 263-288. New York: Criminal Justice Press.
7. Brantingham, P.L., Brantingham, P.J., 1995, Location Quotients and Crime Hot Spots in the City, in Block, C.R., Dabdoub, M., and Fregly, S. (Eds.), Crime Analysis Through Computer Mapping, pp. 129 - 149. Washington: Police Executive Research Forum.
8. Brantingham, P.L., Brantingham, P.J., 1994, Crime Analysis Using Location Quotients, In Zahm, D., and Cromwell, P. (Eds.), Proceedings of the International Seminar on Environmental Criminology and Crime Analysis, pp. 83-94. Tallahassee, Fla: Florida Statistical Analysis Center, Florida Criminal Justice Institute.
9. Chakravorty, S., 1995, Identifying Crime Clusters: the Spatial Principles. Middle States Geographer, Vol. 28.
10. Everitt, B., 1974, Cluster Analysis. London: Heinemann Education Books.
11. Getis, A., Gartin, J., Wright, R., Drummy, P., Gorr, W., Harries, K., Rogerson, P., and Stoe, D., 1999, UCGIS White Paper: Geographic Information Science and Crime Analysis.
12. Harries, Keith, 1999, Mapping Crime: Principle and Practice, Washington DC: U.S. Department of Justice, Office of Justice Program, National Institute of Justice, Crime Mapping Research Center.
13. Hartigan, J.A., 1975, Cluster Algorithms. New York: John Wiley & Sons.
14. Jefferis, E., (ed.) 1999, A Multi-method Exploration of Crime Hot Spot: A Summary of Findings. Washington DC: U.S. Department of Justice, Office of Justice Program, National Institute of Justice, Crime Mapping Research Center.
15. Jensen, R.C., Mandeville, T.D., and Karunaratne, N.D., 1979, Regional Economic Planning: Generation of Regional Input-output Analysis. London: Croom Helm.
16. Kelsall, J.E., and Diggle, P.J., 1995, Kernel estimation of relative risk, Bernoulli, 1, pp. 3-16.
17. Openshaw, Stan, 1991, Developing appropriate spatial analysis methods for GIS, in Maguire, D., Good child, M.F., and Rhind, D. (Eds.) Geographical Information Systems: Principles and Application, Vol. 1, pp. 389-402, London: Longman.
18. Sadahiro, Y., 1997, Cluster Perception in Distribution of Point Objects. Cartographica, Vol. 34, No. 1.
19. Silverman, B.W., 1986, Density Estimation for Statistics and Data Analysis. London: Chapman & Hall.
20. Sherman, L.W., 1995, Hot spots of crime and criminal careers of places, in Eck, J.E., and Weisburd, D. (Eds.) Crime and Place, pp. 35-52. Monsey, New York: Criminal Justice Press.
21. Taylor, Ralph, 1998, Crime and small-scale places: what we know, what we can prevent, and what else we need to know, in Crime and Place: Plenary Papers of the 1997 Conference on Criminal Justice Research and Evaluation. Washington DC: U.S. Department of Justice, Office of Justice Program, National Institute of Justice.
22. Tobler, W. 1979, Cellular geography, in Gale, S. and Olson G. (Eds.) Philosophy in Geography, pp. 379-386. Dordrecht: Reidel.
23. Venables, W.N., and Ripley B.D., 1997, Modern Applied Statistics with S-Plus (second edition). New York: Springer-Verlag.
24. Wrigley, N., Holt, T., Steel, D., and Tranmer, M., 1996, Analyzing, modeling, and resolving the ecological fallacy, in Longley, P., and Batty, M. (Eds.) Spatial Analysis: Modeling in GIS Environment, pp. 23-30. New York: John Wiley & Sons.

note 1:
1. According to definition of LQ (see equation 1), location quotient is a relativity measurement of one attribute of subarea with relation to another related attribute versus that of the whole area. Thus, crime activity intensity could also be measured with Crime Density Location Quotient (equation 4)

Equation 4 ..........................equation (4)

It is not hard to tell that the numerator in the above equation is density measurement of subarea i, while the denominator is a measurement of general density in the whole standard area. For a study taking a physically consistent "standard" area, such as the physical extent of the whole city, the denominator is a constant. Thus, equation 4 could be write as

Equation 5 ..........................equation (5)

If the whole area is divided into equal-sized square cells as the grid density surface, LQi would be a direct reflect of density measurement for location i, as that of density surface. ??