Skip to document

Lecture notes, lecture spatial

Course

Environmental Statistics (ES 714)

7 Documents
Students shared 7 documents in this course
Academic year: 2012/2013
Uploaded by:
Anonymous Student
This document has been uploaded by a student, just like you, who decided to remain anonymous.
Massachusetts Institute of Technology

Comments

Please sign in or register to post comments.
  • Student
    Very good and summarized spatial statistical model

Preview text

March, 2012

Spatial Statistics

This chapter provides a brief introduction to the statistical analysis and modeling of spatialdata.

1 Introduction

Spatial data is distinguished by observations that are obtained at spatial locations s 1 ,s 2 ,... ,snwhere thesiare coordinates in the planeℜ 2 or spaceℜ 3 typically.

Time series models attempt to model the correlations between responses at different time points. Similarly, with spatial data, the spatial correlational structure needs to be incorporated and modeled. The field of spatial statistics is a relatively new area of development and remains an area of active statistical research. The importance of spatial dependencies was realized in the early 1900’s and some of the methods of experimental design (randomization, blocking, etc.) where established in agricultural studies to control for spatial correlation.

Spatial statistics consists primarily of three main components:

  • Geostatistics– a spatial process indexed over a continuous space.

  • Spatial Point Patterns– pertaining to the location of “events” of interest.

  • Lattice Data– spatial data indexed over a lattice of points.

Below is a short description of these three methods.

  1. Geostatistics began with mining type applications and the prefix “geo” indicates that geostatistics dealt initially with statistics pertaining to the earth. Geostatistics has applications in climatology, environmental monitoring, and geology. One of the most common geostatistical applications in the mining context waskriging which deals with predicting the ore grade in a mining area from spatially observed samples. Geostatistical applications have expanded original mining applications to include modeling soil properties, ground water studies, rainfall precipitation, public health, etc.

  2. Point Patterns. Data analysis of point patterns corresponds to studies where the interest lies in where events of interest occur. A fundamental question of interest in this context is whether or not the points of interest are occurring at random, or do the points cluster in some manner, or perhaps the points of interest are occurring with some sort of regularity.

  3. Lattice Data. Lattice type data provide the closest analogue to time series data. In time series data sets, observations are typically obtained at equally spaced

time points. For lattice data, spatial data is obtained over a regularly spaced set of points (irregular lattice type data is a possibility as well). Lattice data often comes in the form ofpixels, which are small rectangularly shaped regions, often obtained by remote sensing from satellites or aircraft. Lattice type data for spatial statistics is very similar to the type lattice type data one obtains from medical imagining studies, such as PET scans which yield images of the brain by way of pixels or voxels.

We shall explore each of these three types of spatial data in more detail next.

2 Geostatistics

Geostatistical data is characterized by data involving the measure of a variable of interest over a spatial region where the measurement points can vary continuously. The primary tool in geostatistical analysis is thevariogramwhich we will describe shortly.

LetDdenote a region of interest (a forest, a field, a mining area etc). Each point s= (x, y) inDcan be described by anxandycoordinate in the plane. In applications we want to measure some variablez(water or pH, mercury levels etc.) at locations within the regionD. Letz(s) denote the random variable that can be measured at locationsin the region. In practice, measurements are obtained at a finite number nof points (out of a possible infinite collection of points inD). Geostatistical data then looks like: z(s 1 ), z(s 2 ),... , z(sn).

A very simple model for geostatistical data is

z(s) =μ+ǫ(s). (1)

The errorǫ(s) is assumed to have a mean of zero in which case

E[z(s)] =μ. (2)

Another common assumption is that ofhomoskedasticity:

var[z(s)] =σ 2 (3)

for all pointssinD. Recall that the autocovariance function for a stationary time series only depends on the lag between the time points. Similarly, for geostatistical data, we can consider covariance functions for the responsezat two different points s 1 ands 2 that depend only on the difference in locations (distance and direction) between the two points:

cov[z(s 1 ), z(s 2 )] =c(s 1 −s 2 ), (4)

for some functionc. Spatial data that satisfy the conditions of (2), (3), (4) is called second-order stationary. Additionally, if the covariance function depends only on the difference between the two points, then it is calledintrinsically stationary. An

0 1 2 3 4 5 6

2

4

6

8

10

Variogram Illustration

Distance Between Points (h)

Variogram

Sill

Range of Influence

Nugget Effect

Figure 1: A plot of a theoretical variogram based on a Gaussian model with nugget = 3, sill = 10, and range of influence = 4

variogram. As the distancehgets larger, the variogram values increase indicating that as points get farther apart, the expected difference between the measured response at those two points increases as well.

The Nugget Effect. One would expect that 2γ(0) should equal zero. That is, var[z(s 1 )−z(s 2 )] should equal zero ifs 1 =s 2. However, this is usually not the case. As the distancehgoes to zero, there tends to be anugget effectdue to measurement error and microscale variation. The nugget effect is evident in Figure 1 whenh= 0.

The Sill. Another aspect of the variogram is the sillwhich corresponds to the maximum height of the variogram curve. The sill is also indicated in Figure 1. Ash gets large, the correlation (and hence covariance) between the response at two points separated by a distance hbecomes negligible. In this case, the variogram value 2 γ(h) = var[z(s 1 )−z(s 2 )]≈ 2 σ 2. Thus, the sill in the variogram plot corresponds to two times the variance.

The Range. The range is the distancehsuch that pairs of sites further than this distance apart are negligibly correlated. Therange of influenceis sometimes defined as the point at which the curve is 95% of the difference between the nugget and the sill.

2 Models for Variograms

The variogram plot in Figure 1 is for a model variogram. There exist various models for variograms used in practice. One of the more common models is theGaussian modelgiven by the following expression:

Gaussian Model γ(h) =c+(S−c){ 1 −e− 3 h 2 /a 2 }, (5) wherecis the nugget effect,Sis the sill, andais the range of influence. Note that γ(0) =c, the nugget effect. Also, ash→ ∞,γ(h) approachesS, the sill. Finally, whenh=a, thenγ(a) =c+ (S−c)(1−e− 3 )≈c+ 0(S−c).

Some other common models used for variograms are thespherical model:

Spherical Model γ(h) =

{ c+ (S−c){ 1 .5(h/a)− 0 .5(h/a) 3 }, forh≤a c, otherwise

,

theexponential model:

Exponential Model γ(h) =c+(S−c){ 1 −e− 3 h/a},

and thepower model:

Power Model γ(h) =c+Ahw.

In each of these models, the constantcis the nugget effect. Note that the power model increases without bound.

2 Estimation of the Variogram

Theclassicalmethod of estimating the variogram (which corresponds to the method of moments estimator) is given by the following formula:

2ˆγ(h) =

1

|N(h)|

N(h)

[z(si)−z(sj)] 2 , (6)

whereN(h) is the set of all distinct pairs of pointsz(si), z(sj) such that‖si−sj‖=h. In practice, the data is smoothed to generate an estimate of the variogram. For instance, the data can be partitioned into groups where observations in particular groups are within a certain range of distance apart and then using the average squared difference of the points in each group to replace the sum in (6).

Nonlinear least squares can be used to obtain a model estimate of the variogram, such as the Gaussian model in (5), for the variogram.

Nonconstant-Mean Assumption.

finding the weightsaiin (8) is found using the method of Lagrange multipliers and is given by 

   

a 1 a 2 .. . an m

    

=

    

c 11 c 12 ··· c 1 n 1 c 21 c 22 ··· c 2 n 1 .. .

..
. ···
..
.
..
.

cn 1 c 2 n ··· cnn 1 1 1 ··· 1 0

    

− 1     

c 10 c 20 .. . cn 0 1

    

,

wheremis the Lagrange multiplier. The variogram is needed in the prediction equa- tions used by kriging. For additional details, see Thompson (1992, p 243).

There exists different types of kriging. Simple krigingassumes the mean is constant through out the region while ordinary kriging allows for the mean to vary in different parts of the region by only using sample points nearest to the point to be predicted.

3 Spatial Designs

If the object of the spatial analysis is to predict a value at a new spatial location, then the issue of sampling points arises. What sampling points should be obtained in order to obtain the most accurate predictions? An obvious answer is to sample points near the point you want to make a later prediction. However, if you do not know ahead of time where you want to make predictions, or if you want to make predictions at several locations, how should points be sampled?

The answer is to obtain points that provide the most information possible. Responses for points near each other spatially tend to be highly correlated. If responses are highly correlated, then knowing one response tells us a lot about the other response. That is, we obtain some redundant information. One can show mathematically that predictions become more accurate (in terms of mean squared error) as the correlations between the sampled points becomes smaller. This suggest that the sampled points should be spread out as much as possible, perhaps in a systematic fashion. However, systematic sampling patterns can become very inefficient if there is some sort of periodic variation in the region.

For systematic sampling, Mat ́ern (1986) found that for predicting the mean over a region that a triangular grid was most efficient, more so than a square grid.

In order to estimate the mean concentration of a point-source pollutant over a re- gion, McArthur (1987) found that the most efficient strategy was to use a stratified systematic sampling approach.

3 Spatial Point Patterns

Consider a region D and in this region we are interested in locations of certain “events”. We may ask ourselves if the events of interest are occurring randomly throughout the area, or if the events tend to cluster together. Some examples may help to illustrate the ideas:

  1. A study was done which located nests of the antMessor wasmaniin a 240 by 250 square foot area (Harkness and Isham 1983). Additionally, nests of the antCataglyphis bicolorwere also located. Another question of interest besides whether or not the nests are randomly located throughout the region is whether or not there is any evidence of a relationship between the positions for the two species of ants.

  2. A study was conducted on the longleaf pine in an old growth forest in Thomas County Georgia (see Rathbun and Cressie 1990). Data on 584 tree locations were obtained in this study. Again, one of the main questions of interest is whether or not the spatial locations of the trees in the forest is random or are they clustered in some way.

The problem with trying to determine the presence of clustering in a spatial data set can be very difficult. Points formed from completely random processes can appear clustered.

Spatial point patterns consisting ofnevents will be compared to completely random point processes. Acomplete spatial randomness(also known asa homogeneous Pois- son process) is one where, conditional onnevents (i. points in the region), the events are independently and uniformly distributed over the region.

Statistical methods have been developed to test if a spatial point pattern is random or not. Some of these methods are computer intensiveDistance based methods.

Figure 2 shows a plot of spatial data that was generated as in accordance to complete spatial randomness. Any pattern that shows up in this plot is due completely to chance and not because of some underlying structure. Figure 3 shows a plot of spatially located points that appears to be spread roughly uniformly throughout the region in a grid-like pattern. Some people may mistakenly consider this random data due to the uniform spacing. However, the data in Figure 3 has very little randomness associated with it. Finally, Figure 4 shows data generated by two clusters of points. Is it evident that there exists two distinct clusters in Figure 4?

Before going into details on some testing procedures, we introduce a couple of defini- tions.

3 Intensity Function

For geostatistical data, the mean functionμ(s) was of interest. The analogue of the mean function for spatial point patterns is the intensity functionλ(·).LetNsdenote the number of events in a square centered at locations. Consider the ratio

P(Ns>0) area of square

.

The intensity functionλ(s) is the limit of this expression as the area of the square goes to zero. There exist various methods for estimating the intensity functionλ(s)

0 0 0 0 0 1.

Clustered Data

x

y

Figure 4: Clustered spatial data generated from two underlying clusters.

including nonparametrickernelestimators (commonly used to estimate density func- tions) which are based on smoothing the data.

TheKFunction analogue of the variogram from geostatistics is theKfunction for spatial point patterns. TheKfunction captures the spatial dependence between different parts of the region where the sampling takes place. TheKfunction is defined to be

K(h) =

(Ave. # events within a distancehof each other) λ

.

This is estimated empirically by replacing the numerator by the average number of pairs of points within a distancehof each other and replacing the denominator by an estimator of the intensity.

3 Quadrat Methods

The basic idea of quadrat methods is to divide the regionDinto subsets often rect- angular in shape (but other shapes are used as well), and then counting the number of events in each of the subsets. The use of quadrat counts can be used to access whether there is any spatial pattern in the data. If clustering is present in the data, then one would expect quadrats with higher counts to be located near each other. On the other hand, if the quadrat counts are spread out over the region, then there is evidence of uniformity.

If the events correspond to spatial randomness, then the quadrat counts should con- form to this randomness as well. Recall that count data is often modeled by a Poisson distribution. One can show that for complete spatial randomness the quadrat counts will indeed have a Poisson distribution. Recall that the Poisson distribution has the following density function:

f(x) =e−μμx/x!, x= 0, 1 , 2 ,... ,

whereμis the rate parameter (and also the mean and variance of the distribution).

Because the mean and variance of a Poisson distribution are equal, we can define a test statistic based on this property. For all the quadrats in the data set, compute the mean count ̄xand the sample variances 2 of the quadrat counts. Then the quantity

R=s 2 /x, ̄

should be close to one in value if the counts are Poisson distributed. Deviations ofR from 1 indicate deviations from spatial randomness. In particular,

  • IfRis too small (less than 1) then the mean is bigger than the variance. This indicates that the counts appear more uniform than expected from a strictly random process. If the events are more spread out than what one would expect from a random scatter of points, then that is an indication ofevenness.

  • IfRis too big (greater than 1), then the variance is bigger than the mean. This is an indication of events to accumulate together which is indicative of clustering.

Of course, the natural question is: what constitutes “too big” or “too small”? To answer this question, the behavior (i. sampling distribution) ofRneeds to be known when the null hypothesis is true. IfRis standardized as

T=
(R−1)

√ 2 /(n−1)

,

thenTfollows at-distribution onn−1 degrees of freedom approximately under the hypothesis of complete randomness.

Example to Figure 2, Figure 3, and Figure 4, let us test for randomness using the quadrat method by dividing each region into square quadrats. The quadrat grid is superimposed on the data in Figure 5, Figure 6, and Figure 7. For the random data, the counts in each of the quadrats are (going down the first column and then the second column, etc).

The quadrat counts for the data in Figure 2 are:

0 0 0 0 0 1.

Random Data

x

y

Figure 5: Spatial data generated from random data points with square quadrats.

0 0 0 0 1 0 2 2 1 0
0 0 0 0 1 2 1 0 0 0
0 0 1 0 0 2 6 3 1 1
0 0 1 0 0 0 2 2 3 1
0 0 0 1 0 1 2 3 0 1
2 1 2 1 0 2 0 1 1 0
0 3 2 0 2 1 0 0 1 0
2 1 1 3 2 1 1 0 0 0
0 2 1 1 2 1 0 0 0 0
0 0 0 1 0 0 0 0 0 0

Quadrat counts for the clustered spatial data.

The mean and variance of the quadrat counts for the clustered data in Figure 7 are x ̄= 0 ands 2 = 1. The ratio of the variance to the mean isR= 1, and thet-test statistic is

t=

(R−1)

√ 2 /(n−1)

=
(1. 388 −1)

√ 2 /(79−1)

= 2. 426

which yields a two-tailedp-value ofp= 0. 0176. Thus, we reject the hypothesis of random data shown in Figure 4. BecauseR= 1. 388 >1, we conclude that the counts are more variable than one would expect by chance which is consistent with clustered data.

0 0 0 0 0.

Gridded Data

x

y

Figure 6: Data in a spatial region forming a grid-like pattern with square quadrats.

0 0 0 0 0 1.

Clustered Data

x

y

Figure 7: Clustered spatial data generated from two underlying clusters with square quadrats.

these nearest-neighbor distances. Next, simulate a set ofnpoints from a completely random point processes and compute this same nearest-neighbor average distance. We can continue to generate simulated data sets that are completely random. For each of these simulated data sets, one can compute the nearest-neighbor average distance. These simulated nearest-neighbor average distances can then be used to determine the sampling distribution of the nearest-neighbor average distance when the null hypothesis of complete spatial randomness holds. Generally, a large number of simulated data sets need to be generated in order to obtain a good approximation to the null-distribution of the test statistic (e. 1000 to 10,000 data sets). The nearest- neighbor average distance for the original data can then be compared to this null distribution to access whether or not the hypothesis of complete spatial randomness can be rejected. In particular, if the nearest-neighbor average distance for the original data falls in the extreme tail of the null distribution, then that is evidence against the null hypothesis. This type of procedure is an example of aMonte Carlotest.

Using only the nearest neighbor in the test just described is not very efficient use of the data. One can also consider measuring distances to the nearest second, third, fourth neighbor and so on. LetQ 1 equal the mean distance of each point to its nearest neighbor, and letQ 2 equal the mean distance of each point to its second nearest neighbor. We can compute the statisticsQ 1 , Q 2 ,... , Qk, and so on. Again, Monte Carlo methods can be used to simulate the null distribution of Q 1 , Q 2 ,... , and compare then compare the values of Q 1 , Q 2 ,... , from the actual data to the null-distribution.

One difficulty that arises with these nearest-neighbor tests deals with the boundary of the area of interest, calledEdge Effects. Nearest neighbor distances for events near the boundary will tend to be larger than points near the center of the region because points near the boundary can only have nearest neighbors in a direction away from the boundary. There are a few common ways of dealing with this problem such as constructing a “guard” area inside the perimeter of the region and only points inside the guard area are considered. These edge effects can have at times considerable influence on the statistical results (see Figure 8 of Cressie (1993, page 613)).

4 Spatial Models for Lattice Data

We have considered spatial data in a regionD. In this section, the regionDis a finite (or countable) collection of spatial sites where observations are made. The collection of points inDis known as a lattice. The spatial sites in a lattice are typically identified using itslongitude(x) and itslatitude(y).

Example an epidemiological study in the state of Ohio and let the lattice consist of the counties of Ohio. Each county can be identified by the longitude and latitude of the county seat.

Another aspect of lattice data is that of aneighborhood. In the Ohio counties example, the neighborhood of a given county will be the collection of counties that are nearby, say in terms of distance. For instance, the neighborhood of Greene County can be

defined as all counties within 50 miles. The distance metric can be Euclidean distance, but other distance metrics can be used. For example, in an urban area, “city-block” distance may be the appropriate metric:|xi−xj|+|yi−yj|.Another way of defining a neighborhood is to simply define it to be the set of counties that border a given county.

There are three characteristics of lattice data that need to be considered:

  1. Is the latticeregularor irregular? The Ohio Counties are irregular. On the other hand, consider a study of yield from an agricultural experiment. A regular lattice of points can be placed over the field to determine where the sampling will take place.

  2. Do the locations in the lattice refer to “points” or “regions”? In the Ohio counties, each county is a region.

  3. Is the response measured at the lattices “discrete” such as a count or “contin- uous”?

Initial Data Analysis for Lattice Data. A simple way to illustrate lattice data graphically is to use achoropleth map. A choropleth map is a map which shows regions or areas which have the same characteristics. For instance, in an epidemi- ological study of a particular disease in Ohio counties, we could record the number of instances of the disease in each county and then look at the rates in each county (# cases/county population). Counties with similar rates will then be shaded the same color. However, choropleth maps of rates as described here can be quite mis- leading because of varying populations within counties. The variability of the rates for smaller counties will be much higher than for counties with large populations.

If the data are on a regular lattice, it is useful to “smooth” the data. One method for doing this is to apply themedian polishthat was discussed previously.

Depending on the type of data being analyzed, a transformation may be helpful. For instance, if we are analyzing count data, such as the number of cases of a disease in the population, the distribution of the counts is binomial. The variance of a binomial random variable depends on the number of trialsn, which in this case is the population size. Often, a square-root transformation is helpful to get rid of the dependence between mean and variance of the measurements.

The next two sub-sections pertain to continuous response variables at the lattice points.

4 Spatial Markov Process

In the Time Series chapter, the termMarkovwas introduced to describe a time series where the response at timetdepended only on the response immediately preceding it at timet−1 plus a random error. A spatial analogue of this sort of dependence states that the responsezat the point (i, j) in a square lattice in the plane depends only on the responses at the four nearest-neighbor sites.

  1. Nonnegative Images–θcan assume any positive real number value:θ >0.

Here are some terms associated with remote sensing:

Restoration: One of the statistical goals in remote sensing is to estimate the true values ofθinDbased on datay(s 1 ),... , y(sn) obtained from the image.

Segmentation:The partitioning of the latticeDinto regions corresponding to different types of surfaces etc. The goal is that the groups in the partition are homogeneous in some respect.

Classification:The problem of assigning a pixel (or region) a label (e. water surface, sandy soil, agricultural surface, etc.). The multivariate topic ofdiscriminant analysis is the statistical tool used for classifying pixels into one of several possible classes. The classical methods of discriminant analysis assume the observations are independent of one another. This is clearly a false assumption for spatial data due to spatial correlations. Smoothing techniques can be used to incorporate spatial information for discriminant analysis. One common approach is to use a moving average (or median)filter(see the time series notes) where the smoothed response at a lattice point is equal to a weighted average (or median) of nearby lattice points. The nearest points get the highest weights.

6 References

Cressie, N. (1993),Statistics for Spatial Data, Revised Edition, Wiley: New York.

Harkness, R. D. and Isham, V. (1983), “A bivariate spatial point pattern of ants’ nests,”Applied Statistics, 32 , 293—303.

Mat ́ern, B. (1986),Spatial Variation, 2nd Edition. Berlin: Springer–Verlag.

McArthur, R. D. (1987), “An Evaluation of Sample Designs for Estimating a Locally Concentrated Pollutant,”Communications in Statistics – Simulation and Computa- tion, 16 , 735–759.

Rathbun, S. L. and Cressie, N. (1990), “A space-time survival point process for a longleaf pine forest in southern Georgia,”Journal of the American Statistical Asso- ciation.

Thompson, S. K. (1992),Sampling, Wiley: New York.

Was this document helpful?

Lecture notes, lecture spatial

Course: Environmental Statistics (ES 714)

7 Documents
Students shared 7 documents in this course
Was this document helpful?
Spatial Statistics 1
March, 2012
Spatial Statistics
This chapter provides a brief introduction to the statistical analysis and modeling of
spatial data.
1 Introduction
Spatial data is distinguished by observations that are obtained at spatial locations
s1,s2, . . . , snwhere the siare coordinates in the plane 2or space 3typically.
Time series models attempt to model the correlations between responses at different
time points. Similarly, with spatial data, the spatial correlational structure needs to
be incorporated and modeled. The field of spatial statistics is a relatively new area
of development and remains an area of active statistical research. The importance
of spatial dependencies was realized in the early 1900’s and some of the methods of
experimental design (randomization, blocking, etc.) where established in agricultural
studies to control for spatial correlation.
Spatial statistics consists primarily of three main components:
Geostatistics a spatial process indexed over a continuous space.
Spatial Point Patterns pertaining to the location of “events” of interest.
Lattice Data spatial data indexed over a lattice of points.
Below is a short description of these three methods.
1. Geostatistics. Geostatistics began with mining type applications and the prefix
“geo” indicates that geostatistics dealt initially with statistics pertaining to the earth.
Geostatistics has applications in climatology, environmental monitoring, and geology.
One of the most common geostatistical applications in the mining context was kriging
which deals with predicting the ore grade in a mining area from spatially observed
samples. Geostatistical applications have expanded original mining applications to
include modeling soil properties, ground water studies, rainfall precipitation, public
health, etc.
2. Point Patterns. Data analysis of point patterns corresponds to studies where
the interest lies in where events of interest occur. A fundamental question of interest
in this context is whether or not the points of interest are occurring at random, or
do the points cluster in some manner, or perhaps the points of interest are occurring
with some sort of regularity.
3. Lattice Data. Lattice type data provide the closest analogue to time series
data. In time series data sets, observations are typically obtained at equally spaced