Mapping Roads with GPS Data

By David Ness
Thursday, September 6, 2001

The Purpose

9/6/2001 13:39 - This note is still in a preliminary state, and is incomplete, particularly towards the end.

The purpose of this note is to discuss the prospects for analyzing GPS tracking data, and producing maps of roads (paths) from that data.

The Problem

GPS tracking data contains errors. Worse, it makes errors in two dimensions. That's the bad news. The good news is that we often have lots of tracks to work with. Lots of the roads that we drive on, we drive on many times. Each time we traverse a road we can record a new track. And over time most of these tracks will probably generate some relatively unbiased estimates of locations along the road, So we should be able to produce some estimates of where the road actually is.

In this note I am going to speak principally of tracks which are on roads where we are assumed to be driving. There is little in analyzing driving and roads that doesn't also apply to walking and paths, but for the purposes of discussion it is simpler just to assume one environment and not continually have to say `road or path' or engage in other circumlocutions.

Roads also have one differentiating characteristic: they are pretty much immobile and, over the kind of time frames we are concerned with, they do not change. Paths are somewhat more subject to local variation, and thus the data that comes from walking may have more variability, particularly percentage wise, than data that comes in from driving on roads. While we may drive down the very right edge of the road one day, and rather hug the centerline on another, the difference is at most a few feet. Walking we may well vary our path by substantially more than that, taking short-cuts and crossing streets at odd angles as we dodge cars that are driving past. And in walking we are somewhat more likely to capriciously change our direction than is likely desirable, or even possible, when driving on roads.

The problem is that reducing a collection of measurements into an estimate of the position and as the discussion here should make clear this is distinctly not a trivial problem.

How do GPS Work?

For those who don't understand it, GPS (Global Position Satellite or Global Positioning System) seems quite miraculous. In some ways it is. In other ways it is really very simple. From our standpoint the most important thing about the way it is structured is that most of the cost of the system goes into getting the satellites in place. The consumer side of the operation only demands very simple, quite inexpensive equipment.

This means that consumer grade GPS are now available to most of the people who would be interested in them. The cheapest units cost under $100, and while the more expensive models can still get up to the thousands of dollars, very good reliable equipment can be obtained for a few hundred dollars. This makes GPS a viable source of mapping information and makes the problem of reducing GPS tracks to valid maps a problem of much wider interest than was formerly the case.

Time

At the core of GPS we have the notion of a very precise measurement of time. Atomic clocks have made the hyper-accurate measurement of time quite possible. We can accurately now keep time, without huge trouble, to the millionth or billionth of a second.

In a billionth of a second (we call it a nanosecond) an electrical signal will travel about a foot. This allows us to measure distance with great accuracy. If we can generate a time signal accurate to within 10 nanoseconds, for example, we can measure distances to an accuracy of about 10 feet.

Distance

Think about it a moment. Assume I have two perfectly synchronized clocks and that I can emit an electrical signal at one and detect it at the other. Assume I emit a pulse from one of the clocks at noon. If the other clock detects the pulse one one-thousandth of a second after noon, then it must be a million feet (a couple of hundred miles) away from the source of the signal. If I have two other clocks I can also measure my distance from them. If I know with some precision where these clocks are then I can get a quite precise notion of exactly where I am by looking at when the signals arrive---as that tells me how far I am from each of them.

But how can we, without carrying around an atomic clock in our pocket, know the time quite precisely. The answer is simple. Add another remote clock into the mix, and instead of solving the three variable problem of position with three clocks, solve the four variable problem of position plus time with four clocks. The math may be a little tougher, but only a little.

Spheres

In essence, that's what GPS does. The satellites have hyper-accurate atomic clocks that are under constant `surveillance' by the people who manage the satellite system on the ground. The satellites also have unbelievable stability in their paths. So we know where they are with considerable accuracy at every point in time. And the satellites can broadcast enough information about their position so that very simple devices in the hands of the consumer can read the signals from the satellites and make appropriate corrections.

Now think about knowing, with considerable precision, the distance from our current location to a satellite. If we know where the satellite is, and we know our distance from it, then we know we are somewhere on a sphere in space that is exactly that distance from the satellite. If we know our distance from two separate satellites, then we know we are somewhere on the circle of intersection where the two spheres in space mesh (think about what you'd get if you were able to push two tennis balls together). If we bring a third satellite into the picture, then we are down to a point or two in space which must be our true location. Generally only one of the points `makes sense', and that's where we probably are.

What's in a Track?

Current hobbyist/consumer grade GPS devices generally create sequences of data points, called tracks, that consist of 3-tuples: time, latitude and longitude. For most consumer grade devices that I have seen time is measured in whole seconds, while latitude and longitude are measured in degrees and minutes with minutes reported to four decimal places. As a rough calculation one minute of latitude is 25000 miles / (360 degrees * 60 minutes) or about 1.16 miles. This means that one ten-thousandth of a minute represents about 6 inches. At the moment consumer grade GPS are not accurate to anything like this degree of fineness, as errors are generally somewhere between several feet and (a small number of) hundreds of yards.

The time estimates are very accurate, but as is indicated, not very precise. We do not have any real information about how synchronized the reports might be, so we have to assume that a report that comes in at 10:10:10 could be anything from 5 to 7 seconds after 10:10:05, as we might be `late' in one measure (i.e. 10:10:10 was almost 10:10:11) or 'early' (it was barely 10:10:10). Similarly with 10:10:05, thus producing a swing of up to a couple of seconds.

Sampling the Track

Consumer grade devices can be set to provide tracks with at least three common sampling strategies. These strategies are:

Specifying time sampling causes the GPS to output a measure every X seconds where we are allowed to specify any number from 1 on up. Current day GPS commonly have storage capacity for 30 minutes of samples at 1 per second. Usually I set my GPS to sample every 2 seconds, this seeming to be a good compromise that results in me not losing data, but also not having to download my tracks from the GPS more than once a day (I do about an hour's worth of driving on a typical day, it seems). If I am going to be traveling longer distances I may set the time delta to five seconds, or even 15 seconds if many hours of travel are involved.

Specifying distance causes a similar track to be made, but with this setting, points are recorded every X feet that are traveled. I have never used this option much and therefore don't have experience with it.

Specifying resolution causes the GPS to record points whenever the (vector) velocity of the unit changes dramatically. This happens, of course, during starts, stops and when turns are made. With this scheme, more points are recorded when one goes around a corner than when one proceeds, at a relatively even speed, down a straight road. This is quite a useful strategy for recording points, but it is both more difficult to control (i.e. it's easier to run out of storage space in the device and begin to lose points), and it may be harder to analyze the data that comes back. This last point is not yet established to my satisfaction, but I do have a hunch that it is likely true.

Why Analyzing Tracks is hard

All of this would be pretty easy to handle if the data that we got from the GPS was error free. The complications arise because there are many sources of error, and all of them contribute to making it difficult to combine the data from several separate tracks in order to draw some useful conclusions about where the roads actually lie.

Errors

There are many sources of error. They have the cumulative impact of generally placing us several (small dozens would be typical) feet away from our actual location. In certain cases the errors can also be very much more substantial than that.

There Used to be SA

If we have any historical data we may have to contend with what is called SA Error. Prior to a Presidential Order issued by President Clinton, information from the satellites was purposely slightly altered to make GPS fuzzy. This was done for (probably misguided) security purposes at the behest of the Pentagon. This process, called Selective Availability, and abbreviated SA caused measurements of location to slowly `wander' at a mile or two per hour, even if you were standing still.

The military eventually realized that this procedure didn't contribute much to national security, and given that analysis Clinton ended the practice at Midnight (UTC) on May First 2000. Since that time we have not had to contend with this artificially induced error source.

Environmental Considerations

Unfortunately, that doesn't mean all error is gone. There are several different kinds of error that are introduced due to environmental considerations. For example, multi-path error occurs when the GPS sees two slightly different copies of the same signal. This can be due to something as simple as seeing an echo or reflection off of some nearby surface. Urban Canyons are a common place to notice this kind of error, so good GPS measurements in a city may prove to be problematical.

This isn't the place to go into any detailed discussion of all of the possible sources of error. All that matters for our purposes is that there are enough sources of error present in a typical track so that the estimates of location we receive are often a modest distance from our true location.

Biases in the Data

In some situations there may even be some bias introduced into the data. This might occur, for example, if there was some unusual geometric configuration of the landscape in a particular place that would lead to receiving reflected signals rather than direct signals, for example. Fortunately, practical experience seems to indicate that such circumstances are quite rare, so we don't have to give them much consideration in the very preliminary analysis that we undergo in this note.

Momentum

The GPS may also introduce some rather substantial errors itself. This is an unfortunate consequence of solving some other problems with GPS data. Most of the models that underlie this situation are not discussed in the public record, but lots of comment on Usenet Newsgroups devoted to GPS indicate that the basic assumptions made in the description which follows are not very far from reality.

It is not uncommon for GPS signals to drop out. Trees, buildings and other things may momentarily obscure a signal if we are in a moving vehicle. This may mean the we temporarily lose lock on the satellites. Rather than completely drop out, the GPS will typically attempt to find the lost signals again, and while doing so will assume that you continue to travel at roughly the same speed and in roughly the same direction that it knows you were doing over the past few seconds. Most of the time, the signal will be quickly re-acquired, and then a little clean-up can be performed and tracking will proceed quite normally.

Sometimes, however, this isn't the case. In my neighborhood, for example there are lots of twisting roads. Worse, these roads are in little valleys that are prone to obscure the lines of sight to the satellites. So I quite frequently see drop-outs that occur and (fake) continue me on some path when I have actually either stopped or made a turn to a new direction. This may go on, in extreme circumstances, for the better part of a minute or so. As a result the GPS may `think' I am several hundred yards from my actual position. If it then succeeds in getting a good signal, it may well correct my position indication to the proper place, but in doing so it likely will traverse some country that is not very near my actual path (for example through a neighbor's yard, or through the woods at a very sharp corner where I have an almost switchback-like turn). This can, obviously, produce data points that are quite inaccurate.

Don't know 'real' location

Since at the end of the process all we have is the `track' we don't know where our real location was with respect to any of our data points. All we know is that the data is probably reasonably consistent with the general movement down the road, and if we are able to aggregate several trips, all made on separate occasions, we might be able to generate a reasonable estimate of our actual situation.

Notice, however, that recognizing the `distance off track' for any particular point is a non-trivial task. Assume, for a moment, that we have a true picture of the track, so that we could plot the point in question with accuracy. It is still not obvious where we were on the track when the point we have obtained was being created. There is certainly no reason to believe that we were, for example, at the closest point that is on the track, it may just be that we have a relatively large error. All we really can say, unless we have some implications from a more sophisticated model, is that we were somewhere on the track when the point was generated.

Regularity of Distortions

Looking at hordes of empirical data suggests that there may be some simplifications that are possible. Not all of the errors in measurement appear to be all that random. For example, from time-to-time paths seem to be generated that quite parallel other paths. It is as though one of the satellites is slightly misplaced and therefore generates position information that has some regular displacement from the real positions. If this happens, then it might be reasonable to detect it, and as a result prove to be able to properly adjust the data in such a way as to extract significant information from it.

In any case, while we might have considerable trouble recognizing features when traversing a city with a regular block structure, most of our typical trips will be on roads where any regular distortion is quite likely to show up.

Implications from Path

We might be able to draw some conclusions from the actual pattern of a path, however. If there are lots of turns or other irregularities in a particular path we might be able to conclude something about the tracks by doing some kind of error minimization as we superimpose the paths on one another. Most paths, unless they are generated in highly regularized neighborhoods (square block patterns, for example) have rather distinctive characteristics which might well indicate how the points should be overlaid as we attempt to digest the information in a collection of tracks.

Implications from Speed

There are also some implications that we might be able to draw by looking at the implicit speed of the vehicle that is generating the track. While accelerations and decelerations are quite common in driving, they are most often associated with starts and stops (which tend to occur at regular places, such as stop signs) than with substantial variations in the course of regular driving.

Speed indications contain other information as well. All other things being equal, one would expect similar speeds on a particular road segment over several different trips.

Elimination of Data

It sometimes is possible to eliminate data points that are far off course. One of the common causes of such points is a weak signal which causes the GPS to go into momentum mode where it extrapolates velocity and direction in a way that produces data points from places (extensions of the real route) that we never actually reached.

One of the consequences of these extended routes is that they often result route corrections that produce speeds that are out of range for the situation. While such corrections are not really common, they do occur with some frequency and can result in speeds that are clearly out of the normal range. I have several tracks from my neighborhood (where the roads are quite twisting) that will show brief speeds of 150 mph or more. These are clearly the result of artifacts, not real data. Neither my car nor my local roads are capable of dealing with such speeds. I also have one track, taken from a recent visit to New York City, that shows me spending about five minutes of this trip up on the Northern border of Manitoba, having gotten there at a speed of well over 4,000 mph, at a time that I happen to know I was actually quite still, and gazing at a rather pastoral Central Park scene.

Outliers

It is clear that there are times when we can eliminate some outliers in the data that we have by plotting points on an existing map and observing (by sight) that they are too far off-track to be legitimate points. We may want to eliminate these points from our analysis in order to keep them from distorting the results of our mapping exercises.

It is also possible that knowing something about the density of roads in some particular area will allow us to trim our data set by removing a series of points which are too far off track, but which we know from external considerations are points generated from travel on the road segment in question.

We can accomplish this because bad points, while they occasionally are just single points far away from all of the other data, will often occur in sequences---probably due to a time period where we receive a bad signal (reflection or multi-path, for example) for some period of time. This can distort a sequence of readings, perhaps by a detectable (and occasionally even correctable) amount.

Sampling at different times

The modeling picture is complicated by the fact that the sampling mechanism of the GPS is quite granular, and its regularity is not known. In other words, while the periodicity of the GPS is set, the regularity of the intervals between each recording point is not known. It is not immediately clear how this should be reflected in the modeling process.

Modeling Basis

I'm not very happy with the approaches to this problem that I have tried so far. I do, however, have a mental picture of an approach that I'd like to try, and I am trying to figure out how to implement it in an effective way.

Modeling Overview

---> Give an overview the rubber mat model.

---> Also give an overview of the detecting intersections model.

A Draft Procedure

---> Attempt here to explain how one might produce an `integrated vector difference' of two paths that you lay one on top of the other, and allow some transforms, particularly rotation and displacement to be minimized. Then reduce the data to some form of least squares measures. Iterate with each new data path, and see if iterations gradually begin to converge.

Then maybe make some estimates and actually go out and try to survey the projections.

Conclusions

---> To be determined.

Sidebar: Modeling Simple Things

This problem domain is an example that is appropriate for the study of Commonplace Things. This is an excellent example of a very simple situation. It occurs in everyday situations, and is generally very well understood by just about everyone. Nevertheless, it produces a circumstance that is very difficult to model.

David Ness' summary of work can be found at http://mywebpages.comcast.net/dness
The data from GPS can provide us with information that lets us build maps of roads, etc. However, it doesn't make the problem of doing so trivial. This paper discusses some of the probelms that we encounter on the way to tackling this task.

Home