Goals and Objectives:
The objective of this lab was to work with excel tables and get comfortable with normalizing data. Having to normalize data can be very frustrating, because there is no set standards on the way tables, and there data, are displayed. In GIS I every piece of data is normalized already. In GIS II it is more like what one would experience in the working world, which involves table normalization. Geocoding the mines was the next objective and then finding how accurate individuals geocoded by comparing the point distance of individuals mines against the rest of the class. This gives an idea of how much error was involved in the geocoding process. The outcome of this work is a shapefile with all of the sand mines in Wisconsin in it. The mines in the shapefile are all geocoded and ready to be analyzed spatially.
Methods:
Normalizing the data was a time consuming task. The addresses in the initial spreadsheet are in one of two forms. The first being that the address is provided, which is good. The second form is in PLSS, which doesn't really help when the data is moved to ArcMap. To get the table into ArcMap the addresses had to be standardized. The un-normalized table can be seen in figure 1. The difference between St. and Street makes a difference and so do capital letters. The zip code and city also had to be pulled out of the table and separated into different columns. Some difficulty was found in changing the address from PLSS to a regular address. To accomplish this the address locator tool on the geocoding toolbar could be used in conjunction with the PLSS data layer, which could be found in the DNR database. Once the table was normalized it could then be brought in to ArcMap and geocoding could begin. The normalized table is shown in figure 2.
The first step was to log in to ArcGIS online so that it could connect with your computer and allow geocoding to be done. Geocoding is relatively straightforward and painless as long as your data has its ducks in a row. For me it worked right away and all of my addresses were mapped with a high match rate.
Once geocoding was complete all of the geocoded addresses for the class needed to be collected so a comparison of accuracy could be done with the individuals who had the same mines as myself. First all of the classes shapfiles of mines needed to merged into one table on ArcMap. This was frustrating, but after fixing a couple fields it ran successfully. Before the comparison could be done the mines with the same id number as my own needed to be queried out from all the rest. The query was long, but simple enough and I got the desired result. With the selected results a new feature class was created. Now by taking my geocoded mines and the same geocoded mines done by other classmates, I could run the point distance tool and discover just how accurate I had been during geocoding. Figure 3 below shows the resulting table.
Results:
Figure 1. This shows the spreadsheet of mines before table normalization. As one can see nothing is really standardized and the address column has more than just the address in it. |
Figure 2. This shows the result of table normalization. Every column is neat and orderly, making the move to ArcMap possible and also making it easier to read. |
Figure 3. After running the point distance tool one can see how far each individual mine is from all of the others. This is useful because it allows the interpretation of accuracy of geocoding. |
Figure 4. |
Discussion:
When it comes to the types of error that cause distance variation during geocoding there are a couple that are relevant. The most common errors were systematic errors and operation errors. These are caused by human influence. An example that occurred during this assignment would be the placement of mines on the map when trying to come up with an address. Everyone isn't going to pick the same spot therefore systematic error is present. Another large contributor to error in the assignment was inherent errors. This type of error spawns from attribute data input and digitizing. When it comes to attribute data input it's obvious where the error came from. Each one of us in the class normalized our own mines. During that process no one did it exactly the same as the next person. This results in inherent errors. One way to fix this would be to have a standard set for normalizing data.
When it comes to the question of how to tell which points are off and which ones are accurate the answer can be tricky. To find the exact answer to this question we would have to have a very accurate version of this data that had the exact addresses for the mines. Then we could compare that to the geocoded addresses we came up with and run a point distance tool.
Conclusion:
Data normalization is a very crucial, not just with this assignment, but anytime you work with spatial data and tables. It is a good idea to have a standard laid out ahead of time for data normalization. If we had that standard before the assignment the class would of probably had a few less headaches. The problems of mislabeled fields is one that came up a few times during the assignment and also shapefiles projected differently from the rest. Geocoding is a very helpful tool in GIS and it was very useful to brush up on that in this assignment.
No comments:
Post a Comment