Friday, November 21, 2014

Network Analysis for Sand Mines and Rail Terminals

Intro

The goal of this lab was to use network analysis, python scripting, and data flow modeling to carry out the steps of the lab automatically.  This does require some time to set up and get running properly, but after everything runs on its own.  Network analysis is a powerful tool that allows for many different types of routing, in this case to the nearest facility.  Network analysis is a key tool for logistics in many companies and is a smart way to figure out the most efficient route.  Python scripting is another great way to save yourself some time and let the scripts run the tools and create things for you.  Lastly data flow modeling is another piece of the same pie that allows the displaying of a process and then streamlines it.  In this instance we created a python script which narrowed our data by the criteria provided, and created some feature classes.  We then made a data flow model, which we used to run network analysis and provide us with some statistics on the effects of sand trucks on roads.

Methods

First the script was written in python to scale down our data based on criteria.  The criteria were that we didn't want to include any mines that have a railroad loading station, or ones that are within 1.5km of a railroad, because if that's the case they would have built a spur.  Those criteria were met with the script and new feature classes were made based on those criteria.  More information on this part of the lab can be found in my last post.

Network analyst is used to do efficient route modeling.  The network analyst tools were used in this lab to find the nearest facility, the facilities being the rail terminals in Wisconsin, as well as in Winona, MN.  The incidents in the network analyst tool were labeled as incidents.  The data used in the network analysis were from a couple sources.  The network dataset that contains the roads came from ESRI streetmap data, the mine data came from the Wisconsin DNR, and the rail terminal data was provided by the Department of Transportation.

The data flow model used in this lab allowed the automation of the routes, creation of new fields, and also the calculation and summarizing of said fields.  The two fields that were added were "Miles" and "Cost".  To calculate the miles field I took the shape length of each route, which is in meters because of the projection, and divided it by 1,609 (how many meters are in a mile). This equation gave me the number of miles.  In the second calculate field box I calculated the impact in dollars of having sand trucks drive on roads in Wisconsin by county.  The trucks took 50 trips there and back, meaning 100 total trips.  The impact on roads per mile was 2.2 cents. The equation for this calculation was ((100*[Miles])*2.2/100). By dividing the total by 100 I got the number in dollars instead of cents.

Figure 1. The model used in exercise 7 to calculate the impact in dollars on Wisconsin roads.

Results and Discussion

The results shown below are all hypothetical and should not be used to make any decisions regarding policymaking, and should also not be shown in any publications.

Figure 2. Below shows a map of the quickest routes from sand mines to rail terminals.  An observation can be made that some of the routes go into Minnesota and use their roads.  This means there are no impacts on Wisconsin roads, as long as the mine uses the most efficient route.  In these cases it makes more sense economically to use a highway, then stay in Wisconsin and take back roads. As you can see the rail terminals in Wood, Trempealeau, and Chippewa counties have numerous mines that utilize them.  This can be seen in the frequency field of figure 3. What figure 3. also shows is that those three counties mentioned are not the highest on the cost field ranking.  This is due in part to how short the routes are for most of the mines in the counties.  In counties like Burnett and Barron the frequency is lower, but the trips are a lot longer.  Should frac sand mines have to pay for the impacts they have on Wisconsin roads? Should more rail spurs be made to prevent the impact on roads? These are questions that the Wisconsin DOT and the state government will have to ponder as more and more mines become active in the state.


Figure 2. Map of routes from sand mines to rail terminals.

Figure 3. Table of frequency of routes per county as well as cost of trucking on roads by county.

Conclusions

After doing this analysis many questions come to mind on how the state plans to deal with the sand mines, especially since in the coming years there will only be more and more becoming active.  Just by doing this hypothetical exercise one can see the impacts are happening to some degree.  Not to mention the other effects the excessive trucking has like air pollution and noise pollution. network analyst can play a big role in answering some of these questions and will most likely be used when the state does decide to make a decision on this topic.

Wednesday, November 12, 2014

Python Script for Network Analysis Exercise

Making this python script was part one of exercise seven. In exercise seven we are looking at the effect commercial trucking has on roads from mines to rail stations. To do an analyze of this question we first need to get rid of the mines that have rail spurs and the mines that are not active.  To do this we used python to make a query and acquire our specified results.

The first step was to set the environments for the script and import the system modules.  The workspace was set to our individual exercise seven geodatabase.  The next step was set the variables we would be using in the script, which were just existing feature classes in the geodatabase, as well as some new ones that we would be creating.  The third step was to create three SQL statements that would query out the mines that we wanted.  The first SQL statement selected the active mines, the second statement selected all of the facilities with "Mine" as a type, and lastly the third statement kept out any mines with the word "Rail" in them. The SQL statements were then ran with the addition of the variables created earlier.  The last step was to remove active mines that were within 1.5km of a railroad.  Any closer distance and they likely would have had a spur built to the railroad. To do this the arcpy.SelectLayerByLocation_Management tool was used.  After those parts of the script were done and the script successfully ran, the result was a point feature class containing 41 mines.



Figure 1. The python script used to query active mines with no railroads in a 1.5km radius.

Friday, November 7, 2014

Data Normalization, Geocoding, and Error Assessment

Goals and Objectives:
     The objective of this lab was to work with excel tables and get comfortable with normalizing data.  Having to normalize data can be very frustrating, because there is no set standards on the way tables, and there data, are displayed.  In GIS I every piece of data is normalized already.  In GIS II it is more like what one would experience in the working world, which involves table normalization.  Geocoding the mines was the next objective and then finding how accurate individuals geocoded by comparing the point distance of individuals mines against the rest of the class.  This gives an idea of how much error was involved in the geocoding process.  The outcome of this work is a shapefile with all of the sand mines in Wisconsin in it.  The mines in the shapefile are all geocoded and ready to be analyzed spatially.
Methods:

Normalizing the data was a time consuming task.  The addresses in the initial spreadsheet are in one of two forms.  The first being that the address is provided, which is good.  The second form is in PLSS, which doesn't really help when the data is moved to ArcMap.  To get the table into ArcMap the addresses had to be standardized. The un-normalized table can be seen in figure 1.  The difference between St. and Street makes a difference and so do capital letters.  The zip code and city also had to be pulled out of the table and separated into different columns.  Some difficulty was found in changing the address from PLSS to a regular address.  To accomplish this the address locator tool on the geocoding toolbar could be used in conjunction with the PLSS data layer, which could be found in the DNR database.  Once the table was normalized it could then be brought in to ArcMap and geocoding could begin. The normalized table is shown in figure 2. 
The first step was to log in to ArcGIS online so that it could connect with your computer and allow geocoding to be done.  Geocoding is relatively straightforward and painless as long as your data has its ducks in a row. For me it worked right away and all of my addresses were mapped with a high match rate.
Once geocoding was complete all of the geocoded addresses for the class needed to be collected so a comparison of accuracy could be done with the individuals who had the same mines as myself. First all of the classes shapfiles of mines needed to merged into one table on ArcMap.  This was frustrating, but after fixing a couple fields it ran successfully.  Before the comparison could be done the mines with the same id number as my own needed to be queried out from all the rest.  The query was long, but simple enough and I got the desired result.  With the selected results a new feature class was created.  Now by taking my geocoded mines and the same geocoded mines done by other classmates, I could run the point distance tool and discover just how accurate I had been during geocoding.  Figure 3 below shows the resulting table.
Results:



Figure 1. This shows the spreadsheet of mines before table normalization. As one can see nothing is really standardized and the address column has more than just the address in it.

Figure 2. This shows the result of table normalization.  Every column is neat and orderly, making the move to ArcMap possible and also making it easier to read.


Figure 3. After running the point distance tool one can see how far each individual mine is from all of the others. This is useful because it allows the interpretation of accuracy of geocoding.



Figure 4.
Discussion:

When it comes to the types of error that cause distance variation during geocoding there are a couple that are relevant.  The most common errors were systematic errors and operation errors.  These are caused by human influence.  An example that occurred during this assignment would be the placement of mines on the map when trying to come up with an address.  Everyone isn't going to pick the same spot therefore systematic error is present.  Another large contributor to error in the assignment was inherent errors.  This type of error spawns from attribute data input and digitizing.   When it comes to attribute data input it's obvious where the error came from.  Each one of us in the class normalized our own mines.  During that process no one did it exactly the same as the next person.  This results in inherent errors.  One way to fix this would be to have a standard set for normalizing data.  
When it comes to the question of how to tell which points are off and which ones are accurate the answer can be tricky.  To find the exact answer to this question we would have to have a very accurate version of this data that had the exact addresses for the mines.  Then we could compare that to the geocoded addresses we came up with and run a point distance tool.  
 Conclusion:

Data normalization is a very crucial, not just with this assignment, but anytime you work with spatial data and tables.  It is a good idea to have a standard laid out ahead of time for data normalization.  If we had that standard before the assignment the class would of probably had a few less headaches.  The problems of mislabeled fields is one that came up a few times during the assignment and also shapefiles projected differently from the rest.  Geocoding is a very helpful tool in GIS and it was very useful to brush up on that in this assignment.