Traffic tickets in Uruguay (2013-2014)

This is a quick and simple descriptive summary of traffic tickets in Montevideo. I created this in about 25 minutes using Pandas, Ipython notebook, and matplotlib (they are free). It is also a really fast way to share your work. Plus, you can see the results right on the screen.

For a great introduction to the subject go to http://pandas.pydata.org/pandas-docs/stable/tutorials.html

Importing modules

In [23]:
import numpy as np #Is more of a low level language
import pandas as pd  #Works better for Time series (It is a high level)
import matplotlib #Done for graphs
import scipy
%pylab inline  
#This makes graph appear inline
figsize(15, 5) # Make the graphs a bit prettier, and bigger
pd.set_option('display.mpl_style', 'default')
Populating the interactive namespace from numpy and matplotlib

Importing the files

The datasets come from the Uruguay open data site. The required a little bit of cleaning so I placed the cleaned version on Dropbox

In [25]:
M2014=https://www.dropbox.com/s/9d6bpp00z9gcq70/multas2014.csv?dl=0;
M2013=https://www.dropbox.com/s/3ksnfx80jrvw299/multas2013.csv?dl=0;

data_2014 = pd.read_csv(M2014)
data_2013 = pd.read_csv(M2013)

data_2013["Fecha"]=pd.to_datetime(data_2013["Fecha"[0:10]], errors='coerce', format="%d/%m/%Y" , exact=0) 
#On 2013 I did a substring [0:10] because the data had some inconsistency and I only wanted the date not timestamp
data_2014["Fecha"]=pd.to_datetime(data_2014["Fecha"], format="%d/%m/%Y  %H:%M", errors='coerce' )
#The coerce allows you skip the errors

Summary

In this part we look at the tickets per month, day of week, and hour. The first part tabulates. The histogram is tickets per day of week with 0 being monday. Maybe cops are nicer on Sunday? Nah!

In [28]:
#Tickets in 2014 by day of week
data_2014["Fecha"].dt.dayofweek.value_counts()
data_2014["Fecha"].dt.hour.value_counts()
data_2014["Fecha"].dt.month.value_counts()
In [39]:
plt.title("Tickets By Day")
plt.xlabel("Value")
plt.ylabel("Frequency")
axis=[0, 7, 0, 45000]
data_2014["Fecha"].dt.dayofweek.plot(kind="hist",  bins=7, rwidth=.8 )
Out[39]:
<matplotlib.axes.AxesSubplot at 0x93ba1b0>

Distinct features

I now do some more queries which do a 1) cross tabulation between two columns, 2) take a subsample of the data, and 3) look at the unique values in a column

In [ ]:
pd.crosstab(data_2014["Fecha"].dt.month,  data_2013["Ordenanza"])
data_2014[data_2014["Fecha"].dt.month >10]["Articulo"].value_counts()
data_2014["Ordenanza"].unique()

Plotting

Finally we plot the basic graphs. We plot the distributions by month and see that there were more tickets on 2014.

In [51]:
plt.title("Tickets By Month")
plt.xlabel("Value")
plt.ylabel("Frequency")
axis=[0, 12, 0, 45000]
data_2014["Fecha"].dt.month.plot(kind="hist",  bins=12 , color='b',  label='2014', rwidth=.8) #Remember the number of bins 
data_2013["Fecha"].dt.month.plot(kind="hist",  bins=12, color='r', alpha=0.3,  label='2013', rwidth=.8)
plt.legend()
Out[51]:
<matplotlib.legend.Legend at 0xdd11730>
In [ ]: