A quick look at three years of commuting to UCL

A bit of fun

I love cycling and running, and around this time three years ago I purchased my first GPS watch to use as a training aid. As most who have purchased such a device will know, once you start using it you very soon start recording everything you do, even if it’s just commuting to and from work or going to the shops. If it isn’t recorded it didn’t happen, right?

Since I started using one, the popularity of GPS trackers has grown massively and a huge number of apps have been designed to cater for this demand including MapMyFitness, Endomondo and Strava to name a few. These apps, Strava in particular, have created a whole new social phenomenon whereby people can compete against one other to get the fastest time on particular ‘segments’ and earn the Kudos of being K or QOM (King or Queen of the Mountain).

What I find most interesting about this phenomenon is the vast amount of data that is being collected and stored on cyclists’ mobility patterns. As is the case with me, many people now track their daily commute by bicycle as a matter of course. This creates significant opportunities to research cycling commuting behaviour at the aggregate level.

There is a general feeling, I believe, that cyclists journey times are not affected by vehicular traffic. If there is a queue, a cyclist can just go up the inside, or overtake and bypass the queue completely. However, anyone who cycles often in London will know that it is often not that simple. London is an old city with narrow roads, and frequently it can be too dangerous to bypass traffic, or there is simply not enough space. This means that there can be considerable variation in the time it takes to commute the same route by bicycle.

As a start, I wanted to analyse this quantitatively by looking at my own tracks. The first thing I did was to plot the durations of the activities against their distances, which you can see below. I thought it was quite interesting that I was able to qualitatively identify different activity types visually. For example, the two clusters of points arranged horizontally are commutes between UCL and my current home and previous home.

A quick view of a few years of activity data

 

The smaller cluster at 5000 metres and just over 20 minutes is the Wimbledon Common Park Run, a weekly 5k race that I do often. You can also clearly see the two distinct profiles for running and cycling activities.

What is interesting about the commuting activities is their horizontal extent on the plot. With a cursory glance, the mean commuting time is approximately 40 minutes for my old home location and 45 for the new one, but there is considerable variation around this. There are many reasons why commuting times may vary like this, including level of effort, wind direction, precipitation, traffic congestion, cycle congestion (i.e. the number of cyclists occupying the space available for cyclists), the precise time at which the activity started and ended, and variations in the route amongst others. In future work, I hope to look in more detail into how to isolate these effects.

 

 

Loading PostGIS geometries into R without rgdal – an approach without loops

R Stuff

Some work I have been doing recently has involved setting up a PostGIS database to store spatio-temporal data and corresponding geometries. I like to do my analysis in R, so I needed to import the well-known binary (WKB) geometries into the R environment. If you have drivers for PostGIS in your gdal installation, this is straightforward, and instructions are here. However, getting the drivers can be tricky on Windows and was not something I wanted to spend time doing. Luckily, it is possible to get around the problem by returning the geometries as well-known text (WKT) from PostGIS and converting them to spatial objects in R. Lee Hachadoorian describes a way of doing this here. The issue with Lee’s method (as he points out) is that it uses loops, which most R users know are very inefficient. I wanted to avoid loops and came up with the following solution using RPostgreSQL, rgeos and sp:

#Load the required packages
require(RPostgreSQL)
require(rgeos)
require(sp)
# create driver
dDriver <- dbDriver("PostgreSQL")
#connect to the server, replacing with your credentials
conn <- dbConnect(dDriver, user="user", password="password", dbname="dbname")

In the following query I am selecting the ID (id) and geometry (geom) columns from a table called ‘your_geometry’ that fall within a specified bounding box. The function ST_AsText is used to convert WKB to WKT.

# Select and return the geometry as WKT
rs <- dbSendQuery(conn, 'select id, ST_AsText(geom) from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
# Fetch the results
res <- fetch(rs, -1)
dbClearResult(rs)

The readWKT function converts the WKT to R Spatial objects such as SpatialPoints, SpatialLines or SpatialPolygons. In this case I am using lines.

# Use the readWKT function to create a list of SpatialLines 
# objects from the PostGIS geometry column
str <- lapply(res[,2], "readWKT", p4s=CRS("+proj=longlat +datum=WGS84"))
# Add the IDs to the SpatialLines objects using spChFIDs
coords <- mapply(spChFIDs, str, as.character(res[,1]))

Now it is a case of creating a SpatialLinesDataFrame and adding the remaining attributes from the attribute table.

# Query the remaining fields in the attribute table
rs <- dbSendQuery(conn, 'select * from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
res <- fetch(rs, -1)
dbClearResult(rs)
# Create a SpatialLinesDataFrame with the geometry and the 
# attribute table
rownames(res) <- res[,1]
# Here I assume the geometry is in the last column and remove it from the attributes
data <- SpatialLinesDataFrame(SpatialLines(unlist(lapply(coords, function(x) x@lines)),proj4string=CRS("+proj=longlat +datum=WGS84")), res[,-ncol(res)])

The tricky bit of this was working out the way that Line, Lines, SpatialLines and SpatialLinesDataFrame objects interact. readWKT makes SpatialLines objects. If you try to create a SpatialLinesDataFrame from the result, it will give you a single feature with space for one attribute. Therefore, it is necessary to extract the Lines from the SpatialLines, and then convert them back to individual SpatialLines objects… This was a real pain to work out, so I thought I would post it in case it is useful for anyone.

Note that I have queried the same data twice; first to select the geometry as WKT and then to get the remaining attributes. I did this because the query was fast and I didn’t want to manually type out the column names of the attribute table in the first query. It would be easy to do the whole operation in one query.

If you have any ways of simplifying this or speeding it up further I would be interested to know!