The picture of a law abiding cyclist

A bit of fun, R Stuff

A summary of my commutes

This is a picture of my commutes between home and work over the past three years. The points are coloured by the speed value recorded by the GPS device. Red points are slower and white points are faster. The individual points from all my commutes are overlaid on each other and the opacity is a proxy for the number of times a road segment has been traversed. Road segments with stronger colour have been traversed more often. It is important to note that raw GPS speed records are not particularly accurate and depend on factors such as the number of satellites in view, multipath from buildings and trees etc. Therefore it is just an indication of approximate speed.


Activities around UCL

In the second picture, the clustering of red points around road intersections represents stop points, showing that I am a law abiding cyclist! Rightly or wrongly, cyclists as a group have a bad reputation in London for not stopping at red lights. As with any group of road users, the majority of cyclists obey the rules of the road. Unfortunately, the running of red lights is something that is very visible because junctions naturally have a captive audience. I will not get into the debate on this issue, but I found it interesting when looking at my commutes that I could clearly see all the junctions and pedestrian crossings appear. In the future, I hope to look into the various causes of delay for cycle commuters in more detail.

The method

Garmin activities come in a format called .tcx, which is like a .gpx file with the ability to store additional data like heart rate and cadence. I imported the activities into R and converted them to a data frame using the following code:

library(XML) #Load the XML library

# Place your .tcx files in a folder on their own and list files
files <- list.files()

# Create an empty data frame to store the activities using the column headers from the .tcx files
actv <- data.frame("value.ActivityID"=NA, "value.Time"=NA,"value.Position.LatitudeDegrees"=NA,

# Loop through the files and fill up the data frame
for(i in 1:length(files))
	doc <- xmlParse(files[i])
	nodes 0)  #Check that there is data in the activity
		# Convert nodes to a data frame and give the activity an ID
		mydf <- cbind(i, plyr::ldply(nodes, 
		colnames(mydf)[1] <- "value.ActivityID"
		# I included this as some of my activities had different numbers of 
		# fields. This may not be needed (majority had 9).
			actv <- rbind(actv, mydf)

To make the visualisations, I first converted the data frame to a SpatialPointsDataFrame using the following function:

tcxToPoints <- function(tcx=NA, actv=NA, proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
	# A function to import Garmin Connect activities and convert them to spatialPoints objects
	# Inputs:
	# tcx = An individual .tcx file 
	# actv = A data frame of activities if using above code
	# proj4string = coordinate reference system for SpatialPointsDataFrame

		doc <- xmlParse(tcx)
		nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
		mydf <- plyr::ldply(nodes,
		mydf <- actv
	# remove missing coordinates
	mydf <- mydf[-(which([,"value.Position.LatitudeDegrees"]))),]

	coords <- cbind(as.numeric(as.matrix(mydf[,"value.Position.LongitudeDegrees"])), as.numeric(as.matrix(mydf[,"value.Position.LatitudeDegrees"])))
	pts <- SpatialPointsDataFrame(coords=coords, proj4string=proj4string,
	data=subset(mydf, select=-c(value.Position.LatitudeDegrees, value.Position.LongitudeDegrees))) # data without the coordinates
	mode(pts@data[,6]) <- "numeric"
	# Create a speed column in kph
	pts@data <- cbind(pts@data, (pts@data[,6]*3600)/1000)
	colnames(pts@data)[ncol(pts@data)] <- "speedKph"
	# Change dates to POSIX format
	pts@data[,2] <- as.POSIXct(gsub("T", " ", pts@data[,2]))
tcxPoints <- tcxToPoints(actv=actv)
# remove unrealistic speeds (in this case over 80kph)
tcxPoints <- tcxPoints[-which(tcxPoints@data[,8]>80),]
# transform to OSGB 1936
tcxPointsOSGB <- spTransform(tcxPoints, CRSobj=CRS("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"))

I carried out an intermediate step here to isolate journeys that had start and end points within 500 metres of my work location and 200 metres of my home locations. I then created the plot using the following code:

plt <- tcxPointsOSGB # Or a subset thereof

# Create breaks and define colour palette, the alpha value 
# is used to define transparency
brks <- quantile(plt@data$speedKph, seq(0,1,1/5), na.rm=T, digits=2)
cols <- colorRampPalette(c(rgb(1,0,0,0.025), 
rgb(1,1,1,0.05)), alpha=T)(5)

# Set background to black and plot
plot(plt, col=cols[findInterval(x=plt@data[,8], vec=brks)], pch=1, cex=0.35)
par(bg="white") #Reset background to white

# Create a palette for the legend without transparency
legCols <- colorRampPalette(c(rgb(1,0,0,1), 
rgb(1,1,1,1)), alpha=T)(5)

# Add a legend using the bounding box for positioning
legend(x=bbox(plt)[1,1], y=bbox(plt)[2,2], 
legend=leglabs(round(brks, digits=2)), 
fill=legCols, cex=0.75, title="Speed (km/h)",
text.col="white", bty="n")

# Add a North arrow using bounding box for positioning
# and height

arrowHeight <- (bbox(plt)[1,2]-bbox(plt)[1,1])/5
arrows(bbox(plt)[1,2], bbox(plt)[2,1], bbox(plt)[1,2], 
bbox(plt)[2,1]+arrowHeight, col="white", code=2, lwd=3)

Hopefully this code can be applied to any .tcx file without much modification, but .tcx files may vary according to the functionality of the device. I would be interested to know if anyone applies this code to their data and come across any problems.


A quick look at three years of commuting to UCL

A bit of fun

I love cycling and running, and around this time three years ago I purchased my first GPS watch to use as a training aid. As most who have purchased such a device will know, once you start using it you very soon start recording everything you do, even if it’s just commuting to and from work or going to the shops. If it isn’t recorded it didn’t happen, right?

Since I started using one, the popularity of GPS trackers has grown massively and a huge number of apps have been designed to cater for this demand including MapMyFitness, Endomondo and Strava to name a few. These apps, Strava in particular, have created a whole new social phenomenon whereby people can compete against one other to get the fastest time on particular ‘segments’ and earn the Kudos of being K or QOM (King or Queen of the Mountain).

What I find most interesting about this phenomenon is the vast amount of data that is being collected and stored on cyclists’ mobility patterns. As is the case with me, many people now track their daily commute by bicycle as a matter of course. This creates significant opportunities to research cycling commuting behaviour at the aggregate level.

There is a general feeling, I believe, that cyclists journey times are not affected by vehicular traffic. If there is a queue, a cyclist can just go up the inside, or overtake and bypass the queue completely. However, anyone who cycles often in London will know that it is often not that simple. London is an old city with narrow roads, and frequently it can be too dangerous to bypass traffic, or there is simply not enough space. This means that there can be considerable variation in the time it takes to commute the same route by bicycle.

As a start, I wanted to analyse this quantitatively by looking at my own tracks. The first thing I did was to plot the durations of the activities against their distances, which you can see below. I thought it was quite interesting that I was able to qualitatively identify different activity types visually. For example, the two clusters of points arranged horizontally are commutes between UCL and my current home and previous home.

A quick view of a few years of activity data


The smaller cluster at 5000 metres and just over 20 minutes is the Wimbledon Common Park Run, a weekly 5k race that I do often. You can also clearly see the two distinct profiles for running and cycling activities.

What is interesting about the commuting activities is their horizontal extent on the plot. With a cursory glance, the mean commuting time is approximately 40 minutes for my old home location and 45 for the new one, but there is considerable variation around this. There are many reasons why commuting times may vary like this, including level of effort, wind direction, precipitation, traffic congestion, cycle congestion (i.e. the number of cyclists occupying the space available for cyclists), the precise time at which the activity started and ended, and variations in the route amongst others. In future work, I hope to look in more detail into how to isolate these effects.