The picture of a law abiding cyclist

A bit of fun, R Stuff

A summary of my commutes

This is a picture of my commutes between home and work over the past three years. The points are coloured by the speed value recorded by the GPS device. Red points are slower and white points are faster. The individual points from all my commutes are overlaid on each other and the opacity is a proxy for the number of times a road segment has been traversed. Road segments with stronger colour have been traversed more often. It is important to note that raw GPS speed records are not particularly accurate and depend on factors such as the number of satellites in view, multipath from buildings and trees etc. Therefore it is just an indication of approximate speed.

AroundWork

Activities around UCL

In the second picture, the clustering of red points around road intersections represents stop points, showing that I am a law abiding cyclist! Rightly or wrongly, cyclists as a group have a bad reputation in London for not stopping at red lights. As with any group of road users, the majority of cyclists obey the rules of the road. Unfortunately, the running of red lights is something that is very visible because junctions naturally have a captive audience. I will not get into the debate on this issue, but I found it interesting when looking at my commutes that I could clearly see all the junctions and pedestrian crossings appear. In the future, I hope to look into the various causes of delay for cycle commuters in more detail.

The method

Garmin activities come in a format called .tcx, which is like a .gpx file with the ability to store additional data like heart rate and cadence. I imported the activities into R and converted them to a data frame using the following code:

library(XML) #Load the XML library

# Place your .tcx files in a folder on their own and list files
files <- list.files()

# Create an empty data frame to store the activities using the column headers from the .tcx files
	
actv <- data.frame("value.ActivityID"=NA, "value.Time"=NA,"value.Position.LatitudeDegrees"=NA,
"value.Position.LongitudeDegrees"=NA,
"value.AltitudeMeters"=NA,           
"value.DistanceMeters"=NA,
"value.Value"=NA,
"value.Speed"=NA,                    
"Time"=NA) 

# Loop through the files and fill up the data frame
for(i in 1:length(files))
	{
	doc <- xmlParse(files[i])
	nodes 0)  #Check that there is data in the activity
		{
		# Convert nodes to a data frame and give the activity an ID
		mydf <- cbind(i, plyr::ldply(nodes, as.data.frame(xmlToList))) 
		colnames(mydf)[1] <- "value.ActivityID"
		
		# I included this as some of my activities had different numbers of 
		# fields. This may not be needed (majority had 9).
		if(ncol(mydf)==9)
			{
			actv <- rbind(actv, mydf)
			}
		}
	}

To make the visualisations, I first converted the data frame to a SpatialPointsDataFrame using the following function:

tcxToPoints <- function(tcx=NA, actv=NA, proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
	{
	# A function to import Garmin Connect activities and convert them to spatialPoints objects
	# Inputs:
	# tcx = An individual .tcx file 
	# actv = A data frame of activities if using above code
	# proj4string = coordinate reference system for SpatialPointsDataFrame

	if(!is.na(tcx))
		{
		doc <- xmlParse(tcx)
		nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
		mydf <- plyr::ldply(nodes, as.data.frame(xmlToList))
		}
	else
		{
		mydf <- actv
		}
	# remove missing coordinates
	mydf <- mydf[-(which(is.na(mydf[,"value.Position.LatitudeDegrees"]))),]

	coords <- cbind(as.numeric(as.matrix(mydf[,"value.Position.LongitudeDegrees"])), as.numeric(as.matrix(mydf[,"value.Position.LatitudeDegrees"])))
	pts <- SpatialPointsDataFrame(coords=coords, proj4string=proj4string,
	data=subset(mydf, select=-c(value.Position.LatitudeDegrees, value.Position.LongitudeDegrees))) # data without the coordinates
	mode(pts@data[,6]) <- "numeric"
	
	# Create a speed column in kph
	pts@data <- cbind(pts@data, (pts@data[,6]*3600)/1000)
	colnames(pts@data)[ncol(pts@data)] <- "speedKph"
	
	# Change dates to POSIX format
	pts@data[,2] <- as.POSIXct(gsub("T", " ", pts@data[,2]))
	return(pts=pts)
	}
tcxPoints <- tcxToPoints(actv=actv)
# remove unrealistic speeds (in this case over 80kph)
tcxPoints <- tcxPoints[-which(tcxPoints@data[,8]>80),]
# transform to OSGB 1936
tcxPointsOSGB <- spTransform(tcxPoints, CRSobj=CRS("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"))

I carried out an intermediate step here to isolate journeys that had start and end points within 500 metres of my work location and 200 metres of my home locations. I then created the plot using the following code:

plt <- tcxPointsOSGB # Or a subset thereof

# Create breaks and define colour palette, the alpha value 
# is used to define transparency
brks <- quantile(plt@data$speedKph, seq(0,1,1/5), na.rm=T, digits=2)
cols <- colorRampPalette(c(rgb(1,0,0,0.025), 
rgb(1,1,1,0.05)), alpha=T)(5)

# Set background to black and plot
par(bg="black")
plot(plt, col=cols[findInterval(x=plt@data[,8], vec=brks)], pch=1, cex=0.35)
par(bg="white") #Reset background to white

# Create a palette for the legend without transparency
legCols <- colorRampPalette(c(rgb(1,0,0,1), 
rgb(1,1,1,1)), alpha=T)(5)

# Add a legend using the bounding box for positioning
legend(x=bbox(plt)[1,1], y=bbox(plt)[2,2], 
legend=leglabs(round(brks, digits=2)), 
fill=legCols, cex=0.75, title="Speed (km/h)",
text.col="white", bty="n")

# Add a North arrow using bounding box for positioning
# and height

arrowHeight <- (bbox(plt)[1,2]-bbox(plt)[1,1])/5
arrows(bbox(plt)[1,2], bbox(plt)[2,1], bbox(plt)[1,2], 
bbox(plt)[2,1]+arrowHeight, col="white", code=2, lwd=3)

Hopefully this code can be applied to any .tcx file without much modification, but .tcx files may vary according to the functionality of the device. I would be interested to know if anyone applies this code to their data and come across any problems.