The picture of a law abiding cyclist

A bit of fun, R Stuff

A summary of my commutes

This is a picture of my commutes between home and work over the past three years. The points are coloured by the speed value recorded by the GPS device. Red points are slower and white points are faster. The individual points from all my commutes are overlaid on each other and the opacity is a proxy for the number of times a road segment has been traversed. Road segments with stronger colour have been traversed more often. It is important to note that raw GPS speed records are not particularly accurate and depend on factors such as the number of satellites in view, multipath from buildings and trees etc. Therefore it is just an indication of approximate speed.

AroundWork

Activities around UCL

In the second picture, the clustering of red points around road intersections represents stop points, showing that I am a law abiding cyclist! Rightly or wrongly, cyclists as a group have a bad reputation in London for not stopping at red lights. As with any group of road users, the majority of cyclists obey the rules of the road. Unfortunately, the running of red lights is something that is very visible because junctions naturally have a captive audience. I will not get into the debate on this issue, but I found it interesting when looking at my commutes that I could clearly see all the junctions and pedestrian crossings appear. In the future, I hope to look into the various causes of delay for cycle commuters in more detail.

The method

Garmin activities come in a format called .tcx, which is like a .gpx file with the ability to store additional data like heart rate and cadence. I imported the activities into R and converted them to a data frame using the following code:

library(XML) #Load the XML library

# Place your .tcx files in a folder on their own and list files
files <- list.files()

# Create an empty data frame to store the activities using the column headers from the .tcx files
	
actv <- data.frame("value.ActivityID"=NA, "value.Time"=NA,"value.Position.LatitudeDegrees"=NA,
"value.Position.LongitudeDegrees"=NA,
"value.AltitudeMeters"=NA,           
"value.DistanceMeters"=NA,
"value.Value"=NA,
"value.Speed"=NA,                    
"Time"=NA) 

# Loop through the files and fill up the data frame
for(i in 1:length(files))
	{
	doc <- xmlParse(files[i])
	nodes 0)  #Check that there is data in the activity
		{
		# Convert nodes to a data frame and give the activity an ID
		mydf <- cbind(i, plyr::ldply(nodes, as.data.frame(xmlToList))) 
		colnames(mydf)[1] <- "value.ActivityID"
		
		# I included this as some of my activities had different numbers of 
		# fields. This may not be needed (majority had 9).
		if(ncol(mydf)==9)
			{
			actv <- rbind(actv, mydf)
			}
		}
	}

To make the visualisations, I first converted the data frame to a SpatialPointsDataFrame using the following function:

tcxToPoints <- function(tcx=NA, actv=NA, proj4string=CRS("+proj=longlat +ellps=WGS84 +datum=WGS84"))
	{
	# A function to import Garmin Connect activities and convert them to spatialPoints objects
	# Inputs:
	# tcx = An individual .tcx file 
	# actv = A data frame of activities if using above code
	# proj4string = coordinate reference system for SpatialPointsDataFrame

	if(!is.na(tcx))
		{
		doc <- xmlParse(tcx)
		nodes <- getNodeSet(doc, "//ns:Trackpoint", "ns")
		mydf <- plyr::ldply(nodes, as.data.frame(xmlToList))
		}
	else
		{
		mydf <- actv
		}
	# remove missing coordinates
	mydf <- mydf[-(which(is.na(mydf[,"value.Position.LatitudeDegrees"]))),]

	coords <- cbind(as.numeric(as.matrix(mydf[,"value.Position.LongitudeDegrees"])), as.numeric(as.matrix(mydf[,"value.Position.LatitudeDegrees"])))
	pts <- SpatialPointsDataFrame(coords=coords, proj4string=proj4string,
	data=subset(mydf, select=-c(value.Position.LatitudeDegrees, value.Position.LongitudeDegrees))) # data without the coordinates
	mode(pts@data[,6]) <- "numeric"
	
	# Create a speed column in kph
	pts@data <- cbind(pts@data, (pts@data[,6]*3600)/1000)
	colnames(pts@data)[ncol(pts@data)] <- "speedKph"
	
	# Change dates to POSIX format
	pts@data[,2] <- as.POSIXct(gsub("T", " ", pts@data[,2]))
	return(pts=pts)
	}
tcxPoints <- tcxToPoints(actv=actv)
# remove unrealistic speeds (in this case over 80kph)
tcxPoints <- tcxPoints[-which(tcxPoints@data[,8]>80),]
# transform to OSGB 1936
tcxPointsOSGB <- spTransform(tcxPoints, CRSobj=CRS("+proj=tmerc +lat_0=49 +lon_0=-2 +k=0.9996012717 +x_0=400000 +y_0=-100000 +ellps=airy +datum=OSGB36 +units=m +no_defs"))

I carried out an intermediate step here to isolate journeys that had start and end points within 500 metres of my work location and 200 metres of my home locations. I then created the plot using the following code:

plt <- tcxPointsOSGB # Or a subset thereof

# Create breaks and define colour palette, the alpha value 
# is used to define transparency
brks <- quantile(plt@data$speedKph, seq(0,1,1/5), na.rm=T, digits=2)
cols <- colorRampPalette(c(rgb(1,0,0,0.025), 
rgb(1,1,1,0.05)), alpha=T)(5)

# Set background to black and plot
par(bg="black")
plot(plt, col=cols[findInterval(x=plt@data[,8], vec=brks)], pch=1, cex=0.35)
par(bg="white") #Reset background to white

# Create a palette for the legend without transparency
legCols <- colorRampPalette(c(rgb(1,0,0,1), 
rgb(1,1,1,1)), alpha=T)(5)

# Add a legend using the bounding box for positioning
legend(x=bbox(plt)[1,1], y=bbox(plt)[2,2], 
legend=leglabs(round(brks, digits=2)), 
fill=legCols, cex=0.75, title="Speed (km/h)",
text.col="white", bty="n")

# Add a North arrow using bounding box for positioning
# and height

arrowHeight <- (bbox(plt)[1,2]-bbox(plt)[1,1])/5
arrows(bbox(plt)[1,2], bbox(plt)[2,1], bbox(plt)[1,2], 
bbox(plt)[2,1]+arrowHeight, col="white", code=2, lwd=3)

Hopefully this code can be applied to any .tcx file without much modification, but .tcx files may vary according to the functionality of the device. I would be interested to know if anyone applies this code to their data and come across any problems.

Loading PostGIS geometries into R without rgdal – an approach without loops

R Stuff

Some work I have been doing recently has involved setting up a PostGIS database to store spatio-temporal data and corresponding geometries. I like to do my analysis in R, so I needed to import the well-known binary (WKB) geometries into the R environment. If you have drivers for PostGIS in your gdal installation, this is straightforward, and instructions are here. However, getting the drivers can be tricky on Windows and was not something I wanted to spend time doing. Luckily, it is possible to get around the problem by returning the geometries as well-known text (WKT) from PostGIS and converting them to spatial objects in R. Lee Hachadoorian describes a way of doing this here. The issue with Lee’s method (as he points out) is that it uses loops, which most R users know are very inefficient. I wanted to avoid loops and came up with the following solution using RPostgreSQL, rgeos and sp:

#Load the required packages
require(RPostgreSQL)
require(rgeos)
require(sp)
# create driver
dDriver <- dbDriver("PostgreSQL")
#connect to the server, replacing with your credentials
conn <- dbConnect(dDriver, user="user", password="password", dbname="dbname")

In the following query I am selecting the ID (id) and geometry (geom) columns from a table called ‘your_geometry’ that fall within a specified bounding box. The function ST_AsText is used to convert WKB to WKT.

# Select and return the geometry as WKT
rs <- dbSendQuery(conn, 'select id, ST_AsText(geom) from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
# Fetch the results
res <- fetch(rs, -1)
dbClearResult(rs)

The readWKT function converts the WKT to R Spatial objects such as SpatialPoints, SpatialLines or SpatialPolygons. In this case I am using lines.

# Use the readWKT function to create a list of SpatialLines 
# objects from the PostGIS geometry column
str <- lapply(res[,2], "readWKT", p4s=CRS("+proj=longlat +datum=WGS84"))
# Add the IDs to the SpatialLines objects using spChFIDs
coords <- mapply(spChFIDs, str, as.character(res[,1]))

Now it is a case of creating a SpatialLinesDataFrame and adding the remaining attributes from the attribute table.

# Query the remaining fields in the attribute table
rs <- dbSendQuery(conn, 'select * from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
res <- fetch(rs, -1)
dbClearResult(rs)
# Create a SpatialLinesDataFrame with the geometry and the 
# attribute table
rownames(res) <- res[,1]
# Here I assume the geometry is in the last column and remove it from the attributes
data <- SpatialLinesDataFrame(SpatialLines(unlist(lapply(coords, function(x) x@lines)),proj4string=CRS("+proj=longlat +datum=WGS84")), res[,-ncol(res)])

The tricky bit of this was working out the way that Line, Lines, SpatialLines and SpatialLinesDataFrame objects interact. readWKT makes SpatialLines objects. If you try to create a SpatialLinesDataFrame from the result, it will give you a single feature with space for one attribute. Therefore, it is necessary to extract the Lines from the SpatialLines, and then convert them back to individual SpatialLines objects… This was a real pain to work out, so I thought I would post it in case it is useful for anyone.

Note that I have queried the same data twice; first to select the geometry as WKT and then to get the remaining attributes. I did this because the query was fast and I didn’t want to manually type out the column names of the attribute table in the first query. It would be easy to do the whole operation in one query.

If you have any ways of simplifying this or speeding it up further I would be interested to know!