Loading PostGIS geometries into R without rgdal – an approach without loops

R Stuff

Some work I have been doing recently has involved setting up a PostGIS database to store spatio-temporal data and corresponding geometries. I like to do my analysis in R, so I needed to import the well-known binary (WKB) geometries into the R environment. If you have drivers for PostGIS in your gdal installation, this is straightforward, and instructions are here. However, getting the drivers can be tricky on Windows and was not something I wanted to spend time doing. Luckily, it is possible to get around the problem by returning the geometries as well-known text (WKT) from PostGIS and converting them to spatial objects in R. Lee Hachadoorian describes a way of doing this here. The issue with Lee’s method (as he points out) is that it uses loops, which most R users know are very inefficient. I wanted to avoid loops and came up with the following solution using RPostgreSQL, rgeos and sp:

#Load the required packages
require(RPostgreSQL)
require(rgeos)
require(sp)
# create driver
dDriver <- dbDriver("PostgreSQL")
#connect to the server, replacing with your credentials
conn <- dbConnect(dDriver, user="user", password="password", dbname="dbname")

In the following query I am selecting the ID (id) and geometry (geom) columns from a table called ‘your_geometry’ that fall within a specified bounding box. The function ST_AsText is used to convert WKB to WKT.

# Select and return the geometry as WKT
rs <- dbSendQuery(conn, 'select id, ST_AsText(geom) from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
# Fetch the results
res <- fetch(rs, -1)
dbClearResult(rs)

The readWKT function converts the WKT to R Spatial objects such as SpatialPoints, SpatialLines or SpatialPolygons. In this case I am using lines.

# Use the readWKT function to create a list of SpatialLines 
# objects from the PostGIS geometry column
str <- lapply(res[,2], "readWKT", p4s=CRS("+proj=longlat +datum=WGS84"))
# Add the IDs to the SpatialLines objects using spChFIDs
coords <- mapply(spChFIDs, str, as.character(res[,1]))

Now it is a case of creating a SpatialLinesDataFrame and adding the remaining attributes from the attribute table.

# Query the remaining fields in the attribute table
rs <- dbSendQuery(conn, 'select * from your_geometry where
your_geometry.geom && ST_MakeEnvelope(-0.149, 51.51, -0.119, 51.532, 4326);')
res <- fetch(rs, -1)
dbClearResult(rs)
# Create a SpatialLinesDataFrame with the geometry and the 
# attribute table
rownames(res) <- res[,1]
# Here I assume the geometry is in the last column and remove it from the attributes
data <- SpatialLinesDataFrame(SpatialLines(unlist(lapply(coords, function(x) x@lines)),proj4string=CRS("+proj=longlat +datum=WGS84")), res[,-ncol(res)])

The tricky bit of this was working out the way that Line, Lines, SpatialLines and SpatialLinesDataFrame objects interact. readWKT makes SpatialLines objects. If you try to create a SpatialLinesDataFrame from the result, it will give you a single feature with space for one attribute. Therefore, it is necessary to extract the Lines from the SpatialLines, and then convert them back to individual SpatialLines objects… This was a real pain to work out, so I thought I would post it in case it is useful for anyone.

Note that I have queried the same data twice; first to select the geometry as WKT and then to get the remaining attributes. I did this because the query was fast and I didn’t want to manually type out the column names of the attribute table in the first query. It would be easy to do the whole operation in one query.

If you have any ways of simplifying this or speeding it up further I would be interested to know!

2 thoughts on “Loading PostGIS geometries into R without rgdal – an approach without loops

  1. Hi James Haworth,

    At first, thank’s a bunch! Your code helped me to get through this mentioned pain 😉
    Some questions / suggestions:
    – Why do you use dbSendQuery() and fetch() instead of just dbGetQuery? Imho the latter produces the same result and it’s a bit shorter.
    – I wouldn’t use ‘str’ as a variable name since str() is already a valuable function in R
    – For a clearer understanding it would have been better to provide the column names instead of the * in your second SQL statement
    – Your second link only leads to the homepage of r-bloggers.com
    – I did it with SpatialPolygons (Only some minor changes to apply):
    data <- SpatialPolygonsDataFrame(SpatialPolygons(unlist(lapply(coords, function(x) x@polygons)), proj4string=(CRS(p4s))), data=res[, ncol(res)])
    – I also didn't type in the proj4string CRS, I loaded it as follows:
    EPSG = make_EPSG()
    p4s = EPSG[which(EPSG$code == 31467), "prj4"]
    – It's the first time I worked with the sp-package and the following description helped a lot to understand the package' structure:

    regards
    Andreas

    1. Hi Andreas,
      Thanks for you comment and I’m glad my code was useful for you. Regarding your comments, the honest answer is I just did this in the quickest way possible and posted it because it was a bit of a pain! Thanks for your suggestions too!

      James

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s