Artur Carter

0 %
Zak Abdel-Illah
Automation Enthusiast
  • 📍 Location
    🇬🇧 London
Certifications
  • AWS Solutions Architect - Associate
Languages
  • Python
  • Go
Technologies
  • Ansible
  • Linux
  • Terraform
  • Kubernetes

Bus route-finding with PostGIS and ArangoDB

December 2, 2024

I created a bus route finding tool using PostGIS and ArangoDB.

I wanted a different way to navigate from village to village using bus routes. Transport for London (TfL) provides spider maps (which is almost the perfect answer), but they don’t show long bus routes and are specific to London.

While online mapping services are the ‘de-facto’ approach for finding specific routes, I find that they don’t tend to resolve long routes well and sometimes don’t show all services available. Instead, I’d like a way to know which routes are scheduled to pass through which village and allow myself (or a tool) to resolve that.

An example of a spider map for Heathrow Airport. (From the TfL Website). I can visually see which buses I can use to reach Hounslow, Feltham or Hayes without worrying about the stops in-between, but I can’t see what’s further than these towns.

To get an overview of possible routes at scale, I built a workflow that uses;

  • OpenStreetMap data stored in PostgreSQL to resolve bus route connections
  • ArangoDB to store resolved bus route connections and the villages that they connect to
  • Python as an intermediary for bringing data from PostgreSQL to ArangoDB and presenting routes

Obtaining & Interpreting the OpenStreetMap Data

I imported the London dataset from Geofabrik into PostgreSQL using osm2pgsql. I have a database instance running locally on my MacBook Pro, but a cloud-managed instance (e.g using RDS) would work just as well.

osm2pgsql -d osm -U osmuser --create --slim --hstore -C 2000 --number-processes 4 greater-london-latest.osm.pbf

# -d [database name]
# -U [postgres username]
# -C 2000; for limiting the amount of RAM usage by the import tool

Visualizing the OpenStreetMap Data

QGIS is a great tool that allows me to explore the openstreetmap dataset directly from PostgreSQL (or file) in it’s intended form, a map. I’ll be using it to drive how my SQL queries are structured and to see what data points I can use to reach the end goal.

Adding a data source for viewing in QGIS can be done by right-clicking PostgreSQL and hitting “New Connection” inside the browser

Since I’m working on my local MacBook, simply providing the host and database is enough. I don’t need authentication in my ‘rapid-development’ scenario.

Once imported, I extracted planet_osm_line for routes and planet_osm_points for villages or ‘places of interest’. I do this by applying a filter of route = 'bus' on the planet_osm_line layer, and place = 'suburb' on planet_osm_point.

The Attribute Table in QGIS for planet_osm_point, which shows the dataset in its’ relational form, or in the way that PostgreSQL would present it. Here is where I isolate that I need to filter place by suburb.
Representation of the data points within planet_osm_point table from PostgreSQL, filtered down to show only entries that are of type suburb.
Representation of the planet_osm_line table from PostgreSQL, filtered down to show only entries that represent a bus route.

What am I trying to achieve?

The end-goal is to be able to resolve which planet_osm_line[] can reach from planet_osm_point A -> planet_osm_point B, or in a simpler terms, “Which buses can I use to get from town A to town B”. Not specifically “How can I get from my house to the specific store on the other side of town”.

Example of how I’ll achieve this in West London. Within GIS, a Buffer expands a point into an area, while an Intersection captures at any geometry that insects with another. The idea is to get all lines that pass through the blue circle.

I was able to create a 1km boundary from planet_osm_points directly in PostgreSQL by using the ST_Buffer GIS function, and to conditionally return using the ST_Intersects GIS function. Additionally, to filter out all the irrelevant data points, I return all points where place has a value of suburb, and all where all lines where the value of route is bus.

WITH suburb_buffer AS (
    SELECT name, place, ST_Buffer(way, 1000) AS buffer
    FROM planet_osm_point
    WHERE place = 'suburb'
)
SELECT DISTINCT line.ref, sb.name
FROM planet_osm_line line
JOIN suburb_buffer sb ON ST_Intersects(line.way, sb.buffer)
WHERE line.route = 'bus';
line.refsb.name
SL8Hanwell
SL8Hillingdon
SL8Shepherd’s Bush

With this data returned, I can clearly see that the path of the SL8 bus route passes through Hanwell, Hillingdon and Shepherd’s Bush.

I’m relying on the database to handle as much of the resolving logic as possible as opposed to any Python code, mainly for reducing time spent on maintaining Python code. In order, this query is doing the following;

  1. suburb_buffer represents a list of all suburbs according to OpenStreetMap.
    • way is a single point stored as GIS data.
    • I request a circle around way points that is 1km wide, and store it as buffer.
  2. sb is a single iteration of suburb_buffer, which represents planet_osm_points.
  3. line is a single iteration of planet_osm_line, restricted by line.route = 'bus'
  4. line is also restricted by whether ST_Intersects() returns True.
    • ST_Intersects() verifies whether the line in iteration matches the point in iteration.

I then moved this query to a Python script to translate this information to ArangoDB. I did this so that I can use the best tool for the job. As a graph database, I can quickly perform complex relational queries without much computational overhead.

Three collections exist in ArangoDB, where each returned value from PostgreSQL is converted into one document in each collection;

  1. route holds all bus routes, taken from line.ref (planet_osm_line)
  2. location holds all suburbs, taken from sb.name (planet_osm_point)
  3. route_connections is the confirmation returned from ST_Intersects() (route passes through location)
    • _to and _from are required keys in an Edge document.
    • After creating the route and location, I;
      • Take the _id of each
      • Set _to on the route_connection to the _id of location
      • Set _from on the route_connection to the _id of route
A graph view with Hayes (from planet_osm_point) as a starting point. A purple node represents a suburb, a grey node represents a route, and a line states that a route passes through a suburb.

AQL (ArangoDB Query Language) is used to traverse the graph and resolve routes.

FOR vtx, edge, path IN 1..6 ANY 'location/{START}' route_connection
    FILTER vtx.name == "{TARGET}"
    RETURN path

Here, I traverse from location/{START} to a location where location.name is {TARGET}. This is checked against the route_connection collection. For the result below, {START} is “Yiewsley” and {TARGET} is “Shepherd’s Bush”.

Finally, I built an “API Bridge”, returning a JSON representation of calculated routes for use by a possible Vue.JS interface or the likes.

"698 ➡️ Hillingdon ➡️ SL8, N207 ➡️ Shepherd's Bush ➡️"
{
	"Hillingdon": {
		"routes": ["698", "U1", "U3", "U5"],
		"next": {
			"Shepherd's Bush": {
				"routes": ["SL8"]
			}
		}
	}
}
A basic CLI implemented in Python, where a route is calculated from Yiewsley to Shepherd’s Bush. It definitely presents a lot more routes, some with a lot more stops, but it’s a start! This takes approximately 0.2 seconds to perform its’ calculation.

In the future, I’d to render spider maps using this data or to be able to interpret this data on-the-go on my iPhone. But at-least now I can get all the possible routes (extreme, quick and sometimes not possible, since 698 is the local school bus) from one village to another without a specific location in mind.

I’d also like to explore optimizing the data structure within the graph database, perhaps moving the route collection into route_connections and eventually including stops and distance to provide more accurate resolves.

Posted in R&DTags:
© 2020. All rights reserved
© 2024 All Rights Reserved. me@zai.dev