I created a bus route finding tool using PostGIS and ArangoDB.
I wanted a different way to navigate from village to village using bus routes. Transport for London (TfL) provides spider maps (which is almost the perfect answer), but they don’t show long bus routes and are specific to London.
While online mapping services are the ‘de-facto’ approach for finding specific routes, I find that they don’t tend to resolve long routes well and sometimes don’t show all services available. Instead, I’d like a way to know which routes are scheduled to pass through which village and allow myself (or a tool) to resolve that.

To get an overview of possible routes at scale, I built a workflow that uses;
- OpenStreetMap data stored in PostgreSQL to resolve bus route connections
- ArangoDB to store resolved bus route connections and the villages that they connect to
- Python as an intermediary for bringing data from PostgreSQL to ArangoDB and presenting routes
Obtaining & Interpreting the OpenStreetMap Data
I imported the London dataset from Geofabrik into PostgreSQL using osm2pgsql. I have a database instance running locally on my MacBook Pro, but a cloud-managed instance (e.g using RDS) would work just as well.
osm2pgsql -d osm -U osmuser --create --slim --hstore -C 2000 --number-processes 4 greater-london-latest.osm.pbf
# -d [database name]
# -U [postgres username]
# -C 2000; for limiting the amount of RAM usage by the import tool
Visualizing the OpenStreetMap Data
QGIS is a great tool that allows me to explore the openstreetmap dataset directly from PostgreSQL (or file) in it’s intended form, a map. I’ll be using it to drive how my SQL queries are structured and to see what data points I can use to reach the end goal.

Adding a data source for viewing in QGIS can be done by right-clicking PostgreSQL and hitting “New Connection” inside the browser

Since I’m working on my local MacBook, simply providing the host and database is enough. I don’t need authentication in my ‘rapid-development’ scenario.
Once imported, I extracted planet_osm_line for routes and planet_osm_points for villages or ‘places of interest’. I do this by applying a filter of route = 'bus' on the planet_osm_line layer, and place = 'suburb' on planet_osm_point.

planet_osm_point, which shows the dataset in its’ relational form, or in the way that PostgreSQL would present it. Here is where I isolate that I need to filter place by suburb.
planet_osm_point table from PostgreSQL, filtered down to show only entries that are of type suburb.
planet_osm_line table from PostgreSQL, filtered down to show only entries that represent a bus route.What am I trying to achieve?
The end-goal is to be able to resolve which planet_osm_line[] can reach from planet_osm_point A -> planet_osm_point B, or in a simpler terms, “Which buses can I use to get from town A to town B”. Not specifically “How can I get from my house to the specific store on the other side of town”.

Buffer expands a point into an area, while an Intersection captures at any geometry that insects with another. The idea is to get all lines that pass through the blue circle.I was able to create a 1km boundary from planet_osm_points directly in PostgreSQL by using the ST_Buffer GIS function, and to conditionally return using the ST_Intersects GIS function. Additionally, to filter out all the irrelevant data points, I return all points where place has a value of suburb, and all where all lines where the value of route is bus.
WITH suburb_buffer AS (
SELECT name, place, ST_Buffer(way, 1000) AS buffer
FROM planet_osm_point
WHERE place = 'suburb'
)
SELECT DISTINCT line.ref, sb.name
FROM planet_osm_line line
JOIN suburb_buffer sb ON ST_Intersects(line.way, sb.buffer)
WHERE line.route = 'bus';
| line.ref | sb.name |
|---|---|
| SL8 | Hanwell |
| SL8 | Hillingdon |
| SL8 | Shepherd’s Bush |
With this data returned, I can clearly see that the path of the SL8 bus route passes through Hanwell, Hillingdon and Shepherd’s Bush.
I’m relying on the database to handle as much of the resolving logic as possible as opposed to any Python code, mainly for reducing time spent on maintaining Python code. In order, this query is doing the following;
suburb_bufferrepresents a list of all suburbs according to OpenStreetMap.wayis a single point stored as GIS data.- I request a circle around
waypoints that is 1km wide, and store it asbuffer.
sbis a single iteration ofsuburb_buffer, which representsplanet_osm_points.lineis a single iteration ofplanet_osm_line, restricted byline.route = 'bus'lineis also restricted by whetherST_Intersects()returnsTrue.ST_Intersects()verifies whether the line in iteration matches the point in iteration.
I then moved this query to a Python script to translate this information to ArangoDB. I did this so that I can use the best tool for the job. As a graph database, I can quickly perform complex relational queries without much computational overhead.
Three collections exist in ArangoDB, where each returned value from PostgreSQL is converted into one document in each collection;
routeholds all bus routes, taken fromline.ref(planet_osm_line)locationholds all suburbs, taken fromsb.name(planet_osm_point)route_connectionsis the confirmation returned fromST_Intersects()(routepasses throughlocation)_toand_fromare required keys in an Edge document.- After creating the
routeandlocation, I;- Take the
_idof each - Set
_toon theroute_connectionto the_idoflocation - Set
_fromon theroute_connectionto the_idofroute
- Take the

planet_osm_point) as a starting point. A purple node represents a suburb, a grey node represents a route, and a line states that a route passes through a suburb.AQL (ArangoDB Query Language) is used to traverse the graph and resolve routes.
FOR vtx, edge, path IN 1..6 ANY 'location/{START}' route_connection
FILTER vtx.name == "{TARGET}"
RETURN path
Here, I traverse from location/{START} to a location where location.name is {TARGET}. This is checked against the route_connection collection. For the result below, {START} is “Yiewsley” and {TARGET} is “Shepherd’s Bush”.
Finally, I built an “API Bridge”, returning a JSON representation of calculated routes for use by a possible Vue.JS interface or the likes.
"698 ➡️ Hillingdon ➡️ SL8, N207 ➡️ Shepherd's Bush ➡️"
{
"Hillingdon": {
"routes": ["698", "U1", "U3", "U5"],
"next": {
"Shepherd's Bush": {
"routes": ["SL8"]
}
}
}
}

In the future, I’d to render spider maps using this data or to be able to interpret this data on-the-go on my iPhone. But at-least now I can get all the possible routes (extreme, quick and sometimes not possible, since 698 is the local school bus) from one village to another without a specific location in mind.
I’d also like to explore optimizing the data structure within the graph database, perhaps moving the route collection into route_connections and eventually including stops and distance to provide more accurate resolves.