Filtering by Geometry¶
How to create geographic extracts from an OSM file.
Task¶
Given the country extract of Liechtenstein, extract all data that is within 2km of the coordinates 47.13,9.52. All objects inside the geographic area should be complete, meaning that complete geometries can be created for them.
Quick solution¶
import osmium
with osmium.ForwardReferenceWriter('../data/out/centre.osm.pbf',
'../data/liechtenstein.osm.pbf', overwrite=True) as writer:
for obj in osmium.FileProcessor('../data/liechtenstein.osm.pbf', osmium.osm.NODE):
if osmium.geom.haversine_distance(osmium.osm.Location(9.52, 47.13), obj.location) < 2000:
writer.add_node(obj)
Background¶
OSM data is not a simple selection of geometries. In an OSM data file only the OSM nodes have a location. All other OSM object are made up of OSM nodes or other OSM objects. To find out where an OSM way or relation is located on the planet, it is necessary to go back to the nodes it references.
For the task at hand this means that any filtering by geometry needs to start with the OSM nodes. Lets start with a simple script that writes out all the nodes within the circle defined in the task:
with osmium.SimpleWriter('../data/out/centre.opl', overwrite=True) as writer:
for obj in osmium.FileProcessor('../data/liechtenstein.osm.pbf', osmium.osm.NODE):
if osmium.geom.haversine_distance(osmium.osm.Location(9.52, 47.13), obj.location) < 2000:
writer.add_node(obj)
The FileProcessor reads the data and SimpleWriter writes the nodes out that we are interested in. Given that we are looking at nodes only, the FileProcessor can be restricted to that type. For one thing, this makes processing faster. For another it means, we don't have to explicitly check for the type of the object within the for loop. We can trust that only nodes will be returned. Checking if a node should be included in the output file is a simple matter of computing the distance between the target coordinates and the location of the node. pyosmium has a convenient function haversine_distance()
for that. It computes the distance between two points in meters.
This gives us a file with nodes. But what about the ways and relations? To find out which ones to include, we need to follow the forward references. Given the IDs of the nodes already included in the file, we need to find the ways which reference any of the nodes. And then we need to find relations which reference either nodes already included or one of the newly found ways. Luckily for us, OSM files are ordered by node, way and relations. So by the time the FileProcessor sees the first way, it will already have seen all the nodes and it can make an informed decision, if the way needs including or not. The same is true for relations. They are last in the file, so all the node and way members have been processed already. The situation is more complicated with relation members and nested relations. We leave those out for the moment.
Given that nodes, ways and relations need to be handled differently and we need to carry quite a bit of state, it is easier to implement the forward referencing collector as a handler class:
class CoordHandler:
def __init__(self, coord, dist, writer):
self.center = osmium.osm.Location(*coord)
self.dist = dist
self.writer = writer
self.id_tracker = osmium.IdTracker()
def node(self, n):
if osmium.geom.haversine_distance(self.center, n.location) <= self.dist:
self.writer.add_node(n)
self.id_tracker.add_node(n.id)
def way(self, w):
if self.id_tracker.contains_any_references(w):
self.writer.add_way(w)
self.id_tracker.add_way(w.id)
def relation(self, r):
if self.id_tracker.contains_any_references(r):
self.writer.add_relation(r)
The IdTracker
class helps to keep track of all the objects that appear in the file. Every time a node or way is written, its ID is recorded. Tracking relation IDs would only be necessary for nested relations. The IDTracker gives us also a convenient function contains_any_reference()
which checks if any of the IDs it is tracking is needed by the given object. If that is the case, the object needs to be written out.
This is almost it. To get a referentially complete output file, we also need to add the objects that are referenced by the ways and relations we have added. This can be easily achieved by using the BackReferenceWriter
in place of the SimpleWriter
:
with osmium.BackReferenceWriter('../data/out/centre.osm.pbf', ref_src='../data/liechtenstein.osm.pbf', overwrite=True) as writer:
osmium.apply('../data/liechtenstein.osm.pbf', CoordHandler((9.52, 47.13), 2000, writer))
To learn more about adding backward references, have a look at the cookbook on Filtering By Tags.
The ForwardReferenceWriter
helps to automate most of what we have just done manually. It is a replacement for the SimpleWriter
which collects the forward references under the hood. It will first collects the OSM data that should be written in a temporary file. When the writer is closed, it adds the forward references from a reference file. This means, the ForwardReferenceWriter
needs two mandatory parameters to be instantiated: the name of the file to write to and the name of the file to copy the referenced data from:
writer = osmium.ForwardReferenceWriter('../data/out/centre.osm.pbf', '../data/liechtenstein.osm.pbf', overwrite=True)
The writer will by default also add the necessary objects to make the file reference-complete. The writer can now replace the SimpleWriter in the code with the first attempt, resulting in the final solution shown in the Quick Solution.