Replication Tools🔗
OpenStreetMap is a database that is constantly extended and updated. When you
download the planet or an extract of it, you only get a snapshot of the
database at a given point in time. To keep up-to-date with the development
of OSM, you either need to download a new snapshot or you can update your
existing data from change files published along with the planet file.
Pyosmium ships with two tools that help you to process change files:
pyosmium-get-changes
and pyosmium-up-to-date
.
This section explains the basics of OSM change files and how to use Pyosmium's tools to keep your data up to date.
About change files🔗
Regular change files are published for the planet and also by some extract services. These change files are special OSM data files containing all changes to the database in a regular interval. Change files are not referentially complete. That means that they only contain OSM objects that have changed but not necessarily all the objects that are referenced by the changed objects. Because of that change file are rarely useful on their own. But they can be used to update an existing snapshot of OSM data.
Getting change files🔗
There are multiple sources for OSM change files available:
-
https://planet.openstreetmap.org/replication is the official source for planet-wide updates. There are change files for minutely, hourly and daily intervals available.
-
Geofabrik offers daily change files for all its updates. See the extract page for a link to the replication URL. Note that change files go only about 3 months back. Older files are deleted.
-
download.openstreetmap.fr offers minutely change files for all its extracts.
For other services also check out the list of providers on the OSM wiki.
Updating a planet or extract🔗
If you have downloaded the full planet or obtain a PBF extract file from one of the sites which offer a replication service, then updating your OSM file can be as easy as:
pyosmium-up-to-date <osmfile.osm.pbf>
This finds the right replication source and file to start with, downloads changes and updates the given file with the data. You can repeat this command whenever you want to have newer data. The command automatically picks up at the same point where it left off after the previous update.
Choosing the replication source🔗
OSM files in PBF format are able to save the replication source and the
current status on their own. That is why pyosmium-up-to-date is able to
automatically do the right thing.
If you want to switch the replication source
or have a file that does not have replication information, you need to bootstrap
the update process and manually point pyosmium-up-to-date
to the right
service:
pyosmium-up-to-date --ignore-osmosis-headers --server <replication URL> <osmfile.osm.pbf>
pyosmium-up-to-date
automatically finds the right sequence ID to use
by looking at the age of the data in your OSM file. It updates the file
and stores the new replication source in the file. The additional parameters
are then not necessary anymore for subsequent updates.
Tip
Always use the PBF format to store your data. Other format do not support to save the replication information. pyosmium-up-to-date is still able to update these kind of files if you manually point to the replication server but the process is always more costly because it needs to find the right starting point for updates first.
Updating larger amounts of data🔗
When used without any parameters, pyosmium downloads at a maximum about
1GB of changes. That corresponds to about 3 days of planet-wide changes.
You can increase the amount using the additional --size
parameter:
pyosmium-up-to-date --size=10000 planet.osm.pbf
This would download about 10GB or 30 days of change data. If your OSM data file is older than that, downloading the full file anew is likely going to be faster.
pyosmium-up-to-date
uses return codes to signal if it has downloaded all
available updates. A return code of 0 means that it has downloaded and
applied all available data. A return code of 1 indicates that it has applied
some updates but more are available.
A minimal script that updates a file until it is really up-to-date with the replication source would look like this:
status=1 # we want more data
while [ $status -eq 1 ]; do
pyosmium-up-to-date planet.osm.pbf
# save the return code
status=$?
done
Creating change files for updating databases🔗
There are quite a few tools that can import OSM data into databases, for example osm2pgsql, imposm or Nominatim. These tools often can use change files to keep their database up-to-date. pyosmium can be used to create the appropriate change files. This is slightly more involved than updating a file.
Preparing the state file🔗
Before downloading the updates, you need to find out with which sequence number to start. The easiest way to remember your current status is to save the number in a file. pyosmium can then read and update the file for you.
Method 1: Starting from the import file🔗
If you still have the OSM file you used to set up your database, then create a state file as follows:
pyosmium-get-changes -O <osmfile.osm.pbf> -f sequence.state -v
Note that there is no output file yet. This creates a new file sequence.state
with the sequence ID where updates should start and prints the URL of the
replication service to use.
Method 2: Starting from a date🔗
If you do not have the original OSM file anymore, then a good strategy is to
look for the date of the newest node in the database to find the snapshot date
of your database. Find the highest node ID, then look up the date for version 1
on the OSM website. For example the date for node 2367234 can be found at
https://www.openstreetmap.org/api/0.6/node/23672341/1
Find and copy the timestamp
field. Then create a state file using this date:
pyosmium-get-changes -D 2007-01-01T14:16:21Z -f sequence.state -v
As before, this creates a new file sequence.state
with the sequence ID where
updates should start and prints the URL of the replication service to use.
Creating a change file🔗
Now you can create change files using the state:
pyosmium-get-changes --server <replication server> -f sequence.state -o newchange.osc.gz
This downloads the latest changes from the server, saves them in the file
newchange.osc.gz
and updates your state file. <replication server>
is the
URL that was printed when you set up the state file. The parameter can be
omitted when you use minutely change files from openstreetmap.org.
This simplifies multiple edits of the same element into the final change. If you want to
retain the full version history specify --no-deduplicate
.
pyosmium-get-changes
loads only about 100MB worth of updates at once (about
8 hours of planet updates). If you want more, then add a --size
parameter.
Continuously updating a database🔗
pyosmium-get-changes
emits special return codes that can be used to set
up a script that continuously fetches updates and applies them to a
database. The important error codes are:
- 0 - changes successfully downloaded and new change file created
- 3 - no new changes are available from the server
All other error codes indicate fatal errors.
A simple shell script can look like this:
while true; do
# pyosmium-get-changes would not overwrite an existing change file
rm -f newchange.osc.gz
# get the next batch of changes
pyosmium-get-changes -f sequence.state -o newchange.osc.gz
# save the return code
status=$?
if [ $status -eq 0 ]; then
# apply newchange.osc.gz here
....
elif [ $status -eq 3 ]; then
# No new data, so sleep for a bit
sleep 60
else
echo "Fatal error, stopping updates."
exit $status
done