replication - Handling Updates of OSM Data

Replication servers provide regular updates of OSM data. This module provides helper functions to access the servers and download and apply updates.

Replication Server Class

class osmium.replication.server.ReplicationServer(url, diff_type='osc.gz')

Represents a server that publishes replication data. Replication change files allow to keep local OSM data up-to-date without downloading the full dataset again.

collect_diffs(start_id, max_size=1024)

Create a MergeInputReader and download diffs starting with sequence id start_id into it. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If some data was downloaded, returns a namedtuple with three fields: id contains the sequence id of the last downloaded diff, reader contains the MergeInputReader with the data and newest is a sequence id of the most recent diff available.

Returns None if there was an error during download or no new data was available.

apply_diffs(handler, start_id, max_size=1024, idx='', simplify=True)

Download diffs starting with sequence id start_id, merge them together and then apply them to handler handler. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If idx is set, a location cache will be created and applied to the way nodes. You should be aware that diff files usually do not contain the complete set of nodes when a way is modified. That means that you cannot just create a new location cache, apply it to a diff and expect to get complete way geometries back. Instead you need to do an initial data import using a persistent location cache to obtain a full set of node locations and then reuse this location cache here when applying diffs.

Diffs may contain multiple versions of the same object when it was changed multiple times during the period covered by the diff. If simplify is set to False then all versions are returned. If it is True (the default) then only the most recent version will be sent to the handler.

The function returns the sequence id of the last diff that was downloaded or None if the download failed completely.

apply_diffs_to_file(infile, outfile, start_id, max_size=1024, set_replication_header=True)

Download diffs starting with sequence id start_id, merge them with the data from the OSM file named infile and write the result into a file with the name outfile. The output file must not yet exist.

max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If set_replication_header is true then the URL of the replication server and the sequence id and timestamp of the last diff applied will be written into the writer. Note that this currently works only for the PBF format.

The function returns a tuple of last downloaded sequence id and newest available sequence id if new data has been written or None if no data was available or the download failed completely.

timestamp_to_sequence(timestamp, balanced_search=False)

Get the sequence number of the replication file that contains the given timestamp. The search algorithm is optimised for replication servers that publish updates in regular intervals. For servers with irregular change file publication dates ‘balanced_search` should be set to true so that a standard binary search for the sequence will be used. The default is good for all known OSM replication services.

get_state_info(seq=None)

Downloads and returns the state information for the given sequence. If the download is successful, a namedtuple with sequence and timestamp is returned, otherwise the function returns None.

get_diff_block(seq)

Downloads the diff with the given sequence number and returns it as a byte sequence. Throws a urllib.error.HTTPError (or urllib2.HTTPError in python2) if the file cannot be downloaded.

get_state_url(seq)

Returns the URL of the state.txt files for a given sequence id.

If seq is None the URL for the latest state info is returned, i.e. the state file in the root directory of the replication service.

get_diff_url(seq)

Returns the URL to the diff file for the given sequence id.