replication - Handling Updates of OSM Data

Replication servers provide regular updates of OSM data. This module provides helper functions to access the servers and download and apply updates.

Replication Server Class

class osmium.replication.server.ReplicationServer(url: str, diff_type: str = 'osc.gz')

Represents a connection to a server that publishes replication data. Replication change files allow to keep local OSM data up-to-date without downloading the full dataset again.

url contains the base URL of the replication service. This is the directory that contains the state file with the current state. If the replication service serves something other than osc.gz files, set the diff_type to the given file suffix.

ReplicationServer may be used as a context manager. In this case, it internally keeps a connection to the server making downloads faster.

close() None

Close any open connection to the replication server.

set_request_parameter(key: str, value: Any) None

Set a parameter which will be handed to the requests library when calling requests.get(). This may be used to set custom headers, timeouts and similar parameters. See the requests documentation for possible parameters. Per default, a timeout of 60 sec is set and streaming download enabled.

make_request(url: str) Request
open_url(url: Request) Any

Download a resource from the given URL and return a byte sequence of the content.

collect_diffs(start_id: int, max_size: int = 1024) Optional[DownloadResult]

Create a MergeInputReader and download diffs starting with sequence id start_id into it. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If some data was downloaded, returns a namedtuple with three fields: id contains the sequence id of the last downloaded diff, reader contains the MergeInputReader with the data and newest is a sequence id of the most recent diff available.

Returns None if there was an error during download or no new data was available.

apply_diffs(handler: BaseHandler, start_id: int, max_size: int = 1024, idx: str = '', simplify: bool = True) Optional[int]

Download diffs starting with sequence id start_id, merge them together and then apply them to handler handler. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If idx is set, a location cache will be created and applied to the way nodes. You should be aware that diff files usually do not contain the complete set of nodes when a way is modified. That means that you cannot just create a new location cache, apply it to a diff and expect to get complete way geometries back. Instead you need to do an initial data import using a persistent location cache to obtain a full set of node locations and then reuse this location cache here when applying diffs.

Diffs may contain multiple versions of the same object when it was changed multiple times during the period covered by the diff. If simplify is set to False then all versions are returned. If it is True (the default) then only the most recent version will be sent to the handler.

The function returns the sequence id of the last diff that was downloaded or None if the download failed completely.

apply_diffs_to_file(infile: str, outfile: str, start_id: int, max_size: int = 1024, set_replication_header: bool = True, extra_headers: Optional[Mapping[str, str]] = None, outformat: Optional[str] = None) Optional[Tuple[int, int]]

Download diffs starting with sequence id start_id, merge them with the data from the OSM file named infile and write the result into a file with the name outfile. The output file must not yet exist.

max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If set_replication_header is true then the URL of the replication server and the sequence id and timestamp of the last diff applied will be written into the writer. Note that this currently works only for the PBF format.

extra_headers is a dict with additional header fields to be set. Most notably, the ‘generator’ can be set this way.

outformat sets the format of the output file. If None, the format is determined from the file name.

The function returns a tuple of last downloaded sequence id and newest available sequence id if new data has been written or None if no data was available or the download failed completely.

timestamp_to_sequence(timestamp: datetime, balanced_search: bool = False) Optional[int]

Get the sequence number of the replication file that contains the given timestamp. The search algorithm is optimised for replication servers that publish updates in regular intervals. For servers with irregular change file publication dates ‘balanced_search` should be set to true so that a standard binary search for the sequence will be used. The default is good for all known OSM replication services.

get_state_info(seq: Optional[int] = None, retries: int = 2) Optional[OsmosisState]

Downloads and returns the state information for the given sequence. If the download is successful, a namedtuple with sequence and timestamp is returned, otherwise the function returns None. retries sets the number of times the download is retried when pyosmium detects a truncated state file.

get_diff_block(seq: int) str

Downloads the diff with the given sequence number and returns it as a byte sequence. Throws a urllib.error.HTTPError if the file cannot be downloaded.

get_state_url(seq: Optional[int]) str

Returns the URL of the state.txt files for a given sequence id.

If seq is None the URL for the latest state info is returned, i.e. the state file in the root directory of the replication service.

get_diff_url(seq: int) str

Returns the URL to the diff file for the given sequence id.