Skip to content

Replication🔗

Replication server🔗

osmium.replication.ReplicationServer 🔗

Represents a connection to a server that publishes replication data. Replication change files allow to keep local OSM data up-to-date without downloading the full dataset again.

ReplicationServer may be used as a context manager. In this case, it internally keeps a connection to the server making downloads faster.

__init__(url: str, diff_type: str = 'osc.gz') -> None 🔗

Set up the connection to a replication server.

url contains the base URL of the replication service. This is the directory that contains the state file with the current state. If the replication service serves something other than osc.gz files, set the diff_type to the given file suffix.

apply_diffs(handler: BaseHandler, start_id: int, max_size: int = 1024, idx: str = '', simplify: bool = True) -> Optional[int] 🔗

Download diffs starting with sequence id start_id, merge them together and then apply them to handler handler. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If idx is set, a location cache will be created and applied to the way nodes. You should be aware that diff files usually do not contain the complete set of nodes when a way is modified. That means that you cannot just create a new location cache, apply it to a diff and expect to get complete way geometries back. Instead you need to do an initial data import using a persistent location cache to obtain a full set of node locations and then reuse this location cache here when applying diffs.

Diffs may contain multiple versions of the same object when it was changed multiple times during the period covered by the diff. If simplify is set to False then all versions are returned. If it is True (the default) then only the most recent version will be sent to the handler.

The function returns the sequence id of the last diff that was downloaded or None if the download failed completely.

apply_diffs_to_file(infile: str, outfile: str, start_id: int, max_size: int = 1024, set_replication_header: bool = True, extra_headers: Optional[Mapping[str, str]] = None, outformat: Optional[str] = None) -> Optional[Tuple[int, int]] 🔗

Download diffs starting with sequence id start_id, merge them with the data from the OSM file named infile and write the result into a file with the name outfile. The output file must not yet exist.

max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If set_replication_header is true then the URL of the replication server and the sequence id and timestamp of the last diff applied will be written into the writer. Note that this currently works only for the PBF format.

extra_headers is a dict with additional header fields to be set. Most notably, the 'generator' can be set this way.

outformat sets the format of the output file. If None, the format is determined from the file name.

The function returns a tuple of last downloaded sequence id and newest available sequence id if new data has been written or None if no data was available or the download failed completely.

close() -> None 🔗

Close any open connection to the replication server.

collect_diffs(start_id: int, max_size: int = 1024) -> Optional[DownloadResult] 🔗

Create a MergeInputReader and download diffs starting with sequence id start_id into it. max_size restricts the number of diffs that are downloaded. The download stops as soon as either a diff cannot be downloaded or the unpacked data in memory exceeds max_size kB.

If some data was downloaded, returns a namedtuple with three fields: id contains the sequence id of the last downloaded diff, reader contains the MergeInputReader with the data and newest is a sequence id of the most recent diff available.

Returns None if there was an error during download or no new data was available.

get_diff_block(seq: int) -> str 🔗

Downloads the diff with the given sequence number and returns it as a byte sequence. Throws an :code:requests.HTTPError if the file cannot be downloaded.

get_diff_url(seq: int) -> str 🔗

Returns the URL to the diff file for the given sequence id.

get_state_info(seq: Optional[int] = None, retries: int = 2) -> Optional[OsmosisState] 🔗

Downloads and returns the state information for the given sequence. If the download is successful, a namedtuple with sequence and timestamp is returned, otherwise the function returns None. retries sets the number of times the download is retried when pyosmium detects a truncated state file.

get_state_url(seq: Optional[int]) -> str 🔗

Returns the URL of the state.txt files for a given sequence id.

If seq is None the URL for the latest state info is returned, i.e. the state file in the root directory of the replication service.

open_url(url: urlrequest.Request) -> Any 🔗

Download a resource from the given URL and return a byte sequence of the content.

set_request_parameter(key: str, value: Any) -> None 🔗

Set a parameter which will be handed to the requests library when calling requests.get(). This may be used to set custom headers, timeouts and similar parameters. See the requests documentation <https://requests.readthedocs.io/en/latest/api/?highlight=get#requests.request>_ for possible parameters. Per default, a timeout of 60 sec is set and streaming download enabled.

timestamp_to_sequence(timestamp: dt.datetime, balanced_search: bool = False) -> Optional[int] 🔗

Get the sequence number of the replication file that contains the given timestamp. The search algorithm is optimised for replication servers that publish updates in regular intervals. For servers with irregular change file publication dates 'balanced_search` should be set to true so that a standard binary search for the sequence will be used. The default is good for all known OSM replication services.

osmium.replication.OsmosisState 🔗

Bases: typing.NamedTuple

Represents a state file of a replication server.

sequence: int instance-attribute 🔗

The ID of the replication change on the server.

timestamp: dt.datetime instance-attribute 🔗

Date until when changes are contained in the change file.

osmium.replication.DownloadResult 🔗

Bases: typing.NamedTuple

Downloaded change.

id: int instance-attribute 🔗

The ID of the latest downloaded replication change on the server.

newest: int instance-attribute 🔗

ID of the newest change available on the server.

reader: MergeInputReader instance-attribute 🔗

osmium.MergeInputReader with all downloaded changes.

Replication utils🔗

osmium.replication.newest_change_from_file(filename: str) -> datetime.datetime 🔗

Find the data of the most recent change in a file.

osmium.replication.get_replication_header(fname: str) -> ReplicationHeader 🔗

Scans the given file for an Osmosis replication header. It returns a namedtuple with url, sequence and timestamp. Each or all fields may be None, if the piece of information is not avilable. If any of the fields has an invalid format, it is simply ignored.

The given file must exist and be readable for osmium, otherwise a RuntimeError is raised.

osmium.replication.ReplicationHeader 🔗

Bases: typing.NamedTuple

Description of a replication state.

sequence: Optional[int] instance-attribute 🔗

ID of the change file on the server.

timestamp: Optional[dt.datetime] instance-attribute 🔗

Date of latest changes contained in the diff file.

url: Optional[str] instance-attribute 🔗

Base URL of the replication service.