Replication🔗
Replication server🔗
osmium.replication.ReplicationServer
🔗
Represents a connection to a server that publishes replication data. Replication change files allow to keep local OSM data up-to-date without downloading the full dataset again.
ReplicationServer may be used as a context manager. In this case, it internally keeps a connection to the server making downloads faster.
__init__(url: str, diff_type: str = 'osc.gz') -> None
🔗
Set up the connection to a replication server.
url contains the base URL of the replication service. This is
the directory that contains the state file with the current
state. If the replication service serves something other
than osc.gz files, set the diff_type to the given file suffix.
apply_diffs(handler: BaseHandler, start_id: int, max_size: Optional[int] = None, idx: str = '', simplify: bool = True, end_id: Optional[int] = None) -> Optional[int]
🔗
Download diffs starting with sequence id start_id, merge them
together and then apply them to handler handler. end_id
optionally gives the highest sequence id to download. max_size
allows to restrict the amount of diffs that are downloaded.
Downloaded diffs are temporarily saved in memory and this parameter
ensures that pyosmium doesn't run out of memory. max_size
is the maximum size in kB this internal buffer may have.
If neither end_id nor max_size are given, the download is
restricted to a maximum size of 1MB. The download also
stops when the most recent diff has been processed.
If idx is set, a location cache will be created and applied to
the way nodes. You should be aware that diff files usually do not
contain the complete set of nodes when a way is modified. That means
that you cannot just create a new location cache, apply it to a diff
and expect to get complete way geometries back. Instead you need to
do an initial data import using a persistent location cache to
obtain a full set of node locations and then reuse this location
cache here when applying diffs.
Diffs may contain multiple versions of the same object when it was
changed multiple times during the period covered by the diff. If
simplify is set to False then all versions are returned. If it
is True (the default) then only the most recent version will be
sent to the handler.
The function returns the sequence id of the last diff that was downloaded or None if the download failed completely.
apply_diffs_to_file(infile: str, outfile: str, start_id: int, max_size: Optional[int] = None, set_replication_header: bool = True, extra_headers: Optional[Mapping[str, str]] = None, outformat: Optional[str] = None, end_id: Optional[int] = None) -> Optional[Tuple[int, int]]
🔗
Download diffs starting with sequence id start_id, merge them
with the data from the OSM file named infile and write the result
into a file with the name outfile. The output file must not yet
exist.
end_id optionally gives the highest sequence id to download.
max_size allows to restrict the amount of diffs that are
downloaded. Downloaded diffs are saved in memory and this parameter
ensures that pyosmium doesn't run out of memory. max_size
is the maximum size in kB this internal buffer may have.
If neither end_id nor max_size are given, the
download is restricted to a maximum size of 1MB. The download also
stops when the most recent diff has been processed.
If set_replication_header is true then the URL of the replication
server and the sequence id and timestamp of the last diff applied
will be written into the writer. Note that this currently works
only for the PBF format.
extra_headers is a dict with additional header fields to be set.
Most notably, the 'generator' can be set this way.
outformat sets the format of the output file. If None, the format
is determined from the file name.
The function returns a tuple of last downloaded sequence id and newest available sequence id if new data has been written or None if no data was available or the download failed completely.
close() -> None
🔗
Close any open connection to the replication server.
collect_diffs(start_id: int, max_size: Optional[int] = None, end_id: Optional[int] = None) -> Optional[DownloadResult]
🔗
Create a MergeInputReader and download diffs starting with sequence
id start_id into it. end_id optionally gives the highest
sequence number to download. max_size restricts the number of
diffs that are downloaded by size. If neither end_id nor
max_size are given, then download default to stop after 1MB.
The download stops as soon as
1. a diff cannot be downloaded or
2. the end_id (inclusive) is reached or
3. the unpacked data in memory exceeds max_size kB or,
when no end_id and max_size are given, 1024kB.
If some data was downloaded, returns a namedtuple with three fields:
id contains the sequence id of the last downloaded diff, reader
contains the MergeInputReader with the data and newest is a
sequence id of the most recent diff available.
Returns None if there was no new data was available.
If there is an error during the download, then the function will simply return the already downloaded data. If the reported error is a client error (HTTP 4xx) and happens during the download of the first diff, then a ::request.HTTPError:: is raised: this condition is likely to be permanent and the caller should not simply retry without investigating the cause.
get_diff_block(seq: int) -> str
🔗
Downloads the diff with the given sequence number and returns
it as a byte sequence. Throws an :code:requests.HTTPError
if the file cannot be downloaded.
get_diff_url(seq: int) -> str
🔗
Returns the URL to the diff file for the given sequence id.
get_state_info(seq: Optional[int] = None, retries: int = 2) -> Optional[OsmosisState]
🔗
Downloads and returns the state information for the given
sequence. If the download is successful, a namedtuple with
sequence and timestamp is returned, otherwise the function
returns None. retries sets the number of times the download
is retried when pyosmium detects a truncated state file.
get_state_url(seq: Optional[int]) -> str
🔗
Returns the URL of the state.txt files for a given sequence id.
If seq is None the URL for the latest state info is returned,
i.e. the state file in the root directory of the replication
service.
open_url(url: urlrequest.Request) -> Any
🔗
Download a resource from the given URL and return a byte sequence of the content.
set_request_parameter(key: str, value: Any) -> None
🔗
Set a parameter which will be handed to the requests library
when calling requests.get(). This
may be used to set custom headers, timeouts and similar parameters.
See the requests documentation <https://requests.readthedocs.io/en/latest/api/?highlight=get#requests.request>_
for possible parameters. Per default, a timeout of 60 sec is set
and streaming download enabled.
timestamp_to_sequence(timestamp: dt.datetime, balanced_search: bool = False, limit_by_oldest_available: bool = False) -> Optional[int]
🔗
Get the sequence number of the replication file that contains the given timestamp. The search algorithm is optimised for replication servers that publish updates in regular intervals. For servers with irregular change file publication dates 'balanced_search` should be set to true so that a standard binary search for the sequence will be used. The default is good for all known OSM replication services.
When limit_by_oldest_available is set, then the function will
return None when the server replication does not start at 0 and
the given timestamp is older than the oldest available timestamp
on the server. Some replication servers do not keep the full
history and this flag avoids accidentally trying to download older
data. The downside is that the function will never return the
oldest available sequence ID when the flag is set.
osmium.replication.OsmosisState
🔗
osmium.replication.DownloadResult
🔗
Bases: typing.NamedTuple
Downloaded change.
id: int
instance-attribute
🔗
The ID of the latest downloaded replication change on the server.
newest: int
instance-attribute
🔗
ID of the newest change available on the server.
reader: MergeInputReader
instance-attribute
🔗
osmium.MergeInputReader with all downloaded changes.
Replication utils🔗
osmium.replication.newest_change_from_file(filename: str) -> datetime.datetime
🔗
Find the data of the most recent change in a file.
osmium.replication.get_replication_header(fname: str) -> ReplicationHeader
🔗
Scans the given file for an Osmosis replication header. It returns
a namedtuple with url, sequence and timestamp. Each or all fields
may be None, if the piece of information is not avilable. If any of
the fields has an invalid format, it is simply ignored.
The given file must exist and be readable for osmium, otherwise
a RuntimeError is raised.
osmium.replication.ReplicationHeader
🔗
Bases: typing.NamedTuple
Description of a replication state.