sewerrat package¶

Submodules¶

sewerrat.deregister module¶

sewerrat.deregister.deregister(path, url, retry=3, wait=1, block=True)[source]¶

Deregister a directory from the SewerRat search index.

Parameters:

path (str) – Path to the directory to be deregistered. The directory should either be readable by the SewerRat API and the caller should have write access; or the directory should not exist.
url (str) – URL to the SewerRat REST API.
retry (int) – Deprecated, ignored.
wait (float) – Deprecated, ignored.
block (bool) – Whether to block on successful deregistration.

Returns:

On success, the directory is deregistered.

If block = False, the function returns before confirmation of successful deregistration from the SewerRat API. This can be useful for asynchronous processing of directories with many files.

sewerrat.list_fields module¶

sewerrat.list_fields.list_fields(url, pattern=None, count=False, number=100, on_truncation='message')[source]¶

List available fields in the SewerRat database.

Parameters:

url (str) – URL to the SewerRat REST API.
pattern (Optional[str]) – Pattern for filtering fields, using the usual * and ? wildcards. Only fields matching to the pattern will be returned. If None, no filtering is performed.
count (bool) – Whether to count the number of metadata files associated with each field.
number (int) – Integer specifying the maximum number of results to return. This can also be float("inf") to retrieve all available results.
on_truncation (Literal['message', 'warning', 'none']) – String specifying the action to take when the number of search results is capped by number.

Returns:

field, string containing the field.
count, integer specifying the number of files associated with the field. This is only present if count=True in the function call.

Return type:

List of dictionaries where each dictionary corresponds to a field and contains

sewerrat.list_files module¶

sewerrat.list_files.list_files(path, url, recursive=True, force_remote=False)[source]¶

List the contents of a registered directory or a subdirectory thereof.

Parameters:

path (str) – Absolute path of the directory to list.
url (str) – URL to the SewerRat REST API. Only used for remote access.
recursive (bool) – Whether to list the contents recursively. If False, the contents of subdirectories are not listed, and the names of directories are suffxed with / in the returned list.
force_remote (bool) – Whether to force remote access via the API, even if path is on the same filesystem as the caller.

Return type:

List[str]

Returns:

List of strings containing the relative paths of files in path.

sewerrat.list_registered_directories module¶

sewerrat.list_registered_directories.list_registered_directories(url, user=None, contains=None, within=None, prefix=None, exists=None, number=100, on_truncation='message')[source]¶

List all registered directories in the SewerRat instance.

Parameters:

url (str) – URL to the SewerRat REST API.
user (Union[str, bool, None]) – Name of a user. If not None, this is used to filter the returned directories based on the user who registered them. Alternatively, this can be set to True to automatically use the name of the current user.
contains (Optional[str]) – String containing an absolute path. If not None, results are filtered to directories that contain this path.
within (Optional[str]) – String containing an absolute path. If not None, results are filtered to directories equal to or within this path.
prefix (Optional[str]) – String containing an absolute path or a prefix thereof. If not None, results are filtered to directories starting with this string. This is soft-deprecated and users should use within= instead.
exists (Optional[bool]) – Whether to only report directories that exist on the filesystem. If False, only non-existent directories are reported, and if None, no filtering is applied based on existence.
number (int) – Integer specifying the maximum number of results to return. This can also be float("inf") to retrieve all available results.
on_truncation (Literal['message', 'warning', 'none']) – String specifying the action to take when the number of search results is capped by number.

Returns:

path, the path to the directory.
user, the name of the user who registered it.
time, the Unix epoch time of the registration.
names, a list containing the names of the metadata files to be indexed.

Return type:

List of dictionaries where each dictionary corresponds to a registered directory and contains

sewerrat.list_tokens module¶

sewerrat.list_tokens.list_tokens(url, pattern=None, field=None, count=False, number=100, on_truncation='message')[source]¶

List available tokens in the SewerRat database.

Parameters:

url (str) – URL to the SewerRat REST API.
pattern (Optional[str]) – Pattern for filtering tokens, using the usual * and ? wildcards. Only tokens matching to the pattern will be returned. If None, no filtering is performed.
field (Optional[str]) – Metadata property field for filtering tokens. Only tokens found in the specified field will be returned. If None, no filtering is performed.
count (bool) – Whether to count the number of metadata files associated with each token.
number (int) – Integer specifying the maximum number of results to return. This can also be float("inf") to retrieve all available results.
on_truncation (Literal['message', 'warning', 'none']) – String specifying the action to take when the number of search results is capped by number.

Returns:

token, string containing the token.
count, integer specifying the number of files associated with the token. This is only present if count=True in the function call.

Return type:

List of dictionaries where each dictionary corresponds to a token and contains

sewerrat.query module¶

sewerrat.query.query(url, text=None, user=None, path=None, after=None, before=None, metadata=True, number=100, on_truncation='message')[source]¶

Query the metadata in the SewerRat backend based on free text, the owner, creation time, etc. This function does not require filesystem access.

Parameters:

url (str) – String containing the URL to the SewerRat REST API.
text (Optional[str]) – String containing a free-text query, following the syntax described here. If None, no filtering is applied based on the metadata text.
user (Optional[str]) – String containing the name of the user who generated the metadata. If None, no filtering is applied based on the user.
path (Optional[str]) – String containing any component of the path to the metadata file. If None, no filtering is applied based on the path.
after (Optional[int]) – Integer containing a Unix time in seconds, where only files newer than after will be retained. If None, no filtering is applied to remove old files.
before (Optional[int]) – Integer containing a Unix time in seconds, where only files older than before will be retained. If None, no filtering is applied to remove new files.
metadata (bool) – Whether to return the metadata of each file. This can be set to False for better performance if only the path is of interest.
number (int) – Integer specifying the maximum number of results to return. This can also be float("inf") to retrieve all available results.
on_truncation (Literal['message', 'warning', 'none']) – String specifying the action to take when the number of search results is capped by number.

Returns:

path, a string containing the path to the file.
user, the identity of the file owner.
time, the Unix time of most recent file modification.
metadata, a list representing the JSON contents of the file. Only reported if metadata=True in the function call.

Return type:

List of dictionaries where each dictionary corresponds to a metadata file and contains

sewerrat.register module¶

sewerrat.register.register(path, names, url, retry=3, wait=1, block=True)[source]¶

Parameters:

path (str) – Path to the directory to be registered. The directory should be readable by the SewerRat API and the caller should have write access.
names (Union[str, List[str]]) – List of strings containing the base names of metadata files inside path to be indexed. Alternatively, a single string containing the base name for a single metadata file.
url (str) – URL to the SewerRat REST API.
retry (int) – Deprecated, ignored.
wait (int) – Deprecated, ignored.
block (bool) – Whether to block on successful registration.

Returns:

On success, the directory is registered. If a metadata file cannot be indexed (e.g., due to incorrect formatting, insufficient permissions), a warning will be printed but the function will not throw an error.

If block = False, the function returns before confirmation of successful registration from the SewerRat API. This can be useful for asynchronous processing of directories with many files.

sewerrat.retrieve_directory module¶

sewerrat.retrieve_directory.retrieve_directory(path, url, cache=None, force_remote=False, overwrite=False, concurrent=1, update_delay=3600)[source]¶

Obtain the path to a registered directory or one of its subdirectories. This may create a local copy of the directory’s contents if the caller is not on the same filesystem.

Parameters:

path (str) – Relative path to a registered directory or its subdirectories.
url (str) – URL to the Gobbler REST API. Only used for remote queries.
cache (Optional[str]) – Path to a cache directory. If None, an appropriate location is automatically chosen. Only used for remote access.
force_remote (bool) – Whether to force remote access. This will download all files in the path via the REST API and cache them locally, even if path is present on the same filesystem.
overwrite (bool) – Whether to overwrite existing files in the cache.
concurrent (int) – Number of concurrent downloads.
update_delay (int) – Delay interval before checking for updates in a cached directory, seconds. Only used for remote access.

Return type:

str

Returns:

Path to the subdirectory on the caller’s filesystem. This is either path if it is accessible, or a path to a local cache of the directory’s contents otherwise.

sewerrat.retrieve_file module¶

sewerrat.retrieve_file.retrieve_file(path, url, cache=None, force_remote=False, overwrite=False)[source]¶

Retrieve the path to a single file in a registered directory. This will call the REST API if the caller is not on the same filesystem.

Parameters:

path – Relative path to a registered directory or its subdirectories.
url – URL to the Gobbler REST API. Only used for remote queries.
cache (Optional[str]) – Path to a cache directory. If None, an appropriate location is automatically chosen. Only used for remote access.
force_remote (bool) – Whether to force remote access. This will download path via the REST API and cache it locally, even if path is present on the same filesystem.
overwrite (bool) – Whether to overwrite existing files in the cache.

Return type:

str

Returns:

Path to the subdirectory on the caller’s filesystem. This is either path if it is accessible, or a path to a local copy otherwise.

sewerrat.retrieve_metadata module¶

sewerrat.retrieve_metadata.retrieve_metadata(path, url)[source]¶

Retrieve a single metadata entry in a registered directory from the SewerRat API.

Parameters:

path (str) – Absolute path to a metadata file in a registered directory.
url (str) – URL to the SewerRat REST API.

Returns:

path, the path to the metadata file.
user, the identity of the owning user.
time, the Unix time at which the file was modified.
metadata, the loaded metadata, typically another dictionary representing a JSON object.

Return type:

Dictionary containing

sewerrat.start_sewerrat module¶

sewerrat.start_sewerrat.start_sewerrat(db=None, port=None, wait=1, version='1.3.1', whitelist=None, overwrite=False)[source]¶

Start a test SewerRat service.

Parameters:

db (Optional[str]) – Path to a SQLite database. If None, one is automatically created.
port (Optional[int]) – An available port. If None, one is automatically chosen.
wait (float) – Number of seconds to wait for the service to initialize before use.
version (str) – Version of the service to run.
whitelist (Optional[list]) – List of users who can create symbolic links that will be followed by the SewerRat service. If None, this defaults to the current user and the owner of the temporary directory.
overwrite (bool) – Whether to overwrite the existing Gobbler binary.

Return type:

Tuple[bool, int]

Returns:

A tuple indicating whether a new test service was created (or an existing instance was re-used) and its URL. If a service is already running, this function is a no-op and the configuration details of the existing service will be returned.

sewerrat.start_sewerrat.stop_sewerrat()[source]¶: Stop the SewerRat test service started by start_sewerrat(). If no test service was running, this function is a no-op.

sewerrat package¶

Submodules¶

sewerrat.deregister module¶

sewerrat.list_fields module¶

sewerrat.list_files module¶

sewerrat.list_registered_directories module¶

sewerrat.list_tokens module¶

sewerrat.query module¶

sewerrat.register module¶

sewerrat.retrieve_directory module¶

sewerrat.retrieve_file module¶

sewerrat.retrieve_metadata module¶

sewerrat.start_sewerrat module¶

Module contents¶