sewerrat package¶
Submodules¶
sewerrat.deregister module¶
- sewerrat.deregister.deregister(path, url, retry=3, wait=1, block=True)[source]¶
Deregister a directory from the SewerRat search index.
- Parameters:
path (
str
) – Path to the directory to be deregistered. The directory should either be readable by the SewerRat API and the caller should have write access; or the directory should not exist.url (
str
) – URL to the SewerRat REST API.retry (
int
) – Deprecated, ignored.wait (
float
) – Deprecated, ignored.block (
bool
) – Whether to block on successful deregistration.
- Returns:
On success, the directory is deregistered.
If
block = False
, the function returns before confirmation of successful deregistration from the SewerRat API. This can be useful for asynchronous processing of directories with many files.
sewerrat.list_fields module¶
- sewerrat.list_fields.list_fields(url, pattern=None, count=False, number=100, on_truncation='message')[source]¶
List available fields in the SewerRat database.
- Parameters:
url (
str
) – URL to the SewerRat REST API.pattern (
Optional
[str
]) – Pattern for filtering fields, using the usual*
and?
wildcards. Only fields matching to the pattern will be returned. IfNone
, no filtering is performed.count (
bool
) – Whether to count the number of metadata files associated with each field.number (
int
) – Integer specifying the maximum number of results to return. This can also befloat("inf")
to retrieve all available results.on_truncation (
Literal
['message'
,'warning'
,'none'
]) – String specifying the action to take when the number of search results is capped bynumber
.
- Returns:
field
, string containing the field.count
, integer specifying the number of files associated with the field. This is only present ifcount=True
in the function call.
- Return type:
List of dictionaries where each dictionary corresponds to a field and contains
sewerrat.list_files module¶
- sewerrat.list_files.list_files(path, url, recursive=True, force_remote=False)[source]¶
List the contents of a registered directory or a subdirectory thereof.
- Parameters:
path (
str
) – Absolute path of the directory to list.url (
str
) – URL to the SewerRat REST API. Only used for remote access.recursive (
bool
) – Whether to list the contents recursively. If False, the contents of subdirectories are not listed, and the names of directories are suffxed with/
in the returned list.force_remote (
bool
) – Whether to force remote access via the API, even ifpath
is on the same filesystem as the caller.
- Return type:
- Returns:
List of strings containing the relative paths of files in
path
.
sewerrat.list_registered_directories module¶
- sewerrat.list_registered_directories.list_registered_directories(url, user=None, contains=None, within=None, prefix=None, exists=None, number=100, on_truncation='message')[source]¶
List all registered directories in the SewerRat instance.
- Parameters:
url (
str
) – URL to the SewerRat REST API.user (
Union
[str
,bool
,None
]) – Name of a user. If notNone
, this is used to filter the returned directories based on the user who registered them. Alternatively, this can be set toTrue
to automatically use the name of the current user.contains (
Optional
[str
]) – String containing an absolute path. If not None, results are filtered to directories that contain this path.within (
Optional
[str
]) – String containing an absolute path. If notNone
, results are filtered to directories equal to or within this path.prefix (
Optional
[str
]) – String containing an absolute path or a prefix thereof. If notNone
, results are filtered to directories starting with this string. This is soft-deprecated and users should usewithin=
instead.exists (
Optional
[bool
]) – Whether to only report directories that exist on the filesystem. IfFalse
, only non-existent directories are reported, and ifNone
, no filtering is applied based on existence.number (
int
) – Integer specifying the maximum number of results to return. This can also befloat("inf")
to retrieve all available results.on_truncation (
Literal
['message'
,'warning'
,'none'
]) – String specifying the action to take when the number of search results is capped bynumber
.
- Returns:
path, the path to the directory.
user, the name of the user who registered it.
time, the Unix epoch time of the registration.
names, a list containing the names of the metadata files to be indexed.
- Return type:
List of dictionaries where each dictionary corresponds to a registered directory and contains
sewerrat.list_tokens module¶
- sewerrat.list_tokens.list_tokens(url, pattern=None, field=None, count=False, number=100, on_truncation='message')[source]¶
List available tokens in the SewerRat database.
- Parameters:
url (
str
) – URL to the SewerRat REST API.pattern (
Optional
[str
]) – Pattern for filtering tokens, using the usual*
and?
wildcards. Only tokens matching to the pattern will be returned. IfNone
, no filtering is performed.field (
Optional
[str
]) – Metadata property field for filtering tokens. Only tokens found in the specified field will be returned. IfNone
, no filtering is performed.count (
bool
) – Whether to count the number of metadata files associated with each token.number (
int
) – Integer specifying the maximum number of results to return. This can also befloat("inf")
to retrieve all available results.on_truncation (
Literal
['message'
,'warning'
,'none'
]) – String specifying the action to take when the number of search results is capped bynumber
.
- Returns:
token
, string containing the token.count
, integer specifying the number of files associated with the token. This is only present ifcount=True
in the function call.
- Return type:
List of dictionaries where each dictionary corresponds to a token and contains
sewerrat.query module¶
- sewerrat.query.query(url, text=None, user=None, path=None, after=None, before=None, metadata=True, number=100, on_truncation='message')[source]¶
Query the metadata in the SewerRat backend based on free text, the owner, creation time, etc. This function does not require filesystem access.
- Parameters:
url (
str
) – String containing the URL to the SewerRat REST API.text (
Optional
[str
]) – String containing a free-text query, following the syntax described here. If None, no filtering is applied based on the metadata text.user (
Optional
[str
]) – String containing the name of the user who generated the metadata. If None, no filtering is applied based on the user.path (
Optional
[str
]) – String containing any component of the path to the metadata file. If None, no filtering is applied based on the path.after (
Optional
[int
]) – Integer containing a Unix time in seconds, where only files newer thanafter
will be retained. If None, no filtering is applied to remove old files.before (
Optional
[int
]) – Integer containing a Unix time in seconds, where only files older thanbefore
will be retained. If None, no filtering is applied to remove new files.metadata (
bool
) – Whether to return the metadata of each file. This can be set toFalse
for better performance if only the path is of interest.number (
int
) – Integer specifying the maximum number of results to return. This can also befloat("inf")
to retrieve all available results.on_truncation (
Literal
['message'
,'warning'
,'none'
]) – String specifying the action to take when the number of search results is capped bynumber
.
- Returns:
path
, a string containing the path to the file.user
, the identity of the file owner.time
, the Unix time of most recent file modification.metadata
, a list representing the JSON contents of the file. Only reported ifmetadata=True
in the function call.
- Return type:
List of dictionaries where each dictionary corresponds to a metadata file and contains
sewerrat.register module¶
- sewerrat.register.register(path, names, url, retry=3, wait=1, block=True)[source]¶
Register a directory into the SewerRat search index.
- Parameters:
path (
str
) – Path to the directory to be registered. The directory should be readable by the SewerRat API and the caller should have write access.names (
Union
[str
,List
[str
]]) – List of strings containing the base names of metadata files insidepath
to be indexed. Alternatively, a single string containing the base name for a single metadata file.url (
str
) – URL to the SewerRat REST API.retry (
int
) – Deprecated, ignored.wait (
int
) – Deprecated, ignored.block (
bool
) – Whether to block on successful registration.
- Returns:
On success, the directory is registered. If a metadata file cannot be indexed (e.g., due to incorrect formatting, insufficient permissions), a warning will be printed but the function will not throw an error.
If
block = False
, the function returns before confirmation of successful registration from the SewerRat API. This can be useful for asynchronous processing of directories with many files.
sewerrat.retrieve_directory module¶
- sewerrat.retrieve_directory.retrieve_directory(path, url, cache=None, force_remote=False, overwrite=False, concurrent=1, update_delay=3600)[source]¶
Obtain the path to a registered directory or one of its subdirectories. This may create a local copy of the directory’s contents if the caller is not on the same filesystem.
- Parameters:
path (
str
) – Relative path to a registered directory or its subdirectories.url (
str
) – URL to the Gobbler REST API. Only used for remote queries.cache (
Optional
[str
]) – Path to a cache directory. If None, an appropriate location is automatically chosen. Only used for remote access.force_remote (
bool
) – Whether to force remote access. This will download all files in thepath
via the REST API and cache them locally, even ifpath
is present on the same filesystem.overwrite (
bool
) – Whether to overwrite existing files in the cache.concurrent (
int
) – Number of concurrent downloads.update_delay (
int
) – Delay interval before checking for updates in a cached directory, seconds. Only used for remote access.
- Return type:
- Returns:
Path to the subdirectory on the caller’s filesystem. This is either
path
if it is accessible, or a path to a local cache of the directory’s contents otherwise.
sewerrat.retrieve_file module¶
- sewerrat.retrieve_file.retrieve_file(path, url, cache=None, force_remote=False, overwrite=False)[source]¶
Retrieve the path to a single file in a registered directory. This will call the REST API if the caller is not on the same filesystem.
- Parameters:
path – Relative path to a registered directory or its subdirectories.
url – URL to the Gobbler REST API. Only used for remote queries.
cache (
Optional
[str
]) – Path to a cache directory. If None, an appropriate location is automatically chosen. Only used for remote access.force_remote (
bool
) – Whether to force remote access. This will downloadpath
via the REST API and cache it locally, even ifpath
is present on the same filesystem.overwrite (
bool
) – Whether to overwrite existing files in the cache.
- Return type:
- Returns:
Path to the subdirectory on the caller’s filesystem. This is either
path
if it is accessible, or a path to a local copy otherwise.
sewerrat.retrieve_metadata module¶
- sewerrat.retrieve_metadata.retrieve_metadata(path, url)[source]¶
Retrieve a single metadata entry in a registered directory from the SewerRat API.
- Parameters:
- Returns:
path
, the path to the metadata file.user
, the identity of the owning user.time
, the Unix time at which the file was modified.metadata
, the loaded metadata, typically another dictionary representing a JSON object.
- Return type:
Dictionary containing
sewerrat.start_sewerrat module¶
- sewerrat.start_sewerrat.start_sewerrat(db=None, port=None, wait=1, version='1.2.0', overwrite=False)[source]¶
Start a test SewerRat service.
- Parameters:
db (
Optional
[str
]) – Path to a SQLite database. If None, one is automatically created.port (
Optional
[int
]) – An available port. If None, one is automatically chosen.wait (
float
) – Number of seconds to wait for the service to initialize before use.version (
str
) – Version of the service to run.overwrite (
bool
) – Whether to overwrite the existing Gobbler binary.
- Return type:
- Returns:
A tuple indicating whether a new test service was created (or an existing instance was re-used) and its URL. If a service is already running, this function is a no-op and the configuration details of the existing service will be returned.
- sewerrat.start_sewerrat.stop_sewerrat()[source]¶
Stop the SewerRat test service started by
start_sewerrat()
. If no test service was running, this function is a no-op.