gypsum_client package

Submodules

gypsum_client.auth module

gypsum_client.auth.access_token(full=False, request=True, cache_dir='/home/runner/.cache/gypsum', token_expiration_limit=10)[source]

Get GitHub access token for authentication to the gypsum API’s.

Example

token = access_token()
Parameters:
  • full (bool) – Whether to return the full token details. Defaults to False, only token is returned.

  • request (bool) – Whether to request a new token if no token is found or the current token is expired. Defaults to True.

  • cache_dir (Optional[str]) – Path to the cache directory to store tokens. Can be set to None, indicating token is not cached to disk.

  • token_expiration_limit (int) – Integer specifying the number of seconds until the token expires.

Return type:

Union[str, dict, None]

Returns:

If full=False A string specifying the GitHub token to access gypsum’s resources.

If full=True retuns a dicionary containing the full token details.

gypsum_client.auth.set_access_token(token='auto', app_url='https://gypsum.artifactdb.com', app_key=None, app_secret=None, github_url='https://api.github.com', user_agent=None, cache_dir='/home/runner/.cache/gypsum')[source]

Set GitHub access token for authentication to the gypsum API’s.

Parameters:
  • token (str) – A String containing Github’s personal access token.

  • app_url (str) – URL to the gypsum REST API.

  • app_key (Optional[str]) – Key to the GitHub oauth app.

  • app_secret (Optional[str]) – Secret to the GitHub oauth app.

  • github_url (str) – URL to GitHub’s API.

  • user_agent (Optional[str]) – Specify the user agent for queries to various endpoints.

  • cache_dir (Optional[str]) – Path to the cache directory to store tokens. Defaults to None, indicating token is not cached to disk.

Returns:

  • token, a string containing the token.

  • name, the name of the GitHub user authenticated by the token.

  • expires, the Unix time at which the token expires.

Return type:

Dictionary containing the following keys

gypsum_client.cache_directory module

gypsum_client.cache_directory.cache_directory(dir=None)[source]

Cache directory.

Specify the cache directory in the local filesystem for gypsum-related data.

If the GYPSUM_CACHE_DIR environment variable is set before the first call to cache_directory(), it is used as the initial location of the cache directory. Otherwise, the initial location is set to user’s home directory defined by appdirs.user_cache_dir().

Parameters:

dir (Optional[str]) – Path to the current cache directory. If None, a default cache location is chosen.

Return type:

str

Returns:

If dir is None, the path to the cache directory is returned.

If dir is supplied, it is used to set the path to the cache directory, and the previous location of the directory is returned.

gypsum_client.clone_operations module

Clone a version’s directory structure.

Cloning of a versioned asset involves creating a directory at the destination that has the same contents as the corresponding project-asset-version directory. All files in the specified version are represented as symlinks from the destination to the corresponding file in the cache. The idea is that, when the destination is used in prepare_directory_upload(), the symlinks are converted into upload links, i.e., links= in start_upload(). This allows users to create new versions very cheaply as duplicate files are not uploaded to/stored in the backend.

Users can more-or-less do whatever they want inside the cloned destination, but they should treat the symlink targets as read-only. That is, they should not modify the contents of the linked-to file, as these refer to assumed-immutable files in the cache. If a file in the destination needs to be modified, the symlink should be deleted and replaced with an actual file; this avoids mutating the cache and it ensures that prepare_directory_upload() recognizes that a new file actually needs to be uploaded.

Advanced users can set download=False, in which case symlinks are created even if their targets are not present in the cache. In such cases, the destination should be treated as write-only due to the potential presence of dangling symlinks. This mode is useful for uploading a new version of an asset without downloading the files from the existing version, assuming that the modifications associated with the former can be achieved without reading any of the latter.

On Windows, the user may not have permissions to create symbolic links, so the function will transparently fall back to creating hard links or copies instead. This precludes any optimization by prepare_directory_upload as the hard links/copies cannot be converted into upload links. It also assumes that download=True as dangling links/copies cannot be created.

gypsum_client.clone_operations.clone_version(project, asset, version, destination, download=True, cache_dir='/home/runner/.cache/gypsum', url='https://gypsum.artifactdb.com', **kwargs)[source]

Clone a version’s directory structure.

Clone the directory structure for a versioned asset into a separate location. This is typically used to prepare a new version for a lightweight upload.

See also

prepare_directory_upload(), to prepare an upload based on the directory contents.

Example

import tempfile

cache = tempfile.mkdtemp()
dest = tempfile.mkdtemp()

clone_version("test-R", "basic", "v1", destination=dest, cache_dir=cache)
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • cache_dir (str) – Path to the cache directory.

  • destination (str) – Destination directory at which to create the clone.

  • download (bool) – Whether the version’s files should be downloaded first. This can be set to False to create a clone without actually downloading any of the version’s files. Defaults to True.

  • url (str) – URL of the gypsum REST API.

  • **kwargs

    Further arguments to pass to save_version().

    Only used if download is True.

gypsum_client.config module

Set this to False if SSL certificates are not properly setup on your machine.

essentially sets verify=False on all requests to the gypsum REST API.

Example

gypsum_client.create_operations module

gypsum_client.create_operations.create_project(project, owners, uploaders=[], baseline=None, growth_rate=None, year=None, url='https://gypsum.artifactdb.com', token=None)[source]

Create a new project with the associated permissions.

See also

remove_project(), to remove the project.

Example

createProject(
    "test-Py-create",
    owners="jkanche",
    uploaders=[{"id": "ArtifactDB-bot"}]
)
Parameters:
  • project (str) – Project name.

  • owners (Union[str, List[str]]) –

    List of GitHub users or organizations that are owners of this project.

    May also be a string containing the Github user or organization.

  • uploaders (List[str]) –

    List of authorized uploaders for this project. Defaults to an empty list.

    Checkout fetch_permissions() for more info.

  • baseline (int) – Baseline quote in bytes.

  • growth_rate (int) – Expected annual growth rate in bytes.

  • year (int) – Year of project creation.

  • url (str) – URL to the gypsum REST API.

  • token (str) – GitHub access token to authenticate with the gypsum REST API. The token must refer to a gypsum administrator account.

gypsum_client.fetch_metadata_database module

Fetch the metadata database.

This function will automatically check for updates to the SQLite files and will download new versions accordingly. New checks are performed when one hour or more has elapsed since the last check. If the check fails, a warning is raised and the function returns the currently cached file.

gypsum_client.fetch_metadata_database.fetch_metadata_database(name='bioconductor.sqlite3', cache_dir='/home/runner/.cache/gypsum', overwrite=False)[source]

Fetch the SQLite database containing metadata from the gypsum backend.

See metadata index for more details.

Each database is generated by aggregating metadata across multiple assets and/or projects, and can be used to perform searches for interesting objects.

See also

fetch_metadata_schema(), to get the JSON schema used to define the database tables.

Example

sql_path = fetch_metadata_database()
Parameters:
  • name (str) –

    Name of the database. This can be the name of any SQLite file published here.

    Defaults to “bioconductor.sqlite3”.

  • cache_dir (str) –

    Path to the cache directory.

    Defaults to None.

  • overwrite (bool) –

    Whether to overwrite existing file.

    Defaults to False.

Return type:

str

Returns:

Path to the downloaded database.

gypsum_client.fetch_metadata_database.get_current_unix_time()[source]
gypsum_client.fetch_metadata_database.get_last_modified_date(base_url)[source]

gypsum_client.fetch_metadata_schema module

gypsum_client.fetch_metadata_schema.fetch_metadata_schema(name='bioconductor/v1.json', cache_dir='/home/runner/.cache/gypsum', overwrite=False)[source]

Fetch a JSON schema file for metadata to be inserted into a SQLite database.

Fetch a JSON schema file for metadata to be inserted into a SQLite database See metadata index for more details.

Each SQLite database is created from metadata files uploaded to the gypsum backend, so clients uploading objects to be incorporated into the database should validate their metadata against the corresponding JSON schema.

See also

validate_metadata(), to validate metadata against a chosen schema.

fetch_metadata_database(), to obtain the SQLite database of metadata.

Example

schema_path = fetch_metadata_schema()
Parameters:
  • name (str) – Name of the schema. Defaults to “bioconductor/v1.json”.

  • cache_dir (str) – Path to the cache directory.

  • overwrite (bool) – Whether to overwrite existing file in cache.

Return type:

str

Returns:

Path containing the downloaded schema.

gypsum_client.fetch_operations module

gypsum_client.fetch_operations.fetch_latest(project, asset, url='https://gypsum.artifactdb.com')[source]

Fetch the latest version of a project’s asset.

See also

refresh_latest(), to refresh the latest version.

Example

ver = fetch_latest("test-R", "basic")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • url (str) – URL to the gypsum compatible API.

Return type:

str

Returns:

Latest version of the project.

gypsum_client.fetch_operations.fetch_manifest(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]

Fetch the manifest for a version of an asset of a project.

Example

manifest = fetch_manifest("test-R", "basic", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • cache_dir (str) – Path to the cache directory.

  • overwrite (bool) – Whether to overwrite existing file in cache.

  • url (str) – URL to the gypsum compatible API.

Return type:

dict

Returns:

Dictionary containing the manifest for this version. Each element is named after the relative path of a file in this version. The value of each element is another list with the following fields: - size, an integer specifying the size of the file in bytes. - md5sum, a string containing the hex-encoded MD5 checksum of the file. - Optional link, a list specifying the link destination for a file.

This contains the strings project, asset, version and path. If the link destination is itself a link, an ancestor list will be present that specifies the final location of the file after resolving all intermediate links.

gypsum_client.fetch_operations.fetch_permissions(project, url='https://gypsum.artifactdb.com')[source]

Fetch the permissions for a project.

See also

set_permissions(), to update or modify the permissions.

Example

perms = fetch_permissions("test-R")
Parameters:
  • project (str) – Project name.

  • url (str) – URL to the gypsum compatible API.

Returns:

  • owners, a character vector containing the GitHub users or

organizations that are owners of this project. - uploaders, a list of lists specifying the users or organizations who are authorzied to upload to this project.

Each entry is a list with the following fields: - id, a string containing the GitHub user or organization that is authorized to upload. - Optional asset, a string containing the name of the asset that the uploader is allowed to upload to. If not provided, there is no restriction on the uploaded asset name. - Optional version, a string containing the name of the version that the uploader is allowed to upload to.If not provided, there is no restriction on the uploaded version name. - Optional until``a POSIXct object containing the expiry date of this authorization. If not provided, the authorization does not expire. - Optional ``trusted, whether the uploader is trusted. If not provided, defaults to False.

Return type:

Dictionary containing the permissions for this project

gypsum_client.fetch_operations.fetch_quota(project, url='https://gypsum.artifactdb.com')[source]

Fetch the quota details for a project.

See also

set_quota(), to update or modify the quota.

Example

quota = fetch_quota("test-R")
Parameters:
  • project (str) – Project name.

  • url (str) – URL to the gypsum compatible API.

Return type:

dict

Returns:

Dictionary containing baseline, the baseline quota at time zero in bytes; growth_rate, the annual growth rate for the quota in bytes; year, the creation year (i.e., time zero) for this project.

gypsum_client.fetch_operations.fetch_summary(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]

Fetch the summary for a version of an asset of a project.

Example

summa = fetch_summary("test-R", "basic", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • cache_dir (str) – Path to the cache directory.

  • overwrite (bool) – Whether to overwrite existing file in cache.

  • url (str) – URL to the gypsum compatible API.

Returns:

  • upload_user_id, string containing the identity of the uploader.

  • upload_start, a POSIXct object containing the upload start time.

  • upload_finish, a POSIXct object containing the upload finish time.

  • on_probation (optional), a logical scalar indicating whether the upload is probational.

    If missing, this can be assumed to be False.

Return type:

Dictionary containing the summary for this version, with the following fields

gypsum_client.fetch_operations.fetch_usage(project, url='https://gypsum.artifactdb.com')[source]

Fetch the quota usage for a project.

See also

refresh_usage(), to refresh usage details.

Example

usage = fetch_usage("test-R")
Parameters:
  • project (str) – Project name.

  • url (str) – URL to the gypsum compatible API.

Return type:

int

Returns:

Quota usage for the project, in bytes.

gypsum_client.list_operations module

gypsum_client.list_operations.list_assets(project, url='https://gypsum.artifactdb.com')[source]

List all assets in a project.

Example

all_assets = list_assets("test-R")
Parameters:
  • project (str) – Project name.

  • url (str) – URL to the gypsum compatible API.

Return type:

list

Returns:

List of asset names.

gypsum_client.list_operations.list_files(project, asset, version, prefix=None, include_dot=True, url='https://gypsum.artifactdb.com')[source]

List all files for a specified version of a project and asset.

Example

all_files = list_files("test-R", "basic", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • prefix (str) –

    Prefix for the object key.

    If provided. a file is only listed if its object key starts with {project}/{asset}/{version}/{prefix}.

    Defaults to None and all associated files with this version of the asset in the specified project are listed.

  • include_dot (bool) – Whether to list files with .. in their names.

  • url (str) – URL to the gypsum compatible API.

Return type:

list

Returns:

List of relative paths of files associated with the versioned asset.

gypsum_client.list_operations.list_projects(url='https://gypsum.artifactdb.com')[source]

List all projects in the gypsum backend.

Example

all_prjs = list_projects()
Parameters:

url (str) – URL to the gypsum compatible API.

Return type:

list

Returns:

List of project names.

gypsum_client.list_operations.list_versions(project, asset, url='https://gypsum.artifactdb.com')[source]

List all versions for a project asset.

Example

all_vers = list_versions("test-R", "basic")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • url (str) – URL to the gypsum compatible API.

Return type:

list

Returns:

List of versions.

gypsum_client.prepare_directory_for_upload module

Prepare to upload a directory’s contents.

Files in directory (that are not symlinks) are used as regular uploads, i.e., files= in start_upload().

If directory contains a symlink to a file in cache, we assume that it points to a file that was previously downloaded by, e.g., save_file() or save_version(). Thus, instead of performing a regular upload, we attempt to create an upload link, i.e., links= in start_upload(). This is achieved by examining the destination path of the symlink and inferring the link destination in the backend. Note that this still works if the symlinks are dangling.

If a symlink cannot be converted into an upload link, it will be used as a regular upload, i.e., the contents of the symlink destination will be uploaded by start_upload(). In this case, an error will be raised if the symlink is dangling as there is no file that can actually be uploaded. If links="always", an error is raised instead upon symlink conversion failure.

This function is intended to be used with clone_version(), which creates symlinks to files in cache.

See also

start_upload(), to actually start the upload.

clone_version(), to prepare the symlinks.

Example

import tempfile
cache = tempfile.mkdtemp()
dest = tempfile.mkdtemp()

# Clone a project
clone_version("test-R", "basic", "v1", destination=dest, cache_dir=cache)

# Make some modification
with open(os.path.join(dest, "heanna"), "w") as f:
    f.write("sumire")

# Prepare the directory for upload
prepped = prepare_directory_upload(dest, cache_dir=cache)
gypsum_client.prepare_directory_for_upload.prepare_directory_upload(directory, links='auto', cache_dir='/home/runner/.cache/gypsum')[source]

Prepare to upload a directory’s contents.

Prepare to upload a directory’s contents via start_upload. This goes through the directory to list its contents and convert symlinks to upload links.

Parameters:
  • directory (str) – Path to a directory, the contents of which are to be uploaded via start_upload().

  • links (Literal['auto', 'always', 'never']) – Indicate how to handle symlinks in directory. Must be one of the following: - “auto”: Will attempt to convert symlinks into upload links. If the conversion fails, a regular upload is performed. - “always”: Will attempt to convert symlinks into upload links. If the conversion fails, an error is raised. - “never”: Will never attempt to convert symlinks into upload links. All symlinked files are treated as regular uploads.

  • cache_dir (str) – Path to the cache directory, used to convert symlinks into upload links.

Returns:

  • files: list of strings to be used as files=

in start_upload(). - links: dictionary to be used as links= in start_upload().

Return type:

Dictionary containing

gypsum_client.probation_operations module

gypsum_client.probation_operations.approve_probation(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]

Approve a probational upload.

This removes the on_probation tag from the uploaded version.

See also

start_upload(), to specify probational upload.

reject_probation(), to reject the probational upload..

Example

init = start_upload(
    project="test-Py",
    asset="probation",
    version="v1",
    files=[],
    probation=True
)

complete_upload(init)
approve_probation("test-Py", "probation", "v1")

# Cleanup if this is just for testing
remove_asset("test-Py", "probation")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

gypsum_client.probation_operations.reject_probation(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]

Reject a probational upload.

This removes all files associated with that version.

See also

start_upload(), to specify probational upload.

approve_probation(), to approve the probational upload..

Example

init = start_upload(
    project="test-Py",
    asset="probation",
    version="v1",
    files=[],
    probation=True
)

complete_upload(init)
reject_probation("test-Py", "probation", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

gypsum_client.refresh_operations module

gypsum_client.refresh_operations.refresh_latest(project, asset, url='https://gypsum.artifactdb.com', token=None)[source]

Refresh the latest version.

Recompute the latest version of a project’s asset. This is useful on rare occasions where multiple simultaneous uploads cause the latest version to be slightly out of sync.

See also

fetch_latest(), to fetch the latest version without recomputing.

Example

ver = refresh_latest("test-R", "basic")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

Return type:

str

Returns:

A string containing the latest version.

gypsum_client.refresh_operations.refresh_usage(project, url='https://gypsum.artifactdb.com', token=None)[source]

Refresh the usage quota of a project.

Recompute the usage of a project. This is useful on rare occasions where multiple simultaneous uploads cause the usage calculations to be out of sync.

See also

fetch_usage(), to fetch the usage without recomputing.

Example

ver = refresh_usage("test-R", "basic")
Parameters:
  • project (str) – Project name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

Return type:

int

Returns:

The total quota usage of this project, in bytes.

gypsum_client.remove_operations module

gypsum_client.remove_operations.remove_asset(project, asset, url='https://gypsum.artifactdb.com', token=None)[source]

Remove an asset of a project from the gypsum backend.

See also

remove_project(), to remove a project.

remove_version(), to remove a specific version.

Example

# Mock a project
init = start_upload(
    project="test-Py-remove",
    asset="mock-remove",
    version="v1",
    files=[],
)

complete_upload(init)
remove_asset("test-Py-remove", "mock-remove")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API. The token must refer to a gypsum administrator account.

Returns:

True if asset was successfully removed.

gypsum_client.remove_operations.remove_project(project, url='https://gypsum.artifactdb.com', token=None)[source]

Remove a project from the gypsum backend.

See also

create_project(), to create a project.

remove_asset(), to remove a specific asset.

remove_version(), to remove a specific version.

Example

create_project("test-Py-remove", owners=["jkanche"])
remove_project("test-Py-remove")
Parameters:
  • project (str) – Project name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API. The token must refer to a gypsum administrator account.

Returns:

True if the project was successfully removed.

gypsum_client.remove_operations.remove_version(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]

Remove a project from the gypsum backend.

See also

remove_asset(), to remove a specific asset.

remove_version(), to remove a specific version.

Example

# Mock a project
init = start_upload(
    project="test-Py-remove",
    asset="mock-remove",
    version="v1",
    files=[],
)

complete_upload(init)

remove_version("test-Py-remove", "mock-remove", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

Returns:

True if the version of the project was successfully removed.

gypsum_client.rest_url module

gypsum_client.rest_url.rest_url(url=None)[source]

URL for the gypsum REST API.

Get or set the URL for the gypsum REST API.

Parameters:

url (Optional[str]) – URL to the gypsum REST API. Defaults to None.

Returns:

String containing the URL to the gypsum REST API.

gypsum_client.s3_config module

gypsum_client.s3_config.public_s3_config(refresh=False, url='https://gypsum.artifactdb.com', cache_dir=None)[source]

Get S3 configuration to the bucket storing the data.

Users can use this downstream to access the bucket directly using boto3.

Parameters:
  • refresh (bool) – Whether to refresh the cached credentials. Defaults to False.

  • url (str) – URL to the gypsum compatible API.

  • cache_dir (Optional[str]) – Path to the cache directory. Defaults to None.

Return type:

dict

Returns:

A dictionary containing the S3 credentials.

gypsum_client.save_operations module

gypsum_client.save_operations.save_file(project, asset, version, path, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]

Save a file from a version of a project asset.

Download a file from the gypsum bucket, for a version of an asset of a project.

See also

save_version(), to save all files associated with a version.

Example

out <- save_version("test-R", "basic", "v1", "blah.txt")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • path (str) –

    Suffix of the object key for the file of interest, i.e., the relative path inside the version’s ` subdirectory`.

    The full object key is defined as {project}/{asset}/{version}/{path}.

  • cache_dir (Optional[str]) – Path to the cache directory.

  • overwrite (bool) – Whether to overwrite existing file in cache.

  • url (str) – URL to the gypsum compatible API.

Returns:

The destintion file path where the file is downloaded to in the local file system.

gypsum_client.save_operations.save_version(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, relink=True, concurrent=1, url='https://gypsum.artifactdb.com')[source]

Download all files associated with a version of an asset of a project from the gypsum bucket.

See also

save_file(), to save a single file.

Example

out <- save_version("test-R", "basic", "v1")
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • cache_dir (Optional[str]) – Path to the cache directory.

  • overwrite (bool) – Whether to overwrite existing file in cache.

  • relink (bool) – Whether links should be resolved, see resolve_links(). Defaults to True.

  • concurrent (int) – Number of concurrent downloads. Defaults to 1.

Return type:

str

Returns:

Path to the local directory where the files are downloaded to.

gypsum_client.search_metadata module

Text search on the metadata database.

Each string is tokenized by converting it to lower case and splitting it on characters that are not Unicode letters/numbers or a dash. We currently do not remove diacritics so these will need to be converted to ASCII by the user. If a text query involves only non-letter/number/dash characters, the filter will not be well-defined and will be ignored when constructing SQL statements.

For convenience, a non-empty character vector may be used in query. - A list of length 1 is treated as shorthand for a text query with default arguments in define_text_query(). - A list of length greater than 1 is treated as shorthand for an AND operation on default text queries for each of the individual strings.

See also

fetch_metadata_database(), to download and cache the database files.

See metadata index, for details on the SQLite file contents and table structure.

class gypsum_client.search_metadata.GypsumSearchClause(type, text=None, field=None, partial=False, children=None, child=None)[source]

Bases: object

__and__(other)[source]
__init__(type, text=None, field=None, partial=False, children=None, child=None)[source]
__invert__()[source]
__or__(other)[source]

Return self|value.

gypsum_client.search_metadata.add_query_parameter(env, value)[source]
Return type:

str

gypsum_client.search_metadata.build_query(query, name, env)[source]
Return type:

List[str]

gypsum_client.search_metadata.define_text_query(text, field=None, partial=False)[source]

Define a query.

Parameters:
  • text (str) – Text to search by.

  • field (Optional[str]) –

    Name of the metadata field to search by.

    If None, search is performed on all metadata fields.

  • partial (bool) –

    Whether text contains SQLite wild cards for partial matches.

    Defaults to False.

Return type:

GypsumSearchClause

Returns:

GypsumSearchClause defining the search.

gypsum_client.search_metadata.sanitize_query(query)[source]
Return type:

Optional[GypsumSearchClause]

gypsum_client.search_metadata.search_metadata_text(path, query, latest=True, include_metadata=True)[source]

Text search on the metadata database.

Perform a text search on a SQLite database containing metadata from the gypsum backend. This is based on a precomputed tokenization of all string properties in each metadata document; see metadata index for details.

Examples

  • Search all metadata for a keyword:

search_metadata_text(
    sqlite_path,
    ["mikoto"],
    include_metadata=False,
    latest=False
)
  • Search for metadata containing multiple keywords (AND operation):

search_metadata_text(
    sqlite_path,
    ["sakugawa", "judgement"],
    include_metadata=False,
    latest=False
)

# or use the ``&`` operation
query = define_text_query("sakugawa") & define_text_query("judgement")
result = search_metadata_text(
    sqlite_path,
    query,
    include_metadata=False,
    latest=False
)
  • Search for metadata container either of the keywords (OR operation):

# use the ``|`` operation
query = define_text_query("uiharu") | define_text_query("rank")
result = search_metadata_text(
    sqlite_path,
    query,
    include_metadata=False,
    latest=False
)
Parameters:
  • path (str) – Path to the SQLite file, usually obtained by fetch_metadata().

  • query (Union[str, List[str], GypsumSearchClause]) –

    List of keywords specifying the query to execute.

    May be GypsumSearchClause class generated by define_text_query().

  • latest (bool) – Whether to only search in the latest version for each asset. Defaults to True.

  • include_metadata (bool) – Whether metadata should be returned. Defaults to True.

Return type:

List[Dict]

Returns:

Results matching the query.

gypsum_client.search_metadata.search_metadata_text_filter(query, pid_name='paths.pid')[source]
Return type:

Dict[str, Union[str, List]]

gypsum_client.set_operations module

gypsum_client.set_operations.set_permissions(project, owners=None, uploaders=None, append=True, url='https://gypsum.artifactdb.com', token=None)[source]

Set the owner and uploader permissions for a project.

See also

fetch_permission(), to fetch the permissions for a project.

Example

create_project("test-Py-perms", owners=["jkanche"])

until = (
    (datetime.now() + timedelta(seconds=1000000))
    .replace(microsecond=0)
)

set_permissions(
    "test-Py-perms",
    owners=["LTLA"],
    uploaders=[{"id": "LTLA", "until": until}]
)
Parameters:
  • project (str) – The project name.

  • owners (str) –

    List of GitHub users or organizations that are owners of this project.

    If None, no change is made.

  • uploaders (str) –

    List of authorized uploaders for this project.

    If None, no change is made.

  • append (bool) –

    Whether to append owners and uploaders to the existing ones.

    If False, the provided owners and uploaders replace the existing values.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

gypsum_client.set_operations.set_quota(project, baseline=None, growth_rate=None, year=None, url='https://gypsum.artifactdb.com', token=None)[source]

Set the quota for a project.

See also

fetch_quota(), to fetch the usage details.

Example

create_project("test-Py-quota", owners=["jkanche"])

set_quota(
    "test-Py-quota",
    baseline=1234,
    growth_rate=5678,
    year=2020
)
Parameters:
  • project (str) – The project name.

  • baseline (int) – Baseline quote, in bytes. If None, no change is made.

  • growth_rate (int) – Expected annual growth rate, in bytes. If None, no change is made.

  • year (int) – Year of project creation. If None, no change is made.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

gypsum_client.upload_api_operations module

gypsum_client.upload_api_operations.abort_upload(init, url='https://gypsum.artifactdb.com')[source]

Abort an upload session, usually after an irrecoverable error.

See also

start_upload(), to create the init.

Example

import tempfile
tmp_dir = tempfile.mkdtemp()

with open(f"{tmp_dir}/blah.txt", "w") as f:
    f.write(blah_contents)

os.makedirs(f"{tmp_dir}/foo", exist_ok=True)

with open(f"{tmp_dir}/foo/blah.txt", "w") as f:
    f.write(foobar_contents)

files = [
    str(file.relative_to(tmp_dir))
    for file in Path(tmp_dir).rglob("*")
    if not os.path.isdir(file)
]

init = start_upload(
    project="test-Py-demo",
    asset="upload",
    version="1",
    files=files,
    directory=tmp_dir,
)

complete_upload(init)
Parameters:
  • init (dict) –

    Dictionary containing abort_url and session_token.

    start_upload(), to create init.

  • url – URL to the gypsum REST API.

gypsum_client.upload_api_operations.complete_upload(init, url='https://gypsum.artifactdb.com')[source]

Complete an upload session after all files have been uploaded.

See also

start_upload(), to create the init.

Example

import tempfile
tmp_dir = tempfile.mkdtemp()

with open(f"{tmp_dir}/blah.txt", "w") as f:
    f.write(blah_contents)

os.makedirs(f"{tmp_dir}/foo", exist_ok=True)

with open(f"{tmp_dir}/foo/blah.txt", "w") as f:
    f.write(foobar_contents)

files = [
    str(file.relative_to(tmp_dir))
    for file in Path(tmp_dir).rglob("*")
    if not os.path.isdir(file)
]

init = start_upload(
    project="test-Py-demo",
    asset="upload",
    version="1",
    files=files,
    directory=tmp_dir,
)

abort_upload(init)
Parameters:
  • init (dict) –

    Dictionary containing complete_url and session_token.

    start_upload(), to create init.

  • url – URL to the gypsum REST API.

gypsum_client.upload_api_operations.start_upload(project, asset, version, files, links=None, deduplicate=True, probation=False, url='https://gypsum.artifactdb.com', token=None, directory=None)[source]

Start an upload.

Start an upload of a new version of an asset, or a new asset of a project.

See also

upload_files(), to actually upload the files.

complete_upload(), to indicate that the upload is completed.

abort_upload(), to abort an upload in progress.

prepare_directory_upload(), to create files and links from a directory.

Example

import tempfile
tmp_dir = tempfile.mkdtemp()

with open(f"{tmp_dir}/blah.txt", "w") as f:
    f.write(blah_contents)

os.makedirs(f"{tmp_dir}/foo", exist_ok=True)

with open(f"{tmp_dir}/foo/blah.txt", "w") as f:
    f.write(foobar_contents)

files = [
    str(file.relative_to(tmp_dir))
    for file in Path(tmp_dir).rglob("*")
    if not os.path.isdir(file)
]

init = start_upload(
    project="test-Py-demo",
    asset="upload",
    version="1",
    files=files,
    directory=tmp_dir,
)

abort_upload(init)
Parameters:
  • project (str) – Project name.

  • asset (str) – Asset name.

  • version (str) – Version name.

  • files (Union[str, List[str], List[dict]]) –

    A file path or a List of file paths to upload. These paths are assumed to be relative to the directory parameter.

    Optionally, May be provided a list where each element is a dictionary containing the following keys: - path: a string containing the relative path of the file inside the version’s subdirectory. - size, a non-negative integer specifying the size of the file in bytes. - md5sum, a string containing the hex-encoded MD5 checksum of the file. - Optionally dedup, a boolean value indicating whether deduplication should be attempted for each file. If this is not available, the parameter deduplicate is used.

  • links (List[dict]) – A List containing a dictionary with the following keys: - from.path: a string containing the relative path of the file inside the version’s subdirectory. - to.project: a string containing the project of the list destination. - to.asset: a string containing the asset of the list destination. - to.version: a string containing the version of the list destination. - to.path: a string containing the path of the list destination.

  • deduplicate (bool) – Whether the backend should attempt deduplication of files in the immediately previous version. Defaults to True.

  • probation (bool) – Whether to perform a probational upload. Defaults to False.

  • url (str) – URL of the gypsum REST API.

  • token (str) – GitHub access token to authenticate to the gypsum REST API.

  • directory (str) – Path to a directory containing the files to be uploaded. This directory is assumed to correspond to a version of an asset.

Returns:

  • file_urls, a list of lists containing information about each

file to be uploaded. This is used by uploadFiles. - complete_url, a string containing the completion URL, to be used by complete_upload. - abort_url, a string specifying the abort URL, to be used by abort_upload. - session_token, a string for authenticating to the newly initialized upload session.

Return type:

Dictionary containing the following keys

gypsum_client.upload_file_actions module

gypsum_client.upload_file_actions.upload_directory(directory, project, asset, version, cache_dir='/home/runner/.cache/gypsum', deduplicate=True, probation=False, url='https://gypsum.artifactdb.com', token=None, concurrent=1, abort_failed=True)[source]

Upload a directory to the gypsum backend. :rtype: bool

This function is a wrapper around prepare_directory_upload() and start_upload() and others.

The aim is to streamline the upload of a directory’s contents when no customization of the file listing is required.

Convenience method to upload a directory to the gypsum backend as a versioned asset of a project. This requires uploader permissions to the relevant project.

Example:

tmp_dir = tempfile.mkdtemp()

with open(os.path.join(tmp, "blah.txt"), "w") as f:
    f.write("ABCDEFGHIJKLMNOPQRSTUVWXYZ")

os.makedirs(os.path.join(tmp, "foo"))
with open(os.path.join(tmp, "foo", "bar.txt"), "w") as f:
    f.write("

“.join(map(str, range(1, 11))))

upload_directory(

tmp, “test-Py”, “upload-dir”, version=”1”

)

Args:
directory:

Path to a directory containing the files to be uploaded. This directory is assumed to correspond to a version of an asset.

project:

Project name.

asset:

Asset name.

version:

Version name.

cache_dir:

Path to the cache for saving files, e.g., in save_version().

Used to convert symbolic links to upload links,see prepare_directory_upload().

deduplicate:

Whether the backend should attempt deduplication of files in the immediately previous version. Defaults to True.

probation:

Whether to perform a probational upload. Defaults to False.

url:

URL of the gypsum REST API.

token:

GitHub access token to authenticate to the gypsum REST API.

concurrent:

Number of concurrent downloads. Defaults to 1.

abort_failed:

Whether to abort the upload on any failure.

Setting this to False can be helpful for diagnosing upload problems.

Returns:

True if successfull, otherwise False.

gypsum_client.upload_file_actions.upload_files(init, directory=None, url='https://gypsum.artifactdb.com', concurrent=1)[source]

Upload files in an initialized upload session for a version of an asset.

Parameters:
  • init (dict) – Dictionary containing file_urls and session_token. This is typically the return value from start_upload`().

  • directory (str) – Path to the directory containing files. Defaults to None, if files are part of the current working directory.

  • url (str) – URL of the gypsum REST API.

  • concurrent (int) – Number of concurrent uploads. Defaults to 1.

gypsum_client.validate_metadata module

gypsum_client.validate_metadata.validate_metadata(metadata, schema, stringify=None)[source]

Validate metadata against a JSON schema for a SQLite database.

See also

fetch_metadata_schema(), to get the JSON schema.

fetch_metadata_database(), to obtain the SQLite database of metadata.

Example

_cache_dir = tempfile.mkdtemp()

metadata = {
    "title": "Fatherhood",
    "description": "Luke ich bin dein Vater.",
    "sources": [{"provider": "GEO", "id": "GSE12345"}],
    "taxonomy_id": ["9606"],
    "genome": ["GRCm38"],
    "maintainer_name": "Darth Vader",
    "maintainer_email": "vader@empire.gov",
    "bioconductor_version": "3.10",
}

schema = fetch_metadata_schema(cache_dir=_cache_dir)
validate_metadata(metadata, schema)
Parameters:
  • metadata (Union[str, dict]) –

    Metadata to be checked.

    Usually a dictionary, but may also be a JSON-formatted string.

  • schema (str) – Path to a schema.

  • stringify (Optional[bool]) – Whether to convert metadata to a JSON-formatted string. Defaults to True if metadata is not already a string.

Return type:

bool

Returns:

True if metadata conforms to schema.

Module contents