gypsum_client package¶
Submodules¶
gypsum_client.auth module¶
- gypsum_client.auth.access_token(full=False, request=True, cache_dir='/home/runner/.cache/gypsum', token_expiration_limit=10)[source]¶
Get GitHub access token for authentication to the gypsum API’s.
Example
token = access_token()
- Parameters:
full (
bool
) – Whether to return the full token details. Defaults to False, onlytoken
is returned.request (
bool
) – Whether to request a new token if no token is found or the current token is expired. Defaults to True.cache_dir (
Optional
[str
]) – Path to the cache directory to store tokens. Can be set to None, indicating token is not cached to disk.token_expiration_limit (
int
) – Integer specifying the number of seconds until the token expires.
- Return type:
- Returns:
If full=False A string specifying the GitHub token to access gypsum’s resources.
If full=True retuns a dicionary containing the full token details.
- gypsum_client.auth.set_access_token(token='auto', app_url='https://gypsum.artifactdb.com', app_key=None, app_secret=None, github_url='https://api.github.com', user_agent=None, cache_dir='/home/runner/.cache/gypsum')[source]¶
Set GitHub access token for authentication to the gypsum API’s.
- Parameters:
token (
str
) – A String containing Github’s personal access token.app_url (
str
) – URL to the gypsum REST API.app_secret (
Optional
[str
]) – Secret to the GitHub oauth app.github_url (
str
) – URL to GitHub’s API.user_agent (
Optional
[str
]) – Specify the user agent for queries to various endpoints.cache_dir (
Optional
[str
]) – Path to the cache directory to store tokens. Defaults to None, indicating token is not cached to disk.
- Returns:
token
, a string containing the token.name
, the name of the GitHub user authenticated by the token.expires
, the Unix time at which the token expires.
- Return type:
Dictionary containing the following keys
gypsum_client.cache_directory module¶
- gypsum_client.cache_directory.cache_directory(dir=None)[source]¶
Cache directory.
Specify the cache directory in the local filesystem for gypsum-related data.
If the
GYPSUM_CACHE_DIR
environment variable is set before the first call tocache_directory()
, it is used as the initial location of the cache directory. Otherwise, the initial location is set to user’s home directory defined byappdirs.user_cache_dir()
.- Parameters:
dir (
Optional
[str
]) – Path to the current cache directory. If None, a default cache location is chosen.- Return type:
- Returns:
If
dir
isNone
, the path to the cache directory is returned.If
dir
is supplied, it is used to set the path to the cache directory, and the previous location of the directory is returned.
gypsum_client.clone_operations module¶
Clone a version’s directory structure.
Cloning of a versioned asset involves creating a directory at the destination
that has the same contents as the corresponding project-asset-version directory.
All files in the specified version are represented as symlinks from the
destination to the corresponding file in the cache.
The idea is that, when the destination is used in
prepare_directory_upload()
,
the symlinks are converted into upload links, i.e., links=
in
start_upload()
.
This allows users to create new versions very cheaply as duplicate files
are not uploaded to/stored in the backend.
Users can more-or-less do whatever they want inside the cloned destination,
but they should treat the symlink targets as read-only.
That is, they should not modify the contents of the linked-to file, as these
refer to assumed-immutable files in the cache.
If a file in the destination needs to be modified, the symlink should be
deleted and replaced with an actual file;
this avoids mutating the cache and it ensures that
prepare_directory_upload()
recognizes that a new file actually needs to be uploaded.
Advanced users can set download=False
, in which case symlinks are created
even if their targets are not present in the cache.
In such cases, the destination should be treated as write-only due to the
potential presence of dangling symlinks.
This mode is useful for uploading a new version of an asset without
downloading the files from the existing version,
assuming that the modifications associated with the former can be
achieved without reading any of the latter.
On Windows, the user may not have permissions to create symbolic links, so the function will transparently fall back to creating hard links or copies instead. This precludes any optimization by prepare_directory_upload as the hard links/copies cannot be converted into upload links. It also assumes that download=True as dangling links/copies cannot be created.
- gypsum_client.clone_operations.clone_version(project, asset, version, destination, download=True, cache_dir='/home/runner/.cache/gypsum', url='https://gypsum.artifactdb.com', **kwargs)[source]¶
Clone a version’s directory structure.
Clone the directory structure for a versioned asset into a separate location. This is typically used to prepare a new version for a lightweight upload.
See also
prepare_directory_upload()
, to prepare an upload based on the directory contents.Example
import tempfile cache = tempfile.mkdtemp() dest = tempfile.mkdtemp() clone_version("test-R", "basic", "v1", destination=dest, cache_dir=cache)
- Parameters:
project (
str
) – Project name.asset (
str
) – Asset name.version (
str
) – Version name.cache_dir (
str
) – Path to the cache directory.destination (
str
) – Destination directory at which to create the clone.download (
bool
) – Whether the version’s files should be downloaded first. This can be set to False to create a clone without actually downloading any of the version’s files. Defaults to True.url (
str
) – URL of the gypsum REST API.**kwargs –
Further arguments to pass to
save_version()
.Only used if
download
is True.
gypsum_client.config module¶
Set this to False if SSL certificates are not properly setup on your machine.
essentially sets verify=False
on all requests to the gypsum REST API.
Example
gypsum_client.create_operations module¶
- gypsum_client.create_operations.create_project(project, owners, uploaders=[], baseline=None, growth_rate=None, year=None, url='https://gypsum.artifactdb.com', token=None)[source]¶
Create a new project with the associated permissions.
See also
remove_project()
, to remove the project.Example
createProject( "test-Py-create", owners="jkanche", uploaders=[{"id": "ArtifactDB-bot"}] )
- Parameters:
project (
str
) – Project name.owners (
Union
[str
,List
[str
]]) –List of GitHub users or organizations that are owners of this project.
May also be a string containing the Github user or organization.
List of authorized uploaders for this project. Defaults to an empty list.
Checkout
fetch_permissions()
for more info.baseline (
int
) – Baseline quote in bytes.growth_rate (
int
) – Expected annual growth rate in bytes.year (
int
) – Year of project creation.url (
str
) – URL to the gypsum REST API.token (
str
) – GitHub access token to authenticate with the gypsum REST API. The token must refer to a gypsum administrator account.
gypsum_client.fetch_metadata_database module¶
Fetch the metadata database.
This function will automatically check for updates to the SQLite files and will download new versions accordingly. New checks are performed when one hour or more has elapsed since the last check. If the check fails, a warning is raised and the function returns the currently cached file.
- gypsum_client.fetch_metadata_database.fetch_metadata_database(name='bioconductor.sqlite3', cache_dir='/home/runner/.cache/gypsum', overwrite=False)[source]¶
Fetch the SQLite database containing metadata from the gypsum backend.
See metadata index for more details.
Each database is generated by aggregating metadata across multiple assets and/or projects, and can be used to perform searches for interesting objects.
See also
fetch_metadata_schema()
, to get the JSON schema used to define the database tables.Example
sql_path = fetch_metadata_database()
- Parameters:
- Return type:
- Returns:
Path to the downloaded database.
gypsum_client.fetch_metadata_schema module¶
- gypsum_client.fetch_metadata_schema.fetch_metadata_schema(name='bioconductor/v1.json', cache_dir='/home/runner/.cache/gypsum', overwrite=False)[source]¶
Fetch a JSON schema file for metadata to be inserted into a SQLite database.
Fetch a JSON schema file for metadata to be inserted into a SQLite database See metadata index for more details.
Each SQLite database is created from metadata files uploaded to the gypsum backend, so clients uploading objects to be incorporated into the database should validate their metadata against the corresponding JSON schema.
See also
validate_metadata()
, to validate metadata against a chosen schema.fetch_metadata_database()
, to obtain the SQLite database of metadata.Example
schema_path = fetch_metadata_schema()
gypsum_client.fetch_operations module¶
- gypsum_client.fetch_operations.fetch_latest(project, asset, url='https://gypsum.artifactdb.com')[source]¶
Fetch the latest version of a project’s asset.
See also
refresh_latest()
, to refresh the latest version.Example
ver = fetch_latest("test-R", "basic")
- gypsum_client.fetch_operations.fetch_manifest(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]¶
Fetch the manifest for a version of an asset of a project.
Example
manifest = fetch_manifest("test-R", "basic", "v1")
- Parameters:
- Return type:
- Returns:
Dictionary containing the manifest for this version. Each element is named after the relative path of a file in this version. The value of each element is another list with the following fields: -
size
, an integer specifying the size of the file in bytes. -md5sum
, a string containing the hex-encoded MD5 checksum of the file. - Optionallink
, a list specifying the link destination for a file.This contains the strings
project
,asset
,version
andpath
. If the link destination is itself a link, anancestor
list will be present that specifies the final location of the file after resolving all intermediate links.
- gypsum_client.fetch_operations.fetch_permissions(project, url='https://gypsum.artifactdb.com')[source]¶
Fetch the permissions for a project.
See also
set_permissions()
, to update or modify the permissions.Example
perms = fetch_permissions("test-R")
- Parameters:
- Returns:
owners
, a character vector containing the GitHub users or
organizations that are owners of this project. -
uploaders
, a list of lists specifying the users or organizations who are authorzied to upload to this project.Each entry is a list with the following fields: -
id
, a string containing the GitHub user or organization that is authorized to upload. - Optionalasset
, a string containing the name of the asset that the uploader is allowed to upload to. If not provided, there is no restriction on the uploaded asset name. - Optionalversion
, a string containing the name of the version that the uploader is allowed to upload to.If not provided, there is no restriction on the uploaded version name. - Optionaluntil``a POSIXct object containing the expiry date of this authorization. If not provided, the authorization does not expire. - Optional ``trusted
, whether the uploader is trusted. If not provided, defaults to False.- Return type:
Dictionary containing the permissions for this project
- gypsum_client.fetch_operations.fetch_quota(project, url='https://gypsum.artifactdb.com')[source]¶
Fetch the quota details for a project.
See also
set_quota()
, to update or modify the quota.Example
quota = fetch_quota("test-R")
- Parameters:
- Return type:
- Returns:
Dictionary containing
baseline
, the baseline quota at time zero in bytes;growth_rate
, the annual growth rate for the quota in bytes;year
, the creation year (i.e., time zero) for this project.
- gypsum_client.fetch_operations.fetch_summary(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]¶
Fetch the summary for a version of an asset of a project.
Example
summa = fetch_summary("test-R", "basic", "v1")
- Parameters:
- Returns:
upload_user_id
, string containing the identity of the uploader.upload_start
, a POSIXct object containing the upload start time.upload_finish
, a POSIXct object containing the upload finish time.on_probation
(optional), a logical scalar indicating whether the upload is probational.If missing, this can be assumed to be False.
- Return type:
Dictionary containing the summary for this version, with the following fields
- gypsum_client.fetch_operations.fetch_usage(project, url='https://gypsum.artifactdb.com')[source]¶
Fetch the quota usage for a project.
See also
refresh_usage()
, to refresh usage details.Example
usage = fetch_usage("test-R")
gypsum_client.list_operations module¶
- gypsum_client.list_operations.list_assets(project, url='https://gypsum.artifactdb.com')[source]¶
List all assets in a project.
Example
all_assets = list_assets("test-R")
- gypsum_client.list_operations.list_files(project, asset, version, prefix=None, include_dot=True, url='https://gypsum.artifactdb.com')[source]¶
List all files for a specified version of a project and asset.
Example
all_files = list_files("test-R", "basic", "v1")
- Parameters:
project (
str
) – Project name.asset (
str
) – Asset name.version (
str
) – Version name.prefix (
str
) –Prefix for the object key.
If provided. a file is only listed if its object key starts with
{project}/{asset}/{version}/{prefix}
.Defaults to None and all associated files with this version of the asset in the specified project are listed.
include_dot (
bool
) – Whether to list files with..
in their names.url (
str
) – URL to the gypsum compatible API.
- Return type:
- Returns:
List of relative paths of files associated with the versioned asset.
- gypsum_client.list_operations.list_projects(url='https://gypsum.artifactdb.com')[source]¶
List all projects in the gypsum backend.
Example
all_prjs = list_projects()
gypsum_client.prepare_directory_for_upload module¶
Prepare to upload a directory’s contents.
Files in directory (that are not symlinks) are used as
regular uploads, i.e., files= in
start_upload()
.
If directory contains a symlink to a file in cache,
we assume that it points to a file that was previously downloaded
by, e.g., save_file()
or
save_version()
.
Thus, instead of performing a regular upload, we attempt to
create an upload link, i.e., links=
in
start_upload()
.
This is achieved by examining the destination path of the
symlink and inferring the link destination in the backend.
Note that this still works if the symlinks are dangling.
If a symlink cannot be converted into an upload link, it will
be used as a regular upload, i.e., the contents of the symlink
destination will be uploaded by
start_upload()
.
In this case, an error will be raised if the symlink is dangling
as there is no file that can actually be uploaded.
If links="always"
, an error is raised instead upon symlink
conversion failure.
This function is intended to be used with
clone_version()
,
which creates symlinks to files in cache.
Example
import tempfile
cache = tempfile.mkdtemp()
dest = tempfile.mkdtemp()
# Clone a project
clone_version("test-R", "basic", "v1", destination=dest, cache_dir=cache)
# Make some modification
with open(os.path.join(dest, "heanna"), "w") as f:
f.write("sumire")
# Prepare the directory for upload
prepped = prepare_directory_upload(dest, cache_dir=cache)
- gypsum_client.prepare_directory_for_upload.prepare_directory_upload(directory, links='auto', cache_dir='/home/runner/.cache/gypsum')[source]¶
Prepare to upload a directory’s contents.
Prepare to upload a directory’s contents via start_upload. This goes through the directory to list its contents and convert symlinks to upload links.
- Parameters:
directory (
str
) – Path to a directory, the contents of which are to be uploaded viastart_upload()
.links (
Literal
['auto'
,'always'
,'never'
]) – Indicate how to handle symlinks in directory. Must be one of the following: - “auto”: Will attempt to convert symlinks into upload links. If the conversion fails, a regular upload is performed. - “always”: Will attempt to convert symlinks into upload links. If the conversion fails, an error is raised. - “never”: Will never attempt to convert symlinks into upload links. All symlinked files are treated as regular uploads.cache_dir (
str
) – Path to the cache directory, used to convert symlinks into upload links.
- Returns:
files: list of strings to be used as files=
in
start_upload()
. - links: dictionary to be used as links= instart_upload()
.- Return type:
Dictionary containing
gypsum_client.probation_operations module¶
- gypsum_client.probation_operations.approve_probation(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]¶
Approve a probational upload.
This removes the
on_probation
tag from the uploaded version.See also
start_upload()
, to specify probational upload.reject_probation()
, to reject the probational upload..Example
init = start_upload( project="test-Py", asset="probation", version="v1", files=[], probation=True ) complete_upload(init) approve_probation("test-Py", "probation", "v1") # Cleanup if this is just for testing remove_asset("test-Py", "probation")
- gypsum_client.probation_operations.reject_probation(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]¶
Reject a probational upload.
This removes all files associated with that version.
See also
start_upload()
, to specify probational upload.approve_probation()
, to approve the probational upload..Example
init = start_upload( project="test-Py", asset="probation", version="v1", files=[], probation=True ) complete_upload(init) reject_probation("test-Py", "probation", "v1")
gypsum_client.refresh_operations module¶
- gypsum_client.refresh_operations.refresh_latest(project, asset, url='https://gypsum.artifactdb.com', token=None)[source]¶
Refresh the latest version.
Recompute the latest version of a project’s asset. This is useful on rare occasions where multiple simultaneous uploads cause the latest version to be slightly out of sync.
See also
fetch_latest()
, to fetch the latest version without recomputing.Example
ver = refresh_latest("test-R", "basic")
- gypsum_client.refresh_operations.refresh_usage(project, url='https://gypsum.artifactdb.com', token=None)[source]¶
Refresh the usage quota of a project.
Recompute the usage of a project. This is useful on rare occasions where multiple simultaneous uploads cause the usage calculations to be out of sync.
See also
fetch_usage()
, to fetch the usage without recomputing.Example
ver = refresh_usage("test-R", "basic")
gypsum_client.remove_operations module¶
- gypsum_client.remove_operations.remove_asset(project, asset, url='https://gypsum.artifactdb.com', token=None)[source]¶
Remove an asset of a project from the gypsum backend.
Example
# Mock a project init = start_upload( project="test-Py-remove", asset="mock-remove", version="v1", files=[], ) complete_upload(init) remove_asset("test-Py-remove", "mock-remove")
- gypsum_client.remove_operations.remove_project(project, url='https://gypsum.artifactdb.com', token=None)[source]¶
Remove a project from the gypsum backend.
See also
create_project()
, to create a project.remove_asset()
, to remove a specific asset.remove_version()
, to remove a specific version.Example
create_project("test-Py-remove", owners=["jkanche"]) remove_project("test-Py-remove")
- gypsum_client.remove_operations.remove_version(project, asset, version, url='https://gypsum.artifactdb.com', token=None)[source]¶
Remove a project from the gypsum backend.
See also
remove_asset()
, to remove a specific asset.remove_version()
, to remove a specific version.Example
# Mock a project init = start_upload( project="test-Py-remove", asset="mock-remove", version="v1", files=[], ) complete_upload(init) remove_version("test-Py-remove", "mock-remove", "v1")
gypsum_client.resolve_links module¶
- gypsum_client.resolve_links.resolve_links(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]¶
Resolve links in the cache directory.
Create hard links (or copies, if filesystem links are not supported) for linked-from files to their link destinations.
Example
cache = tempfile() save_version("test-R", "basic", "v3", relink=False, cache_dir=cache) list_files(cache_dir, recursive=True, all_files=True) resolve_links("test-R", "basic", "v3", cache_dir=cache)
- Parameters:
- Returns:
True if all links are resolved.
gypsum_client.rest_url module¶
gypsum_client.s3_config module¶
gypsum_client.save_operations module¶
- gypsum_client.save_operations.save_file(project, asset, version, path, cache_dir='/home/runner/.cache/gypsum', overwrite=False, url='https://gypsum.artifactdb.com')[source]¶
Save a file from a version of a project asset.
Download a file from the gypsum bucket, for a version of an asset of a project.
See also
save_version()
, to save all files associated with a version.Example
out <- save_version("test-R", "basic", "v1", "blah.txt")
- Parameters:
project (
str
) – Project name.asset (
str
) – Asset name.version (
str
) – Version name.path (
str
) –Suffix of the object key for the file of interest, i.e., the relative
path
inside the version’s ` subdirectory`.The full object key is defined as
{project}/{asset}/{version}/{path}
.overwrite (
bool
) – Whether to overwrite existing file in cache.url (
str
) – URL to the gypsum compatible API.
- Returns:
The destintion file path where the file is downloaded to in the local file system.
- gypsum_client.save_operations.save_version(project, asset, version, cache_dir='/home/runner/.cache/gypsum', overwrite=False, relink=True, concurrent=1, url='https://gypsum.artifactdb.com')[source]¶
Download all files associated with a version of an asset of a project from the gypsum bucket.
See also
save_file()
, to save a single file.Example
out <- save_version("test-R", "basic", "v1")
- Parameters:
project (
str
) – Project name.asset (
str
) – Asset name.version (
str
) – Version name.overwrite (
bool
) – Whether to overwrite existing file in cache.relink (
bool
) – Whether links should be resolved, seeresolve_links()
. Defaults to True.concurrent (
int
) – Number of concurrent downloads. Defaults to 1.
- Return type:
- Returns:
Path to the local directory where the files are downloaded to.
gypsum_client.search_metadata module¶
Text search on the metadata database.
Each string is tokenized by converting it to lower case and splitting it on characters that are not Unicode letters/numbers or a dash. We currently do not remove diacritics so these will need to be converted to ASCII by the user. If a text query involves only non-letter/number/dash characters, the filter will not be well-defined and will be ignored when constructing SQL statements.
For convenience, a non-empty character vector may be used in query
.
- A list of length 1 is treated as shorthand for a text
query with default arguments in define_text_query()
.
- A list of length greater than 1 is treated as
shorthand for an AND operation on default text queries for
each of the individual strings.
See also
fetch_metadata_database()
,
to download and cache the database files.
See metadata index, for details on the SQLite file contents and table structure.
- class gypsum_client.search_metadata.GypsumSearchClause(type, text=None, field=None, partial=False, children=None, child=None)[source]¶
Bases:
object
- gypsum_client.search_metadata.define_text_query(text, field=None, partial=False)[source]¶
Define a query.
- Parameters:
- Return type:
- Returns:
GypsumSearchClause defining the search.
- gypsum_client.search_metadata.search_metadata_text(path, query, latest=True, include_metadata=True)[source]¶
Text search on the metadata database.
Perform a text search on a SQLite database containing metadata from the gypsum backend. This is based on a precomputed tokenization of all string properties in each metadata document; see metadata index for details.
Examples
Search all metadata for a keyword:
search_metadata_text( sqlite_path, ["mikoto"], include_metadata=False, latest=False )
Search for metadata containing multiple keywords (AND operation):
search_metadata_text( sqlite_path, ["sakugawa", "judgement"], include_metadata=False, latest=False ) # or use the ``&`` operation query = define_text_query("sakugawa") & define_text_query("judgement") result = search_metadata_text( sqlite_path, query, include_metadata=False, latest=False )
Search for metadata container either of the keywords (OR operation):
# use the ``|`` operation query = define_text_query("uiharu") | define_text_query("rank") result = search_metadata_text( sqlite_path, query, include_metadata=False, latest=False )
- Parameters:
path (
str
) – Path to the SQLite file, usually obtained byfetch_metadata()
.query (
Union
[str
,List
[str
],GypsumSearchClause
]) –List of keywords specifying the query to execute.
May be
GypsumSearchClause
class generated bydefine_text_query()
.latest (
bool
) – Whether to only search in the latest version for each asset. Defaults to True.include_metadata (
bool
) – Whether metadata should be returned. Defaults to True.
- Return type:
- Returns:
Results matching the query.
gypsum_client.set_operations module¶
- gypsum_client.set_operations.set_permissions(project, owners=None, uploaders=None, append=True, url='https://gypsum.artifactdb.com', token=None)[source]¶
Set the owner and uploader permissions for a project.
See also
fetch_permission()
, to fetch the permissions for a project.Example
create_project("test-Py-perms", owners=["jkanche"]) until = ( (datetime.now() + timedelta(seconds=1000000)) .replace(microsecond=0) ) set_permissions( "test-Py-perms", owners=["LTLA"], uploaders=[{"id": "LTLA", "until": until}] )
- Parameters:
project (
str
) – The project name.owners (
str
) –List of GitHub users or organizations that are owners of this project.
If None, no change is made.
uploaders (
str
) –List of authorized uploaders for this project.
If None, no change is made.
append (
bool
) –Whether to append owners and uploaders to the existing ones.
If False, the provided owners and uploaders replace the existing values.
url (
str
) – URL of the gypsum REST API.token (
str
) – GitHub access token to authenticate to the gypsum REST API.
- gypsum_client.set_operations.set_quota(project, baseline=None, growth_rate=None, year=None, url='https://gypsum.artifactdb.com', token=None)[source]¶
Set the quota for a project.
See also
fetch_quota()
, to fetch the usage details.Example
create_project("test-Py-quota", owners=["jkanche"]) set_quota( "test-Py-quota", baseline=1234, growth_rate=5678, year=2020 )
- Parameters:
project (
str
) – The project name.baseline (
int
) – Baseline quote, in bytes. If None, no change is made.growth_rate (
int
) – Expected annual growth rate, in bytes. If None, no change is made.year (
int
) – Year of project creation. If None, no change is made.url (
str
) – URL of the gypsum REST API.token (
str
) – GitHub access token to authenticate to the gypsum REST API.
gypsum_client.upload_api_operations module¶
- gypsum_client.upload_api_operations.abort_upload(init, url='https://gypsum.artifactdb.com')[source]¶
Abort an upload session, usually after an irrecoverable error.
See also
start_upload()
, to create the init.Example
import tempfile tmp_dir = tempfile.mkdtemp() with open(f"{tmp_dir}/blah.txt", "w") as f: f.write(blah_contents) os.makedirs(f"{tmp_dir}/foo", exist_ok=True) with open(f"{tmp_dir}/foo/blah.txt", "w") as f: f.write(foobar_contents) files = [ str(file.relative_to(tmp_dir)) for file in Path(tmp_dir).rglob("*") if not os.path.isdir(file) ] init = start_upload( project="test-Py-demo", asset="upload", version="1", files=files, directory=tmp_dir, ) complete_upload(init)
- Parameters:
init (
dict
) –Dictionary containing
abort_url
andsession_token
.start_upload()
, to createinit
.url – URL to the gypsum REST API.
- gypsum_client.upload_api_operations.complete_upload(init, url='https://gypsum.artifactdb.com')[source]¶
Complete an upload session after all files have been uploaded.
See also
start_upload()
, to create the init.Example
import tempfile tmp_dir = tempfile.mkdtemp() with open(f"{tmp_dir}/blah.txt", "w") as f: f.write(blah_contents) os.makedirs(f"{tmp_dir}/foo", exist_ok=True) with open(f"{tmp_dir}/foo/blah.txt", "w") as f: f.write(foobar_contents) files = [ str(file.relative_to(tmp_dir)) for file in Path(tmp_dir).rglob("*") if not os.path.isdir(file) ] init = start_upload( project="test-Py-demo", asset="upload", version="1", files=files, directory=tmp_dir, ) abort_upload(init)
- Parameters:
init (
dict
) –Dictionary containing
complete_url
andsession_token
.start_upload()
, to createinit
.url – URL to the gypsum REST API.
- gypsum_client.upload_api_operations.start_upload(project, asset, version, files, links=None, deduplicate=True, probation=False, url='https://gypsum.artifactdb.com', token=None, directory=None)[source]¶
Start an upload.
Start an upload of a new version of an asset, or a new asset of a project.
See also
upload_files()
, to actually upload the files.complete_upload()
, to indicate that the upload is completed.abort_upload()
, to abort an upload in progress.prepare_directory_upload()
, to createfiles
andlinks
from a directory.Example
import tempfile tmp_dir = tempfile.mkdtemp() with open(f"{tmp_dir}/blah.txt", "w") as f: f.write(blah_contents) os.makedirs(f"{tmp_dir}/foo", exist_ok=True) with open(f"{tmp_dir}/foo/blah.txt", "w") as f: f.write(foobar_contents) files = [ str(file.relative_to(tmp_dir)) for file in Path(tmp_dir).rglob("*") if not os.path.isdir(file) ] init = start_upload( project="test-Py-demo", asset="upload", version="1", files=files, directory=tmp_dir, ) abort_upload(init)
- Parameters:
project (
str
) – Project name.asset (
str
) – Asset name.version (
str
) – Version name.files (
Union
[str
,List
[str
],List
[dict
]]) –A file path or a List of file paths to upload. These paths are assumed to be relative to the
directory
parameter.Optionally, May be provided a list where each element is a dictionary containing the following keys: -
path
: a string containing the relative path of the file inside the version’s subdirectory. -size
, a non-negative integer specifying the size of the file in bytes. -md5sum
, a string containing the hex-encoded MD5 checksum of the file. - Optionallydedup
, a boolean value indicating whether deduplication should be attempted for each file. If this is not available, the parameterdeduplicate
is used.links (
List
[dict
]) – A List containing a dictionary with the following keys: -from.path
: a string containing the relative path of the file inside the version’s subdirectory. -to.project
: a string containing the project of the list destination. -to.asset
: a string containing the asset of the list destination. -to.version
: a string containing the version of the list destination. -to.path
: a string containing the path of the list destination.deduplicate (
bool
) – Whether the backend should attempt deduplication offiles
in the immediately previous version. Defaults to True.probation (
bool
) – Whether to perform a probational upload. Defaults to False.url (
str
) – URL of the gypsum REST API.token (
str
) – GitHub access token to authenticate to the gypsum REST API.directory (
str
) – Path to a directory containing thefiles
to be uploaded. This directory is assumed to correspond to a version of an asset.
- Returns:
file_urls
, a list of lists containing information about each
file to be uploaded. This is used by
uploadFiles
. -complete_url
, a string containing the completion URL, to be used bycomplete_upload
. -abort_url
, a string specifying the abort URL, to be used byabort_upload
. -session_token
, a string for authenticating to the newly initialized upload session.- Return type:
Dictionary containing the following keys
gypsum_client.upload_file_actions module¶
- gypsum_client.upload_file_actions.upload_directory(directory, project, asset, version, cache_dir='/home/runner/.cache/gypsum', deduplicate=True, probation=False, url='https://gypsum.artifactdb.com', token=None, concurrent=1, abort_failed=True)[source]¶
Upload a directory to the gypsum backend. :rtype:
bool
This function is a wrapper around
prepare_directory_upload()
andstart_upload()
and others.The aim is to streamline the upload of a directory’s contents when no customization of the file listing is required.
Convenience method to upload a directory to the gypsum backend as a versioned asset of a project. This requires uploader permissions to the relevant project.
Example:
tmp_dir = tempfile.mkdtemp() with open(os.path.join(tmp, "blah.txt"), "w") as f: f.write("ABCDEFGHIJKLMNOPQRSTUVWXYZ") os.makedirs(os.path.join(tmp, "foo")) with open(os.path.join(tmp, "foo", "bar.txt"), "w") as f: f.write("
“.join(map(str, range(1, 11))))
- upload_directory(
tmp, “test-Py”, “upload-dir”, version=”1”
)
- Args:
- directory:
Path to a directory containing the
files
to be uploaded. This directory is assumed to correspond to a version of an asset.- project:
Project name.
- asset:
Asset name.
- version:
Version name.
- cache_dir:
Path to the cache for saving files, e.g., in
save_version()
.Used to convert symbolic links to upload links,see
prepare_directory_upload()
.- deduplicate:
Whether the backend should attempt deduplication of
files
in the immediately previous version. Defaults to True.- probation:
Whether to perform a probational upload. Defaults to False.
- url:
URL of the gypsum REST API.
- token:
GitHub access token to authenticate to the gypsum REST API.
- concurrent:
Number of concurrent downloads. Defaults to 1.
- abort_failed:
Whether to abort the upload on any failure.
Setting this to False can be helpful for diagnosing upload problems.
- Returns:
True if successfull, otherwise False.
- gypsum_client.upload_file_actions.upload_files(init, directory=None, url='https://gypsum.artifactdb.com', concurrent=1)[source]¶
Upload files in an initialized upload session for a version of an asset.
- Parameters:
init (
dict
) – Dictionary containingfile_urls
andsession_token
. This is typically the return value fromstart_upload`()
.directory (
str
) – Path to the directory containing files. Defaults to None, if files are part of the current working directory.url (
str
) – URL of the gypsum REST API.concurrent (
int
) – Number of concurrent uploads. Defaults to 1.
gypsum_client.validate_metadata module¶
- gypsum_client.validate_metadata.validate_metadata(metadata, schema, stringify=None)[source]¶
Validate metadata against a JSON schema for a SQLite database.
See also
fetch_metadata_schema()
, to get the JSON schema.fetch_metadata_database()
, to obtain the SQLite database of metadata.Example
_cache_dir = tempfile.mkdtemp() metadata = { "title": "Fatherhood", "description": "Luke ich bin dein Vater.", "sources": [{"provider": "GEO", "id": "GSE12345"}], "taxonomy_id": ["9606"], "genome": ["GRCm38"], "maintainer_name": "Darth Vader", "maintainer_email": "vader@empire.gov", "bioconductor_version": "3.10", } schema = fetch_metadata_schema(cache_dir=_cache_dir) validate_metadata(metadata, schema)
- Parameters:
- Return type:
- Returns:
True if metadata conforms to schema.