Retrieve a file from an ArtifactDB using its REST endpoints.

getFile(
  id,
  url,
  path = NULL,
  cache = NULL,
  follow.links = TRUE,
  user.agent = NULL
)

getFileURL(id, url, expiry = NULL)

Arguments

id

String containing the ArtifactDB identifier for a file. This is a concatenated identifier involving the project name, file path and version, i.e., <project>:<path>@<version> (see Examples).

url

String containing the URL of the ArtifactDB REST endpoint.

path

String containing the path to save the file. This is only used if cache is not supplied. Defaults to a temporary path if NULL.

cache

Function to use for caching the result, see getFileMetadata for the requirements. If NULL, no caching is performed.

follow.links

Logical scalar indicating whether to search for links to duplicate files, see Details.

user.agent

String containing the user agent, see authorizedVerb.

expiry

Integer scalar specifying the time until expiry of the link, in seconds. Defaults to 120 if NULL; maximum value is 86400 (24 hours).

Value

For getFileURL, a string containing a URL that can be used to download the specified file.

For getFile, a string containing a path to a cached file downloaded from the ArtifactDB instance.

Details

ArtifactDB instances support linking between identical files to avoid storing unnecessary duplicates. If cache is provided and follow.links=TRUE, getFile will follow the links to determine whether the requested file is the same as any previously cached files. This avoids downloading another copy of the file if one is already available under a linked identifier. If no local duplicate can be found, getFile will download the earliest version of the file among the set of duplicates, and then populate all other duplicates in the cache directory with hard links or copies. If this fails or if cache=NULL, it will just download the requested file directly.

See also

packID, to create id from various pieces of information.

Author

Aaron Lun

Examples

getFileURL(example.id, url = example.url)
#> [1] "https://gypsum-test.bfb2e522e0b245720424784fcf7c04c0.r2.cloudflarestorage.com/test-public/base/blah.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ae0f6ee0f014deff554f709698940ab3%2F20230202%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20230202T004455Z&X-Amz-Expires=120&X-Amz-Signature=df7f95a65c44ab0a4cdc39c92e82d1946fb960524a79f349717371a7a56c4845&X-Amz-SignedHeaders=host&x-id=GetObject"

X <- getFile(example.id, url = example.url)
readLines(X) # as we know it's a text file.
#>  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#> [20] "T" "U" "V" "W" "X" "Y" "Z"

# Simple caching in the temporary directory:
tmp.cache <- file.path(tempdir(), "zircon-cache")
dir.create(tmp.cache)
cache.fun <- function(key, save) {
    path <- file.path(tmp.cache, URLencode(key, reserved=TRUE, repeated=TRUE))
    if (!file.exists(path)) {
        save(path)
    } else {
        cat("cache hit!\n")
    }
    path
}
getFile(example.id, example.url, cache = cache.fun)
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"
getFile(example.id, example.url, cache = cache.fun) 
#> cache hit!
#> cache hit!
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"