Retrieve a file

Retrieve a file from an ArtifactDB using its REST endpoints.

getFile(
  id,
  url,
  path = NULL,
  cache = NULL,
  follow.links = TRUE,
  user.agent = NULL
)

getFileURL(id, url, expiry = NULL)

Arguments

id: String containing the ArtifactDB identifier for a file. This is a concatenated identifier involving the project name, file path and version, i.e., <project>:<path>@<version> (see Examples).
url: String containing the URL of the ArtifactDB REST endpoint.
path: String containing the path to save the file. This is only used if cache is not supplied. Defaults to a temporary path if NULL.
cache: Function to use for caching the result, see getFileMetadata for the requirements. If NULL, no caching is performed.
follow.links: Logical scalar indicating whether to search for links to duplicate files, see Details.
user.agent: String containing the user agent, see authorizedVerb.
expiry: Integer scalar specifying the time until expiry of the link, in seconds. Defaults to 120 if NULL; maximum value is 86400 (24 hours).

Value

For getFileURL, a string containing a URL that can be used to download the specified file.

For getFile, a string containing a path to a cached file downloaded from the ArtifactDB instance.

Details

ArtifactDB instances support linking between identical files to avoid storing unnecessary duplicates. If cache is provided and follow.links=TRUE, getFile will follow the links to determine whether the requested file is the same as any previously cached files. This avoids downloading another copy of the file if one is already available under a linked identifier. If no local duplicate can be found, getFile will download the earliest version of the file among the set of duplicates, and then populate all other duplicates in the cache directory with hard links or copies. If this fails or if cache=NULL, it will just download the requested file directly.

Author

Aaron Lun

Examples

getFileURL(example.id, url = example.url)
#> [1] "https://gypsum-test.bfb2e522e0b245720424784fcf7c04c0.r2.cloudflarestorage.com/test-public/base/blah.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ae0f6ee0f014deff554f709698940ab3%2F20230202%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20230202T004455Z&X-Amz-Expires=120&X-Amz-Signature=df7f95a65c44ab0a4cdc39c92e82d1946fb960524a79f349717371a7a56c4845&X-Amz-SignedHeaders=host&x-id=GetObject"

X <- getFile(example.id, url = example.url)
readLines(X) # as we know it's a text file.
#>  [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#> [20] "T" "U" "V" "W" "X" "Y" "Z"

# Simple caching in the temporary directory:
tmp.cache <- file.path(tempdir(), "zircon-cache")
dir.create(tmp.cache)
cache.fun <- function(key, save) {
    path <- file.path(tmp.cache, URLencode(key, reserved=TRUE, repeated=TRUE))
    if (!file.exists(path)) {
        save(path)
    } else {
        cat("cache hit!\n")
    }
    path
}
getFile(example.id, example.url, cache = cache.fun)
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"
getFile(example.id, example.url, cache = cache.fun) 
#> cache hit!
#> cache hit!
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"

Arguments

Value

Details

See also

Author

Examples