getFile.Rd
Retrieve a file from an ArtifactDB using its REST endpoints.
getFile(
id,
url,
path = NULL,
cache = NULL,
follow.links = TRUE,
user.agent = NULL
)
getFileURL(id, url, expiry = NULL)
String containing the ArtifactDB identifier for a file.
This is a concatenated identifier involving the project name, file path and version,
i.e., <project>:<path>@<version>
(see Examples).
String containing the URL of the ArtifactDB REST endpoint.
String containing the path to save the file.
This is only used if cache
is not supplied.
Defaults to a temporary path if NULL
.
Function to use for caching the result, see getFileMetadata
for the requirements.
If NULL
, no caching is performed.
Logical scalar indicating whether to search for links to duplicate files, see Details.
String containing the user agent, see authorizedVerb
.
Integer scalar specifying the time until expiry of the link, in seconds.
Defaults to 120 if NULL
; maximum value is 86400 (24 hours).
For getFileURL
, a string containing a URL that can be used to download the specified file.
For getFile
, a string containing a path to a cached file downloaded from the ArtifactDB instance.
ArtifactDB instances support linking between identical files to avoid storing unnecessary duplicates.
If cache
is provided and follow.links=TRUE
, getFile
will follow the links to determine whether the requested file is the same as any previously cached files.
This avoids downloading another copy of the file if one is already available under a linked identifier.
If no local duplicate can be found, getFile
will download the earliest version of the file among the set of duplicates,
and then populate all other duplicates in the cache directory with hard links or copies.
If this fails or if cache=NULL
, it will just download the requested file directly.
packID
, to create id
from various pieces of information.
getFileURL(example.id, url = example.url)
#> [1] "https://gypsum-test.bfb2e522e0b245720424784fcf7c04c0.r2.cloudflarestorage.com/test-public/base/blah.txt?X-Amz-Algorithm=AWS4-HMAC-SHA256&X-Amz-Content-Sha256=UNSIGNED-PAYLOAD&X-Amz-Credential=ae0f6ee0f014deff554f709698940ab3%2F20230202%2Fauto%2Fs3%2Faws4_request&X-Amz-Date=20230202T004455Z&X-Amz-Expires=120&X-Amz-Signature=df7f95a65c44ab0a4cdc39c92e82d1946fb960524a79f349717371a7a56c4845&X-Amz-SignedHeaders=host&x-id=GetObject"
X <- getFile(example.id, url = example.url)
readLines(X) # as we know it's a text file.
#> [1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K" "L" "M" "N" "O" "P" "Q" "R" "S"
#> [20] "T" "U" "V" "W" "X" "Y" "Z"
# Simple caching in the temporary directory:
tmp.cache <- file.path(tempdir(), "zircon-cache")
dir.create(tmp.cache)
cache.fun <- function(key, save) {
path <- file.path(tmp.cache, URLencode(key, reserved=TRUE, repeated=TRUE))
if (!file.exists(path)) {
save(path)
} else {
cat("cache hit!\n")
}
path
}
getFile(example.id, example.url, cache = cache.fun)
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"
getFile(example.id, example.url, cache = cache.fun)
#> cache hit!
#> cache hit!
#> [1] "/tmp/RtmpqT1SXc/zircon-cache/https%3A%2F%2Fgypsum-test.aaron-lun.workers.dev%2Ffiles%2Ftest-public%253Ablah.txt%2540base"