Utilities for deduplicating objects inside saveObject
to save time and/or storage space.
createDedupSession()
checkObjectInDedupSession(x, session)
addObjectToDedupSession(x, session, path)
Some object, typically S4.
Session object created by createDedupSession
.
String containing the absolute path to the directory in which x
is to be saved.
This will be used for deduplication if another object is identified as a duplicate of x
.
If a relative path is provided, it will be converted to an absolute path.
createDedupSession
will return a deduplication session that can be modified in-place.
If x
is a duplicate of an object in session
, checkObjectInDedupSession
will return a string containing the absolute path to a directory representing that object.
Otherwise, it will return NULL
.
addObjectToDedupSession
will add x
to session
with the supplied path
.
It returns NULL
invisibly.
These utilities allow extension developers to support deduplication of objects in a top-level call to saveObject
.
For a given saveObject
method, we can:
Accept a session object in an optional <PREFIX>.dedup.session=
argument.
We may also accept a <PREFIX>.dedup.action=
argument to specify how any deduplication should be performed.
Some <PREFIX>
prefix should be chosen to avoid conflicts between multiple deduplication sessions.
If a session argument is provided, we call checkObjectInDedupSession(x, session)
to see if the x
is a duplicate of an existing object in the session.
If a path is returned, we call cloneDirectory
and return.
Otherwise, we save this object to disk, possibly passing the session argument as <PREFIX>.dedup.session=
in further calls to saveObject
for child objects.
We call addObjectToDedupSession
to add the current object to the session.
A user can enable deduplication by passing the output of createDedupSession
to <PREFIX>.dedup.session=
in the top-level call to saveObject
.
This is most typically performed when saving SummarizedExperiment objects with multiple assays, where one assay consists of delayed operations on another assay.
test <- function(x, path, test.dedup.session=NULL, test.dedup.action="link") {
if (!is.null(test.dedup.session)) {
original <- checkObjectInDedupSession(x, test.dedup.session)
if (!is.null(original)) {
cloneDirectory(original, path, test.dedup.action)
return(invisible(NULL))
}
}
dir.create(path)
saveRDS(x, file.path(path, "whee.rds")) # replace this with actual saving code.
if (!is.null(test.dedup.session)) {
addObjectToDedupSession(x, test.dedup.session, path)
}
}
library(S4Vectors)
y <- DataFrame(A=1:10, B=1:10)
tmp <- tempfile()
dir.create(tmp)
# Saving the first instance of the object, which is now stored in the session.
session <- createDedupSession()
checkObjectInDedupSession(y, session) # no duplicates yet.
#> NULL
test(y, file.path(tmp, "first"), test.dedup.session=session)
# Saving it again will trigger the deduplication.
checkObjectInDedupSession(y, session)
#> [1] "/tmp/Rtmp6XG2G9/file221708cffc5/first"
test(y, file.path(tmp, "duplicate"), test.dedup.session=session)
list.files(tmp, recursive=TRUE)
#> [1] "duplicate/whee.rds" "first/whee.rds"