stageObject.Rd
Generic to stage assorted R objects. More methods may be defined by other packages to extend the alabaster.base framework to new classes.
stageObject(x, dir, path, child = FALSE, ...)
A Bioconductor object of the specified class.
String containing the path to the staging directory.
String containing a prefix of the relative path inside dir
where x
is to be saved.
The actual path used to save x
may include additional components, see Details.
Logical scalar indicating whether x
is a child of a larger object.
Further arguments to pass to specific methods.
dir
is populated with files containing the contents of x
.
A named list containing the metadata for x
is returned.
Methods for the stageObject
generic should create a subdirectory at the input path
inside dir
.
All files (artifacts and metadata documents) required to represent x
on disk should be created inside path
.
Upon method completion, path
should contain:
Zero or one file containing the data inside x
.
Methods are free to choose any format and name within path
except for the .json
file extension,
which is reserved for JSON metadata documents (see below).
The presence of such a file is optional and may be omitted for metadata-only schemas.
Zero or many subdirectories containing child objects of x
.
Each child object should be saved in its own subdirectory within dir
,
which can have any name that does not conflict with the data file (if present) and does not end with .json
.
This allows developers to decompose complex x
into their components for more flexible staging/loading.
The return value of each method should be a named list of metadata,
which will (eventually) be passed to writeMetadata
to save a JSON metadata file inside the path
subdirectory.
This list should contain at least:
$schema
, a string specifying the schema to use to validate the metadata for the class of x
.
This may be decorated with the package
attribute to help writeMetadata
find the package containing the schema.
path
, a string containing the relative path to the object's file representation inside dir
.
For clarity, we will denote the input path
argument as PATHIN and the output path
property as PATHOUT.
These are different as PATHIN refers to the directory while PATHOUT refers to a file inside the directory.
If a data file exists, PATHOUT should contain the relative path to that file from dir
.
Otherwise, for metadata-only schemas, PATHOUT should be set to a relative path of a JSON file inside the PATHIN subdirectory,
specifying the location in which the metadata is to be saved by writeMetadata
.
is_child
, a logical scalar equal to the input child
.
This list will usually contain more useful elements to describe x
.
The exact nature of those elements will depend on the specified schema for the class of x
.
The stageObject
generic will check if PATHIN already exists inside dir
before dispatching to the methods.
If so, it will throw an error to ensure that downstream name clashes do not occur.
The exception is if PATHIN is "."
, in which case no check is performed; this is useful for eliminating subdirectories in situations where the project contains only one object.
The concept of child objects allows developers to break down complex objects into its basic components for convenience. For example, if one DataFrame is nested within another as a separate column, the former is a child and the latter is the parent. A list of multiple DataFrames will also represent each DataFrame as a child object. This allows developers to re-use the staging/loading code for DataFrames when reconstructing the complex parent object.
If a stageObject
method needs to save a child object, it should do so in a subdirectory of PATHIN (i.e., the input path
argument).
This is achieved by calling altStageObject(child, dir, subpath)
where child
is the child component of x
and subdir
is the desired subdirectory path.
Note the period at the start of the function, which ensures that the method respects customizations from alabaster applications (see .altStageObject
for details).
We also suggest creating subdir
with paste0(path, "/", subname)
for a given subdirectory name, which avoids potential problems with non-/
file separators.
After creating the child object's subdirectory, the stageObject
method should call writeMetadata
on the output of altStageObject
to save the child's metadata.
This will return a list that can be inserted into the parent's metadata list for the method's return value.
All child files created by a stageObject
method should be referenced from the metadata list,
i.e., the child metadata's PATHOUT should be present in in the metadata list as a resource
entry somewhere.
Any attempt to use the stageObject
generic to save another non-child object into PATHIN or its subdirectories will cause an error.
This ensures that PATHIN contains all and only the contents of x
.
checkValidDirectory
, for validation of the staged contents.
tmp <- tempfile()
dir.create(tmp)
library(S4Vectors)
X <- DataFrame(X=LETTERS, Y=sample(3, 26, replace=TRUE))
stageObject(X, tmp, path="test1")
#> $`$schema`
#> [1] "csv_data_frame/v1.json"
#>
#> $path
#> [1] "test1/simple.csv.gz"
#>
#> $is_child
#> [1] FALSE
#>
#> $data_frame
#> $data_frame$columns
#> $data_frame$columns[[1]]
#> $data_frame$columns[[1]]$name
#> [1] "X"
#>
#> $data_frame$columns[[1]]$type
#> [1] "string"
#>
#>
#> $data_frame$columns[[2]]
#> $data_frame$columns[[2]]$name
#> [1] "Y"
#>
#> $data_frame$columns[[2]]$type
#> [1] "integer"
#>
#>
#>
#> $data_frame$row_names
#> [1] FALSE
#>
#> $data_frame$column_data
#> NULL
#>
#> $data_frame$other_data
#> NULL
#>
#> $data_frame$dimensions
#> [1] 26 2
#>
#> $data_frame$version
#> [1] 2
#>
#>
#> $csv_data_frame
#> $csv_data_frame$compression
#> [1] "gzip"
#>
#>
list.files(file.path(tmp, "test1"))
#> [1] "simple.csv.gz"