Generic to stage assorted R objects. More methods may be defined by other packages to extend the alabaster.base framework to new classes.

stageObject(x, dir, path, child = FALSE, ...)

Arguments

x

A Bioconductor object of the specified class.

dir

String containing the path to the staging directory.

path

String containing a prefix of the relative path inside dir where x is to be saved. The actual path used to save x may include additional components, see Details.

child

Logical scalar indicating whether x is a child of a larger object.

...

Further arguments to pass to specific methods.

Value

dir is populated with files containing the contents of x. A named list containing the metadata for x is returned.

Details

Methods for the stageObject generic should create a subdirectory at the input path inside dir. All files (artifacts and metadata documents) required to represent x on disk should be created inside path. Upon method completion, path should contain:

  • Zero or one file containing the data inside x. Methods are free to choose any format and name within path except for the .json file extension, which is reserved for JSON metadata documents (see below). The presence of such a file is optional and may be omitted for metadata-only schemas.

  • Zero or many subdirectories containing child objects of x. Each child object should be saved in its own subdirectory within dir, which can have any name that does not conflict with the data file (if present) and does not end with .json. This allows developers to decompose complex x into their components for more flexible staging/loading.

The return value of each method should be a named list of metadata, which will (eventually) be passed to writeMetadata to save a JSON metadata file inside the path subdirectory. This list should contain at least:

  • $schema, a string specifying the schema to use to validate the metadata for the class of x. This may be decorated with the package attribute to help writeMetadata find the package containing the schema.

  • path, a string containing the relative path to the object's file representation inside dir. For clarity, we will denote the input path argument as PATHIN and the output path property as PATHOUT. These are different as PATHIN refers to the directory while PATHOUT refers to a file inside the directory.

    If a data file exists, PATHOUT should contain the relative path to that file from dir. Otherwise, for metadata-only schemas, PATHOUT should be set to a relative path of a JSON file inside the PATHIN subdirectory, specifying the location in which the metadata is to be saved by writeMetadata.

  • is_child, a logical scalar equal to the input child.

This list will usually contain more useful elements to describe x. The exact nature of those elements will depend on the specified schema for the class of x.

The stageObject generic will check if PATHIN already exists inside dir before dispatching to the methods. If so, it will throw an error to ensure that downstream name clashes do not occur. The exception is if PATHIN is ".", in which case no check is performed; this is useful for eliminating subdirectories in situations where the project contains only one object.

Saving child objects

The concept of child objects allows developers to break down complex objects into its basic components for convenience. For example, if one DataFrame is nested within another as a separate column, the former is a child and the latter is the parent. A list of multiple DataFrames will also represent each DataFrame as a child object. This allows developers to re-use the staging/loading code for DataFrames when reconstructing the complex parent object.

If a stageObject method needs to save a child object, it should do so in a subdirectory of PATHIN (i.e., the input path argument). This is achieved by calling altStageObject(child, dir, subpath) where child is the child component of x and subdir is the desired subdirectory path. Note the period at the start of the function, which ensures that the method respects customizations from alabaster applications (see .altStageObject for details). We also suggest creating subdir with paste0(path, "/", subname) for a given subdirectory name, which avoids potential problems with non-/ file separators.

After creating the child object's subdirectory, the stageObject method should call writeMetadata on the output of altStageObject to save the child's metadata. This will return a list that can be inserted into the parent's metadata list for the method's return value. All child files created by a stageObject method should be referenced from the metadata list, i.e., the child metadata's PATHOUT should be present in in the metadata list as a resource entry somewhere.

Any attempt to use the stageObject generic to save another non-child object into PATHIN or its subdirectories will cause an error. This ensures that PATHIN contains all and only the contents of x.

See also

checkValidDirectory, for validation of the staged contents.

Author

Aaron Lun

Examples

tmp <- tempfile()
dir.create(tmp)

library(S4Vectors)
X <- DataFrame(X=LETTERS, Y=sample(3, 26, replace=TRUE))
stageObject(X, tmp, path="test1")
#> $`$schema`
#> [1] "csv_data_frame/v1.json"
#> 
#> $path
#> [1] "test1/simple.csv.gz"
#> 
#> $is_child
#> [1] FALSE
#> 
#> $data_frame
#> $data_frame$columns
#> $data_frame$columns[[1]]
#> $data_frame$columns[[1]]$name
#> [1] "X"
#> 
#> $data_frame$columns[[1]]$type
#> [1] "string"
#> 
#> 
#> $data_frame$columns[[2]]
#> $data_frame$columns[[2]]$name
#> [1] "Y"
#> 
#> $data_frame$columns[[2]]$type
#> [1] "integer"
#> 
#> 
#> 
#> $data_frame$row_names
#> [1] FALSE
#> 
#> $data_frame$column_data
#> NULL
#> 
#> $data_frame$other_data
#> NULL
#> 
#> $data_frame$dimensions
#> [1] 26  2
#> 
#> $data_frame$version
#> [1] 2
#> 
#> 
#> $csv_data_frame
#> $csv_data_frame$compression
#> [1] "gzip"
#> 
#> 
list.files(file.path(tmp, "test1"))
#> [1] "simple.csv.gz"