chihaya
Validating delayed array operations in HDF5
|
This repository contains a specification for delayed array operations stored in a HDF5 file. The concept of delayed operations is taken from Bioconductor's DelayedArray package, where any operations on a DelayedArray
are cached in memory and evaluated on an as-needed basis. Our aim is to save these operations to file in a well-defined, cross-language format; this avoids the need to compute and store the results of such operations, which may be prohibitively expensive.
Several use cases benefit from the serialization of delayed operations:
uint8_t
s. We apply a transformation that promotes the type, e.g., log-transformation to float
s or double
s. By saving the delayed operation, we can maintain our compact representation to reduce the file size.In the chihaya specification, we store a "delayed object" as a HDF5 group in the file. Delayed operations are represented as further nested groups, terminating in an array containing the original data (or a reference to it). The type of delayed operation/array is specified in the group's attributes. By recursively inspecting the contents of each HDF5 group, applications can reconstitute the original delayed object in the framework of choice.
The chihaya specification currently supports a range of delayed operations including subsetting, combining, transposition, matrix products, and an assortment of unary and binary operations. It also supports dense arrays, sparse matrices, constant arrays and custom arrays. More details about the on-disk representation of each operation can be found in the specifications:
In C++, a delayed object in a file can be validated by calling the validate
function:
In R, DelayedArray
objects (from the DelayedArray package) can be saved to a chihaya-compliant HDF5 file using the our R package. The same package also reconstitutes a DelayedArray
from the file.
FetchContent
If you're using CMake, you just need to add something like this to your CMakeLists.txt
:
Then you can link to chihaya to make the headers available during compilation:
find_package()
You can install the library by cloning a suitable version of this repository and running the following commands:
Then you can use find_package()
as usual:
If you're not using CMake, the simple approach is to just copy the files the include/
subdirectory - either directly or with Git submodules - and include their path during compilation with, e.g., GCC's -I
. You will also need to link to the HDF5 library, usually from a system installation (1.10 or higher).
Web applications can read delayed matrices into memory using the chihaya Javascript package.
At some point, we may also add tatami bindings to load the delayed operations into memory. This would enable C++ applications to natively read from the HDF5 files that comply with chihaya's specification.
The library is provisionally named after Chihaya Kisaragi, one of my favorite characters.