dolomite_base package¶
Submodules¶
dolomite_base.alt_read_object module¶
- dolomite_base.alt_read_object.alt_read_object(path, metadata=None, **kwargs)[source]¶
Wrapper around
read_object()that respects application-defined overrides fromalt_read_object_function(). This allows applications to customize the reading process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated functions called byread_object) usealt_read_objectinternally for staging child objects instead ofread_object.
- dolomite_base.alt_read_object.alt_read_object_function(fun=None)[source]¶
Get or set the alternative reading function for use by
alt_read_object(). Typically set by applications prior to reading for customization, e.g., to attach more metadata to the loaded object.- Parameters:
fun (
Callable|None) – The alternative reading function. This should accept the same arguments and return the same value asread_object().- Return type:
- Returns:
If
fun = None, the current setting of the alternative reading function is returned.Otherwise, the alternative reading function is set to
fun, and the previous function is returned.
dolomite_base.alt_save_object module¶
- dolomite_base.alt_save_object.alt_save_object(x, path, **kwargs)[source]¶
Wrapper around
save_object()that respects application-defined overrides fromalt_save_object_function().This allows applications to customize the saving process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated
save_objectmethods) usealt_save_objectinternally for saving child objects instead ofsave_object.
- dolomite_base.alt_save_object.alt_save_object_function(fun=None)[source]¶
Get or set the alternative saving function for use by
alt_save_object(). Typically set by applications prior to saving for customization, e.g., to save extra metadata.- Parameters:
fun (
Callable|None) – The alternative saving function. This should accept the same arguments and return the same value assave_object().- Return type:
- Returns:
If
fun = None, the current setting of the alternative saving function is returned.Otherwise, the alternative saving function is set to
fun, and the previous function is returned.
dolomite_base.choose_missing_placeholder module¶
- dolomite_base.choose_missing_placeholder.choose_missing_float_placeholder(x, dtype=<class 'numpy.float64'>)[source]¶
Choose a missing placeholder for float sequences.
- Parameters:
- Return type:
- Returns:
Value of the placeholder. If
xis a NumPy floating-point array, this is guaranteed to be of the same type asx.dtype.If no suitable placeholder can be found, None is returned instead.
- dolomite_base.choose_missing_placeholder.choose_missing_integer_placeholder(x, max_dtype=<class 'numpy.int32'>)[source]¶
Choose a missing placeholder for integer sequences.
- Parameters:
- Return type:
- Returns:
Value of the placeholder. This is guaranteed to be of a type that can fit into
max_dtype. It also may not be of the same type asx.dtypeifxis a NumPy array, so some casting may be required when replacing missing values with the placeholder.If no suitable placeholder can be found, None is returned instead.
dolomite_base.lib_dolomite_base module¶
dolomite_base.list_objects module¶
- dolomite_base.list_objects.list_objects(dir, include_children=False)[source]¶
List all objects in a directory, along with their types.
- Parameters:
dir (
str) – Path to a directory in which one or more objects were saved, typically viasave_object().include_children (
bool) – Whether to include child objects (i.e., objects that are components of other objects) in the listing.
- Return type:
- Returns:
A
BiocFramewhere each row corresponds to an object indir. It contains the following columns:path, the relative path to the object’s subdirectory insidedir.type, the type of the object.child, whether the object is a child of another object.
If
include_children=False, the listing will only contain non-child objects.
dolomite_base.load_vector_from_hdf5 module¶
- dolomite_base.load_vector_from_hdf5.load_vector_from_hdf5(handle, expected_type, report_1darray)[source]¶
Load a vector from a 1-dimensional HDF5 dataset, with coercion to the expected type. Any missing value placeholders are used to set Nones or to create masks.
- Parameters:
- Return type:
StringList|IntegerList|FloatList|BooleanList|ndarray- Returns:
The contents of the dataset as a vector-like object. By default, this is a typed
NamedListsubclass with missing values represented by None. Ifkeep_as_1darray = True, a 1-dimensional NumPy array is returned instead, possibly with masking.
dolomite_base.read_atomic_vector module¶
- dolomite_base.read_atomic_vector.read_atomic_vector(path, metadata, atomic_vector_use_numeric_1darray=False, **kwargs)[source]¶
Read an atomic vector from its on-disk representation. In general, this function should not be called directly but instead via
read_object().- Parameters:
path (
str) – Path to the directory containing the object.metadata (
dict) – Metadata for the object.atomic_vector_use_numeric_1darray (
bool) – Whether numeric vectors should be represented as 1-dimensional NumPy arrays. This is more memory-efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. We set this toFalseby default to ensure that we can load names viaNamedListsubclasses.kwargs – Further arguments, passed to nested objects.
- Return type:
StringList|IntegerList|FloatList|BooleanList|ndarray- Returns:
An atomic vector, represented as a
StringList,IntegerList,FloatList,BooleanList.
dolomite_base.read_data_frame module¶
- dolomite_base.read_data_frame.read_data_frame(path, metadata, data_frame_represent_numeric_column_as_1darray=True, **kwargs)[source]¶
Load a data frame from a HDF5 file. In general, this function should not be called directly but instead via
read_object().- Parameters:
path (
str) – Path to the directory containing the object.metadata (
dict) – Metadata for the object.data_frame_represent_numeric_column_as_1darray (
bool) – Whether numeric columns should be represented as 1-dimensional NumPy arrays. This is more efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. Usually this is not an important difference, but nonetheless, users can set this flag toFalseto load columns as (typed) lists instead.kwargs – Further arguments, passed to nested objects.
- Return type:
- Returns:
A data frame.
dolomite_base.read_object module¶
- dolomite_base.read_object.read_object(path, metadata=None, **kwargs)[source]¶
Read an object from its on-disk representation. This will dispatch to individual reading functions - possibly from different packages in the dolomite framework based on the
metadatafrom theOBJECTfile.Application developers can control the dispatch process by modifying
read_object_registry. Each key is a string containing the object type, e.g.,data_frame, while the value can either be a string specifying the fully qualified name of a reader function (including all modules, which will be loaded upon dispatch) or the reader function itself.Any reader functions should accept the same arguments as :py:func`~dolomite_base.read-object.read_object` and return the loaded object. Readers may assume that the
metadataargument is available, i.e., no need to account for the None case.
dolomite_base.read_object_file module¶
- dolomite_base.read_object_file.read_object_file(path)[source]¶
Read the
OBJECTfile in each directory, which provides some high-level metadata of the object represented by that directory. It is guaranteed to have a ‘type’ property that specifies the object type; individual objects may add their own information to this file.
dolomite_base.read_simple_list module¶
- dolomite_base.read_simple_list.read_simple_list(path, metadata, **kwargs)[source]¶
Read an R-style list from its on-disk representation in the uzuki2 format. In general, this function should not be called directly but instead via
read_object().
dolomite_base.read_string_factor module¶
dolomite_base.save_atomic_vector module¶
- dolomite_base.save_atomic_vector.save_atomic_vector_from_boolean_list(x, path, **kwargs)[source]¶
Method for saving
BooleanListobjects to their corresponding file representation, seesave_object()for details.- Parameters:
x (
BooleanList) – Object to be saved.path (
str) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_float_list(x, path, **kwargs)[source]¶
Method for saving
FloatListobjects to their corresponding file representation, seesave_object()for details.- Parameters:
x (
FloatList) – Object to be saved.path (
str) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_integer_list(x, path, **kwargs)[source]¶
Method for saving
IntegerListobjects to their corresponding file representation, seesave_object()for details.- Parameters:
x (
IntegerList) – Object to be saved.path (
str) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_string_list(x, path, string_list_vls=False, **kwargs)[source]¶
Method for saving
StringListobjects to their corresponding file representation, seesave_object()for details.- Parameters:
x (
StringList) – Object to be saved.path (
str) – Path to save the object.string_list_vls (
bool|None) – Whether to save variable-length strings into a custom VLS array format. IfNone, this is automatically determined by comparing the required storage with that of fixed-length strings.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
dolomite_base.save_data_frame module¶
- class dolomite_base.save_data_frame.Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector, use_vls)¶
Bases:
tuple- __annotate_func__ = None¶
- __annotations_cache__ = {}¶
- __getnewargs__()¶
Return self as a plain tuple. Used by copy and pickle.
- __match_args__ = ('handle', 'otherable', 'convert_list_to_vector', 'convert_1darray_to_vector', 'use_vls')¶
- static __new__(_cls, handle, otherable, convert_list_to_vector, convert_1darray_to_vector, use_vls)¶
Create new instance of Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector, use_vls)
- __replace__(**kwds)¶
Return a new Hdf5ColumnOutput object replacing specified fields with new values
- __repr__()¶
Return a nicely formatted representation string
- __slots__ = ()¶
- convert_1darray_to_vector¶
Alias for field number 3
- convert_list_to_vector¶
Alias for field number 2
- handle¶
Alias for field number 0
- otherable¶
Alias for field number 1
- use_vls¶
Alias for field number 4
- dolomite_base.save_data_frame.save_data_frame(x, path, data_frame_convert_list_to_vector=True, data_frame_convert_1darray_to_vector=True, data_frame_string_list_vls=False, **kwargs)[source]¶
Method for saving
BiocFrameobjects to the corresponding file representations, seesave_object()for details.- Parameters:
x (
BiocFrame) – Object to be saved.path (
str) – Path to a directory in which to savex.data_frame_convert_list_to_vector (
bool) – If a column is a regular Python list where all entries are of the same basic type (integer, string, float, boolean) or None, should it be converted to a typed vector in the on-disk representation? This avoids creating a separate file to store this column but changes the class of the column when theBiocFrameis read back into a Python session. IfFalse, the list is saved as an external object instead.data_frame_convert_1darray_to_vector (
bool) – If a column is a 1D NumPy array, should it be saved as a typed vector? This avoids creating a separate file for the column but discards the distinction between 1D arrays and vectors. Usually this is not an important difference, but nonetheless, users can set this flag toFalseto save all 1D NumPy arrays as an external “dense array” object instead.data_frame_string_list_vls (
bool) – Whether to save columns of variable-length strings into a custom VLS array format. IfNone, this is automatically determined by comparing the required storage with that of fixed-length strings.kwargs – Further arguments, passed to internal
alt_save_object()calls.
- Return type:
- Returns:
x is saved to path.
dolomite_base.save_object module¶
- dolomite_base.save_object.save_object(x, path, **kwargs)[source]¶
Save an object to its on-disk representation. dolomite extensions should define methods for this generic to stage different object classes.
Saver methods may accept additional arguments in the
kwargs; these should be prefixed by the object type to avoid conflicts (seesave_data_frame()for examples).Saver methods should also use the
validate_saves()decorator to ensure that the generated output inpathis valid.
- dolomite_base.save_object.validate_saves(fn)[source]¶
Decorator to validate the output of
save_object().- Parameters:
fn – Function that implements a method for
save_object.- Returns:
A wrapped version of the function that validates the directory containing the on-disk representation of the saved object.
dolomite_base.save_object_file module¶
- dolomite_base.save_object_file.save_object_file(path, object_type, extra={})[source]¶
Saves object-specific metadata into the
OBJECTfile inside each directory, to be used by, e.g.,read_object_file().
dolomite_base.save_simple_list module¶
- dolomite_base.save_simple_list.save_simple_list_from_NamedList(x, path, simple_list_mode='json', simple_list_string_list_vls=False, **kwargs)[source]¶
Method for saving a NamedList to its corresponding file representation, see
save_object()for details.- Parameters:
x (
NamedList) – Object to be saved.path (
str) – Path to a directory in which to save the object.simple_list_mode (
Literal['hdf5','json']) – Whether to save in HDF5 or JSON mode.simple_list_string_list_vls (
bool) – Whether to saveStringListobjects of variable-length strings into a custom VLS array format for HDF5. IfNone, this is automatically determined by comparing the required storage with that of fixed-length strings. Only relevant ifsimple_list_mode = "hdf5".kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
- dolomite_base.save_simple_list.save_simple_list_from_dict(x, path, simple_list_mode='json', simple_list_string_list_vls=False, **kwargs)[source]¶
Method for saving dictionaries (Python analogues to R-style named lists) to the corresponding file representations, see
save_object()for details.
- dolomite_base.save_simple_list.save_simple_list_from_list(x, path, simple_list_mode='json', simple_list_string_list_vls=False, **kwargs)[source]¶
Method for saving lists (Python analogues to R-style unnamed lists) to the corresponding file representations, see
save_object()for details.
dolomite_base.save_string_factor module¶
- dolomite_base.save_string_factor.save_string_factor(x, path, **kwargs)[source]¶
Method for saving
Factorobjects to their corresponding file representation, seesave_object()for details.- Parameters:
x (
Factor) – Object to be saved.path (
str) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
dolomite_base.validate_directory module¶
- dolomite_base.validate_directory.validate_directory(dir)[source]¶
Check whether each object in a directory is valid by calling
validate_object()on each non-child object.- Parameters:
dir (
str) – Path to a directory with subdirectories populated bysave_object().diritself may also correspond to an object.- Return type:
- Returns:
List of the paths inside
dirthat were validated. This contains onlyNoneifdiritself corresponds to an object.
dolomite_base.validate_object module¶
- dolomite_base.validate_object.validate_object(path, metadata=None)[source]¶
Validate an on-disk representation of an object, typically using validators based on the takane specifications.
Applications may also register their own validators by adding entries to
validate_object_registry. Each key should be the object type and each value should be a function that accepts a path to a directory (string) and JSON-derived metadata (dictionary). The function should raise an error if the object in the directory is not valid for the specified object type.
dolomite_base.write_vector_to_hdf5 module¶
- dolomite_base.write_vector_to_hdf5.write_boolean_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]¶
Write a boolean vector to a HDF5 file as a 1-dimensional dataset with a 8-bit signed integer datatype. If
xcontains missing values, they are replaced with a placeholder value of -1.- Parameters:
handle (
Group) – A handle to a HDF5 group.name (
str) – Name of the dataset in which to save the integer vector.x (
Sequence[bool]) – Sequence containing booleans, Nones, and/or masked NumPy values.placeholder_name (
str) – Name of the attribute in which to store the missing value placeholder, ifxcontains None or masked values.
- Return type:
Dataset- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_float_vector_to_hdf5(handle, name, x, h5type='f8', placeholder_name='missing-value-placeholder')[source]¶
Write a floating-point vector to a HDF5 file as a 1-dimensional dataset. If
xcontains missing values, a placeholder value is selected bychoose_missing_float_placeholder(). and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.- Parameters:
handle (
Group) – A handle to a HDF5 group.name (
str) – Name of the dataset in which to save the integer vector.x (
Sequence[float]) – Sequence containing floats, Nones, and/or masked NumPy values.h5type (
str) – Floating-point type of the HDF5 dataset to create.placeholder_name (
str) – Name of the attribute in which to store the missing value placeholder, ifxcontains None or masked values.
- Return type:
Dataset- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_integer_vector_to_hdf5(handle, name, x, h5type='i4', placeholder_name='missing-value-placeholder', allow_float_promotion=False)[source]¶
Write an integer vector to a HDF5 file as a 1-dimensional dataset. If
xcontains missing values, a placeholder value is selected bychoose_missing_integer_placeholder()and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.- Parameters:
handle (
Group) – A handle to a HDF5 group.name (
str) – Name of the dataset in which to save the integer vector.x (
Sequence[int]) – Sequence containing integers, Nones, and/or masked NumPy values.h5type (
str) – Integer type of the HDF5 dataset to create.placeholder_name (
str) – Name of the attribute in which to store the missing value placeholder, ifxcontains None or masked values.allow_float_promotion (
bool) – Whether to savexinto a 64-bit floating-point dataset if any values inxexceeds the range of values that can be represented byh5type, or if no missing value placeholder can be found within the acceptable range of integer values. IfFalse, an error is raised ifxcannot be saved without promotion.
- Return type:
Dataset- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_string_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]¶
Write a string vector to a HDF5 file as a 1-dimensional dataset with a fixed-length string datatype. If
xcontains missing values, a suitable placeholder value is selected usingchoose_missing_string_placeholder(). and used to replace all missing values in the dataset. The placeholder itself is stored as an attribute of the dataset.- Parameters:
handle (
Group) – A handle to a HDF5 group.name (
str) – Name of the dataset in which to save the string vector.x (
Sequence[str]) – Sequence containing strings, Nones, and/or masked NumPy values.placeholder_name (
str) – Name of the attribute in which to store the missing value placeholder, ifxcontains None or masked values.
- Return type:
Dataset- Returns:
Handle for the newly created dataset.