dolomite_base package¶
Submodules¶
dolomite_base.alt_read_object module¶
- dolomite_base.alt_read_object.alt_read_object(path, metadata=None, **kwargs)[source]¶
Wrapper around
read_object()
that respects application-defined overrides fromalt_read_object_function()
. This allows applications to customize the reading process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated functions called byread_object
) usealt_read_object
internally for staging child objects instead ofread_object
.
- dolomite_base.alt_read_object.alt_read_object_function(fun=None)[source]¶
Get or set the alternative reading function for use by
alt_read_object()
. Typically set by applications prior to reading for customization, e.g., to attach more metadata to the loaded object.- Parameters:
fun (
Optional
[Callable
]) – The alternative reading function. This should accept the same arguments and return the same value asread_object()
.- Return type:
- Returns:
If
fun = None
, the current setting of the alternative reading function is returned.Otherwise, the alternative reading function is set to
fun
, and the previous function is returned.
dolomite_base.alt_save_object module¶
- dolomite_base.alt_save_object.alt_save_object(x, path, **kwargs)[source]¶
Wrapper around
save_object()
that respects application-defined overrides fromalt_save_object_function()
.This allows applications to customize the saving process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated
save_object
methods) usealt_save_object
internally for saving child objects instead ofsave_object
.
- dolomite_base.alt_save_object.alt_save_object_function(fun=None)[source]¶
Get or set the alternative saving function for use by
alt_save_object()
. Typically set by applications prior to saving for customization, e.g., to save extra metadata.- Parameters:
fun (
Optional
[Callable
]) – The alternative saving function. This should accept the same arguments and return the same value assave_object()
.- Return type:
- Returns:
If
fun = None
, the current setting of the alternative saving function is returned.Otherwise, the alternative saving function is set to
fun
, and the previous function is returned.
dolomite_base.choose_missing_placeholder module¶
- dolomite_base.choose_missing_placeholder.choose_missing_float_placeholder(x, dtype=<class 'numpy.float64'>)[source]¶
Choose a missing placeholder for float sequences.
- Parameters:
- Return type:
- Returns:
Value of the placeholder. If
x
is a NumPy floating-point array, this is guaranteed to be of the same type asx.dtype
.If no suitable placeholder can be found, None is returned instead.
- dolomite_base.choose_missing_placeholder.choose_missing_integer_placeholder(x, max_dtype=<class 'numpy.int32'>)[source]¶
Choose a missing placeholder for integer sequences.
- Parameters:
- Return type:
- Returns:
Value of the placeholder. This is guaranteed to be of a type that can fit into
max_dtype
. It also may not be of the same type asx.dtype
ifx
is a NumPy array, so some casting may be required when replacing missing values with the placeholder.If no suitable placeholder can be found, None is returned instead.
dolomite_base.lib_dolomite_base module¶
dolomite_base.load_vector_from_hdf5 module¶
- dolomite_base.load_vector_from_hdf5.load_vector_from_hdf5(handle, expected_type, report_1darray)[source]¶
Load a vector from a 1-dimensional HDF5 dataset, with coercion to the expected type. Any missing value placeholders are used to set Nones or to create masks.
- Parameters:
- Return type:
Union
[StringList
,IntegerList
,FloatList
,BooleanList
,ndarray
]- Returns:
The contents of the dataset as a vector-like object. By default, this is a typed
NamedList
subclass with missing values represented by None. Ifkeep_as_1darray = True
, a 1-dimensional NumPy array is returned instead, possibly with masking.
dolomite_base.read_atomic_vector module¶
- dolomite_base.read_atomic_vector.read_atomic_vector(path, metadata, atomic_vector_use_numeric_1darray=False, **kwargs)[source]¶
Read an atomic vector from its on-disk representation. In general, this function should not be called directly but instead via
read_object()
.- Parameters:
path (
str
) – Path to the directory containing the object.metadata (
dict
) – Metadata for the object.atomic_vector_use_numeric_1darray (
bool
) – Whether numeric vectors should be represented as 1-dimensional NumPy arrays. This is more memory-efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. We set this toFalse
by default to ensure that we can load names viaNamedList
subclasses.kwargs – Further arguments, passed to nested objects.
- Return type:
Union
[StringList
,IntegerList
,FloatList
,BooleanList
,ndarray
]- Returns:
An atomic vector, represented as a
StringList
,IntegerList
,FloatList
,BooleanList
.
dolomite_base.read_data_frame module¶
- dolomite_base.read_data_frame.read_data_frame(path, metadata, data_frame_represent_numeric_column_as_1darray=True, **kwargs)[source]¶
Load a data frame from a HDF5 file. In general, this function should not be called directly but instead via
read_object()
.- Parameters:
path (
str
) – Path to the directory containing the object.metadata (
dict
) – Metadata for the object.data_frame_represent_numeric_column_as_1darray (
bool
) – Whether numeric columns should be represented as 1-dimensional NumPy arrays. This is more efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. Usually this is not an important difference, but nonetheless, users can set this flag toFalse
to load columns as (typed) lists instead.kwargs – Further arguments, passed to nested objects.
- Return type:
- Returns:
A data frame.
dolomite_base.read_object module¶
- dolomite_base.read_object.read_object(path, metadata=None, **kwargs)[source]¶
Read an object from its on-disk representation. This will dispatch to individual reading functions - possibly from different packages in the dolomite framework based on the
metadata
from theOBJECT
file.Application developers can control the dispatch process by modifying
read_object_registry
. Each key is a string containing the object type, e.g.,data_frame
, while the value can either be a string specifying the fully qualified name of a reader function (including all modules, which will be loaded upon dispatch) or the reader function itself.Any reader functions should accept the same arguments as :py:func`~dolomite_base.read-object.read_object` and return the loaded object. Readers may assume that the
metadata
argument is available, i.e., no need to account for the None case.
dolomite_base.read_object_file module¶
- dolomite_base.read_object_file.read_object_file(path)[source]¶
Read the
OBJECT
file in each directory, which provides some high-level metadata of the object represented by that directory. It is guaranteed to have a ‘type’ property that specifies the object type; individual objects may add their own information to this file.
dolomite_base.read_simple_list module¶
- dolomite_base.read_simple_list.read_simple_list(path, metadata, **kwargs)[source]¶
Read an R-style list from its on-disk representation in the uzuki2 format. In general, this function should not be called directly but instead via
read_object()
.
dolomite_base.read_string_factor module¶
dolomite_base.save_atomic_vector module¶
- dolomite_base.save_atomic_vector.save_atomic_vector_from_boolean_list(x, path, **kwargs)[source]¶
Method for saving
BooleanList
objects to their corresponding file representation, seesave_object()
for details.- Parameters:
x (
BooleanList
) – Object to be saved.path (
str
) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_float_list(x, path, **kwargs)[source]¶
Method for saving
FloatList
objects to their corresponding file representation, seesave_object()
for details.- Parameters:
x (
FloatList
) – Object to be saved.path (
str
) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_integer_list(x, path, **kwargs)[source]¶
Method for saving
IntegerList
objects to their corresponding file representation, seesave_object()
for details.- Parameters:
x (
IntegerList
) – Object to be saved.path (
str
) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
- dolomite_base.save_atomic_vector.save_atomic_vector_from_string_list(x, path, **kwargs)[source]¶
Method for saving
StringList
objects to their corresponding file representation, seesave_object()
for details.- Parameters:
x (
StringList
) – Object to be saved.path (
str
) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
dolomite_base.save_data_frame module¶
- class dolomite_base.save_data_frame.Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector)¶
Bases:
tuple
- __getnewargs__()¶
Return self as a plain tuple. Used by copy and pickle.
- static __new__(_cls, handle, otherable, convert_list_to_vector, convert_1darray_to_vector)¶
Create new instance of Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector)
- __repr__()¶
Return a nicely formatted representation string
- __slots__ = ()¶
- convert_1darray_to_vector¶
Alias for field number 3
- convert_list_to_vector¶
Alias for field number 2
- handle¶
Alias for field number 0
- otherable¶
Alias for field number 1
- dolomite_base.save_data_frame.save_data_frame(x, path, data_frame_convert_list_to_vector=True, data_frame_convert_1darray_to_vector=True, **kwargs)[source]¶
Method for saving
BiocFrame
objects to the corresponding file representations, seesave_object()
for details.- Parameters:
x (
BiocFrame
) – Object to be saved.path (
str
) – Path to a directory in which to savex
.data_frame_convert_list_to_vector (
bool
) – If a column is a regular Python list where all entries are of the same basic type (integer, string, float, boolean) or None, should it be converted to a typed vector in the on-disk representation? This avoids creating a separate file to store this column but changes the class of the column when theBiocFrame
is read back into a Python session. IfFalse
, the list is saved as an external object instead.data_frame_convert_1darray_to_vector (
bool
) – If a column is a 1D NumPy array, should it be saved as a typed vector? This avoids creating a separate file for the column but discards the distinction between 1D arrays and vectors. Usually this is not an important difference, but nonetheless, users can set this flag toFalse
to save all 1D NumPy arrays as an external “dense array” object instead.kwargs – Further arguments, passed to internal
alt_save_object()
calls.
- Return type:
- Returns:
x is saved to path.
dolomite_base.save_object module¶
- dolomite_base.save_object.save_object(x, path, **kwargs)[source]¶
Save an object to its on-disk representation. dolomite extensions should define methods for this generic to stage different object classes.
Saver methods may accept additional arguments in the
kwargs
; these should be prefixed by the object type to avoid conflicts (seesave_data_frame()
for examples).Saver methods should also use the
validate_saves()
decorator to ensure that the generated output inpath
is valid.
- dolomite_base.save_object.validate_saves(fn)[source]¶
Decorator to validate the output of
save_object()
.- Parameters:
fn – Function that implements a method for
save_object
.- Returns:
A wrapped version of the function that validates the directory containing the on-disk representation of the saved object.
dolomite_base.save_object_file module¶
- dolomite_base.save_object_file.save_object_file(path, object_type, extra={})[source]¶
Saves object-specific metadata into the
OBJECT
file inside each directory, to be used by, e.g.,read_object_file()
.
dolomite_base.save_simple_list module¶
- dolomite_base.save_simple_list.save_simple_list_from_NamedList(x, path, simple_list_mode='json', **kwargs)[source]¶
Method for saving a NamedList to its corresponding file representation, see
save_object()
for details.
- dolomite_base.save_simple_list.save_simple_list_from_dict(x, path, simple_list_mode='json', **kwargs)[source]¶
Method for saving dictionaries (Python analogues to R-style named lists) to the corresponding file representations, see
save_object()
for details.
- dolomite_base.save_simple_list.save_simple_list_from_list(x, path, simple_list_mode='json', **kwargs)[source]¶
Method for saving lists (Python analogues to R-style unnamed lists) to the corresponding file representations, see
save_object()
for details.
dolomite_base.save_string_factor module¶
- dolomite_base.save_string_factor.save_string_factor(x, path, **kwargs)[source]¶
Method for saving
Factor
objects to their corresponding file representation, seesave_object()
for details.- Parameters:
x (
Factor
) – Object to be saved.path (
str
) – Path to save the object.kwargs – Further arguments, ignored.
- Returns:
x is saved to path.
dolomite_base.validate_object module¶
- dolomite_base.validate_object.validate_object(path, metadata=None)[source]¶
Validate an on-disk representation of an object, typically using validators based on the takane specifications.
Applications may also register their own validators by adding entries to
validate_object_registry
. Each key should be the object type and each value should be a function that accepts a path to a directory (string) and JSON-derived metadata (dictionary). The function should raise an error if the object in the directory is not valid for the specified object type.
dolomite_base.write_vector_to_hdf5 module¶
- dolomite_base.write_vector_to_hdf5.write_boolean_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]¶
Write a boolean vector to a HDF5 file as a 1-dimensional dataset with a 8-bit signed integer datatype. If
x
contains missing values, they are replaced with a placeholder value of -1.- Parameters:
handle (
Group
) – A handle to a HDF5 group.name (
str
) – Name of the dataset in which to save the integer vector.x (
Sequence
[bool
]) – Sequence containing booleans, Nones, and/or masked NumPy values.placeholder_name (
str
) – Name of the attribute in which to store the missing value placeholder, ifx
contains None or masked values.
- Return type:
Dataset
- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_float_vector_to_hdf5(handle, name, x, h5type='f8', placeholder_name='missing-value-placeholder')[source]¶
Write a floating-point vector to a HDF5 file as a 1-dimensional dataset. If
x
contains missing values, a placeholder value is selected bychoose_missing_float_placeholder()
. and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.- Parameters:
handle (
Group
) – A handle to a HDF5 group.name (
str
) – Name of the dataset in which to save the integer vector.x (
Sequence
[float
]) – Sequence containing floats, Nones, and/or masked NumPy values.h5type (
str
) – Floating-point type of the HDF5 dataset to create.placeholder_name (
str
) – Name of the attribute in which to store the missing value placeholder, ifx
contains None or masked values.
- Return type:
Dataset
- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_integer_vector_to_hdf5(handle, name, x, h5type='i4', placeholder_name='missing-value-placeholder', allow_float_promotion=False)[source]¶
Write an integer vector to a HDF5 file as a 1-dimensional dataset. If
x
contains missing values, a placeholder value is selected bychoose_missing_integer_placeholder()
and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.- Parameters:
handle (
Group
) – A handle to a HDF5 group.name (
str
) – Name of the dataset in which to save the integer vector.x (
Sequence
[int
]) – Sequence containing integers, Nones, and/or masked NumPy values.h5type (
str
) – Integer type of the HDF5 dataset to create.placeholder_name (
str
) – Name of the attribute in which to store the missing value placeholder, ifx
contains None or masked values.allow_float_promotion (
bool
) – Whether to savex
into a 64-bit floating-point dataset if any values inx
exceeds the range of values that can be represented byh5type
, or if no missing value placeholder can be found within the acceptable range of integer values. IfFalse
, an error is raised ifx
cannot be saved without promotion.
- Return type:
Dataset
- Returns:
Handle for the newly created dataset.
- dolomite_base.write_vector_to_hdf5.write_string_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]¶
Write a string vector to a HDF5 file as a 1-dimensional dataset with a fixed-length string datatype. If
x
contains missing values, a suitable placeholder value is selected usingchoose_missing_string_placeholder()
. and used to replace all missing values in the dataset. The placeholder itself is stored as an attribute of the dataset.- Parameters:
handle (
Group
) – A handle to a HDF5 group.name (
str
) – Name of the dataset in which to save the string vector.x (
Sequence
[str
]) – Sequence containing strings, Nones, and/or masked NumPy values.placeholder_name (
str
) – Name of the attribute in which to store the missing value placeholder, ifx
contains None or masked values.
- Return type:
Dataset
- Returns:
Handle for the newly created dataset.