dolomite_base package

Submodules

dolomite_base.alt_read_object module

dolomite_base.alt_read_object.alt_read_object(path, metadata=None, **kwargs)[source]

Wrapper around read_object() that respects application-defined overrides from alt_read_object_function(). This allows applications to customize the reading process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated functions called by read_object) use alt_read_object internally for staging child objects instead of read_object.

Parameters:
  • path (str) – Directory containing the object to load.

  • metadata (Optional[Dict]) – Metadata for the object. If None, this should be read from the OBJECT file inside path.

  • kwargs – Further arguments, passed to individual methods.

Return type:

Any

Returns:

Some kind of object.

dolomite_base.alt_read_object.alt_read_object_function(fun=None)[source]

Get or set the alternative reading function for use by alt_read_object(). Typically set by applications prior to reading for customization, e.g., to attach more metadata to the loaded object.

Parameters:

fun (Optional[Callable]) – The alternative reading function. This should accept the same arguments and return the same value as read_object().

Return type:

Callable

Returns:

If fun = None, the current setting of the alternative reading function is returned.

Otherwise, the alternative reading function is set to fun, and the previous function is returned.

dolomite_base.alt_save_object module

dolomite_base.alt_save_object.alt_save_object(x, path, **kwargs)[source]

Wrapper around save_object() that respects application-defined overrides from alt_save_object_function().

This allows applications to customize the saving process for some or all of the object classes, assuming that developers of dolomite extensions (and the associated save_object methods) use alt_save_object internally for saving child objects instead of save_object.

Parameters:
  • x (Any) – Object to be saved.

  • path (str) – Path to a directory to save x.

  • kwargs – Further arguments to be passed to individual methods.

Return type:

Dict[str, Any]

Returns:

x is saved to path.

dolomite_base.alt_save_object.alt_save_object_function(fun=None)[source]

Get or set the alternative saving function for use by alt_save_object(). Typically set by applications prior to saving for customization, e.g., to save extra metadata.

Parameters:

fun (Optional[Callable]) – The alternative saving function. This should accept the same arguments and return the same value as save_object().

Return type:

Callable

Returns:

If fun = None, the current setting of the alternative saving function is returned.

Otherwise, the alternative saving function is set to fun, and the previous function is returned.

dolomite_base.choose_missing_placeholder module

dolomite_base.choose_missing_placeholder.choose_missing_float_placeholder(x, dtype=<class 'numpy.float64'>)[source]

Choose a missing placeholder for float sequences.

Parameters:
  • x (Sequence[float]) – Sequence of floats, possibly containing masked or None values.

  • dtype (type) – Floating-point NumPy type to use for the placeholder. Ignored if x is already a NumPy floating-point array, in which case the dtype is just set to the x.dtype.

Return type:

Optional[generic]

Returns:

Value of the placeholder. If x is a NumPy floating-point array, this is guaranteed to be of the same type as x.dtype.

If no suitable placeholder can be found, None is returned instead.

dolomite_base.choose_missing_placeholder.choose_missing_integer_placeholder(x, max_dtype=<class 'numpy.int32'>)[source]

Choose a missing placeholder for integer sequences.

Parameters:
  • x (Sequence[int]) – Sequence of integers, possibly containing masked or None values.

  • max_dtype (type) – Integer NumPy type that is guaranteed to faithfully represent all (non-None, non-masked) values of x.

Return type:

Optional[generic]

Returns:

Value of the placeholder. This is guaranteed to be of a type that can fit into max_dtype. It also may not be of the same type as x.dtype if x is a NumPy array, so some casting may be required when replacing missing values with the placeholder.

If no suitable placeholder can be found, None is returned instead.

dolomite_base.choose_missing_placeholder.choose_missing_string_placeholder(x)[source]

Choose a missing placeholder for string sequences.

Parameters:

x (Sequence[str]) – Sequence of strings, possibly containing missing or None values.

Return type:

str

Returns:

String to use as the placeholder. This may be longer than the maximum string length in x (for fixed-length-string arrays), so some casting may be required.

dolomite_base.lib_dolomite_base module

dolomite_base.lib_dolomite_base.load_list_hdf5(arg0: str, arg1: str, arg2: list) object
dolomite_base.lib_dolomite_base.load_list_json(arg0: str, arg1: list) object
dolomite_base.lib_dolomite_base.validate(arg0: str, arg1: object, arg2: dict) None

dolomite_base.load_vector_from_hdf5 module

dolomite_base.load_vector_from_hdf5.load_vector_from_hdf5(handle, expected_type, report_1darray)[source]

Load a vector from a 1-dimensional HDF5 dataset, with coercion to the expected type. Any missing value placeholders are used to set Nones or to create masks.

Parameters:
  • handle (Dataset) – Handle to a HDF5 dataset.

  • expected_type (type) – Expected type of the output vector. This should be one of float, int, str or bool.

  • report_1darray (bool) – Whether to report the output as a 1-dimensional NumPy array.

Return type:

Union[StringList, IntegerList, FloatList, BooleanList, ndarray]

Returns:

The contents of the dataset as a vector-like object. By default, this is a typed NamedList subclass with missing values represented by None. If keep_as_1darray = True, a 1-dimensional NumPy array is returned instead, possibly with masking.

dolomite_base.read_atomic_vector module

dolomite_base.read_atomic_vector.read_atomic_vector(path, metadata, atomic_vector_use_numeric_1darray=False, **kwargs)[source]

Read an atomic vector from its on-disk representation. In general, this function should not be called directly but instead via read_object().

Parameters:
  • path (str) – Path to the directory containing the object.

  • metadata (dict) – Metadata for the object.

  • atomic_vector_use_numeric_1darray (bool) – Whether numeric vectors should be represented as 1-dimensional NumPy arrays. This is more memory-efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. We set this to False by default to ensure that we can load names via NamedList subclasses.

  • kwargs – Further arguments, passed to nested objects.

Return type:

Union[StringList, IntegerList, FloatList, BooleanList, ndarray]

Returns:

An atomic vector, represented as a StringList, IntegerList, FloatList, BooleanList.

dolomite_base.read_data_frame module

dolomite_base.read_data_frame.read_data_frame(path, metadata, data_frame_represent_numeric_column_as_1darray=True, **kwargs)[source]

Load a data frame from a HDF5 file. In general, this function should not be called directly but instead via read_object().

Parameters:
  • path (str) – Path to the directory containing the object.

  • metadata (dict) – Metadata for the object.

  • data_frame_represent_numeric_column_as_1darray (bool) – Whether numeric columns should be represented as 1-dimensional NumPy arrays. This is more efficient than regular Python lists but discards the distinction between vectors and 1-D arrays. Usually this is not an important difference, but nonetheless, users can set this flag to False to load columns as (typed) lists instead.

  • kwargs – Further arguments, passed to nested objects.

Return type:

BiocFrame

Returns:

A data frame.

dolomite_base.read_object module

dolomite_base.read_object.read_object(path, metadata=None, **kwargs)[source]

Read an object from its on-disk representation. This will dispatch to individual reading functions - possibly from different packages in the dolomite framework based on the metadata from the OBJECT file.

Application developers can control the dispatch process by modifying read_object_registry. Each key is a string containing the object type, e.g., data_frame, while the value can either be a string specifying the fully qualified name of a reader function (including all modules, which will be loaded upon dispatch) or the reader function itself.

Any reader functions should accept the same arguments as :py:func`~dolomite_base.read-object.read_object` and return the loaded object. Readers may assume that the metadata argument is available, i.e., no need to account for the None case.

Parameters:
  • path (str) – Path to a directory containing the object.

  • metadata (Optional[dict]) – Metadata for the object. If None, the metadata is read from the OBJECT file inside path.

  • kwargs – Further arguments, passed to individual methods.

Return type:

Any

Returns:

Some kind of object.

dolomite_base.read_object_file module

dolomite_base.read_object_file.read_object_file(path)[source]

Read the OBJECT file in each directory, which provides some high-level metadata of the object represented by that directory. It is guaranteed to have a ‘type’ property that specifies the object type; individual objects may add their own information to this file.

Parameters:

path (str) – Path to a directory containing the object.

Return type:

Dict[str, Any]

Returns:

Dictionary containing the object metadata.

dolomite_base.read_simple_list module

dolomite_base.read_simple_list.read_simple_list(path, metadata, **kwargs)[source]

Read an R-style list from its on-disk representation in the uzuki2 format. In general, this function should not be called directly but instead via read_object().

Parameters:
  • path (str) – Path to the directory containing the object.

  • metadata (dict) – Metadata for the object.

  • kwargs – Further arguments, passed to nested objects.

Return type:

Union[dict, list]

Returns:

A list or dictionary.

dolomite_base.read_string_factor module

dolomite_base.read_string_factor.read_string_factor(path, metadata, **kwargs)[source]

Read a string factor from disk.

In general, this function should not be called directly but instead via load_object().

Parameters:
  • path (str) – Path to the directory containing the object.

  • metadata (dict) – Metadata for the object.

  • kwargs – Further arguments, passed to nested objects.

Return type:

Factor

Returns:

A Factor object.

dolomite_base.save_atomic_vector module

dolomite_base.save_atomic_vector.save_atomic_vector_from_boolean_list(x, path, **kwargs)[source]

Method for saving BooleanList objects to their corresponding file representation, see save_object() for details.

Parameters:
  • x (BooleanList) – Object to be saved.

  • path (str) – Path to save the object.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_atomic_vector.save_atomic_vector_from_float_list(x, path, **kwargs)[source]

Method for saving FloatList objects to their corresponding file representation, see save_object() for details.

Parameters:
  • x (FloatList) – Object to be saved.

  • path (str) – Path to save the object.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_atomic_vector.save_atomic_vector_from_integer_list(x, path, **kwargs)[source]

Method for saving IntegerList objects to their corresponding file representation, see save_object() for details.

Parameters:
  • x (IntegerList) – Object to be saved.

  • path (str) – Path to save the object.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_atomic_vector.save_atomic_vector_from_string_list(x, path, **kwargs)[source]

Method for saving StringList objects to their corresponding file representation, see save_object() for details.

Parameters:
  • x (StringList) – Object to be saved.

  • path (str) – Path to save the object.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_data_frame module

class dolomite_base.save_data_frame.Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector)

Bases: tuple

__getnewargs__()

Return self as a plain tuple. Used by copy and pickle.

static __new__(_cls, handle, otherable, convert_list_to_vector, convert_1darray_to_vector)

Create new instance of Hdf5ColumnOutput(handle, otherable, convert_list_to_vector, convert_1darray_to_vector)

__repr__()

Return a nicely formatted representation string

__slots__ = ()
convert_1darray_to_vector

Alias for field number 3

convert_list_to_vector

Alias for field number 2

handle

Alias for field number 0

otherable

Alias for field number 1

dolomite_base.save_data_frame.save_data_frame(x, path, data_frame_convert_list_to_vector=True, data_frame_convert_1darray_to_vector=True, **kwargs)[source]

Method for saving BiocFrame objects to the corresponding file representations, see save_object() for details.

Parameters:
  • x (BiocFrame) – Object to be saved.

  • path (str) – Path to a directory in which to save x.

  • data_frame_convert_list_to_vector (bool) – If a column is a regular Python list where all entries are of the same basic type (integer, string, float, boolean) or None, should it be converted to a typed vector in the on-disk representation? This avoids creating a separate file to store this column but changes the class of the column when the BiocFrame is read back into a Python session. If False, the list is saved as an external object instead.

  • data_frame_convert_1darray_to_vector (bool) – If a column is a 1D NumPy array, should it be saved as a typed vector? This avoids creating a separate file for the column but discards the distinction between 1D arrays and vectors. Usually this is not an important difference, but nonetheless, users can set this flag to False to save all 1D NumPy arrays as an external “dense array” object instead.

  • kwargs – Further arguments, passed to internal alt_save_object() calls.

Return type:

Dict[str, Any]

Returns:

x is saved to path.

dolomite_base.save_object module

dolomite_base.save_object.save_object(x, path, **kwargs)[source]

Save an object to its on-disk representation. dolomite extensions should define methods for this generic to stage different object classes.

Saver methods may accept additional arguments in the kwargs; these should be prefixed by the object type to avoid conflicts (see save_data_frame() for examples).

Saver methods should also use the validate_saves() decorator to ensure that the generated output in path is valid.

Parameters:
  • x (Any) – Object to be saved.

  • path (str) – Path to the output directory.

  • kwargs – Further arguments to be passed to individual methods.

Returns:

x is saved to path.

dolomite_base.save_object.validate_saves(fn)[source]

Decorator to validate the output of save_object().

Parameters:

fn – Function that implements a method for save_object.

Returns:

A wrapped version of the function that validates the directory containing the on-disk representation of the saved object.

dolomite_base.save_object_file module

dolomite_base.save_object_file.save_object_file(path, object_type, extra={})[source]

Saves object-specific metadata into the OBJECT file inside each directory, to be used by, e.g., read_object_file().

Parameters:
  • path (str) – Path to the directory representing an object. An OBJECT file will be created inside this directory.

  • object_type (str) –

    Type of the object.

    extra:

    Extra metadata to be written to the OBJECT file in path. Any entry named type will be overwritten by object_type.

dolomite_base.save_simple_list module

dolomite_base.save_simple_list.save_simple_list_from_NamedList(x, path, simple_list_mode='json', **kwargs)[source]

Method for saving a NamedList to its corresponding file representation, see save_object() for details.

Parameters:
  • x (NamedList) – Object to be saved.

  • path (str) – Path to a directory in which to save the object.

  • simple_list_mode (Literal['hdf5', 'json']) – Whether to save in HDF5 or JSON mode.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_simple_list.save_simple_list_from_dict(x, path, simple_list_mode='json', **kwargs)[source]

Method for saving dictionaries (Python analogues to R-style named lists) to the corresponding file representations, see save_object() for details.

Parameters:
  • x (dict) – Object to be saved.

  • path (str) – Path to a directory in which to save the object.

  • simple_list_mode (Literal['hdf5', 'json']) – Whether to save in HDF5 or JSON mode.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_simple_list.save_simple_list_from_list(x, path, simple_list_mode='json', **kwargs)[source]

Method for saving lists (Python analogues to R-style unnamed lists) to the corresponding file representations, see save_object() for details.

Parameters:
  • x (list) – Object to be saved.

  • path (str) – Path to a directory in which to save the object.

  • simple_list_mode (Literal['hdf5', 'json']) – Whether to save in HDF5 or JSON mode.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.save_string_factor module

dolomite_base.save_string_factor.save_string_factor(x, path, **kwargs)[source]

Method for saving Factor objects to their corresponding file representation, see save_object() for details.

Parameters:
  • x (Factor) – Object to be saved.

  • path (str) – Path to save the object.

  • kwargs – Further arguments, ignored.

Returns:

x is saved to path.

dolomite_base.validate_object module

dolomite_base.validate_object.validate_object(path, metadata=None)[source]

Validate an on-disk representation of an object, typically using validators based on the takane specifications.

Applications may also register their own validators by adding entries to validate_object_registry. Each key should be the object type and each value should be a function that accepts a path to a directory (string) and JSON-derived metadata (dictionary). The function should raise an error if the object in the directory is not valid for the specified object type.

Parameters:
  • path (str) – Path to the directory containing the object’s representation.

  • metadata (Optional[Dict]) – Metadata for the object. If None, this is read from the OBJECT file in the path.

Raises:

Error if the validation fails.

dolomite_base.write_vector_to_hdf5 module

dolomite_base.write_vector_to_hdf5.write_boolean_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]

Write a boolean vector to a HDF5 file as a 1-dimensional dataset with a 8-bit signed integer datatype. If x contains missing values, they are replaced with a placeholder value of -1.

Parameters:
  • handle (Group) – A handle to a HDF5 group.

  • name (str) – Name of the dataset in which to save the integer vector.

  • x (Sequence[bool]) – Sequence containing booleans, Nones, and/or masked NumPy values.

  • placeholder_name (str) – Name of the attribute in which to store the missing value placeholder, if x contains None or masked values.

Return type:

Dataset

Returns:

Handle for the newly created dataset.

dolomite_base.write_vector_to_hdf5.write_float_vector_to_hdf5(handle, name, x, h5type='f8', placeholder_name='missing-value-placeholder')[source]

Write a floating-point vector to a HDF5 file as a 1-dimensional dataset. If x contains missing values, a placeholder value is selected by choose_missing_float_placeholder(). and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.

Parameters:
  • handle (Group) – A handle to a HDF5 group.

  • name (str) – Name of the dataset in which to save the integer vector.

  • x (Sequence[float]) – Sequence containing floats, Nones, and/or masked NumPy values.

  • h5type (str) – Floating-point type of the HDF5 dataset to create.

  • placeholder_name (str) – Name of the attribute in which to store the missing value placeholder, if x contains None or masked values.

Return type:

Dataset

Returns:

Handle for the newly created dataset.

dolomite_base.write_vector_to_hdf5.write_integer_vector_to_hdf5(handle, name, x, h5type='i4', placeholder_name='missing-value-placeholder', allow_float_promotion=False)[source]

Write an integer vector to a HDF5 file as a 1-dimensional dataset. If x contains missing values, a placeholder value is selected by choose_missing_integer_placeholder() and used to replace all of the missing values in the dataset. The placeholder value itself is stored as an attribute of the dataset.

Parameters:
  • handle (Group) – A handle to a HDF5 group.

  • name (str) – Name of the dataset in which to save the integer vector.

  • x (Sequence[int]) – Sequence containing integers, Nones, and/or masked NumPy values.

  • h5type (str) – Integer type of the HDF5 dataset to create.

  • placeholder_name (str) – Name of the attribute in which to store the missing value placeholder, if x contains None or masked values.

  • allow_float_promotion (bool) – Whether to save x into a 64-bit floating-point dataset if any values in x exceeds the range of values that can be represented by h5type, or if no missing value placeholder can be found within the acceptable range of integer values. If False, an error is raised if x cannot be saved without promotion.

Return type:

Dataset

Returns:

Handle for the newly created dataset.

dolomite_base.write_vector_to_hdf5.write_string_vector_to_hdf5(handle, name, x, placeholder_name='missing-value-placeholder')[source]

Write a string vector to a HDF5 file as a 1-dimensional dataset with a fixed-length string datatype. If x contains missing values, a suitable placeholder value is selected using choose_missing_string_placeholder(). and used to replace all missing values in the dataset. The placeholder itself is stored as an attribute of the dataset.

Parameters:
  • handle (Group) – A handle to a HDF5 group.

  • name (str) – Name of the dataset in which to save the string vector.

  • x (Sequence[str]) – Sequence containing strings, Nones, and/or masked NumPy values.

  • placeholder_name (str) – Name of the attribute in which to store the missing value placeholder, if x contains None or masked values.

Return type:

Dataset

Returns:

Handle for the newly created dataset.

Module contents