dolomite_matrix package¶
Submodules¶
dolomite_matrix.DelayedMask module¶
- class dolomite_matrix.DelayedMask.DelayedMask(seed, placeholder, dtype=None)[source]¶
Bases:
DelayedOpDelayed mask to replace the missing value placeholder with a NumPy masked array.
- __init__(seed, placeholder, dtype=None)[source]¶
- Parameters:
seed – Any object that satisfies the seed contract, see
DelayedArrayfor details.placeholder – Placeholder value for defining masked values, of the same type as
seed.dtype(or coercible into that type). All values equal to the placeholder are considered to be missing.dtype (
Optional[dtype]) – Desired type of the masked output, defaults toseed.dtype.
- property placeholder¶
Returns: The placeholder value.
- property seed¶
Returns: The seed object.
- dolomite_matrix.DelayedMask.extract_dense_array_DelayedMask(x, subset)[source]¶
See
extract_dense_array().
dolomite_matrix.ReloadedArray module¶
- class dolomite_matrix.ReloadedArray.ReloadedArray(seed, path)[source]¶
Bases:
DelayedArrayAn array that was reloaded from disk by the
read_object()function, and remembers the path from which it was loaded. This class allows methods to refer to the existing on-disk representation by inspecting the path. For example,save_object()can just copy/link to the existing files instead of repeating the saving process.- __init__(seed, path)[source]¶
To construct a
ReloadedArrayfrom an existingReloadedArraySeed, usewrap()instead.- Parameters:
seed – The contents of the reloaded array.
path (
str) – Path to the directory containing the on-disk representation.
- class dolomite_matrix.ReloadedArray.ReloadedArraySeed(seed, path)[source]¶
Bases:
WrapperArraySeedSeed for the
ReloadedArrayclass. This is a subclass ofWrapperArraySeed.
- dolomite_matrix.ReloadedArray.save_object_ReloadedArray(x, path, reloaded_array_reuse_mode='link', **kwargs)[source]¶
Method for saving
ReloadedArrayobjects to disk, seesave_object()for details.- Parameters:
x (
ReloadedArray) – Object to be saved.path (
str) – Path to a directory to savex.reloaded_array_reuse_mode (
str) – How the files inx.pathshould be re-used when populatingpath. This can be"link", to create a hard link to each file;"symlink", to create a symbolic link to each file;"copy", to create a copy of each file; or"none", to perform a fresh save ofxwithout relying onx.path.kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
dolomite_matrix.WrapperArraySeed module¶
- class dolomite_matrix.WrapperArraySeed.WrapperArraySeed(seed)[source]¶
Bases:
objectWrapper for a DelayedArray seed, which forwards all of the required operations to the seed object. This is expected to be used as a base for concrete subclasses that attach more provenance-tracking information - see
ReloadedArrayfor an example.- __annotations__ = {}¶
- property seed¶
Returns: The underlying seed instance.
- dolomite_matrix.WrapperArraySeed.chunk_grid_WrapperArraySeed(x)[source]¶
See
chunk_grid()for details.
- dolomite_matrix.WrapperArraySeed.create_dask_array_WrapperArraySeed(x)[source]¶
See
create_dask_array()for details.
- dolomite_matrix.WrapperArraySeed.extract_dense_array_WrapperArraySeed(x, subset)[source]¶
See
extract_dense_array()for details.- Return type:
- dolomite_matrix.WrapperArraySeed.extract_sparse_array_WrapperArraySeed(x, subset)[source]¶
See
extract_sparse_array()for details.- Return type:
- dolomite_matrix.WrapperArraySeed.is_masked_WrapperArraySeed(x)[source]¶
See
is_masked()for details.- Return type:
- dolomite_matrix.WrapperArraySeed.is_sparse_WrapperArraySeed(x)[source]¶
See
is_sparse()for details.- Return type:
dolomite_matrix.choose_chunk_dimensions module¶
- dolomite_matrix.choose_chunk_dimensions.choose_chunk_dimensions(shape, size, min_extent=100, buffer_size=10000000.0)[source]¶
Choose chunk dimensions to use for a dense HDF5 dataset. For each dimension, we consider a slice of the array that consists of the full extent of all other dimensions. We want this slice to occupy less than
buffer_sizein memory, and we resize the slice along the current dimension to achieve this. The chunk size is then chosen as the size of the slice along the current dimension. This ensures that efficient iteration along each dimension will not use any more thanbuffer_sizebytes.- Parameters:
size (
int) – Size of each array element in bytes.min_extent (
int) – Minimum extent of each chunk dimension, to avoid problems with excessively small chunk sizes when the data is large.buffer_size (
int) – Size of the (conceptual) memory buffer to use for storing blocks of data during iteration through the array, in bytes.
- Return type:
- Returns:
Tuple containing the chunk dimensions.
dolomite_matrix.read_compressed_sparse_matrix module¶
- dolomite_matrix.read_compressed_sparse_matrix.read_compressed_sparse_matrix(path, metadata, **kwargs)[source]¶
Read a compressed sparse matrix from its on-disk representation. In general, this function should not be called directly but instead be dispatched via
read_object().- Parameters:
- Return type:
- Returns:
A
ReloadedArraycontaining a HDF5-backed compressed sparse matrix as a seed.
dolomite_matrix.read_dense_array module¶
- dolomite_matrix.read_dense_array.read_dense_array(path, metadata, **kwargs)[source]¶
Read a dense array from its on-disk representation. In general, this function should not be called directly but instead be dispatched via
read_object().- Parameters:
- Return type:
- Returns:
A
ReloadedArraycontaining a HDF5-backed dense array as a seed.
dolomite_matrix.save_compressed_sparse_matrix module¶
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_Sparse2darray(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving a
SparseNdarrayto disk, seesave_object()for details.- Parameters:
x (
SparseNdarray) – Object to be saved.path (
str) – Path to a directory to savex.compressed_sparse_matrix_chunk_size (
int) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_buffer_size (
int) – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_coo_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
coo_matrixobjects to disk, seestage_object()for details.- Parameters:
x (
coo_matrix) – Matrix to be saved.path (
str) – Path to a directory to savex.compressed_sparse_matrix_chunk_size (
int) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_csc_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
csc_matrixobjects to disk, seestage_object()for details.- Parameters:
x (
csc_matrix) – Matrix to be saved.path (
str) – Path to a directory to savex.compressed_sparse_matrix_chunk_size (
int) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_csr_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
csr_matrixobjects to disk, seestage_object()for details.- Parameters:
x (
csr_matrix) – Matrix to be saved.path (
str) – Path to a directory to savex.compressed_sparse_matrix_chunk_size (
int) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
xis saved topath.
dolomite_matrix.save_delayed_array module¶
- dolomite_matrix.save_delayed_array.save_delayed_array(x, path, delayed_array_preserve_operations=False, **kwargs)[source]¶
Method to save
DelayedArrayobjects to disk, seesave_object()for details.If the array is pristine, we attempt to use the
save_objectmethod of the seed. Ifdelayed_array_preserve_operations = False, we save theDelayedArrayas a dense array or a compressed sparse matrix.- Parameters:
x (
DelayedArray) – Object to be saved.path (
str) – Path to a directory to savex.delayed_array_preserve_operations (
bool) – Whether to preserve delayed operations via the chihaya specification. Currently not supported.kwargs – Further arguments, passed to the
save_objectmethods for dense arrays and compressed sparse matrices.
- Returns:
xis saved topath.
dolomite_matrix.save_dense_array module¶
- dolomite_matrix.save_dense_array.save_dense_array_from_ndarray(x, path, dense_array_chunk_dimensions=None, dense_array_chunk_args={}, dense_array_buffer_size=100000000.0, dense_array_string_vls=False, **kwargs)[source]¶
Method for saving
ndarrayobjects to disk, seesave_object()for details.- Parameters:
x (
ndarray) – Object to be saved.path (
str) – Path to a directory to savex.dense_array_chunk_dimensions (
Optional[Tuple[int,...]]) – Chunk dimensions for the HDF5 dataset. Larger values improve compression at the potential cost of reducing random access efficiency. If not provided, we choose some chunk sizes withchoose_chunk_dimensions().dense_array_chunk_args (
Dict) – Arguments to pass tochoose_chunk_dimensionsifdense_array_chunk_dimensionsis not provided.dense_array_buffer_size (
int) – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.kwargs – Further arguments, ignored.
- Returns:
xis saved topath.