dolomite_matrix package¶
Submodules¶
dolomite_matrix.DelayedMask module¶
- class dolomite_matrix.DelayedMask.DelayedMask(seed, placeholder, dtype=None)[source]¶
Bases:
DelayedOp
Delayed mask to replace the missing value placeholder with a NumPy masked array.
- property placeholder¶
Returns: The placeholder value.
- property seed¶
Returns: The seed object.
- dolomite_matrix.DelayedMask.extract_dense_array_DelayedMask(x, subset)[source]¶
See
extract_dense_array()
.
dolomite_matrix.ReloadedArray module¶
- class dolomite_matrix.ReloadedArray.ReloadedArray(seed, path)[source]¶
Bases:
DelayedArray
An array that was reloaded from disk by the
read_object()
function, and remembers the path from which it was loaded. This class allows methods to refer to the existing on-disk representation by inspecting the path. For example,save_object()
can just copy/link to the existing files instead of repeating the saving process.
- class dolomite_matrix.ReloadedArray.ReloadedArraySeed(seed, path)[source]¶
Bases:
WrapperArraySeed
Seed for the
ReloadedArray
class. This is a subclass ofWrapperArraySeed
.
- dolomite_matrix.ReloadedArray.save_object_ReloadedArray(x, path, reloaded_array_reuse_mode='link', **kwargs)[source]¶
Method for saving
ReloadedArray
objects to disk, seesave_object()
for details.- Parameters:
x (
ReloadedArray
) – Object to be saved.path (
str
) – Path to a directory to savex
.reloaded_array_reuse_mode (
str
) – How the files inx.path
should be re-used when populatingpath
. This can be"link"
, to create a hard link to each file;"symlink"
, to create a symbolic link to each file;"copy"
, to create a copy of each file; or"none"
, to perform a fresh save ofx
without relying onx.path
.kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.
dolomite_matrix.WrapperArraySeed module¶
- class dolomite_matrix.WrapperArraySeed.WrapperArraySeed(seed)[source]¶
Bases:
object
Wrapper for a DelayedArray seed, which forwards all of the required operations to the seed object. This is expected to be used as a base for concrete subclasses that attach more provenance-tracking information - see
ReloadedArray
for an example.- property seed¶
Returns: The underlying seed instance.
- dolomite_matrix.WrapperArraySeed.chunk_grid_WrapperArraySeed(x)[source]¶
See
chunk_grid()
for details.
- dolomite_matrix.WrapperArraySeed.create_dask_array_WrapperArraySeed(x)[source]¶
See
create_dask_array()
for details.
- dolomite_matrix.WrapperArraySeed.extract_dense_array_WrapperArraySeed(x, subset)[source]¶
See
extract_dense_array()
for details.- Return type:
- dolomite_matrix.WrapperArraySeed.extract_sparse_array_WrapperArraySeed(x, subset)[source]¶
See
extract_sparse_array()
for details.- Return type:
- dolomite_matrix.WrapperArraySeed.is_masked_WrapperArraySeed(x)[source]¶
See
is_masked()
for details.- Return type:
- dolomite_matrix.WrapperArraySeed.is_sparse_WrapperArraySeed(x)[source]¶
See
is_sparse()
for details.- Return type:
dolomite_matrix.choose_chunk_dimensions module¶
- dolomite_matrix.choose_chunk_dimensions.choose_chunk_dimensions(shape, size, min_extent=100, buffer_size=10000000.0)[source]¶
Choose chunk dimensions to use for a dense HDF5 dataset. For each dimension, we consider a slice of the array that consists of the full extent of all other dimensions. We want this slice to occupy less than
buffer_size
in memory, and we resize the slice along the current dimension to achieve this. The chunk size is then chosen as the size of the slice along the current dimension. This ensures that efficient iteration along each dimension will not use any more thanbuffer_size
bytes.- Parameters:
size (
int
) – Size of each array element in bytes.min_extent (
int
) – Minimum extent of each chunk dimension, to avoid problems with excessively small chunk sizes when the data is large.buffer_size (
int
) – Size of the (conceptual) memory buffer to use for storing blocks of data during iteration through the array, in bytes.
- Return type:
- Returns:
Tuple containing the chunk dimensions.
dolomite_matrix.read_compressed_sparse_matrix module¶
- dolomite_matrix.read_compressed_sparse_matrix.read_compressed_sparse_matrix(path, metadata, **kwargs)[source]¶
Read a compressed sparse matrix from its on-disk representation. In general, this function should not be called directly but instead be dispatched via
read_object()
.- Parameters:
- Return type:
- Returns:
A
ReloadedArray
containing a HDF5-backed compressed sparse matrix as a seed.
dolomite_matrix.read_dense_array module¶
- dolomite_matrix.read_dense_array.read_dense_array(path, metadata, **kwargs)[source]¶
Read a dense array from its on-disk representation. In general, this function should not be called directly but instead be dispatched via
read_object()
.- Parameters:
- Return type:
- Returns:
A
ReloadedArray
containing a HDF5-backed dense array as a seed.
dolomite_matrix.save_compressed_sparse_matrix module¶
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_Sparse2darray(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving a
SparseNdarray
to disk, seesave_object()
for details.- Parameters:
x (
SparseNdarray
) – Object to be saved.path (
str
) – Path to a directory to savex
.compressed_sparse_matrix_chunk_size (
int
) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_buffer_size (
int
) – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_coo_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
coo_matrix
objects to disk, seestage_object()
for details.- Parameters:
x (
coo_matrix
) – Matrix to be saved.path (
str
) – Path to a directory to savex
.compressed_sparse_matrix_chunk_size (
int
) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_csc_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
csc_matrix
objects to disk, seestage_object()
for details.- Parameters:
x (
csc_matrix
) – Matrix to be saved.path (
str
) – Path to a directory to savex
.compressed_sparse_matrix_chunk_size (
int
) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.
- dolomite_matrix.save_compressed_sparse_matrix.save_compresssed_sparse_matrix_from_scipy_csr_matrix(x, path, compressed_sparse_matrix_chunk_size=10000, compressed_sparse_matrix_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
csr_matrix
objects to disk, seestage_object()
for details.- Parameters:
x (
csr_matrix
) – Matrix to be saved.path (
str
) – Path to a directory to savex
.compressed_sparse_matrix_chunk_size (
int
) – Chunk size for the data and indices. Larger values improve compression at the potential cost of reducing random access efficiency.compressed_sparse_matrix_cache_size – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.
kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.
dolomite_matrix.save_delayed_array module¶
- dolomite_matrix.save_delayed_array.save_delayed_array(x, path, delayed_array_preserve_operations=False, **kwargs)[source]¶
Method to save
DelayedArray
objects to disk, seesave_object()
for details.If the array is pristine, we attempt to use the
save_object
method of the seed. Ifdelayed_array_preserve_operations = False
, we save theDelayedArray
as a dense array or a compressed sparse matrix.- Parameters:
x (
DelayedArray
) – Object to be saved.path (
str
) – Path to a directory to savex
.delayed_array_preserve_operations (
bool
) – Whether to preserve delayed operations via the chihaya specification. Currently not supported.kwargs – Further arguments, passed to the
save_object
methods for dense arrays and compressed sparse matrices.
- Returns:
x
is saved topath
.
dolomite_matrix.save_dense_array module¶
- dolomite_matrix.save_dense_array.save_dense_array_from_ndarray(x, path, dense_array_chunk_dimensions=None, dense_array_chunk_args={}, dense_array_buffer_size=100000000.0, **kwargs)[source]¶
Method for saving
ndarray
objects to disk, seesave_object()
for details.- Parameters:
x (
ndarray
) – Object to be saved.path (
str
) – Path to a directory to savex
.dense_array_chunk_dimensions (
Optional
[Tuple
[int
,...
]]) – Chunk dimensions for the HDF5 dataset. Larger values improve compression at the potential cost of reducing random access efficiency. If not provided, we choose some chunk sizes withchoose_chunk_dimensions()
.dense_array_chunk_args (
Dict
) – Arguments to pass tochoose_chunk_dimensions
ifdense_array_chunk_dimensions
is not provided.dense_array_buffer_size (
int
) – Size of the buffer in bytes, for blockwise processing and writing to file. Larger values improve speed at the cost of memory.kwargs – Further arguments, ignored.
- Returns:
x
is saved topath
.