Source code for dolomite_matrix.choose_chunk_dimensions
fromtypingimportTuple
[docs]defchoose_chunk_dimensions(shape:Tuple[int,...],size:int,min_extent:int=100,buffer_size:int=1e7)->Tuple[int,...]:""" Choose chunk dimensions to use for a dense HDF5 dataset. For each dimension, we consider a slice of the array that consists of the full extent of all other dimensions. We want this slice to occupy less than ``buffer_size`` in memory, and we resize the slice along the current dimension to achieve this. The chunk size is then chosen as the size of the slice along the current dimension. This ensures that efficient iteration along each dimension will not use any more than ``buffer_size`` bytes. Args: shape: Shape of the array. size: Size of each array element in bytes. min_extent: Minimum extent of each chunk dimension, to avoid problems with excessively small chunk sizes when the data is large. buffer_size: Size of the (conceptual) memory buffer to use for storing blocks of data during iteration through the array, in bytes. Returns: Tuple containing the chunk dimensions. """num_elements=int(buffer_size/size)chunks=[]ford,sinenumerate(shape):otherdim=1ford2,s2inenumerate(shape):# just calculating it again to avoid overflow issues.ifd2!=d:otherdim*=s2proposed=int(num_elements/otherdim)ifproposed>s:proposed=selifproposed<min_extent:proposed=min_extentchunks.append(proposed)return(*chunks,)