ritsuko
Helper utilities for ArtifactDB C++ code
Loading...
Searching...
No Matches
ritsuko Namespace Reference

Assorted helper functions for parsing and validation. More...

Namespaces

namespace  hdf5
 Assorted helper functions for HDF5 parsing.
 

Classes

struct  FloatExtremes
 Extremes for float types. More...
 
struct  IntegerExtremes
 Extremes for integer types. More...
 
struct  Version
 Version number. More...
 

Functions

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > choose_missing_integer_placeholder (Iterator start, Iterator end, Mask mask)
 
template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > choose_missing_integer_placeholder (Iterator start, Iterator end)
 
template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > choose_missing_float_placeholder (Iterator start, Iterator end, Mask mask, bool skip_nan)
 
template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > choose_missing_float_placeholder (Iterator start, Iterator end, bool skip_nan=false)
 
template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes find_integer_extremes (Iterator start, Iterator end, Mask mask)
 
template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes find_integer_extremes (Iterator start, Iterator end)
 
template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes find_float_extremes (Iterator start, Iterator end, Mask mask, bool skip_nan)
 
template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes find_float_extremes (Iterator start, Iterator end, bool skip_nan=false)
 
bool is_date_prefix (const char *ptr)
 
bool is_date (const char *ptr, size_t len)
 
bool is_rfc3339_suffix (const char *ptr, size_t len)
 
bool is_rfc3339 (const char *ptr, size_t len)
 
Version parse_version_string (const char *version_string, size_t size, bool skip_patch=false)
 
double r_missing_value ()
 
template<typename Float_ >
bool are_floats_identical (const Float_ *x, const Float_ *y)
 

Detailed Description

Assorted helper functions for parsing and validation.

Function Documentation

◆ are_floats_identical()

template<typename Float_ >
bool ritsuko::are_floats_identical ( const Float_ * x,
const Float_ * y )

Check for identical floating-point numbers, including NaN status and the payload.

Template Parameters
Float_Floating-point type.
Parameters
xPointer to a floating point value.
yPointer to another floating point value.
Returns
Whether or not x and y have identical bit patterns.

◆ choose_missing_float_placeholder() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > ritsuko::choose_missing_float_placeholder ( Iterator start,
Iterator end,
bool skip_nan = false )

Overload of choose_missing_float_placeholder() where no values are masked.

Template Parameters
Iterator_Forward iterator for floating-point values.
Type_Integer type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
skip_nanWhether to skip NaN as a potential placeholder.
Returns
Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_float_placeholder() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > ritsuko::choose_missing_float_placeholder ( Iterator start,
Iterator end,
Mask mask,
bool skip_nan )

Choose an appropriate placeholder for missing values in a floating-point dataset, after ignoring all masked values. This will try the various IEEE special values (NaN, Inf, -Inf) and then some type-specific boundaries (the minimum, the maximum, and for signed types, 0) before sorting the dataset and searching for an unused float.

Template Parameters
Iterator_Forward iterator for floating-point values.
Type_Float type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
maskStart of the mask vector.
skip_nanWhether to skip NaN as a potential placeholder. Useful in frameworks like R that need special consideration of NaN payloads.
Returns
Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_integer_placeholder() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > ritsuko::choose_missing_integer_placeholder ( Iterator start,
Iterator end )

Overload of choose_missing_integer_placeholder() where no values are masked.

Template Parameters
Iterator_Forward iterator for integer values.
Type_Integer type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
Returns
Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_integer_placeholder() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ > ritsuko::choose_missing_integer_placeholder ( Iterator start,
Iterator end,
Mask mask )

Choose an appropriate placeholder for missing values in an integer dataset, after ignoring all the masked values. This will try the various special values (the minimum, the maximum, and for signed types, 0) before sorting the dataset and searching for an unused integer value.

Template Parameters
Iterator_Forward iterator for integer values.
Mask_Random access iterator for mask values.
Type_Integer type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
maskStart of the mask vector. This should have the same length as end - start; each entry is true if the corresponding value of the integer dataset is masked, and false otherwise.
Returns
Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ find_float_extremes() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes ritsuko::find_float_extremes ( Iterator start,
Iterator end,
bool skip_nan = false )

Overload of find_float_extremes() where no values are masked.

Template Parameters
Iterator_Forward iterator for float values.
Type_Float type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
skip_nanWhether to skip searches for NaN. Useful in frameworks like R that need special consideration of NaN payloads.
Returns
Whether extreme values are present in [start, end).

◆ find_float_extremes() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes ritsuko::find_float_extremes ( Iterator start,
Iterator end,
Mask mask,
bool skip_nan )

Check for the presence of extreme values in a floating-point dataset. This can be used to choose a missing placeholder value in an online fashion, by calling this function on blocks of the dataset; if any of the extreme values are absent from all blocks, they can be used as the missing value placeholder. By contrast, choose_missing_float_placeholder() requires access to the full dataset.

Template Parameters
Iterator_Forward iterator for float values.
Mask_Random access iterator for mask values.
Type_Float type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
maskStart of the mask vector. This should have the same length as end - start; each entry is true if the corresponding value of the float dataset is masked, and false otherwise.
skip_nanWhether to skip searches for NaN. Useful in frameworks like R that need special consideration of NaN payloads.
Returns
Whether extreme values are present in [start, end). If skip_nan = true, FloatExtremes::has_nan is set to false and should be ignored. If Type_ is not an IEEE754-compliant float, users should ignore FloatExtremes::has_nan, FloatExtremes::has_negative_inf and FloatExtremes::has_positive_inf.

◆ find_integer_extremes() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes ritsuko::find_integer_extremes ( Iterator start,
Iterator end )

Overload of find_integer_extremes() where no values are masked.

Template Parameters
Iterator_Forward iterator for integer values.
Type_Integer type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
Returns
Whether extreme values are present in [start, end).

◆ find_integer_extremes() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes ritsuko::find_integer_extremes ( Iterator start,
Iterator end,
Mask mask )

Check for the presence of extreme values in an integer dataset. This can be used to choose a missing placeholder value in an online fashion, by calling this function on blocks of the dataset; if any of the extreme values are absent from all blocks, they can be used as the missing value placeholder. By contrast, choose_missing_integer_placeholder() requires access to the full dataset.

Template Parameters
Iterator_Forward iterator for integer values.
Mask_Random access iterator for mask values.
Type_Integer type pointed to by Iterator_.
Parameters
startStart of the dataset.
endEnd of the dataset.
maskStart of the mask vector. This should have the same length as end - start; each entry is true if the corresponding value of the integer dataset is masked, and false otherwise.
Returns
Whether extreme values are present in [start, end). If Type_ is unsigned, IntegerExtremes::has_lowest and IntegerExtremes::has_zero are the same.

◆ is_date()

bool ritsuko::is_date ( const char * ptr,
size_t len )
inline
Parameters
[in]ptrPointer to a C-style string.
lenLength of the string referenced by ptr, excluding the null terminator.
Returns
Whether ptr refers to a XXXX-YY-ZZ date, for approximately valid combinations of YY and ZZ (see is_date_prefix() for details).

◆ is_date_prefix()

bool ritsuko::is_date_prefix ( const char * ptr)
inline

Does a string start with a XXXX-YY-ZZ date, for approximately valid combinations of YY and ZZ? (This is only approximate as we do not check the exact correctness of the number of days for each month.)

Parameters
[in]ptrPointer to a C-style string containing at least 10 characters.
Returns
Whether or not the string starts with a date.

◆ is_rfc3339()

bool ritsuko::is_rfc3339 ( const char * ptr,
size_t len )
inline

Does a string follow the RFC3339 format? This uses is_date_prefix() and is_rfc3339_suffix() to check the date and the rest of the timestamp, respectively.

Parameters
[in]ptrPointer to a C-style string.
lenLength of the string in ptr.
Returns
Whether or not the string is RFC3339-compliant.

◆ is_rfc3339_suffix()

bool ritsuko::is_rfc3339_suffix ( const char * ptr,
size_t len )
inline

Does a string finish with an RFC3339-compliant timestamp, i.e., does the substring starting at T after the date follow the RFC3339 specification? Note that the timestamp validity checks are only approximate as the correctness of leap seconds are not currently considered. It is expected that the start of the string up to the T was already validated with is_date_prefix().

Parameters
[in]ptrPointer to a character array containing at least 10 characters. This should start from the 10th position in the original string, i.e., T in the timestamp.
lenLength of the string in ptr.
Returns
Whether or not the string finishes with an RFC3339-compliant timestamp.

◆ parse_version_string()

Version ritsuko::parse_version_string ( const char * version_string,
size_t size,
bool skip_patch = false )
inline
Parameters
[in]version_stringPointer to a version string.
sizeLength of the version_string.
skip_patchWhether to skip the patch number.
Returns
A Version object containing the version number. If skip_patch = true, the patch number is always zero.

◆ r_missing_value()

double ritsuko::r_missing_value ( )
inline

Create R's missing value for doubles, allowing us to mimic R's missingness concept in other languages.

Returns
A quiet NaN with a payload of 1954, equivalent to R's double-precision missing value.