Assorted helper functions for parsing and validation. More...

Namespaces
namespace	hdf5
	Assorted helper functions for HDF5 parsing.

Classes
struct	FloatExtremes
	Extremes for float types. More...

struct	IntegerExtremes
	Extremes for integer types. More...

struct	Version
	Version number. More...

Functions
template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ >	choose_missing_integer_placeholder (Iterator start, Iterator end, Mask mask)

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ >	choose_missing_integer_placeholder (Iterator start, Iterator end)

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ >	choose_missing_float_placeholder (Iterator start, Iterator end, Mask mask, bool skip_nan)

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
std::pair< bool, Type_ >	choose_missing_float_placeholder (Iterator start, Iterator end, bool skip_nan=false)

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes	find_integer_extremes (Iterator start, Iterator end, Mask mask)

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
IntegerExtremes	find_integer_extremes (Iterator start, Iterator end)

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes	find_float_extremes (Iterator start, Iterator end, Mask mask, bool skip_nan)

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >
FloatExtremes	find_float_extremes (Iterator start, Iterator end, bool skip_nan=false)

bool	is_date_prefix (const char *ptr)

bool	is_date (const char *ptr, size_t len)

bool	is_rfc3339_suffix (const char *ptr, size_t len)

bool	is_rfc3339 (const char *ptr, size_t len)

Version	parse_version_string (const char *version_string, size_t size, bool skip_patch=false)

double	r_missing_value ()

template<typename Float_ >
bool	are_floats_identical (const Float_ x, const Float_ y)

Detailed Description

Assorted helper functions for parsing and validation.

Function Documentation

◆ are_floats_identical()

template<typename Float_ >

bool ritsuko::are_floats_identical	(	const Float_ *	x,
		const Float_ *	y )

Check for identical floating-point numbers, including NaN status and the payload.

Template Parameters

Float_ Floating-point type.

Parameters

x	Pointer to a floating point value.
y	Pointer to another floating point value.

Returns: Whether or not x and y have identical bit patterns.

◆ choose_missing_float_placeholder() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

std::pair< bool, Type_ > ritsuko::choose_missing_float_placeholder	(	Iterator	start,
		Iterator	end,
		bool	skip_nan = false )

Overload of choose_missing_float_placeholder() where no values are masked.

Template Parameters

Iterator_	Forward iterator for floating-point values.
Type_	Integer type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
skip_nan	Whether to skip NaN as a potential placeholder.

Returns: Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_float_placeholder() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

std::pair< bool, Type_ > ritsuko::choose_missing_float_placeholder	(	Iterator	start,
		Iterator	end,
		Mask	mask,
		bool	skip_nan )

Choose an appropriate placeholder for missing values in a floating-point dataset, after ignoring all masked values. This will try the various IEEE special values (NaN, Inf, -Inf) and then some type-specific boundaries (the minimum, the maximum, and for signed types, 0) before sorting the dataset and searching for an unused float.

Template Parameters

Iterator_	Forward iterator for floating-point values.
Type_	Float type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
mask	Start of the mask vector.
skip_nan	Whether to skip NaN as a potential placeholder. Useful in frameworks like R that need special consideration of NaN payloads.

Returns: Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_integer_placeholder() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

std::pair< bool, Type_ > ritsuko::choose_missing_integer_placeholder	(	Iterator	start,
		Iterator	end )

Overload of choose_missing_integer_placeholder() where no values are masked.

Template Parameters

Iterator_	Forward iterator for integer values.
Type_	Integer type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.

Returns: Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ choose_missing_integer_placeholder() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

std::pair< bool, Type_ > ritsuko::choose_missing_integer_placeholder	(	Iterator	start,
		Iterator	end,
		Mask	mask )

Choose an appropriate placeholder for missing values in an integer dataset, after ignoring all the masked values. This will try the various special values (the minimum, the maximum, and for signed types, 0) before sorting the dataset and searching for an unused integer value.

Template Parameters

Iterator_	Forward iterator for integer values.
Mask_	Random access iterator for mask values.
Type_	Integer type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
mask	Start of the mask vector. This should have the same length as `end - start`; each entry is true if the corresponding value of the integer dataset is masked, and false otherwise.

Returns: Pair containing (i) a boolean indicating whether a placeholder was successfully found, and (ii) the chosen placeholder if the previous boolean is true.

◆ find_float_extremes() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

FloatExtremes ritsuko::find_float_extremes	(	Iterator	start,
		Iterator	end,
		bool	skip_nan = false )

Overload of find_float_extremes() where no values are masked.

Template Parameters

Iterator_	Forward iterator for float values.
Type_	Float type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
skip_nan	Whether to skip searches for NaN. Useful in frameworks like R that need special consideration of NaN payloads.

Returns: Whether extreme values are present in [start, end).

◆ find_float_extremes() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

FloatExtremes ritsuko::find_float_extremes	(	Iterator	start,
		Iterator	end,
		Mask	mask,
		bool	skip_nan )

Check for the presence of extreme values in a floating-point dataset. This can be used to choose a missing placeholder value in an online fashion, by calling this function on blocks of the dataset; if any of the extreme values are absent from all blocks, they can be used as the missing value placeholder. By contrast, choose_missing_float_placeholder() requires access to the full dataset.

Template Parameters

Iterator_	Forward iterator for float values.
Mask_	Random access iterator for mask values.
Type_	Float type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
mask	Start of the mask vector. This should have the same length as `end - start`; each entry is true if the corresponding value of the float dataset is masked, and false otherwise.
skip_nan	Whether to skip searches for NaN. Useful in frameworks like R that need special consideration of NaN payloads.

Returns: Whether extreme values are present in [start, end). If skip_nan = true, FloatExtremes::has_nan is set to false and should be ignored. If Type_ is not an IEEE754-compliant float, users should ignore FloatExtremes::has_nan, FloatExtremes::has_negative_inf and FloatExtremes::has_positive_inf.

◆ find_integer_extremes() [1/2]

template<class Iterator , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

IntegerExtremes ritsuko::find_integer_extremes	(	Iterator	start,
		Iterator	end )

Overload of find_integer_extremes() where no values are masked.

Template Parameters

Iterator_	Forward iterator for integer values.
Type_	Integer type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.

Returns: Whether extreme values are present in [start, end).

◆ find_integer_extremes() [2/2]

template<class Iterator , class Mask , class Type_ = typename std::remove_cv<typename std::remove_reference<decltype(*(std::declval<Iterator>()))>::type, ::type >

IntegerExtremes ritsuko::find_integer_extremes	(	Iterator	start,
		Iterator	end,
		Mask	mask )

Check for the presence of extreme values in an integer dataset. This can be used to choose a missing placeholder value in an online fashion, by calling this function on blocks of the dataset; if any of the extreme values are absent from all blocks, they can be used as the missing value placeholder. By contrast, choose_missing_integer_placeholder() requires access to the full dataset.

Template Parameters

Iterator_	Forward iterator for integer values.
Mask_	Random access iterator for mask values.
Type_	Integer type pointed to by `Iterator_`.

Parameters

start	Start of the dataset.
end	End of the dataset.
mask	Start of the mask vector. This should have the same length as `end - start`; each entry is true if the corresponding value of the integer dataset is masked, and false otherwise.

Returns: Whether extreme values are present in [start, end). If Type_ is unsigned, IntegerExtremes::has_lowest and IntegerExtremes::has_zero are the same.

◆ is_date()

bool ritsuko::is_date	(	const char *	ptr,
		size_t	len )

inline

Parameters

[in]	ptr	Pointer to a C-style string.
	len	Length of the string referenced by `ptr`, excluding the null terminator.

Returns: Whether ptr refers to a XXXX-YY-ZZ date, for approximately valid combinations of YY and ZZ (see is_date_prefix() for details).

◆ is_date_prefix()

bool ritsuko::is_date_prefix ( const char * ptr )

inline

Does a string start with a XXXX-YY-ZZ date, for approximately valid combinations of YY and ZZ? (This is only approximate as we do not check the exact correctness of the number of days for each month.)

Parameters

[in] ptr Pointer to a C-style string containing at least 10 characters.

Returns: Whether or not the string starts with a date.

◆ is_rfc3339()

bool ritsuko::is_rfc3339	(	const char *	ptr,
		size_t	len )

inline

Does a string follow the RFC3339 format? This uses is_date_prefix() and is_rfc3339_suffix() to check the date and the rest of the timestamp, respectively.

Parameters

[in]	ptr	Pointer to a C-style string.
	len	Length of the string in `ptr`.

Returns: Whether or not the string is RFC3339-compliant.

◆ is_rfc3339_suffix()

bool ritsuko::is_rfc3339_suffix	(	const char *	ptr,
		size_t	len )

inline

Does a string finish with an RFC3339-compliant timestamp, i.e., does the substring starting at T after the date follow the RFC3339 specification? Note that the timestamp validity checks are only approximate as the correctness of leap seconds are not currently considered. It is expected that the start of the string up to the T was already validated with is_date_prefix().

Parameters

[in]	ptr	Pointer to a character array containing at least 10 characters. This should start from the 10th position in the original string, i.e., `T` in the timestamp.
	len	Length of the string in `ptr`.

Returns: Whether or not the string finishes with an RFC3339-compliant timestamp.

◆ parse_version_string()

Version ritsuko::parse_version_string	(	const char *	version_string,
		size_t	size,
		bool	skip_patch = false )

inline

Parameters

[in]	version_string	Pointer to a version string.
	size	Length of the `version_string`.
	skip_patch	Whether to skip the patch number.

Returns: A Version object containing the version number. If skip_patch = true, the patch number is always zero.

◆ r_missing_value()

double ritsuko::r_missing_value ( )

inline

Create R's missing value for doubles, allowing us to mimic R's missingness concept in other languages.

Returns: A quiet NaN with a payload of 1954, equivalent to R's double-precision missing value.

Namespaces

Classes

Functions

Detailed Description

Function Documentation

◆ are_floats_identical()

◆ choose_missing_float_placeholder() [1/2]

◆ choose_missing_float_placeholder() [2/2]

◆ choose_missing_integer_placeholder() [1/2]

◆ choose_missing_integer_placeholder() [2/2]

◆ find_float_extremes() [1/2]

◆ find_float_extremes() [2/2]

◆ find_integer_extremes() [1/2]

◆ find_integer_extremes() [2/2]

◆ is_date()

◆ is_date_prefix()

◆ is_rfc3339()

◆ is_rfc3339_suffix()

◆ parse_version_string()

◆ r_missing_value()