A data frame object stored inside a group of a HDF5 file (see here for a detailed specification). Atomic columns are stored as one-dimensional datasets of the same length in the data subgroup, named by their positional 0-based index in the data frame. Column names are stored in column_names, a 1-dimensional string dataset of length equal to the number of columns. Row names, if present, are stored in a row_names dataset of the same length at the number of rows. For non-atomic columns, the corresponding dataset is omitted and the actual contents are obtained from other files; a pointer to the resource should be stored in the corresponding entry of the data_frame.columns property.
Derived from data_frame/v1.json: virtual data frame object stored in a yet-to-be-defined file format. Columns with simple types are stored directly in the file. For columns with non-obvious types (e.g., nested data frames), their contents should be stored in other files, and a pointer to a resource should be stored in the corresponding entry of columns (a placeholder column may be created in the file, depending on the format).
The schema to use.
Location of additional metadata for each column, stored as another data_frame. Omitted if no additional per-column metadata is present.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Information about the columnar fields in the data frame. This should be in the same order as the columns in the on-disk representation.
No Additional ItemsIf the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"string" If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"factor" If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
"other" Formatting constraints for string types.
YYYY-MM-DD format.Levels for a categorical factor, used by file formats that cannot store the levels internally (e.g., CSVs). This property points to a separate resource containing the levels as a vector of unique non-missing strings.For ordered factors, the order is respected in the saved vector.
Older instances (version = 1) store the levels in a 1-column data frame;this column can simply be treated as the vector of strings.
For file formats that are capable of storing the levels internally (e.g., HDF5), this property is not required and may be ignored.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Name of the column. Each column must have a non-empty name. Column names should not be duplicated within columns.
Must be at least 1 characters long
Whether to assume that the levels are ordered.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Type of the column.
format property that restrict their contents, e.g., for dates or times.version >= 2. - The factor type is represented as an integer, to be used as a 1-based index into a vector of string levels. This type has an additional levels property specifying the levels, as well as an ordered property indicating whether they are ordered. data_frame.version = 1) store factor and ordered types as strings instead of integers. All such strings are guaranteed to belong to the string levels in levels. This representation is deprecated and the integer representation should be used in version > 2. ordered type is a deprecated alias for the factor type with the ordered property set to true; the latter should be used in version >= 2.date type is a soft-deprecated alias for the string type with format property set to date; the latter should be used in version >= 2.date-time type is a soft-deprecated alias for the string type with format property set to date-time; the latter should be used in version >= 2.other are assumed to be non-simple and should contain a resource property pointing to column's contents.Dimensions of a two-dimensional object.
Must contain a minimum of 2 items
Must contain a maximum of 2 items
Location of additional metadata for this object, typically stored as a list. Omitted if no other metadata is present.
Relative path of the resource from the root of the project directory.
Type of file. Local files should be present in the same project directory.
Whether the data frame has row names. If true, these are stored in the first column of the CSV.
Minor version of this format.
Value must be lesser or equal to 2
If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
1 If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.
Name of the group inside the HDF5 file that contains the contents of the data frame.
Minor version of this format. Only used for older hdf5_data_frame instances, and is ignored if a version number attribute is present in the HDF5 group named by group.
Value must be lesser or equal to 3
Is this a child document, only to be interpreted in the context of the parent document from which it is linked? This may have implications for search and metadata requirements.
MD5 checksum for the file.
Path to the file in the project directory.