CSV Data Frame

Type: object

A data frame stored in a CSV file (see here for a detailed specification). If the data_frame.row_names property is present and truthy, the first column of the CSV file contains the row names of the data frame as non-missing strings; this should be ignored when indexing entries of the data_frames.columns property. For non-simple columns, a placeholder column is created in the CSV and a pointer to the relevant resource is created in the corresponding entry of the data_frame.columns property. The CSV file may be compressed if the csv_data_frame.compression property is set to "gzip".

Derived from data_frame/v1.json: virtual data frame object stored in a yet-to-be-defined file format. Columns with simple types are stored directly in the file. For columns with non-obvious types (e.g., nested data frames), their contents should be stored in other files, and a pointer to a resource should be stored in the corresponding entry of columns (a placeholder column may be created in the file, depending on the format).

No Additional Properties

Type: string

The schema to use.

Type: object
No Additional Properties

Type: enum (of string)

Type of compression applied to the file.

Must be one of:

  • "none"
  • "gzip"
  • "bzip2"


Type: object

Type: object

Location of additional metadata for each column, stored as another data_frame. Omitted if no additional per-column metadata is present.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: array of object

Information about the columnar fields in the data frame. This should be in the same order as the columns in the on-disk representation.

No Additional Items

Each item of this array must be:


No Additional Properties

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.


Must not be:

Type: object

The following properties are required:

  • format
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.


Must not be:

Type: object

Type: enum (of string)

Must be one of:

  • "factor"
  • "ordered"

Must not be:

Type: object

The following properties are required:

  • levels
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.


Must not be:

Type: object

The following properties are required:

  • ordered
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: "other"
Type: object

The following properties are required:

  • resource

Must not be:

Type: object

The following properties are required:

  • resource

Type: enum (of string)

Formatting constraints for string types.

  • Dates are strings consisting of integers and dashes, following the YYYY-MM-DD format.
  • Date-times are strings following RFC 3339 Section 5.6, i.e., the Internet Date/Time format.

Must be one of:

  • "date"
  • "date-time"

Type: object

Levels for a categorical factor, used by file formats that cannot store the levels internally (e.g., CSVs). This property points to a separate resource containing the levels as a vector of unique non-missing strings.For ordered factors, the order is respected in the saved vector.

Older instances (version = 1) store the levels in a 1-column data frame;this column can simply be treated as the vector of strings.

For file formats that are capable of storing the levels internally (e.g., HDF5), this property is not required and may be ignored.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: string

Name of the column. Each column must have a non-empty name. Column names should not be duplicated within columns.

Must be at least 1 characters long

Type: boolean Default: false

Whether to assume that the levels are ordered.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: enum (of string)

Type of the column.

  • Integers, (floating-point) numbers and booleans are their usual selves.
  • Strings have an optional format property that restrict their contents, e.g., for dates or times.
    This is only available in version >= 2. - The factor type is represented as an integer, to be used as a 1-based index into a vector of string levels. This type has an additional levels property specifying the levels, as well as an ordered property indicating whether they are ordered.
    • Older instances (data_frame.version = 1) store factor and ordered types as strings instead of integers. All such strings are guaranteed to belong to the string levels in levels. This representation is deprecated and the integer representation should be used in version > 2.
      • The ordered type is a deprecated alias for the factor type with the ordered property set to true; the latter should be used in version >= 2.
      • The date type is a soft-deprecated alias for the string type with format property set to date; the latter should be used in version >= 2.
      • The date-time type is a soft-deprecated alias for the string type with format property set to date-time; the latter should be used in version >= 2.
  • Columns listed as other are assumed to be non-simple and should contain a resource property pointing to column's contents.

Must be one of:

  • "integer"
  • "number"
  • "string"
  • "factor"
  • "ordered"
  • "boolean"
  • "date"
  • "date-time"
  • "other"

Type: array of integer

Dimensions of a two-dimensional object.

Must contain a minimum of 2 items

Must contain a maximum of 2 items

No Additional Items

Each item of this array must be:

Type: object

Location of additional metadata for this object, typically stored as a list. Omitted if no other metadata is present.

Type: object

Type: string

Relative path of the resource from the root of the project directory.

Type: enum (of string)

Type of file. Local files should be present in the same project directory.

Must be one of:

  • "local"

Type: boolean Default: false

Whether the data frame has row names. If true, these are stored in the first column of the CSV.

Type: integer Default: 1

Minor version of this format.

Value must be lesser or equal to 2

Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: const
Specific value: 1
Type: object

Type: object
Type: object

If the conditions in the "If" tab are respected, then the conditions in the "Then" tab should be respected. Otherwise, the conditions in the "Else" tab should be respected.

Type: object

Type: object
Type: object

Type: object

Type: boolean Default: false

Is this a child document, only to be interpreted in the context of the parent document from which it is linked? This may have implications for search and metadata requirements.

Type: string

MD5 checksum for the file.

Type: string

Path to the file in the project directory.