Skip to content

Metadata File

jlchang edited this page Jun 21, 2021 · 19 revisions

Metadata file

Purpose of the metadata file

SCP Conventional File Format

Legacy Metadata File Format

Purpose

A metadata file provides metadata describing cells in the study. The metadata provided in this file are interpreted as either "group" (categorical/factor) or "numeric" (continuous) data. These metadata are available throughout the visualization portal.

Metadata will be available to paint cell plots. Categorical metadata will paint the cells with discrete color panels (as discrete groups, each with their own color). Continuous metadata will paint cells as a gradient of color. These metadata not only determine color on plots of cells, but also are used when viewing genes across cells.

If categorical metadata is currently being viewed and a gene is searched, the gene will be viewed as boxplots, each boxplot of cells a group in the metadata. If continuous metadata is currently being viewed, the gene will be plotted against the metadata as a scatter plot.

There is no restriction on the number of metadata to make available. These metadata will be globally available to all plots (for metadata with more than one and up to two hundred unique values); if you would like metadata to be restricted to a specific plot of cells, include it in the cluster file used to make that specific plot. To make metadata available for advanced search, submit metadata in SCP Conventional File Format.

Note: Required for study visualization.

Metadata-powered Advanced Search

Single Cell Portal offers search across all studies with SCP Conventional Metadata using a faceted search interface:

To make study data available as part of advanced search, submitted metadata files need to use SCP Conventional header names for common search metadata (Species, Organ, Disease etc) and supply values from designated ontologies or controlled lists as described below. Study owners are welcome to include unconventional metadata which will not be part of advanced search but is available for painting cells in plots.

SCP Conventional File Format

The Single Cell Portal Metadata Convention defines a set of metadata terms that are intended to be used in a consistent manner across participating studies within the Portal. Where possible, the Metadata Convention uses established biomedical ontologies for the communication of complex domain information in a clear and concise manner. For simpler concepts, controlled lists of terms are used to ensure the data is uniform across participating studies.

While usage of the SCP Metadata Convention is optional, using the SCP Metadata Convention will enable your data to be discovered through our advanced search interface. When metadata files for an SCP study are validated against the SCP Metadata Convention, extra steps are applied to ensure submitted metadata conforms to the requirements of the metadata convention. Note that unconventional metadata is not stored for advanced search.

Conventional metadata files are tab delimited (without quotes).

Conventional metadata file anatomy

Click to view full size Conventional metadata file anatomy image

Conventional metadata files must:

Download the example Conventional Metadata File

Required Conventional Metadata

Studies uploaded using the SCP Metadata convention must include the following required metadata:

Metadata Name Content Description
NAME string unique identifier for each cell in the study, must be first column in metadata file (Listed as "CellID" in the convention schema)
biosample_id string unique identifier for each sample in the study
donor_id string unique identifier for each biosample donor in the study
species ontology ontology identifier from NCBItaxon
species__ontology_label ontology_label ontology label from NCBItaxon
disease ontology* ontology identifier from MONDO or PATO (if no disease, use ontology ID "PATO_0000461")
disease__ontology_label ontology_label* ontology label from MONDO or PATO (if no disease, use ontology label "normal")
organ ontology ontology identifier from Uberon
organ__ontology_label ontology_label ontology label from Uberon
library_preparation_protocol ontology ontology identifier from Experimental Factor ontology:library preparation
library_preparation_protocol__ontology_label ontology_label ontology label from Experimental Factor ontology:library preparation
sex controlled list (enum) one of ["male", "female", "mixed", "unknown"]

*multiple values allowed - see Array valued metadata

biosample_id and donor_id are identifiers assigned by the study owner to associate cells in a study to their biosample and/or donor, respectively.

Inclusion of ontology_label metadata allows validation of required metadata to ensure the submitted ontology ID is the intended ontology term.

Recommended Conventional Metadata

Metadata Name Type Description
cell_type ontology ontology identifier from Cell Ontology
cell_type__ontology_label ontology_label ontology label from Cell Ontology

Additional optional convention metadata are listed in the SCP Metadata convention with documentation on convention components and additional guidance on usage.

Metadata Validation Rules

Metadata files uploaded using the SCP Metadata convention are validated for the following properties:

  • Metadata names
    • must exactly match Metadata Name in metadata convention for metadata to be added to the query database
    • are case-sensitive and must also case-match the conventional metadata name in the schema
    • unconventional metadata names may only have alphanumeric characters or underscore
  • Ontology terms must
    • be correctly formatted ("<ontology name>_<numeric ID>" or "<ontology name>:<numeric ID>")
      • example: MONDO_0000001 or MONDO:0000001
    • exist in the expected metadata ontology, validated through EBI OLS.
    • supply the human-readable label ontology label for the provided ontology ID, if the metadata is Required (<metadata attribute>__ontology_label <- note the usage of double underscore)
  • Ontology labels must exactly match the label or synonym in EBI OLS
  • Metadata with controlled lists (type = enum) must exactly match one of the enumerated values.
  • Metadata values provided must match their type and class declarations in the metadata convention.
  • Array-valued metadata must be delimited with the pipe symbol (|)

Example Conventional Metadata File

View an example Conventional Metadata File

The example conventional metadata file demonstrates a study with 5 samples from 3 donors. The "mouse1" donor contributes two samples "mm1_lymph" and "mm1_blood". The sample collection had both normal (PATO_0000461) samples and disease samples. Unconventional metadata "average intensity" is included and can be used to paint cells in plots. The cells in this metadata file were annotated with the convention-optional "cell_type" metadata, so that cells from this study are findable through the Cell Type facet of advanced search if the metadata file is validated against the metadata convention.

Download the example Conventional Metadata File

Metadata Validation

How to upload your metadata file with the metadata convention (File upload wizard interface):

To validate your metadata file against the metadata convention, select the "yes" option.

Visit our Metadata Validation Errors FAQ for information on common metadata file issues and solutions.

Legacy Metadata File Format

The legacy SCP metadata file format is a tab delimited file (without quotes) that has one required column and two required rows.

Columns: The first column is required and contains cell names; one should include all cells given in the expression file. Additional columns are different metadata to be viewed. Please note, cell names should match other cell names in other study files

Rows: The first of the two rows starts with the entry "NAME", after this the name of the metadata contained in each column is given. This is the name users will see and select in the portal. The second row starts with "TYPE" and then contains the value "group" or "numeric" describing the column of metadata. Additional rows describe a cell, given first a cell name and then metadata entries. The cell names should match cell names in other files. For unconventional metadata please try to use descriptive metadata, naming groups in ways others will understand as they view them. Please use only alphanumeric characters and underscore.

Example Legacy Metadata File

Clone this wiki locally