-
Notifications
You must be signed in to change notification settings - Fork 25
Metadata File
- Anatomy of an SCP Conventional Metadata File
- Required Conventional Metadata
- Metadata Validation Rules
- Example Conventional Metadata File
- Metadata Validation
- Metadata Validation Errors FAQ
A metadata file provides metadata describing cells in the study. The metadata provided in this file are interpreted as either "group" (categorical/factor) or "numeric" (continuous) data. These metadata are available throughout the visualization portal.
Metadata will be available to paint cell plots. Categorical metadata will paint the cells with discrete color panels (as discrete groups, each with their own color). Continuous metadata will paint cells as a gradient of color. These metadata not only determine color on plots of cells, but also are used when viewing genes across cells.
If categorical metadata is currently being viewed and a gene is searched, the gene will be viewed as boxplots, each boxplot of cells a group in the metadata. If continuous metadata is currently being viewed, the gene will be plotted against the metadata as a scatter plot.
There is no restriction on the number of metadata to make available. These metadata will be globally available to all plots (for metadata with more than one and up to two hundred unique values); if you would like metadata to be restricted to a specific plot of cells, include it in the cluster file used to make that specific plot. To make metadata available for advanced search, submit metadata in SCP Conventional File Format.
Note: Required for study visualization.
Single Cell Portal offers search across all studies with SCP Conventional Metadata using a faceted search interface:
To make study data available as part of advanced search, submitted metadata files need to use SCP Conventional header names for common search metadata (Species, Organ, Disease etc) and supply values from designated ontologies or controlled lists as described below. Study owners are welcome to include unconventional metadata which will not be part of advanced search but is available for painting cells in plots.
The Single Cell Portal Metadata Convention defines a set of metadata terms that are intended to be used in a consistent manner across participating studies within the Portal. Where possible, the Metadata Convention uses established biomedical ontologies for the communication of complex domain information in a clear and concise manner. For simpler concepts, controlled lists of terms are used to ensure the data is uniform across participating studies.
While usage of the SCP Metadata Convention is optional, using the SCP Metadata Convention will enable your data to be discovered through our advanced search interface. When metadata files for an SCP study are validated against the SCP Metadata Convention, extra steps are applied to ensure submitted metadata conforms to the requirements of the metadata convention. Note that unconventional metadata is not stored for advanced search.
Conventional metadata files are tab delimited (without quotes).
Click to view full size Conventional metadata file anatomy image
Conventional metadata files must:
- have NAME and TYPE rows (as described in Legacy Metadata File Format)
- provide cell names in column 1 of the expression file, ensuring cell names exactly match cell names in other study files
- include all Required Conventional Metadata
- follow Metadata Validation Rules
- be uploaded to the Portal with Metadata Validation
Download the example Conventional Metadata File
Studies uploaded using the SCP Metadata convention must include the following required metadata:
Metadata Name | Content | Description |
---|---|---|
NAME | string | unique identifier for each cell in the study, must be first column in metadata file (Listed as "CellID" in the convention schema) |
biosample_id | string | unique identifier for each sample in the study |
donor_id | string | unique identifier for each biosample donor in the study |
species | ontology | ontology identifier from NCBItaxon |
species__ontology_label | ontology_label | ontology label from NCBItaxon |
disease | ontology* | ontology identifier from MONDO or PATO (if no disease, use ontology ID "PATO_0000461") |
disease__ontology_label | ontology_label* | ontology label from MONDO or PATO (if no disease, use ontology label "normal") |
organ | ontology | ontology identifier from Uberon |
organ__ontology_label | ontology_label | ontology label from Uberon |
library_preparation_protocol | ontology | ontology identifier from Experimental Factor ontology:library preparation |
library_preparation_protocol__ontology_label | ontology_label | ontology label from Experimental Factor ontology:library preparation |
sex | controlled list (enum) | one of ["male", "female", "mixed", "unknown"] |
*multiple values allowed - see Array valued metadata
biosample_id and donor_id are identifiers assigned by the study owner to associate cells in a study to their biosample and/or donor, respectively.
Inclusion of ontology_label metadata allows validation of required metadata to ensure the submitted ontology ID is the intended ontology term.
Metadata Name | Type | Description |
---|---|---|
cell_type | ontology | ontology identifier from Cell Ontology |
cell_type__ontology_label | ontology_label | ontology label from Cell Ontology |
Additional optional convention metadata are listed in the SCP Metadata convention with documentation on convention components and additional guidance on usage.
Metadata files uploaded using the SCP Metadata convention are validated for the following properties:
- Metadata names
- must exactly match Metadata Name in metadata convention for metadata to be added to the query database
- are case-sensitive and must also case-match the conventional metadata name in the schema
- unconventional metadata names may only have alphanumeric characters or underscore
- Ontology terms must
- be correctly formatted ("<ontology name>_<numeric ID>" or "<ontology name>:<numeric ID>")
- example: MONDO_0000001 or MONDO:0000001
- exist in the expected metadata ontology, validated through EBI OLS.
- supply the human-readable label ontology label for the provided ontology ID, if the metadata is Required (<metadata attribute>__ontology_label <- note the usage of double underscore)
- be correctly formatted ("<ontology name>_<numeric ID>" or "<ontology name>:<numeric ID>")
- Ontology labels must exactly match the label or synonym in EBI OLS
- Metadata with controlled lists (type = enum) must exactly match one of the enumerated values.
- Metadata values provided must match their type and class declarations in the metadata convention.
- Array-valued metadata must be delimited with the pipe symbol (|)
View an example Conventional Metadata File
The example conventional metadata file demonstrates a study with 5 samples from 3 donors. The "mouse1" donor contributes two samples "mm1_lymph" and "mm1_blood". The sample collection had both normal (PATO_0000461) samples and disease samples. Unconventional metadata "average intensity" is included and can be used to paint cells in plots. The cells in this metadata file were annotated with the convention-optional "cell_type" metadata, so that cells from this study are findable through the Cell Type facet of advanced search if the metadata file is validated against the metadata convention.
Download the example Conventional Metadata File
To validate your metadata file against the metadata convention, select the "yes" option.
Visit our Metadata Validation Errors FAQ for information on common metadata file issues and solutions.
The legacy SCP metadata file format is a tab delimited file (without quotes) that has one required column and two required rows.
Columns: The first column is required and contains cell names; one should include all cells given in the expression file. Additional columns are different metadata to be viewed. Please note, cell names should match other cell names in other study files
Rows: The first of the two rows starts with the entry "NAME", after this the name of the metadata contained in each column is given. This is the name users will see and select in the portal. The second row starts with "TYPE" and then contains the value "group" or "numeric" describing the column of metadata. Additional rows describe a cell, given first a cell name and then metadata entries. The cell names should match cell names in other files. For unconventional metadata please try to use descriptive metadata, naming groups in ways others will understand as they view them. Please use only alphanumeric characters and underscore.