update vignette

stemangiola · Feb 9, 2023 · f110413 · f110413
1 parent 11a35bf
commit f110413
Showing 1 changed file with 38 additions and 5 deletions.
diff --git a/vignettes/Introduction.Rmd b/vignettes/Introduction.Rmd
@@ -7,6 +7,10 @@ vignette: >
   %\VignetteEncoding{UTF-8}
 ---
 
+`CuratedAtlasQuery` is a query interface that allow the programmatic exploration and retrieval of the harmonised, curated and reannotated CELLxGENE single-cell human cell atlas. Data can be retrieved at cell, sample, or dataset levels based on filtering criteria. 
+
+# Query interface
+
 ```{r, include = FALSE}
 # Note: knit this to the repo readme file using:
 # rmarkdown::render("vignettes/readme.Rmd", output_format = "github_document", output_dir = getwd())
@@ -17,7 +21,7 @@ knitr::opts_chunk$set(
 ```
 
 ```{r, echo=FALSE, out.height = "139px", out.width = "120px"}
-knitr::include_graphics("../inst/logo.png")
+knitr::include_graphics("inst/logo.png")
 ```
 
 ## Load the package
@@ -168,10 +172,39 @@ get_metadata() |>
 ```
 
 ```{r, echo=FALSE, message=FALSE, warning=FALSE}
-knitr::include_graphics("../inst/NCAM1_figure.png")
+knitr::include_graphics("inst/NCAM1_figure.png")
 ```
 
-```{r}
-sessionInfo()
-```
+# Cell metadata
+
+Dataset-specific columns (definitions available at cellxgene.cziscience.com)
+
+`cell_count`, `collection_id`, `created_at.x`, `created_at.y`, `dataset_deployments`, `dataset_id`, `file_id`, `filename`, `filetype`, `is_primary_data.y`, `is_valid`, `linked_genesets`, `mean_genes_per_cell`, `name`, `published`, `published_at`, `revised_at`, `revision`, `s3_uri`, `schema_version`, `tombstone`, `updated_at.x`, `updated_at.y`, `user_submitted`, `x_normalization`
+
+Sample-specific columns (definitions available at cellxgene.cziscience.com)
+
+`.sample`, `.sample_name`, `age_days`, `assay`, `assay_ontology_term_id`, `development_stage`, `development_stage_ontology_term_id`, `ethnicity`, `ethnicity_ontology_term_id`, `experiment___`, `organism`, `organism_ontology_term_id`, `sample_placeholder`, `sex`, `sex_ontology_term_id`, `tissue`, `tissue_harmonised`, `tissue_ontology_term_id`, `disease`, `disease_ontology_term_id`, `is_primary_data.x`
+
+Cell-specific columns (definitions available at cellxgene.cziscience.com)
+
+`.cell`, `cell_type`, `cell_type_ontology_term_idm`, `cell_type_harmonised`, `confidence_class`, `cell_annotation_azimuth_l2`, `cell_annotation_blueprint_singler` 
+
+Through harmonisation and curation we introduced custom column, not present in the original CELLxGENE metadata
+
+- `tissue_harmonised`: a coarser tissue name for better filtering
+- `age_days`: the number of days corresponding to the age
+- `cell_type_harmonised`: the consensus call identiti (for immune cells) using the original and three novel annotations using Seurat Azimuth and SingleR
+- `confidence_class`: an ordinal class of how confident `cell_type_harmonised` is. 1 is complete consensus, 2 is 3 out of four and so on.             
+- `cell_annotation_azimuth_l2`: Azimuth cell annotation
+- `cell_annotation_blueprint_singler`: SingleR cell annotation using Blueprint reference
+- `cell_annotation_blueprint_monaco`: SingleR cell annotation using Monaco reference
+- `sample_id_db`: Sample subdivision for internal use
+- `file_id_db`: File subdivision for internal use
+- `.sample`: Sample ID
+- `.sample_name`: How samples were defined
+
+# RNA abundance
+
+The `raw` assay includes RNA abundance in the positive real scale (not transformed with non-linear functions, e.g. log sqrt). Originally CELLxGENE include a mix of scales and tranformations specified in the `x_normalization` column.
 
+The `cpm` assay includes counts per million.