-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
for chembl.drug_mechanisms
, can we add CHEMBL.TARGET ID mapping to UniProt?
#151
Comments
Example from Andrew: The chemical Evacetrapib has a chembl drug mechanism entry that says the target is The Chembl website shows a UniProt accession for this CHEMBL3572: P11597. So it seems like CHEMBL.TARGET -> UniProtKB mappings exist. |
Source DataIt's not necessary to use the csv/tsv that Chunlei suggested. We can collect the E.g. from the dumped {
'organism': 'Homo sapiens',
'pref_name': 'Prostanoid IP receptor',
'target_chembl_id': 'CHEMBL1995',
'target_components': [
{
'accession': 'P43119',
'component_description': 'Prostacyclin receptor',
'component_id': 325,
'component_type': 'PROTEIN',
'relationship': 'SINGLE PROTEIN',
}
],
'target_type': 'SINGLE PROTEIN',
'tax_id': 9606
} The result is identical to the accession shown on https://www.ebi.ac.uk/chembl/g/#search_results/all/query=CHEMBL1995 Special Cases
The FixThe initial fix is committed to branch issue-151-fix Sample document ( {
'_id': 'QXWZQTURMXZVHJ-UHFFFAOYSA-N',
'chembl': {
'molecule_chembl_id': 'CHEMBL238804',
'inchi_key': 'QXWZQTURMXZVHJ-UHFFFAOYSA-N',
'smiles': 'CC(C)N(CCCCOCC(=O)NS(C)(=O)=O)c1cnc(-c2ccccc2)c(-c2ccccc2)n1',
'inchi': 'InChI=1S/C26H32N4O4S/c1-20(2)30(16-10-11-17-34-19-24(31)29-35(3,32)33)23-18
-27-25(21-12-6-4-7-13-21)26(28-23)22-14-8-5-9-15-22/h4-9,12-15,18,20H,10-11,16-17,19H2,1-3H3,
(H,29,31)',
'drug_indications': [ ... ],
'drug_mechanisms': [
{
'action_type': 'AGONIST',
'mechanism_refs': [
{
'id': 'label/2015/207947s000lbl.pdf',
'type': 'FDA',
'url':
'http://www.accessdata.fda.gov/drugsatfda_docs/label/2015/207947s000lbl.pdf',
'FDA': 'label/2015/207947s000lbl.pdf'
}
],
'target_chembl_id': 'CHEMBL1995',
'target_components': ['P43119'],
'target_type': 'SINGLE PROTEIN',
'target_organism': 'Homo sapiens',
'target_name': 'Prostanoid IP receptor'
}
]
}
} TODO: field structureSeveral questions to @colleenXu :
|
Some questions:
|
@erikyao Feedback on the ENSEMBL ID part: The ENSG ID and that entity (microRNA 30a) doesn't seem to exist in Uniprot. Which I guess makes sense since it doesn't seem to code for a protein... I guess the structure I proposed above (which I edited since I made a mistake) would still work... |
Thank you, @colleenXu
From the data I parsed from https://www.ebi.ac.uk/chembl/api/data/target.json, a component is either None, an Ensembl Gene ID, or a UniProt accession ID.
Sure, I can make it in the parser. |
Nope, my current code cannot fix this problem. Do you have any idea (like a second file/API to fill the blanks)? |
|
I am not sure... and I think
My latest code will remove those null fields. Those documents will be kept. |
Here's the chembl sites for some chemicals that have this "null" issue Looks like CHEMBL's website also lacks the target info / any mechanism info. |
Yep, I think we can at first align our documents to the the CHEMBL report cards, and then fill the blanks in the future if necessary. |
Side note: I found a possible error in CHEMBL, and have reported it to chembl/GLaDOS#1310 |
Check the latest release of MyChem for The |
Looks good to me. @erikyao, can we close this issue? |
@colleenXu issue closed.
|
@erikyao I'm confused. I see 60 documents that seem to have both |
@colleenXu sorry, my bad. I found I was analyzing on a subset of source data... Gladly we have both the uniprot and ensembl fields indexed. |
I asked here biothings/pending.api#100 whether we could make a new pending API for the chembl drug mechanisms data https://mychem.info/v1/query?q=_exists_:%22chembl.drug_mechanisms%22.
One reason was because the current API used CHEMBL.TARGET IDs (this target entity is involved in the drug mechanism of the chembl compound X). It looks like the CHEMBL.TARGETs are mostly Gene/Protein entities. It would be easier to use if we used a more-universal ID namespace...
Related: Chunlei has found mappings between CHEMBL.TARGET and UniProtKB IDs here biothings/mygene.info#105 (comment)
The text was updated successfully, but these errors were encountered: