Cell line
: depmap, 2024-Q2¶
!lamin load laminlabs/bionty-assets
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!
if you see this message repeatedly, run: import bionty; bionty.base.reset_sources()
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!
if you see this message repeatedly, run: import bionty; bionty.base.reset_sources()
💡 connected lamindb: laminlabs/bionty-assets
import lamindb as ln
import bionty as bt
import pandas as pd
ln.context.uid = "GOgp5sRkbin90000"
ln.track()
new_ontology = ln.ULabel.filter(name="new_ontology").one()
ln.context.run.transform.ulabels.add(new_ontology)
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!
if you see this message repeatedly, run: import bionty; bionty.base.reset_sources()
✅ wrote new records from public sources.yaml to /home/zeth/.lamin/bionty/versions/sources_local.yaml!
if you see this message repeatedly, run: import bionty; bionty.base.reset_sources()
💡 connected lamindb: laminlabs/bionty-assets
💡 notebook imports: bionty==0.48.2 lamindb==0.76.0 pandas==2.2.2
WARNING: Skipping /home/zeth/miniconda3/envs/lamindb/lib/python3.11/site-packages/jupyterlab_widgets-3.0.10.dist-info due to invalid metadata entry 'name'
💡 loaded Transform('GOgp5sRkbin90000') & loaded Run('2024-08-21 12:49:45.216289+00:00')
Curate source¶
We obtained the model.csv
file from https://depmap.org/portal/data_page/?tab=allData using version 24Q2.
depmap_df = pd.read_csv("depmap_q2_model.csv", sep=",")
depmap_df.head(3)
ModelID | PatientID | CellLineName | StrippedCellLineName | DepmapModelType | OncotreeLineage | OncotreePrimaryDisease | OncotreeSubtype | OncotreeCode | LegacyMolecularSubtype | ... | EngineeredModel | TissueOrigin | ModelDerivationMaterial | PublicComments | CCLEName | HCMIID | WTSIMasterCellID | SangerModelID | COSMICID | DateSharedIndbGaP | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACH-000001 | PT-gj46wT | NIH:OVCAR-3 | NIHOVCAR3 | HGSOC | Ovary/Fallopian Tube | Ovarian Epithelial Tumor | High-Grade Serous Ovarian Cancer | HGSOC | NaN | ... | NaN | NaN | NaN | NaN | NIHOVCAR3_OVARY | NaN | 2201.0 | SIDM00105 | 905933.0 | NaN |
1 | ACH-000002 | PT-5qa3uk | HL-60 | HL60 | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | NaN | ... | NaN | NaN | NaN | NaN | HL60_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | NaN | 55.0 | SIDM00829 | 905938.0 | NaN |
2 | ACH-000003 | PT-puKIyc | CACO2 | CACO2 | COAD | Bowel | Colorectal Adenocarcinoma | Colon Adenocarcinoma | COAD | NaN | ... | NaN | NaN | NaN | NaN | CACO2_LARGE_INTESTINE | NaN | NaN | SIDM00891 | NaN | NaN |
3 rows × 43 columns
completeness_per_column = depmap_df.notna().mean() * 100
completeness_per_column
ModelID 100.000000
PatientID 100.000000
CellLineName 100.000000
StrippedCellLineName 100.000000
DepmapModelType 100.000000
OncotreeLineage 99.744768
OncotreePrimaryDisease 100.000000
OncotreeSubtype 100.000000
OncotreeCode 92.802450
LegacyMolecularSubtype 7.708014
LegacySubSubtype 42.419602
PatientMolecularSubtype 7.044410
RRID 96.324655
Age 79.428280
AgeCategory 100.000000
Sex 98.723839
PatientRace 32.363451
PrimaryOrMetastasis 83.665135
SampleCollectionSite 99.744768
SourceType 97.498724
SourceDetail 91.475242
CatalogNumber 52.935171
PatientTreatmentStatus 3.215926
PatientTreatmentType 0.153139
PatientTreatmentDetails 0.153139
Stage 0.357325
StagingSystem 0.000000
PatientTumorGrade 0.000000
PatientTreatmentResponse 0.204186
GrowthPattern 100.000000
OnboardedMedia 78.611536
FormulationID 78.611536
PlateCoating 0.000000
EngineeredModel 0.714650
TissueOrigin 0.000000
ModelDerivationMaterial 0.102093
PublicComments 4.338948
CCLEName 97.090352
HCMIID 0.612557
WTSIMasterCellID 49.821337
SangerModelID 62.072486
COSMICID 49.872384
DateSharedIndbGaP 0.000000
dtype: float64
# Drop all columns with less than 70% completeness
depmap_df = depmap_df.loc[:, completeness_per_column >= 70]
depmap_df
ModelID | PatientID | CellLineName | StrippedCellLineName | DepmapModelType | OncotreeLineage | OncotreePrimaryDisease | OncotreeSubtype | OncotreeCode | RRID | ... | AgeCategory | Sex | PrimaryOrMetastasis | SampleCollectionSite | SourceType | SourceDetail | GrowthPattern | OnboardedMedia | FormulationID | CCLEName | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | ACH-000001 | PT-gj46wT | NIH:OVCAR-3 | NIHOVCAR3 | HGSOC | Ovary/Fallopian Tube | Ovarian Epithelial Tumor | High-Grade Serous Ovarian Cancer | HGSOC | CVCL_0465 | ... | Adult | Female | Metastatic | ascites | ATCC | ATCC | Adherent | MF-001-041 | RPMI + 20% FBS + 0.01 mg/ml insulin | NIHOVCAR3_OVARY |
1 | ACH-000002 | PT-5qa3uk | HL-60 | HL60 | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_0002 | ... | Adult | Female | Primary | haematopoietic_and_lymphoid_tissue | ATCC | ATCC | Suspension | MF-005-001 | IMDM + 10% FBS | HL60_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE |
2 | ACH-000003 | PT-puKIyc | CACO2 | CACO2 | COAD | Bowel | Colorectal Adenocarcinoma | Colon Adenocarcinoma | COAD | CVCL_0025 | ... | Adult | Male | Primary | Colon | ATCC | ATCC | Adherent | MF-015-009 | EMEM + 20% FBS | CACO2_LARGE_INTESTINE |
3 | ACH-000004 | PT-q4K2cp | HEL | HEL | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_0001 | ... | Adult | Male | Primary | haematopoietic_and_lymphoid_tissue | DSMZ | DSMZ | Suspension | MF-001-001 | RPMI + 10% FBS | HEL_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE |
4 | ACH-000005 | PT-q4K2cp | HEL 92.1.7 | HEL9217 | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_2481 | ... | Adult | Male | NaN | bone_marrow | ATCC | ATCC | Mixed | MF-001-001 | RPMI + 10% FBS | HEL9217_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
1954 | ACH-003161 | PT-or1hkT | ABM-T9430 | ABMT9430 | ZIMMPSC | Pancreas | Non-Cancerous | Immortalized Pancreatic Stromal Cells | NaN | NaN | ... | Unknown | NaN | NaN | pancreas | ABM | ABM | Adherent | MF-043-001 | PriGrow I (TM001) + 25 μg/ml BPE + 0.15 ng/ml ... | NaN |
1955 | ACH-003181 | PT-W75e4m | NRH-LMS1 | NRHLMS1 | LMS | Soft Tissue | Leiomyosarcoma | Leiomyosarcoma | LMS | NaN | ... | Adult | Female | Metastatic | soft_tissue | Academic lab | Oslo University Hospital-The Norwegian Radium ... | Mixed | MF-001-014 | RPMI + 5% FBS | NRH-LMS1 |
1956 | ACH-003183 | PT-BqidXH | NRH-MFS3 | NRHMFS3 | MFS | Soft Tissue | Myxofibrosarcoma | Myxofibrosarcoma | MFS | NaN | ... | Adult | Male | Primary | soft_tissue | Academic lab | Oslo University Hospital-The Norwegian Radium ... | Mixed | MF-001-014 | RPMI + 5% FBS | NRH-MFS3 |
1957 | ACH-003184 | PT-21NMVa | NRH-LMS2 | NRHLMS2 | LMS | Soft Tissue | Leiomyosarcoma | Leiomyosarcoma | LMS | NaN | ... | Adult | Female | Primary | soft_tissue | Academic lab | Oslo University Hospital-The Norwegian Radium ... | Mixed | MF-001-014 | RPMI + 5% FBS | NRH-LMS2 |
1958 | ACH-003191 | PT-B8KJKw | NRH-GCT2 | NRHGCT2 | GCTB | Bone | Giant Cell Tumor of Bone | Giant Cell Tumor of Bone | GCTB | NaN | ... | Adult | Male | Primary | soft_tissue | Academic lab | Oslo University Hospital-The Norwegian Radium ... | Mixed | MF-001-014 | RPMI + 5% FBS | NRH-GCT2 |
1959 rows × 21 columns
# Unfortunately, there is no reasonable 'definitions' column
depmap_df = depmap_df.rename(columns={"ModelID": "ontology_id",
"CellLineName": "name",
})
depmap_df["parents"] = "[]"
depmap_df['synonyms'] = depmap_df['StrippedCellLineName'] + '|' + depmap_df['CCLEName']
depmap_df = depmap_df.drop(["StrippedCellLineName",
"CCLEName"], axis=1)
cols = ['name', 'synonyms'] + [col for col in depmap_df.columns if col not in ['name', 'synonyms']]
depmap_df = depmap_df[cols]
depmap_df = depmap_df.set_index("ontology_id")
depmap_df = depmap_df.drop(["PatientID",
"Age",
"AgeCategory",
"Sex",
"SourceType",
"SourceDetail"], axis=1)
depmap_df
name | synonyms | DepmapModelType | OncotreeLineage | OncotreePrimaryDisease | OncotreeSubtype | OncotreeCode | RRID | PrimaryOrMetastasis | SampleCollectionSite | GrowthPattern | OnboardedMedia | FormulationID | parents | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
ontology_id | ||||||||||||||
ACH-000001 | NIH:OVCAR-3 | NIHOVCAR3|NIHOVCAR3_OVARY | HGSOC | Ovary/Fallopian Tube | Ovarian Epithelial Tumor | High-Grade Serous Ovarian Cancer | HGSOC | CVCL_0465 | Metastatic | ascites | Adherent | MF-001-041 | RPMI + 20% FBS + 0.01 mg/ml insulin | [] |
ACH-000002 | HL-60 | HL60|HL60_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_0002 | Primary | haematopoietic_and_lymphoid_tissue | Suspension | MF-005-001 | IMDM + 10% FBS | [] |
ACH-000003 | CACO2 | CACO2|CACO2_LARGE_INTESTINE | COAD | Bowel | Colorectal Adenocarcinoma | Colon Adenocarcinoma | COAD | CVCL_0025 | Primary | Colon | Adherent | MF-015-009 | EMEM + 20% FBS | [] |
ACH-000004 | HEL | HEL|HEL_HAEMATOPOIETIC_AND_LYMPHOID_TISSUE | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_0001 | Primary | haematopoietic_and_lymphoid_tissue | Suspension | MF-001-001 | RPMI + 10% FBS | [] |
ACH-000005 | HEL 92.1.7 | HEL9217|HEL9217_HAEMATOPOIETIC_AND_LYMPHOID_TI... | AML | Myeloid | Acute Myeloid Leukemia | Acute Myeloid Leukemia | AML | CVCL_2481 | NaN | bone_marrow | Mixed | MF-001-001 | RPMI + 10% FBS | [] |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
ACH-003161 | ABM-T9430 | NaN | ZIMMPSC | Pancreas | Non-Cancerous | Immortalized Pancreatic Stromal Cells | NaN | NaN | NaN | pancreas | Adherent | MF-043-001 | PriGrow I (TM001) + 25 μg/ml BPE + 0.15 ng/ml ... | [] |
ACH-003181 | NRH-LMS1 | NRHLMS1|NRH-LMS1 | LMS | Soft Tissue | Leiomyosarcoma | Leiomyosarcoma | LMS | NaN | Metastatic | soft_tissue | Mixed | MF-001-014 | RPMI + 5% FBS | [] |
ACH-003183 | NRH-MFS3 | NRHMFS3|NRH-MFS3 | MFS | Soft Tissue | Myxofibrosarcoma | Myxofibrosarcoma | MFS | NaN | Primary | soft_tissue | Mixed | MF-001-014 | RPMI + 5% FBS | [] |
ACH-003184 | NRH-LMS2 | NRHLMS2|NRH-LMS2 | LMS | Soft Tissue | Leiomyosarcoma | Leiomyosarcoma | LMS | NaN | Primary | soft_tissue | Mixed | MF-001-014 | RPMI + 5% FBS | [] |
ACH-003191 | NRH-GCT2 | NRHGCT2|NRH-GCT2 | GCTB | Bone | Giant Cell Tumor of Bone | Giant Cell Tumor of Bone | GCTB | NaN | Primary | soft_tissue | Mixed | MF-001-014 | RPMI + 5% FBS | [] |
1959 rows × 14 columns
depmap_df.to_parquet("df_all__depmap__2024-Q2__CellLine.parquet")
Register in laminlabs/bionty-assets
¶
from bionty.core._bionty import register_source_in_bionty_assets
source_record = bt.Source.filter(name="depmap", organism="all", version="2024-Q2", entity="bionty.CellLine").one()
register_source_in_bionty_assets(filepath="df_all__depmap__2024-Q2__CellLine.parquet", source=source_record)
... uploading df_all__depmap__2024-Q2__CellLine.parquet: 100.0%
registered Source(uid='2zHO', entity='bionty.CellLine', organism='all', name='depmap', version='2024-Q2', in_db=False, currently_used=True, description='Dependency Map', url='s3://bionty-assets/df_all__depmap__2024-Q2__CellLine.parquet', md5='', source_website='https://depmap.org/portal/', created_by_id=2, dataframe_artifact_id=180, updated_at='2024-08-21 13:28:33 UTC') with dataframe Artifact(uid='ImWoMC3V3jU2WFCztblE', is_latest=True, key='df_all__depmap__2024-Q2__CellLine.parquet', suffix='.parquet', size=110099, hash='Ic6bd5W9ImRj0ZAbM_1zww', _hash_type='md5', visibility=1, _key_is_virtual=False, created_by_id=2, storage_id=1, transform_id=10, run_id=11, updated_at='2024-08-21 13:28:27 UTC')
Artifact(uid='ImWoMC3V3jU2WFCztblE', is_latest=True, key='df_all__depmap__2024-Q2__CellLine.parquet', suffix='.parquet', size=110099, hash='Ic6bd5W9ImRj0ZAbM_1zww', _hash_type='md5', visibility=1, _key_is_virtual=False, created_by_id=2, storage_id=1, transform_id=10, run_id=11, updated_at='2024-08-21 13:28:27 UTC')
ln.finish()
❗ cells [(9, 11), (11, 14)] were not run consecutively