Pavel P. Kuksa's Publications

Sorted by DateClassified by Publication TypeDefault Ordering

FILER: a framework for harmonizing and querying large-scale functional genomics knowledge

Pavel P Kuksa, Yuk Yee Leung, Prabhakaran Gangadharan, Zivadin Katanic, Lauren Kleidermacher, Alexandre Amlie-Wolf, Chien-Yueh Lee, Liming Qu, Emily Greenfest-Allen, Otto Valladares, and Li-San Wang. FILER: a framework for harmonizing and querying large-scale functional genomics knowledge. NAR Genomics and Bioinformatics, 4(1), jan 2022.

Download

[PDF] [URL] 

Abstract

Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).

BibTeX

@article{nargab2022filer,
	abstract = {Querying massive functional genomic and annotation data collections, linking and summarizing the query results across data sources/data types are important steps in high-throughput genomic and genetic analytical workflows. However, these steps are made difficult by the heterogeneity and breadth of data sources, experimental assays, biological conditions/tissues/cell types and file formats. FILER (FunctIonaL gEnomics Repository) is a framework for querying large-scale genomics knowledge with a large, curated integrated catalog of harmonized functional genomic and annotation data coupled with a scalable genomic search and querying interface. FILER uniquely provides: (i) streamlined access to >50 000 harmonized, annotated genomic datasets across >20 integrated data sources, >1100 tissues/cell types and >20 experimental assays; (ii) a scalable genomic querying interface; and (iii) ability to analyze and annotate user's experimental data. This rich resource spans >17 billion GRCh37/hg19 and GRCh38/hg38 genomic records. Our benchmark querying 7 × 109 hg19 FILER records shows FILER is highly scalable, with a sub-linear 32-fold increase in querying time when increasing the number of queries 1000-fold from 1000 to 1 000 000 intervals. Together, these features facilitate reproducible research and streamline integrating/querying large-scale genomic data within analyses/workflows. FILER can be deployed on cloud or local servers (https://bitbucket.org/wanglab-upenn/FILER) for integration with custom pipelines and is freely available (https://lisanwanglab.org/FILER).},
	author = {Kuksa, Pavel P and Leung, Yuk Yee and Gangadharan, Prabhakaran and Katanic, Zivadin and Kleidermacher, Lauren and Amlie-Wolf, Alexandre and Lee, Chien-Yueh and Qu, Liming and Greenfest-Allen, Emily and Valladares, Otto and Wang, Li-San},
	bib2html_pubtype = {Journal},
	date-added = {2022-01-26 12:29:57 -0500},
	date-modified = {2022-01-26 12:29:57 -0500},
	doi = {10.1093/nargab/lqab123},
	issn = {2631-9268},
	journal = {NAR Genomics and Bioinformatics},
	mendeley-groups = {Inferno},
	month = {jan},
	number = {1},
	title = {{FILER: a framework for harmonizing and querying large-scale functional genomics knowledge}},
	url = {https://academic.oup.com/nargab/article/doi/10.1093/nargab/lqab123/6507423},
	volume = {4},
	year = {2022},
	bdsk-url-1 = {https://academic.oup.com/nargab/article/doi/10.1093/nargab/lqab123/6507423},
	bdsk-url-2 = {https://doi.org/10.1093/nargab/lqab123}}

Generated by bib2html.pl (written by Patrick Riley ) on Mon Feb 07, 2022 18:48:31