Pavel P. Kuksa, PhD
I am a Research Assistant Professor in the Department of
Pathology and Laboratory Medicine, University of Pennsylvania, Perelman School of Medicine and Senior Fellow in the Institute for Biomedical Informatics at the University of Pennsylvania.
Previously, I was a postdoctoral scientist at NEC Laboratories America, Inc.
in the Machine Learning department.
Before that, I was working on my Ph.D. at the Department of Computer Science at Rutgers University, where I was affiliated with
Rutgers Sequence Analysis and Modeling Lab (SeqAM).
My main research interests include applied machine learning, sequence modeling and analysis, biomedical informatics, natural language processing, algorithms, pattern recognition, text and data mining, and computer vision.
I got a bachelor's degree in Computer Engineering (2002) and M.Sc. in Information and Computer Sciences (2004) from the Bauman Moscow State Technical University, Moscow, Russia. Full biography
See also my Google Scholar page
- NEW. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Human Molecular Genetics 2022.
- NEW. Alzheimer's Disease Variant Portal: A Catalog of Genetic Findings for Alzheimer's Disease. Journal of Alzheimer's Disease, 2022. [ADVP website]
- NEW. FILER: a framework for harmonizing and querying large-scale functional genomics knowledge. NAR Genomics and Bioinformatics 2022. [FILER website] [FILER code repository]
- NEW. SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants. Bioinformatics, 2020. [SparkINFERNO bitbucket repository] [SparkINFERNO Docker]
- NEW. HiPR: High-throughput probabilistic RNA structure inference. Computational and Structural Biotechnology Journal, 2020. [HiPR supplementary website] [HiPR GitHub repository]
- NEW. HIPPIE2: a method for fine-scale identification of physically interacting chromatin regions. NAR Genomics and Bioinformatics, 2020. [preprint] [HIPPIE2 Bitbucket repository]
- NEW. DASHR 2.0: integrated database of human small non-coding RNA genes and mature products. Bioinformatics, 2019. DASHR 2.0 database website
- NEW. SPAR: small RNA-seq portal for analysis of sequencing experiments. Nucleic
Acids Research, 2018. SPAR website
- NEW. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Research, 2018. INFERNO website
- NEW. The landscape of short RNAs in human cell types and tissues. ASHG 2017 (selected as top 10% Reviewer's Choice)
- NEW. DASHR 2.0: database of small human non-coding RNAs. ASHG 2017
- NEW. Book chapter: In Silico Identification of RNA Modifications
from High-Throughput Sequencing Data Using HAMR. In A. Lusser (Ed.), RNA Methylation: Methods and Protocols (pp. 211-229). Springer, 2017 [book] [chapter pdf]
- NEW. DASHR: database of small human noncoding RNAs. Nucleic Acids Research, 2016 (Database Issue). Database website
Nov 25, 2015 News article about DASHR database in RNA-Seq news Read here
Nov 20, 2015 News article about DASHR database in miRNA Research & Industry news Read here
The landscape of regulatory post-transcriptionally derived small non-coding RNAs in the human transcriptome. ASHG 2016.
INFERNO - INFERring the molecular mechanisms of NOncoding genetic variants. ASHG 2016.
- NEW. Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. Plant Cell, 2015.
Nov 11, 2015 Editorial on the paper: Revealing the Elusive Plant Epitranscriptome
Nov 23, 2015 News media coverage:
Penn biologists characterize new form of mRNA regulation
New form of mRNA regulation characterized
Dec 4, 2015 PennNews: Dynamic Regulation
Modeling and prediction of inter-molecular interactions:
deep learning, high-order networks, high-order kernel methods.
Prediction and modeling of molecular structures and intra-molecular
interactions: RNA structure
Genomic interaction networks (Hi-C) and gene regulation
Motif finding and identification of protein-factors mediating enhancer-gene
Prediction of disease-associated enhancer elements
High-throughput sequencing for RNA structure probing
Protein remote homology prediction: similarity estimation, prediction and functional and structural characterization of proteins with low similarity (remote homologs). Protein classification and ranking
Motif finding algorithms: sub-linear motif-finding algorithms, algorithms for motif finding in large-alphabet sequences
Sequence modeling and classification
Spatial sample representations (SSR), spatial sample kernels (SSSK) for modeling relationships between sequences of various nature (word sequences, DNA sequences, music sequences)
DNA barcoding: species identification and classification
Music classification: genre prediction, artist ID, etc.
Natural language processing
Semi-supervised large-scale learning
Algorithms for sequence comparison and computation of string kernels
- Introduction of systematic sufficient-statistics based algorithms for computing general class of sequence kernels
- Introduction of motif-stem search in the area of motif finding
- Development of sub-linear selection algorithms for motif finding
- First linear-time algorithms for sequence comparison under general (non-Hamming) similarity metrics
University of Pennsylvania
423 Blockley Hall
pkuksa at upenn dot edu
pkuksa at cs dot rutgers dot edu