Pavel P. Kuksa

Pavel P. Kuksa, PhD

Erdös Number: 3

Einstein Number: 5

Research Assistant Professor
Perelman School of Medicine,
University of Pennsylvania

Postdoctoral Scientist
Machine Learning Department,
NEC Laboratories America, Princeton, NJ

Ph.D. Candidate
Department of Computer Science,
Rutgers University,
110 Frelinghuysen Road,
Piscataway, NJ 08854

Curriculum vitae (CV) [pdf]
Publications
Teaching
Talks
Software
Statement of Research Interests (pdf)
Statement of Teaching Interests (pdf)

I am a Research Assistant Professor in the Department of Pathology and Laboratory Medicine, University of Pennsylvania, Perelman School of Medicine and Senior Fellow in the Institute for Biomedical Informatics at the University of Pennsylvania. Previously, I was a postdoctoral scientist at NEC Laboratories America, Inc. in the Machine Learning department. Before that, I was working on my Ph.D. at the Department of Computer Science at Rutgers University, where I was affiliated with Rutgers Sequence Analysis and Modeling Lab (SeqAM). My main research interests include applied machine learning, sequence modeling and analysis, biomedical informatics, natural language processing, algorithms, pattern recognition, text and data mining, and computer vision. I got a bachelor's degree in Computer Engineering (2002) and M.Sc. in Information and Computer Sciences (2004) from the Bauman Moscow State Technical University, Moscow, Russia. Full biography

Education

2005-2011	PhD in Computer Science Rutgers University Advisor: Vladimir Pavlovic Ph.D. thesis: Scalable Kernel Methods and Algorithms for General Sequence Analysis Dissertation committee: Vladimir Pavlovic, Casimir Kulikowski, Alexander Schliep, Christina Leslie
2002-2004	M.Sc. in Computer Science Bauman Moscow State Technical University
1998-2002	B.Sc. in Computer Engineering Bauman Moscow State Technical University

Publications

See also my Google Scholar page

Curriculum vitae

Teaching

Research highlights

NEW. hipFG pre-print on bioRxiv: hipFG: High-throughput harmonization and integration pipeline for functional genomics data bioRxiv doi:https://doi.org/10.1101/2023.04.21.537695
NEW. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Human Molecular Genetics 2022.
NEW. Alzheimer's Disease Variant Portal: A Catalog of Genetic Findings for Alzheimer's Disease. Journal of Alzheimer's Disease, 2022. [ADVP website]
NEW. FILER: a framework for harmonizing and querying large-scale functional genomics knowledge. NAR Genomics and Bioinformatics 2022. [FILER website] [FILER code repository] <
NEW. SparkINFERNO: a scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants. Bioinformatics, 2020. [SparkINFERNO bitbucket repository] [SparkINFERNO Docker]
NEW. HiPR: High-throughput probabilistic RNA structure inference. Computational and Structural Biotechnology Journal, 2020. [HiPR supplementary website] [HiPR GitHub repository]
NEW. HIPPIE2: a method for fine-scale identification of physically interacting chromatin regions. NAR Genomics and Bioinformatics, 2020. [preprint] [HIPPIE2 Bitbucket repository]
NEW. DASHR 2.0: integrated database of human small non-coding RNA genes and mature products. Bioinformatics, 2019. DASHR 2.0 database website
NEW. SPAR: small RNA-seq portal for analysis of sequencing experiments. Nucleic Acids Research, 2018. SPAR website
NEW. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Research, 2018. INFERNO website
NEW. The landscape of short RNAs in human cell types and tissues. ASHG 2017 (selected as top 10% Reviewer's Choice)
NEW. DASHR 2.0: database of small human non-coding RNAs. ASHG 2017
NEW. Book chapter: In Silico Identification of RNA Modifications from High-Throughput Sequencing Data Using HAMR. In A. Lusser (Ed.), RNA Methylation: Methods and Protocols (pp. 211-229). Springer, 2017 [book] [chapter pdf]
NEW. DASHR: database of small human noncoding RNAs. Nucleic Acids Research, 2016 (Database Issue). Database website
Nov 25, 2015 News article about DASHR database in RNA-Seq news Read here
Nov 20, 2015 News article about DASHR database in miRNA Research & Industry news Read here
The landscape of regulatory post-transcriptionally derived small non-coding RNAs in the human transcriptome. ASHG 2016.
INFERNO - INFERring the molecular mechanisms of NOncoding genetic variants. ASHG 2016.
NEW. Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. Plant Cell, 2015.
Nov 11, 2015 Editorial on the paper: Revealing the Elusive Plant Epitranscriptome
Nov 23, 2015 News media coverage:
Penn biologists characterize new form of mRNA regulation
New form of mRNA regulation characterized
Dec 4, 2015 PennNews: Dynamic Regulation

Modeling and prediction of inter-molecular interactions: deep learning, high-order networks, high-order kernel methods.
- NEW. High-order neural networks and kernel methods for MHC-peptide binding prediction. Bioinformatics'2015.
- High-order neural networks and kernel methods for MHC-peptide binding prediction. MLCB at NIPS'2014.
Prediction and modeling of molecular structures and intra-molecular interactions: RNA structure
Genomic interaction networks (Hi-C) and gene regulation
- NEW. HIPPIE: A high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics'2014.
Motif finding and identification of protein-factors mediating enhancer-gene interactions
- NEW. Identifying the transcription factors mediating enhancer-target gene regulation in the human genome. ASHG'2015 [Platform talk] [Talk slides].
Prediction of disease-associated enhancer elements
- NEW. Prediction of Late-Onset Alzheimer's Disease Associated Enhancer Elements. AAIC'2015.
High-throughput sequencing for RNA structure probing
- NEW. Transcriptome-wide measurement of plant RNA secondary structure. Current Opinion in Plant Biology'2015.
Protein remote homology prediction: similarity estimation, prediction and functional and structural characterization of proteins with low similarity (remote homologs). Protein classification and ranking
Motif finding algorithms: sub-linear motif-finding algorithms, algorithms for motif finding in large-alphabet sequences
- Sublinear selection algorithms for motif finding
- Efficient motif finding algorithms for large-alphabet inputs. BMC Bioinformatics.
- Fast motif selection for biological sequences. BIBM.
Sequence modeling and classification
Spatial sample representations (SSR), spatial sample kernels (SSSK) for modeling relationships between sequences of various nature (word sequences, DNA sequences, music sequences)
- Spatial Representation for Efficient Sequence Classification . ICPR'2010.
DNA barcoding: species identification and classification
- Efficient alignment-free DNA barcode analytics. BMC Bioinformatics.
Music classification: genre prediction, artist ID, etc.
- Efficient multivariate sequence classification
- Efficient time series classification with Multivariate similarity kernels. NYAS Machine learning [Spotlight talk].
Natural language processing
- Natural Language Processing (Almost) from Scratch. JMLR
- Semi-Supervised Abstraction-Augmented String Kernel for Multi-Level Bio-Relation Extraction. ECML.
Semi-supervised large-scale learning
- Semi-Supervised Bio-Named Entity Recognition with Word-Codebook Learning. SDM.
- Semi-Supervised Sequence Labeling with Self-Learned Features. ICDM.
Algorithms for sequence comparison and computation of string kernels
- Efficient evaluation of large sequence kernels. KDD.
- Scalable Algorithms for String Kernels with Inexact Matching. NIPS.

Selected achievements

Introduction of systematic sufficient-statistics based algorithms for computing general class of sequence kernels
Introduction of motif-stem search in the area of motif finding
Development of sub-linear selection algorithms for motif finding
First linear-time algorithms for sequence comparison under general (non-Hamming) similarity metrics

Contact information:

University of Pennsylvania
423 Blockley Hall

E-mail: pkuksa at upenn dot edu
pkuksa at cs dot rutgers dot edu

Web: http://pkuksa.org