Home CV

Pavel P. Kuksa

Research associate
Department of Pathology and Laboratory Medicine
Institute for Biomedical Informatics
Perelman School of Medicine
University of Pennsylvania

Publications:

See also my Google Scholar page

    Journal Publications

  1. Pavel P. Kuksa, Chia-Lun Liu, Wei Fu, Liming Qu, Yi Zhao, Zivadin Katanic, Kaylyn Clark, Amanda B. Kuzma, Pei-Chuan Ho, Kai-Teh Tzeng, Otto Valladares, Shin-Yi Chou, Adam C. Naj, Gerard D. Schellenberg, Li-San Wang, and Yuk Yee Leung. Alzheimer's Disease Variant Portal: A Catalog of Genetic Findings for Alzheimer's Disease. Journal of Alzheimer's Disease, Preprint(Preprint):1–17, IOS Press, 2022.
    Details     BibTeX    [pdf] [URL] 
  2. Pavel P. Kuksa, Yuk Yee Leung, Prabhakaran Gangadharan, Zivadin Katanic, Lauren Kleidermacher, Alexandre Amlie-Wolf, Chien-Yueh Lee, Liming Qu, Emily Greenfest-Allen, Otto Valladares, and Li-San Wang. FILER: a framework for harmonizing and querying large-scale functional genomics knowledge. NAR Genomics and Bioinformatics, 4(1), jan 2022.
    Details     BibTeX    [pdf] [URL] 
  3. Pavel P. Kuksa, Chien-Yueh Lee, Alexandre Amlie-Wolf, Prabhakaran Gangadharan, Elizabeth E Mlynarski, Yi-Fan Chou, Han-Jen Lin, Heather Issen, Emily Greenfest-Allen, Otto Valladares, Yuk Yee Leung, and Li-San Wang. SparkINFERNO: A scalable high-throughput pipeline for inferring molecular mechanisms of non-coding genetic variants. Bioinformatics, April 2020.
    Details     BibTeX    [pdf] [URL] 
  4. Pavel P. Kuksa, Alexandre Amlie-Wolf, Yih-Chii Hwang, Otto Valladares, Brian D. Gregory, and Li-San Wang. HIPPIE2: a method for fine-scale identification of physically interacting chromatin regions. NAR Genomics and Bioinformatics, 2020.
    Details     BibTeX    [pdf] (2.0MB )  [URL] 
  5. Pavel P. Kuksa, Fan Li, Sampath Kannan, Brian D. Gregory, Yuk Yee Leung, and Li-San Wang. HiPR: High-throughput probabilistic RNA structure inference. Computational and Structural Biotechnology Journal, 18:1539 – 1547, 2020.
    Details     BibTeX    [pdf] [URL] 
  6. Pavel P Kuksa, Alexandre Amlie-Wolf, Zivadin Katanic, Otto Valladares, Li-San Wang, and Yuk Yee Leung. DASHR 2.0: integrated database of human small non-coding RNA genes and mature products. Bioinformatics, 35(6):1033–1039, Mar 2019.
    Details     BibTeX    [pdf] [URL] 
  7. Alexandre Amlie-Wolf, Mitchell Tang, Elisabeth E Mlynarski, Pavel P. Kuksa, Otto Valladares, Zivadin Katanic, Debby Tsuang, Christopher D Brown, Gerard D Schellenberg, and Li-San Wang. INFERNO: inferring the molecular mechanisms of noncoding genetic variants. Nucleic Acids Research, 46(17):8740–8753, 2018.
    Details     BibTeX    [pdf] [URL] 
  8. Pavel P Kuksa, Alexandre Amlie-Wolf, \v Zivadin Katani\'c, Otto Valladares, Li-San Wang, and Yuk Yee Leung. SPAR: small RNA-seq portal for analysis of sequencing experiments. Nucleic Acids Research, 46(W1):W36–W42, 2018.
    Details     BibTeX    [pdf] [URL] 
  9. Yuk Yee Leung*, Pavel P. Kuksa*, Alexandre Amlie-Wolf, Otto Valladares, Lyle H. Ungar, Sampath Kannan, Brian D. Gregory, and Li-San Wang. DASHR: database of small human noncoding RNAs. Nucleic Acids Research (Database Issue), 2016.
    Details     BibTeX    [pdf] [URL] 
  10. Lee E. Vandivier, Rafael Campos, Pavel P. Kuksa, Ian M. Silverman, Li-San Wang, and Brian D. Gregory. Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. The Plant Cell, 27(11):3024–3037, 2015.
    Details     BibTeX    [pdf] [URL] 
  11. Pavel P. Kuksa, Martin Renqiang Min, Rishabh Dugar, and Mark Gerstein. High-order neural networks and kernel methods for peptide-MHC binding prediction. Bioinformatics, 31(22):3600–3607, 2015.
    Details     BibTeX    [pdf] [URL] 
  12. Shawn W Foley, Lee E Vandivier, Pavel P Kuksa, and Brian D Gregory. Transcriptome-wide measurement of plant \RNA secondary structure. Current Opinion in Plant Biology, 27:36 – 43, 2015. Cell signalling and gene regulation
    Details     BibTeX    [pdf] [URL] 
  13. Yih-Chii Hwang, Chiao-Feng Lin, Otto Valladares, John Malamon, Pavel Kuksa, Qi Zheng, Brian D. Gregory, and Li-San Wang. HIPPIE: A high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics, 2014.
    Details     BibTeX    [pdf] (218.2kB )  [URL] 
  14. Pavel P. Kuksa. Biological Sequence Analysis with Multivariate String Kernels. IEEE/ACM Transactions on Computational Biology and Bioinformatics, March 2013.
    Details     BibTeX    [pdf] (396.0kB )  
  15. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Efficient use of unlabeled data for protein sequence classification: a comparative study. BMC Bioinformatics, 10(Suppl 4):S2, 2009. Impact factor: 3.78
    Details     BibTeX    [pdf] (173.3kB )  [URL] 
  16. Pavel Kuksa and Vladimir Pavlovic. Efficient alignment-free DNA barcode analytics. BMC Bioinformatics, 10(Suppl 14):S9, 2009. Impact factor: 3.78
    Details     BibTeX    [pdf] (1.8MB )  [URL]  [Supplementary Material]
  17. Pavel Kuksa and Vladimir Pavlovic. Efficient motif finding algorithms for large-alphabet inputs. BMC Bioinformatics, 11(Suppl 8):S1, 2010.
    Details     BibTeX    [pdf] (597.3kB )  [URL] 
  18. Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (Almost) from Scratch. Journal of Machine Learning, 12:2493–2537, 2011.
    Details     BibTeX    [pdf] (414.8kB )  

  19. Refereed conferences

  20. Pavel P. Kuksa, Prabhakaran Gangadharan, Chien-Yueh Lee, Yi-Fan Chou, Emily Greenfest-Allen, Han-Jen Lin, Z. Katanic, Otto Valladares, Yuk Yee Leung, and Li-San Wang. GADB: Large-scale, curated Functional Genomics Annotation Database. In American Society of Human Genetics Annual Meeting (ASHG), 2019.
    Details     BibTeX     (unavailable)
  21. C.-Y. Lee*, Pavel P. Kuksa*, A. Amlie-Wolf, E.E. Mlynarski, Y.-F. Chou, H.-J. Lin, E. Greenfest-Allen, Z. Katanic, O. Valladares, A. Kuzma, A. Naj, G.D. Schellenberg, Y.Y. Leung, L.-S. Wang, and Alzheimer's Disease Sequencing Project. INFERNO2: Scalable Spark-based framework for inferring dysregulated enhancer and noncoding RNAs for WGS and GWAS data. In American Society of Human Genetics Annual Meeting (ASHG), 2019.
    Details     BibTeX     (unavailable)
  22. Y.Y. Leung, Pavel P. Kuksa, C.-Y. Lee, Y.-F. Chou, A. Amlie-Wolf, G.D. Schellenberg, and L.-S. Wang. Non-coding regulatory landscape of Alzheimer's disease variants using GWAS of 63,926 individuals.. In American Society of Human Genetics Annual Meeting (ASHG), 2019.
    Details     BibTeX     (unavailable)
  23. Pavel P. Kuksa*, A. Amlie-Wolf*, Y.-C. Hwang*, B. D. Gregory, and L.-S. Wang. Hi-C-based characterization of the landscape of physically interacting regions and interaction mechanisms across six human cell lines using HiPPIE2. In American Society of Human Genetics Annual Meeting (ASHG), 2018.
    Details     BibTeX     (unavailable)
  24. E.E. Mlynarski, A. Amlie-Wolf, Pavel P. Kuksa, O. Valladares, G.D. Schellenberg, and L.-S. Wang. SV-INFERNO: A Spark based pipeline for INFERring the molecular mechanisms of NOncoding structural variants. In American Society of Human Genetics Annual Meeting (ASHG), 2018.
    Details     BibTeX     (unavailable)
  25. Pavel P. Kuksa, Y.Y. Leung, A. Amlie-Wolf, O. Valladares, and L.-S. Wang. DASHR 2.0: Database of small non-coding RNAs in normal human tissues and cell types. In American Society of Human Genetics Annual Meeting (ASHG), 2017.
    Details     BibTeX    [pdf] (133.7kB )  
  26. Y.Y. Leung*, Pavel P. Kuksa*, A. Amlie-Wolf, and L.-S. Wang. The landscape of short RNAs in human cell types and tissues. In American Society of Human Genetics Annual Meeting (ASHG), 2017. Top 10% Reviewers Choice
    Details     BibTeX    [pdf] (164.9kB )  
  27. A. Amlie-Wolf, M. Tang, Pavel P. Kuksa, Y.Y. Leung, B. Slaff, J. King, B. Dombroski, G.D. Schellenberg, and L.-S. Wang. INFERNO -- INFERring the molecular mechanisms of NOncoding genetic variants. In American Society of Human Genetics Annual Meeting (ASHG), 2016.
    Details     BibTeX     (unavailable)
  28. Y.Y. Leung, Pavel P. Kuksa, A. Amlie-Wolf, and L.-S. Wang. The landscape of regulatory post-transcriptionally derived small non-coding RNAs in the human transcriptome. In American Society of Human Genetics Annual Meeting (ASHG), 2016.
    Details     BibTeX     (unavailable)
  29. Pavel P. Kuksa, Martin Renqiang Min, Rishabh Dugar, and Mark Gerstein. High-order neural networks and kernel methods for MHC-peptide binding prediction. In NIPS Machine Learning in Computational Biology, 2014.
    Details     BibTeX    [pdf] (328.0kB )  
  30. Y.-C. Hwang*, P. P. Kuksa*, B. D. Gregory, L.-S. Wang. Identifying the transcription factors mediating enhancer--target gene regulation in the human genome. In American Society of Human Genetics Annual Meeting (ASHG), 2015. (Platform talk)
    Details     BibTeX    [URL] 
  31. Mitchell Tang, Christian Kramer, George Xu, Michele Hawk, Yih-Chii Hwang, Chiao-Feng Lin, Pavel Kuksa, Weixin Wang, Beth A. Dombroski, Adam C. Naj, Li-San Wang, Gerald D. Schellenberg. Prediction of Late-Onset Alzheimer's Disease Associated Enhancer Elements. In Alzheimer's Association International Conference (AAIC), 2015.
    Details     BibTeX    [URL] 
  32. P. P. Kuksa, Y. Y. Leung, A. Amlie-Wolf, B. D. Gregory, L.-S. Wang. SPAR: Sequencing-based pipeline for annotating novel small non-coding RNAs. In American Society of Human Genetics Annual Meeting (ASHG), 2015.
    Details     BibTeX    [URL] 
  33. Y. Y. Leung*, P. P. Kuksa*, A. Amlie-Wolf, O. Valladares, B. D. Gregory, L.-S. Wang. DASHR - Database of small human non-coding RNAs. In American Society of Human Genetics Annual Meeting (ASHG), 2015.
    Details     BibTeX    [URL] 
  34. Pavel P. Kuksa. Efficient multivariate sequence classification. In CoRR abs/1409.8211, 2013.
    Details     BibTeX    [pdf] (357.2kB )  
  35. Pavel P. Kuksa and Vladimir Pavlovic. Efficient evaluation of large sequence kernels. In KDD, 2012 (oral presentation). Acceptance rate: 133/755 (17.6%)
    Details     BibTeX    [pdf] (303.6kB )  
  36. Pavel P. Kuksa, Imdadullah Khan, and Vladimir Pavlovic. Generalized Similarity Kernels for Efficient Sequence Classification. In SDM, 2012. Acceptance rate: 99/362 (27%)
    Details     BibTeX    [pdf] (315.8kB )  
  37. Pavel P. Kuksa. Efficient sequence kernel-based genome-wide prediction of transcription factors. In ICPR, 2012.
    Details     BibTeX    [pdf] (168.7kB )  
  38. Pavel P. Kuksa. 2D similarity kernels for biological sequence classification. In BIOKDD, 2012.
    Details     BibTeX    [pdf] (222.4kB )  
  39. Pavel P. Kuksa. Efficient time series classification with Multivariate similarity kernels. In NYAS Machine Learning Symposium, 2012. Oral presentation
    Details     BibTeX    [pdf] (180.7kB )  
  40. Pavel Kuksa and Yanjun Qi. Semi-Supervised Bio-Named Entity Recognition with Word-Codebook Learning. In SDM, 2010. Acceptance rate: 82/351 (23%)
    Details     BibTeX    [pdf] (394.1kB )  
  41. Pavel P. Kuksa, Yanjun Qi, Bing Bai, Ronan Collobert, Jason Weston, Vladimir Pavlovic, and Xia Ning. Semi-Supervised Abstraction-Augmented String Kernel for Multi-Level Bio-Relation Extraction. In ECML, 2010. Acceptance rate: 106/658 (16%)
    Details     BibTeX    [pdf] (199.4kB )  
  42. Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. Combining labeled and unlabeled data with word-class distribution learning. In Proceeding of the 18th ACM Conference on Information and Knowledge Management CIKM 2009, pp. 1737–1740, 2009. Acceptance rate: (123+171)/847 (20% short paper)
    Details     BibTeX    [pdf] (142.2kB )  [URL] 
  43. Yanjun Qi, Pavel P. Kuksa, Ronan Collobert, Kunihiko Sadamasa, Koray Kavukcuoglu, and Jason Weston. Semi-Supervised Sequence Labeling with Self-Learned Features. In Proc. International Conference on Data Mining (ICDM'09), IEEE, 2009. Acceptance rate: 8.9% regular (70/786)
    Details     BibTeX    [pdf] (231.8kB )  
  44. Pavel P. Kuksa and Vladimir Pavlovic. Efficient Motif Finding Algorithms for Large-Alphabet Inputs. In BIOKDD, 2010. Acceptance rate: 7/29 regular (24%)
    Details     BibTeX    [pdf] (442.7kB )  
  45. Pavel Kuksa and Vladimir Pavlovic. Fast motif selection for biological sequences. In IEEE International Conference on Bioinformatics and Biomedicine BIBM'09, 2009. Acceptance rate: (44+37)/233 (35%)
    Details     BibTeX    [pdf] (159.6kB )  [URL] 
  46. Pavel P. Kuksa and Vladimir Pavlovic. Spatial Representation for Efficient Sequence Classification. In ICPR, 2010. Acceptance rate: 385/2140 oral (18%)
    Details     BibTeX    [pdf] (144.0kB )  
  47. Pavel Kuksa and Vladimir Pavlovic. Efficient Alignment-free Barcode Analytics. In Third International Barcode of Life Conference, 2009.
    Details     BibTeX     [URL] 
  48. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Scalable Algorithms for String Kernels with Inexact Matching. In NIPS, 2008. Spotlight Presentation. Acceptance rate: 123/1022 (12%)
    Details     BibTeX    [pdf] (111.0kB )  [supplementary data]
  49. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. On the role of local matching for efficient semi-supervised protein sequence classification. In BIBM, 2008. Acceptance rate: 38/156 (24%)
    Details     BibTeX    [pdf] (133.0kB )  
  50. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. A fast, semi-supervised learning method for protein sequence classification. In 8th International Workshop on Data Mining in Bioinformatics (BIOKDD 2008), pp. 29–37, 2008. Acceptance rate: 8/25 (32%)
    Details     BibTeX    [pdf] (197.0kB )  
  51. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Fast and Accurate Multi-class Protein Fold Recognition with Spatial Sample Kernels. In Computational Systems Bioinformatics: Proceedings of the CSB2008 Conference, pp. 133–143, 2008. Acceptance rate: 30/135 (22%)
    Details     BibTeX    [pdf] (399.1kB )   [supplementary materials]
  52. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Fast Protein Homology and Fold Detection with Sparse Spatial Sample Kernels. In 19th International Conference on Pattern Recognition ICPR 2008, 2008. Acceptance rate: 18% (oral). Best paper nominee
    Details     BibTeX    [pdf] (89.0kB )  [supplementary materials]
  53. Pavel Kuksa and Vladimir Pavlovic. Fast Barcode-Based Species Identification Using String Kernels. In Second International Barcode of Life Conference, 2007. Acceptance rate: 30% (oral)
    Details     BibTeX    [URL] 
  54. Pavel Kuksa and Vladimir Pavlovic. Fast Kernel Methods for SVM Sequence Classifiers. In WABI, pp. 228–239, 2007. Acceptance rate: 37/131 (28%)
    Details     BibTeX    [pdf] (145.2kB )  
  55. Workshop papers

  56. Pavel P. Kuksa. Using string kernels to predict gene expression. Snowbird Learning Workshop, Snowbird, Utah, April 2012, 2012.
    Details     BibTeX    [pdf] (93.4kB )  
  57. Pavel P. Kuksa. 2D similarity kernels and representations for sequence data. Snowbird Learning Workshop, Snowbird, Utah, April 2012, 2012.
    Details     BibTeX    [pdf] (225.4kB )  
  58. Pavel Kuksa and Vladimir Pavlovic. Efficient evaluation of large sequence kernels. In NYAS Machine Learning Symposium, 2011.
    Details     BibTeX    [pdf] (203.4kB )  
  59. Pavel Kuksa and Vladimir Pavlovic. Efficient Sequence Classification with Spatial Representations. In Snowbird Learning Workshop, April 2010. Oral presentation.
    Details     BibTeX    [pdf] (22.9kB )  
  60. Vladimir Pavlovic and Pavel Kuksa. Large scale sequence analytics. In Center for Dynamic Data Analytics (CDDA) Workshop (January 25-26, 2010), 2010.
    Details     BibTeX    [URL] 
  61. Jason Weston, Ronan Collobert, Frederic Ratle, Hossein Mobahi, Pavel Kuksa, and Koray Kavukcuoglu. Deep Learning via Semi-Supervised Embedding. In ICML 2009 Workshop on Learning Feature Hierarchies, 2009.
    Details     BibTeX    [URL] 
  62. Pavel Kuksa and Vladimir Pavlovic. Efficient Discovery of Common Patterns in Sequences. Snowbird Learning Workshop, Clearwater, Florida, April 13-16 2009, 2009.
    Details     BibTeX    [pdf] (100.8kB )  
  63. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. High Performance Sequence Classification with Novel Spatial Sample Embedding. 3rd Annual Machine Learning Symposium, NY, Oct 10, 2008, 2008.
    Details     BibTeX    [URL] 
  64. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Spatially-constrained sample kernel for sequence classification. Snowbird Learning Workshop, Utah, April 1-4, 2008, 2008.
    Details     BibTeX    [pdf] (38.6kB)  [poster]
  65. Pavel Kuksa and Vladimir Pavlovic. Kernel methods for DNA barcoding. Snowbird Learning Workshop, San Juan, Puerto Rico, March 2007, 2007.
    Details     BibTeX    [URL] 
  66. Technical Reports

  67. Pavel Kuksa and Vladimir Pavlovic. Sublinear selection algorithms for motif finding. DIMACS, 2010.
    [pdf]
  68. Pavel Kuksa and Vladimir Pavlovic. Efficient discovery of common patterns in sequences over large alphabets. Technical Report 2009-15, DIMACS, 2009.
    Details     BibTeX    [pdf] (395.2kB )  
  69. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Kernel Methods and Algorithms for General Sequence Analysis. Technical Report DCS-TR-630, Rutgers University, 2008.
    Details     BibTeX    [URL] 
  70. Pai-Hsi Huang, Pavel Kuksa, and Vladimir Pavlovic. Fast and accurate semi-supervised protein homology detection with large uncurated sequence databases. Technical Report RU-DCS-TR634, Rutgers University, 2008.
    Details     BibTeX    [pdf] (218.6kB )  
  71. Robert S. Moore, Richard Howard, Pavel Kuksa, and Richard P. Martin. A Geometric Approach to Device-Free Motion Localization Using Signal Strength. Technical Report DCS-TR-674, Rutgers University, 2010.
    Details     BibTeX    [pdf] (456.2kB )  
  72. Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (almost) from Scratch. arXiv:1103.0398v1, 2011.
    Details     BibTeX    [pdf] (726.8kB )  
  73. Pavel P. Kuksa, Imdadullah Khan, and Vladimir Pavlovic. Generalized Similarity Kernels for Efficient Sequence Classification. Technical Report RU-DCS-TR684, Rutgers University, 2011.
    Details     BibTeX    [pdf] (164.2kB )  

Theses

Invited Lectures and Talks

Other Presentations

Publications in Russian