Home CV

Pavel P. Kuksa

Associate Fellow
Institute for Biomedical Informatics
School of Medicine
University of Pennsylvania

Publications:

    Journal Publications

  1. Yuk Yee Leung*, Pavel P. Kuksa*, Alexandre Amlie-Wolf, Otto Valladares, Lyle H. Ungar, Sampath Kannan, Brian D. Gregory, and Li-San Wang. DASHR: database of small human noncoding RNAs. Nucleic Acids Research (Database Issue), 2016.
    Details     BibTeX    [pdf] [URL] 
  2. Lee E. Vandivier, Rafael Campos, Pavel P. Kuksa, Ian M. Silverman, Li-San Wang, and Brian D. Gregory. Chemical Modifications Mark Alternatively Spliced and Uncapped Messenger RNAs in Arabidopsis. The Plant Cell, 27(11):3024–3037, 2015.
    Details     BibTeX    [pdf] [URL] 
  3. Pavel P. Kuksa, Martin Renqiang Min, Rishabh Dugar, and Mark Gerstein. High-order neural networks and kernel methods for peptide-MHC binding prediction. Bioinformatics, 31(22):3600–3607, 2015.
    Details     BibTeX    [pdf] [URL] 
  4. Shawn W Foley, Lee E Vandivier, Pavel P Kuksa, and Brian D Gregory. Transcriptome-wide measurement of plant \RNA secondary structure. Current Opinion in Plant Biology, 27:36 – 43, 2015. Cell signalling and gene regulation
    Details     BibTeX    [pdf] [URL] 
  5. Yih-Chii Hwang, Chiao-Feng Lin, Otto Valladares, John Malamon, Pavel Kuksa, Qi Zheng, Brian D. Gregory, and Li-San Wang. HIPPIE: A high-throughput identification pipeline for promoter interacting enhancer elements. Bioinformatics, 2014.
    Details     BibTeX    [pdf] (218.2kB )  [URL] 
  6. Pavel P. Kuksa. Biological Sequence Analysis with Multivariate String Kernels. IEEE/ACM Transactions on Computational Biology and Bioinformatics, March 2013.
    Details     BibTeX    [pdf] (396.0kB )  
  7. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Efficient use of unlabeled data for protein sequence classification: a comparative study. BMC Bioinformatics, 10(Suppl 4):S2, 2009. Impact factor: 3.78
    Details     BibTeX    [pdf] (173.3kB )  [URL] 
  8. Pavel Kuksa and Vladimir Pavlovic. Efficient alignment-free DNA barcode analytics. BMC Bioinformatics, 10(Suppl 14):S9, 2009. Impact factor: 3.78
    Details     BibTeX    [pdf] (1.8MB )  [URL]  [Supplementary Material]
  9. Pavel Kuksa and Vladimir Pavlovic. Efficient motif finding algorithms for large-alphabet inputs. BMC Bioinformatics, 11(Suppl 8):S1, 2010.
    Details     BibTeX    [pdf] (597.3kB )  [URL] 
  10. Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (Almost) from Scratch. Journal of Machine Learning, 12:2493–2537, 2011.
    Details     BibTeX    [pdf] (414.8kB )  

  11. Refereed conferences

  12. Pavel P. Kuksa, Martin Renqiang Min, Rishabh Dugar, and Mark Gerstein. High-order neural networks and kernel methods for MHC-peptide binding prediction. In NIPS Machine Learning in Computational Biology, 2014.
    Details     BibTeX    [pdf] (328.0kB )  
  13. Y.-C. Hwang*, P. P. Kuksa*, B. D. Gregory, L.-S. Wang. Identifying the transcription factors mediating enhancer--target gene regulation in the human genome. In American Society of Human Genetics Annual Meeting (ASHG), 2015. (Platform talk)
    Details     BibTeX    [URL] 
  14. Mitchell Tang, Christian Kramer, George Xu, Michele Hawk, Yih-Chii Hwang, Chiao-Feng Lin, Pavel Kuksa, Weixin Wang, Beth A. Dombroski, Adam C. Naj, Li-San Wang, Gerald D. Schellenberg. Prediction of Late-Onset Alzheimer's Disease Associated Enhancer Elements. In Alzheimer's Association International Conference (AAIC), 2015.
    Details     BibTeX    [URL] 
  15. P. P. Kuksa, Y. Y. Leung, A. Amlie-Wolf, B. D. Gregory, L.-S. Wang. SPAR: Sequencing-based pipeline for annotating novel small non-coding RNAs. In American Society of Human Genetics Annual Meeting (ASHG), 2015.
    Details     BibTeX    [URL] 
  16. Y. Y. Leung*, P. P. Kuksa*, A. Amlie-Wolf, O. Valladares, B. D. Gregory, L.-S. Wang. DASHR - Database of small human non-coding RNAs. In American Society of Human Genetics Annual Meeting (ASHG), 2015.
    Details     BibTeX    [URL] 
  17. Pavel P. Kuksa. Efficient multivariate sequence classification. In CoRR abs/1409.8211, 2013.
    Details     BibTeX    [pdf] (357.2kB )  
  18. Pavel P. Kuksa and Vladimir Pavlovic. Efficient evaluation of large sequence kernels. In KDD, 2012 (oral presentation). Acceptance rate: 133/755 (17.6%)
    Details     BibTeX    [pdf] (303.6kB )  
  19. Pavel P. Kuksa, Imdadullah Khan, and Vladimir Pavlovic. Generalized Similarity Kernels for Efficient Sequence Classification. In SDM, 2012. Acceptance rate: 99/362 (27%)
    Details     BibTeX    [pdf] (315.8kB )  
  20. Pavel P. Kuksa. Efficient sequence kernel-based genome-wide prediction of transcription factors. In ICPR, 2012.
    Details     BibTeX    [pdf] (168.7kB )  
  21. Pavel P. Kuksa. 2D similarity kernels for biological sequence classification. In BIOKDD, 2012.
    Details     BibTeX    [pdf] (222.4kB )  
  22. Pavel P. Kuksa. Efficient time series classification with Multivariate similarity kernels. In NYAS Machine Learning Symposium, 2012. Oral presentation
    Details     BibTeX    [pdf] (180.7kB )  
  23. Pavel Kuksa and Yanjun Qi. Semi-Supervised Bio-Named Entity Recognition with Word-Codebook Learning. In SDM, 2010. Acceptance rate: 82/351 (23%)
    Details     BibTeX    [pdf] (394.1kB )  
  24. Pavel P. Kuksa, Yanjun Qi, Bing Bai, Ronan Collobert, Jason Weston, Vladimir Pavlovic, and Xia Ning. Semi-Supervised Abstraction-Augmented String Kernel for Multi-Level Bio-Relation Extraction. In ECML, 2010. Acceptance rate: 106/658 (16%)
    Details     BibTeX    [pdf] (199.4kB )  
  25. Yanjun Qi, Ronan Collobert, Pavel Kuksa, Koray Kavukcuoglu, and Jason Weston. Combining labeled and unlabeled data with word-class distribution learning. In Proceeding of the 18th ACM Conference on Information and Knowledge Management CIKM 2009, pp. 1737–1740, 2009. Acceptance rate: (123+171)/847 (20% short paper)
    Details     BibTeX    [pdf] (142.2kB )  [URL] 
  26. Yanjun Qi, Pavel P. Kuksa, Ronan Collobert, Kunihiko Sadamasa, Koray Kavukcuoglu, and Jason Weston. Semi-Supervised Sequence Labeling with Self-Learned Features. In Proc. International Conference on Data Mining (ICDM'09), IEEE, 2009. Acceptance rate: 8.9% regular (70/786)
    Details     BibTeX    [pdf] (231.8kB )  
  27. Pavel P. Kuksa and Vladimir Pavlovic. Efficient Motif Finding Algorithms for Large-Alphabet Inputs. In BIOKDD, 2010. Acceptance rate: 7/29 regular (24%)
    Details     BibTeX    [pdf] (442.7kB )  
  28. Pavel Kuksa and Vladimir Pavlovic. Fast motif selection for biological sequences. In IEEE International Conference on Bioinformatics and Biomedicine BIBM'09, 2009. Acceptance rate: (44+37)/233 (35%)
    Details     BibTeX    [pdf] (159.6kB )  [URL] 
  29. Pavel P. Kuksa and Vladimir Pavlovic. Spatial Representation for Efficient Sequence Classification. In ICPR, 2010. Acceptance rate: 385/2140 oral (18%)
    Details     BibTeX    [pdf] (144.0kB )  
  30. Pavel Kuksa and Vladimir Pavlovic. Efficient Alignment-free Barcode Analytics. In Third International Barcode of Life Conference, 2009.
    Details     BibTeX     [URL] 
  31. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Scalable Algorithms for String Kernels with Inexact Matching. In NIPS, 2008. Spotlight Presentation. Acceptance rate: 123/1022 (12%)
    Details     BibTeX    [pdf] (111.0kB )  [supplementary data]
  32. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. On the role of local matching for efficient semi-supervised protein sequence classification. In BIBM, 2008. Acceptance rate: 38/156 (24%)
    Details     BibTeX    [pdf] (133.0kB )  
  33. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. A fast, semi-supervised learning method for protein sequence classification. In 8th International Workshop on Data Mining in Bioinformatics (BIOKDD 2008), pp. 29–37, 2008. Acceptance rate: 8/25 (32%)
    Details     BibTeX    [pdf] (197.0kB )  
  34. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Fast and Accurate Multi-class Protein Fold Recognition with Spatial Sample Kernels. In Computational Systems Bioinformatics: Proceedings of the CSB2008 Conference, pp. 133–143, 2008. Acceptance rate: 30/135 (22%)
    Details     BibTeX    [pdf] (399.1kB )   [supplementary materials]
  35. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Fast Protein Homology and Fold Detection with Sparse Spatial Sample Kernels. In 19th International Conference on Pattern Recognition ICPR 2008, 2008. Acceptance rate: 18% (oral). Best paper nominee
    Details     BibTeX    [pdf] (89.0kB )  [supplementary materials]
  36. Pavel Kuksa and Vladimir Pavlovic. Fast Barcode-Based Species Identification Using String Kernels. In Second International Barcode of Life Conference, 2007. Acceptance rate: 30% (oral)
    Details     BibTeX    [URL] 
  37. Pavel Kuksa and Vladimir Pavlovic. Fast Kernel Methods for SVM Sequence Classifiers. In WABI, pp. 228–239, 2007. Acceptance rate: 37/131 (28%)
    Details     BibTeX    [pdf] (145.2kB )  
  38. Workshop papers

  39. Pavel P. Kuksa. Using string kernels to predict gene expression. Snowbird Learning Workshop, Snowbird, Utah, April 2012, 2012.
    Details     BibTeX    [pdf] (93.4kB )  
  40. Pavel P. Kuksa. 2D similarity kernels and representations for sequence data. Snowbird Learning Workshop, Snowbird, Utah, April 2012, 2012.
    Details     BibTeX    [pdf] (225.4kB )  
  41. Pavel Kuksa and Vladimir Pavlovic. Efficient evaluation of large sequence kernels. In NYAS Machine Learning Symposium, 2011.
    Details     BibTeX    [pdf] (203.4kB )  
  42. Pavel Kuksa and Vladimir Pavlovic. Efficient Sequence Classification with Spatial Representations. In Snowbird Learning Workshop, April 2010. Oral presentation.
    Details     BibTeX    [pdf] (22.9kB )  
  43. Vladimir Pavlovic and Pavel Kuksa. Large scale sequence analytics. In Center for Dynamic Data Analytics (CDDA) Workshop (January 25-26, 2010), 2010.
    Details     BibTeX    [URL] 
  44. Jason Weston, Ronan Collobert, Frederic Ratle, Hossein Mobahi, Pavel Kuksa, and Koray Kavukcuoglu. Deep Learning via Semi-Supervised Embedding. In ICML 2009 Workshop on Learning Feature Hierarchies, 2009.
    Details     BibTeX    [URL] 
  45. Pavel Kuksa and Vladimir Pavlovic. Efficient Discovery of Common Patterns in Sequences. Snowbird Learning Workshop, Clearwater, Florida, April 13-16 2009, 2009.
    Details     BibTeX    [pdf] (100.8kB )  
  46. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. High Performance Sequence Classification with Novel Spatial Sample Embedding. 3rd Annual Machine Learning Symposium, NY, Oct 10, 2008, 2008.
    Details     BibTeX    [URL] 
  47. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Spatially-constrained sample kernel for sequence classification. Snowbird Learning Workshop, Utah, April 1-4, 2008, 2008.
    Details     BibTeX    [pdf] (38.6kB)  [poster]
  48. Pavel Kuksa and Vladimir Pavlovic. Kernel methods for DNA barcoding. Snowbird Learning Workshop, San Juan, Puerto Rico, March 2007, 2007.
    Details     BibTeX    [URL] 
  49. Technical Reports

  50. Pavel Kuksa and Vladimir Pavlovic. Sublinear selection algorithms for motif finding. DIMACS, 2010.
    [pdf]
  51. Pavel Kuksa and Vladimir Pavlovic. Efficient discovery of common patterns in sequences over large alphabets. Technical Report 2009-15, DIMACS, 2009.
    Details     BibTeX    [pdf] (395.2kB )  
  52. Pavel Kuksa, Pai-Hsi Huang, and Vladimir Pavlovic. Kernel Methods and Algorithms for General Sequence Analysis. Technical Report DCS-TR-630, Rutgers University, 2008.
    Details     BibTeX    [URL] 
  53. Pai-Hsi Huang, Pavel Kuksa, and Vladimir Pavlovic. Fast and accurate semi-supervised protein homology detection with large uncurated sequence databases. Technical Report RU-DCS-TR634, Rutgers University, 2008.
    Details     BibTeX    [pdf] (218.6kB )  
  54. Robert S. Moore, Richard Howard, Pavel Kuksa, and Richard P. Martin. A Geometric Approach to Device-Free Motion Localization Using Signal Strength. Technical Report DCS-TR-674, Rutgers University, 2010.
    Details     BibTeX    [pdf] (456.2kB )  
  55. Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural Language Processing (almost) from Scratch. arXiv:1103.0398v1, 2011.
    Details     BibTeX    [pdf] (726.8kB )  
  56. Pavel P. Kuksa, Imdadullah Khan, and Vladimir Pavlovic. Generalized Similarity Kernels for Efficient Sequence Classification. Technical Report RU-DCS-TR684, Rutgers University, 2011.
    Details     BibTeX    [pdf] (164.2kB )  

Theses

Invited Lectures and Talks

Other Presentations

Publications in Russian