Miller School Researchers Help Push the Limits of Programming Languages in Biology
Bohdan Khomtchouk, a fourth-year human genetics and genomics Ph.D candidate working in the Center for Therapeutic Innovation and the Department of Psychiatry and Behavioral Sciences at the University of Miami Miller School of Medicine, has spearheaded the publication of a high-impact review paper published in Briefings in Bioinformatics that was the subject of this year’s invited keynote speech at the European Lisp Symposium in Brussels, Belgium.
Khomtchouk is a recipient of the prestigious National Defense Science and Engineering Graduate Fellowship, which is awarded through the U.S. Department of Defense. He plans to continue pushing the frontiers of bioinformatics programming languages in their application to biological research problems at Stanford University, where he will be starting his postdoctoral research career in the field of epigenetics this year in the Department of Biology.
Khomtchouk is joined in authorship by his graduate advisor, Claes Wahlestedt, M.D., Ph.D., professor of psychiatry and behavioral sciences, associate dean for therapeutic innovation and director of the Center for Therapeutic Innovation at the Miller School; Edmund Weitz, Ph.D., professor of mathematics at the University of Applied Sciences in Hamburg, Germany; and Peter D. Karp, Ph.D., director of the Bioinformatics Research Group within the Artificial Intelligence Center at SRI International.
“Bioinformatics and computational biology software is dominated largely by higher-level languages like R and Python and lower-level languages like C and C++,” said Khomtchouk. “In our paper, we systematically review the advantages posed by a unique hybrid of languages, called the Lisp family of languages, that offer both high-level scripting and low-level performance capabilities not commonly seen in other languages.”
In bioinformatics and computational biology, Lisp has successfully been applied to research in systems biology, high-performance computing, database curation, drug discovery, computational chemistry and nanotechnology, network and pathway -omics analysis, single-nucleotide polymorphism analysis and RNA structure prediction.
“It has generally been an accepted practice in biology to inject C or C++ code into computationally intensive portions of Python or R for performance reasons,” said Wahlestedt, senior author of the study. “This is often not an easy task, so to get around it, most of the heavy-lifting in today’s Python and R code is done predominantly using libraries written almost entirely in Fortran or C. These packages are great to use when they are available, but it may not be possible when writing something very novel that no one has tried before or when developing your own library, where there is no existing set of software packages to leverage. These days if you want to build things from scratch and still have the code run fast, you have to master the fine balancing act of injecting a lower-level language into your software architecture.”
“This was actually some of the most complex software engineering work I’ve had to do in grad school, as it required a near-perfect understanding of the intricacies of C and R, to the point where you can actually resonate them off of each other effectively” says Khomtchouk, who developed a popular R/Bioconductor package called geneXtendeR, which is almost entirely written in C. “Getting the R and C code to play nicely with each other was sometimes very tricky. That’s when I started looking into Lisp.”
“Lisp offers programmers the unique ability to write code as quickly and easily as in other high-level languages like R or Python, yet retain all or nearly all of the performance from writing in a lower-level language like C,” said Karp. “Lisp boosts programmer efficiency and maximizes productivity. My group has used it for 25 years to develop one of the most comprehensive bioinformatics software systems.”
“I think that besides its hybrid nature — being able to write both interactive and compiled Lisp within the same application — Lisp itself is particularly great for creating domain-specific languages,” said Khomtchouk. “That can be quite useful for fitting the language to the problem, not the other way around. Basically, the language never gets in your way. The easiest way to explain it would be that if you want to cross a mountain, you don’t have to climb over it or go around it — you can simply go through it. It is a hard analogy to explain until you try coding something complex in Lisp yourself and are coming in with experience using other programming languages. There can certainly be a fair share of ‘aha’ moments.”
“These moments were the driving motivation behind sharing our experiences with the broader biological community,” said Wahlestedt. “We recommend giving Lisp a try, whether it be Common Lisp, or Scheme, or Clojure. To quote one famous Lisper: ‘Lisp is worth learning for the profound enlightenment experience you will have when you finally get it; that experience will make you a better programmer for the rest of your days, even if you never actually use Lisp itself a lot.’”
“I was personally very pleased that Bohdan accepted our invitation to speak at the European Lisp Symposium,” said Alberto Riva, Ph.D., the symposium’s Program Committee chair. “I am a bioinformatics scientist with a background in Artificial Intelligence and Knowledge Engineering, and I have been using Common Lisp for most of my career. It is very rewarding to see that, thanks to the hard work of our small but dedicated community, Lisp and the ideas it spearheaded are returning to the forefront of scientific research. I am looking forward to working with a new generation of scientists, of which Bohdan is an excellent representative, to bring the power and elegance of Lisp to the field of computational biology.”
Tags: bioinformatics, Bohdan Khomtchouk, Claes Wahlestedt, comutational biology, Lisp languages, Miller School of Medicine, University of Miami