Figuring Out What Genes Do

Concept of treatment and adjustment of DNA molecule.
Summary
  • The Molecular Phenotypes of Null Alleles in Cells (MorPhiC) Consortium is studying how genes affect cells and organisms.
  • The Data Resource and Administrative Coordinating Center is curating and managing the massive datasets involved in the project.
  • A Miller School research team is analyzing 1,000 genes but believes its work will be applicable to all genes.

The reference genome from the Human Genome Project was a stunning achievement, but it left a lot of questions unanswered. While the project revealed how many genes there are (around 22,000), where they live on chromosomes, disease susceptibilities and other important details, it didn’t tell us what the genes do. That enormous job is now being performed by the Molecular Phenotypes of Null Alleles in Cells (MorPhiC) Consortium.

Sponsored by the National Human Genome Research Institute (NHGRI) in 2022, this five-year project is using a variety of techniques to remove genes and observe how those losses affect cells, tissues and organisms. By illuminating each gene’s function, the group hopes to better understand developmental biology and different diseases.

The University Miami Miller School of Medicine’s portion of the consortium is being led by scientists at the Data Resource and Administrative Coordinating Center (DRACC), which is curating and managing these massive datasets.

Dr. Stephan Schürer standing in front of a background that says, "Shapiing the future of medicine"
Dr. Stephan Schürer says the work his research team is doing will be applicable to the analysis of all genes.

Now, halfway through the project, the team is sharing early data on its website, highlighting MorPhiC’s first set of 71 gene and protein perturbations. A recently published Nature article describes the strategies behind this work and how these approaches are helping to reveal gene function across a range of biological systems. In addition, MorPhiC is freely sharing its data and tools with the scientific community to accelerate the work.

“This first data release is a major milestone because it provides functional information about many genes that was previously unknown,” said Stephan Schürer, Ph.D., professor of molecular and cellular pharmacology at the Miller School, associate director of data science at Sylvester Comprehensive Cancer Center and MorPhiC’s lead investigator. “But even more than that, it describes the protocols we used to get this data, which will give scientists new tools to gather even more information about gene function.”

Analyzing a Gene’s Mission

Researchers have spent the past 24 years trying to understand how genes work, but the results have often been scattered. Most investigations have focused heavily on genes associated with diseases, providing important insights but neglecting the big picture. More than 13,000 papers have been published about TP53, a gene closely linked to cancer progression, while more than half of all human genes have been research afterthoughts.

MorPhiC is changing that. The project includes four centers that perturb genes and observe the consequences, three data analysis centers and the DRACC, which is being led by the Miller School team in collaboration with three other institutions. The group has been incredibly productive.

“The first phase has been focused on about 1,000 genes,” said Dušica Vidović, Ph.D., lead data scientist at Sylvester Comprehensive Cancer Center and a MorPhiC co-investigator. “But I think, by the end of this first phase, we will have done more than we originally projected.”

By systematically characterizing genes, we’re building a roadmap that could accelerate the development of treatments for a wide range of diseases.
Dr. Stephan Schürer

The DRACC team is standardizing information from many different sources. In addition to the experimental data that is central to the project, MorPhiC is collecting metadata to put those findings in context. Recording how the experiments have been conducted, for example, provides richer information to advance the work.

“Our role, as part of the DRACC, is to essentially implement data standards and processing operations and oversee how the data moves,” said Dr. Vidović. “It’s a lot of information—more than 50 terabytes for each dataset—and it has to be extremely robust so everything is reproducible. How the data goes from production to primary and secondary analysis and then into the data portal…it has to be an extremely clean process.”

The Challenge of Taking Out a Gene

MorPhiC scientists eliminate a gene to learn what it does. It’s a case of addition by subtraction, but knocking out a gene is often challenging.

“For many genes, genetic knockout is lethal to the cell, making it difficult to obtain meaningful data,” said Dr. Schürer. “Therefore, alternative approaches, such as targeted protein degradation, are necessary to study essential genes.”

DRACC team members Dr. Dušica Vidović, XXX, Caty Chung and Dr. Stephan Schürer
Team members (from left) Dr. Dušica Vidović, Dr. Nicolette Ross, Caty Chung, M.S., and Dr. Schürer

Another challenge is to characterize the phenotypic and functional consequences of removing a gene in different cellular contexts. The MorPhiC consortium is using a variety of methods to study the DNA, RNA, proteins, metabolome (molecules produced by cellular chemical reactions) and lipidome (the collection of fat molecules in a cell).

Perturbing these alleles is producing a wealth of information. Gene ISL1 is crucial for early heart development, and the knockouts mimic traits often seen in congenital heart defects. EPAS1 is often boosted in people living at high altitudes, helping them adapt to oxygen-poor environments. Losing EPAS1 is detrimental to cells in low oxygen. PAX6 is associated with pancreatic development. MorPhiC research is showing how PAX6 mutations contribute to neonatal diabetes and pancreatic development disorders.

“This data will be a rich and valuable resource for scientists everywhere,” said Dr. Schürer. “We are capturing incredible information about how genes function. MorPhiC is tasked with analyzing 1,000 genes, but the tools we are developing will eventually help us figure out all of them.”

Members of the MorPhiC Consortium recently met to discuss their progress, collaborations and future direction. The next phase will expand efforts to perturb genes and proteins to understand their function, knowledge that will ultimately improve human health.

“For every effective drug, there has usually been years of foundational research to understand the underlying biology,” said Dr. Schürer. “By systematically characterizing genes, we’re building a roadmap that could accelerate the development of treatments for a wide range of diseases.”


Tags: data science, Department of Molecular and Cellular Pharmacology, Dr. Stephen Schwartz, gene editing, genetics, molecular and cellular pharmacology, technology