Newly released genetics data on poplar trees may be applied to the development of innovative materials.
Image Credit: Flickr user Dano

In January 2017, the Oak Ridge National Laboratory (ORNL) in Tennessee released the largest ever dataset of single nucleotide polymorphisms (SNPs) for plants of the genus Populus, more commonly known as poplar trees. Using data from nearly 900 trees on the west coast of North America, the Genome Wide Association Study (GWAS) identified over 28 million SNPs. According to Gerald Tuskan, the leader of the Plant Systems Biology group at ORNL, this research has implications not only for biologists and ecologists, but also for materials scientists.1

For instance, he suggests that materials science researchers could use the group’s findings on SNPs related to the poplar tree’s production of lignin, a key structural polymer in the plant cell wall. They could look at the way that carbon is transferred through the lignin pathway in different poplar tree variants, and then develop materials, such as new types of carbon fiber or medical devices, based on these modified versions of lignin. They could also look for genetic variants that cause trees to produce low levels of lignin, which is better for the production of biofuels.2 Scientists hoping to use the GWAS study on poplar trees to inform their research can benefit from software that automates data management processes and supports information sharing between researchers.

Effective Management of Large Datasets

Trying to generate research leads from a genome-wide study consisting of 28 million SNPs can seem daunting. If scientists are not able to efficiently sort through existing information to find the results that are relevant for their research interests, a potential project can get so mired in complicated data that it might never even get off the ground. Moreover, manual analysis of datasets as large as the poplar GWAS can lead to accidental human error, which further slows down the research and development process. With modern software, it is possible to automate the access and analysis of information from large datasets, making it much easier for materials scientists to isolate the information that is relevant to the industry and start figuring out how it might be applied to materials development.

Interdisciplinary Research on the Materials Development Applications of the Poplar Genome Dataset

The potential materials development applications of information from the poplar GWAS are numerous and wide-ranging. In addition to vehicle manufacturing and sustainable fueling options, the data could also be used to develop new plastic alternatives or more effective building insulation materials.3  Research leading to the development of such materials would necessarily be highly interdisciplinary, with biologists working alongside chemists and engineering experts to create materials that could address the world’s greatest environmental and economic challenges. One way to support these collaborations is to use modern software that allows scientists from different backgrounds to share their automated analytical protocols. That way, they will not waste time duplicating each other’s experiments, and they can also draw up general procedures that can be tweaked slightly based on the researcher’s area of interest and development goals.

At the same time, individual scientists trying to apply poplar genetic information to materials development will likely need to combine key biological findings with the results from previous materials engineering experiments. For example, when considering how a lignin-like substance may be used to develop a lighter-weight carbon fiber, scientists need to look at both the SNP dataset and their previous research on different types of carbon fiber. Modern software makes it possible to automatically aggregate all of the necessary data from disparate databases, both within and outside of the organization. From there, researchers can integrate the data and run the appropriate analysis on the combined dataset in order to identify possible development applications.

Workflow Authoring for Non-Computer Scientists

Many of the scientists who are tasked with parsing through the poplar tree SNP dataset might not have extensive computer science or software development expertise. After all, for some analytical tasks, it makes much more sense to put an experienced plant biologist or biochemist on the job. In the past, this could lead to research bottlenecks, since these scientists had to rely on the assistance of information technology professionals who could program analysis procedures in order to keep the research moving forward. Luckily, today’s analytical software offers a user-friendly interface, allowing researchers to author their own data analytics, protocols, even if they have not worked with software in the past, thereby speeding up the overall research and development process.

BIOVIA Pipeline Pilot is a workflow authoring application that can automatic the aggregation and analysis of large datasets. It supports the creation of scientific protocols that can easily be shared between researchers, regardless of their locations. Contact us today to learn more about this software.

  1. “Largest Populus SNP dataset holds promise for biofuels, materials, metabolites,” January 17, 2017,
  2. “Sugar and Splice,” February 16, 2015,
  3. “Genome-Wide Association Study (GWAS) Released,” 2017,