A prototypical heat shock protein whose overexpression is linked to carcinogenesis in humans. Source: http://www.ebi.ac.uk/
As the big data revolution continues, doctors, immunologists and other human-health oriented biologists are growing increasingly disillusioned with their ability to connect insights from large data sets to concrete human health impacts. A new review has crystallized the problem as primarily a deficit of consensus regarding data-driven mechanistic modeling of immunological structures and has set the tone for further progression within the field.1
The review isn’t the first to examine immunology’s modeling problems, either. Other prior research explicitly claims that culling immunological insights from big data sets “often requires computational and statistical skills beyond the reach of most bench scientists”—quite a radical statement.2 Much of the disparity is a result of lack of robust modeling software and a deficit of popular knowledge of molecular modeling methods among bench scientists, though the consensus is shifting toward finding a solution to the problem as quickly as possible.
Bridging the data gap
As the review points out, the disconnect between large data sets and new information relevant to human health outcomes is especially unfortunate given that fundamental research in immunology perpetually renders critical downstream information regarding disease.3 Connecting data to outcomes isn’t just a bonus, it’s a requirement. Modeling immunological functions using big data as a lead-in has proven fruitful for immunologists investigating human health related phenomena in the past, but as mentioned in the new review, modeling remains in the domain of computational specialists.4
Indeed, if you encountered a computational immunologist explaining their research to other immunologists, you might get the idea that the computational immunologist has very low expectations of his peers’ level of knowledge regarding modeling.5 The present and also the next generation of immunologists are being trained to utilize big data, so they should be able to connect population level insights to human disease outcomes if they’re equipped with the right software suite and modeling know-how.6 7
From big data to molecular modeling
The pathway from big data to clinical knowledge is more direct than many scientific pipelines, leading to many easy research pickings that aggressive researchers have happily gobbled up and published. Many of the basic statistical techniques for finding correlations between data sets are common knowledge, though not all correlations can render publishable insights. For “merely” big data driven immunological insights, the pipeline is:
- Acquire already-harvested data set via another researcher or public resource
- Run statistical tests on data set to identify populations of interest
- Characterize populations of interest based off of their shared features
- Compare populations of interest with additional datasets chronicling their health outcomes, if available
- Compile and submit the article to the journal of choice for publishing
There’s far more to big data sets than the easy pickings, however. Immunologists have struggled with basic logistical concerns like data sharing before even delving into the deeper data, though it’s unlikely that they’ll continue to tolerate substandard information systems moving forward.8 Once the basic logistics are handled, doing a deep dive into the data set requires taking the populations of interest identified and characterized from the first-pass analysis and bringing them to the modeling drydock so that populations can be fleshed out and have downstream medical impact hypotheses formed. This is the step where we lose most of our immunologists; hardcore modeling of molecular variations for the purpose of further hypothesis forming is too cumbersome with the software tools that most have on hand via their institutions. It’s no secret why.
To use big data in conjunction with molecular modeling within immunology and successfully relate it to human health impacts, researchers must:
- Perform all of the steps to derive a big-data insight as described previously, save publishing
- Model the relevant immunological molecules or systems for each characteristic of each identified population
- Determine if any of the modeled mechanisms intersect with a mechanism of human pathology using a database search
- Model whatever is known regarding the pathology’s mechanism of action
- Combine the pathology model with the immunological data model within one population
- Determine the chain of causation which is responsible for the pathology related to the modeled immunological mechanisms by modeling each step of the pathological process for one characteristic within one population, then for all characteristics within all populations
- Run another big-data study to correlate the health outcomes of the populations of interest who have immunological features which are correlated with a pathological mechanism and may be at risk depending on the veracity of the models
Scaling up original methods
Depending on the number of different populations that popped up in the original statistical analysis, immunologists could potentially be modeling hundreds of different protein variations or thousands of different immunological mechanisms across dozens of different steps understood to be a part of a single pathology. The problem now has an additional dimension: data transfer. If immunologists were struggling with transferring big data sets between each other before exploding each population within the data set into a series of complex models, connecting data to models to human disease will escalate the problem to an even higher level. Provided that researchers can overcome the logistical data difficulties at both ends, they’ll still have to use a platform which enables extremely streamlined modeling of immunological molecules—anything less will add a tremendous amount of time and effort to producing the many models that will be characteristic of immunology research in the near future. Thankfully, there is a software platform that can handle the scale and also the depth of big data enabled immunological modeling.
BIOVIA Biologics is the modeling software that immunologists of the future will use to process their big data insights into useable models which render information on human health outcomes. Contact us today to find out how you can use Biologics to model the populations that your group characterizes via the computational analysis of big data sets.
- “Solving Immunology?” December 2016, http://www.cell.com/trends/immunology/abstract/S1471-4906(16)30202-2?_returnURL=http%3A%2F%2Flinkinghub.elsevier.com%2Fretrieve%2Fpii%2FS1471490616302022%3Fshowall%3Dtrue# ↩
- “Expanding the Immunology Toolbox: Embracing Public-Data Reuse and Crowdsourcing.” 2016, http://www.cell.com/immunity/fulltext/S1074-7613(16)30491-5. ↩
- “The Journey from Discoveries in Fundamental Immunology to Cancer Immunotherapy.” April 2015, http://www.cell.com/cancer-cell/fulltext/S1535-6108(15)00095-1. ↩
- “Model-based genotype-phenotype mapping used to investigate gene signatures of immune sensitivity and resistance in melanoma micrometastasis.” April 2016, http://www.nature.com/articles/srep24967. ↩
- “Probabilistic modeling and molecular phylogeny.” 2010, http://www.cbs.dtu.dk/courses/27685.imm/presentations_2010/gorm_max_likelihood_immunology.pdf. ↩
- “University of Rochester School of Medicine & Dentistry Center for Biodefense Immune Modeling 2015 Symposium on Immune Modeling in the Big Data Era.” June 2015, https://cbim.urmc.rochester.edu/education/2015-symposium. ↩
- “Teaching ‘big data’ analysis to young immunologists.” August 2015, http://www.nature.com/ni/journal/v16/n9/full/ni.3250.html?WT.ec_id=NI-201509&spMailingID=49357798&spUserID=ODkwMTM2NjI2MQS2&spJobID=743025969&spReportId=NzQzMDI1OTY5S0. ↩
- “Toward effective sharing of high-dimensional immunology data.” August 2014, http://www.nature.com/nbt/journal/v32/n8/full/nbt.2974.html. ↩