EXPLAINER: the evolution of genomic read-outs: from exomes, to genomes to epigenomes
And an analogy for understanding the uses of the different types of 'Omics
I saw a Twitter thread recently which discussed some new Whole-Genome Sequencing (WGS) initiatives that captivated the attention of the #SciTwitter crowd, or what remains of it, regarding the type of Science that can be done with Exomes vs Whole-Genome Sequencing. I want to add epigenomes to the picture, which I think give a more broader overview of what we will see more of in the near future.
Definitions
Here’s a breakdown of the differences between Whole Exome Sequencing (WES), Whole Genome Sequencing (WGS), and periodic sequencing of epigenome profiles from liquid biopsy cell-free DNA:
1. Whole Exome Sequencing (WES)
Definition: WES is a sequencing technique that focuses on sequencing only the exome, which includes all the protein-coding regions of the genome. These regions account for about 1-2% of the entire genome but contain approximately 85% of known disease-causing mutations.
Focus: Only the coding regions (exons) of genes.
Cost & Data Size: It used to be less expensive than WGS, difficult to say if that’s still the case, and generates smaller datasets compared to WGS since only a small fraction of the genome is sequenced and needs to be archived in EHRs for future reference.
Applications:
Identification of genetic mutations in protein-coding regions.
Diagnosis of monogenic (single-gene) disorders.
Studies of inherited diseases or cancer where coding mutations are implicated.
2. Whole Genome Sequencing (WGS)
Definition: WGS is a sequencing approach that reads the entire genome, including both the coding (exons) and non-coding regions (introns, regulatory elements, intergenic regions).
Focus: The entire genome, encompassing both coding and non-coding DNA.
Cost & Data Size: It used to be more expensive than WES and produces larger, more complex datasets due to the breadth of information obtained. Illumina brought WGS into the forefront of genomics with the advent of the Illumina HiSeqX instrument in 2013-2014, in what became the $1,000 genome, and since then the cost has gone down to the $100-200/genome range.
Applications:
Comprehensive genomic studies, including detecting mutations in non-coding regions.
Structural variant detection (e.g., insertions, deletions, duplications).
Deeper insights into regulatory regions and non-coding RNAs involved in disease.
3. Periodic Sequencing of Epigenome Profiles from Liquid Biopsy Cell-Free DNA (cfDNA-EPS)
Definition: This refers to the periodic analysis of epigenetic modifications (like DNA methylation) found in cell-free DNA (cfDNA), which circulates in the blood and can be obtained through non-invasive liquid biopsies. The focus is on epigenetic markers rather than just DNA sequence mutations.
Focus:
Epigenetic alterations, particularly changes in methylation patterns or chromatin accessibility, rather than genetic mutations.
The epigenome profile in cfDNA typically reflects the status of tissues or tumors from which the DNA is released, making it valuable for detecting early-stage cancers or monitoring disease progression.
Cost & Data Size: Varies depending on the extent of the epigenetic profiling (e.g., whole-genome sequencing or targeted methylation panels), but it can be less expensive than WGS for certain applications. The list price of Grail Bio’s Galleri for out of pocket patients is a bit less than $1,000 per assay.
Applications:
Cancer detection and monitoring: cfDNA often reflects the tumor's epigenetic landscape, providing insights into cancer without the need for invasive biopsies.
Tracking disease progression: Changes in methylation patterns can indicate tumor evolution or response to therapy.
Non-invasive prenatal testing (NIPT): Analysis of fetal cfDNA in maternal blood for epigenetic and genetic markers. NIPT epigenome profiling is still not commonly seen, but cfDNA genome profiling is commonly used around the World.
Summary of Key Differences:
WES targets only exons (protein-coding regions), whereas WGS covers the entire genome.
Epigenome profiling of cfDNA is focused on studying epigenetic changes rather than sequence changes, and it's performed on circulating DNA in the bloodstream, making it a non-invasive tool for monitoring dynamic biological processes such as cancer progression.
Each method serves different purposes in genomic and epigenetic research, depending on the need for specificity (WES), comprehensiveness (WGS), or periodic monitoring of epigenetic alterations (cfDNA).
How different are WES, WGS and cfDNA-EPS?
A good analogy to explain what the differences are between the technologies, and the kind of information one can get from them, is to compare it to the exercise of trying to know what the weather forecast will be like on a given day in, say Cambridge, UK.
If someone is planning a trip to Cambridge UK and wants to get a good guess of what’ll be like when they are there, one quick and simple way of getting a rough forecast is to go to the Wikipedia page and check the Climate section.
Let’s say someone is planning to visit in October, then this simple readout would tell us that they should expect a Mean Daily Max/Mean/Min temperatures of 15.3/11.4/7.4. The average precipitation for October is the second highest of the entire year at 56.2 (2.22), second only to August with 53.2 (2.27), and it rains an average of 9.5 days in October. So it’s never too hot in October, and in fact, it’s never too hot in Cambridge all year round, and one should expect rain in 1 of every 3-4 days, with little differences during the year.
The analogy with Genomics, in the modern era of Next-Generation Sequencing, and how this information can be applied in the clinical setting, is that given the thousands if not hundreds of thousands of exomes and genomes that have already been sequenced, and the GWAS studies that were done with them, given an patient needing a diagnostics, even before we look at the individual’s omics data, just by looking at their ancestry and some other clinical metadata of the patient, we will be able to guesstimate their individual risk of certain diseases, which on average will be lower/higher on a specific disease than other people from other ancestries and a given health history for themselves and their ancestors (parents, grandparents, etc.). It will also help if we can reconstruct the genealogy tree of the individual and layer in it the information of past diseases: for example, for a pedigree of family members that are prone to cardiovascular diseases, this may be predictive of the patient’s individual risk for such diseases.
Now if we sequenced the exome of this hypothetical patient, we would get a readout of about 2% of the genome which they inherited from their biological mum and dad. This would be valuable to tell us if this patient has a genetic condition related to a protein-coding mutation identified in their exome, and we would also be able to be more specific about their risk of diseases that may develop later in life. In exome datasets, a Polygenic Risk Score (PRS) is a numerical value that estimates an individual's genetic predisposition to a certain trait or disease based on the sum of effects from multiple genetic variants, specifically those located within protein-coding regions of the genome (exons). PRS aggregates the risk associated with multiple genetic variants, such as single nucleotide polymorphisms (SNPs), found within the exons (the parts of the genome that code for proteins). Each of these variants may contribute a small amount to the overall risk for a complex trait or disease. Unlike whole-genome PRS, which includes both coding and non-coding variants, exome PRS is limited to those SNPs found within exons. For traits where coding variants play a key role in the disease mechanism (e.g., some types of cancers or rare Mendelian disorders), an exome-based PRS might still be quite informative. However, for polygenic diseases with significant contributions from non-coding regions (e.g., cardiovascular disease, diabetes), the exome PRS may be incomplete.
Continuing with the weather analogy, we could go beyond the climate definition for Cambridge, UK and actually check what the weather forecast tells for the next couple of weeks. If we check the BBC Weather Forecast website at the time of this writing, we get a more detailed readout with Min/Max daily temperatures and percentage rain estimate, wind forecast, UV light and Pollution. There are no weather warning for the next few days.
The forecast gets less descriptive for a week from today or two-weeks from today.
Going back to the genomics in this analogy, we can equate the type of predictive information that is more specific in the weather prediction for today to exome sequencing, were we can make better predictions about the type of diseases that are mostly driven by protein-coding mutations, but it does give some information about the forecast for the next one or two weeks, of the kind that we would get with Whole-Genome Sequencing.
Now this forecast above can be summarized in the individual data points as shown in the screenshots, but it is a summary of all the information that was used to generate it. A more high-resolution readout would be to look at the readings from radars pointing at the sky in the UK, which tell us where the clouds are located in 5 minute spans, so that we would look in the animation at where the clouds are moving, where is the rain currently falling and where it could fall in the next few minutes or hour. Sometimes this is more informative, sometimes it is not.
We have clear skies in Cambridge at the time of this writing, which is to say that for certain periods of time, after looking at the animation of the location of the clouds, our weather prediction for rain in Cambridge doesn’t change to what we got from the BBC Weather website page. Similarly in the Exome vs Whole-Genome analogy, for certain classes of diseases or conditions, sometimes the Whole-Genome doesn’t tell much more than the Exome, but sometimes the answer it will be very different from WGS than it is from WES.
Finally, we could add to the analogy here the periodic profiling of the cfDNA epigenome from blood samples. This is something that can be done periodically, by taking a 10-20ml blood sample from the patient, which could be a 40 year old person, then assaying it for their cfDNA to measure the epigenetic marks in each molecule, most commonly 5mC, but technologies like Oxford Nanopore sequencing can also do 5hmC and similar related epigenetic marks. For certain diseases, like cancer, early detection methods are crucial to the prognosis of the disease: identifying cancer early makes it more treatable, makes the treatments less invasive, and the prognosis for the patient is much better than for someone with Phase 4 metastatic cancer. The analogy here would be somewhat similar weather forecasting tools for tornadoes and tornado seasons: in some parts of the World we are now entering tornado season, and it is crucial to have ways of measuring the atmosphere with great detail so that we can predict when a tornado is forming and the direction that is travelling at any given time. Not being able to predict it early enough means that the effects of the tornado can be catastrophic. But at the same time, we can be highly confident that there won’t be any tornadoes at a given territory, given the history of the last few hundred years where there hasn’t been any incidence on a seasonal basis. In cancer screening, the age at which a patient would benefit from cfDNA-EPS or a similar type of regular screening has been set at 45-50 years old or older: this means that the “tornado season” for cancer in the average human population starts at 45-50 years old, and finishes later in life, at a point where the risk of developing cancer is reduced again. This later point is difficult to precisely pinpoint, but there is plenty of literature describing that the human body enters what’s called “snipper’s alley” at a given age, and if we survive it, then there is a point that the risk of death from certain diseases is reduced. So for a window of time, the “tornado season” of cancer incidence, we benefit from regular screening using tools like Grail Bio’s Galleri, the pioneer Multi-Cancer Early Detection (MCED) assay that started this new trend, or for specific cancer types, e.g. colorectal or breast or lung cancer, other types of assays that are equivalent in their cancer screening use for individual cancer types. The Galleri assay is a pioneer in the use of cfDNA-EPS, but is not without its critics: people in the know, like Alex Dickinson who posts on LinkedIn about it, have publicly expressed criticism about what can and can’t be done with Grail Bio’s Galleri as it is today. Others like the investor analysts behind the Ark Genomics ARKG fund have also been critical of Grail Bio Galleri, and in favour of “one cancer at a time” approaches like Exact Science’s Cologuard, although in recent times the team behind ARKG has changed, and so has their opinion on this matter. One way or another, regardless of the specific assay and the branding on the box, it won’t take long for these high-resolution “tornado early detection” systems to be applied in many clinical settings.