The Delta and Delta-plus variants of SARS-CoV-2 are making their way around the world, but these are just two of hundreds of unique mutations listed by TRON, a research institute in Mainz, Germany. Using their publicly available CoVigator web search tool, researchers and scientists can navigate TRON’s extensive database that tracks the biogeographic evolution of the virus using genetic sequences from thousands of COVID patients. -19.

TRON gGmbH is a non-profit research organization established as an independent spin-off of the University Medical Center of the Johannes Gutenberg University in Mainz (Germany). TRON, which stands for “Translational Oncology,” bridges the gap between scientific research and pharmaceutical application, pursuing new research to uncover the immunological mechanisms and therapeutic modulation of the immune system. Their research has been cancer-specific primarily, but with the development of the pandemic in 2020, they have applied their tools and expertise to compile a living database of SARS-CoV-2 mutations.

“Vaccines are designed to make the human immune system recognize an invading virus or bacteria and do what it normally does to kill it,” said Martin Löwer, deputy director of the Biomarker Development Center (BDC) at TRON. . “But to develop a vaccine, you have to understand how the virus or bacteria invades human cells and what it does to modify them. We analyze genomes to look for specific mutations in DNA, which are important for understanding the course of disease and how individualized medicine may affect disease in different patients.

Throughout 2020, as COVID-19 became a global pandemic and millions of people around the world were infected, these diverse populations offered hosts where the SARS-CoV-2 virus could evolve by mutation. Mutational variants are of considerable interest to healthcare professionals and scientists as vaccines are developed to control the spread of the disease. We are seeing the dangerous impact of mutations as the Delta variant of SARS-CoV-2 infects immune individuals and spreads virulently among the unvaccinated.

While several studies in 2020 had already reported different strains and an increase in the mutation rate, scientists at TRON became interested in the impact of change on the effectiveness of vaccines as they began to emerge. appear. So they began to analyze the SARS-CoV-2 virus spike glycoprotein (spike protein), which is the characteristic it uses to invade human host cells. MRNA vaccines approved for use are designed to induce immune responses against the spike protein.

“When we started, there was little genomic information for the virus,” said Thomas Bukur, a bioinformatics scientist at TRON. “But we quickly realized that this virus had the potential to mutate and evolve. We also realized that more and more countries were launching major sequencing initiatives. We wanted to create a collection of all this information about viruses and understand the mutations in different populations and geographic regions over time. “

Sequencing initiatives have created large and diverse repositories of virus DNA sequence reads. These sequences continue to lead to the discovery of virus variants, such as Delta, but a database on the evolution of the spike protein across geographies and individuals was not available at the time. TRON used these repositories to begin their search for variants of the spike protein.

Screenshot of CoVigator. Credit: TRON

Discovering mutations requires supercomputing capabilities


Finding variants in a genome is a complex process that involves many calculations performed by powerful computers. A whole genome is an alignment of short segments of base pair sequences from next generation sequence machines (NGS). SARS-CoV-2 DNA is 30,000 base pairs (compared to the 3.2 billion base pairs in the human genome). Genomic repositories provide entire genomic assemblies, viral datasets and sequences that must first be aligned, much like putting together the pieces of a 30,000-piece jigsaw puzzle. Once aligned, the TRON assays examine specific differences between the original “wild-type” reference genome and the sample studied. The differences are annotated as variants (mutations), which, from hundreds of thousands of samples, could represent millions of variants.

To identify non-synonymous mutations in the spike protein, scientists at TRON have built a computational pipeline that uses various genome processing and bioinformatics tools.
This Corona Virus Browser (CoVigator) NGS the pipeline includes trimming, alignment, invocation of variations, and other tasks that use open source tools from many genomics software repositories, including The Large Genome Analysis Toolkit (GATK), BCTools, LoFreq and iVar. Their CoVigator NGS pipeline is implemented in the Nextflow framework and publicly available on GitHub for use by other researchers.

TRON’s main work focuses on cancers and other diseases requiring great medical need. Allocating existing computational resources from this work to analyzes for SARS-CoV-2 could delay other important discoveries. Work with PrimeLine Solutions and Intel’s Pandemic Response Technology Initiative, TRON has acquired ten new Intel Server Systems nodes based on 2nd Generation Intel Xeon Scalable processors. The new system gave TRON 960 dedicated threads to run thousands of tasks in parallel for their analyzes.
TRON is now able to analyze and process over 20,000 sequencing datasets in less than three hours, providing near real-time analysis of the ever-growing publicly available datasets.

CoVigator offers a new perspective on SARS-CoV-2


The original TRON study, which produced a preprint, used 146,917 whole genome assemblies and 2,393 Next Generation Sequencing (NGS) datasets from the GISAID, NCBI Virus and NCBI SRA archives. The study found that a small percentage of samples contained the wild-type spike protein without variation, but found 2,592 distinct variants in all samples. The mutation rate was low, but it increased over time. Additionally, TRON found subclonal mutations, indicating potential coinfection with various strains of SARS-CoV-2 and / or intra-host evolution of the virus, as well as variants that could affect antibody binding or recognition of T lymphocytes.

“The most interesting finding of the research,” commented Löwer, “was to see the many variants of the spike protein. Second, the ability to look back over the past year and a half and track exactly how it has evolved offers a new perspective. We are able to detect even small changes early in its evolution. We can see how it starts in a single patient and mutates within the patient into multiple variations and across populations and geographic regions. And because the virus continues to travel around the world, we see how mutations move around the world over time. “

Inferior
points out that, although they did not sequence it first, the B.1.1.7 (UK) line is characterized by the accumulation of 17 variants, eight of which are located in the spike protein. Given the findings of variants, including the latest Delta and Delta-plus variants, it is important to continue to monitor and catalog the course of SAR-CoV-2 and the effects of the variants on vaccines.

“We now know that the immune system does not recognize all of the spike protein,” added
Inferior. “It recognizes specific small parts of the protein that we call epitopes. People have studied how vaccines trigger the immune system to identify these epitopes. We want to continue to examine whether the mutations we identify alter the epitopes detected by the immune system. This is something that vaccine producers and scientists would like to know. “

A tool for future pandemics


TRON’s work has given us new insight into the SARS-CoV-2 virus
but the study was only the beginning of an effort to first understand and then help scientists continuously monitor and analyze the evolution of the virus. Ttheir new work combines a large amount of data into a single resource and gives scientists the opportunity not only to examine many low-frequency variants, but also to examine the same variant on many hosts in parallel, with many dimensions different in one database.

“Without new computational resources, we would only have been able to complete the initial study,” said Bukur. “With the new system, we are able to provide ongoing research that downloads data from millions of samples and processes it in parallel to identify leading protein variants. As soon as we have the results, we post them to our CoVigator web service dashboard. With this platform, we are able to continue the work and make the latest data available to the research community. “

Understanding the mutations of new and known viruses is essential to being able to continuously treat and manage the therapeutic response to widespread and dangerous diseases. There will certainly be other unique pandemics. And, with climate change, scientists are observing the migration of existing dangerous tropical diseases from equatorial regions to more temperate areas. These threats will create new challenges for healthcare. The workflows and pipelines created by scientists at TRON can be quickly adapted to new viruses and virus strains, providing new tools for research and response in collaborative immunobiology.

“At some point, we can detect these variants during sequencing,” concluded
Inferior, “And being able to see those individual mutations in a single patient, which months later are more dangerous variants, like Delta.” But seeing these very small changes early enough and deciding if it’s a variation or just a random fluctuation in the data means we’re working very closely on the noise level. But we know we can be precise because we now know where the critical variants are occurring. It will not be the last pandemic. And we hope to use what we’ve learned to develop methods for the early detection of subsequent viruses and their mutations. “

Learn more on the TRON CoVigator NGS pipeline.