Uncovering the Secrets of Amino Acids

By: Cloris S.
Year: 2021
School: Jeffrey Trail Middle
Grade: 8
Science Teacher: Jose Ignacio

This project uses data from publicly available databases (NCBI Virus) to analyze the receptor-binding sequences of human coronaviruses.

The novel virus SARS-CoV-2 that causes COVID-19 is one of six human coronaviruses. Current methods used to initially analyze novel viruses involve time-consuming research, so more efficient methods are necessary.

The question that lead this project is: How does the receptor pathway a novel coronavirus uses affect the similarity of amino acid frequencies (in %) at its receptor-binding domain (RBD) to other existing coronaviruses?

Cloris hypothesized that little intraspecies variance exists at the RBD because viruses would evolve an optimal, conserved sequence. She used sequences collected from May 2003 to August 2020 by the National Center for Biotechnology Information (NCBI) Virus. Then, she sampled 30 out of 150,000+ spike protein sequences of six coronaviruses. The deviation between isolates is 0.22%, and SARS-CoV, SARS-CoV-2, and NL63 exhibited 100% similarity at critical sites.

Cloris also hypothesized that, between species of coronaviruses, those that employ the same receptor, Angiotensin-converting enzyme 2, will exhibit identical mean distributions. The mean deviation of 0.224% for viruses that use the same receptor, in contrast to 3.33% deviation across all six coronaviruses, correlates receptor usage with biochemical distribution. The algorithm can be employed in preliminary analyses of future viruses. It forms conclusions on receptor pathway, critical amino acids, deviation rates, and biochemical distributions from just an amino acid sequence.

Cloris’ research helps future biochemists generate targeted studies on a novel virus, at earlier stages of research.