Overview of genomic surveillance related to Severe Acute Respiratory Syndrom Coronavirus 2 (SARS-CoV-2)

. Since the start of the Severe Acute Respiratory Syndrom Coronavirus 2 (SARS-CoV-2) pandemic, several thousand of variants circulated and others are emerging. Therefore, genomic surveillance is crucial, which aims to detect the emergence of new variants, in particular Variants of Concern (VOC) and to assess the impact of priority mutations on the transmissibility and lethality of the virus, the performance of viral diagnostic methods and vaccine efficiency. An overview of available papers was performed to understand conduct, tools and utility of genomic sequencing and surveillance related to Covid-19 disease. We also report the experience of Morocco in this filed through available data. A national SARS-Cov-2 genomic consortium has been established in order to continuously inform the health authorities of the genetic evolution of circulating strains in Morocco. Genomic sequencing shows that Moroccan genomes spread did not show a predominant SARS-CoV-2 lineage. Genomes are dispersed across the evolutionary tree of SARS-CoV-2 and held between 4 and 16 mutations. As the pandemic ongoing, continuous genomic surveillance and regular sequencing are fundamental to understand the spread of SARS-CoV-2, to rapidly identify potential global transmission networks and to consolidate response strategies especially targeted Covid-19 vaccination.


Introduction
Several coronavirus are already known to be able to infect humans.Severe Acute Respiratory Syndrom Coronavirus 2 (SARS-CoV-2), a lineage-b betacoronavirus belonging to the coronaviridae family is the seventh coronavirus pathogenic to humans responsible for the COronaVIrus Disease 2019 .Covid-19 causes pneumonia and severe acute respiratory syndrome due to the high level of inflammatory response [1].It was first identified in Wuhan, China, in early December 2019.Widespread cases were reported in other countries subsequently.In January, 2020, the World Health Organization (WHO) declared Covid-19 a Public Health Emergency of International Concern.The pandemic has infected more than 168,017,381 people and caused more than 3,488,273 deaths globally as of May 25 2021 (https:/www.worlmeters.info/coronavirus/).In Morocco, the first Covid-19 case was declared in March, 2020 with 517,113 confirmed cases and 9,126 deaths until May, 2020.
SARS-CoV-2 is a RiboNucleic Acid (RNA) enveloped virus.Its genome has a size of 30 kilobases approximatively, coding for 15 genes, 4 of which correspond to structural proteins: surface protein (Spike or S protein), envelope protein (E), matrix protein (M) and nucleocapsid protein (N) with accessory proteins, encoded by ORF3a, ORF6, ORF7a, ORF7b and ORF8 genes [2].By the Receptor Binding Domain (RBD), the most variable part of the coronavirus genome, Spike protein is responsible of binding the Angiotensin-Converting Enzyme2 (ACE2) cell surface receptor and inducing cell entry during infection [3].Despite SARS-CoV-2 lower mutation rate than most RNA viruses, mutations generally accumulate inducing genomic diversity.This purchased genetic heterogeneity procures viral adaptation to different hosts and environments.In hosts, genomic diversity, most often is associated with disease progression, drug resistance and vaccination issues.Since the beginning of the pandemic, several mutations of SARS-CoV-2 have been reported in the literature, most often, due to nucleotide substitution but gene deletions are also described.As other countries, in February 2021, the Moroccan health ministry has get up a consortium of laboratories with sequencing platform as a part its strategy for genomic monitoring of the disease.This consortium is composed of the Reference Laboratory for Influenza and respiratory Viruses of the National Institute of hygiene, the Medical Biotechnology Laboratory of Faculty of Medicine and Pharmacy and the Functional Genomic Platform of the National Scientific Research as well as to the Institute Pasteur in Casablanca.The main mission of this laboratories network is to identify SARS-CoV-2 variants and to characterize them by genomic sequencing.
We chose to perform an overview as an appropriate approach to understand conduct, tools and utility of genomic sequencing and surveillance related to Covid-19 disease.Furthermore, we summarize how genomic sequencing and surveillance have supported the identification of new SARS-CoV-2 variants and mutations.Also, we report the available data of the Moroccan experience in this field.

Phylogenetic network analysis
In conducting genomic analysis, after specimen collection, DNA synthesis, genome viral amplification and next generation sequencing, genomes are mapped to the reference sequence Wuhan-Hu-1/2019 with Variant Call Forma (VCF).Then, a phylogenetic analysis is performed to construct the phylogenetic tree premising genomic comparison and analysis via the reference strain repositories such as GenBank and GISAID and to identify mutations [4,6].To date, the globally circulating viruses have been classified into six major clades denoted as S, L, V, G, GH, GR and GRY [5].
As the Covid-19 outbreak continues to evolve and scientific evidence expands rapidly, the information provided in this paper is only current as the date of elaborating this work.

Bioinformatics tools for genomic analysis
Many tools are available for each component step, from quality control of the genomic sequence data to viral genomic verification [7,8].Several bioinformatics tools have been developed for the detection and genomes SARS-CoV-2 sequencing (Covidex for SARS-CoV-2 genomes subtypification, CoV-GLUE for tracking SARS-CoV-2 genome accumulating changes, PoSeiDon for detection of positive selection in proteincoding genes, etc.) [7,8] [7,8].
Researchers, regularly deposit datasets into public databases such as GISAID for consensus sequence as the standard database for sharing of SARS-CoV-2 data internationally with no imposed limitations on the sharing and the genomic sequences use [5].GISAID deposited the largest number of SARS-CoV-2 genome sequences.
An additional classification effort has been provided by Hodcroft et al [10], which supports the PANGOLIN lineage nomenclature system and allows comparison to the SARS-CoV-2 reference sequence, assigning sequences to Clades and define where they fall on a the SARS-CoV-2 phylogenetic tree.Nextclade uses a Year-Letter nomenclature with a capital letter starting for the year when clade emerged.Once the frequency of a clade exceeds 20% for more than 2 months in a representative global sample, new major clades are used.Currently, clades 19A, 19B, 20A, 20B, 20C and 20I are named [10,11].
In Morocco, the report of Badaoui [12] shows that Moroccan genomes are dispersed across the evolutionary tree of SARS-CoV-2.Viral strains are not only from Belgium, Spain and France but also from USA and Vietnam.Nine viruses from Clade 20A, 9 from 20B and 2 from 20C were included with no predominant SARS-CoV-2.The virus circulated on February 2020 before the official discovery of the first case in March [12].

SARS-CoV-2 variants and mutations
Since the start of the pandemic, thousands of mutations (amino acid insertion and deletion), which in turn have given rise to thousands of variants, were screened [13].By convention an amino acid change is written N501Y to denote the wildtype (N, asparagine) and replacement amino acid (Y, tyrosine) at site 501 in the amino acid sequence.
In general, non-synonymous mutation, subject of natural selections, is a nucleotide mutation that alters the amino acid sequence of the spike protein which differs from synonymous substitution by silent mutation, without amino acid sequences alteration.
On 25 February 2021, WHO released a document outlining working definitions of variants of concern (VOC) and variants of interest (VOI) [14].
A VOI is defined as an isolate of SARS-CoV-2 with genotypic and/or phenotypic changes compared to the reference genome that have been associated with changes to receptor binding, reduced neutralization by antibodies generated against previous infection or vaccination, reduced efficacy of treatments, potential diagnostic impact, or predicted increase in transmissibility or disease severity.
A VOC is defined as a VOI which has an evidence of an increase in transmissibility, virulence and/or is not being controlled effectively by current public health measures.
On 31 May 2021, the WHO has assigned simple, easy to say and remember labels for key variants of SARS-CoV-2, using letters of the Greek alphabet.These labels do not replace existing scientific names (assigned by GISAID, Nextstrain and Pango), which convey important scientific information and will continue to be used in research [14].
In South Africa, the B.1.351(Beta), 501Y.V2; VOC 20C/501Y.V2 variant was identified by the Network for Genomic Surveillance in South Africa (NGS-SA) in December 2020 [17].It has been shown to have increased transmissibility and to reduce the efficacy of some vaccines [18].
The Brazilian variant P.1 (Gamma), 501Y.V3 or B.1.1.28.1 VOC, was reported by Japan in December 2020 after detection in four travellers who had returned from Brazil [19].Due to the presence of spike mutations (also found in the B.1.351variant): N501Y and K417N/T (increase virus binding affinity to the ACE2 receptor on human cells and fast lineages growing with possible resistance to some antibodies), E484K (leads to escape from immune response); P.1 variant is flagged to be of concern.
The Indian variant B.1.617.2 (Delta) has emerged in India in December 2020 [20,21] and was declared VOC by the UK in 7 May 2021.This variant is defined by four mutations in the Spike protein: E484Q, L452R (linked to increased transmissibility and virulence and immune protection evasion specifically targeting the spike RBD) and P681R (may increase the infectivity of the virus by facilitating cleavage site between S1/S2).

Variants of interest
For these seven variants, genomic and epidemiological evidence is available that could imply a significant impact on the epidemiological situation by significant transmissibility, severity and/or immunity [22].We adopt the European classification of the European Centre for Disease Prevention and Control (ECDC) for VOI variants [22].
The B.1.525(Eta) variant has emerged in Nigeria and UK in December 2020.It is a variant under investigation with still unknown infectivity level.B.1.525 is defined by 3 mutations: E484K, Q677H and F888L.The mutation of B.1.525makes it similar to B.1.1.7 variant and may increase transmissibility, virulence, and immune escape [22].

Variants under monitoring
Additional variants of SARS-CoV-2 have been detected which they could have properties similar to those of a VOC, but the evidence is weak or has not yet been scientifically assessed.
In our context, Badaoui [12] and Laamrati [23], reported that the virus genomes from Moroccan patients retain between 4 and 16 mutations relative to the common Wuhan-Hu-1/2019' ancestor.Most frequent non-synonymous mutations in SARS-CoV-2 isolates from Moroccan patients were nsp 12, P323L, D614G, R203K and G204R [12].D614G nonsynonymous mutation, associated with the emergence of clade A2 and known as the most prevalent variant worldwide was found [23].This mutation was already associated with the observed transmission increase in the United States.

Conclusion
As the pandemic ongoing, continuous genomic surveillance and regular sequencing are fundamental to understand the spread of SARS-CoV-2 in different regions, to rapidly identify potential global transmission networks and to consolidate response strategies.Bioinformatics resources in sequence and phylogenetic alignment, tree visualization and genomic analysis are essentials.However, the increase in the amount of SARS-CoV-2 genome sequence data available represents serious challenges for data storage and analysis.National and international improvement of genomic surveillance tools and resources must be required to resolve this problem.Therefore, genome sequences updating in real time is crucial for tracking rapidly the genetic evolution of SARS-CoV-2 and the diffusion of emerging clades in order to develop the appropriate health strategies against circulating variants as well as new emerging VOC, especially in terms of Targeted Covid-19 Vaccination.Ethical Approval and Consent to participate Not applicable.
[4,5]quently, genome sequencing of SARS-CoV-2 became a Public Health Priority.It aims to enable genomic epidemiology investigations into Covid-19 disease origins and spread, https://doi.org/10.1051/e3sconf/202131901043 to contribute to a better understanding of viral pathogenesis and virulence and to provide support for targeted vaccines.As 26 May 2021, 1,732,197 SARS-CoV-2 genomic sequences have been shared via the Global Initiative on Sharing All Influenza Data (GISAID) database[4,5].In this global health crisis, to