UPGMA - analysis of type II CRISPR RNA-guided endonuclease Cas9 homologues from the compost metagenome

Metagenomic approaches provide access to the genetic diversity of the environment for biotechnological applications, allowing the discovery of new enzymes and new pathways for numerous catalytic processes. Five new putative type II CRISPR-Cas9 DNA endonucleases were identified from the compost community using the DELTA-BLAST algorithm. It was determined using phylogenetic UPGMA analysis that four of these potential enzymes are similar to those of the Bacteroidetes. Protein structural modeling confirmed the data of DELTA-BLAST and UPGMA analysis. These new five proteins found may be promising for genome editing in termoresistant Actinomyces.


Introduction
Extreme environmental conditions such as hot springs, deep-sea hydrothermal vents and organic composts are reservoirs of unique microbial diversity, providing the potential for the release of new enzymes with desirable properties. The adaptation of microbial communities to these environmental conditions explains their high genomic and metabolic flexibility, and they often encode enzymes with novel properties suitable for many applications [1].
The aim of this work was to search for homologues of CRISPR-Cas9 DNA nuclease from the compost metagenome. Such homologues may be interesting for the development of a system for editing the genes of various bacteria inhabiting this artificial biotope. These enzymes must be thermotolerant, because temperatures during the incubation of compost rise to 90 degrees Celsius or more. Thermotolerant enzymes can also be used to edit the genome of bacteria isolated from other extreme biotopes. An additional bonus of using such sequences can be the use of a thermostable in vitro DNA editing system. An interesting fundamental study of the found TR (thermoresistant) homologues of type II CRISPR-Cas9 DNA endonucleases can be a structural study of these enzymes for the subsequent production of biotechnologically significant mutants based on amino acid sequences extracted from the compost metagenome.

Materials and methods
The main details of the methodological approaches were published by us earlier [1]. The reference point for the search for homologues was the amino acid sequence of the second type II CRISPR-Cas9 gene product. The search for homologues in the compost metagenome (env_nr, taxid: 702656 [3-4] was performed using the DELTA-BLAST algorithm with the following change in the default parameters: gap costs: existence: 9 extension: 1. Sequences MNK39233.1, MNQ43276.1, MNF63500.1, MNX56837.1, and MNU15097.1 were found with statistical confidence. Sequences of CRISPR-Cas9 homologues from various bacterial taxa were taken as controls. Phylogenetic analysis was performed using the UPGMA algorithm with 2000 repetitions of the bootstrap statistical analysis [7] in the MEGAX software package [5]. Building the model by SWISS-MODEL [11]. Templates searches were done using BLAST [9] and HHblits [10] against the SWISS-MODEL Matrix Library (SMTL [12], last updated: 2021-03-03, last included PDB release: 2021-02-26). The model was built by aligning the target sequence with the template using the ProMod3 software tool [13].

Results
Amino acid sequences found using DELTA-BLAST in the metagenome of the compost from the Experimental Botanical Garden Goettingen, Germany [3-4] were used for phylogenetic analysis (Fig.1).
The evolutionary relationships of the homologue MNU15097.1 remain unclear. It is interesting that attempts to compare it with databases in GenBank also did not give an unambiguous result. It is possible that a new rare sequence of the second type CRISPR-Cas9 endonuclease has been found.
Building the MNK39233.1 model by SWISS-MODEL: homology modelling of protein structures [11]. The search for patterns in the SMTL database [12] for structural modeling confirmed the data of DELTA-BLAST and UPGMA analysis.
Protein structural modeling was performed for all five found sequences: MNK39233.1, MNQ43276.1, MNF63500.1, MNX56837.1, and MNU15097.1 (Tab.1). As an example, we present the best result obtained for the MNK39233.1 sequence. A total of 19 templates for MNK39233.1 found. Model on the structure from SMTL DB ID: 6jdv.1 (Crystal structure of Nme1Cas9 in complex with sgRNA and target DNA (ATATGATT PAM) in catalytic state) [14] for sequence MNK39233.1 was found (Tab.1). The model was obtained by superimposing the studied sequence on the structure of the protein ID: 6jdv.1 from SMTL (Fig.2).

Findings
Five new putative type II CRISPR-Cas9 endonucleases were identified from the compost microbial community using the DELTA-BLAST algorithm. It was determined using phylogenetic UPGMA analysis that four of these potential enzymes are similar to those of the Bacteroidetes from the taxa Cytophagales, Chryseobacterium and Flavobacterium. Protein structural modeling confirmed the data of DELTA-BLAST and UPGMA analysis. Structural modeling of proteins showed that potential secondary structures similar to the known endonuclease Cas9 are distributed along the entire length of the paired sequence alignment. This distribution is typical for all studied paired elements. This indicates the possibility of using the data of the found sequences for the synthesis of artificial genes and for obtaining real endonucleases for their experimental verification and subsequent use. The potential thermal stability may determine the scope of application of these new proteins -Cas9 homologues in practice. These new five proteins found may be promising for genome editing in thermoresistant microbe.