Reassessment of the mitochondrial 12S-rRNA gene for DNA barcoding of museum specimens of shelled marine gastropods from Japan

DNA barcoding is an effective and powerful tool for taxonomic identification and thus very useful for biodiversity monitoring. This study investigated the usefulness of the mitochondrial 12S-rRNA gene for the DNA barcoding of shelled marine gastropods. To do so, we determined partial 12S-rRNA sequences of 75 vouchered museum specimens from 69 species of shelled gastropods from Japan. The specimens have been identified morphologically, and natural history data catalog. Sequence analyses through BLAST searches, maximum likelihood phylogenetic analysis, and species delimitation analysis suggested that the 12S-rRNA gene is helpful for barcoding shelled marine gastropods. They thus could be helpful to complement barcoding studies using other markers such as COI. The analyses successfully confirmed all samples’ identity at higher taxonomy (subfamily and above), but much less so at the species level. Our result thus also underlines the lingering problem of DNA barcoding: The lack of comprehensive reference databases of sequences. However, since we provided sequences of properly curated, vouchered museum specimens in this study, our result reported here has thus also helped to give taxonomically reliable reference sequences for biodiversity monitoring and identifications of shelled gastropods which include many important fisheries species.


Introduction
With its ca. 35,000 of recorded species, Gastropoda, a class of shelled mollusks (Conchifera), is one of the most prominent invertebrate groups, in which, during its long evolutionary history, has radiated and occupied a diverse array of ecological niches in marine, freshwater, and biofouling/invasive organisms [2]. Many shelled members of this group are also greatly influenced by the recent ocean acidification event caused by global warming [1] [3][4]. As such, it is essential to monitor the diversity of this taxa constantly.
Recent development in DNA sequencing and the building up of DNA sequence databases have allowed for the usage of DNA sequences for the quick and effective identifications and classifications of samples collected from the field with relatively high accuracy, a method called "DNA barcoding" [5]. Further development of DNA sequencing technology (e.g., Next Generation Sequencing) has allowed for the development of a non-invasive method of biodiversity monitoring using DNA sequences, called the eDNA (environmental DNA = eDNA) method, which is essential barcoding of DNA fragments shed off into the environment by living organisms [6][7][8]. For DNA barcoding and e-DNA to work, the availability of a robust, reliable, and exhaustive reference database of DNA sequences collected from target taxa, is essential. However, at present, biases in biodiversity studies might have caused such a reliable database to be not available for some taxa, including gastropods (e.g. [9]). In addition, Machida et al. [10] and Page [11] reported that currently, some data are not adequately curated. As a result, some taxa might become unidentifiable and thus become "dark taxa."This becomes very problematic in monitoring studies, especially those conducted by people with inadequate taxonomical skills and resources, or if target taxa contain many possible undescribed or cryptic species, or when conducting eDNAbased monitoring for which morphological samples are simply unavailable.
On the technical side, the marker genes must have enough base substitutions to distinguish different species of the same genus [5], but not different enough to dramatically differentiate sympatric individuals of the same species. Several effective universal primers to amplify a region of the mitochondrial COI gene have been developed for metazoans, causing a tremendous amount of DNA barcoding studies to be conducted and thus empirically shown that the amplified COI segment has enough substitution rates to distinguish animals at the species level (e.g. [12][13][14][15][16][17]). However, in a previous study, we have suggested that doing DNA barcoding with only one genetic marker could be risky because of primarily technical problems (limited primer efficacies and the inability of a single quality to place samples properly at higher taxonomies) [16][17]. In that study [16][17], we also suggested that using multiple markers would help to alleviate the problems because it would allow for the collection of a more complete genetic data (primers of different genes might work on samples not amplifiable with those of one marker gene, and a combination of markers would allow for a more robust phylogeny), and thus allowing researchers to collect a more complete picture reflecting the actual biodiversity. Therefore, in that study, we evaluated and thus proposed the usefulness of the nuclear gene Histone-H3 (H3) as a marker for DNA barcoding of shelled gastropods, using previously developed primers [18].
In this study, to develop and assess the utility of another molecular marker for DNA barcoding-based studies of shelled gastropods, we investigated the usefulness of the mitochondrial gene 12S-rRNA using previously reported primers [19][20]. The 12S-rRNA gene has been used in previous molecular phylogenetics, phylogeography, and DNA barcoding of various metazoans, including gastropods. We sequenced the 12S-rRNA of different vouchered samples of shelled marine gastropods stored at the University Museum of The University of Tokyo, including some old museum samples (the oldest was sampled in 1999, and the latest was tested in 2015). Our result presented here, which was based on BLAST searches, phylogenetic analysis, and species delimitation analysis, has confirmed the usefulness of the 12S-rRNA as a genetic marker for DNA barcoding of shelled gastropods, which also include many fisheries species. The result also highlighted the lack of a robust and comprehensive reference database of this gene if it is to be used as a marker for gastropods. Meanwhile, because we used properly curated and vouchered museum specimens as samples, the natural history data of our representatives are reliable. Therefore, we have also contributed a set of reference sequence data from taxonomically reliable samples through this study, which is crucial for biodiversity monitoring using DNA barcoding and e-DNA.

Sample collection
A total of 75 individuals of 69 species of shelled gastropods were used in this study. All samples used in this study were vouchered specimens stored in the University Museum, The University of Tokyo. These samples were initially collected from various locations in Japan, and then fixed and stored in 95% EtOH. Morphological identifications (based on [21]) of collected samples were conducted before or after fixation. Representatives were chosen at random, with one or two individuals per species. Most specimens are at least nine years old, with the most senior sample collected in 1999, and the latest in 2015. The list of samples is provided in Table 1.

DNA sequencing and sequence data acquisition
A piece of the muscle tissue from the mantle or the foot (about 0.25 mg) was cut out from each sample. Total genomic DNA was extracted using the standard CTABphenol-chloroform method. PCR was performed using a standard protocol but with an annealing temperature of 52˚C. Three combinations of previously published three primers ( Table 2) [19][20] were used to amplify the 12S-rRNA fragment of samples. Sanger sequencing of amplicons (using both the forward and reverse primers) was outsourced (FASMAC Co. Ltd., Kanagawa, Japan). For comparison, we also sequenced a fragment of the COI gene of all samples, using previously published primers [22][23][24]. The list of primers used in this study is shown in Table  2. Obtained sequences were then checked for contamination by BLASTn searches [25]. Sequence fidelity was confirmed and edited manually on the software MESQUITE ver. 3.61 [26][27], by also simultaneously checking the chromatograms by eye (visualized on ApE ver.2.0.61 [28]). After sequence editing, all forward and reverse sequences were assembled manually by eye.

Sequence identification (DNA barcoding) through BLASTn searches
In order to confirm if the obtained sequences were homologous to previously published arrangements, and thus to get taxonomic information of the organisms from which the lines were obtained, we performed BLASTn searches on the assembled sequences. We consider a sample as correctly identified if the morphological identification matches the BLASTn search result. We also confirmed at which taxonomic level a particular sequence was identified (species, genus, family, order) to check the availability of reference sequences on GenBank and the fidelity of the GenBank sequences. BLASTn searches were conducted for both the 12S-rRNA and COI gene sequences.        99  98  100  97  99  98  99  100  99  96  99  95  96  96  100  100  97  99  100  99  97  88  93  96  99  99  99  99  99  97  98  99  99  99  98 97  91  99  99  99  99  99  100  98  99  99  99  100  98  100  99  99  100  99  87  99  84  95  99  99  99  99  98  99  90  100  99  99  99
Phylogenetic analysis was conducted only on the 12S-rRNA gene sequences.

Species delimitation analysis
Species delimitation analyses were conducted on the sequence datasets of COI and 12S-rRNA using the Automatic Barcode Gap Discovery (ABGD) software [33] to see if the 12S-rRNA fragment was used in our barcoding could differentiate species]. The analyses used the aligned sequence data of the 12S-rRNA (length = 280 bp) and COI (length = 632 bp). Prior intraspecific divergence range (Pmin to Pmax) was set to 0.001 -0.1, and the X value for the minimum relative gap width was set to 0.99 under the K2P model with TS/TV = 2.0.

Sequence data acquisition
For the 12S-rRNA gene marker, we successfully amplified and thus obtained ca. 450 bp for 12Sma / 12Smb primer pairs (60 samples) and ca. 430 bp of 12S97L / 12Smb primer pairs (15 samples), making us successfully obtain the 12S-rRNA sequences of all individuals used in this study. After sequence editing and alignments, the 12S-rRNA sequence lengths used for phylogenetic analysis were 280 bp. We also successfully amplified the COI gene marker for all samples in this study (ca. 650 bp for LCO1490 / HCO2198 primer pairs = 54 samples; ca. 1100 bp for LCO1490 / H7005 primer pairs = 6 samples; ca. 650 bp for LCOmod_Kano2008 / HCOmod_Kano2008 primer pairs = 15 samples). The sequence length of the COI marker after sequence editing and alignments was 632 bp.

BLASTn searches-based DNA barcoding
BLASTn searches of the gene 12S-rRNA matched 39 species of 41 individuals (Table 1), with the sequence identities of 92%-100% and e-values of 0.00 to 6.00 e-117. Meanwhile, 30 samples of 29 species were confirmed only at the genus level (22 genera). The rest of the samples (four individuals) matched Genbank sequences at higher taxonomy (i.e., subfamily, family, and order levels). Meanwhile, for the COI gene, 47 species (50 individuals; 67%) were confirmed at the species level, with 25 individuals matched at the taxonomic levels of genera and above. Detailed results of the BLASTn searches are presented in Table 1.

Obtained phylogeny and the taxonomic placement of the samples
The phylogenetic tree obtained from the maximum likelihood phylogenetic inference performed on the 12s-rRNA sequences is shown in Figure 1. Most samples were placed along with their morphologically identified species and genera levels with relatively reasonable support values (e.g., Cellana, BS = 72%; Chlorostoma, BS = 70%; Nerita, BS = 53%). However, Nassarius and Patelloida were not monophyletic, although both genera were placed in their proper higher taxonomies (Nasarius: Buccinoidea; Patelloida: Lottiidae). Placements at the higher taxonomy levels (subfamily and above) for other genera and species also placed most of them in their valid taxa, despite the lack of strong bootstrap support on many clades, and improper taxonomic placements of some samples (Figure 1), for example, members of Muricoidea, Nucella lima, and Ocenebra inornatus endermonis, were placed in Buccinoidea, while Notocochlis gualteriana (Naticoidea) was placed in Muricoidea; the Cypraeoid Erronea errones was placed in Buccinoidea; some members of Littorinimorpha, Erronea errones, Notocochlis gualteriana, Canarium mutabile, Conomurex luhuanus, were instead put inside Neogastropoda, and Cassidula mustelina (Ellobiida) was included in Neogastropoda.

Species delimitations of target samples
Species delimitations conducted in ABGD on the 12S-rRNA gene dataset resulted in the identification of 61 species, while the same analysis on the dataset of the COI gene identified 65 species (Table 3). Meanwhile, our samples (75 OTUs) were morphologically identified as 69 species. As shown in Table 3, the result of ABGD on both genes generally agrees with the outcome of morphological identification. However, some closely related congeneric species were not properly delimited and identified as the same species in one of the markers or both. For example, Batillaria multiformis and Batillaria zonalis were identified as one species by the 12s-rRNA, while Cellana grata and Cellana toreuma were identified as one species by both gene markers.

Sequence data acquisition success rate on preserved museum samples
In this study, we tested the 12S-rRNA primers on various museum samples across the whole Gastropods. The usefulness of the COI primers has been shown in multiple previous studies [13][14], including those of ours [16][17]. In general, our results here indicate that the 12S-rRNA primers tested in this study are helpful and can amplify the 12S-rRNA sequence fragments for DNA barcoding to complement results obtained using other markers such as COI. Meanwhile, we successfully got DNA sequences of both gene markers for all samples using standard PCR protocols, even though some of the samples were old museum samples, which were stored at conditions not ideal for molecular works (e.g., room temperature storage). Therefore, our result also suggests that these primers could probably be used for museums studies, such as museum samples barcoding and studies to obtain molecular data out of old museum samples [34].

The non-exhaustiveness of Genbank for the identification of gastropods
We performed BLASTn searches of the COI and 12S-rRNA sequences obtained from the samples to see if the sequences of our taxa are present on Genbank, besides checking if our morphological identification was consistent with the sequence data on Genbank. The result of our BLASTn searches suggested that most samples only 55% of our samples were correctly identified using the 12S-rRNA and only 67% even when using COI, which is the most commonly used DNA barcoding marker [35]. This result thus underlines the problem of taxonomic bias in biodiversity observation, causing the incompleteness and/or nonexhaustiveness of the data base [9]. This could be problematic for studies depending on identification based on DNA sequences only, such as eDNA and metagenomics [6].

The phylogeny is relatively well-resolved for a single marker
We also conducted a phylogenetic analysis of the 12S-rRNA sequences to see if the gene could adequately place the samples in their proper taxa. We found that while most samples were properly grouped with their conspecifics or congeners, some samples were not ( Figure  1). At present, we are unable to pinpoint the cause of these misplacements, which might include sequence errors, homoplasies and long-branch attractions, and the possible lack of identifying substitutions of the 12S-rRNA fragment used in our phylogenetic analysis. Meanwhile, classification at higher taxonomy, in general, is congruent with the recently proposed gastropod systematics [36][37]. This result is generally also in agreement with the result of our preliminary study [38]. However, detailed interrelationships did not agree entirely, and the statistical supports were low in most nodes of higher taxonomy. It is expected since the interrelationships among higher taxa (above genus) cannot usually be resolved using only single-gene data [16][17][38][39].

The 12S-rRNA marker was able to delimit most species in this study
We also conducted species delimitation analyses on the COI and 12S-rRNA to confirm if both characteristics could correctly identify/delimit the species of the samples, as identified morphologically. There was 69 morphospecies (out of 75 individuals) in our samples, which were identified by professional taxonomists/curators, which were also co-authors, of this study. Interestingly, however, both markers were unable to completely delimit all morphospecies (COI = 65 species, but 12S = 71 species), apparently having difficulties differentiating closely related congeneric species (Table 3). The differences in species delimitation could probably be attributed to differences in the substitution rates of each taxon, which might be related to their different biology. This was also suggested by the species delimitation results of both genes before the removal of ambiguously aligned regions by GBlocks. These results indicated that the removal of ambiguously aligned regions might affect the detection of sequence diversity due to the removal of possible informative areas [40]. However, all in all, our result also indicated that both markers could delimit the samples at least at the genus level.

General conclusion and future directions
Our present results of species delimitation analysis, phylogenetic analysis, and BLASTn searches suggest that the short fragment of the 12S-rRNA gene used in this study is useful and effective enough to delimit various gastropod species, and thus useful for DNA barcoding and metabarcoding (eDNA studies) of shelled marine gastropods. The marker could thus be used to complement other DNA barcoding markers such as COI and 18S-rRNA. DNA barcoding using multiple markers would allow researchers to capture a complete snapshot of biodiversity and avoid the numerous possible pitfalls caused by using only a single marker [16][17][41][42][43][44].
Moreover, because our study presented appropriately used curated museum samples, our sequence data would become an essential addition to the reference database for future studies. Therefore, in the future, we will register our sequence data to an adequately curated database such as Genbank or DDBJ. We will also continue our study by testing more prospective markers on more properly curated museum samples of shelled marine gastropods from Japan to provide a comprehensive reference sequence database for further studies involving DNA barcoding, metabarcoding, and e-DNA.