Molecular barcoding of marine ornamental fish from the southern coast of West Java validates conventional identification

Conventional identification of marine ornamental fish has faced difficulties due to similar color patterns of closed related species, or juvenile individuals have different color patterns from adult individuals. Molecular barcoding using the cytochrome c oxidase I (COI) gene provides a reliable tool for unmasking such difficulties. This study aimed to barcode marine ornamental fish from the southern coast of West Java. Fragment of the COI gene was sequenced from 54 morphotypes. In this study, we determined the taxonomic status of the samples based on a 5% genetic divergence, with the parameter including sequence percent identity, genetic distance, and length of monophyletic branch in a phylogenetic tree. The result showed that most samples had a high percentage of sequence identities, low genetic distances, and short chapters in monophyletic clades, but the remaining were not. Those data indicated that most samples could be identified at species-level without doubt and support conventional identification. Barcoding success is also depending on the availability of conspecific sequences in the databases. This study concluded that molecular barcoding could strengthen and validate traditional identification.


Introduction
Indonesian coral reef supports consumptive and non-consumptive fish species. Ornamental fish is a non-consumptive fish group that is utilized for recreation. This fish group is in high demand because of its beautiful colors and color pattern, both in juvenile and adult individuals. Ornamental fish has a broad market from national to international trading [1].
Trading of these wildlife commodities in Indonesia has been started since the 1990s either local or international trade. Many publications have reviewed the ornamental marine fish from Indonesia. However, mainly on trading values and data were collected from prominent exporters [1][2][3]. The study focused on species diversity of marine ornamental fish on particular sites where the commodities are collected relatively rare, especially on the southern coast of West Java. Data on marine ornamental fish production at the south coast of West Java were also not available.
Two recent studies reported marine ornamental fish from the southern coast of West Java, which collected ornamental fish from Pangandaran [4] and Pelabuhan Ratu, Ujung Genteng, and Taman Manalusu [5]. In both studies, the researchers proved that the high species diversity of marine ornamental fish is involved in the aquarium trade on the southern coast of West Java.
In particular of ornamental fish groups, species identification mainly relied on morphological characters, such as color pattern faced difficulties and might lead to misidentification. On the one hand, closed related fish species might show only subtle morphological differences [6]. Different fish species might show similar colors and patterns in marine ornamental fish during the juvenile stage [7]. On the other hand, different life stages of ornamental fish offer different color and color patterns, such as Pomacanthus semicircular [8].
In addition to morphological characters, this report utilized molecular characters for species identification of marine ornamental fish from the southern coast of West Java. This study used the cytochrome c oxidase 1 (COI) gene as a barcode marker. The COI gene has been a reliable technique for species-level identification [9,10]. Some exceptions in some fish groups, COI barcodes could not differentiate closely related species [11]. Moreover, studies demonstrated that COI barcoding could reveal that cryptic species are also abundant [12,13]. Other studies also proved that COI barcoding was strengthened and validated morphological identification [6].
The researchers utilized variable sequence homology values during species delimitation. A minimum sequence homology of 97% or 3% sequence divergences is used for species delimitation in Boldsystems [14]. A similar value was also used by previous studies [15,16] Ward et al. (2009) and Amatya (2019). Other researchers used a minimum of 98% sequence homology as species threshold. However, low genetic homology (below 95%) was observed when the reference species came from different localities [17], while other studies used 99% homology for species determination [18]. At the same time, many studies also reported that intraspecific genetic distances in fish were wildly variable among species ranging from 0.0 to higher than 0.05 [19,20,21,22,23]. Higher genetic distance among species was reported when considering the geographic localities of the samples [24]. Another study said that an overlap genetic distance is observed between intra-and interspecific individuals [25].
This study aimed to identify marine ornamental fish collected on the southern coast of West Java based on cytochrome c oxidase one gene barcoding to validate morphological identification 2 Materials and methods

Sampling sites and times
A total of 367 ornamental fish samples were bought from the first collector in Pelabuhan Ratu and Ujung Genteng, Sukabumi Regency, Taman Manalusu Garut Regency, and Bojongsalawe Village, District of Parigi, Pangandaran Regency ( Figure 1). Ornamental fish samples were collected during the field trips in 2018 and 2019.

Marker amplification and sequencing
Molecular barcoding was carried out on 54 morphotypes identified morphologically. However, the results were questionable due to overlapping characters between closely related species. The genomic DNA of the samples was isolated from caudal fin clips using Chelex®100 methods [26] with slight modification [27]. The selected marker was amplified using primers FishF2 and FishR2 [28]. Reagent composition was as follow; 10X PCR buffer 5 µl, MgCl 2 (50 mM) 5 µl, 2 µl (0.01 mM) of each primer, 2 µl dNTPs (0.05mM), 1 U Taq polymerase, and 4 µl of template DNA. Adjusted finale volumes to 50 µl were obtained by adding RNAse-DNAse free water. The marker was multiplied using the following thermal cycles. Pre-denaturation was performed at 95°C for 5 minutes and continued by 35 cycles with the following conditions. The denaturation process was conducted at 94°C for 1 minute, annealing at a temperature range from 53°C to 55°C depending on the suspected species, and extension steps at 72°C for 1.5 minutes. We conducted the final extension for 5 minutes at 72°C.
Half of the fish samples were treated as follows to obtain sequences data. The genomic DNA was isolated using ZR Tissue and Insect DNA Miniprep Kit (Zymo Research, D6016) following the protocol from the manufacturer. The PCR amplification of the selected COI marker was performed using the MyTaq HS Red Mix (Bioline, BIO-25047), while the sequencing of the COI gene was used in the bi-directional sequencing technique. All procedures of DNA analysis were conducted at Genetika Laboratory (PT. Genetika Science Indonesia).

Sequences editing and species determination
All the sequences were subjected to manual editing, and trimming using Bioedit 7.0 software packed [29]. With manual checking, pairwise multiple sequences alignment was conducted using ClustalW as applied in Bioedit 7.0 software packages [29]. The marker's confidence level as the actual COI sequence obtained from the translation process to the amino acid using the ORF Finder online version (https://www.ncbi.nlm.nih.gov/orffinder/). This study rechecked the translation results through the blast process with the formatting option search parameters plus the CDS feature. This process was carried out to ensure no stop codon in the middle of the COI gene base sequence is obtained.
We determined the taxonomic status of each morphotype based on the sequence identity or similarity value of 95%. The present study chose that value based on a consideration that species could have other sequences divergences within species [6] and geographic locality between the current samples and the references species [17] available in the barcode library (GenBank and Boldsystems). The Kimura 2-parameter genetic distance of 0.05 was selected as additional data for species-level identification. Support to those values was also obtained from the phylogenetic tree, reconstructed based on 527 base pair (bp) sequences. The tree was constructed using the Neighbor-joining method based on the Kimura 2parameter substitution model. We obtained branching polarity from 1000 bootstraps pseudoreplication. Genetic distances calculation and tree reconstruction were performed in MEGAX [30]. Short branches in the monophyletic clade (maximum scale 0.05) were referred to as a single species. This study compared molecular barcoding with previous studies, which identified marine ornamental fish from the same sites but based on morphology [4,5]. That step was conducted to check the validity of morphological identification.

Taxonomic status
Sequence identity test using essential local alignment search tool (BLAST) to the references species available in GenBank resulted in identity values ranging from 94.65% to 100%. This study also rechecked sequence similarities of the samples to conspecific references in Boldsystems. The current study obtained the lowest identity value of 94.65% for the sequence of WJM5. A detail data on sequence identity values and genetic distances between samples and their references species are presented in Table 1. The pairwise Kimura-2parameter (K2P) comparisons indicated that the samples had genetic distances between 0.000 and 0.055 (Table 1). We found the highest genetic distance of 0.055 between morphotype WJM5 and its references species Lutjanus decussatus. Neighbor-joining (NJ) phylogenetic tree reconstructed based on the K2P model indicated that most samples formed a monophyletic clade with their references species. Most representatives formed a clade with a short branch length to the references species except between WJM5 and L. decussatus. Almost all clade had branch lengths lower than 0.05 scales, and only WJM5 and L. decussatus clade had branch lengths higher than the 0.05 scale. The phylogenetic tree is presented in Figure 2.

Molecular barcoding versus morphological identification
Comparison to previous studies [2,5] proved that 51 out of 54 (94.44%) morphotypes resulted in similar taxonomic status between molecular barcoding and conventional identification based on morphological characters. The remaining three morphotypes (5.6%) were different between molecular and traditional identification. Complete data on the comparison between molecular barcoding and conventional identification is presented in Table 2.

Taxonomic status
Fifty-three morphotypes had identity values above 95% to their conspecific references, with genetic distances below 0.05 (Table 1). Those morphotypes also formed monophyletic clades with branch lengths less than 0.05 to their conspecific references ( Figure 2). Those three data (sequence identity, genetic distance, and branch length on monophyletic clade) proved that those 53 species could be assigned to species level. The assignment to the species level is defined according to the barcoding gap used in species determination is 5% genetic divergence, which means 95% genetic similarity between query samples with conspecific references. Several studies reported that 95% could be used for species-level barcoding [17,[22][23]. The use of 3% to 5% genetic divergences must be added by other data [31], including geographic localities [32]. This study utilized the geographic localities of the samples and reference species as additional considerations for species determination. There were exciting findings that two morphotypes had high sequence identities to two different references species. The PGN015 has 100% to Pseudobalistes flavimarginatus and has 99.84% sequence identity to Balistoides viridescens. In contrast, PGN715 has an identity value of 100 to B. viridescens and 99.84% to Pseudobalistes flavimarginatus. In such a case, this study used the highest homology and the lowest genetic distances even though they formed a monophyletic clade with 0.00 branch length in the phylogenetic tree. Therefore, PGN015 and PGN715 were taxonomically referred to as P. flavimarginatus and B. viridescens, respectively. This situation was not surprising because previous studies also reported a similar condition in other fish groups [11]. They found a high homology value of Mystus vittatus sample to M. vittatus and M. horai in barcode databases (99% to each reference species, respectively). A similar high homology value was reported for Bagarius bagarius samples to B. vagaries and B. yarrelli in the databases, with homology values of 100% to both species, respectively [11].
Morphotype WJM5 had a sequence identity of 94.65% to 13 sequences of L. decussatus in GenBank and more than 50 sequences of L. decussatus in Boldsystems, genetic distance 0.055, branch length was longer than 0.05. The morphotype had sequence identity top hits to L. decussatus. However, because the used genetic gap was 95% sequences similarity and genetic gap 0.05, the morphotype WJM5 was referred to genus level Lutjanus and Lutjanus sp.

Molecular barcoding versus morphological identification
Based on the data in Table 2, the result of molecular identification was highly congruent (94.44%) to conventional identification [4,5]. High Congruent between molecular and morphological identification was reported in a previous study with success between 90% and 99% [33]. Congruent between morphological and molecular identification was also reported in mosquitoes [34].
In the case of WJM5, although it has 0.055 (higher than 0.05) genetic distance and genetic identity lower than 95% (94.65%), the nearest relative in barcode libraries (GenBank, 13 individuals, and Boldsystems, > 50 individuals) were L. decussatus. The result was congruence with conventional identification. It is reasonable that WJM5 had higher genetic divergence to its nearest relative in barcode libraries because the researcher collected from different geographic regions or even different oceans. This study collected samples from the southern coast of West Java (East Indian Ocean). In contrast, the previous researcher collected conspecific reference L. decussatus (KF009608) previously published in GenBank from the Philippines (Pacific Ocean). Combining both molecular and conventional identification for WJM5, we finally decided that WJM5 was referred to as L. decussatus. The argument was that samples of single species collected from different localities could have a relatively low genetic identity and high genetic distance to their conspecific references in barcode libraries [6,17,[21][22][23][31][32].

Conclusions
This study highlighted that under certain circumstances, molecular barcoding could strengthen and validate conventional identification. The success of species-level barcoding depends on the availability of conspecific sequences in databases.