An integrated proteomic and transcriptomic analysis of perivitelline ﬂ uid proteins in a freshwater gastropod laying aerial eggs

Proteins of theegg perivitelline ﬂ uid (PVF)that surrounds theembryoare criticalfor embryonicdevelopment in many animals, but little is known about their identities. Using an integrated proteomic and transcriptomic approach, we identi ﬁ ed 64 proteins from the PVF of Pomacea maculata , a freshwater snail adopting aerial oviposition. Proteins were classi ﬁ ed into eight functional groups: major multifunctional perivitellin subunits, immune response, energy metabolism, protein degradation, oxidation-reduction, signaling and binding, transcription and translation, and others. Comparison of gene expression levels between tissues showed that 22 PVF genes were exclusively expressed in albumen gland, the female organ that secretes PVF. Base substitution analysis of PVF and housekeeping genes between P. maculata and its closely related species Pomacea canaliculata showed thatthe reproductive proteins had a higher meanevolutionary rate.Predicted3D structures ofselectedPVF proteins showed that some nonsynonymous substitutions are located at or near the binding regions that may affect protein function. The proteome and sequence divergence analysis revealed a substantial amount of maternal investment in embryonic nutrition and defense, and higher adaptive selective pressure on PVF protein-coding genes when compared with housekeeping genes, providing insight into the adaptations associated with the unusual reproductive strategy in these mollusks. Signi ﬁ cance: Therehasbeengreatinterestinstudyingreproduction-relatedproteinsassuchstudiesmaynotonly answer fundamental questionsaboutspeciation andevolution, but alsosolve practicalproblems ofanimalinfer-tility and pest outbreak. Our study has demonstrated the effectiveness of an integrated proteomic and transcriptomic approach in understanding the heavy maternal investment of proteins in the eggs of a non-modelsnail,and how the reproductive proteins may have evolved during the transition from layingunderwater eggs to aerial eggs.


Introduction
There has been great interest over the last several decades in studying reproduction-related proteins as such studies not only have the potential to answer fundamental questions about speciation and evolution [1,2], but also to solve practical problems of animal infertility and pest outbreak [3,4].In this regard, apple snails (Gastropoda: Ampullariidae) provide an excellent model for addressing numerous questions in evolutionary biology and good candidates to study these reproductive proteins due to a number of characteristics including high diversification of reproductive strategies [5].In fact, while all members of Ampullariidae live in freshwater, their diverse oviposition strategies include laying egg masses either underwater, at the water surface or above the waterline [6].This last strategy has seldom occurred in animals, and the egg proteins involved are the focus of the present study.In particular, species of the South American Pomacea deposit colorful egg masses above the waterline, which prevents the eggs from attack by aquatic predators, and is considered a critical innovation that has contributed to the diversification of Pomacea to become the most species-rich and derived genus of the family [6].Nevertheless, embryonic development in aerial condition is challenging as the eggs are exposed to stressful abiotic conditions such as desiccation, heat and UV radiation, as well as terrestrial predators [7][8][9].Therefore drastic structural and compositional changes in their eggs must have taken place to protect the embryos from abiotic stress and predation [10].
The Pomacea eggs contain a perivitelline fluid (PVF), which surrounds the embryo and provides energy and nutrients for embryonic development [10,11].Proteins, termed perivitellins, are an important component of organic matter in the PVF, ranging from 13% up to 18% of the total dry weight [11,12].Previous reports in the best studied Pomacea species, Pomacea canaliculata (Lamarck, 1822), characterized the two most abundant perivitellins (termed PcOvo and PcPV2) and found that they not only serve as a source of energy and nutrient, but also play several defensive functions against abiotic stressors and predators [9,[13][14][15][16][17][18].Moreover, a proteomic study identified 59 proteins in the PVF of P. canaliculata, showing that the PVF proteome is complex and highlighting the need of more studies on Pomacea reproductive proteins to better understand their adaptations [10].
Comparison between closely related species is an effective approach to study the evolution of reproductive proteins [19][20][21].In this regard, Pomacea maculata (Perry, 1810), a close relative of P. canaliculata [22], is a good candidate to address the PVF protein evolution.The two species have been widely introduced out of South America in tropical and subtropical North America [23] and Asia [24][25][26] becoming invasive pests of agricultural and natural wetlands.Their distribution ranges overlap in their native South American and invaded Asian wetlands [23,24] and they can hybridize naturally [27].Nevertheless, the two species have different distribution ranges associated with their differential cold tolerance [27].Therefore comparing the PVF proteins between the two species has the potential to reveal their contribution to adaptation and further range expansion.Nevertheless, little is known about the perivitellins of P. maculata.Recent studies have reported two major perivitellins of P. maculata, which are structurally and functionally similar to the corresponding PVF proteins of P. canaliculata [12,28].However, very little is known about the presence and roles of the less abundant PVF proteins of P. maculata [12].
We therefore profiled the PVF proteome of P. maculata by using a combined de novo transcriptome assembly and LC-MS/MS approach.We hypothesized that perivitellin genes would be actively expressed in albumen gland (AG) when compared with genes coding non-PVF proteins.We therefore compared the gene expression level of PVF proteins in AG against that of other tissues.In addition, we analyzed the substitution rates of protein-coding genes between P. maculata and P. canaliculata in order to gain insight into the mechanisms of their divergence.

Snail culture
Adults of Pomacea maculata (N30 mm shell length) were originally collected from the Paraná River near San Pedro, Argentina (33°39′ 35.97″ S, 59°41′52.86″W), cultured at Universidad Nacional de La Plata (Argentina) and later transported to Hong Kong Baptist University.The identity of P. maculata was confirmed by amplifying fragments of the elongation factor 1-alpha (EF1α) and cytochrome c oxidase subunit I (COI) genes [27] and comparing them with the sequences deposited in GenBank.Snails were cultured at 25 ± 1 °C in 250-L tanks filled with tap water.Aeration was provided by an air stone connected to an air pump, and the water was cleaned using an overflow filter.Tap water used in the snail culture was dechlorinated by aeration before use, and changed once a week.The snails were fed with lettuce, carrot and fish feed.Food leftovers and snail feces were removed daily.Under these conditions, the snails were able to grow, mate and deposit egg clutches on the tank wall.

Transcriptome sequencing, assembly and annotation
To obtain a database for protein identification following MS/MS analysis, and to detect genes specifically expressed in the AG, we sequenced the transcriptomes of AG, and other tissues (OT: foot, mantle and visceral mass) pooled in equal amount from four individuals (Fig. 1).Total RNA from AG and OT was extracted separately using the TRIzol reagent (Invitrogen, Carlsbad, USA) following the manufacturer's manual except that a high salt solution (mixed solution of 1.2 M sodium chloride and 0.8 M sodium citrate) and a lithium chloride solution (final concentration 2 M) were added before and after the isopropanol precipitation step to remove polysaccharides.RNA quality was checked using agarose gel electrophoresis and an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA).The RIN value for the albumen gland and other tissues sample was 8.1 and 9.1, respectively.Messenger RNA was selected, reversetranscribed into cDNA, and paired-end sequenced on an Illumina Hiseq 2000 to produce 100 base pair (bp) reads (Illumina, San Diego, CA, USA).Reads were assembled using Trinity (release 20130225) [29].The generated sequences were annotated by searching against protein databases (NCBI nr, KEGG, COG, and Swissprot) using BLASTx with an E-value threshold of 1 × e − 5 [30].Protein-coding regions were translated into amino acid sequences and used as a database for protein identification.

Egg mass collection and protein extraction
Egg masses laid within 12 h were carefully removed from the culture tank wall, rinsed several times with MilliQ water, and then air-dried in a fume hood.When the egg shells were hardened and became non-sticky, each egg was individually cracked gently using a fine sterile needle and the perivitelline fluid (PVF) was extracted using 10 μL pipette tips.The PVF was dissolved in 8 M urea, homogenized thoroughly and centrifuged at 12,000 ×g for 10 min at 4 °C.Then the supernatant was transferred into new tubes.The protein solution was purified using the methanol/chloroform method [31].Three biological samples were collected from different egg masses coming from different females, and the protein concentrations were measured with the RC-DC kit (Bio-Rad).

SDS-PAGE and LC-MS/MS analysis
The purified protein solution was mixed with the SDS-PAGE buffer (50% glycerol, 0.2 M Tris-HCl pH 6.8, 0.05% bromophenol blue, 10 mM dithiothreitol, and 10% SDS) at a ratio of 3:1 (v/v), and placed on a heating block at 105 °C for 5 min.Proteins in each sample were separated by SDS-PAGE and stained by Coomassie Brilliant Blue.Then 1% acetic acid was applied to destain the gel.Each sample was divided into 10 slices based on their band intensity and protein molecular weight (Fig. 1).
Gel slices were further destained with a solution of 50% methanol/ 50 mM NH 4 HCO 3 , washed with MilliQ, dried with 100% ACN, and re-hydrated in 100 mM NH 4 HCO 3 .Dithiothreitol (10 mM) and iodoacetamide (55 mM) were sequentially applied to reduce the disulfide bonds of proteins and alkylate the exposed sulfhydryl (−SH) groups, respectively.The gel was then digested with sequencing grade trypsin (Promega, Madison, WI) in 50 mM NH 4 HCO 3 .Peptide solutions were recovered from the gel, desalted using Sep-Pak C18 cartridges (Waters, Milford, MA), and dried in a vacuum concentrator (Eppendorf, Hamburg, Germany).
The dried fraction of each biological sample was reconstituted with 0.1% formic acid and analyzed twice with a LTQ-Orbitrap Elite coupled to an Easy-nLC (Thermo Fisher, Bremen, Germany) as described [10].Briefly, peptides were separated in a C18 capillary column (Michrom BioResources, CA) using a 90-min gradient.Mass spectrometry scans with a range of 350 to 1600 m/z were acquired with a resolution of 60,000 under the positive charge mode.The five most abundant multiple-charged ions having a minimum signal threshold of 500.0 were selected for high-energy collision-induced dissociation (HCD) and fragmentation using collision-induced dissociation (CID).Both the HCD and CID scanning strategies adopted an isolation width of 2.0 m/z.For HCD fragmentation, the activation time was 10 ms and the normalized collision energy was 45%.For CID fragmentation, the activation time was also 10 ms, but the normalized collision energy was 35%.

Database search
Raw LC-MS/MS data files were converted into .mgffiles using Proteome Discovery 1.3.0.339 (Thermo Finnigan, CA), and submitted to Mascot version 2.3.2 (Matrix Sciences, London, U.K.) to search against the P. maculata database with 77,584 protein sequences including both 'target' and 'decoy' sequences.Searching parameters were identical to those described in Mu et al. (2015) [32] except that only one maximum missed cleavage of trypsin was allowed, and fixed modification was set for cysteine carbamidomethylation. Matched peptides with an ion score ≥ 22 were retained to achieve a 95% confidence in identification.Only peptides longer than nine amino acids were retained because shorter peptides could easily match the decoy database.A 1% false discovery rate threshold was applied in the final protein identification in each biological replicate.Only proteins detected in at least two replicates and had at least three matched peptides were kept.

Transcriptomic expression analysis
To distinguish genes that were specifically expressed in AG and OT, we compared their expression levels quantified as transcripts per million (TPM) which takes both gene length and sequence depth into account [33].TPM was calculated by dividing the read counts by the length of each gene in kilobases to get a reads per kilobase (RPK) value, summing up all the RPK values in AG and OT and dividing this number by 1,000,000 to get a "per million" scaling factor, then dividing the RPK values by the scaling factor.Genes with an expression level (TPM) less than 0.5 were considered to be nonexpressed [29].
Reverse transcription-PCR (RT-PCR) reactions were performed to determine the tissue specific expression pattern of seven PVF proteincoding genes.RNA samples from albumen gland, digestive gland, foot, mantle, and testis were reverse transcribed into complementary DNA (cDNA) using the High-Capacity cDNA Reverse Transcription Kit (Applied Biosystems), then the cDNAs were used as templates in PCR experiments to show gene expression.Cytochrome c oxidase I (COI) was chosen as control as this gene had high expression levels in the different tissues, and has been considered as a house keeping gene [34].Primers were designed using NCBI's Primer Designing Tool and their sequences are shown in Supplementary Table S1.PCR reactions were performed in 20 μL solution containing 0.2 μL of TaKaRa Taq™ DNA Polymerase, 1 μL of primers (final working concentration 0.5 μM), 1 μL of DNA template, 2 μL of 10 × reaction buffer, 1.6 μL of dNTP (final working concentration 0.2 mM), and 1.2 μL of 25 mM of MgCl 2 and 13 μL of DNase-free water.PCR conditions were one cycle of 5 min at 94 °C, 25 cycles of 30 s at 94 °C, 40 s at 57 °C and 40 s at 72 °C, followed by a 10 min extension at 72 °C, and a final 30 min incubation at 4 °C.Due to the technical difficulty of having no antibodies for the non-model organism, we were not able to perform Western Blot to confirm the protein expression levels.

Evolutionary analysis
A reciprocal best hit method was applied to identify ortholog pairs between P. maculata and P. canaliculata transcriptomes [32,35].Only proteins with at least 67 amino acids were used in the orthologous analysis.Local BLASTp was performed with an E-value threshold of 1 × e −5 .Paired orthologs between the two species were aligned by MUSCLE implemented in ParaAT1.0 [36] and aligned codons with gaps were removed.KaKs_Calculator was applied to calculate the nonsynonymous substitutions per nonsynonymous site (Ka), synonymous substitutions per synonymous site (Ks), and Ka/Ks ratio using the GY method [37].Sequences with Ka N 1 or Ks N 1 were removed because they might be saturated for nonsynonymous or synonymous substitutions, respectively [38].Genes with a Ks b 0.01 were excluded in order to avoid pseudohigh Ka/Ks values due to the low Ks value [39].To determine whether the genes coding PVF proteins evolved faster than conserved genes, the Ka, Ks, and Ka/Ks values of PVF gene orthologs were compared with three groups of housekeeping genes: actin, tubulin and ribosomal proteins (Supplementary Table S2) [40] using the non-parametric Wilcoxon rank sum test [40,41].

Bioinformatic analysis
Protein abundance was quantified using the Exponentially Modified Protein Abundance Index (emPAI) in Mascot [42].The PVF proteins were classified into functional categories based on their annotation.SignalP 4.1 was applied to determine whether the PVF proteins were secreted [43].For PVF proteins with a Ka/Ks value N 0.5, their three-dimensional structures were predicted using Phyre2 which applied a profile-profile alignment algorithm [44].Models having a confidence level of N 90% in Phyre2 were validated using PROCHECK and those having N 90% residues in the most favored and additional allowed regions were kept.The potential ligand binding sites (or active sites) were then predicted by 3DLigandSite [45].Sequences were also searched against the Conserved Domain Database to detect the conserved protein regions/domains [46].

Results
Transcriptome sequencing produced a total of 52,732,156 and 54,961,478 clean reads for AG and OT, respectively.The mean GC content of AG and OT sequences was 44.94% and 45.05%, respectively.Assembly of the sequences using Trinity produced 92,567 and 130,305 unigenes in AG and OT, respectively.Combining the unigenes from the two organs resulted in 105,349 unigenes and the N50 is 1332 nt.Among them, 38,587 were successfully annotated by searching against several protein databases (i.e., NCBI nr, SwissProt, KEGG, GO and COG).These annotated unigenes were used for protein identification.
Since AG is considered the organ that synthesizes most of the PVF proteins in P. canaliculata [10], the expression of genes coding the PVF proteins in AG is expected to be higher than in OT.Indeed, comparison of AG and OT transcriptomes showed that genes coding 22 of the PVF proteins were only expressed in AG, and as the majority of PVF proteins were preferentially expressed in AG (Table 1).The five most highly expressed PVF genes in AG were PmPV1-4b, perivitellin ovorubin-2, PmPV1-4a, PmPV1-2, and perivitellin ovorubin-3, all with a TPM N 43000.Their expression levels in OT were all less than 0.3, below the threshold of 0.5 considered to be non-expressed in a previous study [29].Seven PVF protein-coding genes were exclusively expressed in AG (Fig. 2).Fig. 3 shows the correlation between the abundances of 64 PVF proteins and their corresponding gene expression levels in AG and OT.Protein abundance was not significantly correlated with gene expression level in OT; but was positively correlated with gene expression level in AG, consistent with the fact that AG is the organ where most of the PVF proteins are synthesized.
Base substitution analysis of the orthologs between the two species revealed 12,231 pairs of genes having Ka/Ks values.Among the 64 PVF gene coding proteins, 30 had Ka/Ks values (Supplementary Table S2).
Homology modeling of the 11 proteins having Ka/Ks N 0.5 using Phyre2 with a threshold of 90% confidence showed that, in 5 of them, the predicted 3D structure had a reasonable match to the corresponding template (Fig. 5, Supplementary Fig. S2).Analysis with 3DLigandSite revealed that some of the nonsynonymous substitution sites and potential ligand binding/active sites overlapped or were very closely located (e.g., Kunitz-like protease inhibitor, ferritin (partial), and transferrin-like protein), indicating the potential of nonsynonymous substitution to modify protein conformation and thus their function.For instance, in ferritin (partial) there were 115 amino acid residues matching its homologous template (PDB ID: 3VNX) with a relatively high sequence similarity (49%).The confidence that this snail ferritin and its template are homologous was 100%, indicating it was reasonable to predict the 3D structure of the PVF protein based on that of 3VNX.As shown in Fig. 5, the snail ferritin has four alpha helixes (76% of amino acids) and one short beta strand (3% of amino acids).The molecule has three predicted ligand binding sites (8E, 11H and 53E), four putative ferrihydrite nucleation centers (3K, 6D, 7E and 10E) and three putative iron ion channels (64H, 77N and 80E).Comparison with its ortholog of P. canaliculata revealed 11 nonsynonymous substitution sites, some overlapping either with the binding site or the nucleation centers of ion channels.The homology models of the other proteins are depicted in Fig. S2.

Functional composition of PVF proteins
The PVF of P. maculata eggs contain 64 proteins belonging to many functional groups, indicating a heavy maternal investment to support the embryonic development.The most abundant P. maculata PVF proteins are the subunits of the two major perivitellins, namely PmPV1 and PmPV2 [28].A previous study on the sister species P. canaliculata indicates that the orthologs of these perivitellins are an energy source and structural precursors of other biological molecules during embryonic development [11].PmPV1, representing ~52% of PVF protein, is a glyco-lipo-carotenoprotein responsible for the bright red coloration (presumably a warning signal) and embryo protection from oxidative damage [28].
Among the 13 P. maculata multifunctional perivitellin subunits detected, the most abundant group is the PmPV1 subunits, which is in agreement with a previous study of P. canaliculata, where subunits of PcOvo, the ortholog of PmPV1, are also the most abundant [10].Interestingly, the 5 subunits differed markedly in their relative abundance, pointing to the potential of different arrangements in the native protein structure in the two species.For example, the PmPV1-2 has an emPAI of 24.41%, but its corresponding ortholog in P. canaliculata is 48.61%.The second most abundant perivitellin group in P. maculata is PV2 which is antinutritive, antidigestive and neurotoxic in P. canaliculata, due to the presence of a light chain with lectin binding property and a heavy chain with perforin properties [9].Sequence alignment of the full amino acid sequences of the light and heavy chain of PV2 (i.e, tachylectin and MACPF subunits) [9,10] between P. maculata and P. canaliculata showed 97% and 96% similarities, respectively (Supplementary Fig. S3).The nonsynonymous substitution sites (9 and 20 amino acid sites in tachylectin and MACPF subunit, respectively) were not located in the phosphorylation, glycosylation sites or MACPF signature regions.This was consistent with a previous study on PcPV2 [9] showing that the MACPF signature in Pomacea was quite conserved, which might be related to their adaptation to the aerial egg deposition strategy as an egg defense protein.A comparison of PV3 between the two species showed that they differed markedly both in protein abundance and subunit compositions [12].Therefore, more work is needed to better understand the relationship between the differences in subunit abundance of this perivitellin fraction within and between the two species and the impact this may have in embryonic development.
Apart from the pigmented perivitellins that are responsible for the conspicuous reddish coloration, antidigestive and antinutritive defenses against predators [28], immune responsive proteins are a potentially important functional group in P. maculata eggs that protect the embryos from pathogens.Among them, the iron storage protein ferritin, the ninth most abundant PVF protein (~1.38%) in P. maculata, has been shown to actively participate in mollusk immunity [47,48].It is a major exogenous yolk protein in the freshwater snails Lymnaea stagnalis and Planorbarius corneus [47].Ferritin was overexpressed in P. canaliculata undergoing estivation, which might prevent iron binding to pathogens to reduce their growth [49].Kunitz-like protease inhibitor also plays an essential role in innate immunity by serving as an inhibitor of serine proteinase in the razor clam Solen grandis [50].Although its protein abundance in our study was relatively low (~0.33%), the transcript expression level in albumen gland (TPM = 1938.09)was much higher than that in other tissues (TPM = 0.30), indicating its high transcription rate in the AG of P. maculata.In fact P. maculata eggs display a moderate serine protease inhibition activity [12].Serpin 1 and serine protease inhibitor 1 may protect the egg masses against foreign serine protease released by bacteria and viruses.They may also function in melanization and immune response by activating phenoloxidase [51,52].Several other immune-related proteins (e.g., C1q domain-containing protein, peptidoglycan recognition protein S1L and thioester-containing protein-C) are also present in the PVF of P. maculata, and these proteins have been detected in the PVF of P. canaliculata [10] and mucosal secretions of Crassostrea virginica [53].The transcript abundances of C1q domain-containing protein and peptidoglycan recognition protein S1L in AG (TPM = 19.602.56 and 51.85, respectively) are much higher than those in OT (TPM = 0.06 and 0.00, respectively), indicating that these proteins may be specifically secreted for packaging into the egg.
A recent study on the newly laid eggs of P. maculata shows that the PVF contains 76.4% carbohydrates, 18.7% proteins, and 1.2% lipid in dry weight [12], which is in the same range observed for its sister species P. canaliculata [11].These molecules serve as an energy and carbon sources during the embryonic development, which requires a wide variety of enzymes.Our study detected several carbohydrate, protein, and lipid metabolizing enzymes in P. maculata PVF, which are readily available even before the embryo develops and starts to secret the other enzymes required to fully metabolize the PVF.The PVF enzymes include 4hydroxyphenylpyruvate dioxygenase (HPPD), serine and cysteine protease inhibitors and 15-hydroxyprostaglandin dehydrogenase (NAD +).HPPD, a Fe(II)-dependent, non-heme oxygenase, functions in tyrosine catabolic pathway which is common to almost all aerobic life forms [54].Cysteine protease inhibitor plays various roles in physiological and pathological processes such as inhibiting protein processing and degradation as well as MHC class II immune responses [55,56].15-hydroxyprostaglandin dehydrogenase (NAD+) has alcohol dehydrogenase activity and is involved in the metabolism of lipid signaling molecules.For example, some glycolysis-related enzymes (e.g.dihydrolipoamide S-acetyltransferase, pyruvate dehydrogenase β, enolase, phosphoglycerate kinase 1 and glyceraldehyde 3-phosphate dehydrogenase) have been reported to play a critical role in the embryonic development of the snails P. canaliculata and Lymnaea stagnalis [57,58].

Evolutionary characterization of PVF proteins
A self-BLAST of the P. maculata PVF protein coding genes with an Evalue threshold of 1 × e −5 revealed 29 pairs of homologs, indicating these proteins have undergone several duplication even`ts during their evolutionary history.However, similar to the PVF genes of P. canaliculata [10], the sequence similarities between the PmPVF paralogs   vary greatly from 27.1% to 99.7%, indicating some of these gene duplications must have occurred early in the diversification of Pomacea.A phylogenetic analysis of PmPV1 paralogs and sequence alignment between the four PmPV1 paralog subunits in P. maculata and their corresponding orthologs in P. canaliculata revealed several fully conserved amino acid sites and motifs such as the IXGGP motif.Since PVF proteins in ampullariids play vital roles and their differential composition is expected to be critical for the acquisition of various oviposition strategies [6,8], we hypothesize that at least some of these proteins must be under high pressure of adaptive selection.Base substitution analysis of PVF protein orthologs between the sister species P. maculata and P. canaliculata indeed showed that their sequence divergence varied substantially.Among them, five PVF genes had a Ka/Ks value N1.0, a clear indication that they have undergone positive selection.In addition, six genes had a Ka/Ks value between 0.5 and 1, which are candidate genes for further examination to evaluate whether certain sequence regions or amino acid sites have undergone positive selection [19].These 11 genes with relatively high Ka/Ks values encode not only protein subunits of the major multifunctional perivitellins, but also proteins involved in signaling and binding, immunity, protein degradation, and energy metabolism functions, indicating widespread positive selection in PVF protein coding genes in these freshwater snails adopting aerial oviposition.Comparing the Ka, Ks, and Ka/Ks values between PVF protein-coding genes with a set of housekeeping genes shows that the PVF genes have significantly higher mean Ka, Ks and Ka/Ks values than the housekeeping genes.This result indicates that the PVF genes are under weaker selective constraints than the housekeeping genes, further supporting previous studies showing that reproductive proteins evolve faster than housekeeping genes in various animals [19,59].Since the Ks values are quite similar between PVF and housekeeping genes, the higher evolutionary rate of PVF protein-coding genes is likely due to the higher rates of nonsynonymous substitutions.

Possible functional consequences of base substitution in some PVF proteins
Homology modeling shows that, in some of PVF proteins, the nonsynonymous substitutions are located at or near the potential binding regions, suggesting the mutation may modify protein conformation and thus protein function.For example, ferritin, a protein that maintains iron homeostasis and provides antioxidant and antimicrobial protection have several nonsynonymous substitution sites in this apple snail.Of interest are the 8E located near the ferrihydrite nucleation center (6D and 7E), and 65K and 66I located near an iron ion channel (64H).Mutations at these sites could change the conformation of the 3D structure, and thus weaken or strengthen the iron transport and storage functions [60,61].

Conclusions
The present study provides the first comprehensive proteomic analysis of P. maculata egg perivitelline fluid and the development of a valuable transcriptomic resource for a non-model organism which is an invasive species and serious crop pest.Albumen gland exclusively expresses many of the PVF proteins, highlighting the critical role of this accessory gland in apple snail reproduction.The maternal investment is not limited to nutrition and protection as previously reported for the major perivitellins.Evolutionary analysis between P. maculata and the sister species P. canaliculata proteins shows that around 8% of the PVF protein-coding genes have undergone positive selection.Some of these proteins have experienced nonsynonymous amino acid substitutions at or near the ligand binding sites, potentially affecting protein conformation and function.Overall, our study provides the proteomic and transcriptomic foundation for further functional and evolutionary analysis of reproductive proteins in a group of snails that have shifted from aquatic to aerial oviposition, a strategy that has seldom occurred in the animal kingdom.

Fig. 1 .
Fig. 1.Pomacea maculata.An individual (shell length = 63 mm) crawling near the water surface (A).A clutch of newly deposited eggs (length = 7.9 cm) (B).A representative SDS-PAGE of the egg perivitelline fluid proteins (right lane) and protein molecular weight markers (kDa, left lane) (C).Numbers 1 to 10 on the right indicate the gel slices used for LC-MS/MS analysis.Anatomy of P. maculata with and without shell showing the positions of several structures and organs (D).

Fig. 2 .
Fig. 2. Gene expression analysis of seven highly-abundant PVF protein-coding genes and a cytochrome c oxidase I (COI) gene.The capital letters on top of each lane represent different organs.A: albumen gland; D: digestive gland; F: foot; M: mantle; T: testis.

Fig. 4 .
Fig. 4. Distribution of Ka, Ks and Ka/Ks values in 30 PVF protein coding genes and 30 housekeeping genes.The solid line and dotted line represents Ka/Ks = 1 and Ka/Ks = 0.5, respectively.

Fig. 5 .
Fig.5.Predicted 3D structure (A) and amino acid sequence (B) of P. maculata ferritin (partial).The locations of potential ligand binding sites (yellow), nonsynonymous substitution sites (red) and overlapped ligand binding and nonsynonymous substitution sites (green) are highlighted.

Table 1
Proteins identified from the egg perivitelline fluid of Pomacea maculata.

Table 1
(continued) Normalized emPAI values.Data are expressed as mean ± standard deviation based on two or three biological replicates.b Transcriptomic expression levels of genes in albumen gland (AG) and other tissues (OT).Data are expressed in transcripts per million (TPM).Bold TPM values: Genes only expressed in AG (TPM b 0.5 in OT).
a c Indicates that the protein has a predicted signal peptide.