Full-length LTR retroelements in Capsicum annuum revealed a few species-specific family bursts with insertional preferences

Capsicum annuum is a species that has undergone an expansion of the size of its genome caused mainly by the amplification of repetitive DNA sequences, including mobile genetic elements. Based on information obtained from sequencing the genome of pepper, the estimated fraction of retroelements is approximately 81%, and previous results revealed an important contribution of lineages derived from Gypsy superfamily. However, the dynamics of the retroelements in the C. annuum genome is poorly understood. In this way, the present work seeks to investigate the phylogenetic diversity and genomic abundance of the families of autonomous (complete and intact) LTR retroelements from C. annuum and inspect their distribution along its chromosomes. In total, we identified 1151 structurally full-length retroelements (340 Copia; 811 Gypsy) grouped in 124 phylogenetic families in the base of their retrotranscriptase. All the evolutive lineages of LTR retroelements identified in plants were present in pepper; however, three of them comprise 83% of the entire LTR retroelements population, the lineages Athila, Del/Tekay, and Ale/Retrofit. From them, only three families represent 70.8% of the total number of the identified retroelements. A massive family-specific wave of amplification of two of them occurred in the last 0.5 Mya (GypsyCa_16; CopiaCa_01), whereas the third is more ancient and occurred 3.0 Mya (GypsyCa_13). Fluorescent in situ hybridization performed with family and lineage-specific probes revealed contrasting patterns of chromosomal affinity. Our results provide a database of the populations LTR retroelements specific to C. annuum genome. The most abundant families were analyzed according to chromosome insertional preferences, suppling useful tools to the design of retroelement-based markers specific to the species.


Introduction
LTR retroelements are the most abundant transposable elements in plant genomes and constitute an important portion of the dispersed repetitive DNA (Bennetzen 2000;Schnable et al. 2009;Jiang and Ramachandran 2013). These elements are sequences of DNA between 4000 and 30,000 bp that, in their complete and intact state, have all the necessary machinery for their replication and amplification in the host genome (Grandbastien 1998;Hirochika et al. 1996). This is carried out in two structurally different regions: (i) the coding module, typically made up of one or two open reading frames: Gag, which encodes a structural protein involved in the maturation and packaging of mRNA, while the second is made up of a polycistronic mRNA called Pol that encodes a polyprotein that contains a protease A (PR), a retrotranscriptase (RT), an integrase (INT), and an RNase-H (RH) (Kumar and Bennetzen 1999;Havecker et al. 2004); and (ii) the regulatory module, made up of two long terminal repeats (LTR) between 100 bp and 5 kbp that flank the coding portion and it carries promoters, regulatory elements, and terminators (Benachenhou et al. 2013;Cavrak et al. 2014;Galindo-González et al. 2017;Neumann et al. 2019). At the time of insertion of an individual retrotransposon, both LTRs are identical; however, as time passes, they accumulate a series of independent mutations that produce divergent sequences, but that maintain a high degree of similarity (SanMiguel et al. 1998;Bowen and McDonald 2001).
Using a taxonomic classification with a phylogenetic criterion, plant retroelements were grouped into two major superfamilies, Copia and Gypsy. Despite both groups being ancestral and sharing common characteristics such as their life cycle, genome organization, and protein functions, their origin is polyphyletic (Kumar and Bennetzen 1999). Thus, significant differences are found in protein organization and their amino acid sequence. Moreover, each Superfamily is derived in different evolutive lineages. In Angiosperms, six canonical lineages in Gypsy (Athila, Tat, Galadriel, Reina, CRM/ CR, and Del/Tekay) and seven canonical lineages in Copia (Tar/Tork, Angela/Tork, GMR/Tork, Maximus/ Sire, Ivana/Oryco, Ale/Retrofit, and Bianca) were widely accepted by the international community working in the field (Wicker and Keller 2007;Du et al. 2010;Llorens et al. 2011;Domingues et al. 2012). Following the same criterion but on a finer scale, lineages consist of copies of retroelements that share different degrees of sequence similarity. Copies of retroelements with high sequence similarity are considered to belong to the same family, which is the last accepted taxonomic level for retroelements (Wicker and Keller 2007). Whereas canonical lineages derived from Gypsy and Copia LTR retroelements are ancestral and mimics eukaryotic macroevolution (Llorens et al. 2009), the radiation of families of retroelements is involved in microevolutionary processes such as adaptation and speciation (Bourgeois and Boissinot 2019).
At a functional level, LTR retroelements might be classified into two main groups based on their capacity to complete their life cycle, autonomous and non-autonomous. The first group corresponds to fully functional elements and is characterized by carrying on all the essential components for its self-retrotransposition (Schulman and Kalendar 2005). Thus, individual autonomous copies may be, to varying degrees, transcriptionally or translationally competent (translation leading to a functional protein) or active. Contrarily, nonautonomous retroelements are characterized by lacking some of the coding domains for the main proteins required for replication retrotransposons (Sabot and Schulman 2006). This group can be generated by deletion or mutation from autonomous retroelements, and they can gain mobility parasitizing autonomous members of the same or related families (Sabot and Schulman 2006). Despite that some non-autonomous retroelements demonstrate successful radiation with a fairly uniform structure in plant genomes (Myers 2001;Jiang et al. 2002a), the vast majority of them correspond to older or fossil elements that have experienced severe deletions or fragmentation by unequal homologous recombination and illegitimate recombination (Devos et al. 2002;Ma et al. 2004;Du et al. 2010). Hence, they were not able to be dated or delimited precisely at lineage and family level using the traditional procedure, as well as the identification of the autonomous partner families that gave rise to them is very difficult (Jiang et al. 2002b;Kalendar et al. 2004;Kejnovsky et al. 2006). Thus, under a functional and taxonomic perspective, the analysis of autonomous retroelement populations in genomes shows to be a robust tool to understand the dynamics of recently amplified families potentially active nowadays (Du et al. 2010;Marcon et al. 2015;Paz et al. 2017).
Despite the great diversity of retroelement families present in plant genomes, most of the copies are in a quiescent state, highly regulated by genetic and epigenetic host genome mechanisms (Vicient 2010;Beulé et al. 2015). This state can be altered due to different types of stress, promoting the activation and expression of certain copies (Hirochika and Hirochika 1993;Grandbastien 1998;Paz et al. 2015). During this process, a single activated copy of a retroelement can produce a large number of identical copies of itself, which could be inserted at new positions within the host genome (Biémont and Vieira 2006;Feschotte and Pritham 2007;Wicker and Keller 2007;Zhao and Ma 2013). Therefore, depending on the window of deregulation caused by the host genome, waves of familyspecific amplification can occur. When the genome regains control and activates the silencing mechanisms, the newly inserted copies occupy a permanent place in the genome, with different genetic consequences. If this phenomenon occurs in the reproductive tissues, genetic modification becomes heritable (Maupetit-Mehouas and Vaury 2020).
Several tools have been developed to study the dynamics of retroelements in genomes. On the one hand, the divergence between the LTR sequences within a retroelement has proven an excellent molecular clock to determine the time of insertion of each copy to identify those that have recently been inserted in a given genome. Sharma et al. 2008;Kijima and Innan 2009;Paz et al. 2017). On the other hand, there is evidence that retroelements are integrated into genomic regions in a non-random manner (Gao et al. 2008;Baucom et al. 2009;Nellåker et al. 2012). From that perspective, certain retroelements tend to be inserted in regions that are not silenced and have less competition or regions enriched with other retroelements (SanMiguel et al. 1996;Gao et al. 2008;Naito et al. 2009). In this way, the application of the FISH technique allows us to identify affinities of lineages of retroelements towards particular chromosome regions.
In this study, we analyzed the dynamics of autonomous retroelements in the Capsicum annuum genome. This species is diploid (2n = 24) but with the peculiarity of having a relatively large genome (3.26 Gb) if compared to other solanaceous species with the same ploidy level (Bennetzen 2002;Park et al. 2011;Qin et al. 2014). This "genomic obesity" is due mostly to the accumulation of repetitive sequences, especially mobile elements (Bennetzen 2002), so pepper is an ideal model for the study of expansion and distribution of LTR retroelements. Thereby, this research aims to identify recently radiated LTR retroelement lineages in C. annuum genome and determine their insertional preferences along chromosomes. Moreover, a database of autonomous and potentially active families of LTR retroelements specific to C. annuum genome was obtained. The information provided here is of interest for the study of intraspecific genetic variability in pepper, since most of the retroelement-based markers in the species are heterologous.

Data mining
The nuclear genome sequence of the Capsicum annuum cv. Zunla (Ref_v1.0) was obtained from the GenBank database (accession no. ASJU00000000). De novo LTR retroelements were identified with LTR-Finder software (http://tlife.fudan.edu.cn/ltr_finder/; Xu and Wang 2007 ) with the following parameters: (i) minimal distance between LTRs: 3500 bp; (ii) ps_scan algorithm to detect protein domains of RT, IN, and RH if they are identified; (iii) conserved domain prediction PBS (primer binding sequence) which was conducted assigning as a reference genome the database of "Arabidopsis thaliana (2004)"; (iv) presence of conserved sequences, such as conserved endings TG-CA; and (v) contain at least two of the following features: TSR (terminal repeated sequences), PBS, and PPT (polypurine tract terminal).
The sequence between the two putative LTR (internal region) was subsequently analyzed in the Conserved Domains databases at NCBI in the same way as described by Paz et al. (2017). Structurally full-length elements were defined as those containing both LTRs and an internal portion encoding for all the typical proteins of Gypsy and Copia superfamilies (Fig. S1A). Full-length elements were annotated and the amino acid sequences of the RT for phylogenetic analyses were extracted from the list of domain hits provided in the output of the Conserved Domains database in the same manner as described in Fig. S1A. Truncated elements and fragments were not considered in this study (Fig. S1B). Retroelement families were defined by evolutionary relationships based on a phylogeny tree of RT.

Phylogenetic analyses
The evolutionary relationships of all the copies of LTR retroelements annotated were analyzed. Reference sequences from previously characterized LTR retroelements from different plant host organisms including several Solanaceae spp. were included (Table S1). Protein sequences were aligned in Seaview using Muscle (Gouy et al. 2010). Maximum likelihood phylogenetic analyses based on the amino acid sequence of the RT were performed with version 7.2.8 RAxML, under the JTT + Γ model. One hundred rapid bootstrap inferences were done with RAxML.
Retroelement families were defined by LTR sequence clustering and by evolutionary relationships based on a phylogeny tree of RT. This analysis revealed similar results to phylogenetic analysis using RH (Paz et al. 2017). Copies of LTR retroelements with bootstrap values higher than 95% were considered belonging to the same family. Once the families were defined, a reference sequence of one member per family was selected and submitted to the DDBJ database (www.ddbj. nig.ac.jp), accession numbers LC434324-LC434447.
To validate and to determine the physic distribution of each retroelement family in C. annuum reference genome, a UGENE BLAST was performed considering 85% of sequence identity with the software Unipro UGENE version 1.31 (Okonechnikov et al. 2012). In addition, Pearson's correlation analysis between the frequency of each family identified by the LTR finder and the number of hits identified by UGENE was performed.

Estimation of insertion time for LTR retrotransposons in pepper
The insertion time was estimated according to the method described by Ma et al. (2004). The CLUSTAL multiple alignment method from MEGA4 (Tamura et al. 2007) was used to align all LTR pairs. The Kimura twoparameter method was used to calculate the distance (d) estimations and the SE for all LTR pairs, under the complete deletion option (Tamura et al. 2007). The rate variation among sites was modeled with a gamma distribution (shape parameter = 8). SE estimates were obtained by using the analytical formula option in MEGA4. Insertion times were estimated by using the following equation: t = d/2r. The rate (r) of neutral evolution of 1.3 × 10 −8 substitutions per site per year was used ).

Comparative genomic analysis
The dynamics and radiation of retroelements families identified in C. annuun in the genomes of another Solanaceae species were determined by performing a BLAST of the complete reference sequence of each family identified against the NCBI database (www. blast.ncbi.nlm.nih.gov). Eight genomic top-level sequences are currently available in the taxid 4070: BLAST parameters were set as follows: (i) Database, RefSeq Genome Database; (ii) Organisms, Solanaceae (taxid: 4070); (iii) Program selection, Megablast (Highly similar sequence); and (iv) Algorithms general parameters, Max target sequences selected to display among 500 and 1000 aligned sequences according to the number of hits. Additional BLAST analyses were performed using retrotransposons identified from Solanaceae genomes described in Table S1. Results were filtered by Query Coverage range from 80 to 100%, E-value = 0.0 and Score > 200. The number of hits was graphed by species and retroelement family.

Plant material
Seeds of C. annuum cv. Zunla kindly provided by Dr. Qin Cheng (Zunyi Academy of Agricultural Sciences, China) were used. Twenty plants were pre-germinated in a Petri dish on wet paper for a week. Once they emerged, they were transplanted in pots of 10 cm of diameter filled with sterilized soil as a substrate and maintained in a greenhouse with a photoperiod of 16/8 h at 24/19°C (day/night), values of relative humidity of 60~80%, and a light intensity of 200 μmol m −1 s −1 . DNA was extracted by CTAB II procedure (Weising et al. 2005) from foliar tissue obtained from Zunla plants.

Chromosome preparation
Mitotic chromosomes were examined in root tips obtained from plants grown as previously described. Roots were pretreated in 8-Didroxinonolein 2 mM during 4-5 h at 14°C and fixed in EtOH:acetic acid (3:1; v/v), washed in distilled water, digested 45 min at 37°C with Pectinex SP ULTRA® (Novozymes), and squashed in a drop of 45% acetic acid. After coverslip removal in liquid nitrogen, the slides were stored at −20°C.

LTR retroelement family selection and specific probe design and construction
Specific probes for each of the three most abundant LTR retroelement families identified in C. annuum genome, GypsyZla_16; GypsyZla_13, and CopiaZla_01 (see "Results"), were developed as described below. Firstly, a consensus sequence for each family of retroelements was obtained from multiple alignments of at least twenty members of the family using the Muscle tool in the software MEGA4. Then, specific primers were designed over the RT and RH conserved regions of each retroelement family with the primer3 online software (http://biotools.umassmed.edu/bioapps/primer3_www. cgi) with the following selection criteria: (i) primer length among 25 and 29 bp; (ii) CG content among 40 and 60%; (iii) the non-formation of dimers and selfcomplementarity; (iv) similar Tm among primers; (v) that amplicon must include portions of conserved RT and RH sequences and range from 800 to 1100 pb. Designed primers sequences are detailed in Table S2.
Secondly, probes were amplified with the specific primers described previously using the Zunla DNA as a template. PCR was performed with 25 nmol of the template, 1 μl PCR buffer 10×, 1.4 μl MgCl 2 (25 mM), 1.0 μl primer Fw (10 μM), 1.0 μl primer Rv (10 μM), 0.5 μl of dNTPs (10 mM), and 1 unit of Taq DNA polymerase (Invitrogen®), in a final volume of 10 μl. Thermocycler program consisted in 1 cycle of 5 min at 94°C; 35 cycles of amplification of 45 s at 94°C, 45 s at 62.5°C/68.3°C/65.8°C (CopiaCa_01/GypsyCa_16/ GypsyCa_13, respectively), and 30 s at 72°C; and a final elongation cycle of 10 min at 72°C. Amplified fragments were extracted from agarose gels and cloned in the plasmid vector pGEM-T Easy (Promega) in the same way as described in Paz et al. (2015). Single clones positive for inserts were selected for sequencing. The plasmid DNA of individual clones was obtained by the alkaline lysis procedure. The presence of insert in the purified plasmids was verified by PCR reaction with the universal M13F (5′-GTAAAACGACGGCCAG-3′) and M13R (5′-CAGGAAACAGCTATGAC-3′) primers. DNA sequencing was carried out with M13 forward primer by Macrogen Inc. (Seoul, South Korea). All nucleic acid sequences obtained were screened for vector contamination using the Vector Screen program (www.ncbi.nlm.nih.gov/VecScreen) and primer sequences were removed. The obtained nucleotide sequences were deposited in the DDJJ database (DNA Data Bank of Japan, www.ddbj.nig.ac.jp) under accession nos. LC431733-LC431740 (Table S2).
Homology search between the obtained sequences and their respective LTR retrotransposon family was conducted using the tool BLAST2 of the National Center of Biotechnology Information (NCBI; www.ncbi. nlm.nih.gov). The homology assignation criterion was based on maximum sequence cover (>90%), maximum identity (>60%), and a minimum E-value of 10 −20 . Based on this criterion, three clones were selected for probe construction, two family-specific (P-GypsyCa_16 and P-CopiaCa_01), and one clade-specific (P-[Del/Tekay]-complex) (see "Results"). The obtained amplicons by PCR with M13 primer and purified plasmids were used as a probe for FISH. Purified DNA was labeled with Digoxigenin-11-dUTP (DIG Nick translation mix, Roche) according to the manufacturer's recommendations.

Fluorescent in situ hybridization
To investigate the chromosomal distribution of specific probes P-GypsyCa_16, P-CopiaCa_01, and P-[Del/ Tekay]-complex in C. annuum genome, we performed fluorescent in situ hybridization (FISH) on somatic metaphase chromosomes and interphase nuclei. The location and number of specific signals from different probes were determined by FISH, using the protocol described by Schwarzacher and Heslop-Harrison (2000) with minor modifications. The preparations were incubated in 100 μg/ml RNAase, post-fixed in 4% (w/v) paraformaldehyde, dehydrated in a 70-100% graded ethanol series, and air-dried. On each slide, 15 μl of hybridization mixture was added (4-6 ng/μl of the probe, 50% formamide, 10% dextran sulfate, 2 SSC, and 0.3% SDS), previously denatured at 70°C for 10 min. Chromosome denaturation/hybridization was done at 90°C for 10 min, 48°C for 10 min, and 38°C for 5 min using a thermal cycler (Mastercycler, Eppendorf, Hamburg, Germany), and slides were placed in a humid chamber at 37°C overnight. Hybridization signals were detected with Avidin-FITC (Sigma) and/or anti-DIG-Rhodamine (Roche) and preparations were mounted with Vectashield-DAPI (Vector Labs).
At least five metaphases were photographed with phase contrast in an Olympus BX61 microscope with a monochromatic CV-M4+ CL model JAI® camera. All the chromosome images were captured in black and white to be subsequently pseudo-colored. Based on the metaphase photographs, the chromosome arm was divided into four regions of equal size to define the chromosome portions according to Roa and Guerra (2012): centromeric (C); proximal (P); interstitial-proximal (IP); interstitial-terminal (IT); and terminal (T). Hybridization signals were considered as dots taking into account the intensity observed in each chromosome portion. Idiogram was constructed in the base of chromosome measurements according to Levan et al. (1964). For the construction of the cytogenetic map, the absolute distance in micrometers from the hybridization signal to the centromere was measured with Adobe Photoshop CS4 (Adobe Systems Inc.) and then located in the idiogram.

Distribution and frequencies of full-length LTR retroelements in pepper genome
The analysis using LTR finder over C. annuum genome identified 3522 hits. From this data, the search of complete and intact LTR retroelements on C. annuum genome yielded 1151 structurally full-length retroelements distributed across the 12 chromosomes, 340 belonging to Superfamily Copia (30%) and 811 to Superfamily Gypsy (70%) ( Table 1; Table S3). Ratios of Copia:Gypsy retroelement were highly variable among chromosomes, ranging from 0.2 to 1.2. In the same way, there was a~5-fold retroelement density variation among one of the most and least populated chromosomes (Ch06 vs Ch07). This variation in retroelement number was not correlated with chromosome sizes (Pearson's correlation 0.33; p = 0.30).

Evolutionary relationships and family radiation of full-length LTR retroelements
Phylogenetic analysis in the base of RT conserved domains revealed the presence of 124 families of LTR retroelements in C. annuum genome belonging to all the phylogenetic clades described in plants (  Table 2 and Table S4). However, some differences were observed in frequencies and family . Contrarily, the clades Angela/Tork, Bianca, Galadriel, and Tat are sparsely populated, with less than 5 specimens and very few families. There was a slight but non-significant correlation between the number of retroelements per clade and the number of families (Pearson's correlation: 0.53; p = 0.0634). At family level, only 6 families encompass 77.6% of LTR retroelement population in C. annuum genome, with a relative frequency (F R ) higher than 2% (Table 3). Sorted in decreasing F R order: (i) GypsyCa_16, F R = 57.1% (Athila clade); (ii) CopiaCa_01, F R = 9.5% (Ale/Retrofit clade); (iii) GypsyCa_13, F R = 4.2% and (iv) GypsyCa_09, F R = 2.6% (both belonging to Del/Tekay clade); (v) CopiaCa_09, F R = 2.2% (TAR/Tork clade); and (vi) CopiaCa_03, F R = 2.1% (Sire/Maximus clade). The remaining 22.4% of the identified retroelements are distributed along 118 low populated families, of which 78 families are constituted by only one retroelement (monotypic). These results were validated by UGENE Blast against the reference genome with high positive significative correlation (LTR finder vs UGENE, Pearson's correlation 0.84, p < 0.0001; Fig. S4).
All the full-length LTR retroelement families identified in this research have been inserted into C. annuum genome in a period shorter than 5.85 Mya (Table 3; Fig. 2; Table S3). The distribution of insertion times of Gypsy and Copia superfamilies has a leptokurtic distribution with asymmetry to the left (Fisher's coefficient, Gypsy: G = 6.2; Copia: G = 2.6), indicating that the vast majority of events of insertion occurred less than 1.0 Mya (84% and 38% of insertions respectively; Fig.  2A, B). It is noteworthy that 22% of the identified complete and intact retroelements were currently inserted (0.0 Ma).
Different waves of amplification were observed (Fig.  2). In the case of Copia, the three clades Tar/Tork, Sire/Maximus, and GMR/Tork exhibited a similar trend, experiencing a gradual increase from 4.5 Mya, with a climax to 3 Mya, followed by a gradual reduction, with very few new insertions in the last period. Another wave of expansion was experimented with more recently by the Copia clades Ale/Retrofit and Oryco/Ivana, with a gradual increase in their population from 3 Mya and a substantial numeric expansion in the last 0.5 Mya, especially in Ale/Retrofit ( Fig. 2A). This last clade is the one that experienced the greatest radiation, with a great diversification of families in the first wave of its expansion, dominated by the formation of new monotypic families. In contrast, the second wave was experienced by only one family, CopiaCa_01, which during the last 6 Mya maintained a very low rate of insertion (ranged   between 0 and 4 copies each 0.5 Mya), and in the last 0.5 Mya experienced a 100-fold amplification (Fig. 2C).
In the case of the Gypsy superfamily, although this superfamily is less diverse in family radiation than Copia, two of its clades experienced significant radiation processes, Del/Tekay and Athila (Fig. 2B). In the case of Del/ Tekay, their retroelements experienced two expansion waves, the first one in the lapse of between 5.0 and 2.0 Mya, and the second within the last 0.5 Mya. On the other hand, in Athila, a rate of insertions of around 2 and 7 retroelements was inserted each 0.5 Mya. However, in the last 0.5 Mya, this rate has increased to more than 600 new insertions (Fig. 2B). The analysis of the family dynamic in those clades revealed the concordance of these waves with the amplification of only three LTR retroelement families: GypsyCa_09; GypsyCa_13; and GypsyCa_16 (Fig. 2C). The first two families were associated with the first and second waves of expansion observed in Del/Tekay clade respectively, whereas the third is the main family responsible for the expansion observed in Athila in the last 0.5 Mya. Similarly, as observed in Ale/Retrofit, the wave of amplification of Athila families previous to the amplification of GypsyCa_16 was mainly due to the radiation on monotypic families.

Comparative genomic analysis
All the retroelement families identified in this work were subsequently recovered by BLAST on the reference genome of C. annuum (Table 3; Table S6). Also, a significant positive correlation was found between the absolute frequency of copies in each family and the number of hits identified by BLAST (LTR finder vs NCBI BLAST hits against C. annuum Zunla genome, Pearson correlation's 0.45, p < 0.0001). When extending this analysis to the 8 reference Solanaceae genomes available in the RefSeq Genome Database, it was observed that most of the families were exclusive of C. annuum (92 families, 74%), this trend being more prominent in Gypsy (48 families, 86% of Gypsy families) than in Copia (44 families, 65% of Copia families) (Fig. 3). Besides, the family CopiaCa_01; and all members of the Athila and Del/Tekay lineages belonged to this group of retroelements specifically radiated in C. annuum.
Of the remainder 32 families of retroelements identified in C. annuum, only five were universal (they presented copies in the 8 reference genomes), the families CopiaCa_30; CopiaCa_07; CopiaCa_11; monotypic-Ch06a_26; and GypsyCa_15 ( Fig. 3B, C). Likewise, some families derived from the GMR/Tork lineage presented high radiation in Nicotiana spp. (CopiaCa_23 and CopiaCa_35) while others presented radiation in Solanum spp., particularly S. tuberosum (CopiaCa_30 and CopiaCa_45, belonging to the TAR/Tork and Ale/ Retrofit lineages respectively).
When comparing these results with the behavior of the 28 families of retroelements identified in other species of Solanaceas, a similar trend was observed. Few families presented a universal radiation (Tnt1; CopiaSL_23; CopiaSL_25; CopiaSL_26), most of them presented a gender-specific radiation (Fig. 3B, C). Thus, a large number of retroelement families were previously identified in Solanum spp. specifically radiated within species of the genus (CopiaSL_05; CopiaSL_15; CopiaSL_17; Tork4/ CopiaSL_37; GypsySL_01; GypsySL_03; GypsySL_04; G y p s y S L _ 0 5 ; G y p s y S L _ 0 7 ; G y p s y S L _ 1 1 ; G y p s y S L _ m o n o t y p i c | C h 0 3 _ 1 s 1 0 ; GypsySL_monotypic|Ch12_1s55), while others identified in Nicotiana spp. behaved similarly (Tto1 and Tntom1) (Fig. 3).
Another aspect to highlight that emanates from this analysis is that the lineages derived from Copia presented a greater degree of conservation among hosts of different Solanaceae genera, while the families derived  from Gypsy have a greater degree of divergence. This is revealed in the fact that only the GypsyCa_15 family was identified in the genomes of other species. This same behavior was observed with the Tntom1 (Galadriel) family identified in N. tabacum and with the GypsySL_01, GypsySL_03, and GypsySL_05 families, and Gypsode1/GypsySL_07 (all derived from Del/ Tekay), exclusive to Solanum.

Distribution of most abundant families of LTR retrotransposons in the genome of C. annuum
The three probes showed homology with their respective retroelements families/lineages (Table 4) and had hybridized along all the chromosomes of Zunla, with differences in the number of signals of hybridization relative to each probe (Figs. 4 and 5). Thus, P-GypsyCa_16 showed a higher number of hybridization signals than P-CopiaCa_01 and P-[Del/Tekay]-complex, with similar values (Fig. 4). These values are proportionally in agreement with the number of retroelements identified by bioinformatic analysis. In some situations, differences were observed in the presence of hybridization signals between the homologous chromosomes (Fig. 5).
A differential retroelement insertion pattern along C. annuum chromosome distribution patterns (Fig. 6) was observed. Thus, the distribution of P-GypsyCa_16 and P-[Del/Tekay]-complex probes shares a similar pattern, where the signals were concentrated mainly in interstitial-proximal and proximal regions of the chromosomes, followed by interstitial-terminal regions and practically absent in terminal and centromeric regions. The only exception were those incidences in centromeric regions, where it was related to zero in P-[Del/Tekay]-complex whereas it reached a moderate frequency in P-GypsyCa_16. Contrarily, in the case of P-CopiaCa_01, the pattern of insertion was marked by a high incidence in the terminal region, followed by a moderate incidence in the proximal and interstitial region and lower incidence in centromeric and interstitial-terminal regions.

Discussion
Complete and intact retroelements are a minor fraction in the universe of repetitive sequences Different studies at the genomic level in Solanaceae species have revealed that LTR retroelements constitute the major fraction of repetitive sequences (between 20 and 80% depending on the species; Qin et al. 2014;Xu et al. 2017;Gaiero et al. 2019). Thus, they have been identified as the main contributors to the variation in the genomic size of this botanical family, being constituted mainly by two fractions: (i) ancestral or fossil retroelements, generated by the gradual loss of the different components of the retroelements throughout evolution, giving rise to truncated, incomplete, and nonautonomous elements ; and (ii) Solo-LTR retroelements (consisting only of LTR-5′ and LTR-3′, lacking the internal coding portion), generated by local homologous recombination between both LTRs of the same element (Vicient et al. 1999;Xu and Du 2014). Those fractions are resulting from the cellular mechanisms that regulate the activity of retroelements, and aim to interrupt their life cycle.
A third minor fraction, and study subject of this work, consists of those (iii) complete and intact retroelements characterized by carrying on all the essential components for its retrotransposition. This autonomous and mobile fraction of the genome can impair changes in gene or genome structure, often with accompanying alterations in gene activity, promoting genome divergence and evolution (Bennetzen 1996(Bennetzen , 2000Raskina et al. 2008;Belyayev 2014;Bennetzen and Wang 2014;Anderson et al. 2019). Their action potential can be substantially increased by enhancing the activity of non-autonomous elements that hack them (Sabot and Schulman 2006). In our study, this fraction represents 0.4% of the pepper genome; this value is similar in magnitude range to those found in the genome of other plant species (Vitte et al. 2007;Beulé et al. 2015;Yadav et al. 2015;Paz et al. 2017), as in other species of the reported genus Capsicum (De Assis et al. 2020).

Intra-and inter-specific radiation of retroelement populations in Solanaceae
In our study, we detected that autonomous Gypsy retroelements were~2.4-fold greater and younger than Copia ones (Tables 1 and 3). The radiation of Gypsy and Copia and their respective lineages have been described as widely variable in the different plant genomes. Thus, in species such as Vitis vinifera (Jaillon et al. 2007   Chromosomes, light blue color, were stained with 4′,6-diamino-2phenylindole (DAPI), while red fluorescence dots (signal) indicate hybridization of the different probes (built on the conserved RT and RH sequence of the families of the selected retroelements). Labeled with Digoxigenin (DIG) and detected with antibodies conjugated with tetramethylrhodamine isothiocyanate (TRITC). Scale 5 um Although all the evolutionary lineages of LTR retroelements described in Angiosperms were identified in pepper, only three comprised 83% of the retroelements: Athila (60%; Gypsy), Ale/Retrofit (17%; Copia), and Del/Tekay (7%; Gypsy) (Table 3; Fig. 1). This result contrasted with the one observed in the genus Solanum, where a majority of Del/Tekay radiation was detected, with variations in the radiation Fig. 6 Average values of hybridization signal intensity of the three retroelement probes, on different chromosomal regions of the reference cultivar Zunla. Abbreviations: C, centromeric; P, proximal; IP, interstitial-proximal; IT, interstitial-terminal; T, terminal. The asterisk indicates values of statistical significance: ***p < 0,0001; **p < 0.01, and *P < 0.05 = violet color. The fully colored rectangles correspond to the detection of hybridization signals in both homologous chromosomes, while the rectangles that have a diagonal line correspond to the detection of the hybridization signal in one of the homologous chromosomes. Scale 5 μm of the other lineages depending on whether they were species related to potatoes or tomatoes ( Fig. 3; Park et al. 2012;Paz et al. 2017;Esposito et al. 2019;Gaiero et al. 2019). In Nicotiana, important radiation of GMR/Tork lineages was observed ( Fig. 3; Melayah et al. 2004;Petit et al. 2007). This would indicate that a differential evolutionary dynamic would be shaping the composition of retroelements in the genomes of this group of species.
These lineage-specific expansion phenomena are due to the massive retrotransposition of a few families of retroelements in the genomes in different plant species at different periods during their evolutionary process (Vicient et al. 2001;Baucom et al. 2009;Beulé et al. 2015;Paz et al. 2017;Zhang and Gao 2017). In our research, we delimited a total of 127 families of complete and intact LTR retroelements in the C. annuum genome, of which we were able to verify that only three represented 71% of the total population: GypsyCa_16 (Athila; 57.1%), CopiaCa_01 (Ale/Retrofit; 9.5%), and GypsyCa_13 (Del/Tekay; 4.2%) ( Table 3). These highly radiated families in the Capsicum genome were not found in the genomes of other Solanaceae species, belonging to the group of exclusive retroelement families of pepper (78% of the total) (Fig. 3). In the same way, some families identified in Solanum and Nicotiana genera show a similar behavior, especially in Gypsy derivate families (Fig. 3). This retroelement family-specific behavior has been observed in other diploid plant species with a large genome size such as Hordeum vulgare-where a single family of retroelements, BARE-1, represents 10% of the genome of the species (Jääskeläinen et al. 2013) and in Oryza australiensiswhere the amplification of only three families of retroelements produced doubled the size of their gen o m e ( P i e g u e t a l . 2 0 0 6 ) . I n t h e c a s e o f S. lycopersicum, the survey of the families of retroelements inhabiting its genome revealed that, although there was differential radiation from two families of retroelements derived from Del/Tekay and Tork/ GRM (both exclusive of Solanum) and that together they comprised almost 50% of the total of the identified retroelements (Jingling/GypsySL_01 and Trok4/ CopiaSl_37; Paz et al. 2017), this radiation was much lower than the one found in pepper (Fig. 3). This behavior could be related to the differential genome expansion between both species in a similar way to that observed in the Oryza genus (Piegu et al. 2006;Zhang and Gao 2017). In this regard, the comparison of the dynamics of retroelement populations in related diploid species with different genomic sizes such as the Oryza genus revealed the differential expansion of a few families of retroelements in the species of greater genome size (Zuccolo et al. 2007).
Currently, the best-studied retroelements in plants belong to the GMR/Tork lineage identified and isolated from different Solanaceae species. Interestingly, in C. annuum, this lineage was characterized by having a few sparsely populated families but with a high degree of homology and conservation with their relatives inhabiting other Solanaceae genomes that were not observed in the other lineages (Fig. 3). In the case of Tnt1, this retroelement constitutes one of the most abundant families of retroelements in the N. tabacum genome and its radiation has been extensively studied in the genomes of Nicotiana spp. (Melayah et al. 2004), Solanum spp. (Manetti et al. 2009;Paz et al. 2017;Tam et al. 2005), and Petunia spp. (Kriedt et al. 2014). In pepper, a single derived family was found with a very low number of copies (CopiaCa_27). This family is ancestral and could be found before Nicotiana radiation (occurred 23 Mya ago, Xu et al. 2017). Despite having a very low copy number in Capsicum spp. and Solanum spp., it is still present in these genomes with a high degree of homology ( Fig. 3; Fig. S2; Melayah et al. 2004;Paz et al. 2015Paz et al. , 2017Tam et al. 2005Tam et al. , 2009. Another similar example is the case of T135/CopiaSl_33, originally identified and isolated from S. lycopersicum (Tam et al. 2009), but which is kept in a perfect state of conservation in C. annuum (CopiaCa_04; Fig. S2), presenting the unique feature in this study of having a high degree of coverage and identity with its tomato counterpart even at the level of the LTR sequence (results not shown). Although the retroelements derived from Tnt1 and T135/CopiaSL_33 in pepper have a low copy number, their high degree of homology allowed the application of heterologous primers as highly informative genetic markers for phylogenetic inferences in Capsicum spp. (Tam et al. 2005). Other well-known and studied families from Tork/GMR lineage in Solanaceae species did not have the same success in the C. annuum genome; this is the case of Tto1 and Tork4/CopiaSL_37, the latter very widespread in the tomato genome ( Fig. 3; Paz et al. 2017).
A notable feature in C. annuum is the fact that a large proportion of the retroelements that inhabit its genome have been inserted recently. In the case of members of the Copia superfamily, at least 35% of the total population did so less than 0.5 Mya, while for Gypsy this number is much higher, an 85% (Fig. 2). These amplification waves are mainly given by three families of retroelements derived from the lineages of Athila, Ale/Retrofit, and Del/Tekay. Evolutionarily, these three lineages are the most likely to radiate in Solanaceae genomes. However, Gypsy-derived lineages are generally more ancient, with little participation in events of recent radiation (Paz et al. 2017;Esposito et al. 2019;Gaiero et al. 2019).

Bridging bioinformatics data with cytogenetics
In our research, we were able to locate cytogenetically the most abundant families identified by bioinformatics tools in the C. annum genome. In this sense, it is important to highlight that the challenge of cytogenetically identifying retroelement families in a genome is arduous, not only because they are highly variable sequences, but also because they are dispersed in the genome. The FISH technique is a tool that allows detecting and locating a specific DNA sequence on a chromosome (Kato et al. 2004). The technique relies on exposing chromosomes to a small DNA sequence called a probe that has a fluorescent molecule attached to it. The visualization procedure is indirect, by the analyses of the fluorescent signal intensity. Thus, it is important to note that the technique is qualitative, not quantitative. That is why it is not possible to quantitatively correlate the hybridization signals with what was observed at the bioinformatic level.
In practice, under optimal hybridization and detection conditions, the sensitivity of the FISH technique depends on the accessibility of the probe to the homologous region on the DNA. In turn, this is determined by the degree of condensation of chromosomal DNA. In other words, the less condensed the chromosomes are, the less coiled the DNA molecule will be and, therefore, the accessibility of the probes to chromosomal DNA will be better ( Van de Rijke et al. 2000). The degree of chromatin condensation varies substantially, not only between the different phases of cell division but also between the different types of configuration adopted by chromosomal DNA. In this work, we employed metaphase mitotic chromosomes. In this kind of sample, the spatial resolution (minimum physical distance at which two adjacent sequences can be identified under a fluorescence microscope; De Jong et al. 1999) is 5-10 Mb and the sensitivity (minimum size of one DNA sequence that can be unambiguously detected under the microscope; De Jong et al. 1999) is 10 kb (Valárik et al. 2004). Besides, the spatial resolution depends on how the chromosomal material has been previously treated and spread or stretched on the microscope slide, producing some decrease in the hybridization signal (De Jong et al. 1999;Valárik et al. 2004). A decrease in the hybridization signal has been observed in other cultivated Solanaceae genomes (Braz et al. 2018).
Another important factor that defines the success of the technique is the number of Diana sequences. Thus, the more the number of Diana sequences are present in the genome, the more intense is the fluorescent signals found. That is why, to achieve hybridization signals, a large number of copies is required to visualize a chromosomal region, whether it be of short, highly repeated, or long DNA sequences (Boyle et al. 2011;Yamada et al. 2011;Beliveau et al. 2012). Background describes that retroelements are integrated into regions of the genome and can show site-specific preferences (Gao et al. 2008;Baucom et al. 2009;Nellåker et al. 2012), forming groups. The Bare-1 element has been observed to be found in a nested form in the barley genome (Shirasu et al. 2000), a characteristic also observed in the genomes of the Hordeum and Triticeae genera (Vicient et al. 1999;Gribbon et al. 1999) and the families analyzed in this research. This type of insertion would favor detection by FISH.
Finally, our FISH results agree with our bioinformatic family-abundance analysis, whereas P-GypsyCa_16 showed a higher number of hybridization signals than P-CopiaCa_01 and P-[Del/Tekay]-complex, both with similar values (Fig. 4). GypsyCa_16 was the most abundant family observed in family-abundance analyses, followed by CopiaCa_01 and Del/Tekay lineages, both with a similar retroelement number (Table 3). In this way, our FISH analysis not only validates the results obtained at the bioinformatic and taxonomic level but also provides information about the distribution of these families/lineages along the chromosomes of C. annuum.

Different retroelement lineages, different affinity to chromatin
The genomic environment is highly heterogeneous and zoned. This characteristic is defined at different levels: (i) the complexity of the DNA sequence (coding or noncoding); (ii) the spatial configuration that this sequence adopts in space; (iii) the epigenetic setting; (iv) association with other molecules (Jarillo et al. 2009). Experimental evidence suggests that the different retroelement lineages have an affinity for different types of heterochromatin as a strategy to evade the genome's activity regulation mechanisms of silencing and/or nonhomologous recombination. In our work, we identified that the three evaluated probes hybridized in all the C. annuum chromosomes and that they also presented differential affinity towards the different chromosomal regions.
In the case of the P-[Del/Tekay]-complex lineagespecific probe, its predominant presence is towards the proximal, interstitial-proximal, and interstitial-proximal regions and with little or no presence in the centromeres and telomeres (Fig. 5). This lineage is characterized by presenting an additional Chromo domain that confers an affinity towards heterochromatin (Neumann et al. 2011). In this sense, in different plant species, it has been demonstrated that families of retroelements belonging to the Del/Tekay lineage have an affinity towards heterochromatic regions but with reduced hybridization towards centromeric regions, secondary constrictions, and major heterochromatic blocks (Wang et al. 2006;Neumann et al. 2011;Park et al. 2011;Domingues et al. 2012;Weber et al. 2013;Yang et al. 2020). In the specific case of Solanaceae, this lineage is quite ancient and has exhibited different waves of amplification before, during, and after speciation between tomato and pepper (Park et al. 2011;Paz et al. 2017). Its insertion in C. annuum has been associated with the heterochromatinization of euchromatic regions (Park et al. 2012) and may affect the expression of neighboring genes.
The family-specific probe derived from the Athila lineage, P-GypsyCa_16, exhibited hybridization signals in all chromosomal regions, with a preponderance towards the proximal and interstitial-proximal regions, and to a lesser extent in the centromeres and the interstitial-terminal region. This behavior was described for the Athila lineage in other plant species (De Souza et al. 2018;Li et al. 2019). In the case of Capsicum, the distribution of potentially autonomous retroelement lineages has recently been described in different species of the genus, including C. annuum (De Assis et al. 2020). In this study, the Athila distribution shows a trend towards the interstitial chromosomal regions. These differences with our results could be attributed to different criteria for choosing the retroelement, probe design, and/or particularities of plant material. Likewise, another study revealed an accumulation of Athila lineage in the pericentromeric to interstitial regions of all C. annuum chromosomes with a marked affinity for regions rich in genes (Park et al. 2011).
Concerning the CopiaCa_01 family, this work presents a majority distribution pattern towards terminal regions and little or no signal in the interstitial-terminal and centromeric regions in most of the pepper chromosomes. This lineage-specific preference towards telomeres has also been observed in Erianthus arundinaceus (Huang et al. 2017) and Allium cepa (Pearce et al. 1996). However, other works report a more heterogeneous distribution (Li et al. 2019;Yang et al. 2020) even in C. annuum (Park et al. 2011). Various studies have suggested that there is an association between transposable elements and rDNA, affecting their distribution, abundance, and expression (Dubcovsky and Dvorák 1995;Raskina et al. 2004;Datson and Murray 2006). This association has evolutionary implications for plant genomes. The presence of CopiaCa_01 has been associated with polymorphisms of rDNA sites in C. annuum (unpublished results, Yañez Santos, AM; Paz RC; Urdampilleta JD). Despite being preliminary, these results could suggest that the activity of CopiaCa_01 could be related to the generation of this type of polymorphism through the generation of nonhomologous recombination sites. However, further studies are necessary to obtain more conclusive data in this regard.

Conclusions
In this work, we demonstrate that there are a large number of families of autonomous LTR retroelements that have been inserted in the last 6 Mya in the C. annuum genome. All the LTR retroelement lineages described in plants are present in pepper. While the lineage families derived from Del/Tekay (closely associated with speciation events in Solanaceae) exhibited different waves of amplification in this period, two families derived from Athila and Ale/Retrofit have experienced a significant wave of amplification in the last 0.5 Mya. The FISH analysis of the insertion preferences of the majority elements identified in this work revealed significant differences: (i) GypsyCa_16 exhibited a wide insertion profile with a preponderance of signals from the centromere towards the interstitial-proximal region; (ii) CopiaCa_01 exhibited a marked insertion preference towards telomeres; (iii) the Del/Tekay lineage was limited to the proximal to interstitial-terminal regions, with little or no presence in telomeres and centromeres. Knowing these particularities within a species may be of interest in the development of molecular markers since insertional polymorphisms can be detected in different genomic regions within the same species.
Acknowledgements The authors acknowledge Dr. Maria Virginia Sanchez-Puerta for her valuable help with phylogenetic tree construction, Dr. Carlos Llorens for his support in retroelements classification, Dr. Thomas Wicker for providing retroelement reference sequences, and to Dr. Alejandra Trenchi for sharing her experience during experiments FISH. We also appreciate the valuable language assistance of Lic. Vanesa Heredia and Dr. Hernan G. Rosli. The authors are also grateful for anonymous reviewers' and the editor Dr. Jiming Jiang's comments which improved substantially this work. This work was part of the PhD thesis of YSA, which benefited from AGENCIA and CONICET fellowship.