Section: Genomics
Topic: Genetics/genomics

Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysiswithin the Onagraceae family

Corresponding author(s): Barloy, D. (dominique.barloy@agrocampus-ouest.fr)

10.24072/pcjournal.536 - Peer Community Journal, Volume 5 (2025), article no. e43.

Get full text PDF Peer reviewed and recommended by PCI

Abstract

The Onagraceae family, which belongs to the order Myrtales, consists of approximately 657 species and 17 genera. This family includes the genus Ludwigia L., which is comprised of 82 species. In this study, we focused on the two aquatic invasive species Ludwigia grandiflora subsp. hexapetala (Lgh) and Ludwigia peploides subsp. montevidensis (Lpm) largely distributed in aquatic environments in North America and in Europe. Both species have been found to degrade major watersheds leading ecological and economical damages. Genomic resources for Onagraceae are limited, with only Ludwigia octovalvis (Lo) plastid genome available for the genus Ludwigia L. at the time of our study. This scarcity constrains phylogenetic, population genetics, and genomic studies. To brush up genomic ressources, new complete plastid genomes of Ludwigia grandiflora subps. hexapetala (Lgh) and Ludwigia peploides subsp.  montevidensis (Lpm) were generated using a combination of MiSeq (Illumina) and GridION (Oxford Nanopore) sequencing technologies. These plastomes were then compared to the published Ludwigia octovalvis (Lo) plastid genome, which was re-annotated by the authors. We initially sequenced and assembled the chloroplast (cp) genomes of Lpm and Lgh using a hybrid strategy combining short and long reads sequences. We observed the existence of two Lgh haplotypes and two potential Lpm haplotypes. Lgh, Lpm, and Lo plastomes were similar in terms of genome size (around 159 Kb), gene number, structure, and inverted repeat (IR) boundaries, comparable to other species in the Myrtales order. A total of 45 to 65 SSRs (simple sequence repeats), were detected, depending on the species, with the majority consisting solely of A and T, which is common among angiosperms. Four chloroplast genes (matK, accD, ycf2 and ccsA) were found under positive selection pressure, which is commonly associated with plant development, and especially in aquatic plants such as Lgh, and Lpm. Our hybrid sequencing approach revealed the presence of two Lgh plastome haplotypes which will help to advance phylogenetic and evolutionary studies, not only specifically for Ludwigia, but also the Onagraceae family and Myrtales order. To enhance the robustness of our findings, a larger dataset of chloroplast genomes would be beneficial.

Metadata
Published online:
DOI: 10.24072/pcjournal.536
Type: Research article
Keywords: Water primrose, Ludwigia, Onagraceae, chloroplast genome, long and short reads, hybrid assembly, haplotype

Barloy-Hubler, F. 1; Le Gac, A.-L. 2; Boury, C. 3; Guichoux, E. 3; Barloy, D. 4

1 CNRS, UMR 6553 ECOBIO, Université de Rennes, Rennes 35000, France
2 Institut Curie, 7500, Paris, France
3 Université de Bordeaux, INRAE, BIOGECO, Cestas, France
4 DECOD (Ecosystem Dynamics and Sustainability), Institut Agro, IFREMER, INRAE, Rennes, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_536,
     author = {Barloy-Hubler, F. and Le Gac, A.-L. and Boury, C. and Guichoux, E. and Barloy, D.},
     title = {Sequencing, de novo assembly of {\protect\emph{Ludwigia} }plastomes,\protect\emph{ }and comparative analysiswithin the {Onagraceae} family},
     journal = {Peer Community Journal},
     eid = {e43},
     publisher = {Peer Community In},
     volume = {5},
     year = {2025},
     doi = {10.24072/pcjournal.536},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.536/}
}
TY  - JOUR
AU  - Barloy-Hubler, F.
AU  - Le Gac, A.-L.
AU  - Boury, C.
AU  - Guichoux, E.
AU  - Barloy, D.
TI  - Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysiswithin the Onagraceae family
JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.536/
DO  - 10.24072/pcjournal.536
LA  - en
ID  - 10_24072_pcjournal_536
ER  - 
%0 Journal Article
%A Barloy-Hubler, F.
%A Le Gac, A.-L.
%A Boury, C.
%A Guichoux, E.
%A Barloy, D.
%T Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysiswithin the Onagraceae family
%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.536/
%R 10.24072/pcjournal.536
%G en
%F 10_24072_pcjournal_536
Barloy-Hubler, F.; Le Gac, A.-L.; Boury, C.; Guichoux, E.; Barloy, D. Sequencing, de novo assembly of Ludwigia plastomes, and comparative analysiswithin the Onagraceae family. Peer Community Journal, Volume 5 (2025), article  no. e43. doi : 10.24072/pcjournal.536. https://peercommunityjournal.org/articles/10.24072/pcjournal.536/

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.genomics.100334

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

Full text

The full text below may contain a few conversion errors compared to the version of record of the published article.

Introduction

The Onagraceae family belongs to the order Myrtales which includes approximately 657 species of herbs, shrubs, and trees across 17 genera grouped into two subfamilies: subfam. Ludwigioideae W. L. Wagner and Hoch, which only has one genus (Ludwigia L.), and subfam. Onagroideae which contains six tribes and 21 genera (Wagner et al. 2007). Ludwigia L. is composed of 83 species (Levin et al. 2003, Levin et al. 2004). The current classification for Ludwigia L., which are composed of several hybrid and/or polyploid species, lists 23 sections. A recent molecular analysis is clarified and supported several major relationships in the genus but has challenged the complex sectional classification of Ludwigia L. (Liu et al. 2017).

The diploid species Ludwigia peploides (Kunth) Raven subsp. montevidensis (Spreng.) (Raven 1963) (named here Lpm) (2n=16), and the decaploid species, Ludwigia grandiflora (Michx) Greuter & Burdet subsp. hexapetala (Hook. & Arn) Nesom & Kartesz (named here Lgh) (2n=80), reproduce essentially by clonal propagation, which suggests that there is a low genetic diversity within the species (Dandelot et al. 2005). Lgh and Lpm are native to South America and are considered as one of the most aggressive aquatic invasive plants (Reddy et al.,2021). Largely distributed in aquatic environments in North America and in Europe (Hussner et al. 2016), both species have been found to degrade major watersheds as well as aquatic and riparian ecosystems (Grewell et al. 2016) leading ecological and economical damages. In France, both species occupied aquatic habitats, such as static or slow-flowing waters, riversides, and have recently been observed in wet meadows (Lambert et al. 2010). The transition from an aquatic to a terrestrial habitat has led to the emergence of two Lgh morphotypes (Haury et al. 2014a). The appearance of metabolic and morphological adaptations could explain the ability to acclimatize to terrestrial conditions, and this phenotypic plasticity involves various genomic and epigenetic modifications (Billet et al. 2018).

Adequate genomic resources are necessary in order to be identify the genes and metabolic pathways involved in the adaptation process leading to plant invasion (Gioria et al. 2023) with genomic information making it possible to predict and control invasiveness (Moravcová et al. 2015). However, even though the number of terrestrial plant genomes has increased considerably over the last 20 years, only a small fraction (~ 0.16%) have been sequenced, with some clades being significantly more represented than others (Marks et al. 2021). Thus, for the Onagraceae family (which includes Ludwigia sp.), only a handful of chloroplast sequences (plastomes) are available, and the complete genome has not yet been sequenced. If Lpm is a diploid species (2n=2x=16) with a relatively small genome size (262 Mb), Lgh is a decaploid species (2n=10x=80) with a large size genome of 1419 Mb (Barloy et al. 2024). Obtaining a reference genome for these two non-model species without having a genome close to the Ludwigia species is challenging and development of plastome and/or mitogenome will be a first step to generate genomic resource. As of April 2023, there are 10,712 reference plastomes listed on GenBank (Release 255: April 15 2023), with the vast majority (10,392 genomes) belonging to Viridiplantae (green plants). However, in release 255, the number of plastomes available for the Onagraceae family is limited, with only 36 plastomes currently listed. Among these, 15 plastomes are from the tribe Epilobieae, with 11 in the Epilobium genus and 4 in the Chamaenerion genus. Additionally, there are 23 plastomes from the tribe Onagreae, with 17 in the Oenothera genus, 5 in the Circaea genus, and only one in the Ludwigia genus. The Ludwigia octovalvis chloroplast genome was released in 2016 as a unique haplotype of approximately 159 kb (Liu et al. 2016). L. octovalvis belongs to sect. Macrocarpon (Micheli) H.Hara while Lpm and Lgh belong to Jussieae section (Zardini and Raven 1992, Hoch et al. 2015). Generally, the inheritance of chloroplast genomes is considered to be maternal in angiosperms. However, biparentally inherited chloroplast genomes could potentially exist in approximately 20% of angiosperm species (Hu et al. 2008, Zhang and Sodmergen 2010). Both maternal and biparental inheritance are described in the Onagraceae family. In tribe Onagreae, Oenothera subsect. Oenothera are known to have biparental plastid inheritance (Wagner et al. 2007, Jones and Cleland 1974). In tribe Epilobieae, biparental plastid inheritance was also reported in Epilobium L. with mainly maternal transmission, and very low proportions of paternally transmitted chloroplasts (Schmitz and Kowallik 1986).

The chloroplast is the symbolic organelle of plants and plays a fundamental role in photosynthesis. Chloroplasts evolved from cyanobacteria through endosymbiosis and thereby inherited components of photosynthesis reactions (photosystems, electron transfer and ATP synthase) and gene expression systems (transcription and translation, Sato 2021). In general, chloroplast genomes (plastomes) are highly conserved in size, structure, and genetic content. They are rather small (120-170 kb, Gualberto et al. 2014), with a quadripartite structure comprising two long identical inverted repeats (IR, 10–30 kb) separated by large and a small single copy regions (LSC and SSC, respectively). They are also rich in genes, with around 100 unique genes encoding key proteins involved in photosynthesis, and a comprehensive set of ribosomal RNAs (rRNAs) and transfer RNAs (tRNAs, Tonti-Filippini et al. 2017). Plastomes are generally circular but linear shapes also exist (Oldenburg and Bendich 2016). Chloroplast DNA usually represents 5-20% of total DNA extracted from young leaves and therefore low-coverage whole genome sequencing can generate enough data to assemble an entire chloroplast genome (Twyford and Ness 2017).

If we refer to their GenBank records, more than 95% of these plastomes were sequenced by so-called short read techniques (mostly Illumina). However, in most seed plants, the plastid genome exhibits two large inverted repeat regions (60 to 335 kb, Twyford and Ness 2017), which are longer than the short read lengths (< 300 bp). This leads to incomplete or approximate assemblies (Wang et al. 2018). Recent long-read sequencing (> 1000 bp) provides compelling evidence that terrestrial plant plastomes exhibit two structural haplotypes. These haplotypes are present in equal proportions and differ in their inverted repeat (IR) orientation (Wang and Lanfear 2019). This shows the importance of using the so-called third generation sequence (TGS, PacBio or Nanopore) to correctly assemble the IRs of chloroplasts and to identify any different structural haplotypes. The current problem with PacBio or Nanopore long read sequencing is the higher error rate compared to short read technology (Ferrarini et al. 2013, Jain et al. 2018, Rang et al. 2018). Thus, a hybrid strategy which combines long reads (to access the genomic structure) and short reads (to correct sequencing errors) could be effective (Wang et al. 2018, Scheunert et al. 2020).

Here, we report the newly sequenced complete plastid genomes of Ludwigia grandiflora subsp. hexapetala (Lgh) and Ludwigia peploides subsp. montevidensis (Lpm), using a combination of different sequencing technologies, as well as a re-annotated comparative genomic analysis of the published Ludwigia octovalvis (Lo) plastid. The main objectives of this study are (1) to assemble and annotate the plastomes of two new species of Ludwigia sp., (2) to reveal the divergent sequence hotspots of the plastomes in this genus and in the Onagraceae (3) to identify the genes under positive selection.

To achieve this, we utilized long read sequencing data from Oxford Nanopore and short read sequencing data from Illumina to assemble the Lgh plastomes and compared these assemblies with those obtained solely from long reads of Lpm. We also compared both plastomes to the published plastome of Lo. Our findings demonstrated the value of de novo assembly in reducing assembly errors and enabling accurate reconstruction of full heteroplasmy. We also evaluated the performance of a variety of software for sequence assembly and correction in order to define a workflow that will be used in the future to assemble Ludwigia sp. mitochlondrial and nuclear genomes. Finally, the three new Ludwigia plastomes generated by our study make it possible to extend the phylogenetic study of the Onagraceae family and to compare it with previously published analyses (Liu et al. 2017, Bedoya and Madriñán 2015, Liu et al. 2020).

Material and Methods

Plant sampling and experimental design

The original plant materials were collected in June of 2018 near to Nantes (France) and formal identified by D. Barloy. L. grandiflora subsp. hexapetala (Lgh) plants were taken from the Mazerolles swamps (N47 23.260, W1 28.206), and L. peploides subsp. montevidensis (Lpm) plants from La Musse (N 47.240926, W -1.788688). Plants were cultivated in a growth chamber in a mixture of 1/3 soil, 1/3 sand, 1/3 loam with flush water level, at 22°C and a 16 h/8 h (light/dark) cycle. A single stem of 10 cm for each species was used for vegetative propagation in order to avoid potential genetic diversity. De novo shoots, taken three centimeters from the apex, were sampled for each species. Samples for gDNA extraction were pooled and immediately snap-frozen in liquid nitrogen, then lyophilized over 48 h using a Cosmos 20K freeze-dryer (Cryotec, Saint-Gély-du-Fesc, France) and stored at room temperature. All the plants were destroyed after being used as required by French authorities for invasive plants (article 3, prefectorial decree n°2018/SEE/2423).

Due to high polysaccharide content and polyphenols in Lpm and Lgh tissues and as no standard kit provided good DNA quality for sequencing, genomic DNA extraction was carried out using a modified version of the protocol proposed by Panova et al in 2016, with three purification steps (Panova et al. 2016).

40 mg of lyophilized buds were ground at 30 Hz for 60 s (Retsch MM200 mixer mill, Fisher). The ground tissues were lysed with 1 ml CF lysis buffer (Macherey-Nagel ) supplemented with 20 μl RNase and incubated for 1 h at 65°C under agitation. 20 μl proteinase K was then added before another incubation for 1 h at 65°C under agitation. To avoid breaking the DNA during pipetting, the extracted DNA was recovered using a Phase-lock gel tube as described in Belser (Belser et al. 2018). The extracts were transferred to 2 ml tubes containing phase-lock gel, and an equal volume of PCIA (Phenol, Chloroform, Isoamyl Alcohol; 25:24:1) was added. After shaking for 5 min, tubes were centrifuged at 11000 g for 20 min. The aqueous phase was transferred into a new tube containing phase-lock gel and extraction with PCIA was repeated. DNA was then precipitated after addition of an equal volume of binding buffer C4 (Macherey-Nagel ) and 99% ethanol overnight at 4°C or 1 h in ice then centrifuged at 800 rpm for 10 min. After removal of the supernatant, 1 ml of CQW buffer was added then the pellet of DNA was re-suspended. Next, DNA purification was carried out by adding a 2 ml mixture of wash buffer PW2 (Macherey-Nagel), wash buffer B5 (Macherey-Nagel), and ethanol at 99% in equal volumes, followed by centrifugation at 800 rpm for 10 min. This DNA purification step was carried out twice. Finally, the DNA pellet was dried in the oven at 70°C for 30 min then re-suspended in 100 μl elution buffer BE (Macherey-Nagel) (5 mM Tris solution, pH 8.5) after 10 min incubation at 65°C under agitation.

A second purification step was performed using a PCR product extraction from gel agarose kit from Macherey-Nagel (MN) NucleoSpin® Gel and PCR Clean-up kit and restarting the above protocol from the step with the addition of CQW buffer then PW2 buffer.

The third purification step consisted of DNA purification using a Macherey-Nagel (MN) NucleoMag kit for clean-up and size selection. Finally, the DNA was resuspended after a 5 min incubation at 65°C in 5 mM TRIS at pH 8.5.

The quantity and quality of the gDNA was verified using a NanoDrop spectrometer, electrophoresis on agarose gel and ethidium bromide staining under UV light and Fragment Analyzer (Agilent Technologies) of the University of Rennes1.

Library preparation and sequencing

MiSeq (Illumina) and GridION (Oxford Nanopore Technologies, referred to here as ONT) sequencing were performed at the PGTB (doi:10.15454/1.5572396583599417E12). Lgh and Lpm genomic DNA were re-purified using homemade SPRI beads (1.8X ratio). Lgh has a large genome size of 1419 Mb, 5-fold larger than Lpm genome 262 Mb (Barloy et al. 2024). SR (Illumina, one run) and LR (Oxford Nanopore, three runs) sequencing were therefore carried out for Lgh and only LR sequencing for Lpm (one run). For Illumina sequencing, 200 ng of Lgh DNA was used according to the QIAseq FX DNA Library Kit protocol (Qiagen). The final library was checked on TapeStation D5000 screentape (Agilent Technologies) and quantified using a QIAseq Library Quant Assay Kit (Qiagen). The pool was sequenced on an Illumina MiSeq using V3 chemistry and 600 cycles (2x300bp). For ONT sequencing, around 8 µg of Lgh and Lpm DNA were size selected using a Circulomics SRE kit (according to the manufacturer’s instructions) before library preparation using a SQK-LSK109 ligation sequencing kit following ONT recommendations. Basecalling in High Accuracy - Guppy version: 4.0.11 (MinKNOW GridION release 20.06.9) was performed for the 48 h of sequencing. Long reads (LR) and short reads (SR) were available for Lgh and only LR for Lpm.

Chloroplast assemblies

Quality controls and preprocessing of sequences were conducted using Guppy v4.0.14 for long reads (via Oxford Nanopore Technology Client access) and fastp v0.20.0 (Chen et al. 2018) for short reads, using Q15, since increasing the Phred quality to 20 or higher has no effect on the number of sequences retained (66%). A preliminary draft assembly was performed using Lgh short-reads (SR, 2*23,067,490 reads) with GetOrganelle v1.7.0 (Jin et al. 2020) and NOVOPlasty v4.2.1 (Dierckxsens et al. 2017), and chloroplastic short and long reads were extracted by mapping against this draft genome. Chloroplastic short reads were then de novo assemble using Velvet (version 1.2.10) (Zerbino and Birney 2008), ABySS (version 2.1.5, Simpson et al. 2009, Jackman et al. 2017), MEGAHIT (1.1.2, Li et al. 2016), and SPAdes (version 3.15.4, Bankevich et al. 2012), without and with prior error correction. The best k-mer parameters were tested using kmergenie (Chikhi and Medvedev 2014) and k=99 was found to be optimal. For ONT reads, Lgh (550,516 reads) and Lpm (68,907 reads) reads were self-corrected using CANU 1.8 (Koren et al. 2017) or SR-corrected using Ratatosk (Holley et al. 2021) and de novo assembly using CANU (Koren et al. 2017) and FLYE 2.8.2 (Kolmogorov et al. 2019) run with the option—meta and –plasmids. For all these assemblers, unless otherwise specified, we used the default parameters.

Post plastome assembly validation

As we used many assemblers and different strategies, we produced multiple contigs that needed to be analyzed and filtered in order to retain only the most robust plastomes. For that, all assemblies were evaluated using the QUality ASsessment Tool (QUAST) for quality assessment (Gurevich et al. 2013) and visualized using BANDAGE (Wick et al. 2015), both using default parameters. BANDAGE compatible graphs (.gfa format) were created with the megahit_toolkit for MEGAHIT (Li et al. 2016) and with gfatools for ABySS (Jackman et al. 2017). Overlaps between fragments were manually checked and ambiguous “IUPAC or N” nucleotides were also biocured with Illumina reads when available.

Chloroplast genome annotation

Plastomes were annotated via the GeSeq (Tillich et al. 2017) using ARAGORN and tRNAscan_SE to predict tRNAs and rRNAs and tRNAscan_SE to predict tRNAs and rRNAs and via Chloe prediction site (Zhong 2020). The previously reported Lo chloroplast genome was also similarly re-annotated to facilitate genomic comparisons. Gene boundaries, alternative splice isoforms, pseudogenes and gene names and functions were manually checked and biocurated using Geneious (v.10). Finally, plastomes were represented using OrganellarGenomeDRAW (OGDRAW, Greiner et al. 2019). These genomes were submitted to GenBank at the National Center of Biotechnology Information (NCBI) with specific accession numbers (for Lgh haplotype 1, (LGH1) OR166254 and Lgh haplotype 2, (LGH2) OR166255; for Lpm haplotype, (LPM) OR166256) using annotation tables generated through GB2sequin (Lehwark and Greiner 2019).

SSRs and Repeat Sequences Analysis

Simple Sequence Repeats (SSRs) were analyzed through the MISA web (MISA-web) server (Beier et al. 2017), with parameters set to 10, 5, 4, 3, 3, and 3 for mono-, di-, tri-, tetra-, penta-, and hexa-nucleotides, respectively. Direct, reverse and palindromic repeats were identified using RepEx (Gurusaran, Ravella, and Sekar 2013). Parameters used were: for inverted repeats (min size 15 nt, spacer = local, class = exact); for palindromes (min size 20 nt); for direct repeats (minimum size 30 nt, minimum repeat similarity 97%). Tandem repeats were identified using Tandem Repeats Finder (Benson 1999), with parameters set to two for the alignment parameter match and seven for mismatches and indels. The IRa region was removed for all these analyses to avoid over representation of the repeats.

Comparative chloroplast genomic analyses

Lgh and Lpm plastomes were compared with the reannotated and biocurated Lo plastome using mVISTA program (Frazer et al. 2004), with the LAGAN alignment algorithm (Brudno et al. 2003) and a cut-off of 70% identity. Nucleotide diversity (Pi) was analyzed using the software DnaSP v.6.12.01 (Rozas and Rozas 1999, Rozas et al. 2017) with step size set to 200 bp and window length to 300 bp. IRscope (Amiryousefi et al. 2018) was used for the analyses of inverted repeat (IR) region contraction and expansion at the junctions of chloroplast genomes. To assess the impact of environmental pressures on the evolution of these three Ludwigia species, we calculated the nonsynonymous (Ka) and synonymous (Ks) substitutions and their ratios (ω = Ks/Ks) using TBtools (Chen et al. 2020) to measure the selective pressure. Genes with ω < 1, ω = 1, and 1 < ω were considered to be under purifying selection (negative selection), neutral selection, and positive selection, respectively.

Phylogenetic analysis of Ludwigia based on MatK sequences

We performed a phylogenetic analysis on the Ludwigia genus using the MatK, only protein coding barcode available for a large number of Ludwigia species. All MatK amino acid sequences were aligned with the FFT-NS-2 (Fast Fourier Transform-based Narrow Search) algorithm and BLOSUM62 scoring matrix using MAFFT 7 (Katoh et al. 2002). The phylogenetic tree analysis was conducted using the rapid hill-climbing algorithm (command line: -f d) in RAxML 8.2.11 (Stamatakis 2014), with GAMMA JTT (Jones-Taylor-Thornton) protein model. Node support was assessed through fast bootstrapping (-f a) with 1,000 non-parametric bootstrap pseudo-replicates. Circaea MatK were selected as outgroup, and all accession numbers are indicated on the phylogenetic tree labels.

Graphic representation

Statistical analyses were performed using R software in RStudio integrated development environment (R Core Team 2015; RStudio: Integrated Development for R. RStudio, Inc., Boston, MA). Figures were realized using ggplot2, ggpubr, tidyverse, dplyr, gridExtra, reshape2, and viridis packages. SNPs were represented using trackViewer (Ou and Zhu 2019) and genes represented using gggenes packages.

Results

Plastome short read assembly

The chloroplastic fraction of Lgh short reads (SR) was extracted by mapping against the two draft haplotypes generated by GetOrganelle, which differ only by a “flip-flop” of the SSC region (Figure 1). Since the assembly by NOVOplasty did not provide any additional information compared to GetOrganelle, it was not retained. This subset (1,360,507 reads) was assembled using ABySS, Velvet, MEGAHIT and SPAdes in order to identify the best assembler for this plant model.

As shown in Figure 2, both the number and size of contigs depend greatly on the algorithms used and the correction step. The effect of prior read correction is notable for MEGAHIT and Velvet, especially concerning the increase in the size of the large alignment (Figure A1-A), loss of misassemblies, and reduction of the number of mismatches (Figure 1A-B). Investigating results via BANDAGE (Figure A2), we observed that ABySS and SPAdes suggest the tripartite structure with the long single-copy (LSC) region as the larger circle in the graph (blue), joined to the small single-copy region (green) by one copy of the inverted repeats (IRs, red), both IRs being collapsed in a segment of approximately twice the coverage. For Velvet and MEGAHIT, graphs confirm the significant fragmentation of the assemblies, which is improved by prior correction of the reads.

In conclusion, none of the short-read assemblers tested in our study produced a complete plastome. The best result was achieved by SPAdes using corrected short reads (mean coverage 1900 X) to assemble a plastome consisting of three contigs: 90,272 bp (corresponding to LSC), 19,788 bp (corresponding to SSC), and 24,762 bp (corresponding to one of the two copies of the IR).

Figure 1 - Two structural haplotypes of L. grandiflora subsp. hexapetala plastomes representing the flip-flop organization of SSC segment

Figure 2 - Comparative results of L. grandiflora subsp. hexapetala short read (SR) assemblies. A: Total number of contigs obtained with the uncorrected (dark green) and corrected (light green) chloroplast SRs for the 4 assemblers (ABySS, MEGAHIT, Velvet and SPAdes). B: Comparison of the size of contigs assembled by the 4 tools using corrected or uncorrected SRs. C: Boxplot showing the distribution of these contigs by size and the improvement brought by the prior correction of the SRs with the long reads for each tool.

Plastome long read assembly

Chloroplast fractions of Lgh long reads (28,882 reads) were assembled using CANU or FLYE. With raw data, CANU generates a unique contig (NGA50 112648) corresponding to haplotype 2, whereas FLYE makes two contigs (NGA50 133687) that reconstruct haplotype 1. Self-corrected LR leads to fragmentation into two (CANU) or three (FLYE) contigs which both reconstruct haplotype 1, with a large gap corresponding to one of the IR copies for CANU. Finally, SR-correction by RATATOSK allows CANU to assemble two redundant contigs reproducing haplotype 2 while FLYE makes two contigs corresponding to haplotype 1 (Figure A3). In conclusion, the two Lgh haplotypes were reconstructed (average coverage 700X) and the most complete and accurate hybrid assemblies (99.94% accuracy, Additional Figure 3B) were submitted to GenBank.

Unfortunately, due to the absence of short read data, we could only perform self-corrected long read assembly for Lpm using CANU. We also compared CANU and FLYE assembler efficiency, and found that assembly using CANU produces 13 contigs whereas FLYE produces 12 contigs. In both cases, only three contigs are required to reconstitute a complete cpDNA assembly (no gap, no N), with an SSC region oriented like those of the Lgh haplotype 2 and the Lo plastome. Although it is more than likely that these two SSC region orientations also exist for Lpm, the low number of nanopore sequences generated (68907 reads) and absence of Illumina short reads prevented us from demonstrating the existence of both haplotypes. As a result, only the “haplotype 2” generated sequence was deposited to Genbank.

Annotation and comparison of Ludwigia plastomes

General Variations

Plastomes of the three species of Ludwigia sp., Lgh, Lpm and Lo, are circular double-stranded DNA molecules (Figure 3) which are all (as shown in Table 1) approximately the same size: Lo is 159,396 bp long, making it the smallest, while Lgh is the largest with 159,584 bp, and Lpm is intermediate at 159,537 bp. The overall GC content is almost the same for the three species (37.4% for Lo, 37.3 % for Lgh and Lpm) and the GC contents of the IR regions are higher than those of the LSC and SSC regions (approximately 43.5 % compared to 35% and ca.32% respectively). Between the three species, the lengths of the total chloroplasts, LSC, SSC, and IR are broadly similar (approximately 90.2 kb for LSC, 19.8 kb for SSC and 24.8 kb for IB, see details Table 1) and the three plastomes are perfectly syntenic if we orient the SSC fragments the same way.

All three Ludwigia sp. plastomes contain the same number of functional genes (134 in total) encoding 85 proteins (embracing 7 duplicated in the IR region: ndhB, rpl2, rpl23, rps7, rps12, ycf2, ycf15), 37 tRNAs (including trnK-UUU which contains matK), and 8 rRNAs (16S, 23S, 5S, and 4.5S as duplicated sets in the IR). Among these genes, 18 contain introns, of which six are tRNAs (Table 2). Only the rps12 gene is a trans-spliced gene. A total of 46 genes are involved in photosynthesis, and 71 genes related to transcription and translation, including a bacterial-like RNA polymerase and 70S ribosome, as well as a full set of transfer RNAs (tRNAs) and ribosomal RNAs (rRNAs). Six other protein-coding genes are involved in essential functions, such as accD, which encodes the β-carboxyl transferase subunit of acetyl-CoA carboxylase, an important enzyme for fatty acid synthesis; matK encodes for maturase K, which is involved in the splicing of group II introns; cemA, a protein located in the membrane envelope of the chloroplast is involved in the extrusion of protons and thereby indirectly allows the absorption of inorganic CO2 in the plastids; clpP1 which is involved in proteolysis, and; ycf1, ycf2, two ATPases members of the TIC translocon. Finally, a highly pseudogenized ycf15 locus was annotated in the IR even though premature stop codons indicate loss of functionality.

Figure 3 - Circular representation of annotations plastomes in Ludwigia octovalis, Ludwigia grandiflora subsp. hexapetala and Ludwigia peploides subsp. montevidensis using ogdraw. Each card contains four circles. From the center outwards, the first circle shows forward and reverse repeats (red and green arcs, respectively). The next circle shows tandem repeats as bars. The third circle shows the microsatellite sequences. Finally, the fourth and fifth circles show the genes colored according to their functional categories (see colored legend). Only the haplotype 1 of L. grandiflora subsp. hexapetala is represented as haplotype 2 only diverge by the orientation of the SSC segment. Accession numbers are indicated for each plastome.

Table 1 - The general characteristics of the 3 Ludwigia plastomes

 

L. octovalvis*      

L. grandiflora subsp. hexapetala

  L. peploides subsp. montevidensis

Size (bp)

 

159,396

159,584

159,537

LSC

90,183

90,272

90,156

SSC

19,703

19,788

19,799

IR

24,755

24,762

24,791

GC%

 

37.4

37.3

37.3

LSC

35.2

35.1

35.1

SSC

32

31.7

31.7

IR

43.5

43.5

43.4

* KX827312 (ref)

Table 2 - Genes present in the plastomes of Ludwigia sp.

Function

Name

Photosynthesis

Rubisco

rbcL

Photosystem I (PSI)

psaA, psaB, psaC, psaI, psaJ

PSI assembly factors

ycf3# (pafI), ycf4 (pafII)

Photosystem II

psbA, psbB, psbC, psbD, psbE, psbF, psbH, psbI, psbJ, psbK, psbL, psbM, pbf1 (psbN) psbT, psbZ

ATP synthase

atpA, atpB, atpE, atpF#, atpH, atpI

Cytochrome b6f 

petA, petB#, petD#, petG, petL, petN

Cytochrome biogenesis

ccsA

NADPH dehydrogenase

ndhA#, ndhB**#, ndhC, ndhD, ndhE, ndhF, ndhG, ndhH, ndhI, ndhJ

Transcription and translation

Transcription

rpoA, rpoB, rpoC1#, rpoC2

Small ribosomal proteins

rps2, rps3, rps4, rps7**, rps8, rps11, rps12**#, rps14, rps15, rps16#, rps18, rps19

Large ribosomal proteins

rpl2**#, rpl14, rpl16#, rpl20, rpl22, rpl23**, rpl32, rpl33, rpl36

Translation initiation

infA

Ribosomal RNA

rrn5**, rrn4,5**, rrn16**, rrn23**

Transfer RNA

trnA-UGC**#,trnC-GCA,trnD-GUC,trnE-UUC,trnF-GAA,trnfM-CAU,trnG-GCC,trnG-UCC#,trnH-GUG,,trnI-CAU**,trnI-GAU**#,trnK-UUU#,trnL-CAA**,trnL-UAA#,trnL-UAG,trnM-CAU,trnN-GUU**,trnP-UGG,trnQ-UUG,trnR-ACG**,trnR-UCU,trnS-GCU,trnS-GGA,trnS-UGA,trnT-GGU,trnT-UGU,trnV-GAC**,trnV-UAC#,trnW-CCA,trnY-GUA

Other functions

Group II intron splicing

matK

Inorganic carbon uptake

cemA

Protease

clpP1#

Fatty acid synthesis/Heat tolerance

accD

TIC machinery (protein import)

ycf1 (Tic214), ycf2**

Unknown function pseudogene

ycf15**

** duplicated in IR region, # spliced genes

Segments Contractions/Expansion

The junctions between the different chloroplast segments were compared between three Ludwigia sp. (Lpm, Lgh and Lo), and we found that the overall resemblance of Ludwigia sp. plastomes was confirmed at all junctions (Figure 4A). In all three genomes, rpl22, rps19, and rpl2 were located around the LSC/IRb border, and rpl2, trnH, and psbA were located at the IRa/LSC edge. The JSB (junction between IRb and SSC) is either located in the ndhF gene or the ycf1 gene depending on the orientation of the SSC region (Figure 4B). The ycf1 gene was initially annotated as a 1139 nt pseudogene that we biocurate as a larger gene (5302 nt) with a frameshift due to a base deletion, compared to Lgh and Lo which both carry a complete ycf1 gene.

Figure 4 - Comparison of the borders of LSC, SSC, and IR regions in Onagraceae plastomes. A: Comparison of the junction between large single-copy (LSC, light blue), inverted repeat (IR, orange) and short single-copy (SSC, light green) regions among the chloroplast genomes of L. octovalvis, L. peploides subsp. montevidensis and L. grandiflora subsp. hexapetala (both haplotypes). Genes are denoted by colored boxes and the gaps between genes and boundaries are indicated by base lengths (bp). JLB: junction line between LSC and IRb; JSB: junction line between IRb and SSC; JSA: junction line between SSC and IRa; JLA: junction line between IRa and LSC. B: Comparison of SSC boundaries in haplotype 1 (L. peploides subsp. montevidensis and L. grandiflora subsp. hexapetala haplotype 1) and haplotype 2 (L. octovalvis and L. grandiflora subsp. hexapetala haplotype 2) plastomes.

If we compare Ludwigia sp. chloroplastic LSC/SCC/IR junctions (via IRscope) with representative Onagraceae plastomes of Chamaenerion conspersum (MZ353638) and chamaenerion angustifolium (NC_052848), Circaea cordata (NC_060876) and Circaea alpina (NC_061010), Epilobium amurense (NC_061015) and Oenothera villosa subsp. strigosa (NC_061365) and Oenothera lindheimeri (MW538951) (Figure 5), we can observe that the gene positions at the JLB (junction of LSC/IRb) and JLA (junction of IRa/LSC) boundary regions are well-preserved throughout the entire family, whereas those at the JSB and JSA regions differ. Concerning JSB (junction of IRb/SSC), in the five Onagraceae genera studied, ndhF is duplicated, with the exception of Circaea sp. and Ludwigia sp. For Oenothera villosa, the first copy of ndhF, which is located in the IRb, overlaps the JSB border, whereas for Oenothera lindheimeri, Epibolium amurense and Chamaenerion sp., ndhF is only located in inverted repeats. Only Circaea sp. and Ludwigia sp. have a unique copy of this locus, and it is found in the SSC segment (Figure 5). At the JSA border (junction of SSC/Ira), in Circaea sp., the ycf1 gene crosses the IRa/SSC boundary and extends into the IRa region.

When comparing the respective sizes of chloroplast fragments (IR/SSC/LSC) in Onagraceae, it can be observed that Ludwigia species exhibit expansions in the SSC and LSC regions which are not compensated by significant contractions in the IR regions. This is likely due to the relocation of the ndhF in the SSC region and rps19 in the LSC region. Additionally, there may be significant size variations in the intergenic region between trnI and ycf2, as well as the intergenic segment containing the ycf15 pseudogene (Figure A4).

Figure 5 - Comparison of LSC, SSC and IR regions boundaries in Onagraceae chloroplast genomes. Representative sequences from each genus have been chosen (noted R on the diagram) except for Oenothera lindheimeri (only 89.35 % identity with others Oenothera), Circaea alpina (99.5 % identity but all others Circaea are 99.9% identical) and Chamaenerion conspersum (99% but all others Chamaenerion are ca. 99.7 identical). As shown in Figure 7, the 3 Ludwigia plastomas had the same structure, L. octovalvis was chosen as a representative of this genus. JLB: junction of LSC/IRb; JSB: junction of IRb/SSC; JSA: junction of SSC/IRa; JLA: junction of IRa/LSC. Accession numbers: Chamaenerion conspersum (MZ353638), Chamaenerion angustifolium (NC_052848), Circaea cordata (NC_060876), Circaea alpina (NC_061010), Epilobium amurense (NC_061015), Oenothera villosa subsp. strigosa (NC_061365) and Oenothera lindheimeri (MW538951).

Repeats and SSRs analysis

In this study, we analyzed the nature and distribution of single sequence repeats (SSR), as their polymorphism is an interesting indicator in phylogenetic analyses. A total of 65 (Lgh), 48 (Lpm) and 45 (Lo) SSRs were detected, the majority being single nucleotide repeats (38–21), followed by tetranucleotides (12–10) and then di-, tri- and penta-nucleotides (Figure A5-A). Mononucleotide SSRs are exclusively composed of A and T, indicating a bias towards the use of the A/T bases, which is confirmed for all SSRs (Figure A5-B). In addition, the SSRs are mainly distributed in the LSC region for the three species, which is probably biased by the fact that LSC is the longest segment of the plastome (Figure A5-C). The analysis of SRR locations revealed that most were distributed in non-coding regions (intergenic regions and introns, Figure A5-D).

The chloroplast genomes of the three Ludwigia species were also screened for long repeat sequences. They were counted in a non-redundant way (if smaller repetitions were included in large repeats, only the large ones were considered). Four types of repeats (tandem, palindromic inverted and direct) were surveyed in the three Ludwigia sp. plastomes. No inverted repeats were detected with the criteria used.

For the three other types of repeats, here are their distributions:

Tandem repeats (Table 3: Perfect tandem repeats (TRs) with more than 15 bp were examined. Twenty-two loci were identified in the three Ludwigia sp. plastomes (Lgh, Lpm, Lo), heterogeneously distributed as shown in Table 3: 13 loci (plus one imperfect) in Lo, nine loci (plus one imperfect) in Lgh and seven loci (plus two imperfect) in Lpm. It can therefore be seen that the TR distributions (occurrence and location) are specific to each plastome, since only four pairs are common to the three species. Thus, nine TRs are unique to Lo, three to Lpm and three to Lgh. Two pairs are common to Lgh and Lpm and one is common to Lo and Lgh. TRs are mainly intergenic or intronic but are detected in two genes (accD and ycf1). These genes have accelerated substitution rates, although this does not generate a large difference in their lengths. This point will be developed later in this article.

Direct repeats (Table 4): There are few direct (non-tandem) repeats (DRs) in the chloroplast genomes of Ludwigia sp. A single direct repeat of 41 nt is common to the three species, at 2 kb intervals, in psaB and psaA genes. This DR corresponds to an amino acid repeat [WLTDIAHHHLAIA] which corresponds to a region predicted as transmembrane. We then observe three direct repeats conserved in Lpm and Lgh in ycf1, accD and clpP1 respectively, two unique DRs in Lo (in the accD gene and rps12-clpP1 intergene) and one in Lgh (in the clpP1 intron 1 and clpP1 intron 2).

Palindromes (Table 5): Palindromic repeats make up the majority of long repetitions, with the numbers of perfect repeats varying from 19, 24 and 26 in Lo, Lgh and Lpm, respectively, and the number of quasi-palindromes (1 mutation) varying between 8, 3 and 6. They are mainly found in the intronic and intergenic regions, with the exception of six genic locations in psbD, ndhK, ccsA and rpl22, and two palindromic sequences in ycf2. These gene palindromic repeats do not seem to cause genetic polymorphism in Ludwigia and can be considered as silent.

Thirteen palindromes are common to the three species (including 2 with co-variations in Lo). 13 others present in Lpm and Lgh correspond to quasi-palindromes (QPs) in Lo due to mutated bases, and conversely, three Lo perfect palidromes are mutated in Lpm and Lgh. Finally, only five palindromes are species specific. Two in particular are located in the hypervariable intergenic spacer ndhF-rpl32, and are absent in Lo due to a large deletion of 160 nt.

Repeat distribution in LSC, SSC and IR segments

In the IRa/IRb regions, repeats are only identified in the first 9 kb region between rpl2 and ycf2: a tandem repeat in the Lpm rpl2 intron, and a tetranucleotide repeat, [TATC]*3, located in the ycf2 gene in the three species. In ycf2 we also found 1 common palindrome (16 nt), a single palindrome in Lo (20 nt, absent following an A:G mutation in the 2 other species), as well as a shared tandem repeat (24 nt), and an additional 15 nt tandem repeat in Lo which adds 4 amino acids to protein sequence.

In the SSC region, the repeats are almost all located in the intergenic and/or intronic regions, with a hotspot between ndhF and ccsA. There is also a shared microsatellite in ndhF, and a palidrome (16 nt) in ccsA which is absent in Lo (due to an A:C mutation), resulting in a synonymous mutation (from isoleucine to leucine). We also observed multiple and various repeats in the ycf1 gene: 3 common poly-A repeats (from 10 to 13 nt), 3 species-specific microsatellites (ATAG)*3 and (ACCA)*4 in Lgh and (CAAC)*3 in Lo, as well as two direct repeats of 32 nt (37 nt spacing), which were absent from Lo due to a G:T SNP. Two tandem repeats were also observed in Lo and Lgh. Neither of these repeats are at the origin of the frameshift causing the pseudogenization of ycf1 in Lo, this latter being due to a single deletion of an A at position 3444 of the gene.

Table 3 - Tandem repeats

Sequence

L. octovalis (L.o)

L. grandiflora (L.g)

L. peploides (L.p)

Length

Region

Locus

Comments

TTGTAGTCAGGGGTGTAGTACTAT

 

 

 

24

IRs

ycf2

TAGAAGAGAGTGCAG

 

X

X

15

IRs

ycf2

15 nt deletion in L.g and L.p

ATGAAATATCGTATAATGAAGTACCACACGAGTGGATAT

X

X

 

39

IRs

rpl2 intron

39 nt deletion in L.g and L.o

AAAAATAGGATAGGAT

 

X

X

16

LSC

ycf1-trnH-GUG

56 nt deletion in L.g and L.p

TAAATTAATATCTATATA

 

X

X

18

LSC

psbZ-trnG-GCC

18 nt deletion in L.g and L.p

TTTTCTATCTATCTTATATCAA

 

X

X

22

LSC

trnK-UUU-rps16

22 nt deletion in L.g and L.p

AGATCCATAACATCATCAAA

 

X

X

20

LSC

rps16 intron

22 nt deletion in L.g and L.p

TATTAGTTATTAATATTATTAGA

 

X

X

23

LSC

trnP-UGG-psaJ

23 nt deletion in L.g and L.p

AATAATATATAATAACTTAAATA

 

X

X

23

LSC

rpl33-rps18

33 et 44 nt nt deletion in in L.g et L.p, respectively

TTTTTATTTAACATGCTATCAAATCAACAATGCCATACCGTAGGGCATCTGTT

 

X

X

53

LSC

rpl20-clpP1

107 nt deletion in L.g and L.p

ATATATTTCGATTCAATTC

X

 

X

19

LSC

trnH-GUG-psbA

3 copies in a 57 nt deletion in L.o and L.p

ATAGAAATATCAGTATTTGAGTG

X

 

X

23

LSC

atpH-atpI

23 nt deletion in L.o and L.p

TTAATTTTAATTGAAGAA

X

 

X

18

LSC

psbJ-psbL

17 and 24 nt deletion in L.o and L.p, respectively

TTAAAGAATATTAATATTC

imperfect TR

 

 

19

LSC

trnR-UCU-atpA

A -> C mutation in second copy in L.o

TATTATTATTATTAAT

X

X

 

16

LSC

atpH-atpI

16 nt deletion in L.g and L.o

TCTAAGGCTGAAATAAGG

X

X

 

18

LSC

pafI intron

18 nt deletion in L.g and L.o

TGTGAATCTATCTAT

 

 

X

15

LSC

trnS-UGA-psbZ

8 nt deletion in L.p

TTTTTTCTAGTA

 

 

 

12

LSC

pafI intron

CTAGTTATTGACATGG

 

imperfect TR

imperfect TR

16

LSC

psaJ-rpl33

G -> A mutation in second in L.p et L.g

ATTTTTATTAACTCT

X

 

imperfect TR

15

SSC

ycf1

T->A mutation in first copy in L.p, other sequence in first copy in L.o

AATCAAATAGTTGAT

 

X

X

15

SSC

ycf1

other sequence in first copy of L.p and L.g

ATAATAATATATTTATTATTAATTAATA

X

 

 

28

SSC

ndhF-rpl32

160 nt deletion in L.o

Lo = Ludwigia octovalvis; Lgh = L. grandiflora subsp. hexapetala; Lpm = L. peploides subsp. montevidensis.

Table 4 - Direct repeats

Sequence

L. octovalis (Lo)

L. grandiflora subsp. hexapetala (Lgh)

L. peploides subsp. montevidensis (Lpm)

Size (nt)

Spacers (nt)

Region

Locus

Comments

TTCAATTGGAACGGACGATTCGTCAATCATCT

 

 

 

32

37

SSC

ycf1

2 copies. In Lo, one mutation (G->A) in the second copie

CATCGATGATGAAAGTGAAAACAGTAATGAAGAGG

X

 

 

35

28 - 22 - 11

LSC

accD

3 perfects copies and 1 mutated (G->A) copie in Lgh and Lpm Region of 174 nt deleted in Lo

TTAAGAGCCGTACAGGCACCTTTTGATGCATACGG

X

 

 

 

408 in Lpm, 406 in Lgh

LSC

clpP1

2 copies. In Lgh, one mutation (C->T) in the second copie

AGATGGTGAAGAACCTTATGAAGATGGTGAAGAACCTTATG

 

X

X

41

22

LSC

accD

Region of 147 nt deleted in Lgh and Lpm

TATCAAATCAACAATGCCATACCGTAGGGCAT

 

X

X

32

22 - 21

LSC

rps12-clpP1

3 copies

TTAAGAGCCGTACAGGCACTTTTTGATGCATACGG

X

 

X

35

811

LSC

clpP1 intron 1- intron 2

 

TGCAATAGCCAAATGATGATGAGCAATATCAGTCAGCCATA

 

 

 

41

2178

LSC

psaB & psaA

 

Lo = Ludwigia octovalvis; Lgh = L. grandiflora subsp. hexapetala; Lpm = L. peploides subsp. Montevidensis

Table 5 - Palindromic repeats

Common perfect palidromic repeats

AGACTCTCATGAGAGTCT

trnC-GCA - petN

ATTAAATAGAATATTCTATTTAAT

trnE-UUC-trnT-GGU

TTGGTAAATTTACCAA

psbD

TTCATTTCAATTTCAATTGAAATTGAAATGAA

trnI-CAU-ycf2

2 copies in IR

GAAAAAGGCCTTTTTC

ycf2

2 copies in IR

TCTCAAATGATTAATCATTTGAGA

trnL-UAA intron

GGATTACTAGTAATCC

trnD-GUC-trnY-GUA

TTTGAATGCATTCAAA

trnG-UCC intron

ATATATTCGAATATAT

trnG-UCC -trnR-UCU

TAGTAATTAATTACTA

trnG-GCC-trnfM-CAU

CCAGTATGCATACTGG

ndhK

Common palidromic repeats with covariation

in L. octovalvis

in L. grandiflora et L. peploides

ATAGAATCTATATTCTATTAGAATATAGATTCTAT

ATCGAATCTATATTCTATTAGAATATAGATTCGAT

ndhC-trnV-UAC

ATGTATATATATCGAT

ATCTATATATATAGAT

trnE-UUC-trnT-GGU

Common palindromic and quasi-palidromic repeats

in L. octovalvis

in L. grandiflora and L. peploides

TTTAACGAATATTAATATT t GTTAAA

TTTAACGAATATTAATATTCGTTAAA

trnR-UCU-atpA

TTAA c GAATATTAATATTCTTTAA

TTAAAGAATATTAATATTCTTTAA

trnR-UCU-atpA

AATTGTA c TTACAATT

AATTGTAATTACAATT

ccsA

AGGAAGATTGATCAATCTT t CT

AGGAAGATTGATCAATCTTCCT

trnL-UAG-rpl32

TTA c TAATATTACTAA

TTAGTAATATTACTAA

trnK-UUU intron

ATATAGAATAT c CTATAT

ATATAGAATATTCTATAT

psbZ-trnG-GCC

ACATATCATGATA g GT

ACATATCATGATATGT

rpl22

Table 5 - Palindromic repeats - Continued

AATTACTAATTTCTATTACTATGTTCAATTGAACATAGTAATAGAAATTAGTAATT

AATTACTAATTTCTATTACT t TGTTCAATTGAACATAGTAATAGAAATTAGTAATT

atpH-atpI

TAGTTAGAATTCTAACTA

TAGTT c GAATTCTAACTA

trnT-UGU-trnL-UAA

TATTTTTTCTAGAAAAAATA

TATTTTTTCTAGAA g AAATA

ycf2

2 copies in IR

in L. octovalvis and L. peploides

in L. grandiflora

CCCATCAATCATGATTG t TGGG

CCCATCAATCATGATTGATGGG

psbN-trnD-GUC

in L. octovalvis and L. grandiflora

in L. peploides

ATGAAAAAAATCGATTTTTTTCAT

ATGATAAAAATAGATTTTT a TCAT

trnK-UUU-rps16

ATGAAAAAAATCGATTTTTTTCAT- ATGATAAAAATCGATTTTTATCAT

ATGATAAAAATA g ATTTTTATCAT

trnK-UUU-rps16

Unique palidromic repeats

L. peploides

TTATATATATATATATATAA

rpl32-ndhF

Full deletion in L.o 6 bases deletion in Lgh

L. octovalvis

ATTGAAATTCGAATTTCAAT

psbZ-trnG-GCC

Full deletion in Lgh and Lpm

L. peploides and L. grandiflora

AAAAAATGGATCCATTTTTT

trnL-UAG-rpl32

3 bases deleted and 3 bases mutated in Lo

AATATATTATTATAATAATATATT

rpl32-ndhF

Full deletion in Lo

TATATTTATTATTAATTAATAATAAATATA

rpl32-ndhF

Full deletion in L.o

Lo = Ludwigia octovalvis; Lgh = L. grandiflora subsp. hexapetala; Lpm = L. peploides subsp. montevidensis.

Finally, in the LSC region, the longest segment, which consequently contains the maximum number of repeats, we still observed a preferential localization in the intergenic and intronic regions since only genes atpA, rpoC2, rpoB, psbD, psbA, psbB, ndhK and clpP1 contain either mononucleotic repeats (poly A and T), palindromes, or microsatellites (most often common to the three species and without affecting the sequences of the proteins produced). As mentioned earlier, the only exception is the accD gene, which contains several direct and tandem repeats in Lgh and Lpm, corresponding to a region of 174 nt (58 amino acids) missing in Lo and, conversely, a direct repeat of 40 nucleotides, in a region of 147 nt (49 aa), which is present in Lo and missing in the other two species. These tandem repeats lead to the presence of four copies of nine amino acids [DESENSNEE] in Lgh and Lpm, two of which form a larger duplication of 17 aa [FLSDSDIDDESENSNEE]. Similarly, the TRs present only in Lo generate two perfect nine amino acid repeats [EELSEDGEE], included in two longer degenerate repeats of 27 nt (Figure A6). It should be noted that though these TRs do not disturb the open reading phases, it is still possible for them to form an intron which is not translated. Different functional studies will be necessary to clarify this point. The presence of polymorphisms of the accD gene between Lo and the two species (Lpm, Lgh) is interesting because accD, that encodes a subunit of acetyl-CoA carboxylase (EC 6.4.1.2). This enzyme is essential in fatty acid synthesis and also catalyzes the synthesis of malonyl-CoA, which is necessary for the growth of dicots, plant fitness and leaf longevity, and is involved in the adaptation to specific ecological niches (Konishi and Sasaki 1994). Large accD expansions due to TRs have also been described in other plants such as Medicago (Wu et al. 2021) and Cupressophytes (Li et al. 2018). Some authors have suggested that these inserted repeats are not important for acetyl-CoA carboxylase activity as the reading frame is always preserved, and they assume that these repeats must have a regulatory role (Gurdon and Maliga 2014).

Sequence Divergence Analysis and Polymorphic Loci Identification

Determination of divergent regions by MVista, using Lo as a reference, confirmed that the three Ludwigia sp. plastomes are well preserved if the SSC segment is oriented in the same way (Figure A7). Sliding window analysis (Figure 6) indicated variations in definite coding regions, notably clpP, accD, ndh5, ycf1 with high Pi values, and to a lesser extent, rps16, matK, ndhK, petA, ccsA and four tRNAs (trnH, trnD, trnT and trnN). These polymorphic loci could be suitable for inferring genetic diversities in Ludwigia sp.

A comparative analysis of the sizes of protein coding genes sizes also shows that the rps11 gene initially annotated in Lo is shorter than those which have been newly annotated in Lgh and Lpm (345 bp instead of 417 bp). Comparative analysis by BLAST shows that it is the long form which is annotated in other Myrtales, and the observation of the locus in Lo shows a frameshift mutation (deletion of a nucleotide in position 311). Functional analysis would be necessary to check whether the rps11 frameshift mutation produces shorter proteins that have lost their function. And only obtaining the complete genome will verify whether copies of some of these genes have been transferred to mitochondrial or nuclear genomes. Such rps11 horizontal transfers have been reported for this gene in the mitochondrial genomes of various plant families (Richardson and Palmer 2007). This also applies to ycf1, found as a pseudogene in Lo (as specified previously), although it is not known if this reflects a gene transfer or a complete loss of function (de Vries et al. 2015, Filip and Skuza 2021). Moreover, there is a deletion of nine nucleotides in the 3’ region of the rpl32 gene in Lgh and Lpm, leading to a premature end of the translation and the deletion of the last four amino acids [QRLD], which are replaced by a K. However, if we look carefully at the preserved region as defined by the RPL32 domain (CHL00152, member of the superfamily CL09115), we see that the later amino acids are not important for rpl32 function since they are not found in the orthologs.

Figure 6 - Illustration of nucleotide diversity of the three Ludwigia chloroplast genome sequences. The graph was generated using DnaSP software version 6.0 (windows length: 800 bp, step size: 200 bp) (Rozas et al. 2017, Rozas and Rozas 1999). The x-axis corresponds to the base sequence of the alignment, and the y-axis represents the nucleotide diversity (π value). LSC, SSC and IR segments were indicated under the line representing the genes coding the proteins (in light blue) the tRNAs (in pink) and the rRNAs (in red). The genes marking diversity hotspots are noted at the top of the peaks.

Our results show that the Ka/Ks ratio is less than 1 for most genes (Figure 7). This indicates adaptive pressures to maintain the protein sequence except for matK (1.17 between Lgh and Lpm), accD (2.48 between Lgh and Lo and 2.16 between Lpm and Lo), ycf2 (4.3 between both Lgh-Lp and Lo) and ccsA (1.4 between both Lgh-Lpm and Lo), showing a positive selection for these genes, and a possible key role in the processes of the species’ ecological adaptations. As we have already described the variability in the accD sequence, we will focus on ycf2, matK, and ccsA variations.

Concerning ccsA, the variations observed, although significant, concern only five amino acids, and modifications do not seem to affect the C-type cytochrome synthase gene function.

Concerning ycf2, our analysis shows that this gene is highly polymorphic with 256 SNPs that provoke 10 deletions, 7 insertions, 21 conservative and 49 non-conservative substitutions in Lo (Figure A8), compared to Lgh and Lpm (100 % identical). This gene has been shown as “variant” in other plant species such as Helianthus tuberosus (Zhong et al. 2019).

The matK gene has been used as a universal barcoding locus to enable species discrimination of terrestrial plants (Antil et al. 2023), and is often, together with the rbcL gene, the only known genetic resource for many plants. Thus, we propose a phylogenetic tree from Ludwigia matK sequences (Figure 8). It should however be noted that this tree contains only 149 amino acids common to all the sequences (out of the 499 in the complete protein). As only three complete Ludwigia plastomes are available at the time of our study, we cannot specify whether these barcodes are faithful to the phylogenomic history of Ludwigia in the same way as the complete plastome. In any case, for this tree, we can see that Lo stands apart from the other Ludwigia sp., Lpm and Lgh, and that the L. grandiflora subsp. hexapetala belongs to the same branch as the species L. ovalis (aquatic taxon used in aquariums (Li J et al. 2022), L. stolonifera (native to the Nile, found in a variety of habitats, from freshwater wetlands to brackish and marine waters) (Soliman et al. 2018) and L. adscendens (common weed of rice fields in Asia) (Kamoshita et al. 2016). Lpm is in a sister branch, close to the L. grandiflora subsp. hexapetala, forming a phylogenetic group corresponding to subsect Jussiaea (in green, Figure 8).

Figure 7 - The Ka/Ks ratios of the 80 protein-coding genes of Ludwigia plastomes. The blue curve represents L. grandiflora subsp. hexapetala versus L. peploides subsp. montevidensis, purple curve denotes L. grandiflora subsp. hexapetala versus L. octovalvis and green curve L. peploides subsp. montevidensis versus L. octovalvis. Four genes (matK, accD, ycf2 and ccsA) have Ka/Ks ratios greater than 1.0, whereas the Ka/Ks ratios of the other genes were less than 1.0.

Figure 8 - Phylogenetic tree based on Ludwigia MatK protein sequences. Only six Ludwigia sequences are complete (yellow star), the others correspond to amino acids ranging from 128 to 289 aa, with an average of 244 aa. Clades are named and colored regarding the Ludwigia phylogeny proposed by Liu et al. (2017). The sections are based on the works of Raven (1963), Wagner et al. (2007) and Liu et al. (2023). The scale bar indicates the branch length.

Discussion

In the present study, we first sequenced and de novo assembled the chloroplast (cp) genomes of Ludwigia peploides subsp. montevidensis (Lpm) and Ludwigia grandiflora subsp. hexapetala (Lgh), two species belonging to the Onagraceae family. We employed a hybrid strategy and demonstrated the presence of two cp haplotypes in Lgh and one haplotype in Lpm, although the presence of both haplotypes in Lpm is likely. Furthermore, we compared these genomes with those of other species in the Onagraceae family to expand our knowledge of genome organization and molecular evolution in these species.

Our findings demonstrate that the utilization of solely short reads has failed to produce complete Ludwigia plastomes, likely due to challenges posed by long repeats and rearrangements. On the other hand, relying solely on long reads resulted in a lower quality sequence due to insufficient coverage and sequencing errors. After conducting our research, we discovered that, for Lgh plastomes, hybrid assembly, which incorporates both long and short read sequences, resulted in the most superior complete assemblies. This innovative approach capitalizes on the advantages of both sequencing technologies, harnessing the accuracy of short read sequences and the length of long read sequences. In the case of our study on Lgh plastome reconstruction, hybrid assembly was the most complete and effective, similarly to studies on other chloroplasts, such as those in Eucalyptus (Wang et al. 2018), Falcataria (Anita et al. 2023), Carex (Xu et al. 2023) or Cypripedium (Guo et al. 2021).

In our study, we were able to identify the presence of two haplotypes in Lgh, which is a first for Ludwigia (and more broadly within Onagraceae), as the plastome of L. octovalvis was only delivered in one haplotype (Liu et al. 2016).

Due to the unavailability of sequence data for Ludwigia octovalvis and the fact that we only have long reads for Ludwigia peploides subsp. montevidensis, none of which large enough to cover the SSC/IR junctions, we are unable to conclusively identify the presence of these two forms in the Ludwigia genus. However, we believe that they are likely to be present. Unfortunately, the current representation of plastomes in GenBank primarily consists of short-read data, which may result in an underrepresentation of this polymorphism. It is unfortunate that structural heteroplasmy, which is expected to be widespread in angiosperms, has been overlooked. Existence of two plastome haplotypes has been identified in the related order of Myrtales (Eucalyptus sp.), in 58 species of Angiosperms (Wang and Lanfear 2019), Asparagales (Ophrys apifera orchid, Bateman et al. 2021), Brassicales (Carica papaya, Vasconcellea pubescens, Lin et al. 2020), Solanales (Solanum tuberosum, Lihodeevskiy and Shanina 2022), Laurales (Avocado Persea americana, Nath et al. 2022) and Rhamnaceae (Rhamnus crenata, Wanichthanarak et al. 2023). However, the majority of reference plastomes in the current GenBank database (Release 260: April 15, 2024) are described as a single haplotype, indicating an underrepresentation of structural heteroplasmy in angiosperm chloroplasts. This underscores the importance of sequencing techniques, as the database is predominantly composed of short-read data (98%), which are less effective than long reads or hybrid assemblies at detecting flip-flop phenomena in the LSC region.

The chloroplast genome sizes for the three genera of Onagraceae subfam. Onagroideae varied as follows: Circaea sp. ranged from 155,817 bp to 156,024 bp, Chamaenerion sp. ranged from 159,496 bp to 160,416 bp, and Epilobium sp. ranged from 160,748 bp to 161,144 bp (Luo et al. 2021). Our study revealed that the size of the complete chloroplast of Ludwigia (Onagraceae subfamily Ludwigioideae) ranged from 159,369 bp to 159,584 bp, which is remarkably similar to other Onagraceae plants (average length of 162,030 bp). Furthermore, Ludwigia plastome sizes are consistent with the range observed in Myrtales (between 152,214 to 171,315 bp, Zhang et al. 2021). In the same way, similar overall GC content was found in Ludwigia sp. (from 37.3 to 37.4%), Circaea sp. (37.7 to 37.8%), Chamaenerion sp. and Epilobium sp. (38.1 to 38.2%, Luo et al. 2021) and more generally for the order Myrtales (36.9–38.9%, with the average GC content being 37%, (Zhang et al. 2021)). Higher GC content of the IR regions (43.5%) found in Ludwigia sp. has already been shown in the Myrtales order (39.7–43.5%) and in other families/orders such as Amaranthaceae (order Caryophyllales, Xu et al. 2020) or Lamiaceae (order Lamiales, Lian et al. 2022), and is mainly due to the presence of the four GC rich rRNA genes.

The complete chloroplast genomes of the three Ludwigia species encoded an identical set of 134 genes including 85 protein-coding genes, 37 tRNA genes and eight ribosomal RNAs, consistent with gene content found in the Myrtales order, with a gene number varying from 123 to 133 genes with 77–81 protein-coding genes, 29–31 tRNA gene and four rRNA genes (Zhang et al. 2021). Chloroplast genes have been selected during evolution due to their functional importance (Mohanta et al. 2020). In our current study, we made the noteworthy discovery that matK, accD, ycf2, and ccsA genes were subjected to positive selection pressure. These genes have frequently been reported in literature as being associated with positive selection, and are known to play crucial roles in plant development conditions. Lgh and Lpm are known to thrive in aquatic environments, where they grow alongside rooted emergent aquatic plants, with their leaves and stems partially submerged during growth, as reported by Wagner et al. (2007). Both species possess the unique ability of vegetative reproduction, enabling them to establish themselves rapidly in diverse habitats, including terrestrial habitats, as noted by Haury et al. (2014b). Additionally, Lo is a wetland plant that typically grows in gullies and at the edges of ponds, as documented by Wagner et al. (2007). Given their ability to adapt to different habitats, these species may have evolved specialized mechanisms to cope with various abiotic stresses, such as reduced carbon and oxygen availability or limited access to light in submerged or emergent conditions. Concerning matK, Barthet and Hilu (2007) demonstrated the relationship between light and developmental stages, and MatK maturase activity, suggesting important functions in plant physiology. This gene has recently been largely reported to be under positive selection in an aquatic plant (Anubias sp., Li L et al. 2022), and more generally in terrestrial plants (Pinus sp., Zeb et al. 2022) or Chrysosplenium sp., Wu et al. 2020)). The accD gene has been described as an essential gene required for leaf development (Kode et al. 2005) and longevity in tobacco (Nicotiana tabacum, Madoka et al. 2002). Under drought stress, plant resistance can be increased by inhibiting accD (Gu et al. 2020), and conversely, enhanced in response to flooding stress by upregulating accD accumulation (Bharadwaj et al. 2023). Hence, we can hypothesize that the positive selection observed on the accD gene can be explained by the submerged and emerged constraints undergone by Ludwigia species. The ycf2 gene seems to be subject to adaptive evolution in Ludwigia species. Its function, although still vague, would be to contribute to a protein complex generating ATP for the TIC machinery (proteins importing into the chloroplasts (Kikuchi et al. 2018, Schreier et al. 2018), as well as plant cell survival (Drescher et al. 2000, Xing et al. 2022). The ccsA gene positive selection is found in some aquatic plants such as Anubia sp.(Li L et al. 2022), marine flowering plants as Zostera species (Chen et al. 2023), and some species of Lythraceae (Gu et al. 2020). The ccsA gene is required for cytochrome c biogenesis (Xie and Merchant 1996) and this hemoprotein plays a key role in aerobic and anaerobic respiration, as well as photosynthesis (Kranz et al. 1998). Furthermore, we showed that Lgh colonization is supported by metabolic adjustments mobilizing glycolysis and fermentation pathways in terrestrial habitats, and the aminoacyl-tRNA biosynthesis pathway, which are key components of protein synthesis in aquatic habitats (Billet et al. 2018). It can be assumed that the ability of Ludwigia to invade aquatic and wet environments, where the amount of oxygen and light can be variable, leads to a high selective pressure on genes involved in respiration and photosynthesis.

Molecular markers are often used to establish population genetic relationships through phylogenetic studies. Five chloroplasts (rps16, rpl16, trnL-trnF, trnL-CD, trnG) and two nuclear markers (ITS, waxy) were used in previous phylogeny studies of Ludwigia sp. (Liu et al. 2017). However, no SSR markers had previously been made available for the Ludwigia genus, or more broadly, the Onagraceae. In this study, we identified 45 to 65 SSR markers depending on the Ludwigia species. Most of them were AT mononucleotides, as already recorded for other angiosperms (Maheswari et al. 2021, Zhang et al. 2016). In addition, we identified various genes with highly mutated regions that can also be used as SNP markers. Chloroplast SSRs (cpSSRs) represent potentially useful markers showing high levels of intraspecific variability due to the non-recombinant and uniparental inheritance of the plastomes (Huang et al. 2018, Leontaritou et al. 2021). Chloroplast SSR characteristics for Ludwigia sp. (location, type of SSR) were similar to those described in most plants. While the usual molecular markers used for phylogenetic analysis are nuclear DNA markers, cpSSRs have also been used to explore cytoplasmic diversity in many studies (Snoussi et al. 2022, Song et al. 2014, Wheeler et al. 2014). To conclude, the 13 highly variable loci and cpSSRs identified in this study are potential markers for population genetics or phylogenetic studies of Ludwigia species, and more generally, Onagraceae.

Concerning the MatK-based phylogenetic tree, its topology is generally congruent with the first molecular classification of Liu et al. (2017) as all Ludwigia from sect Jussiaea (clade B1) and sect. Ludwigia (clade A1) and sect. Isnardia (clade A2) branched together. In this MatK-based tree, Ludwigia prostrata, a species absent from previously published phylogenetic studies, positions itself alone at the root of the Ludwigia tree. This species, sole member of section Nematopyxis, is related as having no close relatives (Raven and Tai 1979), finding supported by our work. We also observed that Ludwigia ovalis branches within sect. Jussiaea, as its 258 amino acids partial MatK sequence (ca. half of the complete sequence) is identical to the MatK proteins of L. grandiflora subsp. hexapetala, L. stolonifera and L. adscendens. Its phylogenetic placement remains unresolved: classified alone by Raven (1963) and Wagner et al. (2007) in sect. Miquelia, later positioned by Liu et al. (2017) within the Isnardia-Microcarpium section (using nuclear DNA) or as sister to it (using plastid DNA). For this reason, conducting a whole plastome analysis would be valuable to provide insights into L. ovalis phylogenetic positioning. Another species positioned on the margins of sect. Isnardia (clade A2) is Ludwigia suffruticosa (previously classified in sect. Microcarpium), which branches within sect. Ludwigia (clade A1). This positioning raises questions about the current grouping of sections Isnardia, Michelia, and Microcarpium into a single section Isnardia as proposed by Liu et al. (2023) and highlights that plastid protein coding markers can provide differing phylogenetic insights. Finally, the last species positioned differently of this clade (clade B4) is Ludwigia decurrens (sect. Pterocaulon) which clusters with L. leptocarpa (clade B3) and L. bonariensis (clade B4a). However, it is important to note that in their study, Liu et al. (2017) indicate that clade B4 is moderately supported and that the two members of sect. Pterocaulon, L. decurrens and L. nervosa, diverge in all trees (Liu et al. 2017). In summary, acquiring complete plastomes for Ludwigia sp. could significantly enhance our understanding of the phylogeny of this complex genus. Furthermore, comparing nuclear and plastid phylogenies would help determine if they reflect the same evolutionary history and whether plastid phylogeny alone can accurately reconstruct the phylogeny of Ludwigia genus.

Conclusion

In this study, we conducted the first-time sequencing and assembly of the complete plastomes of Lpm and Lgh, which are the only available genomic resources for functional analysis in both species. We were able to identify the existence of two haplotypes in Lgh, but further investigations will be necessary to confirm their presence in Lo and Lpm, and more broadly, within the Ludwigia genus. Comparison of all 10 Onagraceae plastomes revealed a high degree of conservation in genome size, gene number, structure, and IR boundaries. However, to further elucidate the phylogenetic analysis and evolution in Ludwigia and Onagraceae, additional chloroplast genomes will be necessary, as highlighted in recent studies of Iris and Aristidoideae species (Feng et al. 2022).

Availability of data and materials

The datasets generated and/or analysed during the current study were available in GenBank (for Lgh haplotype 1, (LGH1) OR166254 and Lgh haplotype 2, (LGH2) OR166255; for Lpm haplotype, (LPM) OR166256). Chloroplastic short and long reads are available at EBI-ENA database (https://www.ebi.ac.uk/ena/browser/home) under these accession numbers for LGH plastomes (Long reads: Experiment: ERX13439011 ; Run: ERR14035997 and short reads: Experiment: ERX13439002 ; Run: ERR14035988) and for LPM plastomes (Long reads: Experiment: ERX13439014 ; Run: ERR14036000).

Conflict of interest disclosure

The authors declare that they comply with the PCI rule of having no financial conflicts of interest in relation to the content of the article.

Funding

The post-doctoral research grant of Anne-Laure Le Gac was supported by the Conseil regional Bretagne (SAD18001).

Acknowledgements

Preprint version v4 of this article has been peer-reviewed and recommended by Peer CommunityIn Genomics (https://doi.org/10.24072/pci.genomics.100334; Sabot, 2025). We are grateful to Luis Portillo-Lemus for developing the high molecular weight genomic DNA extraction protocol. All sequencing experiments were performed at the PGTB (https://doi.org/10.15454/1.5572396583599417E12).

Appendix

Figure A1 - QUAST evaluation of performance of the four assembly tools (using corrected or uncorrected SRs). A: Comparison of plastome fraction, duplication rate and size of the largest alignment obtained. B: Comparison of classic metrics (NGA50 and LGA50), number of errors (misassemblies and mismatches) produced.

Figure A2 - BANDAGE visualization of the L. grandiflora plastome assembly graphs on corrected or uncorrected SRs. Contigs are colored according to their BLAST match to the LSC (blue), SSC (green), and IR (red) segments.

Figure A3 - Graphs representing the assemblies of L. grandiflora long reads. A: Contigs are represented in light blue and the three segments (LSC, SSC and IR) in dark blue, green and yellow, respectively. B: Comparative effectiveness of CANU and RATATOSK correctors.

Figure A4 - Comparison of LSC, SSC and IR sizes in the Onagraceae. A: Comparison of the sizes of LSC, SSC and IR segments in the Onograceae family (Chamaenerion in blue, Circaea in yellow, Epibolium in dark purple, Ludwigia in light green and Oenothera in dark green). B: Maximum likelihood tree made using RAxML (model GTR-GAMMA, algorithm Rapid Hill-climbing) on multiple sequences alignment of Onograceae plastomes made using MAFFT. C: Average size of the different chloroplast segments (LSC, SSC and IR) for the 5 genres of Onograceae. IR size corresponds to the sum of the two copies.

Figure A5 - Comparative analysis of Simple-Sequence Repeats (SSRs) in Ludwigia chloroplast genomes. A: SSR numbers detected in the three species, by repeat class types (mono, di-, tri-, tetra and pentanucleotides). B: Frequency of SSR motifs by repeat class types. C: Frequency of SSRs in LSC, SSC and IR regions. D: Repartition of SSRs in intergenic, protein-coding and intronic regions.

Figure A6 - Diagram showing the position of tandem repeats in the accD gene. L. octovalis (in red) and L. peploides and L. grandiflora (in green). We also observe the consequences of these repetitions on the insertion of amino acids, also repeated.

Figure A7 - Comparison of the three Ludwigia plastomes using mVISTA, with the L. octovalvis as a reference. A: The y-axis represents the identity percentage (between 50 and 100%). The arrows show the genes (in green: proteins genes, in purple: rRNAs and in fuchsia: tRNAs). Blue blocks indicate exonic regions. LCS, IR and SSC regions are also distinguished (in dark blue, red and green, respectively). The second line corresponds to L. grandiflora haplotype 2 (For this haplotype, SSC segment is oriented like L. octovalvis) and the third line corresponds to L. peploides for which the SSC region has been artificially oriented in the same way as the two other plastomes to allow comparison. B: Small box showing a part of the alignment and presenting the consequences if we do not artificially orient the SSC segments in the same direction for the analysis.

Figure A8 - Lollipop diagram allowing the visualization of SNPs and their translational effects on the ycf2. A: localization of the 256 single nucleotide polymorphisms (SNP) observed by comparing L. grandiflora-L. peploides with L. octovalvis. Two regions particularly dense in SNPs (between 3420 and 3460 and between 6100 and 6600) have been zoomed into to allow better reading. B: Effect of these SNPs on the translated sequence of L. octovalvis, compared to Ycf2 of the other two species: non conservative mutation: red square; conservative mutation: circle green; deletion: triangle_point_up blue and insertion: triangle_point_down, orange. As for A, two regions were zoomed into in order to distinguish each mutation.


References

[1] Amiryousefi, A.; Hyvönen, J.; Poczai, P. IRscope: an online program to visualize the junction sites of chloroplast genomes, Bioinformatics, Volume 34 (2018) no. 17 | DOI

[2] Anita, V. P. D.; Matra, D. D.; Siregar, U. J. Chloroplast genome draft assembly of Falcataria moluccana using hybrid sequencing technology, BMC Research Notes, Volume 16 (2023) no. 1 | DOI

[3] Antil, S.; Abraham, J. S.; Sripoorna, S.; Maurya, S.; Dagar, J.; Makhija, S.; Bhagat, P.; Gupta, R.; Sood, U.; Lal, R.; Toteja, R. DNA barcoding, an effective tool for species identification: a review, Molecular Biology Reports, Volume 50 (2023) no. 1, pp. 761-775 | DOI

[4] Bankevich, A.; Nurk, S.; Antipov, D.; Gurevich, A. A.; Dvorkin, M.; Kulikov, A. S.; Lesin, V. M.; Nikolenko, S. I.; Pham, S.; Prjibelski, A. D.; Pyshkin, A. V.; Sirotkin, A. V.; Vyahhi, N.; Tesler, G.; Alekseyev, M. A.; Pevzner, P. A. SPAdes: A new genome assembly algorithm and its applications to single-cell sequencing, Journal of Computational Biology, Volume 19 (2012) no. 5 | DOI

[5] Barloy, D.; Portillo-Lemus, L.; Krueger-Hadfield, S.; Huteau, V.; Coriton, O. Genomic relationships among diploid and polyploid species of the genus Ludwigia L. section Jussiaea using a combination of molecular cytogenetic, morphological, and crossing investigations, Peer Community Journal, Volume 4 (2024) | DOI

[6] Barthet, M. M.; Hilu, K. W. Expression of matK: Functional and evolutionary implications, American Journal of Botany, Volume 94 (2007) no. 8 | DOI

[7] Bateman, R. M.; Rudall, P. J.; Murphy, A. R. M.; Cowan, R. S.; Devey, D. S.; Peréz-Escobar, O. A. Whole plastomes are not enough: Phylogenomic and morphometric exploration at multiple demographic levels of the bee orchid clade Ophrys sect. Sphegodes, Journal of Experimental Botany, Volume 72 (2021) no. 2 | DOI

[8] Bedoya, A. M.; Madriñán, S. Evolution of the aquatic habit in Ludwigia (Onagraceae): Morpho-anatomical adaptive strategies in the Neotropics, Aquatic Botany, Volume 120 (2015) no. PB | DOI

[9] Beier, S.; Thiel, T.; Münch, T.; Scholz, U.; Mascher, M. MISA-web: A web server for microsatellite prediction, Bioinformatics, Volume 33 (2017) no. 16 | DOI

[10] Belser, C.; Istace, B.; Denis, E.; Dubarry, M.; Baurens, F.-C.; Falentin, C.; Genete, M.; Berrabah, W.; Chèvre, A.-M.; Delourme, R.; Deniot, G.; Denoeud, F.; Duffé, P.; Engelen, S.; Lemainque, A.; Manzanares-Dauleux, M.; Martin, G.; Morice, J.; Noel, B.; Vekemans, X.; D’Hont, A.; Rousseau-Gueutin, M.; Barbe, V.; Cruaud, C.; Wincker, P.; Aury, J.-M. Chromosome-scale assemblies of plant genomes using nanopore long reads and optical maps, Nature Plants, Volume 4 (2018) no. 11, pp. 879-887 | DOI

[11] Benson, G. Tandem repeats finder: A program to analyze DNA sequences, Nucleic Acids Research, Volume 27 (1999) no. 2 | DOI

[12] Bharadwaj, B.; Mishegyan, A.; Nagalingam, S.; Guenther, A.; Joshee, N.; Sherman, S. H.; Basu, C. Physiological and genetic responses of lentil (Lens culinaris) under flood stress, Plant Stress, Volume 7 (2023) | DOI

[13] Billet, K.; Genitoni, J.; Bozec, M.; Renault, D.; Barloy, D. Aquatic and terrestrial morphotypes of the aquatic invasive plant, Ludwigia grandiflora, show distinct morphological and metabolomic responses, Ecology and Evolution, Volume 8 (2018) no. 5 | DOI

[14] Brudno, M.; Do, C. B.; Cooper, G. M.; Kim, M. F.; Davydov, E.; Program, N. C. S.; Green, E. D.; Sidow, A.; Batzoglou, S. LAGAN and Multi-LAGAN: Efficient Tools for Large-Scale Multiple Alignment of Genomic DNA, Genome Research, Volume 13 (2003) no. 4, pp. 721-731 | DOI

[15] Chen, C.; Chen, H.; Zhang, Y.; Thomas, H. R.; Frank, M. H.; He, Y.; Xia, R. TBtools: An Integrative Toolkit Developed for Interactive Analyses of Big Biological Data, Molecular Plant, Volume 13 (2020) no. 8, pp. 1194-1202 | DOI

[16] Chen, J.; Zang, Y.; Shang, S.; Yang, Z.; Liang, S.; Xue, S.; Wang, Y.; Tang, X. Chloroplast genomic comparison provides insights into the evolution of seagrasses, BMC Plant Biology, Volume 23 (2023) no. 1 | DOI

[17] Chen, S.; Zhou, Y.; Chen, Y.; Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor, Bioinformatics, Volume 34 (2018) no. 17 | DOI

[18] Chikhi, R.; Medvedev, P. Informed and automated k-mer size selection for genome assembly, Bioinformatics, Volume 30 (2014) no. 1 | DOI

[19] Csanad, G.; Pal, M. Two distinct plastid genome configurations and unprecedented intraspecies length variation in the accD coding region in Medicago truncatula, DNA Research, Volume 21 (2014) no. 4 | DOI

[20] Dandelot, S.; Verlaque, R.; Dutartre, A.; Cazaubon, A. Ecological, Dynamic and Taxonomic Problems Due to Ludwigia (Onagraceae) in France, Hydrobiologia, Volume 551 (2005) no. 1, pp. 131-136 | DOI

[21] Dierckxsens, N.; Mardulyn, P.; Smits, G. NOVOPlasty:de novoassembly of organelle genomes from whole genome data, Nucleic Acids Research (2017) | DOI

[22] Drescher, A.; Ruf, S.; Calsa, T.; Carrer, H.; Bock, R. The two largest chloroplast genome‐encoded open reading frames of higher plants are essential genes, The Plant Journal, Volume 22 (2000) no. 2, pp. 97-104 | DOI

[23] Eyde, R. H. Reproductive Structures and Evolution in Ludwigia (Onagraceae). I. Androecium, Placentation, Merism, Annals of the Missouri Botanical Garden, Volume 64 (1977) no. 3, pp. 644-655 | DOI

[24] Eyde, R. H. Reproductive Structures and Evolution in Ludwigia (Onagraceae). III. Vasculature, Nectaries, Conclusions, Annals of the Missouri Botanical Garden, Volume 68 (1981) no. 3, pp. 379-412 | DOI

[25] Feng, J.-l.; Wu, L.-w.; Wang, Q.; Pan, Y.-j.; Li, B.-l.; Lin, Y.-l.; Yao, H. Comparison Analysis Based on Complete Chloroplast Genomes and Insights into Plastid Phylogenomic of Four Iris Species, BioMed Research International, Volume 2022 (2022) no. 1 | DOI

[26] Ferrarini, M.; Moretto, M.; Ward, J. A.; Šurbanovski, N.; Stevanović, V.; Giongo, L.; Viola, R.; Cavalieri, D.; Velasco, R.; Cestaro, A.; Sargent, D. J. An evaluation of the PacBio RS platform for sequencing and de novo assembly of a chloroplast genome, BMC Genomics, Volume 14 (2013) no. 1 | DOI

[27] Filip, E.; Skuza, L. Horizontal Gene Transfer Involving Chloroplasts, International Journal of Molecular Sciences, Volume 22 (2021) no. 9 | DOI

[28] Frazer, K. A.; Pachter, L.; Poliakov, A.; Rubin, E. M.; Dubchak, I. VISTA: computational tools for comparative genomics, Nucleic Acids Research, Volume 32 (2004) | DOI

[29] Gioria, M.; Hulme, P. E.; Richardson, D. M.; Pyšek, P. Why Are Invasive Plants Successful?, Annual Review of Plant Biology, Volume 74 (2023) no. 1, pp. 635-670 | DOI

[30] Greiner, S.; Lehwark, P.; Bock, R. OrganellarGenomeDRAW (OGDRAW) version 1.3.1: expanded toolkit for the graphical visualization of organellar genomes, Nucleic Acids Research, Volume 47 (2019) no. W1 | DOI

[31] Grewell, B. J.; Netherland, M. D.; Thomason, M. J. S. Establishing research and management priorities for invasive water primroses (Ludwigia spp.), Aquatic Plant Control Research Program, US Army Corps of Engineers, Engineer Research and Development Center, Environmental Laboratory Technical Report ERDC/ELTR-15-X (2016) no. February

[32] Gu, H.; Wang, Y.; Xie, H.; Qiu, C.; Zhang, S.; Xiao, J.; Li, H.; Chen, L.; Li, X.; Ding, Z. Drought stress triggers proteomic changes involving lignin, flavonoids and fatty acids in tea plants, Scientific Reports, Volume 10 (2020) no. 1 | DOI

[33] Gualberto, J. M.; Mileshina, D.; Wallet, C.; Niazi, A. K.; Weber-Lotfi, F.; Dietrich, A. The plant mitochondrial genome: Dynamics and maintenance, Biochimie, Volume 100 (2014), pp. 107-120 | DOI

[34] Guo, Y.-Y.; Yang, J.-X.; Li, H.-K.; Zhao, H.-S. Chloroplast Genomes of Two Species of Cypripedium: Expanded Genome Size and Proliferation of AT-Biased Repeat Sequences, Frontiers in Plant Science, Volume 12 (2021) | DOI

[35] Gurdon, C.; Maliga, P. Two Distinct Plastid Genome Configurations and Unprecedented Intraspecies Length Variation in the accD Coding Region in Medicago truncatula, DNA Research, Volume 21 (2014) no. 4, pp. 417-427 | DOI

[36] Gurevich, A.; Saveliev, V.; Vyahhi, N.; Tesler, G. QUAST: Quality assessment tool for genome assemblies, Bioinformatics, Volume 29 (2013) no. 8, pp. 1072-1075 | DOI

[37] Gurusaran, M.; Ravella, D.; Sekar, K. RepEx: Repeat extractor for biological sequences, Genomics, Volume 102 (2013) no. 4, pp. 403-408 | DOI

[38] Haury, J.; Druel, A.; Cabral, T.; Paulet, Y.; Bozec, M.; Coudreuse, J. Which adaptations of some invasive Ludwigia spp. (Rosidae, Onagraceae) populations occur in contrasting hydrological conditions in Western France?, Hydrobiologia, Volume 737 (2014) no. 1, pp. 45-56 | DOI

[39] Hoch, P. C.; Wagner, W. L.; Raven, P. The correct name for a section of Ludwigia L. (Onagraceae), PhytoKeys, Volume 50 (2015), pp. 31-34 | DOI

[40] Holley, G.; Beyter, D.; Ingimundardottir, H.; Møller, P. L.; Kristmundsdottir, S.; Eggertsson, H. P.; Halldorsson, B. V. Ratatosk: hybrid error correction of long reads enables accurate variant calling and assembly, Genome Biology, Volume 22 (2021) no. 1 | DOI

[41] Hu, Y.; Zhang, Q.; Rao, G.; Sodmergen Occurrence of Plastids in the Sperm Cells of Caprifoliaceae: Biparental Plastid Inheritance in Angiosperms is Unilaterally Derived from Maternal Inheritance, Plant and Cell Physiology, Volume 49 (2008) no. 6, pp. 958-968 | DOI

[42] Huang, L.; Sun, Y.; Jin, Y.; Gao, Q.; Hu, X.; Gao, F.; Yang, X.; Zhu, J.; El‐Kassaby, Y. A.; Mao, J. Development of high transferability cp<scp>SSR</scp> markers for individual identification and genetic investigation in Cupressaceae species, Ecology and Evolution, Volume 8 (2018) no. 10, pp. 4967-4977 | DOI

[43] Hussner, A.; Windhaus, M.; Starfinger, U. From weed biology to successful control: an example of successful management of Ludwigia grandiflora in Germany, Weed Research, Volume 56 (2016) no. 6, pp. 434-441 | DOI

[44] Jackman, S. D.; Vandervalk, B. P.; Mohamadi, H.; Chu, J.; Yeo, S.; Hammond, S. A.; Jahesh, G.; Khan, H.; Coombe, L.; Warren, R. L.; Birol, I. ABySS 2.0: resource-efficient assembly of large genomes using a Bloom filter, Genome Research, Volume 27 (2017) no. 5, pp. 768-777 | DOI

[45] Jain, M.; Koren, S.; Miga, K. H.; Quick, J.; Rand, A. C.; Sasani, T. A.; Tyson, J. R.; Beggs, A. D.; Dilthey, A. T.; Fiddes, I. T.; Malla, S.; Marriott, H.; Nieto, T.; O'Grady, J.; Olsen, H. E.; Pedersen, B. S.; Rhie, A.; Richardson, H.; Quinlan, A. R.; Snutch, T. P.; Tee, L.; Paten, B.; Phillippy, A. M.; Simpson, J. T.; Loman, N. J.; Loose, M. Nanopore sequencing and assembly of a human genome with ultra-long reads, Nature Biotechnology, Volume 36 (2018) no. 4, pp. 338-345 | DOI

[46] Jin, J.-J.; Yu, W.-B.; Yang, J.-B.; Song, Y.; dePamphilis, C. W.; Yi, T.-S.; Li, D.-Z. GetOrganelle: a fast and versatile toolkit for accurate de novo assembly of organelle genomes, Genome Biology, Volume 21 (2020) no. 1 | DOI

[47] Jones, K.; Cleland, R. E. Oenothera, Cytogenetics and Evolution, Kew Bulletin, Volume 29 (1974) no. 1 | DOI

[48] Kamoshita, A.; Ikeda, H.; Yamagishi, J.; Lor, B.; Ouk, M. Residual effects of cultivation methods on weed seed banks and weeds in Cambodia, Weed Biology and Management, Volume 16 (2016) no. 3, pp. 93-107 | DOI

[49] Katoh, K. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform, Nucleic Acids Research, Volume 30 (2002) no. 14, pp. 3059-3066 | DOI

[50] Kikuchi, S.; Asakura, Y.; Imai, M.; Nakahira, Y.; Kotani, Y.; Hashiguchi, Y.; Nakai, Y.; Takafuji, K.; Bédard, J.; Hirabayashi-Ishioka, Y.; Mori, H.; Shiina, T.; Nakai, M. A Ycf2-FtsHi Heteromeric AAA-ATPase Complex Is Required for Chloroplast Protein Import, The Plant Cell, Volume 30 (2018) no. 11, pp. 2677-2703 | DOI

[51] Kode, V.; Mudd, E. A.; Iamtham, S.; Day, A. The tobacco plastid accD gene is essential and is required for leaf development, The Plant Journal, Volume 44 (2005) no. 2, pp. 237-244 | DOI

[52] Kolmogorov, M.; Yuan, J.; Lin, Y.; Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs, Nature Biotechnology, Volume 37 (2019) no. 5, pp. 540-546 | DOI

[53] Konishi, T.; Sasaki, Y. Compartmentalization of two forms of acetyl-CoA carboxylase in plants and the origin of their tolerance toward herbicides., Proceedings of the National Academy of Sciences, Volume 91 (1994) no. 9, pp. 3598-3601 | DOI

[54] Koren, S.; Walenz, B. P.; Berlin, K.; Miller, J. R.; Bergman, N. H.; Phillippy, A. M. Canu: scalable and accurate long-read assembly via adaptivek-mer weighting and repeat separation, Genome Research, Volume 27 (2017) no. 5, pp. 722-736 | DOI

[55] Kranz, R.; Lill, R.; Goldman, B.; Bonnard, G.; Merchant, S. Mmicular mechanisms of cytochrome c biogenesis: three distinct systems, Molecular Microbiology, Volume 29 (1998) no. 2, pp. 383-396 | DOI

[56] Lambert, E.; Dutartre, A.; Coudreuse, J.; Haury, J. Relationships between the biomass production of invasive Ludwigia species and physical properties of habitats in France, Hydrobiologia, Volume 656 (2010) no. 1, pp. 173-186 | DOI

[57] Lehwark, P.; Greiner, S. GB2sequin - A file converter preparing custom GenBank files for database submission, Genomics, Volume 111 (2019) no. 4, pp. 759-761 | DOI

[58] Leontaritou, P.; Lamari, F. N.; Papasotiropoulos, V.; Iatrou, G. Exploration of genetic, morphological and essential oil variation reveals tools for the authentication and breeding of Salvia pomifera subsp. calycina (Sm.) Hayek, Phytochemistry, Volume 191 (2021) | DOI

[59] Levin, R. A.; Wagner, W. L.; Hoch, P. C.; Hahn, W. J.; Rodriguez, A.; Baum, D. A.; Katinas, L.; Zimmer, E. A.; Sytsma, K. J. Paraphyly in Tribe Onagreae: Insights into Phylogenetic Relationships of Onagraceae Based on Nuclear and Chloroplast Sequence Data, Systematic Botany, Volume 29 (2004) no. 1, pp. 147-164 | DOI

[60] Levin, R. A.; Wagner, W. L.; Hoch, P. C.; Nepokroeff, M.; Pires, J. C.; Zimmer, E. A.; Sytsma, K. J. Family‐level relationships of Onagraceae based on chloroplast rbcL and ndhF data, American Journal of Botany, Volume 90 (2003) no. 1, pp. 107-115 | DOI

[61] Li, D.; Luo, R.; Liu, C.-M.; Leung, C.-M.; Ting, H.-F.; Sadakane, K.; Yamashita, H.; Lam, T.-W. MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices, Methods, Volume 102 (2016), pp. 3-11 | DOI

[62] Li, J.; Su, Y.; Wang, T. The Repeat Sequences and Elevated Substitution Rates of the Chloroplast accD Gene in Cupressophytes, Frontiers in Plant Science, Volume 9 (2018) | DOI

[63] Li, J.; Wang, Y.; Cui, J.; Wang, W.; Liu, X.; Chang, Y.; Yao, D.; Cui, J. Removal effects of aquatic plants on high-concentration phosphorus in wastewater during summer, Journal of Environmental Management, Volume 324 (2022) | DOI

[64] Li, L.; Liu, C.; Hou, K.; Liu, W. Comparative Analyses of Plastomes of Four Anubias (Araceae) Taxa, Tropical Aquatic Plants Endemic to Africa, Genes, Volume 13 (2022) no. 11 | DOI

[65] Lian, C.; Yang, H.; Lan, J.; Zhang, X.; Zhang, F.; Yang, J.; Chen, S. Comparative analysis of chloroplast genomes reveals phylogenetic relationships and intraspecific variation in the medicinal plant Isodon rubescens, PLOS ONE, Volume 17 (2022) no. 4 | DOI

[66] Lihodeevskiy, G. A.; Shanina, E. P. The Use of Long-Read Sequencing to Study the Phylogenetic Diversity of the Potato Varieties Plastome of the Ural Selection, Agronomy, Volume 12 (2022) no. 4 | DOI

[67] Lin, Z.; Zhou, P.; Ma, X.; Deng, Y.; Liao, Z.; Li, R.; Ming, R. Comparative analysis of chloroplast genomes in Vasconcellea pubescens A.DC. and Carica papaya L., Scientific Reports, Volume 10 (2020) no. 1 | DOI

[68] Liu, S. H.; Edwards, C.; Hoch, P. C.; Raven, P. H.; Barber, J. C. Complete plastome sequence of ludwigia octovalvis (Onagraceae), a globally distributed wetland plant, Genome Announcements, Volume 4 (2016) no. 6 | DOI

[69] Liu, S. H.; Hung, K. H.; Hsu, T. W.; Hoch, P. C.; Peng, C. I.; Chiang, T. Y. New insights into polyploid evolution and dynamic nature of Ludwigia section Isnardia (Onagraceae), Botanical Studies, Volume 64 (2023) no. 1 | DOI

[70] Liu, S.-H.; Hoch, P. C.; Diazgranados, M.; Raven, P. H.; Barber, J. C. Multi‐locus phylogeny of Ludwigia (Onagraceae): Insights on infra‐ generic relationships and the current classification of the genus, TAXON, Volume 66 (2017) no. 5, pp. 1112-1127 | DOI

[71] Liu, S.-H.; Yang, H.-A.; Kono, Y.; Hoch, P. C.; Barber, J. C.; Peng†, C.-I.; Chung, K.-F. Disentangling Reticulate Evolution of North Temperate Haplostemonous Ludwigia (Onagraceae), Annals of the Missouri Botanical Garden, Volume 105 (2020) no. 2, pp. 163-182 | DOI

[72] Luo, Y.; He, J.; Lyu, R.; Xiao, J.; Li, W.; Yao, M.; Pei, L.; Cheng, J.; Li, J.; Xie, L. Comparative Analysis of Complete Chloroplast Genomes of 13 Species in Epilobium, Circaea, and Chamaenerion and Insights Into Phylogenetic Relationships of Onagraceae, Frontiers in Genetics, Volume 12 (2021) | DOI

[73] Madoka, Y.; Tomizawa, K.-I.; Mizoi, J.; Nishida, I.; Nagano, Y.; Sasaki, Y. Chloroplast Transformation with Modified accD Operon Increases Acetyl-CoA Carboxylase and Causes Extension of Leaf Longevity and Increase in Seed Yield in Tobacco, Plant and Cell Physiology, Volume 43 (2002) no. 12, pp. 1518-1525 | DOI

[74] Maheswari, P.; Kunhikannan, C.; Yasodha, R. Chloroplast genome analysis of Angiosperms and phylogenetic relationships among Lamiaceae members with particular reference to teak (Tectona grandis L.f), Journal of Biosciences, Volume 46 (2021) no. 2 | DOI

[75] Marks, R. A.; Hotaling, S.; Frandsen, P. B.; VanBuren, R. Representation and participation across 20 years of plant genome sequencing, Nature Plants, Volume 7 (2021) no. 12, pp. 1571-1578 | DOI

[76] Mohanta, T. K.; Mishra, A. K.; Khan, A.; Hashem, A.; Abd_Allah, E. F.; Al-Harrasi, A. Gene Loss and Evolution of the Plastome, Genes, Volume 11 (2020) no. 10 | DOI

[77] Moravcová, L.; Pyšek, P.; Jarošík, V.; Pergl, J. Getting the Right Traits: Reproductive and Dispersal Characteristics Predict the Invasiveness of Herbaceous Plant Species, PLOS ONE, Volume 10 (2015) no. 4 | DOI

[78] Nath, O.; Fletcher, S. J.; Hayward, A.; Shaw, L. M.; Masouleh, A. K.; Furtado, A.; Henry, R. J.; Mitter, N. A haplotype resolved chromosomal level avocado genome allows analysis of novel avocado genes, Horticulture Research, Volume 9 (2022) | DOI

[79] Nesom, G. L.; Kartesz, J. T. Observations on the Ludwigia uruguayensis complex (Onagraceae) in the United States, Castanea, Volume 65 (2000) no. 2, pp. 123-125

[80] Nikolenko, S. I.; Korobeynikov, A. I.; Alekseyev, M. A. BayesHammer: Bayesian clustering for error correction in single-cell sequencing, BMC Genomics, Volume 14 (2013) | DOI

[81] Oldenburg, D. J.; Bendich, A. J. The linear plastid chromosomes of maize: terminal sequences, structures, and implications for DNA replication, Current Genetics, Volume 62 (2016) no. 2 | DOI

[82] Ou, J.; Zhu, L. J. trackViewer: a Bioconductor package for interactive and integrative visualization of multi-omics data, 2019 | DOI

[83] Panova, M.; Aronsson, H.; Cameron, R. A.; Dahl, P.; Godhe, A.; Lind, U.; Ortega-Martinez, O.; Pereyra, R.; Tesson, S. V. M.; Wrange, A.-L.; Blomberg, A.; Johannesson, K. DNA Extraction Protocols for Whole-Genome Sequencing in Marine Organisms, Methods in Molecular Biology, Springer New York, New York, NY, 2016, pp. 13-44 | DOI

[84] R Core Team RStudio: Integrated Development Environment for R, 2015

[85] Rang, F. J.; Kloosterman, W. P.; de Ridder, J. From squiggle to basepair: computational approaches for improving nanopore sequencing read accuracy, Genome Biology, Volume 19 (2018) no. 1 | DOI

[86] Raven, P. The Old World species of Ludwigia including Jussia, Reinwardtia, Volume 6 (1963), pp. 327-427

[87] Raven, P. H.; Tai, W. Observations of Chromosomes in Ludwigia (Onagraceae), Annals of the Missouri Botanical Garden, Volume 66 (1979), pp. 862-879

[88] Reddy, A. M.; Pratt, P. D.; Grewell, B. J.; Harms, N. E.; Walsh, G. C.; Hern, M. C.; Faltlhauser, A.; Cibils-stewart, X. Biological control of invasive water primroses, Ludwigia spp., in the United States: A feasibility assessment, Journal Of Aquatic Plant Management, Volume 59 (2021), pp. 67-77

[89] Richardson, A. O.; Palmer, J. D. Horizontal gene transfer in plants, Journal of Experimental Botany, Volume 58 (2006) no. 1, pp. 1-9 | DOI

[90] Rozas, J.; Rozas, R. DnaSP version 3: an integrated program for molecular population genetics and molecular evolution analysis., Bioinformatics, Volume 15 (1999) no. 2, pp. 174-175 | DOI

[91] Rozas, J.; Ferrer-Mata, A.; Sánchez-DelBarrio, J. C.; Guirao-Rico, S.; Librado, P.; Ramos-Onsins, S. E.; Sánchez-Gracia, A. DnaSP 6: DNA Sequence Polymorphism Analysis of Large Data Sets, Molecular Biology and Evolution, Volume 34 (2017) no. 12, pp. 3299-3302 | DOI

[92] Sabot, F. Onagre, monster, invasion and genetics., Peer Community in Genomics, Volume 100334 (2025) | DOI

[93] Sato, N. Are Cyanobacteria an Ancestor of Chloroplasts or Just One of the Gene Donors for Plants and Algae?, Genes, Volume 12 (2021) no. 6 | DOI

[94] Scheunert, A.; Dorfner, M.; Lingl, T.; Oberprieler, C. Can we use it? On the utility of de novo and reference-based assembly of Nanopore data for plant plastome sequencing, PLOS ONE, Volume 15 (2020) no. 3 | DOI

[95] Schmitz, U. K.; Kowallik, K.-V. Plastid inheritance in Epilobium, Current Genetics, Volume 11 (1986) no. 1, pp. 1-5 | DOI

[96] Schreier, T. B.; Cléry, A.; Schläfli, M.; Galbier, F.; Stadler, M.; Demarsy, E.; Albertini, D.; Maier, B. A.; Kessler, F.; Hörtensteiner, S.; Zeeman, S. C.; Kötting, O. Plastidial NAD-Dependent Malate Dehydrogenase: A Moonlighting Protein Involved in Early Chloroplast Development through Its Interaction with an FtsH12-FtsHi Protease Complex, The Plant Cell, Volume 30 (2018) no. 8, pp. 1745-1769 | DOI

[97] Simpson, J. T.; Wong, K.; Jackman, S. D.; Schein, J. E.; Jones, S. J.; Birol, İ. ABySS: A parallel assembler for short read sequence data, Genome Research, Volume 19 (2009) no. 6, pp. 1117-1123 | DOI

[98] Snoussi, M.; Riahi, L.; Ben Romdhane, M.; Mliki, A.; Zoghlami, N. Chloroplast DNA Diversity of Tunisian Barley Landraces as Revealed by cpSSRs Molecular Markers and Implication for Conservation Strategies, Genetics Research, Volume 2022 (2022), pp. 1-7 | DOI

[99] Soliman, A.; Hamed, A.; Hamdy, R. Ludwigia stolonifera , insight into its phenotypic plasticity, habitat diversity and associated species, Egyptian Journal of Botany (2018) | DOI

[100] Song, S.-L.; Lim, P.-E.; Phang, S.-M.; Lee, W.-W.; Hong, D. D.; Prathep, A. Development of chloroplast simple sequence repeats (cpSSRs) for the intraspecific study of Gracilaria tenuistipitata (Gracilariales, Rhodophyta) from different populations, BMC Research Notes, Volume 7 (2014) no. 1 | DOI

[101] Stamatakis, A. RAxML version 8: A tool for phylogenetic analysis and post-analysis of large phylogenies, Bioinformatics, Volume 30 (2014) no. 9, pp. 1312-1313 | DOI

[102] Tillich, M.; Lehwark, P.; Pellizzer, T.; Ulbricht-Jones, E. S.; Fischer, A.; Bock, R.; Greiner, S. GeSeq – versatile and accurate annotation of organelle genomes, Nucleic Acids Research, Volume 45 (2017) no. W1 | DOI

[103] Tonti‐Filippini, J.; Nevill, P. G.; Dixon, K.; Small, I. What can we do with 1000 plastid genomes?, The Plant Journal, Volume 90 (2017) no. 4, pp. 808-818 | DOI

[104] Twyford, A. D.; Ness, R. W. Strategies for complete plastid genome sequencing, Molecular Ecology Resources, Volume 17 (2017) no. 5, pp. 858-868 | DOI

[105] de Vries, J.; Sousa, F. L.; Bölter, B.; Soll, J.; Gould, S. B. YCF1: A Green TIC?, The Plant Cell, Volume 27 (2015) no. 7, pp. 1827-1833 | DOI

[106] Wagner, W. L.; Hoch, P. C.; Raven, P. H. Revised classification of the Onagraceae, Systematic Botany Monograph, Volume 83 (2007), pp. 1-240

[107] Wang, W.; Lanfear, R. Long-reads reveal that the chloroplast genome exists in two distinct versions in most plants, Genome Biology and Evolution (2019) | DOI

[108] Wang, W.; Schalamun, M.; Morales-Suarez, A.; Kainer, D.; Schwessinger, B.; Lanfear, R. Assembly of chloroplast genomes with long- and short-read data: a comparison of approaches using Eucalyptus pauciflora as a test case, BMC Genomics, Volume 19 (2018) no. 1 | DOI

[109] Wanichthanarak, K.; Nookaew, I.; Pasookhush, P.; Wongsurawat, T.; Jenjaroenpun, P.; Leeratsuwan, N.; Wattanachaisaereekul, S.; Visessanguan, W.; Sirivatanauksorn, Y.; Nuntasaen, N.; Kuhakarn, C.; Reutrakul, V.; Ajawatanawong, P.; Khoomrung, S. Revisiting chloroplast genomic landscape and annotation towards comparative chloroplast genomes of Rhamnaceae, BMC Plant Biology, Volume 23 (2023) no. 1 | DOI

[110] Wheeler, G. L.; Dorman, H. E.; Buchanan, A.; Challagundla, L.; Wallace, L. E. A review of the prevalence, utility, and caveats of using chloroplast simple sequence repeats for studies of plant biology, Applications in Plant Sciences, Volume 2 (2014) no. 12 | DOI

[111] Wick, R. R.; Schultz, M. B.; Zobel, J.; Holt, K. E. Bandage: interactive visualization of de novo genome assemblies, Bioinformatics, Volume 31 (2015) no. 20, pp. 3350-3352 | DOI

[112] Wu, S.; Chen, J.; Li, Y.; Liu, A.; Li, A.; Yin, M.; Shrestha, N.; Liu, J.; Ren, G. Extensive genomic rearrangements mediated by repetitive sequences in plastomes of Medicago and its relatives, BMC Plant Biology, Volume 21 (2021) no. 1 | DOI

[113] Wu, Z.; Liao, R.; Yang, T.; Dong, X.; Lan, D.; Qin, R.; Liu, H. Analysis of six chloroplast genomes provides insight into the evolution of Chrysosplenium (Saxifragaceae), BMC Genomics, Volume 21 (2020) no. 1 | DOI

[114] Xie, Z.; Merchant, S. The Plastid-encoded ccsA Gene Is Required for Heme Attachment to Chloroplast c-type Cytochromes, Journal of Biological Chemistry, Volume 271 (1996) no. 9, pp. 4632-4639 | DOI

[115] Xing, J.; Pan, J.; Yi, H.; Lv, K.; Gan, Q.; Wang, M.; Ge, H.; Huang, X.; Huang, F.; Wang, Y.; Rochaix, J.-D.; Yang, W. The plastid-encoded protein Orf2971 is required for protein translocation and chloroplast quality control, The Plant Cell, Volume 34 (2022) no. 9, pp. 3383-3399 | DOI

[116] Xu, J.; Shen, X.; Liao, B.; Xu, J.; Hou, D. Comparing and phylogenetic analysis chloroplast genome of three Achyranthes species, Scientific Reports, Volume 10 (2020) no. 1 | DOI

[117] Xu, S.; Teng, K.; Zhang, H.; Gao, K.; Wu, J.; Duan, L.; Yue, Y.; Fan, X. Chloroplast genomes of four Carex species: Long repetitive sequences trigger dramatic changes in chloroplast genome structure, Frontiers in Plant Science, Volume 14 (2023) | DOI

[118] Zardini, E.; Raven, P. H. A New Section of Ludwigia (Onagraceae) with a Key to the Sections of the Genus, Systematic Botany, Volume 17 (1992) no. 3 | DOI

[119] Zeb, U.; Wang, X.; AzizUllah, A.; Fiaz, S.; Khan, H.; Ullah, S.; Ali, H.; Shahzad, K. Comparative genome sequence and phylogenetic analysis of chloroplast for evolutionary relationship among Pinus species, Saudi Journal of Biological Sciences, Volume 29 (2022) no. 3, pp. 1618-1627 | DOI

[120] Zerbino, D. R.; Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs, Genome Research, Volume 18 (2008) no. 5, pp. 821-829 | DOI

[121] Zhang, Q.; Sodmergen Why does biparental plastid inheritance revive in angiosperms?, Journal of Plant Research, Volume 123 (2010) no. 2, pp. 201-206 | DOI

[122] Zhang, X.-F.; Landis, J. B.; Wang, H.-X.; Zhu, Z.-X.; Wang, H.-F. Comparative analysis of chloroplast genome structure and molecular dating in Myrtales, BMC Plant Biology, Volume 21 (2021) no. 1 | DOI

[123] Zhang, Y.; Du, L.; Liu, A.; Chen, J.; Wu, L.; Hu, W.; Zhang, W.; Kim, K.; Lee, S.-C.; Yang, T.-J.; Wang, Y. The Complete Chloroplast Genome Sequences of Five Epimedium Species: Lights into Phylogenetic and Taxonomic Analyses, Frontiers in Plant Science, Volume 7 (2016) | DOI

[124] Zhong, Q.; Yang, S.; Sun, X.; Wang, L.; Li, Y. The complete chloroplast genome of the Jerusalem artichoke (Helianthus tuberosus L.) and an adaptive evolutionary analysis of the ycf2 gene, PeerJ, Volume 7 (2019) | DOI

[125] Zhong, X. Assembly, annotation and analysis of chloroplast genomes, The University of Western Australia (2020) | DOI