Structural variation turnovers and defective genomes: key drivers for the in vitro evolution of the large double-stranded DNA koi herpesvirus (KHV)

Structural variations (SVs) constitute a significant source of genetic variability in virus genomes. Yet knowledge about SV variability and contribution to the evolutionary process in large doublestranded (ds)DNA viruses is limited. Cyprinid herpesvirus 3 (CyHV-3), also commonly known as koi herpesvirus (KHV), has the largest dsDNA genome within herpesviruses. This virus has become one of the biggest threats to common carp and koi farming, resulting in high morbidity and mortalities of fishes, serious environmental damage, and severe economic losses. A previous study analyzing CyHV-3 virulence evolution during serial passages onto carp cell cultures suggested that CyHV-3 evolves, at least in vitro, through an assembly of haplotypes that alternatively become dominant or under-represented. The present study investigates the SV diversity and dynamics in CyHV-3 genome during 99 serial passages in cell culture using, for the first time, ultra-deep whole-genome and amplicon-based sequencing. The results indicate that KHV polymorphism mostly involves SVs. These SVs display a wide distribution along the genome and exhibit high turnover dynamics with a clear bias towards inversion and deletion events. Analysis of the pathogenesis-associated ORF150 region in ten intermediate cell passages highlighted mainly deletion, inversion and insertion variations that deeply altered the structure of ORF150. Our findings indicate that SV turnovers and defective genomes represent key drivers in the viral population dynamics and in vitro evolution of KHV. Thus, the present study can contribute to the basic research needed to design safe live-attenuated vaccines, classically obtained by viral attenuation after serial passages in cell culture.

can interfere with the wild-type virus replication (Vignuzzi and López 2019). Since then, the role of DVGs in antiviral immunity, viral persistence and their negative impact on virus replication and production has been established (Bull et al., 2003;Li et al., 2011;Vignuzzi and López 2019;Loiseau et al., 2020). Nowadays, DVGs have been described in most RNA viruses and to a lesser extent in dsDNA viruses (Vignuzzi and López 2019;Loiseau et al., 2020). Despite the critical role of SVs in virus infection dynamics, the knowledge about structural variation diversity, and their evolutionary impact in viral populations, especially those with large dsDNA, is limited.
The large dsDNA Cyprinid herpesvirus 3 (CyHV-3), more commonly known as koi herpesvirus (KHV), is one of the most virulent viruses of fish. It is a lethally infectious agent that infects common carp and koi (Cyprinus carpio) at all stages of their life (Hedrick et al., 2000;Haenen et al., 2004). KHV infections are usually associated with high morbidities and mortalities (up to 95%), resulting in serious environmental damages and severe economic losses (Sunarto et al., 2011;Rakus et al., 2013). This threatening virus had a rapid worldwide spread due to global fish trade and international ornamental koi exhibitions (Gotesman et al., 2013). Classified within the family Alloherpesviridae, genus Cyprinivirus, CyHV-3 is the subject of an increasing number of studies and has become the archetype of alloherpesviruses (Boutier et al., 2015). Despite this "status", only 19 isolates have been entirely sequenced so far (source: NCBI) since the release of the first complete genome sequences in 2007 (Aoki et al., 2007). Such a low number of full genomes impairs large-scale phylogenomic studies (Gao et al., 2018). On the other hand, KHV infections have been shown to be the result of haplotype mixtures, both in vivo and in vitro (Hammoumi et al, 2016;Klafack et al, 2019). If mixed-haplotype infections probably represent an additional source of diversification for KHV (Renner and Szpara, 2018), they make genomic comparisons more challenging.
KHV has the largest genome among all known herpesviruses, with a size of approximately 295 kb and 156 predicted open reading frames (ORFs) (Aoki et al. 2007). Several studies focusing on the analysis of viral ORFs have shown the implication of some of them in KHV virulence (Boutier et al., 2015;Fuchs et al. 2014). KHV isolates are known to carry mutations in ORFs that are likely to alter gene functions, and these mutations may vary from virus to virus (Gao et al., 2018) and even within viruses (Hammoumi et al., 2016). Aoki et al (2007) hypothesized that virulent KHV would have arisen from a wild-type ancestor by loss of gene function. However, nearly 15 years later, this hypothesis has still not been tested, probably because of the lack of extensive genomic comparisons. SVs may play a key role in this gene function loss, as recently shown by Klafack et al (2019). These authors conducted a comparative study of a cell culture-propagated isolate that suggested that CyHV-3 evolves through an assemblage of haplotypes whose composition changes within cell passages. This study revealed a deletion of 1,363 bp in the ORF150 of the majority of haplotypes after 78 passages (P78), which was not detected after 99 passages. Furthermore, experimental infections showed that the virus passaged 78 times was much less virulent compared to the original wild-type on the one hand and slightly less virulent compared to the same virus passaged 99 times (P99), highlighting the potentially important role of the ORF150 in the virulence of KHV. Besides, this study demonstrated that haplotype assemblies evolve very rapidly along successive in vitro cell passages during infectious cycles, and raised many questions regarding the mechanisms leading to such rapid gene loss and gain in vitro.
The present study sought to characterize the SV diversity and dynamics in the KHV genome using viruses propagated onto cell cultures. First, P78 and P99 whole virus genomes were sequenced using ultra-deep long-read sequencing, a first with KHV. Then, the obviously pathogenesis-associated ORF150 region (~5 kb) was sequenced in ten intermediate successive cell passages through an Oxford nanopore® amplicon-based sequencing approach to gain insights into the gene loss and gain mechanisms.

Extraction of high molecular weight DNA from P78 and P99 cell culture passages
The virus isolate used in this study was the same as that previously described in Klafack et al. (2019), i.e. an isolate collected from an infected koi in Taiwan (KHV-T) and passed 99 times onto common carp brain (CCB) cells. Considering previous results, a special focus was made on passages 78 (P78) and 99 (P99). Genomic DNA was extracted from cell cultures stored at -80°C, using the MagAttract HMW DNA Kit (Qiagen). Each frozen culture was thawed quickly in a 37°C water bath, equilibrated to room temperature (25°C) and divided into 12 cell culture aliquots of 250 µL. Tubes were centrifuged at 3,000× g for 1 minute and supernatants were transferred into new 2-mL tubes containing 200 µL of proteinase K and RNase A solution. DNA was subsequently extracted according to the manufacturer's recommendations and eluted in 200 SL distilled water provided in the kit. The 12 replicates of each sample were pooled together and evaporated at room temperature using a vacuum concentrator, to reach a final volume of around 60 µL. Concentrated DNA was quantified by fluorometry (Qubit, ThermoFisher Scientific) and its quality was evaluated by spectrophotometry (Nanodrop 2100) and agarose gel electrophoresis. The final concentration of P78 and P99 was 14.4 and 2.6 ng·µL -1 , respectively. consisted in an initial denaturation at 95°C for 5 min followed by 45 cycles of amplification at 95°C for 10 sec, annealing at 60°C for 20 sec and elongation at 72°C for 10 sec with a single fluorescence measurement. After amplification, a melting step was applied, which comprised a denaturation at 95°C for 5 sec, a renaturation at 65°C for 60 sec and a heating step from 65 to 97°C with a ramp of 0.1°C per second and a continuous fluorescence acquisition. Specificity of amplification was verified by visual inspection of the melting profiles, and the ratio between cellular and viral DNA was estimated as 2 -ΔCqCq , assuming that each primer pair has an amplification efficiency close to 2 and that each amplicon is present as a single copy per genome.

Genomic library preparation and Oxford Nanopore whole genome sequencing
High-quality genomic DNA from the two samples (P78 and P99) was sequenced using Oxford

Amplicon-based Oxford nanopore sequencing
To specifically investigate ORF150 region, a fragment of ~4.3 kb encompassing the whole µL of each primer (10 µM). Cycling conditions were as follows: initial denaturation at 95°C for 10 min, amplification with 40 cycles of 95°C for 10 sec, 60°C for 20 sec, 72°C for 3 min, and final extension at 72°C for 5 min. PCR products were purified using 1X Agencourt AMPure XP beads, tested for purity using the NanoDrop™ One spectrophotometer (ThermoFisher Scientific), and quantified fluorometrically using the Qubit dsDNA High sensitivity kit. DNA libraries were prepared using the Rapid Barcoding kit (SQK-RBK004), following the manufacturer's instructions. For each sample, 400 ng of purified amplicon were adjusted with nuclease-free water to a total volume of 7.5 µL and supplemented with 2.5 µL of Fragmentation Mix RB01-4 (one for each sample). Four barcoded samples were combined with an equimolar ratio by mixing 2.5 µL of each sample in a total volume of 10 µL. Furthermore, two additional barcodes RB01-2 were used with the passages P78 and P99 as described above. Pooled libraries were sequenced on 3 R9.4.1 flow cells (4 barcodes, 4 barcodes, 2 barcodes) for 24 hours and sequencing runs were controlled with MinKNOW version 0.49.3.7.

DNA sequence analysis
For each sample, bases from raw FAST5 files with a pass sequencing tag were recalled using order to span large structural variants. After the mapping step using minimap2 (Li, 2018), the sequencing depth was calculated for each sample using the plotCoverage tool implemented in deepTools2.0 tool suite (Ramírez et al., 2016). Sequencing coverage was assessed with the bamCoverage tool from the same tool suite and normalized using the RPGC (reads per genome coverage) method ( Figure 1).

Structural variant detection
To detect structural variants (SVs) in the P78 and P99 whole genomes, a mapping step followed by BAM filtering was performed. Two aligners were used to map the raw long-reads against the KHV-J AP008984.1 reference genome: minimap2 (Li, 2018) and NGMLR (Sedlazeck et al., 2018). BAM files were then filtered using the option '-F' of Samtools view  with the flag "4" to keep only mapped reads and with the flag "0x800" to remove chimeric reads (inconsistent/supplemental mapping). For P78 and P99, 99.33% and 97.77% of reads were mapped, respectively. Chimeric reads represented 28.05% and 17.74% of the mapped reads in P78 and P99, respectively. The resulting filtered BAM files from each mapper were used as input data for SV caller, Sniffles (Sedlazeck et al., 2018). Only SVs ≥ 30 bp and supported by at least 10 reads were kept in the final VCF files. A cross-validation step was performed using SURVIVOR (Jeffares et al., 2017) by extracting common SVs from each mapper/caller combination for each sample. Although the KHV-J reference genome used for the mapping is phylogenetically close to the KHV-T isolate, some genetic diversity exists (Klafack et al. 2017).
Hence, to exclude inter-isolate SVs, a pairwise comparison between P78 and P99 was made.
The distribution of the different SVs along with P78 and P99 genomes was assessed by estimating their occurrences using a 5 kb sliding window. SNPs and Indels variants were called in P78 and P99 by medaka_variant implemented in medaka (1.4.4) using KHV-J AP008984.1 as a reference genome. To detect structural variants in the amplified region (257,345) of P10 to P99 samples, a size filtering step using guppyplex was added to the steps described above. Only reads from 1.5 kb to 8 kb were used for the analysis.

Main features of sequencing data for P78 and P99
A total of 4,900,000 and 2,293,830 long-reads were obtained for P78 and P99, respectively.
After filtering, 462,982 long-reads with an average length of 4.96 kb were retained for P78, and 418,034 reads with an average length of 7.06 kb for P99 (Table 1, Table S1).100% of the sampled bases from the P78 genome had at least 5,000 overlapping reads and 100% of the sampled bases from the P99 genome had at least 7,500 overlapping reads ( Figure S2). Both P78 and P99 genomes were entirely covered by the sequencing data ( Figure 1).
The ratio between cellular and viral DNA, calculated by qPCR, was 34800 and 33200 for P78 and P99, and the percent of mapped reads 99.33% and 97.77%, respectively.

Figure 1.
Normalized sequencing coverage for P78 and P99 samples using the RPGC (reads per genome coverage) method. Both P78 and P99 genomes were totally covered by the sequencing. For P78, the coverage break (green triangle) corresponds to the 1.3 kb deletion. total size Q20=1 in 100 probability of an incorrect base call, Q30= 1 in 1000 probability of an incorrect base call. Here 59.53% of P78 bases=Q20 and 58.27% of P99 bases=Q20.

SV distribution in P78 and P99
For P78, the mapper/caller combination minimap2/Sniffles detected 731 structural variations (SVs), and the combination NGMLR/Sniffles detected 460 SVs (Table S2). For P99, the combination minimap2/Sniffles detected 210 SVs and NGMLR/Sniffles detected 397 SVs (Table   S2). Independently from the mapper/caller combination that was used, P78 showed more SVs than P99 (Table S2). After the cross-validation step (extracting common SVs from each mapper/ caller combination), 236 and 87 SVs were kept for P78 and P99, respectively. For comparison, the number of small variations (with a size < 30 nt, which were excluded from this study) amounted to 57 and 77 for P78 and P99, respectively.
In  (Figure 2.B). The frequencies of these SVs were low and did not exceed 1% of the total reads (with a few exceptions, Table S2). In spite of such low frequencies, it is interesting to note that the most frequent SVs were located in ORFs potentially involved in DNA replication and encapsidation, e.g. ORF33, 46, 47, 55 (Aoki et al, 2007; Table S2).
Altogether, these results highlight high SV turnover dynamics during the in vitro infection cycles (from 78 passages to 99) with a clear trend, or bias, towards inversion and deletion events.

Dynamics and impacts of SVs in ORF150 region
Taking advantage of the high-resolution SV detection provided by the long-read sequencing, we looked for the SV events around the potential virulence-linked ORF150 in P78 and P99 (nt 257,103-261,345 according to AP008984.1). Results confirmed that P99 had a reference-like profile with an unmodified ORF150. In P78, the deletion (nt 258,154-259,517; D258153) was found in 6,902 reads (100% of the reads), whereas the reference haplotype was also detected in 30 reads, representing 0.44% of the total 6,734 supporting reads (Figure 3, Tables S3).
Surprisingly, 26 reads revealed a haplotype as yet unidentified (INV258153), consisting of an inversion of the same length (1,363 bp) and at the same breakpoints as the deletion. The inverted haplotype (INV258153) in P78 deeply altered the ORF150, by inverting the first 1200 bp of the ORF and 160 bp of the 5'UTR in the middle of the ORF (Figure 3).
In order to trace the unexpected dynamics of gain and loss of the full ORF150 along passages, we searched for the SV turnovers during 10 intermediate passages (P10, P20, P30, P40, P50, P70, P78, P80, P90, P99). This analysis revealed the presence of haplotype D258153 at low frequency (from 0.05 to 0.15% of the reads) in passages P10 to P40 and a strong increase in its frequency at P50 (88.7% of the reads) (Figure 4). The frequency of the haplotype D258153 reached a maximum at P78 (100% of the reads) then dropped quickly at P80 (30.7% of the reads) to stabilize at low frequency (0.31% of readings) at P90., as during the first 40 passages ( Figure 4, Table S3). Interestingly, shorter deletions of 119 and 881 bp were observed near the 5' end of the ORF150 in P40 and P80, respectively, at low frequencies (0.42 % in P40 and 0.18% in P80) ( Figure 4, Table S3). The haplotype D258153 completely disappeared at passage 99 ( Figure 4, Table S3).
This analysis also evidenced several other SVs that alter the structure of ORF150 and of its upstream region, including the beginning of ORF149 ( Figure 5). Besides the large deletion, inversions and insertions were also observed in the ORF149-ORF150 region. Inversions were at a low frequency (between 0.01% and 0.53% of the supporting reads) in all passages except for P70, P90 and P99. P10 and P40 showed the lowest and the highest inversion frequencies, respectively ( Figure 5, Table S3). A large insertion of about 1 kb appeared in P50 and P70 at moderate frequencies (14,34% and 16.01% of the supporting reads, respectively) to disappear in P78 and re-appear at a lower frequency (6,79% of the supporting reads) in P80. The consensus sequence of this insertion corresponds to the fragment 259,517-260,477 of the KHV genome, with an identity of about 90%. In P90, an intriguing inverted-duplicated haplotype was observed at a low proportion (0.054% of the supporting reads). Surprisingly, P99 exhibited a unique reference-like, SV-free haplotype ( Figure 5, Table S3). All the variations deeply impacted the structure of ORF150 -and sometimes that of ORF149 as well -by shrinking or increasing its size, causing the ORF149 and ORF150 fusion, inverting the ORF150 sequences and duplicating the ORF150 with the deleted, inserted, inverted and inverted-duplicated haplotypes ( Figure 5).

Discussion
SVs significantly impact the adaptation of viruses to their natural host and environment (Pérez-Losada et al., 2015). Yet the role of SV diversity and dynamics in large DNA viruses is barely known. Ultra-deep long-read sequencing opens unprecedented ways to gain insights into these untapped viral genome polymorphisms. The present study started to tackle the impact of SVs in the evolution of the large dsDNA KHV during cell culture serial passages using ultra-deep whole-genome and amplicon-based sequencing. The sequence data showed a wide distribution of various SVs along the genome associated with high SV turnover dynamics during the in vitro infection cycles and a clear bias towards inversion and deletion events. Analysis of the pathogenicity-associated ORF150 region in ten serial passages mainly highlighted deletions, inversions and insertions that deeply altered the structure of ORF150.
Serial passages of viruses in cell culture may lead to the accumulation of mutations and gene disruptions (Spatz 2010;Colgrove et al., 2014). These mutations can modify viral adaptation and increase or decrease virulence López-Muñoz et al., 2021;. In the case of KHV, a previous work using short-read sequencing showed that 99 consecutive in vitro passages onto CCB cells resulted in the accumulation of less than 60 small variations (<100 nt) (Klafack et al., 2019). It also showed that the haplotype composition can quickly vary along with infection cycles of KHV in vitro. The present study unexpectedly highlighted a high number of structural variations: 87 for P99 and 236 for P78. In contrast, the accumulation of small variations was consistent with what had been observed with short-read sequencing (Klafack et al, 2019). These findings illustrate that long-read sequencing is highly suitable for genome-wide comparisons of viruses. Most importantly, they revealed a hidden source of virus diversification, which had never been reported so far for KHV. They also confirmed that P78 consists of a mixture of undeleted and deleted haplotypes revealing that the undeleted haplotype did not correspond to the native one but to an inverted version of it. We experimentally validated this inversion in P78 by PCR followed by sequencing. Moreover, the inversion found in the middle of the reads excluded the formation of in silico chimeras, i.e., chimeras resulting from the basecaller when two molecules are sequenced in the same pore that undergoes fast reloading (Martin and Legget, 2021). These structural variations form a 'mosaic' of viral subpopulations that seem to result from multiple rearrangement events, mainly inversions and deletions and, to a lesser extent, insertions and duplications. Such sub-viral variants often lead to Defective Viral Genomes (DVGs) (Vignuzzi and López 2019). Because of their negative impact on viral replication, some forms of DVGs have been extensively studied, and three pathogenesis-related functions have been well-described: interference with viral replication, immunostimulation, and viral persistence (Marriott and Dimmock 2010;Vignuzzi and López 2019). DVGs play a role in viral production interference by accumulating at higher rates than the full-length viral genomes and consequently interfere with viral replication by taking up the polymerase activity and competing for structural proteins (Calain and Roux 1995;Portner and Kingsbury 1971). In addition, DVGs can act as primary stimuli and trigger antiviral immunity by inducing the expression of some interleukins and pro-inflammatory cytokines (for review, see Interesting New Gene (RING) domain that is predicted to span 628 amino acids (Li et al, 2015;Aoki et al, 2007). This RING domain contains the HC (C3HC4) type of RING structure, which is involved in the ubiquitination pathway, by acting as a E3 ubiquitin-protein ligase (He and Kwang 2008). Viruses that encode RING finger like-ubiquitin ligases (E3s)may evade host immune responses and also hijack the host's RING E3 to enhance their replication (Zhang et al.,2018).
In aquatic viruses, RING family genes have been reported to be involved in virus latency, replication, and host protein degradation (Shekar and Venugopal 2019;Wang et al., 2021).
As previously observed, the most significant difference affecting all viral haplotypes between P78 and P99 was around the ORF150. For this reason, we focused on this interesting region to determine the passage number at which the presence of the deletion inversion first arose, by sequencing selected representative passages (P10, P20, P30, P40, P50, P70, P80 and P90).
We found that deleted and inverted haplotypes appeared as soon as P10 and that their proportion varied along the successive passages. The most striking feature was the rapid and total disappearance of the deletion between P78 and P99, raising many questions regarding the mechanisms that led to the clearance of this major haplotype.
The rapid SV turnover of DNA viruses, including herpesviruses, likely involves recombination (Szpara and VanDoorslaer, 2021;Renner and Szpara, 2018;Cudini et al., 2019;Kolb et al., 2017;Tomer et al., 2019), which is often linked to replication and DNA repair, as well as errors during viral genome replication (Kulkarni and Fortunato 2011;Xiaofei and Kowalik 2014). In the present case, the pretty good conservation of breakpoints around ORF150 may be a sign of homologous recombination. However, whether this recombination occurs within the same genomic entities or between different viruses remains open. It would be interesting to assess the involvement of each of these mechanisms in generating the observed structural diversity. The multiple mechanisms of DNA virus evolution beyond single nucleotide substitutions likely confer KHV a high level of evolutionary adaptability.
Classically, the generation of live-attenuated vaccines is achieved by passaging the virus in cell culture under different conditions (in different host species or at lower or higher temperatures), in order to induce mutation accumulation that supports viral adaptation to the specific conditions and provides viral attenuation (Minor, 2015, Hanley 2011. With the exception of the OPV polio vaccine viruses (Kew et al., 2005), the exact mechanisms by which these mutations lead to attenuated phenotypes are usually poorly characterized (Lauring et al., 2010). However, liveattenuated viruses can revert to virulent phenotypes either by reversions (as shown here between P78 and P99), introduction of compensatory mutations, or recombination with viruses belonging to the same genus (Cann et al., 1984;Bull et al., 2018;Muslin et al., 2019).
Additionally, the combination of multiple live-attenuated viruses may result in competition or facilitation between individual vaccine viruses, resulting in undesirable increases in virulence or decreases in immunogenicity (Hanley 2011;Pereira-Gomez et al., 2021). Recently, genetic engineering has led to many novel approaches to generate live-attenuated virus vaccines that contain modifications to prevent reversion to virulence (Yeh et al., 2020) and improve interferences among multiple vaccine strains (Pereira-Gomez et al., 2021).

Conclusion
Our findings confirm that CyHV-3 can evolve rapidly during infectious cycles in cell culture, and

Conflict of interest disclosure
The authors declare they have no conflict of interest relating to the content of this article.