A simple procedure to detect, test for the presence of stuttering, and cure stuttered data with spreadsheet programs

Microsatellites are powerful markers for empirical population genetics, but may be affected by ampliﬁcation problems like stuttering that produces heterozygote deﬁcits between alleles with one repeat diﬀerence. In this paper, we present a simple procedure that aims at detecting stuttering for each locus overall subsamples and only requires the use of a spreadsheet interactive application on any operating system. We compare the performances of this procedure with the one of MicroChecker on simulations of dioecious pangamic populations, monoecious selﬁng populations and clonal populations with or without stuttering, and on real data of vectors and parasites. We also propose a cure for loci aﬀected and compare the results with those expected without stuttering. In sexual populations (dioecious or selfers), the new procedure appeared more than three times more eﬃcient than MicroChecker. Cure was able to restore Wright’s FIS of stuttered data to the expected value, and particularly model following following of


Introduction
Despite the recent democratization of NGS based techniques, microsatellite loci are still very useful markers, in particular for empirical population genetics of non-model and small organisms, as many parasites and/or their vectors, which are difficult (or impossible) to study with direct methods as direct observation (as for birds) or as mark-release-recapture approaches. Sequencing and single nucleotide polymorphism markers (SNPs) still represent expensive alternatives in time, money and expertise, which lies beyond the reach of many laboratories and most of the time at the expense of sample sizes. Three decades ago, microsatellite markers were presented as the most powerful genetic markers (Jarne & Lagoda, 1996). However, researchers began to detect the different problems that can arise, as null alleles, allelic dropout, short allele dominance or stuttering (Wattier, Engel et al., 1998;De Meeûs, Humair et al., 2004;Chapuis & Estoup, 2007;Guichoux, Lagache et al., 2011). The last kind of detection tools and possible cures only arose very recently (Wang, Schroeder et al., 2012;De Meeûs, 2018;Manangwa, De Meeûs et al., 2019;De Meeûs, Chan et al., 2021).
Stuttering is the result of the Taq polymerase slippage during the PCR amplification of the targeted DNA strand. This generates several PCR products that differ from each other by one repeat and can cause difficulties when discriminating between fake and true homozygotes, such as heterozygous individuals for dinucleotide microsatellite allele sequences with a single repeat difference (De Meeûs et al., 2021). Stuttering produces heterozygote deficits as compared to Castle-Weinberg (CW) expected genotypic proportions (Castle, 1903;Weinberg, 1908), also known as Hardy-Weinberg (HW) expectations (please have a glance at (De Meeûs et al., 2021) for an explanation why we prefer using CW instead of HW). This phenomenon is locus specific and the deviation produced, as measured by wright's FIS (Wright, 1965), proportional to the intensity with which each locus is affected.
Today, and to our knowledge, the only procedure to detect stuttering is the one used in MicroChecker ( Van Oosterhout, Hutchinson et al., 2004). Though it works well enough, it only studies each locus one by one, which is fine because stuttering presence and intensity are expected to be locus specific. Nonetheless, MicroChecker tests for stuttering in each subsample separately, though a global test might be more powerful. Furthermore, MicroChecker was developed under Microsoft® Windows in 2003 (Windows XP), and it begins to display incompatibility issues with most current systems. A simple alternative, which can detect and test for the presence of stuttering at each locus overall subsamples on any platform kind would thus be welcome and timely.
In this paper, we present a very simple procedure that only requires the use of a spreadsheet interactive computer application such as Apache® OpenOffice Calc or Microsoft® Excel. We compare the performances of this new procedure with the one implemented in MicroCheker on simulated data without (null hypothesis) or with (alternative hypothesis) stuttering, in dioecious populations of various sizes with random mating, hermaphrodites with selfing or clonal populations. We also checked how the cure of loci with stuttering signature, as proposed in a previous work (De Meeûs et al., 2021), restore the values expected for some parameters. We finally reanalyzed four real data sets on vectors and/or their parasite: the tick Ixodes scapularis in North America (De Meeûs et al., 2021); Glossina palpalis palpalis, vector of African trypnosomiasis in Côte d'Ivoire (Berté, De Meeus et al., 2019); the snail Galba truncatula and the fluke it transmits, Fasciola hepatica in France (Correa, De Meeûs et al., 2017); and Trypanosoma brucei gambiense, the agent of sleeping sickness in Guinea and Côte d'Ivoire (Koffi, De Meeûs et al., 2009). On these datasets, we checked if more loci with stuttering could be diagnosed, cured the loci with suspicion of stuttering, following the technique proposed recently (De Meeûs et al., 2021) to verify if some conclusions could be changed.

Methods
Despite the recent democratization of NGS based techniques, microsatellite loci are still very useful markers, in particular for empirical population genetics of non-model and small organisms, as many parasites and/or their vectors, which are difficult (or impossible) to study with direct methods as direct observation (as for birds) or as mark-release-recapture approaches. Sequencing and single nucleotide polymorphism markers (SNPs) still represent expensive alternatives in time, money and expertise, which lies beyond the reach of many laboratories and most of the time at the expense of sample sizes. Three decades ago, microsatellite markers were presented as the most powerful genetic markers (Jarne & Lagoda, 1996). However, researchers began to detect the different problems that can arise, as null alleles, allelic dropout, short allele dominance or stuttering (Wattier et al., 1998;Chapuis & Estoup, 2007;Guichoux et al., 2011). The last kind of detection tools and possible cures only arose very recently (Wang et al., 2012;De Meeûs, 2018;Manangwa et al., 2019;De Meeûs et al., 2021).
Stuttering is the result of the Taq polymerase slippage during the PCR amplification of the targeted DNA strand. This generates several PCR products that differ from each other by one repeat and can cause difficulties when discriminating between fake and true homozygotes, such as heterozygous individuals for dinucleotide microsatellite allele sequences with a single repeat difference (De Meeûs et al., 2021). Stuttering produces heterozygote deficits as compared to Castle-Weinberg (CW) expected genotypic proportions (Castle, 1903;Weinberg, 1908), also known as Hardy-Weinberg (HW) expectations (please have a glance at (De Meeûs et al., 2021) for an explanation why we prefer using CW instead of HW). This phenomenon is locus specific and the deviation produced, as measured by wright's FIS (Wright, 1965), proportional to the intensity with which each locus is affected.
Today, and to our knowledge, the only procedure to detect stuttering is the one used in MicroChecker (Van Oosterhout et al., 2004). Though it works well enough, it only studies each locus one by one, which is fine because stuttering presence and intensity are expected to be locus specific. Nonetheless, MicroChecker tests for stuttering in each subsample separately, though a global test might be more powerful. Furthermore, MicroChecker was developed under Microsoft® Windows in 2003 (Windows XP), and it begins to display incompatibility issues with most current systems. A simple alternative, which can detect and test for the presence of stuttering at each locus overall subsamples on any platform kind would thus be welcome and timely.
In this paper, we present a very simple procedure that only requires the use of a spreadsheet interactive computer application such as Apache® OpenOffice Calc or Microsoft® Excel. We compare the performances of this new procedure with the one implemented in MicroCheker on simulated data without (null hypothesis) or with (alternative hypothesis) stuttering, in dioecious populations of various sizes with random mating, hermaphrodites with selfing or clonal populations. We also checked how the cure of loci with stuttering signature, as proposed in a previous work (De Meeûs et al., 2021), restore the values expected for some parameters. We finally reanalyzed four real data sets on vectors and/or their parasite: the tick Ixodes scapularis in North America (De Meeûs et al., 2021); Glossina palpalis palpalis, vector of African trypnosomiasis in Côte d'Ivoire (Berté et al., 2019); the snail Galba truncatula and the fluke it transmits, Fasciola hepatica in France (Correa et al., 2017); and Trypanosoma brucei gambiense, the agent of sleeping sickness in Guinea and Côte d'Ivoire (Koffi et al., 2009). On these datasets, we checked if more loci with stuttering could be diagnosed, cured the loci with suspicion of stuttering, following the technique proposed recently (De Meeûs et al., 2021) to verify if some conclusions could be changed.

Simulations
Simulations were undertaken with EASYPOP (v. 2.0.1) (Balloux, 2001). We simulated random mating dioecious populations (pangamy), like what probably occurs in the wild for ticks (De Meeûs et al., 2021), Nematocera flies (Prudhomme, De Meeûs et al., 2020), Hemipteran bugs (Gomez-Palacio, Triana et al., 2013), or tsetse flies (Berté et al., 2019). We also simulated selfing monoecious populations, as flukes and water snails (Correa et al., 2017). The total size of populations was NT=10,000 individuals subdivided into either n=100 subpopulations of N=100 individuals, or n=500 and N=20, with an even sex ratio (dioecy) or with selfing rate s=0.3 (monoecy). The model of migration followed an Island model with migration rate m=0.01. We simulated 20 independent loci with a mutation rate of u=0.0001 that followed a mixed model with 70% of mutations following a stepwise mutation model (SMM) and 30% following a KAM model. The maximum possible number of alleles was K=20. Each simulation started with maximum variability and was run for 10,000 generations. At the end of each simulation, 20 individuals (10 males and 10 females in dioecious populations), when N≥20, were randomly sampled in 10 subpopulations. As can be seen with the real datasets reanalyzed in the present work, such a sampling design approximately represents what is classically obtained for most parasites or vectors studies.
Simulations of monoecious populations with 30% of selfing allowed checking the interaction of stuttering detection in inbred populations with a high expected heterozygote deficit (here 18%).
A subset of simulations with n=100, N=100 (same values for other parameters as above) but with 100% clonal propagation, was finally undertaken, to fit with diploid clonal pathogens as trypanosomes (Koffi et al., 2009) or yeasts as Candida albicans (Nébavi, Ayala et al., 2006).
Each parameter set was replicated 10 times.
In the supplementary material, we provided an example with the results files of the first simulation, with the root name "TestStutterDioeciousNoStutter-n1000N100-1" and extensions "txt", "equ", "dat", and "gen", for the parameters used, the statistics along the simulation (all generations) and the resulting data files in Fstat and genepop formats, respectively.

Generating stuttering
Data were analyzed with Fstat 2.9.4 (Goudet, 2003) updated from  to get information on the identity of alleles kept at the end of simulations. For each simulation, data files were imported into a spreadsheet keeping each allele of each locus separated in a single column. Mimicking stuttering needed to follow several steps. We first arbitrarily considered that 10% of possible alleles affected by stuttering was enough. This means that two alleles (out of 20 possible ones) needed to be recoded for stuttering. Because of genetic drift, not all the 20 possible alleles were present at the end of each simulation. For each locus, among the allele still present, only the first two alleles separated by a single repeat were concerned. For each individual carrying one of these alleles as the second allele, if different by a single repeat from the first allele, the second allele was recoded as identical to the first one. Let us assume that the first two alleles separated by a single repeat were, for instance, allele 5 and 6 for the first locus (Locus1). If allele 5 was in cell B2 and allele 6 in cell C2 in the spreadsheet, the command for generating stuttering in a cell from a free zone (e.g. the first free column after the last column of the data) would be: =B2, for the first allele (no change), and =IF(ABS(B2-C2)=1,IF(OR(B2=5,B2=6),B2,C2),C2), for the second allele of that locus. This way, individuals 4/5, 5/6, or 6/7 are recoded as homozygotes for the first allele.
This command was then copied and pasted to transform the whole locus. A template, using the first simulation, can be found in the supplementary material files as the spreadsheet file "TestStutterDioecious-n1000N100-1-10%Stuttering.xlsx".
Because of drift, some loci in some subsamples did not display allele with one repeat difference. Such manipulation thus generated data with 0% to 100% of alleles displaying stuttering for all of the 20 loci, but with various intensity from one locus to the other, and from one subsample to the other, as expected in real situations. This also allowed checking the kind of variance stuttering can generate on parameter estimates (see below).

Detection of stuttering and testing with MicroChecker
All datasets (raw and with stuttering) were analyzed with MicroChecker with 10,000 randomizations. All loci were considered as mononucleotidic, as simulated by Easypop. Stuttering was detected when the observed heterozygosity for alleles with one repeat difference was below the 95% confidence interval (95%CI) for random mating expectation. This was observed from the graphic outputs of MicroChecker ( Figure 1).
For each locus, we summed the number of times MicroChecker found a significant heterozygote deficit probably due to stuttering over the 10 subsamples. We compared this quantity with the expected 5% under the null hypothesis with a one sided exact binomial test with R (R-Core-Team, 2020) (command "binom.test"). The alternative hypothesis was that there are more than 5% significant tests. This test was repeated 20 times across the different loci. To take into account this repetition of independent tests, we used the Benjamini and Hochberg's (BH) false discovery rate (FDR) procedure (Benjamini & Hochberg, 1995) with R (R-Core-Team, 2020) (command "p.adjust") to identify which tests are really significant (see (De Meeûs, Guégan et al., 2009)).

Figure 1:
Examples of significant stuttering tests using MicroChecker graphic outputs. Black crosses represent the number of observed genotype of a given class and red diamonds stand for the corresponding values expected under the null hypothesis (random mating). The abscissa are the different genotypic classes in terms of size differences between the two alleles within an individual (e.g. 0 stands for homozygous genotypes). 1-A: an example with a significant homozygous excess and a significant deficit of heterozygotes with one repeat size difference between the two alleles. 1-B: an example where only the deficit of heterozygotes with one repeat size difference between the two alleles was significant.
In the present paper, both situations are considered significant while only the first one is for MicroChecker.

Alternative method to detect and test for stuttering
We needed to compute the expected frequency of individuals heterozygous for two alleles with one repeat difference, for each locus over all subsamples. All allele frequencies outputted and sorted by Fstat were copied in a spreadsheet. Let us assume, for instance, that subsample size was in cell B3, that the size of the smallest allele of the first locus of the first subsample was in cell A4 and its frequency in cell B4, and allele size and allele frequency of the second allele was in cells A5 and B5 respectively. Then the expected frequency of individuals heterozygous for two alleles with one repeat difference was obtained by typing the following command in, for instance, cell C4: =IF(ABS($A4-$A5)=1,2*B4*B5*B$3,0) As can be seen, for two successive alleles with more than one repeat difference, this expected frequency was set to 0. Please, note that for a dinucleotidic locus the difference in size must be two (e.g. ABS($A4-$A5)=2). For imperfect dinucleotidic loci, the conditional command would be of the form =IF(OR(ABS($A4-$A5)=1, ABS($A4-$A5)=2),2*B4*B5*B$3,0), to include both the cases of one base difference, which may also generate stuttering, and of two bases (one repeat) difference . Now, if the penultimate allele is on line 10 of the spreadsheet, the sum of all expected heterozygotes with one repeat difference for the concerned locus and subsample was obtained by typing the following command in, for instance, cell C12: =SUM(C4:C10). Finally, if this sum for the last subsample is in column U, then, the total number of expected heterozygotes with one repeat difference across all subsamples for that locus was obtained with: =SUM(C12:U12). Then, we needed to compute the observed frequency of such heterozygotes. For this, we copied the raw data (one allele per column) in a spreadsheet. Let us assume that the first allele of the first locus of the first individual was in cell B2 and the last allele of the last locus of the first individual was in cell AO2. In cell AQ2 we typed: =IF(ABS(B2-C2)=1,1,0). Please, note again that for dinucleotidic loci, the difference in size would be two, and it should be one and two for imperfect dinucleotidic loci. We then copied this command in all remaining cells corresponding to the rest of the dataset. In the cell AQ202 (below the last line of the data), to compute the total of observed heterozygous individuals for alleles with one repeat difference, we typed: =SUM(AQ2:AQ201). A template, for the first simulation, is available in the spreadsheet file "TestStutterDioeciousNoStutter-n1000N100-1-FstatRes.xlsx".
We then compared the observed and expected frequencies with a one sided exact binomial test with R, the alternative hypothesis being "there are less heterozygote observed with 1 repeat difference than expected". This provided 20 independent p-values that we corrected for False Discovery Rate with the Benjamini and Hochberg's procedure.
With selfing, natural homozygosity increase may artificially enhance stuttering detection. Nevertheless, since there is no easy way to estimate precisely the selfing rate in case of stuttering, we did not adapt stuttering detection.
For clonal propagation, full clonality exhibits specific signature regarding genetic diversity, FIS and linkage disequilibrium (De Meeûs & Balloux, 2004, 2005De Meeûs, Lehmann et al., 2006;Séré, Kabore et al., 2014;Stoeckel & Masson, 2014;De Meeûs, 2015). Nevertheless, as shown by the simulations we undertook, the absence of segregation makes it impossible to predict the expected frequency of specific heterozygous classes. This led us to entirely modify stuttering detection in that case. Knowing that the expected proportion of homozygous sites is QI=1/K in clonal populations (De Meeûs, 2015), the expected total heterozygosity should be HI=(K-1)/K (K is the total number of possible alleles). The quantity K is never known, so we considered the total number of alleles observed Ko, as an underestimate, with HI'=(Ko-1)/Ko. Some heterozygote kinds are expected to be more frequent than others. We thus considered frequencies pi and pj of alleles i and j, to weight expected values by 2pipj and built an "expected" heterozygote frequency between alleles i and j as: We can see that this way the sum of all Hexpij indeed reaches HI'. For each locus, Hexpij was computed for each heterozygous class with one repeat difference within each subpopulation. We then summed all these expected frequencies over all heterozygote classes with one repeat difference and multiplied it by 20 (subsample size) to obtain the expected number of heterozygotes with one repeat difference for a given subsample. We then summed the results obtained across all the 10 subsamples to obtain the total number of expected heterozygotes with one repeat difference and compared it, for each locus, to the one observed in the simulation. Since Ko≤K, these expectations may be under-estimates of the real expected frequencies.
We might thus expect a deficiency in stuttering detection. Alternatively, since drift should favor particular heterozygous classes by chance, we also expect a total lack of other heterozygous classes, which may lead to strongly significant spurious stuttering signatures.

Estimation of fixation indices estimation and linkage disequilibrium
Wright's F-statistics (Wright, 1965) estimated with Weir and Cockerham's estimators (Weir & Cockerham, 1984) were computed. FIS measures inbreeding of individuals relative to inbreeding of their subpopulation and FST measures inbreeding of subpopulations relative to the total inbreeding. We also computed the 95% confidence intervals of bootstrap over loci of these statistics. These were estimated and computed with Fstat 2.9.4 (Goudet, 2003) updated from Fstat 1.2 .
Amplification problems can increase the variance of F-statistic estimation across loci, and this affects more the FIS than the FST (De Meeûs, 2018). We used the jackknife over loci estimate of the standard error of FIS and FST (StdrdErrFIS and StdrdErrFST) of Fstat to measure the effect of stuttering on parameter variation across loci. In particular, in case of null alleles, StdrdErrFIS is at least twice StdrdErrFST (De Meeûs, 2018). We thus measured the ratio RSE=StdrdErrFIS/StdrdErrFST.
Linkage disequilibrium can be favored by allele miscoring (De Meeûs et al., 2021). We thus tested linkage disequilibrium between all pairs of loci with the G-based randomization test of Fstat over all subsamples because it is the most powerful for combining tests across subsamples ). The False Discovery Rate for dependent tests series was computed following Benjamini and Yekutieli procedure (Benjamini & Yekutieli, 2001) with R (command p.adjust).

Statistical comparisons of method performances
Performance of tests were compared with the Fisher exact test with R-commander package (Fox, 2005;Fox, 2007) for R.
We also undertook generalized linear mixed models with the package lme4 (Bates, Maechler et al., 2015) of R to explain the number of times a test appeared significant. We used a Poisson distribution with a log link. The models were of the form NSig~n+N+Mating+Stuttering+Mating:Stuttering+ (1|Rep) where NSig was the number of loci that outputted a significant stuttering test or the number of locus pairs that appeared in significant LD, n is the total number of subpopulations, N was the size of subpopulations, Mating was the mating system (either pangamic dioecy or Monoecy with 30% selfing), Stuttering was either 0 (no stuttering) or 10 (10% stuttering), ":" stood for interaction between two variables, and (1|Rep) was the random effect of replicates.

Cured data sets
Stuttering correction was made for loci that appeared with a significant stuttering at the BH level with the new method described in the present paper. We used the rules described in (De Meeûs et al., 2021): for each incriminated locus, all alleles with one repeat difference were pooled together. Each group of pooled alleles contained one allele with a frequency of at least 5%. The main principle behind this rule is that rare alleles should keep small weights in the data. Pooling rare alleles together may artificially create a fairly frequent artificial allele, with a strong though artificial weight. Pooling rare alleles with a reasonably frequent one is supposed to attenuate this problem. If no frequent allele was available, then two solutions were chosen. If the sum of the frequencies of these alleles remained below 0.05, these were not pooled. Otherwise, to minimize the impact that these successive alleles may jointly have on the heterozygote deficit, and to avoid pooling rare alleles together, they were pooled with the closest allele with frequency above or equal to 0.05, even if more than one repeats distant from the closest one.
In rare cases, all alleles were one repeat different. To prevent obtaining a monomorphic locus in that case, we pooled alleles two by two, taking care of one of the two alleles pooled displayed a frequency of at least 0.05. In case of uneven number of alleles, the last allele was not left alone and pooled with the previous pair in allele size.
Cured data were reanalyzed and statistics compared with the results expected under the null hypothesis (without stuttering). The efficiency of the correction was checked for each locus in each replicate of each simulation. We retained only the corrected loci for which the correction produced a lower FIS as compared to the value obtained with the uncured locus. The average FIS and 95% bootstraps confidence intervals were thus computed on data sets with efficiently cured loci and uncured remaining loci. These values were compared to those obtained under the null hypothesis (no stuttering), and to expected value for Wright's FIS, FIS_exp. For pangamic dioecious populations we used equation 8 from Balloux (2004) (Balloux, 2004): FIS_exp=-1/(2N+1). For selfing populations, we used the classic FIS_exp=s/(2-s) (e.g. (De Meeûs, McCoy et al., 2007) page 213).

Real data sets
Five data sets were reanalyzed: two regarding dioecious species, two regarding monoecious species and the last regarding a clonal species.
The first real data set reanalyzed was on the tick Ixodes scapularis the vector of Lyme disease in western USA (De Meeûs et al., 2021). We used the data cured for short allele dominance (SAD) as explained in the originator paper (De Meeûs et al., 2021) but uncured for stuttering.
The second data set concerned the tsetse fly Glossina palpalis palpalis, an important vector of sleeping sickness in Côte d'Ivoire (Berté et al., 2019), for which loci X55-3 and pGp23 displaying uncured SAD and locus GPCAG, obviously under selection, were removed. Because some loci were X-linked, only females were kept. Three dinucleotidic loci (pGp20, pGp24, B3) displayed some discrepancies of allele sizes and were marked as mononucleotidic for MicroChecker. For these loci, heterozygotes with single and double nucleotide differences in size were checked for heterozygote deficit due to stuttering.
The third and the fourth data sets concerned the highly selfing snail Galba truncatula and its parasite Fasciola hepatica, also monoecious but almost panmictic, in France (Correa et al., 2017). For both species, microsatellite profiles did not fit with the expected pure dinucleotidic motives and, again, stuttering was considered between alleles of 1 base and two bases differences in size.
Finally, the clonal species studied was Trypanosoma brucei gambiense 1 in Western African foci of sleeping sickness (Koffi et al., 2009), for which one locus (Trbpa1/2), suspected of being under selection, was removed. For this data, we used a derived version of Séré et al superimposition criterion (Séré et al., 2014). In pure clonal populations, the expected value for FIS is FIS_exp=-(1-HS)/HS, where HS is Nei's estimator of local genetic diversity (measured within subsamples) (Nei & Chesser, 1983). This criterion can only be used for sufficiently polymorphic loci with HS≥0.5. To express the goodness of fit of observed FIS towards this value, we designed a superimposition index SC=|FIS-FIS_exp|/max(|FIS|,| FIS_exp|), where "max" means the maximum value of the two FIS's absolute values.

Results and discussion
A synthetic view of simulation results, averages detail computations and test tables are available in the supplementary file S1 for sexual simulations and in the supplementary file S2 for clonal simulations.

Detection of stuttering in sexual populations
The results of these analyses are summarized in Figure 2. Stuttering detection per locus was weak in general, with 0‰ and 5‰ significant tests for MicroChecker and the alternative methods, respectively, under the null hypothesis (H0: there is no stuttering) in monoecious populations. For Microchecker, the total proportion of significant tests over all loci and subsamples was 2‰ in that case. No test stayed significant after Benjamini and Hochber correction. In populations with 30% selfing, but still under H0 (i.e. no stuttering), these proportions increased to 10% for the alternative method only. It dropped to 2.5‰ after Benjamini and Hochberg correction. For MicroChecker over all loci and subsamples, 2% only appeared significant under H0 with selfing. With stuttering (H1), the number of significant tests reached 7% and 24% in dioecious populations for MicroChecker and alternative methods respectively (5% and 17% respectively with Benjamini and Hochberg). This reached 14% and 47% respectively with 30% selfing (10% and 38% respectively with Benjamini and Hochberg). Figure 2: Statistical comparisons, within each kind of simulation, between MicroChecker and the alternative method (New method), between reproductive systems and, within the alternative method, between dataset without or with 10% stuttering to detect stuttering under the null hypothesis with no stuttering or with 10% of alleles affected, and for different system of mating (dioecious pangamy or 30% of selfing). Significant differences (all p-values<0.0001) are indicated by different letters and correspond to Fisher exact test comparing number of significant (S) and not significant (NS) tests (before Benjamini and Hochberg correction) between the two methods. Asterisks stand for comparisons, within a single parameter set, between Microchecker and the alternative method. To this respect, the p-value within dioecious pangamic (Pangamy) populations with 0% selfing was p-value=0.2494. Figure 2: Statistical comparisons, within each kind of simulation, between MicroChecker and the alternative method (New method), between reproductive systems and, within the alternative method, between dataset without or with 10% stuttering to detect stuttering under the null hypothesis with no stuttering or with 10% of alleles affected, and for different system of mating (dioecious pangamy or 30% of selfing). Significant differences (all p-values<0.0001) are indicated by different letters and correspond to Fisher exact test comparing number of significant (S) and not significant (NS) tests (before Benjamini and Hochberg correction) between the two methods. Asterisks stand for comparisons, within a single parameter set, between Microchecker and the alternative method. To this respect, the p-value within dioecious pangamic (Pangamy) populations with 0% selfing was p-value=0.2494.
As can be seen from Figure 2, selfing significantly increased stuttering detection, even under H0, where it significantly appeared above the 5% threshold (p-value<0.0001).
Expectedly, stuttering was much more easily detected in populations with stuttering than under H0 (Figure 2).
The stuttering proportion used here (10%) was relatively small, since the realized actual proportion of alleles affected at the end of simulations was in general much lower on average. This also explains why the power of stuttering detection appeared quite small. With higher values, we may expect that the method proposed here will be very accurate, especially in inbred populations (selfers).

Fixation indices and linkage disequilibrium
The results for F-statistics are presented in Table 1. With 10% stuttering, we observed a significant heterozygote deficit of 4% in pangamic dioecious populations. With 30% selfing, FIS reached 20%. Here, the difference between 0 and 10% of allele submitted to stuttering was not significant (95% CI overlap).
The ratio between standard errors of FIS and FST was 1.04 on average under H0 in 95% CI=[0.95, 1.14] and reached 1.77 in 95%CI=[1.52, 2] with stuttering. It thus increased slightly with stuttering but rarely reached RSE=2 on average, as was observed in case of null alleles (De Meeûs, 2018). The proportion of locus pairs in significant LD varied between 8% and 53% depending on the population structure and mating system (average 18%). With the Benjamini and Yekutieli False Discovery Rate correction, this varied between 0% (majority of cases) and 15% (average 3%). The effect of stuttering on LD was never significant, whatever the mating system (all p-values>0.388). This could be expected since in our simulations, stuttering was not correlated between loci. In real datasets, however, it may occur that stuttering happen in samples with issues (poor preservation, mutations affecting the zone of primers' anchorage, low DNA concentration). In that case, several loci of the same individuals will be affected together, then producing fake significant LDs, as was observed for the tick I. scapularis (De Meeûs et al., 2021).

Generalized linear mixed models
The generalized linear mixed models confirmed the results seen above with more accuracy. For the number of significant tests, the results figure in Table 2. All parameters appeared to display a significant effect that stayed so after BH correction. The most important parameters were stuttering (positive effect), mating system (selfing increases the effect) and their interaction (more effect of stuttering in random mating dioecious populations). Number of subpopulations and subpopulation sizes displayed a rather weak (though significant) negative effect, but this is probably an artefact due to inconsistencies of results as a function of n or N (see supplementary File S1). For instance, N=50, under H0 with random mating, provided the smallest numbers of significant stuttering while N=100 provided more significant results than N=200. In the same framework, for subpopulation numbers, it was n=500 that provided the smallest number of significant tests, followed by 100 and 50. Similar observations can be done for 10% of alleles affected by stuttering and/or in monecious populations with 30% selfing (see supplementary File S1). Table 2: Summary of the generalized linear mixed model for the number of loci found with a significant stuttering (response variable) with the new alternative multi-subsamples method. Explanatory variables were: n (number of subpopulations), N (subpopulation size), mating system (dioecious pangamy or monoecy with 30% selfing) and stuttering intensity (0 or 10%). In case of qualitative variables (mating system), the modalities with least positive effects are compared to the one with the most positive effect (not shown in the output of the analysis). ":" stands for the interaction between two variables. Coefficient estimates (Estimate), standard error (SE), the Z statistic and its p-value are given. For LD, results are presented in Table 3. The main effects were hold by n, with a positive impact, and mating system with a strong negative impact of random mating as compared to selfing. Subpopulation sizes seemed to play a weaker though significant role. Nevertheless, the pattern of simulations explored introduced a strong collinearity between n and N. If n is removed from the model, then N become highly significant with a much stronger (as expected) negative impact (Coefficient of estimate=-0.006). Stuttering did not influence at all the occurrence of significant linkage disequilibrium between pairs of loci. These conclusions did not change when the Benjamini and Hochberg procedure was applied to these series of pvalues. Table 3: Summary of the generalized linear mixed model for the number of locus pairs found with a significant linkage disequilibrium. Explanatory variables were: n (number of subpopulations), N (subpopulation size), mating system (dioecious pangamy or monoecy with 30% selfing) and stuttering intensity (0 or 10%). In case of qualitative variables (mating system), the modalities with least positive effects are compared to the one with the most positive effect (not shown in the output of the analysis).
":" stands for the interaction between two variables. Coefficient estimates (Estimate), standard error (SE), the Z statistic and its p-value are given.
Clonal populations generated very high proportions of false stuttering detections. This was because without segregation of alleles, only some classes of heterozygotes propagate by chance. Heterozygote classes with one repeat difference are quite rare. Indeed, if all 20 alleles were present (which never happened), there would be 19 heterozygote classes with one repeat difference amongst ( 20 2 )=190 possible heterozygous states, hence a proportion of 0.1. Nevertheless, not all allele combinations were kept by drift, which means that in many instances, even if some alleles were one repeat different at the end of simulations, no such heterozygotes were kept by drift. This produced many tests with very small p-values<0.0001, but also many with very high p-values>0.5: 43 % and 53 % respectively. Put it another way, many simulations ended with no individuals heterozygous for two alleles with one repeat difference, and many others with too many of those (more than 20%): 40 % and 20 % respectively. This also explains why stuttering did not have much impact on the global results, such as FIS estimates. It means that much more than 10% stuttering will be needed to significantly affect parameter estimates in clonal organisms.
Nevertheless, given the lack of performance of the expected frequency of heterozygotes with one repeat difference proposed in the present paper, direct attempts of stuttering cure may be the best option in populations with very high clonal rate (0.9<c≤1). This would apply to all loci with excessive heterozygote deficit. If the cure successfully lowers the FIS of a given loci, then it probably means that this locus was indeed affected by stuttering. Loci with unsuccessful cure should just represent outliers that are expected in such populations with very small rates of meiotic sex (Balloux et al., 2003). Intermediate populations (0.2≤c≤0.9), should converge towards panmictic expectations, but only at equilibrium, which may take some time (Reichel, Masson et al., 2016), while smaller clonal rates should swiftly fit panmictic expectations to this respect. In such situations, , using the panmictic expectations will probably display better detection performances.

Cured data
The alleles that were indeed pooled are highlighted with the same color and can be found in excel files that are contained in the supplementary file S3 (zipped file) "PoolingProtocolCureSupFileS3.zip ".
In dioecious pangamic simulations, cured data did not entirely fix the stuttering problem. Indeed, using the new method for stuttering detection, eight tests (1.3 %) remained significant after BH correction, which is significantly more than the initial absence of significant test (Fisher exact test, p-value=0.0076), before stuttering was introduced (H0). In monoecious populations with 30% selfing, two tests remained significant (0.5 %), which is not significantly more than the initial result (one significant test) under H0 (p-value=1).
Regarding FIS estimates, in pangamic simulations, the fit between the expected values and the one observed in simulations under the null hypothesis, was not very good and better for larger n's (not shown). We thus preferred the more complex but more accurate equation 11 in (Vitalis, 2002): With this equation, the fit was very good, as can be seen in Figure 3.

Figure 3:
Comparison of Wright's FIS estimates obtained in different simulations in dioecious pangamic populations, without stuttering, with 10% stuttering and with 10% stuttering cured data. The F IS_exp , according to Vitalis (2002), is represented, as are the line of F IS =0, and 95% bootstraps confidence intervals (Li and Ls).
From this figure, we can see that stuttering had a significant impact on FIS estimates, driving it sometimes quite far above the expected value. Cured data, though always providing values higher as compared to H0, always included the expected value in their 95% confidence intervals.
As can be seen from Figure 4, in monoecious simulations with 30% of selfing, the fit is almost perfect between FIS_exp and the value obtained under H0 (no stuttering). Stuttering expectedly significantly increased observed FIS, while cured data presented values that were almost superimposed with expected ones.

Figure 4:
Comparison of Wright's FIS estimates obtained in different simulations in monoecious populations with 30% selfing, without stuttering, with 10% stuttering and with 10% stuttering cured data.
The FIS_exp=s/(2-s), is represented, as are the 95% bootstraps confidence intervals (Li and Ls) around observed values.
In conclusion, 10% stuttering significantly increased FIS and the cure used reasonably restored the FIS expected under the null hypothesis, and particularly so in inbred populations (with 30% of selfing).

Real datasets
For I. scapularis from Western USA, MicroChecker test only found one locus with significant stuttering out of nine (i.e. 11%), after exact binomial tests and Benjamini and Hochberg correction. Alternatively, the new method developed here found three loci out of nine loci with a significant stuttering (33%). Cured data provided results in agreement with a pangamic reproductive strategy, as was already observed by the authors (De Meeûs et al., 2021).
For Glossina palpalis palpalis in Côte d'Ivoire no binomial tests provided a significant result with MicroChecker statistics. For the new method presented here, two loci (B3 and XB110) out of seven (22.22%) displayed a significant stuttering. Cured data set was obtained by pooling alleles following De Meeûs et al (2019)'s rules (De Meeûs et al., 2021) (see Appendix 2, A2.1). This cure provided a slightly lower FIS=0.221 (instead of 0.231) for locus B3, but a higher one for locus XB110 (FIS=0.277 instead of 0.252), meaning that the heterozygote deficit at this locus was better explained by another phenomenon (i.e. null alleles).
In G. truncatula, after Benjamini and Hochberg correction, 50% of loci displayed a significant stuttering, while all six loci were significant with the new method (all p-BH<0.0104). Allele pooling for curing the data was as described in Appendix 2 (A2.2). This cure lowered the FIS for three loci, Lt9 (0.778 to 0.776), Lt16 (0.958 to 0.006), and Lt24 (0.966 to 0.947) and increased it for the others. However, missing data (assumed null homozygotes) explained almost 50% of FIS variation, with a highly significant Spearman's rank correlation between FIS and number of blank genotypes (ρ=0.9411, p-value=0.0025), while with the data set cured for loci Lt9, Lt16 and Lt24, the correlation dropped to a non-significant value (p-value=0.2589). When removing the correction for locus Lt16, which provided an outlier as compared to other loci, the correlation became significant again (p-value=0.0103), but with a smaller correlation (ρ=0.8804) and only 41% of FIS variation explained by missing genotypes.
For F. hepatica, no locus was significant with MicroChecker, while only one locus (Fh28) presented a significant stuttering signature after Benjamini and Hochberg correction (p-BH=0.0001). Pooling of alleles at this locus is described in Appendix 2 (A2.3). The FIS of cured data dropped from 0.644 to 0.536. However, we knew that null alleles explained well most of the observed heterozygote deficit (Correa et al., 2017), and the correlation between missing data and FIS, when excluding one locus (Fh25) that displayed too many missing genotypes, was initially significant (ρ=1, p-value=0.0417), with 83 % of the FIS variation explained by blank genotypes. In the cured data, it dropped to a not significant relationship (ρ=0.8, p-value=0.1667), with 72 % of FIS variation explained by missing genotypes.
These discrepancies strongly suggested that stuttering detection in G. truncatula and F. hepatica corresponded to type error I, due to the fact that null alleles better explain the data and probably interacted with our stuttering detection test. It shows that several checks appeared necessary before deciding that a locus is significantly affected by stuttering and requires being cured, especially in selfing species.
For the clonal T. brucei gambiense 1, we avoided using MicoChecker (for obvious reasons). With the method expounded in the present paper, three loci displayed a significant stuttering signature: micbg1, misatg4 and misatg9. These loci were cured as in Appendix 2 (A2.4). These loci presented a lower FIS when cured: from -0.647 to -818, from -0.579 to -0.72, and from -0.471 to -0.496 for micbg1, misatg4 and misatg9, respectively. Moreover, several observations suggested an improvement of the quality of the data after the cure for stuttering. The proportion of significant linkage disequilibrium between locus pairs increased from 53% (uncured data) to 80% (data cured for the three loci), and the superimposition SC increased from 0.9554 to 0.9782, from 0.9800 to 0.9975, and from 0.9539 to 0.9663 for micbg1, misatg4 and lisatg9 respectively. Three loci with missing genotypes could be suspected to display null alleles, i.e. misatg4, misatg9 and m6c8. When these loci were removed, averaged superimposition increased to SC=0.9908. Here again, we can see that some checks allowed deciding that stuttering corrections were valid and significantly improved the quality of the data. After stuttering cure, removing other loci with suspected null alleles (i.e. with missing genotypes) drove the superimposition index defined in the Material and Methods section to almost unity, i.e. a perfect fit with the expected value under full clonality known to occur in that species (Weir, Capewell et al., 2016).

Conclusions
The new method developed here appeared at least three times more efficient (and often much more) than MicroChecker. Moreover, the use of spreadsheet programs makes its portability universal for any microcomputer.
In dioecious pangamic populations, like ticks and tsetse flies, detection works well and cure improves population genetics parameter estimates but not perfectly so, which means that, for instance, FIS and FST will still be slightly overestimated in datasets cured for stuttering. So, whenever possible, removal of affected loci may help to shift such estimates towards (slightly) more accurate values.
In monoecious selfers, detection works well and cure works very well, providing other confounding factors as null alleles do not interfere, in which case avoiding stuttering cure and correct for null alleles appear more appropriate. In doubt, and for subdivision measures, curing for null alleles may be achieved by the elimination of involved loci for strong selfers, or applying the INA correction (Chapuis & Estoup, 2007), for reasonably panmictic populations.
In highly clonal populations, only loci with the highest F IS as compared to other loci, may require stuttering cure, in which case parameter estimates may be much improved. For populations with intermediate clonal rates at genotypic equilibrium, or highly sexual populations, panmictic expected frequencies will be more accurate. Nonetheless, parameters' behavior in partially clonal populations is much harder to predict and will always need extreme caution.
In any way, cure must only be kept if FIS values are improved (lower than it was initially) and special care must be devoted to the behavior of their variation in relation to other factors as number of missing data (null alleles) or superimposition index (clonal populations).
The stuttering detection and cure strategies proposed in the present paper will help interpreting microsatellite data with more accuracy and at the lowest cost. This will be particularly helpful in non-model organisms, as parasite-vector systems, for which microsatellite markers still represent the best cost benefit ratio.