A new and almost perfectly accurate approximation of the eigenvalue effective population size of a dioecious population: comparisons with other estimates and detailed proofs

The effective population size is an important concept in population genetics. It corresponds to a measure of the speed at which genetic drift affects a given population. Moreover, this is most of the time the only kind of population size that empirical population genetics can give access to. Dioecious populations are expected to display excesses of heterozygosity as compared to monoecious panmictic populations, as measured by Wright’s FIS. It can be shown that these excesses are negatively correlated with the population size. This is why FIS can be used to estimate the eigenvalue effective population size of dioecious populations. In this paper, we propose a new approximation that provides a very accurate estimate of the eigenvalue effective population size of a dioecious population as a function of the real population size. We then explore the accuracy of different FIS-based methods using the leading eigenvalue of transition matrices or coalescence. It appears that the eigenvalue-based method provides more accurate results in very small populations, probably due to approximations made by the coalescence approach that are less valid in such situations. We also discuss the applicability of this method in the field.


Introduction
(3) = 4 + 1 2 + 1 2 From equations (2) and (3), and for sex-ratios (SR) that are not too female biased (e.g. − < √ 2 ⁄ , for Equation 2), one can see that dioecy tends to slightly increase Ne. This is obviously a consequence of the supplementary delay required for two alleles to become identical by descent in the same individual, since selfing cannot occur. Another consequence is that dioecious populations are expected to display heterozygote excesses (Robertson, 1965). This led to formalizing the expected deviation of heterozygote frequency in dioecious populations, as measured by Wright's FIS (Wright, 1965), which may provide a simple tool to estimate the effective population size, assuming even sex ratios. Using simple algebra on observed and expected heterozygosity, Pudovkin et al. (1996) proposed the following equation (see Appendix 5 for a detailed proof) for the eigenvalue effective population size: (4) = − 1 2 IS They also proposed a supposedly more accurate approximation with their equation 4 (but see also Appendix 5): (5) = − 1 2 IS + 1 2(1− IS ) Balloux (2004) proposed another solution, based on the coalescent effective population size and requiring quite cumbersome analytic treatments, which are detailed in Appendix 6:

The general model of a dioecious pangamic population
For now, and unless specified otherwise, we assume a dioecious diploid population of constant size and sex-ratio, with discrete generations, no mutation and no migration. At each generation, alleles that will be present in an individual of generation t were randomly drawn with replacement in the pool of gametes of their two parents (or from infinite pools of gametes), and females and males are polygamous and mate randomly. For a dioecious population with Nf females and Nm males, probabilities of identity within individuals QI(t) and between individuals within the subpopulation QS(t) at generation t respectively are (see for instance Balloux (2004) two different alleles are sampled from the same individual with probability ½, but in that case they are identical by descent with probability QI(t-1). If the two alleles came from two different females, the probability that these two alleles are identical by descent is QS(t-1). The same reasoning applies to the twomales case. For the one-female-one-male case, the probability that the two alleles are identical is also QS(t-1).
Combining equations 7 and 8, we get: Assuming λ to be constant from one generation to the next, and dividing all terms by HS(t-1), Equation 10 writes: Note that the same results can be obtained with the leading eigenvalue of the transition matrix describing the evolution of genetic identities (Appendix 7).
For a monoecious panmictic population: and in that case: We now need to combine Equations 11 and 12 to get:

A new approximation
According to Taylor-MacLaurin's expansion series, √1 + ≈ 1 + The reasons for this discrepancy between these two sets of equations are unclear due to the lack of details in Balloux' paper. For his equation 10, Balloux simply cites Wright's book (Wright, 1969) (Felsenstein, 2019)). Appendix 6 provides a detailed proof for Balloux's equation 10 in the (simpler) case of even sex-ratios.
In Figure 1, it can be seen that the first approximation found in Wright's book (Equation 1), as in all population genetics textbooks, strongly underestimates Ne, except for very big populations (as expected), as compared to other approximations. Wright's second equation and Balloux's one seem to display an equivalently small bias, though in varying directions for Balloux's equation, depending on the sex-ratio. This can be seen with a study of the sign of ΔNe_B=Ne_Eq3-Ne_Eq13 (see appendix 9), where we notice that with the unique valid root of ΔNe (SR2), Balloux's equation will provide an over-estimate when SR>SR2, an under-estimate when SR<SR2 and will be accurate when SR=SR2=3-2√2≈0.1716 (e.g.SR≈1/6; SR≈4/23). From there, it is easily deduced that, in Figure 1, positive values of ΔNe_B correspond to SR>SR2, negative ones to SR<SR2, and close or very close to accuracy around this threshold (for instance 35/204 is very close to SR2). This bias is very small when Ne>10 (Figure 1). Finally, the new simplified estimate proposed in Equation 15 perfectly fits to Equation 13, except for very small Ne<4 where a very small overestimate can be noticed (Figure 1). Figure 1 -Comparisons of the performances of different approximations of effective population size (Ne_a) in dioecious populations with uneven (left) and even (right) sex-ratios, as compared to equation 13 (Eq 13) (Ne). Performance was measured as Δe=(Ne_a-Ne)/Ne, with Wright's equations 1 and 2 (Eq1 and Eq2 respectively), Balloux (Eq3), and the new simplified estimate of the present paper (Eq15).

Estimating the effective population size from Wright's FIS
As seen above with equations 4, 5 and 6, heterozygote excesses expected in pangamic dioecious populations as computed by Wright's FIS, can give access to an estimate of Ne from genotypic data. In the following, we propose other FIS based estimates.
Let Hexp and Hobs be the expected (under Castle-Weinberg expectations) and the observed proportion of heterozygotes in the population, respectively. We can express these proportions as the probabilities of drawing at random two different alleles in the population HS(t) and in an individual HI(t) respectively at any generation t: Hexp=HS(t) and Hobs=HI(t). Finally, from equation 7, we can see that HI(t)= HS(t-1). If we take Nei's parametric definition of FIS (Nei & Chesser, 1983): We can combine equations 10 and 17 to obtain: Now, combining equation 18 with equation 12 yields: This result is the same as Equation 4 plus one half of an individual. We can also express N as a function of FIS in a dioecious population with an even sex-ratio (Appendix 10): Here, N can be called the effective number of breeders, which must not be confused with the effective population size. Notice that equation 20 differs from equation 4 by the subtraction of half an individual.
If we use Equation 20 in Equation 16 and rearrange the formula we get (Appendix 10): We can here note that, if we use equation 20 in equation 2 (Wright (1969), page 197), with an even sex-ratio, we obtain equation 4 (first FIS based estimate of Pudovkin et al. (1996)).
We have estimated FIS based Ne under various population sizes and sex-ratios, using the expected FIS computed in Appendix 11: In the Figure 2, it can be seen that, as compared to expected values (Equation 13) Equations 5 (Pudovkin et al., second equation) and 19 (simple equation of the present paper) tend to over-estimate Ne as compared to other estimates, unless population size becomes big enough. Equations 4 (Pudovkin et al., first equation) and 6 (Balloux) present an equivalent bias but in the opposite direction (under-estimate and over-estimate, respectively). This bias is only visible for small Ne's (i.e. Ne<10). Interestingly, the average of these two is exactly equation 21 of the present study, which fits perfectly with the expected one, with an extremely small over-estimate for the smallest Ne<3. ) estimates of effective population size (Ne_a) in dioecious populations, as compared to equation 13 (Eq 13) (Ne), measured as Δe=(Ne_a-Ne)/Ne, with Pudovkin et al. equations 1 and 2 (Eq 4 and Eq 5 respectively), Balloux (Eq 6), and the new simplified estimates of the present paper: the most simple (Eq 19), and the final one (Eq 21).

Discussion
Genetic drift can be influenced by several factors other than dioecy, population size and sex-ratio: reproductive variance, non-overlapping generations, changes in population size and/or sex ratio, subdivision and selection. Such complications may lead to very cumbersome algebra if one wanted to present a more general expression for the effective population size.
The main goals of the present study were, in decreasing order of importance. 1) to present an improved approximation of the classic case where only dioecy and/or uneven sex-ratio have an effect and compare it to previous approximations; 2) to present detailed derivations of both the old and new results that would be accessible to most readers; and 3) to provide an improved FIS estimate deriving from this new approximation, to be used in empirical population genetics studies instead of the two most often used methods: Pudovkin et al. (1996) and Balloux (2004) (11 and 8 citations per year respectively according to Google scholar's based research in the computer program Publish and Perish 8.8.4275.8412 (Harzing, 2007)).
We do not expect any natural population to closely follow the model we explored in the present work. Nevertheless, excluding reproductive system variation and selection, any combination of the factors mentioned above will tend to reduce Ne and FIS accordingly, and most probably their variance, with not much harm to the average expectations. Partial selfing (e.g. deriving from some kinds of automictic parthenogenesis of females (De Meeûs et al., 2007)), or partial sib-mating, should be easily detected, produce generalized heterozygote deficits and thus exclude our method. Clonal propagation (e.g. through a special kind of endomitotic automixis of females (De Meeûs et al., 2007)) should also be easy to detect (De Meeûs et al., 2006) and again the FIS based method would be dismissed. Selection is locus specific and should only affect one or two of the loci used, which consequently should be easy to detect and exclude. Subdivision can have two effects: if there is a Wahlund effect, this should be easy to detect (Manangwa et al., 2019); and if not, highly subdivided populations should exhibit effective sub-population sizes that are very close to the one that these would exhibit if totally isolated (because rare immigrants are not expected to display much influence), and if not, subsamples should all converge toward the total effective population size, which should be easily detected too (Ravel et al. (2023), see also Waples & Do (2010), and Waples & England (2011)). Additionally, in most situation, empiricists have no idea of the effective sex-ratio, of the scenarios regarding how generations overlap, or how population size fluctuates across generations. Consequently, complication of estimates will neither allow an easy understanding of the mathematical developments, which was an important goal of the present work, nor take into account with certainty the real scenarios that occurred in the population under investigation. This is why, even if the new estimate will hardly give significantly different values as compared to the previous ones, we think it is still better using the theoretical one that is closer to the exact expected value (equation 21) and interpretable on more sound biological means (see below).
The new approximation proposed in Equation 15 is equivalent to what is expected in large dioecious populations (Equation 1), plus half an individual, plus half of a coalesced individual in a large dioecious population. As far as we know, such added quantities were never discussed before (but see Felsenstein (2019), page 266). We can here try to provide some biological interpretation of such quantities. One half of heterozygosity is lost when an individual reproduces by selfing. In a panmictic population (i.e. monoecious) of size N, a proportion 1/N of gametes are produced by selfing (Rousset, 1996), in which case half the genes coalesce in the progeny (PCM=1/2N). This can happen in the N individuals. Hence (1/N)×(1/2)×N=1/2 individuals are produced with coalesced genes per generation through selfing in a WF population. This means that one half of such coalescent event does not happen when random selfing is forbidden, as it is necessarily the case in dioecious populations. Additionally, in very big dioecious populations, Ne≈4NfNm/N (Equation 1). For any diploid population, the instantaneous probability of coalescence is PC=1/(2Ne) (see Laporte & Charlesworth (2002), Equation 7;or Nomura (2008), Equation 3). Consequently, for a very big dioecious population, this probability becomes (Equation 1) PCBD=(1/2)×N/(4NfNm). The number of individuals concerned are those that inherited twice the same allele from their grand-parents, which is (1/Nf)×(1/4)×Nf for females and (1/Nm)×(1/4)×Nm for males, hence ½ individuals. This yields PCBD×(1/2) individuals. In small dioecious populations, such coalescent events hardly happen, because as long as polymorphism is kept, male and female parents that mate randomly can only rarely have sampled twice the same alleles from the same grand-parent. These two differences with 1) panmictic populations, and 2) big dioecious populations, may explain Equation 15. In other words, if we call Ne_BD the effective population size of a very big dioecious population, NNCNS the number of individuals that cannot be coalescent due to the absence of selfing and NNCSD the number of individuals that cannot be coalescent in a dioecious population due to the limited number of possible mates, then, in small dioecious populations, the effective population size is Ne_SD = Ne_BD + NNCNS + NNCSD.
Interestingly, the highly sophisticated, and fairly complicated to compute, Balloux's equation (Balloux, 2004), Equation 3 in the present paper, did not perform better than Wright's second equation (Equation 2), and worse than our Equation 15. As shown in Appendix 6, the coalescence effective population size was obtained after neglecting different terms at several successive steps of the analytical process. To be as accurate as Equation 21, Equation 2 indeed requires Ne > 10 and/or a sufficient number of generations (e.g. 10). As seen from the Figure 1, this seems to indeed apply as long as Ne>12. No explanations were provided for the abstract notion of the coalescence effective population size and the way used to weight approximated coalescent times computed at different hierarchical levels (e.g. individuals, subpopulations, etc…). What we were interested in, in this paper, was to compute the local effective population size, i.e. the one that affects the speed of polymorphism loss within subpopulations. In that case, the eigenvalue effective population size may be the most accurate.
Regarding FIS-based estimates, the fact that Pudovkin et al., 2 nd equation (Pudovkin et al. (1996), equation 5) did not perform well, probably comes from the confusion between effective population size and the number of individuals (or of breeders), at different steps of the analytical procedure. Equation 19 provided similar results as equation 5, though with a slightly stronger bias and is thus too biased also. Pudovkin et al., second equation (equation 5 of the present manuscript), is quite popular in empirical population genetics studies, and is the one implemented in NeEstimator (Do et al., 2014). It presented underestimates, even when Ne>20. Balloux's equation (equation 6), also popular, suffered from an overestimation of Ne, in a symmetric position as compared to underestimations of Equation 4 (Pudovkin et al., first equation). For both, the biases are small, particularly so when Ne>4. Interestingly, following the steps described in Balloux's paper (Balloux, 2004), but replacing the coalescence approach by the leading eigenvalue approach, provided the most accurate FIS-based estimate of the effective population size in dioecious populations (Equation 21). It appeared to exactly correspond to the average between Equations 4 (Pudovkin et al., first equation) and 6 (Balloux).
It is worth recalling that the FIS-based estimates given in Equation 21 assumes an even sex ratio. Nevertheless, strongly biased sex-ratios will affect FIS accordingly and should not have strong consequences on the estimate of effective population sizes, as suggested by the Figure 2.
We may also bear in mind that although random mating was assumed, we did not specify any reproductive strategy (mono or polygamy). Indeed, Equation 8 assumes polygamy, but monogamy is known to lead to the same result as polygamy, as demonstrated pages 267-268 in Felsenstein's book (Felsenstein, 2019), and in Appendix 12. The only difference is that, in monogamous populations, the sexratio of individuals that reproduce is necessarily even. Consequently, monogamy can prevent a possible high variance in male mating success, which would reduce Ne. But monogamy cannot produce an increase of Ne as compared to pangamic polygamy. In that sense, and everything else being equal, gibbons (which are monogamous) should preserve better their genetic diversity than gorillas, but not better than bonobos (assuming bonobos are pangamous).
It is worth mentioning that these computations were based on accurate (exact) measures of FIS. Unbiased estimators of F-statistics (Weir & Cockerham, 1984) suffer from large variances (Robertson & Hill, 1984). It is thus likely that deviation from the real value will have a large impact on FIS-based estimates of Ne, especially for small expected ("real") values. It can be seen that Ne_Eq4<Ne_Eq21<Ne_Eq6<Ne_Eq5<Ne_Eq19. From there, it can be expected that with small underestimations of FIS, Ne_Eq6 will be closer to the real value; then Ne_Eq5 for bigger underestimations, and finally Ne_Eq19 for the strongest underestimations. On the contrary, overestimations of FIS will necessarily lead Ne_Eq4 to stay closer to the real Ne. However, the differences are expected to be quite small, particularly so as compared to Pudovkin 1 (Equation 4) and Balloux (Equation 6), especially for the highest values (e.g. Ne>6): Nevertheless, not knowing what the real FIS should be, it is probably wiser using the less biased estimate, i.e. Ne_Eq21.
It is also worth mentioning that FIS should be estimated from adults, as the genetic structure in immature individuals may considerably differ from the one they would display in the pool of adults that survived.
The fact that our equation 21 outperformed other equations for Ne<4-6 may suggest strong limitations in the practical applicability of this performance since such systems may be expected to quickly undergo extinction. In addition to the fact that it is generally preferable to work with the most accurate equation, these results are likely to be especially pertinent for certain types of biological systems that are able to persist for extended periods despite having very small effective population sizes. For instance, the populations of the parasitoid wasp Nasonia vitripenis, depending on the distribution of its host (parasitic flies), often display systematic mating of females with their brothers (Werren, 1980). For the mite Varroa destructor, a female enters a brood cell, which she caps, where she feeds on the bee larva and then gives birth to a haploid male, which later mates with its diploid sisters, laid by the mother from fertilized eggs from a previous mating that occurred before the colonization of the brood cell, or after mating with her son (which may happen for 30% of females) (Beaurepaire et al., 2017;Häußermann et al., 2020). In both cases, males are produced by arrhenotokous parthenogenesis (unfertilized haploid eggs), meaning that many populations of these organisms probably display very small Ne, and even smaller than 1 in some instances. We did not undertake an extensive review of similar cases, as it is not in the scope of the present paper, but such kind of situations may not be rare in dioecious parasitic organisms like parasitoid hymenoptera, mites or nematodes.
According to recent papers based on simulations (i.e. perfect data), FIS-based single sample (or subsample) estimates of Ne are not the most accurate (Wang, 2009;Do et al., 2014;Wang, 2016). According to Do et al. (2014), the linkage disequilibrium (LD) based estimate (Waples, 2006), appeared to perform better than the co-ancestry method (CoA) (Nomura, 2008) and the FIS-based method (Equation 5) (Pudovkin et al., 1996). According to more recent simulation studies (Wang, 2016), the sibship frequency based estimate (SF) (Wang, 2009) seemed to provide more accurate results than the previous ones. No comparison was ever made with an alternative method based on one and two locus identity measures (1&2LI) (Vitalis & Couvet, 2001c, 2001b, implemented by the software Estim 1.2.2 (Vitalis, 2002), updated from Estim 1 (Vitalis & Couvet, 2001a). Based on simulations, the 1&2LI method provided accurate (though slightly underestimated) results, especially when more than four loci were used (Vitalis & Couvet, 2001b). Again, no simulation study exhaustively compared all available one-sample estimates. This would require replicated simulations of different scenarios of population structure (Island or stepping stone models with varying subpopulation number, sub-population sizes and immigration rates), different kinds of loci (microsatellite like or SNP like loci) with varying number of loci, number of alleles and mutation rates, and with or without amplification problems (null alleles, stuttering, short allele dominance or allelic dropouts), and varying sampling strategies. A comparison with temporal methods (Nei & Tajima, 1981;Pollak, 1983;Wang & Whitlock, 2003;Jorde & Ryman, 2007) might also prove interesting, though the number of generations between two samples of the same site will add another relevant parameter to explore (Waples & Do, 2010). This will obviously require much more work to undertake, which is beyond the scope of the present paper.
We nevertheless undertook a quick simulation study with Easypop (Balloux, 2001). We simulated single isolated and randomly mating dioecious populations, with varying sex-ratio, at 100 independent loci with a KAM model of mutation with K=100 possible allelic states and a mutation rate of u=0.00001, and 100 generations. All simulations started with maximum diversity. We then computed effective population sizes. We computed FIS with Fstat (Goudet, 1995). For these simulations, most of the averaged FIS across loci were positive and therefore could not be used to estimate Ne. We then preferred computing the average across loci displaying a negative FIS. For NeEstimator analyses (LD and Coancestries), we assumed polygamy and kept estimates excluding alleles less frequent than 5% (LD method). For Estim (1&2LI), we assumed panmixia. For Colony (SF), we generated data using Create (Coombs et al., 2008) and assumed polygamy and some inbreeding, as this may occur at unknown level in real data. Figure 3 illustrates what kind of variations could be observed from one parameter set to the other and from one method to the other. It suggests that some kind of average across methods may allow grasping the range of actual effective population sizes of sub-populations from genotyped sub-samples. Regarding real data, quoting Nomura, "combined estimate of several independent estimates is expected to improve the precision of separate estimates" (Nomura, 2008). For each method, one could compute the average Ne across subsamples of the same population, ignoring undefined values (negative or infinite), note the maximum and minimum values obtained and keep the number of usable values as a weight. Finally, the grand average (across methods) and average minimum and maximum, all weighted by the number of usable values obtained in each method, could be computed. For more clarity, a template of this method can be found in the file "TemplateRhipicephalusFstatResNeFiveMethods.xlsx", coming from the analysis of cattle tick populations from New-Caledonia (De Meeûs et al., 2010). With all data, average Ne≈120 in minimax≈ [80,200]. When excluding the two most extreme values Ne≈50 in minimax≈ [10,110]. Using the harmonic mean, as suggested by Nomura (Nomura, 2008), Ne≈20 in minimax= [10,30]. Simulation studies could be used to identify an estimator that more accurately approximates the eigenvalue effective population size of genotyped populations.
Temporal data are rarely available (but see Palstra & Ruzzante (2008)), but when these are, they give access to different estimates, which may be usefully included in the computations of averages and magnitude of variations.
Undefined Ne may correspond to very big values. Thus, ignoring these may lead to underestimates. They may also correspond to the variance of estimate of the parameter used, like FIS, as mentioned above. This possible flaw may be attenuated by the use of repeated subsamples and independent loci. Waples and Do proposed to include negative Ne as such in the computation of an harmonic mean, with weights proportional to reciprocals of variances (Waples & Do, 2010). Nevertheless, on the tick data set, this strategy ended with globally negative (and then unsound) values for these populations (not shown), which are expected to display important population sizes (i.e. 120 ≤ Ne ≤ 1200 (Koffi et al., 2006)). As already discussed, this will require a more thorough exploration through simulations of various kinds of populations.
To conclude, even if the differences with some other equations are not very big, the new approximation proposed here appeared almost perfect and biologically relatively sound, and, wherever it is used for, we suggest to use it instead of the previous and more biased estimates found in the literature.

Acknowledgements
Thierry de Meeûs would like to thank François Rousset, François Ziegler and members of the Maximadiscuss list for recurrently answering to his queries and for giving him tips that really unblocked the advances of the present work. Special thanks to Joseph Felsenstein who gave the idea of using Taylor-MacLaurin series, which triggered the transformation of a small paragraph of a book into a full article. We also are grateful to Robin Waples, Jay Taylor and two anonymous referees, whose comments considerably helped improving this manuscript, as Myriam Heuertz for her significant editorial inputs. This work was entirely financed by the Institut de Recherche pour le Développement (IRD) (recurring subsidies of UMR Intertryp). Preprint version 6 of this article has been peer-reviewed and recommended by Peer Community In Evolutionary Biology (https://doi.org/10.24072/pci.evolbiol.100651, Baer (2023)).

Data availability
Maxima scripts are provided in the section Scripts at the end of the manuscript. Simulations used for the graphic of Figure 3 are available in the file "Simulations-1st-pop.zip". This file, the template "TemplateRhipicephalusFstatResNeFiveMethods.xlsx", with the corresponding dataset "BoophilusAdultsDataCattle.txt", and the file VarDif.pdf (computation of the variance of a difference) are also available online at https://doi.org/10.5281/zenodo.7945369 (De Meeûs, 2023

Appendices
Appendix 1: Matrix multiplication, identity matrix, matrix determinant and matrix inversion. By convention, matrices and vectors appear in bold, while scalars write in italics, and matrix multiplication is noted by a point.
Let matrix A and vector x be: ).  Consequently, the reverse of matrix A is: We know that ad-bc=Det(A), thus: When Det(A)=0, A is singular. When Det(A)≠0, A is nonsingular. A nonsingular matrix is necessarily squared.
To compute the inverse of a 33 matrix, it is easier using a mathematical software as Maxima (Vodopivec, 2017) (command invert(A)).

Appendix 2: eigenvalues and eigenvectors
For the sake of space saving and simplicity, we will take the example of a 2×2 matrix X and a two lines column vector e: If λi is an eigenvalue and ei the corresponding eigenvector of matrix X, then they must satisfy the equation: X.ei=λi.ei. We can translate this into the system of equations: { 11 × 1 + 12 × 2 = 1 21 × 1 + 22 × 2 = 2 Excluding the trivial solution where e1=e2=0, we can rewrite the preceding equations as: We can go back to EqA2-2 to obtain eigenvalues as function of x21 (as is the case in certain textbooks), this leads to:

Appendix 3: Matrix power and diagonalization
Computing matrix powers is difficult, except for diagonal matrices. Indeed, using equation A1-2, it is easy to see that: For any other squared matrix A, it may thus be useful to diagonalize it, if one wants to compute any power of it. In Horn and Johnson's book, page 59 (Horn & Johnson, 2013), we are invited to solve the equation: where P is an invertible matrix and D a diagonal matrix. Let A, P and D, v1 and v2 be:  We recognize what we saw about eigenvalues and eigenvectors in Appendix 2, meaning that matrix S is a combination of eigenvectors of A, and D is a diagonal matrix with matrix A's eigenvalues on the diagonal from the bigger (top left) to the smallest (bottom right).
From there, computing the power of any matrix A is relatively easy. Indeed, if we have P -1 .A.P=D, then this also writes P.P -1 .A.P.P -1 =P.D.P -1  A=P.D.P -1 . From there, computing A t is easy: where I is the identity matrix. From there, we can compute: Consequently, we can use equation (A3-3) to calculate the power of any diagonalizable square matrix.
We can now derive some other properties of eigenvalue-eigenvector pairs (eigenpairs). For the 2×2 matrix A, with the eigenpairs λi and ei, where i stands for 1 or 2, A.ei=λi.ei. Then
Let v be a vector composed of a combination of eigenvectors of matrix A so that v=∑ . , where the xi's are scalars that can be computed. We can then write:  This property can be used to any power function of matrices. In particular, for the matrix S=(I-γ.A), which should be invertible, it is easy to see that the eigenpairs of S are (in decreasing order of the hierarchy) λ1'=1-γλ2, e1'=e2 and λ2'=1-γλ1, e2'=e1. Indeed, if we take the eigenvector ei, then: S.ei=ei-γ.A.ei  S.ei=eiγ.λi.ei  S.ei = (I-γ.λi).ei (QED).
Using equation A3-5, we can write: This logically yields:

Equation A3
-6 corresponds to equation A.10 of Rousset's book (page 219), which was given without any detailed proof.

Appendix 4: eigenvalue effective population size
This notion refers to the evolution of heterozygosity, or more exactly genetic diversity, defined as the probability, at generation t, to draw randomly two alleles that are not identical by descent, from one generation to the other, and labelled Ht-1 and Ht. If the evolution of a population by genetic drift has reached a steady state, the ratio of Ht/Ht-1 remains constant generation after generation and has been shown to correspond to the leading eigenvalue of the transition matrix for the evolution of vectors of genetic identity probabilities (see below).
Let QI be the probability of identity within diploid individuals, and QS, the probability of identity between two alleles from two individuals of the same population. In an ideal population of size N, and thus under panmixia (thus here QI=QS), we can set the system of equations: If we replace identities with their corresponding values in terms of genetic non-identity (hence diversity), we obtain: Assuming a steady state, so that HSt/HSt-1=HSt-1/HSt-2=λ, we can set: The ratio of genetic diversities of generation t and t-1 is indeed the leading eigenvalue of the transition matrix describing the evolution of genetic diversities (and of genetic identities as well) (QED).
We can also determine c1 and c2, if genetic diversities at time 0 are known. From equations A4-3 and A4-7, we know that: Now if we combine equations A1-8 and A1-10 we can compute, for HS (which is here the same as HI): This results confirms that, at any generation HI=HS, and hence c2=0. We can then simply write, for the Wright-Fisher model: where HS is the local genetic diversity at time 0, and λ1 id the leading eigenvalue of the transition matrix A. From equation A2-4, it is very easy to see that λ1+λ2=x11+x22, and that λ1>λ2. From matrix A defined for the WF model, we can see that all xij are probabilities that only sum to 1 for the Castle-Weinberg model and hence x11+x22≤1. In the case of WF, the difference between λ1 and λ2 is even very big (equation A4-6). Consequently, when t becomes big enough λ1 t >>λ2 t and equation A1-12 can be approximated as: Combining A4-13 with A2-5 yields: From there, and for any HX (X=I or S), it is straightforward that the ratio Ht/Ht-1=λ1. From equation A4-6, we know that the leading eigenvalue of the transition matrix of a fully panmictic model (i.e. WF) is λe=1-1/(2Ne) and thus, the effective population size of any non-reference population will follow λ1=1-1/(2Ne), which is equivalent to: Consequently, the eigenvalue effective population size of any population will be: where λ1 is the leading eigenvalue of the transition matrix, describing the evolution of genetic diversities (or same wise of genetic identities) from one generation to the other, for that population. All these detailed explanations leading to equation A4-15 provide the same result as equation 3.105 in Ewen's book (page 120) (Ewens, 2004), which was given with much more elliptic explanations.
It is also worth noting that equation A4-15 is only accurate when t is big enough, or when the population has reached a steady state so that the ratio Ht/Ht-1 becomes constant and equal to λ1.

Appendix 5: Pudovkin et al.'s methods to compute Ne
Let pf and pm be allele frequencies of one of two alleles at a given locus in females and males respectively, in a population with an even sex-ratio. Then, in the progeny, the proportion of heterozygotes observed should be Hexp-dio=pf(1-pm)+(1-pf)pm. In this population, the frequency of this allele will be (pf+pm)/2. Consequently, the expected frequency of heterozygotes under the panmictic (monoecious) model in the progeny (Hexpmon) would be: Please note that Hexp-dio and Hexp-mon here correspond to Hobs and Hexp respectively in (Pudovkin et al., 1996). There is thus an observed heterozygote excess in the progeny.
The quantity pf-pm can be considered as a random variable with average 0 over all possible parental groups. If we consider that the frequency of the first allele was p in the parental population, then the average of (pf-pm)² is the variance of a difference in allele frequencies between two binomial samples of size N for each gender (N alleles in females and N in males=2N alleles in total). The variance of frequency of a given allele randomly taken in a population of size N, is p(1-p)/N, where p is the frequency of the allele in the parents. The variance of a difference between two uncorrelated (e.g. independent) random variables is the sum of individual variances (see the file VarDif. In their appendix, Pudovkin et al. then used a sleight of hand. They set N=Ne again, 2p(1-p)=Ht-1 and Hexp-mon=Ht, and used the equation λ= Ht/Ht-1, citing Kimura and Crow's book (Crow & Kimura, 1970). Then, with p(1-p)=Ht-1/2, Ht-1=Ht/λ and Hexp=Ht, we can rewrite A5-2: Pudovkin et al. used another sleight of hand from equation 3.11.8 (page 104) from Crow and Kimura's book (Crow & Kimura, 1970), replacing subpopulation sizes N by Ne (again) and obtained: The way Pudovkin et al used this equation may be inaccurate because Crow and Kimura's equation refers to the number of individuals, not the effective population size. Nevertheless, if we combine equations A5-6 and A5-7 we obtain: From equation A5-4, and setting that Ht is the expected heterozygote frequency in the progeny, hence Hexp-mon, we can rewrite equation A5-8 as:

Appendix 6: Coalescent effective population size in a dioecious pangamic population
Let QI and QS be the probabilities that the same allele is sampled twice, either in one individual or in two distinct individuals from the same population. Let u be the mutation rate per generation in an infinite allele model where each mutation event producesa new allele that never existed before (no homoplasy). Then, for a dioecious population with an even sex-ratio and random mating, we can set the following recurrences between generation t and t-1 (Equations 7 and 8 with an even sex-ratio and mutation rate u) (see equations 7 and 8 with Nf=Nm).
Let Qt the vector of genetic identities at time t and A be the squared matrix of transition for genetic identities, v the corresponding vector of residuals, and I the identity matrix. If γ=(1-u)² is the probability that two alleles taken at random did not mutate, then we can write: = ( −2 + ) +  = 2 2 −2 + 2 +  = 2 2 ( −3 + ) + 2 +  = 3 3 −3 + 3 2 + 2 +  = ( −1) ( −1) 1 + ( −1) ( −2) + ( −2) ( −3) + ⋯ + 2 1 + 1 0 Assuming that equilibrium values has been reached at time t (t→): We can see that the second term in these equations will increase with t, albeit at a diminishing rate, while the first term will decrease with t.. Hence, if inbreeding within individuals and within subpopulation are small enough at time t=1, after a sufficient number of generations, and using equation A3-5 and decomposing v as v=∑ . , where the xi's are scalars that can be computed and ei are eigenvectors of A, we can approximate equation A6-4 as: This is the same as the second part of equation 4.10 in Rousset's book, page 56 (Rousset, 2004). It is worthy of note that such an approximation is invalid in populations with poor levels of genetic diversity in the first generations.
We can also compute Q at equilibrium. For this we set equation A6-2 as: We can use equation A3-6 to obtain: This equation corresponds to the first part of equation 4.10 given in Rousset's book.
We can also express Q as a function of probability of pairwise coalescence at time t. If we define a vector Ct of such probabilities within individuals and between individuals (to stick to our framework with two hierarchies), we can write: In equation CI(t) and CS(t) are the probabilities, at time t, that two alleles of one individual (I) or of different individuals in the subpopulation (S), respectively, all randomly chosen, had coalesced somewhere in the past. At equilibrium, or after a lot of generations, identities will correspond to the sum of all coalescent events that occurred in the past, and if no mutation ever occurred, and hence: Combining equations A6-5 and A6-9 provides the following equality: This means that: This equation meets with equation 4.11 page 56 in Rousset's book (Rousset, 2004). The mean coalescent time between two alleles in hierarchy J TJ(t) (J=I or S for the example treated in the present paper) at time t, can be computed as the sum of the products of the time of each event of coalescence by the probability of coalescence at that time for these two alleles of J (Rousset, 2004) (page 59), in vector format: Please note that in Rousset's book or other papers n=∞.
Using the result of equation A6-10 we get: for the general case of any squared transition matrices (for the present case this sum stops at λ2).
The eigenpairs and scalars are constant through time and we can for now focus on the different sums, Si of each eigenpair of order i: = ∑ ( −1) =1 = 1 0 + 2 1 + 3 2 + ⋯ + ( −1)  = 1 + 2 2 + ⋯ + ( − 1) −1 + We can then set: We can again use the fact that: We can now replace this Si' in Si to obtain: Now, using this Si in equation A6-12 yields: Here, simplifying equation A6-13 is possible, but at the expense of another approximation. In the case of an isolated dioecious subpopulation, numerical applications suggested that if n big (i.e.n>400 generations) or if the subpopulation is big enough (N>4) and n>10, then equation A6-13 can be approximated as: For a dioecious population with random mating and even sex-ratio we can write Equation A6-2 (see also A6-3) as: Qt=γ(A.Qt-1+v). If we combine equations A6-14, and A6-16, we get:

Eigenpairs of matrix
[ 1 + 2 2 − 2 2 − 1 − 1 2 + 2 1 (1 − 1 ) 2 (1 − 2 ) 2 ] S ≈ 1 2 ( 1 − 2 ) [ 1 (1 + 2 2 − 2 2 ) − 2 (1 + 1 2 − 2 1 ) If we use A6-15 in A6-17, we obtain: This result is the same as in Balloux's paper (Balloux, 2004) (equation 15) with an even sex ratio. For a panmictic population of size Ne: The corresponding transition matrix has the following eigenpair:  From there, it is easy to understand that the coalescent effective population size can then be defined as in equation 17 of , i.e.: where ̅ is the weighted average of the different Ti's, here: The weights in fact correspond to the probabilities to sample two genes from the considered hierarchy: within one individual, from two different individuals within the same sub-population, from two different sub-populations from the same archipelago, etc… In our context, for a dioecious and isolated population of size N with an even sex-ratio, combining equations A6-18, A6-22 and A6-23 leads to: -24 is exactly the same as equation 10 in Balloux (2004). To give a biological meaning to this result, it corresponds to the census size (or number of breeders) plus half an individual that would have been coalescent through random selfing in a WF population, plus one coalescent individual that would occur in a WF population (which may sound redundant with the second).

Appendix 7: Matrix method to compute eigenvalue effective population size in a dioecious population
Let QI(t) and QS(t) be the probabilities of identity between two alleles at time t within individuals and between individuals in a dioecious random mating population. We can then use Equations 7 and 8 in the main text: To save time we used wxMaxima to find the leading eigenvalue of A (see Script which is the same as equation 11 in the main text (QED).

Basic notions about derivative functions
Readers acquainted with derivatives can skip this first section. The derivative of a function f(x) describes the orientation and speed of variation of this function, measured between two points separated by a distance Δx that tends to 0: ∆ For the present paper, we will need to compute the derivative of several functions. For instance for the function f(x)=x n , then: ( + ∆ ) − ∆ For any n, beginning with n=2 or 3, it is easy to show that: is a function of x with one term in x m<n , one term in Δx n-2 and other terms in xΔx, so that the limit when Δx→0 necessary is nx n-1 . Then f'(x)=nx n-1 . It is easy to see that the derivative of a sum of functions is simply the sum of derivatives of the different functions of this sum.
Next, we need to compute the derivative of f(g(x)) or more correctly (f ○ g)(x). For this, it will be easier to change of notation:

Examples of Taylor-MacLaurin's expansion series
We need to find a proxy for √1 + and for 1/(1-X), when X is small.
Using the same approach as for g'(X): If we now use Taylor for ( ) = √1 + in the neighborhood of a: When a→0, we get: The same result can be obtained with the command "taylor(sqrt(1+X),X,0,3)" in wxMaxima (Vodopivec, 2017).

Appendix 10: Balloux's like method to compute FIS based Ne
Let QI and QS be the probabilities to sample twice the same allele in one individual and between individuals from the same population, respectively, u the mutation rate, then, for a dioecious population with an even sex-ratio and random mating, we can set the following recurrences between generation t and t-1 (see equations 7 and 8 with Nf=Nm): The same results can be obtained with classic algebra, without the use of matrix computations, but it is much faster this way. This is also the same results as equation 8 in Balloux's paper. It is worth mentioning here that equation A10-3 can also theoretically give access to the census size of individuals in the population (N) or, more precisely, to the exact number of adult parents of the individuals in the population, that some may call the effective number of breeders: (A10-4) = − 1+ IS

IS
If we go back to equation 16 of the main manuscript, we can compute the eigenvalue effective population size as: (A10-5) ≈ + Now, with a stronger approximation, Ne≈N+1/2, which, combined with equation A9-4 yields Pudovkin et al.'s equation 3 (see equation 4 of the present manuscript). We can also notice that A9-6 is the average of equations 4 (Pudovkin et al. second equation) and 6 (Balloux).

Appendix 11: Equilibrium value for FIS in a dioecious population (general case)
For this we need to use equation A6-1 and add a mutation rate u so that equation A6-1 becomes: (A10-7) Terms in u are small in front of 1 so that equation A7-2 can be simplified as: (A10-9) IS ≈ − + + +8 Appendix 12: Effective population size of an isolated monogamous population We will use the same notations as in other sections. Monogamy implies an even sex ratio in the pool of adults that are involved in a mating. For the identity within individuals, the recurrence stays the same as in polygamous populations. The recursion for the identity between individuals can be determined by conditioning on the ancestry of the sampled pair in the previous generation. One possibility is that the two