Section: Genomics
Topic: Genetics/genomics, Statistics
Conference: JOBIM

localScore: an R package to highlight optimal and suboptimal segments in a sequence with associated p-values computation

Corresponding author(s): Déjean, Sébastien (sebastien.dejean@math.univ-toulouse.fr)

10.24072/pcjournal.650 - Peer Community Journal, Volume 5 (2025), article no. e141

Get full text PDF Peer reviewed and recommended by PCI

Highlighting atypical segments of a sequence is an important goal in very diverse domains. In the case where no prior information on the length of the segment to be highlighted is known, Karlin and Altschul defined, in 1990, the local score for biological sequence analysis, and an asymptotic approximation of its distribution was proposed in 1992. There are now many other theoretical results that can be used to establish the p-value of the local score in different contexts. We have developed an R package bringing together these results for a sequence modelled by independent and identically or Markovian distributed variables. It calculates the local score, the sub-optimal scores and their positions, and proposes to establish the $p$-value of the local score using the various theoretical methods available to date. An automatic analysis is also proposed to apply the most appropriate method depending on the sequence analyzed. Here we present the software package and various application examples. Comparisons with other tools used depending on the context of the application are also given. The localScore package is available on CRAN under the GPL-2 license (core program) and various licenses for the embedded Eigen library.

Published online:
DOI: 10.24072/pcjournal.650
Type: Research article
Keywords: Statistical significance, Local score, Sequence analysis, Atypical segment detection, Lindley-CUSUM process

Robelin, David  1 ; Déjean, Sébastien  2 ; Mercier, Sabine  2

1 INRAE, INPT, ENVT, Université de Toulouse, GenPhySE UMR 1388, Castanet-Tolosan, France
2 Université de Toulouse, UT2J, UT Capitole, INUC, INSA, CNRS, Institut de Mathématiques de Toulouse, UMR 5219, Toulouse, France
License: CC-BY 4.0
Copyrights: The authors retain unrestricted copyrights and publishing rights
@article{10_24072_pcjournal_650,
     author = {Robelin, David and D\'ejean, S\'ebastien and Mercier, Sabine},
     title = {localScore: an {R} package to highlight optimal and suboptimal segments in a sequence with associated p-values computation
},
     journal = {Peer Community Journal},
     eid = {e141},
     year = {2025},
     publisher = {Peer Community In},
     volume = {5},
     doi = {10.24072/pcjournal.650},
     language = {en},
     url = {https://peercommunityjournal.org/articles/10.24072/pcjournal.650/}
}
TY  - JOUR
AU  - Robelin, David
AU  - Déjean, Sébastien
AU  - Mercier, Sabine
TI  - localScore: an R package to highlight optimal and suboptimal segments in a sequence with associated p-values computation

JO  - Peer Community Journal
PY  - 2025
VL  - 5
PB  - Peer Community In
UR  - https://peercommunityjournal.org/articles/10.24072/pcjournal.650/
DO  - 10.24072/pcjournal.650
LA  - en
ID  - 10_24072_pcjournal_650
ER  - 
%0 Journal Article
%A Robelin, David
%A Déjean, Sébastien
%A Mercier, Sabine
%T localScore: an R package to highlight optimal and suboptimal segments in a sequence with associated p-values computation

%J Peer Community Journal
%D 2025
%V 5
%I Peer Community In
%U https://peercommunityjournal.org/articles/10.24072/pcjournal.650/
%R 10.24072/pcjournal.650
%G en
%F 10_24072_pcjournal_650
Robelin, D.; Déjean, S.; Mercier, S. localScore: an R package to highlight optimal and suboptimal segments in a sequence with associated p-values computation. Peer Community Journal, Volume 5 (2025), article  no. e141. https://doi.org/10.24072/pcjournal.650

PCI peer reviews and recommendation, and links to data, scripts, code and supplementary information: 10.24072/pci.genomics.100420

Conflict of interest of the recommender and peer reviewers:
The recommender in charge of the evaluation of the article and the reviewers declared that they have no conflict of interest (as defined in the code of conduct of PCI) with the authors or with the content of the article.

[1] Bairstow, L. Applied Aerodynamic, Longmans, Green and Company, London, 1920, pp. 551-560

[2] Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing, Journal of the Royal Statistical Society: Series B (Methodological), Volume 57 (1995) no. 1, pp. 289-300 | DOI

[3] Cellier, D.; Charlot, F.; Mercier, S. An improved approximation for assessing the statistical significance of molecular sequence features, Jour. Appl. Prob., Volume 40 (2003), pp. 427-441 | DOI

[4] Chabriac, C.; Lagnoux, A.; Mercier, S.; Vallois, P. Elements related to the largest complete excursion of a reflected BM stopped at a fixed time. Application to local score, Stochastic Processes and their Applications, Volume 124 (2014) no. 12, pp. 4202-4223 | DOI

[5] Coop, G.; Witonsky, D.; Di Rienzo, A.; Pritchard, J. K. Using environmental correlations to identify loci underlying local adaptation, Genetics, Volume 185 (2010) no. 4, pp. 1411-1423 | DOI

[6] Cucala, L. A hypothesis-free multiple scan statistic with variable window, Biometrical Journal, Volume 50 (2008) no. 2, pp. 299-310 | DOI

[7] Cucala, L. Variable Window Scan Statistics: Alternatives to Generalized Likelihood Ratio Tests, Handbook of Scan Statistics, Springer, New York, NY, 2017, pp. 1-16 | DOI

[8] Fariello, M.; Boitard, S.; Mercier, S.; Robelin, D.; Faraut, T.; Arnould, C.; Le Bihan-Duval, E.; Recoquillay, J.; Salin, G.; Dahais, G.; Pitel, F.; Leterrier, G.; Sancristobal, M. Accounting for Linkage Disequilibrium in genome scans for selection without individual genotypes : the local score approach, Molecular Ecology, Volume 26(14) (2017), pp. 3700-3714 | DOI

[9] Glaz, J.; Pozdnyakov, V.; Wallenstein, S. Scan statistics - Methods and applications, Birkhäuser Boston, 2009 | DOI

[10] Glaz, J.; Naus, J.; Wallenstein, S. Scan Statistics, Springer Series in Statistics, Springer, New York, NY, 2001 | DOI

[11] Grusea, S.; Mercier, S. Improvement on the distribution of maximal segmental score in a Markovian sequence, Journal of Applied Probability, Volume 57.1 (2020), pp. 29-52 | DOI

[12] Hassenforder, C.; Mercier, S. Exact Distribution of the Local Score for Markovian Sequences, AISM, Volume 59 (2007) no. 4, pp. 741-755 | DOI

[13] Karlin, S.; Altschul, S.-F. Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes, PNAS, Volume 87 (1990), pp. 2264-2268 | DOI

[14] Karlin, S.; Dembo, A. Limit distributions of maximal segmental score among Markov-dependent partial sums, Advances in Applied Probability, Volume 24 (1992), pp. 113-140 | DOI

[15] Knox, G. Secular pattern of congenital oesophageal atresia, British Journal of Preventive Social Medicine, Volume 13 (1959), pp. 222-226 | DOI

[16] Kyte, J.; Doolittle, R. A simple method for displaying the hydropathic character of a protein, Journal of molecular biology, Volume 157 (1982) no. 1, pp. 105-132 | DOI

[17] Lagnoux, A.; Mercier, S.; Vallois, P. Statistical significance based on length and position of the local score in a model of i.i.d. sequences, Bioinformatics, Volume 33 (2017) no. 5, pp. 654-660 | DOI

[18] Mercier, S. Transferring biological sequence analysis tools to break‐point detection for on‐line monitoring: A control chart based on the local score, Quality and Reliability Engineering International, Volume 36 (2020) no. 7, pp. 2379-2397 | DOI

[19] Mercier, S.; Daudin, J. Exact Distribution for the Local Score of One i.i.d. Random Sequence, Journal of Computational Biology, Volume 8 (2001) no. 4, pp. 373-380 | DOI

[20] Nagarwalla, N. A Scan Statistic with a Variable Window, Statistics in Medicine, Volume 15 (1996) no. 7-9, pp. 845-850 | DOI

[21] Naus, J.; Wallenstein, S. Temporal surveillance using scan statistics, Statistics in Medicine, Volume 25 (2006) no. 2, pp. 311-324 | DOI

[22] Naus, J. I. Approximations for Distributions of Scan Statistics, Journal of the American Statistical Association, Volume 77 (1982) no. 377, pp. 177-183 | DOI

[23] Poklukar, K.; Mestre, C.; Škrlep, M.; Čandek-Potokar, M.; Ovilo, C.; Fontanesi, L.; Riquet, J.; Bovo, S.; Schiavo, G.; Ribani, A.; Muñoz, M.; Gallo, M.; Bozzi, R.; Charneca, R.; Quintanilla, R.; Kušec, G.; Mercat, M.-J.; Zimmer, C.; Razmaite, V.; Araujo, J. P.; Radović, Č.; Savić, R.; Karolyi, D.; Servin, B. A meta-analysis of genetic and phenotypic diversity of European local pig breeds reveals genomic regions associated with breed differentiation for production traits, Genetics Selection Evolution, Volume 55 (2023) no. 88 | DOI

[24] R Core Team R: A Language and Environment for Statistical Computing (R Foundation for Statistical Computing. Vienna, Austria), 2024 (https://www.R-project.org/)

[25] Robelin, D.; Dejean, S.; Mercier, S. Supplementary material: localScore: an R package to highlight optimal and suboptimal segments in a sequence with associated p-values computation, Recherche Data Gouv, 2025 | DOI

[26] Shewhart, W. A. Economic Control of Quality of Manufactured Product, D. Van Nostrand Company, New York, 1931

[27] Simon, S.; Robelin, D.; Mercier, S.; Dejean, S. localScore: Package for Sequence Analysis by Local Score, R package version 1.0.11 (2023) (https://cran.r-project.org/package=localScore)

[28] Wallenstein, S.; Neff, N. An approximation for the distribution of the scan statistic, Statistics in Medicine, Volume 6 (1987) no. 2, pp. 197-207 | DOI

[29] Wang, S. localScore: finding optimal segments in genetic sequences, Peer Community in Genomics (2025) no. 100420 | DOI

[30] Wang, X.; Glaz, J. Variable Window Scan Statistics for Normal Data, Communications in Statistics - Theory and Methods, Volume 43 (2014) no. 10-12, pp. 2489-2504 | DOI

Cited by Sources: