Comparison of modularity-based approaches for nodes clustering in hypergraphs

Statistical analysis and node clustering in hypergraphs constitute an emerging topic suffering from a lack of standardization. In contrast to the case of graphs, the concept of nodes' community in hypergraphs is not unique and encompasses various distinct situations. In this work, we conducted a comparative analysis of the performance of modularity-based methods for clustering nodes in binary hypergraphs. To address this, we begin by presenting, within a unified framework, the various hypergraph modularity criteria proposed in the literature, emphasizing their differences and respective focuses. Subsequently, we provide an overview of the state-of-the-art codes available to maximize hypergraph modularities for detecting node communities in binary hypergraphs. Through exploration of various simulation settings with controlled ground truth clustering, we offer a comparison of these methods using different quality measures, including true clustering recovery, running time, (local) maximization of the objective, and the number of clusters detected. Our contribution marks the first attempt to clarify the advantages and drawbacks of these newly available methods. This effort lays the foundation for a better understanding of the primary objectives of modularity-based node clustering methods for binary hypergraphs.


Introduction
The interest in higher-order interactions stems from the recognition that many phenomena are inherently more complex than what can be effectively represented by pairwise relationships alone.While graphs model pairwise interactions, hypergraphs generalize this concept by capturing higher-order interactions involving more than two elements.This extension provides a more expressive framework for modeling intricate dependencies and interactions in various fields, ranging from social network analysis (early acknowledged in Wolff, 1950) or co-authorship relations (Roy and Ravindran, 2015) to ecological systems (Muyinda et al., 2020), neurosciences (Chelaru et al., 2021) or even chemistry (Flamm et al., 2015).We refer to Battiston et al., 2020;Bick et al., 2023;Torres et al., 2021 for recent reviews on higher-order interactions.
With the emergence of hypergraph datasets (see for e.g. Lee et al., 2021) to model higherorder interactions, the question of nodes clustering and, more specifically, the detection of communities in hypergraphs arises.In the context of graphs, the seminal paper by Newman and Girvan, 2004 introduced the concept of modularity (commonly known as the Newman-Girvan modularity), paving the way for a flourishing literature on community detection in networks.In the context of hypergraphs, the past few years have witnessed the surge of modularity-based proposals for hypergraph community detection.One of the first challenges is to propose a modularity criterion that measures the extent to which a hypergraph is composed of communities.This raises a more fundamental question: What is a community of nodes in a hypergraph?While in the context of graphs, a community is simply a set of nodes with more within-cluster interactions than between-clusters ones, generalizing that concept to hypergraphs is not immediate.As hypergraph interactions have a heterogeneous size (i.e., the number of nodes they contain), a primary issue is whether one should weigh the links with respect to (wrt) their sizes and put more emphasis on larger hyperedges (see Figure 1 for an illustration).Consequently, various modularity criteria have recently emerged in the literature.
Figure 1 -On the left, a modular graph with two clusters is depicted, represented as circle-blue and triangle-green nodes, respectively.In each cluster, the number of withincluster interactions is much larger than the between-clusters ones.On the right, a hypergraph is shown using the same set of nodes, where each clique from the previous graph is replaced by a hyperedge.In this hypergraph, the number of within-cluster interactions in each of the two clusters is the same as the number of between-clusters interactions.Is this hypergraph modular?Should we consider weighting hyperedges with respect to their sizes to analyze how modular the hypergraph is?
For a long time now, the computer science literature has tackled hypergraphs by simplifying them into graphs, employing two primary methods: the clique reduction graph, also known as the two-section graph, and the star-expansion graph.In the clique reduction graph, each hyperedge of a hypergraph is transformed into a clique in a graph over the same set of nodes (as illustrated in Figure 1, where the graph on the left represents the clique reduction of the hypergraph on the right).Conversely, the star-expansion graph constructs a bipartite graph by treating the original vertex set as the first part and introducing a new vertex for every original hyperedge in a second part.These parts are then connected whenever a node is contained in a hyperedge in the hypergraph.While the former reduction loses information (the original hypergraph cannot be reconstructed from its clique reduction graph), the latter transformation is one-to-one, given that the two parts are labeled (allowing the distinction between original nodes and original hyperedges) and hypergraphs with self-loops and multiple hyperedges are allowed.Consequently, a natural approach is to define hypergraph modularity by relying on graphs.Figure 1 illustrates the limitations of such a method, where the clique reduction graph (on the left) appears clearly modular, while one may question whether the original hypergraph (on the right) should be considered modular or not.
In this article, we explore the current state-of-the-art and challenges posed by modularitybased community detection methods in binary hypergraphs.In the context of graphs, Yang et al., 2016 propose a comparative analysis of community detection algorithms for undirected and binary graphs.In the same vein, we here restrict our attention to modularity-based methods whose performances for community detection in binary hypergraphs are compared.The methodology is described in Section 2. After introducing general notation, we first present a reformulated version of the different hypergraph modularities existing in the literature (Section 2.2).The goal of this reformulation is to facilitate the comparison of concepts introduced independently from each other and never fully connected before.To be a valuable concept, a hypergraph modularity should come with a (local and/or heuristic) maximization algorithm that outputs a node clustering.Available implementations of such algorithms are presented in Section 2.3.To compare the different modularities and maximization algorithms, it is mandatory to work with synthetic datasets where ground truth clustering is known and hypergraph statistics can be controlled.While in the graph context, recent years have seen the emergence of benchmark datasets for such a task, as for instance the Lancichinetti-Fortunato-Radicchi benchmark graph (LFR, Lancichinetti et al., 2008) used in Yang et al., 2016, there is yet no such benchmark for hypergraphs.We thus rely on several models for generating synthetic modular hypergraphs, described in Section 2.4.Then Section 3 describes our experiments: which scenarios have been explored in each model generating method (Section 3.1) and quality assessment through the lens of different measures, namely true clustering recovery, running time, (local) maximization of the objective and the number of clusters detected (Section 3.2).All the results are presented in Section 4 and a discussion follows in Section 5.The scripts to reproduce the experiments are available online.
To conclude this introduction, we mention that there are other methods to cluster the nodes of a hypergraph, such as spectral clustering approaches (Chodrow et al., 2023;Ghoshdastidar and Dukkipati, 2017) or model-based methods (Brusa and Matias, 2022b;Ruggeri et al., 2023).It is also possible to cluster hyperedges instead of nodes (Ng and Murphy, 2022).However, our focus in this work is on clustering nodes through modularity-based methods.

General notation and definitions
A hypergraph H = (V , E) is defined as a set of nodes V = {1, ... , n} and a set of hyperedges E ⊂ P(V ), where P(V ) is the set of all subsets of V .In other words, each hyperegde e ∈ E is a subset of nodes in V (namely, e ⊂ V or e ∈ P(V )).A hypergraph can either be binary (presence/absence of subsets of nodes) or weighted (also equivalently called multiple).In the latter case, the hypergraph H = (V , E, w ) comes with a weight function w : P(V ) → N∪{0} such that ∀e / ∈ E, we have w (e) = 0, and w (e) ∈ N otherwise.The weight counts how many times a hyperedge appears in the hypergraph.Multiple (i.e., weighted) hypergraphs can be viewed as hypergraphs where the set of hyperedges E is allowed to be a multiset (some hyperedges may appear several times).A binary hypergraph is a particular case of a weighted hypergraph with weight function being the indicator function w (e) = 1{v ∈ e} (i.e., each hyperedge has multiplicity 1).The size of a hyperedge e is the number of nodes it contains |e| = v ∈V 1{v ∈ e}.A hypergraph is said to be s-uniform if it only contains hyperegdes of size s.Any 2-uniform hypergraph is simply a graph.We let E s denote the subset of E of hyperedges with size s.We can allow hyperedges e ∈ E to be multisets of V , in which case nodes may appear more than once in the same hyperedge.Such hypergraphs are called multiset hypergraphs and can either be binary or multiple.In a multiset hypergraph, each node v ∈ V has a multiplicity in hyperedge e ∈ E, denoted by m e (v ) ∈ N ∪ {0}, which counts the number of times this node appears in that hyperedge.Moreover, the hyperedge size accounts for the nodes multiplicity and becomes |e| = v ∈V m e (v ).For example, a self-loop {u, u} is a (multiset) hyperedge of size 2. In the following, unless otherwise stated, all sets can be multisets in which case all counts include multiplicities (be it for nodes or for hyperedges).A hypergraph is said simple whenever it is binary and nonmultiset, i.e., neither nodes or hyperedges may be repeated.The (weighted) degree deg H (v ) of a node v in a hypergraph H is the (weighted) count of the hyperedges it belongs to, namely deg H (v ) = e∈E w (e).The incidence matrix H of the hypergraph has size |V | × |E| and entries H(v , e) = 1{v ∈ e} or m e (v ) for multiset hypergraphs.Note that we use the same notation H for a graph and its incidence matrix, the difference should be clear from the context.Letting w = (w (e)) e∈E denote the (column) vector of the hyperedges weights and w ⊺ its transpose, we obtain the vectors of node degrees and hyperedges sizes as Hw and w ⊺ H, respectively.Two nodes are said incident whenever they belong to a same hyperedge e ∈ E.
For any subset of nodes C ⊂ V , we define its volume: and the (weighted) number of hyperedges whose nodes are all included in C :

Note that Vol
From a (weighted) hypergraph H = (V , E) we may construct its clique reduction graph G clique = (V , E ).This graph has the same set of nodes V as the hypergraph and every hyperedge e ∈ E in the hypergraph is reduced into a complete clique in the graph.In other words, for any hyperedge e ∈ E with size |e| ≥ 2 and for any pair of incident nodes u, v ∈ e, the graph G clique contains the edge {u, v } ∈ E and only edges obtained in this way are contained in E .The (weighted) adjacency matrix A clique of the clique reduction graph satisfies A clique = Hdiag(w )H ⊺ , where H is the incidence matrix of the hypergraph and diag(w ) is the diagonal matrix induced by the vector of hyperedge weights w .In general, self-loops are removed from A clique and A clique uu is set to 0 for any u ∈ V .This can be done directly by setting A nodes clustering is a partition C = (C 1 , ... , C K ) of the set of nodes V into parts called clusters.For any partition C = (C 1 , ... , C K ) of the set of nodes V and any subset e ⊂ V , we let e ∩ C = (e 1 , ... , e J ) denote the partition of the subset e induced by C. It has J parts with J ≤ K and is indeed a partition of e, namely The adjacent clusters of a node u ∈ V are the parts C k that contain at least one node v ∈ C k that is incident to u, or in other words such that there is a hyperedge e ∈ E such that u, v ∈ e.
In this manuscript, the identity matrix is denoted by I (its size should be clear from the context).We already used notation |S| for the cardinality of a set S (or a multiset), and 1{S} for the indicator function of an event S.

Modularities in hypergraphs
Different hypergraph modularity criteria have been proposed in the literature up to now (Chodrow et al., 2021;Kamiński et al., 2019a;Kamiński et al., 2021;Kumar et al., 2020).We recall these different quantities, using a unified presentation that highlights similarities and differences between them.As we will see, these are all constructed in the same way, namely the difference between a first term that is a specific hyperedge count and a second term that in some cases corresponds to the expected value of this count under some null model, and otherwise is a correction term.The differences between the expressions of those hypergraph modularities come from: i) the type of hyperedges that are counted; ii) the null model used for computing the expectation or the correcting term; iii) possible weights to each of these terms.Kumar et al., 2020's definition of hypergraph modularity corresponds to a graph modularity as originally defined in Newman and Girvan, 2004 and applied to a specific graph choice.Considering the clique reduction graph of a hypergraph, Kumar et al. noticed that the reduction does not preserve the node degrees: in the clique reduction graph G clique , the degree of a node differs from its initial value in the hypergraph H. Indeed, a simple computation shows that Thus, Kumar et al., 2020 simply modified the weights in the clique reduction graph to preserve these degrees.Let D E = diag(|e|) e∈E denote the diagonal matrix of the hyperedges sizes.We define the weighted clique reduction graph G w-clique through its adjacency matrix The node degrees in this graph G w-clique are equal to the initial node degrees in the hypergraph H (where self-loops are discarded).This construction is equivalent to saying that for each hyperedge e ∈ E, we create G w-clique by forming a total of |e| 2 edges with weights w (e)/(|e| − 1), between any pair of nodes incident in the hypergraph H.

Veronica Poda & Catherine Matias 5
Peer Community Journal, Vol. 4 (2024), article e37 https://doi.org/10.24072/pcjournal.404 Then for any hypergraph H = (V , E) and any partition C = (C 1 , ... , C K ) of its set of nodes V , we let Note that Q w-clique ranges in [−1; 1].It is an average over all pairs of nodes u, v belonging to the same cluster C k of the difference between the weighted edge value A w-clique uv in the weighted clique reduction graph and its expectation under a configuration model (Chung and Lu, 2002) that accounts only for nodes degrees and plays the role of a null model.A high value of modularity Q w-clique means dense connections in the weighted clique reduction graph G w-clique between the nodes within the same cluster and sparse connections between nodes in different clusters.Going back to the hypergraph H, that means node pairs u, v ∈ V belonging to the same cluster participate more in the same hyperedge than node pairs in different clusters.Kamiński et al., 2019a introduce a strict hypergraph modularity such that only the hyperedges e ∈ E entirely included in a same cluster contribute to increasing modularity, which is in sharp contrast with the previous proposal.For any hypergraph H = (V , E) and any partition C = (C 1 , ... , C K ) of its set of nodes V , we let Note that Q strict also ranges in [−1; 1].Here, the first term inside the sum accounts for the number of hyperedges whose all nodes are within the same cluster.The second term comes from a generalization of the Chung and Lu model to hypergraphs.Again, it plays the role of an expected value of the first term e H (C k ) under some null model which preserves both node degrees and the (weighted) number |E s | of size-s hyperedges.This quantity is called by its authors the degree tax.Kamiński et al., 2021 propose a more general modularity that accounts for the homogeneity of each hyperedge, namely, the fraction of its vertices that belong to the largest cluster (provided it is more than 50%).For any subset C ⊂ V , any size s ≥ 2 and any integer c ∈ {⌊s/2⌋ + 1, ... , s}, we let e s,c H (C ) denote the number of size-s hyperedges that have exactly c nodes included in their majority part C .With our previous notation, we have e H (C ) = s≥2 e s,s H (C ).
In the following, P(Bin(s, p) = c) = s c p c (1 − p) s−c is the probability that a Binomial random variable with parameters (s, p) takes the value c.Then for any partition C = (C 1 , ... , C K ) of the set of nodes, Kamiński et al., 2021  where w s,c ∈ [0, 1] are hyper-parameters to be specified.Note that we have s so that (2) is a special case of (3) where w s,c = 1{c = s}.Different setups may be considered for the hyper-parameters w s,c and we focus here on the choices for which an optimisation algorithm is available, namely (4) majority setting, c/s1{c > s/2} linear setting.
As already mentioned, the strict setting gives back Q strict , already introduced in (2).For the other 2 settings, we call the corresponding modularities Q majority and Q linear , respectively.Finally, Chodrow et al., 2021 first defined a general symmetric modularity, where for any partition C of the set of nodes, the contribution of a hyperedge e ∈ E to the modularity of this partition is characterized only by the vector p whose entries p k count the number of nodes in e belonging to the k-th largest part in e ∩C.It is based on a general affinity function Ω : P → R that modulates the weight of the contribution of each partition vector p, where the set of partition vectors is For instance, a s-tuple of nodes with s = 7 that are clustered by a partition C into the parts {v 1 }; {v 2 , v 3 }; {v 4 }; {v 5 , v 6 , v 7 } induces the partition vector p = (3, 2, 1, 1).The symmetric modularity from Chodrow et al., 2021 will thus account for the different clusters counts that compose a hyperedge, treating all the clusters in an exchangeable way.We present the details of this modularity in Section A from the Supplementary Material.Then, the authors consider particular cases of their general symmetric modularity, relying on specific forms of the affinity function Ω (see Table 1 in that reference).However, an implementation of the algorithm for optimising the induced specific modularities is available only for the all-or-nothing affinity function on which we focus now.
The all-or-nothing modularity function is defined as: where βs and γs are parameters estimated from the data.While in general we may expect that both βs , γs > 0 (see Section B in the Supplementary Material for more details on these parameters), we then recover in this expression a sum of difference terms between a count of specific hyperedges, namely those entirely included in a cluster, and a correcting volume term.The extra parameters βs , γs might not seem natural at first.In fact, they appear as the result of an approximate maximum likelihood approach in a specific degree-corrected hypergraph stochastic blockmodel (DCHSBM), in the same way as Newman, 2016 did in a graph context.
As a final remark, Chodrow et al., 2021 notice that considering the specific choices βs = 1 and γs = |E s |/Vol H (V ) s in their modularity Q aon , they recover (up to a scaling factor and an additional term not depending on the partition C and which can thus be discarded) the expression of the Veronica Poda & Catherine Matias 7 Peer Community Journal, Vol. 4 (2024), article e37 https://doi.org/10.24072/pcjournal.404modularity Q strict from (2).However, they argue that leaving these parameters free (adapting to the data) lends important flexibility to their approach.
Additional comments.We already highlighted similarities and differences between the different modularities defined above.Let us add some more comments.Two extreme cases are represented by the modularities Q w-clique and Q strict , the former being less stringent than the latter.Whenever a hyperedge is split by the partition C into different clusters, it will be ignored by Q strict but as soon as this hyperedge contains at least 2 nodes in the same cluster, the modularity Q w-clique will account for it.The weakness in Q w-clique lies in that the exact composition of each hyperedge in nodes falling into the different clusters is captured only through pairs of nodes.The modularity Q wsc represents a compromise between the 2 previous extremes: it accounts for homogeneous hyperedges, namely hyperedges such that (at least) half of their nodes fall into a cluster that becomes a majority cluster.In particular, Kamiński et al., 2021 argue that the hyper-parameters w s,c may be chosen so that Q wsc well approximates Q w-clique because contributions in the latter from parts that contain at most s/2 vertices may often be neglected.Finally, the modularity Q aon is as strict as Q strict and focuses on hyperedges with nodes split into a unique cluster by the partition C. As already stressed, the major difference between Q strict and Q aon lies in that the latter, while summing similar differences as the former, weights differently each terms in those differences (with weights adaptive to the data, as they are estimated from these).
Note that possible self-loops in the hypergraph H never contribute to a modularity and may thus be discarded from the dataset.However, we highlight that all these modularities are developed for multiset hypergraphs, where nodes may be repeated in a same hyperedge.In particular, the Chung and Lu null models (for graphs and hypergraphs) used in defining modularities Q w-clique , Q strict and Q wsc as well as the DCHSBM underlying the definition of the modularity Q aon , all rely on models for multiset hypergraphs.While it is known in the case of graphs that this is inadequate (Cafieri et al., 2010;Massen and Doye, 2005;Squartini and Garlaschelli, 2011), that assumption has not yet been discussed in the context of hypergraph modularities.It might be that the computational simplifications enabled by this assumption prevent from any attempt not to use it (see for e.g.Section B2 in Supplementary Material from Brusa and Matias, 2022b).

Modularity maximization methods
In this section, we focus on available implementations for hypergraph nodes clustering through modularity-based methods.We briefly describe the corresponding algorithms and their major characteristics, as well as the options that were chosen for our comparison study.All the algorithms require an initialization, most of the times relying on an initial partition where each node is in its own part, i.e., C own = ({1}, ... , {n}).We group the different methods by the packages where they can be found.A summary is given in Table 1.
Note that we did not include in our experiments a comparison with methods based on clique reductions.Indeed, Kumar et al., 2020 already did so and concluded that "hypergraph based methods perform consistently better than their clique based equivalents" (end of page 16 in that reference).et al., 2020 propose to maximize their modularity Q w-clique relying on the popular and fast Louvain algorithm for graphs (Blondel et al., 2008).More precisely, they do not simply apply Louvain algorithm on the graph G w-clique but rather propose an Iteratively Reweighted Modularity Maximization (IRMM) algorithm where they iteratively apply Louvain on a weighted clique reduction graph, and compute new hyperedge weights see Algorithm 1 in Kumar et al., 2020.The hyperedge re-weighting step puts a larger weight on hyperedges which are cut into more unbalanced partition vectors by the current partition C. For example, a size-s hyperedge cut into the partition vector p = (s − 1, 1) (meaning a unique node falls in a cluster different from the majority one) is much more unbalanced than another one cut into the partition vector p = (s/2, s/2) (namely half of the nodes belong to a first cluster, the other half belonging to a second cluster) and thus gets a larger weight (see Figure 1 in Kumar et al., 2020).By getting a larger weight, it is more likely that the unique node in this hyperedge will join the majority cluster at Louvain's next step.The function hmod.kumarimplements the IRMM algorithm.

HyperNetX package.
The last step refinement (LSR) is an algorithm described in Kamiński et al., 2021.This is a general method that starting from an initial partition of the nodes, iteratively moves one vertex at a time (in a random order) to a neighboring cluster whenever it improves Q wsc , until convergence.The authors propose to start by running the IRMM on the weighted clique reduction graph, then the resulting partition is used as initialisation in their LSR procedure, that aims at maximizing Q wsc .For the specific choices strict, majority and linear of the hyper-parameters w s,c described above, implementations are provided.The modularity Q wsc is obtained through the function hmod.modularity from the HyperNetX package and the LSR algorithm is implemented in the function hmod.last_step from this same package.Both functions contain the 3 different options for hyper-parameters w s,c defined in (4) and the default choice is linear.This is this option that we choose for our comparisons.strictModularity package.Kamiński et al., 2019a propose a Clauset-Newman-Moore like (CNM-like) algorithm to maximize Q strict (see Clauset et al., 2004, for the original CNM algorithm).Starting with partition C own where each node is in its own part, this algorithm iterates over the set of hyperedges that are split into more than 2 clusters by the current partition, trying to merge all the parts it touches and looking for a modularity improvement.More precisely, the algorithm comes in two versions.In the first one, a loop over all hyperedges is taken, so that at each step all hyperedges are searched and evaluated for merging.In the second one, a stochastic approach is taken which evaluates at each step just one randomly chosen hyperedge (see Algorithm 1 in Kamiński et al., 2019a, for more details).The stochastic version is computationally less expensive, especially for larger hypergraphs; however it requires to set a maximal number of iterations.In what follows, we choose that second version and set the number of iterations to twice the total number of hyperedges.The implementation is available from Kamiński et al., 2019b, in a mix of Python and Julia files.More precisely, a script strictModularity.pycontains a "quick" Python implementation that should work on small datasets only, while a Julia function find_comms is more generally provided to perform the CNM-like algorithm.We rely on the latter in our experiments.the Supplementary Material) and the simpler and faster AON-HMLL algorithm for maximizing the specific all-or-nothing Q aon modularity.Both the HMLL and the AON-HMLL are implemented in the Julia package HyperModularity (Chodrow et al., 2022).However the current version of the HyperModularity package does not contain an implementation of an estimation of a general affinity function Ω that is required to compute the symmetric modularity.That is why we focus on the AON modularity Q aon and the corresponding AON-HMLL algorithm.
The AON-HMLL algorithm is an iterative algorithm that mimics the standard graph Louvain algorithm in that it starts with initial configuration C own (each node is in its own part) and at the first iteration, it greedily moves nodes to adjacent clusters (i.e., clusters that contain incident nodes) until no more improvement of Q aon is possible.The subsequent iterations however differ from Louvain's approach and instead of considering a weighted graph on "supernodes", it greedily moves entire clusters to adjacent ones whenever this improves Q aon .Note that the option startclusters from Simple_AON_Louvain_mod determines which initial partition is used to estimate the parameters βs , γs .We rely on startclusters == "cliquelouvain" that gives the best results in general.

Synthetic models for binary and modular hypergraphs
To compare the different modularity-based approaches for clustering hypergraphs nodes, it is mandatory to rely on simulations of modular hypergraphs where ground truth clusters are known.As mentioned earlier, there is no single standard method for generating modular hypergraphs, and, to our knowledge, there are two main approaches.The first approach is based on hypergraph stochastic block models, with several variants proposed in the literature.The second approach involves a generalization of the LFR model for graphs (Lancichinetti et al., 2008).We chose to consider two variants of the first approach and the only one that we are aware of in the second approach.A summary of these models is given in Table 2.We highlight the similarities and differences between those different generating models and the characteristics of the hypergraphs generated by those approaches.In all those models, we fix a number of nodes n, either fix or randomly generate a true number of clusters K (that might depend on n) as well as a true partition of the nodes C true = (C true 1 , ... , C true K ) and a maximal size S of hyperedges.
Hypergraphs with HSBM.We consider datasets simulated under a simple (binary and nonmultiset) Hypergraph Stochastic Blockmodel (HSBM, see Brusa and Matias, 2022b) generated through the R package HyperSBM (Brusa and Matias, 2022a).In this model, we fix the true number of clusters K , their proportions π = (π 1 , ... , π K ) such that π k ∈ (0, 1) and k π k = 1 and the following parameters, for any 2 ≤ s ≤ S, , so that α s (resp.β s ) is the probability for a s-tuple of nodes to form a hyperedge given that they belong to the same cluster (resp.given that they are not all in same cluster).The parameters should be chosen in order to ensure that the generated hypergraphs are modular.To this aim, we consider the ratios ρ s of the number of within-cluster size-s hyperedges over the number of between-clusters size-s hyperedges obtained as: .
In our simulations, we impose ρ s > 1, with larger values corresponding to more modular hypergraphs.Note that in this setting, the total number of size-s hyperedges is random and has expected value We simulate hypergraphs with decreasing values E(|E s |) when s increases, which is more realistic than the constant case.
Hypergraphs with DCHSBM-like.We consider datasets simulated under the DCHSBM-like generating model proposed by Chodrow et al., 2021.This model relies on a fixed true number of clusters K , balanced clusters and equal numbers of size-s hyperedges for 2 ≤ s ≤ S; so that for each size s, a total of |E|/(S − 1) hyperedges are drawn.With probability p s , such a hyperedge is placed on a s-tuple of (distinct) nodes within the same cluster and with probability 1 − p s , it is placed on any s-tuple of (distinct) nodes.
The ratio ρ s of within-cluster over between-clusters size-s hyperedges is random and its expectation is Note that the DCHSBM-like generating model has been originally proposed for multisets hypergraphs, where nodes may be repeated in hyperedges, and hyperedges may be multiple.In practice, as we consider sparse hypergraphs where the number of hyperedges is linear wrt the number of nodes, multiple hyperedges are rare.Section C from the Supplementary Material contains some further considerations on the links between parameters in HSBM and DCHSBMlike.(Kamiński et al., 2023a).This generating model is an appropriate candidate to compare modularity approaches.In this model, we fix the number of nodes n and we either input a sequence of node degrees or it is sampled from a power-law distribution with some input exponent γ ∈ (2, 3) and input minimum/maximum degree values.The true clusters sizes are also either input or sampled from a power-law with some input exponent β ∈ (1, 2) and minimum/maximum sizes values.In our case, we choose to fix the cluster sizes so that the number of clusters is fixed rather than random.The model requires a sequence q = (q 1 , ... , q S ) of weights summing to 1 such that S is the maximal hyperedge size and q s is the fraction of size-s hyperedges.For instance fixing q 1 = 0 prohibits self-loops.The script abcdh.jlalso handles the proportion of homogeneous hyperedges, where homogeneity is the concept discussed in Section 2.2 when introducing Q wsc .We recall that a homogeneous hyperedge has more than half of its nodes within the same (majority) cluster.Let ω c,s denote the fraction of homogeneous hyperedges of size s that have exactly c ≥ ⌊s/2⌋ nodes belonging to their majority cluster, so that c=⌊s/2⌋+1 ω c,s = 1.This notation is not to be confused with the weights w s,c introduced in (4).To link it to previously introduced quantities, we remark that ω c,s = e∈Es Thus in the majority setting, a homogeneous hyperedge is randomly drawn among all hyperedges with more than half of their nodes in their majority cluster, while in the strict setting, homogeneous hyperedges are exactly within-cluster hyperedges (i.e., all nodes belong to the majority cluster).The linear setting spreads the homogeneous hyperedges in a linear fashion across the different values c of the number of nodes in the majority cluster.In that setting, there is thus a larger number of homogeneous hyperedges showing a larger number of nodes in their majority cluster.Having set which hyperedges are homogeneous ones, a mixing parameter ξ ∈ (0, 1) controls for the proportion of the degree of each node that is assigned to nonhomogeneous hyperedges.In this generating model, the total number of hyperedges is random and equals In the strict setting, we can also express the ratio ρ s of within-cluster over between-cluster size-s hyperedges as for the strict setting of h-ABCD.

Scenarios
General principles and base case scenario.Our simulations explore various settings in order to i) highlight the global behaviors of the methods and compare their performances; ii) explore which hypergraphs characteristics most impact those performances.In all the settings, we choose to focus on sparse hypergraphs, for which the number of hyperedges grows linearly with the number of nodes, as this is a most realistic setting.
We start with "base case" scenarios, called scenarios A, that we defined in the 3 different generating models (HSBM, DCHSBM and h-ABCD).Then we explore other scenarios (called B to F and Z), simulated under the most convenient model to do so, and in which we modify only one characteristic at a time wrt the base case.Each scenario is composed of subcases with different samples sizes, comprising in general cases 1 to 6 corresponding to n ∈ {50, 100, 150, 200, 500, 1000}.Moreover, in each scenario explored, we randomly generated 25 hypergraphs.Table 3 gives a summary of the scenarios considered and the empirical characteristics of the 25 hypergraphs generated for each of them.In this table, each simulation is summarized through its differing characteristic wrt the base case (namely, scenarios A).For example, ScenB-DCHSBM is a simulation of hypergraphs less sparse than the base case.We started from a first series of scenarios, called scenarios A, that play the role of a reasonable sparse case for the methods to work.To explore the robustness of our conclusions, these scenarios are presented under the 3 different generating models (HSBM, DCHSBM and h-ABCD) relying on similar settings for sample size n and number of hyperedges |E|.We set the numbers of hyperedges such that they grow linearly with the number of nodes n (sparse setting).We generated K = 3 clusters with equal size or probability (depending on the generating model) and the maximum hyperedge size S = 3.This latter choice ensures both reasonable computing times and simplicity of model parametrization.The ratio ) is set to 0.7 (on average) to reflect the fact that we expect larger sizes hyperedges to be less frequent than smaller-sizes ones.The within-cluster over between-cluster hyperedge ratio is constant wrt size s ∈ {2, 3} and set to ρ s = 1.7 (either exactly or on average), in order to obtain modular hypergraphs.
For this scenario A, we first generated hypergraphs under HSBM with a number of nodes n up to 500, the algorithm becoming too slow for n = 1000.Under DCHSBM, we went up to sample size n = 1000.Finally we generated samples under h-ABCD again up to a number of nodes n = 1000.In this latter model, we considered the strict setting regarding homogeneous hyperedges and choose the parameter ξ such that the resulting ρ s = 1.7 and we set |E 2 | = 3|E 3 |, which is approximately the case in the other 2 models.The degree distribution is scale-free with γ = 2.07 and minimum and maximum value set to 1 and 32, respectively (the observed values in the other 2 models).Note that the range of γ ∈ (2, 3) did not allow us to select mean degrees with similar values than with the other 2 models.In this sense, this scenario A under h-ABCD generating model diverges from the other ones (under HSBM and DCHSBM).
Variant scenarios.We further contrasted scenarios A by varying one characteristic at a time, keeping all others fixed.As our conclusions on scenarios A were globally robust against the choice of the generating model (at least among HSBM and DCHSBM, see next Section 4), we explored those variations in the most convenient model to do so.In scenario B, we decrease the sparsity of the model by generating more hyperedges (keeping all other parameters identical as in scenario A).In scenario C, we explore the effect of unbalanced clusters, while in scenario D, we explore the effect of varying the proportions of size-2 and size-3 hyperedges, namely considering more size-3 than size-2 hyperedges.Scenarios E (resp.F) considers the case where the within-cluster over between-cluster hyperedge ratio ρ s is increased (resp.decreased) wrt scenario A. Finally, because we obtained pretty bad results for all modularity clustering methods relying on hypergraphs generated by h-ABCD (see next Section 4), we explored in scenario Z the author's default values of that model to generate modular hypergraphs.Note that in this case, the true number of clusters K is random and the ratio ρ s cannot be obtained from the model parameters.

Quality assessment
We now describe the different properties explored to assess the quality of each method.These properties are summarized in Table 4.
We first consider accuracy of the clustering, relying on the Adjusted Rand Index ARI, Hubert and Arabie, 1985 that measures similarity between Ĉ and C true (up to label switching).It is upper bounded by 1, where a value of 1 indicates perfect agreement between the clusterings, and negative values indicate less agreement than expected by chance.Then we consider running Veronica Poda & Catherine Matias times (expressed in seconds) of each method.The results have been obtained on a computer with a AMD EPYC 7542 32-Core processor, 128 CPU (2 sockets of 32 double threads cores; we used just one core for each job as none of the procedure is parallelized) and 675Gb RAM.We already mentioned that modularity maximization is far from trivial because of the size of the search space.Thus, an important question is whether the method at stake indeed maximizes its objective.To assess this, we measure the relative error between the ground truth modularity A method that reaches its objective (modularity maximization) without being able to recover the true modular clusters would reveal that it is based on a definition of modularity that is not appropriate.Also note that this error has a sign, with negative values indicating that ground truth modularity is not the maximum value.The mean values and standard deviations for ground truth modularity Q true are also reported (Table 1 in the Supplementary Material), since values close to zero could induce unstable errors.We finally also consider the estimated number of clusters K wrt its true value K .In general we present a barplot of the estimated values, to be compared to the true and fixed one.Only for scenarios Z where the true value K is random, we plotted the difference K − K .

Question
Measure Is the classification correct?ARI( Ĉ ; C true ) Is the method fast ?
Running times Is the modularity maximized?
Relative error between Q true and Q Is the number of clusters correct ?distribution of K wrt K

Results
General comparison.We first analyze the results under the simplest scenarios (namely scenarios A, which represent our base case) and the HSBM generating model.Results are presented in Figure 2. First, the CNM-like algorithm does not recover the ground truth clusters, with ARI values around 0 (Figure 2, top left).In fact, the algorithm did not improve over its initialization at C own = ({1}, ... , {n}) and the number of estimated clusters corresponds to the actual number of nodes (bottom right).Its relative error on modularity is constant and corresponds to the relative difference between the modularity of C true and that of C own .It is positive, so that the modularity maximization goal is clearly not achieved here.The other 3 methods successfully recover the true clusters.For those 3 methods, median ARI values are above 0.7 (top left) and the number of estimated clusters varies between 3 and 6 (bottom right).While the AON-HMLL globally obtains the best ARI results (top left), it is also the fastest method (top right) and it attains its objective of modularity maximization (relative error around 0, see bottom left).The LSR algorithm was proposed to improve over the IRMM.While its relative error on modularity (bottom left) seems in general improved over the latter (with smaller values), ).This might be due to our setting where the within-cluster over between-cluster hyperedge ratio ρ s is kept constant when n varies.Let us now compare these results with those obtained on scenarios A generated under DCHSBM and presented in Figure 3. From these simulations, we confirm the previous conclusions: the AON-HMLL is globally the best method and the CNM-like algorithm has very low performance for clustering recovery (ARI values very small).The other 2 methods successfully recover the clusters but the LSR does not improve on the IRMM and has a much larger computing time.Computing times are similar in this simulation and the former one; to see this, we choose to remove computing times for the LSR method in scenario A6 (Figure 3, top right).Indeed, those values are all above 15,000 seconds and including them would have changed the y -scale in a way preventing from any possible comparison.As a consequence, we conclude that our analysis is robust against the choice of HSBM or DCHSBM generating model.To finish with these settings from scenarios A, we consider Figure 4 where the results for hypergraphs generated under the h-ABCD benchmark method are provided.Let us recall that while we tried to mimic as much as possible the characteristics of the scenarios A obtained under HSBM and DCHSBM, it was impossible to obtain similar node degrees within that h-ABCD generating process (see Table 3) and the ones obtained here are much smaller.We observe that in this setting, none of the proposed methods is able to reconstruct the true clusters: ARI values are generally lower than 0.3 (see Figure 4, top left) and the number of estimated clusters is too large (bottom right).Nonetheless, the modularity maximization seems to work as the relative error between the ground truth modularity and its estimation is small (bottom left).Note also that the LSR algorithm seems to find a clustering with larger value of the Q linear modularity than at the ground truth clusters (negative errors).Overall, our conclusions raise the following question: are these datasets indeed modular?We will come back to this later when discussing scenarios Z.
We now explore additional insights on the methods performances provided by other scenarios.

Impact of sparsity.
In scenario B, we decreased the sparsity wrt to scenario A (note that the hypergraphs remain nonetheless sparse, see Table 3).Results are presented in Figure 5.Here again, we removed from the time plot (top right) all values for the LSR method in scenario B6.Their   range between 16,961 seconds and 17,895 seconds would have changed the y -scale.We mostly observe that while the above conclusions are still valid, the performances of the 3 "working" methods (AON-HMLL, IRMM and LSR) increase wrt to scenario A. Indeed, except for the CNM-like algorithm, the methods exactly recover the true number of clusters (bottom right) and ARI values are almost equal to 1 (top left).Relative errors on modularity are also almost zero for those 3 methods, indicating that the local maximization of the modularity works.We note that the CNM-like method has relative error equal to 1.This comes from the fact that the maximized modularity is zero while the ground truth modularity is not zero.Also note that the computing time for this method in scenario B6 becomes significantly larger.
Impact of unbalanced clusters.Let us now turn to scenario C where we explore the impact of unbalanced clusters.Results are presented in Figure 6, where we removed from the time plot (top right) all values for the LSR method in scenario C5 as they range between 2,646 and 3,842 seconds.We observe that the overall performances of the methods have decreased wrt scenario A: ARI values are quite low (top left) and the number of clusters is over-estimated (bottom right).Contrarily to scenario A, increasing the number of nodes n degrades the performance of ARI.This is quite counter intuitive, as we expect that with larger values of n, the clusters sizes increase and thus should be easier to detect.Relative errors on modularity are also almost zero for those 3 methods, indicating that the local maximization of the modularity works.From that simulation we conclude that clustering via modularity maximization is easier for datasets with a larger proportion of large-size hyperedges and conversely, more difficult in the realistic setting where larger sizes hyperedges are in smaller proportion.
Impact of within-cluster over between-cluster hyperedges ratio.Scenario E (resp.F) rely on a larger (resp.smaller) value for the within-cluster over between-cluster hyperedge ratio ρ s (still constant with hyperedge size s) compared to scenario A. The results of this simulation are presented in Figure 8 (resp.Figure 9).and F6 respectively have been removed.We can observe that the modularity based methods are sensitive to this parameter ρ s , with better clustering results obtained when this ratio is large.
As expected, the more modular the hypergraphs are, the easier it is to recover the clusters.
Exploring possible bias from generating models.The bad results obtained by all methods on the datasets generated from scenarios A under h-ABCD model raised the question whether those hypergraphs are indeed modular.As we choose the settings of this simulation to mimic the observations obtained under HSBM and DCHSBM but did not completely succeed in that task, one could wonder whether our parameter choices make sense for this model.That is why we consider scenarios Z under h-ABCD, relying on the authors of the model default parameter choices.Note that we started at sample size n = 100 because n = 50 did not work.The results obtained on these datasets are presented in Figure 10.Here, we observe that again, none of the methods is able to recover the ground truth clusters (top left plot shows ARI values around 0 and bottom right plot shows difference between estimated and true number of clusters quite large).This seems to indicate that h-ABCD is not an appropriate benchmark method to test community detection algorithms.Overall, we could wonder whether the generating models DCHSBM and HSBM could be favoring the AON-HMLL method.This could be particularly the case for the DCHSBM model as  this model and the AON-HMLL method both derived from the same article (Chodrow et al., 2021).However, we can argue against that claim that the modularities Q aon and Q strict maximized by the methods AON-HMLL and CNM-like, respectively (see summary in Table 1) both focus on contributions by within-clusters hyperedges only.More precisely, the difference in Q aon and Q strict lies only on adaptive weights included in the former, the latter appearing as a special choice of those weights.We thus conclude that our simulations that partly focused on the ratio of within-clusters over between-clusters hyperedges is not especially in favor of the AON-HMLL method.
As a final note, we notice that in our experiments, the IRMM method sometimes shows some very large values for the relative modularity error (points that we called "outliers" and removed to preserve y -scales in the plots).Looking at

Discussion
Let us now summarize the main findings of this study: • Globally, the best modularity-based approach is the AON-HMLL, as it often recovers the ground truth clusters and is among the fastest approaches; • The IRMM algorithm has often good results at recovering ground truth clusters, but it is less fast than the AON-HMLL; • Though the LSR algorithm is specifically designed to improve on the IRMM, it does not improve the clustering problem at stake;  In the following, we concentrate on commenting the results of the "working methods", namely the AON-HMLL, IRMM and the LSR: • The working methods tend to have better results when the densities of the hypergraphs increase, though still in a sparse setting (i.e., when the number of hyperedges increases, see scenarios B); • The methods are sensitive to the balance in the cluster sizes, with better results when clusters are balanced (see scenarios C); • The methods tend to have better results when we observe a larger proportion of largersize hyperedges (i.e., when |E 3 | becomes larger than |E 2 |, see scenarios D); • The methods are sensitive to the ratio ρ s of within-cluster over between-cluster size-s hyperedges, with better results when this ratio is larger (thus the hypergraph is more modular, see scenarios E and F).
Another conclusion from our study is that the h-ABCD benchmark model (Kamiński et al., 2023b) does not seem appropriate to generate modular hypergraphs, or at least that none of the current modularity-based approaches is able to detect the simulated clusters in those hypergraphs.
Our work is a first building block in gaining a better understanding of modularity in hypergraphs, yet it comes with certain limitations that warrant attention in future research.One constraint arises from computational limitations in both the generating models and modularity maximization methods, restricting our exploration to relatively small graphs (with a number of nodes n ≤ 1, 000).Consequently, we constrained ourselves to a limited number of clusters (K ≤ 3), as larger values might lead to clusters too small for effective detection.Our focus was on binary hypergraphs, which already encompass a vast array of higher-order interactions.However, weighted hypergraphs are also of significant interest.Additionally, our approach relied on simulated hypergraphs with characteristics dictated by methodological constraints (e.g., the number of nodes, number of clusters) and others chosen to align with what we believe to be realistic (e.g., sparse hypergraphs, |E 2 | ≫ |E 3 |, ... ). Lee et al., 2021 examined 13 real-world hypergraphs with heterogeneous sparsity (ratios |E|/n ranging from as small as 0.5 to around 50) and an average hyperedge size s generally less than 3.9, with two exceptions (hypergraphs related to drug chemicals).Despite this attempt, the literature still lacks a large-scale study on the characteristics of real-world hypergraphs that could inform and support simulations.There remain numerous unresolved questions that extend beyond the scope of the present contribution.Realistic characteristics, influenced by parameter choices in the generating models, are intricately tied to the issue of detectability thresholds.Specifically, under what circumstances is it possible to effectively recover clusters in a hypergraph?While this question has garnered attention for uniform hypergraphs (Angelini et al., 2015;Chien et al., 2019;Stephan and Zhu, 2022;Zhang and Tan, 2023), real-world hypergraphs, which are non-uniform, remain largely unexplored in this context.Furthermore, moving beyond clustering recovery, it would be valuable to investigate the discriminative power of modularities.Specifically, understanding how discriminative each proposed modularity measure is could provide insights on their design.Examining the distribution of modularity values across a diverse set of hypergraphs, including non-modular ones, holds significant importance.In a similar vein, whether hypergraph modularities are unimodal or not is an important question.Characterizing the behavior of modularities across the entire spectrum of node clusterings would aid in designing suitable modularity-based methods for community detection in hypergraphs.

K
k=1 e s,c H (C true k )/|E s |.The current implementation handles 3 different options: linear, strict, majority, corresponding to the following choices

Figure 2
Figure2-Datasets HSBM, scenarios A1 to A5.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).The IRMM and (consequently) the LSR methods both gave an error on one dataset in scenario A5.Outlier points have been removed: from the relative error plot (bottom left), 1 value below -500 concerning the IRMM method in scenario A1.Moreover, one dataset from scenario A5 gave an error with the IRMM and (consequently) the LSR methods; corresponding results were removed from the plots.

VeronicaFigure 3 -
Figure 3 -Datasets DCHSBM, scenarios A1 to A6.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500, 1000}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).From the time plot (top right), values for the LSR method in scenario A6 range between 15,796 and 22,350 seconds and are not shown.Outlier points have been removed from the relative error plot (bottom left): 1 value above 300 concerning the IRMM method in scenario A4.

Figure 4 -
Figure 4 -Datasets h-ABCD, scenarios A1 to A6.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500, 1000}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).Outlier points have been removed: from the relative error plot (bottom left), 3 values at 25, -50 and -55 concerning the IRMM method with in scenarios A6, A2 and A3 respectively.

VeronicaFigure 5 -
Figure 5 -Datasets DCHSBM, scenarios B1 to B6.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500, 1000}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).From the time plot (top right), values for the LSR method in scenario B6 range between 16,961 and 17,895 seconds are not shown.

Figure 6 -
Figure 6 -Datasets HSBM, scenarios C1 to C5.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).From the time plot (top right), values for the LSR method in scenario C5 range between 2,646 and 3,842 seconds and are not shown.Outlier points have been removed: from the relative error plot (bottom left), 2 values concerning the IRMM method, one above 30 in scenario C3 and the second below -60 in scenario C4.

Figure 9 -
Figure9-Datasets DCHSBM, scenarios F1 to F6.Comparison by increasing the number of nodes from n ∈ {50, 100, 150, 200, 500, 1000}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and estimated number of clusters (bottom right, true value is 3).From the time plot (top right), values for LSR method in scenario F6 range between 22,891 and 39,967 seconds and are not shown.Outlier points have been removed from the relative error plot (bottom left): 3 values concerning the IRMM method, with 2 values smaller than -40 and -680 and 1 value larger than 22 in scenarios F1, F4 and F6 respectively.

Figure 10 -
Figure10-Datasets h-ABCD, scenarios Z1 to Z5.Comparison by increasing the number of nodes from n ∈ {100, 150, 200, 500, 1000}: Adjusted Rand Index (top left), time in seconds (top right), relative error on modularity (bottom left) and difference between estimated number of clusters and true value (bottom right).Outlier points have been removed: from the time plot (top right), 4 values above 900 seconds concerning the LSR method in scenario Z5 and from the relative error plot (bottom left), 1 value below -28 concerning the IRMM method in scenario Z4.

Table 1 -
Summary of functions (with package name and reference) for clustering hypergraphs through modularity-based approaches.We indicate which modularity is maximized by the function (second column), the corresponding algorithm (third column), the implementation language (fourth column) and our option choices (fifth column).

Table 2 -
Summary of synthetic models for modular hypergraphs and their characteristics.In all the models, the number of nodes n and the maximal hyperedge size S are chosen by the user.The numbers of clusters K and hyperedges |E| (resp.size-s hyperedges E s ) can either be fixed or random.

Table 3 -
Simulation settings and empirical descriptors of the 25 simulated hypergraphs in each scenario (line): number of clusters (K ), maximal hyperedge size (S), within-cluster over between-clusters size-s hyperedges ratio (ρ s ), number of nodes (n), mean number of size-s hyperedges ( |E s |), mean node degree ( d) and maximum node degree (max(d)).
Most importantly, from the clustering point of view, ARI is not improved (top left) and computing times are much larger (top right).This seems to indicate that the LSR places too much emphasis on maximizing modularity at the expense of clustering recovery.As the number of nodes n increases, we observe that ARI values globally have a lower dispersion, but do not seem to overall improve (top left Table 1 in the Supplementary Material shows that the modularity Q w-clique optimized by IRMM is close to 0 for the ground truth clusters, thus giving unstable errors; while Q linear optimized by LSR is strictly positive at those ground Veronica Poda & Catherine Matias Peer Community Journal, Vol. 4 (2024), article e37 https://doi.org/10.24072/pcjournal.404truth clusters.
In those figures again, time values for the LSR method in scenarios E6 Table1in the Supplementary Material, we observe that the ground truth modularity Q w-clique is close to zero, explaining this unstable behaviour of the relative error.
Peer Community Journal, Vol. 4 (2024), article e37 https://doi.org/10.24072/pcjournal.404 • The CNM-like algorithm does not recover the ground truth clusters in any simulation setting;• We did not observe any algorithm for which the modularity Q would be correctly maximized (relative error in modularity close to zero) while clusters would not be recovered (low ARI values).Nonetheless, the modularity Q strict from Kamiński et al., 2019a is not fully maximized by the CNM-like method, which leaves open the question of whether it is able to capture communities in hypergraphs.