Structify-Net: Random Graph generation with controlled size and customized structure

Network structure is often considered one of the most important features of a network, and various models exist to generate graphs having one of the most studied types of structures, such as blocks/communities or spatial structures. In this article, we introduce a framework for the generation of random graphs with a controlled size -- number of nodes, edges -- and a customizable structure, beyond blocks and spatial ones, based on node-pair rank and a tunable probability function allowing to control the amount of randomness. We introduce a structure zoo -- a collection of original network structures -- and conduct experiments on the small-world properties of networks generated by those structures. Finally, we introduce an implementation as a Python library named Structify-net.


Introduction
The structure of networks has long been one of the most studied research questions in network science.In this article, we introduce a method to generate networks of a chosen size, organized according to a structure that can be expressed as an arbitrary ranking function for node pairs.This process thus allows the generation of classic structures such as communities, blocks, and spatial organizations, but also more exotic ones.We subsequently show an application of this framework to study network properties, by extending the classic small-world experiment by (Watts and Strogatz, 1998).A python library (Cazabet, 2023) allowing reproduction of the results and generating networks with custom structures is also introduced.

Context
Generating networks of a chosen size and respecting some constraints is a key topic in network science.It is used in various tasks, for instance, to study network properties (Wang and Chen, 2003), as null models (e.g., Durak et al., 2013), to study the impact on diffusion processes (e.g., Ódor et al., 2021), as benchmarks (e.g., Lancichinetti et al., 2008), etc.In this article, we focus more particularly on a class of random graph models, in which the probability of observing 2 Remy Cazabet et al. Peer Community Journal, Vol. 3 (2023), article e103 https://doi.org/10.24072/pcjournal.335 edges between each pair of nodes is independent of the probability of observing edges between other pairs.This class of models is commonly used to generate various structures, from the homogeneous Erdős-Rényi (ER) generator, to configuration models preserving node degrees, block structures (Abbe, 2017), spatial structures (Cazabet et al., 2017;Waxman, 1988), etc. See section 7 for an overview of related works.The originality of our work is to propose a generic framework to generate many different network structures while allowing to set: • The number of nodes n; • The number of edges m (equivalently, the density); • A parameter ϵ ∈ [0, 1] controlling the strength of the structure bias, i.e., the network is fully determined by the structure definition when ϵ = 0, and increase in randomness with ϵ, becoming an ER network for ϵ = 1.The main advantage of our proposition compared with existing frameworks such as SBM or latent space models is 1) the simplicity for the user to design their own structure logic, by providing their own ranking function (see Section 2), 2) to control the strenght of the structure bias using a single numerical parameter.Structify-Net is not intended to be used in parameter inference tasks, but only for the generation of null models, reference models, and random graphs with controlled properties in general.

Motivational examples
Generating multiple random networks with common properties, either fixed beforehand or preserved from an observed network, is needed in most domains of network science.In this section, we list three motivational examples of usages of such models, for which Structify-net provides a simple practical solution.These examples are in no way exhaustive since random networks are used in many other contexts in network science.
Reference model.When one is interested in studying a network property, such as the transitivity or the homophily of a node attribute, one usually needs a random graph model as a reference.The simplest one of them is the Erdős-Rényi (ER) random graph model, in which only the nodes and the expected number of edges are preserved; but in most cases, one would like to compare with other reference models.For instance, when studying the transitivity, one might wonder if the observed transitivity of a network is significantly higher than the transitivity of a similar graph having a spatial structure, a block structure, or a strong degree of heterogeneity, etc.To explore those hypotheses, one needs a generative model to generate random networks having the same number of nodes and edges as the observed network, but with a particular structure.
Benchmark for network tasks.A generative model can also be used as a benchmark to test algorithms for complex network tasks developed for capturing fundamental patterns of networks and their functions.A common task where synthetic networks are used to evaluate the performances of an algorithm is community detection, namely the task of identifying -in its general intuition-well-connected and/or well-separated groups of nodes within a network (Fortunato and Hric, 2016).Generators with planted communities are used to estimate the agreement between their ground-truth structure and the communities captured by an algorithm, as for instance the LFR benchmark (Lancichinetti et al., 2008).To test the robustness of a wide variety of algorithms defined for different purposes, one needs generators able to model a wide range of planted structures capturing the properties one intends an algorithm to handle.These properties can range from density to homophily, plus any preferred combination of structural properties leading to clique-, grid-, and star-based structures, among others (Yamaguchi et al., 2020).
Influence of structure on dynamical processes.Dynamical processes on networks, such as diffusion or synchronization, have been studied for a long time.Typical examples would be the diffusion of pandemics or polarization on social media.The structure and properties of the network are well-known to have various effects on these processes.For instance, diffusion speed depends on degree heterogeneity (Barthélemy et al., 2004), community structure (Peng et al., 2020) and clustering (Zhuang et al., 2017), etc.In order to experiment with which factors might control Remy Cazabet et al. the speed and scale of a particular diffusion process, one should compare such processes with random networks that 1) are comparable in terms of size, i.e., number of nodes and edges, and 2) differ in their structure.

Method
Structify-Net principle is to create probabilistic random graph generators in two steps: (1) A rank function R sets an order among the node pairs, from most likely to appear to less likely to appear; (2) A probability function P assigns to each pair of nodes a probability to be connected by an edge, based on its rank.P allows to control the expected number of edges m.The function P is independent from the graph structure represented by R; and reciprocally R is independent from the expected number of edges m or the function P.

Rank function
The principle of Structify-Net generator is to describe a network structure by an edge-pair ranking function.More formally, we define R(u, v ) = r the function assigning a value r to each undirected node pair, such as r ∈ [1, n * (n−1)

2
] corresponds to the rank of the node pair (u, v ), and r (u, v ) < r (u2, v 2) means that it is more likely to observe an edge between the pair (u, v ) than between the pair (u2, v 2).This function can be expressed directly in that form, or be trivially derived from a function R ′ assigning a cost to each node pair, coupled with a sorting function, ranking pairs by increasing or decreasing values of R ′ (u, v ).In practice, in that case, we also add an infinitesimal random value ι to each cost, in order to avoid ties.
An intuitive example of such a cost function is for the spatial structure: given a position vector W u for each node u (provided to mimic real data, or generated in fully synthetic network generation), the tendency to observe edges can be a function of the distance, e.g., R ′ (u, v ) = ||W u , W v ||.By sorting node pairs by increasing distance, we obtain a spatial structure such that the closer the nodes, the higher their tendency to be connected.
Section 3 describes in more detail various types of network structures that can be represented this way.

Probability Function
To go from a ranking of node pairs to a random network generator, we use a rank probability function assigning a probability to each rank, i.e., P(r ) ∈ [0, 1].The only constraint to this function is that it must be non-increasing, so that a node pair of lower rank is at least as likely to be connected by an edge than a node pair of higher rank.
The probability function controls the expected number of edges: Bézier Interpolated Probability Function.We propose a family of probability functions controlled by 1) the target expected number of edges m, 2) a parameter ϵ, which controls how strongly is the random graph driven by the planted structure.The family is defined as follows: at one extreme (ϵ = 1), the probability of observing an edge is independent of the rank, i.e., P(r ) = m/L, as in an ER random graph.Conversely, at the other extreme ϵ = 0, the m edges connect the m pairs of nodes of lower rank: To interpolate between the two, we use a rational Bézier parametric curve, as illustrated in Fig.The function giving the probability of observing an edge between node pairs given their rank is defined by the derivative of the parametric Bézier curve (See Fig. 2).
The choice of the Bézier parametric curve arises as a natural solution to the problem as introduced in Fig. 2a: the curve for ϵ = 0 and ϵ = 1 are independent of the interpolation method, the chosen family function should thus propose a smooth interpolation between the two.The Bézier curve answers this problem in a convenient way, although other functions could be used.

Structure Zoo
To illustrate the expression power of the Structify-Net rank generation approach, we propose a collection of structures, available in the Python library under the name of Structure Zoo.This collection contains both classic structures widely found in the literature, together with original ones.Fig. 3 introduces matrix representations of the node-pair ranks of all structures in the zoo.The structures in the Zoo are only to be taken as examples, chosen arbitrarily among a few wellknown structure types, and a few original ones.The main interest of Structify-Net is for the users to be able to specify their own structure with their own rank functions.(Waxman, 1988).More complex versions exist such as the Gravity model (Cazabet et al., 2017).A simple spatial structure can be easily implemented as a Rank model by using a cost Function, with d(u, v ) a notion of distance, such as Euclidean or Haversine distance.W is a matrix such as W i is a vector representing the position of the node in a d dimensional space.Positions can come from observed data, or be generated.In Fig. 3, we attributed to nodes random positions in a 1-dimensional space.

Assortative block structure
Community structure is one of the best-known types of organization of networks.A simple way to implement such a structure as a random graph generator is to use the stochastic Block Model (SBM), with a constraint of having an assortative structure, i.e., edges are more likely to be present between nodes affiliated to the same community than to nodes affiliated to different ones.A simple way to implement this as a rank model is using the following cost function: With B the block affiliation vector, such as B i identifies the block to which node i is affiliated with.Of course, many variants are possible, for instance, to take into account the size of blocks/communities.

Overlapping Assortative Block structure
A variant of the block structure allowing nodes to have multiple affiliations.There are numerous ways to model this situation.In the example provided here, we keep the same threshold cost function as for the non-overlapping case, extending it to multiple affiliations, i.e., we use the following cost function: with B u the set of blocks to which node u belongs.In Fig. 3, each node belongs to exactly two communities and the affiliations are chosen such as each community has half of its nodes shared with another community c1 and the other half shared with another community c2.

Block Structure: Disconnected Cliques
Communities are often understood as sets of nodes that are strongly connected to each other and more weakly connected to the rest of the graph.A special case of extreme community structure can be set up by having only cliques, without links between them -disconnected cliques.Keeping the same threshold cost function as for assortative blocks, we can find the value for community sizes such as obtaining the densest possible disconnected subgraphs for ϵ = 0, for a fixed m.Given the average degree k = m 2n , we want cliques to be of size n c ⌈ k⌉.Because n is not necessarily a multiple of n c , we set the number of communities to ⌊ n nc ⌋, and group the remaining nodes in an additional community.The already defined assortative block structure is then applied as usual with those blocks.

Nested structure
Nested network structures are well-known in some fields, such as ecology and economics (Alves et al., 2019;Mariani et al., 2019).A nested structure is a type of hierarchical structure, Remy Cazabet et al. 7 Peer Community Journal, Vol. 3 (2023), article e103 https://doi.org/10.24072/pcjournal.335 in which the properties/links of each entity are subsets of the properties/links of entities at a higher hierarchical level.We implement this as follows: Where u, v are consecutive node indices in [1,n], and u < v

Star structure
Hubs are known to play an important role in many real networks.The hub-and-spoke structure is frequent both in human-designed infrastructure and in natural systems, forming patterns also known as stars.One way to obtain such a structure is by using the following rank function: Where u, v are consecutive node indices in [1,n], and u < v .
As seen in Fig. 3, this rank function ranks first all the pairs of nodes including the node of ID 0, then all the pairs of nodes containing node ID 1 and another node with a larger ID, and so on and so forth.The structure therefore tends to create stars with nodes of low IDs in the center.

Core Periphery
Core periphery structure is another well-known type of organization for complex systems.This organization is often modeled using blocks, one block being the dense core, another block, internally sparse, representing the periphery, and the density between the two blocks is set at an intermediate value.To illustrate the flexibility of the Rank approach, we propose a soft-core alternative, the coreness dissolving progressively into a periphery.To do so, we consider nodes embedded into a latent space, as for the spatial structure -random 1d positions in our example.The node-pair rank score is computed as the inverse of the product of 3 distances: the distances from both nodes to the center, and the distance between the two nodes.As a consequence, when two nodes belong to the center, they are very likely to be connected; two nodes far from the center are unlikely to be connected unless they are extremely close to each other.
with 0 the vector corresponding to the center of the space, i.e., the core of the generated network.
Note that this is only an example of a rank function implementing a soft core, and one could imagine many variations of it.

Perlin noise
Perline noise (Perlin, 2002) is a type of gradient noise frequently used in computer graphics to create images with a realistic feel, such as textures and landscapes.We use it to generate an adjacency matrix, from the upper triangle of a 2d image of size (in pixels) n × n.The R ′ cost function is the black intensity of the pixel.In practice, Perlin noise tends to create continuous shapes of lower and higher values, with smooth transitions between the two (see Fig. 3) for an example.Such a structure can be interpreted as a fuzzy version of a non-assortative SBM; with stronger relations between some groups of nodes and some other groups of nodes.Perlin noise has a parameter, called octaves, allowing the addition of smaller-scale structures on top of each other.

Fractal structures
To better illustrate the expression power of the Structify-Net structure definition, we propose three variations of what we call fractal structures.The principle is to embed the nodes into a complete binary tree and to compute the rank scores based on distances on that binary tree.The purpose is to introduce heterogeneity among nodes, which can be used to create specific structures.In the example, we embed 6 nodes.In the simpler case, the probability of observing a graph in the resulting graph is proportional to the distance between the nodes in the tree.
Fractal leaves.In the fractal leaves structure, we create a complete binary tree such that the number of leaves is n (Fig. 4).We embed nodes of the network on the leaves and use the distance between them in the graph as the cost function.This creates (see 3) a sort of Matryoshka doll, hierarchical block structure, in which -considering that edge probability decreases with distance without reaching zero-small dense blocks are contained into larger, sparser blocks, recursively.
Fractal root.In the fractal root structure, we embed nodes of the graph in a complete binary tree of size n, and define the cost function as the distance between nodes in the embedding tree (Fig. 4), R ′ (u, v ) = d T (u, v ), with d T the geodesic -shortest path-distance between the nodes in the embedding tree.This structure has similarities with the previous one, but introduces a particular role for some of the nodes: the root and nodes close to the role now occupy a central, pivotal role, since they are on the shortest paths between nodes on their rights and on their lefts.
The network has both a hierarchy of blocks and a sort of central core composed of the nodes close to the root of the tree.
Fractal hierarchy.The fractal principle and custom rank function can be used to create random networks with particular properties of interest.For instance, it has been pointed out in a seminal article (Ravasz and Barabási, 2003) that most real networks have a negative correlation between nodes' individual clustering coefficient and their degrees, while most network models do not reproduce this correlation.In the original article, a network model called the "hierarchical network model" has been introduced to generate networks with these properties.To reproduce the socalled hirarchical property -note that many other notions of hierarchical networks exist and that this is only the one introduced in (Ravasz and Barabási, 2003)-networks must have 1) heterogeneous degrees, 2) a high average clustering coefficient, and 3) a controlled relation between degrees and clustering coefficient.In the original article, such networks were created through an iterative deterministic algorithm, replacing graph parts with predefined subgraphs until reaching the target size.Instead, we propose here a rank-score approach, embedding nodes in a complete tree as in the fractal root embedding.However, we propose to use a ternary tree instead of a binary one, to increase local clustering.We then choose a score function such as: 1) leaves tend to have high clustering and low degree, and 2) root and other nodes in the higher levels tend to have high degrees and low clustering coefficients.The principle is thus to have a high probability of observing edges 1) between groups of nodes at the bottom of the tree if they have close common ancestors and 2) between nodes at the top of the tree and nodes at the bottom of the tree.The purpose of this example is to show that, by designing an appropriate rank function, one can obtain random graph generators such that the generated graphs have a property of interest.
The rank-score is thus defined as follows: S are scores capturing respectively a Descendent and Sibling similarity.We use: with h(T ) the global height of the tree and h(u) the height of node u, such as h(u) = 0 if u is a leaf, and h(u) = h(T ) if u is the root of the tree.This function ranks first pairs of nodes that are far away in terms of tree levels, with a value of 0 between the root and the leaves.
with d(u, v ) the shortest path distance in the tree.

Discussion on the structure zoo
Structures introduced in this structure zoo are only a few examples of the infinite number of possibilities for structures that can be defined by cost functions.We stress that once such a cost function has been chosen, we are able to generate graphs following them, with a chosen number of nodes and edges.
The structures we introduced allow the generation of synthetic networks without prior data, but one can perfectly define a cost function defined on node attributes, e.g., take a real network in which nodes are located in space, belong to known groups, and have other characteristic attributes, and define a structure by using a cost function taking all these attributes into account.Fig. 3 proposes a representation as matrices of all these structures on a network of 128 nodes and 1048 edges.

Application: Swall World Structures
One of the most famous properties of network structure is the so-called small world property.Introduced in (Watts and Strogatz, 1998), it characterizes a network as being a small world if it has both 1) a high clustering coefficient -significantly larger than in an ER random graph, 2) a short average distance -of the same order of magnitude as in an ER random graph.This property, considered ubiquitous in real networks, has been reproduced in (Watts and Strogatz, 1998) by progressively adding randomness to a regular network, built such as the n nodes are ordered in a circle, and each node is connected to its k/2 neighbors in both directions.The small world property emerges because, when we rewire edges at random, the average distance decreases faster than the clustering coefficient -both being large in the regular network and low in the ER random graph.
We conduct an experiment to observe how other structures behave when submitted to a similar experiment, i.e., starting with an archetypal structure, and adding noise progressively.

Reproducing the Watts-Strogatz experiment
Watts-Strogatz rank model.To mimic the original small-world experiment, we define a rank-based structure using a cost function, parameterized by the number of nodes n and the desired average node degree k.Scoring functions.In the original article, clustering coefficients and average distances were expressed as a fraction of the value obtained for the regular structure.We cannot reuse this approach for multiple structures, having different starting values.Instead, for the clustering, we directly use the average clustering coefficient score, CC (g ) ∈ [0, 1].
For the average distance, we propose a scaled value defined as: , otherwise with G(G ) the largest connected component of graph G , and d(G ) the average shortest path distance between nodes of graph G .
The property of this score is that δ ∈ [0, 1], with δ = 1 if every node can reach any other node in two hops or less (e.g., a full star structure), and δ decreases quickly as the average distance d increases.
Watts-Strogatz experiment replication.We use our setting to replicate an experiment similar in nature to the one in the original article.We used the same parameters, i.e., n = 1000, k = 10.We progressively add randomness, from a deterministic network to an ER random graph by varying parameter ϵ of the probability function.Fig 5b shows results coherent with the original article: after adding some randomness, the short path index has increased significantly, while the clustering coefficient still remains close to its value for the deterministic network.

Small-World profiles for other structures
We can apply the same process to the other structures defined in our structure zoo, with the same number of nodes and edges.In Fig. 6, we observe a wide variety of behaviors.
• Fractal-hierarchy and maximal-star structures display a super-small-world behavior, having both short paths and high clustering coefficients.This can be easily understood: their hierarchical nature creates a giant hub maximizing reachability.Fractal-hierarchy is designed such as most nodes of low degree have a high clustering coefficient, due to a local structure.On the contrary, in maximal stars, most nodes are connected only to a few hubs, but since those hubs are connected to each other, they also have a high clustering coefficient.• Nested, overlapping communities, and Perlin noise seems, on the contrary, to be antismall-world, with both a low clustering coefficient and long paths.Again, this is due to different factors.For instance, the nestedness and Perlin noise concentrate so many edge probability between a small subset of nodes, that many nodes are disconnected, leading to the absence of a giant component -thus to an infinite average distance.The overlapping community, instead, is created in a way that makes it look like the original Watts-Strogatz circular structure, as can be observed in Fig. 3. Its low clustering coefficient comes from 1)many nodes not having a degree of at least 2, or 2)structures being too large compared to the number of edges, so that the probability of forming triangles is low.• Other structures tend to follow a pattern roughly similar to the original article, with more or less pronounced profiles.In some cases, the short distance score remains at zero until reaching a certain amount of noise, due to the absence of a giant component.

Working With Node Attributes
In the previous section, we have seen an experiment in random networks without metadata/attributes, i.e., with interchangeable nodes.However, as mentioned in the model definition, it is possible to use any node information in the rank function.In this section, we provide two simple motivational examples.
Let's imagine working with a dataset such as a global air transport network.Nodes correspond to airports and are identified by their name, location, and country.Edges might correspond to having at least one direct flight between two airports.To study a property of such a network, for instance, its clustering, average shortest path, or properties of a virus diffusion on that network, one would usually compare that property with its value in networks generated by a reference null model, such as ER or Configuration model.Another possibility offered by Structifynet is to use the node attributes.For instance, we can consider only the airport positions, and rank node pairs according to the distance between them in the dataset, which corresponds to using the Spatial structure defined in the Zoo, using metadata instead of random positions.Alternatively, we could use the country information as metadata to the Assortative Block Structure of the Zoo.Of course, it could be possible to use already-existing methods such as block models or spatial models to do the same.Structify-Net simply offers a convenient way to do it, for any model, just by providing the appropriate function.
But it is of course possible to go far beyond simple blocks or latent space, and to propose a custom ranking function based on the case study.For instance, one can use machine learning to design a ranking function retaining complex properties from an observed graph.In the airport example, we could use a classification algorithm such as logistic regression or a decision tree to learn how likely it is to observe an edge between two nodes given their properties.From the observed network, we would extract a set of training examples {distance, sameCountry ∈ {0, 1}, edge ∈ {0, 1}}, and train a classifier, which can then assign a class probability to each nodepair.This probability would not however be usable directly to create null models preserving the number of edges.Instead, we can use this probability as a rank function, and generate random graphs with a chosen number of edges using Structify-Net.The structure produced will preserve some of the properties related to attributes of the original graph -probably, a higher tendency of airports to be connected if they are located in the same country and if they are close in space.

Python library
An important aspect of such a generator is to allow other researchers to use it for their own needs, whether it be to generate networks according to a structure described in the structure zoo, or to define their own.We thus release with this paper a pip installable python library (Cazabet, 2023), together with its documentation (https://structify-net.readthedocs.io/en/latest/).For convenience, the library is compatible with Networkx (Hagberg et al., 2008).Obtaining a rank model corresponding to one of those defined in the zoo, such as the nested structure, is as simple as calling it: 1 import s t r u c t i f y _ n e t .zoo as zoo 2 n=128 3 rank_model = zoo .s o r t _ n e s t e d n e s s ( n ) Generating a network as a Networkx object from it is straightforward: 1 import s t r u c t i f y _ n e t .zoo as zoo 2 n=128 3 m=512 4 g e n e r a t o r = zoo .s o r t _ n e s t e d n e s s ( n ) .g e t _ g e n e r a t o r ( e p s i l o n = 0 .5 ,m=m) 5 g_generated = g e n e r a t o r .g e n e r a t e ( ) One can also define a custom structure by providing a rank-score function: 1 import s t r u c t i f y _ n e t as s t n 2 n=128 3 m=512 4 5 def R_nestedness ( u , v , _ ) : 6 r e t u r n u+v 7 r a n k _ n e s t e d = s t n .Rank_model ( n , R_nestedness ) 8 g = r a n k _ n e s t e d .g e n e r a t e _ g r a p h ( e p s i l o n = 0 . 1 ,m=m) The library allows easy plotting of the rank-score matrices and node-pair probability matrices, and more generally reproduces all the content of the current article.

Related Works
Many works can be found in the literature on the generation of random graphs.A complete survey is beyond the scope of this paper; we will nevertheless briefly introduce in this section the most common random graph models and existing software for random graph generation.
We can make a distinction between generative models that are designed for model parameter inference, and those that are not.Models designed for inference are usually controlled by a limited number of parameters, and have some appropriate statistical properties allowing to infer parameter values to match a specific observed graph or series of graphs(e.g., SBM or latentspace models).On the contrary, generative-only models are designed to generate graphs with some specific properties in order to use them in downstream tasks (e.g., LFR or Waxman graphs).Structify-Net, although defined as an edge-independent random graph model, rather belongs to the second category, since it is not designed for parameter inference.

Common Random Network Models
The simplest way to generate random graphs is certainly the Erdos-Renyi (ER) random graph model, that we already introduced.ER random graphs are fully homogeneous and do not have any particular structure; but a controlled expected size.Configuration models, in particular the Chung-Lu version, allow the generation of graphs of controlled size without mesoscopic organization but preserving the nodes' degrees.
Remy Cazabet et al. 13 Peer Community Journal, Vol. 3 (2023), article e103 Block structures.Stochastic Block Models (SBM) define random graph models with block structures.In their simpler form, they are defined by two sets of attributes: a vector defining the block to which each node belongs, and a matrix defining the number of edges between each pair of blocks.They exist in various flavors, the canonical version being edge-independent (Snijders and Nowicki, 1997), while microcanonical versions (Tiago P Peixoto, 2017) are not.These models also exist with or without node degree preservation.The literature on SBM mostly focuses on inference, but SBM can naturally be used to generate random networks, either by choosing model parameters or by using those obtained after inference.A popular way to set manually the parameters for custom structure generation is to fix a number of blocks and a number of nodes in each block, then to choose an internal edge probability p in and an external probability p out .Usually, p in > p out , thus defining assortative blocks.Arbitrary block structures, with different block sizes or non-assortative structures can be defined by setting the parameters accordingly.
Multiple variants of block models exist, such as overlapping SBM (Latouche et al., 2009) or hierarchical ones (Schaub et al., 2023).Block models can also be used to generate core-periphery structures, typically by setting one core block and one or several peripheral blocks.This structure however cannot generate other types of possible core-periphery structures, such as a continuous change between core and periphery.
A popular random graph generator with community structure is the LFR Benchmark (Lancichinetti et al., 2008).Not designed for inference, it allows the generation of networks with realistic properties with a limited number of parameters, thus in a more convenient way than with manually initialized SBM.A more recent variant solving some of the problems of LFR is the ABCD random graph generator (Kamiński et al., 2021).
Latent space structure.Various models exist to generate random graphs in which nodes are embedded into a space, the probability of observing an edge depending on the distance between them.Among popular examples, we can cite Random Geometric Graphs (RGG; Dall and Christensen, 2002), in which nodes are connected if their distance is below a parameter, and Waxman Graphs (Waxman, 1988), in which edge probability decreases exponentially with the distance.The gravity model (Wojahn, 2001) is an alternative in which the probability of observing an edge depends both on nodes' degrees and on a deterrence function controlling the influence of distance on edge probability.The parameters, in particular the deterrence function, can also be inferred to fit a given observed network (Cazabet et al., 2017).Latent spaces are not limited to geographical space, and models have been proposed for the inference of social spaces, for instance (Hoff et al., 2002).
A model related both to SBM and to spatial models is the Random Dot Product Graph (RDPG) model (Young and Scheinerman, 2007).Nodes are characterized by a vector defining their positions in a latent space, and the probability of observing an edge between nodes is given as the dot product between their vectors.Some authors consider instead that networks are better represented in hyperbolic space, leading to the proposition of Hyperbolic random graph generators (Aldecoa et al., 2015).
Homophily.Other generators model edge probabilities depending on the nodes' attributes.They allow to analyze the interplay between similarities in structure (e.g., common friends in social networks) and similarities in node attributes (Asikainen et al., 2020), or to investigate mechanisms of non-structural closures such as the formation of links between nodes having similar attributes that do not share common neighbors, as a base of node homophily (Murase et al., 2019).
Generic random graph models.A difference between models introduced until then in this section and Structify-Net is the restriction to a single type of network structure.Since Structify-Net accepts any node-pair ranking function as input, it allows the generation of block, spatial, but also other types of structures such as core-periphery, nestedness, etc.
Another family of highly expressive random graph models is the Exponential Random Graph Model family (ERGM; Lusher et al., 2013).ERGMs define the probability of observing a given graph G as , with Θ a vector of network parameters, X (G ) network characteristics, including for instance the number of triangles or node properties, and c(Θ) a normalizing constant ensuring that the sum of P(G ) for all G is equal to 1. ERGMs are mostly used in the context of model inference, and allow in theory to model non-independent edges, e.g., taking into account a triangle closure propension.However, due to the computational complexity, this approach is limited to small graphs.
Finally, an approach sharing some similarities with our framework is the graphon (Glasscock, 2015), contraction of graph function, first introduced in (Lovász, 2006).A graphon can be defined (Orbanz and Roy, 2014) as a bivariate function W : [0, 1] 2 → [0, 1].That function returns an edge probability for each pair of nodes, based on a node latent variable.Graphons were first introduced mostly as theoretical objects, in the context of sequences of large, dense graphs.More recently, works have focused on the inference of this nonparametric model, as smoothgraphons (Sischka and Kauermann, 2022a) or combined with an SBM approach (Orbanz and Roy, 2014;Sischka and Kauermann, 2022b).While graphons share the principle of using a function to characterize the network structure with our approaches, they are part of a very different literature.Graphons are more generic, so much so that SBM, spatial, and nearly all latent-variablebased statistical models can be considered a special case of graphons.The literature on the topic focuses on inference problems, and notions such as node-pair ranking or randomness parameters are not present.
The Structify-net framework is aimed to play a different role compared with all methods introduced in this section.ERGM and Graphons are families of models, designed for model inference rather than network generation.They are so general that they do not offer much help to define a particular structure, and are used in general in a restricted context, for instance with block-approximations for graphons, or imposed number of triangles for ERGMs.SBM, gravity models and configuration models are on the contrary more specific than structify-net, focusing on a single type of network structure.Furthermore, they are often used in the context of model inference.On the contrary, other models such as LFR benchmark or Waxman graphs are designed, as Structify-Net, to generate networks with controlled properties, but they also focus on one specific type of structure.Our contribution thus occupies an original position in the scientific landscape on random graphs: it is designed for the generation of random graphs and not the inference task, it is more flexible than LFR or SBM, and offers a more convenient way to generate graphs of controlled properties compared with ERGM or Graphons.

Software
Several easy-to-use libraries propose to generate networks with blocks, following the Stochastic Block Model approach.Among the most popular, we can cite networkx (Hagberg et al., 2008) and iGraph (Csardi, Nepusz, et al., 2006), which include an SBM generation function allowing to define blocks of arbitrary sizes, arbitrary probabilities of observing edges between them, and then generate a graph accordingly.More advanced functions are proposed in the graph-tool (Tiago P. Peixoto, 2014) library, allowing to generate microcanonical, degree-preserving versions, and several other variants of block structures.
The same libraries offer, under the name of geometric models, some possibilities for spatially constrained network structures.Most of these methods, however, do not allow setting the number of edges, since they instead require setting a threshold below which edges exist, or using an a priori function (Waxman random graph).Only the k-nearest neighbors method allows one to choose the number of edges among multiples of the number of nodes, but it is a deterministic generator.These libraries also contain other types of network generators with non-random structures, such as lattices, or networks defined by a process such as the forest-fire model.
Another notable network generator is the LFR benchmark, which is implemented in networkx in a simplified version, or available as a standalone code to have access to all its settings.Software to work with graphons is much more scarce; we found an R library implementing graphon inference and graphon random graph generation (https://cran.r-project.org/web/packages/graphon/index.html); a recent python code exists also, although not in the form of a documented library (https://github.com/BenjaminSischka/GraphonPy).These libraries

Discussion
This article introduced a new method to generate random networks with a customizable network structure, and a target number of nodes and edges, while controlling the amount of randomness.To the best of our knowledge, this is the first random network generator allowing to do so.We think that having such a generator opens doors to new research directions in network science, for studying the properties of networks with some particular structures -as we have done in the experimental section, or as a reference model for observed graphs, to name a few.
Moreover, one of the main strengths of the generator is its ability to control situations where a process/rule of the structural organization (expressed by a pair-node rank) can be mixed with "unknown" random processes (expressed by ϵ); thus, among the observed edges, some of them strictly follow the structural constraints imposed by the rank, and some of them can go beyond the explanation of such constraints.The possibility to analyze this mix -between edges driven by the organization and randomness-is quite important, especially given that network behaviors like small-worldness or homophily can be better explained when randomness is added to a rule of connection (Talaga and Nowak, 2020).
Regarding the analysis of such network behaviors, another strength of the generator is the possibility to exploit the properties of nodes when defining a rank, thus including elements representing, in principle, physical position, political opinions on a spectrum, gender, or even degree.We focused here on the distance between vectors of nodes' positions in a d dimensional space for building a spatial structure (cf."Structure Zoo", 2.1).We also focused on the affiliation to the same group for building an assortative block structure (cf."Structure Zoo", 2.2).Similarities between such structures lead us to acknowledge the significance of incorporating node metadata/attributes to generalize a wide variety of network behaviors.We focused here on analyzing the small-world property, but the same can be applied to other behaviors.In principle, behaviors like homophily (McPherson et al., 2001), could be described just as a particular case of either a spatial organization (if attributes are numerical) or of an assortative block structure (if attributes are categorical).

Limits and future work
The main limit of the current work is scalability: node-pair ranks and probability matrices are dense matrices, which can be memory-demanding for large graphs.The generation process also requires an independent random draw for each node-pair.These limits could be overcome in future work.Another limit is that network structures in which probabilities of observing an edge between a pair of nodes are not independent of adjacent ones -for instance, to generate random networks of a specific size and a specific clustering coefficient-cannot be expressed by a rank-structure.Another limit compared with some other random graph models such as SBM or RDPG is the absence of an efficient inferential solution.In particular, without any constraint on the node-pair order, an inferential approach would always find a trivial uninformative solution in which all connected node-pairs are ranked first.Note however that, since it is possible to compute the probability to obtain a given graph for a set of parameters, it could be possible in theory to use maximal-likelihood inference on a subset of the models, e.g., by fixing the amount of randomness ϵ, and constraining the domain of acceptable rank functions.
In future work, we plan to compare the properties of real-world networks with those of the synthetic ones generated from structures such as those of the zoo.Having such a variety of possible structures, we expect to be able to characterize real networks, by observing similarities and differences with the synthetic ones, e.g., a real network might have a clustering coefficient and an average distance compatible with the Watts-Strogatz network, but differ in degree heterogeneity and robustness, while another synthetic network might have more similar properties

Figure 1 -
Figure 1 -Generating networks using the Structify-Net approach.A rank function defines an ordering between node pairs.A probability function is used to assign an edge probability to each node pair based on their rank in the ordering.This gives a Random graph model, that can be used to generate graph instances.Note how the same Rank function R1 can give 2 Random Graph Models using different Probability functions P1 and P2, how the same Probability function P1 is used for two different Rank functions R1 and R2, and how multiple graphs can be generated from a single Random Graph Model.

Figure 2 -
Figure 2 -Example of probability functions for various values of ϵ.In this example, we set m = 128, n = 512 2. The Bézier curve is defined by two endpoints, corresponding to the two points shared by both cumulative distributions: the points (0,0) and (L,m).The control point of the curve is (m,m).A weight b allows controlling how close the curve is to each of the two extremes.If b = 0, the Remy Cazabet et al. 5 Peer Community Journal, Vol. 3 (2023), article e103 https://doi.org/10.24072/pcjournal.335curve corresponds to ϵ = 0, and ϵ(lim b→∞ ) = 1.For convenience, we thus rescale the given parameter ϵ into b as follows: By convention, if ϵ = 0, b = b max and if ϵ = 1,b = 0, with b max a large integer constant.

Figure 3 -
Figure 3 -The Structure Zoo.Matrix of node-pairs ranks for networks with 128 nodes.Darker colors correspond to lower ranks.For Disconnected cliques, we set m = 128 * 8.When involving spatial or clique positions, nodes are ordered according to this value.

Figure 4 -
Figure4-Two methods to create fractal structures by embedding nodes into complete binary trees.In the example, we embed 6 nodes.In the simpler case, the probability of observing a graph in the resulting graph is proportional to the distance between the nodes in the tree.
otherwise With T u , the position of node u in the embedding tree, ANCESTOR a function such as ANCESTOR(u, v ) = TRUE if u is an ancestor of v in the embedding tree.Functions D and Remy Cazabet et al. 9 Peer Community Journal, Vol. 3 (2023), article e103 https://doi.org/10.24072/pcjournal.335

. Spatial structure Spatial
structures are commonly found in the literature.Several versions of random graphs spatial models exist, for instance, the Waxman Graph (Hunter et al., 2008)to see, in terms of usage of capabilities, with Structify-Net.They are designed for completely different purposes.The reference library for working with ERGM is the R package ergm(Hunter et al., 2008), focusing on model inference.(https://gvegayon.github.io/appliedsnar/the-ergm-package.html).