Large deviations for the annealed Ising model on inhomogeneous random graphs: spins and degrees

We prove a large deviations principle for the total spin and the number of edges under the annealed Ising measure on generalized random graphs. We also give detailed results on how the annealing over the Ising model changes the degrees of the vertices in the graph and show how it gives rise to interesting correlated random graphs.


Introduction and main results
Recently, there has been substantial work on Ising models on random graphs, as a paradigmatic model for dependent random variables on complex networks. While much work exists on random graphs with independent randomness on the edges or vertices, such as percolation and first-passage percolation (see [20] for a substantial overview of results for these models on random graphs), the dependence of the random variables on the vertices raises many interesting new questions. We refer to [4,5,8,11,12,13,18,17] for recent results on the Ising model on random graphs, as well as [20,Chapter 5] and [9] for overviews. The crux about the Ising model is that the variables that are assigned to the vertices of the random graph wish to be aligned, thus creating positive dependence. Since the Ising model lives on a random graph, we are dealing with non-trivial double randomness of both the spin system as well as the random environment. While [8,12,13,17] study the quenched setting, in which the random graph is either fixed (random-quenched) or the Boltzmann-Gibbs measure is averaged out with respect to the random medium (averaged-quenched), recently the annealed setting, in which both the partition function and the Boltzmann weight are averaged out separately has attracted substantial attention [4,5,11,18]. The random graph models investigated are rank-1 inhomogeneous random graphs [11,18], as well as random regular graphs and configuration models [4,5,17]. Depending on the setting, the annealed setting may have a different critical temperature. However, as predicted by the non-rigorous physics work [23,14], the annealed Ising model turns out to be in the same universality class as the quenched model for all settings investigated [5,11,13].
In this paper, we extend the analysis of the annealed Ising model on inhomogeneous random graphs to their large deviation properties. We investigate both the large deviations of the total spin, which is a classical problem dating back at least to Ellis [16,15], but we also consider the large deviation properties under the annealed measure of purely graph quantities, such as the number of edges or the vertex degrees.
Such problems are in general difficult since the rate function is not convex at low temperatures (β > β c ), so the Gärtner-Ellis theorem cannot be used directly.
Our main results provide a formula for the large deviation function of the total spin that holds true even when the hypothesis of the theorem are not satisfied, i.e., at low temperatures. This formula is indeed valid for all values of the parameters determining the phase diagram. To overcome the lack of differentiability of the annealed pressure (which is a necessary condition for the application of the Gärtner-Ellis theorem) at low temperatures, we shall use the key property that the annealed Ising model on the generalized random graph can be mapped to an inhomogeneous mean-field (Curie-Weiss) model. As a consequence, the large deviation function of the total spin can be deduced from classical results for independent variables and application of the Varadhan's lemma.
The study of large deviations for the number of edges brings the fact to light that, if one focuses solely on graph observables and properties, then annealing can be described in terms of a modified law for the graph. Our results show that in the annealed setting, the typical number of edges present is substantially larger than the typical value under the original law of the graph, thus quantifying the effect that the annealing has on the structure of the random graph involved. As explained in more detail below, one could think of the annealed Ising model on a random graph as giving rise to a random graph with an interesting correlation structure between the edges. To gain more understanding on this correlation structure we also investigate the degrees distribution under the annealed Ising measure. Again we find that the degree of a fixed vertex (or the degree of a uniformly chosen vertex) under the modified graph law has a distribution with a larger mean.

The annealed Ising model on generalized random graphs
We now introduce the model. We first define the specific random graph model, the so-called generalized random graph, and then define the (annealed) Ising model.

Generalized random graph
To construct the generalized random graph [3], let I ij denote the Bernoulli indicator that the edge between vertex i and vertex j is present and let p ij = P (I ij = 1) be the edge probability, where different edges are present independently. Further, consider a sequence of non-negative weights w = (w i ) i∈[n] whose label i runs through the vertex set [n] = {1, . . . , n}. Then, the generalized random graph, denoted by GRG n (w), is defined by where ℓ n = i∈[n] w i is the total weight of all vertices. Denote the law of GRG n (w) by P and its expectation by E. There are many related random graph models (also called rank-1 inhomogeneous random graphs [2]), such as the random graph with specified expected degrees or Chung-Lu model [6,7] and the Poisson random graph or Norros-Reittu model [24]. Janson [21] shows that many of these models are asymptotically equivalent. Even though his results do not apply to the large deviation properties of these random graphs, all our results also apply to these other models.
We need to assume that the vertex weight sequences w = (w i ) i∈ [n] are sufficiently nicely behaved. Let U n ∈ [n] denote a uniformly chosen vertex in GRG n (w) and W n = w Un its weight. Then, the following condition defines the asymptotic weight W and set the convergence properties of (W n ) n≥1 to W : Further, we assume that E[W ] > 0.
As explained in more detail in [19,Chapter 6], conditions (a)-(b) imply that the empirical degree distribution of the random graph converges to a mixed Poisson distribution with mixing distribution W , i.e., the proportion of vertices with degree k is close to the probability that a Poisson random variable with random parameter W equals k. We note also that, by uniform integrability, Condition 1.1(c) implies (b).
Notation. Throughout this paper, given a probability measure µ we denote by E µ the average w.r.t. µ.

Annealed Ising model
Let σ = (σ i ) i∈[n] ∈ {−1, +1} n =: Ω n be a spin configuration. Then, for a given graph denotes the edge set, the Ising model is defined by the following Boltzmann-Gibbs measure σ i is the quenched partition function. Here β ≥ 0 is the inverse temperature and B ∈ R is the external field. When G n is a random graph, this is known as the random quenched Ising model [17].
To obtain the annealed model, we take expectations with respect to the random graph measure in both the numerator and denominator of (1.2), i.e., we define the annealed Ising measure by where the annealed partition function Z an n (β, B) is equal to σ i .

Previous results for the annealed Ising model on the generalized random graph
In this section, we describe some important results about the annealed Ising model that have been derived previously. An important quantity in the study of the annealed Ising model is the annealed pressure defined by ψ an n (β, B) = 1 n log Z an n (β, B).
The thermodynamic limit of this quantity ψ an (β, B) := lim n→∞ ψ an n (β, B) is determined in the following theorem: Theorem 1.2 (Annealed pressure [18]). Suppose that Condition 1.1 holds. Then for all 0 ≤ β < ∞ and all B ∈ R, where α(β) = lim n→∞ α n (β) with α n (β) defined in (1.19) below is given by 5) and z ⋆ (β, B) is, for B = 0, given by the unique solution with the same sign as B of the fixed-point equation This theorem is proved in [18,Thm 1.1]. In Section 2.2 we provide an alternative expression for the annealed pressure that is instrumental for our large deviation analysis.
In [18,Thm 1.1] it is also proved that the annealed Ising model on the generalized random graph has a second order phase transition at a critical inverse temperature β an c given by , and suppose that (β, B) ∈ U . Then, for all ε > 0 there exists a constant L = L(ε) > 0 such that, for all n sufficiently large, being z ⋆ (β, B) the solution of (1.6), equals the annealed magnetization, that is lim n→∞ M an n (β, B). Furthermore, where χ an (β, B) = ∂ ∂B M an (β, B) is the annealed susceptibility and N (0, σ 2 ) denotes a centered normal random variable with variance σ 2 .
Analogously one can define the random quenched pressure: This has been determined for the GRG as well as other locally tree-like random graph models in [8,12], where it is also proven that ψ qe (β, B) is a non-random quantity. An SLLN and CLT for the total spin w.r.t. µ qe n have been obtained in [17]. In general, the quenched and annealed pressures are different, and also the critical temperatures of the models are different. The only exception that we are aware of is the random regular graph (see [4]). The critical temperature in the quenched setting will be denoted by β qe c .

Main results
In this paper, we study the spin sum in more detail (i.e. beyond the CLT scale) and prove a large deviation principle for S n , as well as a weighted version that plays a crucial role in the annealed Ising model. Let us start by recalling what a large deviation principle is. Given a sequence of random variables (X n ) n≥1 taking values in the measurable space (X , B), with X a topological space and B a σ-field of subsets of X , then the large deviation principle is defined as follows: Definition 1.4 (Large deviation principle (LDP) [10]). We say that (X n ) n≥1 satisfies an LDP with rate function I(x) and speed n w.r.t. a probability measure where F o denotes the interior of F andF its closure.
In this definition I : X → [0, ∞] is a lower semicontinuous function. Our first main result is an LDP for the total spin in the high-temperature regime for both the random quenched and the annealed Ising model: Theorem 1.5 (Total spin LDPs in high-temperature regime). In the annealed Ising model, under Condition 1.1, the total spin S n satisfies an LDP w.r.t. µ an n for β ≤ β an c and B ∈ R, with rate function In the random quenched Ising model, under Condition 1.1, the total spin S n also satisfies an LDP w.r.t. µ qe n for β ≤ β qe c and B ∈ R, with rate function The proof of Theorem 1.5 is highly general, and applies to settings where the pressure is known to exist and to be differentiable. As such, the proof is basically identical for the annealed and quenched Ising models on GRG n (w).
For the annealed Ising model we also prove an LDP for all positive temperatures. For this, we also introduce the total weighted spin S (w) n = i∈ [n] w i σ i . Theorem 1.6 (Alternative form of the pressure and LDPs for the annealed Ising model). For all β ≥ 0 and B ∈ R, under Condition 1.1, the annealed pressure is given by and the couple (S n , S (w) n ) satisfies an LDP w.r.t. µ an n with rate function (1.10) Furthermore the annealed pressure has the alternative expression and also with the alternative expression of the rate function given by Naturally, in the high-temperature setting, the large deviation rate functions in (1.8) and (1.10) (or (1.12)) coincide after the application of a contraction principle. Combining Theorem 1.2 and Theorem 1.6 we see that the annealed pressure is either given by the optimization of a real function (as in (1.4)) or it can be expressed as the solution of a two-dimensional variational problem (as in (1.9) or (1.11)). In Section 2.2 we shall prove Theorem 1.2 starting from Theorem 1.6, thus obtaining that the expressions for the annealed pressure do coincide.
We next discuss the LDP for the total number of edges in the annealed Ising model on GRG n (w): Theorem 1.7 (LDPs for the edges in the annealed Ising model). Suppose that Condition 1.1 holds. For all β ≥ 0 and B ∈ R, the total number of edges |E n | satisfies an LDP w.r.t. µ an n with rate function that is the Legendre transform of the function which is explicitly computed in (3.18) below. Further, the number of edges under the annealed Ising model on GRG n (w) satisfies We continue by investigating the limiting distribution of the degrees of vertices. Our main result is as follows: cosh z ⋆ (β, B)e t w j sinh(β) (1.14) Consequently, the degree D U of a uniformly chosen vertex satisfies (1.15) In the above, z ⋆ (β, B) is the solution to (1.6).
We remark that in (1.15) we both take the average w.r.t. the annealed measure µ an n as well as with the uniform vertex U ∈ [n]. E µ an n e tDv → E e cosh(β)W (e t −1) (1. 16) In (1.14), we see that the moment generating function of a vertex having weight w is close to We recognize e cosh(β)w(e t −1) as the moment generating function of a Poisson random variable with mean cosh(β)w, which is multiplied by another function. However, this factor does not turn out to be a moment generating function.
By setting a(β) = sinh(β) E[W ] for the sake of notation, we can rewrite the product of the second and third factors in the r.h.s. of (1.14) as e (w j +B)a(β)z ⋆ e w j (cosh(β)+a(β)z ⋆ )(e t −1) + e −(w j +B)a(β)z ⋆ e w j (cosh(β)−a(β)z ⋆ )(e t −1) This shows that the limiting moment generating function of D j is a mixed Poisson random variables with parameters w j (cosh(β) + Y a(β)z ⋆ ), where provided w j (cosh(β) ± a(β)z ⋆ ) are both positive. We lack a more detailed interpretation of the above two realizations.
Let us next relate Theorem 1.8 to Theorem 1.7. We can use (1.15) to show that, as in (1.13), Indeed, note that .
Here, in the middle formula, we again take the average w.r.t. both µ an n as well as the uniform vertex U ∈ [n]. Convergence of the moment-generating function implies convergence of all moments, so that as required, where we have made use of (1.6) in the last step. Thus, for (1.13), it suffices to prove that 1 n |E n | is concentrated.
In the next theorem, we extend Theorem 1.8 to several vertices:  (1)).
Theorem 1.10 implies that the degrees of different vertices under the annealed measure are approximately independent.

Discussion
In this section, we discuss our results and state some further conjectures.
Random-quenched LDP. For the random-quenched model we only obtain an LDP in the hightemperature regime. The difficulty in this analysis is that the rate function is non-convex at low temperature. This means that the usual technique relying on the Gärtner-Ellis theorem, by taking the Legendre transform of the cumulant generating function, does not work. The cumulant generating function can easily be expressed in terms of the difference of the pressure for different values of the external field B. However, this Legendre transform is the convex envelope of the cumulant generating function. This raises the question how to do this for all inverse temperatures β.
Averaged-quenched LDP. The averaged quenched measure is defined as E µ qe n (σ) (recall (1.2)). Here, even in the high-temperature regime, we are in trouble since the averaged quenched cumulant generating function is not a difference of pressures. Independently of the explicit computation, an interesting question is whether it is possible to relate the random-quenched and the averaged-quenched large deviation rate functions.
Large deviations of random graph quantities. As already mentioned in the introduction, if one is interested only in graph quantities, then the effect of the annealing amounts to changing the graph law from P (the law of of GRG n (w)) to a new law P β,B depending on the two parameters β and B. Evidently lim β→0,B→0 P β,B = P. We know that under the law P a uniform degree has an asymptotic mixed Poisson distribution with mixing distribution W . From formula (1.15) we see that in zero external field B = 0, the moment generating function of a uniform degree changes in two ways: firstly, in the high-temperature regime, the mixing distribution changes to W cosh(β) (since z ⋆ (β, 0) = 0 there); secondly, in the lowtemperature region a new effect appears due to the non-zero value of z ⋆ (β, 0). It would be of interest to invert the moment generating function (1.15) and thus explicitly characterize the distribution of a uniform degree at low temperatures. This can be done once we know that is non-negative (see Remark 1.9), but we do not know this to be true in general. Also, as of yet, we have no interpretation for this novel mixed Poisson distribution for the degrees. It might also be interesting to investigate other properties of the random graph under the annealed Ising model. An example would be the distribution of triangles, for which the positive dependence of edges enforced by the annealed Ising model might have a pronounced effect. A further interesting problem is to identify the large deviation rate function in a joint LDP for both the spin as well as the total number of edges.
Organisation of this paper. We start in Section 1.4 by describing an enlightening computation that is at the heart of our analysis. In Section 2, we derive the LDP for the total spin and the total weighted spin. In Section 3, we investigate the large deviation properties, as well as the weak convergence, of the number of edges in the annealed Ising model, thus quantifying the statement that under the annealed Ising model, there are more edges in the graph than for the typical graph. In Section 4, we investigate the degree distribution under the annealed Ising model. Finally in the Appendix we re-derive the LDP for the total spin by combinatorial arguments.

Preliminaries: an enlightening computation
Our large deviations results are obtained from exact expressions for moment generating functions of spin or of edge variables under the annealed GRG n (w) measure. Such exact expressions follow from the observation (already contained in [18, Sec. 2.1]) that the annealed GRG n (w) measure can be identified as an inhomogeneous Ising model on the complete graph, which is called the rank-1 inhomogeneous Curie-Weiss model in [18]. In this paper, we will extend such computations significantly, for example by also including the edge statuses. We can write the numerator in the definition (1.3) of µ an n as where we have used the independence of the edges in the second equality. Define Then, we can write Hence, also using the symmetry β ij = β ji , We observe that the quantity e σ i can be regarded as the Hamiltonian of an inhomogeneous Curie-Weiss model with couplings given by (β ij ) ij . Thus, the annealed Ising model on the GRG n (w) is equivalent to such inhomogeneous model, see [18,11]. Moreover, since β ij is close to factorizing into a contribution due to i and to j, one can prove [18,11] that: This computation shows that, in the large n-limit, the annealed measure µ an n at inverse temperature β is close to the Boltzmann-Gibbs measure µ ICW n of the rank-1 inhomogeneous Curie-Weiss model at inverse temperatureβ = sinh(β) with Hamiltonian and normalizing partition function (1.23) The above analysis can be simply extended to moment generating functions involving (some of) the edge variables (I ij ) 1≤i<j≤n , as these can be incorporated into the exponential term and the expectation w.r.t. them can then again be taken. Of course, in such settings, the connection to the rank-1 inhomogeneous Curie-Weiss model is changed as well, and a large part of our paper deals precisely with the description of such changes, as well as their effects.

LDP in the high-temperature regime
We first prove the LDP in the high-temperature regime for the annealed Ising model using the Gärtner-Ellis theorem.
Proof of Theorem 1.5. To apply the Gärtner-Ellis theorem we need the thermodynamic limit of the cumulant generating function of S n w.r.t. µ an n , given by Observe that . Hence, where the existence of the limit follows from Theorem 1.2. We know that, for B = 0, Hence, it follows from the Gärtner-Ellis theorem [10, Thm. 2.3.6] that S n satisfies an LDP with rate function given by the Legendre transform of c(t) which is given by (1.8).
The proof for the random quenched Ising model is analogous.
Let us now elaborate on the interpretation of the above results. The stationarity condition for (1.8) is which defines a functionť =ť(x; β, B) such that Given (β, B), the total spin per particle will concentrate around its typical value M an (β, B) coinciding with the magnetization. To observe the atypical value x the field must be changed from B to B + t, where t is determined by requiring that x is the magnetization M an (β, B + t). Note that we have not made use of any specifics about the graph sequence, or whether we are in the annealed or quenched setting. Hence, the above holds for Ising models on any graph sequence, as long as the appropriate thermodynamic limit of the pressure exists.
For β > β an c , and hence c(t) is not differentiable for t = −B and the Gärtner-Ellis theorem can no longer be applied.
Since the spontaneous magnetization is not zero, it is not possible to find a t such that (2.1) holds for −m + < x < m + . Therefore, the Legendre transform (1.8) has a flat piece. By the Gärtner-Ellis theorem, this Legendre transform still gives a lower bound on the rate function, but it is only an upper bound for so-called exposed points of the Legendre transform, i.e., for x outside this flat piece. In fact, we show that the Legendre transform in general does not give the correct rate function, since the Legendre transform of the pressure is convex and we show that the rate function in the low temperature regime in general is not.

LDPs for the total spin and weighted spin
In this section we prove Theorem 1.6 and then we deduce from it a new proof of Theorem 1.2 (thus by a method different from that of [18]). Following Ellis' approach [15], we can compute the annealed pressure ψ an (β, B) and the large deviation function of Y n (σ) : the annealed measure µ an n , starting from the LDP of (m n , m (w) n ) w.r.t. the product measure The large deviations of Y n = (m n , m (w) n ) w.r.t. P n can easily be obtained by applying the Gärtner-Ellis theorem.
Proof of Theorem 1.6. Let t = (t 1 , t 2 ) and compute where E Pn denotes average w.r.t. P n . Thus, the cumulant generating function of the vector here E represents the average w.r.t. the uniformly chosen vertex W n . Since | log cosh(t 1 + W n t 2 )| ≤ |t 1 + W n t 2 | ≤ |t 1 | + W n |t 2 | it follows from Condition 1.1(b) and the dominated convergence theorem that with W limiting weight of the graph. By the Gärtner-Ellis theorem, we conclude that Y n has a large deviation principle with rate function We have are given by the stationarity condition For any function f : Ω n → R we can write σ∈Ωn f (σ) = 2 n Ωn f (σ)dP n (σ).
Hence, also using (1.20), and, similarly, which is equivalent to (1.9), and the rate function of (m n , m (w) n ) w.r.t. the annealed measure is [16, Thm. II.7.2] This shows that indeed (S n , S (w) n ) satisfies an LDP w.r.t. µ an n with rate function given by (1.10). By applying the contraction principle, we obtain the rate functions I an β,B of m n and J an β,B of m (w) n as I an β,B (x 1 ) = inf In a similar way, we can also immediately obtain an LDP by incorporating the magnetic field in the a priori measure on the spins. For this, define where E P (B) n denotes average w.r.t. P (B) n . Hence, the cumulant generating function is given by (with E the average w.r.t. the uniformly chosen vertex W n ) which, as in the previous case, converges to We can apply the Gärtner-Ellis theorem to obtain that (m n , m (w) n ) satisfies an LDP w.r.t. P (B) n with rate function The stationarity conditions are given by Note that Hence, where Z an n (β, B) = (2 cosh B) n e nαn Ωn e n 2 sinh(β) As above, it immediately follows that (m n , m (w) n ) satisfies an LDP w.r.t. the annealed measure with rate function where the pressure is given by This proves that also (1.12) is a rate function for the LDP of (S n , S (w) n ).
where it should be noted that, by the contraction principle, inf x 1 I (B) (x 1 , x 2 ) is equal to the rate function I (w) for the LDP of m (w) n w.r.t. P (B) n . Setting t 1 = 0 in the above computations, this can be proved to be so that The supremum in (2.10) is attained for t satisfying Since f (t) is strictly increasing, its inverse f −1 is well defined. Hence, and d dx Hence, the supremum in (2.11) for x, or equivalently, For any solution x ⋆ of (2.12), For B > 0, f (t) is an increasing, bounded and concave function for t ≥ 0 with f (0) > 0, and hence there is a unique positive solution x + to (2.12). For any negative solution to (2.12), x − say, since x + is the unique positive local maximum. An analogous argument holds for B < 0. Hence, where x ⋆ is the unique solution to (2.12) with the same sign as B. The value for B = 0 follows from Lipschitz continuity. This is equivalent to the formulation in (1.4) by making a change of variables 3 LDP for the number of edges: proof of Theorem 1.7 So far we have considered large deviations of the total spin. We now consider observables that depend only on the graph and investigate their large deviation properties w.r.t. the annealed Ising measure. Such an analysis sheds light on what graph structures optimize the Ising Hamiltonian.

Strategy of the proof
In this section, we investigate the large deviation properties for the number of edges |E n | = i<j I ij under the annealed Ising model on the generalized random graph, where we recall that (I ij ) 1≤i<j≤n denote the independent Bernoulli indicators of the event that the edge ij is present in the graph, which occurs with probability p ij in (1.1). We aim to apply the Gärtner-Ellis theorem, for which we need to compute the generating function of |E n | w.r.t. the annealed measure µ an n given by For later purposes, we will generalize the above computation and, introducing the variables t ij , instead compute the generating function of the Bernoulli indicators (I ij ) ij defined for t = (t ij ) ij ∈ R n(n−1)/2 R β,B,n (t) := E µ an n e 1≤i<j≤n t ij I ij = This can be carried out in a similar way as in [18]. Let us focus on the numerator in the previous display, which we denote by A n (t, β, B), so that We have We rewrite where β ij (t ij ) and C ij (t ij ) are chosen such that From the above system, we get By symmetry β ij (t ij ) = β ji (t ij ). Furthermore, defining t ji = t ij for 1 ≤ i < j ≤ n and The equations (3.3) and (3.6) give us an explicit formula for the moment generating function of the edge variables (I ij ) ij in the annealed GRG n (w) that will prove useful throughout the remainder of this paper.

Moment generating function for the number of edges
Since the moment generating function for the number of edges in (3.1) can be obtained from R β,B,n (t) in (3.2) by choosing t ij = t for all 1 ≤ i < j ≤ n, we continue by studying the asymptotics of A n (t, β, B) for such case, which we denote as A n (t, β, B). By a Taylor expansion of x → log(1 + x), β ij (t) = 1 2 log 1 + p ij (e t+β − 1) − 1 2 log 1 + p ij (e t−β − 1) For any fixed t, the term O( i,j∈[n] p 2 ij (e t±β − 1) 2 ) can be controlled by using p ij ≤ w i w j ℓn and Condition 1.1(c), which implies that and then, We can proceed further and write where we have also used that, under Condition 1.1(c), Recalling the definition of the partition function of the Inhomogeneous Curie-Weiss model we can thus rewrite A n (t, β, B) = G n (t, β)e o(n) Z ICW n (e t sinh(β), B) , while the denominator in (3.1) equals Therefore, the annealed cumulant generating function of the number of the edges is In order to apply the Gärtner-Ellis theorem, we need to compute the limit of ϕ β,B,n (t). We can deal with the first and second term in the r.h.s. of (3.9) by using the results obtained in [18], in which the limit pressure of the Inhomogeneous Curie-Weiss model has been computed. Indeed, from [18] ψ ICW (sinh(β), B) := lim = log 2 + E log cosh e t sinh(β) with z ⋆ (t, β, B) be the unique fixed point with the same sign as B of the equation Next, we have to deal with the third term in (3.9) which, recalling (3.7) and (3.4), we write explicitly as (3.13) We start by computing the first term in the r.h.s. of the previous display, then we show that the remaining terms give a vanishing contribution in the limit. We start by recalling that, on the basis of the Weight Regularity Condition 1.1(a) and (c), ℓ n = n(E[W ] + o(1)) = O(n) and 1≤i<j≤n p 2 ij = O(n −1 ). Thus, we write the first term in (3.13) as where the Taylor expansions of 1/(1 + x) and log(1 + x) have been used. Therefore, Then, by Weight Regularity Condition 1.1(c) and p ij ≤ w i w j /ℓ n , where the definition of β ii (t) in (3.5) has been used. Combining (3.13) with the estimates in (3.14), (3.15), (3.16) leads to (3.17) Considering the limit n → ∞ in (3.9) and using (3.17), (3.11) and (3.10) , finally gives us

Conclusion of the proof
With (3.18) in hand, we are finally ready to prove Theorem 1.7. Equation (3.18) identifies the infinitevolume limit of the cumulant generating function of the number of edges. By the Gärtner-Ellis theorem, this also identifies the rate function as its Legendre transform, provided that t → ϕ β,B (t) is differentiable. We compute the derivative of t → ϕ β,B (t) in (3.18) explicitly as Since z ⋆ (t, β, B) is the fixed point for the ICW withβ = e t sinh(β), which is an analytic function of t, it holds that z ⋆ (t, β, B) is analytic in t for B = 0 and hence d dt z ⋆ (t, β, B) exists. By (3.12), the first expectation equals z ⋆ (t, β, B), so that the two terms containing the factors For B = 0, d dt z ⋆ (t, β, B) might not exist in the critical point e t sinh(β) =β c . However, since the specific heat is finite, both the left and right derivative exist. Therefore, the above argument can be repeated for the left and right derivative, which both give the r.h.s. of (3.20), so that this equation is also true for This shows that t → ϕ β,B (t) is differentiable and it concludes the proof of the main statement in Theorem 1.7 about the large deviations function for the number of edges in the annealed GRG n (w). Formula (1.13) for the expected number of edges is immediately obtained by evaluating (3.20) in t = 0.
Finally, we note that by the LDP derived in the previous section, and the fact that the limiting rate function is strictly convex (this can be seen by noting that both terms on the r.h.s. of (3.20) are strictly increasing) the rate function has a unique minimum, which immediately shows that |E n |/n is concentrated around its mean, which has already been derived in (1.17) as well as in (3.20).  because z ⋆ (0, 0) = 0, which can also be seen by direct computation.
4 Degree distribution under annealed measure: proof of Theorem 1.8 , the degree sequence of the GRG n (w) we want to compute its moment generating function with respect to the annealed measure µ an n , i.e., g β,B,n (s) = E µ an n e i∈[n] s i D i , for s = (s 1 , s 2 , . . . , s n ) ∈ R n . Since D i = j =i I ij , where (I ij ) 1≤i<j≤n are the independent Bernoulli variables with parameters p ij representing the indicator that the edge ij exists and I ji = I ij , we can write where we define t ij (s) := s i + s j for 1 ≤ i < j ≤ n. Furthermore, by (3.3), where we recall that A n (t, β, B) was defined in (3.6). This is the starting point of our analysis. In Section 4.1 we simplify the expression for the moment generating function of the degrees by using the mapping of the annealed Ising measure to the rank-1 inhomogeneous Curie-Weiss model. We then investigate the degree of a fixed vertex under the annealed Ising model in section 4.2 and we consider finitely many degrees in section 4.3.

Moment generating function of the degrees
We start by rewriting the generating function of the degree g β,B,n (s). To this aim, due to (4.2), we need to rewrite A n (t(s), β, B). This can be done using again the Hubbard-Stratonovich identity. Introducing the standard Gaussian variable Z, we will show that we can extend the arguments in [18] to show that log cosh (a n (β)e s i w i Z + B) where a n (β) = sinh(β) ℓn , κ(t) is some appropriate constant and E Z denotes the expectation w.r.t. the Gaussian variable Z. This boils down to proving convergence of the moment generating function, which requires sharp asymptotics for A n (t(s), β, B), while in [18], it sufficed to study the logarithmic asymptotics.
To see (4.3), we define the s-dependent rank-1 inhomogeneous Curie-Weiss model measure as i,j sinh(β)e s i e s j w i w j with Z ICW n,s (sinh(β), B) the appropriate partition function. Then, using (3.6), we can follow [11, (4.64)] to obtain that A n (t(s), β, B) = G n (t(s), β)Z ICW n,s (sinh(β), B) E µ ICW n,s e Fn(s) , (4.4) where now F n (s) = 1 2 i,j β ij (s i + s j ) − e s i +s j sinh(β) w i w j ℓ n σ i σ j , and we have adapted notation from E n in [11, (4.64)] to F n here to avoid confusion with the total number of edges. To further simplify (4.4), we observe that, following the proof of [11,Lemma 4.1], one has log cosh (a n (β)e s i w i Z + B) .
Further, under Condition 1.1(a)-(c), we can follow the proof of [11,Lemma 4.7] to identify the limit of E µ ICW n,s e Fn(s) , as formulated in the next lemma: In particular, κ(s) = κ(0) when s = (s 1 , . . . , s n ) only contains finitely many non-zero coordinates.
Proof of Lemma 4.1. We follow the proof of [11,Lemma 4.7] to obtain that Due to the negativity of this term, Lemma 4.1 follows when we prove that, for some barκ(s), and then Lemma 4.1 follows with κ(s) = 1 2 (κ(s)) 2 sinh(β) cosh(β). We proceed to prove (4.5), which, in turn, is equivalent to proving that as n → ∞ E µ ICW n,s e r i∈[n] e s i σ i w 2 i ℓn → e rκ(s) .
Following [18, (4.71)] we start by applying again the Hubbard-Stratonovich identity that gives E µ ICW n,s e r i∈[n] e s i σ i The sum over the spins can now be performed yielding By introducing the random variables W n (s) = w U e s U , where U ∈ [n] is a uniform vertex, the previous expression can be rewritten as E µ ICW n,s e r i∈[n] e s i σ i w 2 i ℓn = R exp − z 2 /2 + nE log cosh r ℓn W 2 n (s/2) + sinh(β) ℓn W n (s)z + B dz We do a change of variables replacing z √ n by z, so that E µ ICW n,s e r i∈[n] e s i σ i Assuming that W n (s) D −→ W (s) for some limiting distribution, as well as E[W n (s) 2 ] → E[W (s) 2 ] (which in fact is a condition on s), an application of the Laplace method yields E µ ICW n,s e r i∈[n] e s i σ i where z ⋆ (s, β, B) is the solution with the same sign as B of All in all, the previous computation shows that (4.5) holds with .
When s only has a finite number of non-zero coordinates, it holds that W n (s) , so thatκ(s) =κ(0), as required.
Armed with (4.3), we recall (4.2) and thus conclude that the moment generating function of the degrees is given by with a n (β) = sinh(β)

Degree of a fixed vertex: proof of Theorem 1.8
We want to study the distribution of the degree of a fixed vertex. With no loss of generality we can fix, for instance, vertex i = 1. Thus, we choose s = s 1 with s 1 = (s, 0, . . . , 0), and write exp n i=1 log cosh (a n (β)e s i w i Z + B) = cosh(a n (β)e s w 1 Z + B) cosh(a n (β)w 1 Z + B) exp n i=1 log cosh (a n (β)w i Z + B) . where E Wn is the average w.r.t. W n = w U being U an uniformly chosen vertex in [n], we can introduce the probability measure on R by and write (4.6) as g β,B,n (s 1 ) = (1 + o(1)) G n (t(s 1 ), β) G n (0, β) E γ β,B,n cosh (a n (β)e s w 1 Z + B) cosh (a n (β)w 1 Z + B) , (4.8) since, by Lemma 4.1, κ(t) = κ(0). Now, under the measure γ β,B,n , Z/ √ n P −→ z ⋆ (β, B), which can be seen by performing a Laplace method on the integral In fact, that is precisely the interpretation that z ⋆ (β, B) in Theorem 1.2 has. As a result, E γ β,B,n cosh (a n (β)e s w 1 Z + B) cosh (a n (β) Thus, and we are left with the problem of studying the limit of G n (t(s 1 ), β)/G n (0, β). We have 10) where (3.7) has been used. From the definition of C ij (s)'s, we get cosh(β 1j (0)) cosh(β 1j (s)) . (4.11) Putting p ij = w i w j /(ℓ n + w i w j ), the first term in the l.h.s. is rewritten as j>1 e s cosh(β)p 1j + 1 − p 1j cosh(β)p 1j + 1 − p 1j = j>1 ℓ n + e s cosh(β)w 1 w j ℓ n + cosh(β)w 1 w j = e cosh(β)w 1 (e s −1) (1 + o(1)) as n → ∞. Next, we consider the second factor in the r.h.s. of (4.11). Arguing as in the previous section for equation (3.15), since max j∈[n] w j = o(n). Taking the exponential of the previous relation, we obtain j>1 cosh(β 1j (0)) cosh(β 1j (s)) = 1 + o(1), as n → ∞. Finally, since β ij (s) = o(1) as n → ∞ (since p ij → 0 in the same limit), the second factor in the r.h.s. of (4.10) is 1 + o(1). This proves that and from (4.9), we finally obtain cosh (a n (β)e s i w i Z + B) cosh (a n (β)w i Z + B) , cosh (a n (β)e s i w i Z + B) cosh (a n (β) (4.14) as n → ∞. Now we have to study the limit of G n (t(s m ), β)/G n (0 m , β). From the definition of G n (t, β) given in (3.7) and recalling that t ij (s) = s i + s j , We analyze the three factors separately: • First and third factors of (4.15). By the definition of C ij (t ij ), , (4.16) where, by definition of p ij , 1≤i<j≤m e s i e s j cosh(β)p ij We show that this factor is 1 + o(1). Indeed, following [19], we expand log(1 + x) obtaining: since ℓ n = O(n) and m is fixed. The second term in the r.h.s. of (4.16) and the third factor of (4.15) converge to 1. Thus we have shown that that the first and third factors of (4.15) are 1 + o(1).
• Second factor of (4.15). For any fixed The second factor in the r.h.s. of the previous display can be treated as in (4.12), showing that it is 1 + o(1), while the first factor is close to the generating function of D i in a GRG with vertex set {i, m + 1, . . . , n} and weight of vertex i given by cosh(β)w i . We can deal with this term as we have already done, that is,  (1)). Thus we conclude that (1)).
Going back to (4.13), we finally obtain that as required.
A Appendix: LDP for the total spin using combinatorial arguments In this appendix, we obtain the large deviation function of the total spin in the rank-1 inhomogeneous Curie-Weiss model (and thus in the annealed Ising model) by employing direct combinatorial arguments. We will restrict to the finite-type setting in which, roughly, there is a finite set of values for w i 's. More precisely, we define this setting as follows: Condition A.1 (Finite-type setting). The vertex weight sequences w = (w i ) i∈[n] satisfy the following conditions: (a) There exists a K ∈ N and a set of positive numbers A = {a 1 , a 2 , . . . , a K }, with a 1 < a 2 < . . . < a K , such that w i ∈ A for all i ∈ [n]; (b) Denoting byn k (n) the number of weights (w i ) i∈ [n] such that w i = a k , then the following limits exist Hereafter, for the sake of notation we drop n from the notation ofn k (n),p k (n), e k (n).
In this finite-type setting, the previous Condition A.1 is equivalent to Condition 1.1 in which W n is the uniformly chosen weight with and W is the limit weight assuming values a k with probability p k , so that Assuming In the theorem below, we write ⌊x⌋ for the integer part of x > 0.
Theorem A.2 (LDPs for the total spin in the finite-type ICW model). In the inhomogeneous Curie-Weiss model defined by (1.22), and assuming the finite-type setting in Condition A.1, the total spin S n satisfies that for m ∈ (−1, 1), with A = 1 2 (1 + m) a 1 , 1 2 (1 + m) a K , where ψ ICW (β, B) is the pressure of the model and where I m (x) = E e λ 1 W +λ 2 1 + e λ 1 W +λ 2 log e λ 1 W +λ 2 1 + e λ 1 W +λ 2 + 1 1 + e λ 1 W +λ 2 log Remark A.3. The expression for the large deviation rate function of the total spin in the Theorem A.2 coincides with the one that is obtained from Theorem 1.6 by application of the contraction principle and the relation between the annealed Ising model and the inhomogeneous Curie-Weiss model. Indeed, recalling that the annealed measure µ an n at inverse temperature β is close to the Boltzmann-Gibbs measure µ ICW n of the inhomogeneous Curie-Weiss model at inverse temperatureβ = sinh(β) (in the sense of equation (2.7)) and by using ψ ICW (β, B) = −α(β) + ψ an (β, B), one finds that the large deviation function of the total spin in the inhomogeneous Curie-Weiss model obtained from (2.4) reads To see that (A.7) is equal to the r.h.s of (A.4) one employs the substitution x 2 = 2x − E(W ). In doing so clearly the energetic contribution are equal since It remains to prove that This can be shown by changing the spin variables σ i to the variables y i = 1 2 (σ i + 1) and introducinĝ Observe that S n = 2Ŝ n − n, S (w) n = 2Ŝ (w) n − n E[W n ], so that we can write (1 + e 2(t 1 +w i t 2 ) ), we obtain that the moment generating function of (S n , S (w) n ) w.r.t. the product measure (2.2) can be expressed as c n (t) = − log 2 − (t 1 + t 2 E[W n ]) + E[log(1 + e 2(t 1 +Wnt 2 ) )].
Thus, arguing as in the proof of Theorem 1.6, we obtain that the limit of c n (t) exists and equals By applying the Gärtner-Ellis theorem we get the expression for the rate function The stationarity conditions read as (A.8) Since x 1 represents the magnetization m and x 2 represents the weighted magnetization m (w) , and using again the substitution x = (x 2 + E(W ))/2 we obtain that (A.8) is identical to (A.6) provided that λ 1 is identified with 2t 2 and λ 2 with 2t 1 .
Proof of Theorem A.2. Given a configuration σ, we denote by n + and n − the number of its positive resp. negative spins. We can group the spins in σ according to either m n or n + , since these quantities are related by n + = n(1 + m n )/2. We can also identify each configuration of spins σ with the set I + ⊂ [n] of vertices in which σ i = 1, obviously the cardinality of this set is |I + | = n + . Given any I + ⊂ [n], we define to be the frequency of type a k in I + . Then and r (w) n := Moreover, given n, we define the set Q n := ℓ 1 n , . . . , ℓ K n | ℓ k ∈ N, ℓ k ≤n k (n), k = 1, . . . , K of the possible frequency vectors q = (q 1 , q 2 , . . . , q K ).
Exponential estimate for the conditional probability of q = (q 1 , q 2 , . . . , q K ). We start by counting the number of sets I + with ⌊n(1 + m)/2⌋ elements and a given q = (q 1 , q 2 , . . . , q K ) ∈ Q n that satisfies the condition n K k=1 q k = ⌊n( 1+m 2 )⌋ = |I + | = n + . For any k = 1, . . . , K, in [n] there are np k sites corresponding to a k , and we choose nq k out of them to form I + . On the other hand, there are n n + ≡ n ⌊n(1+m)/2⌋ possible ways to form a set I + with n + elements. Thus, the conditional distribution of q = (q 1 , q 2 , . . . , q K ) given m is multi-hypergeometric, i.e., P n (q 1 , q 2 , . . . , q K | m) = The asymptotic behavior of this probability as n → ∞, can be obtained by using the Stirling's approximation n! = e −n n n √ 2πn(1 + o(1)) to estimate of the binomial coefficient as where 0 < a < b. Then, generalizing the previous formula to a set of variables a k < b k , k = 1, . . . , K, we obtain and the function is defined on the set We now compute the asymptotics of the numerator in (A.12). Recalling thatp k = p k + e k and Taylor expanding the sum in (A.14) and c 1 as a function of b k 's, we obtain for some constants c (1) k and c (2) k . The second factor in the r.h.s. comes from the substitution p k → p k + e k in the factor c 1 of (A.14), and the third form the sum in the same equation. From Condition A.1 we have that these terms are both (1 + o(1)). Then, we conclude that the numerator in (A.12) is We can deal with the denominator in (A.12) in a similar fashion, obtaining: (1) and Exponential estimate for the conditional probability of r (w) which are the sets of values of r (w) n = 1 n i∈I + w i corresponding to subsets I + ⊂ [n] with ⌊n(1 + m)/2⌋ elements.
We have that with ρ n (m) = n(1 + m)/2 − ⌊n(1 + m)/2⌋. Obviously ρn(m) n = O(n −1 ), since 0 ≤ ρ n (m) < 1. Moreover, by (A.21) and the fact that inf A n (m) ≥ a 1 k q k and sup A n (m) ≤ a K k q k , as n → ∞. The previous remark imply that r (w) n is close to some x ∈ 1 2 (1 + m) a 1 , 1 2 (1 + m) a K for large n. Therefore, we claim that where the sum is extended to those k-tuples q ∈ Q n for which the event i∈I + w i = n x is realized. In the previous sum the term that corresponds to the larger value of the exponent g(q, m) in (A.18) controls the behavior in the limit, the remaining terms being sub-leading. The quantity depending on m in the definition of h(q, m), see (A.20), is negative and the sum on k is positive, while h(q, m) is negative in the range defined in the first line of (A.19). Thus, defining h(q 1 , q 2 , . . . , q K ) = K k=1 [p k log p k − q k log q k − (p k − q k ) log(p k − q k )], we have to find S n = sup a·q=x, k q k = 1 n ⌊n(m+1)/2⌋h (q 1 , q 2 , . . . , q K ).
Asymptotic behavior of P µ ICW n (m n = m). Let us observe that, since the conditional average on the left hand side to the previous display is computed with respect to the uniform measure π n (σ) = 2 −n on the spins σ,