Modularity in Complex Multilayer Networks with Multiple Aspects: A Static Perspective

Complex systems are usually illustrated by networks which captures the topology of the interactions between the entities. To better understand the roles played by the entities in the system one needs to uncover the underlying community structure of the system. In recent years, systems with interactions that have various types or can change over time between the entities have attracted an increasing research attention. However, algorithms aiming to solve the key problem - community detection - in multilayer networks are still limited. In this work, we first introduce the multilayer network model representation with multiple aspects, which is flexible to a variety of networks. Then based on this model, we naturally derive the multilayer modularity - a widely adopted objective function of community detection in networks - from a static perspective as an evaluation metric to evaluate the quality of the communities detected in multilayer networks. It enables us to better understand the essence of the modularity by pointing out the specific kind of communities that will lead to a high modularity score. We also propose a spectral method called mSpec for the optimization of the proposed modularity function based on the supra-adjacency representation of the multilayer networks. Experiments on the electroencephalograph network and the comparison results on several empirical multilayer networks demonstrate the feasibility and reliable performance of the proposed method.


INTRODUCTION
C OMPLEX systems are usually illustrated by networks which captures the topology of the interactions between the entities [1], [2], [3], [4], [5]. For systems with more complicated entity interconnections, edges with different attributes, e.g. directed graphs [2], [6], weighted graphs [7], [8], signed graphs [9], [10] and so on, have been thoroughly studied. In recent years, systems with entity interactions that have various types or can change over time have attracted an increasing research attention [11], [12], [13], [14]. For example, a person interacts with his friends in Facebook and uses emails for business will demonstrate different behaviors in Facebook social network and email social network. Such networks are usually interpreted as a combination of different "layers" (or "views", "edge colors", "relations", "slices", etc. in the literature), and is regarded as multilayer networks. In different contexts, "multigraph", "multiplex network", "multirelational network", "multislice network", "multilevel network", "network of network" and "temporal network" always refer to a similar network structure [15]. Following the conventional terminology in network science, we refer to networks with such structure as multilayer networks.
Although there is actually no consensus on its definition, a community usually refers to a group of nodes that are compactly connected with each other and sparsely connected with those nodes outside the group. By partitioning a network into communities we obtain its community structure, which is a coarse-grained representation of the network that assists us analyzing the roles played by each node [17]. Despite numerous studies on multilayer networks in recent years, there is still a lack of evaluation metrics for measuring the community structure of a multilayer network, which in turn limits the number of available algorithms to find the optimal community structure in multilayer networks. Existing evaluation metrics in multilayer networks are mainly derived from "single-layer" cases, where the evaluation metrics are designed to detect modular structures in conventional networks that can be represented simply with nodes and edges, e.g. edge centrality, clustering coefficient, and metrics based on dynamic process [15], [18], [19], [20], [21], [22]. In such methods, detections are applied independently on each layers before final assignment, or on an "collapsed network" which is a single-layer network generated by aggregating the layers [23]. Such treatment is intuitive to find an "average" role played by a node in different layers, but somehow fails to treat the multiple layers fundamentally as a whole. In [16], Mucha et al. proposed a modularity-based metric for multilayer network community structure derived from a Laplacian dynamic. To the best of our knowledge, they for the first time introduce couplings to the multilayer network models, which are links that appear between layers and connect a node with its copy in other layers, to combine the layers and form an interconnectedlayer network model. Based on such an interconnected-layer structure, the generalized modularity is able to evaluate the community structure without any compression or loss of the arXiv:1605.06190v1 [cs.SI] 20 May 2016  The resolution parameter controlling the contribution of the null model to the modularity µ, µ The normalization factor of multilayer modularity proposed in this paper and [16] information encoded in the multilayer networks.
In spite of the great advances, the generalized modularity still has weaknesses which will lead to confusion especially when it comes to temporal networks, where the layers are usually time slices of a specific evolving single-layer network. The derivation of modularity is based on the stability of global communities [24], which is measured by comparing the position of a random walker with the stable state. This assumes that the random walker keeps transferring as the time goes. However, the layers are interdependent w.r.t. time and the couplings are introduced to describe the continuity of the interaction between nodes along the layers (time slices) [16]. It is confusing since one layer may be the result of the evolvement of another layer but the random walker is assumed to be able to travel between them. Another important weakness is, although the generalized modularity is generally similar with its original version in the single-layer case, the definition of a community in multilayer networks becomes vaguer to understand -what does a community look like in multilayer networks if a random walker can hardly escape from it? Actually, the current derivation of multilayer modularity focus on capturing the dynamic property of a communitystopping the random walker from leaving it. In some cases where there is such random process defined on the network, the definition of the community is apparent. But in other cases, the definition of a community becomes vague.
The above two issues are inevitably brought by the derivation from a dynamic perspective. In order to address them, in this paper, we derive the generalized modularity from a static perspective, i.e. without defining dynamic process on the network. As will be shown in section 3, from such perspective, the generalized modularity is represented as the predominant part of Hamiltonian, which measures the total energy of the systems in a variety of cases including community structure in the networks [25]. Thus the optimization of the proposed metric is equivalent to that of Hamiltonian, which provides the generalized modularity with an energy explanation. We also demonstrate in section 3 that the generalized modularity just finds communities with high cohesion, i.e. densely distributed internal efficient edges (not the couplings), which is more intuitive to understand and returns to its original definition in the single-layer case [26]. With such a static derivation, we are able to generalize the modularity to multiple aspect cases, where the layers belong to different groups [15] or the layer relation is flexible. We also propose a spectral algorithm called mSpec for optimizing the proposed modularity evaluation metric, which extends the spectral bisection algorithm in the single-layer case [26].
We summarize our contribution in this paper briefly as follows: • We derive the multilayer modularity from a static perspective to address the confusion in temporary networks and point out which kind of topological structure will lead to a high modularity value. • We generalize the multilayer modularity to adapt to networks with multiple aspects or there are flexible constraints on the layer relation. • We propose a spectral bisection algorithm (mSpec) for multilayer modularity optimization based on the supraadjacency representation for multilayer structure. • We apply the proposed metric to electroencephalogram (EEG) networks as an attempt of application.
The rest of this paper will be arranged as follows. We review the related works that have been done in the literature in section 2. The proposed multilayer modularity and the mSpec optimization will be described in section 3 and section 4 respectively. The experimental results are reported in section 5. We conclude this paper in section 6. For clarity, Table 1 summarizes the notations used in this paper.

BACKGROUND
In this section, we will briefly introduce the network models that have been explored in the literature and the strategies that have been adopted to detect communities in multilayer networks, including evaluation metrics and optimization.

Layer2
Layer3 Aspect ( II ) Fig. 1. Aspect-aspect representation of the multilayer network model [15]. Aspect (I) has two layer sets and aspect (II) has three layer sets. The within-layer edges are denoted with solid lines and the couplings are denoted with dotted lines where different colors indicate the couplings of different aspects.
During the process of exploring the multilayer networks, different network models have been proposed [15], [16], [27]. Mucha et al. linked multiple single-layer networks with couplings, which refer to the edges that connect the nodes with their copies in different layers, to represent a multilayer network [16]. This model allows the layers to communicate through the couplings and is widely adopted especially by research involving dynamics defined on multilayer networks [28], [29]. De Domenico et al. proposed a multilayer model based on tensor representation, which no longer restrains the between-layer connection to appear between node-copy pairs [28]. In the rest of this paper, we will use between-layer edges to refer to this kind of connections that link a node with another node in different layers. On the one hand, the presence of between-layer edges makes the network more flexible. But on the other hand, a multilayer network with between-layer edges is very similar to a single-layer network in structure since they both have no limitations on the presence of edges (any node in any layer is allowed to link with another node in another layer). This will sometimes blur the boundary between single-layer and multilayer networks.
In more complex systems, the layers may be divided into several groups, which indicates that multilayer networks should also be distinguished when observed from different aspects [15]. For example, a cellphone contact network can be characterized by different means such as calling and texting.
Meanwhile, this network is also temporal since there are callings and textings at any time point. Thus this layers is divided into two groups: according to time stamps or according to communication means. In order not to lose the information of the networks from either aspect, we have to construct a more complex multilayer network. There are actually two types of multilayer networks with multiple aspects, which has not been clearly distinguished in the literature. If a layer can belong to more than one aspects, which means the aspects may overlap, we can locate a single layer by indicating all aspects it belongs to. In the rest of this paper, we will call this an aspect-aspect representation, as shown in Fig. 1. In such representation, we need a F -dimensional (the number of aspects) vector to locate a layer. For example the layer at the top right can be located by (2, 1) since the layer is in layer set 2 of aspect (I) and layer set 1 of aspect (II). When there are additional aspects, the dimension of the location vector grows. Therefore, it is a challenging task to represent this network by matrix with predetermined size.
In other networks with multiple aspects, each layer only belongs to a unique aspect. For instance, conventional electroencephalogram (EEG) networks (single-layer networks) for different individuals construct a multilayer network, where each layer corresponds to an EEG network of a person. With the fact that all testees receive the same treatment, the layers reflect the common reactions to the test, but still hold the individual difference between the testees. In addition, a person can take several EEG tests to obtain different EEG networks that all reflect the roles played by different regions of his brain in the test. Thus we have two aspects observing the EEG network of the testees, enabling us to analyze individual differences and similarities as well as the role of different brain regions simultaneously. With respect to different individuals, we may have as many aspects as the number of persons that takes the EEG test, and the layers within that aspect are several EEG network obtained from several tests. To locate a layer in such networks, we just need to point out to which aspect the layer belongs and its position within that aspect, as shown in Fig. 2. We will refer to such representation as aspect-layer representation to distinguish with the aspectaspect representation.
Actually, for a more convenient implementation, we can convert the aspect-aspect representation to the aspect-layer representation by absorbing aspects hierarchically into one aspect. We can interpret this process by considering how multidimensional arrays are stored on the disk. A 2-dimensional array is represented as an "array of arrays". The multiple aspects are arranged in a similar way so that we can represent the network using matrices with predetermined size.

Existing Evaluation Metrics for Community Detection in Multilayer Networks
As one of the most concerned issues in network analysis, community detection aims at partitioning the network into groups of closely connected nodes (which is called a community) to obtain a coarse-grained representation, which helps us better understand the structure of the network. However, as far as we are concerned, most of existing evaluation metrics designed for community detection in multilayer networks assume that the layers are independent. The multilayer stochastic blockmodels (SBM), which are generative models that make inferences on the role of nodes given the network structure as evidence [23], [30], usually adopt two types of strategies. They either learn a SBM on each layer, just like in single-layer networks, and then make global assignments based on the result of each layers, or they aggregate the layers to produce a "collapsed network" [23]. The final community assignment of each node is made based on the SBM result on the collapsed network. De Domenico et al. extended the well-known infomap method [31] to the multilayer case [22]. The infomap method solves the community detection problem by considering its duality with a coding problem. It assumes that the community is able to capture the flows on the network so that by utilizing the community structure, we can greatly compress the coding length needed to describe a random process on the network [31]. The goal is to minimize the "map equation", which describes the coding length based on a specific partition and the transition probability of the random process. De Domenico et al. defined the transition probability of a random walker in multilayer networks so that the map equation is able to describe the flow in multilayer scenarios. Such treatment is intuitively correct, albeit they assume the node can reach the neighbors of its copies in other layers in a single step. In fact this implicitly erases the difference between layers -it is equivalent to consider a collapsed network.
Some other existing evaluation metrics also provide considerable solutions to the community detection problem in multilayer networks, such as multilayer clustering coefficient (the authors consider the overlapping of layers or the networks with multiple types of connections) [18], [19], multilayer centrality (the authors consider a random walker to jump between layers through specific node pairs or edges) [20], [21], etc. What these methods share in common is that they assume the layers are independent or can be aggregated and attempt to find global roles for the nodes. Such treatments would have considerable effects as the network structure varies when we wish to find the similarity of the layers. But when we are interested in the different roles of nodes in the layers, these methods may generate a poor result, as we will discuss in the experiments. Thus it is highly recommended to adopt an interconnected-layer structure.
Modularity is a widely adopted metric for community detection in single-layer networks [2], [26], [32], [33]. The original definition of modularity is the edge difference between the current network and a null model, which is a rewired random network with the same degree distribution as the original network. Modularity reflects the cohesion of nodes within a community, so by optimizing global modularity one can find a partition of the network with communities within which the edges are densely distributed [26]. Recently, Mucha et al. extended the single-layer modularity to multilayer case using a Laplacian dynamic process defined on the multilayer network (without between-layer edges), which measures the stability of a community by comparing the probability of a random walker to stay in the same community at time t to the static solution (i.e. t → ∞) [16], [24].
This generalized modularity is of great contribution due to the fact that it combines the layers (using the couplings) on a model level for the first time and is adopted in a wide range of areas [12], [34], [35]. Nevertheless, this evaluation metric still has weaknesses. The multilayer modularity is derived based on a dynamic process (actually it is a random walk process), which means the random walker is jumping between nodes as time goes. So what if the network is evolving over time? When it comes to temporal networks, whose layers can be interpreted as different time slices of an evolving network (i.e. the edges varies over time), things get confusing, because the layers can be seen as different states in the network evolving process. Moreover, although the within-layer representation is the same as the conventional form proposed by Newman et al., it is not clear what kind of community the multilayer modularity tends to find. It is of vital importance to know the bias of the evaluation metrics on the communities, so that we can pick appropriate evaluation metrics for corresponding network structures. Last but not the least, the coupling strength strategy needs modification to adapt to more general cases, since the original one is brought without much discussion.

Optimization
Optimizing the single-layer modularity is an NP-hard problem [36], so we can only obtain a good approximation of the optimal solution efficiently. Since the single-layer modularity is actually a component of the multilayer modularity, the optimization of the multilayer modularity will also be NP-hard. To our best knowledge, there are rare algorithms except a generalized Louvain heuristic approach for multilayer modularity optimization [16]. The Louvain method is a greedy iterative method which hierarchically aggregates two nodes into a group by making the optimal modularity gain in each iteration. Then the generated node group is regarded as a new node and another iteration starts. This algorithm converges when there is no such merger that increases global modularity value. Some tricks like adding a Kernighan-Lin node swapping step [37] after each iteration will give better detection result. The Louvain method is a widely adopted heuristic for optimizing quality functions of community structure, which implies that it does not utilize the property of the evaluation metric. Meanwhile, the community assignments of nodes are not guaranteed to converge to a good approximation, so we may need to run the algorithm several times to obtain a relatively more reasonable solution. As will be discussed in section 5, we cannot control the community scale detected by the Louvain method. When it comes to EEG networks, the Louvain method provides a relatively fine-grained detection result, whereas we expect it to find two communities -the regions that are active or inactive.
In order to tackle the above issues, we adopt the aspectlayer representation for describing network structure which is intuitive to implement and derive the multilayer modularity from a static perspective (not involving the dynamic process). We also discuss the extension of the evaluation metric so as to make it applicable when considering different types of multilayer networks such as unbalanced multilayer networks, temporal networks or signed networks etc. We propose a spectral method for optimizing the multilayer modularity which provides a stable solution and is helpful when we concern the scale of the discovered communities.

MULTILAYER MODULARITY FROM A STATIC PERSPECTIVE
We start with several general requirements that a quality function should satisfy as introduced in [25]: (i) rewarding existing edges within a community, (ii) penalizing non-existing edges within a community, (iii) penalizing existing edges between two communities and (iv) rewarding non-existing edges between two communities. Thus a general quality function takes the form where A ij is the edge strength of nodes i and j, g i indicates the label of the community that node i belongs to, and a, b, c, d are free parameters. The delta function δ is the Kronecker delta. In multilayer networks, since there are three kinds of edges (within-layer edges, couplings and between-layer edges), we need to expand this function to enable the additional edge types. To be more explicit, the between-layer edges will be ignored in this paper since they blur the boundaries between such multilayer model and a single-layer network (i.e. both of them have no restraints on the appearance of the edges). But similar tricks can be designed to easily enable between-layer edges in this model. The expanded quality function can be written as Within-layer internal existing edges Within-layer internal non-existing links Between-layer internal existingcouplings Between-layer internal non-existing couplings Between-layer external non-existing couplings , where we use s and r for the denotation of different layers, v and w for that of aspects. N and V v represent the total number of nodes within a layer and total number of layers of aspect v and matrix A, C and g denote the within-layer adjacency, between-layer adjacency and the community label matrix, respectively. The number of parameters has doubled after taking between-layer couplings into account. Eq. (1) points out the general form of an objective function for community detection, and can be used to derive the Hamiltonian of a Potts model in statistical mechanics as well as the modularity [25], [38], while Eq. (2) restricts the quality that an objective function in the multilayer case should satisfy. Since the parameters of Eq. (2) control the punishment (encouragement) and are free to choose, we can take a to obtain a similar representation as the multilayer modularity [16], where p {v} ijs known as null model is the penalty factor, and the parameter γ In Eq. (3), we notice that the terms that do not contain δ will be constant in optimization process, so we can rewrite H M (g) as By usingC , we can get the standard Hamiltonian form for system with many particles where the first term is proportional to Mucha's modularity [16] (except the valueC {vw} isr takes differs from C ijs in [16] which will be discussed in section 3.1). The last two terms which can be interpreted as bias are dependent on the given network and the parameters. Therefore, minimizing Hamiltonian is equivalent to optimizing modularity. Finally we obtain the modularity representation .
which groups the edges into two types and gives different punishment (encouragement). The value of the parameters are actually the efficient number of each type of edges (the edge difference of current network structure and the null model). In other words, a positive modularity is obtained if the edges and couplings within the community are more than those between different communities. A higher modularity is reached when the edges are more densely distributed within the communities.

Selection ofC {vw} isr
In [16], Mucha et al. proposed a multilayer modularity based on a Laplacian dynamic defined on multilayer network model as follows Although similar to the proposed form as Eq. (7) in structure (taking v = w to obtain a single-aspect representation of modularity in this paper), Mucha et al. did not discuss much about the coupling strength C jsr . They chose C jsr to take binary value {0, ω} to represent the absence and presence of couplings and ω controls the contribution of couplings. In the proposed form, we notice thatC is totally free, the proposed form of multilayer modularity is flexible to adjust to various types of multilayer networks. We will use two typical types of network as an example.

Unevenly Distributed Views
Consider a common type of multilayer network whose distribution of layers is uneven, i.e. the intervals between pairs of layers can be unequal. In this situation, simply letting C {vw} isr take the same value without considering the closeness of layers will cause large errors. For instance, suppose we have a multilayer electroencephalogram network in which each person is treated as a layer. Apparently, the age difference and gender of the patients will greatly influence the result [40], [41]. Therefore, we should enable the proposed model to handle such networks with unevenly distributed layers.
where M sr measures the closeness of layer s and r. Here we still use ω to control the coupling strength so as to control the balance between within-layer edges and between-layer couplings.

Temporal Networks
In some research a temporal network is defined as a sequence of networks corresponding to successive time points with between-layer couplings indicating the continuity between adjacent layers [14], [39], [42]. For example, suppose in a phone calling temporal network, two nodes are linked by an edge in two successive layers. If there is a coupling connecting the corresponding nodes in both layers, then we can tell that this call lasts through these two time points.
Otherwise we can tell that they have two calls at both of the time points. Therefore, between-layer couplings only appear between adjacent layers in such temporal networks. In order to satisfy this we let e {vw} isr = 0 when |s − r| = 1 or the link between the nodes does not last between two time points.
Notice that the interval between two time slices can also be unequal. For example, the Facebook social networks of a person when he was 15 and 16 will be similar but they may have large difference compared with the network when he was 20. Such time interval problem can be addressed just like the unevenly distributed layers discussed before.

Signed Networks
Connections in complex systems reflect either positive or negative interactions between nodes, which can be modeled as signed networks that contain edges with positive or negative weight [9], [10]. The effect of both kinds of edges on the structure of such networks should be distinguished: the contribution of positive edges should be awarded, while the contribution of the negative edges should be punished. In [16], Mucha et al. derive the modularity by using a Laplacian dynamics operator that contains the sign information. We can bring in signed edges into the proposed metric by representing the adjacency A Thus we obtain the signed version of the proposed metric This is equivalent to considering the within-layer modularity as the combination of two "networks" with opposite contribution. We can now conclude that the proposed metric is able to deal with signed networks by considering the negative edges as additional networks of the within-layer modularity.

MSPEC: AN ITERATIVE SPECTRAL OPTI-MIZATION OF MULTILAYER MODULARITY
In order to find a good approximation of the optimal solution of multilayer modularity maximization problem, Mucha et al. adopted a generalized Louvain method, which hierarchically merges two communities to increase the modularity score [16]. The result is improved by a KL-swap step that swaps the nodes between the communities to see if further increase on modularity score is possible [37]. But such optimization method is unstable, so we need to run it multiple times to avoid converging to a local maxima. And it sometimes fails to find expected number (always small) of communities, since the  Fig. 3. Supra-adjacency matrix of a multilayer network with 3 aspects. The first aspect consists of two layers and the others contain only one layer. The nondiagonal blocks of the supra-adjacency matrix represent the between-layer adjacency of the layers. Since we only consider the between-layer couplings, these blocks are all diagonal. The diagonal blocks record the withinlayer adjacency. Since we only discuss about undirected networks, the supra-adjacency matrix is symmetric.
algorithm stops before the number of communities decreases to the desired value. Newman et al. proposed a spectral method for single-layer modularity optimization which hierarchically divides the network into two communities [26]. Inspired by their work, we propose a spectral bisection method called mSpec based on the supra-adjacency representation of the multilayer network. This method will provide more stable performance as will be discussed in the experiments.

Supra-adjacency representation: An Equivalent Single-layer Network
In multilayer network analysis, a supra-adjacency always refers to a single-layer network which is flattened from a multilayer network [15], [27], [39], [43], [44]. The basic idea is to combine two layers which are represented by two N × N graphs, to obtain an expanded layer which is represented by a 2 × 2 block graph with the diagonal blocks representing the within-layer adjacency of each layer and off-diagonal blocks representing the between-layer couplings. By repeating such flattening step until the number of layers reduces to one, we obtain an expanded equivalent single-layer network containing all nodes in the original multilayer network, where the nodes are distinguished from their copies in different layers and aspects (see Fig. 3).
Based on the supra-adjacency representation, we obtain a mapping from a multilayer network to an equivalent singlelayer network where the mapped subscript for node i in layer s {v} is We therefore can apply the same mapping on the modularity matrix which records the modularity of each node pair (i, j) in each layer pair (s {v} , r {w} ) to obtain a supra-modularitymatrix 13) We will illustrate this supra-modularity-matrix maintains all the information of the original multilayer network and can be utilized for optimization.

Dividing Networks into Two Communities
Let the index matrix L identify the community label of each node in each layer Then we can rewrite the modularity function as We notice that which means once the graph is given, χ is a constant value and will not influence the global maximization of modularity function. Also, the 1 µ and 1 2 values in the parentheses do not make sense in the maximization, either. So our objective function can be rewritten as Then we can map the multilayer network to the corresponding supra-adjacency as described in Eq. (13) where the mapping is performed according to Eq. (12). We can also bring in a new label vector z with Therefore, we can represent the objective function Eq. (17) as By applying this mapping, the problem is converted to be a relatively simple one, on which we can apply the same spectral method used in the single-layer case. We can solve it by utilizing the eigenvectors and eigenvalues of matrix D as follows We can then represent z as the linear combination of the eigenvectors of D, i.e. z = x a x u x , where u x is the xth eigenvector of D and a x is the corresponding weight. We can obtain a according to the fact that D is symmetric because B We know that in order to maximize Q, supposing that the eigenvector corresponding to the largest eigenvalue is u M , all we need to do is to assign the vector z according to u M Thus we obtain the optimal division using the supramodularity-matrix.

Dividing Networks into More than Two Communities
To divide the network into more communities, we have to rewrite the additional modularity contribution of further division. Suppose the subcommunities after dividing community C are A and B, we have where L {w} jr ∈ {−1, +1} is the community label indicating to A or B the node belongs. Here we use the fact that the sum of entries of modularity matrix B is constant once the network is determined so that it will not influence the optimization. Then the multilayer modularity gain can be written as Similarly, we also bring in an assistant matrix D to maximize the global modularity, B {vw} isjr = D xy . Notice that is also symmetric and isjrvw B {vw}(C) isjr = 0, so we can repeatedly apply the bisection method on the detected communities using D as the modularity matrix B until the modularity gain ∆Q does not increase.

Complexity Analysis
The mSpec method is based on a linear mapping and spectral decomposition. The time complexity of the linear mapping is O( where N is the total number of nodes in a single layer and V v is total layers within aspect v. By applying Lanczos algorithm [45], finding the dominant eigenvector can be done in O(( [26]. Thus, suppose there are k divisions, we can complete the total calculation in time . The total number of divisions depends on the depth of the division tree, which is expected to be log( . In summary, the theoretical efficiency of the mSpec method can be guaranteed as the scale of data grows.

EXPERIMENTS
In this section we present community detection results using the proposed modularity in several multilayer networks. As we will demonstrate in the results, 1) the proposed method can be applied to a wide range of networks by flexibly adjusting the couplings and parameters; 2) the mSpec is more stable than the generalized Louvain method.
We conduct several experiments on a well-known benchmark network to discuss how the parameters can influence the results of community detection. The proposed method is also applied to the electroencephalograph (EEG) networks as an attempt of its application, the result of which turns out to coincide with the functional division of the human brains. In order to evaluate the performance of the proposed modularity optimization method (mSpec), it is compared with baseline optimization methods. As will be reported, the proposed optimization performs more reliably as the coupling scale varies.
The networks we use in experiments are 1) Parameter analysis data: • Zachary Karate Club network: network of friendships between 34 members of a karate club in a US university [46]. 2) Comparison data: • CKM-Physicians Innovation multilayer network: a network of the physicians' adoption of a new drug, tetracycline, in four towns [47]. There are 246 nodes and 3 layers (according to three questions asking about the relationship between the physicians). • CS-Aarhus social network: a multilayer social network consists of five online and offline relationships (5 layers) between 61 employees of Computer Science department at Aarhus [48]. • Kapferer Tailor Shop network: a time-varying network recording the interactions in a tailor shop in Zambia over ten months [49]. The network consists of two layers according to the interaction types and 39 nodes. • Krackhardt High-Tech network: three kinds of social relationships (Advice, Friendship and "Reports to") between 21 managers of a high-tech company [50]. • London Transportation network: multilayer transportation network of 369 London train stations with 3 layers recording different types of connection (Underground, Overground and DLR) [51]. This network is relatively sparse. • Padgett Florentine Families network: the network of marriage alliances and business relationships between Florentine families in the Renaissance [52]. There are 16 nodes in total. • Vickers Class Relation network: the networks collected from 29 seventh grade students in an Australia school about three questions on the classmate relationship ("Get on with", "Best friend" and "Prefer to work with") [53].

3) Case Study data: EEG network
• Signed multilayer network that characterize the correlation of the testees' brain regions during a visual stimuli test. The nodes include 128 scalp electrodes as well as a standard control electrode and 11 testees and several test records form a two-aspect multilayer network.

Parameter Analysis
In order to study how the parameters (i.e. γ s and ω) in the proposed method influence the community detection results, we conduct experiments with similar experimental settings as Mucha et al. do in [16]. We construct a ten-layer network with resolution parameter γ s ∈ {0.1, 0.2, . . . , 1}, where the adjacency of each layer is the benchmark network Zachary Karate network [46] and we assume the between-layer coupling exists between any pair of nodes and their copies. We perform community detection on the generated network with different coupling strength parameter ω ∈ {0, 0.01, 0.1, 1, 10} and the community assignment for each node in ten layers is depicted with different colors.
From Fig. 4 we can see, when ω = 0, the layers show great divergence due to the value of resolution parameter γ s . As γ s grows, the network is inclined to split into subcommunities. By comparing with standard community label, we see the detection result with parameter γ s setting from 0.5 to 0.9 matches the ground truth, while there are misclassification in the rest. As ω increases, we see the nodes in different layers tend to be assigned to the same community. When ω = 1, we see that every node has the same community label as its copies in other layers, and the detection result consistent with the ground truth.
We can then conclude that, the resolution parameter γ s controls the tendency of the splitting and the coupling strength parameter ω controls the consistency of the community assignment between layers. Too large or too small γ s will cause misclassification, which can be fixed, however, by the between-layer couplings. Meanwhile, too small ω will lead to the isolation between layers. When there are noises in the network data, the result can be poor (as shown in Fig.  4(a)) since cross-layer information has not been fully utilized.
Nevertheless, the peculiarity of each layer will be damaged by large ω (as shown in Fig. 4(d)).
In section 5.2 we will compare the performance of several algorithms as the network scale varies, where we bring in a parameter ρ to explicitly control the coupling density. However, since ρ reflects the density of the raw network data, we can consider it as a super parameter that is unalterable once the network is given.

Comparison Results
For comparison, several state-of-the-art approaches are used so as to evaluate the performance of the proposed optimization method (mSpec): 1) mLouv: multilayer Louvain-like method plus KL-swap improvement, which is the most widely adopted heuristic method for modularity optimization [16]; 2) sMeanSpec: single-layer spectral optimization method that will be applied on the mean of adjacency matrices of all layers [54]; 3) sFullSpec: single-layer spectral optimization method applied on each layer [26], [32].
In order to examine the reliability of the proposed method, the detection is performed over seven datasets with different between-layer coupling density ρ. The parameter ρ depending on the raw network data reflects how closely connected any two layers are, and in experiments we generate random       between-layer couplings according to the probability ρ. The nodes are linked with all its copies in other layers when ρ = 1 and there is no couplings at all when ρ = 0. The result is evaluated by the modularity value Q computed according to Eq. (7) using the community assignment of each algorithm, as shown in Tables 2 to 8. The variance and mean of the modularity value reflect the stability and reliability of each algorithm against network with different between-layer coupling scales.
As the results suggest, the proposed method significantly outperforms the existing methods, achieving 18.65% improvement over the second best in terms of mean modularity values while maintaining a relatively low variances. The mLouv method and sMeanSpec method show low Q when the couplings are sparse (small ρ) and high Q when the couplings are dense (large ρ), while sFullSpec performs oppositely. This is because the mLouv and sMeanSpec methods incline to look for a global community label for all nodes and ignore the peculiarity of each layers, so that when the couplings are sparse (which implies high heterogeneity between layers), such algorithm fail to make a distinguished assignment. Similarly, the performance of sFullSpec degenerates seriously when the couplings are dense since it runs detection over each layer respectively and lacks the consideration of consistency. The proposed method is based on a supra-adjacency representation of the multilayer network, with ω dominates the consistency. This guarantees the reliable performance of the proposed method against networks with different conditions of the connection between layers. In a nutshell, the proposed method performs stably as the coupling density varies so that is relatively reliable when the condition of the raw network is unclear.

Case Study: EEG Network
The event-related potentials (ERPs) which are measured by means of electroencephalography (EEG) is the measured brain response of testee with a specific stimuli [55], [56]. Since the EEG monitoring collects electrical impulse data from the electrodes placed on the scalp, it should be totally noninvasive in most cases except for an inevitable invasive electrode for specific application. Moreover, the monitoring process is silent so that the auditory disturbance is reduced to a very subtle level and is tolerant to subject movement. Owing to the numerous advantages, EEG is widely adopted as the analysis tool for brain activity, especially on children testees. Nevertheless, the traditional output of the EEG monitoring manifests as waveforms, so that the analysis of them is unintuitive and usually relies on the experiential judgements of the EEG providers. In recent years, more and more research focus is concentrated on the analysis of EEG data, but almost all of such work focuses on the average performance of similar testees, which may lead to the loss of information about each distinct testee [57], [58], [59]. In this experiment, we attempt to apply the proposed method on the signed multilayer network generated from the EEG data to explore the functional performance of the regions of brain. We compare the detected result with a standard empirical brain functional region division to find a surprising match between clinical experience and graph data mining [60].
We regard the 128 electrodes and a standard control electrode placed on the testee's scalp as 129 nodes involved, and calculate the correlation coefficients between the ERPs recorded from each pair of electrodes when the testee is given a series of visual stimuli as the edge weights between them. Thus we generate a single-layer network based on the EEG data of one test record of a specific testee. By combining the networks generated in this way from 11 testees and their several test records, we obtain a two-aspect EEG network that contains the information of the brain activities of all testees. Since the electrodes are placed identically for every testee, we assume the between-layer coupling exists between each pair of corresponding electrodes. We can adjust the parameter γ s to control the resolution and ω to control the consistency of the detection result of each testee. The detection results on the first four testees are shown in Fig. 5.
We find that the EEG networks are always divided into two communities, yellow and blue, in all experiments. By comparing the detection results with the corresponding adjacency, we observe that the edges with negative weights mainly lie between the two communities and within each community the nodes are connected by the edges with positive weights. Therefore, in order to better illustrate the brain terrain, we directly treat the dominant eigenvector u M of the modularity matrix as the detected community labels of the corresponding nodes for plotting since such non-binary labels make it possible to picture the contour of the brain. Say, the dominant eigenvector is (0.5, 0.2, −0.1, −1) and the label vector will also be (0.5, 0.2, −0.1, −1), where the last two nodes will be dyed blue (the darkness distinguishes the magnitude) and the first two will be dyed yellow. Meanwhile, since such treatment also maximize the modularity function, the result is more accurate and reliable than discrete community labels. We present the continuous community label as the topographic map of the brain where the two communities correspond to regions with different color. By adding the standard brain (a) testee 1 (b) testee 2 (c) testee 3 (d) testee 4 Fig. 5. The detection result of EEG network. We randomly pick 4 layers from the multilayer network. The standard brain region division is plotted with different symbols: i) purple triangle: prefrontal cortex that controls thinking, perception, information memory and attention; ii) white square: premotor cortex that controls eye movements; iii) blue circle: auditory cortex that controls the audition; iv) green star: somatosensory cortex that controls the sense of touch; v) red diamond: visual cortex that controls the sense of sight. The detected result is presented as the topographic map of the brain where we directly treat the dominant eigenvector µ as the community label. The blue region corresponds to the negative terms of µ while the yellow region corresponds to the positive terms, where the darkness indicates the magnitude of corresponding label value.
function region division to the figures, we find the detection results reach a surprising match with the widely accepted brain functional partition. The visual cortex (red diamond), prefrontal cortex (purple triangle) and the premotor cortex (white square) share the same community while the auditory cortex which is denoted with blue circles belongs to the other community. The former is more or less relevant to the visual and attention while the latter is closely related with audition. The results coincide with the clinical experience that the visual and audition always demonstrate relatively strong divergence and interaction. Moreover, from the colorbar attached we can notice the magnitude of continuous community label of the blue part which corresponds to visual brain region is much higher than that of the yellow part which refers to the auditory region. The magnitude of the continuous label indicates the contribution of the node to the global modularity value, which can imply how active the region is during the test. Therefore, we see the visual region is much more active than the auditory region, which coincides our intuition.
To sum up, this experiment on EEG network shows encouraging results about the feasibility of the proposed method on empirical networks. It also provides a new direction of the application of the proposed method and similar approaches.

CONCLUSION
In this paper, we discussed the representation of multilayer networks with multiple aspects and then derived the multilayer modularity based on the assumption of the contribution of the edges and couplings. According to the derivation, we demonstrate that the modularity prefers the community structure where the edges and couplings are densely distributed within the communities. Then we proposed a spectral bisection method for optimization of the modularity based on the supra-adjacency representation. In the experiment section, we reported the performance of the proposed evaluation metric as the parameters change and the comparison result with some other baseline methods. We applied the proposed method on a two-aspect EEG network as an attempt of application, and the results tend to coincide with the functional region of the brain.