 Research
 Open Access
 Published:
Community detection based on influence power
Applied Informatics volume 4, Article number: 8 (2017)
Abstract
Nowadays, more and more people use social network to share their lives and communicate with each other. In real social network, a person is influenced by others and also influences others at the same time. So the status of a person in the network can be determined by his influence power. In other words, a person with larger influence power always plays more important role and is more likely to act as a core of a community. Different from most of existing community detection algorithms which concentrate on the topology of networks, we propose an algorithm based on influence power to discover potential core members from which the community structure can be revealed. Extensive experiments confirm that our proposed algorithm has good performance in detecting community in real social network.
Introduction
With the development of network technology, more and more people use social platforms such as Twitter and Facebook every day to share their lives and follow what others share. For social platforms, they are badly in need of classifying their users according to status, hobby or other factors, which can help them manage users in different communities and push advertising information precisely (Wang et al. 2014; Ding et al. 2017). Because of the rapid growth of social platforms, the topic of community detection in social networks has attracted a lot of attention in recent years (Xu et al. 2013; Zhang et al. 2017).
In both online and offline social networks, influence power always plays an important role in social activities. A person who has great influence power can be the core of a group with close relationship, which is called a community. Every person in social network is affected by his friends while he is affecting others. From the intuitive phenomena mentioned above, we propose a novel algorithm based on influence power to discover potential core members in the social network from which community structure can be revealed.
Related work
Detecting communities is of great importance in disciplines such as sociology, biology and computer science, where systems are often represented as graphs (Fortunato 2010; Wang et al. 2013). In the past few decades, many algorithms have been developed for community detection because community is one of the most common and important properties to reveal the hidden structures of a network (Newman 2010).
One representative community detection algorithm is modularity (Newman 2006; Wang et al. 2016), which is a typical measure for evaluating the community structure and has been widely used in many algorithms as evaluation criterion. Generally, many of the existing community detection algorithms are based on the topological structure of the network, such as GN Newman and Girvan (2004), CNM Clauset et al. (2004) and Ratio Cut Wei and Cheng (1991). They mainly focus on the distribution of nodes and how edges connect the nodes in the network. Among the topological structurebased algorithms, there are many directions. For example, some algorithms focus on local topological structure such as GCE (Lee et al. 2010) and some focus on higher order feature of network topology such as motifbased algorithms (Benson et al. 2016).
With the development of community detection, it is revealed that real community distribution does not always obey the rules of topological structure. That is, the detection result strictly based on topological structure may have a huge deviation compared to the ground truth. To improve the accuracy of detection, some efforts have been made by taking into account the formation of the networks and communities when designing algorithms. For example, node attributes may affect the behavior of nodes, and then affect the connections between nodes (Yang et al. 2014). For social networks, human influence is an important factor in community formation, which means the human factors may help the detection result approach the true community distribution. To address the above issue, we designed a new algorithm based on influence power, which is an important factor in social activities. At the same time, we also take the topological rules into account and use modularity as criterion.
In the existing community detection algorithms, the way to discover communities can be divided into two types. One type is dividing all members into different communities directly. For example, label propagation algorithm (LPA) (Raghavan et al. 2007) can get detection result directly. The other type is finding community cores first and then allocating the rest members. For example, some community detection algorithms developed from clustering algorithms mainly belong to this type, such as the OCDDP algorithm (Bai et al. 2016) developed from the densitypeak algorithm (Rodriguez and Laio 2014). Our influence power based algorithm belongs to the second type which is also an iterative algorithm.
Our work
Different from other kinds of network, real social network is formed by humans, which means there exist some social factors in human activities that result in the formation of network topology. It is curious to know what social factors can affect the formation of network. Inspired by this perception, we discover that influence power in the social activities can reveal the importance of a person in the network. Namely, a person with greater influence power in social activities will have more followers around, so the corresponding node in the network topology has larger probability to be a core of a community.
Apart from the aforementioned basic idea, one observation is revealed about influence power as follows. That is, the energy of a person on social contact is limited. If he is busy with reading message from others (be influenced more), he has little time to express himself or send message to others (influence others less). So one assumption is made that influence power of a person can be counteracted by the power of being influenced.
Accordingly, the change of influence power can be described in the following three phases. First, every node starts with an initial value as its initial influence power. In this part, a random initialization strategy based on degree centrality is used Freeman (1978). Second, every node can influence its neighbors and be influenced by its neighbors, which can lead to the pure influence power. Third, as time evolves, if the pure influence power of a node is negative, it will lose chance to be a core and be ignored temporarily. Otherwise, the node can be retained and the pure influence power in this iteration will determine the influence power in the next iteration. In every iteration, the remaining nodes are chosen to be the potential cores of distinct communities. As a temporary result, communities will be detected by allocating noncore nodes into the community associated with the potential cores. We will always save the community detection result with largest modularity until convergence.
Compared with most of the existing algorithms, the proposed approach not only takes topological rules into account, but also introduces an important social factor, namely influence power. Due to this reason, our approach achieves a good performance in detecting communities in real social network, which is confirmed by extensive experiments in “Experiments” section.
Community detection based on influence power
Preliminaries
In social network platforms, everyone would influence others and be influenced by others at the same time. So we use BI\(_i\)(BeInfluenced) to measure how much user i is influenced by his neighbors and IP\(_i\)(InfluencePower) to measure how much user i can influence others. Assuming that these two variables can counteract each other, we define PP\(_i\)(PurePower) to represent the pure influence power of user i, representing the centrality of user i in the community. Figure 1 shows how IP and BI work in our model.
The input of our algorithm is an undirected graph \(G=(V,E)\) where V is the set of nodes and E is the set of edges. In addition, \(e=\{i,j\}\in E\) represents the edge between node i and node j. Some definitions are introduced as follows.
Definition 1
(Neighbors of Node i) Given an undirected graph \(G=(V,E)\), neighbors of a node \(i \in V\) is the set N(i) containing directly linked nodes with i, i.e., \(N(i) = \{j \in V  \{i,j\} \in E\}\).
Definition 2
(Social Circle of Node i) Given an undirected graph \(G=(V,E)\), social circle of a node \(i \in V\) is the set S(i) containing both node i and its neighbors, i.e., \(S(i) = N(i) \cup \{i\}\).
Given an undirected graph G, we cannot quantify the intimate degree of two directly linked nodes in the network, which is used in both updating and allocation procedure in our algorithm. An intuitive way to solve this problem is Jaccard Similarity (Jaccard 1912). Obviously, if two nodes have more common neighbors, their relationship is closer and they are more similar in the network.
Definition 3
(Intimate Degree) The intimate degree of two linked nodes i and j is defined as
From the above equation, \(\omega _{ij}\) can be described as the ratio of common friends of user i and user j in their social circle. As for weighted graph, we can use weighted Jaccard Similarity.
Initialization
According to degree centrality, the degree of each node can be used to initialize IP directly, which is based on the assumption that a person with more neighbors has more possibility to be a core in community so he has greater influence power. This initialization strategy can give a stable result, but it may lead to degenerated results. Another strategy is to initialize IP randomly. Though random initialization may lead to different initial results, this strategy can avoid being trap into degeneration and find the acceptable performance by running several times. By combining the above two strategies, a random initialization strategy based on degree centrality can be derived.
Update procedure
BI updating
In this procedure, we update the BeInfluenced value BI of every user in the network. One observation is that, everyone in social network is influenced by his neighbors. To measure how much a user is influenced, BI depends on two factors: (1) intimate degree between a person and his neighbors, (2) influence power of a user’s neighbors. These two factors are easy to understand for the reason that everyone is easier to be influenced by his close friends, especially his friends who have great influence power.
That is, the value BI\(_i\) for every user i can be updated as follows,
where \(W_{ij}\) represents normalized intimate degree.
PP updating
In this procedure, based on the aforementioned assumption that IP and BI can counteract each other, we can calculate the value PP of every user i in the network.
where \({\lambda }\) is a factor to control the relationship between IP and BI. It is important to note that when the pure influence power PP\(_i<0\), which means user i is influenced by others more than he influences others, so he loses the possibility to be the central person of the community. Due to this reason, if PP\(_i<0\), user i and his relationship will be temporarily ignored in the process of iteration until we select the potential cores of communities, as shown in Fig. 2.
IP updating
After we get the value PP and temporarily remove some users which are impossible to be potential cores, we will go for a new iteration. Before that, we should renew the value IP. From our perspective, the influence power of a user may depend not only on himself, but also the influence of his neighbors because if someone has an influential friend, his influence power may become greater. Namely, the average influence power of a social circle may affect the influence power of every user in it. Based on this fact, we calculate the value IP\(_i\) for every user i as follows,
where \(\chi (x)\) = 1 if \(x\le 0\) and \(\chi (x)=0\) otherwise. dis\(_{ij}\) means the distance between user i and his neighbor j, while dis\(\_c\) is a threshold of distance. For example, if dis\(\_c=0\), that means we only concern about the influence power of a user himself. If dis\(\_c=1\), that means we take the influence power of a user’s direct friends into consideration at the same time. And the choice of dis\(\_c\) depends on specific network. \({S'(i)}\) means the amount of users in range dis\(\_c\) of user i. From Eq. (5), we can regard IP\(_i\) as the average pure power of neighbors in a specific range of user i.
Terminal condition and allocation
The above updating procedure iterates until convergence, resulting in the detected communities. In particular, in one iteration, if no node is cut off and PP changes little (i.e., the change is smaller than some very tiny threshold), it means the iteration has converged and should be terminated. In every iteration, we regard the remaining nodes as potential cores. Jaccard Similarity is used to allocate noncore nodes to the most similar potential core and get a community detection result. For communities, modularity is calculated for every detection result as our criterion. Besides, the largest modularity and the corresponding community detection result are always recorded until convergence.
The overall algorithm
In summary, the method we proposed above is an iteration method to detect communities in social network. In every iteration, we calculate BI, IP and PP of all nodes in the network. When the value of PP of a node is negative, it will be cut off temporarily. After iterating many times, the remaining nodes are more likely to be the core nodes of communities. At the end of iteration, noncore nodes will be allocated to core nodes, leading the community detection result. Besides, modularity of the detecting result will be calculated and the best result will be always remained until iterations terminate. Algorithm 1 shows the procedure of the whole method based on influence power.
Experiments
Experimental setting
To conduct parameter analysis and test the performance of the proposed method, both synthetic networks and real social networks are used. For comparison, some classical community detection methods, such as FastNewman (Newman 2003), Louvain (Blondel et al. 2008), Infomap (Rosvall and Bergstrom 2007) are used as comparison algorithms. For evaluation purposes, both modularity (i.e., Q value) and NMI (Normalized Mutual Information). Danon et al. (2005) are used to measure the community detection results.
Synthetic networks
We use LFR benchmark (Lancichinetti et al. 2008) to generate synthetic networks, which has been widely used in community detection.
To test the performance of proposed method in networks with different scale, we generate networks with the number of nodes ranging in (20, 2000) and the average degree k ranging in (3, 15). As for other parameters, the number of overlapping node is set to be 0 because the proposed method is not an overlapping community detection method. Besides, the mixing parameter \(\mu\) is set to be 0.1, which indicates the generated networks have relatively clear community structure. The configurations of LFR datasets are listed in Table 1.
Real social networks
Different from synthetic networks, every real social network has its unique background, which may impact on the final community distribution. In other words, a community of real social network is determined not only by topological structure but also background or more. Six real social networks are used for comparison experiments, namely Karate (Zachary 1977), Dolphin (Lusseau et al. 2003), Polbooks (Krebs 2017), Football (Girvan and Newman 2001), Jazz (Gleiser and Danon 2003) and Science (Newman 2006), the first four of which have groundtruth community labels. Due to the effect of background on real social network mentioned above, it is necessary to introduce their background according to Newman (2017).

Karate social network of friendships between 34 members of a karate club at a US university in the 1970s.

Dolphin an undirected social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand.

Polbooks network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com. Edges between books represent frequent copurchasing of books by the same buyers.

Football network of American football games between Division IA colleges during regular season Fall 2000.

Jazz social network of jazz musicians.

Science coauthorship network of scientists working on network theory and experiment.
Parameter analysis
In this section, parameter analysis is conducted on analyzing the effect of two parameters \(\lambda\) and \(\text {dis}\_c\) on six LFR networks. When analyzing one parameter, the other parameter is fixed and degree is used to initialize IP directly.
Parameter \(\lambda\)
The parameter \(\lambda\) is used to balance how much BI can counteract IP. With larger \(\lambda\), more influence power of a person will be counteracted. For each LFR dataset, we set \(\lambda\) from 0.1 to 1.0 with step 0.1 and get the result. In this parameter analysis, \(\text {dis}\_c\) is set to 0. Figure 3 shows the change of Q value, NMI and iteration times with different \(\lambda\) on different LFR datasets. Though we can not find a value of \(\lambda\) that always performs the best, we can still find some rules. First, in most situations, Q value and NMI are always relatively larger with relatively larger \(\lambda\), though not absolutely. Second, we can see that Q value and NMI always change synchronously, which implies that these two criterions have similar features in some respects and for synthetic networks Q value can precisely evaluate the community detection result without ground truth. Third, from the red curve, which represents the number of iterations reaching convergence, it is obvious that when \(\lambda\) becomes smaller, the iteration number increases several times over. According to what we analyzed above, we set \(\lambda\) = 1 in comparison experiments, for less iterations and better performance.
Parameter \(dis\_c\)
According to “Community detection based on influence power” section, IP of a user in the next iteration can be calculated by the PP values of himself and his friends in a threshold distance \(\text {dis}\_c\) in this iteration. For each LFR dataset, we set \(\text {dis}\_c\) from 0 to 5 with step 1 and get the result. In this parameter analysis, \(\lambda\) is set to 1.
As shown in Fig. 4, the performance of our algorithm roughly becomes worse while \(\text {dis}\_c\) becomes larger, though not absolutely. Unlike experiments on \(\lambda\), we can easily find from the blue and green curves that when \(\text {dis}\_c=0\), our algorithm always performs better than other \(\text {dis}\_c\) values in these six LFR networks. From the red curve, different values of \(\text {dis}\_c\) have little influence on the number of iterations. Based on our analysis above, we set \(\text {dis}\_c=0\) in comparison experiments.
Comparison experiments
In comparison experiments, we use six LFR datasets and six real social networks mentioned above to see how the proposed algorithm performs, compared to three classical algorithms, namely FastNewman, Louvain and Infomap. To explore the potential of our algorithm, we use random initialization strategy based on degree centrality to initialize IP, test several times and record the best results. Table 2 shows the results of our proposed method compared with other three classical algorithms.
For networks with ground truth, though we measure both Q value and NMI, it should be noted that we care more about NMI because this criterion is based on ground truth. In other words, large NMI represents that communities divided by the algorithm are more close to real communities. For those networks without ground truth, we can only use Q value to evaluate the performance of algorithms.
LFR network analysis
As shown in Table 2, for the reason that LFR networks obey the topological rules strictly and they are generated without any social background, the proposed influence power based algorithm does not perform the best in most cases compared to the topological structure based algorithms. But since we take topological rules into consideration and use modularity as criterion, the results of our algorithm are always at the relative high level and are never the worst one compared to other three algorithms. So for synthetic networks, we think the performance of our algorithm is competitive.
Real social network analysis
According to the result of real social networks, the proposed influence power based algorithm performs better in most of the cases from the perspective of NMI. More specifically, it achieves better NMI than the second best algorithms in networks Karate (by 0.301), Dolphin (by 0.002) and Football (by 0.011). Though the performance of the proposed influence power based algorithm in network Polbooks is worse than Infomap (by 0.101), we find that our method still performs the second best, and the background of this dataset is about copurchasing of books, not about relationship of human directly. From the perspective of Q value, we can see that our method is also competitive among all compared algorithms. From the results, we can conclude that the proposed algorithm has better performance in real social networks than other three classical algorithms in our experiments.
The overall analysis
In LFR networks, algorithm always gets maximum Q value and NMI at the same time. However, in real social networks, algorithm often gets the maximum NMI without the maximum Q value. For example, in network Karate, the proposed algorithm has the result with NMI = 1, which means the detected communities exactly match the groundtruth, but the Q value of our algorithm is smaller than FastNewman and Infomap. From this point, we encounter a problem that larger Q value does not indicate larger NMI in real social networks, which is opposite to the conclusion drawn from the results in LFR networks. To cope with this problem, we should reemphasize the difference between synthetic networks and real social networks, the social factors in the background of datasets. As for LFR networks, they are generated with mixing parameter \(\mu =0.1\), which means that the community structures are very clear and easily to be detected. But for real social networks, their community structure form with unique background and they do not strictly obey the topological structure theory, which means high Q value does not directly indicate good detection result in real social networks. From this point of view, to detect real social networks with high accuracy, social factors which can affect the formation of the final network should be concerned in community detection. Otherwise, if only taking topological structure into account, the Q value of the result may be very large, but the algorithm may divide the network in a wrong way, which leads to small NMI values. For example, the Infomap algorithm always gets the maximum Q value in real social networks, but its NMI values are always smaller than those of our algorithm.
Aiming at detecting communities in real social networks, influence power, the starting point of the proposed algorithm, affects people on daily social activities, and then leads to the formation of communities. From the experimental results and the background of those real social networks, we can conclude that our algorithm has great potential and good performance to detect communities in real social networks with background about friendship or direct social behavior of human, such as network Karate and Football, compared with those methods only concerning about topological structure.
Conclusions
In this paper, we have proposed a community detection algorithm based on influence power from a novel perspective, taking both social factor and topological structure into account. We have shown the procedure of initialization, iteration and allocation of our algorithm. Besides, we have conducted experiments on both synthetic networks and real social networks. According to the results, we have also analyzed why our algorithm has potential to perform better than the comparison algorithms only based on topological structure in real social networks. In a word, many human factors paly an important part in the formation of social networks, which implies that they do not fully follow the existing topological structure theory. To detect communities of social networks more precisely, it is significant to design algorithms based on human factors such as influence power in the future.
References
Bai X, Yang P, Shi X (2016) An overlapping community detection algorithm based on density peaks. Neurocomputing 226:7–15
Benson AR, Gleich DF, Leskovec J (2016) Higherorder organization of complex networks. Science 353(6295):163–166
Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):155–168
Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6 Pt 2):066111
Danon L, Díazguilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09):09008
Ding Y, Huang L, Wang CD, Huang D (2017) Community detection in graph streams by pruning zombie nodes. PAKDD. Springer, Berlin, pp 574–585
Fortunato Santo (2010) Community detection in graphs. Phys Rep 486(3–5):75–174
Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239
Girvan M, Newman MEJ (2001) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–6
Gleiser PM, Danon L (2003) Community structure in jazz. Adv Compl Syst 06(4):565–573
Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50
Krebs V (2017) Social Network Analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com/. Accessed 10 April 2017
Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(2):046110
Lee C, Reid F, Mcdaid A, Hurley N (2010) Detecting highly overlapping community structure by greedy clique expansion. PLoS ONE 6(4):e18961
Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of longlasting associations. Behav Ecol Sociobiol 54(4):396–405
Newman ME (2003) Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 69(6 Pt 2):066133
Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E Stat Nonlinear Soft Matter Phys 74(3 Pt 2):036104
Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577
Newman M (2010) Networks: an introduction. Oxford University Press, Inc., Oxford, pp 741–743
Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2 Pt 2):026113
Newman M (2017) Network data. http://wwwpersonal.umich.edu/~mejn/netdata/. Accessed 10 April 2017
Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in largescale networks. Phys Rev E 76(2):036106
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–6
Rosvall M, Bergstrom CT (2007) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123
Wang CD, Lai JH, Philip SY (2014) NEIWalk: community discovery in dynamic contentbased networks. IEEE Trans Knowl Data Eng 26(7):1734–1748
Wang CD, Lai JH, Yu PS (2013) Dynamic community detection in weighted graph streams. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, New Delhi, pp. 151–161
Wang X, Wang CD, Lai JH (2016) Modularity optimization by global–local search. In: IJCNN, pp. 840–846
Wei YC, Cheng CK (1991) Ratio cut partitioning for hierarchical designs. IEEE Trans Comput Aided Design Integr Circuits Syst 10(7):911–921
Xu H, Hu Y, Wang Z, Ma J, Xiao W (2013) Corebased dynamic community detection in mobile social networks. Entropy 15(12):5419–5438
Yang J, Mcauley J, Leskovec J (2014) Community detection in networks with node attributes. In: ICDM, pp 1151–1156
Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473
Zhang H, Wang CD, Lai JH, Yu PS (2017) Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl Inform 4(1):7
Authors' contributions
The authors discussed the problem and the solutions proposed all together. All authors participated in drafting and revising the final manuscript. All authors read and approved the final manuscript.
Acknowledgements
This work was supported by NSFC (61502543), Guangdong Natural Science Funds for Distinguished Young Scholar (2016A030306014), and Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (2016TQ03X542).
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 Modularity
 Community detection
 Social network
 Influence power