Community detection based on influence power
 Wei Shi^{1},
 ChangDong Wang^{1}Email authorView ORCID ID profile and
 JianHuang Lai^{1}
Received: 28 August 2017
Accepted: 25 September 2017
Published: 5 October 2017
Abstract
Nowadays, more and more people use social network to share their lives and communicate with each other. In real social network, a person is influenced by others and also influences others at the same time. So the status of a person in the network can be determined by his influence power. In other words, a person with larger influence power always plays more important role and is more likely to act as a core of a community. Different from most of existing community detection algorithms which concentrate on the topology of networks, we propose an algorithm based on influence power to discover potential core members from which the community structure can be revealed. Extensive experiments confirm that our proposed algorithm has good performance in detecting community in real social network.
Keywords
Introduction
With the development of network technology, more and more people use social platforms such as Twitter and Facebook every day to share their lives and follow what others share. For social platforms, they are badly in need of classifying their users according to status, hobby or other factors, which can help them manage users in different communities and push advertising information precisely (Wang et al. 2014; Ding et al. 2017). Because of the rapid growth of social platforms, the topic of community detection in social networks has attracted a lot of attention in recent years (Xu et al. 2013; Zhang et al. 2017).
In both online and offline social networks, influence power always plays an important role in social activities. A person who has great influence power can be the core of a group with close relationship, which is called a community. Every person in social network is affected by his friends while he is affecting others. From the intuitive phenomena mentioned above, we propose a novel algorithm based on influence power to discover potential core members in the social network from which community structure can be revealed.
Related work
Detecting communities is of great importance in disciplines such as sociology, biology and computer science, where systems are often represented as graphs (Fortunato 2010; Wang et al. 2013). In the past few decades, many algorithms have been developed for community detection because community is one of the most common and important properties to reveal the hidden structures of a network (Newman 2010).
One representative community detection algorithm is modularity (Newman 2006; Wang et al. 2016), which is a typical measure for evaluating the community structure and has been widely used in many algorithms as evaluation criterion. Generally, many of the existing community detection algorithms are based on the topological structure of the network, such as GN Newman and Girvan (2004), CNM Clauset et al. (2004) and Ratio Cut Wei and Cheng (1991). They mainly focus on the distribution of nodes and how edges connect the nodes in the network. Among the topological structurebased algorithms, there are many directions. For example, some algorithms focus on local topological structure such as GCE (Lee et al. 2010) and some focus on higher order feature of network topology such as motifbased algorithms (Benson et al. 2016).
With the development of community detection, it is revealed that real community distribution does not always obey the rules of topological structure. That is, the detection result strictly based on topological structure may have a huge deviation compared to the ground truth. To improve the accuracy of detection, some efforts have been made by taking into account the formation of the networks and communities when designing algorithms. For example, node attributes may affect the behavior of nodes, and then affect the connections between nodes (Yang et al. 2014). For social networks, human influence is an important factor in community formation, which means the human factors may help the detection result approach the true community distribution. To address the above issue, we designed a new algorithm based on influence power, which is an important factor in social activities. At the same time, we also take the topological rules into account and use modularity as criterion.
In the existing community detection algorithms, the way to discover communities can be divided into two types. One type is dividing all members into different communities directly. For example, label propagation algorithm (LPA) (Raghavan et al. 2007) can get detection result directly. The other type is finding community cores first and then allocating the rest members. For example, some community detection algorithms developed from clustering algorithms mainly belong to this type, such as the OCDDP algorithm (Bai et al. 2016) developed from the densitypeak algorithm (Rodriguez and Laio 2014). Our influence power based algorithm belongs to the second type which is also an iterative algorithm.
Our work
Different from other kinds of network, real social network is formed by humans, which means there exist some social factors in human activities that result in the formation of network topology. It is curious to know what social factors can affect the formation of network. Inspired by this perception, we discover that influence power in the social activities can reveal the importance of a person in the network. Namely, a person with greater influence power in social activities will have more followers around, so the corresponding node in the network topology has larger probability to be a core of a community.
Apart from the aforementioned basic idea, one observation is revealed about influence power as follows. That is, the energy of a person on social contact is limited. If he is busy with reading message from others (be influenced more), he has little time to express himself or send message to others (influence others less). So one assumption is made that influence power of a person can be counteracted by the power of being influenced.
Accordingly, the change of influence power can be described in the following three phases. First, every node starts with an initial value as its initial influence power. In this part, a random initialization strategy based on degree centrality is used Freeman (1978). Second, every node can influence its neighbors and be influenced by its neighbors, which can lead to the pure influence power. Third, as time evolves, if the pure influence power of a node is negative, it will lose chance to be a core and be ignored temporarily. Otherwise, the node can be retained and the pure influence power in this iteration will determine the influence power in the next iteration. In every iteration, the remaining nodes are chosen to be the potential cores of distinct communities. As a temporary result, communities will be detected by allocating noncore nodes into the community associated with the potential cores. We will always save the community detection result with largest modularity until convergence.
Compared with most of the existing algorithms, the proposed approach not only takes topological rules into account, but also introduces an important social factor, namely influence power. Due to this reason, our approach achieves a good performance in detecting communities in real social network, which is confirmed by extensive experiments in “Experiments” section.
Community detection based on influence power
Preliminaries
In social network platforms, everyone would influence others and be influenced by others at the same time. So we use BI\(_i\)(BeInfluenced) to measure how much user i is influenced by his neighbors and IP\(_i\)(InfluencePower) to measure how much user i can influence others. Assuming that these two variables can counteract each other, we define PP\(_i\)(PurePower) to represent the pure influence power of user i, representing the centrality of user i in the community. Figure 1 shows how IP and BI work in our model.
The input of our algorithm is an undirected graph \(G=(V,E)\) where V is the set of nodes and E is the set of edges. In addition, \(e=\{i,j\}\in E\) represents the edge between node i and node j. Some definitions are introduced as follows.
Definition 1
(Neighbors of Node i) Given an undirected graph \(G=(V,E)\), neighbors of a node \(i \in V\) is the set N(i) containing directly linked nodes with i, i.e., \(N(i) = \{j \in V  \{i,j\} \in E\}\).
Definition 2
(Social Circle of Node i) Given an undirected graph \(G=(V,E)\), social circle of a node \(i \in V\) is the set S(i) containing both node i and its neighbors, i.e., \(S(i) = N(i) \cup \{i\}\).
Given an undirected graph G, we cannot quantify the intimate degree of two directly linked nodes in the network, which is used in both updating and allocation procedure in our algorithm. An intuitive way to solve this problem is Jaccard Similarity (Jaccard 1912). Obviously, if two nodes have more common neighbors, their relationship is closer and they are more similar in the network.
Definition 3
From the above equation, \(\omega _{ij}\) can be described as the ratio of common friends of user i and user j in their social circle. As for weighted graph, we can use weighted Jaccard Similarity.
Initialization
Update procedure
BI updating
In this procedure, we update the BeInfluenced value BI of every user in the network. One observation is that, everyone in social network is influenced by his neighbors. To measure how much a user is influenced, BI depends on two factors: (1) intimate degree between a person and his neighbors, (2) influence power of a user’s neighbors. These two factors are easy to understand for the reason that everyone is easier to be influenced by his close friends, especially his friends who have great influence power.
PP updating
IP updating
Terminal condition and allocation
The above updating procedure iterates until convergence, resulting in the detected communities. In particular, in one iteration, if no node is cut off and PP changes little (i.e., the change is smaller than some very tiny threshold), it means the iteration has converged and should be terminated. In every iteration, we regard the remaining nodes as potential cores. Jaccard Similarity is used to allocate noncore nodes to the most similar potential core and get a community detection result. For communities, modularity is calculated for every detection result as our criterion. Besides, the largest modularity and the corresponding community detection result are always recorded until convergence.
The overall algorithm
Experiments
Experimental setting
To conduct parameter analysis and test the performance of the proposed method, both synthetic networks and real social networks are used. For comparison, some classical community detection methods, such as FastNewman (Newman 2003), Louvain (Blondel et al. 2008), Infomap (Rosvall and Bergstrom 2007) are used as comparison algorithms. For evaluation purposes, both modularity (i.e., Q value) and NMI (Normalized Mutual Information). Danon et al. (2005) are used to measure the community detection results.
Synthetic networks
We use LFR benchmark (Lancichinetti et al. 2008) to generate synthetic networks, which has been widely used in community detection.
LFR networks configurations
Network  #Nodes  #Edges  Avgdegree 

LFR1  20  25  2.5 
LFR2  50  89  3.56 
LFR3  100  220  4.4 
LFR4  1000  7773  15.55 
LFR5  1500  11,415  15.22 
LFR6  2000  15,480  15.48 
Real social networks

Karate social network of friendships between 34 members of a karate club at a US university in the 1970s.

Dolphin an undirected social network of frequent associations between 62 dolphins in a community living off Doubtful Sound, New Zealand.

Polbooks network of books about US politics published around the time of the 2004 presidential election and sold by the online bookseller Amazon.com. Edges between books represent frequent copurchasing of books by the same buyers.

Football network of American football games between Division IA colleges during regular season Fall 2000.

Jazz social network of jazz musicians.

Science coauthorship network of scientists working on network theory and experiment.
Parameter analysis
In this section, parameter analysis is conducted on analyzing the effect of two parameters \(\lambda\) and \(\text {dis}\_c\) on six LFR networks. When analyzing one parameter, the other parameter is fixed and degree is used to initialize IP directly.
Parameter \(\lambda\)
The parameter \(\lambda\) is used to balance how much BI can counteract IP. With larger \(\lambda\), more influence power of a person will be counteracted. For each LFR dataset, we set \(\lambda\) from 0.1 to 1.0 with step 0.1 and get the result. In this parameter analysis, \(\text {dis}\_c\) is set to 0. Figure 3 shows the change of Q value, NMI and iteration times with different \(\lambda\) on different LFR datasets. Though we can not find a value of \(\lambda\) that always performs the best, we can still find some rules. First, in most situations, Q value and NMI are always relatively larger with relatively larger \(\lambda\), though not absolutely. Second, we can see that Q value and NMI always change synchronously, which implies that these two criterions have similar features in some respects and for synthetic networks Q value can precisely evaluate the community detection result without ground truth. Third, from the red curve, which represents the number of iterations reaching convergence, it is obvious that when \(\lambda\) becomes smaller, the iteration number increases several times over. According to what we analyzed above, we set \(\lambda\) = 1 in comparison experiments, for less iterations and better performance.
Parameter \(dis\_c\)
As shown in Fig. 4, the performance of our algorithm roughly becomes worse while \(\text {dis}\_c\) becomes larger, though not absolutely. Unlike experiments on \(\lambda\), we can easily find from the blue and green curves that when \(\text {dis}\_c=0\), our algorithm always performs better than other \(\text {dis}\_c\) values in these six LFR networks. From the red curve, different values of \(\text {dis}\_c\) have little influence on the number of iterations. Based on our analysis above, we set \(\text {dis}\_c=0\) in comparison experiments.
Comparison experiments
Comparison results in terms of Q value and NMI
Datasets  Nodes  Influence power  FastNewman  Louvain  Infomap  

Q  NMI  Q  NMI  Q  NMI  Q  NMI  
LFR1  20  0.774  1.000  0.774  1.000  0.707  0.958  0.774  1.000 
LFR2  50  0.740  0.960  0.739  1.000  0.622  0.824  0.627  0.792 
LFR3  100  0.616  0.809  0.675  0.928  0.597  0.761  0.670  0.846 
LFR4  1000  0.844  0.974  0.843  0.934  0.862  1.000  0.862  1.000 
LFR5  1500  0.759  0.974  0.860  0.933  0.873  0.999  0.876  1.000 
LFR6  2000  0.975  0.999  0.979  1.000  0.979  1.000  0.971  0.971 
Karate  34  0.371  1.000  0.381  0.692  0.361  0.520  0.402  0.699 
Dolphin  62  0.446  0.575  0.495  0.573  0.495  0.451  0.520  0.491 
Polbooks  105  0.505  0.441  0.502  0.390  0.483  0.326  0.293  0.542 
Football  115  0.588  0.938  0.548  0.694  0.597  0.927  0.597  0.924 
Jazz  198  0.445  –  0.439  –  0.423  –  0.442  – 
Science  1589  0.946  –  0.881  –  0.870  –  0.954  – 
For networks with ground truth, though we measure both Q value and NMI, it should be noted that we care more about NMI because this criterion is based on ground truth. In other words, large NMI represents that communities divided by the algorithm are more close to real communities. For those networks without ground truth, we can only use Q value to evaluate the performance of algorithms.
LFR network analysis
As shown in Table 2, for the reason that LFR networks obey the topological rules strictly and they are generated without any social background, the proposed influence power based algorithm does not perform the best in most cases compared to the topological structure based algorithms. But since we take topological rules into consideration and use modularity as criterion, the results of our algorithm are always at the relative high level and are never the worst one compared to other three algorithms. So for synthetic networks, we think the performance of our algorithm is competitive.
Real social network analysis
According to the result of real social networks, the proposed influence power based algorithm performs better in most of the cases from the perspective of NMI. More specifically, it achieves better NMI than the second best algorithms in networks Karate (by 0.301), Dolphin (by 0.002) and Football (by 0.011). Though the performance of the proposed influence power based algorithm in network Polbooks is worse than Infomap (by 0.101), we find that our method still performs the second best, and the background of this dataset is about copurchasing of books, not about relationship of human directly. From the perspective of Q value, we can see that our method is also competitive among all compared algorithms. From the results, we can conclude that the proposed algorithm has better performance in real social networks than other three classical algorithms in our experiments.
The overall analysis
In LFR networks, algorithm always gets maximum Q value and NMI at the same time. However, in real social networks, algorithm often gets the maximum NMI without the maximum Q value. For example, in network Karate, the proposed algorithm has the result with NMI = 1, which means the detected communities exactly match the groundtruth, but the Q value of our algorithm is smaller than FastNewman and Infomap. From this point, we encounter a problem that larger Q value does not indicate larger NMI in real social networks, which is opposite to the conclusion drawn from the results in LFR networks. To cope with this problem, we should reemphasize the difference between synthetic networks and real social networks, the social factors in the background of datasets. As for LFR networks, they are generated with mixing parameter \(\mu =0.1\), which means that the community structures are very clear and easily to be detected. But for real social networks, their community structure form with unique background and they do not strictly obey the topological structure theory, which means high Q value does not directly indicate good detection result in real social networks. From this point of view, to detect real social networks with high accuracy, social factors which can affect the formation of the final network should be concerned in community detection. Otherwise, if only taking topological structure into account, the Q value of the result may be very large, but the algorithm may divide the network in a wrong way, which leads to small NMI values. For example, the Infomap algorithm always gets the maximum Q value in real social networks, but its NMI values are always smaller than those of our algorithm.
Aiming at detecting communities in real social networks, influence power, the starting point of the proposed algorithm, affects people on daily social activities, and then leads to the formation of communities. From the experimental results and the background of those real social networks, we can conclude that our algorithm has great potential and good performance to detect communities in real social networks with background about friendship or direct social behavior of human, such as network Karate and Football, compared with those methods only concerning about topological structure.
Conclusions
In this paper, we have proposed a community detection algorithm based on influence power from a novel perspective, taking both social factor and topological structure into account. We have shown the procedure of initialization, iteration and allocation of our algorithm. Besides, we have conducted experiments on both synthetic networks and real social networks. According to the results, we have also analyzed why our algorithm has potential to perform better than the comparison algorithms only based on topological structure in real social networks. In a word, many human factors paly an important part in the formation of social networks, which implies that they do not fully follow the existing topological structure theory. To detect communities of social networks more precisely, it is significant to design algorithms based on human factors such as influence power in the future.
Declarations
Authors' contributions
The authors discussed the problem and the solutions proposed all together. All authors participated in drafting and revising the final manuscript. All authors read and approved the final manuscript.
Acknowledgements
This work was supported by NSFC (61502543), Guangdong Natural Science Funds for Distinguished Young Scholar (2016A030306014), and Tiptop Scientific and Technical Innovative Youth Talents of Guangdong special support program (2016TQ03X542).
Competing interests
The authors declare that they have no competing interests.
Availability of data and materials
Not applicable.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Authors’ Affiliations
References
 Bai X, Yang P, Shi X (2016) An overlapping community detection algorithm based on density peaks. Neurocomputing 226:7–15View ArticleGoogle Scholar
 Benson AR, Gleich DF, Leskovec J (2016) Higherorder organization of complex networks. Science 353(6295):163–166View ArticleGoogle Scholar
 Blondel VD, Guillaume JL, Lambiotte R, Lefebvre E (2008) Fast unfolding of communities in large networks. J Stat Mech Theory Exp 2008(10):155–168View ArticleGoogle Scholar
 Clauset A, Newman ME, Moore C (2004) Finding community structure in very large networks. Phys Rev E 70(6 Pt 2):066111View ArticleGoogle Scholar
 Danon L, Díazguilera A, Duch J, Arenas A (2005) Comparing community structure identification. J Stat Mech Theory Exp 2005(09):09008View ArticleGoogle Scholar
 Ding Y, Huang L, Wang CD, Huang D (2017) Community detection in graph streams by pruning zombie nodes. PAKDD. Springer, Berlin, pp 574–585Google Scholar
 Fortunato Santo (2010) Community detection in graphs. Phys Rep 486(3–5):75–174MathSciNetView ArticleGoogle Scholar
 Freeman LC (1978) Centrality in social networks conceptual clarification. Soc Netw 1(3):215–239View ArticleGoogle Scholar
 Girvan M, Newman MEJ (2001) Community structure in social and biological networks. Proc Natl Acad Sci USA 99(12):7821–6MathSciNetView ArticleMATHGoogle Scholar
 Gleiser PM, Danon L (2003) Community structure in jazz. Adv Compl Syst 06(4):565–573View ArticleGoogle Scholar
 Jaccard P (1912) The distribution of the flora in the alpine zone. New Phytol 11(2):37–50View ArticleGoogle Scholar
 Krebs V (2017) Social Network Analysis software & services for organizations, communities, and their consultants. http://www.orgnet.com/. Accessed 10 April 2017
 Lancichinetti A, Fortunato S, Radicchi F (2008) Benchmark graphs for testing community detection algorithms. Phys Rev E 78(2):046110View ArticleGoogle Scholar
 Lee C, Reid F, Mcdaid A, Hurley N (2010) Detecting highly overlapping community structure by greedy clique expansion. PLoS ONE 6(4):e18961Google Scholar
 Lusseau D, Schneider K, Boisseau OJ, Haase P, Slooten E, Dawson SM (2003) The bottlenose dolphin community of doubtful sound features a large proportion of longlasting associations. Behav Ecol Sociobiol 54(4):396–405View ArticleGoogle Scholar
 Newman ME (2003) Fast algorithm for detecting community structure in networks. Phys Rev E Stat Nonlinear Soft Matter Phys 69(6 Pt 2):066133Google Scholar
 Newman ME (2006) Finding community structure in networks using the eigenvectors of matrices. Phys Rev E Stat Nonlinear Soft Matter Phys 74(3 Pt 2):036104MathSciNetView ArticleGoogle Scholar
 Newman ME (2006) Modularity and community structure in networks. Proc Natl Acad Sci 103(23):8577View ArticleGoogle Scholar
 Newman M (2010) Networks: an introduction. Oxford University Press, Inc., Oxford, pp 741–743View ArticleGoogle Scholar
 Newman ME, Girvan M (2004) Finding and evaluating community structure in networks. Phys Rev E 69(2 Pt 2):026113View ArticleGoogle Scholar
 Newman M (2017) Network data. http://wwwpersonal.umich.edu/~mejn/netdata/. Accessed 10 April 2017
 Raghavan UN, Albert R, Kumara S (2007) Near linear time algorithm to detect community structures in largescale networks. Phys Rev E 76(2):036106View ArticleGoogle Scholar
 Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–6View ArticleGoogle Scholar
 Rosvall M, Bergstrom CT (2007) Maps of random walks on complex networks reveal community structure. Proc Natl Acad Sci 105(4):1118–1123View ArticleGoogle Scholar
 Wang CD, Lai JH, Philip SY (2014) NEIWalk: community discovery in dynamic contentbased networks. IEEE Trans Knowl Data Eng 26(7):1734–1748View ArticleGoogle Scholar
 Wang CD, Lai JH, Yu PS (2013) Dynamic community detection in weighted graph streams. In: Proceedings of the 2013 SIAM International Conference on Data Mining, SIAM, New Delhi, pp. 151–161Google Scholar
 Wang X, Wang CD, Lai JH (2016) Modularity optimization by global–local search. In: IJCNN, pp. 840–846Google Scholar
 Wei YC, Cheng CK (1991) Ratio cut partitioning for hierarchical designs. IEEE Trans Comput Aided Design Integr Circuits Syst 10(7):911–921View ArticleGoogle Scholar
 Xu H, Hu Y, Wang Z, Ma J, Xiao W (2013) Corebased dynamic community detection in mobile social networks. Entropy 15(12):5419–5438View ArticleMATHGoogle Scholar
 Yang J, Mcauley J, Leskovec J (2014) Community detection in networks with node attributes. In: ICDM, pp 1151–1156Google Scholar
 Zachary WW (1977) An information flow model for conflict and fission in small groups. J Anthropol Res 33(4):452–473View ArticleGoogle Scholar
 Zhang H, Wang CD, Lai JH, Yu PS (2017) Modularity in complex multilayer networks with multiple aspects: a static perspective. Appl Inform 4(1):7View ArticleGoogle Scholar