SSRW: A Scalable Algorithm for Estimating Graphlet Statistics Based on Random Walk

摘要

Mining graphlet statistics is very meaningful due to its wide applications in social networks, bioinformatics and information security, etc. However, it is a big challenge to exactly count graphlet statistics as the number of subgraphs exponentially increases with the graph size, so sampling algorithms are widely used to estimate graphlet statistics within reasonable time. However, existing sampling algorithms are not scalable for large graphlets, e.g., they may get stuck when estimating graphlets with more than five nodes. To address this issue, we propose a highly scalable algorithm, Scalable subgraph Sampling via Random Walk (SSRW), for graphlet counts and concentrations. SSRW samples graphlets by generating new nodes from the neighbors of previously visited nodes instead of fixed ones. Thanks to this flexibility, we can generate any k-graphlets in a unified way and estimate statistics of k-graphlet efficiently even for large k. Our extensive experiments on estimating counts and concentrations of 4,5,6,74,5,6,74,5,6,7-graphlets show that SSRW algorithm is scalable, accurate and fast.

出版物
Database Systems for Advanced Applications

相关