DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING | DWIGHT LOOK COLLEGE OF ENGINEERING | TEXAS A&M UNIVERSITY

 

HOME

ABOUT

COURSES

PEOPLE

PROJECTS

PUBLICATIONS

CONTACT

Motif listing in large graphs

Abstract

Motifs are small subgraphs (e.g., triangles, four-cycles) whose appearance in nature is much more frequent than in classical random graphs. Their discovery (enumeration or listing) plays an important role in various fields. This project aims to analyze complexity of the involved algorithms and design techniques that can handle trillion-edge graphs with limited RAM.

Journal Papers

 
  • Y. Cui, D. Xiao, D.B.H. Cline, and D. Loguinov, "Improving I/O Complexity of Triangle Enumeration," IEEE Transactions on Knowledge and Data Engineering, vol. 34, no. 4, April 2022.

PDF
  • Y. Cui, D. Xiao, and D. Loguinov, "On Efficient External-Memory Triangle Listing," IEEE Transactions on Knowledge and Data Engineering, vol. 31, no. 8, August 2019.

PDF

Conference Papers

 
  • Y. Cui, D. Xiao, D.B.H. Cline, and D. Loguinov, "Improving I/O Complexity of Triangle Enumeration," IEEE ICDM, November 2017.

PDF, PPT
 
  • D. Xiao, Y. Cui, D.B.H. Cline, and D. Loguinov, "On Asymptotic Cost of Triangle Listing in Random Graphs," ACM PODS, May 2017.

PDF, PPT
 
  • Y. Cui, D. Xiao, and D. Loguinov, "On Efficient External-Memory Triangle Listing," IEEE ICDM, December 2016.

PDF, PPT

Technical Reports

 
  • Y. Cui, D. Xiao, D.B.H. Cline, and D. Loguinov, "Improving I/O Complexity of Triangle Enumeration," Texas A&M Technical Report  2017-8-3, August 2017.

PDF
 
  • D. Xiao, Y. Cui, D.B.H. Cline, and D. Loguinov, "On Asymptotic Cost of Triangle Listing in Random Graphs," Texas A&M Technical Report  2016-9-2, September 2016.

PDF
 
  • Y. Cui, D. Xiao, and D. Loguinov, "On Efficient External-Memory Triangle Listing," Texas A&M Technical Report 2016-9-1, September 2016.

PDF

Datasets

The next four undirected graphs are used in the ICDM 2016 paper. Each consists of two files -- (source node, degree) pairs and all adjacency lists dumped one after the other. All node IDs and degrees are unsigned 4-byte integers, LSB order. The IDs are sequential, with no gaps. Source nodes and neighbor lists are sorted ascending.

IRLbot domain (14 GB): degree, edges IRLbot host (53 GB): degree, edges IRLbot IP (6 GB): degree, edges Full ClueWeb09 webgraph (358 GB): degree, edges Trigon source code (can be used for PCF as well)


     Copyright © 2002-2024 IRL at Texas A&M. All Rights Reserved.