Scalable learning of large networks
Cellular networks inferred from condition-specific microarray data can capture the functional rewiring of cells in response to different environmental conditions. Unfortunately, many algorithms for inferring cellular networks do not scale to whole-genome data with thousands of variables. We propose a novel approach for scalable learning of large networks: cluster and infer networks (CIN). CIN learns network structures in two steps: (a) partition variables into smaller clusters, and (b) learn networks per cluster. We optionally revisit the cluster assignment of variables with poor neighbourhoods. Results on networks with known topologies suggest that CIN has substantial speed benefits, without substantial performance loss. We applied our approach to microarray compendia of glucose-starved yeast cells. The inferred networks had significantly higher number of subgraphs representing meaningful biological dependencies than random graphs. Analysis of subgraphs identified biological processes that agreed well with existing information about yeast populations under glucose starvation, and also implicated novel pathways that were previously not known to be associated with these populations. [Includes supplementary material]