Multivariate dependence and genetic networks inference
Multivariate dependence and genetic networks inference
- Author(s): A.A. Margolin ; K. Wang ; A. Califano ; I. Nemenman
- DOI: 10.1049/iet-syb.2010.0009
For access to this article, please select a purchase option:
Buy article PDF
Buy Knowledge Pack
IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.
Thank you
Your recommendation has been sent to your librarian.
- Author(s): A.A. Margolin 1, 2 ; K. Wang 2, 3 ; A. Califano 2 ; I. Nemenman 2, 4
-
-
View affiliations
-
Affiliations:
1: Cancer Program, The Broad Institute of Harvard and MIT, Cambridge, USA
2: Joint Centers for Systems Biology, Columbia University Medical Center, New York, USA
3: Joint Centers for Systems Biology, Oncology Research Unit, San Diego, USA
4: Departments of Physics and Biology, Emory University, Computational and Life Sciences Strategic Initiative, Atlanta, USA
-
Affiliations:
1: Cancer Program, The Broad Institute of Harvard and MIT, Cambridge, USA
- Source:
Volume 4, Issue 6,
November 2010,
p.
428 – 440
DOI: 10.1049/iet-syb.2010.0009 , Print ISSN 1751-8849, Online ISSN 1751-8857
A critical task in systems biology is the identification of genes that interact to control cellular processes by transcriptional activation of a set of target genes. Many methods have been developed that use statistical correlations in high-throughput data sets to infer such interactions. However, cellular pathways are highly cooperative, often requiring the joint effect of many molecules. Few methods have been proposed to explicitly identify such higher-order interactions, partially due to the fact that the notion of multivariate statistical dependence itself remains imprecisely defined. The authors define the concept of dependence among multiple variables using maximum entropy techniques and introduce computational tests for their identification. Synthetic network results reveal that this procedure uncovers dependencies even in undersampled regimes, when the joint probability distribution cannot be reliably estimated. Analysis of microarray data from human B cells reveals that third-order statistics, but not second-order ones, uncover relationships between genes that interact in a pathway to cooperatively regulate a common set of targets.
Inspec keywords: cellular biophysics; maximum entropy methods; genetics; correlation methods; higher order statistics; molecular biophysics
Other keywords:
Subjects: Probability theory, stochastic processes, and statistics; Biomolecular structure, configuration, conformation, and active sites; Physics of subcellular structures; Biomolecular dynamics, molecular probes, molecular pattern recognition
References
-
-
1)
- C.E. Shannon . A mathematical theory of communication. AT&T Tech. J. , 3 , 379 - 423
-
2)
- J.N. Darroch . Interactions in multi-factor contingency tables. J. Roy. Stat. Soc. Ser. B (Methodol.) , 1 , 251 - 263
-
3)
- H.J. Bussemaker , H. Li , E.D. Siggia . Regulatory element detection using correlation with expression. Nat. Genet. , 2 , 167 - 171
-
4)
- M.A. Beer , S. Tavazoie . Predicting gene expression from sequence. Cell , 2 , 185 - 198
-
5)
- S. Peri , J.D. Navarro , R. Amanchy . Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. , 10 , 2363 - 2371
-
6)
- S. Kullback , R.A. Leibler . On information and sufficiency. Ann. Math. Stat. , 1 , 142 - 143
-
7)
- S. Amari . Information geometry on hierarchy of probability distributions. IEEE Trans. Inf. Theory , 5 , 1701 - 1711
-
8)
- M. Schena , D. Shalon , R.W. Davis , P.O. Brown . Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science , 5235 , 467 - 470
-
9)
- S.N. Roy , M.A. Kastenbaum . On the hypothesis of no “interaction” in a multi-way contingency table. Ann. Math. Stat. , 3 , 749 - 757
-
10)
- S. Kullback . Probability densities with given marginals. Ann. Math. Stat. , 4 , 1236 - 1243
-
11)
- J. Lu , G. Getz , E.A. Miska . MicroRNA expression profiles classify human cancers. Nature , 7043 , 834 - 838
-
12)
- I. Nemenman , F. Shafee , W. Bialek , T.G. Dietterich , S. Becker , Z. Ghahramani . (2002) Entropy and inference revisited', in, Advances in neural information processing systems.
-
13)
- H.H. Ku , S. Kullback . Interaction in multidimensional contingency tables: an information theoretic approach. J. Res. Natl. Bur. Stand (Math. Sci.) , 3 , 159 - 200
-
14)
- B. Luscher , E.A. Kuenzel , E.G. Krebs , R.N. Eisenman . Myc oncoproteins are phosphorylated by casein kinase II. Embo J. , 4 , 1111 - 1119
-
15)
- E. Schneidman , S. Still , M.J. Berry , W. Bialek . Network information and connected correlations. Phys. Rev. Lett. , 23
-
16)
- W. Lu , E. Kimball , J.D. Rabinowitz . A high-performance liquid chromatography-tandem mass spectrometry method for quantitation of nitrogen-containing intracellular metabolites. J. Am. Soc. Mass. Spectrom. , 1 , 37 - 50
-
17)
- A. Agresti . (1990) Categorical data analysis.
-
18)
- H.O. Lancaster . Complex contingency tables treated by the partition of chi square. J. Roy. Stat. Soc. Ser. B (Methodol.) , 2 , 242 - 249
-
19)
- I.J. Good . Maximum entropy for hypothesis formulation, especially for multidimensional contingency tables. Ann. Math. Stat. , 3 , 911 - 934
-
20)
- A.A. Margolin . (2009) Computational inference of genetic networks in human cancer cells.
-
21)
- W.E. Deming , F.F. Stephan . On a least squares adjustment of a sampled frequency table when the expected marginal totals are known. Ann. Math. Stat. , 427 - 444
-
22)
- A.A. Margolin , K. Wang , W.K. Lim , M. Kustagi , I. Nemenman , A. Califano . Reverse engineering cellular networks. Nat. Protocols , 2 , 662 - 671
-
23)
- T. Palomero , W. Lim , D. Odom . NOTCH1 directly regulates MYC and activates a feed-forward-loop transcriptional network promoting leukemic cell growth. Proc. Natl. Acad. Sci. , 48 , 18261 - 18266
-
24)
- N. Friedman . Inferring cellular networks using probabilistic graphical models. Science , 5659 , 799 - 805
-
25)
- N.E. Buchler , U. Gerland , T. Hwa . On schemes of combinatorial transcription logic. Proc. Natl. Acad. Sci. USA , 9 , 5136 - 5141
-
26)
- K. Bousset , M. Henriksson , J.M. Luscher-Firzlaff , D.W. Litchfield , B. Luscher . Identification of casein kinase II phosphorylation sites in Max: effects on DNA-binding kinetics of Max homo- and Myc/Max heterodimers. Oncogene , 12 , 3211 - 3220
-
27)
- A.A. Margolin , A. Califano . Theory and limitations of genetic network inference from microarray data. Ann. NY Acad. Sci. , 51 - 72
-
28)
- Bell, A.J.: `Co-information lattice', RNI-TR-02–1, Technical, 2002.
-
29)
- I. Nemenman , W. Bialek , R.R. de Ruyter van Steveninck . Entropy and information in neural spike trains: progress on the sampling problem. Phys. Rev. E , 5
-
30)
- K. Wang , M. Saito , B.C. Bisikirska . Genome-wide identification of post-translational modulators of transcription factor activity in human B cells. Nat. Biotechnol. , 9 , 829 - 839
-
31)
- A.J. Hartemink , D.K. Gifford , T.S. Jaakkola , R.A. Young . (2001) Using graphical models and genomic expression data to statistically validate models of genetic regulatory networks.
-
32)
- P.C. Fernandez , S.R. Frank , L. Wang . Genomic targets of the human c-Myc protein. Genes Dev. , 9 , 1115 - 1129
-
33)
- P.M. Lewis . Approximating probability distributions to reduce storage requirements. Inf. Control. , 214 - 225
-
34)
- S. Watanabe . Information theoretical analysis of multivariate correlation. IBM J. Res. Dev. , 1 , 66 - 82
-
35)
- U. Klein , Y. Tu , G.A. Stolovitzky . Gene expression profiling of B cell chronic lymphocytic leukemia reveals a homogeneous phenotype related to memory B cells. J. Exp. Med. , 11 , 1625 - 1638
-
36)
- D. Anastassiou . Computational analysis of the synergy among multiple interacting genes. Mol. Syst. Biol.
-
37)
- T.M. Cover , J.A. Thomas . (2006) Elements of information theory.
-
38)
- C.T. Ireland , S. Kullback . Contingency tables with given marginals. Biometrika , 1 , 179 - 188
-
39)
- O.D. Perez , G.P. Nolan . Simultaneous measurement of multiple active kinase states using polychromatic flow cytometry. Nat. Biotechnol. , 2 , 155 - 162
-
40)
- H. Niiro , E.A. Clark . Regulation of B-cell fate by antigen-receptor signals. Nat. Rev. Immunol. , 12 , 945 - 956
-
41)
- Wang, K., Nemenman, I., Banerjee, N.: `Genome-wide identification of modulators of transcriptional networks in human B lymphocytes', Proc. Tenth Annual Int. Conf. on Research in Computational Molecular Biology (RECOMB), 2006, p. 348–362, (LNCS, 3909).
-
42)
- H. Joe . (1997) Multivariate models and dependence concepts.
-
43)
- G. Chechick , A. Globerson , M.J. Anderson , T. Dietterich , S. Becker , Z. Ghahramani . (2002) Groups redundancy measures reveal redundancy reduction along the auditory pathway’,, Advance in Neural Information Processing System.
-
44)
- A. Niknejad .
-
45)
- I. Nemenman , G.D. Lewen , W. Bialek , R.R. de Ruyter van Steveninck . Neural coding of natural stimuli: information at sub-millisecond resolution. PLoS Comput. Biol. , 3
-
46)
- K.I. Zeller , A.G. Jegga , B.J. Aronow , K.A. O'Donnell , C.V. Dang . An integrated database of genes responsive to the Myc oncogenic transcription factor: identification of direct genomic targets. Genome Biol. , 10
-
47)
- A.A. Margolin , I. Nemenman , K. Basso . ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinf.
-
48)
- J. Watkinson , K.C. Liang , X. Wang , T. Zheng , D. Anastassiou . Inference of regulatory gene interactions from expression data using three-way mutual information. Ann. NY Acad. Sci. , 302 - 313
-
49)
- S.P. Strong , R. Koberle , R.R. de Ruyter van Steveninck , W. Bialek . Entropy and information in neural spike train. Phys. Rev. Lett. , 197 - 200
-
50)
- L. Martignon . Neural coding: higher-order temporal patterns in the neurostatistics of cell assemblies. Neural Comput. , 2621 - 2653
-
51)
- D. Pe'er , A. Regev , G. Elidan , N. Friedman . Inferring subnetworks from perturbed expression profiles. Bioinformatics , S215 - S224
-
52)
- J. Pearl . (1988) Probabilistic reasoning in intelligent systems: networks of plausible inference.
-
53)
- N. Slonim , O. Elemento , S. Tavazoie . Ab initio genotype-phenotype association reveals intrinsic modularity in genetic networks. Mol. Syst. Biol.
-
54)
- E.T. Jaynes . Information theory and statistical mechanics. Phys. Rev. , 620 - 630
-
55)
- W.R. Garner , W.J. McGill . The relation between information and variance analysis. Psychometrika , 3 , 219 - 228
-
56)
- K. Basso , A.A. Margolin , G. Stolovitzky , U. Klein , R. Dalla-Favera , A. Califano . Reverse engineering of regulatory networks in human B cells. Nat. Genet. , 4 , 382 - 390
-
57)
- V. Matys , E. Fricke , R. Geffers . TRANSFAC: transcriptional regulation, from patterns to profiles. Nucl. Acids Res. , 1 , 374 - 378
-
58)
- A. de la Fuente , N. Bing , I. Hoeschele , P. Mendes . Discovery of meaningful associations in genomic data using partial correlation coefficients. Bioinformatics , 18 , 3565 - 3574
-
59)
- E. Soofi . A generalized formulation of conditional logit with diagnostics. J. Am. Stat. Assoc. , 419 , 812 - 816
-
60)
- I. Csiszar . I-divergence geometry of probability distributions and minimization problems. Ann. Probab. , 1 , 146 - 158
-
61)
- W. McGill . Multivariate information transmission. IRE Trans. Inf. Theory , 93 - 110
-
1)