Comparison of clustering approaches with application to dual colour protein data
- Author(s): Sabrina Siebert 1 ; Katja Ickstadt 1 ; Martin Schäfer 2 ; Yvonne Radon 3 ; Peter J. Verveer 3
-
-
View affiliations
-
Affiliations:
1:
Faculty of Statistics , TU Dortmund University , Dortmund , Germany ;
2: Chair of Mathematical Optimization , Mathematical Institute, Heinrich Heine University , Düsseldorf , Germany ;
3: Max-Planck-Institute Dortmund , Dortmund , Germany
-
Affiliations:
1:
Faculty of Statistics , TU Dortmund University , Dortmund , Germany ;
- Source:
Volume 12, Issue 1,
February
2018,
p.
7 – 17
DOI: 10.1049/iet-syb.2017.0019 , Print ISSN 1751-8849, Online ISSN 1751-8857
Cells communicate with their environment via proteins, located at the plasma membrane separating the interior of a cell from its surroundings. The spatial distribution of these proteins in the plasma membrane under different physiological conditions is of importance, since this may influence their signal transmission properties. In this study, the authors compare different methods such as hierarchical clustering, extensible Markov models and the gammics method for analysing such a spatial distribution. The methods are examined in a simulation study to determine their optimal use. Afterwards, they analyse experimental imaging data and extend these methods to simulate dual colour data.
Inspec keywords: molecular biophysics; statistical analysis; bioinformatics; cellular biophysics; biomembranes; Markov processes; proteins
Other keywords: hierarchical clustering; plasma membrane; gammics method; cell communication; experimental imaging data; signal transmission; extensible Markov models; spatial protein distribution; clustering approaches; dual colour protein data
Subjects: Probability theory, stochastic processes, and statistics; Physics of subcellular structures; Natural and artificial biomembranes; Biomolecular interactions, charge transfer complexes; Markov processes; Interactions with radiations at the biomolecular level; Biology and medical computing
References
-
-
1)
-
6. Greb, C., Hosy, E.: ‘Universal PAINT–dynamic super-resolution microscopy’, 2015.
-
-
2)
-
13. Sengupta, P., Jovanovic-Talisman, T., Skoko, D., et al: ‘Probing protein heterogeneity in the plasma membrane using PALM and pair correlation analysis’, Nat. Methods, 2011, 8, (11), pp. 969–975.
-
-
3)
-
11. Ripley, B.D.: ‘Modelling spatial patterns’, J. R. Stat. Soc. Ser. B, 1977, 39, (2), pp. 173–212.
-
-
4)
-
20. Dunham, M.H., Meng, Y., Huang, J.: ‘Extensible Markov model’. Proc. IEEE ICDM Conf., IEEE, 2004, pp. 371–374.
-
-
5)
-
25. Hahsler, M., Dunham, M.H.: ‘rEMM: extensible Markov model for data stream clustering in R’, J. Stat. Softw., 2010, 35, (5), pp. 1–31.
-
-
6)
-
3. Sunaga, D.Y., Nievola, J.C., Ramos, M.P.: ‘Statistical and biological validation methods in cluster analysis of gene expression’. Sixth Int. Conf. Machine Learning and Applications (ICMLA), 2007, pp. 494–499.
-
-
7)
-
26. Ch, Hennig.: ‘Fpc: flexible procedures for clustering, R package version 2.1-7’, 2014. Available at http://CRAN.R-project.org/package=fpc.
-
-
8)
-
22. R Core Team: ‘R: a language and environment for statistical computing’ (R Foundation for Statistical Computing, Vienna, Austria, 2013). Available at http://www.R-project.org/.
-
-
9)
-
18. Kaufman, L., Rousseeuw, P.J.: ‘Finding groups in data: an introduction to cluster analysis’ (Wiley, Wiley series in probability and mathematical statistics, New Jersey, 2005).
-
-
10)
-
29. MATLAB: ‘Version 7.10.0 (R2010a)’ (The MathWorks Inc., Natick, MA, 2010).
-
-
11)
-
1. PubMed Help [Internet]. Bethesda (MD): ‘National Center for Biotechnology Information (US); 2005-. PubMed Help’. Available at https://www.ncbi.nlm.nih.gov/books/NBK3827/, accessed May 2017.
-
-
12)
-
17. Jain, A.K., Murty, M.N., Flynn, P.J.: ‘Data clustering: a review’, ACM Comput. Surv., 1999, 31, (3), pp. 264–323.
-
-
13)
-
5. Manley, S., Gillette, J.M., Patterson, G.H., et al: ‘High-density mapping of single-molecule trajectories with photoactivated localization microscopy’, Nat. Methods, 2008, 5, (2), pp. 155–157.
-
-
14)
-
16. Hartigan, J.A.: ‘Clustering algorithms’ (Wiley, New York, 1975).
-
-
15)
-
2. Eisen, M.B., Spellman, P.T., Brown, P.O., et al: ‘Cluster analysis and display of genome-wide expression patterns’, PNAS, 1998, 95, (25), pp. 14863–14868.
-
-
16)
-
10. Huang, B., Babcock, H., Zhuang, X.: ‘Breaking the diffraction barrier: super-resolution imaging of cells’, Cell, 2010, 143, (7), pp. 1047–1058.
-
-
17)
-
27. Maechler, M., Rousseeuw, P., Struyf, A., et al: ‘Cluster: cluster analysis basics and extensions, R package version 1.14.4 – for new features, see the ‘Changelog’ file (in the package source), 2013.
-
-
18)
-
28. Original, S., Scott, D.W.: ‘Report by Gebhardt, A. adopted to recent S-PLUS by Kaluzny, S.: ash: David Scott's ASH routines’. R Package, version 1.0-14, 2013. Available at http://CRAN.R-project.org/package=ash.
-
-
19)
-
14. Levet, F., Hosy, E., Kechkar, A., et al: ‘SR-Tesseler: a method to segment and quantify localization-based super-resolution microscopy data’, Nat. Methods, 2015, 12, (11), pp. 1065–1071.
-
-
20)
-
9. Betzig, E., Patterson, G.H., Sougrat, R., et al: ‘Imaging intracellular fluorescent proteins at nanometer resolution’, Am. Assoc. Adv. Sci., 2006, 313, (5793), pp. 1642–1645, doi:10.1126/science.1127344.
-
-
21)
-
23. Baddeley, A., Turner, R.: ‘Spatstat: an R package for analyzing spatial point patterns’, J. Stat. Softw., 2005, 12, (6), pp. 1–42.
-
-
22)
-
19. Scott, D.W., Sain, S.R.: ‘Multidimensional density estimation’. inRao, C.R., Wegman, E.J., Solka, J.L. (EDs.): ‘Handbook of statistics’ (Elsevier, 2005), vol. 24, pp. 229–261.
-
-
23)
-
4. Arnau, V., Mars, S., Marín, I.: ‘Iterative cluster analysis of protein interaction data’, Bioinformatics, 2005, 21, (2), pp. 364–378.
-
-
24)
-
32. Argiento, R., Cremaschi, A., Guglielmi, A.: ‘A ‘density-based’ algorithm for cluster analysis using species sampling Gaussian mixture models’, J. Comput. Graph. Stat., 2014, 23, (4), pp. 1126–1142.
-
-
25)
-
30. Johnson, S.C.: ‘Hierarchical clustering schemes’, Psychometrika, 1967, 32, (3), pp. 241–254.
-
-
26)
-
15. Ester, M., Kriegel, H.-P., Sander, J., et al: ‘A density-based algorithm for discovering clusters in large spatial databases with noise’. Proc. Second Int. Conf. Knowledge Discovery and Data Mining (KDD-96), 1996, pp. 231–266, ed. U. M. F. Evangelos Simoudis, Jiawei Han AAAI Press.
-
-
27)
-
21. Schäfer, M., Radon, Y., Klein, T., et al: ‘A Bayesian mixture model to quantify parameters of spatial clustering’, Comput. Stat. Data Anal., 2015, 92, pp. 163–176.
-
-
28)
-
12. Rubin-Delanchy, P., Burn, G.L., Griffie, J., et al: ‘Bayesian cluster identification in single-molecule localization microscopy data’, Nat. Methods, 2015, 12, (11), pp. 1072–1076.
-
-
29)
-
31. Ankerst, M., Breunig, M.M., Kriegel, H.-P., et al: ‘OPTICS: ordering points to identify the clustering structure’. ACM SIGMOD Int. Conf. Management of Data, 1999, pp. 49–60, ACM Press.
-
-
30)
-
7. Burnette, D.T., Sengupta, P., Dai, Y., et al: ‘Bleaching/blinking assisted localization microscopy for super-resolution imaging using standard fluorescent molecules’, Proc. Natl. Acad. Sci.USA, 2011, 108, (52), pp. 21081–21086, doi:10.1073/pnas.1117430109.
-
-
31)
-
24. Hahsler, M., Dunham, M.H.: ‘rEMM: extensible Markov model (EMM) for data stream clustering in R’, R package version 1.0-8, 2014. Available at http://CRAN.R-project.org/package=rEMM.
-
-
32)
-
8. Subach, F.V., Patterson, G.H., Manley, S., et al: ‘Photoactivatable mCherry for high-resolution two-color fluorescence microscopy’, Nat. Methods, 2009, 6, (2), pp. 153–159.
-
-
1)