In many applications data classification may be hindered by the existence of multiple contexts that produce an input sample. To alleviate the problems associated with multiple contexts, context-based classification is a process that uses different classifiers depending on a measure of the context. Context-based classifiers offer the promise of increasing performance by allowing classifiers to become experts at classifying input samples of certain types, rather than trying to force single classifiers to perform well on all possible inputs. This study introduces a novel mixture of experts (ME) model, the mixture of hidden Markov model experts, for context-based classification of samples that are variable length sequences; and derives the update equations for a single probabilistic model that to learn the experts and a gate that connects the experts. The model has a similar high-level structure to the ME model but has the novelty that the gates and the experts are HMMs and the input data are sequences. Experimental results are presented on three datasets including one for landmine detection. Detailed analysis of the model is provided; which, over multiple runs and cross-validation experiments, show superior results over the compared algorithms.

References

1. 1)
  - 24. Scott, W.: ‘Broadband array of electromagnetic induction sensors for detecting buried land-mines’. IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS), July 2008, vol. 2, pp. 375–378.
2. 2)
  - 37. Bicego, M., Murino, V., Figueiredo, M.A.T.: ‘Similarity-based classification of sequences using hidden Markov models’, Pattern Recogn., 2004, 37, (12), pp. 2281–2291.
3. 3)
  - 32. Smyth, P.: ‘Clustering sequences with hidden Markov models’. Advances in Neural Information Proc. Systems (NIPS), 1997, pp. 648–654.
4. 4)
  - 8. Wang, X., Whigham, P., Deng, D., et al: ‘Time-line hidden Markov experts for time series prediction’, Neural Inf. Process.– Lett. Rev., 2004, 3, (2), pp. 39–48.
5. 5)
  - 41. Scholkopf, B., Smola, A.J.: ‘Learning with kernels: support vector machines, regularization, optimization, and beyond’ (MIT Press, Cambridge, MA, USA, 2001).
6. 6)
  - 12. Fritsch, J., Finke, M., Waibel, A.: ‘Context dependent hybrid HME HMM speech recognition using polyphone clustering decision trees’. Proc. of the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 1997, pp. 1759–1762.
7. 7)
  - 38. Bicego, M., Pekalska, E., Tax, D.M.J., et al: ‘Component-based discriminative classification for hidden Markov models’, Pattern Recogn., 2009, 42, (11), pp. 2637–2648.
8. 8)
  - 20. Kanungo, T., Mount, D.M., Netanyahu, N.S., et al: ‘An efficient k-means clustering algorithm: Analysis and implementation’, IEEE Trans. Pattern Anal. Mach. Intell., 2002, 24, pp. 881–892.
9. 9)
  - 16. Jordan, M.I., Xu, L.: ‘Convergence results for the EM approach to mixtures of experts architectures’, Neural Netw., 1995, 8, pp. 1409–1431.
10. 10)
  - 18. Juang, B.-H., Hou, W., Lee, C.-H.: ‘Minimum classification error rate methods for speech recognition’, IEEE Trans. Speech Audio Process., 1997, 5, (3), pp. 257–265.
11. 11)
  - 2. Yuksel, S.E., Wilson, J.N., Gader, P.D.: ‘Twenty years of mixture of experts’, IEEE Trans. Neural Netw. and Learn. Syst., 2012, 23, (8), pp. 1177–1193.
12. 12)
  - 7. Lu, Z.: ‘A regularized minimum cross-entropy algorithm on mixtures of experts for time series prediction and curve detection’, Pattern Recognit. Lett., 2006, 27, (9), pp. 947–955.
13. 13)
  - 3. Jacobs, R.A., Jordan, M.I., Nowlan, S.J., et al: ‘Adaptive mixtures of local experts’, Neural Comput., 1991, 3, (1), pp. 79–87.
14. 14)
  - 33. Hamdi, A., Missaoui, O., Frigui, H., et al: ‘Landmine detection using ensemble discrete hidden Markov models with context dependent training methods’. Proc. SPIE, 2010, p. 76642J.
15. 15)
  - 28. Yuksel, S.E., Ramachandran, G., Gader, P., et al: ‘Hierarchical methods for landmine detection with wideband electro-magnetic induction and ground penetrating radar multi-sensor systems’. IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS), July 2008, vol. 2, pp. II-177–II-180.
16. 16)
  - 4. Jordan, M.I.: ‘Hierarchical mixtures of experts and the EM algorithm’, Neural Comput., 1994, 6, pp. 181–214.
17. 17)
  - 27. Yuksel, S., Gader, P.: ‘Mixture of hmm experts with applications to landmine detection’. 2012 IEEE Int. Geoscience and Remote Sensing Symp. (IGARSS), July 2012, pp. 6852–6855.
18. 18)
  - 6. Coelho, A., Lima, C., Von Zuben, F.: ‘Hybrid genetic training of gated mixtures of experts for nonlinear time series forecasting’. IEEE Int. Conf. on Systems, Man and Cybernetics, 2003, vol. 5, pp. 4625–4630.
19. 19)
  - 36. Bicego, M., Cristani, M., Murino, V., et al: ‘Clustering-based construction of hidden Markov models for generative kernels’. EMM-CVPR, 2009, pp. 466–479.
20. 20)
  - 22. Freund, Y., Schapire, R.E.: ‘A decision-theoretic generalization of on-line learning and an application to boosting’, J. Comput. Syst. Sci., 1997, 55, (1), pp. 119–139.
21. 21)
  - 1. Bureau of Political-Military Affairs: ‘Hidden killers: The global landmine crisis’. Report 10575, Office of Humanitarian Demining Programs, United States Department of State, September 1998.
22. 22)
  - 5. Chen, K., Xie, D., Chi, H.: ‘A modified HME architecture for text-dependent speaker identification’, IEEE Trans. Neural Netw., 1996, 7, pp. 1309–1313.
23. 23)
  - 21. Rabiner, L.R.: ‘A tutorial on hidden Markov models and selected applications in speech recognition’. Proc. of the IEEE, 1989, pp. 257–286.
24. 24)
  - 31. Bezdek, J.C.: ‘Pattern recognition with fuzzy objective function algorithms’ (Plenum Press, New York, 1981).
25. 25)
  - 14. Yao, B., Walther, D., Beck, D., et al: ‘Hierarchical mixture of classification experts uncovers interactions between brain regions’. Advances in Neural Inf. Proc. Systems (NIPS), 2009, vol. 22, pp. 2178–2186.
26. 26)
  - 23. Fails, E.B., Torrione, P.A., Waymond, R., et al: ‘Performance of a four parameter model for modeling landmine signatures in frequency domain wideband electromagnetic induction detection systems’. SPIE Detection and Remediation Technologies for Mines and Minelike Targets XII, 2007, pp. 65531–7.
27. 27)
  - 11. Yumlu, M.S., Gurgen, F.S., Okay, N.: ‘Financial time series prediction using mixture of experts’. 18th Int. Symp. on Computer and Information Sciences (ISCIS), 2003, vol. 2869, pp. 553–560.
28. 28)
  - 25. Frigui, H., Zhang, L., Gader, P.: ‘Context-dependent multisensor fusion and its application to land mine detection’, IEEE Trans. Geosci. Remote Sens., 2010, 48, (6), pp. 2528–2543.
29. 29)
  - 19. Missaoui, O., Frigui, H., Gader, P.: ‘Land-mine detection with ground-penetrating radar using multistream discrete hidden Markov models’, IEEE Trans. Geosci. Remote Sens., 2011, 49, (6), pp. 2080–2099.
30. 30)
  - 29. Ratto, C., Torrione, P., Morton, K., et al: ‘Context-dependent landmine detection with ground-penetrating radar using a hidden Markov context model’. IEEE Int. Symp. on Geoscience and Remote Sensing (IGARSS), July 2010, pp. 4192–4195.
31. 31)
  - 10. Weigend, A.S., Mangeas, M., Srivastava, A.N.: ‘Nonlinear gated experts for time series: discovering regimes and avoiding overfitting’, Int. J. Neural Syst., 1995, 6, pp. 373–399.
32. 32)
  - 9. Weigend, A., Gershenfeld, N. (Eds.): ‘Time series prediction: forecasting the future and understanding the past’ (Addison-Wesley, 1994).
33. 33)
  - 34. Zhao, Y., Gader, P., Chen, P., et al: ‘Training dhmms of mine and clutter to minimize landmine detection errors’, IEEE Trans. Geosci. and Remote Sens., 2003, 41, (5), pp. 1016–1024.
34. 34)
  - 17. Waterhouse, S., Robinson, A.: ‘Classification using hierarchical mixtures of experts’. Proc. IEEEWorkshop on Neural Networks for Signal Processing IV, 1994, pp. 177–186.
35. 35)
  - 30. Yuksel, S.E., Bolton, J., Gader, P.D.: ‘Multiple-instance hidden Markov models with applications to landmine detection’, IEEE Trans. Geosci. Remote Sens., 2015, 53, (12), pp. 6766–6775.
36. 36)
  - 35. Andreu, G., Crespo, A., V., J.: ‘Selecting the toroidal self-organizing feature maps (TSOFM) best organized to object recognition’. Int. Conf. on Neural Networks, 1997, vol. 2, pp. 1341–1346.
37. 37)
  - 15. Zhao, Y., Schwartz, R., Sroka, J., et al: ‘Hierarchical mixtures of experts methodology applied to continuous speech recognition’. Procof the IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP), 1995, vol. 5, pp. 3443–3446.
38. 38)
  - 42. Yuksel, S.E., Gader, P.: ‘Variational mixture of experts for classification with applications to landmine detection’. Int. Conf. on Pattern Recognition (ICPR), 2010, pp. 2981–2984.
39. 39)
  - 26. Ramachandran, G., Gader, P., Wilson, J.G.: ‘Gradient angle model algorithm on wideband EMI data for land-mine detection’, IEEE Geosci. Remote Sens. Lett., 2010, 7, (3), pp. 535–539.
40. 40)
  - 39. Daliri, M.R., Torre, V.: ‘Robust symbolic representation for shape recognition and retrieval’, Pattern Recogn., 2008, 41, pp. 1799–1815.
41. 41)
  - 40. Neuhaus, M., Bunke, H.: ‘Edit distance-based kernel functions for structural pattern classification’, Pattern Recogn., 2006, 39, pp. 1852–1863.
42. 42)
  - 13. Tuerk, A.: ‘The state based mixture of experts HMM with applications to the recognition of spontaneous speech’. PhD thesis, University of Cambridge, September 2001.

Context-based classification via mixture of hidden Markov model experts with applications in landmine detection

References

Related content