Effect of the order of parameterisation in gradient learning for kernel methods

T.J. Dodd; S. Nair; R.F. Harrison

Effect of the order of parameterisation in gradient learning for kernel methods

Access Full Text

Effect of the order of parameterisation in gradient learning for kernel methods

Author(s): T.J. Dodd ; S. Nair ; R.F. Harrison
DOI: 10.1049/iet-cta.2009.0367

For access to this article, please select a purchase option:

Buy article PDF

Buy Knowledge Pack

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership

Recommend Title Publication to library

IET Control Theory & Applications — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Author(s): T.J. Dodd ¹ ; S. Nair ² ; R.F. Harrison ¹
- Affiliations: 1: Department of Automatic Control and Systems Engineering, University of Sheffield, Sheffield, UK
  2: California NanoSystems Institute, Center for the Environmental Implications of Nanotechnology, University of California, Los Angeles, USA
Source: Volume 4, Issue 10, October 2010, p. 2141 – 2151
DOI: 10.1049/iet-cta.2009.0367 , Print ISSN 1751-8644, Online ISSN 1751-8652

Published

Reproducing kernel Hilbert spaces (RKHSs) provide a natural framework for data modelling and have been applied to signal processing, control, machine learning and function approximation. A significant problem with models derived from RKHS is that the estimation scales poorly with the number of data. This is due to the need to invert a matrix of size equal to the number of data. Among the methods proposed to overcome this are gradient-based iterative techniques such as steepest descent and conjugate gradient that avoid direct matrix inversions. In this study the authors explore the use of gradient methods for estimating RKHS models from data. It is possible to apply the gradient iteration in function space and subsequently parameterise the algorithm or, alternatively, apply the gradient iteration directly to a parameterised version of the function approximation problem. The main contribution of this study is to demonstrate that the order in which the model is parameterised affects the rate of convergence of gradient-based iterative solution algorithms. The authors also provide conditions for which parameterisation to use in practise. Criteria for selecting the best approach, functional or parametric, are given and results demonstrating the different convergence rates are presented.

References

1. 1)
  - Nair, S.: `Function estimation using kernel methods for large data sets', 2007, PhD, The University of Sheffield.
2. 2)
  - J. Platt , B. Schölkopf , C. Burges , A. Smola . (1999) Fast training of support vector machines using sequential minimal optimization, Advances in kernel methods – support vector learning.
3. 3)
  - B. Mitchinson , R. Harrison . Digital communications channel equalisation using the kernel adaline. IEEE Trans. Commun. , 4 , 571 - 576
4. 4)
  - T. Dodd , Y. Wan , P. Drezet , R. Harrison . Practical estimation of Volterra filters of arbitrary degree. Int. J. Control , 6 , 908 - 918
5. 5)
  - Dodd, T., Nair, S., Harrison, R.: `Gradient based methods: functional vs parametric forms', Proc. 2005 IFAC World Congress, 2005, Prague.
6. 6)
  - Ratliff, N., Bagnell, J.: `Kernel conjugate gradient for fast kernel machines', Proc. Int. Conf. on Artificial Intelligence, 2007.
7. 7)
  - C. Williams , M. Seeger , T. Leen , T. Dietterich , V. Tresp . (2001) Using the Nyström method to speed up kernel machines, Advances in neural information processing systems.
8. 8)
  - C. Williams , M. Jordan . (1999) Prediction with Gaussian processes: From linear regression to linear prediction and beyond, Learning in graphical models.
9. 9)
  - M. Bertero , C. De Mol , E. Pike . Linear inverse problems with discrete data. I: General formulation and singular system analysis. Inverse Probl. , 301 - 330
10. 10)
  - E. Kreyszig . (1978) Introductory functional analysis with applications.
11. 11)
  - T. Poggio , F. Giroshi . Networks for approximation and learning. Proc. IEEE , 9 , 1481 - 1497
12. 12)
  - V. Tresp . Scaling kernel-based systems to large data sets. Data Min. Knowl. Discov. , 3 , 197 - 211
13. 13)
  - Smola, A., Schölkopf, B.: `Sparse greedy matrix approximation for machine learning', Proc. 17th Int. Conf. on Machine Learning, 2000, p. 911–918.
14. 14)
  - V. Vapnik , S. Haykin . (1998) Statistical learning theory, Adaptive and learning systems for signal processing, communications and control.
15. 15)
  - C. Groetsch . (1977) Generalized inverses of linear operators: Representation and approximation.
16. 16)
  - J. Daniel . The conjugate gradient method for linear and nonlinear operator equations. SIAM J. Numer. Anal. , 10 - 26
17. 17)
  - A. van der Sluis , H. van der Vorst . The rate of convergence of conjugate gradients. Numer. Math. , 543 - 560
18. 18)
  - W. Kammerer , M. Nashed . Iterative methods for best approximate solutions of linear integral equations of the first and second kinds. J. Math. Anal. and Appl. , 547 - 573
19. 19)
  - F. Deutsch , J. Borwein , P. Borwein . (2001) Best approximation in inner product spaces, CMS books in mathematics.
20. 20)
  - Wan, Y., Dodd, T., Harrison, R.: `Infinite degree Volterra series estimation', Proc. Second Int. Conf. on Computational Intelligence, Robotics and Autonomous Systems, 2003, Singapore.
21. 21)
  - R. deFigueiredo . A generalized Fock space framework for nonlinear system and signal analysis. IEEE Trans. Circuits Syst. , 9 , 637 - 647
22. 22)
  - Dodd, T., Harrison, R.: `Estimating Volterra filters in Hilbert space', Proc. IFAC Int. Conf. on Intelligent Control Systems and Signal Processing, 2003.
23. 23)
  - B. Schölkopf , A. Smola . (2002) Learning with kernels.
24. 24)
  - T. Kailath . RKHS approach to detection and estimation problems – Part I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory , 5 , 530 - 549
25. 25)
  - G. Wahba . (1990) Spline models for observational data, Applied mathematics.
26. 26)
  - O. Axelsson , I. Kaporin . On the sublinear and superlinear rate of convergence of conjugate gradient methods. Numer. Algorithms , 1 - 22
27. 27)
  - E. Parzen . An approach to time series analysis. Ann. Math. Stat. , 951 - 989
28. 28)
  - K. Yao . Applications of reproducing kernel Hilbert spaces – bandlimited signal models. Inf. Control , 429 - 444
29. 29)
  - A.N. Tikhonov , V.Y. Arsenin , F. John . (1977) Solutions of Ill-posed problems, Scripta series in mathematics.
30. 30)
  - N. Aronszajn . Theory of reproducing kernels. Trans. Am. Math. Soc. , 337 - 404
31. 31)
  - M. Nashed . Steepest descent for singular linear operator equations. SIAM J. Numer. Anal. , 3 , 358 - 362
32. 32)
  - N. Cristianini , J. Shawe-Taylor . (2000) An introduction to support vector machines and other kernel-based learning methods.

Login

Not registered yet?

Share

Tools

Login to add to favourites

Key

Effect of the order of parameterisation in gradient learning for kernel methods

Effect of the order of parameterisation in gradient learning for kernel methods

Buy article PDF

Buy Knowledge Pack

Thank you

References

Related content