Your browser does not support JavaScript!
http://iet.metastore.ingenta.com
1887

Effect of the order of parameterisation in gradient learning for kernel methods

Effect of the order of parameterisation in gradient learning for kernel methods

For access to this article, please select a purchase option:

Buy article PDF
£12.50
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend Title Publication to library

You must fill out fields marked with: *

Librarian details
Name:*
Email:*
Your details
Name:*
Email:*
Department:*
Why are you recommending this title?
Select reason:
 
 
 
 
 
IET Control Theory & Applications — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Reproducing kernel Hilbert spaces (RKHSs) provide a natural framework for data modelling and have been applied to signal processing, control, machine learning and function approximation. A significant problem with models derived from RKHS is that the estimation scales poorly with the number of data. This is due to the need to invert a matrix of size equal to the number of data. Among the methods proposed to overcome this are gradient-based iterative techniques such as steepest descent and conjugate gradient that avoid direct matrix inversions. In this study the authors explore the use of gradient methods for estimating RKHS models from data. It is possible to apply the gradient iteration in function space and subsequently parameterise the algorithm or, alternatively, apply the gradient iteration directly to a parameterised version of the function approximation problem. The main contribution of this study is to demonstrate that the order in which the model is parameterised affects the rate of convergence of gradient-based iterative solution algorithms. The authors also provide conditions for which parameterisation to use in practise. Criteria for selecting the best approach, functional or parametric, are given and results demonstrating the different convergence rates are presented.

References

    1. 1)
      • Nair, S.: `Function estimation using kernel methods for large data sets', 2007, PhD, The University of Sheffield.
    2. 2)
      • J. Platt , B. Schölkopf , C. Burges , A. Smola . (1999) Fast training of support vector machines using sequential minimal optimization, Advances in kernel methods – support vector learning.
    3. 3)
      • B. Mitchinson , R. Harrison . Digital communications channel equalisation using the kernel adaline. IEEE Trans. Commun. , 4 , 571 - 576
    4. 4)
      • T. Dodd , Y. Wan , P. Drezet , R. Harrison . Practical estimation of Volterra filters of arbitrary degree. Int. J. Control , 6 , 908 - 918
    5. 5)
      • Dodd, T., Nair, S., Harrison, R.: `Gradient based methods: functional vs parametric forms', Proc. 2005 IFAC World Congress, 2005, Prague.
    6. 6)
      • Ratliff, N., Bagnell, J.: `Kernel conjugate gradient for fast kernel machines', Proc. Int. Conf. on Artificial Intelligence, 2007.
    7. 7)
      • C. Williams , M. Seeger , T. Leen , T. Dietterich , V. Tresp . (2001) Using the Nyström method to speed up kernel machines, Advances in neural information processing systems.
    8. 8)
      • C. Williams , M. Jordan . (1999) Prediction with Gaussian processes: From linear regression to linear prediction and beyond, Learning in graphical models.
    9. 9)
      • M. Bertero , C. De Mol , E. Pike . Linear inverse problems with discrete data. I: General formulation and singular system analysis. Inverse Probl. , 301 - 330
    10. 10)
      • E. Kreyszig . (1978) Introductory functional analysis with applications.
    11. 11)
      • T. Poggio , F. Giroshi . Networks for approximation and learning. Proc. IEEE , 9 , 1481 - 1497
    12. 12)
      • V. Tresp . Scaling kernel-based systems to large data sets. Data Min. Knowl. Discov. , 3 , 197 - 211
    13. 13)
      • Smola, A., Schölkopf, B.: `Sparse greedy matrix approximation for machine learning', Proc. 17th Int. Conf. on Machine Learning, 2000, p. 911–918.
    14. 14)
      • V. Vapnik , S. Haykin . (1998) Statistical learning theory, Adaptive and learning systems for signal processing, communications and control.
    15. 15)
      • C. Groetsch . (1977) Generalized inverses of linear operators: Representation and approximation.
    16. 16)
      • J. Daniel . The conjugate gradient method for linear and nonlinear operator equations. SIAM J. Numer. Anal. , 10 - 26
    17. 17)
      • A. van der Sluis , H. van der Vorst . The rate of convergence of conjugate gradients. Numer. Math. , 543 - 560
    18. 18)
      • W. Kammerer , M. Nashed . Iterative methods for best approximate solutions of linear integral equations of the first and second kinds. J. Math. Anal. and Appl. , 547 - 573
    19. 19)
      • F. Deutsch , J. Borwein , P. Borwein . (2001) Best approximation in inner product spaces, CMS books in mathematics.
    20. 20)
      • Wan, Y., Dodd, T., Harrison, R.: `Infinite degree Volterra series estimation', Proc. Second Int. Conf. on Computational Intelligence, Robotics and Autonomous Systems, 2003, Singapore.
    21. 21)
      • R. deFigueiredo . A generalized Fock space framework for nonlinear system and signal analysis. IEEE Trans. Circuits Syst. , 9 , 637 - 647
    22. 22)
      • Dodd, T., Harrison, R.: `Estimating Volterra filters in Hilbert space', Proc. IFAC Int. Conf. on Intelligent Control Systems and Signal Processing, 2003.
    23. 23)
      • B. Schölkopf , A. Smola . (2002) Learning with kernels.
    24. 24)
      • T. Kailath . RKHS approach to detection and estimation problems – Part I: deterministic signals in Gaussian noise. IEEE Trans. Inf. Theory , 5 , 530 - 549
    25. 25)
      • G. Wahba . (1990) Spline models for observational data, Applied mathematics.
    26. 26)
      • O. Axelsson , I. Kaporin . On the sublinear and superlinear rate of convergence of conjugate gradient methods. Numer. Algorithms , 1 - 22
    27. 27)
      • E. Parzen . An approach to time series analysis. Ann. Math. Stat. , 951 - 989
    28. 28)
      • K. Yao . Applications of reproducing kernel Hilbert spaces – bandlimited signal models. Inf. Control , 429 - 444
    29. 29)
      • A.N. Tikhonov , V.Y. Arsenin , F. John . (1977) Solutions of Ill-posed problems, Scripta series in mathematics.
    30. 30)
      • N. Aronszajn . Theory of reproducing kernels. Trans. Am. Math. Soc. , 337 - 404
    31. 31)
      • M. Nashed . Steepest descent for singular linear operator equations. SIAM J. Numer. Anal. , 3 , 358 - 362
    32. 32)
      • N. Cristianini , J. Shawe-Taylor . (2000) An introduction to support vector machines and other kernel-based learning methods.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-cta.2009.0367
Loading

Related content

content/journals/10.1049/iet-cta.2009.0367
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading
This is a required field
Please enter a valid email address