Strategy of finding optimal number of features on gene expression data

Strategy of finding optimal number of features on gene expression data

For access to this article, please select a purchase option:

Buy article PDF
(plus tax if applicable)
Buy Knowledge Pack
10 articles for £75.00
(plus taxes if applicable)

IET members benefit from discounts to all IET publications and free access to E&T Magazine. If you are an IET member, log in to your account and the discounts will automatically be applied.

Learn more about IET membership 

Recommend to library

You must fill out fields marked with: *

Librarian details
Your details
Why are you recommending this title?
Select reason:
Electronics Letters — Recommend this title to your library

Thank you

Your recommendation has been sent to your librarian.

Feature selection is considered to be an important step in the analysis of transcriptomes or gene expression data. Carrying out feature selection reduces the curse of the dimensionality problem and improves the interpretability of the problem. Numerous feature selection methods have been proposed in the literature and these methods rank the genes in order of their relative importance. However, most of these methods determine the number of genes to be used in an arbitraryly or heuristic fashion. Proposed is a theoretical way to determine the optimal number of genes to be selected for a given task. This proposed strategy has been applied on a number of gene expression datasets and promising results have been obtained.


    1. 1)
    2. 2)
      • L. Yu , H. Liu . Efficient feature selection via analysis of relevance and redundancy. J. Mach. Learn. Res. , 1205 - 1224
    3. 3)
      • P. Jafari , F. Azuaje . An assessment of recently published gene expression data analyses: reporting experimental design and statistical factors. BMC Med. Inf. Decision Making
    4. 4)
    5. 5)
      • L. Tao , C. Zhang , M. Ogihara . A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics , 14 , 2429 - 2437
    6. 6)
    7. 7)
    8. 8)
      • G.J. Gordon , R.V. Jensen , L.-L. Hsiao , S.R. Gullans , J.E. Blumenstock , S. Ramaswamy , W.G. Richards , D.J. Sugarbaker , R. Bueno . Translation of microarray data into clinically relevant cancer diagnostic tests using gene expression ratios in lung cancer and mesothelioma. Cancer Res. , 4963 - 4967
    9. 9)
      • A.C. Tan , D. Gilbert . Ensemble machine learning on gene expression data for cancer classification. Appl. Bioinformat. , S75 - 83
    10. 10)
      • J. Li , L. Wong . (2003) Using rules to analyse bio-medical data: a comparison between C4.5 and PCL, Advances in web-age information management.
    11. 11)
      • Cong, G., Tan, K.-L., Tung, A.K.H., Xu, X.: `Mining top-k covering rule groups for gene expression data', ACM SIGMOD Int. Conf. on Management of Data, 2005, Baltimore, MD, USA, p. 670–681.

Related content

This is a required field
Please enter a valid email address