access icon free Activity assessment of small drug molecules in estrogen receptor using multilevel prediction model

The authors have proposed an efficient multilevel prediction model for better activity assessment to test whether certain chemical compounds can disrupt processes in the human body that may create negative health effects. Here, a computational method (in-silico) is proposed for the quality prediction of drugs in terms of their activity, activity score, potency, and efficacy for estrogen receptors (ERs) by using various physicochemical properties (molecular descriptors). PaDEL-Descriptor is used for features extraction. The ER dataset has 8481 drug molecules where 1084 are active, and 7397 are inactive, and each drug molecule has 1444 features. This dataset is highly imbalanced and has a substantial number of features. Initially, a class imbalance problem is resolved through synthetic minority oversampling technique algorithm, and feature selection is done using FSelector library of R. A machine learning based multilevel prediction model is developed where classification is performed on its first level and regression on its second level. By using all these strategies simultaneously, outperformed accuracy is achieved in comparison to many other computational approaches. The K-fold cross-validation is performed to measure the consistency of the model for all the target classes. Finally, the validity of the proposed method on some AIDS therapy's drug molecules is proved.

Inspec keywords: drugs; feature extraction; sampling methods; pattern classification; regression analysis; learning (artificial intelligence)

Other keywords: features extraction; chemical compounds; AIDS therapy; efficient multilevel prediction model; molecular descriptors; negative health effects; estrogen receptor; human body; activity score; activity assessment; physicochemical properties; feature selection; synthetic minority oversampling technique algorithm; 8481 drug molecules; PaDEL-Descriptor; computational method; drug molecule; ER dataset

Subjects: Data handling techniques; Other topics in statistics; Knowledge engineering techniques; Biology and medical computing

References

    1. 1)
      • 27. Lersivirine – national center for biotechnology information. PubChem compound database’, https://pubchem.ncbi.nlm.nih.gov/compound/16739244, accessed 2 September 2018.
    2. 2)
      • 7. Jerrold, T., Taylor, B.: ‘Russell and Burch's 3Rs then and now: the need for clarity in definition and purpose’, J. Am. Assoc. Lab. Anim. Sci., 2015, 54, pp. 120132.
    3. 3)
      • 24. Usach, I, Iris, P.E., Melis, V, et al: ‘Non-nucleoside reverse transcriptase inhibitors: a review on pharmacokinetics, pharmacodynamics, safety and tolerability’, J. Int. AIDS Soc., 2013, 16, pp. 114.
    4. 4)
      • 4. Erin, K.S., Wei, X.: ‘Endocrine disrupting chemicals targeting estrogen receptor signaling: identification and mechanisms of action’, NIH Public Access, 2011, 24, 619.
    5. 5)
      • 9. Mahmud, T.H.K.: ‘Predictions of the ADMET properties of candidate drug molecules utilizing different QSAR/QSPR modelling’, Curr. Drug Metab., 2010, 11, pp. 285295.
    6. 6)
      • 26. Rilpivirine – national center for biotechnology information. PubChem compound database’, https://pubchem.ncbi.nlm.nih.gov/compound/6451164, accessed 2 September 2018.
    7. 7)
      • 16. Nitesh, V.C., Kevin, W.B., Lawrence, O.H.: ‘SMOTE: synthetic minority over-sampling technique’, J. Artif. Intell. Res., 2002, 16, pp. 321357.
    8. 8)
      • 3. Jerome, C.N., Sathish, S., Yangfan, Z.: ‘Predictive features of ligand-specific signaling through the estrogen receptor’, Mol. Syst. Biol., 2016, 12, pp. 114.
    9. 9)
      • 11. Tox21, national institute of health – toxicology in the 21st century’, https://ncats.nih.gov/tox21, accessed 22 March 2016.
    10. 10)
      • 5. Drug activity prediction’, https://www.kaggle.com/c/DrugActivityPrediction, accessed 22 April 2016.
    11. 11)
      • 18. FSelector package’, http://cran.r-project.org/web/packages/FSelector/FSelector.pdf, accessed 25 May 2017.
    12. 12)
      • 8. Andrey, A.T., Alla, P.T., Ivan, R.J., et al: ‘Comprehension of drug toxicity: software and databases’, Comput. Biol. Med., 2014, 45, pp. 2025.
    13. 13)
      • 25. Etravirine – national center for biotechnology information. PubChem compound database’, https://pubchem.ncbi.nlm.nih.gov/compound/193962, accessed 2 September 2018.
    14. 14)
      • 6. Arja, H.A., Juhani, R., Kari, A.T.: ‘Consensus kNN QSAR: a versatile method for predicting the estrogenic activity of organic compounds in silico. A comparative study with five estrogen receptors and a large, diverse set of ligands’, Environ. Sci. Technol., 2004, 38, (24), pp. 67246729.
    15. 15)
      • 20. Pang-Ning, T., Michael, S., Vipin, K.: ‘Introduction to data mining’ (Pearson India Education Services Pvt. Ltd., India, 2016, 5th edn.).
    16. 16)
      • 19. Asha, G.K., Manjunath, A.S., Jayram, M.A.: ‘Compartive study of attribute selection using gain ratio and correlation based feature selection’, Int. J. Inf. Technol. Know. Manag., 2010, 2, pp. 271277.
    17. 17)
      • 1. Rastogi, S.C., Mendiratta, N., Rastogi, P.: ‘Introduction to drug discovery, bioinformatics method and applications’ (Prentice-Hall of India, India, 2004, 1st edn.).
    18. 18)
      • 12. Filip, S.: ‘Prediction of compounds activity in nuclear receptor signaling and stress pathway assays using machine learning algorithms and low-dimensional molecular descriptors’, Front. Environ. Sci., 2015, 3, 7783.
    19. 19)
      • 14. Chun, W.Y.: ‘PaDEL-descriptor: an open source software to calculate molecular descriptors and fingerprints’, J. Comput. Chem., 2011, 32, pp. 14661474.
    20. 20)
      • 10. Andreas, M., Gunter, K., Thomas, U., et al: ‘Deep Tox: toxicity prediction using deep learning’, Front. Environ. Sci., 2016, 3, pp. 8094.
    21. 21)
      • 21. Divya, K., Prashant, S.R.: ‘Multilevel ensemble model for prediction of IgA and IgG antibodies’, Immunol. Lett., 2017, 184, 5160.
    22. 22)
      • 23. Ronald, E. W., Raymond, H. M., Sharon, L. M., et al: ‘Probability & statistics for engineers & scientists’ (Pearson India Education Services Pvt. Ltd., India, 2016, 9th edn.).
    23. 23)
      • 22. Prashant, S.R., Harish, S., Mahua, B., et al: ‘Quality assessment of modeled protein structure using physicochemical properties’, J. Bioinf. Comput. Biol., 2015, 2, pp. 1550005–(1–19).
    24. 24)
      • 2. Estrogen receptor - national center for biotechnology information, PubChem BioAssay database’, https://pubchem.ncbi.nlm.nih.gov/bioassay/743079, accessed 22 January 2016.
    25. 25)
      • 17. How to handle imbalanced classification problems in machine learning’, https://www.analyticsvidhya.com/blog/2017/03/imbalanced-classification-problem, accessed March 2017.
    26. 26)
      • 28. Androgen receptor – national center for biotechnology information. PubChem BioAssay database’, https://pubchem.ncbi.nlm.nih.gov/bioassay/743040, accessed 2 September 2018.
    27. 27)
      • 30. Achuthsankar, S.N., Aswathi, B.: ‘Sensitivity, specificity, accuracy and the relationship between them’, http://www.lifenscience.com/bioinformatics/sensitivity-specificity-accuracy-and, accessed 7 Jan 2019.
    28. 28)
      • 15. Nishtha, H., Seema, B., Prashant, S.R.: ‘B2FSE framework for high dimensional imbalanced data: a case study for drug toxicity prediction’, Neurocomputing, 2018, 276, pp. 3141.
    29. 29)
      • 13. Han, J., Kamber, M., Pei, J.: ‘Data mining concepts and techniques’ (Morgan Kaufmann is an imprint of Elsevier, USA, 2012, 3rd edn.).
    30. 30)
      • 29. Randomforest, rpart, kernlab, nnet’, https://cran.r-project.org/web/packages/, accessed 25 May 2017.
http://iet.metastore.ingenta.com/content/journals/10.1049/iet-syb.2018.5068
Loading

Related content

content/journals/10.1049/iet-syb.2018.5068
pub_keyword,iet_inspecKeyword,pub_concept
6
6
Loading