Optimized Feature Selection and Transformations for Early Stage Prediction of Autism Using Supervised Machine Learning Models

Full Text (PDF, 1236KB), PP.73-89

Views: 0 Downloads: 0


Praveena K N 1,* R Mahalakshmi 1 Manjunath C 2 Ahmad Faiz Zubair 3 P. Karthikeyan 4

1. Bio-intelligence Lab, Department of Computer Science and Engineering, PresidencyUniversity, Itkalpur, Rajanukunte, Bengaluru

2. School of Mechanical Engineering, REVA University, Yelahanka, Bengaluru

3. School of Mechanical Engineering, College of Engineering, Universiti Teknologi Mara, Kampus Pulau Pinang, 13500 Permatang Pauh, Pulau Pinang, Malaysia

4. Dept. of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan-62102

* Corresponding author.

DOI: https://doi.org/10.5815/ijmecs.2023.06.06

Received: 2 Jan. 2023 / Revised: 25 Mar. 2023 / Accepted: 25 May 2023 / Published: 8 Dec. 2023

Index Terms

Autism, AQ-10 dataset, ML algorithms, Feature transformation, Feature selection technique, predictive model


Autism Spectrum Disorder (ASD) is a neurodevelopmental syndrome which cannot be curable but can be predicted in early stage. Early prediction and cure may help to diagnose the autism. In existing methods, prediction of best feature is not identified for detecting the autism in early stage. In this proposed research, prediction of ASD has been done by identifying the best feature transformation technique with best ML classifier and finding out the most significant feature for diagnosis of autism in early age. Early-detected ASD datasets pertaining to toddler and child are collected and applied few Feature transformation techniques, comprising log, power-box-cox and yeo-Johnson transformations to these datasets. Then, using these ASD datasets, several classification approaches were applied, and their efficiency was evaluated. Adaboost given 100% accuracy for toddler dataset and whereas, Random forest showed 98.3% accuracy for child datasets. The feature transformations ensuing the best prediction was Log, Power- Box cox and Yeo-Johnson Transformation for toddler and Log transformation for children datasets. After these exploration, various feature selection techniques like univariate (UNI) and recursive feature elimination (RFE) are applied to these transformed datasets to recognize the most significant ASD risk feature to predict the autism in early stage for toddler and child data. It is found that A5 feature is most significant feature for toddler, A4 stands most significant feature for child based on univariate and RFE. This benefits the doctor to provide the suitable diagnosis in their early stage of life. The results of these logical methodologies show that ML methods can yield precise predictions of ASD when they are accurately optimised. This shows that using these models for early ASD detection may be feasible.

Cite This Paper

Praveena K N, Mahalakshmi R, Manjunath C, Ahmad Faiz Zubair, P. Karthikeyan, "Optimized Feature Selection and Transformations for Early Stage Prediction of Autism Using Supervised Machine Learning Models", International Journal of Modern Education and Computer Science(IJMECS), Vol.15, No.6, pp. 73-89, 2023. DOI:10.5815/ijmecs.2023.06.06


[1]Akter, Tania, Md Shahriare Satu, Md Imran Khan, Mohammad Hanif Ali, Shahadat Uddin, Pietro Lio, Julian MW Quinn, and Mohammad Ali Moni. "Machine learning-based models for early stage detection of autism spectrum disorders." IEEE Access 7 (2019): 166509-166527.
[2]C. Allison, B. Auyeung, and S. Baron-Cohen, ‘‘Toward brief ‘red flags’ for autism screening: The short autism spectrum quotient and the short quantitative checklist in 1,000 cases and 3,000 controls,’’J. Amer. Acad. Child Adolescent Psychiatry, vol. 51, no. 2, pp. 202–212, 2012.
[3]F. Thabtah, F. Kamalov, and K. Rajab, ‘‘A new computational intelligence approach to detect autistic features for autism screening,’’Int. J. Med. Inform., vol. 117, pp. 112–124, Sep. 2018.
[4]F. Thabtah and D. Peebles, ‘‘A new machine learning model based on induction of rules for autism detection”, Health Inform. J., 2019, Art. no. 1460458218824711, doi: 10.1177/1460458218824711.
[5]M. S. Satu, F. F. Sathi, M. S. Arifen, M. H. Ali, and M. A. Moni, ‘‘Early detection of autism by extracting features: A case study in Bangladesh,’’in Proc. 1st Int. Conf. Robot., Elect. Signal Process. Techn. (ICREST), Jan. 2019, pp. 87–90.
[6]H. Abbas, F. Garberson, E. Glover, and D. P. Wall, ‘‘Machine learning approach for early detection of autism by combining questionnaire and home video screening,’’ J. Amer. Med. Informat. Assoc., vol. 25, no. 8,pp. 1000–1007, 2018.
[7]F. Thabtah, ‘‘Machine learning in autistic spectrum disorder behavioral research: A review and ways forward,’’ Informat. Health Social Care vol. 44, no. 3, pp. 278–297, 2018.
[8]F. Thabtah, ‘‘Autism spectrum disorder screening: Machine learning adaptation and DSM-5 fulfillment,’’ in Proc. 1st Int. Conf. Med. Health Inform., 2017, pp. 1–6.
[9]K. C. Howlader, M. S. Satu, A. Barua, and M. A. Moni, ‘‘Mining significant features of diabetes mellitus applying decision trees: A case study in Bangladesh,’’bioRxiv, Nov. 2018, Art. no. 481994.
[10]M. A. Hossain, S. M. S. Islam, J. M. Quinn, F. Huq, and M. A. Moni, ‘‘Machine learning and bioinformatics models to identify gene expression patterns of ovarian cancer associated with disease progression and mortality,’’J. Biomed. Inform., vol. 100, Oct. 2019, Art. no. 103313, doi: 10.1016/j.jbi.2019.103313.
[11]M. Duda, R. Ma, N. Haber, and D. P. Wall, ‘‘Use of machine learning for behavioral distinction of autism and ADHD,’’Transl. Psychiatry, vol. 6, no. 2, p. e732, 2016.
[12]K. L. Goh, S. Morris, S. Rosalie, C. Foster, T. Falkmer, and T. Tan, ‘‘Typically developed adults and adults with autism spectrum disorder classification using centre of pressure measurements,’’ in Proc. IEEE Int.Conf. Acoust., Speech Signal Process. (ICASSP), Mar. 2016, pp. 844–848.
[13]A. Crippa, C. Salvatore, P. Perego, S. Forti, M. Nobile, M. Molteni, and I. Castiglioni, ‘‘Use of machine learning to identify children with autism and their motor abnormalities,’’ J. Autism Develop. Disorders, vol. 45,no. 7, pp. 2146–2156, 2015.
[14]Autism Screening Data for Toddlers. Accessed: Sep. 10, 2018. [Online]. Available: https://www.kaggle.com/fabdelja/autism-screening-for- toddlers
[15]UCI Machine Learning Repository: Autistic Spectrum Disorder Screening Data for Children Data Set. Accessed: Sep. 10, 2018. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Autistic+ Spectrum+Disorder+Screening+Data+for+Children++
[16]UCI Machine Learning Repository: Autistic Spectrum DisorderScreening Data for Adolescent Data Set. Accessed: Sep. 10, 2018.[Online].Available:https://archive.ics.uci.edu/ml/datasets/Autistic+Spectrum+Disorder+Screening+Data+for+Adolescent+++
[17]UCI Machine Learning Repository: Autism Screening Adult Data Set. Accessed: Sep. 10, 2018. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/Autism+Screening+Adult
[18]Zhang, Yao, Jianxue Wang, and Xu Luo. "Probabilistic wind power forecasting based on logarithmic transformation and boundary kernel." Energy conversion and management 96, pp. 440-451, (2015).
[19]Liu, Yanli, Yourong Wang, and Jian Zhang. "New machine learning algorithm: Random forest." In International Conference on Information Computing and Applications, pp. 246-252. Springer, Berlin, Heidelberg, 2012.
[20]Wang, Lishan. "Research and implementation of machine learning classifier based on KNN." In IOP Conference Series: Materials Science and Engineering, vol. 677, no. 5, p. 052038. IOP Publishing, 2019.
[21]Bujlow, T. Riaz, and J. M. Pedersen, ‘‘A method for classification of network traffic based on C5.0 machine learning algorithm,’’ in Proc. Int.Conf. Comput., Netw. Commun. (ICNC), Jan./Feb. 2012, pp. 237–241
[22]Ma, Baoshan, Fanyu Meng, Ge Yan, Haowen Yan, Bingjie Chai, and Fengju Song. "Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data." Computers in biology and medicine 121 (2020): 103761.
[23]S. Satu, T. Akter, and M. J. Uddin, ‘‘Performance analysis of classifyinglocalization sites of protein using data mining techniques and artificial neu-ral networks,’’ inProc. Int. Conf. Elect., Comput. Commun. Eng. (ECCE),Feb. 2017, pp. 860–865.
[24]Praveena, K. N., and R. Mahalakshmi. "Classification of Autism Spectrum Disorder and Typically Developed Children for Eye Gaze Image Dataset using Convolutional Neural Network." International Journal of Advanced Computer Science and Applications 13, no. 3 (2022).
[25]S. Satu, S. Ahamed, F. Hossain, T. Akter, and D. M. Farid, ‘‘Mining traffic accident data of N5 national highway in bangladesh employing decision trees,’’ in Proc. IEEE Region 10 Humanitarian Technol. Conf. (R10-HTC) ,Dec. 2017, pp. 722–725.
[26]M. S. Satu, S. Ahamed, A. Chowdhury, and M. Whaiduzzaman, ‘‘Exploring significant family income ranges of career decision difficulties ofadolescents in Bangladesh applying regression techniques,’’ inProc. Int.Conf. Elect., Comput. Commun. Eng. (ECCE), Feb. 2019, pp. 1–6.
[27]M. E. Hossain, A. Khan, M. A. Moni, and S. Uddin, ‘‘Use of electronic health data for disease prediction: A comprehensive literature review,’’ IEEE/ACM Trans. Comput. Biol. Bioinf., 2019, doi: 10.1109/TCBB.2019.2937862.
[28]M. R. Islam, A. R. M. Kamal, N. Sultana, R. Islam, M. A. Moni, and A. Ulhaq, ‘‘Detecting depression using K nearest neighbors (KNN) classification technique,’’ in Proc. Int. Conf. Comput., Commun., Chem., Mater. Electron. Eng. (IC4ME2), Feb. 2018, pp. 1–4.
[29]Praveena, K.N., Mahalakshmi, R. (2022). A Survey on Early Prediction of Autism Spectrum Disorder Using Supervised Machine Learning Methods. In: Rana, N.K., Shah, A.A., Iqbal, R., Khanzode, V. (eds) Technology Enabled Ergonomic Design. HWWE 2020. Design Science and Innovation. Springer, Singapore. https://doi.org/10.1007/978-981-16-6982-8_2
[30]K. S. Oma, P. Mondal, N. S. Khan, M. R. K. Rizvi, and M. N. Islam, ‘‘A machine learning approach to predict autism spectrum disorder,’’ in Proc. Int. Conf. Electr., Comput. Commun. Eng. (ECCE) , Feb. 2019,pp.1–6.
[31]Omar, Kazi Shahrukh, Prodipta Mondal, Nabila Shahnaz Khan, Md Rezaul Karim Rizvi, and Md Nazrul Islam. "A machine learning approach to predict autism spectrum disorder." In 2019 International conference on electrical, computer and communication engineering (ECCE), pp. 1-6. IEEE, 2019.
[32]H. Talabani and E. Avci, ‘‘Performance comparison of SVM kernel types on child autism disease database,’’ in Proc. Int. Conf. Artif. Intell. Data Process. (IDAP), Sep. 2018, pp. 1–5
[33]F.Thabtah, ‘‘An accessible and efficient autism screening method for behavioural data and predictive analyses”, Health Informat. J., Sep. 2018, Art. no. 1460458218796636.