Educational data mining with regression algorithms: a study on performance prediction

Authors

DOI:

https://doi.org/10.15536/reducarmais.6.2022.2691

Keywords:

Educational Data Mining, Machine Learning, Regression Algorithms, Performance Prediction

Abstract

With the increasing availability of data, especially in the educational context, Educational Data Mining (DEM) has become increasingly important for decision making in this context. One of the main objectives of the MDE is the performance prediction, because when you know in advance about the students' performance, it is possible to intervene, avoiding failures, and even dropouts. In this sense, this study aims to predict the performance of students, in a set of public data, using Regression algorithms, in addition to indicating the main predictive attributes for student performance. For this, a MDE process based on 4 steps described by Aggarwal (2015) was implemented. As a result, it was identified that for the two sets of analyzed data, Decision Trees was the most accurate, with an accuracy of 90% for the Mathematics subject, and Random Forest had the best performance for the data referring to the Portuguese subject, 80% accuracy. In addition, it was found that attributes related to school activities are more predictors of student performance, however some attributes resulting from demographic and socioeconomic characteristics also influence performance.

Downloads

Download data is not yet available.

Author Biographies

Vanessa Faria de Souza, Instituto Federal de Educação, Ciência e Tecnologia do Rio Grande do Sul - IFRS

Doutoranda no PPGIE (Programa de Pós-Graduação em Informática na Educação) da Universidade Federal do Rio Grande do Sul (UFRGS). Mestre em Informática pelo PPGI (Programa de Pós-Graduação em Informática) da Universidade Tecnológica Federal do Paraná (UTFPR), na área de Computação Aplicada, e ênfase em Engenharia de Software. Possuo especialização em Educação Especial Inclusiva, com ênfase em Tecnologia Assistiva. Sou graduada em Sistemas de Informação pela Universidade Estadual do Norte do Paraná (2011). Completei a Licenciatura em Matemática, pela UTFPR. Atualmente sou docente dedicação exclusiva no Instituto Federal do Rio Grande do Sul, Campus Ibirubá, estou em afastamento para a realização do Doutorado. Ministro aulas nos Cursos de Ciência da Computação, Técnico em Informática Integrado do Ensino Médio, Licenciatura em Matemática e Especialização em Ensino de Linguagens e suas Tecnologias. Também já atuei como Professora do Magistério Superior na Universidade Estadual do Norte do Paraná (UENP) nos cursos de Graduação Ciência da Computação e Sistemas de informação, nas disciplinas de Sistemas Digitais, Projeto e Análise de Algoritmo, Tópicos Avançados em Computação, Computação Simbólica e Numérica, Metodologia Científica. Assim como na UTFPR. Também já atuei como professora de Matemática no Ensino Básico.

Prof. Dr. Sílvio Cézar Cazella, Universidade Federal de Ciências da Saúde de Porto Alegre

Sílvio César Cazella completed his Ph.D. in Computer Science at the Federal University of Rio Grande do Sul in 2006, having completed his Ph.D. "sandwich" at the University of Alberta in Canada. Master in Computer Science from the Federal University of Rio Grande do Sul in 1997. Graduated in Computer Science from the Pontifical Catholic University of Rio Grande do Sul in 1993. He is currently an Associate Professor - Level II at the Federal University of Health Sciences in Porto Alegre. Effective Professor of the Graduate Program in Health Education (UFCSPA), Effective Professor of the Graduate Program in Information Technologies and Health Management (UFCSPA), collaborator of the Graduate Program in Health Science (UFCSPA) and collaborator of the Graduate Program in Informatics in Education (UFRGS). In his Lattes curriculum, the most frequent terms in the context of scientific, technological and artistic-cultural production are: Recommendation Systems, Data Mining, Software Engineering, Multi-Agent Systems, Data Mining, Artificial Intelligence, Specialist Systems, Data Warehouse, Education a Distance, Informatics in Education and Continuing Education. He works as an Analyst and Designer (architecture) of Structured and Object-Oriented Information Systems, as a Business Analyst, and as a consultant in companies in the field of Information Technology. He holds TOEFL, IELTS for English and DELE for Spanish.

References

AGGARWAL, C. C. (2015). Data Mining: The Textbook. 1. ed. New York, USA: Springer. E-book. Disponível em:< https://doi.org/10.1007/978-3-319-14142-8>.

ALMASRI, A.; CELEBI, E.; ALKHAWALDEH, R.S. (2019). EMT: ensemble meta-based tree model for predicting student performance. Scientific Programming, v. 19, p. 1-14. Disponível em: <https://doi.org/10.1155/2019/3610248>.

BAKER, R.; ISOTANI, S.; CARVALHO, A. (2011). Mineração de Dados Educacionais: Oportunidades para o Brasil. Revista Brasileira de Informática na Educação, v. 19, n. 2, p. 3–13. Disponível em: <https://doi.org/10.5753/rbie.2011.19.02.03>.

BAKER, R. S.; INVENTADO, P. S. (2014). Educational Data Mining and Learning Analytics. In: J.A. Larusson and B. White (EDS.) (org.). Learning Analytics: From Research to Practice. 1. ed. New York, USA: Springer, 1–195. E-book. Disponível em: <https://doi.org/10.1007/978-1-4614-3305-7>.

BAKER, R. S. J. D. (2015). Big data and education. 2. ed. New York, USA: A Massive Online Open Textbook (MOOT) - Teachers College, Columbia University. Disponível em: <http://www.columbia.edu/~rsb2162/bigdataeducation.html>.

CORTEZ, P.; SILVA, A. (2008). Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008). Disponível em: <http://www3.dsi.uminho.pt/pcortez/student.pdf>.

DABHADE, P.; AGARWAL, R.; ALAMEEN, K. P.; FATHIMA, A. T.; SRIDHARAN, R.; GOPAKUMAR, G. (2021). Educational data mining for predicting students’ academic performance using machine learning algorithms. Materials Today: Proceedings. Disponível em: <https://doi.org/10.1016/j.matpr.2021.05.646>.

MALINI, J.; KALPANA, Y. (2021). Investigation of factors affecting student performance evaluation using education materials data mining technique. Materials Today: Proceedings. Disponível em: <https://doi.org/10.1016/j.matpr.2021.05.026>.

MDE. Sociedade Internacional de Mineração de Dados Educacionais. (2020). Disponível em: <http://educationaldatamining.org/>. Acesso em: 15 set. 2021.

JAPKOWICZ, N.; SHAH, M. (2014). Evaluating Learning Algorithms: A Classification Perspective. 1a Ed. ed. Cambridge, E-book. Disponível em: <https://dl.acm.org/doi/book/10.5555/1964882>.

KUBAT, M. (2017). An Introduction to Aprendizagem de Máquina. 2. ed. Coral Gables, FL, USA: Springer. E-book. Disponível em: <https://doi.org/10.1007/978-3-319-63913-0>.

RIESTRA-GONZÁLEZ, M.; PAULE-RUÍZ, M. DEL P.; ORTIN, F. (2021). Massive LMS log data analysis for the early prediction of course-agnostic student performance. Computers & Education, v. 163, p. 1-20. Disponível em: <https://doi.org/10.1016/j.compedu.2020.104108>.

ROMERO, C.; VENTURA, S. (2013). Data mining in education. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, v. 3, n. 1, p. 12–27. Disponível em: <https://doi.org/10.1002/widm.1075>.

ROMERO, C.; VENTURA, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, v. 10, n. 3, p. 1–21. Disponível em: <https://doi.org/10.1002/widm.1355>.

SHAHIRI, A. M.; HUSAIN, W.; RASHID, N. A. (2015). A Review on Predicting Student’s Performance Using Data Mining Techniques. Procedia Computer Science, v. 72, p. 414–422. Disponível em: <https://doi.org/10.1016/j.procs.2015.12.157>.

SINGH, R.; PAL, S. (2020). Machine learning algorithms and ensemble technique to improve prediction of students performance. International Journal of Advanced Trends in Computer Science and Engineering, v. 9, n. 3, p. 3970-3976. Disponível em: <https://doi.org/10.30534/ijatcse/2020/221932020>.

SOUZA, V. F.; PERRY, G. T. (2020). Tendências de Pesquisas em Mineração de Dados Educacionais em MOOCs: um Mapeamento Sistemático. Revista Brasileira de Informática na Educação, v. 28, p. 491-508. Disponível em: <http://dx.doi.org/10.5753/rbie.2020.28.0.491>.

TALAL, H.; SAEED, S. (2019) A study on adoption of data mining techniques to analyze academic performance. ICIC Express Letters, Part B: Applications, v. 10, n. 8, p. 681-687. Disponível em: <http://doi.org/10.24507/icicelb.10.08.681>.

YAACOB, W. F. W.; NASIR, S. A. M; YAACOB, W. F. W.; SOBRI, N. M. (2019). Supervised data mining approach for predicting student performance. Indonesian Journal of Electrical Engineering and Computer Science, v. 16, n. 3, p. 1584- 1592. Disponível em: <http://doi.org/10.11591/ijeecs.v16.i3.pp1584-1592>.

Published

2022-02-18

How to Cite

Faria de Souza, V., & Cazella, S. C. (2022). Educational data mining with regression algorithms: a study on performance prediction. Educar Mais, 6, 183–198. https://doi.org/10.15536/reducarmais.6.2022.2691