Abstract:
This paper revisits five previous studies spanning a decade, which were used in the development and validation of a computational model named PreSS (Predict Student Success) to predict student success in an introductory programming module with 80% accuracy. PreSS was initially developed over three studies from 2004 – 2006, recording data from 240 students across multiple institutions incorporating three studies. These studies examined 25 factors and six machine learning algorithms, resulting in a model that used only three factors and the naïve Bayes machine learning algorithm (coupled with multiple other statistical techniques) called PreSS. The authors completed two subsequent independent studies over the academic years 2013- 2015 that were used to investigate if the PreSS model was still valid on a modern data set while also collecting new data (factors) that were not considered during the initial development of PreSS. These two studies successfully validated PreSS, even with a decade separating the student profiles and landscapes.
This paper has two main objectives; the first objective is an investigation on factors gathered during the initial development of PreSS that were not used in the final model. This is important as several factors were found to be significant at the time but were excluded as their associated sample size was small and the goal was to develop the most generalizable model possible. PreSS is arguably a universal model for two reasons. 1) PreSS is independent of the programming language used. PreSS has been so far been exposed to 6 different programming languages that include: Java, C#, Processing, Python, Visual Basic and C++ and still maintained the same high level of accuracy. 2) PreSS is also independent of any specifics of the cohort sampled such as gender bias, institution bias, age bias etc. The second objective of this paper is an examination of additional data collected in two recent independent studies, to determine if incorporating some or all of these new factors could improve the accuracy of PreSS.
This paper reports on 90 experiments using data from all five previous studies, examining 126 possible factors. The work successfully identified 16 factors that when used in combination with the original PreSS factors either produced significant increases in prediction accuracy or exhibited noteworthy findings. Some of the 16 factors produced substantial gains in accuracy (in some cases in excess of 8%) or when integrated into the PreSS model, revealed interesting substitutions of factors.
This paper has two main objectives; the first objective is an investigation on factors gathered during the initial development of PreSS that were not used in the final model. This is important as several factors were found to be significant at the time but were excluded as their associated sample size was small and the goal was to develop the most generalizable model possible. PreSS is arguably a universal model for two reasons. 1) PreSS is independent of the programming language used. PreSS has been so far been exposed to 6 different programming languages that include: Java, C#, Processing, Python, Visual Basic and C++ and still maintained the same high level of accuracy. 2) PreSS is also independent of any specifics of the cohort sampled such as gender bias, institution bias, age bias etc. The second objective of this paper is an examination of additional data collected in two recent independent studies, to determine if incorporating some or all of these new factors could improve the accuracy of PreSS.
This paper reports on 90 experiments using data from all five previous studies, examining 126 possible factors. The work successfully identified 16 factors that when used in combination with the original PreSS factors either produced significant increases in prediction accuracy or exhibited noteworthy findings. Some of the 16 factors produced substantial gains in accuracy (in some cases in excess of 8%) or when integrated into the PreSS model, revealed interesting substitutions of factors.