Student Retention Model

In today’s competitive workforce, a post secondary degree is a critical factor in young adults finding employment and earning a good living. However, only 57% of college students complete their degrees within six years and 31% drop out altogether.

An independent private university experiencing an average freshman-year-to-sophomore-year attrition rate of 20% engaged EdjAnalytics to help identify the at-risk freshmen and optimize their retention efforts. Edj’s predictive retention model helps universities identify high-risk students before classes begin and follows them throughout their college careers. Retaining students through their graduation not only affects student success but the universities’ bottom line in both cost and reputation.


Edj developed an attrition risk ratings model for undergraduate applicants for the university’s incoming 2015 class. The school provided data for over 5,000 freshmen students. In an effort to make the problem tractable, Edj restricted the scope of the analysis to students who did not return for their 2ndor 3rd semester. To predict student retention two models were developed– the first was exclusively restricted to application level data and the second had access to student records after their 1st semester. Data analyzed included demographics, high school academic and social records, financial standing (FAFSA) and college credit hours and grades.

The Application Model includes social factors, ACT Score, and High-school Academics (Variables)

The first semester model includes the application model, semester one scholarship, semester one hours, and semester one GPA (variables)


For the social scoring, a repeated cross-validation process identified five factors that were the most predictive of retention:

  • First generation status
  • Legacy status
  • Early application submission
  • High school classification (public or private)
  • Identified as Catholic

All of the above social factors are binary (yes/no) variables extracted from application documents. Self-identifying as Catholic was a significant predictive factor in whether a student would make it to their sophomore year. This particular effect is likely something that is idiosyncratic and not extractable to universities at large.

Edj found that high school GPA provides value in predicting retention, however it is not standardized across school systems. In order to contextualize high school GPA, a method was devised to normalize these numbers according to their school. Data on students from past graduating classes was utilized to predict how likely current students from the same high school would link to attrition at the university.

This graph illustrates the retention variance between private, Catholic and public high school students with the same GPA. For example, ‘HS 1’ represents a private, Catholic high school and indicates a student with a 2.0 GPA is 90% likely to continue to their sophomore year. However, a student at ‘HS 2’ which is a public school with the same 2.0 GPA is 67% likely to be retained by the university.

At the conclusion of the cohort’s first semester, the students’ updated academic, social and financial support variables were reassessed in the model. Edj found that including this additional information increased the model’s predictability.

While students’ high school social information and GPA are predictive, having access to near real time information allows university administration to build focused intervention strategies to target those students at risk of leaving.

The below table represents Edj’s predictions compared to the true outcomes for the 2015 freshman class.

Freshmen to Sophomore Attrition Predictions vs. Results

*Risk Ratings – (1=Lowest Risk, 5=Highest Risk)