The British Secondary School Test Debacle—When Algorithms Are Designed to Churn Out a Result, not a Rational Conclusion

Few Americans have heard of the standardized test disaster that occurred this year in Britain, and has shaken both the British education system and the Johnson government. This disaster should be considered a warning when any entity, in particular the government, wants to use an algorithm to justify a result rather than to make objective determinations.
This debacle started with a seemingly beneficial result in mind. The British education system had long been accused of “grade creep”, such that more top marks were being given out than was warranted by the quality of those students receiving the marks. To make matters worse, that creep seemed to be favoring students of the upper classes who attend the most posh public (i.e. private, to the confusion of the average U.S. citizen) high schools. After all, parents pay good money to ensure that by attending those exclusive secondar schools, their children will be more likely to be admitted to the best universities in the British Isle.
In response to these criticisms, the British government hired designers to develop an algorithm that would prevent grade creep. The algorithm would determine the percentage of students that “should” fall within each grade range. Furthermore, each student’s expected grade as given by a teacher would be evaluated using both historical results for students with similar schooling as that student, as well as expected results for all students taking tests that year. Those historical results and expected results were, in turn, based on algorithmic analysis. If the student’s final grade given by a teacher deviated from that expected grade, the algorithm could override the teacher’s grade and raise or lower the student’s score.
The resulting re-grading was so disastrous the government had no choice but to scrap the entire plan and fall back on the teachers’ initial test scores. Those students who came from schools with historically low test scores found their grades lowered, notwithstanding their personal achievement. Students from schools with historically high test score results, particularly those in small classes—in other words the upper class private schools– found their scores revised upwards. The alterations were so clearly unfair, and affected so many students striving to perform better than society assumed of them, that the algorithm results were deemed clearly unfair and biased, notwithstanding the fact these algorithms were incredibly intricate in design, because they were to be tools to overcome unfairness and bias. In fact, the algorithm creators wrote a 317-page report explaining just how fair and objective the algorithm results would be. See Will Bedingfield, Everything that went wrong with the botched A-levels algorithm, WIRED (Aug 19, 2020), https://www.wired.co.uk/article/alevel-exam-algorithm,
So what went wrong? The complicated answer is the many problems were to be expected, given the complexity of the algorithm. The simple answer is that this outcome is a prime example of when governments design and use algorithms to reach a desired outcome, rather than use algorithms to reach a proper outcome. Moreover, this demonstrates what happens when algorithms use a bell curve to define outcomes—those persons who have traditionally fallen outside the norms which establish the bell curve are those who are most detrimentally affected by the forced outcomes required by a bell curve. Finally, this proves clearly that algorithms will go wrong. Even when a majority of the algorithm’s determinations are accurate, no algorithm will be perfectly accurate. When thousands of people, like the British graduating student population, are affected by an algorithm, the number harmed by inevitably accurate results could likewise be in the thousands, even with the best algorithm. As this debacle shows, algorithms that are “just good” will result in too many individuals actually harmed.
Finally, one must remember what could have happened if government officials had not acted. How would the average student be able to protest his or her wrongful treatment? They could never prove how the algorithm harmed them, or perhaps even if they were indeed one of the individuals harmed, because the process was so non-transparent. The government, in fact, could easily establish that for “most” students, the results were acceptably accurate. Inevitably, the government would be buttressed by experts paid by the algorithm designer to argue the algorithm was acceptable. Students would face discrimination, as well as harm from arbitrary and unreasonable results, which would clearly be constitutional but for the fact the students would not have the resources to meet their burden of proof. In fact, even with substantial resources, given the Black Box nature of algorithms, the students still would never be able to meet their burden of proof, meaning that the use of the algorithms by the government was sure to preclude any student’s due process rights. This debacle is a foreshadowing of both the harm that could befall recipients of government benefits and determinations in the Age of Algorithms, and the inevitable deprivation of constitutional rights that will preclude those harmed from ever being made whole.