The most dangerous technology in the justice system is rarely the one that announces itself as revolutionary—it’s the one that quietly expands its job description. Across industries, “scope creep” of AI systems, the tendency for developers and users to push these systems into uses beyond their ideal functions, is a growing concern. In courts, technologies like algorithmic risk assessments and other automated systems for assisting in judicial decisionmaking have already expanded significantly beyond their original use cases. Now new technologies like the American Arbitration Association’s “AI Arbitrator” are hitting a market hesitant to experiment with corporate bottom lines but that would happily increase efficiency in dealing with individuals, creating risks of scope creep that could have significant impacts on individual rights and outcomes in courtrooms.
Consider the trajectory of Correctional Offender Management Profiling for Alternative Sanctions (COMPAS), a proprietary risk assessment algorithm that has become a staple in American courts. When it was first introduced, its proposed use was modest and targeted at providing services to people leaving incarceration. The idea was to help corrections officials identify treatment needs among incarcerated people and better allocate rehabilitative resources to ensure their reintegration into the community on their return. In that narrow lane, if a tool could highlight who needed substance abuse counseling or anger management classes, perhaps it could make prisons marginally less blunt instruments. But COMPAS did not stay in its lane.
Over time, courts and policymakers began to use COMPAS scores for more decisions than it was originally intended for. Judges consulted it when determining pretrial release and bail—decisions about whether someone should remain free while presumed innocent. Then came sentencing, where COMPAS scores began to inform how long someone would spend behind bars. What started as a tool for treatment planning became, in practice, a shadow arbiter of liberty.
A widely cited investigation by ProPublica underscores how dangerous that drift can be. Its analysis of COMPAS found that the tool’s predictions were not only imperfect but systematically skewed: Black defendants were far more likely to be falsely labeled as high risk, while white defendants were more often incorrectly classified as low risk. In other words, the algorithm reproduced and in some cases amplified the very disparities it was assumed to neutralize. Just as troubling, COMPAS’s proprietary nature meant that defendants could not meaningfully challenge the scores influencing decisions about their liberty. These findings highlight a core failure mode of scope creep: when a system that lacks transparency and exhibits biased error rates is extended into higher-stakes decisions, it does not simply carry its flaws forward—it magnifies their consequences.
This kind of scope creep tends to be the default lifecycle of algorithmic systems introduced into bureaucracies hungry for efficiency and certainty. Systems like COMPAS offer a veneer of objectivity in environments saturated with discretion and bias. They produce numbers, categories, and risk levels that can be cited, recorded, and defended. For overburdened courts, they promise speed. For risk-averse officials, they offer cover. And for institutions wary of scrutiny, they shift responsibility onto a machine. What they do not offer—at least not without rigorous validation—is legitimacy outside their original design.
The problem is not just that COMPAS was used beyond its intended purpose. It’s that each incremental expansion felt reasonable in isolation. If the tool can identify who might benefit from treatment, why not use it to estimate who might reoffend? If it can estimate reoffending, why not use it to guide bail decisions? And if it can guide bail, why not sentencing?
The risk of similar scope creep for the “AI Arbitrator” seems high. Arbitration has long been attractive to businesses precisely because it offers efficiency, confidentiality, and control. Those same features make it an ideal testing ground for artificial intelligence. And yet, when asked whether AI should actually decide cases, the institutions and professionals who operate in this space have shown a striking degree of hesitation. Surveys of arbitration practitioners consistently find that while AI is welcomed for document review, research, and case management, there is deep resistance to allowing it to exercise judgment. Concerns about bias, error, and the inability of AI systems to explain their reasoning remain pervasive.
This hesitation is reasonable. Arbitration, at its core, is an exercise of discretion. It requires weighing incomplete evidence, interpreting ambiguous contracts, and making normative judgments about fairness. These are precisely the areas where AI systems remain weakest and most opaque. The so-called “black box” problem—where even developers cannot fully explain how an output was generated—poses a direct challenge to the legitimacy of any decision-making system that claims authority over rights and obligations.
For now, these concerns have produced a cautious equilibrium. Businesses and arbitration providers are experimenting with AI at the margins—deploying it in narrow, low-stakes contexts where the upside is efficiency and the downside is contained. What they are not doing, at least not yet, is handing over the final decision. But that restraint may not hold evenly across all contexts.
If history is any guide, new decision-making technologies are rarely introduced first where the stakes are highest for those with power. They are introduced where the “guinea pigs” don’t have power to avoid becoming test subjects. In the case of arbitration, that may mean the widespread use of AI not in disputes between sophisticated corporate actors, but in the far more routine, asymmetrical disputes between companies and individuals.
Forced arbitration clauses already channel millions of consumer and employment disputes out of public courts and into private systems designed by the very companies involved. These are high-volume, low-visibility cases: wage disputes, wrongful termination claims, consumer complaints. They are also precisely the kinds of cases where efficiency pressures are greatest and individualized consideration is most easily sacrificed.
In that environment, the incentives align differently. A corporation deciding whether to adopt AI arbitration for its own high-value contractual disputes may hesitate, wary of unpredictable errors and reputational risk. The same corporation deciding how to process thousands of employee claims may see the calculation differently. Speed, cost reduction, and consistency begin to outweigh concerns about nuance or explainability.
The result is a likely asymmetry in adoption. AI arbitrators may arrive first not as tools used between equals, but as systems imposed on individuals with little bargaining power. Employees and consumers—already bound by contracts they did not negotiate—may find their disputes resolved by systems that were never validated for that purpose, and whose inner workings they cannot meaningfully challenge. This is where scope creep becomes something more than a technical issue. It becomes a question of rights.
If AI systems in arbitration inherit the same trajectory as tools like COMPAS, their role will not remain confined to narrow, low-risk applications. What begins as assistance in document-heavy disputes can evolve into recommendations, then into default judgments, and eventually into de facto decision-makers. Each step will be justified in isolation, each will promise efficiency, and each will be harder to roll back than the last.
For workers, the implications are significant. Arbitration already limits transparency and access to justice. Introducing AI into that system risks compounding those limitations with opacity, automation bias, and the diffusion of accountability. A worker challenging a termination or wage violation may not only face a private system tilted toward employers, but one in which the reasoning behind a decision is inaccessible or unchallengeable.
The danger is not that AI will suddenly replace human arbitrators across the board. It is that, in the spaces where oversight is weakest and incentives for efficiency are strongest, it will not need to. And by the time its role becomes visible, it will already feel indispensable.
