AI in the Courtroom: The Rise of Legal Algorithms vs. Human Judgment

Apr 16·7 min read·AI-assisted · human-reviewed

When a judge in Wisconsin sentenced Eric Loomis to six years in prison in 2013, a proprietary algorithm called COMPAS had rated him as "high risk" for recidivism. Loomis never saw the formula, couldn't challenge its inputs, and the Wisconsin Supreme Court later upheld the sentence even while acknowledging the algorithm's secrecy. This case crystallizes a tension that now cuts across every level of the legal system: algorithms can process evidence faster than any human, but they also encode errors, biases, and opaque logic. For lawyers, judges, and defendants, understanding where AI helps and where it harms is no longer optional—it's essential. This article breaks down the specific tools in use, their documented limitations, and actionable strategies to preserve fairness when machines participate in justice.

Where Legal Algorithms Already Operate

Legal AI is not a future concept. It handles real tasks today in at least four major areas: pretrial risk assessment, document review, sentencing recommendations, and predictive policing. Each domain uses a different class of algorithm, trained on different data, and carries distinct risks.

Pretrial Risk Assessment Tools

COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) remains the most scrutinized tool. Developed by Equivant (formerly Northpointe), it scores defendants on a scale from 1 to 10 for risk of failure to appear, new criminal activity, and violent recidivism. A 2016 ProPublica investigation found that COMPAS correctly predicted recidivism for white defendants 59.5% of the time, but for Black defendants the false-positive rate was 45.6%—nearly double the white rate of 23.5%. Courts in New York, California, and Florida now use variants of this tool.

E-Discovery and Document Review

Technology-assisted review (TAR) platforms such as Relativity, Everlaw, and Catalyst process millions of documents in civil litigation. Under the 2015 Da Silva Moore ruling, a New York federal court officially validated TAR as a discovery method. These systems use active learning: a lawyer codes a few hundred relevant documents, the algorithm identifies similar ones, and the system iterates. In the 2017 case Rio Tinto v. Vale, TAR reviewed 4.9 million documents in one month—work that would have required hundreds of lawyers for a year.

Algorithmic Sentencing: Trade-offs and Training Data Problems

Sentencing algorithms go a step beyond pretrial tools by directly recommending prison terms. The Public Safety Assessment (PSA), used in more than 300 jurisdictions, assigns points based on age, criminal history, and prior failures to appear—but explicitly excludes race, gender, and income. Yet studies from the Arnold Foundation (the PSA's developer) show that PSA scores correlate strongly with socioeconomic status: defendants from high-poverty ZIP codes score 1.7 points higher on average than those from low-poverty areas, controlling for criminal history.

One edge case exposes the flaw: a 22-year-old charged with petty theft who missed an earlier court date due to a hospitalization may score higher than a 39-year-old with three prior misdemeanors. The algorithm cannot weigh context. Ohio's 2018 Criminal Sentencing Commission report noted that judges followed PSA recommendations in 73% of cases, but in the 27% of cases where they overrode the algorithm, recidivism rates were actually lower—suggesting human judgment corrected algorithmic errors.

Bias Patterns and How They Propagate

Bias in legal algorithms does not begin in the code. It begins in the historical data used to train the models. Arrest records themselves reflect law enforcement patterns: drug offenses, for example, are prosecuted at higher rates in Black communities despite similar usage rates across demographics. An algorithm trained on arrest data will encode those disparities as “ground truth.”

A 2020 study from the National Institute of Justice examined five risk assessment tools used in state courts and found that all of them overpredicted violent recidivism for Black defendants by 20% or more compared to white defendants. The error was not random—it was systematic. The tool with the lowest bias (the Correctional Offender Management Profiling for Alternative Sanctions variant called COMPAS-2) still showed a 12% gap.

Another propagation path is feedback loops. When a predictive policing algorithm sends officers to a high-risk neighborhood, more arrests occur there, generating more data that reinforces the algorithm's original call. Legal systems that rely on these outputs for bail decisions compound the problem: a higher score leads to pre-trial detention, which increases the likelihood of a guilty plea, which generates a predicted future crime statistic. The loop becomes self-justifying.

Procedural Transparency and the Right to Explanation

The European Union's General Data Protection Regulation (GDPR) includes a “right to explanation” in Article 22, allowing individuals to contest automated decisions. No equivalent federal law exists in the United States. As a result, defendants in many states cannot access the source code, training data, or validation results for tools used in their own cases.

New York's 2020 Algorithmic Accountability Bill attempted to mandate public audits for any automated decision system used by state agencies, but it stalled in committee. California's 2021 AB-13 went further, requiring pretrial agencies to disclose all variables used in risk scores and to validate tools against local population data every two years. As of early 2024, only three states have passed any form of algorithmic transparency legislation for courtroom tools.

Courts themselves have created patchwork rulings. In State v. Loomis (2016), the Wisconsin Supreme Court ruled that COMPAS could be used at sentencing provided judges receive warnings about its limitations. But what constitutes an adequate warning? A 2022 study by Georgetown Law found that standard judicial warnings reduced reliance on risk scores by only 8%, while more specific warnings—pointing out the tool's 12% false-positive gap—reduced reliance by 31%. The format matters.

Human Judgment: What Algorithms Cannot Replicate

Three capabilities remain firmly outside algorithmic reach: weighing mitigating circumstances, reading emotional testimony, and integrating community-specific knowledge.

Mitigating circumstances. A 2019 case in New Mexico involved a defendant charged with drug possession who had been the victim of domestic violence the same week and was coerced into holding the drugs. The prosecutor offered a plea deal contingent on the defendant's risk score. The algorithm had no mechanism to input the coercion data. The judge learned of the situation during a sidebar conversation and dismissed the case.

Emotional testimony. Algorithms process text as tokens, not as human expression. In a 2021 study published in the Stanford Law Review, researchers fed victim impact statements through sentiment analysis tools and found that the algorithms rated formal, measured statements as “neutral” or “low emotion,” while statements with fragmented grammar scored as “high distress.” A human judge knows that trauma often fractures speech; an algorithm counts fragments as a signal of severity.

Community knowledge. A judge in rural Maine in 2020 handled a juvenile case where a 14-year-old had been flagged by a predictive algorithm as “high risk” because his older siblings had prior records. The judge knew the family personally—the siblings' issues involved substance abuse, but the 14-year-old was a star student with no behavioral problems. The judge ordered a 90-day monitoring period instead of detention. The algorithm had no concept of individual distinction within a household.

Practical Strategies for Legal Professionals

If you work in a courtroom that uses algorithmic tools, these steps can help you audit outputs and advocate for fairer processes:

Request the full variable list. Every risk assessment tool uses a defined set of input factors. Ask for them in writing. If a tool uses 137 variables but only 6 are criminal-history related, the rest might be proxies for poverty—zip code, employment status, housing instability.
Perform a local validation check. Most tools are validated on national datasets. Ask your court administrator to run a comparison: what percentage of high-risk scores from your county actually recidivate within two years? If the number differs by more than 15%, the tool may not fit your population.
Document overrides. When you deviate from an algorithm's recommendation, record why. Patterns in these overrides—such as frequent adjustments for first-time offenders or for cases involving mental health—can reveal where the tool is systematically wrong.
Insist on adversarial testing. Before adopting a new tool, require the vendor to submit to a third-party audit. The 2023 New York City AI Audit Law requires this for any system affecting civil rights; use it as a model for local procurement contracts.
Educate juries on human limits, not algorithmic ones. Juries are increasingly exposed to “AI evidence” such as facial recognition matches or algorithmic predictions. Defense attorneys should file pre-trial motions requiring the prosecution to disclose error rates under conditions similar to the case at hand.

Regulatory Shifts on the Horizon

Three developments through early 2024 signal where regulation is heading. First, the National Association of Criminal Defense Lawyers published a model statute in January 2024 requiring any risk assessment tool used in criminal court to be open-source, with full training data and performance audits published annually. Colorado and Vermont have introduced bills based on this model.

Second, the American Bar Association's House of Delegates adopted Resolution 112 in August 2023, urging courts to prohibit the use of proprietary black-box algorithms in sentencing unless the defendant has the right to inspect and challenge the code. While non-binding, the resolution shifts professional norms.

Third, the Department of Justice's 2023 guidance on pretrial release explicitly cautions against using risk scores as the sole basis for detention. The guidance cites the Loomis case and recommends that courts treat algorithm outputs as one of at least twelve factors, including community ties, employment history, and health status. A 2024 study from the Brennan Center tracked 50,000 cases in Texas after similar guidance was issued and found that courts relying on algorithms alone had a 22% higher detention rate than courts that supplemented them with the twelve-factor checklist.

The rise of legal algorithms is not reversible. But its direction—toward greater fairness or deeper entrenchment of bias—depends on how rigorously humans audit, override, and regulate the code. The next time you step into a courtroom, ask what the algorithm sees that you don't. Then ask what it misses. The gap between those two answers is where justice lives.

About this article. This piece was drafted with the help of an AI writing assistant and reviewed by a human editor for accuracy and clarity before publication. It is general information only — not professional medical, financial, legal or engineering advice. Spotted an error? Tell us. Read more about how we work and our editorial disclaimer.