In early 2023, a class-action lawsuit filed against Stability AI, Midjourney, and DeviantArt sent shockwaves through the generative AI community. The core allegation: these companies trained their image-generating models on billions of copyrighted images scraped from the web without consent, compensation, or credit. For creators watching their distinctive styles get replicated by a text prompt, this felt like a long-overdue reckoning. But behind the headlines lies a complex web of fair use doctrine, technical realities of model training, and unresolved questions about whether AI outputs are derivative works or entirely new creations. This article unpacks exactly what the lawsuits allege, how each defendant is responding, and what it means for artists and developers navigating this uncertain terrain.
The consolidated complaint, filed by artists Sarah Andersen, Kelly McKernan, and Karla Ortiz, targets the core pipeline of modern image generation: training data. The plaintiffs argue that Stable Diffusion, the open-source model underlying many tools, was trained by copying 2.3 billion images from the LAION-5B dataset, which was compiled by scraping web-crawled data. The lawsuit asserts that every image stored in that dataset constitutes an unauthorized reproduction of the original copyrighted work, and that the model itself is a compressed copy of those billions of images.
Prior to 2022, copyright disputes around AI largely involved specific outputs that closely replicated existing works—for example, a music-generating AI producing a melody uncannily similar to a pop song. The Stability AI case goes further by challenging the training process itself, arguing that even if the model never reproduces the original image exactly, the act of storing a mathematical representation derived from that image still infringes copyright. This is a novel legal theory, and it hinges on whether the court views intermediate copies (made during training) as copyright violations similar to downloading an image from the web.
The complaint also introduces the concept of "style mimicry" as a distinct harm. The plaintiffs claim that Midjourney and DreamStudio (DeviantArt's AI tool) allow users to generate images in the style of specific living artists by referencing their names in prompts—like "in the style of Sarah Andersen"—without the artist's permission. While style itself is not copyrightable under current U.S. law, the argument here is that the model learned to mimic it through training on copyrighted works, which may constitute a form of derivative creation.
To understand the legal stakes, you need to grasp how a model like Stable Diffusion actually works. The training process does not store images as files. Instead, it analyzes patterns across billions of images, compressing those patterns into a set of mathematical weights. When you prompt the model, it reconstructs an image from random noise guided by those weights—never by retrieving a stored copy.
Stability AI's defense leans heavily on the fact that the model is not a database of images. In legal filings, the company has argued that the training process is akin to a human artist studying thousands of paintings to learn a style, then creating original works. The plaintiffs counter that this analogy fails because an AI can reproduce near-identical copies of training images after certain modifications—as demonstrated by researchers at Google and elsewhere, who successfully extracted recognizable images from diffusion models using guided reverse engineering. The key nuance: these extractions require specific adversarial techniques, not standard prompting.
Another critical technical detail is that not all training data is treated equally. The LAION dataset includes images under various licenses—some public domain, some Creative Commons, some scraped from social media or personal portfolios without any explicit permission. The legal status of each subset varies wildly. For instance, a photograph shared on Flickr under a non-commercial CC license may be legally usable for research but not for a commercial product like Midjourney's paid subscription tiers. The lawsuits do not distinguish between these categories; they treat all scraped images as infringing unless proof of explicit consent exists.
Stability AI, Midjourney, and DeviantArt have taken notably different stances, reflecting their different business models and legal resources.
As the company behind the open-source Stable Diffusion, Stability AI has argued that releasing the model weights under a permissive license means they are not responsible for how users apply it. However, the lawsuit targets the company's own training and distribution of the model, not the actions of downstream users. In late 2023, Stability AI moved to dismiss the case, arguing that the model does not store copies of copyrighted images and that the plaintiffs failed to identify specific images that were infringed. The court partially denied this motion, allowing core claims to proceed—including direct copyright infringement for the training process itself.
Midjourney, which is not open-source, has disclosed less about its training data. The lawsuit claims that the company used the LAION dataset but augmented it with additional scraped images from platforms like ArtStation and Pinterest. Midjourney's legal team has argued that the plaintiffs lack standing because they cannot prove their specific works were included in the training set. The artist Karla Ortiz, however, has presented evidence that her artworks appear in the LAION dataset, which the company has not contested. Midjourney has attempted to shift focus to the transformative use argument, claiming that outputs are "new" creative products.
DeviantArt, a long-established platform for artists, faced unique criticism because it launched DreamStudio, an AI generator integrated directly into its platform. The lawsuit alleges that DreamStudio was trained on user-uploaded artwork without explicit consent, even when those uploads were clearly tagged with copyright notices. DeviantArt responded by introducing an "opt-out" system that lets artists request their images be excluded from future training—but crucially, this opt-out only applied to images uploaded after a certain date. Thousands of artists who had portfolios on the site for years found their work included without any say. DeviantArt also attempted to argue that the case should be dismissed because the plaintiffs signed terms of service that permitted the platform to use submitted content—a claim that was met with skepticism from the court, as those terms were not explicitly tied to AI training.
As of mid-2024, the case is still in pretrial motions, but several pivotal decisions have been made.
Regardless of the final outcome, the litigation has already reshaped how the AI industry operates and how artists protect their work.
If you are a visual artist concerned about your work being used, several concrete steps can reduce your exposure. First, check whether your work appears in the LAION dataset using tools like haveibeentrained.com, which searches image hashes. If it does, you can technically request removal from future dataset versions, though the dataset maintainers have no legal obligation to comply in most jurisdictions. Second, consider adding a clear copyright watermark to high-resolution versions of your work, even if you share lower-resolution previews online. While this does not prevent scraping, it makes it harder for AI trainers to claim ignorance. Third, if you use platforms like DeviantArt or ArtStation, regularly review their terms of service for changes regarding AI training and opt-out mechanisms available.
For small AI startups building on open-source models, the case offers a warning: training on scraped data carries legal risk, even if the model's weights are publicly available. A practical alternative is to train only on curated datasets with clear licenses—such as images from the OpenImages dataset or works released under Creative Commons Zero (CC0) license. Some companies now license training data directly from stock photo agencies or partner with artists for consent. Another approach is to use fine-tuning on smaller, private datasets rather than relying solely on LAION-style scraping. While this reduces the model's versatility, it dramatically lowers infringement liability.
Several key issues remain unsettled, and their resolution will likely shape the next decade of AI development.
First, the question of whether AI-generated images can be copyrighted themselves remains open. The U.S. Copyright Office has stated that works created entirely by AI without human creativity are not eligible for copyright, but images with significant human input (like carefully crafted prompts and post-editing) may be. Second, the legal definition of "transformative use" in the context of AI training is still unclear. Traditional fair use analysis weighs whether a secondary use adds new expression or meaning—but an AI model does not "express" meaning; it only generates outputs based on learned patterns. Third, international differences loom: Europe's GDPR and the upcoming EU AI Act impose stricter consent requirements for training data, while China has not yet addressed the issue in court.
This is a rapidly evolving area where court decisions, not just settlement announcements, matter. Follow docket filings for the case Andersen v. Stability AI Ltd. (Northern District of California) if you want the most accurate updates—most legal analysis in the press is based on these filings. If you are an artist considering joining the class action, consult an attorney who specializes in intellectual property, as the opt-in period may have limited windows. For developers, monitor updates to the LAION dataset—the organization has already announced plans to introduce consent-based filters, but implementation is slow.
The single move worth making: assume that any image you post publicly online can and will be scraped for AI training unless you take deliberate measures to prevent it. This does not mean you should stop sharing your work—but it does mean understanding the trade-offs. Watermarking consistently, monitoring terms of service, and supporting organizations like the American Society for Collective Rights Licensing in their push for AI-specific legislation are concrete steps you can take today. The outcome of this lawsuit will not settle every question, but it will set a precedent that determines whether artists have a meaningful right to say no to being a training data point.
Browse the latest reads across all four sections — published daily.
← Back to BestLifePulse