The AI Copyright Crackdown: Why Artists Are Suing and What It Means for Generative AI

Apr 21·7 min read·AI-assisted · human-reviewed

When an artist spends weeks perfecting a digital painting only to see an AI generate a strikingly similar image in seconds, the frustration is visceral. But the legal battle is more than just bruised egos—it’s a showdown over the economic future of creative work. Since 2023, a wave of class-action lawsuits and corporate legal actions has swept through the U.S. and UK courts, challenging the core assumption that AI companies can scrape the internet for training data without paying creators. This article unpacks the specific legal theories being tested, the outcomes that have already shifted industry behavior, and the practical implications for anyone building or using generative AI tools today.

The Core Legal Dispute: Training Data as Theft

The fundamental argument in almost every AI copyright case boils down to a simple question: is using copyrighted images, text, or music to train a generative model a form of infringement, or is it a protected fair use? The answer is not binary. Plaintiffs argue that each image in a training set is copied and stored internally by the model, and that the model’s output can reproduce the “style” or even exact elements of the original work without attribution.

What the Law Actually Says

Under US copyright law, the “fair use” defense (Section 107 of the Copyright Act) considers four factors: the purpose of the use, the nature of the copyrighted work, the amount of the work used, and the effect on the market. AI companies argue their use is “transformative” (a new creation, not a replica) and that training is an intermediate technical process. Artists counter that scraping millions of images without opt-in is not transformative—it’s commercial exploitation of their intellectual property.

A key nuance often missed: even if training is deemed fair use, generating outputs that mimic a specific artist’s recognizable style could still lead to infringement claims if the output is commercialized. The US Copyright Office’s 2023 guidance stated that AI-generated works can be copyrighted only if a human made “sufficient creative decisions,” further muddying the waters for derivative disputes.

Major Lawsuits Transforming the Landscape

Not all lawsuits are created equal. Some have been dismissed early, while others have survived motions to dismiss and are moving toward discovery, where internal model training details will be exposed.

Getty Images vs. Stability AI

This is arguably the most concrete case to follow. Getty Images filed suit in February 2023 in the UK High Court and later in Delaware, alleging that Stability AI copied over 12 million watermark-stamped photographs from Getty’s database to train Stable Diffusion. Getty’s strategy is unique because they own the metadata and licensing records for those images, making it easier to prove each individual infringement. The UK case passed a preliminary hearing in late 2023, with the judge refusing to strike down Getty’s copyright claims. Stability AI has counterargued that the images were publicly accessible and that the model does not “store” them in a conventional sense.

Sarah Andersen et al. vs. Stability AI, Midjourney, and DeviantArt

Filed by three visual artists (Andersen, McKernan, and Ortiz) in January 2023, this class action targets the “large-scale, systematic copying” of hundreds of millions of images. A critical ruling came in July 2023 when US District Judge William Orrick dismissed most claims but allowed the direct copyright infringement claim against Stability AI to proceed because artists plausibly alleged that their works were included in the LAION-5B dataset used to train Stable Diffusion. This case is now in discovery, and industry watchers expect it to reveal whether models can reconstruct near-exact copies of training images, which would severely undermine the fair use defense.

The New York Times vs. OpenAI and Microsoft

Although not purely visual arts, the New York Times lawsuit (filed December 2023) sets a precedent for text-based generative AI. The Times provided dozens of examples where ChatGPT regurgitated near-verbatim paragraphs from paywalled articles. If this argument succeeds, it could force AI companies to license news and article content, which would directly affect tools like ChatGPT and Google’s Gemini. The case is in early stages, but OpenAI has already struck licensing deals with Axel Springer and the Associated Press, signaling that big publishers may settle, leaving individual creators without leverage.

Practical Impact on AI Tools You Use Today

These lawsuits are not abstract legal debates—they’re already changing how generative AI tools are built, marketed, and used. For those deploying AI in creative workflows, three shifts are already visible.

Opt-Out and Filtering Mechanisms

Stability AI launched a “Opt-Out” system in late 2023, allowing artists to submit their work to a database that the model will exclude from future training. However, critics note that the process requires artists to know about the system and manually submit each image. In contrast, Adobe’s Firefly was trained exclusively on licensed stock photos and public domain works, and Adobe has agreed to indemnify enterprise users against copyright claims. This makes Firefly a safer choice for commercial outputs, though its style diversity is narrower than Stable Diffusion’s.

Licensing and Royalty Models

Shutterstock and Getty Images have both launched their own generative AI tools that use only licensed content. Shutterstock’s generator, powered by OpenAI’s DALL-E, pays contributing artists each time a model is trained or an image is generated. Getty’s tool, developed with Nvidia, uses only Getty’s Creative Commons and in-house photography. For a small business, using these licensed tools eliminates the risk of accidentally generating a derivative of Harry Potter or a Disney character, which can trigger legal takedown notices.

Model Training Transparency

The EU’s AI Act, expected to be fully enforced by 2026, will require companies to disclose a “sufficiently detailed summary” of the copyrighted data used to train their models. This means future lawsuits will be easier to prove because plaintiffs can identify exactly which images were used. For practitioners, this suggests that training custom models on controlled datasets (e.g., your own photos) will become more attractive than relying on opaque, internet-scraped models.

Common Mistakes Artists and Developers Make

Both sides of this conflict often fall into predictable traps. Understanding these can save time and legal exposure.

Assuming “publicly available” means “free to use”: Many AI developers believe that because an image appears on a website or in a dataset like LAION-5B, it is permissible to train on it. Legally, this is unproven. The public display of an image does not grant a license to copy it for commercial model training.
Over-relying on style mimicry: An artist who uses AI to generate images “in the style of” a famous living artist without permission is risking a lawsuit for misappropriation or unfair competition, even if the AI company is the one facing the primary suit. Several high-profile cases involving Disney and Marvel characters have already led to cease-and-desist letters.
Ignoring non-US jurisdictions: The UK’s Intellectual Property Office proposed a code of practice in 2023 that would require AI firms to respect opt-out requests, but the UK government has not passed a law. In Germany, courts have already issued preliminary injunctions against AI companies for training on private social media posts. A developer deploying a model globally must account for the strictest jurisdiction.
Failing to document provenance: When using AI tools to create commercial content, keep logs of which version of the model you used, what prompt you entered, and whether the tool’s license includes indemnification. In the event of a copyright claim against your end product, this documentation can show good-faith reliance on a licensed platform.

What the Future of Generative AI Looks Like

By mid-2024, several trends are solidifying that will define the next three years. First, the availability of “foundation models” trained openly on the internet will shrink. Meta released LLaMA 3 with a more restrictive license that prohibits certain commercial uses, and OpenAI has not disclosed the training data for GPT-4. Expect more “black box” models with limited transparency.

The Rise of Licensed Data Pools

Startups like Bria and Spawning are building AI image generators using only images that artists have opted in to. Spawning’s tool allows creators to set a price for their style being used, potentially creating a new revenue stream. If lawsuits ultimately rule against scraped training, these licensed pools will become the only legally safe option for commercial use.

Human-AI Collaboration Contracts

Some large media companies, such as Condé Nast and The Atlantic, have already signed contracts with AI firms that specify how AI can be used in content creation. These contracts typically include clauses that require human final approval, limit the number of AI-generated paragraphs per article, and forbid using the AI to plagiarize staff writers. Independent creators may want to explore similar terms in their agreements with platforms like Medium or Substack.

Potential Legislation

In the US, the “Artificial Intelligence Copyright Disclosure Act” introduced in April 2024 would require AI companies to report all copyrighted works used in training. It has bipartisan support but is not yet law. If passed, it would effectively kill the argument that training data is a trade secret, giving artists a direct way to demand compensation.

What This Means for Creators and Developers

Whether you are an artist trying to protect your work or a developer building tools, the safest path today is to choose licensed platforms and document your usage. If you are an artist, register your best works with the US Copyright Office before the next wave of model releases—registration is a prerequisite for suing for statutory damages. If you are a developer, audit your training data: if you cannot prove every image was either public domain, explicitly licensed, or generated by you, you are sitting on a legal liability. Start building a pipeline that filters out copyrighted material now, before the courts force you to. The AI copyright crackdown is not slowing down; it is the single factor that will decide whether generative AI becomes a sustainable industry or a legal minefield.

About this article. This piece was drafted with the help of an AI writing assistant and reviewed by a human editor for accuracy and clarity before publication. It is general information only — not professional medical, financial, legal or engineering advice. Spotted an error? Tell us. Read more about how we work and our editorial disclaimer.