When an artist spends weeks perfecting a digital painting only to see an AI generate a strikingly similar image in seconds, the frustration is visceral. But the legal battle is more than just bruised egos—it’s a showdown over the economic future of creative work. Since 2023, a wave of class-action lawsuits and corporate legal actions has swept through the U.S. and UK courts, challenging the core assumption that AI companies can scrape the internet for training data without paying creators. This article unpacks the specific legal theories being tested, the outcomes that have already shifted industry behavior, and the practical implications for anyone building or using generative AI tools today.
The fundamental argument in almost every AI copyright case boils down to a simple question: is using copyrighted images, text, or music to train a generative model a form of infringement, or is it a protected fair use? The answer is not binary. Plaintiffs argue that each image in a training set is copied and stored internally by the model, and that the model’s output can reproduce the “style” or even exact elements of the original work without attribution.
Under US copyright law, the “fair use” defense (Section 107 of the Copyright Act) considers four factors: the purpose of the use, the nature of the copyrighted work, the amount of the work used, and the effect on the market. AI companies argue their use is “transformative” (a new creation, not a replica) and that training is an intermediate technical process. Artists counter that scraping millions of images without opt-in is not transformative—it’s commercial exploitation of their intellectual property.
A key nuance often missed: even if training is deemed fair use, generating outputs that mimic a specific artist’s recognizable style could still lead to infringement claims if the output is commercialized. The US Copyright Office’s 2023 guidance stated that AI-generated works can be copyrighted only if a human made “sufficient creative decisions,” further muddying the waters for derivative disputes.
Not all lawsuits are created equal. Some have been dismissed early, while others have survived motions to dismiss and are moving toward discovery, where internal model training details will be exposed.
This is arguably the most concrete case to follow. Getty Images filed suit in February 2023 in the UK High Court and later in Delaware, alleging that Stability AI copied over 12 million watermark-stamped photographs from Getty’s database to train Stable Diffusion. Getty’s strategy is unique because they own the metadata and licensing records for those images, making it easier to prove each individual infringement. The UK case passed a preliminary hearing in late 2023, with the judge refusing to strike down Getty’s copyright claims. Stability AI has counterargued that the images were publicly accessible and that the model does not “store” them in a conventional sense.
Filed by three visual artists (Andersen, McKernan, and Ortiz) in January 2023, this class action targets the “large-scale, systematic copying” of hundreds of millions of images. A critical ruling came in July 2023 when US District Judge William Orrick dismissed most claims but allowed the direct copyright infringement claim against Stability AI to proceed because artists plausibly alleged that their works were included in the LAION-5B dataset used to train Stable Diffusion. This case is now in discovery, and industry watchers expect it to reveal whether models can reconstruct near-exact copies of training images, which would severely undermine the fair use defense.
Although not purely visual arts, the New York Times lawsuit (filed December 2023) sets a precedent for text-based generative AI. The Times provided dozens of examples where ChatGPT regurgitated near-verbatim paragraphs from paywalled articles. If this argument succeeds, it could force AI companies to license news and article content, which would directly affect tools like ChatGPT and Google’s Gemini. The case is in early stages, but OpenAI has already struck licensing deals with Axel Springer and the Associated Press, signaling that big publishers may settle, leaving individual creators without leverage.
These lawsuits are not abstract legal debates—they’re already changing how generative AI tools are built, marketed, and used. For those deploying AI in creative workflows, three shifts are already visible.
Stability AI launched a “Opt-Out” system in late 2023, allowing artists to submit their work to a database that the model will exclude from future training. However, critics note that the process requires artists to know about the system and manually submit each image. In contrast, Adobe’s Firefly was trained exclusively on licensed stock photos and public domain works, and Adobe has agreed to indemnify enterprise users against copyright claims. This makes Firefly a safer choice for commercial outputs, though its style diversity is narrower than Stable Diffusion’s.
Shutterstock and Getty Images have both launched their own generative AI tools that use only licensed content. Shutterstock’s generator, powered by OpenAI’s DALL-E, pays contributing artists each time a model is trained or an image is generated. Getty’s tool, developed with Nvidia, uses only Getty’s Creative Commons and in-house photography. For a small business, using these licensed tools eliminates the risk of accidentally generating a derivative of Harry Potter or a Disney character, which can trigger legal takedown notices.
The EU’s AI Act, expected to be fully enforced by 2026, will require companies to disclose a “sufficiently detailed summary” of the copyrighted data used to train their models. This means future lawsuits will be easier to prove because plaintiffs can identify exactly which images were used. For practitioners, this suggests that training custom models on controlled datasets (e.g., your own photos) will become more attractive than relying on opaque, internet-scraped models.
Both sides of this conflict often fall into predictable traps. Understanding these can save time and legal exposure.
By mid-2024, several trends are solidifying that will define the next three years. First, the availability of “foundation models” trained openly on the internet will shrink. Meta released LLaMA 3 with a more restrictive license that prohibits certain commercial uses, and OpenAI has not disclosed the training data for GPT-4. Expect more “black box” models with limited transparency.
Startups like Bria and Spawning are building AI image generators using only images that artists have opted in to. Spawning’s tool allows creators to set a price for their style being used, potentially creating a new revenue stream. If lawsuits ultimately rule against scraped training, these licensed pools will become the only legally safe option for commercial use.
Some large media companies, such as Condé Nast and The Atlantic, have already signed contracts with AI firms that specify how AI can be used in content creation. These contracts typically include clauses that require human final approval, limit the number of AI-generated paragraphs per article, and forbid using the AI to plagiarize staff writers. Independent creators may want to explore similar terms in their agreements with platforms like Medium or Substack.
In the US, the “Artificial Intelligence Copyright Disclosure Act” introduced in April 2024 would require AI companies to report all copyrighted works used in training. It has bipartisan support but is not yet law. If passed, it would effectively kill the argument that training data is a trade secret, giving artists a direct way to demand compensation.
Whether you are an artist trying to protect your work or a developer building tools, the safest path today is to choose licensed platforms and document your usage. If you are an artist, register your best works with the US Copyright Office before the next wave of model releases—registration is a prerequisite for suing for statutory damages. If you are a developer, audit your training data: if you cannot prove every image was either public domain, explicitly licensed, or generated by you, you are sitting on a legal liability. Start building a pipeline that filters out copyrighted material now, before the courts force you to. The AI copyright crackdown is not slowing down; it is the single factor that will decide whether generative AI becomes a sustainable industry or a legal minefield.
Browse the latest reads across all four sections — published daily.
← Back to BestLifePulse