Big tech has been stealthily training its AI models. Creatives are finally waking up to the dangers. Are they too late?

More than 20,000 artists, writers, composers and other cultural creatives are objecting to unlicensed scraping of their work in the AI space race

ChatGPT: John Grisham and George RR Martin are among the authors suing OpenAI. Photograph: Arsenii Vaselenko/NYT
ChatGPT: John Grisham and George RR Martin are among the authors suing OpenAI. Photograph: Arsenii Vaselenko/NYT

“The unlicensed use of creative works for training generative AI is a major, unjust threat to the livelihoods of the people behind those works, and must not be permitted.”

The statement above has already been signed by more than 20,000 artists, writers, composers and other cultural creatives since it was launched by the composer and former AI executive Ed Newton-Rex. Signatories include the actor Julianne Moore, the musicians Thom Yorke and Max Richter, and the novelists Kazuo Ishiguro and Ann Patchett.

It’s the latest act of what may be turn out to have been a belated campaign to stop tech companies mining the entirety of human creative output to “train” their large language models, or LLMs. For several years they’ve been stealthily doing this on an ask-questions-later basis. More recently, big media companies such as the New York Times have launched civil proceedings for breach of copyright. In the United States John Grisham and George RR Martin are among a group of authors suing OpenAI’s ChatGPT, while Sony Music, Universal Music Group and Warner Music Group are suing the AI-music creators Suno and Udio. Those cases may or may not succeed. But expect plenty more of them.

A Googlepocalypse is sweeping the United States – and its devastating effects are on their way to IrelandOpens in new window ]

A California court recently dismissed a claim of vicarious copyright infringement by the writer and comedian Sarah Silverman and the novelist Paul Tremblay against ChatGPT on the basis that the authors had not shown a “substantial similarity” between their books and ChatGPT’s output. The authors’ claim that ChatGPT outputs are “infringing derivative work” was deemed insufficient. The judge did, however, rule that OpenAI should continue to face an allegation that it had violated unfair-competition law by using copyrighted books without their authors’ permission.

READ MORE

That seems to be the crux of the matter and is what is currently being debated not just in courtrooms but also in political corridors of power. What Yorke, Moore, Ishiguro and the rest are protesting is not straightforward, old-fashioned plagiarism or theft. ChatGPT and the other LLMs are not producing simulacrums of their works. They are mining them in order to turn out infinite variations that draw on their themes and styles.

Irish composer Jennifer Walshe on AI music: ‘If you came up with the idea for I Glued My Balls to My Butthole Again, is that art?’Opens in new window ]

That they should not be able to do so without the prior permission of the authors, you might think, should be pretty cut and dried. Not necessarily.

The Financial Times reported last week that the UK government is debating whether to introduce regulations that would permit tech companies to scrape creative content from any source that had not actively opted out of being used in this way. Content creators argue that this would effectively give carte blanche to the scrapers, as it would be difficult to know how to manage opt-outs, which would become an unsustainable burden for smaller creators. Far better, they argue, to have an opt-in system where the artificial-intelligence companies have to apply for the right to use specific content and can then negotiate appropriate rates.

“It’s totally unfair to put the burden of opting out of AI training on the creator whose work is being trained on,” Newton-Rex told the Guardian this week. “If a government really thought this was a good thing for creators then it would create an opt-in scheme.”

According to the report, however, UK ministers will soon unveil a consultation paper that favours the AI companies’ preferred option.

The European Union’s new AI Act, which came into force in August, also allows companies to mine content so long as rights holders have not expressly denied permission.

Will AI take over art? ‘No amount of sentimentality is going to stop it from happening’Opens in new window ]

All this takes place against a broader backdrop of authorities in the EU and UK alike being increasingly fearful of being left behind in what has rapidly become an AI space race, and of the big US-based companies being unafraid to exert their financial muscle to get what they want.

There is still a lot to play for, however. In the EU the granular detail of how the AI Act will be implemented remains to be worked out. One element worth keeping an eye on is a requirement that the companies provide “detailed summaries” of the data used to train their models. In theory, content creators who discovered that their work had been used to train an AI model could seek compensation, although this will undoubtedly be tested in court. OpenAI has drawn criticism for refusing to divulge the content it has used, citing commercial confidentiality.

Maximilian Gahntz, AI policy lead at the non-profit Mozilla Foundation, told Reuters that companies are “going out of their way to avoid transparency. The AI Act presents the best chance to shine a light on this crucial aspect and illuminate at least part of the black box.”