A hot potato: More than 8,000 authors including luminaries such as James Patterson, Margaret Atwood, and Jonathan Franzen have signed an open letter asking leaders from the top six AI companies to not use their work for training models without first obtaining consent and offering compensation.
The letter, published by professional writers' organization The Authors Guild, is addressed to the bosses of OpenAI, Alphabet, Meta, Stability AI, IBM, and Microsoft. It calls out the CEOs over the "inherent injustice" in using the authors' works to train their large language models without consent, credit, or compensation.
"These technologies mimic and regurgitate our language, stories, style, and ideas. Millions of copyrighted books, articles, essays, and poetry provide the 'food' for AI systems, endless meals for which there has been no bill," the letter states.
"You're spending billions of dollars to develop AI technology. It is only fair that you compensate us for using our writings, without which AI would be banal and extremely limited."
It's also claimed that many of the book texts that AI systems are trained on come from notorious piracy websites.
NPR writes that an incoming report from The Authors Guild reveals incomes for writers have declined by 42% between 2009 and 2019, with the median income for a full-time writer last year down to $23,000. With generative AIs such as ChatGPT and Bard adding to their pressure, and some companies already replacing workers with these systems, it's easy to understand where the anger comes from.
Mary Rasenberger, CEO of the Authors Guild, said the intention of the letter was to convince the AI companies to settle with the authors without going down the expensive and lengthy lawsuit route. Not that all authors are avoiding legal action: Sarah Silverman, Paul Tremblay and Mona Awad are plaintiffs in class action suits against Meta and/or OpenAI for training their programs on pirated copies of their work.
OpenAI said in a statement (via the Wall Street Journal) that ChatGPT is trained on "licensed content, publicly available content, and content created by human AI trainers and users," adding that the company respects the rights of creators and authors.
It's not just authors whose work is being used for AI training. Google updated its privacy policy earlier this month to explicitly state that the company reserves the right to collect and analyze pretty much anything people share on the web to train its AI systems.
The scraping of text by AI companies is a contentious issue right now. Elon Musk said Twitter limited the number of tweets accounts could read per day to allegedly address "extreme levels" of data scraping and "system manipulation" on the platform. He also threatened to sue Microsoft, which has invested billions into OpenAI, for illegally using Twitter data.
Reddit has also faced a slew of troubles since turning off free access to its APIs to stop data harvesting. The move resulted in over 8,000 subreddits going dark in protest and some switching to NSFW.