In the ever-evolving landscape of artificial intelligence (AI), OpenAI finds itself in the midst of controversy as the debate surrounding the use of copyrighted material in training AI models intensifies. The recent lawsuit by the New York Times (NYT) against OpenAI and Microsoft has brought attention to the pivotal role copyrighted content plays in the development of groundbreaking AI tools like ChatGPT.
OpenAI, a prominent developer in the AI space, contends that the creation of advanced AI tools such as ChatGPT would be deemed impossible without access to copyrighted material. Chatbots and image generators like Stable Diffusion rely on extensive datasets sourced from the internet, a considerable portion of which falls under copyright protection—a legal safeguard against unauthorized use of creative works.
Last month's legal action by the NYT accused OpenAI and Microsoft of "unlawful use" of its content to build their AI products. In response, OpenAI submitted a statement to the House of Lords communications and digital select committee, asserting that large language models like GPT-4, the technology powering ChatGPT, cannot be trained effectively without leveraging copyrighted works.
OpenAI's submission emphasized the ubiquitous nature of copyright, covering a wide array of human expressions, including blog posts, photographs, forum discussions, software code snippets, and government documents. The company argued that restricting training materials to out-of-copyright books and drawings from over a century ago would yield interesting experiments but fail to produce AI systems meeting the needs of contemporary users.
In addressing the NYT lawsuit, OpenAI declared its support for journalism, highlighted its partnerships with news organizations, and dismissed the legal action as lacking merit. The company reiterated its commitment to respecting the rights of content creators and owners while asserting its belief that, legally, copyright law does not prohibit the training of AI models.
The defense of using copyrighted material by AI companies often relies on the legal concept of "fair use," allowing certain uses of content without seeking explicit permission from the owner. OpenAI, in its submission, maintained that it adheres to fair use principles and asserted that copyright law does not pose a legal barrier to AI training.
The NYT lawsuit is not the only legal challenge faced by OpenAI. In September, prominent authors, including John Grisham, Jodi Picoult, and George RR Martin, filed a lawsuit alleging "systematic theft on a mass scale." Getty Images is pursuing legal action against Stability AI, the creator of Stable Diffusion, for alleged copyright breaches, while a group of music publishers, including Universal Music, is suing Anthropic, the company behind the Claude chatbot, for alleged misuse of copyrighted song lyrics.
In its House of Lords submission, OpenAI addressed AI safety concerns and expressed support for independent analysis of its security measures. The company endorsed the practice of "red-teaming," where external researchers assess the safety of AI systems by simulating the actions of rogue actors. OpenAI has joined other companies in an agreement to collaborate with governments on safety testing their most powerful models, both before and after deployment, following a global safety summit in the UK last year.
As the legal landscape surrounding AI and copyrighted material continues to evolve, OpenAI remains at the forefront, navigating challenges and emphasizing the necessity of copyrighted content in the development of cutting-edge AI technologies.