The House of Lords’ Communications and Digital Committee has addressed concerns surrounding copyright issues in the development of Large Language Models (LLMs). This discussion sheds light on the clash between AI companies and creators over the concept of proprietary work and the alleged infringement of copyrighted content by LLMs.
During a session chaired by Baroness Stowell, expert witnesses, including Dan Conway, CEO of the Publishers Association, highlighted the widespread copyright concerns in the development of LLMs. Conway argued that LLMs, despite their potential benefits, are not being developed responsibly, ethically, or in compliance with intellectual property (IP) laws. He emphasized instances where LLMs, like those using the Books3 dataset containing pirated titles, engage in massive copyright infringement.
Conway’s assertion aligns with ongoing class actions in the United States, indicating a broader debate within the industry. Some tech giants, including Google, Amazon, and Meta, claim to seek compliance with IP laws. However, the ambiguity of the term “compliance” and the lack of transparency among developers raise questions about the extent to which these companies adhere to copyright regulations.
The legal landscape: copyright and AI development
Dr. Hayleigh Bosher, a copyright expert from Brunel University London, clarified the legal standpoint of the Publishers Association, emphasizing that AI developers need licenses for activities involving copyrighted content. She differentiated between reading a book for personal benefit and AI ‘reading’ datasets for commercial purposes, arguing that the latter typically requires a license.
While some argue for fair-use conventions and the protection of academic inquiry, Bosher maintained that copyright laws are technologically neutral and must be applied to different circumstances. Despite acknowledging that not all AI companies use unlicensed data, the lack of transparency makes it challenging to verify whether proper permissions have been obtained.
Jurisdictional challenges: US vs. UK laws
The debate extends to jurisdictional differences, with companies like Google, Amazon, and Meta asserting compliance within the U.S. legal framework. Richard Mollet of RELX pointed out that while some erroneously argue that U.S. law permits fair use of unlicensed content, UK and EU laws clearly state that commercial entities reproducing copyrighted works for text and data mining must obtain permission from rights holders.
The complexity of U.S. copyright law, particularly regarding fair use and transformative uses, adds another layer of uncertainty. The focus on ‘transformative’ uses becomes a potential legal battleground for generative AI.
A balancing act or obstruction to innovation?
Addressing concerns about copyright being an obstacle to innovation, Dr. Bosher dismissed the notion, citing the withdrawal of a proposal for broad IP law exceptions for data mining in AI and machine learning. She argued that copyright’s purpose is to encourage creativity and innovation while balancing protection and limitations.
Richard Mollet, representing RELX as both a traditional publisher and an AI user, highlighted the importance of copyright for maintaining data quality and incentivizing the creation of high-quality data. He emphasized that transparency in the AI development process is crucial for ensuring trust in the outputs.
Billion-dollar investments and copyright liability
A written note from VC investors Andreessen Horowitz raised eyebrows by suggesting that AI companies’ massive investments were premised on the understanding that copyright allows copying for extracting statistical facts. The note implied that developers assured investors of minimal copyright challenges. However, it also pointed out that under any licensing framework with substantial payments to rights holders, AI developers could face astronomical liabilities.
The focus on ‘statistical’ facts and the reluctance to pay substantial royalties raise ethical questions about the AI industry’s approach to compensating creators for their work.
As the Lords Committee delves into the intersection of AI development and copyright, it becomes evident that the issues are multifaceted. Balancing innovation, copyright protection, and fair compensation for creators presents a challenging landscape. The ongoing debate emphasizes the need for clear regulations, transparency in AI development, and a delicate balance between fostering innovation and protecting intellectual property rights