Meta Platforms, the parent company of Facebook and Instagram, is currently facing a significant legal battle. Some authors, including comedian Sarah Silverman and Pulitzer Prize winner Michael Chabon, have consolidated their lawsuits against Meta, alleging the unauthorized use of their copyrighted works. These authors assert that Meta used their books without permission to train its artificial intelligence language model, Llama.
Details of the lawsuit
The lawsuit, filed on December 11, intensifies the scrutiny over Meta’s practices in developing AI technologies. A critical component of the lawsuit is the inclusion of chat logs from a Meta-affiliated researcher. These logs, discussed in a Discord server, suggest that Meta was aware its use of the books might contravene U.S. copyright law. Researcher Tim Dettmers, a doctoral student at the University of Washington, is quoted in these logs discussing the legal implications of using copyrighted materials as training data for AI models.
In 2021, Dettmers mentioned discussions with Meta’s legal department regarding the legality of using book files for training. The logs reveal that Meta’s lawyers had expressed concerns about using such data, indicating a potential awareness of the legal risks involved.
Impact on the AI industry
This lawsuit comes amidst a growing number of legal challenges faced by tech companies over their use of copyrighted content to train generative AI models. These models, which have gained global attention and spurred substantial investment, are being scrutinized for their data-sourcing practices. The outcome of these cases could significantly influence the generative AI landscape, potentially increasing the costs of developing AI models by requiring compensation for content creators.
Moreover, emerging AI regulations in Europe could compel companies to disclose their training data, further exposing them to legal risks. This legal environment is increasingly becoming a concern for AI developers and the tech industry.
Meta’s Llama models and training data disclosure
Meta released the first version of its Llama language model in February, detailing the datasets used for its training, which included the “Books3 section of ThePile.” This dataset reportedly contains 196,640 books. However, for its latest version, Llama 2, released for commercial use in the summer, Meta has not disclosed the training data used.
Llama 2, offered freely to companies with fewer than 700 million monthly active users, has been seen as a potential disruptor in the generative AI software market. It poses a challenge to established players like OpenAI and Google, who charge for using their models.
The lawsuit against Meta Platforms highlights the complex legal and ethical issues surrounding AI development. As AI technologies become more advanced and integral to various industries, the importance of responsibly sourcing training data is becoming increasingly evident. The outcome of this lawsuit could set a significant precedent for how AI models are trained and the balance between innovation and copyright protection.
Meta’s response to these allegations and the legal decisions that follow will be closely watched by the tech community and content creators alike. This case underscores the need for clear legal frameworks and ethical guidelines in the rapidly evolving field of artificial intelligence.