Mark Zuckerberg is alleged to have approved Meta’s use of “pirated” versions of copyright-protected books to train the company’s artificial intelligence models, according to a court filing in the U.S. by a group of authors. The filing cites internal communications from Meta that suggest the chief executive supported the use of the LibGen dataset, which is known to include pirated material, despite warnings from his own AI executive team regarding the legality of its use.

Concerns Over Legal Compliance

The internal messaging indicated that reliance on a dataset containing pirated material could affect Meta’s negotiations with regulators. An internal note mentions that media portrayal of the use of such data may undermine the company’s standing in discussions with regulators, stating, “Media coverage suggesting we have used a dataset we know to be pirated… may undermine our negotiating position with regulators.”

The Lawsuit Against Meta

Among the plaintiffs are notable authors including Ta-Nehisi Coates and comedian Sarah Silverman, who are suing Meta for copyright infringement, asserting that their works were improperly utilized in training Llama, the large language model that underpins Meta’s chatbots. The Library Genesis, or LibGen, is described as a “shadow library” originating in Russia, claiming to house millions of literary works and scientific journals.

Legal Precedents

In the past year, there has been heightened attention on the intersection of copyright and AI, particularly regarding the usage of copyrighted content in training generative AI tools such as ChatGPT. A New York federal court, for instance, ordered LibGen’s undisclosed operators to pay $30 million in damages for copyright violations. This ruling underscores the complexities and tensions surrounding AI training data sources.

Internal Discussions and Legal Proceedings

The court filings suggest that after escalating the matter to Zuckerberg, the AI team at Meta was sanctioned to proceed with the LibGen data. The team reportedly expressed reservations about accessing LibGen files, indicating that downloading pirated data from a Meta corporate laptop “doesn’t feel right.”

Judicial Developments

In a ruling last year, U.S. District Judge Vince Chhabria dismissed certain claims regarding copyright infringement related to text generated by Meta’s AI models. However, he permitted the plaintiffs to revise their claims in light of new evidence. During a hearing on Thursday, Chhabria indicated a willingness to allow the writers to file an amended complaint but cast doubt on the viability of some of their claims related to copyright management information (CMI) and fraud allegations.

Meta has been contacted for comment on these allegations, as the case continues to evolve with implications for copyright, AI ethics, and regulatory scrutiny in the tech industry.