A federal judge ruled that OpenAI must hand over 20 million de-identified ChatGPT conversation logs to plaintiffs in a sprawling copyright infringement lawsuit, rejecting the AI giant’s privacy objections.
U.S. District Judge Sidney H. Stein of the Southern District of New York denied OpenAI’s challenge to earlier discovery orders in the consolidated multidistrict litigation, which involves claims from news organizations and a proposed class of authors alleging the company trained its AI models on copyrighted works without permission.
The dispute centered on whether plaintiffs could access a sample of the tens of billions of user conversations—consisting of prompts and ChatGPT’s responses—that OpenAI stores in the ordinary course of business. OpenAI had initially proposed producing a 20-million-conversation sample, but later attempted to limit production to only those logs containing search terms related to plaintiffs’ specific works.
Magistrate Judge Ona T. Wang rejected that approach, finding that even conversations not directly reproducing plaintiffs’ works could be relevant to OpenAI’s fair use defense. OpenAI appealed to Judge Stein, arguing the ruling failed to adequately protect user privacy.
Judge Stein disagreed, distinguishing the case from precedent involving wiretapped phone calls. “Privacy interests in users’ conversations with ChatGPT, which users voluntarily disclosed to OpenAI”, are weaker than those in secretly recorded calls, he wrote.
The court found that three safeguards adequately protected user privacy: reducing the sample from billions to 20 million logs, removing personally identifiable information through de-identification, and the case’s existing protective order limiting disclosure to external parties.
Click here to read the decision of the federal district court for the Southern District of New York, in Re: OpenAI, Inc., Copyright Infringement Litigation.