OpenAI stole “massive amounts of personal data” to train ChatGPT, a lawsuit alleges.
According to the proposed class-action lawsuit, Sam Altman’s business “secretly” collected data to train its sophisticated language models, which allowed its chatbot to mimic human speech.
“Despite established protocols for the purchase and use of personal information, Defendants took a different approach: theft,” the lawyers wrote in the 157-page lawsuit, filed on Wednesday in the US District Court for the Northern District of California.
Per the lawsuit, OpenAI amassed enormous amounts of data by web crawling, including a sizable amount of information obtained from social media websites. According to the lawsuit, OpenAI’s proprietary AI corpus of personal data, WebText2, for instance, scraped enormous amounts of information from Reddit posts and the websites they linked to.
The data accessed included “private information and private conversations, medical data, information about children — essentially every piece of data exchanged on the internet it could take — without notice to the owners or users of such data, much less with anyone’s permission,” per the lawsuit.
This amounted to “the negligent and otherwise illegal theft of personal data of millions of Americans who do not even use AI tools,” the lawsuit claims.
Outside of regular business hours, Insider contacted OpenAI for comment, but they did not respond right away.
The lawsuit asserts that in addition to scraping the “digital footprints” of the general public, OpenAI also stores and makes public the private information of users, including the information they provide when creating OpenAI accounts, their chat log data, and social media data.
This includes information from users of applications like Snapchat, Stripe, Spotify, Microsoft Teams, and Slack, among others, that have integrated ChatGPT, the lawsuit claims. The businesses did not answer Insider’s request for comments right away.
In order to prevent OpenAI’s products from “surpassing human intelligence and harming others,” the lawsuit is asking that commercial access to and development of them be temporarily halted until more regulations and safeguards have been put in place by the business. People whose data was accessed to train the bots are also entitled to financial compensation, according to the lawsuit.
Major supporter Microsoft was also listed as a defendant in addition to OpenAI.
The plaintiffs were identified only by their initials, occupations, and state, which their lawyers said was to “avoid intrusive scrutiny as well as any potentially dangerous backlash.”
Generative AI, which can create text, audio, images, and videos, has exploded in popularity since OpenAI released its ChatGPT in November. People have been using generative AI for personal, professional, and academic purposes, though there are concerns about its access to data.
You must be logged in to post a comment Login