When OpenAI came out GPT-3 in July 2020, it offered insight into the data used to train the large language model. Millions of pages pulled from the web, Reddit posts, books, etc. are used to create the generative text system, according to technical document. Some of the personal information you share about yourself online is captured in this data. This data is now causing problems for OpenAI.
On March 31, the Italian data regulator issued a temporary emergency decision demanding that OpenAI stop using the personal information of millions of Italians that is included in its training data. According to the regulator, Garante per la Protezione dei Dati Personali, OpenAI does not have the legal right to use the personal information of people in ChatGPT. In response, OpenAI blocked people in Italy from accessing its chatbot while it provides answers to officials, who are investigating further.
The action is the first taken against ChatGPT by a Western regulator and highlights privacy tensions around the creation of giant generative AI models, which are often trained on vast expanses of internet data. As artists And media companies complained that generative AI developers used their work without permission, the data regulator is now saying the same for people’s personal information.
Similar decisions could follow across Europe. In the days following Italy’s announcement of its probe, data regulators in France, Germany and Ireland contacted the Garante to request more information about its findings. “If the business model is just scraping the internet for whatever you can find, then there could be a really big problem here,” says Tobias Judin, head of international at the Norwegian data protection authority, who monitor developments. Judin adds that if a model is built on data that can be collected illegally, it raises questions about whether anyone can use the tools legally.
Italy’s blow to OpenAI also comes as scrutiny of big AI models is steadily increasing. On March 29, technology leaders called for a pause on the development of systems like ChatGPT, fearing its future implications. Judin says the Italian move highlights more immediate concerns. “Essentially, we’re finding that there’s a potentially huge gap in AI development to date,” Judin says.
Europe GDPR rulesthat cover how organizations collect, store and use personal data of individuals, protect the data of more than 400 million people across the continent. This personal data can be anything from a person’s name to their IP address. If it can be used to identify someone, it can be considered their personal information. Unlike the patchwork of state-level privacy rules in the United States, GDPR protections apply if people’s information is freely available online. In short: Just because someone’s information is public doesn’t mean you can suck it up and do whatever you want with it.
Italian Garante thinks ChatGPT has four problems under GDPR: OpenAI has no age checks to prevent people under 13 from using the text-generating system; it may provide information about individuals that is not accurate; and people were not informed that their data had been collected. Perhaps more importantly, its fourth argument asserts that there is “no legal basis” for collecting people’s personal information from the massive swell of data used to form ChatGPT.
“The Italians have called their bluff,” says Lilian Edwards, professor of law, innovation and society at Newcastle University in the UK. “It seemed pretty obvious in the EU that this was a breach of data protection law.”