Italian regulators say ChatGPT must meet local and GDPR privacy regulations by April 30, but AI experts say the model's architecture makes such compliance almost impossible.
OpenAI may soon face its biggest regulatory challenge yet as Italian authorities insist the company has until April 30 to comply with local and European data protection and privacy laws, a task artificial intelligence (AI) experts say could be near impossible.
Italian authorities issued a blanket ban on OpenAI’s GPT products in late March, becoming the first Western country to outright shun the products. The action came on the heels of a data breach wherein ChatGPT and GPT API customers could see data generated by other users.
We believe the number of users whose data was actually revealed to someone else is extremely low and we have contacted those who might be impacted. We take this very seriously and are sharing details of our investigation and plan here. 2/2 https://t.co/JwjfbcHr3g
— OpenAI (@OpenAI) March 24, 2023
Per a Bing-powered translation of the Italian order commanding OpenAI to cease its ChatGPT operations in the nation until it’s able to demonstrate compliance:
“In its order, the Italian SA highlights that no information is provided to users and data subjects whose data are collected by Open AI; more importantly, there appears to be no legal basis underpinning the massive collection and processing of personal data in order to 'train' the algorithms on which the platform relies.”
The Italian complaint goes on to state that OpenAI must also implement age verification measures in order to ensure that its software and services are compliant with the company’s own terms of service requiring users be over the age of 13.
Related: EU legislators call for ‘safe’ AI as Google’s CEO cautions on rapid development
In order to achieve privacy compliance in Italy and throughout the rest of the European Union, OpenAI will have to provide a basis for its sweeping data collection processes.
Under the EU’s General Data Protection Regulation (GDPR), tech outfits must solicit user consent to train with personal data. Furthermore, companies operating in Europe must also give Europeans the option to opt-out of data collection and sharing.
According to experts, this will prove a difficult challenge for OpenAI because its models are trained on massive data troves, which are scraped from the internet and conflated into training sets. This form of black box training aims to create a paradigm called “emergence,” where useful traits manifest unpredictably in models.
"GPT-4...exhibits emergent behaviors".
— MMitchell (@mmitchell_ai) April 11, 2023
Wait wait wait wait. If we don't know the training data, how can we say what's "emergent" vs. what's "resultant" from it?!?!
I think they're referring to the idea of "emergence", but still I'm unsure what's meant. https://t.co/Mnupou6D1d
Unfortunately, this means that the developers seldom have any way of knowing exactly what’s in the dataset. And, because the machine tends to conflate multiple data points as it generates outputs, it may be beyond the scope of modern technicians to extricate or modify individual pieces of data.
Margaret Mitchell, an AI ethics expert, told MIT’s Technology Review that “OpenAI is going to find it near-impossible to identify individuals’ data and remove it from its models.”
To reach compliance, OpenAI will have to demonstrate that it obtained the data used to train its models with user consent — something the company’s research papers show isn’t true — or demonstrate that it had a “legitimate interest” in scraping the data in the first place.
Lilian Edwards, an internet law professor at Newcastle University, told MIT’s Technology Review that the dispute is bigger than just the Italian action, stating that “OpenAI’s violations are so flagrant that it’s likely that this case will end up in the Court of Justice of the European Union, the EU’s highest court.”
This puts OpenAI in a potentially precarious position. If it can’t identify and remove individual data per user requests, nor make changes to data that misrepresents people, it may find itself unable to operate its ChatGPT products in Italy after the April 30 deadline.
The company’s problems may not stop there as French, German, Irish, and EU regulators are also currently considering action to regulate ChatGPT.