X suspends personal data training of AI chatbot Grok following Irish DPC pressure
The question
How are the data regulators addressing the use of personal data when training AI language models?
The key takeaway
The training of an AI language model using an individual's personal data needs to comply with data protection laws, including the EU General Data Protection Regulation (EU GDPR) where this applies. Following an investigation into X's training of its AI language model, Grok, by the Irish Data Protection Commission (DPC), X has suspended the use of EU user personal data to train Grok. In cases like that of X, it is possible that an organisation will ultimately be asked to delete the personal data used to train the AI model from their systems, although there are no obligations to delete the resulting AI model itself.
The background
On 6 August 2024, the DPC launched proceedings in Ireland's High Court against Twitter International Unlimited Company (TIUC), the main Irish subsidiary of Elon Musk's social media platform X. The action related to the use of the personal data of X users that is subject to the EU GDPR to train an artificial intelligence model named Grok. Grok was intended to act as an AI search assistant exclusive to X's premium account holders, and was created by another one of Musk's companies, US-based xAI Corp.
X had changed its privacy settings in July 2024 to require its EU users to opt out of training Grok. The DPC's action was brought on the grounds of a breach of the EU GDPR in the training of Grok, specifically that the use of public posts on X to develop the model amounted to processing personal data without a lawful basis for doing so. The DCP requested the suspension the processing of personal data collected between May and August 2024. Following an interim suspension of the data processing starting on 8 August 2024, X's Global Governance Affairs team tweeted: "The order that the Irish DPC has sought is unwarranted, overboard and singles out X without any justification. This is deeply troubling…While many companies continue to scrape the web to train AI models with no regards for user privacy, X has done everything it can to give users more control over their data."
The Irish DPC's proceedings were terminated on 4 September 2024, after TIUC agreed to permanently discontinue the processing of some of the personal data. The undertaking means that TIUC must delete and stop using EU user's data to train Grok. Interestingly, however, TIUC is not obliged to delete AI models which were trained using this data, despite the absence of explicit consent by from data subjects.
The development
Due to a lack of clarity in this area, the DPC has taken the case to the European Data Protection Board (EDPB), the EU body in charge of enforcing privacy laws, to provide its view on whether TIUC breached any data privacy laws during the period when personal data subject to the EU GDPR was being used to train Grok. Furthermore, another nine complaints have been filed against xAI Corp by the data privacy advocacy group NOYB, for allegedly breaching 16 articles of the EU GDPR.
With the rise of AI and the rapid development of AI tools by large tech platforms, the Irish data commissioner, Dale Sutherland, is lobbying for the EDPB to introduce a "proactive, effective and consistent Europe-wide regulation" to regulate the training of AI. The EDPB is expected to make a two-thirds majority decision on this issue in October 2024. So far, TIUC and xAI have managed to escape any sanctions, but further EU GDPR complaints relating to the training of Grok are still under investigation.
Why is this important?
A potential loophole has been unearthed by this case, whereby AI platforms do not need to delete AI models trained using personal data even if they are required to delete the data itself. When American newspaper TechCrunch interviewed the DPC on this point, the watchdog replied that it was immediately more concerned about the processing of EU and EEA users' data and did not comment on the information already learned by Grok.
In a parallel GDPR complaint, Marco Scialdone, a lawyer and university professor, has demanded that an "algorithmic disgorgement" should be performed by X whereby the AI model trained with deleted data is either retrained or deleted. With the surge of AI, the outcome of this decision is likely to act as a crucial precedent in future cases.
Any practical tips?
Understanding what data to feed an AI language model and how breaches of privacy laws can be avoided is absolutely key to safe deployment. Any data processing consent sought from users whose personal data is used to train a large language model should require their explicit agreement to that processing.
Autumn 2024
Stay connected and subscribe to our latest insights and views
Subscribe Here