LinkedIn under fire for secret AI training data harvest

By default, users are signed up to allow their data to be used for AI training

Image:

LinkedIn under fire for secret AI training data harvest

LinkedIn has come under fire for secretly training its AI models on user data without explicit consent.

The professional networking platform recently updated its privacy policy to clarify how it uses user data to develop and provide AI-powered services.

In a "trust and safety" update, LinkedIn's senior vice president and general counsel, Blake Lawit, disclosed that the company has been collecting and analysing users' posts, articles, and other data to enhance its generative AI features.

By default, LinkedIn users are signed up to allow their data to be used for AI training. This data is then used to develop AI models that provide writing suggestions and post recommendations. Additionally, LinkedIn's parent company, Microsoft, may also be training its AI models on the scraped user data.

This revelation has angered many users who feel their privacy has been violated. Many users expressed their disappointment on the platform, accusing the company of breaching trust. Some users argued that Microsoft should pay LinkedIn users for the use of their data for AI training.

Meanwhile, LinkedIn has introduced an opt-out option for users who do not want their data used for AI training. They can opt-out by accessing the "Data Privacy" section of their settings menu and toggling off the "Use my data for training content creation AI models" option.

However, it's important to note that this opt-out may not affect training that has already taken place.

LinkedIn also says it will not use data from users in the EU, Iceland, Norway, Liechtenstein and Switzerland to train its AI models.

Despite LinkedIn's claims to minimise the use of personal data in its training datasets, there are concerns about the potential for sensitive information to be exposed. The company acknowledges that generative AI features can sometimes produce outputs that include personal data from the input, such as names or specific details.

"Members or customers may provide personal data as an input to a generative AI powered feature, which could result in personal data being provided as an output. For example, if a member used our generative AI "Writing Suggestions" feature to help write an article about the best advice they got from their mentors, they may include the names of those mentors in the input," LinkedIn says on its FAQ page.

"The resulting output from the generative AI “Writing Suggestions” feature may include those names, which that member can edit or revise before deciding to post."

The controversy has prompted calls for investigations from data protection authorities in the UK and Ireland.

The Information Commissioner's Office (ICO) in the UK and the Data Protection Commission (DPC) in Ireland are both examining LinkedIn's practices and considering potential enforcement actions.

Other social media platforms, such as Meta and Snap, have also faced similar criticism for their use of user data to train AI models.

Platforms like Reddit and Stack Overflow have established data licensing agreements with AI companies, enabling the training of their models on large amounts of user-generated content.

Earlier this week, Meta announced that it will resume training its AI models using public content shared by users on Facebook and Instagram in the UK, after halting the process in June due to regulatory conflicts.