
Meta and Scale AI Used Gig Workers to Scrape Your Social Media Data

Categories: AI, Data Privacy
- Meta’s new AI model “Muse Spark” – codenamed Avocado – is tied to a broader pipeline involving Scale AI, a data-labeling firm in which Meta invested $14.3B for a 49% stake in June 2025.
- Contractors on Scale AI’s Outlier platform report scraping public social media profiles, tagging personal data, and processing sensitive content for AI training.
- Workers describe unstable pay, exposure to explicit or disturbing material, and unclear guidance around copyrighted data usage.
- Meta has separately used public Facebook and Instagram data dating back to 2007 for AI training, raising concerns about consent — especially outside EU/UK protections.
- Human review of personal data introduces added privacy risks beyond automated scraping, including exposure of identifiable information to individual contractors.
- Users can limit exposure by adjusting privacy settings, opting out where legally possible, and deleting historical content with tools like Redact.
On 7 April 2026, The Guardian published an investigation revealing that tens of thousands of gig workers have been employed through Scale AI to manually trawl social media profiles, copy copyrighted images, and transcribe explicit audio, all in service of training Meta’s artificial intelligence systems. The following day, on 8 April, Meta announced Muse Spark, its first major AI model from the newly assembled Meta Superintelligence Labs. The timing was not coincidental. The investigation and the product launch together tell a complete story: where the data came from, who handled it, and what it was ultimately used to build.
The story has since been reported by TechCrunch, CNBC, Euronews, Eastern Eye, and numerous technology and privacy publications. What has emerged is a detailed picture of how one of the world’s most powerful technology companies is sourcing the raw material for its AI ambitions, and the human and ethical cost of that process.
What Is Scale AI and How Is It Connected to Meta?
Scale AI is a San Francisco-based technology company that specialises in data labelling and annotation: the process of tagging and categorising information so that AI models can learn from it. In June 2025, Meta invested $14.3 billion in Scale AI, acquiring a 49% stake. As part of that deal, Scale AI’s co-founder and then-CEO, Alexandr Wang, was brought in to lead Meta Superintelligence Labs, the new division responsible for building Meta’s most advanced AI models. Meta then embarked on a broader hiring push, recruiting researchers and executives from OpenAI, Anthropic, and Google.
Scale AI operates a platform called Outlier, through which it recruits contractors with specialist backgrounds in fields such as medicine, science, economics, and journalism. The platform markets itself as offering flexible, skilled, project-based work. Workers are paid per task and classified as independent contractors rather than employees, which has significant implications for their legal rights and protections.
This is not the only direction from which Meta has been acquiring data. As previously covered by Redact, Meta exploited a GDPR “legitimate interest” loophole to begin training AI on European users’ public Facebook and Instagram content, opting them in by default rather than seeking explicit consent. The Scale AI investigation represents a separate and parallel channel for the same goal: building a training dataset large enough to compete with OpenAI and Google.
Meta and Scale AI Social Media Scraping: What Were Workers Actually Doing?
According to The Guardian’s investigation, the tasks assigned to contractors frequently went far beyond refining or testing high-level AI systems. Workers described being asked to comb through public social media profiles on platforms including Facebook and Instagram, tagging individuals, analysing personal content, and harvesting images for inclusion in machine learning datasets.
“I don’t think people understood quite that there’d be somebody on a desk in a random state, looking at your profile, using it to generate AI data.”
– Scale AI contractor, speaking to The Guardian
The tasks were not limited to social media scraping. Multiple workers reported being asked to transcribe explicit audio content, label disturbing and graphic imagery, and review sensitive material, despite being told during onboarding that they would not be exposed to such content. One worker described feeling uncomfortable when images of children appeared in training materials. Others reported being asked to order images by the apparent age of the person depicted.
Workers also reported that some tasks involved the use of copyrighted artwork and creative work to train AI systems that would then generate original content. Scale AI has stated that contractors were specifically instructed not to use copyrighted material. Several workers disputed this account, suggesting the instructions were either unclear or inconsistently enforced.
Analysts and workers have speculated that many of these tasks were specifically linked to training Meta’s “Avocado” model, the internal codename for what was later released publicly as Muse Spark. That model is now being rolled out across Facebook, Instagram, WhatsApp, Messenger, and Meta’s Ray-Ban AI glasses. Meta’s AI-related capital expenditure in 2026 is projected at between $115 billion and $135 billion.
Who Are the Gig Workers Behind Meta’s AI Training Data?
The workforce recruited through Outlier is notable for its educational and professional diversity. Scale AI positions the platform as suitable for people with expert-level knowledge, and workers include former journalists, academics, healthcare professionals, and scientists. This distinguishes the Outlier workforce from more typical gig economy labour, where tasks are usually lower-skilled and repetitive.
Despite that positioning, many workers told The Guardian that financial desperation, rather than professional interest, drove them to the platform. “A lot of us were really desperate,” one worker said. “Many people really needed this job, myself included, and really tried to make the best of a bad situation.”
“I have to be positive about AI because the alternative is not great.”
– Scale AI contractor, speaking to The Guardian
Workers described the income as unpredictable, with projects disappearing without notice and no guaranteed minimum work available. Many continued despite the conditions because the alternative, in industries increasingly disrupted by the very AI tools they were helping to train, looked worse.
This pattern is not unique to Scale AI. A recently published report by SOMO (Centre for Research on Multinational Corporations) found that AI data workers globally face unstable pay, heavy monitoring, and working conditions that mirror those of the broader gig economy, while the companies they serve are among the most profitable in the world. In January 2026, workers at Covalen’s Dublin offices providing AI training services for Meta went on strike to demand union recognition, better wages, and improved working conditions.
What Privacy Risks Does the Meta and Scale AI Data Scraping Create for You?
The immediate concern raised by this investigation is straightforward: personal information that people shared on social media, often years ago and in a very different context, has been reviewed and processed by human contractors working for an AI company. This includes names, faces, locations, and relationship data visible on public profiles.
This is part of a longer pattern of Meta using social media content as AI training material without obtaining meaningful consent from users. Meta has acknowledged scraping all public Facebook and Instagram posts since 2007 for AI training purposes, an admission made during a public inquiry in Australia. The company has also tested a feature allowing it to upload and scan photos from users’ device camera rolls, including images that were never posted to any platform.
The Scale AI investigation adds a dimension that automated scraping alone does not create. It is one thing for an algorithm to process public data at scale. It is another for individual contractors, located anywhere in the world, to manually review personal profiles, photographs, and private-seeming content. Human reviewers introduce different and more personal risks: data can be mishandled, stored informally, or viewed by people with no legitimate reason to see it. Human rights organisations cited in the original investigation documented widespread exposure of personally identifiable information to reviewers, including names, selfies, and the content of private messages.
The concern is compounded by another recent development at Meta. Meta is removing end-to-end encryption from Instagram direct messages, with the change taking effect in May 2026. While the stated reason is content moderation and legal compliance, the change could also enable Meta to increase data collection for ad targeting & personalization purposes. Broadly speaking, there appears to be a trend away from user privacy in Meta.
What Has Meta and Scale AI Said in Response?
Scale AI told The Guardian that its Outlier platform offers flexible, project-based work with transparent pay, and that contributors choose when to participate. The company said it had made meaningful investments in the contributor experience and characterised the arrangement as mutually beneficial.
On the question of copyrighted material, Scale AI stated that contractors were specifically instructed not to use AI-generated images in training data. The company did not directly address the broader allegations about social media profile scraping or the exposure of workers to explicit and disturbing content.
Meta did not comment when approached by multiple outlets. Notably, OpenAI ended its partnership with Scale AI in June 2025, likely due to concerns around confidentiality and potential conflict of interest. Scale AI maintain partnerships with numerous companies and government agencies including Cisco, the U.S. Army, the U.S. Air Force and many others, according to their website.
How to Opt Out of Meta Using Your Data for AI Training
Your options depend significantly on where you live. The rights available to users in the European Union and the United Kingdom are considerably stronger than those available to users in the United States and most other countries.
If you are in the EU or UK: Under GDPR, you have a formal right to object to Meta processing your data for AI training. To exercise this, log into Facebook or Instagram, go to Settings, then Privacy Center, and look for the section titled “How Meta uses information for generative AI models and features.” Click “Right to object” and complete the form. You will need to provide your email address and a brief explanation of your objection. Meta is required to review your request, though it is not obligated to approve every submission automatically. MIT Technology Review has a detailed walkthrough of this process. It is also worth noting that an opt-out covers your own content but does not cover photos or posts that other users have shared featuring you.
If you are in the US: There is currently no equivalent opt-out mechanism. Meta does not offer a setting that prevents American users’ public posts from being used as AI training data. As Proton’s privacy team has noted, the most effective steps available to US users are setting their accounts to private, limiting what they share publicly, and deleting content they would not want to contribute to a training dataset.
For all users: It is also worth checking Meta’s camera roll cloud processing setting if you use the Facebook mobile app. If enabled, this setting allows Meta to upload and scan recent photos from your device, including images you have never posted. To check this: open the Facebook app, go to Settings, and look for “Camera roll sharing suggestions.” If the toggle is blue, it is active. Turn it off to prevent ongoing photo uploads.
For users who want to go further than adjusting settings, the most direct action available is to reduce the amount of personal content that exists on public social media profiles in the first place. Opting out only limits future use. Content already harvested may already be embedded in existing training datasets. Removing old posts, photos, and profile information limits both current and future exposure. Tools like Redact make this process manageable at scale, allowing you to bulk delete content across more than 25 platforms without doing it manually, post by post.
The Broader Implications: Regulation, Litigation, and the Future of AI Training Data
The practices described in The Guardian’s investigation are not unique to Meta or Scale AI. The AI industry as a whole depends on vast quantities of labelled human-generated data, and human annotation remains central to that process. What has changed is the scale of the operation and the sensitivity of the content now being processed.
Industry analysts have noted that aggressive data acquisition strategies are generating serious legal and reputational risk across the sector. Lawsuits over copyright infringement are already underway. As documented by Redact, Meta have faced litigation for using pirated content in LLM training – though they were protected under fair use. The New York Times has sued OpenAI and Microsoft, and a coalition of media publishers including Condé Nast has filed similar cases against multiple AI companies.
Regulatory pressure is building in parallel. The EU’s Digital Services Act imposes new obligations on large platforms around algorithmic transparency and data governance. Lawmakers in the US, EU, and Asia are actively considering stricter rules around AI training data and consent frameworks. Privacy laws globally are shifting, but enforcement remains inconsistent, and the gap between what regulators require and what companies practice remains wide.
For ordinary users, regulatory change arrives slowly. In the meantime, the most reliable form of protection is limiting the data that exists to be harvested in the first place.
Take Control of Your Social Media Footprint with Redact
The practices described in this investigation are a clear reminder that content shared online, even publicly, can be used in ways that most people would not expect or consent to. Your social media history is not just a personal archive. For companies like Meta and Scale AI, it is raw material for commercial AI systems built without your knowledge or agreement.
Redact is a privacy-first tool that lets you bulk delete your posts, messages, comments, images, and other content across more than 25 platforms. Everything runs locally on your device: your credentials and content are never processed on external servers, and Redact cannot see or store any of your data.
You can filter deletions by keyword, date range, content type, and more, giving you precise control over what stays and what goes. Whether you want to remove years of old posts from a single platform or systematically clean up your presence across your entire digital footprint, Redact makes it possible without requiring you to do it one post at a time.
Download Redact.dev and take back control of your online presence.
Redact supports a massive range of major social media and productivity platforms – like Instagram, Twitter, Facebook, Discord, Reddit and more.