Another 2 Billion Discord Messages Scraped: What This Means for Your Digital Footprint

Another 2 Billion Discord Messages Scraped: What This Means for Your Digital Footprint

Redacto
4 min read

In a story by 404media that reads like a textbook example of modern surveillance creep, researchers recently scraped and published over 2 billion Discord messages – sourced from more than 620 public servers. This comes about a month after the last claim of a massive Discord scraping job.

The dataset, created under the guise of “open research,” has ignited a heated debate about privacy, ethics, and the increasingly blurry lines between public and personal data in online communities.

đź§  What Happened?

According to reporting from 404 Media, a team of academic researchers harvested billions of messages from public Discord servers, compiling a massive dataset and releasing it publicly.

These messages were scraped via Discord’s public API – a method that some are defending as fair game under Discord’s TOS for public servers. While technically “public,” this data wasn’t collected with the contextual consent of the users involved.

In short, the dataset that’s been published is enough for an amateur OSINT analyst – or malicious actor – to triangulate real identities from casual conversations, and leverage them for phishing attacks, scams or social engineering.

⚖️ Maybe Legal, Definitely Not Ethical

Legality is a low bar in data ethics. Just because something is accessible via a public API doesn’t mean it should be hoovered up and republished without boundaries.

The dataset’s creators argue that it’s a resource for studying online communities and moderation. But critics point out that:

  • Many of the servers targeted are intimate, niche spaces — think ADHD support groups, queer communities, and hobbyist fandoms.
  • Some users were minors when they posted.
  • No redaction, anonymization, or opt-out mechanism was provided.
  • Users that may have consented to data collection from Discord, did not knowingly consent to this data being scraped and stored by third parties.

A Discord spokesperson since released a statement about the scraping;

“This is a serious matter, and we are committed to protecting the privacy and data of our users. Based on our initial investigation, we determined that user accounts accessed Discord servers that were discoverable and widely accessible and scraped data without our permission,” the spokesperson said. “It appears the researchers took steps to protect people’s identities, but this still violates our policies and we are fully investigating.” (AJ Dellinger, Gizmodo)

🕵️‍♀️ Why This Is a Goldmine for OSINT, Phishing, and Social Engineering

Publishing this dataset is a threat model in action. With a few cross-referenced usernames, bad actors can:

  • Build psychographic profiles for manipulation
  • Extract keywords and topics for phishing lures
  • Map entire social graphs of online communities
  • Scrape timestamped behavior patterns for targeting

This kind of unfiltered content offers insight that metadata never could — emotional tone, vulnerabilities, and interpersonal dynamics. Combine that with data from breaches or LinkedIn and you’ve got a social engineer’s starter pack.

🛡️ How to Protect Yourself

If you’re active on Discord – or anywhere else online – consider this your privacy wake-up call. Here’s how to reduce your exposure:

1. Practice content hygiene:
Assume anything you say in a public or semi-public space can be logged and indexed. Even in “closed” servers, avoid sharing sensitive personal info. If you’ve done this already, delete it right now with a tool like redact.dev.

2. Use pseudonyms consistently:
Avoid using the same handle across platforms. Services like Namecheckr can help identify where your usernames overlap.

3. Revisit old posts and accounts:
Use deletion tools like Redact.dev to bulk-delete historical messages and reduce your traceable footprint.

4. Harden your Discord settings:
Disable “Allow direct messages from server members” and regularly audit which servers you’re in. Avoid those with open invites and no moderation guidelines.

5. Advocate for privacy-first tools and policies:
Push back when platforms, researchers, or even journalists normalize mass surveillance – even under the pretense of academic curiosity.

The Bigger Picture

This incident isn’t just about Discord. It’s about a growing culture of justified overcollection, where public APIs become pipelines for questionable data harvesting – all while users are left in the dark.

In the age of AI and large language models, every scraped conversation becomes a potential training data point. Every message you’ve sent might someday power a tool you never consented to – one trained to imitate you.

We don’t just need stronger platform policies. We need a public understanding that privacy doesn’t die in the dark – it dies in the metadata.

© 2025 Redact - All rights reserved