Bluesky’s open API means anyone can scrape your data for AI training
Bluesky might not be training AI systems on user content as other social networks are doing, but there’s little stopping third-parties from doing so. Per a report by 404 Media, a machine learning librarian at AI firm Hugging Face pulled 1 million public posts from Bluesky via its Firehose API for machine learning research, pushing the dataset to a public repository. Daniel van Strien later removed the data due to the controversy that ensued, however it serves as a timely reminder that everything you post publicly to Bluesky is, well, public. Bluesky said that it’s looking at ways to enable users to communicate their consent preferences externally, though it’s up to those parties whether they respect those preferences. The company posted: “Bluesky won’t be able to enforce this consent outside of our systems. It will be up to outside developers to respect these settings. We’re having ongoing conversations with engineers & lawyers and we hope to have more updates to share on this shortly!” What’s clear here is that while Bluesky is surging in popularity, …