Building AI safety is getting harder and harder
This is Atlantic Intelligence, an eight-week series in which The Atlantic’s leading thinkers on AI will help you understand the complexity and opportunities of this groundbreaking technology. Sign up here. The bedrock of the AI revolution is the internet, or more specifically, the ever-expanding bounty of data that the web makes available to train algorithms. ChatGPT, Midjourney, and other generative-AI models “learn” by detecting patterns in massive amounts of text, images, and videos scraped from the internet. The process entails hoovering up huge quantities of books, art, memes, and, inevitably, the troves of racist, sexist, and illicit material distributed across the web. Earlier this week, Stanford researchers found a particularly alarming example of that toxicity: The largest publicly available image data set used to train AIs, LAION-5B, reportedly contains more than 1,000 images depicting the sexual abuse of children, out of more than 5 billion in total. A spokesperson for the data set’s creator, the nonprofit Large-scale Artificial Intelligence Open Network, told me in a written statement that it has a “zero tolerance policy for …