yes GPTBot, eat up! enjoy the maze of garbage

output of a terminal showing a nginx access log (showing what things have been making requests to my server). It's filled with requests from the GPTBot, whose purpose is to scrape data to train "AI" models. But instead of getting good training data it's getting junk data. I.e. I'm poisoning it

Check out Iocaine – it traps web bots in a never-ending maze of garbage text instead of letting them scrape your actual website

The image is a screenshot of my terminal showing recent requests from my web server. Right now it’s all GPTBot, whose purpose is to scrape data to train “AI” models, or, steal human creations without permission. Instead of getting good training data it’s getting junk data. I’m poisoning it. The idea is that if these bots suck up enough garbage data, the bots that get trained from that data will get worse and worse.

If my server were serving it actual data, it would probably have been brought down by all the traffic. So far today – just today, in the 8 hours since midnight – the GPTBot was made 110,637 requests to my server, 100% of which were served garbage.

a GIF of Mr. Burns from the Simpsons putting his fingers together and saying "EXCELLENT"

Leave a comment?

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.