My Dead Guard Dog
2025/12/11
At the front of my website stands a file,
robots.txt, and you'll find one in the root directory of most
websites. Its job is to tell bots (automated scripts and
browsers) where they can and can't go within the
website, and it's as useless as a dead guard dog.
It's very much like the Do Not Track (DNT) header: it's a law with no enforcement — a worthless gesture that will only be adhered to by voluntarily honorable and trustworthy parties, exactly the type of people who wouldn't want to track you in the first place. In fact, there's nothing stopping a morally bankrupt organization from deciding to track you more scrutinously when you enable DNT headers, perhaps under the premise that they think you have something to hide. I deeply appreciate an unobtrusive notification when I first enter a website saying my DNT header will be honored; I feel more comfortable sticking around.
My robots.txt just has one instruction, and
it's identical to the one
reddit swapped to in 2024. It tells bots to fuck off from the entire site.
The reason reddit does it is obvious: it allowed them to take all of the data contributed by redditors over decades and auction it off to different AI firms, data brokers, and search engines under private (profitable) agreements. Instantaneously disabling dozens of third-party reddit reading apps, upon which some independent developers had built their livelihoods, was a welcome side effect, as all of the value they added to the reddit experience over the years, in good faith, was clawed back to shareholders who could now negotiate how to sell our intellectual property.
The reason I do it is out of an indiscernable mix of spite and principle. It doesn't work, and it elicits a benign but annoying side effect of receiving automated emails about my trash SEO and how I can pay someone to improve it.
I don't want code that I wrote, the product of years of expensive education and arduous trial and error, to be used to improve the quality of software that is openly threatening my job security.
I don't have a solution here. Anyone who hosts a website will understand the plight: even if your dead guard dog adamantly declares the entire house off-limits, the most dishonorable, unwanted scrapers you cared most about deterring will walk past unobstructed, scraping to their iron hearts' content and filling your forms with PHP debugging code.
This state of affairs leaves the deterrence of unwanted bots up to you. A modern website owner must understand and account for the inevitable fact that bots will crawl their site, and they'll likely pay for that traffic just the same as legitimate visits by humans. Techniques like Cloudflare's "AI Labyrinths" are intriguing, as are putting sneaky, invisible form inputs to trick a bot into ousting itself, but these are cat and mouse games. An author of a crawler can account for these tactics once enough is known about them, and then it's on to the next mitigation technique, ad infinitum.
At the time being, the onus is on the site owner to erect a fence of bot-bouncing safeguards, a hurdle that further impedes the pursuit of an open web.