Woof.group's been getting slower and slower lately, and I think it's cuz we're getting hammered by (maybe LLM?) scrapers. Hard to say, really, but don't buy that there are *that* many Windows/Chrome users clicking around through every single tag page.
Gonna add a bunch of the LLM bots to robots.txt--I know many of the big players just ignore robots.txt and fudge their UAs, but maybe it'll make a little dent. Fully 2% of our requests are ByteDance, and 5% are ahrefs--both of those should be blockable.
No idea what to do about what I suspect is residential proxy traffic, which makes up the vast majority of our load. I assume throwing Anubis in front of a Mastodon instance is going to break a ton of legitimate use cases.
Have you tried preventing this in your proxy?
I mean something like this:
Ngnix:
if ($http_user_agent ~* (GPTBot|ChatGPT)) {
return 403;
}
Apache .htaccess
SetEnvIfNoCase User-Agent “GPTBot” bad_bot
SetEnvIfNoCase User-Agent “ChatGPT” bad_bot
Deny from env=bad_bot
You just have to find the other bots.
@bearleathermen Oh we've actually had GPTBot in robots.txt for ages, and they seem to respect it. The problem is the high-volume scrapers impersonating Chrome/Firefox/Edge/etc, usually just one or two request per IP.