this list are misbehaving bots busted in crawling where they should not be crawling, being greedy or just plain annoying.

every single ip in any of these lists have been cought by our filters,
there is not a single ip in any of these lists that hasn't behaved like a bad actor.

this is after "abuseipdb.com" blacklist has been applied, it is not a replacement but rather an addition.
it also contains things that are otherwise whitelisted by abuseipdb even tough they might not be welcome, eg: AI bots.
ip's in these lists will be remembered for a set period of time, depending on what exactly they were doing.
if you do not want to appear in any of these lists, try behaving yourself.

be carefull with the "badrequest" list, this may contain systems such as cloudflare, while i'd advise not to use their
services because they freely offer them to any malicious actor blocking them could lead to some issues.

they are updated daily. use these as you see fit.

"companyname" is a jail specifically for that particular company after repeated violations.
"badrequest" was any type of scanning for configuration or admin stuff, known attacks, bad requests.
"baduseragent" are untrusted user agents commonly used to scan or crawl sites.
"email" are systems who fell into the email honeypot, aka: spammers, email hackbots, anything you don't want poking around.
"denied" are systems who repeatedly got "access denied" but kept poking around.
"honeypot" are systems who fell into a very obvious honeypot meant to trap bots or people snooping around, these are the losers of the botworld.

why are "legitimate" companies in this list?
simple, either violating robots.txt limits, being a constant annoyance or being very greedy when crawling. more specifically:
alibaba - ignoring robots.txt
applenews - ignoring robots.txt + greed
bitsightbot - ignoring robots.txt + greed
censys - annoyance
meta/facebook - ignoring robots.txt + greed
openAI - ignoring robots.txt + greed
qwantbot - greed
yandex - ignoring robots.txt + greed
amazonbot - greed
babbar.tech - greed
bytedane - ignoring robots.txt + greed
netcraft - annoyance
palo-alto - annoyance
semrush - ignoring robots.txt + greed
