diff --git a/README.md b/README.md index 95ab179..10709d1 100644 --- a/README.md +++ b/README.md @@ -1,3 +1,38 @@ -# robots.txt +# robots.txt 🤖 -A collection of robots.txt files for sites where it's the only option to defend against scraping like Neocities or Nekoweb. \ No newline at end of file +A collection of robots.txt files for sites where it's the only option to defend against scraping like Neocities or Nekoweb. + +## Use Instructions: + +###1: Pick a file you want to use. + +- **blacklist-robots.txt** - a large robots.txt file around **24Kb** in size that attempts to block everything bad, this includes search engines, AI crawlers, scrapers, and SEO bots. + +- **whitelist-robots.txt** - a small robots.txt file around **1.3Kb** (you can shrink it). Currently only allows [Wiby](https://wiby.org/) and [Marginalia Search](https://marginalia-search.com/). + +###2: Change your desired file's name to 'robots.txt' + +###2.5: If you have a sitemap, add it inside file where instructed, if you don't, delete the line. + +###3: Upload it to your site! + +## Request to Modify Repository + +Would you like a bot added or removed from either the whitelist or blacklist file? My email is [Here](https://left4code.neocities.org/left4code_gpg.txt), Specify the following in your email: + +1: File to modify +2: Bot to add or remove +3: name of User-agent for crawler and website if possible +4: why the bot should be added or removed. + +Currently this gitea instance does not allow for new sign-ups and therefore new PR's, email is currently the only way and will change if this instance ever opens up again. This also allows for anyone who already has an email address to make requests and hopefully should be easier to manage. + +## Where did you get the User-agents from? + +These user agents were manually obtained from [Baccyflap's No AI webring](https://baccyflap.com/noai/), I went through the list, found the disallowed user agents, put them all into a list, and ran: + +`sort | uniq -u > final_output.txt` + +if you wanted to create a list of your own. + +## I do not guarantee this will make you impervious to bots, this is what I use. Help would be appreciated in keeping the list updated, managed, and hopefully in the future, documented. diff --git a/blacklist-robots.txt b/blacklist-robots.txt new file mode 100644 index 0000000..b34ba2b --- /dev/null +++ b/blacklist-robots.txt @@ -0,0 +1,1077 @@ +#[BLACKLIST-ROBOTS.TXT VERSION 1.0] +#[MAINTAINED AT: https://git.qwik.space/Left4Code/robots.txt] GET A COPY OR REPORT ISSUES THERE! +#_________________________________________________________________________ + +#lots of AI companies and scrapers seem to use alternative non-published user-agents, only some get caught and shamed. Consider using some form of AI blocking if possible like Go-away, or Anubis. For static sites hosted with Neocities, Nekoweb, etc. This is another option. + +#___________________________________________________ + +User-agent: 007ac9 +User-agent: 008 +User-agent: 01h4x.com +User-agent: 2^32$ +User-agent: 360Spider +User-agent: 404checker +User-agent: 404enemy +User-agent: 80legs +User-agent: Abonti +User-agent: Aboundex +User-agent: Aboundexbot +User-agent: Acunetix +User-agent: AdIdxBot +User-agent: adidxbot +User-agent: ADmantX +User-agent: Adsbot +User-agent: adsbot +User-agent: AdsBot-Google +User-agent: AdsBot-Google-Mobile +User-agent: adscanner +User-agent: AdsTxtCrawlerTP +User-agent: AfD-Verbotsverfahren +User-agent: Agentic +User-agent: AhrefsBot +User-agent: .ai +User-agent: AI21 Labs +User-agent: AI2Bot +User-agent: Ai2Bot +User-agent: AI2Bot-Dolma +User-agent: Ai2Bot-Dolma +User-agent: AI Article Writer +User-agent: AIBOT +User-agent: AIBot +User-agent: AI Content Detector +User-agent: AI Dungeon +User-agent: AiHitBot +User-agent: aiHitBot +User-agent: AIMatrix +User-agent: Aipbot +User-agent: AISearchBot +User-agent: AI Search Engine +User-agent: AI SEO Crawler +User-agent: AI Training +User-agent: AITraining +User-agent: AI Writer +User-agent: Alexa +User-agent: Alexibot +User-agent: ALittle Client +User-agent: Alligator +User-agent: AllSubmitter +User-agent: Alpha AI +User-agent: AlphaAI +User-agent: AlphaBot +User-agent: a[mazing]{42}(robot) +User-agent: Amazon Bedrock +User-agent: AmazonBot +User-agent: Amazonbot +User-agent: Amazon Comprehend +User-agent: Amazon-Kendra +User-agent: Amazon Lex +User-agent: Amazon Sagemaker +User-agent: Amazon Silk +User-agent: Amazon Textract +User-agent: Amelia +User-agent: Anarchie +User-agent: Anarchy +User-agent: Anarchy99 +User-agent: AndersPinkBot +User-agent: Andibot +User-agent: Ankit +User-agent: Anthill +User-agent: Anthropic +User-agent: anthropic-ai +User-agent: AnyPicker +User-agent: Anyword +User-agent: Apexoo +User-agent: APIs-Google +User-agent: Applebot +User-agent: Applebot-Extended +User-agent: Applebot-Extended +User-agent: AppleNewsBot +User-agent: archive.org_bot +User-agent: Aria Browse +User-agent: arquivo.pt +User-agent: arquivo-web-crawler +User-agent: Articoolo +User-agent: Aspiegel +User-agent: AspiegelBot +User-agent: ASPSeek +User-agent: Asterias +User-agent: Atomseobot +User-agent: Attach +User-agent: autoemailspider +User-agent: Automated Writer +User-agent: Awario +User-agent: AwarioBot +User-agent: awario.com +User-agent: AwarioRssBot +User-agent: AwarioSmartBot +User-agent: Azure +User-agent: BackDoorBot +User-agent: Backlink-Ceck +User-agent: backlink-check +User-agent: BacklinkCrawler +User-agent: BacklinksExtendedBot +User-agent: BackStreet +User-agent: BackWeb +User-agent: Badass +User-agent: Bandit +User-agent: BardBot +User-agent: Barkrowler +User-agent: barkrowler +User-agent: BatchFTP +User-agent: Battleztar Bazinga +User-agent: BBBike +User-agent: BDCbot +User-agent: BDFetch +User-agent: bedrockbot +User-agent: BetaBot +User-agent: Bigfoot +User-agent: BingAI +User-agent: BingBot +User-agent: Bingbot +User-agent: Bingbot-chat +User-agent: Bitacle +User-agent: Blackboard +User-agent: Black Hole +User-agent: BlackWidow +User-agent: BLEXBot +User-agent: Blow +User-agent: BlowFish +User-agent: Boardreader +User-agent: Bolt +User-agent: BotALot +User-agent: Brandprotect +User-agent: BrandVerity/1.0 +User-agent: Brandwatch +User-agent: Brave Leo +User-agent: Brightbot 1.0 +User-agent: Buck +User-agent: Buddy +User-agent: BuiltBotTough +User-agent: BuiltWith +User-agent: Bullseye +User-agent: BunnySlippers +User-agent: BuzzSumo +User-agent: ByteDance +User-agent: Bytedance +User-agent: ByteSpider +User-agent: Bytespider +User-agent: cah.io.community +User-agent: Calculon +User-agent: CatBoost +User-agent: CATExplorador +User-agent: CazoodleBot +User-agent: CCBot +User-agent: CCbot +User-agent: CC-Crawler +User-agent: Cegbfeieh +User-agent: CensysInspect +User-agent: ChatGLM +User-agent: ChatGLM-Spider/1.0 +User-agent: ChatGLM-User/1.0 +User-agent: ChatGLM-User/2.0 +User-agent: check1.exe +User-agent: CheckMarkNetwork/1.0 +User-agent: CheckMarkNetwork/1.0 (+https://www.checkmarknetwork.com/spider.html) +User-agent: CheeseBot +User-agent: CherryPicker +User-agent: CheTeam +User-agent: ChinaClaw +User-agent: Chinchilla +User-agent: Chlooe +User-agent: Citoid +User-agent: Claritybot +User-agent: clark-crawler +User-agent: Claude +User-agent: ClaudeBot +User-agent: ClaudeBot +User-agent: Claude-SearchBot +User-agent: Claude-User +User-agent: Claude-Web +User-agent: claude-web +User-agent: ClearScope +User-agent: Clickagy +User-agent: Clickagy Intelligence Bot v2 +User-agent: Cliqzbot +User-agent: Cloud mapping +User-agent: coccocbot +User-agent: Cocolyzebot +User-agent: CODE87 +User-agent: Cogentbot +User-agent: cognitiveseo +User-agent: Cohere +User-agent: cohere-ai +User-agent: cohere-training-data-crawler +User-agent: Collector +User-agent: Common Crawl +User-agent: CommonCrawl +User-agent: com.plumanalytics +User-agent: ContentAtScale +User-agent: ContentBot +User-agent: Contentedge +User-agent: Content Harmony +User-agent: Content King +User-agent: Content Optimizer +User-agent: Content Samurai +User-agent: Conversion AI +User-agent: Copier +User-agent: Copilot +User-agent: CopyAI +User-agent: Copymatic +User-agent: CopyRightCheck +User-agent: Copyscape +User-agent: Cosmos +User-agent: Cotoyogi +User-agent: Cotoyogi +User-agent: Craftbot +User-agent: crawler4j +User-agent: crawler.feedback +User-agent: crawler.with.dots +User-agent: Crawling at Home Project +User-agent: CrawlQ AI +User-agent: crawl.sogou.com +User-agent: Crawlspace +User-agent: CrazyWebCrawler +User-agent: Crescent +User-agent: Crew AI +User-agent: CrewAI +User-agent: CrunchBot +User-agent: CSHttp +User-agent: Curious +User-agent: curl|sudo bash +User-agent: Custo +User-agent: CyotekWebCopy +User-agent: DALL-E +User-agent: DatabaseDriverMysqli +User-agent: DataCha0s +User-agent: DataForSeoBot +User-agent: dataforseobot +User-agent: dataforseo.com +User-agent: DataProvider +User-agent: Dataprovider +User-agent: Datenbank Crawler +User-agent: daumoa +User-agent: DBLBot +User-agent: dcrawl +User-agent: DeepAI +User-agent: DeepL +User-agent: DeepMind +User-agent: DeepSeek +User-agent: demandbase-bot +User-agent: Demon +User-agent: Deusu +User-agent: Devil +User-agent: Devin +User-agent: Diffbot +User-agent: diffbot +User-agent: Digincore +User-agent: DigitalPebble +User-agent: DIIbot +User-agent: Dirbuster +User-agent: Disco +User-agent: Discobot +User-agent: Discordbot +User-agent: Discoverybot +User-agent: Dispatch +User-agent: DittoSpyder +User-agent: DnBCrawler-Analytics +User-agent: DnyzBot +User-agent: DOC +User-agent: DomainAppender +User-agent: DomainCrawler +User-agent: DomainSigmaCrawler +User-agent: Domains Project +User-agent: domainsproject.org +User-agent: DomainStatsBot +User-agent: DomCopBot +User-agent: DotBot +User-agent: Dotbot +User-agent: dotbot +User-agent: Doubao AI +User-agent: Download Ninja +User-agent: Download Wonder +User-agent: Dragonfly +User-agent: Drip +User-agent: DSearch +User-agent: DTS Agent +User-agent: DuckAssistBot +User-agent: DuckduckBot +User-agent: EasyDL +User-agent: Ebingbong +User-agent: eCatch +User-agent: ECCP/1.0 +User-agent: Echobot Bot +User-agent: EchoboxBot +User-agent: Ecxi +User-agent: EirGrabber +User-agent: EMail Siphon +User-agent: EmailSiphon +User-agent: EMail Wolf +User-agent: Envelbot +User-agent: EroCrawler +User-agent: evc-batch +User-agent: ev-crawler +User-agent: everyfeed-spider +User-agent: Evil +User-agent: Exabot +User-agent: Express WebPictures +User-agent: ExtLinksBot +User-agent: Extractor +User-agent: ExtractorPro +User-agent: Extreme Picture Finder +User-agent: EyeNetIE +User-agent: Ezooms +User-agent: Facebookbot +User-agent: FacebookExternalHit +User-agent: facebookexternalhit +User-agent: facebookscraper +User-agent: Facebot +User-agent: Factset_spyderbot +User-agent: Falcon +User-agent: FDM +User-agent: FeedFetcher-Google +User-agent: FemtosearchBot +User-agent: FHscan +User-agent: Fimap +User-agent: Firecrawl +User-agent: FirecrawlAgent +User-agent: Firefox/7.0 +User-agent: FlashGet +User-agent: Flunky +User-agent: Flyriver +User-agent: Foobot +User-agent: Frase AI +User-agent: Freeuploader +User-agent: FriendlyCrawler +User-agent: FrontPage +User-agent: Fuzz +User-agent: FyberSpider +User-agent: Fyrebot +User-agent: GalaxyBot +User-agent: Gemini +User-agent: GeminiCrawler +User-agent: Gemini-Deep-Research +User-agent: Gemma +User-agent: GenAI +User-agent: Genieo +User-agent: GenomeCrawlerd +User-agent: Genspark +User-agent: GermCrawler +User-agent: Getintent +User-agent: GetRight +User-agent: GetWeb +User-agent: G-i-g-a-b-o-t +User-agent: Gigabot +User-agent: GLM +User-agent: GloogleCrawler +User-agent: Go-Ahead-Got-It +User-agent: GoogleAgent-Mariner +User-agent: Google Bard AI +User-agent: Googlebot +User-agent: googlebot +User-agent: Googlebot-Extended +User-agent: Googlebot-Image +User-agent: Googlebot-News +User-agent: Googlebot-Video +User-agent: Google-CloudVertexBot +User-agent: Google-Extended +User-agent: Google-InspectionTool +User-agent: GoogleOther +User-agent: GoogleOther-Image +User-agent: GoogleOther-Video +User-agent: GoogleProducer +User-agent: GoogleProducer; (+http://goo.gl/7y4SX) +User-agent: google-speakr +User-agent: Goose +User-agent: gopher +User-agent: Gotit +User-agent: Go!Zilla +User-agent: GoZilla +User-agent: GPT +User-agent: GPTBot +User-agent: Grabber +User-agent: GrabNet +User-agent: Grafula +User-agent: Grammarly +User-agent: GrapeFX +User-agent: GrapeshotCrawler +User-agent: Grendizer +User-agent: GridBot +User-agent: Grok +User-agent: GT Bot +User-agent: GTBot +User-agent: GT::WWW +User-agent: Haansoft +User-agent: HaosouSpider +User-agent: Harvest +User-agent: Havij +User-agent: HEADMasterSEO +User-agent: Hemingway Editor +User-agent: Heritrix +User-agent: heritrix +User-agent: Hloader +User-agent: HMView +User-agent: HonoluluBot +User-agent: HTMLparser +User-agent: HTTP::Lite +User-agent: HTTrack +User-agent: HTTrack 3.0 +User-agent: Hugging Face +User-agent: Humanlinks +User-agent: HybridBot +User-agent: Hypotenuse AI +User-agent: iaskspider +User-agent: iaskspider/2.0 +User-agent: Iblog +User-agent: ICC-Crawler +User-agent: IDBot +User-agent: IDBTE4M +User-agent: Id-search +User-agent: IlseBot +User-agent: Image Fetch +User-agent: ImageGen +User-agent: ImagesiftBot +User-agent: ImagesiftBot +User-agent: imagesift.com +User-agent: Image Sucker +User-agent: img2dataset +User-agent: imgproxy +User-agent: IndeedBot +User-agent: Indy Library +User-agent: Inferkit +User-agent: InfoNaviRobot +User-agent: Information Security Team InfraSec Scanner +User-agent: InfoTekies +User-agent: InfraSec Scanner +User-agent: INK Editor +User-agent: INKforall +User-agent: instabid +User-agent: IntelliSeek +User-agent: Intelliseek +User-agent: InterGET +User-agent: InternetMeasurement +User-agent: Internet Ninja +User-agent: InternetSeer +User-agent: internetVista monitor +User-agent: IonCrawl +User-agent: ips-agent +User-agent: Iria +User-agent: IRLbot +User-agent: isitwp.com +User-agent: Iskanie +User-agent: IsraBot +User-agent: ISSCyberRiskCrawler +User-agent: IstellaBot +User-agent: Is this a crawler? +User-agent: iubenda-radar +User-agent: ivre-masscan +User-agent: JamesBOT +User-agent: JasperAI +User-agent: Jbrofuzz +User-agent: JennyBot +User-agent: JetCar +User-agent: Jetty +User-agent: JikeSpider +User-agent: JOC Web Spider +User-agent: Joomla +User-agent: Jorgee +User-agent: JustView +User-agent: Jyxobot +User-agent: k2spider +User-agent: Kafkai +User-agent: Kangaroo +User-agent: Kangaroo Bot +User-agent: Kenjin Spider +User-agent: Keybot Translation-Search-Machine +User-agent: Keyword Density +User-agent: Keyword Density AI +User-agent: Kinza +User-agent: Knowledge +User-agent: KomoBot +User-agent: Kozmosbot +User-agent: Lanshanbot +User-agent: Larbin +User-agent: larbin +User-agent: Leap +User-agent: LeechFTP +User-agent: LeechGet +User-agent: LexiBot +User-agent: Lftp +User-agent: LibWeb +User-agent: Libwhisker +User-agent: libwww +User-agent: LieBaoFast +User-agent: Lightspeedsystems +User-agent: Likse +User-agent: Linguee +User-agent: Linkbot +User-agent: linkdexbot +User-agent: LinkedInBot +User-agent: LinkextractorPro +User-agent: linkfluence +User-agent: linko +User-agent: LinkpadBot +User-agent: LinkScan +User-agent: LinksManager +User-agent: LinkWalker +User-agent: LinqiaMetadataDownloaderBot +User-agent: LinqiaRSSBot +User-agent: LinqiaScrapeBot +User-agent: Lipperhey +User-agent: Lipperhey Spider +User-agent: Litemage_walker +User-agent: LLaMA +User-agent: LLMs +User-agent: Lmspider +User-agent: LNSpiderguy +User-agent: Ltx71 +User-agent: lwp-request +User-agent: LWP::Simple +User-agent: lwp-trivial +User-agent: Mag-Net +User-agent: Magnet +User-agent: magpie-crawler +User-agent: Majestic12 +User-agent: Majestic SEO +User-agent: Majestic-SEO +User-agent: MarketMuse +User-agent: MarkMonitor +User-agent: MarkWatch +User-agent: Masscan +User-agent: masscan +User-agent: Mass Downloader +User-agent: Mata Hari +User-agent: MauiBot +User-agent: Mb2345Browser +User-agent: MeanPath Bot +User-agent: Meanpathbot +User-agent: meanpathbot +User-agent: Mediapartners-Google +User-agent: Mediapartners-Google* +User-agent: Mediatoolkitbot +User-agent: mediawords +User-agent: MegaIndex.ru +User-agent: Meltwater +User-agent: Meta AI +User-agent: Meta-AI +User-agent: MetaAI +User-agent: Meta-External +User-agent: Meta-ExternalAgent +User-agent: meta-externalagent +User-agent: Meta-ExternalFetcher +User-agent: meta-externalfetcher +User-agent: MetaInspector +User-agent: MetaTagBot +User-agent: Metauri +User-agent: MFC_Tear_Sample +User-agent: MicroMessenger +User-agent: Microsoft Data Access +User-agent: MIDown tool +User-agent: MIIxpc +User-agent: MindSpider +User-agent: Minefield +User-agent: Mister PiX +User-agent: Mistral +User-agent: MistralAI-User/1.0 +User-agent: MJ12bot +User-agent: Moblie Safari +User-agent: ModatScanner +User-agent: Mojeek +User-agent: MojeekBot +User-agent: Mojolicious +User-agent: MolokaiBot +User-agent: Morfeus Fucking Scanner +User-agent: Mozlila +User-agent: MQQBrowser +User-agent: Mr.4x3 +User-agent: MSFrontPage +User-agent: MSIECrawler +User-agent: Msrabot +User-agent: MTRobot +User-agent: muhstik-scan +User-agent: Musobot +User-agent: MyCentralAIScraperBot +User-agent: Name Intelligence +User-agent: Nameprotect +User-agent: Narrative +User-agent: NaverBot +User-agent: Navroad +User-agent: NearSite +User-agent: Needle +User-agent: NeevaBot +User-agent: Nessus +User-agent: NetAnts +User-agent: Netcraft +User-agent: netEstate Imprint Crawler +User-agent: netEstate NE Crawler +User-agent: NetLyzer +User-agent: NetMechanic +User-agent: NetSpider +User-agent: Nettrack +User-agent: Net Vampire +User-agent: Netvibes +User-agent: NetZIP +User-agent: NeuralSEO +User-agent: Neural Text +User-agent: newspaper +User-agent: NextGenSearchBot +User-agent: Nibbler +User-agent: NICErsPRO +User-agent: Niki-bot +User-agent: Nikto +User-agent: NimbleCrawler +User-agent: Nimbostratus +User-agent: Ninja +User-agent: Nmap +User-agent: Nova Act +User-agent: NovaAct +User-agent: NPBot +User-agent: NPbot +User-agent: Nuclei +User-agent: Nutch +User-agent: OAI-SearchBot +User-agent: oBot +User-agent: Octopus +User-agent: Odin +User-agent: Offline Explorer +User-agent: Offline Navigator +User-agent: Omgili +User-agent: omgili +User-agent: Omgilibot +User-agent: omgilibot +User-agent: Omgilitbot +User-agent: OmniExplorer_Bot +User-agent: OnCrawl +User-agent: Open AI +User-agent: OpenAI +User-agent: openai +User-agent: openai.com +User-agent: OpenBot +User-agent: Openfind +User-agent: OpenLinkProfiler +User-agent: OpenText AI +User-agent: OpenVAS +User-agent: Openvas +User-agent: Operator +User-agent: OrangeBot +User-agent: OrangeSpider +User-agent: Orthogaffe +User-agent: OutclicksBot +User-agent: OutfoxBot +User-agent: Outwrite +User-agent: Page Analyzer +User-agent: PageAnalyzer +User-agent: Page Analyzer AI +User-agent: PageGrabber +User-agent: PageScorer +User-agent: page scorer +User-agent: PageThing +User-agent: PageThing.com +User-agent: Pandalytics +User-agent: PanguBot +User-agent: Panscient +User-agent: panscient.com +User-agent: Papa Foto +User-agent: Paperlibot +User-agent: Paraphraser.io +User-agent: Pavuk +User-agent: pcBrowser +User-agent: PECL::HTTP +User-agent: peer39_crawler +User-agent: peer39_crawler/1.0 +User-agent: PeoplePal +User-agent: PerplexityBot +User-agent: Perplexity-User +User-agent: PetalBot +User-agent: Petalbot +User-agent: PhindBot +User-agent: Phindbot +User-agent: PHPCrawl +User-agent: Picscout +User-agent: Picsearch +User-agent: PictureFinder +User-agent: Piepmatz +User-agent: Pi-Monster +User-agent: Pimonster +User-agent: Pinterestbot +User-agent: PiplBot +User-agent: Pixray +User-agent: PleaseCrawl +User-agent: plumanalytics +User-agent: Pockey +User-agent: POE-Component-Client-HTTP +User-agent: polaris version +User-agent: Poseidon Research Crawler +User-agent: prefetch-proxy +User-agent: probe-image-size +User-agent: Probethenet +User-agent: ProPowerBot +User-agent: ProWebWalker +User-agent: ProWritingAid +User-agent: Proximic +User-agent: Psbot +User-agent: psbot +User-agent: Pu_iN +User-agent: Pump +User-agent: PxBroker +User-agent: PyCurl +User-agent: python-requests +User-agent: QualifiedBot +User-agent: QueryN Metasearch +User-agent: Quick-Crawler +User-agent: QuillBot +User-agent: quillbot.com +User-agent: Quora-Bot +User-agent: Rainbot +User-agent: RankActive +User-agent: RankActiveLinkBot +User-agent: RankFlex +User-agent: RankingBot +User-agent: RankingBot2 +User-agent: Rankivabot +User-agent: RankurBot +User-agent: RealDownload +User-agent: Reaper +User-agent: RebelMouse +User-agent: Recorder +User-agent: RedesScrapy +User-agent: ReGet +User-agent: RepoMonkey +User-agent: Re-re +User-agent: Ripper +User-agent: ripz +User-agent: RobotSpider +User-agent: Robozilla +User-agent: RocketCrawler +User-agent: Rogerbot +User-agent: rogerbot +User-agent: RSSingBot +User-agent: Rytr +User-agent: s1z.ru +User-agent: SalesIntelligent +User-agent: SaplingAI +User-agent: satoristudio.net +User-agent: SBIder +User-agent: SBIntuitionsBot +User-agent: scalaj-http +User-agent: Scalenut +User-agent: ScanAlert +User-agent: Scanbot +User-agent: scan.lol +User-agent: scoop.it +User-agent: ScoutJet +User-agent: Scraper +User-agent: Scrapy +User-agent: Screaming +User-agent: ScreenerBot +User-agent: ScrepyBot +User-agent: ScriptBook +User-agent: Searchestate +User-agent: SearchmetricsBot +User-agent: Seekport +User-agent: SeekportBot +User-agent: Seekport Crawler +User-agent: Seekr +User-agent: SemanticJuice +User-agent: Semrush +User-agent: Semrush* +User-agent: SEMrushBot +User-agent: SemrushBot +User-agent: SemrushBot-BA +User-agent: SemrushBot-BM +User-agent: SemrushBot-CT +User-agent: SemrushBot-FT +User-agent: SemrushBot-OCOB +User-agent: SemrushBot-SA +User-agent: SemrushBot-SI +User-agent: SemrushBot-SWA +User-agent: SemrushBot-UB +User-agent: SentiBot +User-agent: Sentibot +User-agent: sentibot +User-agent: SenutoBot +User-agent: seobility +User-agent: SeobilityBot +User-agent: SeoCherryBot +User-agent: seocompany.store +User-agent: SEO Content Machine +User-agent: SEOkicks +User-agent: SEOkicks-Robot +User-agent: SEOlyt +User-agent: SEOlyticsCrawler +User-agent: Seomoz +User-agent: SEOprofiler +User-agent: SEO Robot +User-agent: seoscanners +User-agent: seoscanners.net +User-agent: SeoSiteCheckup +User-agent: seostar +User-agent: SEOstats +User-agent: serpstatbot +User-agent: sexsearcher +User-agent: SeznamBot +User-agent: Shodan +User-agent: Sidetrade +User-agent: Sidetrade indexer bot +User-agent: Simplified AI +User-agent: Siphon +User-agent: SISTRIX +User-agent: sistrix +User-agent: SiteAuditBot +User-agent: Sitebeam +User-agent: SiteCheckerBotCrawler +User-agent: sitechecker.pro +User-agent: sitecheck.internetseer.com +User-agent: SiteExplorer +User-agent: Sitefinity +User-agent: Siteimprove +User-agent: SiteLockSpider +User-agent: siteripz +User-agent: SiteSnagger +User-agent: Site Sucker +User-agent: SiteSucker +User-agent: Sitevigil +User-agent: Skydancer +User-agent: SlickWrite +User-agent: SlySearch +User-agent: SmartDownload +User-agent: SMTBot +User-agent: Snake +User-agent: Snapbot +User-agent: Snoopy +User-agent: SocialRankIOBot +User-agent: Sociscraper +User-agent: sogou spider +User-agent: sogouspider +User-agent: Sogou web spider +User-agent: Sonic +User-agent: Sosospider +User-agent: Sottopop +User-agent: SpaceBison +User-agent: Spammen +User-agent: SpankBot +User-agent: Spanner +User-agent: sp_auditbot +User-agent: Spbot +User-agent: spbot +User-agent: Spider_Bot +User-agent: Spider_Bot/3.0 +User-agent: Spinbot +User-agent: Spinn3r +User-agent: Spin Rewriter +User-agent: SplitSignalBot +User-agent: SputnikBot +User-agent: spyfu +User-agent: Sqlmap +User-agent: Sqlworm +User-agent: Sqworm +User-agent: Stability +User-agent: StableDiffusionBot +User-agent: star***crawler +User-agent: Steeler +User-agent: Storebot-Google +User-agent: Stripper +User-agent: Sucker +User-agent: Sucuri +User-agent: Sudowrite +User-agent: SummalyBot +User-agent: Super Agent +User-agent: SuperBot +User-agent: SuperHTTP +User-agent: Surfbot +User-agent: Surfer AI +User-agent: SurveyBot +User-agent: Suzuran +User-agent: Swiftbot +User-agent: sysscan +User-agent: Szukacz +User-agent: T0PHackTeam +User-agent: T8Abot +User-agent: tAkeOut +User-agent: Teleport +User-agent: TeleportPro +User-agent: Telesoft +User-agent: Telesphoreo +User-agent: Telesphorep +User-agent: Teoma +User-agent: TerraCotta +User-agent: Text Blaze +User-agent: TextCortex +User-agent: The Intraformant +User-agent: The Knowledge AI +User-agent: TheNomad +User-agent: Thinkbot +User-agent: ThinkChaos +User-agent: Thumbor +User-agent: TightTwatBot +User-agent: TikTokSpider +User-agent: TimpiBot +User-agent: Timpibot +User-agent: TinyTestBot +User-agent: Titan +User-agent: Toata +User-agent: Toweyabot +User-agent: Tracemyfile +User-agent: Trendiction +User-agent: Trendictionbot +User-agent: trendiction.com +User-agent: trendiction.de +User-agent: True_Robot +User-agent: Turingos +User-agent: Turnitin +User-agent: TurnitinBot +User-agent: turnitinbot +User-agent: TwengaBot +User-agent: Twice +User-agent: Typhoeus +User-agent: ubermetrics +User-agent: ubermetrics-technologies.com +User-agent: UbiCrawler +User-agent: UbiNG +User-agent: U_Bot +User-agent: U Fool +User-agent: U Fool v2.0.0 +User-agent: UnisterBot +User-agent: Upflow +User-agent: Vacuum +User-agent: Vagabondo +User-agent: V-BOT +User-agent: VB Project +User-agent: VCI +User-agent: VelenPublicWebCrawler +User-agent: VeriCiteCrawler +User-agent: VidibleScraper +User-agent: Vidnami AI +User-agent: Virusdie +User-agent: VoidEYE +User-agent: Voil +User-agent: Voltron +User-agent: voyagerx.com +User-agent: Wallpapers +User-agent: Wallpapers/3.0 +User-agent: WallpapersHD +User-agent: WARDBot +User-agent: WASALive-Bot +User-agent: WBSearchBot +User-agent: Webalta +User-agent: Web Auto +User-agent: WebAuto +User-agent: WebBandit +User-agent: Web Collage +User-agent: WebCollage +User-agent: WebCopier +User-agent: WEBDAV +User-agent: Web Enhancer +User-agent: WebEnhancer +User-agent: Web Fetch +User-agent: WebFetch +User-agent: Web Fuck +User-agent: WebFuck +User-agent: webgains-bot +User-agent: WebGo IS +User-agent: WebImageCollector +User-agent: WebLeacher +User-agent: WebmasterWorldForumBot +User-agent: webmeup-crawler +User-agent: Web Pix +User-agent: WebPix +User-agent: webprosbot +User-agent: webpros.com +User-agent: WebReaper +User-agent: Web Sauger +User-agent: WebSauger +User-agent: Webshag +User-agent: WebsiteExtractor +User-agent: Website Quester +User-agent: WebsiteQuester +User-agent: Webster +User-agent: WebStripper +User-agent: Web Sucker +User-agent: WebSucker +User-agent: WebWhacker +User-agent: Webzio +User-agent: Webzio-Extended +User-agent: Webzio-extended +User-agent: webzio-extended +User-agent: WebZIP +User-agent: WeSEE +User-agent: Whack +User-agent: Whacker +User-agent: Whatweb +User-agent: Whisper +User-agent: Who.is Bot +User-agent: Widow +User-agent: wiederfreibot/1.0 +User-agent: WinHTTrack +User-agent: WiseGuys Robot +User-agent: WISENutbot +User-agent: Wonderbot +User-agent: Woobot +User-agent: WordAI +User-agent: Wordtune +User-agent: WormsGTP +User-agent: Wotbox +User-agent: WPBot +User-agent: wpbot +User-agent: Wprecon +User-agent: WPScan +User-agent: Writecream +User-agent: WriterZen +User-agent: Writescope +User-agent: Writesonic +User-agent: WWW-Collector-E +User-agent: WWW-Mechanize +User-agent: WWW::Mechanize +User-agent: WWWOFFLE +User-agent: x09Mozilla +User-agent: x22Mozilla +User-agent: x28-job-bot +User-agent: xAI +User-agent: Xaldon WebSpider +User-agent: Xaldon_WebSpider +User-agent: xBot +User-agent: Xenu +User-agent: XoviBot +User-agent: xpymep1.exe +User-agent: YaK +User-agent: Yandex +User-agent: YandexAdditional +User-agent: YandexAdditionalBot +User-agent: Yeti +User-agent: YouBot +User-agent: Youbot +User-agent: YoudaoBot +User-agent: Zade +User-agent: Zao +User-agent: Zauba +User-agent: zauba.io +User-agent: Zealbot +User-agent: Zermelo +User-agent: Zerochat +User-agent: Zero GTP +User-agent: Zeus +User-agent: zgrab +User-agent: Zhipu +User-agent: Zimm +User-agent: Zitebot +User-agent: ZmEu +User-agent: ZoomBot +User-agent: ZoominfoBot +User-agent: ZumBot +User-agent: ZyBORG +User-agent: ZyBorg +Disallow: / +Dissalow: * +DisallowAITraining: / +DisallowAITraining: * +Content-Usage: ai=n + +User-agent: * +Disallow: +DisallowAITraining: / +DisallowAITraining: * +Content-Usage: ai=n + +sitemap: !REPLACE WITH SITEMAP LINK! + +#___________________________________________________ diff --git a/whitelist-robots.txt b/whitelist-robots.txt new file mode 100644 index 0000000..e0fb4d6 --- /dev/null +++ b/whitelist-robots.txt @@ -0,0 +1,22 @@ +#[WHITELIST-ROBOTS.TXT VERSION 1.0] +#[MAINTAINED AT: https://git.qwik.space/Left4Code/robots.txt] GET A COPY OR REPORT ISSUES THERE: +#_________________________________________________________________________ + +#lots of AI companies and scrapers seem to use alternative non-published user-agents, only some get caught and shamed. Consider using some form of AI blocking if possible like Go-away, or Anubis. For static sites hosted with Neocities, Nekoweb, etc. This is another option. + +#___________________________________________________ + +User-agent: WibyBot +User-agent: search.marginalia.nu +Disallow: + +User-agent: * +Disallow: / +Disallow: * +DisallowAITraining: / +DisallowAITraining: * +Content-Usage: ai=n + +sitemap: !REPLACE WITH SITEMAP LINK! + +#___________________________________________________