Split off challenges page from README

This commit is contained in:
WeebDataHoarder
2025-04-13 16:53:52 +02:00
parent 2cd6d0cebf
commit d72010bb64
2 changed files with 126 additions and 115 deletions

123
README.md
View File

@@ -7,11 +7,11 @@ Self-hosted abuse detection and rule enforcement against low-effort mass AI scra
go-away sits in between your site and the Internet / upstream proxy.
Incoming requests can be selected by [rules](#rich-rule-matching) to be [actioned](#extended-rule-actions) or [challenged](#challenges) to filter suspicious requests.
Incoming requests can be selected by [rules](#rich-rule-matching) to be [actioned](#extended-rule-actions) or [challenged](CHALLENGES.md#challenges) to filter suspicious requests.
The tool is designed highly flexible so the operator can minimize impact to legit users, while surgically targeting heavy endpoints or scrapers.
[Challenges](#challenges) can be transparent (not shown to user, depends on backend or other logic), [non-JavaScript](#non-javascript-challenges) (challenges common browser properties), or [custom JavaScript](#custom-javascript-wasm-challenges) (from Proof of Work to fingerprinting or Captcha is supported)
[Challenges](CHALLENGES.md#challenges) can be transparent (not shown to user, depends on backend or other logic), [non-JavaScript](#non-javascript-challenges) (challenges common browser properties), or [custom JavaScript](#custom-javascript-wasm-challenges) (from Proof of Work to fingerprinting or Captcha is supported)
See _[Why?](#why)_ section for the challenges and reasoning behind this tool.
@@ -104,7 +104,7 @@ Several challenges that do not require JavaScript are offered, some targeting th
These can be used for light checking of requests that eliminate most of the low effort scraping.
See [Challenges](#challenges) below for a list of them.
See [Challenges](CHALLENGES.md#challenges) for a list of them.
### Custom JavaScript / WASM challenges
@@ -150,7 +150,11 @@ Results will be temporarily cached
By default, [DroneBL](https://dronebl.org/) is used.
### Network range loading
### Network range and automated filtering
Some specific search spiders do follow _robots.txt_ and are well behaved. However, many actors can reuse user agents, so the origin network ranges must be checked.
The samples provide example network range fetching and rules for Googlebot / Bingbot / DuckDuckBot / Kagibot / Qwantbot / Yandexbot.
Network ranges can be loaded via fetched JSON / TXT / HTML pages, or via lists. You can filter these using _jq_ or a regex.
@@ -363,117 +367,6 @@ services:
```
## Challenges
#### http
Verify incoming requests against a specified backend to allow the user through. Cookies and some other headers are passed.
For example, this allows verifying the user cookies against the backend to have the user skip all other challenges.
Example on Forgejo, checks that current user is authenticated:
```yaml
http-cookie-check:
mode: http
url: http://forgejo:3000/user/stopwatches
# url: http://forgejo:3000/repo/search
# url: http://forgejo:3000/notifications/new
parameters:
http-method: GET
http-cookie: i_like_gitea
http-code: 200
```
#### preload-link
Requires HTTP/2+ response parsing and logic, silent challenge (does not display a challenge page).
Browsers that support [103 Early Hints](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Status/103) are indicated to fetch a CSS resource via [Link](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Link) preload that solves the challenge.
The server waits until solved or defined timeout, then continues on other challenges if failed.
Example:
```yaml
self-preload-link:
condition: '"Sec-Fetch-Mode" in headers && headers["Sec-Fetch-Mode"] == "navigate"'
mode: "preload-link"
runtime:
# verifies that result = key
mode: "key"
probability: 0.1
parameters:
preload-early-hint-deadline: 3s
key-code: 200
key-mime: text/css
key-content: ""
```
#### header-refresh
Requires HTTP response parsing and logic, displays challenge site instantly.
Have the browser solve the challenge by following the URL listed on HTTP [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh) instantly.
#### meta-refresh
Requires HTTP and HTML response parsing and logic, displays challenge site instantly.
Have the browser solve the challenge by following the URL listed on HTML `<meta http-equiv=refresh>` tag instantly. Equivalent to above.
#### resource-load
Requires HTTP and HTML response parsing and logic, displays challenge site.
Servers a challenge page with a linked resource that is loaded by the browser, which solves the challenge. Page refreshes a few seconds later via [Refresh](https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Refresh).
Example:
```yaml
self-resource-load:
mode: "resource-load"
runtime:
# verifies that result = key
mode: "key"
probability: 0.1
parameters:
key-code: 200
key-mime: text/css
key-content: ""
```
#### cookie
Requires HTTP parsing and a Cookie Jar, silent challenge (does not display a challenge page unless failed).
Serves the client with a Set-Cookie that solves the challenge, and redirects it back to the same page. Browser must present the cookie to load.
Several tools implement this, but usually not mass scrapers.
#### js-pow-sha256
Requires JavaScript and workers, displays challenge site.
Has the user solve a Proof of Work using SHA256 hashes, with configurable difficulty.
Example:
```yaml
js-pow-sha256:
# Asset must be under challenges/{name}/static/{asset}
# Other files here will be available under that path
mode: js
asset: load.mjs
parameters:
# difficulty is number of bits that must be set to 0 from start
# Anubis challenge difficulty 5 becomes 5 * 8 = 20
difficulty: 20
runtime:
mode: wasm
# Verify must be under challenges/{name}/runtime/{asset}
asset: runtime.wasm
probability: 0.02
```
## Development