Initial COmmit
This commit is contained in:
40
README.md
Normal file
40
README.md
Normal file
@@ -0,0 +1,40 @@
|
||||
# Scrappr
|
||||
|
||||
Small Go scraper for the Outward Fandom wiki.
|
||||
|
||||
## Layout
|
||||
|
||||
```text
|
||||
.
|
||||
├── cmd/scrappr/main.go # binary entrypoint
|
||||
├── internal/app # bootstrapping and output writing
|
||||
├── internal/logx # colored emoji logger
|
||||
├── internal/model # dataset models
|
||||
├── internal/scraper # crawl flow, parsing, queueing, retries
|
||||
├── go.mod
|
||||
├── go.sum
|
||||
└── outward_data.json # generated output
|
||||
```
|
||||
|
||||
## Run
|
||||
|
||||
```bash
|
||||
go run ./cmd/scrappr
|
||||
```
|
||||
|
||||
## What It Does
|
||||
|
||||
- Crawls item and crafting pages from `outward.fandom.com`
|
||||
- Uses browser-like headers and rotating user agents
|
||||
- Limits crawl depth and queue size to avoid drifting into junk pages
|
||||
- Retries temporary failures with short backoff
|
||||
- Prints colored emoji logs for queueing, requests, responses, parsing, retries, and periodic status
|
||||
- Writes a stable, sorted JSON dataset to `outward_data.json`
|
||||
|
||||
## Tuning
|
||||
|
||||
Scraper defaults live in `internal/scraper/config.go`.
|
||||
|
||||
- Lower or raise `RequestDelay` / `RequestJitter`
|
||||
- Tighten or relax `MaxQueuedPages`
|
||||
- Adjust `RequestTimeout`, `MaxRetries`, and `ProgressEvery`
|
||||
Reference in New Issue
Block a user