Files

50 lines
1.6 KiB
Markdown
Raw Permalink Normal View History

2026-03-15 16:42:43 +02:00
# Scrappr
Small Go scraper for the Outward Fandom wiki.
## Layout
```text
.
2026-03-15 18:23:58 +02:00
├── cmd/outward-web/main.go # web UI entrypoint
2026-03-15 16:42:43 +02:00
├── cmd/scrappr/main.go # binary entrypoint
├── internal/app # bootstrapping and output writing
├── internal/logx # colored emoji logger
├── internal/model # dataset models
├── internal/scraper # crawl flow, parsing, queueing, retries
2026-03-15 18:23:58 +02:00
├── internal/webui # embedded web server + static UI
2026-03-15 16:42:43 +02:00
├── go.mod
├── go.sum
└── outward_data.json # generated output
```
## Run
```bash
go run ./cmd/scrappr
```
2026-03-15 18:23:58 +02:00
```bash
go run ./cmd/outward-web
```
2026-03-15 16:42:43 +02:00
## What It Does
- Crawls item and crafting pages from `outward.fandom.com`
- Uses browser-like headers and rotating user agents
- Limits crawl depth and queue size to avoid drifting into junk pages
- Retries temporary failures with short backoff
- Prints colored emoji logs for queueing, requests, responses, parsing, retries, and periodic status
- Stores legacy and portable infobox fields, primary item image URLs, recipes, effects, and raw content tables for later processing
- Saves resumable checkpoints into `.cache/scrape-state.json` on a timer, during progress milestones, and on `Ctrl+C`
2026-03-15 16:42:43 +02:00
- Writes a stable, sorted JSON dataset to `outward_data.json`
2026-03-15 18:23:58 +02:00
- Serves a local craft-planner UI backed by recipes from `outward_data.json`
2026-03-15 16:42:43 +02:00
## Tuning
Scraper defaults live in `internal/scraper/config.go`.
- Lower or raise `RequestDelay` / `RequestJitter`
- Tighten or relax `MaxQueuedPages`
- Adjust `RequestTimeout`, `MaxRetries`, `ProgressEvery`, `AutosaveEvery`, and `AutosavePages`