Add disk-backed image proxy support to the web UI and expose it via
`/api/image`. The proxy validates image URLs, fetches remote images with
a timeout, stores image bytes + metadata in a local cache, and serves
cached responses with proper content type and cache headers.
Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it
through `cmd/outward-web` into `webui.Run`, with startup logging updated
to include the cache location.
This reduces repeated remote fetches and makes image delivery more
reliable for the UI.feat(webui): add cached image proxy with configurable dir
Add disk-backed image proxy support to the web UI and expose it via
`/api/image`. The proxy validates image URLs, fetches remote images with
a timeout, stores image bytes + metadata in a local cache, and serves
cached responses with proper content type and cache headers.
Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it
through `cmd/outward-web` into `webui.Run`, with startup logging updated
to include the cache location.
This reduces repeated remote fetches and makes image delivery more
reliable for the UI.
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README