outward-scrapper

5 Commits

Include renames

Author	SHA1	Message	Date
kato	a83fa251a0	Implemented basic test functionality	2026-03-16 23:28:25 +02:00
kato	b9d035d71a	feat(webui): add cached image proxy with configurable dir Add disk-backed image proxy support to the web UI and expose it via `/api/image`. The proxy validates image URLs, fetches remote images with a timeout, stores image bytes + metadata in a local cache, and serves cached responses with proper content type and cache headers. Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it through `cmd/outward-web` into `webui.Run`, with startup logging updated to include the cache location. This reduces repeated remote fetches and makes image delivery more reliable for the UI.feat(webui): add cached image proxy with configurable dir Add disk-backed image proxy support to the web UI and expose it via `/api/image`. The proxy validates image URLs, fetches remote images with a timeout, stores image bytes + metadata in a local cache, and serves cached responses with proper content type and cache headers. Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it through `cmd/outward-web` into `webui.Run`, with startup logging updated to include the cache location. This reduces repeated remote fetches and makes image delivery more reliable for the UI.	2026-03-16 09:56:47 +02:00
kato	ad3385c63b	Evil	2026-03-15 18:23:58 +02:00
kato	6bf221de3f	feat(scraper): add checkpointing and richer page extraction Add resumable checkpoint support so long scrapes can recover from interruptions instead of restarting from scratch. - introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling - expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction - skip infobox tables during recipe parsing to avoid false recipe matches - add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction Add resumable checkpoint support so long scrapes can recover from interruptions instead of restarting from scratch. - introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling - expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction - skip infobox tables during recipe parsing to avoid false recipe matches - add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README	2026-03-15 17:08:24 +02:00
kato	42e2083ece	Initial COmmit	2026-03-15 16:42:43 +02:00