Commit Graph

7 Commits

Author SHA1 Message Date
d05ed6c6d6 Merge pull request 'Implemented basic test functionality' (#1) from feat/test into master
Reviewed-on: #1
2026-03-16 23:29:53 +02:00
a83fa251a0 Implemented basic test functionality 2026-03-16 23:28:25 +02:00
f011ef66cf Git tracking fix 2026-03-16 09:57:24 +02:00
b9d035d71a feat(webui): add cached image proxy with configurable dir
Add disk-backed image proxy support to the web UI and expose it via
`/api/image`. The proxy validates image URLs, fetches remote images with
a timeout, stores image bytes + metadata in a local cache, and serves
cached responses with proper content type and cache headers.

Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it
through `cmd/outward-web` into `webui.Run`, with startup logging updated
to include the cache location.

This reduces repeated remote fetches and makes image delivery more
reliable for the UI.feat(webui): add cached image proxy with configurable dir

Add disk-backed image proxy support to the web UI and expose it via
`/api/image`. The proxy validates image URLs, fetches remote images with
a timeout, stores image bytes + metadata in a local cache, and serves
cached responses with proper content type and cache headers.

Also add `SCRAPPR_IMAGE_CACHE` (default `.cache/webui-images`) and pass it
through `cmd/outward-web` into `webui.Run`, with startup logging updated
to include the cache location.

This reduces repeated remote fetches and makes image delivery more
reliable for the UI.
2026-03-16 09:56:47 +02:00
ad3385c63b Evil 2026-03-15 18:23:58 +02:00
6bf221de3f feat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.

- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction

Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.

- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README
2026-03-15 17:08:24 +02:00
42e2083ece Initial COmmit 2026-03-15 16:42:43 +02:00