feat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from interruptions instead of restarting from scratch. - introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling - expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction - skip infobox tables during recipe parsing to avoid false recipe matches - add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction Add resumable checkpoint support so long scrapes can recover from interruptions instead of restarting from scratch. - introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling - expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction - skip infobox tables during recipe parsing to avoid false recipe matches - add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README
This commit is contained in:
@@ -36,6 +36,7 @@ var (
|
||||
"status": {emoji: "🌀", label: "STATUS", color: colorYellow},
|
||||
"done": {emoji: "✅", label: "DONE", color: colorGreen},
|
||||
"write": {emoji: "💾", label: "WRITE", color: colorBlue},
|
||||
"cache": {emoji: "🗂️", label: "CACHE", color: colorCyan},
|
||||
"skip": {emoji: "⏭️", label: "SKIP", color: colorGray},
|
||||
"warn": {emoji: "⚠️", label: "WARN", color: colorYellow},
|
||||
"error": {emoji: "💥", label: "ERROR", color: colorRed},
|
||||
|
||||
Reference in New Issue
Block a user