2026-03-15 16:42:43 +02:00
|
|
|
package model
|
|
|
|
|
|
|
|
|
|
type Recipe struct {
|
|
|
|
|
Result string `json:"result"`
|
|
|
|
|
ResultCount string `json:"result_count,omitempty"`
|
|
|
|
|
Ingredients []string `json:"ingredients,omitempty"`
|
|
|
|
|
Station string `json:"station,omitempty"`
|
|
|
|
|
SourcePage string `json:"source_page,omitempty"`
|
|
|
|
|
}
|
|
|
|
|
|
feat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README
2026-03-15 17:08:24 +02:00
|
|
|
type Table struct {
|
|
|
|
|
Title string `json:"title,omitempty"`
|
|
|
|
|
Headers []string `json:"headers,omitempty"`
|
|
|
|
|
Rows []map[string]string `json:"rows,omitempty"`
|
|
|
|
|
RawRows [][]string `json:"raw_rows,omitempty"`
|
|
|
|
|
}
|
|
|
|
|
|
2026-03-15 16:42:43 +02:00
|
|
|
type Item struct {
|
|
|
|
|
Name string `json:"name"`
|
|
|
|
|
URL string `json:"url"`
|
|
|
|
|
Categories []string `json:"categories,omitempty"`
|
|
|
|
|
Infobox map[string]string `json:"infobox,omitempty"`
|
feat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README
2026-03-15 17:08:24 +02:00
|
|
|
ImageURL string `json:"image_url,omitempty"`
|
2026-03-15 16:42:43 +02:00
|
|
|
Effects []string `json:"effects,omitempty"`
|
|
|
|
|
EffectLinks []string `json:"effect_links,omitempty"`
|
|
|
|
|
Recipes []Recipe `json:"recipes,omitempty"`
|
feat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in READMEfeat(scraper): add checkpointing and richer page extraction
Add resumable checkpoint support so long scrapes can recover from
interruptions instead of restarting from scratch.
- introduce autosave/load/clear checkpoint flow in `.cache/scrape-state.json`, including SIGINT/SIGTERM save-on-exit handling
- expand parsing/model output to capture legacy and portable infobox fields, primary image URLs, effects, recipes, raw tables, and improved category extraction
- skip infobox tables during recipe parsing to avoid false recipe matches
- add cache log event type, ignore cache/output artifacts, and document new autosave tuning options in README
2026-03-15 17:08:24 +02:00
|
|
|
Tables []Table `json:"tables,omitempty"`
|
2026-03-15 16:42:43 +02:00
|
|
|
Description string `json:"description,omitempty"`
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type Effect struct {
|
|
|
|
|
Name string `json:"name"`
|
|
|
|
|
URL string `json:"url"`
|
|
|
|
|
Categories []string `json:"categories,omitempty"`
|
|
|
|
|
Infobox map[string]string `json:"infobox,omitempty"`
|
|
|
|
|
Description string `json:"description,omitempty"`
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
type Dataset struct {
|
|
|
|
|
Items []Item `json:"items"`
|
|
|
|
|
Effects []Effect `json:"effects"`
|
|
|
|
|
}
|