AI Evaluation
This section showcases a systematic evaluation of entity extraction quality comparing Claude (LLM) against spaCy (statistical NLP) across U.S. government domain text. Explore the results, browse the gold dataset, and read the full methodology.
Explore
Model Comparison Results
Visual comparison of spaCy vs Claude with P/R/F1 charts, per-branch breakdown, and per-entity-type analysis.
Gold Dataset Explorer
Browse the 113 evaluation articles with ground-truth entity annotations, perturbation labels, and difficulty ratings.
Evaluation Methodology
Entity taxonomy, gold dataset construction, fuzzy matching strategy, and the full evaluation design.