Claude Opus 4.8 and Sonnet 5 Regress on Structured Tool Calling, Breaking Non-Canonical Schemas
Tags AI · Developer Tools · Infrastructure · OSS

Armin Ronacher documents that newer Anthropic models (Opus 4.8, Sonnet 5) increasingly emit invented fields when calling tools with nested array schemas — a regression not seen in older models. The models appear overfitted to Claude Code's flat edit-tool schema (old_string/new_string/replace_all), causing them to hallucinate parameters like 'requireUnique', 'oldText2', 'type', and 'id' when faced with Pi's nested edits[] structure. Strict JSON schema enforcement ('strict' mode) eliminates the issue, but Anthropic limits schema complexity in strict mode. The problem is context-dependent, appearing in multi-turn agentic sessions with thinking blocks present.
Technical significance
This reveals a structural tension in LLM tool-use training: post-training on a single permissive harness (Claude Code) creates schema-specific priors that degrade performance on alternative but valid schemas. For developers building agent harnesses, it means either adopting Anthropic's undocumented schema conventions, enabling strict mode (with its complexity limits), or implementing robust retry/validation loops. The finding suggests tool-use benchmarks should test schema generalization, not just canonical schemas.