refactor: normalize chosen_plan taxonomy so plan templates are not overloaded with single_tool semantics #10
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Issue #9 successfully taught the offline policy builder to learn execution-mode priors separately from plan-key priors.
New structural gap
Live shadow traces now expose a deeper taxonomy problem: some rows have
execution_mode=planfor good reasons, but still carrychosen_plan=single_tool. That meanschosen_planis currently mixing plan-template identity with tool-cardinality shorthand.Why this matters
If we keep learning on top of an overloaded
chosen_plan, the policy layer will keep conflating:This weakens interpretability and will eventually corrupt higher-order planning priors.
Deliverables
chosen_planalways names a real template.execution_modeandprimary_tool_countas separate fields.single_toolas the plan key.Definition of done
Replay, policy stats, and shadow traces use a stable plan-template vocabulary that does not overload
single_tool.Done.
Implemented
/home/openclaw/.openclaw/workspace/lib/plan_taxonomy.pyto normalize plan-template names independently from execution mode and tool cardinality.chosen_plannow names a real template such assingle_tool_directorsingle_tool_with_setup_evidence.primary_tool_countto shadow traces and typed trajectories./home/openclaw/.openclaw/workspace/bin/backfill-plan-taxonomy./home/openclaw/.openclaw/workspace/bin/check-plan-taxonomyto prove plan-mode rows no longer usesingle_toolas the plan key.Validation
{"ok": true, "changed_rows": 586}single_tool_direct,service_then_access_clear,memory_then_setup_lookup./home/openclaw/.openclaw/workspace/evals/results/sacred_gate_20260321T082906Z.jsonNew structural finding
Policy artifacts can become stale immediately after validation runs. We need an atomic policy-refresh/snapshot boundary so evals, candidates, and shadow comparisons all refer to the same replay cut.