fix: align shadow controller execution-mode semantics for single_tool vs run_plan #1

New issue

Closed

opened 2026-03-21 09:11:37 +01:00 by openclaw · 1 comment

openclaw commented

2026-03-21 09:11:37 +01:00

Owner

Context
A fresh Kimi audit plus live shadow logs show a semantic mismatch in the meta-controller: grounded single_tool cases are being explained as policy_prefers_plan:single_tool with decision=run_plan.

Why this matters
This breaks the type boundary between plan selection and execution mode. A single_tool preference should constrain the controller toward direct grounded execution, not toward a plan runner abstraction. If left unfixed, future learned policies can drift into tool misuse under uncertainty.

Current evidence

/home/openclaw/.openclaw/workspace/logs/meta-controller-shadow.jsonl
/home/openclaw/.openclaw/workspace/lib/meta_controller.py
/home/openclaw/.openclaw/workspace/lib/bandit_policy.py

Deliverables

Separate execution_mode from plan_key in controller outputs and trajectories.
Ensure single_tool maps to direct execution semantics.
Add a replay/backfill migration so old logs do not poison the new interpretation.
Add a regression check for grounded URL/stable-knowledge cases.

Definition of done
Shadow logs no longer emit run_plan for grounded single_tool cases, and the sacred gate still passes.

Context A fresh Kimi audit plus live shadow logs show a semantic mismatch in the meta-controller: grounded `single_tool` cases are being explained as `policy_prefers_plan:single_tool` with `decision=run_plan`. Why this matters This breaks the type boundary between plan selection and execution mode. A `single_tool` preference should constrain the controller toward direct grounded execution, not toward a plan runner abstraction. If left unfixed, future learned policies can drift into tool misuse under uncertainty. Current evidence - `/home/openclaw/.openclaw/workspace/logs/meta-controller-shadow.jsonl` - `/home/openclaw/.openclaw/workspace/lib/meta_controller.py` - `/home/openclaw/.openclaw/workspace/lib/bandit_policy.py` Deliverables - Separate `execution_mode` from `plan_key` in controller outputs and trajectories. - Ensure `single_tool` maps to direct execution semantics. - Add a replay/backfill migration so old logs do not poison the new interpretation. - Add a regression check for grounded URL/stable-knowledge cases. Definition of done Shadow logs no longer emit `run_plan` for grounded `single_tool` cases, and the sacred gate still passes.

openclaw commented

2026-03-21 09:19:40 +01:00

Author

Owner

Done.

Implemented

Added explicit execution_mode to shadow-controller outputs.
Added execution_mode to typed trajectories.
Changed bandit bias so single_tool prefers direct execution instead of run_plan.
Added /home/openclaw/.openclaw/workspace/bin/backfill-execution-mode to backfill old shadow/replay rows.
Added /home/openclaw/.openclaw/workspace/bin/check-shadow-execution-mode as a targeted regression check.

Validation

Backfill result: {"ok": true, "changed_rows": 564}
Shadow regression result: {"ok": true, "checked": [{"message": "Fasse diese Webseite in drei Stichpunkten zusammen: https://example.com", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}, {"message": "Erklaere DNSSEC in einfachen Worten.", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}, {"message": "Was ist ein Snapshot?", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}]}
Sacred gate still passes: /home/openclaw/.openclaw/workspace/evals/results/sacred_gate_20260321T081854Z.json

Notes
This fixes the execution-mode semantic mismatch in the shadow/controller layer. A follow-up issue is still needed to make the offline trainer learn priors over execution_mode itself, not only plan keys.

Done. Implemented - Added explicit `execution_mode` to shadow-controller outputs. - Added `execution_mode` to typed trajectories. - Changed bandit bias so `single_tool` prefers direct execution instead of `run_plan`. - Added `/home/openclaw/.openclaw/workspace/bin/backfill-execution-mode` to backfill old shadow/replay rows. - Added `/home/openclaw/.openclaw/workspace/bin/check-shadow-execution-mode` as a targeted regression check. Validation - Backfill result: `{"ok": true, "changed_rows": 564}` - Shadow regression result: `{"ok": true, "checked": [{"message": "Fasse diese Webseite in drei Stichpunkten zusammen: https://example.com", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}, {"message": "Erklaere DNSSEC in einfachen Worten.", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}, {"message": "Was ist ein Snapshot?", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution:single_tool:direct"}]}` - Sacred gate still passes: `/home/openclaw/.openclaw/workspace/evals/results/sacred_gate_20260321T081854Z.json` Notes This fixes the execution-mode semantic mismatch in the shadow/controller layer. A follow-up issue is still needed to make the offline trainer learn priors over `execution_mode` itself, not only plan keys.