feat: train execution-mode priors separately from plan-key priors in offline policy builder #9
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Issue #1 separated
execution_modefromchosen_planin shadow outputs and typed trajectories, and fixed the immediate shadow-controller misuse forsingle_tool.New structural gap
The offline trainer still aggregates policy mainly by
chosen_plan/family, not byexecution_mode. That means the learning loop still cannot directly preferdirectvsplanvsmemoryvsclarifyas first-class priors, even though trajectories now record that distinction.Why this matters
If we want a genuinely smarter meta-controller, execution semantics need their own learned statistics. Otherwise plan keys continue to carry too much meaning and we risk reintroducing semantic drift in the next policy stage.
Deliverables
policy_stats.jsonandpolicy_candidate.jsonwith execution-mode buckets.train-policy-offlineto compute priors fordirect,plan,memory, andclarify.directbecause of execution-mode priors, not only because of hardcoded mapping.Definition of done
Offline policy artifacts expose learned execution-mode priors and the shadow controller can cite them in its decision trace.
Done.
Implemented
policy_stats.jsonwith anexecution_modesbucket.policy_candidate.jsonwith learned priors fordirectandplan.train-policy-offlineto emit execution-mode priors.bandit_policy.pyto read execution-mode priors and bias decisions from them.meta_controller.pyto surfaceexecution_mode_priorinpolicy_hint./home/openclaw/.openclaw/workspace/bin/rebuild-policy-statsto rebuild stats from replay with the new bucket./home/openclaw/.openclaw/workspace/bin/check-shadow-execution-mode-priorsto verify grounded single-tool cases are now preferred because of execution-mode priors.Validation
{"ok": true, "plans": 4, "families": 8, "execution_modes": 2}direct -> mode=prefer, beta_mean=0.9411764705882353,plan -> mode=prefer, beta_mean=0.9{"ok": true, "checked": [{"message": "Fasse diese Webseite in drei Stichpunkten zusammen: https://example.com", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution_mode:direct"}, {"message": "Erklaere DNSSEC in einfachen Worten.", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution_mode:direct"}, {"message": "Was ist ein Snapshot?", "decision": "answer_direct", "execution_mode": "direct", "reason": "policy_prefers_execution_mode:direct"}]}/home/openclaw/.openclaw/workspace/evals/results/sacred_gate_20260321T082413Z.jsonNew structural finding
chosen_planis still semantically overloaded. We now see traces whereexecution_mode=planis correct, butchosen_planstill sayssingle_tool. That should be split into a real plan-template taxonomy so future learning does not conflate cardinality with execution structure.