feat: make policy refresh atomic so replay stats, candidate policy, and eval runs share a consistent snapshot #11
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Context
Issue #10 normalized the plan-template taxonomy and validated it successfully.
New structural gap
During validation we observed that
policy_stats.jsonandpolicy_candidate.jsoncan diverge immediately after shadow-regression scripts add fresh replay rows. Example: stats hadsingle_tool_direct.count = 18while the candidate still reflected15because the candidate was generated before the last validation pass.Why this matters
A learning system that compares or promotes against stale policy snapshots is hard to reason about. Shadow traces, sacred-gate results, and candidate policy reviews need to point at the same replay cut, otherwise offline learning becomes subtly non-reproducible.
Deliverables
Definition of done
Policy stats, candidate policy, and eval outputs all reference the same snapshot ID and no longer drift during validation.
Implemented and live-verified atomic policy snapshots.
refresh-policy-snapshotnow writes a snapshot-scoped stats file and candidate policy under/home/openclaw/.openclaw/workspace/data/policy_snapshots/<snapshot_id>/, publishes a manifest at/home/openclaw/.openclaw/workspace/data/policy_snapshot.json, andrun-sacred-evalsnow refreshes and pinsOPENCLAW_POLICY_CANDIDATE_PATH/OPENCLAW_POLICY_SNAPSHOT_IDfor the entire eval run. Live checks confirmed matching snapshot ids across manifest, snapshot stats, snapshot candidate, shadow-controller policy hints, and sacred-gate output. Outcome logging now writes incremental counters topolicy_stats_live.json, so pinned snapshots no longer drift under live traffic.