エージェント指示の自動形式化とポリシーとしてのコード
原題: Autoformalization of Agent Instructions into Policy-as-Code
分析結果
- カテゴリ
- 法律・制度
- 重要度
- 67
- トレンドスコア
- 26
- 要約
- 高リスク領域におけるエージェントの安全性には、正式なポリシーの施行が必要ですが、既存のアプローチの多くは確率的なガードレール(微調整された分類器やプロンプト)に依存しています。本研究では、エージェントの指示を自動的に形式化し、ポリシーとしてコード化する方法を提案します。これにより、エージェントの行動をより安全に制御できる可能性があります。
- キーワード
arXiv:2606.26649v1 Announce Type: new Abstract: Agent safety in high-stakes domains requires formal policy enforcement, but most existing approaches either rely on probabilistic guardrails (fine-tuned classifiers, prompt-based steering) that offer no formal guarantees, or on hand-coded symbolic enforcement that does not scale to the breadth of real policy specifications. We present an autoformalization pipeline that translates agent prompts, MCP tool descriptions, and natural language policy documents into formally verified policies using an LLM-based generator-critic loop. The resulting policies are written in the Cedar Policy Language. On the MedAgentBench benchmark, our autoformalized policies cover substantially more of the source natural-language specification than the hand-coded symbolic enforcement in prior work. arXiv:2606.26649v1 Announce Type: new Abstract: Agent safety in high-stakes domains requires formal policy enforcement, but most existing approaches either rely on probabilistic guardrails (fine-tuned classifiers, prompt-based steering) that offer no formal guarantees, or on hand-coded symbolic enforcement that does not scale to the breadth of real policy specifications. We present an autoformalization pipeline that translates agent prompts, MCP tool descriptions, and natural language policy documents into formally verified policies using an LLM-based generator-critic loop. The resulting policies are written in the Cedar Policy Language. On the MedAgentBench benchmark, our autoformalized policies cover substantially more of the source natural-language specification than the hand-coded symbolic enforcement in prior work.