arXiv cs.LG (Machine Learning) INT ai 2026-04-29 13:00

大規模言語モデルにおける幻覚の原則的検出と多重検定

原題: Principled Detection of Hallucinations in Large Language Models via Multiple Testing

分析結果

カテゴリ: AI
重要度: 85
トレンドスコア: 34
要約: 大規模言語モデル（LLM）は多様なタスクを解決するための強力な基盤モデルとして登場しましたが、幻覚を引き起こす傾向があることも示されています。
キーワード: hallucinations large language models detection hypothesis testing multiple testing conformal p-values robustness
長期重要性: 数年で重要
ビジネス可能性: 高いビジネス化可能性がある。特にAI技術の信頼性向上に寄与するため。
日本波及可能性: 高 - 日本のAI産業における信頼性向上に寄与し、ビジネスや社会への影響が大きい。

arXiv:2508.18473v3 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. Existing hallucination detectors propose a wide range of empirical scoring rules, but their performance varies across models and datasets, and it is hard to determine which ones to rely on in practice or to treat as a reliable detector. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels with the problem of out-of-distribution detection in machine learning models. We then propose a multiple-testing-inspired method that systematically aggregates multiple evaluation scores via conformal p-values, enabling calibrated detection with controlled false alarm rate. Extensive experiments across diverse models and datasets validate the robustness of our approach against state-of-the-art methods. arXiv:2508.18473v3 Announce Type: replace-cross Abstract: While Large Language Models (LLMs) have emerged as powerful foundational models to solve a variety of tasks, they have also been shown to be prone to hallucinations, i.e., generating responses that sound confident but are actually incorrect or even nonsensical. Existing hallucination detectors propose a wide range of empirical scoring rules, but their performance varies across models and datasets, and it is hard to determine which ones to rely on in practice or to treat as a reliable detector. In this work, we formulate the problem of detecting hallucinations as a hypothesis testing problem and draw parallels with the problem of out-of-distribution detection in machine learning models. We then propose a multiple-testing-inspired method that systematically aggregates multiple evaluation scores via conformal p-values, enabling calibrated detection with controlled false alarm rate. Extensive experiments across diverse models and datasets validate the robustness of our approach against state-of-the-art methods.

大規模言語モデルにおける幻覚の原則的検出と多重検定

分析結果

類似記事（ベクトル近傍）