arXiv cs.LG (Machine Learning) INT ai 2026-04-28 13:00

表現の曲率が大規模言語モデルにおける行動の不確実性を調整する

原題: Representational Curvature Modulates Behavioral Uncertainty in Large Language Models

分析結果

カテゴリ: AI
重要度: 85
トレンドスコア: 34
要約: 自己回帰型の大規模言語モデル（LLM）において、時間的な直線化が次のトークン予測の目的が表現をどのように形成するかを説明します。モデルは、トークンの予測を学習します。
キーワード: 大規模言語モデル表現の曲率次トークン予測エントロピートレーニング行動の不確実性モデルパフォーマンス
長期重要性: 数年で重要
ビジネス可能性: 高いビジネス化可能性がある
日本波及可能性: 高 - 日本のAI研究や開発において新たな知見を提供し、技術革新を促進する可能性があるため。

arXiv:2604.23985v1 Announce Type: cross Abstract: In autoregressive large language models (LLMs), temporal straightening offers an account of how the next-token prediction objective shapes representations. Models learn to progressively straighten the representational trajectory of input sequences across layers, potentially facilitating next-token prediction via linear extrapolation. However, a direct link between this trajectory and token-level behavior has been missing. We provide such a link by relating contextual curvature-a geometric measure of how sharply the representational trajectory bends over recent context-to next-token entropy. Across two models (GPT-2 XL and Pythia-2.8B), contextual curvature is correlated with entropy, and this relationship emerges during training. Perturbation experiments reveal selective dependence: manipulating curvature through trajectory-aligned interventions reliably modulates entropy, while geometrically misaligned perturbations have no effect. Finally, regularizing representations to be straighter during training modestly reduces token-level entropy without degrading validation loss. These results identify trajectory curvature as a task-aligned representational feature that influences behavioral uncertainty in LLMs. arXiv:2604.23985v1 Announce Type: cross Abstract: In autoregressive large language models (LLMs), temporal straightening offers an account of how the next-token prediction objective shapes representations. Models learn to progressively straighten the representational trajectory of input sequences across layers, potentially facilitating next-token prediction via linear extrapolation. However, a direct link between this trajectory and token-level behavior has been missing. We provide such a link by relating contextual curvature-a geometric measure of how sharply the representational trajectory bends over recent context-to next-token entropy. Across two models (GPT-2 XL and Pythia-2.8B), contextual curvature is correlated with entropy, and this relationship emerges during training. Perturbation experiments reveal selective dependence: manipulating curvature through trajectory-aligned interventions reliably modulates entropy, while geometrically misaligned perturbations have no effect. Finally, regularizing representations to be straighter during training modestly reduces token-level entropy without degrading validation loss. These results identify trajectory curvature as a task-aligned representational feature that influences behavioral uncertainty in LLMs.

表現の曲率が大規模言語モデルにおける行動の不確実性を調整する

分析結果

類似記事（ベクトル近傍）