arXiv cs.LG (Machine Learning) INT ai 2026-05-08 13:00

特異学習理論によるグロッキングの視点

原題: A Basin-Selection Perspective on Grokking via Singular Learning Theory

分析結果

カテゴリ: 教育
重要度: 59
トレンドスコア: 18
要約: グロッキングとは、長期間の訓練後に記憶から一般化への急激な移行を指し、異なる状態を持つ競合する解の盆地が存在することを示唆しています。本稿では、特異学習理論を用いてこの現象を分析し、解の盆地選択の観点からグロッキングのメカニズムを探ります。
キーワード: basin grokking learning transition generalisation training basins loss

arXiv:2603.01192v3 Announce Type: replace-cross Abstract: Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape. The key measure is the local learning coefficient (LLC) which quantifies the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging SLT, we develop a basin-selection perspective on grokking in quadratic networks: LLC ranks competing near-zero-loss basins by statistical preference, while the training-time transition between them is governed by optimisation dynamics. In this view, grokking corresponds to a transition from a higher-LLC (memorising) basin to a lower-LLC (generalising) basin that dominates the posterior. To support this, we derive analytic formulas for the LLC in shallow quadratic networks under both lazy and feature learning regimes. Empirically, we demonstrate that LLC trajectories estimated from training data track the onset of generalisation and provide an informative probe of the optimisation path. arXiv:2603.01192v3 Announce Type: replace-cross Abstract: Grokking, the abrupt transition from memorization to generalisation after extended training, suggests the presence of competing solution basins with distinct statistical properties. We study this phenomenon through the lens of Singular Learning Theory (SLT), a Bayesian framework that characterizes the geometry of the loss landscape. The key measure is the local learning coefficient (LLC) which quantifies the local degeneracy of the loss surface. SLT links lower-LLC basins to higher posterior mass concentration and lower expected generalisation error. Leveraging SLT, we develop a basin-selection perspective on grokking in quadratic networks: LLC ranks competing near-zero-loss basins by statistical preference, while the training-time transition between them is governed by optimisation dynamics. In this view, grokking corresponds to a transition from a higher-LLC (memorising) basin to a lower-LLC (generalising) basin that dominates the posterior. To support this, we derive analytic formulas for the LLC in shallow quadratic networks under both lazy and feature learning regimes. Empirically, we demonstrate that LLC trajectories estimated from training data track the onset of generalisation and provide an informative probe of the optimisation path.