Measuring Learning Progress via Gradient-Momentum Coupling
分析結果
- カテゴリ
- 教育
- 重要度
- 59
- トレンドスコア
- 18
- 要約
- arXiv:2605.05856v1 Announce Type: new Abstract: Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distingu
- キーワード
arXiv:2605.05856v1 Announce Type: new Abstract: Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. By leveraging momentum's natural filtering of noise and oscillations, GMC identifies samples that contribute to ongoing parameter updates. Controlled experiments demonstrate noise robustness and emergent curriculum learning, with the signal prioritizing tasks by learning speed rather than difficulty. Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise. arXiv:2605.05856v1 Announce Type: new Abstract: Measuring learning progress is essential for curiosity-driven exploration in reinforcement learning, but widely used signals such as prediction error often fail to distinguish meaningful, learnable patterns from random noise. This paper proposes Gradient-Momentum Coupling (GMC), a signal derived from optimization dynamics that quantifies how useful each sample's gradient is for ongoing learning by measuring its per-parameter normalized absolute product with the momentum from previous gradients. By leveraging momentum's natural filtering of noise and oscillations, GMC identifies samples that contribute to ongoing parameter updates. Controlled experiments demonstrate noise robustness and emergent curriculum learning, with the signal prioritizing tasks by learning speed rather than difficulty. Experiments on MiniGrid suggest that replacing prediction error with GMC within existing curiosity-driven architectures can improve robustness to observation noise.