Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions
分析結果
- カテゴリ
- 教育
- 重要度
- 59
- トレンドスコア
- 18
- 要約
- arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporati
- キーワード
arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$. arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$.