Global Trend Radar
arXiv cs.LG (Machine Learning) INT ai 2026-05-08 13:00

Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions

元記事を開く →

分析結果

カテゴリ
教育
重要度
59
トレンドスコア
18
要約
arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporati
キーワード
arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$. arXiv:2512.20974v2 Announce Type: replace Abstract: Bayesian Reinforcement Learning (BRL), a subclass of Meta-Reinforcement Learning (Meta-RL), provides a principled framework for generalisation by explicitly incorporating Bayesian task parameters into transition and reward models. However, classical BRL methods assume known forms of transition and reward models. While recent deep BRL methods incorporate model learning to address this, applying neural networks directly to joint data and task parameters necessitates variational inference. This often yields indistinct task representations, compromising the resulting BRL policies. To overcome these limitations, we introduce Generalised Linear Models in Deep Bayesian RL with Learnable Basis Functions (GLiBRL). Our approach features fully tractable Bayesian inference over task parameters and model noise, alongside exact marginal likelihood evaluation for learning transition and reward models. The permutation-invariant nature of exact Bayesian inference in GLiBRL enables seamless integration with both on-policy and off-policy RL algorithms. We further show that GLiBRL admits a closed-form relationship between the $\mathcal{L}_2$ distance of its task representations and empirical kernel-based correspondence between task samples, which is to our knowledge the first such structural result for online deep BRL. GLiBRL is compared against representative and recent Meta-RL methods, and improves state-of-the-art performance on both MuJoCo and MetaWorld benchmarks by up to 1.8$\times$.