arXiv cs.LG (Machine Learning) INT ai 2026-05-08 13:00

離散データの生成モデルにおける幾何学的潜在部分空間の利用

原題: Generative Modeling of Discrete Data Using Geometric Latent Subspaces

分析結果

カテゴリ: 宇宙
重要度: 59
トレンドスコア: 18
要約: 本研究では、離散データの生成モデルのための幾何学的潜在部分空間フレームワークを提案します。特に、指数パラメータの潜在部分空間を導入し、生成モデルの性能を向上させる方法を探ります。このアプローチにより、離散データの生成における新たな可能性が開かれることを目指しています。
キーワード: latent discrete parameter space dimensional geometry generative geometric

arXiv:2601.21831v2 Announce Type: replace-cross Abstract: We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data. arXiv:2601.21831v2 Announce Type: replace-cross Abstract: We propose a geometric latent-subspace framework for generative modeling of discrete data. Specifically, we introduce latent subspaces in the exponential parameter space of product manifolds of categorical distributions as a novel method for learning generative models of discrete data. The resulting low-dimensional latent space encodes statistical dependencies and removes redundant degrees of freedom among the categorical variables. We equip the parameter domain with a Riemannian geometry such that the latent subspace and induced data manifold are related by isometries enabling consistent flow matching. Exploiting this structure, we propose a geometry-aware dimensionality reduction objective, called geometric PCA (GPCA), which we formulate as a regularized cross-entropy minimization that encourages small Riemannian distances between the data and their reconstructions. In particular, under the induced geometry, geodesics become straight lines in the latent parameter space which makes model training by flow matching effective. Empirical results show that low-dimensional latent representations suffice to accurately model high-dimensional discrete data.