Global Trend Radar
arXiv cs.LG (Machine Learning) INT ai 2026-06-26 13:00

漸近的に最適な学習に関するパラメトリック予言者不等式

原題: Asymptotically Optimal Learning for Parametric Prophet Inequalities

元記事を開く →

分析結果

カテゴリ
教育
重要度
53
トレンドスコア
12
要約
本研究では、未知のパラメータ$ heta$を持つ指数型パラメトリックファミリーから引き出されたi.i.d.報酬を用いた予言者不等式における学習を考察します。このクラスには指数分布が含まれ、最適な学習戦略の特性を明らかにすることを目指しています。
キーワード
arXiv:2606.26893v1 Announce Type: new Abstract: We study learning in prophet inequalities with i.i.d. rewards drawn from an exponential-type parametric family with an unknown parameter $\theta$, a class that includes exponential, Pareto, and bounded-support power-family distributions. We first characterize the optimal full-information asymptotic competitive ratio for this family. In the unbounded-support case, the limit is $ {\left({\theta}/({\theta-c_+})\right)^{c_+/\theta}}/ {\Gamma(1-c_+/\theta)},$ while in the bounded-support case, the limit is $1$. We then propose a confidence-based dynamic-programming policy for online learning. By exploiting the explicit parametric structure, the policy achieves the same optimal asymptotic competitive ratio using only online observations, without external offline samples. We further derive distribution-specific convergence rates for canonical examples. Finally, numerical experiments on synthetic instances illustrate the performance of our algorithm. arXiv:2606.26893v1 Announce Type: new Abstract: We study learning in prophet inequalities with i.i.d. rewards drawn from an exponential-type parametric family with an unknown parameter $\theta$, a class that includes exponential, Pareto, and bounded-support power-family distributions. We first characterize the optimal full-information asymptotic competitive ratio for this family. In the unbounded-support case, the limit is $ {\left({\theta}/({\theta-c_+})\right)^{c_+/\theta}}/ {\Gamma(1-c_+/\theta)},$ while in the bounded-support case, the limit is $1$. We then propose a confidence-based dynamic-programming policy for online learning. By exploiting the explicit parametric structure, the policy achieves the same optimal asymptotic competitive ratio using only online observations, without external offline samples. We further derive distribution-specific convergence rates for canonical examples. Finally, numerical experiments on synthetic instances illustrate the performance of our algorithm.