Global Trend Radar
arXiv cs.AI INT ai 2026-05-08 13:00

ほぼ最適なアテンションコアセット

原題: Nearly Optimal Attention Coresets

元記事を開く →

分析結果

カテゴリ
宇宙
重要度
53
トレンドスコア
12
要約
本稿では、限られた空間でアテンションメカニズムを推定する問題を考察し、ほぼ最適なサイズのコアセットの存在を証明します。具体的には、アテンションメカニズムに対するコアセットの構築方法とその特性について詳述します。
キーワード
arXiv:2605.05602v1 Announce Type: cross Abstract: We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{\rho+o(\rho)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $\rho$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $\Omega({\sqrt{d} e^{\rho}/\epsilon})$. arXiv:2605.05602v1 Announce Type: cross Abstract: We consider the problem of estimating the Attention mechanism in small space, and prove the existence of coresets for it of nearly optimal size. Specifically, we show that for any set of unit-norm keys and values $(K,V)$ in $\mathbb{R}^d$, there exists a subset $(K',V')$ of size at most $O({\sqrt{d} e^{\rho+o(\rho)}/\varepsilon})$ such that \[ \left\| \operatorname{Attn}(q,K,V)- \operatorname{Attn}(q,K',V') \right\| \le \varepsilon \] simultaneously for all queries whose norm is bounded by $\rho$. This outperforms the best known results for this problem. We also offer an improved lower bound showing that $\varepsilon$-coresets must have size $\Omega({\sqrt{d} e^{\rho}/\epsilon})$.