arXiv cs.LG (Machine Learning) INT ai 2026-04-27 13:00

PreMoE: 効率的な専門家の混合のためのプロアクティブ推論

原題: PreMoE: Proactive Inference for Efficient Mixture-of-Experts

分析結果

カテゴリ: AI
重要度: 63
トレンドスコア: 22
要約: Mixture-of-Experts (MoE)モデルは動的計算を提供しますが、通常は静的なフルキャパシティモデルとして展開され、展開特有の仕様の機会を逃しています。
キーワード: domain pre deployment expert sparsity specific utility high

arXiv:2505.17639v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework that proactively compiles sparse MoE variants for targeted deployment scenarios. At its core is Predicted Expert Utility (PEU), a robust metric for estimating expert importance from router logits through high-confidence threshold filtering and logit transformation, which together stabilize utility estimation under aggressive sparsity. Using PEU scores computed on a small calibration set, PreMoE produces domain-aware expert rankings that can be used to compile either domain-specific specialists or high-efficiency multi-domain generalists, without any retraining. Across MoE models ranging from 30B to 718B parameters, PreMoE achieves up to 50\% sparsity with nearly no performance loss. It further exposes a practical deployment trade-off: specialists maximize in-domain efficiency, while synthesized generalists retain broader cross-domain capability at the same sparsity budget. arXiv:2505.17639v3 Announce Type: replace Abstract: Mixture-of-Experts (MoE) models offer dynamic computation, but are typically deployed as static full-capacity models, missing opportunities for deployment-specific specialization. We introduce PreMoE, a training-free framework that proactively compiles sparse MoE variants for targeted deployment scenarios. At its core is Predicted Expert Utility (PEU), a robust metric for estimating expert importance from router logits through high-confidence threshold filtering and logit transformation, which together stabilize utility estimation under aggressive sparsity. Using PEU scores computed on a small calibration set, PreMoE produces domain-aware expert rankings that can be used to compile either domain-specific specialists or high-efficiency multi-domain generalists, without any retraining. Across MoE models ranging from 30B to 718B parameters, PreMoE achieves up to 50\% sparsity with nearly no performance loss. It further exposes a practical deployment trade-off: specialists maximize in-domain efficiency, while synthesized generalists retain broader cross-domain capability at the same sparsity budget.

PreMoE: 効率的な専門家の混合のためのプロアクティブ推論

分析結果

類似記事（ベクトル近傍）