私のLLMをカスタマイズする: 変動モデルを活用して推論ハイパーパラメータを調整する
原題: Pimp My LLM: Leveraging Variability Modeling to Tune Inference Hyperparameters
分析結果
- カテゴリ
- AI
- 重要度
- 85
- トレンドスコア
- 34
- 要約
- 大規模言語モデル(LLM)はさまざまなタスクでの利用が増加していますが、その高い計算要求はエネルギーに関する懸念を引き起こしています。
- キーワード
- 長期重要性
- 数年で重要
- ビジネス可能性
- 高い
- 日本波及可能性
- 高 - 日本のAI産業におけるエネルギー効率の向上が期待されるため。
arXiv:2602.17697v2 Announce Type: replace Abstract: Large Language Models (LLMs) are being increasingly used across a wide range of tasks. However, their substantial computational demands raise concerns about the energy efficiency and sustainability of both training and inference. Inference, in particular, dominates total compute usage, making its optimization crucial. Recent research has explored optimization techniques and analyzed how configuration choices influence energy consumption. Yet, the vast configuration space of inference servers makes exhaustive empirical evaluation infeasible due to combinatorial explosion. In this paper, we introduce a new perspective on this problem by treating LLMs as configurable systems and applying variability management techniques to systematically analyze inference-time configuration choices. We evaluate our approach on the Hugging Face Transformers library by representing generation hyperparameters and their constraints using a feature-based variability model, sampling representative configurations, measuring their energy consumption, latency, accuracy, and learning predictive models from the collected data. Our results show that variability modeling effectively manages the complexity of LLM inference configurations. It enables systematic analysis of hyperparameters effects and interactions, reveals trade-offs, and supports prediction of inference behavior from a limited number of measurements. Overall, this work opens a new research direction that bridges software engineering and machine learning by leveraging variability modeling for the efficient and sustainable configuration of LLMs. arXiv:2602.17697v2 Announce Type: replace Abstract: Large Language Models (LLMs) are being increasingly used across a wide range of tasks. However, their substantial computational demands raise concerns about the energy efficiency and sustainability of both training and inference. Inference, in particular, dominates total compute usage, making its optimization crucial. Recent research has explored optimization techniques and analyzed how configuration choices influence energy consumption. Yet, the vast configuration space of inference servers makes exhaustive empirical evaluation infeasible due to combinatorial explosion. In this paper, we introduce a new perspective on this problem by treating LLMs as configurable systems and applying variability management techniques to systematically analyze inference-time configuration choices. We evaluate our approach on the Hugging Face Transformers library by representing generation hyperparameters and their constraints using a feature-based variability model, sampling representative configurations, measuring their energy consumption, latency, accuracy, and learning predictive models from the collected data. Our results show that variability modeling effectively manages the complexity of LLM inference configurations. It enables systematic analysis of hyperparameters effects and interactions, reveals trade-offs, and supports prediction of inference behavior from a limited number of measurements. Overall, this work opens a new research direction that bridges software engineering and machine learning by leveraging variability modeling for the efficient and sustainable configuration of LLMs.