Global Trend Radar
arXiv cs.LG (Machine Learning) INT ai 2026-04-28 13:00

MLorc: メモリ効率の良い大規模言語モデル適応のためのモメンタム低ランク圧縮

原題: MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation

元記事を開く →

分析結果

カテゴリ
AI
重要度
69
トレンドスコア
28
要約
大規模言語モデル(LLM)のサイズが増加する中、全パラメータのファインチューニングは大きなメモリ要求を伴います。これを軽減するために、新しいメモリ効率的な手法を提案します。
キーワード
arXiv:2506.01897v5 Announce Type: replace Abstract: With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks (e.g., $r=4$), and generalizes well across different optimizers, all while not compromising time or memory efficiency. arXiv:2506.01897v5 Announce Type: replace Abstract: With increasing size of large language models (LLMs), full-parameter fine-tuning imposes substantial memory demands. To alleviate this, we propose a novel memory-efficient training paradigm called Momentum Low-rank compression (MLorc). The key idea of MLorc is to compress and reconstruct the momentum of matrix parameters during training to reduce memory consumption. Compared to LoRA, MLorc avoids enforcing a fixed-rank constraint on weight update matrices and thus enables full-parameter learning. Compared to GaLore, MLorc directly compress the momentum rather than gradients, thereby better preserving the training dynamics of full-parameter fine-tuning. We provide a theoretical guarantee for its convergence under mild assumptions. Empirically, MLorc consistently outperforms other memory-efficient training methods, matches or even exceeds the performance of full fine-tuning at small ranks (e.g., $r=4$), and generalizes well across different optimizers, all while not compromising time or memory efficiency.

類似記事(ベクトル近傍)