arXiv cs.AI INT ai 2026-05-08 13:00

カットを学ぶ：ベンダーズ分解のための強化学習

原題: Learning to Cut: Reinforcement Learning for Benders Decomposition

分析結果

カテゴリ: 教育
重要度: 59
トレンドスコア: 18
要約: 本記事では、ベンダーズ分解におけるカット生成のための強化学習手法について説明します。ベンダーズ分解は、大規模最適化問題を解決するための効果的な手法ですが、カットの生成はその効率に大きく影響します。強化学習を用いることで、カットの生成プロセスを自動化し、最適化の精度と速度を向上させることが可能です。実験結果も示され、提案手法の有効性が確認されています。
キーワード: learning approach stochastic cuts policy two stage decision

arXiv:2605.06516v1 Announce Type: cross Abstract: Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions. arXiv:2605.06516v1 Announce Type: cross Abstract: Benders decomposition (BD) is a widely used solution approach for solving two-stage stochastic programs arising in real-world decision-making under uncertainty. However, it often suffers from slow convergence as the master problem grows with an increasing number of cuts. In this paper, we propose Reinforcement Learning for BD (RLBD), a framework that adaptively selects cuts using a neural network-based stochastic policy. The policy is trained using a policy gradient method via the REINFORCE algorithm. We evaluate the proposed approach on a two-stage stochastic electric vehicle charging station location problem and compare it with vanilla BD and LearnBD, a supervised learning approach that classifies cuts using a support vector machine. Numerical results demonstrate that RLBD achieves substantial improvements in computational efficiency and exhibits strong generalization to problems with similar structures but varying data inputs and decision variable dimensions.