DSpace at EWHA: 수정된 Q-Learning을 사용한 예산제약이 있는 다기간 다제품 재고관리 모형

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 604 Download: 0

수정된 Q-Learning을 사용한 예산제약이 있는 다기간 다제품 재고관리 모형

Title: 수정된 Q-Learning을 사용한 예산제약이 있는 다기간 다제품 재고관리 모형

Other Titles: Multi-period, multi-product inventory management model with budget constraints using modified Q-Learning

Authors: 박나희

Issue Date: 2022

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 민대기

Abstract: 수요의 변동성 및 불확실성이 커진 오늘날의 공급망 환경에서 회사의 높은 수익성을 유지하기 위해 최적의 주문수량을 찾는 재고관리 모델에 관한 연구는 지속적으로 수행되어왔다. 본 논문은 예산의 제약이 존재하는 다기간 다제품 Newsvendor 재고모델에 대해 논한다. 재고모델은 소매업체인 한 에이전트 (agent)를 가정하며, 소매업체의 재고는 A와 B, 총 2개 제품을 보유하고 있다. 재고 모델의 최종 목표는 초과재고비용과 재고부족비용을 포함한 총 재고비용을 최소화하는 것이며, 두 제품의 주문 수량이 모두 예산제약을 충족해야 한다. 문제는 MDP (Constrained Markov Decision Process)로 공식화되며, MDP 모형의 최적해를 구하는 과정에서 발생하는 차원의 저주 (Curse of Dimensionality)와 같은 문제를 극복하고자 강화학습 기법의 Q-Learning을 사용한다. 또한 강화학습 과정에서 행동제약으로 반영되는 예산제약의 존재로 인해 비가능해를 생성하는 문제를 해결하고자, 최적화 수리모형을 사용하여 비가능해를 가능해로 보정하는 ‘수정된 Q-Learning 모델’을 활용한다. Penalty와 Incentive는 학습 절차에 포함된 2차 최적화 문제를 해결하여 얻는다. 본 논문은 예산제약 유형으로 계획 기간 동안 예산을 사용하는 방식에 따라서 주기적 예산제약 모형 (Periodic budget constraint)과 유연 예산제약 모형 (Flexible budget constraint) 등 두 가지 유형을 모두 고려한다. 주기적 예산제약 모형은 계획 기간 동안 매주 일정 수준의 예산이 주어지는 방식이며 LAU, XIA JIUN (2021)이 수행한 유형이다. 본 논문은 총 예산을 기간 내에 자유롭게 활용하여 각 기간마다 총비용을 최소화하는 의사결정을 하는 유연 예산제약 모형을 제안한다. 수치적 분석으로, 제안된 Q-learning 방법의 유연 예산제약 모형의 성능을 평가하고자 비교 방법론으로 주기적 예산제약 모형, 예산제약이 없는 Q-Learning 모델, 휴리스틱 기법을 사용한 Q-Learning 모델, EOQ 모델을 실험한다. 실험 결과, 제안 모형은 총 재고비용을 낮추는 동시에 예산제약을 만족할 확률을 높이는 것으로 나타났으며, 수요의 변동성이 높아질수록 주기적 예산제약 모형과 유연 예산제약 모형 모두 총 재고비용과 예산제약 위반 규모가 증가함을 알 수 있었다. 또한, 실제 수요 데이터를 활용한 수치 실험을 통해 유연 예산제약 모형의 실질적 비즈니스 응용의 타당성을 검증하였다. ;This paper deals with the multi-period, multi-product Newsvendor problem considering budget constraints by applying the reinforcement learning technique Q-Learning. In order to solve the problem of generating impossible solutions due to budget constraints reflected as behavioral constraints in the reinforcement learning process, a modified Q-Learning model that corrects impossible solutions to possible using an optimization mathematical model was used. This model meets customer needs while minimizing total costs, including inventory holding costs and backlog costs. This problem is formulated as MDP and the model uses Q-Learning of reinforcement learning. In this reinforcement learning procedure, either a penalty for breaking a constraint or an incentive for meeting a constraint on the value of q is imposed. It also applies a secondary program to the reinforcement learning process to obtain a penalty or incentive. Based on the method of using the budget during the planning period, it is divided into Periodic budget constraint and Flexible budget constraint type. The Flexible budget constraint model is the model proposed in this paper. For performance comparison, periodic budget constraint model, Q-Learning without budget constraint, Q-Learning using heuristic method, and EOQ model were tested. Through the experiment, the following results were derived. First, the Flexible budget constraint model showed the best performance. Second, as the budget size increased, the size of the budget constraint violation decreased in both the Periodic budget constraint model and the Flexible budget constraint model. Third, as the volatility of demand increased, the total inventory cost and the size of the budget constraint violation increased in both the Periodic budget constraint model and the Flexible budget constraint model. In addition, it was derived that the performance of the Flexible budget constraint model was relatively better. Finally, the feasibility of practical business application of the Flexible budget constraint model was verified through numerical experiments using actual demand data.