DSpace at EWHA: 온라인 시장환경 하에서 유통기한이 존재하는 제품의 수익성 극대화를 위한 동적가격인하정책 연구

Browse

My Repository

DSpace at EWHA일반대학원 빅데이터분석학협동과정 Theses_Master

View : 851 Download: 0

온라인 시장환경 하에서 유통기한이 존재하는 제품의 수익성 극대화를 위한 동적가격인하정책 연구

Title: 온라인 시장환경 하에서 유통기한이 존재하는 제품의 수익성 극대화를 위한 동적가격인하정책 연구

Other Titles: A study on dynamic markdown pricing model for perishable products under online market environment

Authors: 김단비

Issue Date: 2021

Department/Major: 대학원 빅데이터분석학협동과정

Publisher: 이화여자대학교 대학원

Degree: Master

Advisors: 강윤철

Abstract: 일반적으로 온라인 시장에서는 오프라인 시장과 비교하여 가격 변동이 용이하며, 수익성을 극대화 할 수 있는 가격변동 정책이 요구된다. 특히, 유통기한이 존재하는 제품의 경우, 시간이 지남에 따라 제품의 가치가 하락하는 경향이 존재하며, 같은 상품에 대해 고객이 지불하고자 하는 금액이 서로 다르기 때문에 이를 반영할 수 있는 가격변동 정책이 요구된다. 하지만 현재 유통기한이 존재하는 제품은 유통기한이 임박한 상품에 대해서만 큰 폭의 할인 가격을 제공하고 대부분의 품목은 유통기한이 다르더라도 같은 가격에 판매된다. 유통기한이 긴 제품과 짧은 제품의 가격이 동일한 경우 소비자는 유통기한이 긴 제품을 우선적으로 구매하게 되는데 이는 폐기비용 증가와 매출 손실을 유발한다. 이러한 비용 발생을 막기 위해 유통기한이 존재하는 상품에 대한 세분화된 가격정책이 요구되고 있는 실정이다. 본 연구는 남은 유통기한을 기반으로 신선도가 낮은 상품에 대해 신선도가 높은 상품보다 할인된 가격을 책정하는 가격인하 정책과 고객 특성 및 재고 상황을 고려한 맞춤형 가격정책을 결합한 동적가격인하정책을 제안한다. 특히, 고객 수요의 불확실성으로 인해 발생하는 수요 추정의 어려움을 해결하기 위해 해당 문제를 순차적 의사결정문제로 정의하고 강화학습을 통해 가격책정 문제를 해결하고자 한다. 강화학습은 불확실한 환경 하에서 과거의 경험을 통해 학습함으로써 순차적인 의사결정을 내리며 실시간으로 가격 정책을 수정하여 의사결정에 반영할 수 있다. 현재 강화학습을 사용한 동적가격정책은 대부분 Q-learning을 통해 이루어 지고 있는데, 이는 환경이 복잡해질수록 차원의 저주(curse of dimension) 문제가 존재해 수렴 속도가 기하급수적으로 감소한다는 한계가 있다. 본 논문에서는 이를 해결하기 위해 심층 강화학습의 하나인 Deep Q-Networks(DQN)을 통해 최적 가격정책을 도출하고자 한다. 제안 방법론의 비교 대상으로 유통기한이 임박한 상품에 대하여만 큰 폭의 할인을 제공하는 가격 정책과 무작위 가격할인 정책을 사용하였다. 본 논문에서는 유통기한이 존재하는 제품의 동적가격 책정 문제에 강화학습 알고리즘을 적용할 경우, 기존 방법론에 비해 유의미한 수익 증대 효과가 있음을 확인하였다. 또한, 민감도 분석을 통해 고객 수요의 불확실성이 높아짐에 따라, 재고 상태의 복잡도가 높아짐에 따라 강화학습을 활용한 가격 정책이 유용함을 입증하였다. 이어 모델 별 매출량 증가, 비용 감소량 분석을 통해 강화학습 모델이 어떠한 방식으로 최적 가격 정책을 도출할 수 있었는 지에 대한 분석을 진행하였다. 분석을 통해 강화학습 모델이 전체 재고 보유로 인해 발생하는 비용과 신선도가 떨어지는 상품을 보유함으로써 발생하는 비용을 고려하여 적절한 가격 정책을 집행하였고, 고객 유형에 적합한 가격 책정을 하여 기회 비용을 감소시켰다는 것을 알 수 있었다. 또한, 강화학습 모델이 단순히 기회 비용을 줄이기 위해 저가 정책만을 펼치지 않고 재고 상황과 고객 유형을 고려하여 총 보상(total reward)을 증가시키는 방향으로 학습하여 가격을 집행하였음을 알 수 있었다. 본 논문이 기여하는 바로는 유통기한이 존재하는 제품의 동적가격책정 문제에 강화학습을 적용하여 불확실한 고객 수요를 사전에 정의하지 않고 시뮬레이션을 통해 실시간으로 발생하는 구매 데이터를 학습하여 최적 가격을 도출해냈다는 점이다. 또한, 고객 수요의 변동, 재고 보충 규칙의 변경 등과 같은 다양한 상황적 변화에서도 강화학습 모델이 전통적 가격 정책 모델보다 우수한 성능을 보였다는 점에서 불확실성이 큰 현실적인 상황에서 강화학습 방법론을 적용한 가격 정책이 실제로 유의미한 수익 증대를 가져올 수 있다는 점을 시사했다. 고객의 로그 데이터 및 구매 데이터가 실시간으로 기록되는 온라인 시장의 특성을 고려하면 강화학습을 사용한 동적가격책정은 현업에서의 활용 가능성이 높을 것으로 기대된다.;In general, changing a price is easier in an online market than in an offline market so pricing is required to maximize the profitability. Particularly, in case of perishable products, there is a tendency where product values depreciate over time. Besides, since there is a difference the amount customer want to pay, it warrants pricing which can reflect such a point above. However, currently perishable products offer a low price only for roducts that have a near expiration date, and most items are sold at the same price even though they have different shelf-life. If the prices of products having long shelf-life and products having short shelf-life are the same, consumers will first purchase products having long shelf-life, which will lead to higher disposal costs and lost sales costs. Effective and granular pricing policies for perishable products are required to prevent such costs. This study proposes a dynamic markdown pricing that combines markdown pricing that sets a discount on less fresh products than those with higher freshness and a dynamic pricing considering customer characteristics and inventory level. In particular, to address the difficulty of estimating customer demand, it defines the pricing problem as the sequential decision-making problem and approximates various pricing factors in the model, deriving the optimal dynamic pricing through the deep reinforcement learning. Reinforcement learning can make sequential decisions by learning from past experience under uncertain environment and modify pricing policies in real time. Currently, most study about dynamic pricing using reinforcement learning are carried out through Q-learning. However, as the environment becomes more complex, there is a limitation that the speed of convergence decreases exponentially due to the curse of dimension. To solve this problem, this paper solves the optimal pricing policy through Deep Q-Networks(DQN), one of deep reinforcement learning. Two-period pricing(traditional pricing policies making huge discount only for near expiration date products) and random markdown pricing were used to compare with proposed methodologies. This study shows that pricing policy using DQN are useful as the complexity of inventory states increases, as customer demand uncertainty increases through total reward comparison with traditional pricing policies. We also conducted an analysis of how the DQN model was able to derive the optimal pricing policy through the analysis of sales increase(%) and cost value decrease(%) by each models. The analysis showed that the reinforcement learning model implemented an appropriate pricing policy considering the costs incurred by holding the entire inventory and the costs incurred by holding the less fresh product, and reduced opportunity costs by setting the appropriate price considering customer types. Furthermore, it was shown that the reinforcement learning model was trained to increase the total rewards by taking into account inventory states and customer types rather than simply implementing low-cost policy to reduce opportunity costs. The contribution of this paper is that reinforcement learning has been applied to the dynamic pricing problem of perishable product to obtain optimal prices by learning purchase data that occurs in real time through simulation without defining uncertain customer demand in advance. In addition, the fact that the reinforcement learning model outperformed the traditional pricing policy model even in various situational changes such as fluctuations in customer demand and changes in inventory replenishment rules suggests that the pricing policy using reinforcement learning methodology can actually lead to significant revenue growth in real-world situations. Considering the characteristics of the online market, where customers' log data and purchase data are recorded in real time, dynamic pricing using reinforcement learning is expected to be highly likely to be used in the field.