程哲, 韦磊, 程军圣, 胡茑庆. 基于演员−评论家结构的深度强化学习齿轮箱智能故障诊断方法[J]. 失效分析与预防, 2023, 18(3): 141-148, 200. DOI: 10.3969/j.issn.1673-6214.2023.03.001
    引用本文: 程哲, 韦磊, 程军圣, 胡茑庆. 基于演员−评论家结构的深度强化学习齿轮箱智能故障诊断方法[J]. 失效分析与预防, 2023, 18(3): 141-148, 200. DOI: 10.3969/j.issn.1673-6214.2023.03.001
    CHENG Zhe, WEI Lei, CHENG Jun-sheng, HU Niao-qing. Deep Reinforcement Learning Gearbox Intelligent Fault Diagnosis Method Based on Actor-critic Structure[J]. Failure Analysis and Prevention, 2023, 18(3): 141-148, 200. DOI: 10.3969/j.issn.1673-6214.2023.03.001
    Citation: CHENG Zhe, WEI Lei, CHENG Jun-sheng, HU Niao-qing. Deep Reinforcement Learning Gearbox Intelligent Fault Diagnosis Method Based on Actor-critic Structure[J]. Failure Analysis and Prevention, 2023, 18(3): 141-148, 200. DOI: 10.3969/j.issn.1673-6214.2023.03.001

    基于演员−评论家结构的深度强化学习齿轮箱智能故障诊断方法

    Deep Reinforcement Learning Gearbox Intelligent Fault Diagnosis Method Based on Actor-critic Structure

    • 摘要: 由于旋转机械大部分时间处于健康状态,并且很难获得足够的故障数据,历史监测数据将高度偏向健康状态,在非平衡样本条件下的深度学习故障诊断方法的诊断精度将会严重降低。本研究结合强化学习框架和深度学习算法,提出一种基于演员−评论家结构的深度强化学习的齿轮箱智能故障诊断方法。智能体以原始振动信号作为输入,将智能体输出概率值的分布与真实标签one-hot编码的Jensen-Shannon(JS)散度距离作为连续奖励函数,并以不平衡比为基准来提高智能体正确识别故障样本时的奖励值;设计一种使智能体在训练初期尽可能探索状态空间而在训练后期逐渐收敛的探索策略。实验证明:在PHM2009数据集中,健康样本和故障样本不平衡比例为10时,本研究所提的智能故障诊断方法在3种工况下平均识别精度可达99%,相较于其他诊断精度方法提升37%~49%。

       

      Abstract: As rotating machinery is in a healthy state most of the time and obtaining sufficient fault data is difficult, the historical monitoring data will be inclined to healthy conditions and the diagnostic accuracy of the fault diagnosis methods based on deep learning algorithm under unbalanced sample conditions will be significantly reduced. Therefore, by combining a reinforcement learning framework and a deep learning algorithm, an intelligent fault diagnosis method for gearboxes based on deep reinforcement learning with actor-critic structure was proposed in this study. With this algorithm, the agent takes the original vibration signal as input data, and the Jensen-Shannon (JS) divergence distance between the distribution of the agent output probability values and the true label one-hot encoding as a continuous reward function. Besides, the imbalance ratio works as a benchmark to increase the reward value when the intelligent system correctly identifies the faulty sample. Moreover, an exploration strategy was designed, which can ennable the intelligent system explore the state space as much as possible at the training beginning and gradually converge at the end. The experimental resutls validates that, when the imbalance ratio between healthy and faulty samples is 10 in PHM2009 data set, the proposed intelligent fault diagnosis method can achieve an average recognition accuracy of 99% under three working conditions, which is 37%~49% higher than other diagnosis accuracy methods.

       

    /

    返回文章
    返回