[关键词]
[摘要]
目的:针对中药不良反应数据的不平衡性,探索并应用不平衡数据的处理方法,对中药的不良反应进行预测。本文以使用丹红注射液的患者为研究对象,对来自37家医院集中监测数据进行深度挖掘,在使用了丹红注射液的患者中预测是否发生不良反应。方法:从数据层面采用四种方法:不处理、随机欠采样、随机过采样、SMOTE采样;从算法层面采用四种模型或算法:决策树、随机森林、AdaBoost算法、GradientBoosting算法,对数据的不平衡性进行处理。两个层面的方法两两结合,对16种方法与模型或算法组合的预测效果进行比较。结果:随机欠采样和AdaBoost算法相结合、随机欠采样和GradientBoosting算法相结合的预测效果较为理想,recall和G-mean都达到80%以上,AUC指标也高达0.86。结论:初步探索中药不良反应可能适用的不平衡数据处理方法,预测结果结合实际经验,能较准确地预测使用了丹红注射液的患者是否发生不良反应,在临床实际应用中能起到一定的警示作用。同时,根据输出的变量重要性排名,能最大程度地避免用药后的不良反应的发生,为丹红注射液的安全性再评价提供一些科学参考依据。
[Key word]
[Abstract]
In view of the imbalance of the adverse reaction data of traditional Chinese medicine (TCM), this paper explored and applied the processing method of imbalanced data to predict adverse reactions of TCM. This paper took patients who used Dan- Hong (DH) injection as the research object, excavated centralized monitoring data from 37 hospitals, and predicted adverse reactions from patients who used DH injection. This paper combined four data-level approaches, including non-processing, random undersampling, random oversampling and SMOTE algorithm, with four algorithm- level approaches, including decision tree, random forest, AdaBoost and Gradient Boosting, to process the imbalanced data, and then to compare their prediction performance. Finally we found that two algorithms, combining random undersampling with AdaBoost, and combining random undersampling with Gradient Boosting, had better prediction performance than other algorithms. Their recall and G -mean both reached 80%; and AUC was more than 0.86. It was concluded that the imbalanced data processing methods were preliminary explored. This method is applicable to the prediction of TCM adverse reactions in combination with practical experiences. It can accurately predict whether adverse reactions occurred in patients who used DH injection. It can play a certain warning role in clinical practice. At the same time, according to the importance of the output variable ranking, we can minimize the occurrence of adverse reactions after treatment. It provided some scientific references for the safety reassessment of DH injection.
[中图分类号]
[基金项目]
国家自然科学基金委青年科学基金项目(81502898):大型观察性医学数据的因果图模型研究,负责人:杨伟;重大新药创制专项子课题(2015ZX09501004-001-007):临床需长期使用的中药口服制剂安全性监测研穷,负责人:李学林。