[关键词]
[摘要]
目的 针对当前中医药临床数据发布方法在保护患者隐私时未考虑信息损失而导致数据在后续的分析处理中不可用的问题,提出一种面向中医临床数据发布的隐私保护算法。方法 采用基于聚类的个性化K-匿名算法对原始数据集的准标识符属性,按照用户自定义的个性化泛化树进行泛化,并将泛化后的记录聚类成满足K-匿名约束的等价类,在实现患者隐私保护的同时减少信息损失。结果 应用该算法实现了中医电子病历安全发布系统,系统运行结果表明数据发布结果满足匿名约束,可以在10秒内对小于6000条记录的数据集完成匿名处理,匿名后的数据可用性较高。结论 该算法具有较高的可行性和有效性,在功能上可以满足发布过程的隐私保护和处理过程的数据可用性,在性能上可以满足业务系统的实际需求。
[Key word]
[Abstract]
Objective To propose a privacy protection algorithm for TCM clinical data release aiming at the problem that the current TCM clinical data release method does not consider the loss of information when protecting the privacy of patients, which causes the data to be unavailable in subsequent analysis and processing.Methods A cluster-based personalized K-anonymity algorithm was used to generalize the quasi-identifier attributes of the original data set according to a user-defined personalized generalization tree, and cluster the generalized records into K-anonymity equivalent classes of constraints, which can reduce information loss while realizing patient privacy protection.Results The application of this algorithm has implemented a TCM electronic medical record security release system. The results of the system operation show that the results of the data release meet the anonymity constraint, and the data set with less than 6000 records can be anonymized within 10 seconds, and the data availability after anonymization was high.Conclusion The algorithm has high feasibility and effectiveness. It can meet the privacy protection of the publishing process and the data availability of the processing process in terms of function, and it can meet the actual needs of the business system in terms of performance.
[中图分类号]
R289.9
[基金项目]