[关键词]
[摘要]
目的:目前已有的知识抽取方法虽然多面向英文,但中文医学文献的数量也正在迅速增长,而且中医古籍文献中也有很多有价值的知识需要获取。基于此,本文以疾病“崩漏”为例,以正则表达式为规则,试图抽取中医古籍中疾病相关的知识,以构建中医疾病知识的语义框架。方法:建立崩漏相关的等同关系、因果关系、治疗关系的正则表达式,进而建立以正则表达式为规则的知识抽取及可视化平台。结果:实现对崩漏相关知识框架的抽取与表达,通过人工抽取和计算机知识平台抽取方式构建中医疾病语义框架,并在此基础上完成中医疾病相关的知识框架描述。结论:研究发现,以正则表达式为规则的知识抽取可视化平台,可以实现对崩漏相关知识框架的抽取与表达,并为中医疾病知识的逻辑化描述与未来的抽取及应用提供了方法,为实现中医疾病相关的知识框架描述奠定基础,可单一地基于正则表达式的信息抽取方式难以达到很好的召回率,如果在正则表达规则的基础上考虑篇章结构,同时整合机器学习与语义标注的混合信息抽取方法可能进一步提高抽取的效能。
[Key word]
[Abstract]
At present most knowledge extraction methods were from English literatures.As a surge of literature study and lots of values in ancient books,the semantic network of diseases and the principle of regular expression of traditional Chinese medicine(TCM)were described.In this study,uterine bleeding in TCM was taken for instance.Regular expression of semantic relations,including equivalence relationship,causal relationship and therapeutic relationship was constructed before establishing the visualization platform of knowledge extraction.As a result,the framework of diseases in TCM was formed after completing knowledge extraction for uterine bleeding.The later was implemented by the principle of regular expression,and laid a fundamental for the application of knowledge extraction in TCM.However,the recall rate was undesirable through single principle of regular expression for the knowledge framework of uterine bleeding and other diseases.The efficiency of knowledge extraction in the further exploration could have been advisable and improved by means of hybrid information extraction method,including machine learning and semantic annotation.
[中图分类号]
[基金项目]
国家自然科学基金委青年科学基金项目(81202758):基于语义网络的传统针灸概念体系表示及应用研究,负责人:朱玲;国家自然科学基金委青年科学基金项目(81403491):基于语义相似度的古代散在针灸知识框架构建研究,负责人:杨峰。