Please wait a minute...
Frontiers of Medicine

ISSN 2095-0217

ISSN 2095-0225(Online)

CN 11-5983/R

Postal Subscription Code 80-967

2018 Impact Factor: 1.847

Front. Med.    2014, Vol. 8 Issue (3) : 347-351     DOI: 10.1007/s11684-014-0361-z
Extracting terms from clinical records of traditional Chinese medicine
Cungen Cao1,*(),Meng Sun2,Shi Wang1
1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China
2. Software College of Beihang University, Beijing 100191, China
Download: PDF(205 KB)   HTML
Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks

Health records of traditional Chinese medicine contain valuable clinical information which can be used for improvement of disease treatment and for medical research. In this paper, we present a practical iterative extraction method for extracting terms from the records. The method is based on a set of extraction rules, the Mesh, and the likelihood ratio technique, and achieved a precision rate of 88.18% and a recall rate of 94.21%.

Keywords term extraction      rule-based      likelihood ratio     
Corresponding Authors: Cungen Cao   
Online First Date: 27 August 2014    Issue Date: 09 October 2014
URL:     OR
ProcedureTerms foundPercentage of totalCorrect termsPrecisionExamplesProcedureTerms found
Preprocessing15538.18%14593.55%腹部毕满(abdominal distention)口臭纳差(ozostomia and anorexia)脑血栓形成(cerebral thrombosis)
1st iteration12731.28%11388.98%舌淡胖(pale and enlarged tongue)脉细弱(thready and weak pulse)素患高血压(hypertension all along)
进食发呛(choking when eating)胸痛偏左(pain on left chest)下肢不能活动(lower limbs cannot move)
2nd iteration8420.69%6880.95%神识模糊(coma)腰腿酸软(debility of the loins and legs)脉濡或沉滑(soft, deep and slippery pulse)
舌尖边有瘀斑瘀点(tip of tongue with ecchymosis)没有表情(no expression)
3rd iteration102.46%770.00%脉两寸、两尺均无力(weak cun-pulse and chi-pulse)不省人事(unconsciousness)
下床欲解大便(want to get out of bed to defecate)时有憋气(feeling suffocated sometime)
Result optimization307.39%2583.33%See in Table 3
Tab.1  Precision of extraction via different iterations
Body wordSourceBody wordSourceBody wordSource
语(speech)失语(aphasia)步履(walk)步履困难(walks with difficulty)治疗(treatment)治疗病情(treatment of the disease)
气(Qi)气短(shortness of breath)苔(tongue fur)少苔(little tongue fur)物(thing)视物不清(blurred vision)
言语(speech)言语尚清(clear speech)寝(sleep)寐寝不安(sleep disorder)小时(hour)两小时(two hours)
Tab.2  Some body structure words found by descriptive words
Rule’s typeExamples
Improved bycovering苔黄腻而干(dry tongue with yellow and greasy fur)狂躁呼叫(manic shout)两小时后软瘫不用(flaccid paralysis after two hours)无痒痛感(no itching feeling)
口溢稀涎(salivation of the labial angle)中风(stroke)左侧偏瘫(hemiplegia of left body)语强(stammering)
自汗(spontaneous perspiration)尚能行走(still can walk)呼吸不规灼手(irregular breathing burning hand)
Edit distance语言蹇塞(stammering)语言蹇涩(stammering)
Tab.3  Some results of result optimization
呱声大响(loud quack)夜寐不宁(sleep disorder)
弛缓性瘫痪(flaccid paralysis)昏仆(faint)
急躁易怒(impatient and irritable)步履艰难(walks with difficulty)
讲话欠利(stammering)夜寐欠佳(not sleep well)
瞳孔尚等大等圆(equal pupil)不能站立(cannot stand)
对光反射迟钝(dull reflection to light)舌强语睿(stiffness of tongue and stammering)
痰声渡渡(phlegm sound)痰声浓液(phlegm sound like concentrated liquid)
血压急剧升高(sharp rise in blood pressure)脘胀泛恶(gastric cavity swelling and nuasea)
Tab.4  Undiscovered terms
1 Ji PP, Yan XY, Cen YH. A survey of term recognition and extraction for domain specific Chinese text information processing. J Libr Inf Serv2010; 16: 124–129
2 Daille B. Study and implementation of combined techniques for automatic extraction of terminology. In: Klavans JL, Resnik P. The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge, MA: MIT Press, 49–66
3 Wang WM, He DC, Fu JH. Research of professional term identification method based on seed expansion. J Comput Appl (Ji Suan Ji Ying Yong)2012; 29(11): 4105–4107 (in Chinese)
4 Zhang F, Xu Y, Hou Y, Fan X Z. Chinese term extraction system based on mutual information. J Comput Appl (Ji Suan Ji Ying Yong)2005; 22(5): 72–73 (in Chinese)
5 Hu WM, HE TT, Zhang Y. Extraction of Chinese term based on Chi-square test. J Comput Appl (Ji Suan Ji Ying Yong)2007; 27(12): 3019–3020 (in Chinese)
6 Zhang WB, Bai Y, Wang PY, Zhang GP. An automatic domain terms extraction method on traditional Chinese medicine books. J Shenyang Aerosp Univ (Shenyang Hang Kong Hang Tian Da Xue Xue Bao)2011; 28(1): 72–75 (in Chinese)
7 Cen YH, Han Z, Ji PP. Chinese term recognition based on hidden Markov model. J New Technol Libr Inf Serv2008; 12: 54–58
8 Zhang HP, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the second SIGHAN workshop on Chinese language processing. 2003; 17: 184–187
9 Dunning T. Accurate methods for the statistics of surprise and coincidence. Int J Comput Linguist 1993; 19(1): 61–74
No related articles found!
Full text