Extracting terms from clinical records of traditional Chinese medicine
Cungen Cao1,*(),Meng Sun2,Shi Wang1
1. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China 2. Software College of Beihang University, Beijing 100191, China
Health records of traditional Chinese medicine contain valuable clinical information which can be used for improvement of disease treatment and for medical research. In this paper, we present a practical iterative extraction method for extracting terms from the records. The method is based on a set of extraction rules, the Mesh, and the likelihood ratio technique, and achieved a precision rate of 88.18% and a recall rate of 94.21%.
. [J]. Frontiers of Medicine, 2014, 8(3): 347-351.
Cungen Cao,Meng Sun,Shi Wang. Extracting terms from clinical records of traditional Chinese medicine. Front. Med., 2014, 8(3): 347-351.
脉两寸、两尺均无力(weak cun-pulse and chi-pulse)不省人事(unconsciousness)
下床欲解大便(want to get out of bed to defecate)
时有憋气(feeling suffocated sometime)
Result optimization
30
7.39%
25
83.33%
See in Table 3
Total
406
-
358
88.18%
-
Tab.1
Body word
Source
Body word
Source
Body word
Source
语(speech)
失语(aphasia)
步履(walk)
步履困难(walks with difficulty)
治疗(treatment)
治疗病情(treatment of the disease)
气(Qi)
气短(shortness of breath)
苔(tongue fur)
少苔(little tongue fur)
物(thing)
视物不清(blurred vision)
呱(quack)
呱逆(-)
院(hospital)
出院(discharging)
人事(-)
人事不省(unconsciousness)
言语(speech)
言语尚清(clear speech)
寝(sleep)
寐寝不安(sleep disorder)
小时(hour)
两小时(two hours)
Tab.2
Rule’s type
Examples
Improved bycovering
苔黄腻而干(dry tongue with yellow and greasy fur)
狂躁呼叫(manic shout)
两小时后软瘫不用(flaccid paralysis after two hours)
无痒痛感(no itching feeling)
口溢稀涎(salivation of the labial angle)
中风(stroke)
左侧偏瘫(hemiplegia of left body)
语强(stammering)
自汗(spontaneous perspiration)
尚能行走(still can walk)
呼吸不规灼手(irregular breathing burning hand)
…
Frequency
语言謇涩(stammering)
Edit distance
语言蹇塞(stammering)
语言蹇涩(stammering)
Tab.3
白内障(cataract)
卧床不起(bedridden)
呱声大响(loud quack)
夜寐不宁(sleep disorder)
弛缓性瘫痪(flaccid paralysis)
昏仆(faint)
昏昏酣睡(sleepy)
昼夜不寐(insomnia)
急躁易怒(impatient and irritable)
步履艰难(walks with difficulty)
讲话欠利(stammering)
夜寐欠佳(not sleep well)
瞳孔尚等大等圆(equal pupil)
不能站立(cannot stand)
对光反射迟钝(dull reflection to light)
舌强语睿(stiffness of tongue and stammering)
痰声渡渡(phlegm sound)
痰声浓液(phlegm sound like concentrated liquid)
血压急剧升高(sharp rise in blood pressure)
脘胀泛恶(gastric cavity swelling and nuasea)
Tab.4
1
Ji PP, Yan XY, Cen YH. A survey of term recognition and extraction for domain specific Chinese text information processing. J Libr Inf Serv2010; 16: 124–129
2
Daille B. Study and implementation of combined techniques for automatic extraction of terminology. In: Klavans JL, Resnik P. The Balancing Act: Combining Symbolic and Statistical Approaches to Language. Cambridge, MA: MIT Press, 49–66
3
Wang WM, He DC, Fu JH. Research of professional term identification method based on seed expansion. J Comput Appl (Ji Suan Ji Ying Yong)2012; 29(11): 4105–4107 (in Chinese)
4
Zhang F, Xu Y, Hou Y, Fan X Z. Chinese term extraction system based on mutual information. J Comput Appl (Ji Suan Ji Ying Yong)2005; 22(5): 72–73 (in Chinese)
5
Hu WM, HE TT, Zhang Y. Extraction of Chinese term based on Chi-square test. J Comput Appl (Ji Suan Ji Ying Yong)2007; 27(12): 3019–3020 (in Chinese)
6
Zhang WB, Bai Y, Wang PY, Zhang GP. An automatic domain terms extraction method on traditional Chinese medicine books. J Shenyang Aerosp Univ (Shenyang Hang Kong Hang Tian Da Xue Xue Bao)2011; 28(1): 72–75 (in Chinese)
7
Cen YH, Han Z, Ji PP. Chinese term recognition based on hidden Markov model. J New Technol Libr Inf Serv2008; 12: 54–58
8
Zhang HP, Liu Q. HHMM-based Chinese lexical analyzer ICTCLAS. In: Proceedings of the second SIGHAN workshop on Chinese language processing. 2003; 17: 184–187
9
Dunning T. Accurate methods for the statistics of surprise and coincidence. Int J Comput Linguist 1993; 19(1): 61–74