|
|
Entity attribute discovery and clustering from online reviews |
Qingliang MIAO1,*(),Qiudan LI2,Daniel ZENG2,Yao MENG1,Shu ZHANG1,Hao YU3 |
1. Fujitsu Research & Development Center, Beijing 100025, China 2. Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China 3. Ricoh Software Research Center (Beijing) Co., Ltd., Beijing 100082, China |
|
|
Abstract The rapid increase of user-generated content (UGC) is a rich source for reputation management of entities, products, and services. Looking at online product reviews as a concrete example, in reviews, customers usually give opinions on multiple attributes of products, therefore the challenge is to automatically extract and cluster attributes that are mentioned. In this paper, we investigate efficient attribute extraction models using a semi-supervised approach. Specifically, we formulate the attribute extraction issue as a sequence labeling task and design a bootstrapped schema to train the extraction models by leveraging a small quantity of labeled reviews and a larger number of unlabeled reviews. In addition, we propose a clustering By committee (CBC) approach to cluster attributes according to their semantic similarity. Experimental results on real world datasets show that the proposed approach is effective.
|
Keywords
opinion mining
attribute extraction
attribute clustering
|
Corresponding Author(s):
Qingliang MIAO
|
Issue Date: 24 June 2014
|
|
1 |
PangB, LeeL. Opinion mining and sentiment analysis. Foundations and Trends in Information Retrieval, 2008, 2(1-2): 1-135 doi: 10.1561/1500000011
|
2 |
LiuB, HuM, ChengJ. Opinion observer: analyzing and comparing opinions on the web. In: Proceedings of the 14th International World Wide Web Conference. 2005, 342-351 doi: 10.1145/1060745.1060797
|
3 |
HuM, LiuB. Mining and summarizing customer reviews. In: Proceedings of the tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 168-177
|
4 |
PopescuA M, EtzioniO. Extracting product features and opinions from reviews. In: Proceedings of the 2005 Conference on Empirical Methods in Natural Language Processing. 2005, 339-346
|
5 |
MiaoQ, LiQ, DaiR. An integration strategy for mining product features and opinions. In: Proceedings of the 17th Conference on Information and Knowledge Management. 2008, 1369-1370
|
6 |
GiuseppeC, RaymondT, EdZ. Extracting knowledge from evaluative text. In: Proceedings of the 3rd International Conference on Knowledge Capture. 2005, 11-18
|
7 |
SuQ, XiangK, WangH, SunB, YuS. Using pointwise mutual information to identify implicit features in customer reviews. In: Proceedings of the 21st International Conference on the Computer Processing of Oriental Languages. 2006
|
8 |
ShiB, ChangK. Mining Chinese reviews. In: Proceedings of the 6th IEEE International Conference on Data Mining. 2006, 585-589
|
9 |
RayidG, KatharinaP, LiuY, MarkoK, AndrewF. Text mining for product attribute extraction. ACM SIGKDD Explorations Newsletter, 2006, 8(1): 41-48 doi: 10.1145/1147234.1147241
|
10 |
WangB, WangH. Bootstrapping both product properties and opinion words from Chinese reviews with cross-training. In: Proceedings of the 2007 IEEE/WIC/ACM International Conference on Web Intelligence. 2007, 259-262
|
11 |
JinW, HoH. A novel lexicalized HMM based learning framework for web opinion mining. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 465-472
|
12 |
QiL, ChenL. A linear-chain CRF-based learning approach for web opinion mining. In: Proceedings of the 11th International Conference on Web Information Systems Engineering. 2010, 128-141
|
13 |
ZhangS, JiaW, XiaY, MengY, YuH. Product features extraction and categorization in Chinese reviews. In: Proceedings of the 6th International Multi-Conference on Computing in the Global Information Technology. 2010, 38-43
|
14 |
SomprasertsriG, LalitrojwongP. Automatic product feature extraction from online product reviews using maximum entropy with lexical and syntactic features. In: Proceedings of the 2008 IEEE International Conference on Information Reuse and Integration. 2008, 250-255 doi: 10.1109/IRI.2008.4583038
|
15 |
MiaoQ, LiQ, DanielZ. Mining fine grained opinions by using probabilistic models and domain knowledge, In: Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence. 2010, 358-365
|
16 |
LaffertyJ, McCallumA, PereiraF. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning. 2001, 282-289
|
17 |
SuQ, XuX, GuoH, GuoZ, WuX, ZhangX, SwenB, SuZ. Hidden sentiment association in Chinese web opinion mining. In: Proceedings of the 17th International Conference on World Wide Web. 2008, 959-968 doi: 10.1145/1367497.1367627
|
18 |
GuoH, ZhuH, GuoZ, ZhangX, SuZ. Product feature categorization with multilevel latent semantic association. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 1087-1096
|
19 |
ZhaiZ, LiuB, XuH, JiaP. Clustering product features for opinion mining. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 347-354 doi: 10.1145/1935826.1935884
|
20 |
GiuseppeP. A semantic similarity metric combining features and intrinsic information content. Data & Knowledge Engineering, 2009, 68(11), 1289-1308 doi: 10.1016/j.datak.2009.06.008
|
21 |
RudiL, PaulM. The Google similarity distance. IEEE Transactions on Knowledge and Data Engineering, 2007, 19(3): 370-383 doi: 10.1109/TKDE.2007.48
|
22 |
DanushkaB, YutakaM, MitsuruI. Measuring semantic similarity between words using web search engines. In: Proceedings of the 16th International Conference on World Wide Web. 2007, 757-766
|
23 |
HuX, SunN, ZhangC, ChuaT. Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 919-928
|
24 |
PatrickP, DekangL. Discovering word senses from text. In: Proceedings of the 8th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining. 2002, 613-619
|
25 |
PeterD T, PatrickP. From frequency to meaning: vector space models of semantics. Journal of Artificial Intelligence Research, 2010, 37(1): 141-188
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|