Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci    2012, Vol. 6 Issue (5) : 513-526    https://doi.org/10.1007/s11704-012-1093-3
RESEARCH ARTICLE
A probabilistic model with multi-dimensional features for object extraction
Jing WANG1(), Zhijing LIU1, Hui ZHAO2
1. School of Computer Science and Technology, Xidian University, Xi’an 710071, China; 2. School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an 710049, China
 Download: PDF(574 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

To identify recruitment information in different domains, we propose a novel model of hierarchical treestructured conditional random fields (HT-CRFs). In our approach, first, the concept of aWeb object (WOB) is discussed for the description of special Web information. Second, in contrast to traditionalmethods, the Boolean model and multirule are introduced to denote a one-dimensional text feature for a better representation of Web objects. Furthermore, a two-dimensional semantic texture feature is developed to discover the layout of a WOB, which can emphasize the structural attributes and the specific semantics term attributes of WOBs. Third, an optimal WOB information extraction (IE) based on HT-CRF is performed, addressing the problem of a model having an excessive dependence on the page structure and optimizing the efficiency of the model’s training. Finally, we compare the proposed model with existing decoupled approaches forWOB IE. The experimental results show that the accuracy rate of WOB IE is significantly improved and that time complexity is reduced.

Keywords feature extraction      conditional random fields (CRFs)      information extraction (IE)     
Corresponding Author(s): WANG Jing,Email:wangjing@mail.xidian.edu.cn   
Issue Date: 01 October 2012
 Cite this article:   
Jing WANG,Zhijing LIU,Hui ZHAO. A probabilistic model with multi-dimensional features for object extraction[J]. Front Comput Sci, 2012, 6(5): 513-526.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-012-1093-3
https://academic.hep.com.cn/fcs/EN/Y2012/V6/I5/513
1 Cui H, Kan M Y, Chua T S. Soft pattern matching models for definitional question answering. ACM Transactions on Information Systems , 2007, 25(2)
doi: 10.1145/1229179.1229182
2 Nyberg E, Mitamura T, Callan J, Carbonell J, . The JAVELIN question-answering system at TREC 2003: a multi-strategy approach with dynamic planning. In: Proceedings of the 12th Text Retrieval Conference . 2003
3 Mooney R J, Bunescu R. Mining knowledge from text using information extraction. ACM SIGKDD Explorations Newsletter , 2005, 7(1): 3-10
doi: 10.1145/1089815.1089817
4 Kobayashi N, Iida R, Inui K, Matsumoto Y. Opinion mining on the web by extracting subject-attribute-value relations. In: Proceedings of AAAI-CAAW’06 . 2006
5 Loth R, Battistelli D, Chaumartin F, . Linguistic information extraction for job ads. In: Proceedings of the 9th International Conference on Adaptivity Personalization and Fusion of Heterogeneous Information . 2010
6 Ye S, Chua T. Learning object models from semistructured web documents. IEEE Transactions on Knowledge and Data Engineering , 2006, 18(3): 334-349
doi: 10.1109/TKDE.2006.47
7 Jinlin C, Ping Z, Cook T. Detecting web content function using generalized hidden Markov model. In: Proceedings of the IEEE 5th International Conference on Machine Learning and Applications . 2006, 279-284
8 Freitag D, McCallum A. Information extraction with HMM structures learned by stochastic optimization. In: Proceedings of American Association for Artificial Intelligence (AAAI-00) . 2000, 584-589
9 Haileong C, Hweetou N. A maximum entropy approach to information extraction from semi-structured and free text. In: Proceedings of American Association for Artificial Intelligence (AAAI-02) . 2002, 786-791
10 Finn A, Kushmerick N. A multi-level boundary classification approach to information extraction. In: Proceedings of the 15th European Conference on Machine Learning . 2004, 111-122
11 Zhu Z. Weakly-supervised relation classification for information extraction. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management . 2004, 581-588
12 Wallach H. Conditional random fields: an introduction. University of Pennsylvania CIS Technical Report MS-CIS-04-21 . 2004
13 Kristjansson T, Culotta A, Viola P, McCallum A. Interactive information extraction with constrained conditional random fields. In: Proceedings of American Association for Artificial Intelligence (AAAI-04) . 2004, 412-418
14 Lafferty J, Xiaojin Z, Yan L. Kernel conditional random fields: representation and clique selection. In: Proceedings of the 21st International Conference on Machine Learning (ICML-2004) . 2004
15 Trevor C, Blunsom P. Semantic role labelling with tree conditional random fields. In: Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL) . 2005, 169-172
16 Chen M M, Chen Y X, Brent M R, Tenney A E. Constrained optimization for validation-guided conditional random field learning. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining . 2009, 189-198
doi: 10.1145/1557019.1557046
17 Xiao J H, Wang X L, Liu B Q. The study of a nonstationary maximum entropy Markov model and its application on the pos-tagging task. In: Processings of ACM Transactions on Asian Language Information . 2007, 6(2)
18 Lafferty J, Mccallum A, Pereira F. Conditional random fields: probabilistic models for segmenting and labeling sequence data. In: Proceedings of the 18th International Conference on Machine Learning (ICML 2001) . 2001, 282 -289
19 Cohn T, Blunsom P. Semantic role labelling with tree conditional random fields. In: Proceedings of the 9th Conference on Computational Natural Language Learning (CoNLL) . 2005, 169-172
20 Xu Z T. Hierarchical conditional random fields for Chinese part-ofspeech tagging. Midterm Report for National Undergraduate Innovational Experimental Program . 2007
21 Peng F C, Feng F F, McCallum A. Chinese segmentation and new word detection using conditional random fields. In: Proceedings of the 20th international conference on Computational Linguistics . 2004
22 Peng F C,McCallum A. Accurate information extraction from research papers using conditional random fields. In: Proceedings of the Human Language Technology Conference on the North American Chapter of the Association for Computational Linguistics (HLT- NAACL 2004) . 2004, 329-336
23 Li W, McCallum A. Rapid development of Hindi named entity recognition using conditional random fields and feature induction. Journal ACM Transactions on Asian Language Information Processing (TALIP) , 2003, 2(3): 290-294
doi: 10.1145/979872.979879
24 Zhu J, Nie Z Q, Wen J R, Ma W Y. 2D conditional random fields for web information extraction. In: Proceedings of the 22nd International Conference on Machine Learning . 2005, 1044-1051
doi: 10.1145/1102351.1102483
25 Tang J, Hong M C, Li J Z, Liang B. Tree-structured conditional random fields for semantic annotation. In: Proceedings of the 5th International Semantic Web Conference (ISWC 2006) . 2006, 4273(5): 640-653
26 Zhu J, Zhang B, Nie Z Q, Wen J R, Hong H W. Webpage understanding: an integrated approach. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . 2007, 903-912
doi: 10.1145/1281192.1281288
27 Truyen T T, Phung D Q, Bui H H, Venkatesh S. Hierarchical semimarkov conditional random fields for recursive sequential data. In: Proceedings of the 22nd Annual Conference on Neural Information Processing Systems . 2008
28 Zhu J. Nie Z Q, Zhang B, Wen J R. Dynamic hierarchical Markov random fields for integrated web data extraction. Journal of Machine Learning Research , 2008, 9: 1583-1614
29 Nie Z Q, Zhang Y Z, Wen J R, Ma W Y. Object-level ranking: bringing order to web objects. In: Proceedings of WWWConference . 2005, 567-574
30 Yang X Y, Liu J. Maximum entropy random fields for texture analysis. Pattern Recognition Letters , 2002, 23(1): 93-101
doi: 10.1016/S0167-8655(01)00092-7
31 Salton G, Wong A, Yang C S. A vector space model for automatic indexing. Communication of the ACM , 1975, 18(5): 613-620
doi: 10.1145/361219.361220
32 Cai D, Yu S P, Wen J R, Ma W Y. VIPS: a visionbased page segmentation algorithm. Microsoft Technical Report, MSR-TR-2003-79 , 2003
[1] Xia-an BI, Yiming XIE, Hao WU, Luyun XU. Identification of differential brain regions in MCI progression via clustering-evolutionary weighted SVM ensemble algorithm[J]. Front. Comput. Sci., 2021, 15(6): 156903-.
[2] Yaru XIAN, Jun XIAO, Ying WANG. A fast registration algorithm of rock point cloud based on spherical projection and feature extraction[J]. Front. Comput. Sci., 2019, 13(1): 170-182.
[3] Wei SHAO,Yi DING,Hong-Bin SHEN,Daoqiang ZHANG. Deep model-based feature extraction for predicting protein subcellular localizations from bio-images[J]. Front. Comput. Sci., 2017, 11(2): 243-252.
[4] Fengying XIE,Yefen WU,Yang LI,Zhiguo JIANG,Rusong MENG. Adaptive segmentation based on multi-classification model for dermoscopy images[J]. Front. Comput. Sci., 2015, 9(5): 720-728.
[5] Zhisong PAN,Zhantao DENG,Yibing WANG,Yanyan ZHANG. Dimensionality reduction via kernel sparse representation[J]. Front. Comput. Sci., 2014, 8(5): 807-815.
[6] Yin LU,Fuxiang WANG,Xiaoyan LUO,Feng LIU. Novel infrared and visible image fusion method based on independent component analysis[J]. Front. Comput. Sci., 2014, 8(2): 243-254.
[7] R PRIYA, T. N SHANMUGAM. A comprehensive review of significant researches on content based indexing and retrieval of visual information[J]. Front Comput Sci, 2013, 7(5): 782-799.
[8] Tim SCHLüTER, Stefan CONRAD. An approach for automatic sleep stage scoring and apnea-hypopnea detection[J]. Front Comput Sci, 2012, 6(2): 230-241.
[9] YANG Jian, YANG Jingyu, ZHANG David. Median Fisher Discriminator: a robust feature extraction method with applications to biometrics[J]. Front. Comput. Sci., 2008, 2(3): 295-305.
[10] DAI Ruwei, XIAO Baihua, LIU Chenglin. Chinese character recognition: history, status and prospects[J]. Front. Comput. Sci., 2007, 1(2): 126-136.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed