Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (4) : 154325    https://doi.org/10.1007/s11704-020-9366-8
RESEARCH ARTICLE
Document structure model for survey generation using neural network
Huiyan XU1,2, Zhongqing WANG3(), Yifei ZHANG3, Xiaolan WENG1,2, Zhijian WANG1, Guodong ZHOU3
1. College of Computer and Information, Hohai University, Nanjing 210098, China
2. School of Computer Science and Technology, Huaiyin Normal University, Huai’an 223300, China
3. Natural Language Processing Lab, Soochow University, Suzhou 215006, China
 Download: PDF(1830 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Survey generation aims to generate a summary from a scientific topic based on related papers. The structure of papers deeply influences the generative process of survey, especially the relationships between sentence and sentence, paragraph and paragraph. In principle, the structure of paper can influence the quality of the summary. Therefore, we employ the structure of paper to leverage contextual information among sentences in paragraphs to generate a survey for documents. In particular, we present a neural document structure model for survey generation.We take paragraphs as units, and model sentences in paragraphs, we then employ a hierarchical model to learn structure among sentences, which can be used to select important and informative sentences to generate survey. We evaluate our model on scientific document data set. The experimental results show that our model is effective, and the generated survey is informative and readable.

Keywords survey generation      contextual information      document structure     
Corresponding Author(s): Zhongqing WANG   
Just Accepted Date: 08 September 2020   Issue Date: 11 March 2021
 Cite this article:   
Huiyan XU,Zhongqing WANG,Yifei ZHANG, et al. Document structure model for survey generation using neural network[J]. Front. Comput. Sci., 2021, 15(4): 154325.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-9366-8
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I4/154325
1 S Mohammad, B Dorr, M Egan, A Hassan, P Muthukrishan, V Qazvinian, D R Radev, D Zajic. Using citations to generate surveys of scientific paradigms. In: Proceedings of Human Language Technologies: The Annual Conference of the North American Chapter of the Association for Computational Linguistics. 2009, 584–592
https://doi.org/10.3115/1620754.1620839
2 R Jha, C Finegan-Dollak, B King, R Coke, D R Radev. Content models for survey generation: a factoid-based evaluation. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing. 2015, 441–450
https://doi.org/10.3115/v1/P15-1043
3 R Jha, R Coke, D R Radev. Surveyor: a system for generating coherent survey articles for scientific topics. In: Proceedings of the 29th AAAI Conference on Artificial Intelligence. 2015, 2167–2173
4 S Teufel, M Moens. Summarizing scientific articles: experiments with relevance and rhetorical status. Computational Linguistics, 2002, 28(4): 409–445
https://doi.org/10.1162/089120102762671936
5 S Teufel. Argumentative zoning: information extraction from scientific text. Dissertation, University of Edinburgh, 1999
6 Y Guo, A Korhonen, T Poibeau. A weakly-supervised approach to argumentative zoning of scientific documents. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2011, 273–283
7 I Tbahriti, C Chichester, F Lisacek, P Ruch. Using argumentation to retrieve articles with similar citations: an inquiry into improving related articles search in the medline digital library. International Journal ofMedical Informatics, 2006, 75(6): 488–495
https://doi.org/10.1016/j.ijmedinf.2005.06.007
8 M Liakata, S Dobnik, S Saha, C Batchelor, D Rebholz-Schuhmann. A discourse-driven content model for summarising scientific articles evaluated in a complex question answering task. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2013, 747–757
9 D H Widyantoro, M L Khodra, B Riyanto, E A Aziz. A multiclass-based classification strategy for rhetorical sentence categorization from scientific papers. Journal of ICT Research and Applications, 2013, 7(3): 235–249
https://doi.org/10.5614/itbj.ict.res.appl.2013.7.3.5
10 H Liu. Automatic argumentative-zoning using Word2vec. 2017, arXiv preprint arXiv: 1703.10152
11 J Q Chen, H Zhuge. Automatic generation of related work through summarizing citations. Concurrency and Computation: Practice and Experience, 2019, 31(3): e4261
https://doi.org/10.1002/cpe.4261
12 S Teufel, A Siddharthan, C Batchelor. Towards discipline-independent argumentative zoning: evidence from chemistry and computational linguistics. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2009, 1493–1502
https://doi.org/10.3115/1699648.1699696
13 Y F Guo, A Korhonen, I Silins, U Stenius. Weakly supervised learning of information structure of scientific abstracts — is it accurate enough to benefit real-world tasks in biomedicine? Bioinformatics, 2011, 27(22): 3179–3185
https://doi.org/10.1093/bioinformatics/btr536
14 A Cohan, N Goharian. Scientific document summarization via citation contextualization and scientific discourse. International Journal on Digital Libraries, 2018, 19(2–3): 287–303
https://doi.org/10.1007/s00799-017-0216-8
15 A M Namboodiri, A K Jain. Document Structure and Layout Analysis. Digital Document Processing. Springer, London, 2007
https://doi.org/10.1007/978-1-84628-726-8_2
16 H Lüngen, M Bärenfänger, M Hilbert, H Lobin, C Puskás. Discourse Relations and Document Structure. Linguistic Modeling of Information and Markup Languages. Springer, London, 2010
https://doi.org/10.1007/978-90-481-3331-4_6
17 S Mao, A Rosenfeld, T Kanungo. Document structure analysis algorithms: a literature survey. Proceedings of the SPIE, 2003, 5010: 197–207
https://doi.org/10.1117/12.476326
18 C C Yang, F L Wang. Fractal summarization: summarization based on fractal theory. In: Proceedings of ACM SIGIR Conference on Research and Development in Information Retrieval. 2003, 391–392
https://doi.org/10.1145/860435.860516
19 F L Wang, C C Yang. Impact of document structure on hierarchical summarization. In: Proceedings of the 9th Conference on Asian Digital Libraries. 2006, 459–469
https://doi.org/10.1007/11931584_49
20 F L Wang, C C Yang, X Shi. Multi-document summarization for terrorism information extraction. In: Proceedings of the International Conference on Intelligence and Security Informatics. 2006, 602–608
https://doi.org/10.1007/11760146_64
21 C C Yang, F L Wang. Hierarchical summarization of large documents. Journal of the American Society for Information Science and Technology, 2008, 59(6): 887–902
https://doi.org/10.1002/asi.20781
22 B Glaser, A L Strauss. The discovery of grounded theory: strategies for qualitative research. Nursing Research, 1968, 17(4): 377–380
https://doi.org/10.1097/00006199-196807000-00014
23 B Endres-Niggemeyer, E Maier, A Sigel. How to implement a naturalistic model of abstracting: four core working steps of an expert abstractor. Information Processing and Management, 1995, 31(5): 631–674
https://doi.org/10.1016/0306-4573(95)00028-F
24 D Contractor, Y F Guo, A Korhonen. Using argumentative zones for extractive summarization of scientific articles. In: Proceedings of the 24th International Conference on Computational Linguistics. 2012, 663–678
25 E Collins, I Augenstein, S Riedel. A supervised approach to extractive summarisation of scientific papers. In: Proceedings of the 21st Conference on Computational Natural Language Learning. 2017, 195–205
https://doi.org/10.18653/v1/K17-1021
26 W Xiao, G Carenini. Extractive summarization of long documents by combining global and local context. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 3009–3019
https://doi.org/10.18653/v1/D19-1298
27 T Mikolov, I Sutskever, K Chen, G S Corrado, J Dean. Distributed representations of words and phrases and their compositionality. In: Proceedings of the 27th Annual Conference on Neural Information Processing Systems. 2013, 3111–3119
28 A Graves. Supervised sequence labelling with recurrent neural networks. Disseration, Technical University of Munich, Germany, 2008
29 F A Gers, J Schmidhuber, F A Cummins. Learning to forget: continual prediction with LSTM. Neural Computation, 2000, 12(10): 2451–2471
https://doi.org/10.1162/089976600300015015
30 F A Gers, N N Schraudolph, J Schmidhuber. Learning precise timing with LSTM recurrent networks. Journal of Machine Learning Research, 2002, 3: 115–143
31 Z Ruben, L D Alicia, G D Javier, T T Doroteo, G R Joaquin, M Ian. Language identification in short utterances using long short-term memory (LSTM) recurrent neural networks. PLoS ONE, 2016, 11(1): e0146917
https://doi.org/10.1371/journal.pone.0146917
32 Y Bengio, H Schwenk, J S Senécal, F Morin, J L Gauvain. Neural probabilistic language models. Journal of Machine Learning Research, 2003, 3(6): 1137–1155
33 T Mikolov, G Zweig. Context dependent recurrent neural network language model. In: Proceedings of IEEE Spoken Language Technology Workshop. 2012, 234–239
https://doi.org/10.1109/SLT.2012.6424228
34 Q V Le, T Mikolov. Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning. 2014, 1188–1196
35 C Y Lin. ROUGE: a package for automatic evaluation of summaries. In: Proceedings of ACL Workshop on Text Summarization Branches Out. 2004, 74–81
36 R Nallapati, F Zhai, B W Zhou. SummaRuNNer: a recurrent neural network based sequence model for extractive summarization of documents. In: Proceedings of the 31st AAAI Conference on Artificial Intelligence. 2017, 3075–3081
37 J G Yao, X J Wan, J G Xiao. Recent advances in document summarization. Knowledge and Information Systems, 2017, 53(2): 297–336
https://doi.org/10.1007/s10115-017-1042-4
38 Y C Chen, M Bansal. Fast abstractive summarization with reinforceselected sentence rewriting. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. 2018, 675–686
https://doi.org/10.18653/v1/P18-1063
39 W Kryscinski, N S Keskar, B Mccann, C Xiong, R Socher. Neural text summarization: a critical evaluation. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing. 2019, 540–551
https://doi.org/10.18653/v1/D19-1051
40 MA K Halliday, R Hasan. Cohesion in English. London: Longman, 1976
[1] Article highlights Download
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed