Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2017, Vol. 11 Issue (5) : 786-802    https://doi.org/10.1007/s11704-016-5442-5
REVIEW ARTICLE
Topic evolution based on the probabilistic topic model: a review
Houkui ZHOU1,3,4, Huimin YU1,2(), Roland HU1
1. College of Information Science & Electronic Engineering, Zhejiang University, Hangzhou 310027, China
2. State Key Laboratory of CAD & CG, Hangzhou 310027, China
3. School of Information Engineering, Zhejiang A&F University, Hangzhou 311300, China
4. Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and Information Technology, Hangzhou 311300, China
 Download: PDF(610 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Accurately representing the quantity and characteristics of users’ interest in certain topics is an important problem facing topic evolution researchers, particularly as it applies to modern online environments. Search engines can provide information retrieval for a specified topic from archived data, but fail to reflect changes in interest toward the topic over time in a structured way. This paper reviews notable research on topic evolution based on the probabilistic topic model from multiple aspects over the past decade. First, we introduce notations, terminology, and the basic topic model explored in the survey, then we summarize three categories of topic evolution based on the probabilistic topic model: the discrete time topic evolution model, the continuous time topic evolutionmodel, and the online topic evolution model. Next, we describe applications of the topic evolution model and attempt to summarize model generalization performance evaluation and topic evolution evaluation methods, as well as providing comparative experimental results for different models. To conclude the review, we pose some open questions and discuss possible future research directions.

Keywords topic evolution      probabilistic topic models      text corpora      evaluation method     
Corresponding Author(s): Huimin YU   
Just Accepted Date: 16 June 2016   Online First Date: 07 June 2017    Issue Date: 26 September 2017
 Cite this article:   
Houkui ZHOU,Huimin YU,Roland HU. Topic evolution based on the probabilistic topic model: a review[J]. Front. Comput. Sci., 2017, 11(5): 786-802.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-016-5442-5
https://academic.hep.com.cn/fcs/EN/Y2017/V11/I5/786
1 AllanJ. Introduction to topic detection and tracking. Topic Detection and Tracking. The Information Retrieval Series, Vol 12. SpringerUS, 2002, 1–16
https://doi.org/10.1007/978-1-4615-0933-2_1
2 AllanJ, Carbonell J G, DoddingtonG , YamronJ, YangY. Topic detection and tracking pilot study final report. In: Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop. 1998, 194–218
3 NallapatiR, FengA, PengF, Allan J. Event threading within news topics. In: Proceedings of the 13th ACM International Conference on Information and Knowledge Management. 2004, 446–453
https://doi.org/10.1145/1031171.1031258
4 MorinagaS, Yamanishi K. Tracking dynamics of topic trends using a finite mixture model. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 811–816
https://doi.org/10.1145/1014052.1016919
5 KumarR, Mahadevan U, SivakumarD . A graph-theoretic approach to extract storylines from search results. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 216–225
https://doi.org/10.1145/1014052.1014078
6 LinF R, HuangF M, LiangC H. Individualized storyline-based news topic retrospection. In: Proceedings of Pacific Asia Conference on Information Systems: Managing Diversity in Digital Enterprises. 2007
7 AhmedA, HoQ, TeoC H, Eisenstein J, SmolaA J , XingE P. Online inference for the infinite topic-cluster model: storylines from streaming text. In: Proceedings of the International Conference on Artificial Intelligence and Statistics. 2011, 101–109
8 HofmannT. Unsupervised learning by probabilistic latent semantic analysis. Machine Learning, 2001, 42(1): 177–196
https://doi.org/10.1023/A:1007617005950
9 BleiD M, NgA Y, JordanM I. Latent dirichlet allocation. Journal of Machine Learning Research, 2003, 3: 993–1022
10 ShanB, LiF. A survey of topic evolution based on LDA. Journal of Chinese Information Processing, 2010, 24(1): 43–49
11 ElshamyW. Continuous-time infinite dynamic topic models. Dissertation for the Doctoral Degree. Manhattan: Kansas State University, 2013
12 DaudA, LiJ Z, ZhouL Z, Muhammad F. Knowledge discovery through directed probabilistic topic models: a survey. Frontiers of Computer Science in China, 2010, 4(2): 280–301
https://doi.org/10.1007/s11704-009-0062-y
13 SteyversM, Griffiths T. Probabilistic topic models. Handbook of Latent Semantic Analysis, 2007, 427(2): 424–440
14 BleiD M, Lafferty J D. Dynamic topic models. In: Proceedings of the 23rd ACM International Conference on Machine Learning. 2006, 113–120
https://doi.org/10.1145/1143844.1143859
15 BleiD M, Lafferty J D. A correlated topic model of science. Annals of Applied Statistics, 2007, 1(1): 17–35
https://doi.org/10.1214/07-AOAS114
16 BleiD M, Griffiths T L, JordanM I . The nested Chinese restaurant process and Bayesian nonparametric inference of topic hierarchies. Journal of the ACM, 2010, 57(2): 7
https://doi.org/10.1145/1667053.1667056
17 BleiD M, CarinL, DunsonD. Probabilistic topic models. IEEE Signal Processing Magazine, 2010, 27(1): 55–65
https://doi.org/10.1109/msp.2010.938079
18 BleiD M. Probabilistic topic models. Communications of the ACM, 2012, 55(4): 77–84
https://doi.org/10.1145/2133806.2133826
19 XingE P. On topic evolution. Technical Report CMU-CALD-05-115. 2005
20 TehY W, JordanM I, BealM J, Blei D M. Hierarchical dirichlet processes. Journal of the American Statistical Association, 2006, 101: 1566–1581
https://doi.org/10.1198/016214506000000302
21 MeiQ Z, ZhaiC X. Discovering evolutionary theme patterns from text: an exploration of temporal text mining. In: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2005, 198–207
https://doi.org/10.1145/1081870.1081895
22 NallapatiR M, Ditmore S, LaffertyJ D , UngK. Multiscale topic tomography. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 520–529
https://doi.org/10.1145/1281192.1281249
23 AhmedA, XingE P. Dynamic non-parametric mixture models and the recurrent Chinese restaurant process with application to evolutionary clustering. In: Proceedings of the SIAM International Conference on Data Mining. 2008, 219–230
https://doi.org/10.1137/1.9781611972788.20
24 AhmedA, XingE P. Timeline: dynamic hierarchical Dirichlet process model for recovering birth/death and evolution of topics in text stream. In: Proceedings of the 26th Conference on Uncertainty in Artificial Intelligence. 2010, 20–29
25 WangJ, LiuX H, WangJ L, Zhao W D. News topic evolution tracking by incorporating temporal information. Communications in Computer and Information Science, 2014, 496(12): 465–472
https://doi.org/10.1007/978-3-662-45924-9_43
26 WangX R, McCallum A. Topics over time: a non-markov continuoustime model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2006, 424–433
https://doi.org/10.1145/1150402.1150450
27 WangC, BleiD, HeckermanD. Continuous time dynamic topic models. In: Proceedings of the International Conference on Uncertainty in Artificial Intelligence. 2008, 579–586
28 KawamaeN. Trend analysis model: trend consists of temporal words, topics, and timestamps. In: Proceedings of the 4th ACM International Conference on Web Search and Data Mining. 2011, 317–326
https://doi.org/10.1145/1935826.1935880
29 DubeyA, HefnyA, WilliamsonS , XingE P. A nonparametric mixture model for topic modeling over time. In: Proceedings of the SIAM International Conference on Data Mining. 2013, 530–538
https://doi.org/10.1137/1.9781611972832.59
30 LiF F, PeronaP. A Bayesian hierarchical model for learning natural scene categories. In: Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition. 2005, 524–531
31 CaniniK P, ShiL, GriffithsT L . Online inference of topics with latent Dirichlet allocation. In: Proceedings of the International Conference on Artificial Intelligence and Statistics. 2009, 65–72
32 HoffmanM, BachF R, BleiD M. Online learning for latent dirichlet allocation. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 856–864
33 SatoI, Kurihara K, NakagawaH . Deterministic single-pass algorithm for LDA. In: Proceedings of the Neural Information Processing Systems Conference. 2010, 2074–2082
34 AlSumaitL, Barbará D, DomeniconiC . On-line LDA: adaptive topic models for mining text streams with applications to topic detection and tracking. In: Proceedings of the 8th IEEE International Conference on Data Mining. 2008, 3–12
https://doi.org/10.1109/icdm.2008.140
35 Gohr, A, Hinneburg A, SchultR , SpiliopoulouM. Topic evolution in a stream of documents. In: Proceedings of the SIAM International Conference on Data Mining. 2009, 859–870
https://doi.org/10.1137/1.9781611972795.74
36 IwataT, YamadaT, SakuraiY, Ueda N. Online multiscale dynamic topic models. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and DataMining. 2010, 663–672
https://doi.org/10.1145/1835804.1835889
37 AhmedA, HoQ, EisensteinJ , XingE, SmolaA J, TeoC H. Unified analysis of streaming news. In: Proceedings of the 20th International Conference on World Wide Web. 2011, 267–276
https://doi.org/10.1145/1963405.1963445
38 GriffithsT L, Steyvers M. Finding scientific topics. Proceedings of the National Academy of Sciences, 2004, 101(suppl 1): 5228–5235
https://doi.org/10.1073/pnas.0307752101
39 HallD, Jurafsky D, ManningC D . Studying the history of ideas using topic models. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2008, 363–371
https://doi.org/10.3115/1613715.1613763
40 BolelliL, Ertekin , GilesC L . Topic and trend detection in text collections using latent dirichlet allocation. In: Proceedings of the European Conference on Information Retrieval. 2009, 776–780
41 SteyversM, SmythP, Rosen-ZviM, Griffiths T. Probabilistic authortopic models for information discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2004, 306–315
42 Rosen-ZviM, Griffiths T, SteyversM , SmythP. The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence. 2004, 487–494
43 NallapatiR M, AhmedA, XingE P, Cohen W W. Joint latent topic models for text and citations. In: Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2008, 542–550
https://doi.org/10.1145/1401890.1401957
44 ZhouD, JiX, ZhaH Y, Giles C L. Topic evolution and social interactions: how authors effect research. In: Proceedings of the 15th ACM International Conference on Information and Knowledge Management. 2006, 248–257
https://doi.org/10.1145/1183614.1183653
45 HeQ, ChenB, PeiJ, Qiu B J, MitraP , GilesL. Detecting topic evolution in scientific literature: how can citations help? In: Proceedings of the 18th ACM Conference on Information and Knowledge Management. 2009, 957–966
https://doi.org/10.1145/1645953.1646076
46 WangX L, ZhaiC X, RothD. Understanding evolution of research themes: a probabilistic generative model for citations. In: Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2013, 1115–1123
https://doi.org/10.1145/2487575.2487698
47 WangX H, ZhaiC X, HuX, SproatR. Mining correlated bursty topic patterns from coordinated text streams. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.2007, 784–793
https://doi.org/10.1145/1281192.1281276
48 HongL J, DomB, GurumurthyS , TsioutsiouliklisK. A timedependent topic model for multiple text streams. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2011, 832–840
49 LinC X, ZhaoB, MeiQ Z, Han J W. PET: a statistical model for popular events tracking in social communities. In: Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2010, 929–938
https://doi.org/10.1145/1835804.1835922
50 LinC X, MeiQ Z, HanJ W, Jiang Y L, DanilevskyM . The joint inference of topic diffusion and evolution in social communities. In: Proceedings of the 11th IEEE International Conference on Data Mining. 2011, 378–387
https://doi.org/10.1109/icdm.2011.144
51 TangX N, YangC C. TUT: a statistical model for detecting trends, topics and user interests in social media. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management. 2012, 972–981
https://doi.org/10.1145/2396761.2396884
52 SasakiK, Yoshikawa T, FuruhashiT . Online topic model for twitter considering dynamics of user interests and topic trends. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing. 2014, 1977–1985
https://doi.org/10.3115/v1/d14-1212
53 IwataT, Watanabe S, YamadaT , UedaN. Topic tracking model for analyzing consumer purchase behavior. In: Proceedings of the International Joint Conference on Artificial Intelligence. 2009, 1427–1432
54 CaiG Y, PengL B, WangY. Topic detection and evolution analysis on microblog. In: Shi Z Z, Wu Z H, Leake D, et al. eds. Intelligent Information Processing VII. IFIP Adrances in Information and Communication Technology, Vol 432. Berlin: Springer, 2014, 67–77
https://doi.org/10.1007/978-3-662-44980-6_8
55 WallachH M, MurrayI, SalakhutdinovR , MimnoD. Evaluation methods for topic models. In: Proceedings of the 26th Annual International Conference on Machine Learning. 2009, 1105–1112
https://doi.org/10.1145/1553374.1553515
56 SahaA, Sindhwani V. Learning evolving and emerging topics in social media: a dynamic nmf approach with temporal regularization. In: Proceedings of the 5th ACM International Conference on Web Search and Data Mining. 2012, 693–702
https://doi.org/10.1145/2124295.2124376
57 VacaC K, Mantrach A, JaimesA , SaerensM. A time-based collective factorization for topic discovery and monitoring in news. In: Proceedings of the 23rd ACM International Conference on World Wide Web. 2014, 527–538
https://doi.org/10.1145/2566486.2568041
58 ChenY, ZhangH, WuJ J, Wang X G. Modeling emerging, evolving and fading topics using dynamic soft orthogonal NMF with sparse representation. In: Proceedings of the IEEE International Conference on Data Mining. 2015, 61–70
https://doi.org/10.1109/icdm.2015.96
59 GlobersonA, Chechik G, PereiraF , TishbyN. Euclidean embedding of co-occurrence data. The Journal of Machine Learning Research, 2007, 8(4): 2265–2295
60 ChangJ, Boyd-Graber J L, GerrishS , WangC, BleiD M. Reading tea leaves: how humans interpret topic models. In: Proceedings of the Neural Information Processing Systems Conference. 2009, 288–296
61 WallachH M. Topic modeling: beyond bag of words. In: Proceedings of the 23rd International Conference on Machine Learning. 2006, 977–984
https://doi.org/10.1145/1143844.1143967
[1] FCS-0786-15442-HMY_suppl_1 Download
[1] Ali DAUD, Juanzi LI, Lizhu ZHOU, Faqir MUHAMMAD. Knowledge discovery through directed probabilistic topic models: a survey[J]. Front Comput Sci Chin, 2010, 4(2): 280-301.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed