Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front Comput Sci Chin    2010, Vol. 4 Issue (2) : 280-301    https://doi.org/10.1007/s11704-009-0062-y
REVIEW ARTICLE
Knowledge discovery through directed probabilistic topic models: a survey
Ali DAUD1(), Juanzi LI1(), Lizhu ZHOU1(), Faqir MUHAMMAD2()
1. Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China; 2. Department of Mathematics and Statistics, Allama Iqbal Open University, Sector H-8, Islamabad 44000, Pakistan
 Download: PDF(409 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

Graphical models have become the basic framework for topic based probabilistic modeling. Especially models with latent variables have proved to be effective in capturing hidden structures in the data. In this paper, we survey an important subclass Directed Probabilistic Topic Models (DPTMs) with soft clustering abilities and their applications for knowledge discovery in text corpora. From an unsupervised learning perspective, “topics are semantically related probabilistic clusters of words in text corpora; and the process for finding these topics is called topic modeling”. In topic modeling, a document consists of different hidden topics and the topic probabilities provide an explicit representation of a document to smooth data from the semantic level. It has been an active area of research during the last decade. Many models have been proposed for handling the problems of modeling text corpora with different characteristics, for applications such as document classification, hidden association finding, expert finding, community discovery and temporal trend analysis. We give basic concepts, advantages and disadvantages in a chronological order, existing models classification into different categories, their parameter estimation and inference making algorithms with models performance evaluation measures. We also discuss their applications, open challenges and future directions in this dynamic area of research.

Keywords text corpora      parametric Directed Probabilistic Topic Mode (DPTMs)ls      soft clustering      unsupervised learning      knowledge discovery     
Corresponding Author(s): DAUD Ali,Email:ali_msdb@hotmail.com; LI Juanzi,Email:ljz@keg.cs.tsinghua.edu.cn; ZHOU Lizhu,Email:dcszlz@tsinghua.edu.cn; MUHAMMAD Faqir,Email:aioufsd@yahoo.com   
Issue Date: 05 June 2010
 Cite this article:   
Ali DAUD,Juanzi LI,Faqir MUHAMMAD, et al. Knowledge discovery through directed probabilistic topic models: a survey[J]. Front Comput Sci Chin, 2010, 4(2): 280-301.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-009-0062-y
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I2/280
Fig.1  Graphical models
Fig.1  Graphical models
ArtsEducation
WordProbabilityWordProbability
New0.03741School0.07344
Film0.03626Students0.05702
Show0.02753Schools0.04136
Music0.02151Education0.02605
Movie0.01854Teachers0.02465
Play0.01124High0.02122
Musical0.01109Public0.02026
Best0.00989Teacher0.02006
Actor0.00966Bennett0.01766
First0.00899Manigat0.01746
York0.00895Namphy0.01478
Opera0.00870State0.0143
Theater0.00854President0.01359
Actress0.00817Elementary0.01219
Love0.00806Haiti0.01211
Tab.1  Topic examples
SymbolDescription
BoldItalicNumber of documents
NNumber of words
TNumber of topics
ANumber of unique authors
V Number of unique words
NdNumber of word tokens in document d
BoldItalicdVector form of document d
BoldItalicdVector form of authors in document d
wdiThe ith word token in document d
zdiTopics assigned to word token wdi
xdiThe author associated with wdi
ydiThe timestamp associated with token wdi
θdMultinomial distribution over topics with parameter α
ΦzMultinomial distribution of words specific to z with parameter β
ΨzTime specific Beta distribution of topic z
αDirichlet distribution associated with topic z
βDirichlet distribution associated with word wdi
?Binomial Distribution associated with transition Ωi
rnRoot Node (or root topic)
RResponse variable used as observed value in supervised topic models
LLink between documents
dSource document
d'Target document
τLink value between documents
γDirichlet distribution associated with link τ
λMultinomial distribution for link generation between documents
BoldItalicClass of word, e.g., Noun Phrase (NP), Not Noun Phrase (NNP)
Tab.2  Notations
SportsEntertainment
GameArt
PlayMusic
BallPlay
TeamPart
PlayingSing
GamesLike
FootballPoetry
BaseballBand
FieldWorld
SportsRhythm
Tab.3  Polysemy with topics
Fig.2  Graph plate notations symbols
Fig.2  Graph plate notations symbols
Year/TypeBasic PDPTMsInter-Document Correlated PDPTMsIntra-Document Correlated PDPTMsTemporal PDPTMsSupervised PDPTMs
1999PLSA
2000
2001PLSA→A Joint Probabilistic Model
2002A probabilistic Approach
2003LDA, A Topic ModelLDA → Corr-LDA
2004Discrete PCALDA →Mixed Membership Models,LDA →Author-Topic Model,LDAAuthor-Topic Model →ART
2005LDA → HMM-LDALDA →LLDA
2006LDA → PAM,LDA → CTM,LDA →Statistical Entity-Topic ModelsBigram Topic Model, PLSA → CPLSALDA → TOT,LDA → DTM,(PAM, TOT) → Continuous Time Model
2007LDA → GWN-LDA,LDA → Citation Influence ModelLDA →HTMM,LDA → TNGMTTMLDA → sLDA
2008LDA →LTHM,LDA Author-Topic Model → ACT Model(A Joint Probabilistic Model, LDA) → Link-PLSA-LDALDADTM →cDTM
2009LDA → Generalized LDA,LDA Author-Topic ModelACT à Generalized ACTLDA Author-Topic Model → TAT
Tab.4  Historical paradigms of PDPTMs from 1999-2009
Fig.3  PLSA
Fig.3  PLSA
Fig.4  Smoothed LDA
Fig.4  Smoothed LDA
Fig.5  Author-Topic Model
Fig.5  Author-Topic Model
Fig.6  LTHM Model
Fig.6  LTHM Model
Fig.7  HMM-LDA Model
Fig.7  HMM-LDA Model
Fig.8  HTMM Model
Fig.8  HTMM Model
Fig.9  Continuous-Time model
Fig.9  Continuous-Time model
Fig.10  sLDA model
Fig.10  sLDA model
ModelsTypeParameter Estimation and Inference Making AlgorithmsProblem Domain (s)Dataset (s)
PLSABDPTMsEMRanking (automatic document indexing)LOB corpus, MED abstract dataset, CRAN abstracts dataset, CACM abstracts dataset, CISI abstracts dataset
A Joint Probabilistic Model IrCDPTMsEMDocument Classification, Relationship between Topics and LinksWebkb web pages dataset (http://www.cs.cmu.edu/~webkb/), Cora abstracts dataset (http://www.cora.justresearch.com)
A probabilistic ApproachBDPTMsGibbs SamplingTopic Discovery (semantics of words)TASA corpus “a collection of children reading”
LDABDPTMsVariational EMTopic Discovery, Document Classification, Collaborative FilteringTREC AP newswire articles corpus, Reuters news articles dataset (http://www.daviddlewis.com/resources/testcollections/reuters21578/), C Elegants Literature (http://elegans.swmed.edu/wli/cgcbib), EachMovie collaborative filtering dataset
A Topic ModelBDPTMsGibbs SamplingTopic Discovery (semantics of words)TASA corpus “a collection of children reading”
Corr-LDASuDPTMsVariational EMAutomating Annotation, Text-based Image RetrievalCorel images and caption dataset
discrete (PCA)Gibbs SamplingText classification, Information Retrieval20 Newsgroup dataset (http://www.ai.mit.edu/_jrennie/20Newsgroups/), Reuters news articles dataset (http://www.daviddlewis.com/resources/testcollections/reuters21578/)
Mixed-Membership Models IrCDPTMsEMTopic Discovery, Document ClassificationPNAS scientific articles dataset (http://www.pnas.org)
Author-Topic Model IrCDPTMsGibbs SamplingEntities and Topics Correlations, Topics Evolution over TimeCite seer dataset (http://citeseer.ist.psu.edu/oai.html)
ART Model IrCDPTMsGibbs SamplingTopic and Role DiscoveryEnron email dataset (http://www.cs.cmu.edu/~enron/), Researchers email achieve
A Composite Model (HMM-LDA)IaCDPTMsGibbs SamplingDocument Classification, Part-of-Speech TaggingBrown and TASA corpus “a collection of children reading” datasets, NIPS00-12 Proceedings dataset (www.cs.toronto.edu/~roweis/data.html)
LLDA Model SuDPTMsVariational EMTopic DiscoveryMicroarray dataset (http://genomics.lbl.gov/~patrickf/llda.html)
CTMIrCDPTMsVariational EMTopics CorrelationsJSTOR science articles dataset (http://www.jstor.org)
DTMTDPTMsVariational Kalman FilteringTopics Evolution over TimeJSTOR science articles dataset (http://www.jstor.org)
Statistical Entity-Topic ModelsIrCDPTMsGibbs SamplingEntities and Topics CorrelationsNew York Times dataset (http://www.ldc.upenn.edu), Foreign broadcast information service FBIS dataset (http://www.fbis.gov)
Bigram Topic ModelIaCDPTMsGibbs EMTopic DiscoveryPsychological review abstracts dataset (http://psiexp.ss.uci.edu/research/programs_data/toolbox.htm), 20 News group dataset (http://people.csail.mit.edu/jrennie/20Newsgroups/)
PAMIrCDPTMsGibbs SamplingSuper and Sub Topic Discovery, Document ClassificationNIPS00-12 Proceedings dataset (www.cs.toronto.edu/~roweis/data.html) , 20 Newsgroup dataset (http://www.cs.cmu.edu/~textlearning/), Rexa research paper search engine (http://Rexa.info)
TOT ModelTDPTMsGibbs SamplingTopics Evolution over TimeState of the Union Addresses dataset (http://www.gutenberg.org/dirs/etext04/suall11.txt), Researchers Email Achieve, NIPS00-12 Proceedings dataset (www.cs.toronto.edu/~roweis/data.html)
Continues-Time ModelTDPTMsGibbs SamplingTopics Evolution over Time and their CorrelationsRexa research paper search engine (http://Rexa.info)
CPLSAIrCDPTMsEMTemporal (Entities-Topic) Correlations, Topics Evolution over Time, Event Impact AnalysisAbstracts of 282 papers of two Data Mining researchers, from ACM Digital library, MSN Space documents, Abstracts of 28 years’ SIGIR conferences from ACM Digital Library
HTMMIaCDPTMsEM and Forward-backward algorithmTopic DiscoveryNIPS00-12 Proceedings dataset (www.cs.toronto.edu/~roweis/data.html), used dataset (http://www.cs.huji.ac.il/~amitg/htmm.html)
MTTMTDPTMsVariational EMTopics Evolution over TimeJSTOR science articles dataset (http://www.jstor.org)
sLDA ModelSuDPTMsVariational EMRanking Movies and Web PagesNews paper movie reviews dataset ( http://www.cs.cornell.edu/people/pabo/movie-review-data/), Digg Links (digg.com)
Citation Influence ModelIrCDPTMsGibbs SamplingCitation InfluenceCite seer dataset (http://citeseer.ist.psu.edu/oai.html)
GWN-LDA ModelIrCDPTMsGibbs SamplingEntities and Topics CorrelationsNanoSci articles dataset (2000-2006) taken from (http://scientific.thomson.com/products/sci/), Cite seer dataset (http://citeseer.ist.psu.edu/oai.html)
TNG ModelIaCDPTMsGibbs SamplingTopic Discovery, Information RetrievalTREC dataset, NIPS00-12 Proceedings dataset (www.cs.toronto.edu/~roweis/data.html),
Link-PLSA-LDAIrCDPTMsVariational EMBlogs InfluenceNielsen Buzz metrics blogs postings dataset (http://www.nielsenbuzzmetrics.com)
cDTMTDPTMsVariational Kalman FilteringTopics Evolution over Continuous TimeTREC-1 AP newswire articles corpus, “Election 08” dataset (digg.com)
LTHMIrCDPTMsEMRelationship between Topics and LinksWebkb web pages dataset (http://www.cs.huji.ac.il/~amitg/lthm.html), Wikipedia (http://www.cs.cmu.edu/~webkb/)
TATTDPTMsGibbs SamplingTemporal Authors Interests and CorrelationsComputer science research papers taken from http://www.informatik.uni-trier.de/~ley/db/
ACTIrCDPTMsGibbs SamplingExpertise Search in Academics Social NetworkComputer science research papers taken from http://www.arnetminer.org/
STMSIrCDPTMsGibbs SamplingExpert FindingComputer science research papers taken from http://www.informatik.uni-trier.de/~ley/db/
GLDAIrCDPTMsGibbs SamplingConference MiningComputer science research papers taken from http://www.informatik.uni-trier.de/~ley/db/
Tab.5  Summary of PDPTMs applications
1 Popescul A, Flake G W, Lawrence S, Ungar L H, Giles C L. Clustering and identifying temporal trends in document databases. IEEE ADL , 2000, 173–182
2 McCallum A, Nigam K, Ungar L H. Efficient clustering of high-dimensional data sets with application to reference matching. In: Proceedings of the 6th ACM SIGKDD , 2000, 169–178
3 Hofmann T. Probabilistic latent semantic analysis. In: Proceedings of the 15th Annual Conference on Uncertainty in Artificial Intelligence (UAI) , Stockholm, Sweden, July 30-August 1, 1999
4 Steyvers M, Griffiths T. Probabilistic topic models. In: Landauer T, Mcnamara D, Dennis S, Kintsch W (Eds), Latent Semantic Analysis: A Road to Meaning. Laurence Erlbaum , 2007
5 Heinrich G. Parameter Estimation for Text Analysis. Technical report , Version 2, February 2008
6 Smolensky P. Information processing in dynamical systems: foundations of harmony theory. In: Rumehart D E,McClelland J L (Eds), Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations . McGraw-Hill, New York, 1986
7 Welling M, Rosen-Zvi M, Hinton G. Exponential family harmoniums with an application to information retrieval. In: Advances in Neural Information Processing Systems (NIPS). Cambridge, MA, MIT Press, 2004
8 Blei D M, Ng A Y, Jordan M I. Latent dirichlet allocation. Journal of Machine Learning Research , 2003, 3: 993–1022
doi: 10.1162/jmlr.2003.3.4-5.993
9 Rosen-Zvi M, Griffiths T, Steyvers M, Smyth P. The author-topic model for authors and documents. In: Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence (UAI) , Banff, Canada, July 7–11, 2004
10 Griffiths T L, Steyvers M. Finding scientific topics. In: Proceedings of the National Academy of Sciences USA, 2004, 101: 5228–5235
doi: 10.1073/pnas.0307752101
11 Teh Y W, Jordan M I, Beal M J, Blei D M. Hierarhical Dirichlet Processes. Technical Report 653, Department of Statistics, UC Berkeley , 2004
12 Blei D M, McAuliffe J. Supervised topic models. In: Advances in Neural Information Processing Systems (NIPS) 21 Cambridge, MA, MIT Press, 2007, 121–128
13 Buntine W L. Operations for learning with graphical models. Journal of Artificial Intelligence Research , 1994, 2: 159–225
14 Steyvers M, Smyth P, Rosen-Zvi M, Griffiths T. Probabilistic author-topic models for information discovery. In: Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining . Seattle, Washington, August 22–25, 2004
15 Wang X, Li W, McCallum A. A continuous-time model of topic co-occurrence trends. In: AAAI Workshop on Event Detection . Boston, Massachusetts, USA, July 16–20, 2006
16 Nigam K, McCallum A K, Thrun S, Mitchell T. Text classification from labeled and unlabeled documents using EM. Journal of Machine Learning , 2000, 39(2–3): 103–134
doi: 10.1023/A:1007692713085
17 Griffiths T L, Steyvers M. A probabilistic approach to semantic representation. In: Proceedings of the 24th Conference of the Cognitive Science Society USA, 2002
18 Griffiths T L, Steyvers M. Prediction and semantic association. In: Advances in Neural Information Processing Systems (NIPS) 15 . Cambridge, MA, MIT Press, 2003
19 Wray L, Buntine, Jakulin A. Applying discrete PCA in data analysis. In: Proceedings of 20th Conference on Uncertainty in Artificial Intelligence (UAI) , Banff, Canada, July7–11, 2004, 59–66
20 Minka T, Lafferty J. Expectation-propagation for the generative aspect model. In: Proceedings of the 18th Conference on Uncertainty in Artificial Intelligence (UAI) , Alberta, Canada, August 1–4, 2002, 352–359
21 Hofmann T, Puzicha J, Jordan M I. Learning from dyadic data. In: Advances in Neural Information Processing Systems (NIPS) 11 . Cambridge, MA, MIT Press, 1999
22 Cohn D, Hofmann T. The missing link- a probabilistic model of document content and hypertext connectivity. In: Advances in Neural Information Processing Systems (NIPS) 13 . Cambridge, MA, MIT Press, 2001
23 Blei D M, Moreno P J. Topic segmentation with an aspect hidden Markov model. In: Proceedings of 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , New Orleans. LA USA, September 9-13, 2001, 343–348
24 Erosheva E, Fienberg S, Lafferty J. Mixed-membership models of scientific publications. In: Proceedings of the National Academy of Sciences , USA, 2004, 101: 5220–5227
doi: 10.1073/pnas.0307760101
25 Nallapati R, Cohen W. Link-plsa-lda: A new unsupervised model for topics and influence of blogs. In: Proceedings of International Conference for Weblogs and Social Media , Seattle, Washington, USA, March 30-April 2, 2008
26 McCallum A, Corrada-Emmanuel A, Wang X. The Author-recipient-topic Model for Topic and Role Discovery in Social Networks: Experiments with Enron and Academic Email. Technical Report UM-CS-2004-096 , 2004
27 Blei D M, Lafferty J. Correlated topic models. In: Advances in Neural Information Processing Systems (NIPS) 18 . Cambridge, MA, MIT Press, 2006, 147–154
28 Li W, McCallum A. Pachinko allocation: Dag-structured mixture models of topic correlations. In: Proceedings of the 23rd International Conference on Machine Learning (ICML) , Pittsburgh, Pennsylvania, June 25-29, 2006, 577–584
29 Newman D, Chemudugunta C, Smyth P, Steyvers M. Statistical entity-topic models. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Philadelphia, USA, August 20–23, 2006, 680–686
30 Zhang H, Giles C L, Foley H C, Yen J. Probabilistic community discovery using hierarchical latent Gaussian mixture model. In: Proceedings of 22nd AAAI Conference on Artificial Intelligence , Vancouver, British Columbia, Canada, July 22–26, 2007, 663–668
31 Dietz L, Bickel S, Scheffer T. Unsupervised prediction of citation influences. In: Proceedings of 24th International Conference on Machine Learning (ICML) , Corvallis, Oregon, USA, June 20–24, 2007
32 Gruber A, Rosen-Zvi M, Weiss Y. Latent topic models for hypertext. In: Proceedings of Uncertainty in Artificial Intelligence (UAI) , Helsinki, Finland, July 9–12, 2008
33 Tang J, Zhang J, Yao L, Li J, Zhang L, Su Z. ArnetMiner: extraction and mining of academic social networks. In: Proceedings of ACM SIGKDD , 2008
34 Daud A, Li J, Zhu L, Muhammad F. A generalized topic modeling approach for maven search. In: Proceedings of International Asia-Pacific Web Conference and Web-Age Information Management (APWEB-WAIM) , Suzhou, China, 2009
35 Daud A, Li J, Zhu L, Muhammad F. Conference mining via generalized topic modeling. In: Proceedings of European Conference on Machine Learning and Principles and Practices of Knowledge Discovery in Databases (ECML PKDD) , Bled, Slovenia, 2009
36 Griffiths T L, Steyvers M, Blei D M, Tenenbaum J B. Integrating topics and syntax. In: Advances in Neural Information Processing Systems (NIPS) 17 . Cambridge, MA, MIT Press, 2005, 537–544
37 Gruber A, Rosen-Zvi M, Weiss Y. Hidden topic Markov models. In: Proceedings of Artificial Intelligence and Statistics (AISTATS), San Juan , Puerto Rico, USA, March 21–24, 2007
38 Wallach J M. Topic modeling: Beyond bag-of-words. In: Proceedings of 23rd International Conference on Machine Learning (ICML) , Pittsburgh, Pennsylvania, USA, June 25–29, 2006
39 Mei Q, Zhai C X. A mixture model for contextual text mining. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Philadelphia, USA, August 20–23, 2006, 649–655
40 Deerwester S, Dumais S T, Furnas G W, Landauer T K, Harshman R. Indexing by latent semantic analysis. Journal of the American Society for Information Science , 1990, 41(6): 391–407
doi: 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO;2-9
41 Wang X, McCallum A, Wei X. Topical N-grams: phrase and topic discovery, with an application to information retrieval. In: Proceedings of the 7th IEEE International Conference on Data Mining (ICDM) , Omaha NE, USA, October 28–31, 2007
42 Rabiner L R. A tutorial on hidden Markov models and selected applications in speech recognition. In: Proceedings of the IEEE , 1989, 77(2): 257–286
doi: 10.1109/5.18626
43 Blei D M, Lafferty J. Dynamic topic models. In: Proceedings of 23rd International Conference on Machine Learning (ICML) , Pittsburgh, Pennsylvania, USA, June 25–29, 2006
44 Nallapati R, Cohen W, Ditmore S, Lafferty J, Ung K. Multiscale topic tomography. In: Proceedings of 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , San Jose, California, USA, August 12–15, 2007
45 Wang C, Blei M D, Heckerman D. Continuous time dynamic topic models. In: Proceedings of Uncertainty in Artificial Intelligence (UAI) , Helsinki, Finland, July 9–12, 2008
46 Uhlenbeck G E, Ornstein L S. On the theory of Brownian motion. Physics Reviews , 1930, 36: 823–841
doi: 10.1103/PhysRev.36.823
47 Wang X, McCallum A. Topics over time: A non-markov continuous-time model of topical trends. In: Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , Philadelphia, USA, August 20–23, 2006
48 Daud A, Li J, Zhu L, Muhammad F. Exploiting temporal authors interests via temporal-author-topic modeling. In: Proceedings of 5th International Conference on Advance Data Mining and Applications (ADMA) , Beijing, China, 2009
49 Blei D M, Jordan M. Modeling annotated data. In: Proceedings of 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval , Toronto, Canada, July 28-August 1, 2003, 127–134
50 Flaherty P, Giaever G, Kumm J, Jordan M, Arkin A. A latent variable model for chemogenomic profiling. Bioinformatics , 2005, 21(15): 3286–3293
doi: 10.1093/bioinformatics/bti515
51 Murphy K. An Introduction to Graphical Models. Technical report , University of California, Berkeley, May 2001
52 Bilmes J A. A gentle tutorial of the EM algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov modals. Berkeley, ICSI TR-97-021 , 1997
53 Jordan M I, Ghahramani Z, Jaakkola T S, Saul L K. An introduction to variational methods for graphical models. In: Jordan M (Eds), Learning in Graphical Models . MIT Press, 1998
54 Buntine W. Variational Extensions to EM and Multinomial PCA. In: Elomaa T . (Eds.): ECML, LNAI 2430, Springer-Verlag , Berlin, 2002, 23–34
55 Gilks W R, Richardson S, Spiegelhalter D J. Markov Chain Monte Carlo in Practice. London: Chapman & Hall, 1996
56 Andrieu C, Freitas N D, Doucet A, Jordan M. An introduction to MCMC for machine learning. Journal of Machine Learning , 2003, 50: 5–43
doi: 10.1023/A:1020281327116
57 Erosheva E A. Grade of membership and latent structure models with applications to disability survey data. Unpublished doctoral dissertation, Department of Statistics , Carnegie Mellon University, 2002
58 Teh Y W, Newman D, Wellingm M. A collapsed variational Bayesian inference algorithm for latent dirichlet allocation. In: Advances in Neural Information Processing Systems (NIPS) . Cambridge, MA, MIT Press, 2006
59 Azzopardi L, Girolami M, Risjbergen K V. Investigating the relationship between language model perplexity and IR precision-recall measures. In: Proceedings of the 26th ACM SIGIR , Toronto, Canada, 2003
60 Zhang J, Tang J, Liu L, Li J. A mixture model for expert finding. In: Proceedings of the PAKDD , Washio T . (Eds). LNAI,2008, 5012: 466–478
61 Chang Y L, Chien J T. Latent dirichlet learning for document summarization. In: Proceedings of the International Conference on Acoustics, Speech, and Signal Processing (ICASSP) , 2009
62 Arora R, Ravindran B. Latent dirichlet allocation based multi-document summarization. In: Proceedings of the 2nd Workshop on Analytics for Noisy Unstructured Rext Data , 2008
63 Bíró I, Szabó J, Benczúr A A. Latent dirichlet allocation in web spam filtering. In: Proceedings of the Adversarial Information Retrieval on the Web (AIRWeb’08) , 2008
64 Elango P K, Jayaraman K. Clustering images using the latent dirichlet allocation model, 2005
65 Wang Y, Mori G. Human action recognition by semi-latent topic models. IEEE Transactions on Pattern Analysis and Machine Intelligence Special Issue on Probabilistic Graphical Models in Computer Vision (T-PAMI) , 2009
66 Wang Y, Sabzmeydani P, Mori G. Semi-latent dirichlet allocation: A hierarchical model for fuman action recognition. In: 2nd Workshop on Human Motion Understanding, Modeling, Capture and Animation (ICCV) , 2007
67 Rath T M, Lavrenko V, Manmatha R. A Statistical Approach to Retrieving Historical Manuscript Images Without Recognition. Technical Report , 2003
[1] Houkui ZHOU, Huimin YU, Roland HU. Topic evolution based on the probabilistic topic model: a review[J]. Front. Comput. Sci., 2017, 11(5): 786-802.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed