|
|
Three challenges in data mining |
Qiang YANG, |
Department of Computer
Science and Engineering, Hong Kong University of Science and Technology,
Hong Kong, China; |
|
|
Abstract In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.
|
Keywords
data mining
transfer learning
social learning
mobile computing
|
Issue Date: 05 September 2010
|
|
|
Caruana R. Multitask learning. MachineLearning, 1997, 28, 41–75
doi: 10.1023/A:1007379606734
|
|
Pan S J, Yang Q. A survey ontransfer learning. IEEE Transactions onKnowledge and Data Engineering, 2010 Available at http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191
|
|
Raina R, Ng A Y, Koller D. Constructing informative priors usingtransfer learning. In: Proceedings of 23rdInternational Conference on Machine Learning, Carnegie Mellon, Pittsburgh,Pennsylvania. 2006, 713–720
|
|
Dai W, Xue G, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Jose, California, USA. 2007, 210–219
|
|
Dai W, Xue G, Yang Q, Yu Y. Transferring naive Bayes classifiers for text classification. In: Proceedings of the 22rd AAAI Conference on Artificial Intelligence,Vancouver, British Columbia, Canada. 2007, 540–545
|
|
Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the Conference on Empirical Methods in Natural Language, Sydney,Australia. 2006, 120–128
|
|
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of theAssociation of Computational Linguistics, Prague, Czech Republic. 2007, 432–439
|
|
Pan S J , Ni X, Sun J T, Yang Q, Chen Z. Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW. 2010, 751–760
|
|
Wu P, Dietterich T G. Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conferenceon Machine Learning, Banff, Alberta, Canada. 2004, 871–878
|
|
Arnold A, Nallapati R, Cohen W W. A comparative study of methodsfor transductive transfer learning. In: Proceedings of the 7th IEEE International Conference on Data MiningWorkshops, Washington, DC, USA, IEEE Computer Society. 2007, 77–82
|
|
Raykar V C, Krishnapuram B, Bi J, Dundar M, Rao R B. Bayesian multiple instance learning: automatic feature selection and inductivetransfer. In: Proceedings of the 25th InternationalConference on Machine learning, Helsinki, Finland. 2008, 808–815
|
|
Ling X, Xue G R, Dai W, Jiang Y, Yang Q, Yu Y. Can Chinese web pages be classified with English datasource? In: Proceedings of the 17th InternationalConference on World Wide Web, Beijing, China. 2008, 969–978
|
|
Yang Q, Chen Y, Xue G R, Dai W, Yu Y. Heterogeneous transferlearning for image clustering via the social Web. In: ACL-IJCNLP (2009). 1–9
|
|
Yang Q. Activity recognition: Linking low-level sensors to high-levelintelligence. In: International Joint Conferenceson Artificial Intelligence (IJCAI). 2009, 20–25
|
|
Pan S J, Shen D, Yang Q, Kwok J T. Transferring localization models across space. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,Chicago, Illinois, USA. 2008, 1383–1388
|
|
Zheng V W, Pan S J, Yang Q, Pan J J. Transferring multi-device localization models using latent multi-tasklearning. In: Proceedings of the 23rd AAAIConference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1427–1432
|
|
Su E C Y, Chiu H S, Lo A, Hwang J K, Sung T Y, Hsu W L. Protein subcellular localization prediction based on compartment-specificfeature and structure conservation. BMC Bioinformatics, 2007, 8(1): 330–341
doi: 10.1186/1471-2105-8-330
|
|
Muskal S M, Kim S H. Predicting protein secondary structure content. A tandem neural network approach. Journal of Molecular Biology, 1992, 225(3): 713–727
doi: 10.1016/0022-2836(92)90396-2
|
|
Zhou G P. An intriguing controversy over protein structural classprediction. Journal of Protein Chemistry, 1998, 17(8): 729–738
doi: 10.1023/A:1020713915365
|
|
Zhou G P, Assa-Munt N. Some insights into protein structural class prediction. Proteins, 2001, 44(1): 57–59
doi: 10.1002/prot.1071
|
|
Chou K C. Prediction of protein cellular attributes using pseudo-aminoacid composition. Proteins, 2001, 43(3): 246–255
doi: 10.1002/prot.1035
|
|
Liu W, Chou K C. Prediction of protein secondary structure content. Protein Engineering, 1999, 12(12): 1041–1050
doi: 10.1093/protein/12.12.1041
|
|
Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research, 1998, 26(9): 2230–2236
doi: 10.1093/nar/26.9.2230
|
|
Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics, 2004, 20(1): 21–28
doi: 10.1093/bioinformatics/btg366
|
|
Yu C S, Lin C J, Hwang J K. Predicting subcellular localization ofproteins for Gram-negative bacteria by support vector machines basedon n-peptide compositions. Protein science : A Publication of the Protein Society, Protein Sci., 2004, 13(5): 1402–1406
doi: 10.1110/ps.03479604
|
|
Shen H B, Yang J, Chou K C. Euk-PLoc: An ensemble classifier forlarge-scale eukaryotic protein subcellular location prediction. Amino Acids, 2007, 33(1): 57–67
doi: 10.1007/s00726-006-0478-8
|
|
Chou K C, Shen H B. Cell-PLoc: A package of Web servers for predicting subcellular localization ofproteins in various organisms. Nature Protocols, 2008, 3(2): 153–162
doi: 10.1038/nprot.2007.494
|
|
Xu Q, Pan S J, Xue H H, Yang Q. Multitask learning for protein subcellular location prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2010
|
|
Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. In: IEEE Intelligent Systems, March/April. 2007, 79–83
doi: 10.1109/MIS.2007.41
|
|
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. JASIST, 2007, 58(7): 1019–1031
doi: 10.1002/asi.20591
|
|
Liben-Nowell D, Kleinberg J M. The link prediction problem for social networks. In: ACM Conference on Information and Knowledge Management. 2003, 556–559
|
|
Breese J, Heckerman D, Kadie C. Empirical analysis of predictivealgorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence. 1998, 43–52
|
|
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for Collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on ComputerSupported Cooperative Work. 1994, 175–186
|
|
Herlocker J, Konstan J A, Riedl J. An empirical analysis ofdesign choices in neighborhood-based collaborative Filtering algorithms. Information Retrieval, 2002, 5(4): 287–310
doi: 10.1023/A:1020443909834
|
|
Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborative filtering recommendation algorithms. In: WWW. 2001, 285–295
|
|
Han J, Sun Y, Yan Y, Yu P S. Mining knowledge from databases: An information network analysisapproach. In: SIGMOD Conference. 2010, 1251–1252
|
|
Gruhl D, Guha R V, Liben-Nowell D, Tomkins A. Information diffusion through blogspace. In: WWW. 2004, 491–501
|
|
Tang J, Sun J, Wang C, Yang Z. Social influence analysis in large-scale networks. In: ACM KDD. 2009, 807–816
|
|
Leskovec J, Backstrom L, Kumar R, Tomkins A. Microscopic evolution of social networks. In: ACM KDD. 2008, 462–470
|
|
Linden G, Smith B, York J. Amazon.com recommendations: Item-to-itemcollaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80
doi: 10.1109/MIC.2003.1167344
|
|
Goldberg K, Roeder T, Gupta D, Perkins C. Eigentaste: A constant time? collaborative?filtering ?algorithm. ?Information Rretrieval, 2001, 4(2): 133–151
doi: 10.1023/A:1011419012209
|
|
Ma H, King I, Lyu M. Effective missing data prediction forcollaborative filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval. 2007, 39–46
|
|
Rennie J, Srebro N. Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conferenceon Machine Learning. 2005, 713–719
|
|
Paterek A. Improving regularized singular value decomposition forcollaborative filtering. In: Proceedings of KDD Cup and Workshop. 2007
|
|
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009, 42(8): 30–37
|
|
Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 2004, 22(1): 89–115
doi: 10.1145/963770.963774
|
|
Jin R, Si L, Zhai C, Callan J. Collaborative filtering with decoupled models for preferences and ratings. In: ACM Conference on Information and KnowledgeManagement. 2003, 309–316
|
|
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborativefiltering. In: Proceedings of the 24thInternational Conference on Machine Learning. 2007, 791–798
|
|
Li B, Yang Q, Xue X. Transfer learning for collaborative filteringvia a rating-matrix generative model. In: ICML. 2009, 617–624
|
|
Pan W, Xiang E W, Liu N, Yang Q. Transfer learning in collaborative filtering for sparsity reduction. In: Proceedings of the 24rd AAAI Conference onArtificial Intelligence. 2010. To appear
|
|
Kittur A, Chi E H, Suh B. Crowdsourcing user studies with Mechanical Turk. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on HumanFactors in Computing Systems (2008). CHI ‘08. ACM, New York, NY, 2008, 453–456
|
|
Das A S, Datar M, Garg A, Rajaram S. Google news personalization: scalable online collaborative filtering. In: Proceedings of WWW. 2007, 271–280
|
|
Dean J, Ghemawat S. Mapreduce. Communications of the ACM, 2008, 51(1): 107–113
doi: 10.1145/1327452.1327492
|
|
Yin J, Chai X, Yang Q. High-level goal recognition in a wirelessLAN. In: Proceedings of the 19th AAAI Conferenceon Artificial Intelligence, San Jose, California, USA. 2004, 578–584
|
|
Chai X, Yang Q. Multiple-goal recognition from low-level signals. In: Proceedings of the 20 AAAI Conference on Artificial Intelligence,San Jose, California, USA. 2005, 3–8
|
|
Hu D H, Yang Q. Cigar: Concurrentand interleaving goal and activity recognition. In: Proceedings of the 23 AAAI Conference on Artificial Intelligence,San Jose, California, USA. 2008, 1715–1720
|
|
Yin J, Yang Q, Pan J J. Sensor-based abnormal human-activitydetection. IEEE Trans. on Knowl. and DataEng., 2008, 20(8): 1082–1090
doi: 10.1109/TKDE.2007.1042
|
|
Hu D H, Zhang X X, Yin J, Zheng V W, Yang Q. Abnormal activity recognitionbased on HDP-HMM models. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 1715–1720
|
|
Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with gps history data. In: WWW. 2010, 1029–1038
|
|
Zheng V W, Cao B, Zheng Y, Xie X, Yang Q. Collaborative filtering meets mobile recommendation: A user-centered approach. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
|
|
Eagle N. Mobile Phones as Social Sensors. The Handbook of Emergent Technologies in Social Research. Oxford University Press, 2010
|
|
Viewed |
|
|
|
Full text
|
|
|
|
|
Abstract
|
|
|
|
|
Cited |
|
|
|
|
|
Shared |
|
|
|
|
|
Discussed |
|
|
|
|