Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2010, Vol. 4 Issue (3) : 324-333    https://doi.org/10.1007/s11704-010-0102-7
Research articles
Three challenges in data mining
Qiang YANG,
Department of Computer Science and Engineering, Hong Kong University of Science and Technology, Hong Kong, China;
 Download: PDF(135 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract In this article, I will discuss three challenges in today’s data mining field. These challenges include: the transfer learning challenge, the social learning challenge and the mobile context mining challenge. I pick these three challenges because I think time is ripe for each of them to be addressed in a major way in the near future, given the current technological and societal readiness to tackle them. I also believe that each of the three challenges discussed in this article will help move the science and engineering of data mining forward, and have a great impact on society.
Keywords data mining      transfer learning      social learning      mobile computing      
Issue Date: 05 September 2010
 Cite this article:   
Qiang YANG. Three challenges in data mining[J]. Front. Comput. Sci., 2010, 4(3): 324-333.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-010-0102-7
https://academic.hep.com.cn/fcs/EN/Y2010/V4/I3/324
Caruana R. Multitask learning. MachineLearning, 1997, 28, 41–75

doi: 10.1023/A:1007379606734
Pan S J, Yang Q. A survey ontransfer learning. IEEE Transactions onKnowledge and Data Engineering, 2010 Available at http://doi.ieeecomputersociety.org/10.1109/TKDE.2009.191
Raina R, Ng A Y, Koller D. Constructing informative priors usingtransfer learning. In: Proceedings of 23rdInternational Conference on Machine Learning, Carnegie Mellon, Pittsburgh,Pennsylvania. 2006, 713–720
Dai W, Xue G, Yang Q, Yu Y. Co-clustering based classification for out-of-domain documents. In: Proceedings of the 13th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, San Jose, California, USA. 2007, 210–219
Dai W, Xue G, Yang Q, Yu Y. Transferring naive Bayes classifiers for text classification. In: Proceedings of the 22rd AAAI Conference on Artificial Intelligence,Vancouver, British Columbia, Canada. 2007, 540–545
Blitzer J, McDonald R, Pereira F. Domain adaptation with structural correspondence learning. In: Proceedings of the Conference on Empirical Methods in Natural Language, Sydney,Australia. 2006, 120–128
Blitzer J, Dredze M, Pereira F. Biographies, Bollywood, boomboxes and blenders: Domain adaptation for sentiment classification. In: Proceedings of the 45th Annual Meeting of theAssociation of Computational Linguistics, Prague, Czech Republic. 2007, 432–439
Pan S J , Ni X, Sun J T, Yang Q, Chen Z. Cross-domain sentiment classification via spectral feature alignment. In: Proceedings of WWW. 2010, 751–760
Wu P, Dietterich T G. Improving SVM accuracy by training on auxiliary data sources. In: Proceedings of the 21st International Conferenceon Machine Learning, Banff, Alberta, Canada. 2004, 871–878
Arnold A, Nallapati R, Cohen W W. A comparative study of methodsfor transductive transfer learning. In: Proceedings of the 7th IEEE International Conference on Data MiningWorkshops, Washington, DC, USA, IEEE Computer Society. 2007, 77–82
Raykar V C, Krishnapuram B, Bi J, Dundar M, Rao R B. Bayesian multiple instance learning: automatic feature selection and inductivetransfer. In: Proceedings of the 25th InternationalConference on Machine learning, Helsinki, Finland. 2008, 808–815
Ling X, Xue G R, Dai W, Jiang Y, Yang Q, Yu Y. Can Chinese web pages be classified with English datasource? In: Proceedings of the 17th InternationalConference on World Wide Web, Beijing, China. 2008, 969–978
Yang Q, Chen Y, Xue G R, Dai W, Yu Y. Heterogeneous transferlearning for image clustering via the social Web. In: ACL-IJCNLP (2009). 1–9
Yang Q. Activity recognition: Linking low-level sensors to high-levelintelligence. In: International Joint Conferenceson Artificial Intelligence (IJCAI). 2009, 20–25
Pan S J, Shen D, Yang Q, Kwok J T. Transferring localization models across space. In: Proceedings of the 23rd AAAI Conference on Artificial Intelligence,Chicago, Illinois, USA. 2008, 1383–1388
Zheng V W, Pan S J, Yang Q, Pan J J. Transferring multi-device localization models using latent multi-tasklearning. In: Proceedings of the 23rd AAAIConference on Artificial Intelligence, Chicago, Illinois, USA. 2008, 1427–1432
Su E C Y, Chiu H S, Lo A, Hwang J K, Sung T Y, Hsu W L. Protein subcellular localization prediction based on compartment-specificfeature and structure conservation. BMC Bioinformatics, 2007, 8(1): 330–341

doi: 10.1186/1471-2105-8-330
Muskal S M, Kim S H. Predicting protein secondary structure content. A tandem neural network approach. Journal of Molecular Biology, 1992, 225(3): 713–727

doi: 10.1016/0022-2836(92)90396-2
Zhou G P. An intriguing controversy over protein structural classprediction. Journal of Protein Chemistry, 1998, 17(8): 729–738

doi: 10.1023/A:1020713915365
Zhou G P, Assa-Munt N. Some insights into protein structural class prediction. Proteins, 2001, 44(1): 57–59

doi: 10.1002/prot.1071
Chou K C. Prediction of protein cellular attributes using pseudo-aminoacid composition. Proteins, 2001, 43(3): 246–255

doi: 10.1002/prot.1035
Liu W, Chou K C. Prediction of protein secondary structure content. Protein Engineering, 1999, 12(12): 1041–1050

doi: 10.1093/protein/12.12.1041
Reinhardt A, Hubbard T. Using neural networks for prediction of the subcellular location of proteins. Nucleic Acids Research, 1998, 26(9): 2230–2236

doi: 10.1093/nar/26.9.2230
Huang Y, Li Y. Prediction of protein subcellular locations using fuzzy k-NN method. Bioinformatics, 2004, 20(1): 21–28

doi: 10.1093/bioinformatics/btg366
Yu C S, Lin C J, Hwang J K. Predicting subcellular localization ofproteins for Gram-negative bacteria by support vector machines basedon n-peptide compositions. Protein science : A Publication of the Protein Society, Protein Sci., 2004, 13(5): 1402–1406

doi: 10.1110/ps.03479604
Shen H B, Yang J, Chou K C. Euk-PLoc: An ensemble classifier forlarge-scale eukaryotic protein subcellular location prediction. Amino Acids, 2007, 33(1): 57–67

doi: 10.1007/s00726-006-0478-8
Chou K C, Shen H B. Cell-PLoc: A package of Web servers for predicting subcellular localization ofproteins in various organisms. Nature Protocols, 2008, 3(2): 153–162

doi: 10.1038/nprot.2007.494
Xu Q, Pan S J, Xue H H, Yang Q. Multitask learning for protein subcellular location prediction. In: IEEE/ACM Transactions on Computational Biology and Bioinformatics. 2010
Wang F-Y, Carley K M, Zeng D, Mao W. Social computing: From social informatics to social intelligence. In: IEEE Intelligent Systems, March/April. 2007, 79–83

doi: 10.1109/MIS.2007.41
Liben-Nowell D, Kleinberg J. The link-prediction problem for social networks. JASIST, 2007, 58(7): 1019–1031

doi: 10.1002/asi.20591
Liben-Nowell D, Kleinberg J M. The link prediction problem for social networks. In: ACM Conference on Information and Knowledge Management. 2003, 556–559
Breese J, Heckerman D, Kadie C. Empirical analysis of predictivealgorithms for collaborative filtering. In: Proceedings of the 14th conference on Uncertainty in Artificial Intelligence. 1998, 43–52
Resnick P, Iacovou N, Suchak M, Bergstrom P, Riedl J. GroupLens: An open architecture for Collaborative filtering of netnews. In: Proceedings of the 1994 ACM Conference on ComputerSupported Cooperative Work. 1994, 175–186
Herlocker J, Konstan J A, Riedl J. An empirical analysis ofdesign choices in neighborhood-based collaborative Filtering algorithms. Information Retrieval, 2002, 5(4): 287–310

doi: 10.1023/A:1020443909834
Sarwar B, Karypis G, Konstan J, Reidl J. Item-based collaborative filtering recommendation algorithms. In: WWW. 2001, 285–295
Han J, Sun Y, Yan Y, Yu P S. Mining knowledge from databases: An information network analysisapproach. In: SIGMOD Conference. 2010, 1251–1252
Gruhl D, Guha R V, Liben-Nowell D, Tomkins A. Information diffusion through blogspace. In: WWW. 2004, 491–501
Tang J, Sun J, Wang C, Yang Z. Social influence analysis in large-scale networks. In: ACM KDD. 2009, 807–816
Leskovec J, Backstrom L, Kumar R, Tomkins A. Microscopic evolution of social networks. In: ACM KDD. 2008, 462–470
Linden G, Smith B, York J. Amazon.com recommendations: Item-to-itemcollaborative filtering. IEEE Internet Computing, 2003, 7(1): 76–80

doi: 10.1109/MIC.2003.1167344
Goldberg K, Roeder T, Gupta D, Perkins C. Eigentaste: A constant time? collaborative?filtering ?algorithm. ?Information Rretrieval, 2001, 4(2): 133–151

doi: 10.1023/A:1011419012209
Ma H, King I, Lyu M. Effective missing data prediction forcollaborative filtering. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Researchand Development in Information Retrieval. 2007, 39–46
Rennie J, Srebro N. Fast maximum margin matrix factorization for collaborative prediction. In: Proceedings of the 22nd International Conferenceon Machine Learning. 2005, 713–719
Paterek A. Improving regularized singular value decomposition forcollaborative filtering. In: Proceedings of KDD Cup and Workshop. 2007
Koren Y, Bell R, Volinsky C. Matrix factorization techniques for recommender systems. IEEE Computer, 2009, 42(8): 30–37
Hofmann T. Latent semantic models for collaborative filtering. ACM Transactions on Information Systems, 2004, 22(1): 89–115

doi: 10.1145/963770.963774
Jin R, Si L, Zhai C, Callan J. Collaborative filtering with decoupled models for preferences and ratings. In: ACM Conference on Information and KnowledgeManagement. 2003, 309–316
Salakhutdinov R, Mnih A, Hinton G. Restricted Boltzmann machines for collaborativefiltering. In: Proceedings of the 24thInternational Conference on Machine Learning. 2007, 791–798
Li B, Yang Q, Xue X. Transfer learning for collaborative filteringvia a rating-matrix generative model. In: ICML. 2009, 617–624
Pan W, Xiang E W, Liu N, Yang Q. Transfer learning in collaborative filtering for sparsity reduction. In: Proceedings of the 24rd AAAI Conference onArtificial Intelligence. 2010. To appear
Kittur A, Chi E H, Suh B. Crowdsourcing user studies with Mechanical Turk. In: Proceeding of the Twenty-Sixth Annual SIGCHI Conference on HumanFactors in Computing Systems (2008). CHI ‘08. ACM, New York, NY, 2008, 453–456
Das A S, Datar M, Garg A, Rajaram S. Google news personalization: scalable online collaborative filtering. In: Proceedings of WWW. 2007, 271–280
Dean J, Ghemawat S. Mapreduce. Communications of the ACM, 2008, 51(1): 107–113

doi: 10.1145/1327452.1327492
Yin J, Chai X, Yang Q. High-level goal recognition in a wirelessLAN. In: Proceedings of the 19th AAAI Conferenceon Artificial Intelligence, San Jose, California, USA. 2004, 578–584
Chai X, Yang Q. Multiple-goal recognition from low-level signals. In: Proceedings of the 20 AAAI Conference on Artificial Intelligence,San Jose, California, USA. 2005, 3–8
Hu D H, Yang Q. Cigar: Concurrentand interleaving goal and activity recognition. In: Proceedings of the 23 AAAI Conference on Artificial Intelligence,San Jose, California, USA. 2008, 1715–1720
Yin J, Yang Q, Pan J J. Sensor-based abnormal human-activitydetection. IEEE Trans. on Knowl. and DataEng., 2008, 20(8): 1082–1090

doi: 10.1109/TKDE.2007.1042
Hu D H, Zhang X X, Yin J, Zheng V W, Yang Q. Abnormal activity recognitionbased on HDP-HMM models. In: International Joint Conferences on Artificial Intelligence (IJCAI). 2009, 1715–1720
Zheng V W, Zheng Y, Xie X, Yang Q. Collaborative location and activity recommendations with gps history data. In: WWW. 2010, 1029–1038
Zheng V W, Cao B, Zheng Y, Xie X, Yang Q. Collaborative filtering meets mobile recommendation: A user-centered approach. In: Proceedings of the 24rd AAAI Conference on Artificial Intelligence. 2010. To appear
Eagle N. Mobile Phones as Social Sensors. The Handbook of Emergent Technologies in Social Research. Oxford University Press, 2010
[1] Ibrahim ALSEADOON, Aakash AHMAD, Adel ALKHALIL, Khalid SULTAN. Migration of existing software systems to mobile computing platforms: a systematic mapping study[J]. Front. Comput. Sci., 2021, 15(2): 152204-.
[2] Genan DAI, Xiaoyang HU, Youming GE, Zhiqing NING, Yubao LIU. Attention based simplified deep residual network for citywide crowd flows prediction[J]. Front. Comput. Sci., 2021, 15(2): 152317-.
[3] Yuling MA, Chaoran CUI, Jun YU, Jie GUO, Gongping YANG, Yilong YIN. Multi-task MIML learning for pre-course student performance prediction[J]. Front. Comput. Sci., 2020, 14(5): 145313-.
[4] Guijuan ZHANG, Yang LIU, Xiaoning JIN. A survey of autoencoder-based recommender systems[J]. Front. Comput. Sci., 2020, 14(2): 430-450.
[5] Lu LIU, Shang WANG. Meta-path-based outlier detection in heterogeneous information network[J]. Front. Comput. Sci., 2020, 14(2): 388-403.
[6] Xu-Ying LIU, Sheng-Tao WANG, Min-Ling ZHANG. Transfer synthetic over-sampling for class-imbalance learning with limited minority class data[J]. Front. Comput. Sci., 2019, 13(5): 996-1009.
[7] Satoshi MIYAZAWA, Xuan SONG, Tianqi XIA, Ryosuke SHIBASAKI, Hodaka KANEDA. Integrating GPS trajectory and topics from Twitter stream for human mobility estimation[J]. Front. Comput. Sci., 2019, 13(3): 460-470.
[8] Hao SHAO. Query by diverse committee in transfer active learning[J]. Front. Comput. Sci., 2019, 13(2): 280-291.
[9] Shuaiqiang WANG, Yilong YIN. Polygene-based evolutionary algorithms with frequent pattern mining[J]. Front. Comput. Sci., 2018, 12(5): 950-965.
[10] Bo SUN, Haiyan CHEN, Jiandong WANG, Hua XIE. Evolutionary under-sampling based bagging ensemble method for imbalanced data classification[J]. Front. Comput. Sci., 2018, 12(2): 331-350.
[11] Yuan LI, Yuhai ZHAO, Guoren WANG, Xiaofeng ZHU, Xiang ZHANG, Zhanghui WANG, Jun PANG. Finding susceptible and protective interaction patterns in large-scale genetic association study[J]. Front. Comput. Sci., 2017, 11(3): 541-554.
[12] Junhua LU,Wei CHEN,Yuxin MA,Junming KE,Zongzhuang LI,Fan ZHANG,Ross MACIEJEWSKI. Recent progress and trends in predictive visual analytics[J]. Front. Comput. Sci., 2017, 11(2): 192-207.
[13] Chengliang WANG,Yayun PENG,Debraj DE,Wen-Zhan SONG. DPHK: real-time distributed predicted data collecting based on activity pattern knowledge mined from trajectories in smart environments[J]. Front. Comput. Sci., 2016, 10(6): 1000-1011.
[14] Xin XU,Wei WANG,Jianhong WANG. A three-way incremental-learning algorithm for radar emitter identification[J]. Front. Comput. Sci., 2016, 10(4): 673-688.
[15] Wenmei LIU,Hui LIU. Major motivations for extract method refactorings: analysis based on interviews and change histories[J]. Front. Comput. Sci., 2016, 10(4): 644-656.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed