1. State Key Lab of Computer Aided Design and Computer Graphics, Zhejiang University, Hangzhou 310058, China 2. College of Science, Zhejiang University of Technology, Hangzhou 310023, China 3. College of Computer Science and Technology, Zhejiang University of Technology, Hangzhou 310023, China 4. School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe AZ 85287-8809, USA
A wide variety of predictive analytics techniques have been developed in statistics, machine learning and data mining; however, many of these algorithms take a black-box approach in which data is input and future predictions are output with no insight into what goes on during the process. Unfortunately, such a closed system approach often leaves little room for injecting domain expertise and can result in frustration from analysts when results seem spurious or confusing. In order to allow for more human-centric approaches, the visualization community has begun developing methods to enable users to incorporate expert knowledge into the prediction process at all stages, including data cleaning, feature selection, model building and model validation. This paper surveys current progress and trends in predictive visual analytics, identifies the common framework in which predictive visual analytics systems operate, and develops a summarization of the predictive analytics workflow.
Lin J, Wong J, Nichols J, Cypher A, Lau T A. End-user programming of mashups with vegemite. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 97–106
40
Scaffidi C, Myers B, Shaw M. Intelligently creating and recommending reusable reformatting rules. In: Proceedings of the 14th International Conference on Intelligent User Interfaces. 2009, 297–306
41
Ives Z, Knoblock C, Minton S, Jacob M, Talukdar P, Tuchinda R, Ambite J L, Muslea M, Gazen C. Interactive data integration through smart copy & paste. In: Proceedings of the Biennial Conference on Innovative Data Systems Research. 2009
42
Kandel S, Heer J, Plaisant C, Kennedy J, Van Ham F, Riche N H, Weaver C, Lee B, Brodbeck D, Buono P. Research directions in data wrangling: visualizations and transformations for usable and credible data. Information Visualization, 2011, 10(4): 271–288 https://doi.org/10.1177/1473871611415994
43
Robertson G G, Czerwinski M P, Churchill J E. Visualization of mappings between schemas. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2005, 431–439 https://doi.org/10.1145/1054972.1055032
44
Altova. Data integration: opportunities, challenges, and altova mapforce. , 2014
45
Informatica. The informatica data quality methodology: a framework to achieve pervasive data quality through enhanced businessit collaboration. , 2010
Fogarty J, Hudson S E. Toolkit support for developing and deploying sensor-based statistical models of human situations. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2007, 135–144 https://doi.org/10.1145/1240624.1240645
49
Markovitch S, Rosenstein D. Feature generation using general constructor functions. Machine Learning, 2002, 49(1): 59–98 https://doi.org/10.1023/A:1014046307775
50
Schuller B, Reiter S, Rigoll G. Evolutionary feature generation in speech emotion recognition. In: Proceedings of the IEEE International Conference on Multimedia and Expo. 2006, 5–8 https://doi.org/10.1109/icme.2006.262500
1
Larose D T, larose C D. Data Mining and Predictive Analytics, 2nd ed. Hoboken: John Wiley & Sons, 2015
2
Schlangenstein M. UPS crunches data to make more routes more efficient, save gas. , 2013
51
Guo D S. Coordinating computational and visual approaches for interactive feature selection and multivariate clustering. Information Visualization, 2003, 2(4): 232–246 https://doi.org/10.1057/palgrave.ivs.9500053
52
Seo J, Shneiderman B. A rank-by-feature framework for unsupervised multidimensional data exploration using low dimensional projections. In: Proceedings of the IEEE Symposium on Information Visualization. 2004, 65–72
3
Ginsberg J, Mohebbi M H, Patel R S, Brammer L, Smolinski M S, Brilliant L. Detecting influenza epidemics using search engine query data. Nature, 2009, 457(7232): 1012–1014 https://doi.org/10.1038/nature07634
Culotta A. Towards detecting influenza epidemics by analyzing Twitter messages. In: Proceedings of the 1st Workshop on Social Media Analytics. 2010, 115–122 https://doi.org/10.1145/1964858.1964874
6
Lazer D, Kennedy R, King G, Vespignani A. The parable of Google flu: traps in big data analysis. Science, 2014, 343(6176): 1203–1205 https://doi.org/10.1126/science.1248506
7
Keim D A, Kohlhammer J, Ellis G, Mansmann F. Mastering the Information Age — Solving Problems with Visual Analytics. Goslar: Florian Mansmann, 2010
8
Bertini E, Lalanne D. Surveying the complementary role of automatic data analysis and visualization in knowledge discovery. In: Proceedings of the ACM SIGKDD Workshop on Visual Analytics and Knowledge Discovery: Integrating Automated Analysis with Interactive Exploration. 2009, 12–20 https://doi.org/10.1145/1562849.1562851
9
Sacha D, Stoffel A, Stoffel F, Kwon B C, Ellis G, Keim D. Knowledge generation model for visual analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1604–1613 https://doi.org/10.1109/TVCG.2014.2346481
10
El-Assady M, Jentner W, Stein M, Fischer F, Schreck T, Keim D. Predictive visual analytics —approaches for movie ratings and discussion of open research challenges. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
53
Piringer H, Berger W, Hauser H. Quantifying and comparing features in high-dimensional datasets. In: Proceedings of the 12th International Conference on Information Visualization. 2008, 240–245 https://doi.org/10.1109/iv.2008.17
54
May T, Bannach A, Davey J, Ruppert T, Kohlhammer J. Guiding feature subset selection with an interactive visualization. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 111–120 https://doi.org/10.1109/vast.2011.6102448
Klemm P, Lawonn K, Glaöer S, Niemann U, Hegenscheid K, Völzke H, Preim B. 3D regression heat map analysis of population study data. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 81–90 https://doi.org/10.1109/TVCG.2015.2468291
57
Lu Y, Wang F, Maciejewski R. Business intelligence from social media: a study from the vast box office challenge. IEEE Computer Graphics and Applications, 2014, 34(5): 58–69 https://doi.org/10.1109/MCG.2014.61
58
Brooks M, Amershi S, Lee B, Drucker S M, Kapoor A, Simard P. Featureinsight: visual support for error-driven feature ideation in text classification. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 105–112 https://doi.org/10.1109/vast.2015.7347637
59
Bögl M, Aigner W, Filzmoser P, Lammarsch T, Miksch S, Rind A. Visual analytics for model selection in time series analysis. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 2237–2246 https://doi.org/10.1109/TVCG.2013.222
60
Lu Y, Kruger R, Thom D, Wang F, Koch S, Ertl T, Maciejewski R. Integrating predictive analytics and social media. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2014, 193–202 https://doi.org/10.1109/vast.2014.7042495
11
Krause J, Perer A, Bertini E. INFUSE: interactive feature selection for predictive modeling of high dimensional data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1614–1623 https://doi.org/10.1109/TVCG.2014.2346482
12
Gleicher M. Position paper: towards comprehensible predictive modeling. In: Proceedings of IEEE VIS Workshop: Visualization for Predictive Analytics. 2014
61
Piringer H, Berger W, Krasser J. Hypermoval: Interactive visual validation of regression models for real-time simulation. Computer Graphics Forum, 2010, 29(3): 983–992 https://doi.org/10.1111/j.1467-8659.2009.01684.x
62
Mühlbacher T, Piringer H. A partition-based framework for building and validating regression models. IEEE Transactions on Visualization and Computer Graphics, 2013, 19(12): 1962–1971 https://doi.org/10.1109/TVCG.2013.125
63
Gotz D, Sun J. Visualizing accuracy to improve predictive model performance. In: Proceedings of the IEEE VISWorkshop on Visualization for Predictive Analytics. 2014
Suykens J A, Vandewalle J. Least squares support vector machine classifiers. Neural Processing Letters, 1999, 9(3): 293–300 https://doi.org/10.1023/A:1018628609742
66
Johnson B, Shneiderman B. Tree-maps: a space-filling approach to the visualization of hierarchical information structures. In: Proceedings of the IEEE Conference on Visualization. 1991, 284–291 https://doi.org/10.1109/visual.1991.175815
67
Stasko J, Zhang E. Focus+context display and navigation techniques for enhancing radial, space-filling hierarchy visualizations. In: Proceedings of the IEEE Symposium on Information Visualization. 2000, 57–65 https://doi.org/10.1109/infvis.2000.885091
68
Ware M, Frank E, Holmes G, Hall M, Witten I H. Interactive machine learning: letting users build classifiers. International Journal of Human-Computer Studies, 2001, 55(3): 281–292 https://doi.org/10.1006/ijhc.2001.0499
69
Ankerst M, Elsen C, Ester M, Kriegel H P. Visual classification: an interactive approach to decision tree construction. In: Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1999, 392–396 https://doi.org/10.1145/312129.312298
70
Van den Elzen S, Van Wijk J J. Baobabview: Interactive construction and analysis of decision trees. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2011, 151–160 https://doi.org/10.1109/vast.2011.6102453
71
Becker B, Kohavi R, Sommerfield D. Visualizing the simple Baysian classifier. In: Fayyad U, Grinstein G G, Wierse A, eds. Information Visualization in Data Mining and Knowledge Discovery. San Francisco: Morgan Kaufmann Publishers Inc., 2002
72
Caragea D, Cook D, Honavar V G. Gaining insights into support vec tor machine pattern classifiers using projection-based tour methods. In: Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2001, 251–256
13
Kandel S, Paepcke A, Hellerstein J, Heer J. Wrangler: interactive visual specification of data transformation scripts. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2011, 3363–3372 https://doi.org/10.1145/1978942.1979444
14
Rahm E, Do H H. Data cleaning: problems and current approaches. IEEE Data Eng. Bull., 2000, 23(4): 3–13
15
Kim W, Choi B J, Hong E K, Kim S K, Lee D. A taxonomy of dirty data. Data Mining and Knowledge Discovery, 2003, 7(1): 81–99 https://doi.org/10.1023/A:1021564703268
16
Ganuza M L, Ferracutti G, Gargiulo M F, Castro S M, Bjerg E, Gröller E, Matković K. The spinel explorer — interactive visual analysis of spinel group minerals. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1913–1922 https://doi.org/10.1109/TVCG.2014.2346754
17
Brown E T, Ottley A, Zhao H, Lin Q, Souvenir R, Endert A, Chang R. Finding waldo: learning about users from their interactions. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1663–1672 https://doi.org/10.1109/TVCG.2014.2346575
18
Born S, Sundermann S H, Russ C, Hopf R, Ruiz C E, Falk V, Gessat M. Stent maps — comparative visualization for the prediction of adverse events of transcatheter aortic valve implantations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2704–2713 https://doi.org/10.1109/TVCG.2014.2346459
73
Ma Y. EasySVM: a visual analysis approach for open-box support vector machines. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
74
John G H, Langley P. Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence. 1995, 338–345
75
Ho T K. Random decision forests. In: Proceedings of the 3rd International Conference on Document Analysis and Recognition. 1995, 278–282
76
Mühlbacher T, Piringer H, Gratzl S, Sedlmair M, Streit M. Opening the black box: strategies for increased user involvement in existing algorithm implementations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1643–1652 https://doi.org/10.1109/TVCG.2014.2346578
77
Paiva J G S, Schwartz W R, Pedrini H, Minghim R. An approach to supporting incremental visual data classification. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 4–17 https://doi.org/10.1109/TVCG.2014.2331979
78
Talbot J, Lee B, Kapoor A, Tan D S. EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems. 2009, 1283–1292 https://doi.org/10.1145/1518701.1518895
79
Wu Y, Pitipornvivat N, Zhao J, Yang S, Huang G, Qu H. egoSlider: visual analysis of egocentric network evolution. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 260–269 https://doi.org/10.1109/TVCG.2015.2468151
80
Stolper C D, Perer A, Gotz D. Progressive visual analytics: user-driven visual exploration of in-progress analytics. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1653–1662 https://doi.org/10.1109/TVCG.2014.2346574
81
Ng K, Ghoting A, Steinhubl S R, Stewart W F, Malin B, Sun J. PARAMO: a PARAllel predictive MOdeling platform for healthcare analytic research using electronic health records. Journal of Biomedical Informatics, 2014, 48: 160–170 https://doi.org/10.1016/j.jbi.2013.12.012
82
Chang C C, Lin C J. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2011, 2(3): 27 https://doi.org/10.1145/1961189.1961199
19
Xie C, Chen W, Huang X X, Hu Y Q, Barlowe S, Yang J. VAET: a visual analytics approach for e-transactions time-series. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1743–1752 https://doi.org/10.1109/TVCG.2014.2346913
20
Madhavan K, Elmqvist N, Vorvoreanu M, Chen X, Wong Y, Xian H, Dong Z, Johri A. Dia2: Web-based cyberinfrastructure for visual analysis of funding portfolios. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1823–1832 https://doi.org/10.1109/TVCG.2014.2346747
21
Hao M C, Janetzko H, Mittelstädt S, Hill W, Dayal U, Keim D A, Marwah M, Sharma R K. A visual analytics approach for peak-preserving prediction of large seasonal time series. Computer Graphics Forum, 2011, 30(3): 691–700 https://doi.org/10.1111/j.1467-8659.2011.01918.x
22
Hao M C, Marwah M, Janetzko H, Dayal U, Keim D A, Patnaik D, Ramakrishnan N, Sharma R K. Visual exploration of frequent patterns in multivariate time series. Information Visualization, 2012, 11(1): 71–83 https://doi.org/10.1177/1473871611430769
23
Malik A, Maciejewski R, Towers S, McCullough S, Ebert D S. Proactive spatiotemporal resource allocation and predictive visual analytics for community policing and law enforcement. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1863–1872 https://doi.org/10.1109/TVCG.2014.2346926
24
Hollt T, Magdy A, Zhan P, Chen G, Gopalakrishnan G, Hoteit I, Hansen C D, Hadwiger M. Ovis: a framework for visual analysis of ocean forecast ensembles. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1114–1126 https://doi.org/10.1109/TVCG.2014.2307892
83
Bögl M, Aigner W, Filzmoser P, Gschwandtner T, Lammarsch T, Miksch S, Rind A. Visual analytics methods to guide diagnostics for time series model predictions. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
84
Andrienko N, Andrienko G, Rinzivillo S. Experiences from supporting predictive analytics of vehicle traffic. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
25
Doraiswamy H, Ferreira N, Damoulas T, Freire J, Silva C T. Using topological analysis to support event-guided exploration in urban data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2634–2643 https://doi.org/10.1109/TVCG.2014.2346449
26
Chen W, Guo F, Wang F Y. A survey of traffic data visualization. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(6): 2970–2984 https://doi.org/10.1109/TITS.2015.2436897
27
Koch S, John M, Worner M, Muller A, Ertl T. Varifocalreader-in-depth visual analysis of large text documents. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1723–1732 https://doi.org/10.1109/TVCG.2014.2346677
28
Zhao J, Cao N, Wen Z, Song Y, Lin Y R, Collins C M. # FluxFlow: visual analysis of anomalous information spreading on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1773–1782 https://doi.org/10.1109/TVCG.2014.2346922
29
Sun G, Wu Y, Liu S, Peng T Q, Zhu J J, Liang R. EvoRiver: visual analysis of topic coopetition on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1753–1762 https://doi.org/10.1109/TVCG.2014.2346919
30
Klemm P, Oeltze-Jafra S, Lawonn K, Hegenscheid K, Volzke H, Preim B. Interactive visual analysis of image-centric cohort study data. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1673–1682 https://doi.org/10.1109/TVCG.2014.2346591
31
Arietta S M, Efros A, Ramamoorthi R, Agrawala M. City forensics: using visual elements to predict non-visual city attributes. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2624–2633 https://doi.org/10.1109/TVCG.2014.2346446
32
Ma Y X, Xu J Y, Peng D C, Zhang T, Jin C Z, Qu H M, Chen W, Peng Q S. A visual analysis approach for community detection of multi-context mobile social networks. Journal of Computer Science and Technology, 2013, 28(5): 797–809 https://doi.org/10.1007/s11390-013-1378-5
85
Maciejewski R, Hafen R, Rudolph S, Larew S G, Mitchell M, Cleveland W S, Ebert D S. Forecasting hotspots — a predictive analytics approach. IEEE Transactions on Visualization and Computer Graphics, 2011, 17(4): 440–453 https://doi.org/10.1109/TVCG.2010.82
86
Cleveland R B, Cleveland W S, McRae J E, Terpenning I. STL: a seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 1990, 6(1): 3–73
87
Bryan C, Wu X, Mniszewski S, Ma K L. Integrating predictive analytics into a spatiotemporal epidemic simulation. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2015, 17–24 https://doi.org/10.1109/vast.2015.7347626
88
Chuang J, Socher R. Interactive visualizations for deep learning. In: Proceedings of the IEEE VIS Workshop on Visualization for Predictive Analytics. 2014
89
Yeon H, Jang Y. Predictive visual analytics using topic composition. In: Proceedings of the 8th International Symposium on Visual Information Communication and Interaction. 2015, 1–8 https://doi.org/10.1145/2801040.2801054
90
Wu Y C, Liu S X, Yan K, Liu M C, Wu F Z. OpinionFlow: visual analysis of opinion diffusion on social media. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 1763–1772 https://doi.org/10.1109/TVCG.2014.2346920
91
Choo J, Lee H, Kihm J, Park H. iVisClassifier: an interactive visual analytics system for classification based on supervised dimension reduction. In: Proceedings of the IEEE Symposium on Visual Analytics Science and Technology. 2010, 27–34 https://doi.org/10.1109/vast.2010.5652443
92
Höferlin B, Netzel R, Höferlin M, Weiskopf D, Heidemann G. Interactive learning of ad-hoc classifiers for video visual analytics. In: Proceedings of the IEEE Conference on Visual Analytics Science and Technology. 2012, 23–32
93
Heimerl F, Koch S, Bosch H, Ertl T. Visual classifier training for text document retrieval. IEEE Transactions on Visualization and Computer Graphics, 2012, 18(12): 2839–2848 https://doi.org/10.1109/TVCG.2012.277
94
Munzner T. Visualization Analysis and Design. Boca Raton: CRC Press, 2014
95
Delevingne L. Hedge fund robots crushed human rivals in 2014. , 2015
96
Seifert M, Hadida A L. On the relative importance of linear model and human judge(s) in combined forecasting. Organizational Behavior and Human Decision Processes, 2013, 120(1): 24–36 https://doi.org/10.1016/j.obhdp.2012.08.003
97
Ruchikachorn P, Mueller K. Learning visualizations by analogy: promoting visual literacy through visualization morphing. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(9): 1028–1044 https://doi.org/10.1109/TVCG.2015.2413786
98
Amini F, Rufiange S, Hossain Z, Ventura Q, Irani P, McGuffin M J. The impact of interactivity on comprehending 2D and 3D visualizations of movement data. IEEE Transactions on Visualization and Computer Graphics, 2015, 21(1): 122–135 https://doi.org/10.1109/TVCG.2014.2329308
33
Van den Elzen S, Holten D, Blaas J, VanWijk J J. Dynamic network visualization with extended massive sequence views. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(8): 1087–1099 https://doi.org/10.1109/TVCG.2013.263
34
Van den Elzen S, Van Wijk J J. Multivariate network exploration and presentation: From detail to overview via selections and aggregations. IEEE Transactions on Visualization and Computer Graphics, 2014, 20(12): 2310–2319 https://doi.org/10.1109/TVCG.2014.2346441
35
Van den Elzen S, Holten D, Blaas J, Van Wijk J J. Reducing snapshots to points: a visual analytics approach to dynamic network exploration. IEEE Transactions on Visualization and Computer Graphics, 2016, 22(1): 1–10 https://doi.org/10.1109/TVCG.2015.2468078
36
Gschwandtner T, Gärtner J, Aigner W, Miksch S. A taxonomy of dirty time-oriented data. In: Proceedings of International Conference on Availability, Reliability, and Security. 2012, 58–72 https://doi.org/10.1007/978-3-642-32498-7_5
37
Eaton C, Plaisant C, Drizd T. Visualizing missing data: graph interpretation user study. In: Proceedings of IFIP Conference on Human Computer Interaction. 2005, 861–872 https://doi.org/10.1007/11555261_68
38
Templ M, Alfons A, Filzmoser P. Exploring incomplete data using visualization techniques. Advances in Data Analysis and Classification, 2012, 6(1): 29–47 https://doi.org/10.1007/s11634-011-0102-y