1. School of Data Science and Engineering, East China Normal University, Shanghai 200062, China 2. Advanced Data Analytics Laboratory of Soochow University, Suzhou 215006, China
Stream processing has emerged as a useful technology for applications which require continuous and low latency computation on infinite streaming data. Since stream processing systems (SPSs) usually require distributed deployment on clusters of servers in face of large-scale of data, it is especially common to meet with failures of processing nodes or communication networks, but should be handled seriously considering service quality. A failed system may produce wrong results or become unavailable, resulting in a decline in user experience or even significant financial loss. Hence, a large amount of fault tolerance approaches have been proposed for SPSs. These approaches often have their own priorities on specific performance concerns, e.g., runtime overhead and recovery efficiency. Nevertheless, there is a lack of a systematic overview and classification of the state-of-the-art fault tolerance approaches in SPSs, which will become an obstacle for the development of SPSs. Therefore, we investigate the existing achievements and develop a taxonomy of the fault tolerance in SPSs. Furthermore, we propose an evaluation framework tailored for fault tolerance, demonstrate the experimental results on two representative open-sourced SPSs and exposit the possible disadvantages in current designs. Finally, we specify future research directions in this domain.
four atomic operators Word count Distinct count statistics
Throughput Latency Penalty factor
Yahoo! StreamingBench [ 42]
Storm Flink Spark Streaming
Advertisements User clicks
Advertisement click analytics
Throughput Latency
YCSB Extension [ 43]
Storm Flink Spark Streaming
Advertisements User clicks
Advertisement click analytics
Throughput Latency
StreamBench [ 44]
Storm Flink Spark Streaming
Synthetic data
Word count Advertisement click analytics KMeans
Throughput Latency
RIoTBench [ 45]
Storm
Urban sensing data Energy consumption MHealth records
ETL Statistics Model training Predictive Analytics
Peak rate Latency Resource usage Jitter
Benchmark Suite [ 46]
Storm Spark Streaming
Project Gutenberg 1998 World Cup Taxi traces
Word count Click analytics Traffic monitoring
Throughput Latency Scalability Tuple loss Resource usage
Benchmarking DSDPS [ 47]
Storm Flink Spark Streaming
Purchasing records Advertisements
Windowed aggregation Windowed join
Throughput Sustainable throughput Latency
Our work
Storm Flink
English novel set Synthetic data
Word count π calculation Dummy topology
Throughput Normal latency Resource usage Recovery latency
Tab.2
Runtime Overhead
Normal latency
Hardware resource
CPU
Memory
Network
Recovery Efficiency
Recovery latency
Recovery accuracy
Precise
Duplicate
Lost
Erroneous
Disordered
Tab.3
Fig.2
Fig.3
Fig.4
Fig.5
Fig.6
Deployments
Steps
Operator Initiation
Connection Establishment
Checkpoint Retrieval
Hot
Cold
Deployed
Vacant
Tab.4
Fig.7
Fig.8
Fig.9
Object
Knob
Abbr
Unit
Value
Application
State size
State
MB
1, 10, 15, 30
Computation intensity
CPI
round
0 (low), 2000(medium), 5000 (high)
Topology depth
TD
# of operators
2, 20
Checkpoint
Checkpoint interval
CI
second
NC (disabled), 1, 30, 45, 60
Checkpoint synchronicity
CS
?
sync, async
Barrier alignment
BA
?
true, false
Tab.5
Workload Suite
Use Case
Description
Object
1
WC
Effects of CI on NL and CPU for applications with different State
Runtime Overhead
2
PI
Effects of CS on NL and CPU for applications with different CPI
3
WC
Effects of BA on NL and CPU when processing data with different Skew
4
WC
NL under different workload fluctuations
5
DT
Effects of TD on RL
Recovery Efficiency
6
WC
Effects of CI on RL
Tab.6
Knob
Abbr
Unit
Value
Input Rate
IR
tuples/s
500, 5000, [0, 7500]
Skew Distribution
Skew
?
0, 1
Tab.7
Fig.10
Fig.11
Fig.12
Fig.13
Fig.14
Fig.15
Fig.16
Fig.17
Fig.18
1
J F Naughton , D J DeWitt , D Maier , A Aboulnaga , J J Chen , L Galanis , J Kang , R Krishnamurthy , Q Luo , N Prakash , R Ramamurthy , J Shanmugasundaram , F Tian , K Tufte , S Viglas , Y Wang , C Zhang , B Jackson , R Chen . The niagara internet query system. IEEE Data Engineering Bulletin, 2001, 24( 2): 27– 33
2
D J Abadi , D Carney , U Çetintemel , M Cherniack , C Convey , S Lee , M Stonebraker , N Tatbul , Zdonik S B . Aurora: a new model and architecture for data stream management. The International Journal on Very Large Data Bases, 2003, 12( 2): 120– 139
3
Motwani R, Arasu J, Widomand A, Babcock B, Babu S, Datar M, Olston G S, Mankuand C, Varma J, Rosensteinand R. Query processing, approximation, and resource management in a data stream management system. In: Proceedings of International Conference on Innovative Data Systems Research. 2003
4
Chandrasekaran S, Cooper O, Deshpande A, J Franklin M, M Hellerstein J, Hong W, Krishnamurthy S, Madden S, Raman V, Reiss F, A Shah M. Telegraphcq: continuous dataflow processing for an uncertain world. In: Proceedings of Conference on Innovative Data Systems Research. 2003
5
Heinze T, Aniello L, Querzoni L, Jerzak Z. Cloud-based data stream processing. In: Proceedings of ACM International Conference on Distributed Event-Based Systems. 2014. 238–245
6
Cherniack M, Balakrishnan H, Balazinska M, Carney D, Çetintemel U, Xing Y, Zdonik S B. Scalable distributed stream processing. In: Proceedings of International Conference on Innovative Data Systems Research. 2003
7
Abadi D J, Ahmad Y, Balazinska M, Çetintemel U, Cherniack M, Hwang J H, Lindner W, Maskey A, Rasin A, Ryvkina E, Tatbul N, Xing Y, B Zdonik S. The design of the borealis stream processing engine. In: Proceedings of Conference on Innovative Data Systems Research. 2005. 277–289
8
Dean J Ghemawat S. Mapreduce: simplified data processing on large clusters. In: Proceedings of USENIX Symposium on Operating System Design and Implementation. 2004, 137–150
9
Toshniwal A, Taneja S, Shukla A, Ramasamy K, M Patel J, Kulkarni S, Jackson J, Gade K, Fu M S, Donham J, Bhagat N, Mittal S, V Ryaboy D. Storm@twitter. In: Proceedings of ACM International Conference on Management of Data. 2014, 147–156
10
Zaharia M, Das T, Li H Y, Hunter T, Shenker S, Stoica I. Discretized streams: fault-tolerant streaming computation at scale. In: Proceedings of ACM Symposium on Operating Systems Principles. 2013, 423–438
11
P Carbone , A Katsifodimos , S Ewen , V Markl , S Haridi , K Tzoumas . Apache flink TM: stream and batch processing in a single engine. IEEE Data Engineering Bulletin, 2015, 38( 4): 28– 38
12
Noghabi S A, Paramasivam K, Pan Y, Ramesh N, Bringhurst J, Gupta I, H Campbell R. Stateful scalable stream processing at linkedin. Proceedings of the VLDB Endowment, 2017, 10(12): 1634–1645
13
Gill P, Jain N, Nagappan N. Understanding network failures in data centers: measurement, analysis, and implications. In: Proceedings of ACM Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications. 2011, 350–361
14
Dean J. Handling large datasets at google: current systems and future directions. In Data-Intensive Computing Symposium, 2008
15
V Vishwanath K Nagappan N. Characterizing cloud computing hardware reliability. In: Proceedings of Symposium on Cloud Computing. 2010, 193–204
16
Shankland S. Google spotlights data center inner workings. CNET News, 2008
17
Raphael J R. In pictures: the worst cloud outages of 2013. PCWorld, 2013
18
Shah M A, Hellerstein J M, Brewer E A. Highly-available, faulttolerant, parallel dataflows. In: Proceedings of ACM International Conference on Management of Data. 2004, 827–838
19
Hwang J H, Balazinska M, Rasin A, Çetintemel U, Stonebraker M, Zdonik S B. High-availability algorithms for distributed stream processing. In: Proceedings of IEEE International Conference on Data Engineering. 2005, 779–790
20
Balazinska M, Balakrishnan H, Madden S, Stonebraker M. Faulttolerance in the borealis distributed stream processing system. In: Proceedings of ACM International Conference on Management of Data. 2005, 13–24
21
Hwang J H, Çetintemel U, B Zdonik S. Fast and highly-available stream processing over wide area networks. In: Proceedings of IEEE International Conference on Data Engineering. 2008, 804–813
22
Hwang J H, Xing Y, Çetintemel U, B Zdonik S. A cooperative, self-configuring high-availability solution for stream processing. In: Proceedings of IEEE International Conference on Data Engineering. 2007, 176–185
23
Y C Kwon , M Balazinska , A G Greenberg . Fault-tolerant stream processing using a distributed, replicated file system. Proceedings of the VLDB Endowment, 2008, 1( 1): 574– 585
24
Gu Y, Zhang Z, Ye F, Yang H, Kim M, Lei H, Liu Z. An empirical study of high availability in stream processing systems. In: Proceedings of ACM/IFIP/USENIX International Middleware Conference. 2009
25
Sebepou Z Magoutis K. Cec: continuous eventual checkpointing for data stream processing operators. In: Proceedings of International Conference on Dependable Systems and Networks. 2011, 145–156
26
Fernandez R C, Migliavacca M, Kalyvianaki E, R Pietzuch P. Integrating scale out and fault tolerance in stream processing using operator state management. In: Proceedings of ACM International Conference on Management of Data. 2013, 725–736
27
Koldehofe B, Mayer R, Ramachandran U, Rothermel K, Völz M. Rollback-recovery without checkpoints in distributed event processing systems. In: Proceedings of ACM International Conference on Distributed Event-Based Systems. 2013, 27–38
28
P Huang Q Lee . Toward high-performance distributed stream processing via approximate fault tolerance. Proceedings of the VLDB Endowment, 2016, 10( 3): 73– 84
29
Zhang Z, Gu Y, Ye F, Yang H, Kim M, Lei H, Liu Z. A hybrid approach to high availability in stream processing systems. In: Proceedings of IEEE International Conference on Distributed Computing Systems. 2010, 138–148
30
Heinze T, Zia M, Krahn R, Jerzak Z, Fetzer C. An adaptive replication scheme for elastic data stream processing systems. In: Proceedings of ACM International Conference on Distributed Event-Based Systems. 2015, 150–161
31
Su L Zhou Y L. Tolerating correlated failures in massively parallel stream processing engines. In: Proceedings of IEEE International Conference on Data Engineering. 2016, 517–528
32
Y L Su L Zhou . Passive and partially active fault tolerance for massively parallel stream processing engines. IEEE Transactions on Knowledge and Data Engineering, 2019, 31( 1): 32– 45
33
Martin A, Smaneoto T, Dietze T, Brito A, Fetzer C. User-constraint and self-adaptive fault tolerance for event stream processing systems. In: Proceedings og IEEE/IFIP International Conference on Dependable Systems and Networks. 2015
34
F B Schneider . Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 1990, 22( 4): 299– 319
35
Chandra T D, Griesemer R, Redstone J. Paxos made live: an engineering perspective. In: Proceedings of ACM Symposium on Principles of Distributed Computing. 2007, 398–407
36
E N Elnozahy , L Alvisi , Y M Wang , D B Johnson . A survey of rollback-recovery protocols in message-passing systems. ACM Computing Surveys, 2002, 34( 3): 375– 408
37
Balazinska M, Hwang J H, Shah M A. Fault-tolerance and high availability in data stream management systems. In Encyclopedia of Database Systems. 2009, 1109–1115
38
A L S Gradvohl , H Senger , L Arantes , P Sens . Comparing distributed online stream processing systems considering fault tolerance issues. Journal of Emerging Technologies in Web Intelligence, 2014, 6( 2): 174– 179
39
Nasir M A U. Fault tolerance for stream processing engines. arXiv preprint, abs/1605.00928, 2016
40
Arasu A, Cherniack M, Galvez E F, Maier D, Maskey A, Ryvkina E, Stonebraker M, Tibbetts R. Linear road: a stream data management benchmark. In: Proceedings of ACM International Conference on Very Large Data Bases. 2004, 480–491
41
Lu R R, Wu G, Xie B, Hu J T. Streambench: towards benchmarking modern distributed stream computing frameworks. In: Proceedings of IEEE/ACM International Conference on Utility and Cloud Computing. 2014, 69–78
42
Chintapalli S, Dagit D, Evans B, Farivar R, Graves T, Holderbaugh M, Liu Z, Nusbaum K, Patil K, Peng B Y, Poulosky P. Benchmarking streaming computation engines: storm, flink and spark streaming. In: Proceedings of IEEE International Parallel and Distributed Processing Symposium Workshops. 2016, 1789–1792
43
Grier J. Extending the yahoo! streaming benchmark. Ververica, 2016
44
Wang Y J. Stream processing systems benchmark: streambench. Master’s thesis, Aalto University, 2016
45
Shukla A, Chaturvedi S, Simmhan Y. Riotbench: a real-time iot benchmark for distributed stream processing platforms. arXiv preprint, abs/1701.08530, 2017
46
Bordin M V. A benchmark suite for distributed stream processing systems. PhD thesis, Universidade Federal do Rio Grande Do Su, 2017
47
Karimov J, Rabl T, Katsifodimos A, Samarev R, Heiskanen H, Markl V. Benchmarking distributed stream data processing systems. In: Proceedings of IEEE International Conference on Data Engineering. 2018, 1507–1518
48
Venkataraman S, Panda A, Ousterhout K, Armbrust M, Ghodsi A, Franklin M J, Recht B, Stoica I. Drizzle: fast and adaptable stream processing at scale. In: Proceedings of ACM Symposium on Operating Systems Principles. 2017, 374–389
49
T Akidau , R Bradshaw , C Chambers , S Chernyak , R FernándezMoctezuma , R Lax , S McVeety , D Mills , F Perry , E Schmidt , S Whittle . The dataflow model: a practical approach to balancing correctness, latency, and cost in massive-scale, unbounded, out-oforder data processing. Proceedings of the VLDB Endowment, 2015, 8( 12): 1792– 1803
50
P A Tucker , D Maier , T Sheard , L Fegaras . Exploiting punctuation semantics in continuous data streams. IEEE Transactions on Knowledge and Data Engineering, 2003, 15( 3): 555– 568
51
L Chandy K M Lamport . Distributed snapshots: determining global states of distributed systems. ACM Transactions on Computer Systems, 1985, 3( 1): 63– 75
52
Y M Wang , P Y Chung , I J Lin , W K Fuchs . Checkpoint space reclamation for uncoordinated checkpointing in message-passing systems. IEEE Transactions on Parallel and Distributed Systems, 1995, 6( 5): 546– 554
53
Neumeyer L, Robbins B, Nair A, Kesari A. S4: distributed stream computing platform. In: Proceedings of IEEE International Conference on Data Mining Workshops. 2010, 170–177
54
B Randell . System structure for software fault tolerance. IEEE Transactions on Software Engineering, 1975, 1( 2): 221– 232
55
Murray D G, McSherry F, Isaacs R, Isard M, Barham P, Abadi M. Naiad: a timely dataflow system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2013, 439–455
56
T Akidau , A Balikov , K Bekiroglu , S Chernyak , J Haberman , R Lax , S McVeety , D Mills , P Nordstrom , S Whittle . Millwheel: faulttolerant stream processing at internet scale. Proceedings of the VLDB Endowment, 2013, 6( 11): 1033– 1044
57
Qian Z P, He Y, Su C Z, Wu Z J, Zhu H Y, Zhang T Z, Zhou L D, Yu Y, Zhang Z. Timestream: reliable stream computation in the cloud. In: Proceedings of Eurosys Conference. 2013, 1–14
58
Bhargava B K S Lian. Independent checkpointing and concurrent rollback for recovery in distributed systems - an optimistic approach. In: Proceedings of Symposium on Reliable Distributed Systems. 1988, 3–12
59
Feldman S I Brown C B. Igor: a system for program debugging via reversible execution. In: Proceedings of ACM SIGPLAN and SIGOPS Workshop on Parallel and Distributed Debugging. 1988, 112–123
60
Ghemawat S, Gobioff H, Leung S. The google file system. In: Proceedings of ACM Symposium on Operating Systems Principles. 2003, 29–43
61
Shvachko K, Kuang H, Radia S, Chansler R. The hadoop distributed file system. In: Proceedings of IEEE Symposium on Mass Storage Systems and Technologies. 2010, 1–10
62
Pohl C, Götze P, Sattler K. A cost model for data stream processing on modern hardware. In: Proceedings of International Workshop on Accelerating Analytics and Data Management Systems Using Modern Processor and Storage Architectures. 2017
63
S Zeuch , S Breß , T Rabl , B D Monte , J Karimov , C Lutz , M Renz , J Traub , V Markl . Analyzing efficient stream processing on modern hardware. Proceedings of the VLDB Endowment, 2019, 12( 5): 516– 530