Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (2) : 152605    https://doi.org/10.1007/s11704-019-8349-0
RESEARCH ARTICLE
Accurate and efficient follower log repair for Raft-replicated database systems
Jinwei GUO, Peng CAI(), Weining QIAN, Aoying ZHOU
School of Data Science and Engineering, East China Normal University, Shanghai 200062, China
 Download: PDF(717 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

State machine replication has been widely used in modern cluster-based database systems. Most commonly deployed configurations adopt the Raft-like consensus protocol, which has a single strong leader which replicates the log to other followers. Since the followers can handle read requests and many real workloads are usually read-intensive, the recovery speed of a crashed follower may significantly impact on the throughput. Different from traditional database recovery, the recovering follower needs to repair its local log first. Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower. To reduce network round trips, an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point, and then to directly fetch all committed log entries from the leader in one round trip. However, if the commit point is not persisted, the recovering follower has to get the whole log from the leader. In this paper, we propose an accurate and efficient log repair (AELR) algorithm for follower recovery. AELR is more robust and resilient to follower failure, and it only needs one network round trip to fetch the least number of log entries for follower recovery. This approach is implemented in the open source database system OceanBase. We experimentally show that the system adopting AELR has a good performance in terms of recovery time.

Keywords Raft      high availability      log replication      log repair     
Corresponding Author(s): Peng CAI   
Just Accepted Date: 22 October 2019   Issue Date: 24 December 2020
 Cite this article:   
Jinwei GUO,Peng CAI,Weining QIAN, et al. Accurate and efficient follower log repair for Raft-replicated database systems[J]. Front. Comput. Sci., 2021, 15(2): 152605.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-019-8349-0
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I2/152605
1 E A Brewer. Towards robust distributed systems (abstract). In: Proceedings of the 19th Annual ACM Symposium on Principles of Distributed Computing. 2000
https://doi.org/10.1145/343477.343502
2 S Gilbert, N A Lynch. Brewer’s conjecture and the feasibility of consistent, available, partition-tolerant Web services. ACM SIGACT News, 2002, 33(2): 51–59
https://doi.org/10.1145/564585.564601
3 G DeCandia, D Hastorun, M Jampani, G Kakulapati, A Lakshman, A Pilchin, S Sivasubramanian, P Vosshall, W Vogels. Dynamo: amazon’s highly available key-value store. In: Proceedings of the 21st ACM Symposium on Operating Systems Principles. 2007, 205–220
https://doi.org/10.1145/1323293.1294281
4 G Vargas-Solar, J Zechinelli-Martini, J Espinosa-Oviedo. Big data management: what to keep from the past to face future challenges? Data Science and Engineering, 2017, 2(4): 328–345
https://doi.org/10.1007/s41019-017-0043-3
5 M Burrows. The chubby lock service for loosely-coupled distributed systems. In: Proceeding of the 7th Symposium on Operating Systems Design and Implementation. 2006, 335–350
6 T D Chandra, R Griesemer, J Redstone. Paxos made live: an engineering perspective. In: Proceedings of the 26th Annual ACM Symposium on Principles of Distributed Computing. 2007, 398–407
https://doi.org/10.1145/1281100.1281103
7 J Zheng, Q Lin, J Xu, C Wei, C Zeng, P Yang, Y Zhang. Paxosstore: highavailability storage made practical in WeChat. Proceedings of the VLDB Endowment, 2017, 10(12): 1730–1741
https://doi.org/10.14778/3137765.3137778
8 D Ongaro, J K Ousterhout. In search of an understandable consensus algorithm. In: Proceedings of 2014 USENIX Annual Technical Conference. 2014, 305–319
9 M Maas, K Asanovic, T Harris, J Kubiatowicz. Taurus: a holistic language runtime system for coordinating distributed managed-language applications. In: Proceedings of the 21st International Conference on Architectural Support for Programming Languages and Operating Systems. 2016, 457–471
https://doi.org/10.1145/2954680.2872386
10 M Vallentin, V Paxson, R Sommer. VAST: a unified platform for interactive network forensics. In: Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation. 2016, 345–362
11 W Pan, Z Li, Y Zhang, C Weng. The new hardware development trend and the challenges in data management and analysis. Data Science and Engineering, 2018, 3(3): 263–276
https://doi.org/10.1007/s41019-018-0072-6
12 W Zheng, S Tu, E Kohler, B Liskov. Fast databases with fast durability and recovery through multicore parallelism. In: Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation. 2014, 465–477
13 K Ren, T Diamond, D J Abadi, A Thomson. Low-overhead asynchronous checkpointing in main-memory database systems. In: Proceedings of the 2016 ACM International Conference on Management of Data. 2016, 1539–1551
https://doi.org/10.1145/2882903.2915966
14 Y Wu, W Guo, C Chan, K Tan. Fast failure recovery for main-memory dbmss on multicores. In: Proceedings of the 2017 ACM International Conference on Management of Data. 2017, 267–281
https://doi.org/10.1145/3035918.3064011
15 W Cao, Z Liu, P Wang, S Chen, C Zhu, S Zheng, Y Wang, G Ma. Polarfs: an ultra-low latency and failure resilient distributed file system for shared storage cloud database. Proceedings of the VLDB Endowment, 2018, 11(12): 1849–1862
https://doi.org/10.14778/3229863.3229872
16 J Guo, J Chu, P Cai, M Zhou, A Zhou. Low-overhead paxos replication. Data Science and Engineering, 2017, 2(2): 169–177
https://doi.org/10.1007/s41019-017-0039-z
17 H Howard. ARC: analysis of Raft consensus. University of Cambridge, Technical Report, 2014
18 J Rao, E J Shekita, S Tata. Using paxos to build a scalable, consistent, and highly available datastore. Proceedings of the VLDB Endowment, 2011, 4(4): 243–254
https://doi.org/10.14778/1938545.1938549
19 B M Oki, B Liskov. Viewstamped replication: a new primary copy method to support highly-available distributed systems. In: Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing. 1988, 8–17
https://doi.org/10.1145/62546.62549
20 B F Cooper, A Silberstein, E Tam, R Ramakrishnan, R Sears. Benchmarking cloud serving systems with YCSB. In: Proceedings of the 1st ACM Symposium on Cloud Computing. 2010, 143–154
https://doi.org/10.1145/1807128.1807152
21 F B Schneider. Implementing fault-tolerant services using the state machine approach: a tutorial. ACM Computing Surveys, 1990, 22(4): 299–319
https://doi.org/10.1145/98163.98167
22 C Mohan, D J Haderle, B G Lindsay, H Pirahesh, PM Schwarz. ARIES: a transaction recovery method supporting fine-granularity locking and partial rollbacks using write-ahead logging. ACM Transactions on Database Systems, 1992, 17(1): 94–162
https://doi.org/10.1145/128765.128770
23 J Gray, P Helland, P E O’Neil, D E Shasha. The dangers of replication and a solution. In: Proceedings of the 1996 ACM International Conference on Management of Data. 1996, 173–182
https://doi.org/10.1145/235968.233330
24 L Lamport. The part-time parliament. ACM Transactions on Computer Systems, 1998, 16(2): 133–169
https://doi.org/10.1145/279227.279229
25 L Lamport. Paxos made simple. ACM SIGACT News, 2001, 32(4): 18–25
26 J Baker, C Bond, J C Corbett, J J Furman, A Khorlin, J Larson, J Leon, Y Li, A Lloyd, V Yushprakh. Megastore: providing scalable, highly available storage for interactive services. In: Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. 2011, 223–234
27 J C Corbett, J Dean, M Epstein, A Fikes, C Frost, J J Furman, S Ghemawat, A Gubarev, C Heiser, P Hochschild, W C Hsieh, S Kanthak, E Kogan, H Li, A Lloyd, S Melnik, D Mwaura, D Nagle, S Quinlan, R Rao, L Rolig, Y Saito, M Szymaniak, C Taylor, R Wang, D Woodford. Spanner: google’s globally-distributed database. In: Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation. 2012, 251–264
28 P Hunt, M Konar, F P Junqueira, B Reed. Zookeeper: wait-free coordination for internet-scale systems. In: Proceedings of 2010 USENIX Annual Technical Conference. 2010
29 F P Junqueira, B C Reed, M Serafini. Zab: high-performance broadcast for primary-backup systems. In: Proceedings of the 2011 IEEE/IFIP International Conference on Dependable Systems and Networks. 2011, 245–256
https://doi.org/10.1109/DSN.2011.5958223
30 R van Renesse, N Schiper, F B Schneider. Vive la différence: paxos vs. viewstamped replication vs. zab. IEEE Transactions on Dependable and Secure Computing, 2015, 12(4): 472–484
https://doi.org/10.1109/TDSC.2014.2355848
31 B Liskov, J Cowling. Viewstamped replication revisited. Technical Report, 2012
[1] Yu TANG,Hailong SUN,Xu WANG,Xudong LIU. An efficient and highly available framework of data recency enhancement for eventually consistent data stores[J]. Front. Comput. Sci., 2017, 11(1): 88-104.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed