Please wait a minute...
Frontiers of Computer Science

ISSN 2095-2228

ISSN 2095-2236(Online)

CN 10-1014/TP

Postal Subscription Code 80-970

2018 Impact Factor: 1.129

Front. Comput. Sci.    2021, Vol. 15 Issue (3) : 153102    https://doi.org/10.1007/s11704-020-9344-1
RESEARCH ARTICLE
Fine-grained management of I/O optimizations based on workload characteristics
Bing WEI1,2, Limin XIAO1,2(), Bingyu ZHOU1,2, Guangjun QIN1,2(), Baicheng YAN1,2, Zhisheng HUO1,2
1. State Key Laboratory of Software Development Environment, Beihang University, Beijing 100191, China
2. School of Computer Science and Engineering, Beihang University, Beijing 100191, China
 Download: PDF(1076 KB)  
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

With the advent of new computing paradigms, parallel file systems serve not only traditional scientific computing applications but also non-scientific computing applications, such as financial computing, business, and public administration. Parallel file systems provide storage services for multiple applications. As a result, various requirements need to be met. However, parallel file systems usually provide a unified storage solution, which cannot meet specific application needs. In this paper, an extended file handle scheme is proposed to deal with this problem. The original file handle is extended to record I/O optimization information, which allows file systems to specify optimizations for a file or directory based on workload characteristics. Therefore, fine-grained management of I/O optimizations can be achieved. On the basis of the extended file handle scheme, data prefetching and small file optimization mechanisms are proposed for parallel file systems. The experimental results show that the proposed approach improves the aggregate throughput of the overall system by up to 189.75%.

Keywords parallel file systems      workload characteristics      extended file handle      data prefetching      small files     
Corresponding Author(s): Limin XIAO,Guangjun QIN   
Just Accepted Date: 01 March 2020   Issue Date: 24 December 2020
 Cite this article:   
Bing WEI,Limin XIAO,Bingyu ZHOU, et al. Fine-grained management of I/O optimizations based on workload characteristics[J]. Front. Comput. Sci., 2021, 15(3): 153102.
 URL:  
https://academic.hep.com.cn/fcs/EN/10.1007/s11704-020-9344-1
https://academic.hep.com.cn/fcs/EN/Y2021/V15/I3/153102
1 P H Carns, W B Ligon, R B Ross, R Thakur. PVFS: a parallel file system for Linux clusters. In: Proceedings of the 4th Annual Linux Showcase and Conference. 2000, 317–327
2 F Schmuck, R Haskin. GPFS: a shared-disk file system for large computing clusters. In: Proceedings of the 10th USENIX Conference on File and Storage Technologies. 2002, 231–244
3 B Wei, L Xiao, B Zhou, G Qin, B Yan, Z Huo. I/O optimizations based on workload characteristics for parallel file systems. In: Proceedings of the 16th Annual IFIP International Conference on Network and Parallel Computing. 2019, 305–310
https://doi.org/10.1007/978-3-030-30709-7_24
4 F Isaila, P Balaprakash, S M Wild, D Kimpe, R Latham, R Ross, P Hovland. Collective I/O tuning using analytical and machine learning models. In: Proceedings of the IEEE International Conference on Cluster Computing. 2015, 128–137
https://doi.org/10.1109/CLUSTER.2015.29
5 S Byna, Y Chen, X H Sun, R Thakur, W Gropp. Parallel I/O prefetching using MPI file caching and I/O signatures. In: Proceedings of the 2008 ACM/IEEE Conference on Supercomputing. 2008, 1–12
https://doi.org/10.1109/SC.2008.5213604
6 J Chen, J Liu, P Roth, Y Chen. Using working set reorganization to manage storage systems with hard and solid state disks. In: Proceedings of the 43rd International Conference on Parallel Processing Workshops. 2014, 283–291
https://doi.org/10.1109/ICPPW.2014.45
7 L B Costa, M Ripeanu. Towards automating the configuration of a distributed storage system. In: Proceedings of the 11th IEEE/ACM International Conference on Grid Computing. 2010, 201–208
https://doi.org/10.1109/GRID.2010.5697971
8 S Narayan, J Chandy. Attest: attributes-based extendable storage. Journal of Systems and Software, 2010, 83(4): 548–556
https://doi.org/10.1016/j.jss.2009.10.034
9 T M Madhyastha, D A Reed. Learning to classify parallel input/output access patterns. IEEE Transactions on Parallel and Distributed Systems, 2002, 13(8): 802–813
https://doi.org/10.1109/TPDS.2002.1028437
10 Y Wang, D Kaeli. Profile-guided I/O partitioning. In: Proceedings of the 17th Annual International Conference on Supercomputing. 2003, 252–260
https://doi.org/10.1145/782814.782850
11 P Habermann, C C Chi, M Alvarez-Mesa, B Juurlink. Application-specific cache and prefetching for HEVC CABAC decoding. IEEE MultiMedia, 2017, 24(1): 72–85
https://doi.org/10.1109/MMUL.2017.12
12 J Chen, P C Roth, Y Chen. Using pattern-models to guide SSD deployment for big data applications in HPC systems. In: Proceedings of IEEE International Conference on Big Data. 2013, 332–337
https://doi.org/10.1109/BigData.2013.6691592
13 J He, J Bent, A Torres, G Grider, G Gibson, C Maltzahn, X H Sun. I/O acceleration with pattern detection. In: Proceedings of the 22nd International Symposium on High-Performance Parallel and Distributed Computing. 2013, 25–36
https://doi.org/10.1145/2462902.2462909
14 C M Patrick, M Kandemir, M Karakoy, S W Son, A Choudhary. Cashing in on hints for better prefetching and caching in PVFS and MPI-IO. In: Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing. 2010, 191–202
https://doi.org/10.1145/1851476.1851499
15 L Battle, R Chang, M Stonebraker. Dynamic prefetching of data tiles for interactive visualization. In: Proceedings of the 2016 International Conference on Management of Data. 2016, 1363–1375
https://doi.org/10.1145/2882903.2882919
16 S Al-Kiswany, A Gharaibeh, M Ripeanu. The case for a versatile storage system. ACM SIGOPS Operating Systems Review, 2010, 44(1): 10–14
https://doi.org/10.1145/1740390.1740394
17 A Calderon, F Garcia-Carballeira, L M Sanchez, J D Garcia, J Fernandez. Fault tolerant file models for parallel file systems: introducing distribution patterns for every file. The Journal of Supercomputing, 2009, 47(3): 312–334
https://doi.org/10.1007/s11227-008-0199-8
18 M Qiu, E H M Sha. Cost minimization while satisfying hard/soft timing constraints for heterogeneous embedded systems. ACM Transactions on Design Automation of Electronic Systems, 2009, 14(2): 1–30
https://doi.org/10.1145/1497561.1497568
19 M Vilayannur, P Nath, A Sivasubramaniam. Providing tunable consistency for a parallel file store. In: Proceedings of the 4th USENIX Conference on File and Storage Technologies. 2005, 17–30
20 J Xue, F Yan, R Birke, L Y Chen, T Scherer, E Smirni. PRACTISE: robust prediction of data center time series. In: Proceedings of the 11th International Conference on Network and Service Management. 2015, 126–134
https://doi.org/10.1109/CNSM.2015.7367348
21 D Dai, F S Bao, J Zhou, Y Chen. Block2vec: a deep learning strategy on mining block correlations in storage systems. In: Proceedings of the 45th International Conference on Parallel Processing Workshops. 2016, 230–239
https://doi.org/10.1109/ICPPW.2016.43
22 C Guo, Y Li, H Liu, Z Wu. An application-oriented cache allocation and prefetching method for long-running applications in distributed storage systems. Chinese Journal of Electronics, 2019, 28(4): 773–780
https://doi.org/10.1049/cje.2019.05.004
23 S L Zhang, H Catanese, A A I Wang. The composite-file file system: decoupling the one-to-one mapping of files and metadata for better performance. In: Proceedings of the 14th USENIX Conference on File and Storage Technologies. 2016, 15–22
24 B Hou, F Chen. Pacaca: mining object correlations and parallelism for enhancing user experience with cloud storage. In: Proceedings of the 26th IEEE International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems. 2018, 293–305
https://doi.org/10.1109/MASCOTS.2018.00036
25 S Sheoran, D Sethia, H Saran. Optimized mapfile based storage of small files in hadoop. In: Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. 2017, 906–912
https://doi.org/10.1109/CCGRID.2017.83
26 A Mehmood, M Usman, W Mehmood, Y Khaliq. Performance efficiency in hadoop for storing and accessing small files. In: Proceedings of the 7th International Conference on Innovative Computing Technology. 2017, 211–216
https://doi.org/10.1109/INTECH.2017.8102449
27 P Carns, S Lang, R Ross, M Vilayannur, J Kunkel, T Ludwig. Small-file access in parallel file systems. In: Proceedings of the IEEE International Symposium on Parallel & Distributed Processing. 2009, 1–11
https://doi.org/10.1109/IPDPS.2009.5161029
28 M Kuhn, JM Kunkel, T Ludwig. Dynamic file system semantics to enable metadata optimizations in PVFS. Concurrency and Computation: Practice and Experience, 2009, 21(14): 1775–1788
https://doi.org/10.1002/cpe.1439
29 B Wei, L M Xiao, W Wei, Y Song, B Y Zhou. A new adaptive coding selection method for distributed storage systems. IEEE Access, 2018, 6(1): 13350–13357
https://doi.org/10.1109/ACCESS.2018.2801265
30 Z P Li, H Yu, Y C Liu, F Q Liu. An improved adaptive exponential smoothing model for short-term travel time forecasting of urban arterial street. Acta Automatica Sinica, 2008, 34(11): 1404–1409
https://doi.org/10.1016/S1874-1029(08)60062-2
31 S A Weil, S A Brandt, E L Miller, D D E Long. Ceph: a scalable, highperformance distributed file system. In: Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation. 2006, 307–320
32 K Shvachko, H Kuang, S Radia, R Chansler. The hadoop distributed file system. In: Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies. 2010, 1–10
https://doi.org/10.1109/MSST.2010.5496972
33 S Ghemawat, H Gobioff, S T Leung. The Google file system. In: Proceedings of the 19th ACM Symposium on Operating Systems Principles. 2003, 29–43
https://doi.org/10.1145/1165389.945450
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed