Please wait a minute...
Frontiers of Environmental Science & Engineering

ISSN 2095-2201

ISSN 2095-221X(Online)

CN 10-1013/X

Postal Subscription Code 80-973

2018 Impact Factor: 3.883

Front. Environ. Sci. Eng.    2024, Vol. 18 Issue (5) : 65    https://doi.org/10.1007/s11783-024-1825-2
Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity
Joe F. Bozeman III1,2()
1. School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA
2. School of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332, USA
 Download: PDF(2733 KB)   HTML
 Export: BibTeX | EndNote | Reference Manager | ProCite | RefWorks
Abstract

● Socioecological inequity must be understood to improve environmental data science.

● The Systemic Equity Framework and Wells-Du Bois Protocol mitigate inequity.

● Addressing irreproducibility in machine learning is vital for bolstering integrity.

● Future directions include policy enforcement and systematic programming.

Socioecological inequity in environmental data science—such as inequities deriving from data-driven approaches and machine learning (ML)—are current issues subject to debate and evolution. There is growing consensus around embedding equity throughout all research and design domains—from inception to administration, while also addressing procedural, distributive, and recognitional factors. Yet, practically doing so may seem onerous or daunting to some. The current perspective helps to alleviate these types of concerns by providing substantiation for the connection between environmental data science and socioecological inequity, using the Systemic Equity Framework, and provides the foundation for a paradigmatic shift toward normalizing the use of equity-centered approaches in environmental data science and ML settings. Bolstering the integrity of environmental data science and ML is just beginning from an equity-centered tool development and rigorous application standpoint. To this end, this perspective also provides relevant future directions and challenges by overviewing some meaningful tools and strategies—such as applying the Wells-Du Bois Protocol, employing fairness metrics, and systematically addressing irreproducibility; emerging needs and proposals—such as addressing data-proxy bias and supporting convergence research; and establishes a ten-step path forward. Afterall, the work that environmental scientists and engineers do ultimately affect the well-being of us all.

Keywords Equity      Bias      Machine Learning      Data Science      Justice      Systemic Equity     
Corresponding Author(s): Joe F. Bozeman III   
Issue Date: 22 February 2024
 Cite this article:   
Joe F. Bozeman III. Bolstering integrity in environmental data science and machine learning requires understanding socioecological inequity[J]. Front. Environ. Sci. Eng., 2024, 18(5): 65.
 URL:  
https://academic.hep.com.cn/fese/EN/10.1007/s11783-024-1825-2
https://academic.hep.com.cn/fese/EN/Y2024/V18/I5/65
Fig.1  Venn Diagram of the Systemic Equity Framework from (Bozeman III et al., 2022).
Category Questions
Bad Data Inadequate Data
Does the data overlook, erroneously represent, or systemically exclude a subpopulation?
Tendentious Data
Does the data represent the subjectivity or impartiality of humans? How does this bias affect the intended outcomes?
Algorithmic Bias Harms of Identity Proxy
Could the model treat a particular demographic differently, even without explicit identity markers?
Harms of Subpopulation Difference
Are algorithmic outcomes disparate across respective subgroups?
Harms of Misfit Models
If the models are predictive, have you examined their accuracy by subpopulation to ensure performance is not significantly different? Specifically, What is your value-orientation and what are the public/social implications of this work?
Human Intent Do No Harm
What are your goals and intended outcomes? Is there any ill intent involved?
Harms of Ignorance
What are the unintended consequences of your work? How can your results be manipulated to abuse or harm?
Tab.1  The Wells-Du Bois Protocol from Monroe-White and Lecy (2022)
Fig.2  Sources of irreproducibility and their respective factors, grouped by primary Wells-Du Bois Protocol question categories as adapted from Gundersen et al. (2022), Monroe-White and Lecy (2022).
Step # Action Deliverable(s)
1 Identify and confirm equity focused environmental data scientists and engineers Initial planning group is formed
2 Identify and confirm collaborative transdisciplinary stakeholders with an emphasis on community-based representation Survey planning and preliminary development group is formed
3 Develop preliminary checklist and criteria for addressing socioecological inequity and bias in environmental data science and ML Preliminary survey framework and checklist/criteria is formed
4 Develop and administer survey to a predetermined amount of equity focused ML stakeholders, seeking input Survey responses inform the in-person focus group checklist/criteria
5 Assess and incorporate survey responses during an in-person focus group of ML-equity transdisciplinary stakeholders Finalized checklist/criteria is formed
6 Submit findings to impactful, accessible outlets and make checklist/criteria public Finalized checklist/criteria is promulgated
7 Identify and confirm ML-equity champions across key ML applications Establishes ML-equity champions and a checklist/criteria dissemination framework
8 Develop a follow-up survey to assess checklist/criteria effectiveness and satisfaction Survey responses inform checklist/criteria refinement
9 Assess and incorporate survey responses during an in-person meeting of ML-equity transdisciplinary stakeholders Updated checklist/criteria is established
10 Repeat Steps 6 through 9 periodically given a concurred period (e.g., every 3 years) Establishes a systemic ML-equity process
Tab.2  Proposed plan for systematically bolstering integrity in environmental data science and ML as adapted from Bozeman III et al. (2022)
1 E Baker, S Carley, S Castellanos, D Nock, III J F Bozeman, D Konisky, C G Monyei, M Shah, B Sovacool. (2023). Metrics for decision-making in energy justice. Annual Review of Environment and Resources, 48(1): 737–760
https://doi.org/10.1146/annurev-environ-112621-063400
2 A M A Balayn, C Lofi, G J P M Houben. (2021). Managing bias and unfairness in data for decision support: a survey of machine learning and data engineering approaches to identify and mitigate bias and unfairness within data management and analytics systems. VLDB Journal, 30(5): 739–768
https://doi.org/10.1007/s00778-021-00671-8
3 J F III Bozeman, E Nobler, D Nock. (2022). A path toward systemic equity in life cycle assessment and decision-making: standardizing sociodemographic data practices. Environmental Engineering Science, 39(9): 759–769
https://doi.org/10.1089/ees.2021.0375
4 III J F Bozeman, S S Chopra, P James, S Muhammad, H Cai, K Tong, M Carrasquillo, H Rickenbacker, D Nock, W Ashton. et al.. (2023). Three research priorities for just and sustainable urban systems: now is the time to refocus. Journal of Industrial Ecology, 27(2): 382–394
https://doi.org/10.1111/jiec.13360
5 J Chubb, M S Reed. (2018). The politics of research impact: academic perceptions of the implications for research funding, motivation and quality. British Politics, 13(3): 295–311
https://doi.org/10.1057/s41293-018-0077-9
6 S Cui, Y Gao, Y Huang, L Shen, Q Zhao, Y Pan, S Zhuang. (2023). Advances and applications of machine learning and deep learning in environmental ecology and health. Environmental Pollution, 335(10): 122358
https://doi.org/10.1016/j.envpol.2023.122358
7 M Feldman, S Friedler, J Moeller, C Scheidegger, S Venkatasubramanian (2014). Certifying and removing disparate impact. 10.48550/arxiv.1412.3756 arXiv. 1412.3756
8 G Gauchat. (2012). Politicization of science in the public sphere: a study of public trust in the United States, 1974 to 2010. American Sociological Review, 77(2): 167–187
https://doi.org/10.1177/0003122412438225
9 K Gibert, J S Horsburgh, I N Athanasiadis, G Holmes. (2018). Environmental data science. Environmental Modelling & Software, 106: 4–12
https://doi.org/10.1016/j.envsoft.2018.04.005
10 S Grineski, B Bolin, C Boone. (2007). Criteria air pollution and marginalized populations: environmental inequity in metropolitan Phoenix, Arizona. Social Science Quarterly, 88(2): 535–554
https://doi.org/10.1111/j.1540-6237.2007.00470.x
11 O E Gundersen, K Coakley, C Kirkpatrick, Y Gil (2022). Sources of irreproducibility in machine learning: a review.ArXiv, abs/2204.07610 10.48550/arXiv.2204.07610
12 M Hardt, E Price, N Srebro (2016). Equality of opportunity in supervised learning. Proceedings of the 30th International Conference on Neural Information Processing Systems, 3323–333110.48550/arxiv.1610.02413
13 J H Hinnefeld, P Cooman, N Mammo, R Deese (2018). Evaluating Fairness Metrics in the Presence of Dataset Bias. 10.48550/arxiv.1809.09245 arXiv. 1809.09245
14 IEEE(2020). Bejing: IEEE Recommended Practice for Assessing the Impact of Autonomous and Intelligent Systems on Human Well. IEEE Std 7010–2020, 1–96 10.1109/IEEESTD.2020.9084219
15 B Joshi, P Swarnakar. (2023). How fair is our air? The injustice of procedure, distribution, and recognition within the discourse of air pollution in Delhi, India. Environmental Sociology, 9(2): 176–189
https://doi.org/10.1080/23251042.2022.2151398
16 X Liu, D Lu, A Zhang, Q Liu, G Jiang. (2022). Data-driven machine learning in environmental pollution: gains and problems. Environmental Science & Technology, 56(4): 2124–2133
https://doi.org/10.1021/acs.est.1c06157
17 R Lokers, R Knapen, S Janssen, Y van Randen, J Jansen (2016). Analysis of Big Data technologies for use in agro-environmental science. Environmental Modelling & Software: With Environment Data News, 84(10), 494–504
18 T Monroe-White, J Lecy (2022). The Wells-Du Bois Protocol for machine learning bias: building critical quantitative foundations for third sector scholarship. Voluntas, 34, 170–18410.1007/s11266-022-00479-2
19 L D Montoya, L M Mendoza, C Prouty, M Trotz, M E Verbyla. (2020). Environmental engineering for the 21st century: increasing diversity and community participation to achieve environmental and social justice. Environmental Engineering Science, 38(5): 288–297
https://doi.org/10.1089/ees.2020.0148
20 M Mowbray, T Savage, C Wu, Z Song, B A Cho, E A Del Rio-Chanona, D Zhang. (2021). Machine learning for biochemical engineering: a review. Biochemical Engineering Journal, 172: 108054
https://doi.org/10.1016/j.bej.2021.108054
21 S G Murray, R M Wachter, R J Cucina (2020). Discrimination by artificial intelligence in a commercial electronic health record: a case study. Health Affairs Forefront,10.1377/hblog20200128.626576
22 A M Petersen, M E Ahmed, I Pavlidis. (2021). Grand challenges and emergent modes of convergence science. Humanities & Social Sciences Communications, 8(1): 194
https://doi.org/10.1057/s41599-021-00869-9
23 A Prahl, W W P Goh (2021). “Rogue machines” and crisis communication: When AI fails, how do companies publicly respond? Public Relations Review, 47(4): 102077 10.1016/j.pubrev.2021.102077
24 J Qian, W Wu, Q Yu, L Ruiz-Garcia, Y Xiang, L Jiang, Y Shi, Y Duan, P Yang. (2020). Filling the trust gap of food safety in food trade between the EU and China: an interconnected conceptual traceability framework based on blockchain. Food and Energy Security, 9(4): e249
https://doi.org/10.1002/fes3.249
25 J Ravetz, A Saltelli. (2015). The future of public trust in science. Nature, 524(7564): 161–161
https://doi.org/10.1038/524161d
26 J Rockström, J Gupta, D Qin, S J Lade, J F Abrams, L S Andersen, McKay D I Armstrong, X Bai, G Bala, S E Bunn. et al.. (2023). Safe and just Earth system boundaries. Nature, 619(7968): 102–111
https://doi.org/10.1038/s41586-023-06083-8
27 R M Sorrentino, S Yamaguchi (2008). Handbook of Motivation and Cognition Across Cultures. San Diego: Academic 10.1016/B978-0-12-373694-9.00024-6
28 K H Tae, Y Roh, Y H Oh, H Kim, S E Whang (2019). Data cleaning for accurate, fair, and robust models: a big data−AI integration approach. DEEM’19: Proceedings of the 3rd International Workshop on Data Management for End-to-End Machine Learningm, 30 June 2019, Amsterdam, Netherlands
29 P Tahmasebi, S Kamrava, T Bai, M Sahimi. (2020). Machine learning in geo- and environmental sciences: from small to large scale. Advances in Water Resources, 142: 103619
https://doi.org/10.1016/j.advwatres.2020.103619
30 C W Tessum, J S Apte, A L Goodkind, N Z Muller, K A Mullins, D A Paolella, S Polasky, N P Springer, S K Thakrar, J D Marshall. et al.. (2019). Inequity in consumption of goods and services adds to racial–ethnic disparities in air pollution exposure. Proceedings of the National Academy of Sciences of the United States of America, 116(13): 6001–6006
https://doi.org/10.1073/pnas.1818859116
31 P W J Verlegh, J B E M Steenkamp. (1999). A review and meta-analysis of country-of-origin research. Journal of Economic Psychology, 20(5): 521–546
https://doi.org/10.1016/S0167-4870(99)00023-9
32 P A Vesilind (2010). Engineering Peace and Justice the Responsibility of Engineers to Society. London: Springer-Verlag
33 R V D Vorst. (1998). Engineering, ethics and professionalism. European Journal of Engineering Education, 23(2): 171–179
https://doi.org/10.1080/03043799808923496
34 K A Wailoo, V J Dzau, K R Yamamoto. (2023). Embed equity throughout innovation. Science, 381(6662): 1029–1029
https://doi.org/10.1126/science.adk6365
35 Y Wen, Z Zhou, S Zhang, T J Wallington, W Shen, Q Tan, Y Deng, Y Wu. (2022). Urban–rural disparities in air quality responses to traffic changes in a megacity of China revealed using machine learning. Environmental Science & Technology Letters, 9(7): 592–598
https://doi.org/10.1021/acs.estlett.2c00246
36 M Zhu, J Wang, X Yang, Y Zhang, L Zhang, H Ren, B Wu, L Ye. (2022). A review of the application of machine learning in water quality evaluation. Eco-Environment & Health, 1(2): 107–116
https://doi.org/10.1016/j.eehl.2022.06.001
37 I Zliobaite (2015). On the relation between accuracy and fairness in binary classification. In: The 2nd Workshop on Fairness, Accountability, and Transparency in Machine Learning (FATML) at ICML’15, July 11, 2015, Lille, France 10.48550/arxiv.1505.05723
[1] Qiannan Duan, Pengwei Yan, Yichen Feng, Qianru Wan, Xiaoli Zhu. Machine learning assisted adsorption performance evaluation of biochar on heavy metal[J]. Front. Environ. Sci. Eng., 2024, 18(5): 55-.
[2] Yanpeng Huang, Chao Wang, Yuanhao Wang, Guangfeng Lyu, Sijie Lin, Weijiang Liu, Haobo Niu, Qing Hu. Application of machine learning models in groundwater quality assessment and prediction: progress and challenges[J]. Front. Environ. Sci. Eng., 2024, 18(3): 29-.
[3] Wiley Helm, Shifa Zhong, Elliot Reid, Thomas Igou, Yongsheng Chen. Development of gradient boosting-assisted machine learning data-driven model for free chlorine residual prediction[J]. Front. Environ. Sci. Eng., 2024, 18(2): 17-.
[4] Yang Zhang, Mei Lei, Kai Li, Tienan Ju. Spatial prediction of soil contamination based on machine learning: a review[J]. Front. Environ. Sci. Eng., 2023, 17(8): 93-.
[5] Zhongyao Liang, Yaoyang Xu, Gang Zhao, Wentao Lu, Zhenghui Fu, Shuhang Wang, Tyler Wagner. Approaching the upper boundary of driver-response relationships: identifying factors using a novel framework integrating quantile regression with interpretable machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(6): 76-.
[6] Yirong Hu, Wenjie Du, Cheng Yang, Yang Wang, Tianyin Huang, Xiaoyi Xu, Wenwei Li. Source identification and prediction of nitrogen and phosphorus pollution of Lake Taihu by an ensemble machine learning technique[J]. Front. Environ. Sci. Eng., 2023, 17(5): 55-.
[7] Rui Liang, Chao Chen, Akash Kumar, Junyu Tao, Yan Kang, Dong Han, Xianjia Jiang, Pei Tang, Beibei Yan, Guanyi Chen. State-of-the-art applications of machine learning in the life cycle of solid waste management[J]. Front. Environ. Sci. Eng., 2023, 17(4): 44-.
[8] Jianxun Yang, Qi Gao, Miaomiao Liu, John S. Ji, Jun Bi. Same stimuli, different responses: a pilot study assessing air pollution visibility impacts on emotional well-being in a controlled environment[J]. Front. Environ. Sci. Eng., 2023, 17(2): 20-.
[9] Min Cheng, Zhiyuan Zhang, Shihui Wang, Kexin Bi, Kong-qiu Hu, Zhongde Dai, Yiyang Dai, Chong Liu, Li Zhou, Xu Ji, Wei-qun Shi. A large-scale screening of metal-organic frameworks for iodine capture combining molecular simulation and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(12): 148-.
[10] Jin Xue, Fangting Wang, Kun Zhang, Hehe Zhai, Dan Jin, Yusen Duan, Elly Yaluk, Yangjun Wang, Ling Huang, Yuewu Li, Thomas Lei, Qingyan Fu, Joshua S. Fu, Li Li. Elucidate long-term changes of ozone in Shanghai based on an integrated machine learning method[J]. Front. Environ. Sci. Eng., 2023, 17(11): 138-.
[11] Weishuai Li, Jingang Huang, Zhuoer Shi, Wei Han, Ting Lü, Yuanyuan Lin, Jianfang Meng, Xiaobing Xu, Pingzhi Hou. Machine learning enabled prediction and process optimization of VFA production from riboflavin-mediated sludge fermentation[J]. Front. Environ. Sci. Eng., 2023, 17(11): 135-.
[12] Haoyang Xian, Pinjing He, Dongying Lan, Yaping Qi, Ruiheng Wang, Fan Lü, Hua Zhang, Jisheng Long. Predicting the elemental compositions of solid waste using ATR-FTIR and machine learning[J]. Front. Environ. Sci. Eng., 2023, 17(10): 121-.
[13] Tienan Ju, Mei Lei, Guanghui Guo, Jinglun Xi, Yang Zhang, Yuan Xu, Qijia Lou. A new prediction method of industrial atmospheric pollutant emission intensity based on pollutant emission standard quantification[J]. Front. Environ. Sci. Eng., 2023, 17(1): 8-.
[14] Wenjing Lu, Weizhong Huo, Huwanbieke Gulina, Chao Pan. Development of machine learning multi-city model for municipal solid waste generation prediction[J]. Front. Environ. Sci. Eng., 2022, 16(9): 119-.
[15] Yicai Huang, Jiayuan Chen, Qiannan Duan, Yunjin Feng, Run Luo, Wenjing Wang, Fenli Liu, Sifan Bi, Jianchao Lee. A fast antibiotic detection method for simplified pretreatment through spectra-based machine learning[J]. Front. Environ. Sci. Eng., 2022, 16(3): 38-.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed