Recommended Reading
There are LOTS of great resources for learning statistics, machine learning, and environmental data science. This is not an exhaustive list, but it includes some of the best resources that I have found. If you have a favorite resource that you think should be included here, please suggest additions.
Fundamentals
- Blitzstein & Hwang (2019) provides a thorough introduction to key concepts and ideas in probability. The book accompanies a free online course, Stat 110, which is a great resource for learning probability and statistics. Practice problems and solutions, handouts, and lecture videos are all available online.
- Downey (2021) offers an introduction to Bayesian statistics using computational methods. It’s not environment focused but provides code and a clear explanation of core concepts.
- Gelman (2021) is a textbook designed for a first course on applied statistics. Clear and well-worked examples underpin discussion of fundamental ideas in statistical analysis and thinking about data.
- Nazarathy & Klok (2021) offers tutorials on using Julia for statistics and machine learning; reading good code is a great way to improve your coding
- The Julia basics page includes a link to good resources for getting started with Julia.
- The MIT Computational Thinking course provides a fantastic Julia-based introduction to applied mathematics and computational thinking and is a fantastic resource for the course.
Environmental applications
- Helsel et al. (2020) is a comprehensive introduction to water resources and hydrology, focusing on statistical methods for analyzing hydrologic data. Its methods are traditional, with less emphasis on machine learning or Bayesian methods and more attention to null hypothesis significance testing, but its case studies are well-worked and thoughtfully described.
- Abernathey (2024) is an excellent resource covering introductory topics in Earth and climate data science using Python, with an emphasis on foundational computations. These core computational concepts serves as a recommended prerequisite for more advanced material in this book.
- Mignan (2024) is a modern introduction to catastrophe risk modeling that covers a wide range of hazards, including hydroclimatic extremes, from a physics-based perspective. It provides a structured framework for quantifying hazard, exposure, and vulnerability, following industry-standard CAT modeling approaches. While broader in scope and more introductory in level, it complements this book’s focus by illustrating foundational principles of probabilistic risk modeling in practice.
- Pyrcz (2024) is a textbook focused on applied machine learning, with a particular focus on geostatistics. There’s less focus on extremes, hydroclimate, and decision-making, but it provides very clear and interpretable explanations of many machine learning methods, including some that are not directly covered in this book.
- Naghettini (2017) is a textbook on statistical hydrology that covers many of the same topics as this course. The statistical hydrology literature often obfuscates key ideas with complex notation and terminology, but this book is a helpful introduction to the field.
Digging deeper
- Gelman et al. (2014) and McElreath (2020) are the classic textbooks on Bayesian inference and provide a wealth of insight and detail. The Gelman textbook is a bit more dense while the McElreath book has a more conversational tone, but both cover similar topics.
- Friedman et al. (2001) is a classic introduction to machine learning, which complements the Bayesian perspective nicely.
- Cressie & Wikle (2011) provides a detailed exploration of hierarchical space-time models. There have been some computational advances since then that are worth keeping in mind before you apply these models directly, but it’s a clearly written and overview.
- Thuerey et al. (2024) is a new textbook on physics-based deep learning, which is a rapidly growing area of research. It provides a comprehensive overview of the field, including theoretical foundations and practical applications. It covers topics, including neural operators and diffusion models, that are not covered in this course, but which are increasingly used in the climate risk space.
- Jaynes (2003) is a classic text on probability theory that you should read if you’re interested in questions like “what is probability?”
- The documentation for the Turing, PyMC, and (especially) stan probabilistic programming languages offer outstanding tutorials on statistical modeling.
References
Abernathey, R. (2024). An Introduction to Earth and Environmental Data Science. Retrieved from https://earth-env-data-science.github.io/intro.html
Blitzstein, J. K., & Hwang, J. (2019). Introduction to Probability, Second Edition (2nd Edition). Boca Raton: Chapman and Hall/CRC. Retrieved from http://probabilitybook.net
Cressie, N. A. C., & Wikle, C. K. (2011). Statistics for spatio-temporal data. Hoboken, N.J.: Wiley.
Downey, A. B. (2021). Think Bayes. "O’Reilly Media, Inc.". Retrieved from https://allendowney.github.io/ThinkBayes2/
Friedman, J., Hastie, T., & Tibshirani, R. (2001). The Elements of Statistical Learning (Vol. 1). Springer series in statistics Springer, Berlin.
Gelman, A. (2021). Regression and other stories. Cambridge, United Kingdom ; Cambridge University Press.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2014). Bayesian Data Analysis (3rd ed.). Chapman & Hall/CRC Boca Raton, FL, USA.
Helsel, D. R., Hirsch, R. M., Ryberg, K. R., Archfield, S. A., & Gilroy, E. J. (2020). Statistical methods in water resources. Techniques and Methods. U.S. Geological Survey. https://doi.org/10.3133/tm4A3
Jaynes, E. T. (2003). Probability theory: The logic of science. New York, NY: Cambridge University Press.
McElreath, R. (2020). Statistical rethinking: A Bayesian course with examples in R and Stan (Second edition.). Boca Raton ; CRC Press, Taylor & Francis Group.
Mignan, A. (2024). Introduction to Catastrophe Risk Modelling: A Physics-based Approach. Cambridge: Cambridge University Press. https://doi.org/10.1017/9781009437370
Naghettini, M. (Ed.). (2017). Fundamentals of Statistical Hydrology. Cham: Springer International Publishing. https://doi.org/10.1007/978-3-319-43561-9
Nazarathy, Y., & Klok, H. (2021). Statistics with Julia: Fundamentals for Data Science, Machine Learning and Artificial Intelligence. Springer International Publishing. https://doi.org/10.1007/978-3-030-70901-3
Pyrcz, M. J. (2024). Applied Machine Learning in Python: A Hands-on Guide with Code. Retrieved from https://geostatsguy.github.io/MachineLearningDemos_Book
Thuerey, N., Holzschuh, B., Holl, P., Kohl, G., Lino, M., Liu, Q., et al. (2024). Physics-based deep learning. Retrieved from https://physicsbaseddeeplearning.org