using Pkg
lab_dir = dirname(@__FILE__)
Pkg.activate(lab_dir)
# Pkg.instantiate() # uncomment this the first time you run the lab to install packages, then comment it backCEVE 543 Fall 2025 Lab 6: Julia Climate Data Tools
YAXArrays.jl, NetCDF.jl, exploring CMIP6 data structure
1 Background
Python’s xarray package has been transformative for working with labeled multi-dimensional arrays in climate science. Originally developed at The Climate Corporation and released as open source in 2014, xarray has become the standard tool for climate data analysis in Python. Julia has been less widely adopted in this space, but the Climate Modeling Alliance is building an Earth System Model from scratch in Julia, driving development of climate data tools in the Julia ecosystem.
The Julia approach offers distinct advantages for this course. Because Julia is fast and doesn’t require switching to C, Fortran, or C++ for performance-critical code, we can implement statistical methods and algorithms in pure Julia and apply them directly to climate data. This means you can write custom downscaling algorithms, bias correction methods, and statistical models without learning specialized library syntax or dealing with language interoperability issues. The resulting code is often easier to understand and maintain because everything stays in one language.
xarray remains excellent and widely used in practice. However, for this course’s focus on implementing and understanding statistical downscaling methods, Julia’s combination of high performance and readability makes it easier to write, test, and apply custom algorithms to real climate data.
2 Objectives
- Load and explore climate model output using YAXArrays.jl
- Work with NetCDF files and understand CMIP6 data structure
- Extract and visualize climate model data for specific locations and time periods
3 Before
Before starting the lab, uncomment the Pkg.instantiate() line in the first code block and run it to install all required packages. This will take a few minutes the first time. After installation completes, comment the line back out to avoid reinstalling on subsequent runs.
4 Tasks
All packages that you need are included, and will be installed when you instantiate the project.
- Work through the Getting Started with YAXArrays.jl user guide and implement the examples in this lab. Note that you will need to modify the code block to use
using ...for all packages.
- You can copy or paste the code blocks from the tutorial, but try to make sure you understand what each line is doing.
- It’s good practice to put all your
usingstatements at the top of your code blocks. As work through more tutorials, put all theusing ...statements together. Often, it’s helpful to sort them alphabetically or in another logical order. - DO add brief text between code blocks – think of this as your notes to yourself
- A lot of the functionality of
YAXArrays.jlcomes from fromDimensionalData.jl. In particular, functions for selecting subsets of data, and for grouping and aggregating data are provided inDimensionalData.
- Read through the Dimensions, Selectors, and other sections of the
DimensionalDatadocumentation under “Getting Started” - Work through the
YAXArraystutorial on selecting data- Pro tip: replace
path = download(url, fname)withif !isfile(fname); download(url, fname); endto avoid re-downloading the file every time you run the code block
- Pro tip: replace
- Working through tutorials can get repetitive, so you don’t need to implement every YAXArrays user guide. However, do take a few minutes to browse through the other available guides so you’re aware of what functionality exists when you need it later.
- Work through the Plotting Maps tutorial
- Instead of
GLMakie, we will useCairoMakie. You can replaceusing GLMakieandusing GLMakie.GeometryBasicswithusing CairoMakieandusing CairoMakie.GeometryBasics. Read more about Makie backends here - Don’t worry about the
AlgebraofGraphics.jlcomponent, although it is installed if you want to try.
- The
store ="gs://cmip6/CMIP6/ScenarioMIP/DKRZ/MPI-ESM1-2-HR/ssp585/r1i1p1f1/3hr/tas/gn/v20190710/"for the Plotting Maps tutorial is actually quite powerful.
- What are we working with? Refer to the CMIP6 Data Reference Syntax for more information on the file structure used
cmip6: name of the top-level Google Cloud storage bucket (hencegs)CMIP6: root directory for the projectScenarioMIP: MIP (Model Intercomparison Project) nameDKRZ: institution ID (here, the German Climate Computing Center)MPI-ESM1-2-HR: source ID (here, the Max Planck Institute Earth System Model, version 1-2, high resolution version)ssp585: experiment ID (here, the Shared Socioeconomic Pathway 5-8.5, a very high emissions scenario)r1i1p1f1: variant label.r1is realization 1 (this would change for different ensemble members, if available).i1is initialization method 1.p1is physics version 1.f1is forcing index 13hr: time frequency (3-hourly data)tas: variable ID (near-surface air temperature)gn: grid label (native grid)v20190710: version (version date)
- Select a single rectangular region. Compute the average
tasover that region (if you’re fancy, weight by the cosine of latitude to account for the decreasing area of grid cells towards the poles, as shown in this xarray example) and plot the time series oftasfor that region. - Using that time series, find the hottest and coldest 3-hourly periods in the entire dataset for that region. For those two time periods, plot the spatial distribution of
tasover the entire globe using a map projection of your choice.
Rice University members can access this “Gem” (a large language model with specific prompts) on Google Gemini. It is designed to help you with syntax and programming challenges related to these specific packages, and to help you translate concepts from Python (e.g., xarray) to Julia. As with all LLMs (and humans), it can be wrong. While it probably can answer the whole lab for you, that would defeat the entire purpose of learning how to use these tools, so please use it wisely and in accordance with the course AI policy.