Lab 05

Working with Tabular and Gridded Climate Data

Module 2
Labs
Published

Mon., Oct. 2

1 Intro

1.1 Setup

Remember to git clone to your machine, then activate and instantiate the environment, as we have done in previous labs.

using CSV
using DataFrames
using Dates
using Plots
using NCDatasets
using Unitful

1.2 Resources

2 Tablular Climate Data

The provided data set contains daily precipitation data from three rain gauges in Houston, TX, from the GHCN Daily dataset. First, let’s glimpse at the data:

first(CSV.read("data/rain_gauge_houston.csv", DataFrame), 5)
5×7 DataFrame
Row STATION NAME LATITUDE LONGITUDE ELEVATION DATE PRCP
String15 String Float64 Float64 Float64 String15 Float64?
1 USC00414325 HOUSTON WESTBURY, TX US 29.7003 -95.4292 14.0 6/1/1948 0.0
2 USC00414325 HOUSTON WESTBURY, TX US 29.7003 -95.4292 14.0 6/2/1948 0.0
3 USC00414325 HOUSTON WESTBURY, TX US 29.7003 -95.4292 14.0 6/3/1948 0.0
4 USC00414325 HOUSTON WESTBURY, TX US 29.7003 -95.4292 14.0 6/4/1948 0.0
5 USC00414325 HOUSTON WESTBURY, TX US 29.7003 -95.4292 14.0 6/5/1948 0.0
  1. What are the columns?
  2. What are the column names?
  3. What does the date column look like?

Parse the date column into a Date type. Use the documentation to define a Date Format that matches the format of the dates in the data. You’ve seen this in lab 2!

# as usual, remember to add curly brackets so that this block can run
date_format = DateFormat(...)
precip = CSV.read(fname, DataFrame; dateformat=date_format)

Next, let’s add appropriate units. Explore the data documentation and multiply the precipitation column by the appropriate unit to add units to our data variable.

...

Next, let’s reshape our data so that each station is its own column. This sort of reshaping is common – see the docs.

precip = unique(precip)
precip2 = let
    long_df = stack(precip, :PRCP, [:STATION, :DATE], value_name=:prcp)[!, Not(:variable)]
    rename!(long_df, :STATION => :station, :DATE => :date)
    unstack(long_df, :date, :station, :prcp)
end
first(precip2, 5)
  1. Plot each station’s precipitation as a time series
  2. Plot the mean precipitation at each station for each month
  3. Plot a scatterplot for each of the three pairs of stations (A and B, B and C, A and C)
  4. What do you learn from these plots?

3 Gridded Climate Data

We will work with the CPC Global Unified Gauge-Based Analysis of Daily Precipitation. This is a gridded dataset on 0.5 degrees. We are working with data in the general vicinity of Houston.

ds = Dataset("data/cpc.nc")
println(ds)
Dataset: data/cpc.nc
Group: /

Dimensions
   lon = 10
   lat = 10
   time = 15341

Variables
  lon   (10)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  lon
    Attributes:
     _FillValue           = NaN
     long_name            = Longitude
     units                = degrees_east
     axis                 = X
     standard_name        = longitude
     actual_range         = Float32[0.25, 359.75]
     coordinate_defines   = center

  lat   (10)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  lat
    Attributes:
     _FillValue           = NaN
     actual_range         = Float32[89.75, -89.75]
     long_name            = Latitude
     units                = degrees_north
     axis                 = Y
     standard_name        = latitude
     coordinate_defines   = center

  time   (15341)
    Datatype:    Union{Missing, DateTime} (Float64)
    Dimensions:  time
    Attributes:
     _FillValue           = NaN
     long_name            = Time
     axis                 = T
     standard_name        = time
     coordinate_defines   = start
     actual_range         = [692496.0, 701232.0]
     delta_t              = 0000-00-01 00:00:00
     avg_period           = 0000-00-01 00:00:00
     _ChunkSizes          = 1
     units                = hours since 1900-01-01
     calendar             = proleptic_gregorian

  precip   (10 × 10 × 15341)
    Datatype:    Union{Missing, Float32} (Float32)
    Dimensions:  lon × lat × time
    Attributes:
     _FillValue           = -9.96921e36
     var_desc             = Precipitation
     level_desc           = Surface
     statistic            = Total
     parent_stat          = Other
     long_name            = Daily total of precipitation
     cell_methods         = time: sum
     avg_period           = 0000-00-01 00:00:00
     actual_range         = Float32[0.0, 428.02423]
     units                = mm
     valid_range          = Float32[0.0, 1000.0]
     dataset              = CPC Global Precipitation
     _ChunkSizes          = Int32[1, 360, 720]
     missing_value        = -9.96921e36

The data is stored as a netcdf4 file, which is a common data format for climate data. We can glimpse the preciptation data as follows:

ds[:precip]
precip (10 × 10 × 15341)
  Datatype:    Union{Missing, Float32} (Float32)
  Dimensions:  lon × lat × time
  Attributes:
   _FillValue           = -9.96921e36
   var_desc             = Precipitation
   level_desc           = Surface
   statistic            = Total
   parent_stat          = Other
   long_name            = Daily total of precipitation
   cell_methods         = time: sum
   avg_period           = 0000-00-01 00:00:00
   actual_range         = Float32[0.0, 428.02423]
   units                = mm
   valid_range          = Float32[0.0, 1000.0]
   dataset              = CPC Global Precipitation
   _ChunkSizes          = Int32[1, 360, 720]
   missing_value        = -9.96921e36
  1. What are its units?
  2. What are its dimensions?

You can access the attributes (i.e., metadtata) of the variable as

ds[:precip].attrib
_FillValue           = -9.96921e36
var_desc             = Precipitation
level_desc           = Surface
statistic            = Total
parent_stat          = Other
long_name            = Daily total of precipitation
cell_methods         = time: sum
avg_period           = 0000-00-01 00:00:00
actual_range         = Float32[0.0, 428.02423]
units                = mm
valid_range          = Float32[0.0, 1000.0]
dataset              = CPC Global Precipitation
_ChunkSizes          = Int32[1, 360, 720]
missing_value        = -9.96921e36

Now, consult the quickstart tutorial, specifically the “Load a netCDF file” section, to learn how to load the data into a Julia variable.

  1. Create a Julia variable called lon to hold the longitude, one called lat to hold the latitude, one called time to hold the time variable, and one called precip_grid to hold the precipitation data.
  2. The time variable is stored as a DateTime type, but we want to convert it to a Date type. Use the Dates package to do this.
  3. Add the correct units to precip_grid by multiplying it by the appropriate unit from the Unitful package

Now, let’s create some simple plots. A challenge we will face is that the latitude data in our dataset goes in the reverse direction. We can fix this by reversing the latitudes

lat = reverse(lat)
precip_grid = reverse(precip_grid, dims=2)

Now we can create some plots. For now we will neglect plotting coastlines, geographical projections, etc.

heatmap(lon, lat, precip_grid[:, :, 1]')

Note that we need to transpose the data here. There’s not a good rule of thumb that I have any luck with for remembering when I need to do this, and you need to look at your data. In this case, the blank squares correspond to ocean and I know that the Gulf of Mexico is in the South East of our domain.

Create some plots for other days in the dataset.

After we’re done reading the data, we close our link to the file.

close(ds)
closed Dataset

4 Comparison

Create two visualizations that compare the gridded data to the rain gauge data.

  1. Create a scatter plot of the precipitation at a single gauge with the precipitation at the nearest grid point. You will need to merge the two datasets together to do this – one way is to create a DataFrame with the time column of the gridded data and a single subset of the precip_grid data. Then you can join that to your rain gauge data.
  2. Compare the seasonality (monthly averages) of your gridded and gauge data. You may take the average of all the gauge data (watch out for missing values) or you may choose a single gauge to work with.

You may interpret these instructions as you see most appropriate. Explain your thinking and your results, and what you have learned from this comparison. Does the gridded product better represent the gauges on short or long time scales?