using CSV
using DataFrames
using Dates
using Plots
using NCDatasets
using Unitful
Lab 05
Working with Tabular and Gridded Climate Data
1 Intro
1.1 Setup
Remember to git clone
to your machine, then activate
and instantiate
the environment, as we have done in previous labs.
1.2 Resources
2 Tablular Climate Data
The provided data set contains daily precipitation data from three rain gauges in Houston, TX, from the GHCN Daily dataset. First, let’s glimpse at the data:
first(CSV.read("data/rain_gauge_houston.csv", DataFrame), 5)
Row | STATION | NAME | LATITUDE | LONGITUDE | ELEVATION | DATE | PRCP |
---|---|---|---|---|---|---|---|
String15 | String | Float64 | Float64 | Float64 | String15 | Float64? | |
1 | USC00414325 | HOUSTON WESTBURY, TX US | 29.7003 | -95.4292 | 14.0 | 6/1/1948 | 0.0 |
2 | USC00414325 | HOUSTON WESTBURY, TX US | 29.7003 | -95.4292 | 14.0 | 6/2/1948 | 0.0 |
3 | USC00414325 | HOUSTON WESTBURY, TX US | 29.7003 | -95.4292 | 14.0 | 6/3/1948 | 0.0 |
4 | USC00414325 | HOUSTON WESTBURY, TX US | 29.7003 | -95.4292 | 14.0 | 6/4/1948 | 0.0 |
5 | USC00414325 | HOUSTON WESTBURY, TX US | 29.7003 | -95.4292 | 14.0 | 6/5/1948 | 0.0 |
- What are the columns?
- What are the column names?
- What does the date column look like?
Parse the date
column into a Date
type. Use the documentation to define a Date Format that matches the format of the dates in the data. You’ve seen this in lab 2!
# as usual, remember to add curly brackets so that this block can run
= DateFormat(...)
date_format = CSV.read(fname, DataFrame; dateformat=date_format) precip
Next, let’s add appropriate units. Explore the data documentation and multiply the precipitation column by the appropriate unit to add units to our data variable.
...
Next, let’s reshape our data so that each station is its own column. This sort of reshaping is common – see the docs.
= unique(precip)
precip = let
precip2 = stack(precip, :PRCP, [:STATION, :DATE], value_name=:prcp)[!, Not(:variable)]
long_df rename!(long_df, :STATION => :station, :DATE => :date)
unstack(long_df, :date, :station, :prcp)
end
first(precip2, 5)
- Plot each station’s precipitation as a time series
- Plot the mean precipitation at each station for each month
- Plot a scatterplot for each of the three pairs of stations (A and B, B and C, A and C)
- What do you learn from these plots?
3 Gridded Climate Data
We will work with the CPC Global Unified Gauge-Based Analysis of Daily Precipitation. This is a gridded dataset on 0.5 degrees. We are working with data in the general vicinity of Houston.
= Dataset("data/cpc.nc")
ds println(ds)
Dataset: data/cpc.nc
Group: /
Dimensions
lon = 10
lat = 10
time = 15341
Variables
lon (10)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: lon
Attributes:
_FillValue = NaN
long_name = Longitude
units = degrees_east
axis = X
standard_name = longitude
actual_range = Float32[0.25, 359.75]
coordinate_defines = center
lat (10)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: lat
Attributes:
_FillValue = NaN
actual_range = Float32[89.75, -89.75]
long_name = Latitude
units = degrees_north
axis = Y
standard_name = latitude
coordinate_defines = center
time (15341)
Datatype: Union{Missing, DateTime} (Float64)
Dimensions: time
Attributes:
_FillValue = NaN
long_name = Time
axis = T
standard_name = time
coordinate_defines = start
actual_range = [692496.0, 701232.0]
delta_t = 0000-00-01 00:00:00
avg_period = 0000-00-01 00:00:00
_ChunkSizes = 1
units = hours since 1900-01-01
calendar = proleptic_gregorian
precip (10 × 10 × 15341)
Datatype: Union{Missing, Float32} (Float32)
Dimensions: lon × lat × time
Attributes:
_FillValue = -9.96921e36
var_desc = Precipitation
level_desc = Surface
statistic = Total
parent_stat = Other
long_name = Daily total of precipitation
cell_methods = time: sum
avg_period = 0000-00-01 00:00:00
actual_range = Float32[0.0, 428.02423]
units = mm
valid_range = Float32[0.0, 1000.0]
dataset = CPC Global Precipitation
_ChunkSizes = Int32[1, 360, 720]
missing_value = -9.96921e36
The data is stored as a netcdf4 file, which is a common data format for climate data. We can glimpse the preciptation data as follows:
:precip] ds[
precip (10 × 10 × 15341) Datatype: Union{Missing, Float32} (Float32) Dimensions: lon × lat × time Attributes: _FillValue = -9.96921e36 var_desc = Precipitation level_desc = Surface statistic = Total parent_stat = Other long_name = Daily total of precipitation cell_methods = time: sum avg_period = 0000-00-01 00:00:00 actual_range = Float32[0.0, 428.02423] units = mm valid_range = Float32[0.0, 1000.0] dataset = CPC Global Precipitation _ChunkSizes = Int32[1, 360, 720] missing_value = -9.96921e36
- What are its units?
- What are its dimensions?
You can access the attributes (i.e., metadtata) of the variable as
:precip].attrib ds[
_FillValue = -9.96921e36 var_desc = Precipitation level_desc = Surface statistic = Total parent_stat = Other long_name = Daily total of precipitation cell_methods = time: sum avg_period = 0000-00-01 00:00:00 actual_range = Float32[0.0, 428.02423] units = mm valid_range = Float32[0.0, 1000.0] dataset = CPC Global Precipitation _ChunkSizes = Int32[1, 360, 720] missing_value = -9.96921e36
Now, consult the quickstart tutorial, specifically the “Load a netCDF file” section, to learn how to load the data into a Julia variable.
- Create a Julia variable called
lon
to hold the longitude, one calledlat
to hold the latitude, one calledtime
to hold the time variable, and one calledprecip_grid
to hold the precipitation data. - The
time
variable is stored as aDateTime
type, but we want to convert it to aDate
type. Use theDates
package to do this. - Add the correct units to
precip_grid
by multiplying it by the appropriate unit from the Unitful package
Now, let’s create some simple plots. A challenge we will face is that the latitude data in our dataset goes in the reverse direction. We can fix this by reversing the latitudes
= reverse(lat)
lat = reverse(precip_grid, dims=2) precip_grid
Now we can create some plots. For now we will neglect plotting coastlines, geographical projections, etc.
heatmap(lon, lat, precip_grid[:, :, 1]')
Note that we need to transpose the data here. There’s not a good rule of thumb that I have any luck with for remembering when I need to do this, and you need to look at your data. In this case, the blank squares correspond to ocean and I know that the Gulf of Mexico is in the South East of our domain.
Create some plots for other days in the dataset.
After we’re done reading the data, we close our link to the file.
close(ds)
closed Dataset
4 Comparison
Create two visualizations that compare the gridded data to the rain gauge data.
- Create a scatter plot of the precipitation at a single gauge with the precipitation at the nearest grid point. You will need to merge the two datasets together to do this – one way is to create a DataFrame with the
time
column of the gridded data and a single subset of theprecip_grid
data. Then you can join that to your rain gauge data. - Compare the seasonality (monthly averages) of your gridded and gauge data. You may take the average of all the gauge data (watch out for missing values) or you may choose a single gauge to work with.
You may interpret these instructions as you see most appropriate. Explain your thinking and your results, and what you have learned from this comparison. Does the gridded product better represent the gauges on short or long time scales?