Background

The output of WRF Hydro model runs is spread over multiple netcdf files and multiple netcdf files in output categories. This tool presents a list-based approach to gathering all the timeseries data you need at once (over both time and output file types) with parallelization at the file level for enhanced speed. The main limitation is that each variable specified may only return a scalar for a given time. However, statistics can summarize spatial fields or the same variable can be specified multiple times with different index. (There is no efficient way to get bulk spatial data over time.)

Setup

Load the rwrfhydro package.

library("rwrfhydro")
## To check rwrfhydro updates run: CheckForUpdates()

Set the WRF Hydro test case directory, this should be the only thing you need to customize to your machine.

tcPath <- '~/wrfHydroTestCases/'

We’ll use daily data from the Fourmile Creek test case where only channel routing (CHRT) is used.

fcPath <- paste0(tcPath,'Fourmile_Creek/')
dataPath <- paste0(fcPath,'/RUN.RTTESTS/OUTPUT_CHRT_DAILY/')

List-based data retrieval

The GetMultiNcdf function needs 3 collated lists as input: filesList, variableList, and indexList (e.g. args(GetMultiNcdf)). (Note these are the formal argument names which may be shortend when calling the function, by R rules, upto uniqueness of the argument name. We construct variables in the global workspace to pass to these functions which have similar but not identical names.)

filesList

This is a named list of arbitrary length. The individual entries corrospond to different file types of mode output and contain all the files you want to look at. These are usually generated by files.list() in a directory. Here output files from the land surface model and RESTART files from the hydro model are desired. These are found individually and then placed into fileList

lsmFiles <- list.files(path=dataPath, pattern='LDASOUT_DOMAIN', full.names=TRUE)
hydroFiles <- list.files(path=dataPath, pattern='HYDRO_RST', full.names=TRUE)
flList <- list(lsm=lsmFiles, hydro=hydroFiles)

Note that the output and restart frequencies are different from the number of files of each type.

variableList

All three lists are collated by name. Now we define which variables are desired for each file group. From the land surface model (LSM) output files, we want the surface radiative temperature (TRAD) and the snow water equivalent (SNEQV). For the hydro restart files, we want streamflow and soil moisture on the four vertical soil layers. The layers are differentiated, for now, only by the names we give them (smc1-4).

lsmVars   <- list(TRAD='TRAD', SWE='SNEQV')
hydroVars <- list(streamflow='qlink1', smc1='sh2ox1', smc2='sh2ox2', smc3='sh2ox3', smc4='sh2ox4')
varList <- list(lsm=lsmVars, hydro=hydroVars)

indexList

The indexList defines what indices/stats are desired for each variable in each file group. This list is collated with both of the previous two lists in a nested way, illustrated below.

Only a scalar can be returned for each entry specified index. However, spatial fields (a range of indices) can be summarized using arbitrary statistics. We show how to define your own useful statistics which can be used when specifying the indexList. (Note that the envir argument may be needed to get your function inside of GetMultiNcdf in special circumstances.) Our statistic example is to calculate basin-average and basin-maximum soil moisture on each layer. To do this we need the basin mask (a non-standard field in the hydro grid file) to define the grid indices in the basin and the fraction of each within the basin.

basinMask <- ncdump(paste0(fcPath,'DOMAIN/Fulldom_hydro_OrodellBasin_100m_geogrd.nc'), 'basn_msk_geogrid', quiet=TRUE)
basAvg <- function(var) sum(basinMask*var)/sum(basinMask)
basMax <- function(var) max(ceiling(basinMask)*var)

For fun, we can also calculate the size of the basin while we are here. The LSM pixels are 1km:

basinKm2 <- sum(basinMask)
basinKm2
## [1] 60.8438

Now we can construct the indexList. For the LSM, we want spatial summaries of TRAD and SNEQV at each time. Statistical summaries are requested using list for each variable. The list specifies a grid ‘slice’ (as distinct from a subset, i.e. a slice is how ncdf4 lets one subset matrices) on which to compute a named statistic. The required names in this list are start, end, and stat in the list. Statistic lists are also specified for the individual soil moisture layers, 1-4, in the hydro restart files. Note that the dimensions are reverse order from what is shown in “ncdump -h”. For the discharge, qlink1, we dont supply a list. Instead we only give the integer index where flow is desired. (See the “WRF Hydro Domain and Channel Visualization” vignette to understand why index 1 on the stream channel just happens, conincidentally, to be the basin outlet.)

lsmInds   <- list(TRAD=list(start=c(1,1,1), end=c(21,7,1), stat='basAvg'),
                  SNEQV=list(start=c(1,1,1), end=c(21,7,1), stat='basMax'))
hydroInds <- list(qlink1=1,
                  smc1=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc2=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc3=list(start=c(1,1), end=c(21,7), stat='basAvg'),
                  smc4=list(start=c(1,1), end=c(21,7), stat='basAvg') )
indList <- list(lsm=lsmInds, hydro=hydroInds)

GetMultiNcdf()

All the work is really in setting up the lists. Now we just pass these to GetMultiNcdf. The preceeding two lines setup optional parallelization of the data grabs.

library(doMC)   ## Showing parallelization, which is at the file level within
## Loading required package: foreach
## Loading required package: iterators
## Loading required package: parallel
registerDoMC(3) ## each file groups; pointless to be longer than your timeseries.
fileData <- GetMultiNcdf(file=flList,var=varList, ind=indList, parallel=FALSE)

What did we get?

str(fileData)
## 'data.frame':    1076 obs. of  7 variables:
##  $ POSIXct      : POSIXct, format: "2013-03-01" "2013-03-02" "2013-03-03" "2013-03-04" ...
##  $ inds         : chr  "1:21,1:7,1:1" "1:21,1:7,1:1" "1:21,1:7,1:1" "1:21,1:7,1:1" ...
##  $ stat         : chr  "basAvg" "basAvg" "basAvg" "basAvg" ...
##  $ variable     : Factor w/ 7 levels "TRAD","SNEQV",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ value        : num  267 270 273 276 268 ...
##  $ variableGroup: chr  "TRAD" "TRAD" "TRAD" "TRAD" ...
##  $ fileGroup    : chr  "lsm" "lsm" "lsm" "lsm" ...

The fileData dataframe shows the time (POSIXct) at which certain indices (inds) were summarized with statistic stat for each variable (variable names in the file, e.g. sh2ox) The resulting value is given with the variableGroup (e.g. smc1-4 and not sh2ox) and fileGroup.

Plot the timeseries

This output format is easily plotted using ggplot2.

library(ggplot2)
library(scales)
ggplot(fileData, aes(x=POSIXct, y=value, color=fileGroup)) +
  geom_line() + geom_point() +
  facet_wrap(~variableGroup, scales='free_y', ncol=1) +
  scale_x_datetime(breaks = date_breaks("1 month")) + theme_bw()