Datasets

This page is about loading/writing, examining and operating directly on entire NetCDF datasets. For functions regarding the variables stored in them, see the Variables page.

Both variables and datasets share the functionality of the Attributes section.

NCDatasets.NCDataset — Type

NCDataset(filename::AbstractString, mode = "r";
          format::Symbol = :netcdf4,
          share::Bool = false,
          diskless::Bool = false,
          persist::Bool = false,
          memory::Union{Vector{UInt8},Nothing} = nothing,
          attrib = [])

Load, create, or even overwrite a NetCDF file at filename, depending on mode

"r" (default) : open an existing netCDF file or OPeNDAP URL in read-only mode.
"c" : create a new NetCDF file at filename (an existing file with the same name will be overwritten).
"a" : open filename into append mode (i.e. existing data in the netCDF file is not overwritten and a variable can be added).

If share is true, the NC_SHARE flag is set allowing to have multiple processes to read the file and one writer process. Likewise setting diskless or persist to true will enable the flags NC_DISKLESS or NC_PERSIST flag. More information is available in the NetCDF C-API.

Notice that this does not close the dataset, use close on the result (or see below the do-block).

The optional parameter attrib is an iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below).

Supported format values:

:netcdf4 (default): HDF5-based NetCDF format.
:netcdf4_classic: Only netCDF 3 compatible API features will be used.
:netcdf3_classic: classic netCDF format supporting only files smaller than 2GB.
:netcdf3_64bit_offset: improved netCDF format supporting files larger than 2GB.
:netcdf5_64bit_data: improved netCDF format supporting 64-bit integer data types.

Files can also be open and automatically closed with a do block.

NCDataset("file.nc") do ds
    data = ds["temperature"][:,:]
end

Here is an attribute example:

using DataStructures
NCDataset("file.nc", "c", attrib = OrderedDict("title" => "my first netCDF file")) do ds
   defVar(ds,"temp",[10.,20.,30.],("time",))
end;

The NetCDF dataset can also be a memory as a vector of bytes. A non-empty string a filename is still required, for example:

using NCDataset, HTTP
resp = HTTP.get("https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc")
ds = NCDataset("some_string","r",memory = resp.body)
total_precipitation = ds["tp"][:,:,:]
close(ds)

Dataset is an alias of NCDataset.

mfds = NCDataset(fnames, mode = "r"; aggdim = nothing, deferopen = true,
              isnewdim = false,
              constvars = [])

Opens a multi-file dataset in read-only "r" or append mode "a". fnames is a vector of file names.

Variables are aggregated over the first unlimited dimension or over the dimension aggdim if specified. Variables without the dimensions aggdim are not aggregated. All variables containing the dimension aggdim are aggregated. The variable who do not contain the dimension aggdim are assumed constant.

If variables should be aggregated over a new dimension (not present in the NetCDF file), one should set isnewdim to true. All NetCDF files should have the same variables, attributes and groupes. Per default, all variables will have an additional dimension unless they are marked as constant using the constvars parameter.

The append mode is only implemented when deferopen is false. If deferopen is false, all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit.

All metadata (attributes and dimension length are assumed to be the same for all NetCDF files. Otherwise reading the attribute of a multi-file dataset would be ambiguous. An exception to this rule is the length of the dimension over which the data is aggregated. This aggregation dimension can varify from file to file.

Setting the experimental flag _aggdimconstant to true means that the length of the aggregation dimension is constant. This speeds up the creating of a multi-file dataset as only the metadata of the first file has to be loaded.

Examples:

You can use Glob.jl to make fnames from a file pattern, e.g.

using NCDatasets, Glob
ds = NCDataset(glob("ERA5_monthly3D_reanalysis_*.nc"))

Aggregation over a new dimension:

using NCDatasets
for i = 1:3
  NCDataset("foo$i.nc","c") do ds
    defVar(ds,"data",[10., 11., 12., 13.], ("lon",))
  end
end

ds = NCDataset(["foo$i.nc" for i = 1:3],aggdim = "sample", isnewdim = true)
size(ds["data"])
# output
# (4, 3)

Useful functions that operate on datasets are:

Base.keys — Method

keys(ds::NCDataset)

Return a list of all variables names in NCDataset ds.

Base.haskey — Function

haskey(ds::NCDataset,name)
haskey(d::Dimensions,name)
haskey(ds::Attributes,name)

Return true if the NCDataset ds (or dimension/attribute list) has a variable (dimension/attribute) with the name name. For example:

ds = NCDataset("/tmp/test.nc","r")
if haskey(ds,"temperature")
    println("The file has a variable 'temperature'")
end

if haskey(ds.dim,"lon")
    println("The file has a dimension 'lon'")
end

This example checks if the file /tmp/test.nc has a variable with the name temperature and a dimension with the name lon.

Base.haskey(a::Attributes,name::SymbolOrString)

Check if name is an attribute

Base.getindex — Method

v = getindex(ds::AbstractDataset, varname::SymbolOrString)

Return the variable varname in the dataset ds as a CFVariable. The following CF convention are honored when the variable is indexed:

_FillValue or missing_value (which can be a list) will be returned as missing.
scale_factor and add_offset are applied (output = scale_factor * data_in_file + add_offset)
time variables (recognized by the units attribute and possibly the calendar attribute) are returned usually as DateTime object. Note that CFTime.DateTimeAllLeap, CFTime.DateTimeNoLeap and CF.TimeDateTime360Day cannot be converted to the proleptic gregorian calendar used in julia and are returned as such. (See CFTime.jl for more information about those date types.) If a calendar is defined but not among the ones specified in the CF convention, then the data in the file is not converted into a date structure.

A call getindex(ds, varname) is usually written as ds[varname].

If variable represents a cell boundary, the attributes calendar and units of the related variables are used, if they are not specified. For example:

dimensions:
  time = UNLIMITED; // (5 currently)
  nv = 2;
variables:
  double time(time);
    time:long_name = "time";
    time:units = "hours since 1998-04-019 06:00:00";
    time:bounds = "time_bnds";
  double time_bnds(time,nv);

In this case, the variable time_bnds uses the units and calendar of time because both variables are related thought the bounds attribute following the CF conventions.

See also cfvariable(ds, varname).

CommonDataModel.variable — Function

v = variable(ds::NCDataset,varname::String)

Return the NetCDF variable varname in the dataset ds as a NCDataset.Variable. No scaling or other transformations are applied when the variable v is indexed.

CommonDataModel.variable(ds::AbstractDataset,variablename::SymbolOrString)

Return the variable with the name variablename from the data set ds.

CommonDataModel.cfvariable — Function

v = cfvariable(ds::NCDataset,varname::SymbolOrString; <attrib> = <value>)

Return the variable varname in the dataset ds as a NCDataset.CFVariable. The keyword argument <attrib> are the attributes (fillvalue, missing_value, scale_factor, add_offset, units and calendar) relevant to the CF conventions. By specifing the value of these attributes, the one can override the value specified in the data set. If the attribute is set to nothing, then the attribute is not loaded and the corresponding transformation is ignored. This function is similar to ds[varname] with the additional flexibility that some variable attributes can be overridden.

Example:

NCDataset("foo.nc","c") do ds
  defVar(ds,"data",[10., 11., 12., 13.], ("time",), attrib = Dict(
      "add_offset" => 10.,
      "scale_factor" => 0.2))
end

# The stored (packed) valued are [0., 5., 10., 15.]
# since 0.2 .* [0., 5., 10., 15.] .+ 10 is [10., 11., 12., 13.]

ds = NCDataset("foo.nc");

@show ds["data"].var[:]
# returns [0., 5., 10., 15.]

@show cfvariable(ds,"data")[:]
# returns [10., 11., 12., 13.]

# neither add_offset nor scale_factor are applied
@show cfvariable(ds,"data", add_offset = nothing, scale_factor = nothing)[:]
# returns [0, 5, 10, 15]

# add_offset is applied but not scale_factor
@show cfvariable(ds,"data", scale_factor = nothing)[:]
# returns [10, 15, 20, 25]

# 0 is declared as the fill value (add_offset and scale_factor are applied as usual)
@show cfvariable(ds,"data", fillvalue = 0)[:]
# returns [missing, 11., 12., 13.]

# Use the time units: days since 2000-01-01
@show cfvariable(ds,"data", units = "days since 2000-01-01")[:]
# returns [DateTime(2000,1,11), DateTime(2000,1,12), DateTime(2000,1,13), DateTime(2000,1,14)]

close(ds)

CommonDataModel.sync — Function

sync(ds::NCDataset)

Write all changes in NCDataset ds to the disk.

Base.close — Function

close(ds::NCDataset)

Close the NCDataset ds. All pending changes will be written to the disk.

CommonDataModel.path — Function

path(ds::NCDataset)

Return the file path (or the opendap URL) of the NCDataset ds

CommonDatamodel.path(ds::AbstractDataset)

File path of the data set ds.

NCDatasets.ncgen — Function

ncgen(fname; ...)
ncgen(fname,jlname; ...)

Generate the Julia code that would produce a NetCDF file with the same metadata as the NetCDF file fname. The code is placed in the file jlname or printed to the standard output. By default the new NetCDF file is called filename.nc. This can be changed with the optional parameter newfname.

CommonDataModel.varbyattrib — Function

varbyattrib(ds, attname = attval)

Returns a list of variable(s) which has the attribute attname matching the value attval in the dataset ds. The list is empty if the none of the variables has the match. The output is a list of CFVariables.

Examples

Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc.

julia> ds = NCDataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]

Base.write — Function

write(dest::AbstractDataset, src::AbstractDataset; include = keys(src), exclude = [])

Write the variables of src dataset into an empty dest dataset (which must be opened in mode "a" or "c"). The keywords include and exclude configure which variable of src should be included (by default all), or which should be excluded (by default none).

If the first argument is a file name, then the dataset is open in create mode ("c").

This function is useful when you want to save the dataset from a multi-file dataset.

To save a subset, one can use the view function view to virtually slice a dataset:

Example

NCDataset(fname_src) do ds
    write(fname_slice,view(ds, lon = 2:3))
end

All variables in the source file fname_src with a dimension lon will be sliced along the indices 2:3 for the lon dimension. All attributes (and variables without a dimension lon) will be copied over unmodified.

Notice that DateTime-structures from CFTime are used to represent time for non-standard calendars. Otherwise, we attempt to use standard structures from the Julia standard library Dates.

Groups

A NetCDF group is a dataset (with variables, attributes, dimensions and sub-groups) and can be arbitrarily nested. A group is created with defGroup and accessed via the group property of a NCDataset.

# create the variable "temperature" inside the group "forecast"
ds = NCDataset("results.nc", "c");
ds_forecast = defGroup(ds,"forecast")
defVar(ds_forecast,"temperature",randn(10,11,12),("lon","lat","time"))

# load the variable "temperature" inside the group "forecast"
forecast_temp = ds.group["forecast"]["temperature"][:,:,:]
close(ds)

CommonDataModel.defGroup — Function

defGroup(ds::NCDataset,groupname; attrib = []))

Create the group with the name groupname in the dataset ds. attrib is a list of attribute name and attribute value pairs (see NCDataset).

group = CommonDatamodel.defGroup(ds::AbstractDataset,name::SymbolOrString)

Create an empty sub-group with the name name in the data set ds. The group is a sub-type of AbstractDataset.

Base.getindex — Method

group = getindex(g::Groups,groupname::AbstractString)

Return the NetCDF group with the name groupname from the parent group g.

For example:

ds = NCDataset("results.nc", "r");
forecast_group = ds.group["forecast"]
forecast_temp = forecast_group["temperature"]

Base.keys — Method

names = keys(g::Groups)

Return the names of all subgroubs of the group g.

Common methods

One can iterate over a dataset, attribute list, dimensions and NetCDF groups.

for (varname,var) in ds
    # all variables
    @show (varname,size(var))
end

for (attribname,attrib) in ds.attrib
    # all attributes
    @show (attribname,attrib)
end

for (groupname,group) in ds.groups
    # all groups
    @show (groupname,group)
end