Variables

Variables (like e.g. CFVariable) are the quantities contained within a NetCDF dataset. See the Datasets page on how to obtain them from a dataset.

Different type of arrays are involved when working with NCDatasets. For instance assume that test.nc is a file with a Float32 variable called variable.

using NCDatasets
ds = NCDataset("test.nc")
ncvar_cf = ds["variable"]

The variable ncvar_cf has the type CFVariable. No data is actually loaded from disk, but you can query its size, number of dimensions, number elements, etc., using the functions size, ndims, length as if ncvar_cf was an ordinary Julia array.

To load the variable ncvar_cf in memory you can convert it into an array with:

data = Array(ncvar_cf)
# or
data = ncvar_cf |> Array
# or if ndims(ncvar_cf) == 2
data = ncvar_cf[:,:]

Since NCDatasets 0.13, the syntax ncvar_cf[:] flattens the array, and is not equivalent with the above (unless ncvar_cf is a vector).

You can only load sub-parts of it in memory via indexing each dimension:

ncvar_cf[1:5, 10:20]

A scalar variable can be loaded using [], for example:

using NCDatasets
NCDataset("test_scalar.nc","c") do ds
    # the list of dimension names is simple `()` as a scalar does not have dimensions
    defVar(ds,"scalar",42,())
end

ds = NCDataset("test_scalar.nc")
value = ds["scalar"][] # 42

To load all a variable in a NetCDF file ignoring attributes like scale_factor, add_offset, _FillValue and time units one can use the property var or the function variable for example:

using NCDatasets
using Dates
data = [DateTime(2000,1,1), DateTime(2000,1,2)]
NCDataset("test_file.nc","c") do ds
    defVar(ds,"time",data,("time",), attrib = Dict(
               "units" => "days since 2000-01-01"))
end;

ds = NCDataset("test_file.nc")
ncvar = ds["time"].var
# or
ncvar = variable(ds,"time")
data = ncvar[:] # here [0., 1.]

The variable ncvar can be indexed in the same way as ncvar_cf explained above.

Note

NCDatasets.Variable and NCDatasets.CFVariable implement the interface of AbstractArray. It is thus possible to call any function that accepts an AbstractArray. But functions like mean, sum (and many more) would load every element individually which is very inefficient for large fields read from disk. You should instead convert such a variable to a standard Julia Array and then do computations with it. See also the performance tips for more information.

The following functions are convenient for working with variables:

Base.sizeMethod
sz = size(var::CFVariable)

Return a tuple of integers with the size of the variable var.

Note

Note that the size of a variable can change, i.e. for a variable with an unlimited dimension.

source
CommonDataModel.dimnamesFunction
names = dimnames(ds::AbstractNCDataset; parents = false)

Return all names defined in ds. When parents is true, also the names of parent groups are returned (default is false).

source
dimnames(v::Variable)

Return a tuple of strings with the dimension names of the variable v.

source
CommonDataModel.dimnames(v::AbstractVariable)

Return an iterable of the dimension names of the variable v.

source
dimnames(v::CFVariable)

Return a tuple of strings with the dimension names of the variable v.

source
CommonDatamodel.dimnames(ds::AbstractDataset)

Return an iterable of all dimension names in ds. This information can also be accessed using the property ds.dim:

Examples

ds = NCDataset("results.nc", "r");
dimnames = keys(ds.dim)
source
NCDatasets.dimsizeFunction
dimsize(v::CFVariable)

Get the size of a CFVariable as a named tuple of dimension → length.

source
CommonDataModel.nameFunction
name(ds::NCDataset)

Return the group name of the NCDataset ds

source
name(v::Variable)

Return the name of the NetCDF variable v.

source
CommonDatamodel.name(ds::AbstractDataset)

Name of the group of the data set ds. For a data set containing only a single group, this will be always the root group "/".

source
CommonDataModel.name(v::AbstractVariable)

Return the name of the variable v as a string.

source
NCDatasets.NCDatasetMethod
mfds = NCDataset(fnames, mode = "r"; aggdim = nothing, deferopen = true,
              isnewdim = false,
              constvars = [])

Opens a multi-file dataset in read-only "r" or append mode "a". fnames is a vector of file names.

Variables are aggregated over the first unlimited dimension or over the dimension aggdim if specified. Variables without the dimensions aggdim are not aggregated. All variables containing the dimension aggdim are aggregated. The variable who do not contain the dimension aggdim are assumed constant.

If variables should be aggregated over a new dimension (not present in the NetCDF file), one should set isnewdim to true. All NetCDF files should have the same variables, attributes and groupes. Per default, all variables will have an additional dimension unless they are marked as constant using the constvars parameter.

The append mode is only implemented when deferopen is false. If deferopen is false, all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit.

All metadata (attributes and dimension length are assumed to be the same for all NetCDF files. Otherwise reading the attribute of a multi-file dataset would be ambiguous. An exception to this rule is the length of the dimension over which the data is aggregated. This aggregation dimension can varify from file to file.

Setting the experimental flag _aggdimconstant to true means that the length of the aggregation dimension is constant. This speeds up the creating of a multi-file dataset as only the metadata of the first file has to be loaded.

Examples:

You can use Glob.jl to make fnames from a file pattern, e.g.

using NCDatasets, Glob
ds = NCDataset(glob("ERA5_monthly3D_reanalysis_*.nc"))

Aggregation over a new dimension:

using NCDatasets
for i = 1:3
  NCDataset("foo$i.nc","c") do ds
    defVar(ds,"data",[10., 11., 12., 13.], ("lon",))
  end
end

ds = NCDataset(["foo$i.nc" for i = 1:3],aggdim = "sample", isnewdim = true)
size(ds["data"])
# output
# (4, 3)
source
NCDatasets.nomissingFunction
a = nomissing(da)

Return the values of the array da of type Array{Union{T,Missing},N} (potentially containing missing values) as a regular Julia array a of the same element type. It raises an error if the array contains at least one missing value.

source
a = nomissing(da,value)

Retun the values of the array da of type AbstractArray{Union{T,Missing},N} as a regular Julia array a by replacing all missing value by value (converted to type T). This function is identical to coalesce.(da,T(value)) where T is the element type of da.

Example:

julia> nomissing([missing,1.,2.],NaN)
# returns [NaN, 1.0, 2.0]
source
CommonDataModel.fillvalueFunction
fv = fillvalue(v::Variable)
fv = fillvalue(v::CFVariable)

Return the fill-value of the variable v.

source
fillvalue(::Type{Int8})
fillvalue(::Type{UInt8})
fillvalue(::Type{Int16})
fillvalue(::Type{UInt16})
fillvalue(::Type{Int32})
fillvalue(::Type{UInt32})
fillvalue(::Type{Int64})
fillvalue(::Type{UInt64})
fillvalue(::Type{Float32})
fillvalue(::Type{Float64})
fillvalue(::Type{Char})
fillvalue(::Type{String})

Default fill-value for the given type from NetCDF.

source
CommonDataModel.load!Function
NCDatasets.load!(ncvar::Variable, data, indices)

Loads a NetCDF variables ncvar in-place and puts the result in data along the specified indices. One can use @inbounds annotate code where bounds checking can be elided by the compiler (which typically require type-stable code).

using NCDatasets
ds = NCDataset("file.nc")
ncv = ds["vgos"].var;
# data must have the right shape and type
data = zeros(eltype(ncv),size(ncv));
NCDatasets.load!(ncv,data,:,:,:)
# or
# @inbounds NCDatasets.load!(ncv,data,:,:,:)
close(ds)

# loading a subset
data = zeros(5); # must have the right shape and type
load!(ds["temp"].var,data,:,1) # loads the 1st column
Note

For a netCDF variable of type NC_CHAR, the element type of the data array must be UInt8 and cannot be the julia Char type, because the julia Char type uses 4 bytes and the NetCDF NC_CHAR only 1 byte.

source
CommonDataModel.load!(ncvar::CFVariable, data, buffer, indices)

Loads a NetCDF (or other format) variables ncvar in-place and puts the result in data (an array of eltype(ncvar)) along the specified indices. buffer is a temporary array of the same size as data but the type should be eltype(ncv.var), i.e. the corresponding type in the files (before applying scale_factor, add_offset and masking fill values). Scaling and masking will be applied to the array data.

data and buffer can be the same array if eltype(ncvar) == eltype(ncvar.var).

Example:

# create some test array
Dataset("file.nc","c") do ds
    defDim(ds,"time",3)
    ncvar = defVar(ds,"vgos",Int16,("time",),attrib = ["scale_factor" => 0.1])
    ncvar[:] = [1.1, 1.2, 1.3]
    # store 11, 12 and 13 as scale_factor is 0.1
end


ds = Dataset("file.nc")
ncv = ds["vgos"];
# data and buffer must have the right shape and type
data = zeros(eltype(ncv),size(ncv)); # here Vector{Float64}
buffer = zeros(eltype(ncv.var),size(ncv)); # here Vector{Int16}
NCDatasets.load!(ncv,data,buffer,:,:,:)
close(ds)
source

Creating a variable

CommonDataModel.defVarFunction
defVar(ds::NCDataset,name,vtype,dimnames; kwargs...)
defVar(ds::NCDataset,name,data,dimnames; kwargs...)

Define a variable with the name name in the dataset ds. vtype can be Julia types in the table below (with the corresponding NetCDF type). The parameter dimnames is a tuple with the names of the dimension. For scalar this parameter is the empty tuple (). The variable is returned (of the type CFVariable).

Instead of providing the variable type one can directly give also the data data which will be used to fill the NetCDF variable. In this case, the dimensions with the appropriate size will be created as required using the names in dimnames.

If data is a vector or array of DateTime objects, then the dates are saved as double-precision floats and units "days since 1900-01-01 00:00:00" (unless a time unit is specifed with the attrib keyword as described below). Dates are converted to the default calendar in the CF conversion which is the mixed Julian/Gregorian calendar.

Keyword arguments

  • fillvalue: A value filled in the NetCDF file to indicate missing data. It will be stored in the _FillValue attribute. NCDatasets does not use implicitely the default NetCDF fill values when reading data.
  • chunksizes: Vector integers setting the chunk size. The total size of a chunk must be less than 4 GiB.
  • deflatelevel: Compression level: 0 (default) means no compression and 9 means maximum compression. Each chunk will be compressed individually.
  • shuffle: If true, the shuffle filter is activated which can improve the compression ratio.
  • checksum: The checksum method can be :fletcher32 or :nochecksum (checksumming is disabled, which is the default)
  • attrib: An iterable of attribute name and attribute value pairs, for example a Dict, DataStructures.OrderedDict or simply a vector of pairs (see example below)
  • typename (string): The name of the NetCDF type required for vlen arrays

chunksizes, deflatelevel, shuffle and checksum can only be set on NetCDF 4 files. Compression of strings and variable-length arrays is not supported by the underlying NetCDF library.

NetCDF data types

NetCDF TypeJulia Type
NC_BYTEInt8
NC_UBYTEUInt8
NC_SHORTInt16
NC_INTInt32
NC_INT64Int64
NC_FLOATFloat32
NC_DOUBLEFloat64
NC_CHARChar
NC_STRINGString

Dimension ordering

The data is stored in the NetCDF file in the same order as they are stored in memory. As julia uses the Column-major ordering for arrays, the order of dimensions will appear reversed when the data is loaded in languages or programs using Row-major ordering such as C/C++, Python/NumPy or the tools ncdump/ncgen (NetCDF CDL). NumPy can also use Column-major ordering but Row-major order is the default. For the column-major interpretation of the dimensions (as in Julia), the CF Convention recommends the order "longitude" (X), "latitude" (Y), "height or depth" (Z) and "date or time" (T) (if applicable). All other dimensions should, whenever possible, be placed to the right of the spatiotemporal dimensions.

Example:

In this example, scale_factor and add_offset are applied when the data is saved.

julia> using DataStructures
julia> data = randn(3,5)
julia> NCDataset("test_file.nc","c") do ds
          defVar(ds,"temp",data,("lon","lat"), attrib = OrderedDict(
             "units" => "degree_Celsius",
             "add_offset" => -273.15,
             "scale_factor" => 0.1,
             "long_name" => "Temperature"
          ))
       end;
Note

If the attributes _FillValue, missing_value, add_offset, scale_factor, units and calendar are used, they should be defined when calling defVar by using the parameter attrib as shown in the example above.

source
v = CommonDataModel.defVar(ds::AbstractDataset,src::AbstractVariable)
v = CommonDataModel.defVar(ds::AbstractDataset,name::SymbolOrString,src::AbstractVariable)

Defines and return the variable in the data set ds copied from the variable src. The dimension name, attributes and data are copied from src as well as the variable name (unless provide by name).

source

Storage parameter of a variable

CommonDataModel.chunkingFunction
storage,chunksizes = chunking(v::Variable)

Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v. Note that chunking reports the same information as nc_inq_var_chunking and therefore considers variables with unlimited dimension as :contiguous.

source
storage,chunksizes = chunking(v::MFVariable)
storage,chunksizes = chunking(v::MFCFVariable)

Return the storage type (:contiguous or :chunked) and the chunk sizes of the varable v corresponding to the first file. If the first file in the collection is chunked then this storage attributes are returned. If not the first file is not contiguous, then multi-file variable is still reported as chunked with chunk size equal to the size of the first variable.

source
CommonDataModel.deflateFunction
isshuffled,isdeflated,deflate_level = deflate(v::Variable)

Return compression information of the variable v. If shuffle is true, then shuffling (byte interlacing) is activated. If deflate is true, then the data chunks (see chunking) are compressed using the compression level deflate_level (0 means no compression and 9 means maximum compression).

source
CommonDataModel.checksumFunction
checksummethod = checksum(v::Variable)

Return the checksum method of the variable v which can be either be :fletcher32 or :nochecksum.

source

Coordinate variables and cell boundaries

CommonDataModel.coordFunction
cv = coord(v::Union{CFVariable,Variable},standard_name)

Find the coordinate of the variable v by the standard name standard_name or some standardized heuristics based on units. If the heuristics fail to detect the coordinate, consider to modify the file to add the standard_name attribute. All dimensions of the coordinate must also be dimensions of the variable v.

Example

using NCDatasets
ds = NCDataset("file.nc")
ncv = ds["SST"]
lon = coord(ncv,"longitude")[:]
lat = coord(ncv,"latitude")[:]
v = ncv[:]
close(ds)
source
CommonDataModel.boundsFunction
b = bounds(ncvar::NCDatasets.CFVariable)

Return the CFVariable corresponding to the bounds attribute of the variable ncvar. The time units and calendar from the ncvar are used but not the attributes controling the packing of data scale_factor, add_offset and _FillValue.

source