Variables
Variables (like e.g. CFVariable
) are the quantities contained within a NetCDF dataset. See the Datasets page on how to obtain them from a dataset.
Different type of arrays are involved when working with NCDatasets. For instance assume that test.nc
is a file with a Float32
variable called variable
.
using NCDatasets
ds = NCDataset("test.nc")
ncvar_cf = ds["variable"]
The variable ncvar_cf
has the type CFVariable
. No data is actually loaded from disk, but you can query its size, number of dimensions, number elements, etc., using the functions size
, ndims
, length
as if ncvar_cf
was an ordinary Julia array.
To load the variable ncvar_cf
in memory you can convert it into an array with:
data = Array(ncvar_cf)
# or
data = ncvar_cf |> Array
# or if ndims(ncvar_cf) == 2
data = ncvar_cf[:,:]
Since NCDatasets 0.13, the syntax ncvar_cf[:]
flattens the array, and is not equivalent with the above (unless ncvar_cf
is a vector).
You can only load sub-parts of it in memory via indexing each dimension:
ncvar_cf[1:5, 10:20]
A scalar variable can be loaded using []
, for example:
using NCDatasets
NCDataset("test_scalar.nc","c") do ds
# the list of dimension names is simple `()` as a scalar does not have dimensions
defVar(ds,"scalar",42,())
end
ds = NCDataset("test_scalar.nc")
value = ds["scalar"][] # 42
To load all a variable in a NetCDF file ignoring attributes like scale_factor
, add_offset
, _FillValue
and time units one can use the property var
or the function variable
for example:
using NCDatasets
using Dates
data = [DateTime(2000,1,1), DateTime(2000,1,2)]
NCDataset("test_file.nc","c") do ds
defVar(ds,"time",data,("time",), attrib = Dict(
"units" => "days since 2000-01-01"))
end;
ds = NCDataset("test_file.nc")
ncvar = ds["time"].var
# or
ncvar = variable(ds,"time")
data = ncvar[:] # here [0., 1.]
The variable ncvar
can be indexed in the same way as ncvar_cf
explained above.
NCDatasets.Variable
and NCDatasets.CFVariable
implement the interface of AbstractArray
. It is thus possible to call any function that accepts an AbstractArray
. But functions like mean
, sum
(and many more) would load every element individually which is very inefficient for large fields read from disk. You should instead convert such a variable to a standard Julia Array
and then do computations with it. See also the performance tips for more information.
The following functions are convenient for working with variables:
Base.size
— Methodsz = size(var::CFVariable)
Return a tuple of integers with the size of the variable var
.
Note that the size of a variable can change, i.e. for a variable with an unlimited dimension.
CommonDataModel.dimnames
— Functionnames = dimnames(ds::AbstractNCDataset; parents = false)
Return all names defined in ds
. When parents
is true
, also the names of parent groups are returned (default is false
).
dimnames(v::Variable)
Return a tuple of strings with the dimension names of the variable v
.
CommonDataModel.dimnames(v::AbstractVariable)
Return an iterable of the dimension names of the variable v
.
dimnames(v::CFVariable)
Return a tuple of strings with the dimension names of the variable v
.
CommonDatamodel.dimnames(ds::AbstractDataset)
Return an iterable of all dimension names in ds
. This information can also be accessed using the property ds.dim
:
Examples
ds = NCDataset("results.nc", "r");
dimnames = keys(ds.dim)
NCDatasets.dimsize
— Functiondimsize(v::CFVariable)
Get the size of a CFVariable
as a named tuple of dimension → length.
CommonDataModel.name
— Functionname(ds::NCDataset)
Return the group name of the NCDataset ds
name(v::Variable)
Return the name of the NetCDF variable v
.
CommonDatamodel.name(ds::AbstractDataset)
Name of the group of the data set ds
. For a data set containing only a single group, this will be always the root group "/"
.
CommonDataModel.name(v::AbstractVariable)
Return the name of the variable v
as a string.
NCDatasets.renameVar
— FunctionrenameVar(ds::NCDataset,oldname,newname)
Rename the variable called oldname
to newname
.
NCDatasets.NCDataset
— Methodmfds = NCDataset(fnames, mode = "r"; aggdim = nothing, deferopen = true,
isnewdim = false,
constvars = [])
Opens a multi-file dataset in read-only "r"
or append mode "a"
. fnames
is a vector of file names.
Variables are aggregated over the first unlimited dimension or over the dimension aggdim
if specified. Variables without the dimensions aggdim
are not aggregated. All variables containing the dimension aggdim
are aggregated. The variable who do not contain the dimension aggdim
are assumed constant.
If variables should be aggregated over a new dimension (not present in the NetCDF file), one should set isnewdim
to true
. All NetCDF files should have the same variables, attributes and groupes. Per default, all variables will have an additional dimension unless they are marked as constant using the constvars
parameter.
The append mode is only implemented when deferopen
is false
. If deferopen is false
, all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit
.
All metadata (attributes and dimension length are assumed to be the same for all NetCDF files. Otherwise reading the attribute of a multi-file dataset would be ambiguous. An exception to this rule is the length of the dimension over which the data is aggregated. This aggregation dimension can varify from file to file.
Setting the experimental flag _aggdimconstant
to true
means that the length of the aggregation dimension is constant. This speeds up the creating of a multi-file dataset as only the metadata of the first file has to be loaded.
Examples:
You can use Glob.jl to make fnames
from a file pattern, e.g.
using NCDatasets, Glob
ds = NCDataset(glob("ERA5_monthly3D_reanalysis_*.nc"))
Aggregation over a new dimension:
using NCDatasets
for i = 1:3
NCDataset("foo$i.nc","c") do ds
defVar(ds,"data",[10., 11., 12., 13.], ("lon",))
end
end
ds = NCDataset(["foo$i.nc" for i = 1:3],aggdim = "sample", isnewdim = true)
size(ds["data"])
# output
# (4, 3)
NCDatasets.nomissing
— Functiona = nomissing(da)
Return the values of the array da
of type Array{Union{T,Missing},N}
(potentially containing missing values) as a regular Julia array a
of the same element type. It raises an error if the array contains at least one missing value.
a = nomissing(da,value)
Retun the values of the array da
of type AbstractArray{Union{T,Missing},N}
as a regular Julia array a
by replacing all missing value by value
(converted to type T
). This function is identical to coalesce.(da,T(value))
where T is the element type of da
.
Example:
julia> nomissing([missing,1.,2.],NaN)
# returns [NaN, 1.0, 2.0]
CommonDataModel.fillvalue
— Functionfv = fillvalue(v::Variable)
fv = fillvalue(v::CFVariable)
Return the fill-value of the variable v
.
fillvalue(::Type{Int8})
fillvalue(::Type{UInt8})
fillvalue(::Type{Int16})
fillvalue(::Type{UInt16})
fillvalue(::Type{Int32})
fillvalue(::Type{UInt32})
fillvalue(::Type{Int64})
fillvalue(::Type{UInt64})
fillvalue(::Type{Float32})
fillvalue(::Type{Float64})
fillvalue(::Type{Char})
fillvalue(::Type{String})
Default fill-value for the given type from NetCDF.
NCDatasets.loadragged
— Function data = loadragged(ncvar,index::Union{Colon,UnitRange,Integer})
Load data from ncvar
in the contiguous ragged array representation as a vector of vectors. It is typically used to load a list of profiles or time series of different length each.
The indexed ragged array representation is currently not supported.
CommonDataModel.load!
— FunctionNCDatasets.load!(ncvar::Variable, data, indices)
Loads a NetCDF variables ncvar
in-place and puts the result in data
along the specified indices
. One can use @inbounds annotate code where bounds checking can be elided by the compiler (which typically require type-stable code).
using NCDatasets
ds = NCDataset("file.nc")
ncv = ds["vgos"].var;
# data must have the right shape and type
data = zeros(eltype(ncv),size(ncv));
NCDatasets.load!(ncv,data,:,:,:)
# or
# @inbounds NCDatasets.load!(ncv,data,:,:,:)
close(ds)
# loading a subset
data = zeros(5); # must have the right shape and type
load!(ds["temp"].var,data,:,1) # loads the 1st column
For a netCDF variable of type NC_CHAR
, the element type of the data
array must be UInt8
and cannot be the julia Char
type, because the julia Char
type uses 4 bytes and the NetCDF NC_CHAR
only 1 byte.
CommonDataModel.load!(ncvar::CFVariable, data, buffer, indices)
Loads a NetCDF (or other format) variables ncvar
in-place and puts the result in data
(an array of eltype(ncvar)
) along the specified indices
. buffer
is a temporary array of the same size as data but the type should be eltype(ncv.var)
, i.e. the corresponding type in the files (before applying scale_factor
, add_offset
and masking fill values). Scaling and masking will be applied to the array data
.
data
and buffer
can be the same array if eltype(ncvar) == eltype(ncvar.var)
.
Example:
# create some test array
Dataset("file.nc","c") do ds
defDim(ds,"time",3)
ncvar = defVar(ds,"vgos",Int16,("time",),attrib = ["scale_factor" => 0.1])
ncvar[:] = [1.1, 1.2, 1.3]
# store 11, 12 and 13 as scale_factor is 0.1
end
ds = Dataset("file.nc")
ncv = ds["vgos"];
# data and buffer must have the right shape and type
data = zeros(eltype(ncv),size(ncv)); # here Vector{Float64}
buffer = zeros(eltype(ncv.var),size(ncv)); # here Vector{Int16}
NCDatasets.load!(ncv,data,buffer,:,:,:)
close(ds)
Creating a variable
CommonDataModel.defVar
— FunctiondefVar(ds::NCDataset,name,vtype,dimnames; kwargs...)
defVar(ds::NCDataset,name,data,dimnames; kwargs...)
Define a variable with the name name
in the dataset ds
. vtype
can be Julia types in the table below (with the corresponding NetCDF type). The parameter dimnames
is a tuple with the names of the dimension. For scalar this parameter is the empty tuple ()
. The variable is returned (of the type CFVariable
).
Instead of providing the variable type one can directly give also the data data
which will be used to fill the NetCDF variable. In this case, the dimensions with the appropriate size will be created as required using the names in dimnames
.
If data
is a vector or array of DateTime
objects, then the dates are saved as double-precision floats and units "days since 1900-01-01 00:00:00" (unless a time unit is specifed with the attrib
keyword as described below). Dates are converted to the default calendar in the CF conversion which is the mixed Julian/Gregorian calendar.
Keyword arguments
fillvalue
: A value filled in the NetCDF file to indicate missing data. It will be stored in the _FillValue attribute.NCDatasets
does not use implicitely the default NetCDF fill values when reading data.chunksizes
: Vector integers setting the chunk size. The total size of a chunk must be less than 4 GiB.deflatelevel
: Compression level: 0 (default) means no compression and 9 means maximum compression. Each chunk will be compressed individually.shuffle
: If true, the shuffle filter is activated which can improve the compression ratio.checksum
: The checksum method can be:fletcher32
or:nochecksum
(checksumming is disabled, which is the default)attrib
: An iterable of attribute name and attribute value pairs, for example aDict
,DataStructures.OrderedDict
or simply a vector of pairs (see example below)typename
(string): The name of the NetCDF type required for vlen arrays
chunksizes
, deflatelevel
, shuffle
and checksum
can only be set on NetCDF 4 files. Compression of strings and variable-length arrays is not supported by the underlying NetCDF library.
NetCDF data types
NetCDF Type | Julia Type |
---|---|
NC_BYTE | Int8 |
NC_UBYTE | UInt8 |
NC_SHORT | Int16 |
NC_INT | Int32 |
NC_INT64 | Int64 |
NC_FLOAT | Float32 |
NC_DOUBLE | Float64 |
NC_CHAR | Char |
NC_STRING | String |
Dimension ordering
The data is stored in the NetCDF file in the same order as they are stored in memory. As julia uses the Column-major ordering for arrays, the order of dimensions will appear reversed when the data is loaded in languages or programs using Row-major ordering such as C/C++, Python/NumPy or the tools ncdump
/ncgen
(NetCDF CDL). NumPy can also use Column-major ordering but Row-major order is the default. For the column-major interpretation of the dimensions (as in Julia), the CF Convention recommends the order "longitude" (X), "latitude" (Y), "height or depth" (Z) and "date or time" (T) (if applicable). All other dimensions should, whenever possible, be placed to the right of the spatiotemporal dimensions.
Example:
In this example, scale_factor
and add_offset
are applied when the data
is saved.
julia> using DataStructures
julia> data = randn(3,5)
julia> NCDataset("test_file.nc","c") do ds
defVar(ds,"temp",data,("lon","lat"), attrib = OrderedDict(
"units" => "degree_Celsius",
"add_offset" => -273.15,
"scale_factor" => 0.1,
"long_name" => "Temperature"
))
end;
If the attributes _FillValue
, missing_value
, add_offset
, scale_factor
, units
and calendar
are used, they should be defined when calling defVar
by using the parameter attrib
as shown in the example above.
v = CommonDataModel.defVar(ds::AbstractDataset,src::AbstractVariable)
v = CommonDataModel.defVar(ds::AbstractDataset,name::SymbolOrString,src::AbstractVariable)
Defines and return the variable in the data set ds
copied from the variable src
. The dimension name, attributes and data are copied from src
as well as the variable name (unless provide by name
).
Storage parameter of a variable
CommonDataModel.chunking
— Functionstorage,chunksizes = chunking(v::Variable)
Return the storage type (:contiguous
or :chunked
) and the chunk sizes of the varable v
. Note that chunking
reports the same information as nc_inq_var_chunking
and therefore considers variables with unlimited dimension as :contiguous
.
storage,chunksizes = chunking(v::MFVariable)
storage,chunksizes = chunking(v::MFCFVariable)
Return the storage type (:contiguous
or :chunked
) and the chunk sizes of the varable v
corresponding to the first file. If the first file in the collection is chunked then this storage attributes are returned. If not the first file is not contiguous, then multi-file variable is still reported as chunked with chunk size equal to the size of the first variable.
CommonDataModel.deflate
— Functionisshuffled,isdeflated,deflate_level = deflate(v::Variable)
Return compression information of the variable v
. If shuffle is true
, then shuffling (byte interlacing) is activated. If deflate is true
, then the data chunks (see chunking
) are compressed using the compression level deflate_level
(0 means no compression and 9 means maximum compression).
CommonDataModel.checksum
— Functionchecksummethod = checksum(v::Variable)
Return the checksum method of the variable v
which can be either be :fletcher32
or :nochecksum
.
Coordinate variables and cell boundaries
CommonDataModel.coord
— Functioncv = coord(v::Union{CFVariable,Variable},standard_name)
Find the coordinate of the variable v
by the standard name standard_name
or some standardized heuristics based on units. If the heuristics fail to detect the coordinate, consider to modify the file to add the standard_name
attribute. All dimensions of the coordinate must also be dimensions of the variable v
.
Example
using NCDatasets
ds = NCDataset("file.nc")
ncv = ds["SST"]
lon = coord(ncv,"longitude")[:]
lat = coord(ncv,"latitude")[:]
v = ncv[:]
close(ds)
CommonDataModel.bounds
— Functionb = bounds(ncvar::NCDatasets.CFVariable)
Return the CFVariable corresponding to the bounds
attribute of the variable ncvar
. The time units and calendar from the ncvar
are used but not the attributes controling the packing of data scale_factor
, add_offset
and _FillValue
.