Datasets
This page is about loading/writing, examining and operating directly on entire NetCDF datasets. For functions regarding the variables stored in them, see the Variables page.
Both variables and datasets share the functionality of the Attributes section.
NCDatasets.NCDataset
— TypeNCDataset(filename::AbstractString, mode = "r";
format::Symbol = :netcdf4,
share::Bool = false,
diskless::Bool = false,
persist::Bool = false,
memory::Union{Vector{UInt8},Nothing} = nothing,
attrib = [])
Load, create, or even overwrite a NetCDF file at filename
, depending on mode
"r"
(default) : open an existing netCDF file or OPeNDAP URL in read-only mode."c"
: create a new NetCDF file atfilename
(an existing file with the same name will be overwritten)."a"
: openfilename
into append mode (i.e. existing data in the netCDF file is not overwritten and a variable can be added).
If share
is true, the NC_SHARE
flag is set allowing to have multiple processes to read the file and one writer process (netcdf classic files only). Likewise setting diskless
or persist
to true
will enable the flags NC_DISKLESS
or NC_PERSIST
flag. More information is available in the NetCDF C-API.
Notice that this does not close the dataset, use close
on the result (or see below the do
-block).
The optional parameter attrib
is an iterable of attribute name and attribute value pairs, for example a Dict
, DataStructures.OrderedDict
or simply a vector of pairs (see example below).
Supported format
values:
:netcdf4
(default): HDF5-based NetCDF format.:netcdf4_classic
: Only netCDF 3 compatible API features will be used.:netcdf3_classic
: classic netCDF format supporting only files smaller than 2GB.:netcdf3_64bit_offset
: improved netCDF format supporting files larger than 2GB.:netcdf5_64bit_data
: improved netCDF format supporting 64-bit integer data types.
Files can also be open and automatically closed with a do
block.
NCDataset("file.nc") do ds
data = ds["temperature"][:,:]
end
Here is an attribute example:
using DataStructures
NCDataset("file.nc", "c", attrib = OrderedDict("title" => "my first netCDF file")) do ds
defVar(ds,"temp",[10.,20.,30.],("time",))
end;
The NetCDF dataset can also be a memory
as a vector of bytes. A non-empty string a filename
is still required, for example:
using NCDataset, HTTP
resp = HTTP.get("https://www.unidata.ucar.edu/software/netcdf/examples/ECMWF_ERA-40_subset.nc")
ds = NCDataset("some_string","r",memory = resp.body)
total_precipitation = ds["tp"][:,:,:]
close(ds)
Dataset
is an alias of NCDataset
.
mfds = NCDataset(fnames, mode = "r"; aggdim = nothing, deferopen = true,
isnewdim = false,
constvars = [])
Opens a multi-file dataset in read-only "r"
or append mode "a"
. fnames
is a vector of file names.
Variables are aggregated over the first unlimited dimension or over the dimension aggdim
if specified. Variables without the dimensions aggdim
are not aggregated. All variables containing the dimension aggdim
are aggregated. The variable who do not contain the dimension aggdim
are assumed constant.
If variables should be aggregated over a new dimension (not present in the NetCDF file), one should set isnewdim
to true
. All NetCDF files should have the same variables, attributes and groupes. Per default, all variables will have an additional dimension unless they are marked as constant using the constvars
parameter.
The append mode is only implemented when deferopen
is false
. If deferopen is false
, all files are opened at the same time. However the operating system might limit the number of open files. In Linux, the limit can be controled with the command ulimit
.
All metadata (attributes and dimension length are assumed to be the same for all NetCDF files. Otherwise reading the attribute of a multi-file dataset would be ambiguous. An exception to this rule is the length of the dimension over which the data is aggregated. This aggregation dimension can varify from file to file.
Setting the experimental flag _aggdimconstant
to true
means that the length of the aggregation dimension is constant. This speeds up the creating of a multi-file dataset as only the metadata of the first file has to be loaded.
Examples:
You can use Glob.jl to make fnames
from a file pattern, e.g.
using NCDatasets, Glob
ds = NCDataset(glob("ERA5_monthly3D_reanalysis_*.nc"))
Aggregation over a new dimension:
using NCDatasets
for i = 1:3
NCDataset("foo$i.nc","c") do ds
defVar(ds,"data",[10., 11., 12., 13.], ("lon",))
end
end
ds = NCDataset(["foo$i.nc" for i = 1:3],aggdim = "sample", isnewdim = true)
size(ds["data"])
# output
# (4, 3)
Useful functions that operate on datasets are:
Base.keys
— Methodkeys(ds::NCDataset)
Return a list of all variables names in NCDataset ds
.
Base.haskey
— Functionhaskey(ds::NCDataset,name)
haskey(d::Dimensions,name)
haskey(ds::Attributes,name)
Return true if the NCDataset ds
(or dimension/attribute list) has a variable (dimension/attribute) with the name name
. For example:
ds = NCDataset("/tmp/test.nc","r")
if haskey(ds,"temperature")
println("The file has a variable 'temperature'")
end
if haskey(ds.dim,"lon")
println("The file has a dimension 'lon'")
end
This example checks if the file /tmp/test.nc
has a variable with the name temperature
and a dimension with the name lon
.
Base.haskey(a::Attributes,name::SymbolOrString)
Check if name
is an attribute
Base.getindex
— Methodv = getindex(ds::AbstractDataset, varname::SymbolOrString)
Return the variable varname
in the dataset ds
as a CFVariable
. The following CF convention are honored when the variable is indexed:
_FillValue
ormissing_value
(which can be a list) will be returned asmissing
.scale_factor
andadd_offset
are applied (output =scale_factor
*data_in_file
+add_offset
)- time variables (recognized by the units attribute and possibly the calendar attribute) are returned usually as
DateTime
object. Note thatCFTime.DateTimeAllLeap
,CFTime.DateTimeNoLeap
andCF.TimeDateTime360Day
cannot be converted to the proleptic gregorian calendar used in julia and are returned as such. (SeeCFTime.jl
for more information about those date types.) If a calendar is defined but not among the ones specified in the CF convention, then the data in the file is not converted into a date structure.
A call getindex(ds, varname)
is usually written as ds[varname]
.
If variable represents a cell boundary, the attributes calendar
and units
of the related variables are used, if they are not specified. For example:
dimensions:
time = UNLIMITED; // (5 currently)
nv = 2;
variables:
double time(time);
time:long_name = "time";
time:units = "hours since 1998-04-019 06:00:00";
time:bounds = "time_bnds";
double time_bnds(time,nv);
In this case, the variable time_bnds
uses the units and calendar of time
because both variables are related thought the bounds attribute following the CF conventions.
See also cfvariable(ds, varname)
.
CommonDataModel.variable
— Functionv = variable(ds::NCDataset,varname::String)
Return the NetCDF variable varname
in the dataset ds
as a NCDataset.Variable
. No scaling or other transformations are applied when the variable v
is indexed.
CommonDataModel.variable(ds::AbstractDataset,variablename::SymbolOrString)
Return the variable with the name variablename
from the data set ds
.
CommonDataModel.cfvariable
— Functionv = cfvariable(ds::NCDataset,varname::SymbolOrString; <attrib> = <value>)
Return the variable varname
in the dataset ds
as a NCDataset.CFVariable
. The keyword argument <attrib>
are the attributes (fillvalue
, missing_value
, scale_factor
, add_offset
, units
and calendar
) relevant to the CF conventions. By specifing the value of these attributes, the one can override the value specified in the data set. If the attribute is set to nothing
, then the attribute is not loaded and the corresponding transformation is ignored. This function is similar to ds[varname]
with the additional flexibility that some variable attributes can be overridden.
Example:
NCDataset("foo.nc","c") do ds
defVar(ds,"data",[10., 11., 12., 13.], ("time",), attrib = Dict(
"add_offset" => 10.,
"scale_factor" => 0.2))
end
# The stored (packed) valued are [0., 5., 10., 15.]
# since 0.2 .* [0., 5., 10., 15.] .+ 10 is [10., 11., 12., 13.]
ds = NCDataset("foo.nc");
@show ds["data"].var[:]
# returns [0., 5., 10., 15.]
@show cfvariable(ds,"data")[:]
# returns [10., 11., 12., 13.]
# neither add_offset nor scale_factor are applied
@show cfvariable(ds,"data", add_offset = nothing, scale_factor = nothing)[:]
# returns [0, 5, 10, 15]
# add_offset is applied but not scale_factor
@show cfvariable(ds,"data", scale_factor = nothing)[:]
# returns [10, 15, 20, 25]
# 0 is declared as the fill value (add_offset and scale_factor are applied as usual)
@show cfvariable(ds,"data", fillvalue = 0)[:]
# returns [missing, 11., 12., 13.]
# Use the time units: days since 2000-01-01
@show cfvariable(ds,"data", units = "days since 2000-01-01")[:]
# returns [DateTime(2000,1,11), DateTime(2000,1,12), DateTime(2000,1,13), DateTime(2000,1,14)]
close(ds)
CommonDataModel.sync
— Functionsync(ds::NCDataset)
Write all changes in NCDataset ds
to the disk.
Base.close
— Functionclose(ds::NCDataset)
Close the NCDataset ds
. All pending changes will be written to the disk.
CommonDataModel.path
— Functionpath(ds::NCDataset)
Return the file path (or the opendap URL) of the NCDataset ds
CommonDatamodel.path(ds::AbstractDataset)
File path of the data set ds
.
NCDatasets.ncgen
— Functionncgen(fname; ...)
ncgen(fname,jlname; ...)
Generate the Julia code that would produce a NetCDF file with the same metadata as the NetCDF file fname
. The code is placed in the file jlname
or printed to the standard output. By default the new NetCDF file is called filename.nc
. This can be changed with the optional parameter newfname
.
CommonDataModel.varbyattrib
— Functionvarbyattrib(ds, attname = attval)
Returns a list of variable(s) which has the attribute attname
matching the value attval
in the dataset ds
. The list is empty if the none of the variables has the match. The output is a list of CFVariable
s.
Examples
Load all the data of the first variable with standard name "longitude" from the NetCDF file results.nc
.
julia> ds = NCDataset("results.nc", "r");
julia> data = varbyattrib(ds, standard_name = "longitude")[1][:]
Base.write
— Functionwrite(dest::AbstractDataset, src::AbstractDataset; include = keys(src), exclude = [])
Write the variables of src
dataset into an empty dest
dataset (which must be opened in mode "a"
or "c"
). The keywords include
and exclude
configure which variable of src
should be included (by default all), or which should be excluded
(by default none).
If the first argument is a file name, then the dataset is open in create mode ("c"
).
This function is useful when you want to save the dataset from a multi-file dataset.
To save a subset, one can use the view function view
to virtually slice a dataset:
Example
NCDataset(fname_src) do ds
write(fname_slice,view(ds, lon = 2:3))
end
All variables in the source file fname_src
with a dimension lon
will be sliced along the indices 2:3
for the lon
dimension. All attributes (and variables without a dimension lon
) will be copied over unmodified.
Notice that DateTime-structures from CFTime are used to represent time for non-standard calendars. Otherwise, we attempt to use standard structures from the Julia standard library Dates
.
Groups
A NetCDF group is a dataset (with variables, attributes, dimensions and sub-groups) and can be arbitrarily nested. A group is created with defGroup
and accessed via the group
property of a NCDataset
.
# create the variable "temperature" inside the group "forecast"
ds = NCDataset("results.nc", "c");
ds_forecast = defGroup(ds,"forecast")
defVar(ds_forecast,"temperature",randn(10,11,12),("lon","lat","time"))
# load the variable "temperature" inside the group "forecast"
forecast_temp = ds.group["forecast"]["temperature"][:,:,:]
close(ds)
CommonDataModel.defGroup
— FunctiondefGroup(ds::NCDataset,groupname; attrib = []))
Create the group with the name groupname
in the dataset ds
. attrib
is a list of attribute name and attribute value pairs (see NCDataset
).
group = CommonDatamodel.defGroup(ds::AbstractDataset,name::SymbolOrString)
Create an empty sub-group with the name name
in the data set ds
. The group
is a sub-type of AbstractDataset
.
Base.getindex
— Methodgroup = getindex(g::Groups,groupname::AbstractString)
Return the NetCDF group
with the name groupname
from the parent group g
.
For example:
ds = NCDataset("results.nc", "r");
forecast_group = ds.group["forecast"]
forecast_temp = forecast_group["temperature"]
Base.keys
— Methodnames = keys(g::Groups)
Return the names of all subgroubs of the group g
.
Common methods
One can iterate over a dataset, attribute list, dimensions and NetCDF groups.
for (varname,var) in ds
# all variables
@show (varname,size(var))
end
for (attribname,attrib) in ds.attrib
# all attributes
@show (attribname,attrib)
end
for (groupname,group) in ds.groups
# all groups
@show (groupname,group)
end