Performance tips

  • Reading data from a file is not type-stable, because the type of the output of the read operation is dependent on the type defined in the NetCDF files and the value of various attribute (like scale_factor, add_offset and units for time conversion). All this information cannot be inferred from a static analysis of the source code. It is therefore recommended to use type annotation if the resulting type of a read operation in known:
ds = NCDataset("file.nc")
nctemp = ds["temp"]
temp = nctemp[:,:] :: Array{Float32,2}

# heavy computation using temp
# ...

Alternatively, one can also use so-called function barriers since the function heavy_computation will be specialized based on the type its input parameters.

function heavy_computation(temp)
# heavy computation using temp
# ...
end

ds = NCDataset("file.nc")
nctemp = ds["temp"]
temp = nctemp[:,:]
output = heavy_computation(temp)

Calling the barrier function with nctemp would also be type-stable. Using the in-place NCDatasets.load! function (which is unexported, so it has to be prefixed with the module name) does also lead to type-stable code and allows to reuse a memory buffer:

ds = NCDataset("file.nc")

temp = zeros(Float32,10,20)
NCDatasets.load!(variable(ds,"temp"),temp,:,:)
  • Most julia functions (like mean, sum,... from the module Statistics) access an array element-wise. It is generally much faster to load the data in memory (if possible) to make the computation.
using NCDatasets, BenchmarkTools, Statistics
ds = NCDataset("file.nc","c")
data = randn(100,100);
defVar(ds,"myvar",data,("lon","lat"))
close(ds)

ds = NCDataset("file.nc")
@btime mean(ds["myvar"]) # takes 107.357 ms
@btime mean(ds["myvar"][:,:]) # takes 106.873 μs, 1000 times faster
close(ds)
  • Avoid, when possible, indexing with arrays and CartesianIndex as they also result in loading the data element-wise.
ds = NCDataset("dataset.nc");
v = ds["v1"][:,1:3,:]; # fast
v = ds["v1"][:,:,CartesianIndex(1)] # slow
v = ds["v1"][:,:,1] # fast
close(ds)