Performance tips
- Reading data from a file is not type-stable, because the type of the output of the read operation is dependent on the type defined in the NetCDF files and the value of various attribute (like
scale_factor
,add_offset
andunits
for time conversion). All this information cannot be inferred from a static analysis of the source code. It is therefore recommended to use type annotation if the resulting type of a read operation in known:
ds = NCDataset("file.nc")
nctemp = ds["temp"]
temp = nctemp[:,:] :: Array{Float32,2}
# heavy computation using temp
# ...
Alternatively, one can also use so-called function barriers since the function heavy_computation
will be specialized based on the type its input parameters.
function heavy_computation(temp)
# heavy computation using temp
# ...
end
ds = NCDataset("file.nc")
nctemp = ds["temp"]
temp = nctemp[:,:]
output = heavy_computation(temp)
Calling the barrier function with nctemp
would also be type-stable. Using the in-place NCDatasets.load!
function (which is unexported, so it has to be prefixed with the module name) does also lead to type-stable code and allows to reuse a memory buffer:
ds = NCDataset("file.nc")
temp = zeros(Float32,10,20)
NCDatasets.load!(variable(ds,"temp"),temp,:,:)
- Most julia functions (like
mean
,sum
,... from the module Statistics) access an array element-wise. It is generally much faster to load the data in memory (if possible) to make the computation.
using NCDatasets, BenchmarkTools, Statistics
ds = NCDataset("file.nc","c")
data = randn(100,100);
defVar(ds,"myvar",data,("lon","lat"))
close(ds)
ds = NCDataset("file.nc")
@btime mean(ds["myvar"]) # takes 107.357 ms
@btime mean(ds["myvar"][:,:]) # takes 106.873 μs, 1000 times faster
close(ds)
- Avoid, when possible, indexing with arrays and
CartesianIndex
as they also result in loading the data element-wise.
ds = NCDataset("dataset.nc");
v = ds["v1"][:,1:3,:]; # fast
v = ds["v1"][:,:,CartesianIndex(1)] # slow
v = ds["v1"][:,:,1] # fast
close(ds)