[Thread Prev][Thread Next][Index]

Re: [ExternalEmail] Re: [ferret_users] saving very large .nc file with repeat




Hi Paul,

I forgot to specify a range in the save command. That's why it failed. The problem with the ncdump reporting that it wasn't a netcdf file is
probably due to you having a version of ncdump that is old and/or doesn't handle netcdf4 correctly.

Here's something that worked on a ssh file that I use.

use ocean_eta_t_2012_01.nc
let p=t[gt=eta_t,l=1:31]
let q=eta_t[l=1:31]
set grid q
go regresst
let ssh_detrended=eta_t-qhat
def sym XC 40
def sym YC 40
def sym TC 10
def sym filepath reg.nc
def sym chunking "XCHUNK=($XC)/YCHUNK=($YC)/TCHUNK=($TC)"
def sym savecom "save/($chunking)/NCFORMAT=4/deflate=1/shuffle/file=($filepath)"

! Initial save for all time at i=1,j=1. For some reason can't just save first time point.
($savecom)/i=1/j=1/clob/ilimits=1:3600/jlimits=1:1500 ssh_detrended

! Specifying a range helps a bit...

repeat/range=1:1500:($YC)/name=jb ( repeat/range=1:3600:($XC)/name=ib ($savecom)/app/i=`ib`:`ib+($XC)-1`/j=`jb`:`jb+($YC)-1` ssh_detrended)

-

Also, are you sure you want "eta_t-qhat"? This also eliminates the mean SSH. At worst I thinks you should be saving "qave" as well so that you have the correct offset.

Chunking may or may not be worth it. It really depends on how you intend to access the file and whether you can exploit the access patterns efficiently.
For instance, if you do a global shade plot of a time slice of ssh_detrended  generated by the script above you'll notice a significant slowdown due to the increased number of read operations. On the otherhand a Y-T shade plot may be much quicker.

 Compression will probably halve the size of files like this.

Cheers,
Russ



On 23/04/17 06:31, Paul Goddard wrote:
Thanks Russ,
An update on working with my large dataset:

I tried several of your methods and slight variations of, trying to understand how they work and what size the finished .nc file would be.

I was able to produce my desired detrended dataset via repeating through one latitude at a time rather than through time and space as I previous attempted. This took about 24 hours; and the resulting .nc is the same size as the original data 265 GB.  So, not the quickest, and probably not the way I should go in the future if a quicker/more efficient method exists.

I also ran the chunking method.  This created 3.4 GB file in about 4 hours. When loaded and shade in ferret/pyferret, the data are missing values for the complete file. Also, when I ncdump -h, I receive an error NetCDF: Unknown file format. I am a novice when it comes to understanding how the computer is compressing the file via chunk, though I am trying to understand through the ferret doc and its reference to unidata and its documention on netcdf4 file compression.

I include my modified script below, thanks for the help, Paul

use "./CM2.6_Control_SSH_01810101-02001231.nc"

set memory/size=99999

let P  = T[GT=SSH,l=1:7305]
let Q = SSH[l=1:7305]
SET GRID Q
GO regresst

let SSH_Detrended = SSH - qhat

def sym XC 40
def sym YC 30
def sym TC 100

def sym chunking "XCHUNK=($XC)/YCHUNK=($YC)/TCHUNK=($TC)"

def sym filepath "./CM2.6_Control_SSH_DETRENDED_01810101-02001231_Chunks.nc"

def sym savecom save/($chunking)/NCFORMAT=4/deflate=1/shuffle/file="($filepath)"


On Thu, Apr 20, 2017 at 9:05 PM, Russ Fiedler <russell.fiedler@xxxxxxxx> wrote:

Oops a couple of errors there

set var/ouytype=real ssh_detrended

and I missed a slash in the following.

def sym savecom "save/($chunking)/NCFORMAT=4/deflate=1/shuffle/file=ssh_detrend.nc"


Russ


On 21/04/17 13:38, Russ Fiedler wrote:

Hi Paul,

Rather than doing this over all space and time it will be easier to to this one latitude at a time. The way you are doing it requires reading in the complete data set multiple times. If you SET MODE DIAGNOTIC you'll see what I mean. I think you are probably flushing the values of slope and intercept and have to recompute them.

Also, why save a complete copy of the detrended data? It's huge! It's over half a TB as you are storing in double precision. Lucky it's only SSH and not a 3D variable! You only need to save the slope and intercepts.

What I do is save the i=1,j=1 value with ILIMITS and JLIMITS specified. This will set up the file

let P  = T[GT=SSH,l=1:7305]
let Q = SSH[l=1:7305]
SET GRID Q
GO regresst

let SSH_Detrended = SSH - qhat

! Save all values

save/i=1/j=1/l=1/clob/ilimits=1:3600/jlimits=1:2700/file=ssh_detrend.nc ssh_detrended
repeat/j=1:2700 save/app/file=ssh_detrend.nc ssh_detrended                                  ! See note below on chunking        

! Save just the slope and intercept

save/i=1/j=1/clob/file=slope_int.nc/ilimits=1:3600/jlimits=1:2700 slope,intercept
repeat/j=1:2700 save/app/file=slope_int.nc slope,intercept

If the original data is is only 4 byte real then SET VAR/OUTTYPE=FLOAT is probably useful.

You may also want to chunk/compress the data in time and space depending on how you intend to access it.

Something like

set var/outputtype=real ssh_detrended

def sym XC 40
def sym YC 30
def sym TC 100

def sym chunking "XCHUNK=($XC)/YCHUNK=($YC)/TCHUNK=($TC)"
def sym savecom "save/($chunking)/NCFORMAT=4/deflate=1/shuffle file=ssh_detrend.nc"
 
($savecome)/i=1/j=1/l=1/clob/ilimits=1:3600/jlimits=1:2700 ssh_detrended

repeat/range=1:2700:($YC)/name=jb ( repeat/range=1:3600:($XC)/name=ib ($savecome)/app/i=`ib+($XC)-1`/j=`jb+($YC)-1` ssh_detrended)

Play with XC,YC and TC as you wish.



Russ

On 21/04/17 11:57, Paul Goddard wrote:
Hello,

I am attempting to detrend SSH data over time at each ocean grid cell. The problem is that the data are very large, the grid is i=1:3600 , j=1:2700, and l=1:7305.

In the past, I was able to save large data by repeating (looping) over the time dimension. However, since a detrending calculation must happen as well; it is taking too long to even save the first year (going on 2 hours... and may crash before it saves l=1..)

Any ideas on the best way to save such large data? Given that this is the resolution of many of the ocean models for CMIP6, I think I better learn a good way to complete these tasks.

Tthank you in advance, Paul

Here is my script:

can data/all
can var/all
can win/all

use "./CM2.6_Control_SSH_01810101-02001231.nc"

set memory/size=99999

let P  = T[GT=SSH,l=1:7305]
let Q = SSH[i=1:3600,j=1:2700,l=1:7305]
SET GRID Q
GO regresst

let SSH_Detrended = SSH[i=1:3600,j=1:2700,l=1:7305] - qhat[i=1:3600,j=1:2700,l=1:7305]

!Control save
save/clobber/file = "/archive/Paul.Goddard/CM2.6/Storm_Surge_Project/CM2.6_Control_SSH_DETRENDED_01810101-02001231.nc" SSH_DETRENDED[l=1]

repeat/l=2:7305 (\
save/append/file = "/archive/Paul.Goddard/CM2.6/Storm_Surge_Project/CM2.6_Control_SSH_DETRENDED_01810101-02001231.nc" SSH_DETRENDED[l=`l`])






[Thread Prev][Thread Next][Index]
Contact Us
Dept of Commerce / NOAA / OAR / PMEL / Ferret

Privacy Policy | Disclaimer | Accessibility Statement