[Thread Prev][Thread Next][Index]

Re: Large files access time



Hi Jean-Marie,
Thank you for the report, and for tracking down which versions behave this way.  I've found that there's a bug in the USE/REGULART command which was introduced when we added the use of the AXIS and CARTESIAN_AXIS attributes in netCDF files.  REGULART is not applied, and this change was made between versions 5.51 and 5.53, as you found.

It's a simple fix and it'll be in the next release of Ferret --

Ansley

J-M Epitalon wrote:
Steve,

after further investigation, I noticed the following:

When using option use/regulart for accessing a 1.5Gb file,
access time is immediate when Ferret version 5.51 (for RedHat 7.1) is
used.

It is much longer (20 to 45 seconds depending on local or remote) when
Ferret 5.53 (for RedHat 9) and up (5.7 an 5.8 tested) is used.....

I will continue my investigations, testing Ferret 5.8 for RedHat 7.1

Please, if you have any idea about this, tell me.

Jean-Marie Epitalon

  
Hi J-M,

The usual interpretation of this is disk latency:  the first time you open the file
the contents are read from disk (30-45 sec).  The second time, the disk blocks are
still in the Unix disk cache -- "opens immediately".

This implies that there is a substantial amount of disk seeking going on.  I'm not
sure why ...   You could check in the netCDF email archives or post a message to the
netCDF group. If you have a debug ("-g") version of Ferret available you can step
through the netCDF open sections and probably can see exactly which netCDF call is
taking so long ...

    - steve

J-M Epitalon wrote:

    
Steve,

thanks for your answer.
I notice that the option /regulart speeds up quite a bit.
On my personnal workstation, a Red-Hat PC running Ferret 5.51, the
instruction use/regulart executes instantaneously.

However, on my LAS server, which is a quadriprocessor PowerEdge 2650
DELL PC with Red Hat Enterprise Linux, running Ferret 5.8, I cannot
achieve the same result:

When opening a 1.5 Gb NetCDF file, it takes 20 to 30 seconds, even for a
local file!!!
For a NFS remote file, it takes 30 to 45s.

Then, the second time I open it, whether in the same or another Ferret
session, it opens immediately.

I tried different versions of Ferret for Linux Red-Hat 9: v5.53, 5.7 and
5.8 and I cannot see any difference.
I suppose the problem lies in the PC or in the OS...

Please, could you help me understand this ?

Jean-Marie

On Tue, 2005-05-03 at 05:24, Steve Hankin wrote:
      
Hi Jean-Marie,

Did you try
    yes?  USE/regularT my_file.nc

When that qualifier is used Ferret reads only the first and last time step of
the time axis.  There's a long disk seek between the two time steps, since the
file is large, but it would account for only a fraction of a second in the
worst case.

If you are using /regularT and still getting the slow speed then I cannot
account for it.  Can you send us the output of ncdump -c  ?

    - steve


J-M Epitalon wrote:

        
Jon and Steve,

sorry. I mislead you.
My file is 1.5 Gb (not Mb) !
and it is accessed through NFS.
I suppose this is why it is really slow.

Anyway, when accessed localy, it still takes half a minute to open it.
(3 to 4 mn thru NFS)

Since I want to keep the system architecture with NFS, I solved my
problem by splitting the file in several parts and using a MC file.

Please, what would you suggest as an alternative solution ?

Jean-Marie Epitalon
CERFACS
Toulouse, FR

On Fri, 2005-04-22 at 18:27, Jonathan Callahan wrote:
          
Jean-Marie,

This seems inordinately slow for Ferret.  Especially for a file that is
only 1.5 Mb!  When this happens it usually means that the time axis is
marked as 'irregular' which means that Ferret has to read in the entire
time axis before it can begin.  We have found many cases where the axis
was actually 'regular' in spite of what the NetCDF attribute says.  If
this is the case then you can use ncatted to change the time axis
attribute and Ferret should open this dataset much more quickly.


-- Jon


J-M Epitalon wrote:

            
hello,

I have daily simulation output covering the period 01-01-1950 to
31-12-2005. It is in a NetCDF file that is 1.5 Mb large.

When I access it through Ferret, it takes 2 to 4 mn to open it (execute
instruction "use").

I read in NCO documentation that "Some random access operations on large
files on certain architecture are slow when using the NetCDF interface".

That seems to be the problem. I could check that it is slow also with
other tools than ferret (Python, or NCO tools).

Anyway, what solution would you suggest to work around this ?
Is using MC descriptors a good idea?

Thanks

Jean-Marie Epitalon
CERFACS
Toulouse, France



              
            
        
    

  

[Thread Prev][Thread Next][Index]

Dept of Commerce / NOAA / OAR / PMEL / TMAP

Contact Us | Privacy Policy | Disclaimer | Accessibility Statement