[Thread Prev][Thread Next][Index]

Re: [ferret_users] Finding frequency distribution



Hi,

Please note that the frequency histogram scripts have a bug in them.  The counts for the lower boundary are not correct.  Here is a discussion I had with Ansley on the topic....

>>>>>>>>>>

Ansley,

I think there is a bug with the lower boundary (i.e. not just zeroes).  Copy this code:

yes?SET DATA levitus_climatology
yes?let new = xsequence(temp)
!trying a different range outside the influence of zero fillers...
yes?go frequency_histogram new 5 10 0.5

This gives...
1    /  1:   4.50      0.
2    /  2:   5.00  34775.

verifying with my ad hoc method...

yes? let test = if new[d=1] ge 4.75 and new[d=1] lt 5.25 then new[d=1]
yes? let good = test[i=@ngd]
yes? list good
             VARIABLE : TEST[I=@NGD]
             X        : 0.5 to 10000000
          14624.

These two are way off.  The upper boundary appear to be fine (I checked) as do all the other values.  This is a big problem for anyone that has or plans to use this script.  Wonder if I can just use my ad hoc method within a repeat loop...

let bin = x
repeat/range=lo:hi:del/name=m(let test = if data ge (m-(bin/2)) and data lt (m+(bin/2)) then data;let good = test[i=@ngd];write to file)

...or something like this.

Steve

----- Original Message -----
From: Ansley Manke <Ansley.B.Manke@xxxxxxxx>
Date: Thursday, October 16, 2008 4:31 pm
Subject: Re: [ferret_users] Follow-up on:  bins in frequency_histogram.jnl


> Hi -
> Let me point you to what the script does. Actually zero is handled
> sort
> of inconsistently and I think wrongly - the script fills in the
> data
> with 0 and then does the operations on that, confusing values of
> zero
> and the zero level of the histogram and missing data.
>
> ...
> ! compute an index that numbers the histogram boxes on the desired
> output plot
> ! also cope with missing values ("MISSING()") and data below the
> minimums! ("MAX()").  Data above the max is simply not read in the
> final commands
> ! because it lies beyond the number of points in the grid gindex
> LET/QUIET vmin = $2
> LET/QUIET vmax = $3
> LET/QUIET vdelta = $4
> LET/QUIET vn = INT((vmax-vmin)/vdelta + 0.5) + 1
> LET/QUIET vindex = MISSING(INT(($1-vmin)/vdelta + 0.5) + 1, 0) ! 1
> to vn
> LET/QUIET index = MAX(0, vindex)
>
> So vindex is the data weighted to the size of the bins by division
> by
> vdelta, and rounded to the nearest integer. Then there's a bit more
> logic to cut the histogram off at the min and max values
>
> If you want exact counts you might look at refining this.
>
> Ansley
>
> Stephen Guimond wrote:
> > Ansley,
> >
> > Thanks.  I will probably use this one since it should take less
> time for large data sets.  What happens to the calculation on the
> boundaries?  For instance, if I look at temp 0.00 in the levitus
> data...>
> > 2    /  2:   0.00  52983.
> >
> > However...
> >
> > yes? let new = xsequence(temp)
> > yes? let test = if new[d=1] ge (-0.25) and new[d=1] lt 0.25 then
> new[d=1]> yes? let good = test[i=@ngd]
> > yes? list good
> >              VARIABLE : TEST[I=@NGD]
> >              X        : 0.5 to 10000000
> >           26978.
> >
> > They don't match here, but they do match for all the others I've
> tried.>
> > Steve
> >
> > ----- Original Message -----
> > From: Ansley Manke <Ansley.B.Manke@xxxxxxxx>
> > Date: Thursday, October 16, 2008 3:09 pm
> > Subject: Re: [ferret_users] Follow-up on:  bins in
> frequency_histogram.jnl> 
> > Cc: Ansley Manke <Ansley.B.Manke@xxxxxxxx>
> >
> >  
> >> Oh, and by the way, looking at the script
> frequency_histogram2.jnl, it
> >> does the same calculation, but uses Ferret functions to do the
> >> sorting;so it doesn't have to write out that big intermediate
> file.
> >> The two
> >> scripts wind up with the exact same result.
> >>
> >> Ansley
> >>
> >> Ansley Manke wrote:
> >>    
> >>> Hi Stephen,
> >>> Interestingly, it looks like a number of points are on the
> edges of
> >>> the histogram box, to within the precision of the data oand of
> >>>      
> >> Ferret> when it makes the comparisons. Your test counts values
> that
> >> are 27.25
> >>    
> >>> exactly, plus all the ones between, plus the ones that are
> 27.75
> >>>      
> >> Those> points on the boundary are counted by the algorithm in
> the
> >> script only
> >>    
> >>> once. So, instead of GE and LE, use GE and LT and they match:
> >>>
> >>>     yes? let test = if new[d=1] ge 27.25 and new[d=1] lt 27.75
> then>>>     new[d=1]
> >>>     yes? list good
> >>>     VARIABLE : TEST[I=@NGD]
> >>>     X : 0.5 to 1296000.5
> >>>     7828.
> >>>
> >>> Ansley
> >>>
> >>>
> >>> Stephen Guimond wrote:
> >>>      
> >>>> Hi Ansley,
> >>>>
> >>>> I am actually still struggling with the frequency_histogram
> >>>>        
> >> script.  The data I am using is 4-D with missing values and when
> I
> >> run it through the script I'm not getting consistent values with
> my
> >> "ad hoc" method.  Here is an example using the
> levitus_climatology
> >> data.>>
> >>    
> >>>> yes? SET DATA levitus_climatology
> >>>> !unravel temp so its just on x-axis to leave y-axis free for
> >>>>        
> >> routine>> yes?let new = xsequence(temp)
> >>    
> >>>> yes?go frequency_histogram new 0 32 0.5
> >>>>
> >>>> The number of counts for temp 27.50 is 7828.  Now when I test
> >>>>        
> >> this (using the bounds I found before), the two numbers don't
> agree...>>    
> >>>> yes?let test = if new[d=1] ge 27.25 and new[d=1] le 27.75 then
> >>>>        
> >> new[d=1]>> yes?let good = test[i=@ngd]
> >>    
> >>>> This gives 7843 for the count.  Thus, I'm still confused by
> this...>>>>
> >>>>
> >>>> Steve
> >>>>
> >>>> ----- Original Message -----
> >>>> From: Ansley Manke <Ansley.B.Manke@xxxxxxxx>
> >>>> Date: Thursday, October 16, 2008 12:47 pm
> >>>> Subject: Re: [ferret_users] Follow-up on:  bins in
> >>>>        
> >> frequency_histogram.jnl>> 
> >>    
> >>>> Cc: oar.pmel.ferret_users@xxxxxxxx
> >>>>
> >>>>  
> >>>>        
> >>>>> Hi all,
> >>>>> Just a couple of general comments here. Using Ferret itself
> to
> >>>>> check
> >>>>> what's in the data and experimenting with a script is a great
> >>>>>          
> >> idea.
> >>    
> >>>>> Sometimes creating some synthetic data and applying a script
> or
> >>>>> function
> >>>>> to that is helpful -- here one could try applying the script
> >>>>>          
> >> with
> >>    
> >>>>> different bins to a variable xx = x[x=1:100]  or some such
> thing.>>>>>
> >>>>> One can also always look at any script, and read the comments
> >>>>>          
> >> the
> >>    
> >>>>> authors included, with
> >>>>>
> >>>>>   yes? go/help frequency_histogram
> >>>>>
> >>>>> Stephen, your previous message mentioned that
> >>>>> frequency_histogram2.jnl
> >>>>> calls the function bin_index_wt which is missing from the
> >>>>>          
> >> Ferret
> >>    
> >>>>> releases.  That's an external function and I have put it into
> >>>>>          
> >> the
> >>    
> >>>>> downloads files for linux 32-bit and 64-bit from our
> downloads
> >>>>> page; it
> >>>>> will also be included in the next Ferret release in the
> ferret
> >>>>> executable as a statically-linked function. If you want it
> now,
> >>>>> download
> >>>>> and install the appropriate Ferret Executables tar file from
> >>>>>
> http://ferret.pmel.noaa.gov/static/Downloads/ferret_downloads.html>>>>>
> >>>>> Ansley
> >>>>>
> >>>>> Stephen Guimond wrote:
> >>>>>    
> >>>>>          
> >>>>>> All,
> >>>>>>
> >>>>>> I did some tooling around and found an answer to my
> question. 
> >>>>>>            
> >> It
> >>    
> >>>>>>      
> >>>>>>            
> >>>>> looks like the routine "frequency_histogram.jnl" look for
> >>>>>          
> >> values
> >>    
> >>>>> within the desired bin CENTERED on the value and INCLUDING
> the
> >>>>> edges....>
> >>>>>    
> >>>>>          
> >>>>>> yes? SET DATA levitus_climatology
> >>>>>> yes? GO  frequency_histogram temp[X=0:360,Y=0:45N,Z=0] 0 32 0.5
> >>>>>>
> >>>>>> I looked at temp 27.50 and found 1107 occurrences.  To check
> >>>>>>            
> >> the
> >>    
> >>>>>>      
> >>>>>>            
> >>>>> number is correct the search must look like this...
> >>>>>    
> >>>>>          
> >>>>>> yes? let test = if temp[X=0:360,Y=0:45N,Z=0] ge 27.25 and
> >>>>>>      
> >>>>>>            
> >>>>> temp[X=0:360,Y=0:45N,Z=0] le 27.75 then
> temp[X=0:360,Y=0:45N,Z=0]>>>>>    
> >>>>>          
> >>>>>> yes?let good = test[i=@ngd,j=@ngd,k=@ngd]
> >>>>>>
> >>>>>> This verifies the value of 1107.  It was useful for me to
> know
> >>>>>>      
> >>>>>>            
> >>>>> that information, hope it will be for others, too.



=======================================================
Stephen R. Guimond
Graduate Research Assistant
Florida State University
Center for Ocean-Atmospheric Prediction Studies (COAPS)
=======================================================


[Thread Prev][Thread Next][Index]

Contact Us
Dept of Commerce / NOAA / OAR / PMEL / TMAP

Privacy Policy | Disclaimer | Accessibility Statement