[Thread Prev][Thread Next][Index]

Re: variance: @VAR and @MODVAR



Hi Andrew et. al.,

<apologies in advance if this response arrives out of sequence -- I am responding off line.
Also apologies because I have not thought about this topic in a long time and have no computer
in front of me to get in the groove -- some cob webs may be blocking my view>

The Ferret documentation should reflect that @VAR and @MODVAR are different sorts of
calculations.  @VAR regards the variable as a continuous function and computed a "weighted
variance" --  I suppose it is actually the "second moment".  It is designed to be consistent
with @AVE.  It weighs the influence of grid cells according to their sizes -- to approximate
     INTEGRAL[(var-AVE(var))^2]dx / INTEGRAL(X)dx.
The documentation discusses the N/(N-1) correction factor.  (The integral value represents the
limiting case of N going to infinity.)

By contrast @MODVAR is a discrete, finite calculation that assumes that all repetitions of the
modulo cycle are identical in character -- allowing for unequal grid cells within the modulo
cycle, but requiring consistency between cycles.  No cell weighting is involved.  It does
include the N/(N-1) correction.

To achieve the discrete/finite variance calculation you can use @SUM and @NGD directly.
Similarly if you want the arithmetic mean instead of the average (I am always a little
uncomfortable with those terms.)

There is indeed the possibility of confusion among these subtleties, and your input on how to
minimize it is welcome.  In my assessment the ambiguity arises as soon as your coordinate model
permits unequal cell sizes.  It becomes unclear if the view of the data is continuous or
discrete.   We might improve the clarity somewhat by adding a new transformation @DVAR
(discrete variance) -- figuring that when the user sees these two transformations side-by-side
he will have to think about this subtlety.  Other ideas?

    - steve

=============================

achieve consistency we'd

Andrew Wittenberg wrote:

> Hi all,
>
> There seems to be some inconsistency between the Ferret "variable
> transform" @VAR and the "regridding transform" @MODVAR, both of which
> purport to compute the sample variance.  Given N elements, @MODVAR
> correctly divides the sum-of-squares by N-1, the number of degrees of
> freedom.  But @VAR divides by N, giving an underestimate of the variance
> for small N.
>
> An example:
>
> yes? def ax/x=1:1:1/modulo xax
> yes? let a = {0,2}
> yes? let a2 = (a-a[i=@ave])^2
>
> ! Here @MODVAR gives the right answer for the variance.
> yes? list a[gx=xax@modvar], a2[i=@sum]/(a[i=@ngd]-1)
>  Column  1: A[G=1 delta on X@MODVAR,X=1] is Variance of {0,2} (regrid: 1 delta on X@MODVAR)
>  Column  2: EX#2 is A2[I=@SUM]/(A[I=@NGD]-1)
>               A   EX#2
> I / *:     2.000  2.000
>
> ! But @VAR gives a low-biased estimate.
> yes? list a[i=@var], a2[i=@sum]/a[i=@ngd]
>              X: 0.5 to 2.5
>  Column  1: A[X=@VAR] is Variance of {0,2}
>  Column  2: EX#2 is A2[I=@SUM]/A[I=@NGD]
>               A   EX#2
> I / *:     1.000  1.000
>
> Are there any wishes/plans to make these consistent?  Preferably by making
> @VAR divide by N-1 instead of N?
>
> Andrew
>
> +--------------------------------------------------------+
> |   Dr. Andrew T. Wittenberg   |        GFDL/NOAA        |
> |  Andrew.Wittenberg@noaa.gov  |      Princeton, NJ      |
> +--------------------------------------------------------+





[Thread Prev][Thread Next][Index]

Dept of Commerce / NOAA / OAR / PMEL / TMAP

Contact Us | Privacy Policy | Disclaimer | Accessibility Statement