# Re: [ferret_users] Spatial correlations[EOF PC, field]

```Hi Fabian,

| When I plot the spatial correlations between the leading principal
| component of an SST dataset
| (inferred from Ferret's EOF functions based on correlation matrix)
| and the SST dataset itself, the correlations in the domain of the
| EOF analysis are very low (see attached Figure). In the North
| Pacific they are supposed to explain 20% of the total
| variance. Shouldn't R-squared averaged over the domain be about 0.2?
| Therefore the correlation should be 0.45 on average in this domain.

I quick calculation seems to show (I hope I didn't make mistakes in
the derivation) that

<T(t,i;n) T(t,i)> / intgr_i[ <T(t,i)T(t,i)> ]
= [A(n) / sum_i A(i)] g(i;n) g(i;n)

where t is time, i is an index for spatial points, n is the EOF mode
number we are focusing on, <> is time average, A(n) is the variance of
mode n, g(i;n) is the spatial structure of mode n, "intgr_i" is a
symbolical representation of spatial integration.  If there is no
spatial weights, "intgr_i" is "sum_i".  I've assumed that
g's are normalized in such a way that

intgr_i g(i;n) g(i;n) = 1 for each n.

I've also assumed that average is already subtracted from T.

Notice that the factor [A(n) / sum_i A(i)] is the contribution
of mode n to the total variance, which is "20%" in your case.
Since we have a factor "g(i;n) g(i;n)", the correlation can
be much smaller than 20% because of the way g is normalized.
Basically this is what your are seeing, I think  (It's not exactly
that, though. See below.)

If you integrate the expression in space, we recover the
contribution of mode n:

\intgr_i <T(t,i;n) T(t,i)> / intgr_i[ <T(t,i)T(t,i)> ]
= A(n) / sum_i A(i)

So, clearly, the correlation between the mode-n part of T
and the original T field contains the desired, correct
information.

Another issue here is that my correlation coefficient is
defined as <T(t,i;n) T(t,i)> divided by the total variance,
whereas the ordinary correlation coefficient between
T(t,i;n) and T(t,i) is defined as <T(t,i;n) T(t,i)> divided
by the standard deviation of T(t,i;n) and that of T(t,i).
The spatial integral of such a correlation yields
sqrt[A(n) / sum_i A(i)], I think.

I can send you (maybe personally) my derivation if you like.
(But, I may not be able to respond quickly; I'll be offline
for a while.)

Regards,
Ryo

```