[Thread Prev][Thread Next][Index]

Re: [ferret_users] write missing value into output netcdf



Hi Russ,

| 2^100 is a poor choice for a large floating point number
| in either single or double precision. You can't evaluate it
| exactly.

I have two points to make.  First, 2^100 is exactly
representable.  The exponent of the IEEE single precision
format has 8 bits.  That means 2^n is exactly representable
for such an integer n as -128 <= n < 128.

In principle, there are machines which don't use the IEEE format,
but in practice, I've never seen such a machine on which Ferret
runs.

Second, you can argue that 2^100 is actually evaluated as
2.0 ^ 100.0 = power(2.0, 100.0), which is a floating-point
operation subject to truncation error. But, I don't see any
reason that this computation can't be done to the maximal
accuracy, unless the implementation uses a very sloppy asymptotic
formula or something.  So, even when the result is not exactly
resprentable, the same calculation would give the same result
down to the last bit across all platforms.  This second point
is my guess.

I'm attaching a simple Fortran program to demonstrate that
this is likely the case.  You can compile it with ifort.

| xint=2**100
| xr4=2.0**100

But, why does Ferret have to choose one method (Integer**Integer)
at some times and choose another (Real**Integer) at other times?
Your concern is based on the assumption that the evaluation
of "2^100" is done in one way at one time and in another at
another time.

| You are relying on the underlying software to make a conversion and
| will get different bit patterns depending on how the expression is
| evaluated. For instance, if the expression 2^100 is evaluated using
| integer arithmetic

I don't think Ferret does that.  It shouldn't, for the very reason
you cite.  The simplest implementation would be to evaluate
x^y always as a floating point exponentiation power(x,y).

| If you specify an explicit constant (eg 1E30) then the correct and
| (more importantly) consistent bit pattern according to the floating
| point model will be used.

Your concern will become reality when we mix single and double
precisions.  For example, 0.1_single /= 0.1_double.  In this case,
even your method (use an explicit constant) ceases to work:

    real(single), parameter:: s = 0.1D0
    real(double), parameter:: d = 0.1D0
    --->  s /= d

When this is a real problem, I recommend 0.999E9.  This is the largest
3-digit number I know which is exactly representable both in the
single-precision and double-precision IEEE format:

    real(single), parameter:: ss = 0.999E9, sd = 0.999D9
    real(double), parameter:: ds = 0.999E9, dd = 0.999D9
    ---> ss == sd == ds == dd

Cheers,
Ryo
--------------
program try
  implicit NONE
  real(4), parameter :: s = 2.0, e = 100.0
  real(8), parameter :: d = 2.0
  integer, parameter :: n = 100
  write(*,"((1X,ES50.43))") s**e, d**e, s**n, d**n
end


[Thread Prev][Thread Next][Index]
Contact Us
Dept of Commerce / NOAA / OAR / PMEL / Ferret

Privacy Policy | Disclaimer | Accessibility Statement