admin管理员组

文章数量:1389768

I want to have the global minimum (floating point types) across all MPI ranks and proceed only on the rank which computed the local minimum.

Specifically, can we compare the global and local minimum for exact equality (==)? Or is it possible that there can occur floating point errors in the reduction itself?

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int rank;
    double minDistance = 100.0 + rank;
    double globalMinDistance;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    // Reduce to find the global minimum
    MPI_Reduce(&minDistance, &globalMinDistance, 1, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);

    // Can i safely compare the doubles here with ==?
    if (minDistance == globalMinDistance) {
        std::cout << "Rank " << rank << " has the globally minimum distance: " << globalMinDistance << std::endl;
    }

    MPI_Finalize();
    return 0;
}

I want to have the global minimum (floating point types) across all MPI ranks and proceed only on the rank which computed the local minimum.

Specifically, can we compare the global and local minimum for exact equality (==)? Or is it possible that there can occur floating point errors in the reduction itself?

#include <iostream>
#include <mpi.h>

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int rank;
    double minDistance = 100.0 + rank;
    double globalMinDistance;

    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    
    // Reduce to find the global minimum
    MPI_Reduce(&minDistance, &globalMinDistance, 1, MPI_DOUBLE, MPI_MIN, 0, MPI_COMM_WORLD);

    // Can i safely compare the doubles here with ==?
    if (minDistance == globalMinDistance) {
        std::cout << "Rank " << rank << " has the globally minimum distance: " << globalMinDistance << std::endl;
    }

    MPI_Finalize();
    return 0;
}
Share Improve this question asked Mar 13 at 14:42 MathieuMathieu 3327 bronze badges 20
  • 1 "Floating point errors" is something of a misconception. Floating-point calculations often give slightly different results from what we get with real numbers, but that's because floating-point values aren't real numbers; they're floating point. Just as (1/3)*3 isn't 1, (1.0/3.0)*3.0 isn't 1.0. The question of whether the floating-point values that you calculated are close enough to the corresponding real-number values that you're thinking of can't be answered without knowing and exploring how you calculated the values in question. – Pete Becker Commented Mar 13 at 14:49
  • 1 There are tricky things in floating-point math. For example, double ratio = 1.0/3.0; if (ratio == 1.0/3.0) { do_sometthing(); } might not call do_something(). That's because the compiler is allowed to do floating-point math at higher precision, until you store the resulting value. So, typically, ratio will be 64 bits wide, but 1.0/3.0 might be 80 bits wide. So don't skip intermediate stores if you care about equality tests. – Pete Becker Commented Mar 13 at 14:53
  • 1 Changing the order of summation can change the result, this isn't an error but has to be expected – 463035818_is_not_an_ai Commented Mar 13 at 14:55
  • @463035818_is_not_an_ai Let me give an example: min1 = 1.2345... and min2 = 6.789. So MPI_Reduce will return 1.2345... Why is the returned value not exactly euqal to the input`min1' ? I agree that thinks are different when e.g. summations are involved. – Mathieu Commented Mar 13 at 15:05
  • sorry misread the code. As the answer states, there is no summation nor other rounding errors in finding a minimum. Is there a reason the answer is phrased in a speculative way? Did you actually run the code and got mismatching results? – 463035818_is_not_an_ai Commented Mar 13 at 15:13
 |  Show 15 more comments

3 Answers 3

Reset to default 2

There should be no roundoff in doing a minimum reduction. Your code looks safe to me. Have you tested it?

However, you are testing globalMinDistance on each process, while the value is only set on rank zero. Use an Allreduce.

Floating point numbers are not broken, they just have finite precision.

You must be aware of rounding errors. In particular (a + b) + c is not always the same as a + (b + c). Assignent however, is not subject to rounding errors.

float x = 0.3;
float y;
y = x;
std::cout << (x == 0.3);
std::cout << (y == 0.3);
std::cout << (y == x);

The last line prints 1, you can count on that. The lines before suffer from finite precision of float and print 0.

If you assume the reduction is only based on comparison via < and assignment you can expect == to yield the correct result.

Independent of the discussion regarding floating point operations, and the issue about using reduce vs. allreduce that @VictorEijkhout pointed out, your approach has another issue. With your approach, you have no guarantee that you will determine a single process contributing the minimal value. Multiple processes might accidentally have the same minimal value. But, MPI has a way more elegant solution for your problem, which will also provide you always a single solution: You can use minloc reduction.

The following code will always print exactly one line with "minloc distance", but will print one line per process with "minimal distance", if you execute as ./a.out 0.

#include <iostream>
#include <cstdlib>
#include <mpi.h>

struct minloc{
  double value;
  int loc;
};

int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);

    int rank, alpha=1;
    if (argc>1) alpha = atoi(argv[1]);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    minloc localMinloc = {100.1 - rank*alpha, rank}, globalMinloc;
    
    // Reduce to find the global minimum
    MPI_Allreduce(&localMinloc, &globalMinloc, 1, MPI_DOUBLE_INT, MPI_MINLOC, MPI_COMM_WORLD);

    if (rank == globalMinloc.loc) {
        std::cout << "Rank " << rank << " has the globally miniloc distance: " << globalMinloc.value << std::endl;
    }
    if (localMinloc.value == globalMinloc.value) {
        std::cout << "Rank " << rank << " has the globally minimum distance: " << globalMinloc.value << std::endl;
    }

    MPI_Finalize();
    return 0;
}

Produces output like:

$ mpirun -np 4 ./a.out 0
Rank 3 has the globally minimum distance: 100.1
Rank 0 has the globally miniloc distance: 100.1
Rank 0 has the globally minimum distance: 100.1
Rank 1 has the globally minimum distance: 100.1
Rank 2 has the globally minimum distance: 100.1
$ mpirun -np 4 ./a.out 1
Rank 3 has the globally miniloc distance: 97.1
Rank 3 has the globally minimum distance: 97.1

本文标签: cfloating point error in MPIReduce possibleStack Overflow