admin管理员组

文章数量:1289634

I have two C++ implementations that count pairs ((x, y)) satisfying ((x + y) % 7 == 0). Method 1 skips unnecessary iterations using y += 6, while Method 2 checks every y. I performed benchmarks on both small and large ranges, and here are my results.

Code for Both Methods (Smaller Range)

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

void method1(long long &comparisons) {
    int count = 0;
    comparisons = 0;

    for (int x = 50; x <= 300; ++x) {  
        int start = max(70, x + 1);

        for (int y = start; y <= 400; y++) {
            ++comparisons;
            if ((x + y) % 7 == 0) {
                ++count;
                y += 6; // Optimization: Skipping unnecessary checks
            }
        }
    }
    cout << "Method 1 count: " << count << "\n";
    cout << "Method 1 comparisons: " << comparisons << "\n";
}

void method2(long long &comparisons) {
    int count = 0;
    comparisons = 0;

    for (int x = 50; x <= 300; ++x) {  
        int start = max(70, x + 1);

        for (int y = start; y <= 400; ++y) {
            ++comparisons;
            if ((x + y) % 7 == 0) {
                ++count;
            }
        }
    }
    cout << "Method 2 count: " << count << "\n";
    cout << "Method 2 comparisons: " << comparisons << "\n";
}

int main() {
    long long comparisons1, comparisons2;

    auto start1 = high_resolution_clock::now();
    method1(comparisons1);
    auto end1 = high_resolution_clock::now();
    double time1 = duration<double>(end1 - start1).count();
    cout << "Method 1 Time: " << time1 << " sec\n\n";

    auto start2 = high_resolution_clock::now();
    method2(comparisons2);
    auto end2 = high_resolution_clock::now();
    double time2 = duration<double>(end2 - start2).count();
    cout << "Method 2 Time: " << time2 << " sec\n";

    double speedup = time2 / time1;
    double comparison_reduction = (double)comparisons2 / comparisons1;

    cout << "\nMethod 1 is **" << speedup << "×** faster than Method 2.\n";
    cout << "Method 1 does **" << comparison_reduction << "×** fewer comparisons.\n";

    return 0;
}

Question

  • Does this type of loop optimization (y += 6) generally lead to performance improvements, or does it depend on specific compiler optimizations?
  • Are there better ways to approach this kind of problem in C++?

Benchmark Results

Smaller Range (50 → 300, 70 → 400)


Method 1 count: 8040
Method 1 comparisons: 8796
Method 1 Time: 3.114e-05 sec

Method 2 count: 8040
Method 2 comparisons: 56285
Method 2 Time: 1.6753e-05 sec

Method 1 is **0.53799×** faster than Method 2.
Method 1 does **6.39893×** fewer comparisons.

Larger Range (5000 → 30000, 7000 → 40000)


Method 1 count: 80074785
Method 1 comparisons: 80149790
Method 1 Time: 0.0372139 sec

Method 2 count: 80074785
Method 2 comparisons: 560523500
Method 2 Time: 0.127715 sec

Method 1 is **3.43193×** faster than Method 2.
Method 1 does **6.99345×** fewer comparisons.

Analysis

  • For a small range, Method 1 does ~6.4× fewer comparisons but isn't significantly faster. (Why?)
  • For a larger range, Method 1 does ~7× fewer comparisons and is ~3.4× faster, proving its efficiency at scale.
  • Optimization Impact: Skipping unnecessary checks (y += 6) improves performance as input size grows.

I have two C++ implementations that count pairs ((x, y)) satisfying ((x + y) % 7 == 0). Method 1 skips unnecessary iterations using y += 6, while Method 2 checks every y. I performed benchmarks on both small and large ranges, and here are my results.

Code for Both Methods (Smaller Range)

#include <iostream>
#include <chrono>

using namespace std;
using namespace std::chrono;

void method1(long long &comparisons) {
    int count = 0;
    comparisons = 0;

    for (int x = 50; x <= 300; ++x) {  
        int start = max(70, x + 1);

        for (int y = start; y <= 400; y++) {
            ++comparisons;
            if ((x + y) % 7 == 0) {
                ++count;
                y += 6; // Optimization: Skipping unnecessary checks
            }
        }
    }
    cout << "Method 1 count: " << count << "\n";
    cout << "Method 1 comparisons: " << comparisons << "\n";
}

void method2(long long &comparisons) {
    int count = 0;
    comparisons = 0;

    for (int x = 50; x <= 300; ++x) {  
        int start = max(70, x + 1);

        for (int y = start; y <= 400; ++y) {
            ++comparisons;
            if ((x + y) % 7 == 0) {
                ++count;
            }
        }
    }
    cout << "Method 2 count: " << count << "\n";
    cout << "Method 2 comparisons: " << comparisons << "\n";
}

int main() {
    long long comparisons1, comparisons2;

    auto start1 = high_resolution_clock::now();
    method1(comparisons1);
    auto end1 = high_resolution_clock::now();
    double time1 = duration<double>(end1 - start1).count();
    cout << "Method 1 Time: " << time1 << " sec\n\n";

    auto start2 = high_resolution_clock::now();
    method2(comparisons2);
    auto end2 = high_resolution_clock::now();
    double time2 = duration<double>(end2 - start2).count();
    cout << "Method 2 Time: " << time2 << " sec\n";

    double speedup = time2 / time1;
    double comparison_reduction = (double)comparisons2 / comparisons1;

    cout << "\nMethod 1 is **" << speedup << "×** faster than Method 2.\n";
    cout << "Method 1 does **" << comparison_reduction << "×** fewer comparisons.\n";

    return 0;
}

Question

  • Does this type of loop optimization (y += 6) generally lead to performance improvements, or does it depend on specific compiler optimizations?
  • Are there better ways to approach this kind of problem in C++?

Benchmark Results

Smaller Range (50 → 300, 70 → 400)


Method 1 count: 8040
Method 1 comparisons: 8796
Method 1 Time: 3.114e-05 sec

Method 2 count: 8040
Method 2 comparisons: 56285
Method 2 Time: 1.6753e-05 sec

Method 1 is **0.53799×** faster than Method 2.
Method 1 does **6.39893×** fewer comparisons.

Larger Range (5000 → 30000, 7000 → 40000)


Method 1 count: 80074785
Method 1 comparisons: 80149790
Method 1 Time: 0.0372139 sec

Method 2 count: 80074785
Method 2 comparisons: 560523500
Method 2 Time: 0.127715 sec

Method 1 is **3.43193×** faster than Method 2.
Method 1 does **6.99345×** fewer comparisons.

Analysis

  • For a small range, Method 1 does ~6.4× fewer comparisons but isn't significantly faster. (Why?)
  • For a larger range, Method 1 does ~7× fewer comparisons and is ~3.4× faster, proving its efficiency at scale.
  • Optimization Impact: Skipping unnecessary checks (y += 6) improves performance as input size grows.
Share Improve this question edited Feb 19 at 18:04 Peter Cordes 366k49 gold badges716 silver badges977 bronze badges asked Feb 19 at 17:50 Chirag JainChirag Jain 11 silver badge 8
  • 1 I'm trying to understand the "scope" of your question. This specific problem can be solved much more efficiently: you don't need to iterate over the values of y in order to count them. Is that relevant/useful? Or is this problem just a placeholder, and your real goal is to understand the performance characteristics you're seeing? – ruakh Commented Feb 19 at 18:10
  • 1 How did you compile these programs, and what hardware did you run them on? (e.g. gcc -O3 -march=native with GCC 14.2, running on i7-6700k Skylake on Arch GNU/Linux would be a possible answer.) If you didn't enable optimization, then latency bottlenecks on most CPUs from store/reload of loop variables might explain some of the effect. Or CPU-frequency warm-up effects for a very short run make the first thing slower: Idiomatic way of performance evaluation? – Peter Cordes Commented Feb 19 at 18:19
  • 1 What you really should do is learn how to use a profiler, you will find out much about where the real time is spend in your code. E.g. missed branch predictions, slow modulo operations (spoiler alert they are). Or just learn about the "divisibility by 7 check" that contains (almost) no branches and no divisions. – Pepijn Kramer Commented Feb 19 at 18:36
  • This is that mathematical shortcut : Step 1: Pick the last digit of the large number. Step 2: Multiply it by 2. Subtract the product with the rest of the digits to its left leaving behind the last digit. Step 3: If the difference is 0 or a multiple of 7, then the number is divisible by 7. – Pepijn Kramer Commented Feb 19 at 18:36
  • Are there better ways to approach this kind of problem in C++? Depending on your perspective... yes, q.v. Jonathan Müller's Functional Programming in C++ presentation from CppCon 2024. – Eljay Commented Feb 19 at 19:21
 |  Show 3 more comments

1 Answer 1

Reset to default 1

Method #1 reduces the number of comparisons necessary with the y += 6;. But it still checks whether the result is divisible by 7, even though it's already assured that it is.

When you get down to it, you don't really need to check the remainder very often at all. If x + y % 7 == 0, then you know the same will be true for y+7, y+14, y+21, and so on.

But then on the next iteration of the outer loop, you have x = prev_x + 1. You don't really need to search for a value that will give a multiple of 7. since x is one greater than it was the last iteration, the y that gives a multiple of 7 will be one less than it was in the previous iteration. That is, if some_x + some_y % 7 == 0, then (some_x + 1) + (some_y - 1) % 7 == 0 as well.

On an unrelated note, Howard Hinnant (the author of most of the chrono library) now advises against using high_resolution_clock. If memory serves, he generally advises using steady_clock instead.

本文标签: performanceBenchmarking Two C Implementations for Counting Pairs Divisible by 7Stack Overflow