/g/ - In C++ this is 60% slower - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
In C++ this is 60% slower 09/05/25(Fri)17:01:40 No.106495377

File: aaa.png (29 KB, 680x357)

29 KB PNG

In C++ this is 60% slower Anonymous 09/05/25(Fri)17:01:40 No.106495377

And some retards still believe the compiler will optimize everything for you.

Anonymous
09/05/25(Fri)17:02:17 No.106495382

Anonymous 09/05/25(Fri)17:02:17 No.106495382

Show godbolt output

Anonymous
09/05/25(Fri)17:02:48 No.106495388

Anonymous 09/05/25(Fri)17:02:48 No.106495388

>>106495382
No, do it yourself

Anonymous
09/05/25(Fri)17:03:50 No.106495401

Anonymous 09/05/25(Fri)17:03:50 No.106495401

>>106495388
I don't care enough

Anonymous
09/05/25(Fri)17:05:59 No.106495422

Anonymous 09/05/25(Fri)17:05:59 No.106495422

File: 0_billy-butcher-in-the-boys.jpg (39 KB, 810x539)

39 KB JPG

>cnt

Anonymous
09/05/25(Fri)17:13:24 No.106495501

Anonymous 09/05/25(Fri)17:13:24 No.106495501

File: Screenshot_2025-09-05_23-13-14.png (49 KB, 627x363)

49 KB PNG

>>106495377

Anonymous
09/05/25(Fri)17:15:51 No.106495526

Anonymous 09/05/25(Fri)17:15:51 No.106495526

File: Screenshot_2025-09-05_23-15-36.png (89 KB, 628x386)

89 KB PNG

>>106495501

Anonymous
09/05/25(Fri)17:17:48 No.106495552

Anonymous 09/05/25(Fri)17:17:48 No.106495552

File: aa.png (31 KB, 533x393)

31 KB PNG

>>106495501
>polish programmers

Anonymous
09/05/25(Fri)17:19:48 No.106495575

Anonymous 09/05/25(Fri)17:19:48 No.106495575

>>106495377
> STL makes slow loops
It has always been thus.
Loop is too big to fit into a cache-line.

Anonymous
09/05/25(Fri)17:22:00 No.106495597

Anonymous 09/05/25(Fri)17:22:00 No.106495597

>>106495377
everything C++ does to get away from its "legacy" "deprecated" C core just makes it worse and worse

Anonymous
09/05/25(Fri)17:22:26 No.106495606

Anonymous 09/05/25(Fri)17:22:26 No.106495606

>>106495377
Not idiomatic.
int s = std::reduce(arr, arr + cnt);

Anonymous
09/05/25(Fri)17:35:01 No.106495740

Anonymous 09/05/25(Fri)17:35:01 No.106495740

File: s.png (8 KB, 492x51)

8 KB PNG

>>106495606
Not fastomatic.

Anonymous
09/05/25(Fri)17:50:06 No.106495871

Anonymous 09/05/25(Fri)17:50:06 No.106495871

>>106495377
::qsort is slower than std::ranges::sort

Anonymous
09/05/25(Fri)18:01:38 No.106495971

Anonymous 09/05/25(Fri)18:01:38 No.106495971

File: 1757109617.png (47 KB, 607x475)

47 KB PNG

>>106495382
even at -O1 it optimizes to similar assembly. the only difference being the loop counter,
you can get identical assembly by changing it to
for (int* a = arr; a != arr + cnt; ++a)
       s += *a;
op using -O0 and then complaining about unoptimized code

Anonymous
09/05/25(Fri)18:08:20 No.106496038

Anonymous 09/05/25(Fri)18:08:20 No.106496038

>>106495377
Experts don't even understand why structured bindings work the way they do or were designed that way

Anonymous
09/05/25(Fri)18:09:30 No.106496049

Anonymous 09/05/25(Fri)18:09:30 No.106496049

>>106495971
another noob retard

Anonymous
09/05/25(Fri)18:47:46 No.106496356

Anonymous 09/05/25(Fri)18:47:46 No.106496356

>>106495377
What compiler, optimisation level and list did you use to run the benchmark? Using clang++ 19.1.7 with -O2 on a list of random numbers, "slow" is much faster than "fast" for short lists and they become roughly equivalent as the list grows.

Anonymous
09/05/25(Fri)18:55:56 No.106496405

Anonymous 09/05/25(Fri)18:55:56 No.106496405

>>106496356
>is much faster
Sorry, I read my table incorrectly. They're roughly equivalent regardless of list length.

t. pogeet !!b2oSUmilA2N
09/05/25(Fri)19:20:15 No.106496627

t. pogeet !!b2oSUmilA2N 09/05/25(Fri)19:20:15 No.106496627

>>106495377
>#include <span>
>uses a C++ bloated template class just for the ability to do fancy for loops
>complains that its slow
Modern C++ and Rust are all about type masturbation.
If everyone thinks that doing
struct MyType {
    void* data;
    size_t length;
};
is bad, buggy and makes the code unreadable then performance is the least of the concerns here.
Disgusting.

Anonymous
09/05/25(Fri)19:22:25 No.106496647

Anonymous 09/05/25(Fri)19:22:25 No.106496647

>>106495377
nigger it's 2k25 and you're still arguing on c/c++? christ

Anonymous
09/05/25(Fri)19:24:00 No.106496659

Anonymous 09/05/25(Fri)19:24:00 No.106496659

>>106495377
>>106496627
>using #includes instead of imports
kys
>using void* instead of templates
kys

Anonymous
09/05/25(Fri)19:26:11 No.106496681

Anonymous 09/05/25(Fri)19:26:11 No.106496681

File: fastslow.png (58 KB, 1454x691)

58 KB PNG

In Beef, they both generate the same assembly when you force inline the span.

[Export, LinkName("Fast")]
static int Fast(int* arr, int count)
{
    int s = 0;
    for (int i = 0; i < count; ++i)
    {
        s += arr[i];
    }
    return s;
}

[Export, LinkName("Slow")]
static int Slow(int* arr, int count)
{
    int s = 0;
    for (int x in Span<int>(arr, count))
    {
        s += x;
    }
    return s;
}

Anonymous
09/05/25(Fri)19:28:13 No.106496700

Anonymous 09/05/25(Fri)19:28:13 No.106496700

>>106496681
>>106495526
>>106495501
delete this post right now, you're completely ruining my fucking thread you fucking anti-fun faggot

Anonymous
09/05/25(Fri)19:28:24 No.106496703

Anonymous 09/05/25(Fri)19:28:24 No.106496703

>>106495377
>slow c++ function is 60% slower than fast c++ function (trust me bro)
absolute retard

Anonymous
09/05/25(Fri)19:31:33 No.106496730

Anonymous 09/05/25(Fri)19:31:33 No.106496730

>>106496647
It's 2025 and the noob retards still haven't fixed their compilers.

Anonymous
09/05/25(Fri)19:32:44 No.106496740

Anonymous 09/05/25(Fri)19:32:44 No.106496740

>>106496700
You don't even understand the posts you quoted, +1 noob retard.
>>106496703
Noob retard.

Anonymous
09/05/25(Fri)20:00:06 No.106496960

Anonymous 09/05/25(Fri)20:00:06 No.106496960

the first function may invoke undefined behavior if 'arr' has more than 32767 elements, the second will never invoke ub.
good luck looping over the bytes of an image in a portable manner.
ps both compile to the same assembly with the right compiler flags; if you don't know what you're doing, C and C++ are not for you.

Anonymous
09/05/25(Fri)20:10:00 No.106497022

Anonymous 09/05/25(Fri)20:10:00 No.106497022

>>106495377
the point of stl is for the garuntees, not for the speeds. The truth is that whichever one either encourages cache filling or whichever one engages simd instructions is going to be more performant. But I'd like to see whether it really is faster or not when speed optimization is turned on.

Anonymous
09/05/25(Fri)20:10:03 No.106497024

Anonymous 09/05/25(Fri)20:10:03 No.106497024

>>106495377
C++ compilers have never compiled index based loops and pointer based loops into identical code. They are different.

Anonymous
09/05/25(Fri)20:10:20 No.106497026

Anonymous 09/05/25(Fri)20:10:20 No.106497026

>>106496960
You are the most stupid of all noob retards in this thread. Congratulations.

Anonymous
09/05/25(Fri)20:11:19 No.106497039

Anonymous 09/05/25(Fri)20:11:19 No.106497039

>>106497026
Why do you keep saying noob retard?

Anonymous
09/05/25(Fri)20:12:01 No.106497044

Anonymous 09/05/25(Fri)20:12:01 No.106497044

>>106497039
because he has autism. ignore his thread starting now.

t. pogeet !!b2oSUmilA2N
09/05/25(Fri)20:13:38 No.106497061

t. pogeet !!b2oSUmilA2N 09/05/25(Fri)20:13:38 No.106497061

>>106496700
Beef is a better OOP lang than C++ every wish it could be.

Anonymous
09/05/25(Fri)20:16:10 No.106497080

Anonymous 09/05/25(Fri)20:16:10 No.106497080

>>106495377
Micro benchmarking can be very difficult to do correctly and even experts fuck it up sometimes. Judging by what you're benchmarking and your attitude, I can safely say I don't trust you to have the competence to have done it correctly.

Anonymous
09/05/25(Fri)20:18:05 No.106497094

Anonymous 09/05/25(Fri)20:18:05 No.106497094

>>106497080
If you can't benchmark that you're a noob retard.

Anonymous
09/05/25(Fri)20:28:07 No.106497160

Anonymous 09/05/25(Fri)20:28:07 No.106497160

File: 9i9kgo[1].jpg (40 KB, 679x522)

40 KB JPG

>>106497094
anyone who says that can't be trusted to do it correct.

Anonymous
09/05/25(Fri)20:29:49 No.106497172

Anonymous 09/05/25(Fri)20:29:49 No.106497172

>>106496960
>the first function may invoke undefined behavior if 'arr' has more than 32767 elements
Why?

Anonymous
09/05/25(Fri)21:22:56 No.106497562

Anonymous 09/05/25(Fri)21:22:56 No.106497562

>>106495377
Not OP, but I benchmarked it.
I made sure to randomize cache between runs.
I also added another option which is using spans, but accumulating by using std::reduce, and guess what on fast optimization, std::reduce caused spans to be just as fast as the C method, but it was a close race.
==> result_o0 <==
Slow Elapsed (s) 6.7619e-05s
Fast Elapsed (s) 2.033e-05s
Slow with std::reduce Elapsed (s) 7.1959e-05s

==> result_o1 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 9e-08s
Slow with std::reduce Elapsed (s) 3.18e-06s

==> result_o2 <==
Slow Elapsed (s) 2.1e-07s
Fast Elapsed (s) 5e-08s
Slow with std::reduce Elapsed (s) 1e-07s

==> result_o3 <==
Slow Elapsed (s) 1.9e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 8e-08s

==> result_ofast <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 8e-08s
Slow with std::reduce Elapsed (s) 5e-08s

==> result_os <==
Slow Elapsed (s) 2.1e-07s
Fast Elapsed (s) 8e-08s
Slow with std::reduce Elapsed (s) 1.1e-07s

Anonymous
09/05/25(Fri)21:27:00 No.106497588

Anonymous 09/05/25(Fri)21:27:00 No.106497588

>>106497562
This is the code I used with g++ as -std=c++20

#include <random>
#include <span>
#include <chrono>
#include <iostream>
#include <numeric>

int fast(int *arr, int cnt)
{
    int s { 0 };
    for (int i = 0; i < cnt; ++i)
    {
        s += arr[i];
    }
    return s;
}

int slow(int *arr, int cnt)
{
    int s { 0 };
    for (int x : std::span(arr, cnt))
    {
        s += x;
    }
    return s;
}

int slow_with_std_reduce(int *arr, int cnt)
{
    auto span_of_arr { std::span(arr, cnt) };
    return std::reduce(span_of_arr.begin(), span_of_arr.end(), 0);
}

void __attribute__((optimize("O0"))) randomize_cache()
{
    const size_t bigger_than_cachesize { 10 * 1024 * 1024 };
    static long *p { new long[bigger_than_cachesize] };

    for (int i = 0; i < bigger_than_cachesize; ++i)
    {
        p[i] = rand();
    }
}

to be continued

Anonymous
09/05/25(Fri)21:28:01 No.106497595

Anonymous 09/05/25(Fri)21:28:01 No.106497595

>>106497588

int main()
{
    using namespace std::chrono;

    const size_t test_arr_size { 5000 };

    int test_arr[50000];

    for (size_t i = 0; i < test_arr_size; ++i)
    {
        test_arr[i] = rand();
    }

    randomize_cache();

    const auto test_arr_ptr { reinterpret_cast<int *>(test_arr) };

    const auto slow_start { steady_clock::now() };

    const auto slow_result { slow(test_arr_ptr, test_arr_size) };

    const auto slow_stop { steady_clock::now() };

    randomize_cache();

    const auto fast_start { steady_clock::now() };

    const auto fast_result { fast(test_arr_ptr, test_arr_size) };

    const auto fast_stop { steady_clock::now() };

    randomize_cache();

    const auto slow_with_std_reduce_start { steady_clock::now() };

    const auto slow_with_std_reduce_result { slow_with_std_reduce(test_arr_ptr, test_arr_size) };

    const auto slow_with_std_reduce_stop { steady_clock::now() };

    const duration<double> elapsed_slow { slow_stop - slow_start };

    std::cout << "Slow Elapsed (s) " << elapsed_slow << '\n';

    const duration<double> elapsed_fast { fast_stop - fast_start };

    std::cout << "Fast Elapsed (s) " << elapsed_fast << '\n';

    const duration<double> elapsed_slow_with_std_reduce { slow_with_std_reduce_stop - slow_with_std_reduce_start };

    std::cout << "Slow with std::reduce Elapsed (s) " << elapsed_slow_with_std_reduce << '\n';
}

Anonymous
09/05/25(Fri)21:31:25 No.106497615

Anonymous 09/05/25(Fri)21:31:25 No.106497615

>>106497595
I just realized I used the wrong array size, let me fix it.

==> result_o0 <==
Slow Elapsed (s) 0.000667988s
Fast Elapsed (s) 0.000200509s
Slow with std::reduce Elapsed (s) 0.000713617s

==> result_o1 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 2.245e-05s

==> result_o2 <==
Slow Elapsed (s) 2.5e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 1e-07s

==> result_o3 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 7e-08s
Slow with std::reduce Elapsed (s) 6e-08s

==> result_ofast <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 7e-08s

==> result_os <==
Slow Elapsed (s) 2.2e-07s
Fast Elapsed (s) 7e-08s
Slow with std::reduce Elapsed (s) 8e-08s

Anonymous
09/05/25(Fri)21:32:45 No.106497627

Anonymous 09/05/25(Fri)21:32:45 No.106497627

>>106497615
Used this as fix to main

        const size_t test_arr_size { 50000 };

        int test_arr[test_arr_size];

        for (size_t i = 0; i < test_arr_size; ++i)
        {
                test_arr[i] = rand();
        }

Anonymous
09/05/25(Fri)22:21:43 No.106497988

Anonymous 09/05/25(Fri)22:21:43 No.106497988

>>106497615
One last fix I swear, so I checked my assembly output because those numbers seemed too low, and the compiler had reordered my now statements. So I fixed that, here are the new results:

==> result_o0 <==
Slow Elapsed (s) 0.000699038s
Fast Elapsed (s) 0.000200809s
Slow with std::reduce Elapsed (s) 0.000640428s

==> result_o1 <==
Slow Elapsed (s) 4.263e-05s
Fast Elapsed (s) 5.702e-05s
Slow with std::reduce Elapsed (s) 2.302e-05s

==> result_o2 <==
Slow Elapsed (s) 1.836e-05s
Fast Elapsed (s) 1.716e-05s
Slow with std::reduce Elapsed (s) 2.441e-05s

==> result_o3 <==
Slow Elapsed (s) 1.87e-05s
Fast Elapsed (s) 2.2059e-05s
Slow with std::reduce Elapsed (s) 2.381e-05s

==> result_ofast <==
Slow Elapsed (s) 1.5909e-05s
Fast Elapsed (s) 1.99e-05s
Slow with std::reduce Elapsed (s) 2.233e-05s

==> result_os <==
Slow Elapsed (s) 4.642e-05s
Fast Elapsed (s) 4.172e-05s
Slow with std::reduce Elapsed (s) 2.358e-05s

Anonymous
09/05/25(Fri)22:21:54 No.106497990

Anonymous 09/05/25(Fri)22:21:54 No.106497990

>>106497588
Why do you niggers use #include? Is it still 1800?

Anonymous
09/05/25(Fri)22:23:34 No.106497995

Anonymous 09/05/25(Fri)22:23:34 No.106497995

>>106497988
and the corrections to main is this

template <class T>
__attribute__((always_inline)) inline void do_not_optimize(const T &value) {
        asm volatile("" : "+m"(const_cast<T &>(value)));
}

int main()
{
        using namespace std::chrono;

        const size_t test_arr_size { 50000 };

        int test_arr[test_arr_size];

        for (size_t i = 0; i < test_arr_size; ++i)
        {
                test_arr[i] = rand();
        }

        randomize_cache();

        const auto test_arr_ptr { reinterpret_cast<int *>(test_arr) };

        const auto slow_start { high_resolution_clock::now() };

        do_not_optimize(test_arr_ptr);
        const auto slow_result { slow(test_arr_ptr, test_arr_size) };
        do_not_optimize(slow_result);

        const auto slow_stop { high_resolution_clock::now() };

        randomize_cache();

        const auto fast_start { high_resolution_clock::now() };

        do_not_optimize(test_arr_ptr);
        const auto fast_result { fast(test_arr_ptr, test_arr_size) };
        do_not_optimize(fast_result);

        const auto fast_stop { high_resolution_clock::now() };

        randomize_cache();

        const auto slow_with_std_reduce_start { high_resolution_clock::now() };

        do_not_optimize(test_arr_ptr);
        const auto slow_with_std_reduce_result { slow_with_std_reduce(test_arr_ptr, test_arr_size) };
        do_not_optimize(slow_with_std_reduce_result);

        const auto slow_with_std_reduce_stop { high_resolution_clock::now() };

Anonymous
09/05/25(Fri)22:33:02 No.106498064

Anonymous 09/05/25(Fri)22:33:02 No.106498064

Micro-benchmarking at this level isn't very useful.

Anonymous
09/05/25(Fri)22:37:44 No.106498091

Anonymous 09/05/25(Fri)22:37:44 No.106498091

>>106498064
I know, I just wanted to show that C++ even when abstract is still good at doing stuff.

Anonymous
09/05/25(Fri)22:44:27 No.106498140

Anonymous 09/05/25(Fri)22:44:27 No.106498140

assuming this isn't bait. even without compiler optimizations the code will be almost identical in performance. a span is a lightweight objects, and range based for loops are also pretty light weight. you have using two abstractions in the second example, iterators and spans, but some could argue it is easier to read. i guess if it matters that much don't use them?

Anonymous
09/05/25(Fri)23:12:53 No.106498306

Anonymous 09/05/25(Fri)23:12:53 No.106498306

>>106496681
>Beef,
is this an anti-jeet language?

Anonymous
09/05/25(Fri)23:16:51 No.106498317

Anonymous 09/05/25(Fri)23:16:51 No.106498317

>>106497988
don't c++ fags know how to print nanoseconds?

Anonymous
09/05/25(Fri)23:17:24 No.106498320

Anonymous 09/05/25(Fri)23:17:24 No.106498320

File: Untitled12.png (9 KB, 819x93)

9 KB PNG

>>106498306
It's anti-vegan.

Anonymous
09/05/25(Fri)23:19:16 No.106498330

Anonymous 09/05/25(Fri)23:19:16 No.106498330

>>106498317
I was being lazy and I wasn't sure if the thread would unalive before I could put my autism on display okay.

Anonymous
09/05/25(Fri)23:33:14 No.106498400

Anonymous 09/05/25(Fri)23:33:14 No.106498400

>>106495501
>>106495526
What book is this from?

Anonymous
09/05/25(Fri)23:49:39 No.106498467

Anonymous 09/05/25(Fri)23:49:39 No.106498467

>>106495501
muh zero-cost abstraction

Anonymous
09/05/25(Fri)23:51:24 No.106498480

Anonymous 09/05/25(Fri)23:51:24 No.106498480

>>106496627
put the length first

Anonymous
09/05/25(Fri)23:58:24 No.106498533

Anonymous 09/05/25(Fri)23:58:24 No.106498533

File: ACKKK.png (104 KB, 1920x1080)

104 KB PNG

OP is gay

Anonymous
09/06/25(Sat)00:04:58 No.106498560

Anonymous 09/06/25(Sat)00:04:58 No.106498560

>>106498533
mfw the compiler isn't smart enough to just emit
mov edi,std::cout
mov esi,10
call std::ostream::operator<<(int)
shit lang

Anonymous
09/06/25(Sat)00:36:25 No.106498696

Anonymous 09/06/25(Sat)00:36:25 No.106498696

>>106498560
>i dont know what constness is
>proceeds to language in pointless language flamewars
hang yourself you worthless nigger faggot

Anonymous
09/06/25(Sat)00:38:01 No.106498703

Anonymous 09/06/25(Sat)00:38:01 No.106498703

>>106498696
>language in pointless
engage in*

Anonymous
09/06/25(Sat)00:38:28 No.106498705

Anonymous 09/06/25(Sat)00:38:28 No.106498705

>>106498696
your mythical smart compiler should be able to look at the whole program and see nothing writes to your array bro just pretend its const bro

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.