[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: aaa.png (29 KB, 680x357)
29 KB
29 KB PNG
And some retards still believe the compiler will optimize everything for you.
>>
Show godbolt output
>>
>>106495382
No, do it yourself
>>
>>106495388
I don't care enough
>>
>cnt
>>
>>106495377
>>
>>106495501
>>
File: aa.png (31 KB, 533x393)
31 KB
31 KB PNG
>>106495501
>polish programmers
>>
>>106495377
> STL makes slow loops
It has always been thus.
Loop is too big to fit into a cache-line.
>>
>>106495377
everything C++ does to get away from its "legacy" "deprecated" C core just makes it worse and worse
>>
>>106495377
Not idiomatic.
int s = std::reduce(arr, arr + cnt);
>>
File: s.png (8 KB, 492x51)
8 KB
8 KB PNG
>>106495606
Not fastomatic.
>>
>>106495377
::qsort is slower than std::ranges::sort
>>
File: 1757109617.png (47 KB, 607x475)
47 KB
47 KB PNG
>>106495382
even at -O1 it optimizes to similar assembly. the only difference being the loop counter,
you can get identical assembly by changing it to
for (int* a = arr; a != arr + cnt; ++a)
s += *a;


op using -O0 and then complaining about unoptimized code
>>
>>106495377
Experts don't even understand why structured bindings work the way they do or were designed that way
>>
>>106495971
another noob retard
>>
>>106495377
What compiler, optimisation level and list did you use to run the benchmark? Using clang++ 19.1.7 with -O2 on a list of random numbers, "slow" is much faster than "fast" for short lists and they become roughly equivalent as the list grows.
>>
>>106496356
>is much faster
Sorry, I read my table incorrectly. They're roughly equivalent regardless of list length.
>>
>>106495377
>#include <span>
>uses a C++ bloated template class just for the ability to do fancy for loops
>complains that its slow
Modern C++ and Rust are all about type masturbation.
If everyone thinks that doing
struct MyType {
void* data;
size_t length;
};

is bad, buggy and makes the code unreadable then performance is the least of the concerns here.
Disgusting.
>>
>>106495377
nigger it's 2k25 and you're still arguing on c/c++? christ
>>
>>106495377
>>106496627
>using #includes instead of imports
kys
>using void* instead of templates
kys
>>
File: fastslow.png (58 KB, 1454x691)
58 KB
58 KB PNG
In Beef, they both generate the same assembly when you force inline the span.
[Export, LinkName("Fast")]
static int Fast(int* arr, int count)
{
int s = 0;
for (int i = 0; i < count; ++i)
{
s += arr[i];
}
return s;
}

[Export, LinkName("Slow")]
static int Slow(int* arr, int count)
{
int s = 0;
for (int x in Span<int>(arr, count))
{
s += x;
}
return s;
}

>>
>>106496681
>>106495526
>>106495501
delete this post right now, you're completely ruining my fucking thread you fucking anti-fun faggot
>>
>>106495377
>slow c++ function is 60% slower than fast c++ function (trust me bro)
absolute retard
>>
>>106496647
It's 2025 and the noob retards still haven't fixed their compilers.
>>
>>106496700
You don't even understand the posts you quoted, +1 noob retard.
>>106496703
Noob retard.
>>
the first function may invoke undefined behavior if 'arr' has more than 32767 elements, the second will never invoke ub.
good luck looping over the bytes of an image in a portable manner.
ps both compile to the same assembly with the right compiler flags; if you don't know what you're doing, C and C++ are not for you.
>>
>>106495377
the point of stl is for the garuntees, not for the speeds. The truth is that whichever one either encourages cache filling or whichever one engages simd instructions is going to be more performant. But I'd like to see whether it really is faster or not when speed optimization is turned on.
>>
>>106495377
C++ compilers have never compiled index based loops and pointer based loops into identical code. They are different.
>>
>>106496960
You are the most stupid of all noob retards in this thread. Congratulations.
>>
>>106497026
Why do you keep saying noob retard?
>>
>>106497039
because he has autism. ignore his thread starting now.
>>
>>106496700
Beef is a better OOP lang than C++ every wish it could be.
>>
>>106495377
Micro benchmarking can be very difficult to do correctly and even experts fuck it up sometimes. Judging by what you're benchmarking and your attitude, I can safely say I don't trust you to have the competence to have done it correctly.
>>
>>106497080
If you can't benchmark that you're a noob retard.
>>
File: 9i9kgo[1].jpg (40 KB, 679x522)
40 KB
40 KB JPG
>>106497094
anyone who says that can't be trusted to do it correct.
>>
>>106496960
>the first function may invoke undefined behavior if 'arr' has more than 32767 elements
Why?
>>
>>106495377
Not OP, but I benchmarked it.
I made sure to randomize cache between runs.
I also added another option which is using spans, but accumulating by using std::reduce, and guess what on fast optimization, std::reduce caused spans to be just as fast as the C method, but it was a close race.
==> result_o0 <==
Slow Elapsed (s) 6.7619e-05s
Fast Elapsed (s) 2.033e-05s
Slow with std::reduce Elapsed (s) 7.1959e-05s

==> result_o1 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 9e-08s
Slow with std::reduce Elapsed (s) 3.18e-06s

==> result_o2 <==
Slow Elapsed (s) 2.1e-07s
Fast Elapsed (s) 5e-08s
Slow with std::reduce Elapsed (s) 1e-07s

==> result_o3 <==
Slow Elapsed (s) 1.9e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 8e-08s

==> result_ofast <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 8e-08s
Slow with std::reduce Elapsed (s) 5e-08s

==> result_os <==
Slow Elapsed (s) 2.1e-07s
Fast Elapsed (s) 8e-08s
Slow with std::reduce Elapsed (s) 1.1e-07s
>>
>>106497562
This is the code I used with g++ as -std=c++20
#include <random>
#include <span>
#include <chrono>
#include <iostream>
#include <numeric>

int fast(int *arr, int cnt)
{
int s { 0 };
for (int i = 0; i < cnt; ++i)
{
s += arr[i];
}
return s;
}

int slow(int *arr, int cnt)
{
int s { 0 };
for (int x : std::span(arr, cnt))
{
s += x;
}
return s;
}

int slow_with_std_reduce(int *arr, int cnt)
{
auto span_of_arr { std::span(arr, cnt) };
return std::reduce(span_of_arr.begin(), span_of_arr.end(), 0);
}

void __attribute__((optimize("O0"))) randomize_cache()
{
const size_t bigger_than_cachesize { 10 * 1024 * 1024 };
static long *p { new long[bigger_than_cachesize] };

for (int i = 0; i < bigger_than_cachesize; ++i)
{
p[i] = rand();
}
}

to be continued
>>
>>106497588
int main()
{
using namespace std::chrono;

const size_t test_arr_size { 5000 };

int test_arr[50000];

for (size_t i = 0; i < test_arr_size; ++i)
{
test_arr[i] = rand();
}

randomize_cache();

const auto test_arr_ptr { reinterpret_cast<int *>(test_arr) };

const auto slow_start { steady_clock::now() };

const auto slow_result { slow(test_arr_ptr, test_arr_size) };

const auto slow_stop { steady_clock::now() };

randomize_cache();

const auto fast_start { steady_clock::now() };

const auto fast_result { fast(test_arr_ptr, test_arr_size) };

const auto fast_stop { steady_clock::now() };

randomize_cache();

const auto slow_with_std_reduce_start { steady_clock::now() };

const auto slow_with_std_reduce_result { slow_with_std_reduce(test_arr_ptr, test_arr_size) };

const auto slow_with_std_reduce_stop { steady_clock::now() };

const duration<double> elapsed_slow { slow_stop - slow_start };

std::cout << "Slow Elapsed (s) " << elapsed_slow << '\n';

const duration<double> elapsed_fast { fast_stop - fast_start };

std::cout << "Fast Elapsed (s) " << elapsed_fast << '\n';

const duration<double> elapsed_slow_with_std_reduce { slow_with_std_reduce_stop - slow_with_std_reduce_start };

std::cout << "Slow with std::reduce Elapsed (s) " << elapsed_slow_with_std_reduce << '\n';
}
>>
>>106497595
I just realized I used the wrong array size, let me fix it.
==> result_o0 <==
Slow Elapsed (s) 0.000667988s
Fast Elapsed (s) 0.000200509s
Slow with std::reduce Elapsed (s) 0.000713617s

==> result_o1 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 2.245e-05s

==> result_o2 <==
Slow Elapsed (s) 2.5e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 1e-07s

==> result_o3 <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 7e-08s
Slow with std::reduce Elapsed (s) 6e-08s

==> result_ofast <==
Slow Elapsed (s) 2e-07s
Fast Elapsed (s) 1e-07s
Slow with std::reduce Elapsed (s) 7e-08s

==> result_os <==
Slow Elapsed (s) 2.2e-07s
Fast Elapsed (s) 7e-08s
Slow with std::reduce Elapsed (s) 8e-08s
>>
>>106497615
Used this as fix to main
        const size_t test_arr_size { 50000 };

int test_arr[test_arr_size];

for (size_t i = 0; i < test_arr_size; ++i)
{
test_arr[i] = rand();
}
>>
>>106497615
One last fix I swear, so I checked my assembly output because those numbers seemed too low, and the compiler had reordered my now statements. So I fixed that, here are the new results:
==> result_o0 <==
Slow Elapsed (s) 0.000699038s
Fast Elapsed (s) 0.000200809s
Slow with std::reduce Elapsed (s) 0.000640428s

==> result_o1 <==
Slow Elapsed (s) 4.263e-05s
Fast Elapsed (s) 5.702e-05s
Slow with std::reduce Elapsed (s) 2.302e-05s

==> result_o2 <==
Slow Elapsed (s) 1.836e-05s
Fast Elapsed (s) 1.716e-05s
Slow with std::reduce Elapsed (s) 2.441e-05s

==> result_o3 <==
Slow Elapsed (s) 1.87e-05s
Fast Elapsed (s) 2.2059e-05s
Slow with std::reduce Elapsed (s) 2.381e-05s

==> result_ofast <==
Slow Elapsed (s) 1.5909e-05s
Fast Elapsed (s) 1.99e-05s
Slow with std::reduce Elapsed (s) 2.233e-05s

==> result_os <==
Slow Elapsed (s) 4.642e-05s
Fast Elapsed (s) 4.172e-05s
Slow with std::reduce Elapsed (s) 2.358e-05s
>>
>>106497588
Why do you niggers use #include? Is it still 1800?
>>
>>106497988
and the corrections to main is this
template <class T>
__attribute__((always_inline)) inline void do_not_optimize(const T &value) {
asm volatile("" : "+m"(const_cast<T &>(value)));
}

int main()
{
using namespace std::chrono;

const size_t test_arr_size { 50000 };

int test_arr[test_arr_size];

for (size_t i = 0; i < test_arr_size; ++i)
{
test_arr[i] = rand();
}

randomize_cache();

const auto test_arr_ptr { reinterpret_cast<int *>(test_arr) };

const auto slow_start { high_resolution_clock::now() };

do_not_optimize(test_arr_ptr);
const auto slow_result { slow(test_arr_ptr, test_arr_size) };
do_not_optimize(slow_result);

const auto slow_stop { high_resolution_clock::now() };

randomize_cache();

const auto fast_start { high_resolution_clock::now() };

do_not_optimize(test_arr_ptr);
const auto fast_result { fast(test_arr_ptr, test_arr_size) };
do_not_optimize(fast_result);

const auto fast_stop { high_resolution_clock::now() };

randomize_cache();

const auto slow_with_std_reduce_start { high_resolution_clock::now() };

do_not_optimize(test_arr_ptr);
const auto slow_with_std_reduce_result { slow_with_std_reduce(test_arr_ptr, test_arr_size) };
do_not_optimize(slow_with_std_reduce_result);

const auto slow_with_std_reduce_stop { high_resolution_clock::now() };
>>
Micro-benchmarking at this level isn't very useful.
>>
>>106498064
I know, I just wanted to show that C++ even when abstract is still good at doing stuff.
>>
assuming this isn't bait. even without compiler optimizations the code will be almost identical in performance. a span is a lightweight objects, and range based for loops are also pretty light weight. you have using two abstractions in the second example, iterators and spans, but some could argue it is easier to read. i guess if it matters that much don't use them?
>>
>>106496681
>Beef,
is this an anti-jeet language?
>>
>>106497988
don't c++ fags know how to print nanoseconds?
>>
File: Untitled12.png (9 KB, 819x93)
9 KB
9 KB PNG
>>106498306
It's anti-vegan.
>>
>>106498317
I was being lazy and I wasn't sure if the thread would unalive before I could put my autism on display okay.
>>
>>106495501
>>106495526
What book is this from?
>>
>>106495501
muh zero-cost abstraction
>>
>>106496627
put the length first
>>
File: ACKKK.png (104 KB, 1920x1080)
104 KB
104 KB PNG
OP is gay
>>
>>106498533
mfw the compiler isn't smart enough to just emit
mov edi,std::cout
mov esi,10
call std::ostream::operator<<(int)

shit lang
>>
>>106498560
>i dont know what constness is
>proceeds to language in pointless language flamewars
hang yourself you worthless nigger faggot
>>
>>106498696
>language in pointless
engage in*
>>
>>106498696
your mythical smart compiler should be able to look at the whole program and see nothing writes to your array bro just pretend its const bro



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.