[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: aes.png (100 KB, 1536x921)
100 KB
100 KB PNG
>turning one instruction into twelve
So this is the power of RISC
>>
>>107539017
The reason why RISCV is so slow right now is because it is lacking good branch prediction, in comparison lack of compound instructions is a minor performance hit
>>
File: riscv.png (78 KB, 1190x696)
78 KB
78 KB PNG
>>107539051
That's AArch64 asm in the screenshot.
RISC-V is even worse since the AES instructions don't use the vector registers at all.
>>
>>107539017
What do you think "reduced instruction set" means
Would you prefer to have six gorillion obscure instructions like amd64
>>
>>107539017
now show the actual circuits and power required to process said 'single instruction' vs the arm one
>>
Not a single RISC architecture has a CAS instruction except RISC-V which only has it as an extra nobody will implement. The day I learned that, I immediately grew out of the RISC meme.
>>
>>107539017
Kind of a moot point, cause all modern x86_64 processors implement a smaller RISC-like language for implementing the big instructions called micro-code. They define the individual atomic operations used for operations, because past a certain point it became infeasible to maintain all of x86’s op-codes in pure silicon, and microcode also means you can fix bugs
>>
>>107540419
You’re wrong, ARM has one: https://developer.arm.com/documentation/ddi0602/2025-09/Base-Instructions/CAS--CASA--CASAL--CASL--Compare-and-swap-word-or-doubleword-in-memory-
RISC-V also has a standard extension that implements it, but it’s not part of the base standard
>>
>>107539017
There is nothing even remotely "reduced" about modern ARM. Also, ARM does not have 512bit registers, so obviously it would need multiple instructions. Nothing whatsoever to do with being "RISC".
>>
File: x86_vs_aarch64.png (332 KB, 1368x1024)
332 KB
332 KB PNG
>>107540337
>Would you prefer to have six gorillion obscure instructions like amd64
lol
lmao
kek, even
>>
>>107540559
My understanding is that the RISC model mostly focuses on making sure an instruction does one thing. This means that instructions do not handle storing to memory, or fetching from memory. You need to do this yourself. So every instruction is preceded by loads and succeeded by stores. CISC architectures on the other hand have more complex instruction encodings, that mean that any given instruction can:
- Read from a register, write to a static address
- Read from a register, write to an address in another register
- Read from a register, write to a register

And so on. This encoding is a notable factor in the complexity of x86, because of just how many ways these can be combined. Doing it like this makes it easier for human devs, because it’s less verbose and easier to work with, which is why x86 won out I think, cause at the time a lot more people were writing directly in assembly.
>>
>>107539051
>it is lacking good branch prediction

Rather than guessing the next instruction, the CPU should just guess the final output. We can call it "predictive computing". You don't even need to write a program, just a vague statement of what you're kinda looking for.
>>
>>107540624
Maybe some fags will make AI do it
>>
>>107540604
>an instruction does one thing

Yes, it works like traditional CPUs used to work. REDUCED Instruction Set Computing...i.e. the total number of registers is deliberately limited. This gives you granular control over program execution but requires hand-optimization of code. It CAN be better, but it won't be if you're using jeetcoders.

CISC treats registers as more of an API, where a call to a register may result in the computer performing numerous additional steps not specified in the program. Such as CMPXCHG and XADD. The idea being that you can improve performance by having commonly used operations baked into the hardware rather than having to repeat them via software step-by-step each time.
>>
>>107540629
And make it slower than a 68k whilst requiring Guatemala's total power output to compute a single SHA512 hash.
>>
>>107540440
Compilers still rely on LS/SC retardation.
https://godbolt.org/z/1rGr1fMjr
>>
>>107540704
Yep, and they'll boast about it too and hype retarded investors up with it.
>>
>>107539017
>turning one instruction into twelve
you're describing Intel's microcode
>>
>>107540824
Except the microcode doesn't unroll into every nook and cranny of the platform, bloating every executable N-fold.
>>
>>107540738
armv7 doesn’t appear to have it, changing the compiler to ARM64 GCC uses the casal instruction: https://godbolt.org/z/resrT3c8h
>>
>>107540704
>than a 68k
zoomer spotted
>>
>>107539017
From what I see, vaesenc has a latency of 4 to 5 while aese could be run concurrently on separate ALUs.
>>
>>107540910
>>than a 68k
>zoomer spotted
I specifically using a 68k because it is widely considered to have the cleanest microcode whilst still being fairly performant. This simplicity would make it a likely target for any ml training since it is vastly simply than x86, and thus, would have fewer output errors, and it is even still supported by gcc. Aside from that, expecting an ai model made to simulate a cpu to avoid branching is unlikely to be doable much faster than that; I should be shocked if it even reached Pentium speeds.
>>
>>107541042
but it has both microcode and nanocode
>>
>>107541113
I didn't say it was clean--just cleanest (esp. vs amd64) while still having decades of tooling and modern support. RISCV is still shitty and fragmented without as much support.
>>
File: office handshake.jpg (89 KB, 1351x1232)
89 KB
89 KB JPG
>>107540559
>Almost 50% more instructions
>Almost three times longer manual
>Still has been performance/watt and battery life than anything else on the market
>>
>>107541172
>performance/watt
Not really, when you consider work done per unit of power x86 shits on everything.
>>
File: 1739821190020499.jpg (891 KB, 1514x1912)
891 KB
891 KB JPG
>>107539017
>le import solution architecture
>>
File: 1739356138889895.jpg (172 KB, 900x598)
172 KB
172 KB JPG
>>107540687
risc lowers the complexity at the circuitry level but raises it at the compiler level. Modern "risc" cpus also have a lot of instructions or are surrounded which a bunch of coprocessors. Smartphones, for example, have co-processors for photography, video recording, video decoding, audio processing, AI, rendering (gpu), encryption, ... instead of having a large instruction set, you have multiple ones to handle. From an user (programmer) perspective: cisc >>> risc.
>>
>>107541318
>user (programmer)
user (user) as well, x86 is compatible with everything, arm is not.
>>
>>107539017
>ahhhhh aes
KYS
>>
>>107541429
No.
>>
https://www.youtube.com/watch?v=vJP_oKN4Ez0
>>
>>107540424
>Kind of a moot point, cause all modern x86_64 processors implement a smaller RISC-like language for implementing the big instructions called micro-code.
So did the 8086.
https://www.righto.com/2022/11/how-8086-processors-microcode-engine.html

>>107540687
>Yes, it works like traditional CPUs used to work. REDUCED Instruction Set Computing
It's the other way around. Traditional CPUs are CISC and have microcode.
>>
>>107540629
Yes what we really need is for CPUs to hallucinate
>>
>>107541809
Marketing execs: "But AI doesn't hallucinate! It gives the correct answer that nobody else can!"
>>
>>107541318
RISC is a myth, it basically does not exist in practice. There is nothing even remotely reduced about a modern ARM core.
>>
>>107542008
MIPS was
>>
>>107541429
elaborate, retarded gorilla nigger
>>
File: cash.png (277 KB, 1065x1431)
277 KB
277 KB PNG
>>107540687
>CISC treats registers as more of an API, where a call to a register may result in the computer performing numerous additional steps not specified in the program. Such as CMPXCHG and XADD.
Who's gonna tell him?
>inb4 aarch64 isn't risc
>>
File: 1762367486198568.jpg (36 KB, 300x417)
36 KB
36 KB JPG
>>107539017
Unlike x86's aesenc/
vaesenc
(which take state and a round key that can be a memory operand or register), ARM's aese always performs an initial AddRoundKey (XOR) with a register-held key, followed by the other steps. There's no variant that loads the key directly from memory within the crypto instruction itself. This design integrates well with ARM's NEON SIMD vectors and allows pairing aese + aesmc for high-throughput pipelining on many cores.

Requiring keys in registers gives compilers/assemblers more control over scheduling, prefetching, and parallel block processing (common in modes like CBC or GCM). It also aligns with ARM's emphasis on vector processing for crypto. While it adds explicit load instructions (as seen in the image's address calculation and ldp), this overhead is minor compared to the speedup from hardware-accelerated rounds, especially when processing multiple blocks or using pre-loaded key schedules.
>>
File: 1763671757.png (1.69 MB, 1664x928)
1.69 MB
1.69 MB PNG
>>107542032
AES is so inefficient (and difficult to implement securely) that hardware vendors are forced to implement it at the hardware level.
Chacha20 and XChacha are better ciphers.
>>
>>107542080
>Requiring keys in registers gives compilers/assemblers more control
>taking away choices gives greater control
GPT isn't sending its best today.
>>107540958
On Zen 5, vaesenc indeed has a latency of 4 cycles but it has a throughput of 2 instructions per cycle. That's the equivalent of 8 pairs of arm's aese+aesmc every cycle, which no modern arm processor is even close to.
>>
>>107542080
>gives compilers/assemblers more control
Why the fuck should they have more control? Are compiler engineers more qualified than the CPU engineers? Fuck no.
>>
>>107542157
It's not a matter of qualification but knowledge, a compiler knows more about the program than the cpu so can assist in instruction scheduling
>>
>>107542182
and a CPU knows more about what execution units are available than the compiler ever will. (This is why Itanium failed, btw).
>>
File: Not RISC.png (71 KB, 1399x520)
71 KB
71 KB PNG
>>107542076
Don't forget the atomic read-modify-write instructions.
>>
>>107542196
Sure, the cpu is free to dispatch or reorder instructions itself, hopefully the compiler writers are just following the manufacturer's docs so the cpu has more opportunities to make those decisions
>>
>>107539017
We need a good C processor like the Vaxen had. CISC is dead and that's a shame.

Anyway your modern Intel machine does this too. Inside deep in there it's RISC with a big fact instruction decoder on the front now. Has been since like, gee, the Pentium Pro?

>>107541042
I think GCC has dropped everything pre-'030.
>>
>>107542439
>We need a good C processor like the Vaxen had. CISC is dead and that's a shame.
The VAX computers were designed for FORTRAN, COBOL, and PL/I. C had nothing to do with it. The string instructions use a 16-bit length, and there is nothing for null-terminated strings. It has an instruction for the traditional FORTRAN and ALGOL style for loop called ACB (add compare branch). The decimal data types are for COBOL and PL/I. The POLY and EDIT instructions come from the PL/I language.
>>
>>107542439
>I think GCC has dropped everything pre-'030.

>https://gcc.gnu.org/onlinedocs/gcc/M680x0-Options.html
>-mc68000
> Generate output for a 68000. This is the default when the compiler is configured for 68000-based systems. It is equivalent to -march=68000.

I wouldn't be surprised if GCC's docs are out of date, but from what I can tell, GCC still supports even the original 68k.
>>
>>107542508
>The VAX computers were designed for FORTRAN, COBOL, and PL/I. C had nothing to do with it.
Wrong. PDP computers, correct. VAXen were made specifically with C in mind.

>>107542514
GCC is in flux at all times but this is good to know. Maybe it's still used in embedded? They have been dropping old archs like crazy in the past few years though.
>>
>>107542551
>Wrong. PDP computers, correct. VAXen were made specifically with C in mind.
Read the rest of my post. The VAX has NOTHING to do with C. Read one of the manuals and it has examples in PL/I, FORTRAN, and COBOL, and I learned the PL/I language and the POLY and EDITPC instructions come directly from PL/I.
>>
>>107542439
This whole x86 is RISC internally thing is totally backwards. You say that it became RISC internally with the Pentium pro, but out of order execution pipelines are a totally different kind of thing. If anything, the 8086 should be considered internally RISC, its micro-code is a full on RISC ISA with a program counter and everything.
>>
>>107541172
It doesn't have better performance per watt, the apple m4 max uses over 50 watts of power when doing anything intensive, while at the same time delivering effectively less performance per core than an intel or amd chip, the performance from arm always came from paralellization, they cram as much tiny cores as they can and just do every process in its own core.
And have in mind that the apple MX chips use basically dedicated accelerators for almost any task they can cram one for, so the real performance of these chips is actually much lower outside synthetic benchmarks they can highly optimize for.
>>
>>107542626
>The VAX has NOTHING to do with C.
No that's not true. It was the ULTIMATE CISC CPU series. They tried to map functions from the stlibs of C to an instruction in the CPU 1:1. There is a good oral history about it floating around.

>>107542749
>but out of order execution pipelines are a totally different kind of thing
Yes but they fused it all in PPro, which is the arch still used today with 64-bit extensions. They failed with P4 and Itanic.
>>
>>107540624
That's literally vibe coding
>>
>>107539017
Yes that's kind of the point actually.
This >>107541318
>>
>>107542824
>They tried to map functions from the stlibs of C to an instruction in the CPU 1:1.
It was not made for C, it was made for other languages which I have already explained. Most of the complex instructions can't be used for C because they were designed for COBOL, FORTRAN, BASIC, and PL/I which use counted strings. C doesn't even have a packed decimal number type, which is a main feature of the VAX, and important in COBOL, PL/I, SQL and other languages used in business. It has the INDEX instruction which is designed for arrays with arbitrary lower bounds like PL/I, BASIC, Pascal, Algol, and Fortran (and Ada but it wasn't around yet when the VAX was invented) and also does bounds checking. The manual actually gives examples and justifications. You keep ignoring the rest of my posts.

https://bitsavers.org/pdf/dec/vax/archSpec/VAX-11_System_Reference_Manual_Rev5_Feb79.pdf
>ACB efficiently implements the general FOR or DO loops in high level languages since the sense of the comparison between index and limit is dependent on the sign of the addend.
That is not how loops work in C though. It's exactly how they work in Algol 60 and 68, PL/I, BASIC, and FORTRAN.

>The standard must be applicable to all of the inter-module CALLable interfaces in the VAX-11 software system. Specifically, the standard must consider the requirements of BASIC, COBOL, FORTRAN, BLISS, MARS and CALLs to the operating system. Thus:
>1. The standard must support all of the calling capabilities needed for the higher-level languages which DEC now supports (BASIC, COBOL, FORTRAN).
>2. The needs of other languages which DEC may support in the future must be noted (PL/1, Algol, APL).
>3. It must be possible to write calling and called procedures in BLISS and MARS which conform to the standard.

The manual doesn't even mention C. C programmers seem to have the same memory bugs as C programs.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.