/g/ - >turning one instruction into twelve So this is th - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
12/13/25(Sat)13:10:27 No.107539017

File: aes.png (100 KB, 1536x921)

Anonymous 12/13/25(Sat)13:10:27 No.107539017

>turning one instruction into twelve
So this is the power of RISC

Anonymous
12/13/25(Sat)13:12:53 No.107539051

Anonymous 12/13/25(Sat)13:12:53 No.107539051

>>107539017
The reason why RISCV is so slow right now is because it is lacking good branch prediction, in comparison lack of compound instructions is a minor performance hit

Anonymous
12/13/25(Sat)13:24:13 No.107539175

Anonymous 12/13/25(Sat)13:24:13 No.107539175

File: riscv.png (78 KB, 1190x696)

78 KB PNG

>>107539051
That's AArch64 asm in the screenshot.
RISC-V is even worse since the AES instructions don't use the vector registers at all.

Anonymous
12/13/25(Sat)15:42:39 No.107540337

Anonymous 12/13/25(Sat)15:42:39 No.107540337

>>107539017
What do you think "reduced instruction set" means
Would you prefer to have six gorillion obscure instructions like amd64

Anonymous
12/13/25(Sat)15:47:20 No.107540375

Anonymous 12/13/25(Sat)15:47:20 No.107540375

>>107539017
now show the actual circuits and power required to process said 'single instruction' vs the arm one

Anonymous
12/13/25(Sat)15:54:29 No.107540419

Anonymous 12/13/25(Sat)15:54:29 No.107540419

Not a single RISC architecture has a CAS instruction except RISC-V which only has it as an extra nobody will implement. The day I learned that, I immediately grew out of the RISC meme.

Anonymous
12/13/25(Sat)15:54:50 No.107540424

Anonymous 12/13/25(Sat)15:54:50 No.107540424

>>107539017
Kind of a moot point, cause all modern x86_64 processors implement a smaller RISC-like language for implementing the big instructions called micro-code. They define the individual atomic operations used for operations, because past a certain point it became infeasible to maintain all of x86’s op-codes in pure silicon, and microcode also means you can fix bugs

Anonymous
12/13/25(Sat)15:57:27 No.107540440

Anonymous 12/13/25(Sat)15:57:27 No.107540440

>>107540419
You’re wrong, ARM has one: https://developer.arm.com/documentation/ddi0602/2025-09/Base-Instructions/CAS--CASA--CASAL--CASL--Compare-and-swap-word-or-doubleword-in-memory-
RISC-V also has a standard extension that implements it, but it’s not part of the base standard

Anonymous
12/13/25(Sat)15:59:50 No.107540459

Anonymous 12/13/25(Sat)15:59:50 No.107540459

>>107539017
There is nothing even remotely "reduced" about modern ARM. Also, ARM does not have 512bit registers, so obviously it would need multiple instructions. Nothing whatsoever to do with being "RISC".

Anonymous
12/13/25(Sat)16:10:06 No.107540559

Anonymous 12/13/25(Sat)16:10:06 No.107540559

File: x86_vs_aarch64.png (332 KB, 1368x1024)

332 KB PNG

>>107540337
>Would you prefer to have six gorillion obscure instructions like amd64
lol
lmao
kek, even

Anonymous
12/13/25(Sat)16:15:57 No.107540604

Anonymous 12/13/25(Sat)16:15:57 No.107540604

>>107540559
My understanding is that the RISC model mostly focuses on making sure an instruction does one thing. This means that instructions do not handle storing to memory, or fetching from memory. You need to do this yourself. So every instruction is preceded by loads and succeeded by stores. CISC architectures on the other hand have more complex instruction encodings, that mean that any given instruction can:
- Read from a register, write to a static address
- Read from a register, write to an address in another register
- Read from a register, write to a register

And so on. This encoding is a notable factor in the complexity of x86, because of just how many ways these can be combined. Doing it like this makes it easier for human devs, because it’s less verbose and easier to work with, which is why x86 won out I think, cause at the time a lot more people were writing directly in assembly.

Anonymous
12/13/25(Sat)16:19:02 No.107540624

Anonymous 12/13/25(Sat)16:19:02 No.107540624

>>107539051
>it is lacking good branch prediction

Rather than guessing the next instruction, the CPU should just guess the final output. We can call it "predictive computing". You don't even need to write a program, just a vague statement of what you're kinda looking for.

Anonymous
12/13/25(Sat)16:20:00 No.107540629

Anonymous 12/13/25(Sat)16:20:00 No.107540629

>>107540624
Maybe some fags will make AI do it

Anonymous
12/13/25(Sat)16:27:28 No.107540687

Anonymous 12/13/25(Sat)16:27:28 No.107540687

>>107540604
>an instruction does one thing

Yes, it works like traditional CPUs used to work. REDUCED Instruction Set Computing...i.e. the total number of registers is deliberately limited. This gives you granular control over program execution but requires hand-optimization of code. It CAN be better, but it won't be if you're using jeetcoders.

CISC treats registers as more of an API, where a call to a register may result in the computer performing numerous additional steps not specified in the program. Such as CMPXCHG and XADD. The idea being that you can improve performance by having commonly used operations baked into the hardware rather than having to repeat them via software step-by-step each time.

Anonymous
12/13/25(Sat)16:29:09 No.107540704

Anonymous 12/13/25(Sat)16:29:09 No.107540704

>>107540629
And make it slower than a 68k whilst requiring Guatemala's total power output to compute a single SHA512 hash.

Anonymous
12/13/25(Sat)16:34:09 No.107540738

Anonymous 12/13/25(Sat)16:34:09 No.107540738

>>107540440
Compilers still rely on LS/SC retardation.
https://godbolt.org/z/1rGr1fMjr

Anonymous
12/13/25(Sat)16:35:45 No.107540747

Anonymous 12/13/25(Sat)16:35:45 No.107540747

>>107540704
Yep, and they'll boast about it too and hype retarded investors up with it.

Anonymous
12/13/25(Sat)16:45:34 No.107540824

Anonymous 12/13/25(Sat)16:45:34 No.107540824

>>107539017
>turning one instruction into twelve
you're describing Intel's microcode

Anonymous
12/13/25(Sat)16:52:53 No.107540870

Anonymous 12/13/25(Sat)16:52:53 No.107540870

>>107540824
Except the microcode doesn't unroll into every nook and cranny of the platform, bloating every executable N-fold.

Anonymous
12/13/25(Sat)16:56:06 No.107540894

Anonymous 12/13/25(Sat)16:56:06 No.107540894

>>107540738
armv7 doesn’t appear to have it, changing the compiler to ARM64 GCC uses the casal instruction: https://godbolt.org/z/resrT3c8h

Anonymous
12/13/25(Sat)16:57:50 No.107540910

Anonymous 12/13/25(Sat)16:57:50 No.107540910

>>107540704
>than a 68k
zoomer spotted

Anonymous
12/13/25(Sat)17:03:07 No.107540958

Anonymous 12/13/25(Sat)17:03:07 No.107540958

>>107539017
From what I see, vaesenc has a latency of 4 to 5 while aese could be run concurrently on separate ALUs.

Anonymous
12/13/25(Sat)17:12:39 No.107541042

Anonymous 12/13/25(Sat)17:12:39 No.107541042

>>107540910
>>than a 68k
>zoomer spotted
I specifically using a 68k because it is widely considered to have the cleanest microcode whilst still being fairly performant. This simplicity would make it a likely target for any ml training since it is vastly simply than x86, and thus, would have fewer output errors, and it is even still supported by gcc. Aside from that, expecting an ai model made to simulate a cpu to avoid branching is unlikely to be doable much faster than that; I should be shocked if it even reached Pentium speeds.

Anonymous
12/13/25(Sat)17:20:30 No.107541113

Anonymous 12/13/25(Sat)17:20:30 No.107541113

>>107541042
but it has both microcode and nanocode

Anonymous
12/13/25(Sat)17:25:23 No.107541160

Anonymous 12/13/25(Sat)17:25:23 No.107541160

>>107541113
I didn't say it was clean--just cleanest (esp. vs amd64) while still having decades of tooling and modern support. RISCV is still shitty and fragmented without as much support.

Anonymous
12/13/25(Sat)17:27:05 No.107541172

Anonymous 12/13/25(Sat)17:27:05 No.107541172

File: office handshake.jpg (89 KB, 1351x1232)

89 KB JPG

>>107540559
>Almost 50% more instructions
>Almost three times longer manual
>Still has been performance/watt and battery life than anything else on the market

Anonymous
12/13/25(Sat)17:29:09 No.107541195

Anonymous 12/13/25(Sat)17:29:09 No.107541195

>>107541172
>performance/watt
Not really, when you consider work done per unit of power x86 shits on everything.

Anonymous
12/13/25(Sat)17:37:12 No.107541280

Anonymous 12/13/25(Sat)17:37:12 No.107541280

File: 1739821190020499.jpg (891 KB, 1514x1912)

891 KB JPG

>>107539017
>le import solution architecture

(●__● ) NPC
12/13/25(Sat)17:41:01 No.107541318

(●__● ) NPC 12/13/25(Sat)17:41:01 No.107541318

File: 1739356138889895.jpg (172 KB, 900x598)

172 KB JPG

>>107540687
risc lowers the complexity at the circuitry level but raises it at the compiler level. Modern "risc" cpus also have a lot of instructions or are surrounded which a bunch of coprocessors. Smartphones, for example, have co-processors for photography, video recording, video decoding, audio processing, AI, rendering (gpu), encryption, ... instead of having a large instruction set, you have multiple ones to handle. From an user (programmer) perspective: cisc >>> risc.

Anonymous
12/13/25(Sat)17:51:02 No.107541414

Anonymous 12/13/25(Sat)17:51:02 No.107541414

>>107541318
>user (programmer)
user (user) as well, x86 is compatible with everything, arm is not.

Anonymous
12/13/25(Sat)17:53:22 No.107541429

Anonymous 12/13/25(Sat)17:53:22 No.107541429

File: ChatGPT Image Nov 20, 202(...).png (2.75 MB, 1536x1024)

2.75 MB PNG

>>107539017
>ahhhhh aes
KYS

Anonymous
12/13/25(Sat)17:54:02 No.107541435

Anonymous 12/13/25(Sat)17:54:02 No.107541435

>>107541429
No.

Anonymous
12/13/25(Sat)17:59:40 No.107541475

Anonymous 12/13/25(Sat)17:59:40 No.107541475

https://www.youtube.com/watch?v=vJP_oKN4Ez0

Anonymous
12/13/25(Sat)18:10:22 No.107541572

Anonymous 12/13/25(Sat)18:10:22 No.107541572

>>107540424
>Kind of a moot point, cause all modern x86_64 processors implement a smaller RISC-like language for implementing the big instructions called micro-code.
So did the 8086.
https://www.righto.com/2022/11/how-8086-processors-microcode-engine.html

>>107540687
>Yes, it works like traditional CPUs used to work. REDUCED Instruction Set Computing
It's the other way around. Traditional CPUs are CISC and have microcode.

Anonymous
12/13/25(Sat)18:35:58 No.107541809

Anonymous 12/13/25(Sat)18:35:58 No.107541809

>>107540629
Yes what we really need is for CPUs to hallucinate

Anonymous
12/13/25(Sat)18:57:42 No.107541986

Anonymous 12/13/25(Sat)18:57:42 No.107541986

>>107541809
Marketing execs: "But AI doesn't hallucinate! It gives the correct answer that nobody else can!"

Anonymous
12/13/25(Sat)19:01:12 No.107542008

Anonymous 12/13/25(Sat)19:01:12 No.107542008

>>107541318
RISC is a myth, it basically does not exist in practice. There is nothing even remotely reduced about a modern ARM core.

Anonymous
12/13/25(Sat)19:02:51 No.107542024

Anonymous 12/13/25(Sat)19:02:51 No.107542024

>>107542008
MIPS was

Anonymous
12/13/25(Sat)19:03:21 No.107542032

Anonymous 12/13/25(Sat)19:03:21 No.107542032

>>107541429
elaborate, retarded gorilla nigger

Anonymous
12/13/25(Sat)19:07:54 No.107542076

Anonymous 12/13/25(Sat)19:07:54 No.107542076

File: cash.png (277 KB, 1065x1431)

277 KB PNG

>>107540687
>CISC treats registers as more of an API, where a call to a register may result in the computer performing numerous additional steps not specified in the program. Such as CMPXCHG and XADD.
Who's gonna tell him?
>inb4 aarch64 isn't risc

Anonymous
12/13/25(Sat)19:08:11 No.107542080

Anonymous 12/13/25(Sat)19:08:11 No.107542080

File: 1762367486198568.jpg (36 KB, 300x417)

36 KB JPG

>>107539017
Unlike x86's aesenc/
vaesenc
(which take state and a round key that can be a memory operand or register), ARM's aese always performs an initial AddRoundKey (XOR) with a register-held key, followed by the other steps. There's no variant that loads the key directly from memory within the crypto instruction itself. This design integrates well with ARM's NEON SIMD vectors and allows pairing aese + aesmc for high-throughput pipelining on many cores.

Requiring keys in registers gives compilers/assemblers more control over scheduling, prefetching, and parallel block processing (common in modes like CBC or GCM). It also aligns with ARM's emphasis on vector processing for crypto. While it adds explicit load instructions (as seen in the image's address calculation and ldp), this overhead is minor compared to the speedup from hardware-accelerated rounds, especially when processing multiple blocks or using pre-loaded key schedules.

Anonymous
12/13/25(Sat)19:11:05 No.107542107

Anonymous 12/13/25(Sat)19:11:05 No.107542107

File: 1763671757.png (1.69 MB, 1664x928)

1.69 MB PNG

>>107542032
AES is so inefficient (and difficult to implement securely) that hardware vendors are forced to implement it at the hardware level.
Chacha20 and XChacha are better ciphers.

Anonymous
12/13/25(Sat)19:12:54 No.107542125

Anonymous 12/13/25(Sat)19:12:54 No.107542125

>>107542080
>Requiring keys in registers gives compilers/assemblers more control
>taking away choices gives greater control
GPT isn't sending its best today.
>>107540958
On Zen 5, vaesenc indeed has a latency of 4 cycles but it has a throughput of 2 instructions per cycle. That's the equivalent of 8 pairs of arm's aese+aesmc every cycle, which no modern arm processor is even close to.

Anonymous
12/13/25(Sat)19:16:30 No.107542157

Anonymous 12/13/25(Sat)19:16:30 No.107542157

>>107542080
>gives compilers/assemblers more control
Why the fuck should they have more control? Are compiler engineers more qualified than the CPU engineers? Fuck no.

Anonymous
12/13/25(Sat)19:19:23 No.107542182

Anonymous 12/13/25(Sat)19:19:23 No.107542182

>>107542157
It's not a matter of qualification but knowledge, a compiler knows more about the program than the cpu so can assist in instruction scheduling

Anonymous
12/13/25(Sat)19:21:43 No.107542196

Anonymous 12/13/25(Sat)19:21:43 No.107542196

>>107542182
and a CPU knows more about what execution units are available than the compiler ever will. (This is why Itanium failed, btw).

Anonymous
12/13/25(Sat)19:22:59 No.107542207

Anonymous 12/13/25(Sat)19:22:59 No.107542207

File: Not RISC.png (71 KB, 1399x520)

71 KB PNG

>>107542076
Don't forget the atomic read-modify-write instructions.

Anonymous
12/13/25(Sat)19:29:00 No.107542253

Anonymous 12/13/25(Sat)19:29:00 No.107542253

>>107542196
Sure, the cpu is free to dispatch or reorder instructions itself, hopefully the compiler writers are just following the manufacturer's docs so the cpu has more opportunities to make those decisions

Anonymous
12/13/25(Sat)19:53:58 No.107542439

Anonymous 12/13/25(Sat)19:53:58 No.107542439

>>107539017
We need a good C processor like the Vaxen had. CISC is dead and that's a shame.

Anyway your modern Intel machine does this too. Inside deep in there it's RISC with a big fact instruction decoder on the front now. Has been since like, gee, the Pentium Pro?

>>107541042
I think GCC has dropped everything pre-'030.

Anonymous
12/13/25(Sat)20:03:11 No.107542508

Anonymous 12/13/25(Sat)20:03:11 No.107542508

>>107542439
>We need a good C processor like the Vaxen had. CISC is dead and that's a shame.
The VAX computers were designed for FORTRAN, COBOL, and PL/I. C had nothing to do with it. The string instructions use a 16-bit length, and there is nothing for null-terminated strings. It has an instruction for the traditional FORTRAN and ALGOL style for loop called ACB (add compare branch). The decimal data types are for COBOL and PL/I. The POLY and EDIT instructions come from the PL/I language.

Anonymous
12/13/25(Sat)20:04:10 No.107542514

Anonymous 12/13/25(Sat)20:04:10 No.107542514

>>107542439
>I think GCC has dropped everything pre-'030.

>https://gcc.gnu.org/onlinedocs/gcc/M680x0-Options.html
>-mc68000
> Generate output for a 68000. This is the default when the compiler is configured for 68000-based systems. It is equivalent to -march=68000.

I wouldn't be surprised if GCC's docs are out of date, but from what I can tell, GCC still supports even the original 68k.

Anonymous
12/13/25(Sat)20:09:23 No.107542551

Anonymous 12/13/25(Sat)20:09:23 No.107542551

>>107542508
>The VAX computers were designed for FORTRAN, COBOL, and PL/I. C had nothing to do with it.
Wrong. PDP computers, correct. VAXen were made specifically with C in mind.

>>107542514
GCC is in flux at all times but this is good to know. Maybe it's still used in embedded? They have been dropping old archs like crazy in the past few years though.

Anonymous
12/13/25(Sat)20:18:53 No.107542626

Anonymous 12/13/25(Sat)20:18:53 No.107542626

>>107542551
>Wrong. PDP computers, correct. VAXen were made specifically with C in mind.
Read the rest of my post. The VAX has NOTHING to do with C. Read one of the manuals and it has examples in PL/I, FORTRAN, and COBOL, and I learned the PL/I language and the POLY and EDITPC instructions come directly from PL/I.

Anonymous
12/13/25(Sat)20:33:26 No.107542749

Anonymous 12/13/25(Sat)20:33:26 No.107542749

>>107542439
This whole x86 is RISC internally thing is totally backwards. You say that it became RISC internally with the Pentium pro, but out of order execution pipelines are a totally different kind of thing. If anything, the 8086 should be considered internally RISC, its micro-code is a full on RISC ISA with a program counter and everything.

Anonymous
12/13/25(Sat)20:40:23 No.107542797

Anonymous 12/13/25(Sat)20:40:23 No.107542797

>>107541172
It doesn't have better performance per watt, the apple m4 max uses over 50 watts of power when doing anything intensive, while at the same time delivering effectively less performance per core than an intel or amd chip, the performance from arm always came from paralellization, they cram as much tiny cores as they can and just do every process in its own core.
And have in mind that the apple MX chips use basically dedicated accelerators for almost any task they can cram one for, so the real performance of these chips is actually much lower outside synthetic benchmarks they can highly optimize for.

Anonymous
12/13/25(Sat)20:44:13 No.107542824

Anonymous 12/13/25(Sat)20:44:13 No.107542824

>>107542626
>The VAX has NOTHING to do with C.
No that's not true. It was the ULTIMATE CISC CPU series. They tried to map functions from the stlibs of C to an instruction in the CPU 1:1. There is a good oral history about it floating around.

>>107542749
>but out of order execution pipelines are a totally different kind of thing
Yes but they fused it all in PPro, which is the arch still used today with 64-bit extensions. They failed with P4 and Itanic.

Anonymous
12/13/25(Sat)21:04:30 No.107542983

Anonymous 12/13/25(Sat)21:04:30 No.107542983

>>107540624
That's literally vibe coding

Anonymous
12/13/25(Sat)21:06:26 No.107542998

Anonymous 12/13/25(Sat)21:06:26 No.107542998

>>107539017
Yes that's kind of the point actually.
This >>107541318

Anonymous
12/13/25(Sat)21:36:48 No.107543209

Anonymous 12/13/25(Sat)21:36:48 No.107543209

>>107542824
>They tried to map functions from the stlibs of C to an instruction in the CPU 1:1.
It was not made for C, it was made for other languages which I have already explained. Most of the complex instructions can't be used for C because they were designed for COBOL, FORTRAN, BASIC, and PL/I which use counted strings. C doesn't even have a packed decimal number type, which is a main feature of the VAX, and important in COBOL, PL/I, SQL and other languages used in business. It has the INDEX instruction which is designed for arrays with arbitrary lower bounds like PL/I, BASIC, Pascal, Algol, and Fortran (and Ada but it wasn't around yet when the VAX was invented) and also does bounds checking. The manual actually gives examples and justifications. You keep ignoring the rest of my posts.

https://bitsavers.org/pdf/dec/vax/archSpec/VAX-11_System_Reference_Manual_Rev5_Feb79.pdf
>ACB efficiently implements the general FOR or DO loops in high level languages since the sense of the comparison between index and limit is dependent on the sign of the addend.
That is not how loops work in C though. It's exactly how they work in Algol 60 and 68, PL/I, BASIC, and FORTRAN.

>The standard must be applicable to all of the inter-module CALLable interfaces in the VAX-11 software system. Specifically, the standard must consider the requirements of BASIC, COBOL, FORTRAN, BLISS, MARS and CALLs to the operating system. Thus:
>1. The standard must support all of the calling capabilities needed for the higher-level languages which DEC now supports (BASIC, COBOL, FORTRAN).
>2. The needs of other languages which DEC may support in the future must be noted (PL/1, Algol, APL).
>3. It must be possible to write calling and called procedures in BLISS and MARS which conform to the standard.

The manual doesn't even mention C. C programmers seem to have the same memory bugs as C programs.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.