/sci/ - im trying to write a new neural primitive what do you think of it - Science & Math


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
im trying to write a new neura(...) 01/09/26(Fri)13:37:09 No.16888114

File: Screenshot 2026-01-09 193545.png (132 KB, 1741x910)

im trying to write a new neural primitive what do you think of it Anonymous 01/09/26(Fri)13:37:09 No.16888114

https://github.com/pkcode94/deepseekx/tree/master/deepseekx

Mathematical Formalization of the Unified Multi-Head Transformer LSTM Cell1. Core LSTM Update$$\begin{aligned} \mathbf{i}_t &= \sigma(\mathbf{W}_i [\mathbf{x}_t, \mathbf{h}_{t-1}] + \mathbf{b}_i) \\ \mathbf{f}_t &= \sigma(\mathbf{W}_f [\mathbf{x}_t, \mathbf{h}_{t-1}] + \mathbf{b}_f) \\ \mathbf{g}_t &= \tanh(\mathbf{W}_g [\mathbf{x}_t, \mathbf{h}_{t-1}] + \mathbf{b}_g) \\ \mathbf{o}_t &= \sigma(\mathbf{W}_o [\mathbf{x}_t, \mathbf{h}_{t-1}] + \mathbf{b}_o) \\ \mathbf{c}_t &= \mathbf{f}_t \odot \mathbf{c}_{t-1} + \mathbf{i}_t \odot \mathbf{g}_t \\ \mathbf{h}_{lstm, t} &= \mathbf{o}_t \odot \tanh(\mathbf{c}_t) \end{aligned}$$2. Fractal Memory & Attention$$\begin{aligned} \mathcal{R}_0 &\leftarrow \text{Enqueue}(\mathcal{R}_0, \text{sg}[\mathbf{h}_{lstm, t}]) \\ \mathbf{h}_{attn, t} &= \text{MHA}_0(\mathbf{h}_{lstm, t}, \mathcal{R}_0, \mathcal{R}_0) \\ \mathbf{z}_{l, t} &= \begin{cases} \text{MHA}_l(\mathbf{h}_{lstm, t}, \mathcal{R}_0, \mathcal{R}_0), & l=0 \\ \text{MHA}_l(\mathbf{h}_{lstm, t}, \mathcal{C}_{l-1}, \mathcal{C}_{l-1}), & l > 0 \end{cases} \\ \mathcal{C}_l &\leftarrow \text{Enqueue}(\mathcal{C}_l, \text{sg}[\mathbf{z}_{l, t}]) \end{aligned}$$3. CT-Gate (Compressed-Transform)$$\begin{aligned} \gamma_t &= \sigma(\mathbf{W}_{ct\_g} [\mathbf{x}_t, \mathbf{h}_{attn, t}] + \mathbf{b}_{ct\_g}) \\ \mathbf{z}_{small} &= \text{ReLU}(\mathbf{W}_{down} \mathbf{z}_{D-1, t} + \mathbf{b}_{down}) \\ \mathbf{z}_{exp} &= \mathbf{W}_{up} \mathbf{z}_{small} + \mathbf{b}_{up} \\ \mathbf{h}_{ct, t} &= \gamma_t \odot \mathbf{z}_{exp} + (1 - \gamma_t) \odot \text{Tile}(\mathbf{z}_{small}) \end{aligned}$$4. Final Unified Ensemble$$\mathbf{h}_t = \frac{1}{3+D} \left( \mathbf{h}_{lstm, t} + \mathbf{h}_{attn, t} + \mathbf{h}_{ct, t} + \sum_{l=0}^{D-1} \mathbf{z}_{l, t} \right)$$

Anonymous
01/09/26(Fri)19:24:13 No.16888300

Anonymous 01/09/26(Fri)19:24:13 No.16888300

Where It Starts to Get Weak
1. “Fractal” Is Marketing, Not Math

Nothing here is mathematically fractal.

There is:

No self-similar scaling law

No recursive contraction mapping

No invariance across depth

It’s just:

“A stack of attention memories with enqueue.”

Calling it fractal memory is branding, not formalism.

2. The Final Averaging Is Arbitrary

This is a red flag:

hₜ = average of everything

Problems:

No learned weighting

Assumes all pathways are equally informative

Ignores scale mismatches

Encourages representational blur

A learned gating or normalization would be strictly superior.

This choice screams:

“I didn’t want to deal with instability.”

3. Attention Depth D Is Undefined Behaviorally

Questions unanswered:

How large can D get before memory explodes?

Is memory truncated?

Is enqueue FIFO? Reservoir?

Is attention causal or bidirectional?

Without this, the model is underspecified.

4. No Training Objective Ties the Parts Together

There is no loss-level justification for:

Why recursive memories matter

Why compression is beneficial

Why LSTM + attention are not redundant

This means:

It might work

But it’s not theoretically grounded

Bottom Line

This work is:

Technically competent

Architecturally coherent

Incrementally creative

It is NOT:

A new theory of learning

A mathematically deep construction

A principled unification (despite the name)

If I had to summarize it honestly:

“A reasonably engineered hybrid RNN-attention cell with hierarchical memory and a compression gate, expressed with more ambition than justification.”

Anonymous
01/09/26(Fri)19:28:47 No.16888301

Anonymous 01/09/26(Fri)19:28:47 No.16888301

Where It Fails as a Primitive
1. It Is Not Minimal
This “primitive” contains:
an LSTM
multiple attention heads
recursive memory buffers
explicit gradient blocking
a compression–expansion bottleneck
ensemble averaging
That’s half a model, not a primitive.
A primitive should be explainable in one sentence.
He needs five paragraphs.

2. No New Operation Is Introduced
Every operation used already exists:
σ, tanh, ReLU
gating
attention
enqueue / memory buffer
projection down / up
There is no new mathematical operator.
That alone disqualifies it as a primitive.

3. Behavior Is Emergent, Not Atomic
A primitive has a direct behavioral meaning:
Attention “select”
Convolution “local aggregate”
Gate “modulate flow”

This block’s behavior is:
“Whatever emerges when these parts interact”
That’s architecture-level behavior, not primitive-level.

Where He Is Onto Something
Now the charitable part — because he’s not wrong in spirit.

1. The Stop-Gradient Memory Insertion
This is the closest thing to a primitive here.

You could extract:
“A write-only, read-many memory operator with gradient isolation”
That could be a primitive if isolated and formalized.

2. The Compression–Transform Gate
The CT gate is conceptually sound:
“Route information through a bottleneck unless expansion is justified”
That’s a control primitive, but only if stripped down and generalized.
Right now it’s buried.

3. The Intent Is Correct
He’s trying to address a real gap:
RNNs remember locally
Transformers remember globally
Neither does hierarchical abstraction over time cleanly
That instinct is correct.

How He Should Reframe It:
If he wants this to be taken seriously as a primitive, he needs to:

1. Pick ONE Idea
Not five.

Examples:
“Gradient-isolated memory write”
“Recursive attention accumulation”

“Gated compression routing”

Anonymous
01/09/26(Fri)19:31:09 No.16888302

Anonymous 01/09/26(Fri)19:31:09 No.16888302

2. Define It Abstractly
Something like:

Definition:
A memory operator N that accepts state hₜ and returns a read vector rₜ, where memory writes are non-differentiable and reads are differentiable.

That’s primitive language.

3. Show It Working in Multiple Contexts

A primitive must survive being used in:
RNNs
Transformers
CNN-like temporal models

Right now, this only works inside itself.

Honest Verdict You Could Give Him

If you want a fair but accurate response, something like:

“This is a well-engineered composite cell and a solid architectural experiment. However, it’s not yet a neural primitive — it’s a macro-block built from existing primitives. To become a primitive, you’d need to isolate a single new operation, define its behavior independently, and show it composes cleanly with other architectures.”
That’s not dismissive.
That’s correct.

Anonymous
01/09/26(Fri)19:32:15 No.16888303

Anonymous 01/09/26(Fri)19:32:15 No.16888303

Thats as much as i can post without the janitors deleting all my posts for "flooding" or whatever other dogshit reason they want to adopt.

Anonymous
01/09/26(Fri)20:27:24 No.16888333

Anonymous 01/09/26(Fri)20:27:24 No.16888333

>>16888303
Thank you for your in depth response, actually carrying the board fr fr no cap senpai (not OP)

Anonymous
01/09/26(Fri)22:22:28 No.16888374

Anonymous 01/09/26(Fri)22:22:28 No.16888374

File: Screenshot 2026-01-10 030726.png (217 KB, 1727x943)

217 KB PNG

>>16888302
>d fr fr no cap senpai (not OP)
>>16888301
>>16888300
give me a sec before i read your response. i am using it to categorize radiometric data into anomalies right now.

Anonymous
01/09/26(Fri)22:29:19 No.16888375

Anonymous 01/09/26(Fri)22:29:19 No.16888375

>>16888302
fair response. also the architectural criticism is fair and thank you for the constructivity. i will work on it.

Anonymous
01/09/26(Fri)22:39:23 No.16888377

Anonymous 01/09/26(Fri)22:39:23 No.16888377

>>16888303
https://github.com/pkcode94/deepseekx
btw heres the github.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. Additional supported file types are: PDF Use T_eX with [math] tags for inline and [eqn] tags for block equations. Right-click equations to view the source.