/sci/ - Transformers are actually Cauchy-Poisson, triviall - Science & Math


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
04/27/26(Mon)22:38:07 No.16966432

File: 1772142944438496.jpg (42 KB, 720x704)

Anonymous 04/27/26(Mon)22:38:07 No.16966432

Transformers are actually Cauchy-Poisson, trivially so
https://github.com/MidoriAppleCore/transformers-are-cauchy-poisson

check the lean code and compile it meow meow

Anonymous
04/27/26(Mon)22:42:16 No.16966438

Anonymous 04/27/26(Mon)22:42:16 No.16966438

Is there any reason that you wanted to prove this?

Anonymous
04/27/26(Mon)22:47:29 No.16966444

Anonymous 04/27/26(Mon)22:47:29 No.16966444

>>16966438
gives you analytical tools to study transformers pretty much, hundreds of years worth of math tools etc

Anonymous
04/27/26(Mon)22:52:36 No.16966445

Anonymous 04/27/26(Mon)22:52:36 No.16966445

>>16966444
maybe you should focus on that in your writeup
like show the power of these tools. then maybe the proof will be worth doing

Anonymous
04/27/26(Mon)22:58:00 No.16966446

Anonymous 04/27/26(Mon)22:58:00 No.16966446

File: transformer_pipeline.png (1.8 MB, 3468x1780)

1.8 MB PNG

>>16966445
We prove that diagonal state space models can't achieve the effectiveness of transformers, which I believe is already considered known but only empirically

here's gpt2 btw

Anonymous
04/27/26(Mon)23:10:39 No.16966450

Anonymous 04/27/26(Mon)23:10:39 No.16966450

>>16966438
he spent 200$ on a gpt pro account and had to get some use out of it

Anonymous
04/27/26(Mon)23:13:23 No.16966453

Anonymous 04/27/26(Mon)23:13:23 No.16966453

>>16966446
It seems that it's potentially interesting. Your work would benefit from being written in prose, like a normal article. You have a lot of emphasis on Lean, but in the end who will read Lean? It's nice to have to add confidence to your arguments, but first you need to explain the statements more clearly. As it is, the theorem statement is unclear to me: "break the ceiling", "attention escapes" sound like your invented terms. I dunno, maybe this is jargon from the depths of SSM literature, then it's just probably not for me.

Anonymous
04/27/26(Mon)23:20:42 No.16966454

Anonymous 04/27/26(Mon)23:20:42 No.16966454

>>16966453
True

I decided to just release since I have to deal with life problems

That analogy is from pinning down the exact difference between SSMs and Transformers, and that transformers are able to break out of that jail, it's an artifact form me trying to find ideal state space models that got left in, really

Anonymous
04/27/26(Mon)23:32:13 No.16966458

Anonymous 04/27/26(Mon)23:32:13 No.16966458

>>16966453
its like this because its AI; see the constant use of "proved x, not approximated" ( as a result, almost certainly of OP prompting his model to formalise everything/not approximate), gems like "Now write the **Poisson kernel** — the classical 19th-century formula for reconstructing a harmonic function from its boundary values on the upper
half-plane H = {z ∈ C | Im z > 0}:" .

The content of the research is interesting but everything in here is AI-written. Its always why there is such a frequent mention of sorry's, I would imagine op used them to recursively prompt a model to work up to the full result.

Anonymous
04/27/26(Mon)23:35:15 No.16966461

Anonymous 04/27/26(Mon)23:35:15 No.16966461

>>16966458
desu I decided to keep it AI written instead of rewriting everything afterwards because I found the idea of something proofing itself poetic or elegant but I understand that it's bad form

Anonymous
04/27/26(Mon)23:44:31 No.16966462

Anonymous 04/27/26(Mon)23:44:31 No.16966462

File: fractalrender2.png (3.34 MB, 1920x1080)

3.34 MB PNG

>>16966458
but the actual result was from 7 or 8 months of fixation/obsession where I searched for this answer

I realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold based

I am a programmer and use AI for work so I used it for this

Anonymous
04/27/26(Mon)23:46:48 No.16966464

Anonymous 04/27/26(Mon)23:46:48 No.16966464

>>16966453
>who will read Lean?
This is fucking hilarious.

Anonymous
04/27/26(Mon)23:57:31 No.16966466

Anonymous 04/27/26(Mon)23:57:31 No.16966466

>>16966462

I doubt that; going by the reddit post you posted relating to the same "result"; in fact, I seriously doubt that you understand anything wrt this paper, which is okay because again, it is an interesting result, but you should be honest instead of lying to yourself and others.

>it's an artifact form me trying to find ideal state space models that got left in, really
its not.

Take note /sci/cels because this is just the vanguard of incoming "researchers" who perhaps for the first time in history can have very little understand of their research while having very intricate, seeming correct work. Imagine what it will be like when everyone has this technology. The era of the useful-pseud has begun.

Anonymous
04/28/26(Tue)00:10:34 No.16966468

Anonymous 04/28/26(Tue)00:10:34 No.16966468

>>16966466
The work up to it I mean, not the lean code itself, which is AI generated ofc, and I shall note it at the top of the README

It is from months and months of studying, I realized it though creating an AI architecture then did the lean afterwards

I am surprised though that this identity is so trivial but nobody has said anything about it or brought it up anywhere

Anonymous
04/28/26(Tue)00:16:19 No.16966471

Anonymous 04/28/26(Tue)00:16:19 No.16966471

>>16966468
The manifold hypothesis is more interesting because if classical analysis is adequate for LLM analysis math degrees devalue again.
Kind of like how every singular doctor is convinced they want to heal patients but the institutional structure of medicine prefers dependent patients.

Anonymous
04/28/26(Tue)00:21:14 No.16966473

Anonymous 04/28/26(Tue)00:21:14 No.16966473

>>16966471
yes, and I find that it's useful to drag the lean files into any AI architecture you're working on, then allow the AI to extend the lean proof up to what you want to do. I actually think math degrees might become useful again since theres an implied langlands programs connection now? It might be that we can use this to try to study hwo to hand design L-funcs

the question is if we can use this to optimize for consumer hardware

Anonymous
04/28/26(Tue)00:39:57 No.16966479

Anonymous 04/28/26(Tue)00:39:57 No.16966479

>>16966468
> I realized it though creating an AI architecture then did the lean afterwards
>I realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold based

you are in transgender AI psychosis and sitting here in an /x/ thread typing to yourself. You must feel yourself getting stupider every day as you delegate more and more of your reasoning to these things; I'm not sure what you thought the outcome of posting what is now evidently slop here and on reddit would be under the pretense that you wrote it, but you arent going to get praise.

Also, after going back and actually reading the lean, your model has written complete and utter slop, the main identity chooses the parameter from the answer, (x_k=q,\ y_k=w_k^{-1}), so (P(q,w_k^{-1};q)=w_k^{-1}/w_k^{-2}=w_k), i.e. (w_k=w_k), not a derivation of attention; because this holds for any (w_k>0), the softmax/query-key structure is unused, (\forall w>0,\exists y=1/w:P(q,y;q)=w);
the “GPT-2” model stores (s,V) directly and ignores the input (x), so it proves equality for (A(x)=\operatorname{outProj}(\operatorname{softmax}(s)V)) with (\partial A/\partial x=0), not real attention (A(x)=\operatorname{softmax}(Q(x)K(x)^\top)V(x));
the off-query theorem assumes (d>0) but the construction uses (d=x_k-q=0), so the bandwidth law is disconnected from the representation;
the claimed bandwidth implication is reversed, since (w\le 1/(2d)\iff d\le 1/(2w)), not (d\ge 1/(2w)); the pruning “bound” is only (|\sum_{k\in S}w_kV_k|\le\sum_{k\in S}w_k|V_k|), plain triangle inequality; and the SSM decomposition is just (A=h+(A-h)), not a separation theorem.

Anonymous
04/28/26(Tue)00:45:08 No.16966483

Anonymous 04/28/26(Tue)00:45:08 No.16966483

>>16966479
thanks for feedback, checking it out now :)

Anonymous
04/28/26(Tue)04:55:59 No.16966584

Anonymous 04/28/26(Tue)04:55:59 No.16966584

>>16966479
Ok I applied fixes, and added /sci/ anon(s) to author

let me know if you see anything now. should be more defensible now. I still need to thread the new analytical cauchy proof through the transformer toy etc but I'm getting tired and have work to do as well, tomorrow I'll keep looking over things. there are definitely more smells and things to organize. despite your choice words about my mental state, you are the most helpful person I have ran into tonight, and your ability to understand lean code helped, thank you deeply

feel free to pull req

Anonymous
04/28/26(Tue)08:30:15 No.16966645

Anonymous 04/28/26(Tue)08:30:15 No.16966645

>>16966458
I mean, I'm ok with this. Usage of Lean to back up AI's proofs is a strong move. Nothing wrong with that. If OP had the overarching idea, maybe some essential insight of a proof, and made AI write it up, formalize and verify, the work is valid.
But I think it will be much better received if framed like this:
Representing a transformer layer as <this CP formalism> allows more powerful, high level arguments about it. Here is an example statement: <diagonal SSMs can't do something>, which is intuitive but difficult to prove, and here is how we can use the CP formulation to show it: <the core idea in words>. Then, move on to prove that CP formulation is equivalent to the original transformer. Acknowledge AI, say that Lean was used to ensure the correctness of the argument. The formal stuff can then be in the appendix/supplements.
This all can actually be pretty popular, this is within the trope of self-improving AI which is hot right now. If OP were in DeepMind or something they could plausibly spin a Nature paper out of this.

Anonymous
04/28/26(Tue)08:59:22 No.16966653

Anonymous 04/28/26(Tue)08:59:22 No.16966653

>>16966645
good take anon I think I'll do this, like more of a 'ai discovered this' sort of angle?

I would say that I had some spark, but it was vague enough that seeing manifold geometry made me spend 8 months vibe coding ai architectures until I stumbled onto a cauchy based one by taking newtons fractal then dampening and rotating it

Anonymous
04/28/26(Tue)09:03:58 No.16966656

Anonymous 04/28/26(Tue)09:03:58 No.16966656

GPT LLMS is the spawn of the devil, no one knows how it really works.

Anonymous
04/28/26(Tue)17:51:09 No.16966923

Anonymous 04/28/26(Tue)17:51:09 No.16966923

tomoko posting nigger

Anonymous
04/28/26(Tue)23:00:55 No.16967066

Anonymous 04/28/26(Tue)23:00:55 No.16967066

File: 1772153564216608.jpg (168 KB, 780x1035)

168 KB JPG

>>16966923
don't you disrespect queen

anyway, I'm cleaning up the files and going over them. I understand I might've like, schizo'd out a little and released the beautiful thing too early. I really want my tsundere professor/reviewer to come back because he's actually useful desu but we'll see

Anonymous
04/28/26(Tue)23:05:37 No.16967067

Anonymous 04/28/26(Tue)23:05:37 No.16967067

File: .jpg (127 KB, 850x601)

127 KB JPG

>>16967066
How does she smell???

Anonymous
04/28/26(Tue)23:07:41 No.16967068

Anonymous 04/28/26(Tue)23:07:41 No.16967068

>>16967067
like flowers (and a month of nonshowering)

Anonymous
04/28/26(Tue)23:11:00 No.16967069

Anonymous 04/28/26(Tue)23:11:00 No.16967069

File: .jpg (43 KB, 600x711)

43 KB JPG

>>16967068
HNNNNNNGH!

Anonymous
04/28/26(Tue)23:12:28 No.16967070

Anonymous 04/28/26(Tue)23:12:28 No.16967070

>>16966466
also, I thought a lot about this post today meow meow like you don't realize how much people like you need people like me and vis versa, formalist

>>16967069
baguette

Anonymous
04/28/26(Tue)23:17:04 No.16967071

Anonymous 04/28/26(Tue)23:17:04 No.16967071

File: .jpg (32 KB, 546x323)

32 KB JPG

>>16967070

Anonymous
04/28/26(Tue)23:38:01 No.16967077

Anonymous 04/28/26(Tue)23:38:01 No.16967077

is there like a repo i can use to get lean to define a transformer for me? I have one but I'd trust some git autists lean transformer more

Anonymous
04/29/26(Wed)02:15:53 No.16967123

Anonymous 04/29/26(Wed)02:15:53 No.16967123

File: WXQ99oLFNLvUODwRtk-aWHX1i(...).jpg (22 KB, 500x492)

22 KB JPG

lean code is cleaned up now I need my tsundere reviewer to call me a schizo tranny and let me know exactly what to fix

Anonymous
04/29/26(Wed)08:07:14 No.16967241

Anonymous 04/29/26(Wed)08:07:14 No.16967241

Lean is weird because I'll often prove things in the system without fully understanding what I'm doing. I kind of "feel" my way through a problem until I expect the goal to be completed.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. Additional supported file types are: PDF Use T_eX with [math] tags for inline and [eqn] tags for block equations. Right-click equations to view the source.