[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/sci/ - Science & Math

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • Additional supported file types are: PDF
  • Use with [math] tags for inline and [eqn] tags for block equations.
  • Right-click equations to view the source.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1772142944438496.jpg (42 KB, 720x704)
42 KB JPG
Transformers are actually Cauchy-Poisson, trivially so
https://github.com/MidoriAppleCore/transformers-are-cauchy-poisson

check the lean code and compile it meow meow
>>
Is there any reason that you wanted to prove this?
>>
>>16966438
gives you analytical tools to study transformers pretty much, hundreds of years worth of math tools etc
>>
>>16966444
maybe you should focus on that in your writeup
like show the power of these tools. then maybe the proof will be worth doing
>>
File: transformer_pipeline.png (1.8 MB, 3468x1780)
1.8 MB PNG
>>16966445
We prove that diagonal state space models can't achieve the effectiveness of transformers, which I believe is already considered known but only empirically

here's gpt2 btw
>>
>>16966438
he spent 200$ on a gpt pro account and had to get some use out of it
>>
>>16966446
It seems that it's potentially interesting. Your work would benefit from being written in prose, like a normal article. You have a lot of emphasis on Lean, but in the end who will read Lean? It's nice to have to add confidence to your arguments, but first you need to explain the statements more clearly. As it is, the theorem statement is unclear to me: "break the ceiling", "attention escapes" sound like your invented terms. I dunno, maybe this is jargon from the depths of SSM literature, then it's just probably not for me.
>>
>>16966453
True

I decided to just release since I have to deal with life problems

That analogy is from pinning down the exact difference between SSMs and Transformers, and that transformers are able to break out of that jail, it's an artifact form me trying to find ideal state space models that got left in, really
>>
>>16966453
its like this because its AI; see the constant use of "proved x, not approximated" ( as a result, almost certainly of OP prompting his model to formalise everything/not approximate), gems like "Now write the **Poisson kernel** — the classical 19th-century formula for reconstructing a harmonic function from its boundary values on the upper
half-plane H = {z ∈ C | Im z > 0}:" .

The content of the research is interesting but everything in here is AI-written. Its always why there is such a frequent mention of sorry's, I would imagine op used them to recursively prompt a model to work up to the full result.
>>
>>16966458
desu I decided to keep it AI written instead of rewriting everything afterwards because I found the idea of something proofing itself poetic or elegant but I understand that it's bad form
>>
File: fractalrender2.png (3.34 MB, 1920x1080)
3.34 MB PNG
>>16966458
but the actual result was from 7 or 8 months of fixation/obsession where I searched for this answer

I realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold based

I am a programmer and use AI for work so I used it for this
>>
>>16966453
>who will read Lean?
This is fucking hilarious.
>>
>>16966462

I doubt that; going by the reddit post you posted relating to the same "result"; in fact, I seriously doubt that you understand anything wrt this paper, which is okay because again, it is an interesting result, but you should be honest instead of lying to yourself and others.

>it's an artifact form me trying to find ideal state space models that got left in, really
its not.

Take note /sci/cels because this is just the vanguard of incoming "researchers" who perhaps for the first time in history can have very little understand of their research while having very intricate, seeming correct work. Imagine what it will be like when everyone has this technology. The era of the useful-pseud has begun.
>>
>>16966466
The work up to it I mean, not the lean code itself, which is AI generated ofc, and I shall note it at the top of the README

It is from months and months of studying, I realized it though creating an AI architecture then did the lean afterwards

I am surprised though that this identity is so trivial but nobody has said anything about it or brought it up anywhere
>>
>>16966468
The manifold hypothesis is more interesting because if classical analysis is adequate for LLM analysis math degrees devalue again.
Kind of like how every singular doctor is convinced they want to heal patients but the institutional structure of medicine prefers dependent patients.
>>
>>16966471
yes, and I find that it's useful to drag the lean files into any AI architecture you're working on, then allow the AI to extend the lean proof up to what you want to do. I actually think math degrees might become useful again since theres an implied langlands programs connection now? It might be that we can use this to try to study hwo to hand design L-funcs

the question is if we can use this to optimize for consumer hardware
>>
>>16966468
> I realized it though creating an AI architecture then did the lean afterwards
>I realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold based

you are in transgender AI psychosis and sitting here in an /x/ thread typing to yourself. You must feel yourself getting stupider every day as you delegate more and more of your reasoning to these things; I'm not sure what you thought the outcome of posting what is now evidently slop here and on reddit would be under the pretense that you wrote it, but you arent going to get praise.

Also, after going back and actually reading the lean, your model has written complete and utter slop, the main identity chooses the parameter from the answer, (x_k=q,\ y_k=w_k^{-1}), so (P(q,w_k^{-1};q)=w_k^{-1}/w_k^{-2}=w_k), i.e. (w_k=w_k), not a derivation of attention; because this holds for any (w_k>0), the softmax/query-key structure is unused, (\forall w>0,\exists y=1/w:P(q,y;q)=w);
the “GPT-2” model stores (s,V) directly and ignores the input (x), so it proves equality for (A(x)=\operatorname{outProj}(\operatorname{softmax}(s)V)) with (\partial A/\partial x=0), not real attention (A(x)=\operatorname{softmax}(Q(x)K(x)^\top)V(x));
the off-query theorem assumes (d>0) but the construction uses (d=x_k-q=0), so the bandwidth law is disconnected from the representation;
the claimed bandwidth implication is reversed, since (w\le 1/(2d)\iff d\le 1/(2w)), not (d\ge 1/(2w)); the pruning “bound” is only (|\sum_{k\in S}w_kV_k|\le\sum_{k\in S}w_k|V_k|), plain triangle inequality; and the SSM decomposition is just (A=h+(A-h)), not a separation theorem.
>>
>>16966479
thanks for feedback, checking it out now :)
>>
>>16966479
Ok I applied fixes, and added /sci/ anon(s) to author

let me know if you see anything now. should be more defensible now. I still need to thread the new analytical cauchy proof through the transformer toy etc but I'm getting tired and have work to do as well, tomorrow I'll keep looking over things. there are definitely more smells and things to organize. despite your choice words about my mental state, you are the most helpful person I have ran into tonight, and your ability to understand lean code helped, thank you deeply

feel free to pull req
>>
>>16966458
I mean, I'm ok with this. Usage of Lean to back up AI's proofs is a strong move. Nothing wrong with that. If OP had the overarching idea, maybe some essential insight of a proof, and made AI write it up, formalize and verify, the work is valid.
But I think it will be much better received if framed like this:
Representing a transformer layer as <this CP formalism> allows more powerful, high level arguments about it. Here is an example statement: <diagonal SSMs can't do something>, which is intuitive but difficult to prove, and here is how we can use the CP formulation to show it: <the core idea in words>. Then, move on to prove that CP formulation is equivalent to the original transformer. Acknowledge AI, say that Lean was used to ensure the correctness of the argument. The formal stuff can then be in the appendix/supplements.
This all can actually be pretty popular, this is within the trope of self-improving AI which is hot right now. If OP were in DeepMind or something they could plausibly spin a Nature paper out of this.
>>
>>16966645
good take anon I think I'll do this, like more of a 'ai discovered this' sort of angle?

I would say that I had some spark, but it was vague enough that seeing manifold geometry made me spend 8 months vibe coding ai architectures until I stumbled onto a cauchy based one by taking newtons fractal then dampening and rotating it
>>
GPT LLMS is the spawn of the devil, no one knows how it really works.
>>
tomoko posting nigger
>>
File: 1772153564216608.jpg (168 KB, 780x1035)
168 KB JPG
>>16966923
don't you disrespect queen

anyway, I'm cleaning up the files and going over them. I understand I might've like, schizo'd out a little and released the beautiful thing too early. I really want my tsundere professor/reviewer to come back because he's actually useful desu but we'll see
>>
File: .jpg (127 KB, 850x601)
127 KB JPG
>>16967066
How does she smell???
>>
>>16967067
like flowers (and a month of nonshowering)
>>
File: .jpg (43 KB, 600x711)
43 KB JPG
>>16967068
HNNNNNNGH!
>>
>>16966466
also, I thought a lot about this post today meow meow like you don't realize how much people like you need people like me and vis versa, formalist

>>16967069
baguette
>>
File: .jpg (32 KB, 546x323)
32 KB JPG
>>16967070
>>
is there like a repo i can use to get lean to define a transformer for me? I have one but I'd trust some git autists lean transformer more
>>
lean code is cleaned up now I need my tsundere reviewer to call me a schizo tranny and let me know exactly what to fix
>>
Lean is weird because I'll often prove things in the system without fully understanding what I'm doing. I kind of "feel" my way through a problem until I expect the goal to be completed.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.