Transformers are actually Cauchy-Poisson, trivially sohttps://github.com/MidoriAppleCore/transformers-are-cauchy-poissoncheck the lean code and compile it meow meow
Is there any reason that you wanted to prove this?
>>16966438gives you analytical tools to study transformers pretty much, hundreds of years worth of math tools etc
>>16966444maybe you should focus on that in your writeuplike show the power of these tools. then maybe the proof will be worth doing
>>16966445We prove that diagonal state space models can't achieve the effectiveness of transformers, which I believe is already considered known but only empiricallyhere's gpt2 btw
>>16966438he spent 200$ on a gpt pro account and had to get some use out of it
>>16966446It seems that it's potentially interesting. Your work would benefit from being written in prose, like a normal article. You have a lot of emphasis on Lean, but in the end who will read Lean? It's nice to have to add confidence to your arguments, but first you need to explain the statements more clearly. As it is, the theorem statement is unclear to me: "break the ceiling", "attention escapes" sound like your invented terms. I dunno, maybe this is jargon from the depths of SSM literature, then it's just probably not for me.
>>16966453TrueI decided to just release since I have to deal with life problemsThat analogy is from pinning down the exact difference between SSMs and Transformers, and that transformers are able to break out of that jail, it's an artifact form me trying to find ideal state space models that got left in, really
>>16966453its like this because its AI; see the constant use of "proved x, not approximated" ( as a result, almost certainly of OP prompting his model to formalise everything/not approximate), gems like "Now write the **Poisson kernel** — the classical 19th-century formula for reconstructing a harmonic function from its boundary values on the upperhalf-plane H = {z ∈ C | Im z > 0}:" .The content of the research is interesting but everything in here is AI-written. Its always why there is such a frequent mention of sorry's, I would imagine op used them to recursively prompt a model to work up to the full result.
>>16966458desu I decided to keep it AI written instead of rewriting everything afterwards because I found the idea of something proofing itself poetic or elegant but I understand that it's bad form
>>16966458but the actual result was from 7 or 8 months of fixation/obsession where I searched for this answerI realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold basedI am a programmer and use AI for work so I used it for this
>>16966453>who will read Lean?This is fucking hilarious.
>>16966462I doubt that; going by the reddit post you posted relating to the same "result"; in fact, I seriously doubt that you understand anything wrt this paper, which is okay because again, it is an interesting result, but you should be honest instead of lying to yourself and others. >it's an artifact form me trying to find ideal state space models that got left in, reallyits not. Take note /sci/cels because this is just the vanguard of incoming "researchers" who perhaps for the first time in history can have very little understand of their research while having very intricate, seeming correct work. Imagine what it will be like when everyone has this technology. The era of the useful-pseud has begun.
>>16966466The work up to it I mean, not the lean code itself, which is AI generated ofc, and I shall note it at the top of the READMEIt is from months and months of studying, I realized it though creating an AI architecture then did the lean afterwardsI am surprised though that this identity is so trivial but nobody has said anything about it or brought it up anywhere
>>16966468The manifold hypothesis is more interesting because if classical analysis is adequate for LLM analysis math degrees devalue again.Kind of like how every singular doctor is convinced they want to heal patients but the institutional structure of medicine prefers dependent patients.
>>16966471yes, and I find that it's useful to drag the lean files into any AI architecture you're working on, then allow the AI to extend the lean proof up to what you want to do. I actually think math degrees might become useful again since theres an implied langlands programs connection now? It might be that we can use this to try to study hwo to hand design L-funcsthe question is if we can use this to optimize for consumer hardware
>>16966468> I realized it though creating an AI architecture then did the lean afterwards>I realized that AI must be manifold based while studying newton fractal fiber bundles such as this image. I realized that AI must be related and manifold basedyou are in transgender AI psychosis and sitting here in an /x/ thread typing to yourself. You must feel yourself getting stupider every day as you delegate more and more of your reasoning to these things; I'm not sure what you thought the outcome of posting what is now evidently slop here and on reddit would be under the pretense that you wrote it, but you arent going to get praise. Also, after going back and actually reading the lean, your model has written complete and utter slop, the main identity chooses the parameter from the answer, (x_k=q,\ y_k=w_k^{-1}), so (P(q,w_k^{-1};q)=w_k^{-1}/w_k^{-2}=w_k), i.e. (w_k=w_k), not a derivation of attention; because this holds for any (w_k>0), the softmax/query-key structure is unused, (\forall w>0,\exists y=1/w:P(q,y;q)=w); the “GPT-2” model stores (s,V) directly and ignores the input (x), so it proves equality for (A(x)=\operatorname{outProj}(\operatorname{softmax}(s)V)) with (\partial A/\partial x=0), not real attention (A(x)=\operatorname{softmax}(Q(x)K(x)^\top)V(x)); the off-query theorem assumes (d>0) but the construction uses (d=x_k-q=0), so the bandwidth law is disconnected from the representation; the claimed bandwidth implication is reversed, since (w\le 1/(2d)\iff d\le 1/(2w)), not (d\ge 1/(2w)); the pruning “bound” is only (|\sum_{k\in S}w_kV_k|\le\sum_{k\in S}w_k|V_k|), plain triangle inequality; and the SSM decomposition is just (A=h+(A-h)), not a separation theorem.
>>16966479thanks for feedback, checking it out now :)
>>16966479Ok I applied fixes, and added /sci/ anon(s) to authorlet me know if you see anything now. should be more defensible now. I still need to thread the new analytical cauchy proof through the transformer toy etc but I'm getting tired and have work to do as well, tomorrow I'll keep looking over things. there are definitely more smells and things to organize. despite your choice words about my mental state, you are the most helpful person I have ran into tonight, and your ability to understand lean code helped, thank you deeplyfeel free to pull req
>>16966458I mean, I'm ok with this. Usage of Lean to back up AI's proofs is a strong move. Nothing wrong with that. If OP had the overarching idea, maybe some essential insight of a proof, and made AI write it up, formalize and verify, the work is valid.But I think it will be much better received if framed like this:Representing a transformer layer as <this CP formalism> allows more powerful, high level arguments about it. Here is an example statement: <diagonal SSMs can't do something>, which is intuitive but difficult to prove, and here is how we can use the CP formulation to show it: <the core idea in words>. Then, move on to prove that CP formulation is equivalent to the original transformer. Acknowledge AI, say that Lean was used to ensure the correctness of the argument. The formal stuff can then be in the appendix/supplements.This all can actually be pretty popular, this is within the trope of self-improving AI which is hot right now. If OP were in DeepMind or something they could plausibly spin a Nature paper out of this.
>>16966645good take anon I think I'll do this, like more of a 'ai discovered this' sort of angle?I would say that I had some spark, but it was vague enough that seeing manifold geometry made me spend 8 months vibe coding ai architectures until I stumbled onto a cauchy based one by taking newtons fractal then dampening and rotating it
GPT LLMS is the spawn of the devil, no one knows how it really works.
tomoko posting nigger
>>16966923don't you disrespect queenanyway, I'm cleaning up the files and going over them. I understand I might've like, schizo'd out a little and released the beautiful thing too early. I really want my tsundere professor/reviewer to come back because he's actually useful desu but we'll see
>>16967066How does she smell???
>>16967067like flowers (and a month of nonshowering)
>>16967068HNNNNNNGH!
>>16966466also, I thought a lot about this post today meow meow like you don't realize how much people like you need people like me and vis versa, formalist>>16967069baguette
>>16967070
is there like a repo i can use to get lean to define a transformer for me? I have one but I'd trust some git autists lean transformer more
lean code is cleaned up now I need my tsundere reviewer to call me a schizo tranny and let me know exactly what to fix
Lean is weird because I'll often prove things in the system without fully understanding what I'm doing. I kind of "feel" my way through a problem until I expect the goal to be completed.