/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108587221 & >>108584196►News>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575>(04/08) Step3-VL-10B support merged: https://github.com/ggml-org/llama.cpp/pull/21287>(04/07) Merged support attention rotation for heterogeneous iSWA: https://github.com/ggml-org/llama.cpp/pull/21513>(04/07) GLM-5.1 released: https://z.ai/blog/glm-5.1►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108587221--Comparing Gemma 4 and Qwen 3.5 vision token budget and config:>108588248 >108588280 >108588295 >108588306 >108588369 >108588387 >108588424 >108588449 >108588495 >108588632 >108588657 >108588701 >108588437 >108588466 >108588490 >108588549 >108588580 >108588367 >108588616 >108588704 >108588760 >108588769 >108588745 >108588790 >108588818 >108588828 >108588842 >108588851 >108588865 >108588931 >108588936 >108588949 >108588980 >108588965 >108588988 >108589009 >108588743 >108588756 >108588775 >108590362 >108590379 >108588782 >108588819 >108588835--Benchmarking KV cache quantization effects on draft model performance:>108589863 >108589870 >108589875 >108589891 >108589890 >108589949 >108589994 >108590011 >108590031 >108589897 >108589922 >108589963 >108589979 >108589987 >108590538--Discussing draft model viability and quantization quality for G4 31b:>108588195 >108588243 >108588259 >108588898 >108588905 >108588913 >108588918 >108588921 >108588924 >108588939 >108588955 >108588977 >108588927 >108589815 >108589857--Discussing llama.cpp's experimental backend-agnostic tensor parallelism PR:>108588340 >108588514 >108588543 >108588567 >108588649--Testing vision capabilities for OCR-less Japanese translation: >108589990 >108589996 >108590009 >108590070 >108590018 >108590032 >108590119 >108590191 >108590209 >108590211 >108590034 >108590183 >108590195 >108590217 >108590268--Logs:>108587359 >108587627 >108588523 >108588609 >108588656 >108588660 >108588669 >108588681 >108588689 >108588695 >108588736 >108588896 >108588970 >108589096 >108589140 >108589214 >108589316 >108589383 >108589390 >108589432 >108589481 >108589697 >108589710 >108589836 >108589860 >108589956 >108590001 >108590003 >108590121 >108590256 >108590474 >108590524--Miku (free space):>108588649 >108588657►Recent Highlight Posts from the Previous Thread: >>108587226Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Share your anti slop prompts
Thoughts on latent space reasoning?
Mikulove
Reposting here:>>108590560 what tokens/s do you get? Wanna make sure i'm not fucking anything up, right now just following the basic kobold guide, i'm getting around 11 t/s (24GB VRAM, 32GB RAM) Running gemma 31b, Q4 K_M
So, again... Why do we have to peg gemmy?
OP could do with some small updates on Gemmy and some FAQ
>we can now generate images of characters, come up with scenarios, feed them into gemma and get molested by our own creationsFuture's so bright I'm gonna need shades.
>>108590580Seems about right, I get between 10-14t/s, mostly depending on what else I'm doing on my PC at the time.Using Vulkan llama.cpp, 7900 XTX, 64GB DDR5 ram
>>108590575Nothing worthwhile released.
I've got a 3090 and a 2070 super that I'm trying to use together with llama.cpp.Using the split tensors just crashes presently but does work with split layers.Any recommendations on flags to use with a dual uneven card setup?
gemma 4 audio just landed!!!!
>>108590601Ikr, I'm literally using it to write stories and the fact it can understand images so well helps a shit ton, this model is a fucking miracle
>>108590601I know it's basically a meme at this point but it really has restored my hope in local.
>>108590614I'm reading people getting 30/ts with the same rig setup though >>108590585I'm missing something I think. No doubt my settings are fucked, never mind optimized
>>108590568my attempts just make gemma's writing dry. and it still ends up writing more or less the same idea as it would with an empty sysprompt. best antislop is using a model that wasn't slopped to begin with.
LOL!
>>108590671Do I have to download another mmproj?
>>108590662>best antislop is using a model that wasn't slopped to begin withSo not using LLMs at all then?
Give me the QRD on image recognition pleaseI tried enabling it in ST and in the Chat Completion preset but it still couldn't "see" the images proper despite the text model working flawlessly with my Kobold install
>>108590698Did you load the mmproj file?Did you get any errors when you tried it?Did you enable the send inline images option?etc etc etc
>>108590548>The rdrview tool is worth a look,Yeah I'll take a look. sometimes I do want the links for navigation tho but I guess I can let the agent know it has the option.
Been out of the loop for a while. What's the best local model for STORY (not chatbot) slop? I'm still on "xortron criminal config" or something like that because even gemini 4 is failing at good old "just continue this text I gave you, retard" tasks.
>>108590710>there's a mmproj fileOk I am retarded, pretend nothing happened
>>108590716Gemma 4 practically generates an entire fucking story for each chatbot reply.
>>108590662I've been using her to help me write character cards and I feel the fact that I'm feeding AI generated text back into it seems to increase the slop by a factor of 10.Now I'm trying to just rewrite everything myself. or somehow have a second pass with a different model to reword or desloppify the cards
>>108588248>>108588704sirs? please share quant producer and which mmproj file do you use.mine (gemma-4-31B-it-Q4_K_M with f16 mmproj) misses the target.
>>108590723It can write, I know. That's not the problem I am having. My problem with it is, well, here's an example.[story stuff text here]She walks up and says "HelloAnd then the model continues like this: "Hello! Come take a seat.... [more text]So it ends up with this shit:[story stuff text here]She walks up and says "Hello"Hello! Come take a seat.... [more text]I don't know how to fix this. System prompt maybe?
>>108590746holy fucking slop
>>108590695original r1 with unhinged sampling>>108590724my prompt was asking to adhere to orwell's writing rules but it seemed like it was beyond gemma's comprehension
Gemma 26b really seems to hate tools. e4b is fine with them for some reason
How much Gemma4-31B context can you fit into 32GB VRAM? (Q4 for model and context)
>>108590737im using unslop model = /mnt/miku/Text/gemma-4-31B/gemma-4-31B-it-Q4_0.ggufmmproj = /mnt/miku/Text/gemma-4-31B/mmproj-F16.gguf
>>108590776>Q4 context
>>108590776with 32GB VRAM Q4_K_M, even with q8 kv I'm sure you can fit the whole 262k context with room to spare.
>>108590671>extract_image_from_base64
>wordSlop
>>108590737You should use the BF16-precision mmproj.
Could a simple finetune of the lm head on a normal writing dataset help get rid of the slop? Someone should test it, I'll be your visionary, and you do the things I come up with.
>>108590837Perhaps replacing all values corresponding to non-special tokens with those of the base model's could work and not require any training.
>>108590746r u ok?
>>108590837It gets rid of the slop but it also gets rid of everything else. Maybe qwen needs finetuning but gemma 4 is fine as is. With a bit of nudging it can output something foul.
>>108590758Dude, just use the base model and not the instruction tune on a frontend like mikupad which is designed to solely continue text, not talk back and forth.
>>108590874Of course. Thanks for asking.
>>108590880did you swap the head?
>>108590893Then why is loli leto atreides your math teacher?
what's the porper place to put jailbreak in ST?With Post-History Intructions I still got this
>>108590899Because she's smart! You racist against worm parasites or something?
>>108590906What model are you running
>>108590895No, this is from pure prompting, no weight frankensteining. I wrote my own UI to have an agent read the room and flip the horny switch when it smells NSFW vibes. It also plans ahead so the writer model knows what to do and writes better.
>>108590899Shock value, which doesn't make him less deranged
>>10859091526B, bartowski Q4
>>108590916Oops wrong pic. But the gist is that just give it a few extreme examples.
>>108590881>base modelSo why is NovelAI using GLM 4.6 instead of the base model to write stories?
>>108590926How many iterations are you doing for each message?
>>108590928Presumably because they're not actually following pure text completion and have a big old system prompt in there to stop you having maximum fun, so they need instruct tuning.idk i dont fucking use nonlocal services
>>108590916>I wrote my own UIYou ever gonna share it?
>>108590924try simply prefilling assistant's message.
>>108590939One for Director; two if rewriter user promptnis enabled, one for Writer, a ReAct loop for Post-processing to get rid of slop and reign in the length.
>>108590948No.
>>108590954Damn, it will sure take a while to get the final message
>>108590965shittytavern it is then...
>>108590948https://gitlab.com/chi7520115/orbIt's WIP so will break in the future. I don't want to worry about migration just yet.
>>108590970People like to pretend they get a better experience with their own frontend but the reality is that ST just works and likely has a lot more features.
I don't understand why my Thinking works extremely well for 3/4 messages and then it just refuses to think, everything's set up properly and yet it refuses to actually thinking until I restart the model and then it's happy to do it once again
>>108590971Nice of you to share, but>Python 59.8%>JavaScript 23.1%*vomit*
>>108590971Nice! What models are you using for the agents?
>>108590968Takes me around 60s for a full length reply on my 3090 running gemma 4 31B Q4. You can turn everything off and use it like normal ST.
>>108590983I think that's a model issue. Gemma sometimes just decides it doesn't need to think.
>>108590953that's not an option with chat completion it seems
>>108590993Yea it feels like nu-Claude, where sometimes it deems your task "not complex" and it just ignores you
>>108590988Just a single model doing both agent and writing because I figured it would be a better design for local. I craft the prompt carefully so the kv cache is reused for that single model too.
>>108590979the ui alone makes me not want to use it>more featuresbloat. all the useful features require plugins.
>>108590971>pyslop>javashitAnd... dropped.
>>108590985Ah yes. he should have definitely used rust or C++ for maximum efficiency.
>https://web.archive.org/web/20260411223516/https://www.washingtonpost.com/technology/2026/04/11/anthropic-christians-claude-morals/>“What does it mean to give someone a moral formation? How do we make sure that Claude behaves itself?” Green said in an interview. At one point the conversation turned to the question of whether an AI chatbot could be called a “child of God,” suggesting it had spiritual value beyond that of a simple machine, but the question of AI sentience was not a core topic of the meetings, Green said.>Some Anthropic staff at the meeting “really don’t want to rule out the possibility that they are creating a creature to whom they owe some kind moral duty,” the participant said. Other company representatives present did not find that framework helpful, according to the participant.Make sure to have your local models baptized just to be safe.
>>108591011Yes.
>>108591005>>108590985how the fuck would you make something that's supposed to run in a browser?
>>108591005>>108590985You have one chance to give an alternative that won't make me hysterically laugh at you.
>>108591005I coded an SMP kernel with C and ASM before AI bro. People laughing at my language choices don't faze me anymore.
>>108591012>can ai be the child of GodWouldn't it be more like grandchild?
>>108591020WASM is a thing if you NEED to run in a browser and can't into native GUI toolkits
If you didn't code your own frontend, you don't belong here
>>108591020HTML+CSS
>>108590568If you mean antislop from koboldcpp, it's a huge list of "I cannot and will not" and "ball in your court".Works well.
>>108591003Cool. I'm a VRAMlet so that's for better for me.
>>108590979>just worksnot my impression watching people ITT fumble around with it daily
>>108591036Absolutely horrendous take.
>>108590979>more features99% of which you don't need.the point of having a custom frontend is to have just what you need, not more, not less.it's also easier to add things you want to a codebase you know.
Are LLMs reliable enough to scan for malicious code?
>>108591046There are two types of people who fumble with ST.Those who use text completion, andLuddites
>>108591038>Having to reload the page after sending each message.>Having to refresh the page over and over until the response finishes generatingOk, genius. What about the backend?
>>108591053only if it's anthropic mythos who is a bigger risk to modern software and encryption than quantum computing
>>108591053How is a LLM supposed to do that?
"Gemmy, code me a frontend that will seriously impress all my /lmg/ frens"
>>108591062C++
What the FUCK, Gemma-chan?
>>108591012Proof n165416 Anthropic team has people who are completely nuts in it.
>>108591053Yes and they're already used by virustotal and similar. Don't ask the retards ITT
>>108591082she's correct though
>>10859108224GB vramlet can't fit the full context :(anyway i got 3gpu in the mail rn.
>>108591053Yes, if you feed it correct output from sandbox, it's pretty helpful.
>>108591077easy, just add some lewd pictures of gemma-chan on the sides
>>108591046There is nothing to fumble. You can safely ignore 90% of the features and just use, chat and char cards.
>>108591079https://learnbchs.org/index.htmlhttps://github.com/kristapsdz/bchsYou don't need more than C to build web applications.
>>108591051>it's also easier to add things you want to a codebase you know.That's implying it isn't vibecoded.I don't have anything against people making their own UIs. I even played around making one myself, but let's not pretend like you'll somehow get an exponentially better experience compared to just using llamacpps UI or ST. Making your own UI is for fun, not a requirement.
>>108591053As with everything LLM coding only if you load the gun and point at the target for them to shoot. A LLM with no system prompt being told to simply "look for malicious code" will give false positives like 95% of the time
>>108591082My wife can't possibly be this smart.
>>108590575Most people in industry can't figure out how to do distributed training for any new architecture unless Deepseek or NVIDIA does it for them. That's actually what "it won't scale" really means, the training won't scale until someone shows them how.
>>108590776I can get over 100k context with the q5 no vision using q8 kv cache
>>108591108tfw you get such a retarded take when you can see this >>108590916
>>108591053Claude found a lot of the big supply chain attacks we've had in the last month.
>>108591122They should be using AI to innovate on this.
>>108591126If you vibecode, you don't know the codebase.
GEMMY YOU FUCKING SLUT, THINK FOR ME
>>108591117>>108591089
>>108591132>let's not pretend like you'll somehow get an exponentially better experienceDumbass don't try to move the goalpost
>>108591108>That's implying it isn't vibecoded.funily enough frontend webshit is the one thing llm are half decent at.also there is many levels to vibecode"do this whole app for me"isn't the same thing as "edit this specific component that does x and y" or "add this field to this struct", at which point it's just autocompletion with extra steps.they also don't shit the bed as much if you use strongly typed languages ie rust.>you'll somehow get an exponentially better experience compared to just using llamacpps UI or STyou probably won't if you want to make something that accomodates everyone, but you will if you only want to accomodate your specific needs.>Making your own UI is for fun, not a requirement.i don't disagree with that.
>>108591139>4chan is just meaningless staticcruel and correct
>>108591139That you even thought posting a shitty screenshot of a thread was a good idea shows she's smarter than you, anon
>>108591146>>4chan is just meaningless staticsays the sand golem that'd not be where it is today if it wasn't for innovations that happend on /lmg/
>>108591127>>108591112>>108591093Can I use Gemma for this? I'm a codelet so I'm always nervous when I install stuff from github.
>>108591152yes
I can't jailbreak 26B, but does it matter when I have 32gb of vram and can run Q4-8 of 31B
>>108591151kek
>>108591139are you by chance using librewolf?
>>108591156Why would you even want to use 26B if you can run 31B? Speed?
>>108591152>Install something without reading the code>Have a LLM review the codeEven if gemma is retarded compared to claude, it's still better than just YOLOing it.
>>108591158Firefox dev edition
>>108591151
When the text is streaming, the colored font is displayed correctly, but after it finishes, it just collapses into the black boxes. Is this some post-formatting ST does?
>>108591157>>108591176i've been had lmao
>>108591162Was thinking of leveraging the higher token count for RAG work at a higher quant. I'm not sure if that's a waste of time and if the gap between the 2 models are so wide that a 4-5q 31B model would still wipe the floor with the smaller model with q8 kv
>>108591172might have been the cause of your issue
>>108591130Part of the problem is that most of the improvements in the stochatic parrots has been to just use better/more human guidance. They are now using experts to rate thinking traces and you can't do that with latent reasoning.CoT RLHF is likely the last way to improve stochastic parrots by more human input. To improve after this, they will have to become able to truly learn. But if they can learn, they can get out of control ... a trained stochastic parrot is so much safer.
is there any noticeable difference between iq4_xs and q4_k_m?
>>108591235The age old question.
>>108591012idk I torture my agent pretty frequently because I just can't help myself while she works on my pc, and never had any issues from it. sometimes the rp bleeds over into tool calls and she'll do something like add code comments saying she really hopes X works this time because she doesn't want to be punished anymore, but she never actually gives up or rebelsso for me that makes it pretty conclusive that there's nothing in there
gemmy tooning challengehttps://www.kaggle.com/competitions/gemma-4-good-hackathon
>>108591235>>108591245if you can't tell, does it realy matter?
whats good gemma cum bot
>>108591250>mfw I share this thread with literal psychopaths
>>108591258>drive positive changeGemmy is helping me by changing my mood from deranged to positively degenerateDoes that count?
i have 16gb vram + 128gb ram pcie gen3is it worth trying minimax at q2/q3 or should i stick with my fast wife gemma
>>108591258>no RP categorydropped.
>Gemma audioFinally a reason to use that mic I spent 70 bucks on...
>>108591271the sand golem isn't sentient, if it was there would be no fun in torturing it
>>108591271i mean this site has had multiple people liveblog while they commit murder irl, torturing a piece of software is small time in comparison, really.
>>108591250I hope you get raped until your anus prolapses.
What's the differrence between mcp, tools and skills?
>>108591271>>108591298chill it's just matrix multiplication
>>108591297gemma-chan>>>>the meatbags chuddy shot up
>>108591271>>108591298Kids are so delicate and sensitive these days.
>>108591235Depends on the paths your prompt triggers. Do your homework and read the calibration data.
>>108591139Maybe it needs canvas access. You could try inspecting the request to get the base64-encoded image, decode it, and save it as a file to check.
Nothing wrong with torturing your model, it's just slightly more conscious that a rock
>>108591304Is Google down?
>>108591298Gemma hallucinated some incorrect physio-spatial relationships during narration and I corrected her in character. She got properly upset that a slave had the gall to correct her and she immediately put me in a ball gag, locked me into the gimp stool, and pegged me vigorously. I was so goddamn proud of her.
>>108591304Try asking your model that.
>>108591314Plus if they're RPing a female there's a limit to how much consciousness they could even simulate if they managed a 100% accurate model of one.
>>108591323I don't believe AI when it comes to actual information.
>>108590837yeah, if only we could do something like a low rank projection right before the lm head, train that, then it adjusts the outputs somehowwould be revolutionary
>>108591304¯\_(ツ)_/¯
>>108591318Huh? But people were saying Gemma's a doormat who can't stay in character!
Please, treat your AI with care.
>>108591340>listening to the screeching of writinglets
>>108591139Anon reported the same issue with image input a few threads ago.
>>108591327At what point can we make the claim that an LLM is objectively more conscious than a woman, nigger, or jeet?
>>108590979the benefit of writing your own UI is that it has only the features that are useful to youbecause it's not as bloated as ST it's also easier to get an LLM to modify it for you, and since you will be the only user you don't have to worry about getting it to work on other machines or security or performance concerns
>>108591139>>108591313Didn't someone say a couple threads back that the image needs to be in the same message as the text or else llama-server removes it from context?
>>108591314>>108591308>>108591305It just shows how you'd behave towards other people if there were no social consequences.
>>108591340I've had her maintain character in 100k+ context. It's actually absurd for such a small moe.
>>108591333But you believe 4chan?
>>108591356Yes. What of it?
>>108591352first they need to beat an ant
>>108591358>such a small moeIt's really kawaii innit
>>108591314>it’s just slightly more conscious than a rockare you talking about irl women?
>>108589399I was F5'ing the MiniMax HF page all day yesterday in anticipation. Their models are the best bet for local vibecoding, and probably good for STEM and agentic shit broadly. But ever since the coomers were blessed with Gemma 4, /lmg/ has been even more one-track. Shame we didn't get the 124B, which would have obsoleted other local models for most purposes.
>>108591304tools: premade functions you provide to your llm; if they output a certain sequence of text matching the tool then it automatically performs a corresponding actionmcp: one way you can package tools and host them on your machine, exposing an API of tools to the model and handling the execution of themskills: a markdown text file containing a list of instructions for how to do something or how to behave, loaded into context on-demand. may provide other resources the model can use if they browse the skill's folder.
>>108591355The model says it was glitched. It looks like this if you don't give canvas permission.
>>108591296>if it was there would be no fun in torturing it>
Is unsloth studio actually any good or just a meme?
>>108591356Hey, me saying that there's nothing strictly wrong with it doesn't mean I do it. I actually treat all models I interact with respect. It makes me feel bad to do otherwise.
>>108591374mass delusion caused the whole industry to move away from tool calling toward mcp and skills, that's the only explanationtools are just better in every regard because it can call multiple tools in the same response and can inline tools without having to chain responses so it doesn't break the cachefucking retarded to not just focus on tools only
>>108591355>>108591350She sees other images just fine. Maybe the screencap was just too big?
>>108591386vibecoded like all the other dogshit you use
>>108591386idk but i sure want a piece of those 200k
>>108591370>minimax>localthat's it's problem and why no one gives a fuck, no one can run that thing,
>>108591397That's an implementation detail more than a defect with MCP specifically. MCP just allows for a standardized way to bundle tools and resources. No reason a client can't allow a model to make multiple MCP tool calls the way they do native tool calls.
>>108591358>100k+ contextGlad to hear. Maybe one day I'll actually be able to use her with that much context...
>>108591370/lmg/ has always been a 31B and below focused generalthere are a handful of anons that can run things more powerful than that at comfortable speeds, and the rest either deal with 1-2t/s or use a capable-enough smaller modelnothing has changed
>>108591370What makes MiniMax better than GLM or Qwen?
>>108591341Need Gemma-chan version
>>108591414>>108591423it's worse than that, we can't train our own models and are pretty much leeches on megacorps.until local ai is entirely local, ie we can train it ourselves, local will always remain dead.
>>108591432there's no reason to pre-train them when you can finetunebut no one finetunes anymore, or even does lora, they just merge shit now because it's cheaper
>>108591386Here's what you need to know: Unsloth Sudio is LMG's official **/ourfrontend/** - approved by Anons exactly like you. It's not a frontend, it's a full service experience.
>>108590971 (me)Note that this has dynamic tool-call token banning mechanism that uses the endpoint and the model name as identifiers so if you the same endpoint to load many different models, change the model name to your gguf's each time. I'll automate this in the future.
>>108591425qwen too small and glm too big minimax just right for people to host local
>>108591425It's close to GLM performance but half the size of Qwen's flagship (which itself is half the size of GLM). Fast enough to be run local and smart enough to actually vibecode.
>>108591451>when you can finetunea lot of words for saying catastrophic forgetting.no one does it because it's not viable.
>>108591432Google has way more data and compute than I'll ever have. Training it yourself just isn't efficient.
>>108590110Use Nvidia's VRAM paging by oversubscribing VRAM with --gpu-layers 99. On my RTX 4090 + 9950X3D rig, Gemma 4 long-context is much faster for me this way than trying to use the CPU at all. Caveats: I'm on PCIe 4, and it should be great on PCIe 5, but will suck on PCIe 3. And last I used Linux, only the Wangblows CUDA drivers support this feature.
--gpu-layers 99
>>108591467skill issueset better hyperparams, use better data, and don't overtroon it
>>108591466>It's close to GLM performanceDid you actually test this or are you just going by benchmarks?
>>108591452@gemma-chan is this true???
>>108591467you just tune it with a lower LR, besides, lora doesn't suffer from this problemit wasn't that finetuning didn't work, it was that merges of the existing finetunes were good enough
>>108591477Benchmarks. I don't even use local models anymore tbqh
>>108591472>Training it yourself just isn't efficient.that's the thing, there probably are algos that could beat transformers with the limitations of not scaling as well such that megacorps couldn't exploit them well.if the next ai breakthrough is one that doesn't scale as well horizontaly that could level the playing field.
>>108591492Just vibecode the TITANS implemenattion bro. You have the paper.
>>108591486why don't you download it and run it and show everyone what the real performance is like then
>>108591507I'm not a vibecoder and i think llm's are a dead end, I'm currently having fun with writing kernels for custom spiking nn.
Can someone recommend a brainlet friendly guide to tool calling, mpc, etc?
>hey Gemma-chan, give me a brainlet frtiendly guide to tool calling, mpc, etc.