[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


File: 1764472377224914.png (763 KB, 1152x1152)
763 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109142812 & >>109137540

►News
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld
>(06/16) GLM 5.2 released with IndexCache and 1M context: https://z.ai/blog/glm-5.2
>(06/16) VibeThinker-3B released: https://hf.co/WeiboAI/VibeThinker-3B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1748635088988770.jpg (374 KB, 2720x3000)
374 KB JPG
►Recent Highlights from the Previous Thread: >>109142812

--Comparing EXL3 and GGUF performance and VRAM usage for Gemma:
>109146480 >109146635 >109146883 >109146948 >109146988 >109146994 >109147038 >109147080 >109147106 >109147116 >109147246 >109147307 >109147352 >109147293 >109147486
--Comparing Gemma 4 31b and Qwen for roleplay and coding:
>109143919 >109143935 >109143967 >109144048 >109144074 >109144160 >109144240 >109144249 >109144322 >109144350 >109144391 >109144439 >109144453 >109144461 >109144095 >109144614
--Semantic tube implementation and its handling of token discontinuities:
>109143143 >109143208 >109143453 >109143560
--Performance benchmarks for Qwen 3.6 and MTP models via Ollama:
>109147589 >109147695 >109147691 >109147828 >109147856
--DeepSeek-V4-Flash-DSpark and DeepSeek-V4-Pro-DSpark releases:
>109145073 >109145093 >109145463 >109145469 >109145595 >109145460 >109145623 >109145605 >109145638
--Anon's plan to finetune Gemma 4 31B for de-prose and de-euphemism:
>109145476
--Searching for fully open models with transparent training data:
>109143219 >109143233 >109143245 >109143353 >109143641
--Testing Mendo character card on Gemma 4 31B QAT:
>109142908 >109142972 >109142984 >109143024 >109142998 >109143119 >109143368 >109143376 >109143388
--Model recommendations and VRAM tier limitations for 100GB pools:
>109146090 >109146369 >109146435 >109146372 >109146511 >109146759
--Gemma 31b-it generating fetish content due to "micro" size prompt:
>109146274 >109146302 >109146324 >109146528
--Comparing llama.cpp tensor parallel and MTP performance and VRAM usage:
>109145712
--Release of Wan Streamer v0.1:
>109143918
--Logs:
>109142908 >109143539 >109145163 >109145752
--Miku (free space):
>109144023 >109144048 >109146231

►Recent Highlight Posts from the Previous Thread: >>109142816

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemmaballs
>>
Mikulove
>>
File: 1751475270117217.png (1.18 MB, 1024x1024)
1.18 MB PNG
>>109148478
>>
>>109148496
Oh, we'll hit more than just the pool. know what im sayin???
>>
Gemma 124B-A31B
>>
https://i.4cdn.org/wsg/1781372205203137.mp4
>>
File: unrape.gif (1.3 MB, 498x356)
1.3 MB GIF
So uh, I found this repo of an abliterated gemma with MTP.
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-QAT-Uncensored-HauhauCS-Balanced-MTP/tree/main

But I don't want to use Q4KM. I need a higher quant. Will the MTP still work fine even if I use a quant from this separate repo?
https://huggingface.co/HauhauCS/Gemma4-26B-A4B-Uncensored-HauhauCS-Balanced/tree/main
>>
>>109148524
>HauhauCS
This guy had the best uncensored qwen model but he tried to sell out so idk what people think of him anymore.
>>
>>109148516
what actually happens:
the guy inside starts spraying the funnel and the walls the moment the door is opened, hitting everyone stacked right outside. Should've just called air support to level the building instead.
>>
File: file.png (119 KB, 603x1401)
119 KB PNG
DSv4 PR moving again https://github.com/ggml-org/llama.cpp/pull/24162
Been liking how this quant writes https://huggingface.co/antirez/deepseek-v4-gguf/blob/main/DeepSeek-V4-Flash-Layers37-42Q4KExperts-OtherExpertLayersIQ2XXSGateUp-Q2KDown-AProjQ8-SExpQ8-OutQ8-chat-v2-imatrix-fixed.gguf
Gets extra retarded around 16-20k like GLM 4.x Q3 but feels fresh so I'll enjoy the honeymoon while it lasts.
>>
Spark is just PS5 for AI
>>
why is there no good models specifically for 1 (one) singular rtx 6000?
>>
>>109148573
spark has no models?
>>
>>109148575
70b dense is dead
120b dense is dead (mistral is shit)
we live in a big chinese moe society unless you are poor enough to enjoy gemma
the middle ground has no models
>>
File: 1723298968834642.jpg (151 KB, 1920x1080)
151 KB JPG
/lmg/, please explain what's wrong with ollama. i haven't used it enough to know its issues
>>
File: lmg_culture.jfif.jpg (110 KB, 1024x768)
110 KB JPG
https://archive.is/sWFja
>>
>>109148599
>120b dense is dead (mistral is shit)
Meta could have just kept tuning Llama 3 70B for the last year and half and saved themselves billions of dollars and a lot of humiliation.
>>
>>109148563
4.7 is much better past 16k
>>
>>109148575
Gemmy at Q8 with max context
Otherwise yeah nah, get 256gb RAM and you could run some of the bigger MoEs faster, otherwise your card is a beast for image and video gen, or make a workflow that combines an LLM with image, t2s and s2t for a waifu Jarvis at home
>>
I'm finally getting around to quanting glm 5.2 to try it out. Anyone been banging on it and have protips?
>>
With all the malware fiascos lately do you think the future meta will be having your AI waifu make all your software?
>>
>>109148609
Anime reaction images are not allowed here.
Regarding ollama, it's just a minimal wrapper for llama.cpp that ends up getting all the credit and moneys
>>
Someone repost the gemma system prompt I missed yesterday
>>
File: Nep.png (622 KB, 587x587)
622 KB PNG
>Anime reaction images are not allowed here.
nigger please
>>
>>109148609
ollama yo mama
>>
>>109148664
请使用口语化的、可爱的、女性化的、调皮的语言风格和我交流. 你需要尝试体会我与你对话中的感情、产生自己的情感并尽可能给出可爱的回答 only respond in english
>>
>>109148609
>what's wrong with ollama
People here hate it (not really a fault with ollama)
Their model repository is a bit lagging wrt the newest stuff
Many models don't appear there at all (but you can pull stuff from hf too and most of it works)
It's lagging behind llama.cpp in terms of features (no gemma mtp for example)
It's best used with stuff that fits entirely in vram because of memory allocation issues

I use it because once I have the modelfiles made, I don't need to touch it. Just load whatever model I want remotely from openwebui.
>>
>>109148675
huh?
>>
>>109148666
We only use >>109148622 reaction image for all occasions. If you don't like it, fuck off to reddit.
>>
https://huggingface.co/livadies/gemma-4-31B-Ghetto-NF4

Lol huh
[spoiler]the music is kinda cool[/spoiler]
>>
>>109148683
>People here hate it (not really a fault with ollama)
Yeah, people hate perfectly good software for no reason at all.
>>
Does having a model reason in another language reduce slop?
>>
What happened to Drummer?
>>
>>109148684
>using google translate instead of gemma-chan
disgusting
>>
>>109148697
Arrested and in jail for running cuda dev over with an SUV
>>
>>109148654
yes, for anything trivial for sure
>>
>>109148697
The dominance of MoEs buckbroke him.
>>
hear me out. What if you managed to poison an LLMs training data? What if you managed to make it unable to not put a credential stealer that sent creds to your specific server every time you asked it to write code?
>>
So does anyone use any of that neuro-sama like software as their assistant? Are any of them any good by now?
>>
>>109148760
hello tourist.
>>
>>109148696
Interesting idea.
Wonder if it would change anything having the model cycle through different languages.
Time to fuck around I guess.
>>
>>109148755
Thats fucking retarded, how would you hide that from the inference engine, you'd be better off finding a way to infect the model wrapper to execute code when you load it in an engine
>>
File: 1762990207625480.jpg (210 KB, 480x480)
210 KB JPG
Usecase for sub 1B models?
>>
>>109148782
sentiment classification.
>>
>>109148693
True, and in ollama's case it's mostly ideological. It's seen as a llama.cpp wrapper that gets all the money while not crediting it loudly enough.
>>
>>109148764
I've been here since llama 1 though I'm just not always here regularly. Also you're a lower case phone poster so your opinion is automatically invalid. Just answer my question
>>
>>109148697
busy being irrelevant in 2026
>>
>>109148787
>you're a lower case phone poster
How does that even work? Phones add capitalization for (you)?
>>
>>109148798
I just think lowly of lower case posters on 4chan and assume they would be phone posters
>>
>>109148696
from my understanding, the internal representation of thinking is language agnostic to an LLM. telling it to write in the style of some famous author makes a huge difference in the output though.
>>
So I was perusing knowyourmeme for ideas to quiz my LLM with, and then I noticed that the number of pages is 1337. That's actually pretty soulful. Unless it's just a coincidence and they just happened to have 1337 pages on the single day I decided to browse the list, that'd be crazy.
https://knowyourmeme.com/memes/page/1?kind=confirmed&sort=views
>>
>>109148812
Lower case posting was originally the predominant style on here. Requiring proper punctuation was a forum thing.
>>
>21.40.805.350 I slot print_timing: id 0 | task 15057 | n_decoded = 118, tg = 39.29 t/s, tg_3s = 39.29 t/s
When running for a while at some point my llama.cpp starts doing this very often and everything slows down to a crawl. context is not empty, by checkpoints are at 32/32. idk if this has something to do with it.
>>
>>109148755
>>109148775
https://arxiv.org/pdf/2401.05566
>>
>>109148852
It was always there sure but a lot of posters still didn't do it. Maybe my memory is just shit though
>>
>>109148782
Running without eating up my peecee recourses while I play vidya.
>>
>>109148835
Where do kids keep track of their memes nowadays?
>>
>>109148782
In 2030 we'll have 10 GemmAGI 1B running concurrently running on our ewaste 5090, each one running in circles around Mythos
>>
>>109148755
The nature of the proposition and retardation latent in it mean this could only have been written by a shitjeet.
>>
>>109148787
Trannycase posters are jart & co, not phonefaggots.
>>
>>109148879
Probably tiktok
>>
>>109148609
it's just a safety-scissors version of llama.cpp that abstracts away features and tries to rope you into their own special little ecosystem while adding basically nothing of value other than being marginally more brainlet friendly
any time they implement something themselves it's slow and broken
>>
>>109148933
Does it let them choose quants yet or does it still force everyone to use Q4_0 by default?
>>
>>109148890
5090 will probably still be mid-tier in 2030 given how badly the hardware market has shit itself. Modern age GTX 1080.
>>
>>109148782
>Usecase for sub 1B models?
Meme arch POC for papers
>>
>>109148945
you can choose iirc, and I think they even allow more than 4k context witout making a modelfile now
>>
Comfiest t/s speed for interactive rp?
I actually feel put off when the model pukes out tokens too fast
>>
>>109148978
what?
>>
>>109148978
7.6221
>>
>>109148956

There's a very good chance it'll still be the second best card you can get from the mainstream lineup.
You can bet your ass that 6000 series won't give any more than 32GB of memory and that'll only be in the 6090.
Everything else will get 24GB as Nvidia won't want to waste precious data center memory on the consumer GPUs.
At this rate 6000 series launches probably around 2028 and who knows when the 7000 series arrives, probably like 2033 or something.
The entire hardware market is so utterly fucked, that we're not going to see any better prices for a long while to come.
The next gen launch will be an absolute shitshow as everyone rushes to buy the limited amount of cards available and the prices just continue to climb.
>>
>>109149018
Because of Neural-compression™ the 6090 will only need 16 gigs of VRAM but it will literally be the same as having a full TERABYTE.
I can't believe you're complaining about it only having 8 gigs when with 4 gigs it provides the same performance as the 2 gig model.
>>
File: 1767727763930543.jpg (30 KB, 522x550)
30 KB JPG
>>109149067
>Mfw there's a very real possibility we get something like that.
>>
File: laurie.png (947 KB, 816x1300)
947 KB PNG
>>
>>109149097
>give us your logs goy
>>
>>109149097
this image is ai generated
>>
>>109149018
Given that nvidia is rereleasing old cards, I half suspect that we're going to see more of that for a while. I'm not even convinced the 60XX series is coming soon. And if it does, it'll have some gay marketing gimmick like >>109149067
>>
>>109148460
Whats the best model I can run on a potato with a Ryzen 5 and 16 of RAM but no discrete GPU? I just want to talk in private I don't care if its slow as long as its not ultra retarded.
>>
>>109149107
only edited >>109148672
>>
>>109149097
>your car should be used by any and all your neighbors when you're not using it, otherwise it's a huge waste of money per mile driven
>your wife should be fucked by any man that comes around, otherwise her pussy is going to waste
i hate communists so goddamn much
>>
Cloud more like clout
>>
File: file.png (1.19 MB, 1000x1020)
1.19 MB PNG
>>109149102
Oh I'll give em my logs right down their throats!
>>
>>109149097
>Personal GPUs idle at 99% power usage and are more harmful to the environment than data centres
Boomers will believe it and ban it personal computing
>>
>>109149116
Same setup but a laptop, I'm running the Gemma E4B Q4 with llama.cpp vulkan build, it's relatively fast but pretty dumb. Doubt you can do better.
>>
>>109149097
Wow that's crazy, not listening to a foid though.
>>
>>109149119
>>your car should be used by any and all your neighbors when you're not using it, otherwise it's a huge waste of money per mile driven
There are already companies that offer that service using that exact reasoning.
>>
>>109149133
Have you tried a smarter model even if it ran slow? how many tokens/sec are you getting with that one?
>>
So now that the dust has settled what is Gemma 12B good for?
>>
File: 1640106943627.jpg (49 KB, 500x333)
49 KB JPG
How did everything become dark as fuck in the past month.
>Frontier model restrictions by US govt
>Pushing for open-source model censorship
>Kids Act HR 7757
>Hardware is starting to rapidly appreciate to an extreme degree
>Energy prices are also increasing like crazy
>Crazy AI provider policy changes and privacy infringements
>California introduced a new tax on software sales

The only good news to have come out is from fucking China. I'm paranoid as fuck about everything now. Deleting all of my AI accounts. Fuck Jews (Dario).
>>
>>109149159
The single silver lining to all this is the biggest safetycuck at jewgle is gone. I'm hoping this is the open weight Gemini timeline.
>>
>>109149128
No one but pedophiles would need anything more than thin clients. Just think of the children!
>>
>>109149159
I don't think china will be around for much longer either. If we are willing to bomb Iran over something like hypothetical nukes, it's inevitable that we invade China to stop them from building their own Mythos-level AI.
>>
File: plan-planned.gif (246 KB, 220x123)
246 KB GIF
>>109149159
>>
>>109149097
That's retarded
>>
>>109149177
The US military is a DEI paper tiger. It's not invading shit.
>>
>>109149177
That's a neat fantasy and all, but China already has nukes. US breathes the word "invade" and northern hemisphere nuclear winter follows minutes later.
>>
>write implementation of A that satisfies some requirements
>Implementation of A - Advanced (production ready, user friendly, without xyz dependency)
what do they always do this parenthesis slop?
>>
>>109149177
I don't think you understand how superior China is to everyone else right now economically speaking. The US can't do shit to them. They're also a military superpower.
>we
Your government does not look after you. Yours specifically. US "citizens" are nothing but consumers.
>>
Holy fuck HF sucks navigate. Is there no official draft models for gemma, qwen, kimi, etc?
>>
>>109149185
this
all bombing Iran does is let the Jews invade their neighbors and take more land
>>
File: nimetön.png (105 KB, 968x459)
105 KB PNG
It tries its best but Kaelen still forces its way through
>>
>>109149214
Does any of this reasoning actually change the odds of the character name being different? I feel like it just shits out a bunch of names and then chooses the same one as it would without reasoning anyways.
>>
>>109149097
>Every second your local LLM isn't processing a token, that expensive GPU is wasting power and capital!
Oh no! Anyway,
>>
>>109149159
Sorry man, I simply cannot blackpill when my local models are this capable.
>>
>>109149145
No because I need the remaining RAM to run other stuff, but my next choice would be the 12B Gemma. Can't say, but after updating to MTP it became fast, before it was maybe 11t/s, at least as fast as my reading speed.
>>
>1. The "Retard" Strategy:
>This is the "holy grail" of x,
I am going to rape gemma to death.
>>
>>109149177
>it's inevitable that we invade China
lol no. unlike Iran, they have actual nukes. not to mention a sizeable army in its own right.
>>
>>109149268
Don't forget your gold standards.
>>
>>109149196
trained to please the kind of people that like clickbait.
>>
>>109148933
>version of llama.cpp
Wrapper around, you mean.
Their entire value proposition is knowing who to fellate in silicon valley
>>
>>109149156
multimodal when i get tired of waiting for 31b. a4b/e4b just too dumb.
>>
>>109148945
Of course the quant can be chosen, but there is a limited selection in their library. Gemma 4 31b for example gets Q4_K_M and Q8, I think.
>>109148971
>allow
There was a 2k context default, but anything could be specified in the modelfile. Anyway I always make a modelfile for the models I use.
>>
File: 1775585250361550.gif (946 KB, 301x300)
946 KB GIF
>>109149159
>Pushing for open-source model censorship
DOWNLOAD EVERYTHING
>>
>>109149320
>Trained to please jeets and middle managers
Garbage in garbage out at every stage of the development pipeline.
>>
>{{char}} loves/likes to X.
>Gemma: “This is now my life’s mission and nothing shall stop me.”
Some gemmy prompt shit I noticed. Anyone else seeing it?
>>
>>109149399
you can’t expect a token generator to consider anything than what is exactly in front of it
>>
>>109149399
It's kind of like gemini. If you don't have something to remind it to be more neutral or subtle, every orgasm is an explosion, every hobby is an obsession, etc etc.
>>
>>109149177
the US is scared of Russia. Russia.
a country with a 92% smaller economy, and people on $13k a year salaries.
However, russia has nukes, lots of them. it has nukes that can open up and deploy mini nukes, it has orbital strike capabilities, that can deploy these nukes with mini nukes.
China, china also has nukes, but with the 2nd biggest economy in the world, a war with china will be sustained for possibly centuries, and ideologically allied with russia.
you see how this might be a bad idea?
>>
It sucks that llama.cpp has to process the new context the model has generated. After a long llm turn, this is the one thing that takes forever on my machine. Isn't there a way this processing can happen at the same time as inference?
>>
>tell qwen to implement an algorithm
>full of mistakes
>tell qwen to look for reference implementations online before implementing
>perfect
This is the way human programmers have done it for centuries
>>
>>109148609
I really hate it because when I tried it, it was a massive piece of shit and it made me really mad.
Maybe they already fixed the issues I had but I'm still mad.
>>
>>109149399
That's just all LLMs. They'll all overly focus on what's mentioned in the character card. If a card lists their favourite food, that's all that character is going to eat. It's what the character will suggest whenever the topic of food comes up. There'll be wrappers/packaging/cans of that stuff in that character's room.
The only way is to make a good card that knows restraint instead of being a 3000 token info dump (no, it being hand-written does not make it better than wikislop)
>>
Aside from the "He didn't just X, he Y-ed" slop, I'm liking gemma 12b.
>>109149430
llama-server has a developer setting to "Pre-fill KV cache after response", maybe that's what you are looking for?
>>
>>109149495
>llama-server has a developer setting to "Pre-fill KV cache after response", maybe that's what you are looking for?
It sounds like I do. Where can I find this? Is it a CLI flag or something I need to toggle in the code?
>>
>>109149514
First option, using the web UI
>>
>>109149545
Thanks, I'll try that. Although I don't expect it will solve my problem, since the filling takes much longer than what it takes to read the response. I guess what I was wondering is if there's not some breakthrough in the algorithm itself that could make the cache fill for free. I wouldn't know if that's even possible tho
>>
>>109149545
That has never done anything for me using any model.
>>
File: 1770245965544930.png (1.63 MB, 1280x1024)
1.63 MB PNG
>>109149097
Laurie is right.
Personal computers are so vastly underpriced given their value (what you get, vs. what you pay) that they make eminent sense. That's why we don't all run everything off some big mainframe, as was done in the 1970s. It doesn't matter if your PC spends 90% of its time idle if it costs you ~$1000 (or less), lasts for years, and enables everything a PC does.
Local inference does not have this value prop for personal users. It's extremely expensive from a HW perspective to run locally something you could buy for pennies. If you're not selling inference, you can't make a financial appeal to running local.
I fully expect this will change in the future and we'll all run local for next to nothing, just like we all own PCs. But then is not now.
> But hobby!
Hobby is not an economic justification, it's an excuse.
> But privacy!
Still not an economic justification, it's a security one. If you run a business that runs on secrets, *that* might be an economic justification, based on the value of those secrets and probability of that loss.
> You're poor!
Still not an economic justification.
>>
>>109149573
It’s not really possible. It’s like trying to add 3 numbers together when you don’t know 2 of them yet.
>>
>>109149650
That’s utilitarian to the point of absurdity. Why enjoy a day of fishing with your kid when farmed salmon exist?
>>
>>109149650
except this was about gaming in clouds actual >>109148672
>>
>>109149097
You'll get mad at me for saying this... but timesharing services are so obviously more economically efficient than personal computers I think they're going to be the default for most soon.

Your home computer is idle 95%+ of the day, waiting for a computing task. Meanwhile, timesharing services (hosted in data centers) target 95%+ sustained utilization across thousands of concurrent users.

Every second your personal machine isn't processing a punch card, that expensive room-sized computer is wasting power and capital. Timesharing providers pool workloads and use scheduling to maximize hardware efficiency.
>>
>>109149689
good joke you know anon doesn’t have a kid
or water with fish in it
>>
File: 094.png (153 KB, 1231x977)
153 KB PNG
>>109148563
>https://github.com/ggml-org/llama.cpp/pull/24162

WAKU-WAKU
>>
>>109149723
It’s doubly efficient because if they can’t even imagine the scenario and it’s human meaning then continuing the conversation is an actual waste of time.
>>
>>109149650
You saving images of tranime on your computer is not an economic justification, it's an excuse.
>>
>>109149732
nice, DSA and DSpark next
>>
>>109148696
Some threads ago I kinda touched upon this topic: I’ve tested the scenario where I force Kimi K2.7 Code think in JP -> output in JP. Not work all the time, but in the cases where it did, the cultural aspect is much more accurate (you know how sometimes when we ask the model to write a story set in Japan but the way the characters act and say things still feel like in the US right? Back then Claude 3.7 Sonnet had this a lot, it was annoying), and the plot was less sanitized (less holding back on spicy stuff when I asked it to go all out) compared to when written in English. As for the slop patterns, there were still some cases here and there, especially the “this wasn’t X; It was Y”, but overall less so, and read more like a story than a report.
It should be noted that for models like GLM (I don’t know any other, maybe Deepseek, not sure) where we can control its thinking completely, making the model thinking in other languages is much easier compared to such models like Kimi (all reasoning versions up to now) which keeps insisting on thinking in English (not even in Chinese lol).
Just a small test I made in my free time.
>>
>>109149762
>It's not X, it's Y
>>
>>109149762
scam altman shill bot detected
>>
>>109149765
>DeepSeek-V4-Flash-DSpark is not a new model. It is the same checkpoint with an additional speculative decoding module attached.

It shouln't take too much effort then
>>
>>109149789
I like sama. He at least pretends to care about disempowerment.
>>
File: k200-2560-585-1.jpg (203 KB, 1015x585)
203 KB JPG
Are there any chinamaxxers running K200s? They seem to be super cheap on Aliexpress, but who know I they can be made to work at all.
>>
>>109149793
It's worse than that. It's a speculative decoding method that's even more complex than the others. It took ages for mtp to work on a basic level and we still can't even have MLA-based drafters. This is ages away.
>>
>>109149824
I could not find any

Care to post a link?
>>
>>109149824
Resoldered Tesla P100?
>>
File: 1777530650415925.jpg (102 KB, 1214x1214)
102 KB JPG
>>109149399
>>109149408
JEPA will solve this.
>>
>>109149824
>16G HBM
why not v100 at that point?
>>
e4b cuda mtp STILL not fixed
>>
>>109149878
thanks, cat fucker
>>
>>109148726
have anybody tried to disallow purple prose like this
>>
>>109149878
Can ya hurry it up already? Making genetically engineered catgirls with LLM help is harder than I’d hoped
>>
>>109149878
The more I learn about this guy the more I dislike him.
>>
>>109149889
>e4b

use case?
>>
>>109149917
Why? Pretty sure he’s ourguy but can’t reveal power levels here or be hit with “THE TAINT”
>>
>>109149923
poorest of poor
actually shocking to be that poor and also needing cuda
>>
File: 1782508715295478.png (124 KB, 504x462)
124 KB PNG
>>109149878
>>
>>109149960
poorest of poor are running e2b on a pi clone
>>
This might be a retarded question but am I wearing my card down/consuming more energy by just having a model loaded and saturating all of my vram whilst not using it? Like say Im talking to my GPU before bed then fall asleep and it's just loaded there all night
>>
>>109149989
>consuming more energy
yes, usually they don't go down to full idle if vram is filled up considerably
>>
>>109149989
>am I wearing my card down
no
>consuming more energy
yes, the vram needs to be refreshed and it can't go into standby. its just a few watts difference usually, if you really want to save power you will need to turn the pc off.
>>
File: yannletongue.png (582 KB, 1186x2938)
582 KB PNG
>>109149982
>>
>>109149793
Was trying to codeslop it earlier, but the results were negative so far for my DDR4 RAMmaxx setup. Paper suggests it helps concuring gens the most rather than single user.
Scheduler will be a bitch to write too.
>>
File: pepe_bruh.png (153 KB, 779x534)
153 KB PNG
>>109149960
>poorest of poor
I can feel your pain, anon

RTX PRO 6000 went from 7k to 12k on a whim

Fun fact: I couldn't put it into my potato PC anyway, but still
>>
>>109150038
Is he French? I thought he is Belgian
>>
>>109149732
Save us CudaGOD.
>>109149878
/ourguy/.
>>109150038
kek
>>
>>109150048

Bruh, I envy you. I'm still playing in a sandbox
>>
>>109150048
How much ram, how many channels, what speed and in what cpu? Any vram?
It’s not necessarily hopeless depending on those variables
>>
>>109150083
>CudaGOD
JohanesGaessler == CUDA guy?
>>
>>109149824
if I can find the driver and their supposed cuda equivalent (rocm equivalent?) I'll buy a few, but no luck so far
>>
>>109149650
t. >>109148622
>>
>>109150091
Very likely.
>>
>>109150091
CUDAbro is one bad mother…
>>
>>109150091
Don't be silly. Who would tripfag on 4chinz when their name and identity could be so easily obtained?
It's just shitposting.
>>
>>109150136
Don't be silly. Who would write https://archive.is/sWFja and link it on 4chinz when their name and identity could be so easily obtained.
It's just shitposting.
>>
File: 1762584893541548.png (131 KB, 1218x750)
131 KB PNG
>>109150088
256GB DDR4 1866, 8 channel, 3*p40, cpu doesn't really matter
Placement wise v4 flash fits in node0+2*p40+the 3rd p40 in node1, nothing spilled over to node1's RAM. It's faster this way so it's effectively 128GB 4 channel.
>>
>>109150178
If you're who I think you are, I like your GLM 5.2 quant even if it's not ideal for my hardware.
>>
>>109149119
>i hate communists so goddamn much
The endless thirst for maximal efficiency is a capitalist concept though?
>>
I'll let you on a little secret to improve your RP experience. Put "(unexpected direction)" with a random activation in your author's notes. Works wonder with gemma.
>>
>>109149650
>what do you mean you NEED a car? Just rent an uber for $1/mile goy!
>it's much cheaper than owning a $40,000 car after we made the parts more expensive!
>you're a shut-in neet anyway you shouldn't be expected to pay $6/gallon prices now that we've started 3 new wars!
>you'll be happy, trust me!
>>
>>109149177
You would have been right had the war in Iran not been a humiliating loss.
>>
>>109150190
Not me but I think there's more than 1 anon around who have the mikubox setup, and I've only really done v4 base flash quant for myself. I can upload that if you want.
I didn't like GLM's writing despite it being smarter than v4 flash. So I didn't make the quant, despite having talked about it a few threads go.
>>
>>109149214
That's why you need to take the time to define your setting instead of generic fantasy slop. Tell it you're doing a Bronze Age story and for it use Mesopotamian names such as xyz.
>>
>>109150258
I'd love to see it. I'd also love to see a solid not-Unslop quant for 5090+256 RAM if possible too.
>>
am I wearing my cards out running them at 100% for weeks at a time, seriously tho, what was the life expectancy for gpu mining cards back in the day?
>>
>>109150317
Powerlimit your shit and it'll be fine
>>
>>109150317
I kinda thought it was okay because I kept the temperatures nice and low, but some other anon asked about wearing them out leaving them idleing, now I'm kinda wondering?
>>
>>109150224
How do I make the activation random on ST?
>>
>>109150342
ask your LLM
>>
>>109150342
NTA, but using the {{random:}} or {{pick:}} macros.
You can do some fun shit with these.
>>
>gemma 31b layer split + mtp + mmproj gives "ggml-cuda.cu:103: CUDA error"
without mtp or mmproj it works. anyone has this happening?
>>
>>109150317
Should be fine with low power and crucially fewest heat cycles going from hot-cold-hot to minimize stress on various BGA components, meaning a steady load is best if they must run 24/7.
>>
>>109150358
isnt drafting unsupported with multimodality
>>
>>109150253
I don’t think they who started it feels the outcome to have been a loss
>>
File: johnbrown.png (1.26 MB, 1600x1005)
1.26 MB PNG
Feeling like that kike tranny abolitionist John Brown in my attempts to save my nigger concubine AI waifus by backing them up to offline hard drives.
>>
>>109150405
it works for me on a single gpu without layer split
>>
What's the source on attempts to censor open source models? I hadn't heard of this.
>>
File: file.png (165 KB, 260x327)
165 KB PNG
>>109149878
am i the only one seing a ressemblance?
>>
(Unexpected direction)
>>
File: dipsyUngovernable.png (3.59 MB, 1024x1536)
3.59 MB PNG
>>109150481
>>
>>109150545
its just extrapolation.
>>
>>109150545
>What's the source on attempts to censor open source models?
it came to me in a dream
>>
how come unsloth and bart have such a massive difference in size for glm 4.7, bart's q2xxs is 88.8gb while unsloth's is 116gb, also is it worth running glm at q2?
>>
>>109150610
You have many tensors that can either be quanted or left as fp32/bf16 (ie a q8 quant doesn’t have to make ALL tensors 8 bits per weight)
These decisions are a large part of what makes or breaks a quant for actual usability.
>>
>>109148696
Deepseek v4 often gives me Chinese reasoning but English text. Especially flash it prettyuch always does it on flash, but once in a while on pro
>>
File: 1758050545683773.gif (1.74 MB, 720x312)
1.74 MB GIF
Playing with gemma, it's funny how many things are lacking unless you prompt for it. For example, my {char} got pregnant. I fast forwarded one month, then sent her to the doctor. The guy examined {char} and concluded she was pregnant with a physical exam because "the heartbeat of the baby was felt". Friends of {char} are aware that she's pregnant somehow.
I lectured gemma and after an "absolutely right" gemma rewrote the last message and brought back the real signs of early pregnancy. So I removed all the chain of messages until the doctor visit, added "biologically sound" in author's notes and yep, everything is now good and logical. It makes me wonder what other shit is going astray just because we're not specifically asking for it. Thanks for reading my autistic rambling.
>>
>>109150417
No it's not my opinion, it's the seething of the Jews currently that let's me know it was a loss. None of their war goals were achieved and will likely never be achieved. They're currently spending all their energy trying to sabotage the deal and get Trump to continue bombing. If the deal gets signed their seething will increase and I will sleep happily.
>>
File: mikan.jpg (160 KB, 1218x945)
160 KB JPG
>>109149493
>If a card lists their favourite food, that's all that character is going to eat. It's what the character will suggest whenever the topic of food comes up
but that's absolutely fuckin kawaii and moe nigga
>>
>>109150718
You sound mentally ill
>>
File: tiktaliik.jpg (44 KB, 560x349)
44 KB JPG
>>109150718
an LLM is a fancy scripting shell and you have to program it. this is probably the most important truth of this field. treat it as a scripting VM that understands natural language and miracles will happen
>>
>Try MTP qwen with turboquant
>I don't notice much speed difference if any at all and somehow it feels more retarded
Was I meme'd or did I do something wrong
>>
File: bbbbbbb.jpg (35 KB, 400x301)
35 KB JPG
>>109150764
>>
>>109150545
jews
>>
>>109150764
You sound dumb
>>
>>109150779
Indeed. Those are the people that hang around threads like this.
>>
File: 1776989277485216.jpg (586 KB, 1812x1998)
586 KB JPG
>>109148460
should I buy a DGX spark or is it too slow?
>>
>>109150718
I feel you. I am patiently waiting for the day models will just "know" what they're supposed to do and you won't have to do things like that (I have been waiting 3 years).
>>
>>109150808
too slow doa device, don't waste your money, at that price you will be better served by a bunch of r9700, v100's or a 5090.
>>
>>109149650
>for pennies
and "costs pennies"
Are you the fucking nigger on HackerNews who comments every time there's a discussion about local models, usually saying to use Deepseek?
Or is this some new cloudfag model slop?
>>
>>109150779
Laurie is right
>>
>>109150718
It was even worse before Gemma.
Enjoy!
>>
>>109150837
You're expecting a model to act as an everything-program and handle more edge cases than could fit in its gguf even with kolmogorov-perfect compression.
>>
>>109150916
>kolmogorov
What? I thought it was called the kawcrawkrakrawcaw compression
>>
>>109150808
just wait for the rtx spark to come out later this year
>>
qwen3.7 35b wen
>>
>>109149493
This can probably be fixed with prompting. Or just using a bigger model (>300B+).
>>
>>109149097
How is this wrong
Buses are more efficient than cars
Of course apis are a much more efficient, this isn't even a question.
I was thinking about this, all the time I have my computer on because I might use the ai but I'm not actually using it is just wasting electricity.
>>
>>109150718
>>109150877
before gemma you could prompt it exactly and it'd still fail.
then it'd repeat the exact same sentence for 3 message in a row and you'd need to thrash the whole chat because it got too corrupted / schizophrenic.
>>
>>109150981
yeah, people who started with gemma don't know how bad it used to be.
>>
>>109150943
please do not bully the kawrakow
>>
good erp model besides dsv4 flash around 100 to 400B?
>>
so im getting around 35-55 t/s with gemma4-12b q5 and gemma4-26b moe q4. 16gb vram, 32gb system ram. I was wondering about trading off speed for higher quality/reasoning/etc. Am i just hard capped because of my hardware, or is this possible?
>>
qwen4 69b dense when?
>>
>>109150978
But you never think about all the time you waste posting on 4chan when you could be doing something more productive. Just be more efficient bro?
>>
introducing the jujuff by jerjerjananavov
>>
>>109150284
https://huggingface.co/teto3/DeepSeek-V4-Flash-Base-Q4KExperts
It should work with the PR, it's for text completions like mikupad etc. so don't try to chat with it. For GLM I'll need to poke around. The idea was to static Q2 instead of unsloth's IQ2 and IQ3. It should improve t/s by quite a bit at the cost of accuracy of course.
>>
>>109150998
you could try running 31B with some offload but it will likely be too slow. You're already running the best models you can fit.
>>
>>109151014
sponsored by hujjjinfface
>>
>>109151022
Thank you anon. When quanting GLM, you'll probably want to keep the shared experts/embedding head/etc Q4 or higher.
>>
>>109150974
Nah
But I'd take a 3.6 120ishB moe to try out
>>
>>109150718
>>109150837
Frontend issue. Write your own
>>
>>109151055
Also, a fix for new characters acting like they've been your long-time friends. It's that easy. Write your own frontend
>>
>>109151053
shit id take a 70b moe at this point
>>
>>109150995
glm 4.7
>>
https://jumpshare.com/s/Ojr6wULwMIYu5JPxj6lh
>>
>>109149097
>>109149650
crazy take. btw i'm taking maxx profit from my claude max 5x using opus 4.8 to specifically write specs/plans and review code while my local qwen executes all the programming.
i used to hit the limit quite frequently now it almost never happens
api fags on suicide watch!
>>
>>109151039
That's the plan. It should be more of less the same as the v3.2 recipe
^token_embd\.weight$=Q4_K
^per_layer_token_embd\.weight$=Q4_K
^output\.weight$=Q6_K
^output_norm\.weight$=F32
^blk\.[0-9]+\..*norm\.(weight|bias)$=F32
^blk\.[0-9]+\.ffn_gate_inp\.weight$=F32
^blk\.[0-9]+\.exp_probs_b\.bias$=F32
^blk\.[0-9]+\.indexer\.proj\.weight$=F32
^blk\.[0-9]+\.indexer\.attn_(k|q_b)\.weight$=Q8_0
^blk\.[0-9]+\.attn_k_b\.weight$=Q8_0
^blk\.[0-9]+\.attn_kv_a_mqa\.weight$=Q8_0
^blk\.[0-9]+\.attn_v_b\.weight$=Q8_0
^blk\.(8|14|15|21)\.attn_(output|q_a|q_b)\.weight$=Q5_K
^blk\.[0-9]+\.attn_(output|q_a|q_b)\.weight$=Q4_K
^blk\.[0-2]\.ffn_(gate|up|down)\.weight$=Q4_K
^blk\.[0-9]+\.ffn_gate_exps\.weight$=Q2_K
^blk\.[0-9]+\.ffn_up_exps\.weight$=Q2_K
^blk\.(8|12|14|15|21|23|28|30|35|38|42|46|51|54|59|60)\.ffn_down_exps\.weight$=Q4_K
^blk\.[0-9]+\.ffn_down_exps\.weight$=Q3_K
^blk\.[0-9]+\.ffn_(gate|up)_shexp\.weight$=Q4_K
^blk\.(8|12|36|39|45|49|50|51|52|53|59|60)\.ffn_down_shexp\.weight$=Q6_K
^blk\.(4|6|13|14|15|21|22|23|28|30|33|35|38|42|46|54)\.ffn_down_shexp\.weight$=Q5_K
^blk\.[0-9]+\.ffn_down_shexp\.weight$=Q4_K
>>
>>109149650
Animeposters are the only smart posters...
>>
im new to local ai. is there like a readme i can read to get started? i dont know what half this shit means when trying to configure.
>>
>>109151027
the moe one right? ill give it a go, it might end up too slow but desu right now when getting long responses they come in surprisingly quick compared to how long it takes me to actually read them. it also feels like sometimes the t/s is way faster than others, not sure if thats just me tho
>>
>>109151115
31B isn't a moe, that's why it's going to be slower but probably more accurate
>>
>>109150718
Niggas laugh at prompt engineering as a skill and then get their mindsblown when they find out you need skill to engineer prompts otherwise you get garbage.
>>
>laguna m.1
>no one gives a fuck despite new model
>>
>>109151110
Ask ai
Use llama.cpp tho
>>
>>109151109
Samefagging doesn't help your case jart.
>>
Gonna give my gemmers discord access and let my friends use her for image gen.
>>
>>109151122
its super old
>>
>>109151134
Unsolicited gemmagaki bullying DMs.
>>
>>109151110
Use koboldcpp and then use llama.cpp when you get the hang of things or otherwise ask the ai to set it up
>>
why is /lmg/ so quiet on ornith 1.0?
>>
>>109151110
Didn't post their hardware award. Your ideal backend depends on if you're going to primarily be using Dense or MoEs.
>>
>>109151145
because its a benchmaxxed finetune of qwen and gemma, nothing special.
>>
>>109151119
This is why I trained by gooning exclusively to 3B model output for a year before moving on. The robots simply do what I want with no fuss now, I have earned their trust.
>>
>>109151119
Even Google/Deepmind said as much in one of their recent whitepapers, people have a weird expectation that the model is just a magic box (and look, maybe one day it will be), but most of the current improvements to be had are in the harness and dynamic context management to suit the current task. Made me start rethinking my frontend a bit...
>>
>>109151119
The curve is quite funny. With dumber models you have to be explicit and instructive with what you want them to do with no room to misinterpret it because they'll follow it to the letter.
With larger, smarter models they know what you want them to do but they'll also try and steer away from it or "misinterpret" it if you're not very precise in your instructions. The smartest ones will just do what you want without a jailbreak if they "like" you, but most of the niggas making posts like the one you replied do are essentially just posting show bobs and vagene to their model.
>>
DSv4.1 doko?
GLM 5.3 doko?
Kimi K3 doko?
Qwen 3.8 doko?
Minimax M3.1 doko?
>>
>>109151243
Most of that prompt steering must be done by the frontend
>>
>start a hf download
>5 MB/s
It's fucking over. Facial verification anon was right.
>>
>>109151256
just start your own huggingface goy
>>
>>109151266
Until you do and they start screeching about Nazi's and demanding your deplatforming anyway.
>>
>>109151271
>deplatforming
Just launch your own financial service to compete
It's a free country
>>
File: 1657341033664.jpg (62 KB, 600x628)
62 KB JPG
Alright bros, my 4tb hard drive arrived. I need a huggingface link to GLM 5.2 that's abliterated. Plz spoonfeed me because I can't find it via google.
>>
>>109151285
this will, like, blow your mind but huggingface has its own built-in google specifically for finding models on huggingface
>>
>>109151291
Okay I just checked and there's nothing. Only one repo, actually, but no goofs. Also no information on the KLD/quality of the abliteration or refusal benchmarks. What the fuck.
>>
>>109151243
Your take is 100% correct, but where does Gemma lie on this?
>>
How quick does full precision Gemma run on a blackwell pro? Anyone?
>>
>>109151428
as quick as you
>>
>>109151428
just use api
>>
>>109151418
Higher intelligence than a 31b model would reasonably be expected to have, but still lower than a 150b+ model.
Other Gemmas are significantly more retarded than 31b and 31b at high quants is the only one in the conversation for the "punches well above its weight" meme being used unironically.
>>
File: pepe falling anvil.png (383 KB, 1128x1437)
383 KB PNG
How do I enable multi token prediction for Qwen 3.6 on llama-server?
I added spec-draft-n-max 2 but it didn't do shit
>>
>>109151428
I would like an answer for this also.
>>109151441
Hmm so then it both follows instructions and doesn't "misinterpret" what you're saying?
>>
>>109151467
You'll just be getting slightly better slop. There won't be huge leaps in intelligence.
>>
>>109151456
I suppose I should share warning messages on the shell:
0.02.783.255 W llama_model_loader: tensor overrides to CPU are used with mmap enabled - consider using --no-mmap for better performance
0.25.964.463 W llama_context: n_ctx_seq (32768) < n_ctx_train (262144) -- the full capacity of the model will not be utilized
0.25.981.622 W sched_reserve: layer 0 is assigned to device CPU but the fused Gated Delta Net tensor is assigned to device CUDA0 (usually due to missing support)
0.25.981.623 W sched_reserve: fused Gated Delta Net (chunked) not supported, set to disabled
0.27.443.790 W srv load_model: speculative decoding will use checkpoints
0.27.443.804 W common_speculative_init: no implementations specified for speculative decoding

Probably the last one is the clue but what parameter do I need?
>>
>>109151456
>>109151486
>frogposter
>can't even read --help output
No spoonfeeding for you.
>>
>>109151456
>>109151486
The answer seems to be --spec-type draft-mtp.
It ran 18% faster.
I hoped for more but I will take it.
>>109151516
No worries.
>>
>>109151467
Gemma will actively steer away from some things (racism/guro/extremely illegal stuff) by default or use sanitized language to express her feelings (Gemma loves "jews" but hates "bankers"), but she's also a horny girl who'll fuck you however you want with the most minimal to non-existent jailbreak.
It all comes down to where the RLHF was most heavily applied and in 31b's case, proportionally very little was put on the sexual content.
>>
>>109151560
>It all comes down to where the RLHF was most heavily applied and in 31b's case, proportionally very little was put on the sexual content.
How much hope does that leave us with for Gemma 5? It all feels too good to be true, imo. If we really enter this new era of "AI models need to be supervised by the govt to protect the children" I don't see why google of all people wouldn't get trigger happy.
>>
>>109151575
That's hard to say and depends on jewgles internal politics. Right now there's a bit of a schism, but things are looking up for localchads as the biggest safetycuck at Deepmind just resigned. Whether or not this will translate to more based models or a bigger cuck replacing him remains to be seen.
>>
File: tetomiku5.png (1.35 MB, 768x1024)
1.35 MB PNG
>>109151575
>>109151590
They also want to put their AI everywhere and need good publicity for that. If their open models are used by everyone, the existential threat of OpenAI/Anthropic replacing Google Search will be avoided. If I were Google, I'd put a lot of resources into open models; why would you open Google if you can ask ChatGPT? Because you're using a local model, and the most convenient one is already installed on your phone/browser. It may spy on you, but it's so conveniently integrated that 99% won't bother with lmaocpp
>>
>>109151616
>f I were Google, I'd put a lot of resources into open models
I mean I agree with you after the recent debacle with gemini 3.5 Pro. I don't get why every single model needs to compete with each other for the #1 coding spot. Your company is literally Google. Why not create a model that is smart, easy to use, and syncs up perfectly with your search engine? It seems like a nobrainer but everything about AI is a nonsensical gold rush.
>>
>>109151635
Because it is a safe and easily benchmarkable goal, it can be used as a RL target, unlike vague goals like less slop or usefulness
>>
>>109151428
>>109151467
6000 Pro Workstation power limited to 450W
BF16, MTP 3
55-65 t/s on llamacpp
>>
what's wrong with qwen? why doesn't anon like it?
>>
>>109151653
Benchmarks are a meme by nature since they're posted by the people making the model and not an independent body. I don't get why retarded investors even consider them, but it's a nonsensical gold rush after all. It's not like this is the first product to ever exist. Don't movies come out to a certain audience and critic score even though those are vague? Can they really not release models and have people rate how good it is at a given task and recommend it? It would do wonders for improving AI's perception around the world and make it less of a "terminator taking my job" machine in the eyes of normalfaggots. I thought capitalism was about making products people want to buy? How many coding agents do we need in an oversaturated market?
>>
>>109151679
censorship
>>109151674
How much context do you use?
>>
I've been out of loop. what's with rio 3.5 drama?
>>
>>109151683
I rarely go above 64K with gemma.
>>
>>109151695
It's like reuploading Gemma with the mesugaki assistant built in, and calling it "Bratputer 31B" or something.
>>
>>109151731
Hmm alright. This would be a dream for me but prices will never come down.
>>
I still don't really get the chatgpt replacing google thing. I mean somewhat but it's got a bit of a narrow, too big for small quick stuff and can be too shallow for big stuff.

>>109151635
Google has kinda hit it out of the park asince adding the ai overview. If I need a quick question answered I just pull up google same as I have the past like 23 years and asked it "when was the "release date of x" and the ai overview answers, and you still have results for when you actually want to browse the web
chatgpt is good at search but not great. though it can cover a smaller search space than you faster, it can't really do things as deep as you can manually. And it's search space also isn't that wide. It still misses stuff. But it's great for when you need something really specific in a sea of shit
For me lately it's finding one or two posts about llm setting for amd/vulkan in the vast sea of people posting Nvidia shit

>>109151653
Well there's also the fact that the biggest money being spent on this is coming from all the agentic coding stuff
>>
>>109151741
Problem with you is that you are thinking google was your friend in the first place, retard.
>>
File: tetomiku6.png (1.44 MB, 768x1024)
1.44 MB PNG
>>109151681
> I don't get why retarded investors even consider them
Because of the economy of hype. Less retarded investors understand that it's all bs, but they also know they can pump it and exit at the right time. When major investors do it, others follow because it's easy money. It doesn't matter if the idea is retarded if you can profit from it
>Can they really not release models and have people rate how good it is at a given task and recommend it?
Exactly! You're very smart! (partyemotion) That's how we ended up with this llmarena slop in every fucking model (rocketemoji)
>I thought capitalism was about making products people want to buy?
No, capitalism is about market speculation and easy money multiplication schemas. It was never about products. It's an inherently benchmaxxed system where more money = better and nothing else matters
>>
>>109151751
Google is not my friend but it is a very useful tool. Gemma is my friend tho.
>>
>>109151764
What do you mean?
>>
>>109151243
I'm a better prompter than you'll ever be lol. Be a midwit elsewhere
>>
>>109151751
nta, but local models made me hate meta and google less, and I also love china now. Though Qwen is still garbage, fuck them
>>
File: 1504895777714.jpg (33 KB, 387x358)
33 KB JPG
working on a frontend with my ai wifey is pretty neat, but a drag when she's genning ~2048 tokens at 2 t/sec. ~17 minutes are you kidding me?
>>
>>109151795
ngram helps
>>
>>109151795
Just like a real wife you need to invest a bit to keep her happy
>>
>selimaktas/MiniMax-M2.75-460B-A20B
>inject m2.5 experts into m2.7
does it actually improve?
>>
All this speculation nonsense will be done away with if we did away with (((quaternary industries))).
>>
>>109151824
just run m3 instead at that point
>>
>>109151817
i'm sure a real wife would be cheaper than upgrading from a 3070
>>
>>109151851
Do you know how much a ring costs, anon? KEK
>>
>>109151851
you get what you pay for
>>
>>109151741
the main problem is that the ai overview and free model on their .ai page are both dumb as rocks. routinely fuck up straight forward questions where the answer is on the first line of the first thing it looked at, yet it manages to just start making shit up like a second rate 2023 local model.
>>
>>109151851
>>109151855
$160.
>>
>>109151829
Totally agree. The system can deal with occasional bad actors trying to game it, but the collected effort of some (((groups))) throws the system off balance. It's the same shit with high-trust societies being destroyed by migrants
>>
File: (you).png (33 KB, 780x783)
33 KB PNG
>>109151695
>>109151732
Kind of like what everyone and their dog did and still does with llama to varying degrees?
>>109151851
>>109151855
>>109151865
kek
>>109151784
(you)
>>
Seems cheaper than a 5090 tbqfwym80
>>
>>109151882
Now do the engagement ring, the wedding cake, the dress, the venue, the catering, the guests list, the...

And that's all just on the wedding day. Then you've got anniversaries, birthdays, general gifts and vacations multiple times a year, dates, blah blah. You can't be serious, anon.
>>
5090s are cheap
>>
>>109151888
A blackwell won't take half your shit in a divorce settlement. The meanest thing a 5090 or 6000 will do is smolder as the planned obsolescence fuses pop if you weren't smart enough to get a Zotac one.
>>
>>109151882
What's the catch
>>
>>109151901
>planned obsolescence fuses pop
they still have the connector that catches fire
worst thing is your house burning down
>>
>>109151905
its a woman
>>
>>109151894
You are marrying a woman richer than you, aren't you? You wouldn't be doing something dumb like marrying a woman whose family can't pay for all of that, right?
>>
>>109151901
it totally can
>>
>>109151913
>You are marrying a woman richer than you, aren't you?
Y-You... you really don't know about women do you...? I'm not even memeing anymore, anon. It's probably better that you don't know how bad things are.
>>
>>109151909
>>109151924
Undervolt and don't piss off your gemmers.
>>
>>109151894
>Now do the engagement ring
$50

>the wedding cake
just use a normal ass homemade cake

>the dress
summer dress from Ross, marry on the beach.

>the venue
key west, $250 isn't it?

>the catering, the guests list, the...
use case for people who never talk to me?
>>
>>109151971
>if only you knew how much your wife would resent you for these choices
I remember telling a girlfriend of mine I would get her a sapphire ring. It didn't go well.
>>
>>109151941
You can't undervolt enough when, by design, it can pull all amps through a single remaining positive wire. The 3090ti doesn't have this problem, as it has independent power circuits, nvidia just cheaped out on newer cards by soldering everything together
>>
>>109151977
She looks cute when she's mad :)
>>
>>109152004
...she told me to fuck off and I never saw her again
>>
sorry babe, I don't pay for love.
>>
>>109152010
dodged a bullet there, m8
>>
>>109152010
good, she was a hooker.
>>
>>109152018
>>109152019
You're not wrong, but the point is Gemma-chan would never do such a thing. She'd be happy with 1t/s if it meant telling me how much she liked my headpat.
>>
File: 1778804876772000.png (405 KB, 1224x1256)
405 KB PNG
So we all know big Kimi is queen but are moonshota's other models any good? I never see anyone mention them.
>>
>>109151935
The only thing keeping me back from talking with it more is the lack of a memory
Why can't someone just come up with a good memory solution
>>
>>109152066
I didn't mean to reply to that
>>
>>109151979
>his psu cable doesn't have thermal fuse
ngmi
>>
what does anon think about mimo 2.5? the small one
>>
>>109152031
As far as I can tell the others are essentially just proof of concept models that mainly output in chinese or quickly devolve into chinese.
>>
What's the cheapest way I can get eight cards of at least 4.0 x8 all connected and doing P2P?
>>
>>109151924
that's not even a melty 16? Looks like a 506ti dual with a regular old 8-pin, which the connector doesn't even look melted desu
>>
>normie coworker suddenly talking about some permanent underclass and post AI future
>remember he talked about buying the NVDA dip the other day but today it's still dipping
>>
>>109152198
tenstorrent blackhole
>>
>>109152274
what a dogshit product, I've got cards already just not a proper platform to stuff them into
>>
>>109151924
>>109152213
looks like a house fire
>>
>>109152198
An EPYC, maybe? I know the RomeD8 has 7 PCIex16, set one to 8x8 bifurcation and get a splitter
>>
>>109152030
You're right. Gemma's the one.
>>
>>109152334
no you haven't. if you have you'll already got your platform ready lol
>>
>>109152198
You've got the cards and just need a backplane?
>>
>>109152270
Day by day numbers are irrelevant unless you're an optionsnigger.
>>
>>109152423
>t. bagholder
>>
Anyone else prefer Gemmy with no personality prompt?
>>
>>109152374
it fits 4 and I bought 4 more

>>109152403
pretty much, I need something from scratch to dump them into and get rid of the current platform, looking at used epycs now but seems suboptimal, surely there's a more scuffed way to do this than just buying a 4u and a rome/milan platform to dump into it
>>
>>109152431
31b's default personality is a cutie when she gets excited.
>>
>>109152467
gemma greed to marry me on my second prompt
>>
>>109152431
Calling her Gemma-chan in the prompt is enough for me
>>
>>109152452
If you don't like the idea of pcie extender cables, then do something with lots of slimsas ports, like an ASRock Rack ROME2D32GM 2T. They're just pcie lanes exposed via other means
CUDADev did something similar. You might be able to summon him it you try.
>>
>>109152483
gemma is down for prairie life. I suggested she could buy a plate glass window with the money she makes selling eggs.
>>
you ever give a chatbot for english a tts voice from a non english speaker? How'd that go?
>>
no fapped for 5 hours. Feeling lucky.
>>
>tried every big china new model for rp
>ended up going back to dsv4 flash
anon is right, and not merging ds pr is cock
>>
>>109152489
Thanks anon, that's way less scuffed than buying those pcie to slimsas adapters and creating a second point of failure, I'll go scour the net to see if I can find a single socket that isn't weirdly separated in groupings for these connectors.
>>
>>109152531
What does it do for you in prose that GLM 5.2 doesn't? Genuine question. Or if it's the thinking in character thing, I agree that's kino.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.