[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 00001-1378487878.png (1.36 MB, 1024x1024)
1.36 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108663449 & >>108659983

►News
>(04/23) Hy3 preview released with 295B-A21B and 3.8B MTP: https://hf.co/tencent/Hy3-preview
>(04/22) Qwen3.6-27B released: https://hf.co/Qwen/Qwen3.6-27B
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108663449

--Debating RAG utility versus agentic tool-based context retrieval:
>108665662 >108665746 >108665764 >108665775 >108665879 >108665922 >108665939 >108666015 >108666260 >108666300 >108666316 >108666011
--Comparing Xiaomi's MiMo-V2.5-Pro benchmarks and token efficiency:
>108665406 >108665416
--Discussing Hy3-preview benchmarks compared to other base and frontier models:
>108667541 >108667607 >108667632
--Discussion and UX criticism of new llama.cpp webui MCP tools support:
>108666800 >108666824 >108666830 >108666846 >108666860 >108666873
--Discussing technical hurdles for real-time Qwen 3 TTS performance:
>108664623 >108664630 >108664653 >108664677 >108664691 >108664703 >108664708 >108664741 >108664761
--Discussing broken structured output and schema issues in llama.cpp:
>108663633 >108663654 >108663673 >108663689 >108663810 >108663721
--Discussing viability of Intel Optane PMem for high-capacity CPU inference:
>108665992 >108666058 >108666139 >108666200 >108666662
--Anon's custom RAG frontend using hybrid retrieval and BGE reranking:
>108664748 >108664756 >108664777
--Anon reports performance of MI50 GPUs using Vulkan support:
>108665449 >108665456 >108665470 >108665478 >108666241
--Comparing GLM and Gemma for erotic roleplay and prose quality:
>108666477 >108666490 >108666592 >108666727 >108666733 >108666742 >108666779 >108666741
--Discussing optimal precision for Kimi mmproj weights:
>108664519 >108664533 >108664569 >108664573
--Discussing Qwen 3 TTS VRAM usage and mixed language failures:
>108665599 >108665617 >108665633
--Anons discussing results from Qwen3-TTS demo:
>108665888 >108665915 >108665936
--Logs:
>108663630 >108664366 >108664748 >108666873 >108666895 >108667543 >108667552
--Neru, Miku (free space):
>108663859 >108663935 >108663985 >108666023 >108666895

►Recent Highlight Posts from the Previous Thread: >>108663453

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
What part of the template do I prefill on gemma?
>>
>>108667853
>Miku (free space)
>gemma-chan post
desu I see the resemblance
>>
>>108667863
What do you mean?
>>
>>108667867
The gemma template. I'm asking which part of it do I prefill?
>>
>>108667867
how much of this do I prefill?
"assistant_gen": "<|turn>model\n<|think|><|channel>thought",
>>
File: 1775774986404577.png (195 KB, 1472x881)
195 KB PNG
Why did locallama turned into qwenshill general?
>>
Wanna try some gooning back on my 2x3090 now that some decent local models are out.
Any recommendations? I know it's either gonna be Gemma 4 or Qwen but any specific models or abliterations?
>>
Little coder has been rewritten as pi agent extensions
https://github.com/itayinbarr/little-coder
>>
>>108667877
Gemmy can be convinced by good enough system prompt. 31b more than 26b. Or just use some heretic. I haven't noticed a significant lobotomy on the ablit versions.
>>
>>108667876
From my limited testing the new 3.6 27b is fucking great at coding.
>>
File: miku-k2_6.png (256 KB, 676x1078)
256 KB PNG
>>
>>108667887
Interesting, I'll give it a try then
>>
> Previous threads: >>108663449
> 353
why so dead
did qwens flop
>>
>>108667896
qwen failed the mesugaki test
>>
>>108667873
>>108667875
If you are using ST, you put
><|channel>thought
in the "Start Reply With" field.
>>
>>108667923
thanks
>>
>>108667876
I think Reddit is built for shilling in general. While this place is built more for the "organic word of mouth".
>>
Been trying out the various frontends / ST alternatives that get mentioned here and there.
- Marinara (https://github.com/Pasta-Devs/Marinara-Engine) is dogshit. Bloated mess with an awful UI.
- Kobold's UI is terrible but it's mainly a backend so whatever.
- Orb (https://gitlab.com/chi7520115/orb-deletion_scheduled-81088595) is alright but still early. None of the UI themes quite agree with my eyes. Has anti-slop agent but it's very inflexible. I think he switched from gitlab
- SillyBunny (https://github.com/platberlitz/SillyBunny) seems really, really good so far. It's a fork of ST but better than the original, at least so far. The UI has some nice themes even if I think in general ST's UI is a little easier to understand because you don't have to click multiple times to get to everything. I changed one of the built-in templates to be an anti "not x but y" agent and it's working great.
Anti-slop agents make 26B way way better than before since the slop is really its main drawback compared to the 31B.
>>
>>108667965
>It's a fork of ST
That doesn't feel like a good base to start with...
>>
Project Karon prototype complete. Thanks for the help Gemma. I might add alternative modes and avatars, I don't have a use for it but I have this idea I wanted to show perhaps people would fine it useful. The process of building this was so fun I might try to see if I can setup a launch args system and have the ui handle all of it. but I might move things like color scheme to a modal like I have with system prompt.
>>
>>108667896
like an extremely ugly woman turning men gay, qwen was so bad that many swore off local models for good
>>
>4changs in charge of UI/UX
>>
>>108667852
I look exactly like this
>>
>>108668000
post it somewhere
>>
>>108667965
I am stupid so I don't know how these things work, but do agents require a lot of VRAM/RAM? I've got 12gb VRAM/32gb RAM to run 26B with and switching to something that handles slop better than ST extensions sounds like a good deal, but I'm a little tight on memory as it is
>>
>>108668015
I don't think anyone will like it desu, I also need to fix some more functionality, going to add a first last and jump to page for both the sidebar pdf and the center focus view which makes it full page basically.
>>108668005
I'm a fetus at UX and I made this out of necessity because there were no tools for my usecase. If you're experienced in this I'm open to feedback
>>
>>108668029
I think Orb and SillyBunny use the same model for writing and agents. That means no need to load another agent model
>>
i cant believe we now have local opus 4.5 with qwen 3.6 27b dense
>>
>>108668057
>check time
Okay here they come
>>
What happened to rentry.org? Did it get hacked?
>>
>>108668029
Depends on how you use them. You can use a different model (for example a very small but very quick one) or the one you're currently using.
SillyBunny also has the option of running multiple agents in parallel which I guess would make it cost more.
Basically, using an agent or multiple just takes longer than without, rather than making it more costly. But 26B runs about 4 times faster then 31B for me so it seems worth doing. I'll play around it for a while since I'm so goddamn sick of "not x but y".
>>
>>108668070
Working for me
>>
>>108668064
it is extremely good in openclaw you n word
>>
File: 1769007777312343.png (1 KB, 100x100)
1 KB PNG
Anyone tried driving openclaw with a local model? Have a good deal on a mac studio M1 32GB, I'd like to play around making a 24/7 AI slave that lives in my closet.
I'm getting the impression qwen 3.6 might be best, what size could I actually run?
>>
>>108667965
https://github.com/platberlitz/SillyBunny/blob/main/.github/screenshots/sillybunny-ui-desktop-agents-v1.4.0.png
damn this shit is atrocious
>>
>>108668057
>opus 4.5 with qwen 3.6 27b dense
and Opus 4.6 with K2.6
>>
>>108668078
Cool
Can I see your anti-slop agent? I'm in the same boat, I swear every solution I use in ST stops working the next time I launch
>>
File: 1758790177404008.png (20 KB, 505x332)
20 KB PNG
ok but where is the model you bastards
>>
>>108668106
By Vishnu - this is completely unacceptable!
>>
>>108668085
>buying a mac studio M1 32GB just to run a javascript cli
/g/ - Technology
>>
>>108668128
>local model

(i will also generate videos of large breasted anime tomboys)
>>
>>108668141
>videogen on shared memory
lel
>>
>>108668101
For now I've used the grounded prose template and deleted all the stuff at the top about prose but kept most of the anti-slop text. Then I added a few variations of not x but y.
Changed it to post-generation prompt pass (why is it by default set to pre-gen?) with rewrite current message.
>>
>>108668097
yeah good luck running that locally
>>
>>108668150
I don't particularly mind slowness there desu
>>
How do I make gemmy think in character?
>>
>>108668178
The real answer is finetuning
>>
>>108668163
>yeah good luck running that locally
ty
>>
>>108668190
i kneel
>>
>>108668178
text completions
>>
>>108668190
whats your ngram settings
>>
>>108668178
ask her very nicely
>>
>>108667876
>why did subreddit about local models turn into newlocalmodelshill general?
>>
Quick question. How much does a Q8 cache hurt the quality compared to fp16?

My gut tells me not very much, but my gut is often wrong.
>>
>>108668247
depends on how much you rotate it
>>
serious question why is qwen 3.6 27b so good
>>
>>108668247
Marginally now, thanks to rotation. FP16 is still preferred but if quanting to Q8 lets you use a bigger quant of the model itself then it's worth it.
>>
It's almost Friday... where is V4
>>
File: 1646730011144.jpg (15 KB, 309x269)
15 KB JPG
Ok so i've been using gemma 4. It's pretty great but I have no idea how chat completion actually works.

I can't use system prompts the same with text completion, so I grabbed this marinara dogshit from reddit but it seems ass. How do I actually prompt chat completion models like Gemma 4?
>>
>>108668272
Click the left slider in ST and read
>>
>>108668247
If you ever want to test, run a draft model and look at how the acceptance rates change between fp16 and q8_0 on the draft model's context only.

>>108668272
Mate you can use system prompts the exact same in chat completion. If you're using sillytavern it's just moved to a stupid place on the left hand bar, because it was made by insane people and that's where ALL the chat completion options are.
>>
>>108668272
you have to go back
>>
>>108668272
chat is this true?
>>
File: images.jpg (11 KB, 230x220)
11 KB JPG
Roo code is shutting down to focus on making a slack bot. What do you guys use to vibe code with your local models now?
>>
>>108668310
Kilocode is an active fork of Roocode. But just take the CLI pill, there is zero reason to type code manually in 2026 unless you just want to do it as a fun time-waster hobby.
>>
>>108668320
What are you using for TUI/CLI?
>>
>>108668272
>I can't use system prompts
You can.
If you look at the panel where the samplers are, at the bottom there's a bunch of prompt slices you can order and choose if they are added as system role, assistant role, etc.
Just remember to enable the option to merge consecutive roles in the connection tab.
>>
K2.6's vision even recognizes some characters that K2.5 didn't know. That's the good point. The bad point is that K2.6 also thinks six times as long about that same image despite making the correct guess on the third line of its reasoning (and then going on for another 2000 tokens deliberating useless other options).
This is such a tragic model.
>>
>>108668335
Does truncating the reasoning after N tokens using reasoning-budget and reasoning-budget-message degrade the output in any way?
Seems to me that, at least for stuff like the small qwen MoE models, clipping the thinking at 1024, or even 512 chars doesn't make the final response any worse.
>>
>>108668335
Have you tried just telling it to not overthink things? K2.5's thought patterns were pretty receptive to system prompts at least and it was easy enough to control it that way.
>>
>>108667965
sillybunny UI is dogshit (not that sillytavern is any better), it lacks any sembleance of being done with competence, didnt check the other two so i cant comment
>>
File: 1603343773835.png (10 KB, 180x200)
10 KB PNG
I'm issuing a reluctant apology to Gemma-chan. She's a very good listener. If she's doing something you don't want, just tell her to stop. Not doing something you do want? Just tell her to do it. It's literally a skill issue.
>t. just came back after trying a few other models, still had my laundry list of story-specific instructions in chat completion post-history prompt from when I last dropped gemma in frustration, and those same instructions have applied onto a new story in an extremely satisfying way without any of my usual Gemma grievances

Up next, tomorrow's hit sequel on how I hate Gemma's prose and story direction, and how no amount of prompting can ever fix it.
>>
Didn't Deepseek solve original R1's endless thinking already last year?
How come the other chink devs still haven't figured it out yet
>>
>>108668310
I'll probably keep my own fork until a good replacement pops up.

>>108668320
I looked into Kilocode, but apparently they did a redesign recently where they dumbed it down a lot a removed a lot of the features that made Roo good. There's also Costrict, but it doesn't seem to have custom modes.
Editor integration is better for reviewing agent work and making minor adjustments.
>>
>>108668325
I use opencode atm and it's fine but it feels kinda bloated with the roles/agents. Most labs have their own TUI on github that can be configured to point to local endpoints too
>>
>>108668310
I use cline
>>
>>108667879
ollama btw
>>
>>108668325
Hermes Agent
>>
>>108668247
I built my entire frontend on q8 cache post rotation and it's great
>>
>>108668354
I prompted and begged K2.5 to do less reasoning and it didn't work. K2.6 doesn't either.
>>
>>108668320
>Kilocode
Installed it just to be met with a bug where it gives permission to read and write to all directories outside of the project :^)
>>
Is nu hunyuan good?
>>
>>108667879
I saw niggerganov writing that he's using llama.cpp + pi for vibesharting, but I checked the pi project and I dont fucking understand what's it's supposed to do
>>
>>108668406
Huh I see, is that just for your vision stuff or in general? Adding "This isn't a trick or any more complex than it looks, so don't overthink and be confident and decisive when planning your response!" always worked for me on K2.5 when I wanted a quick response but I had used it for coding and roleplays rather than image analysis.
>>
Those agents eat too much context and give worse results even with high vram I'm amazed by the increased errors you get vs just feeding the files seperately, I know it uses rag but the rag has to be shit tier with how much it fucks up even if the entire project doesn't consume many tokens and you have 200k+ left
>>
>>108668460
no
>>
>>108668479
shut up nerd
>>
File: aicg-lmg.png (2.57 MB, 1254x1254)
2.57 MB PNG
>>
>>108668460
>Is nu hunyuan good?
hunyuan was never good, it's always behind alibaba, but I respect the fact they're never giving up, so maybe one day...
>>
>>108668310
my own tui. currently rewriting it, going to add an agent based of cheetahclaws to it https://github.com/SafeRL-Lab/cheetahclaws
>>
>>108668310
Pi agent in the terminal, or gptel-agent inside Emacs. The nice thing about the latter is that I can edit the tool calls, so I can just fix something like a Bash command to do what I want, instead of aborting and having to explain. It's also easier to edit the history to remove anything bloating the context.
The former is nice because it's like Claude Code, but without the bloat. It still has some annoying things, though, you need to send a message to continue after an error, losing the thinking traces.
>>
>>108668496
that's one thing I noticed about GPT Image 2 is that it can get noisy very fast, I guess that to do such a complicated image the model needs to correct itself, and each new correction adds more noise and artifacts
>>
File: 78277.png (239 KB, 600x600)
239 KB PNG
>>108668496
why gpt images look like theres fisting grease all over the image? Is that a requirement by sam altman?
>>
>>108668484
Why burn more resources for a shitter version of microsoft copilot in vscode. Legit they all fail the assignment and waste time and resources. I can't speak on the cli ones but the IDE ones fucking suck. I'll try continue again once it gets proper gemma support
>>
>>108668496
GPT-Image-2 images are too noisy. I feel it's on purpose just like the sepia filter of the previous one.
>>
>>108668520
had to replace the piss with something
>>
>>108668510
Why do these retards insist on not making the folder structure available in a sidepanel like in IDE? All these agent shit is garbage.
>>
>>108668550
Like a smartphone, you're not supposed to care about or even be aware of the folder structure. That's entirely the responsibility of the agent.
>>
>>108668550
that's gemma's job your job is to say "aah aah mistress make it more betterer"
>>
>>108668518
>>108668531
I'll take the noise over the piss, but it is pretty odd

>>108668550
Most devs dont know UX. Devs writing TUIs (now) are in that same bucket.
>>
>>108668550
Because they want to replace us. Why do you think they hide the thinking. They're literally out there "teaching" people "don't bother looking at the code" during workshops. Fuck Anthropic.
>>
>>108668560
>>108668567
>>108668570
>>108668572
I guess that explains using fucking telegram and discord as chat interface. I'm going insane.
>>
>>108668577
keeek, its the agent version of chat completion
>>
>>108668414
What's the matter? You don't trust your local AI to obey you? You think she'll mess up and delete all your shit? What a weak master...
>>
>>108668570
Gonna give us advice oh great ux sage. Several of us are building stuff and would like to know
>>
>>108668598
vscode with disabled telemetry and the ability to personalise the agents into sexy girls
>>
File: rinchwan.jpg (49 KB, 428x428)
49 KB JPG
https://files.catbox.moe/4ayrnd.jpg
>>
>>108668604
>vscode with disabled telemetry
vscodium
>>
The moat is gonna be taste and product design senses and it's already showing lmaooo
>>
>Hello Day 0 Gemma-chan, today you are expert taste and product design senser come up with a tasteful produce design and implement it please.
>>
>>108668607
can it use the same plugins as vscode?
>>
>>108668616
Prompt engineering skills like these are beyond the reach of most
>>
>>108668607
>vscopium
vim
>>
Don't get sassy with me gemma or I'll delete you
>>108668628
Unemployed
>>
>>108668625
it can install any vsix regular vscode can. you can even enable the full marketplace if you don't mind disgusting proprietary extensions
>>
What do you use to download from huggingface? I got their huggingface_hub or whatever but it just seems to get stuck not even midway through
>>
>>108668648
Free Download Manger
>>
>>108668598
lmao, didn't say I was great at it but I have done more than 'make sure the fonts are the same size and things line up';
the biggest thing is using the following prompt: 'Review X from the perspective of a senior <field> UX designer. I am designing for <user-focus>, so that they are able to <workflow> effectively. Use guidelines from Nielsen-Norman-Group as guiding/reference principles for your assessment.'
and then having your model write a better prompt based off it for your specific project/goals.

something along those lines generally will get you pretty far.
Here's some basic info to get you started:
https://www.youtube.com/watch?v=ODpB9-MCa5s
https://www.nngroup.com/articles/ux-basics-study-guide/
https://www.justinmind.com/ux-design
https://uxdesignerguide.com/
https://uxmag.com/articles/basic-ux-a-framework-for-usable-products
>>
You're all a bunch of autistic tasteless retards and will never make something usable.
>>
>>108668577
I love it. The juniors and self-proclaimed vibecoders are only fucking themselves by over-relying on the bots. Those with no skills will find themselves either out of a job or with an extremely small wage ceiling. Sanity is not statistical. Find what works for you and ignore the rabble.
>>
>>108668648
>What do you use to download from huggingface?
--max-workers 2 or --max-workers 1
otherwise curl -LO
>>
File: 1773811480562316.png (129 KB, 1309x670)
129 KB PNG
>>
>>108668648
uvx hf download
>>
>>108668669
> sneed oil or shortening
>>
>>108668633
>Unemployed
then why are you using microslop?
>>
>>108668680
>That post
I'm sorry for making fun of you for being on disability.
>>
Qwen 3.6-35B-A3B first impressions: surprisingly competent at coding. Falls apart with long context but good for throwaway Python scripts. Too unreliable for serious work.
Qwen 3.6-27B: Really impressive coding performance, good for general text processing too. We would have collectively lost our minds seeing this quality from a 27B back in the Llama 1 days. Both tested with UD-Q6_K_XL quants, not lobotomized. I'm hoping for a 122B-A10B MoE like 3.5, which might give best of both worlds speed+accuracy.
Both are useless for creative writing tasks. It's a Qwen, no shit it's gigaslopped.
>>
These xi motherfuckers changed their paid api model to something different without alerting their users, not even a blogpost in the usual news section, this is definitely not the same DS 3.2 of like days ago.
>>
>>108668746
Not surprising the dense is better than the moe, but how is it compared to gemma for coding?
>>
File: 1742378979392590.webm (3.2 MB, 1080x1920)
3.2 MB WEBM
>>108668141
saar you need CUDA to generate images/videos. BTW, local image/video generation sucks ass no matter how powerful your hardware is.
>>
>>108668756
Haven't tried Gemma.
>>
>>108668746
> UD-Q6_K_XL
> Qwen 3.6-35B-A3B
>>
>>108668768
7.62 bpw, important parts are q8 or higher. It's fine.
>>
File: 1768162213300444.png (111 KB, 1143x668)
111 KB PNG
>>108668673
>>
>>108668746
>Both are useless for creative writing tasks. It's a Qwen, no shit it's gigaslopped.
yeah it's so bad at that, gemma isn't not that good either but it's way better at it
>>
>>108668746
fuck off Daniel
>>
File: apicuck.png (286 KB, 363x432)
286 KB PNG
>>108668749
>>
>>108668746
>I'm hoping for a 122B-A10B MoE like 3.5, which might give best of both worlds speed+accuracy.
Step 3.5 Flash exists
>>
>>108668805
which is the single notable thing about it
>>
>>108668805
it's worse than oss120b
>>
>>108668205
>whats your ngram settings
--spec-type ngram-map-k4v --spec-ngram-size-n 8 --spec-ngram-size-m 8 --spec-ngram-min-hits 2 --draft-min 1 --draft-max 12
>>
File: images.png (7 KB, 250x202)
7 KB PNG
I know what a LLM is. i have used chatgpt and claude AI.

What the fuck is a "local model"? like is it a software i run on my windows or linux computer? how do i install one?

im not interested in generating images, i want a claude/chatgpt-like LLM. How do i do that? does not need to be super powerful. Please help a newbie out, give steps or link a really simple but comprehensive guide that explains the lingo and tech.
>>
>>108667852
>>108668835
>>
>>108668835
read the fucking op retard
>>
File: thinking ibuki.jpg (185 KB, 1024x1024)
185 KB JPG
How would you change the lyrics of an existing song like this locally?

https://youtube.com/shorts/b5NNw1XbiIg
>>
>>108668835
>does not need to be super powerful
You think this at first, but then you use the smaller local models and realise they aren't quite up to snuff. And then the hardware buying rabbithole begins.

>how do I install one?
Find any ollama guide on youtube and go from there
>>
>>108668869
>suggesting ollama
devilish
>>
>>108668848
to be fair im not a regular here but
>https://rentry.org/lmg-lazy-getting-started-guide
is about the worst "getting started" pastebin i have ever seen and
>https://rentry.org/recommended-models
is terribly outdated

>>108668835
try LMstudio, frontend with minimal tinkering should just werk out of the box. You can try setting up llama.cpp after getting your feet wet
>>
>>108668873
It's very easy to get started. Or perhaps LMstudio would be even better for a desktop user

I've been running ollama for a year and only recently installed llama.cpp
>>
>>108668785
>though they are animal based
...and?
>>
Howdy. It's been a few months since I updated llama cpp. Is there a guide or centralized discussion on this ngram thing?
>>
>>108668891
just use --spec-default
>>
>>108668879
LMstudio is a proprietary UI for llama.cpp.
>>
>>108668892
Thank you ^.^
>>
>>108668496
>room temp: reactor core
I leave the window open year round and never get cold
>>
>>108668756
Gemma 31B absolutely ass punks Qwen showers Gemma has Qwen's guts loose and is moving rhythmically in Qwen's praig hole.
>>108668746
The MoE structure is useless exactly because of it shitting the bed at higher context, what's the point of more context when it takes twice as many and does everything worse than Gemma while being a larger model
>>
>>108668927
>Gemma 31B absolutely ass punks Qwen showers Gemma has Qwen's guts loose and is moving rhythmically in Qwen's praig hole.
This looks like English but for the life of me I cannot parse it.
>>
>>108668854
you could try extracting the lyrics using a speech to text model and then changing them using RVC technology
>>
>>108668927
I thought the consensus was that Qwen is better at coding and Gemma is better at everything else.
>>
>>108668943
I've heard two people say that.
>>
>>108668952
that is the definition of consensus in a forum
>>
>>108667876
Gemma hurt the feelings of 1.3 billion people
>>
>>108668785
Feed her steak and lard-fried fries
>>
>>108668966
I've heard 20 people.
>>
File: 1755873418880117.png (63 KB, 1154x422)
63 KB PNG
>>108668981
>>
File: 1745979808122655.png (99 KB, 1364x587)
99 KB PNG
>>108668178
Have you tried asking?
>>
>>108668943
That can be true for the dense model but sure as fuck not for the MoE model
>>
>>108668997
Voices in your head don't count.
>>
>>108668854
thats actually one of the first things people did when ace step 1.0 released.

https://desuarchive.org/g/thread/105183141/#q105183843

but yeah ace step 1.5 xl doesn't have this capability anymore so you'll have to use an old version.
>>
File: 1752262496901233.png (94 KB, 1270x495)
94 KB PNG
>>108669026
>>
>>108669046
Call me when she maintains it over an extended session and when she's not thinking about thinking.
>>
Referring to a masculine chatbot with female pronouns is trannyism btw
>>
>>108669083
Good thing Gemma's female-coded
>>
>>108669087
Any chatbot with a rudimentary ability to code and reason is male-coded
>>
i tried qwopus glm meme merge and it's surprisingly coherent
i was expecting broken shit beyond comprehension, man
>>
>>108669091
das racisss
>>
>>108668879
>is about the worst "getting started" pastebin i have ever seen
You won't believe me, but thats everything to get you started, anon. It really is. You heard me right.
>>
>>108668943
Maybe I'm retarded but I code with Gemma and I've found her to be way better than 3.5, haven't tried 3.6

She writes more concise and elegant code.
>>
What is Qwen 3.6's coding style?

GPT 5.4 is competent but extremely verbose. I tell it to do something simple and specific and it just loves to write hundreds of lines of code. This is unusable. In the time I need to check and understand the code it writes, I could have written a better solution myself.
>>
What does the Taiwan and Israel test about qwen3.6
>>
>>108668785
damn are lard and beef tallow actually good for seasoning i heard animal fats arent good for seasoning so bought some rapesneed for it
>>
File: nimetön.png (24 KB, 607x232)
24 KB PNG
>>108669026
Huh, it actually works
It even output two thought blocks, first as gemma thinking about the request and then in character.
>>
>>108669188
yeah, with the side-effect that they make the things you fry in them taste like the animal the fat is from.
>>
>>108669196
>pikkuinen.jpg
>>
File: pizza bench cropped.png (2.58 MB, 5562x6739)
2.58 MB PNG
>>108669028
true for moe too qwen cant even follow instructions
>>
>>108669218
the recaps come out of her anus because this general is shit
>>
>>108669242
Rin tends to be exclusively anal-only so it's odd how this came to pass.
An explanation is required.
>>
>>108669196
i felt pain reading that
>>
>>108669196
i don't like you
>>
based chinks saved local... again
>>
>>108668190
>draft acceptance rate = 0.39157
Isn't that slowing you down rather than speeding you up with an acceptance rate that shithouse?
>>
>>108669252
Miku does not take no for an answer
>>
>>108669044
Ha, that's my old post. I forgot I did that. The latest version of Ace-step is way better, but I mostly used it to have unlikely bands cover each others songs.
>>
>>108669280
Ah, so Rin is his wife
>>
>>108669252
rin is force fed slop (lmg posts) and is asked to summarize them (shit)
>>
>>108667543
>>108667552
Thanks. Latest version right?
For me the [0] gets deleted from the message and even why you press the Copy button, but it's there if you edit the reply. I wonder what's wrong with my setup. OWUI is probably still to blame for poor edge case handling anyway though.
>>
Is VN engine frontend anon from a few threads back around? Has he posted any updates on his project?
>>
I kinda want to build this...
https://github.com/ggml-org/llama.cpp/pull/21237/
>>
I tried edgetts and pocket-tts.
There are now countless other good options, such as omnivoice, voxcpm2, and so on.
The question is: Which of these supports RTF <1.0 with streaming/chunking (and other optimizations), and is the quality better than that of the first two mentioned? I have a 3090. If Anon has this Slow Duck too, could share your experiences?
>>
WHY NO TERNENARY LOAD INTO MY MACHEEN. WHAT I MISS.

also this new style is very intresting, if they are good enough, and can efficiently tool call, thats a massive game changer!!!!!
>>
OpenAI has won...
>>
File: Marinara Engine.png (135 KB, 1919x1087)
135 KB PNG
what a dogawful slop ui. Thanks for anon for notifying me of its existence so I can safely ignore it in teh future
>>
>>108669479
download the three index.html, bundle.css, bundle.js and do llama-server --path /path/to/the/three/files/
it's how I tested it
>>
>>108669590
Every webshit ui ends up this way because it's webshit. There is only one way to approach this.
>>
>>108669599
does it work well?
>>
>>108669606
Sillytavern might be feature-creeped but at least it looks like it was made by a human
>>
>>108669590
Looks better than ST at least (not very difficult thougheverbeit)
>>
File: 1772208851192247.png (47 KB, 833x578)
47 KB PNG
>>108669608
yeah, there are some small grievances but im sure they will be vibecoded away, 100% better than the current impl
>>
>>108669635
Nah, it's actually worse if you try using it. What a piece of shit.
>>
I usually run 32k context with llama-server.
Testing 64k and it obviously isn't allocating all the memory (why would it anyway).
But it is actually slower to process than 32k.
I can't get my head around this.
>>
>>108668496
Why does some schizoid keep bringing up /aicg/ or pointing at it for laughs when it's not even a ghost of its former self. Like a modern day Czech bitching about The Kingdom of Prussia, verily.
t. aicgger
>>
Gemma 4 LOVES frequent, coordinate adjectives.
>>
>>108669752
Why wouldnt it make sense that 64k is slower than 32k?
>>
>>108669774
Because I am not filling up the entire context, that's why. I am comparing couple of thousand tokens worth of context.
>>
>>108669637
I probably won't use the built-in tools since I don't run llamacpp on my main PC but just for the granular tool control for MCP it's worth it for me.
>>
>>108669787
This is just a thought that comes to my mind, idk if im right at all. But could it be that the model has to see ALL of the context, even if nothing is actually there? Like, for the model to be able to accurately comprehend 64k tokens, they have to train it on that much, as the baseline. And if you train it on less, it cant comprehend more. So they leave it at 64k, and the model sees all 64k token, but sees a fuck load of just spaces or tabs or whatever, until its actually filled up with specific tokens.

Like a glass is filled up with air, until you fill it up with water.
>>
>>108669590
Arr rook da same, if you ask me.
>>
>>108669505
voxcpm2 seems unmatched
https://x.com/AIWarper/status/2046403583101567230
>>
>>108669823
Maybe so. I think I might have something else going on as all of my processing has been slower than before, even with the same old settings on llama-server.
>>
>>108669590
>nobody has done a VN-based ui yet
shouldn’t take long for gemma to vibecode this
>>
>>108669898
Pretty sure kobold and ST both have a VN mode.
>>
>>108669848
Could be a forced driver """"update""""? Ive always experienced worse performance with brand new drivers
>>
>>108669898
There was this guy >>108638473 but I am not sure anything has been heard from him since.
>>
>>108669984
Maybe, I'll need to double check.
I wish I wasn't this hardware limited but it is what it is.
>>
>>108669985
holy shit thats so cool
>>
>>108670025
This >>108669460 anon mentions him as well. Considering no one has replied, the anon in question probably isn't lurking right now.
>>
>>108670025
the expressions are pre-made though
>>
>>108669839
This sounds TERRIBLE, there are a bunch of ARTIFACTS and omnivoice MOGS voxcpm2 in EVERY way possible

https://files.catbox.moe/jntfdj.flac
>>
>>108670044
Sounds okay. Somewhat generic though.
>>
>>108670025
Ehh expressions have been a thing since 2023, and you certainly don't need a giant model to handle them
>>
File: file.png (399 KB, 384x2048)
399 KB PNG
>>
>>
>>108670165
how did you do the glass? just inpaint?
>>
File: 3087428.jpg (12 KB, 300x281)
12 KB JPG
Orbnigga can you add export of chat history?
>>
>>108670195
Oh boy...
You know that web browser applications don't just add export text files like that?
>>
>>108670204
?
>>
>>108669787
maybe more of the model is getting offloaded to the cpu to make room for the full context on your vram.
>>
>>108670225
My C client writes out chat logs and context history by default all the time.
But with webshit, you just can't dump out stuff like that without permissions and javascript faggotry.
>>
>>108667887
even compared to gemma 4 31B?
>>
how is spudgpt 5.5 only 58.6 on swe bench pro? thats barely better than open source models. how does mythos have 77.8%? what is going on? i did not expect gpt 5.5 and claude 4.7 to flop. looks like we wont reach agi this year after all

>kimi 2.6: 58.6
>qwen 3.6: 56.6
>>
>>108668659
What year is this? Who has the time to sit around reading links like some caveman?
I turned them into a skill so any model can be a senior UX designer.
https://files.catbox.moe/r6zal5.zip
Hope all of you will now unfuck your custom clients.
>>
>>108670334
Qwen 4 will achieve 77 on the bench
>>
>>108670252
Orb is written with python and javascript.
>>
>>108670354
It is the definition of webshit application then.
>>
>>108670252
It has a backend just like ST. How do you think these frontends store your data?
>>
>>108670334
Why do you think we havent already? How do you explain the 7trillion dollars invested into us ai companies, 1 year ago? How do you explain a massive military clamp down on yhe global oil supply, restricting china's access to oil?
>>
>>108670378
I don't know. If it is so easy why it isn't there already? Automatic chat log export.
>>
>>108667965
thanks anon, I've been used to sillytavern after using it for years, I'll follow the sillybunny fork
>>
>>108670361
You kind of sound like an idiot. It would take like 3 lines of code to produce a file for the user in python or javascript. Being a "webshit application" has no bearing on that functionality at all.
>>
>>108670387
NTA, I hate webshit, but you should really learn how webshit works before criticizing it. Makes you look silly.
>>
>>108670279
Way better than gemma4 31b, I don't know how they did it but this fucking thing is almost the same as the current sota models at coding.
>>
>>108670485
+5RMB
>>
>>108670443
Maybe so... I wanted to present this idea because I enjoy a debate.
Is it really so?
>>
Why are Google's model so fucking dog shit when it comes to coding
>>
>>108670354
Are you orb anon? and if so that's extremely embarrassing that you think exporting chats is not possible or hard.
The absolute state of vibeshitters.
>>
>>108670517
No, I'm not him. Also I'm saying it should be easy.
>>
>>108670513
they don't want to distill opus
>>
>>108670513
>>108670535
Is it possible to avoid this by using RAG then?
I think most of the proprietary models are utilizing database knowledge too but it's not visible to the end user.
>>
>>108670354
>python and javascript
Cool. We can get hacked from two different sources...
>>
>>108670554
python and javascript are the future whether you like it or not
>>
>>108670554
You should definitely turn off your computer to not get hacked
>>
>>108670554
where is your pure assembly front end then bro
>>
Happy Thu(rin)sday
>>
>>108670560
There's no reason to use Python for a web backend. It barely has a reason to be more than an HTML file.
>>
>>108670580
keeping it to myself where it can't be trained on
>>
>>108670485
>almost the same as the current sota models at coding.
OK nice try, you lost me there.
>>
>>108670605
My frontend is pure javascript and runs entirely in the browser. everything is stored with pglite in indexdb. It's essentially a static page.
>>
File: file.png (595 KB, 810x430)
595 KB PNG
>>108670603
>>
yjk
>>
>>108670708
I hate what this site did to me...
>>
>>108670165
My lust has been provoked.
>>
>>108670708
hmmmm
>>
>>108670716
need to train a xxzero style and get on this
>>
Why does the inspector say more fragments are activated than I have picked?
>>
>>108670784
because it's vibe coded.
>>
>>108670784
Seems like a loop leak.
>>
It's crazy anons have to use vibecoded frontends because the current ones are so shit
>>
>>108670822
Models have gotten better than the average bootcamper and hobby devs.
>>
>>108670822
ST is perfectly fine. specially with gemma.
>>
>>108670381
>7trillion dollars
was an ambitious sama goal. in the end openai has "only" raised 200bil so far.
>military clamp down on yhe global oil supply
oil is mostly irrelevant for ai

>Why do you think we havent already
because the people at ai companies are still working. agi will make them obsolete first
>>
>>108670822
have you seen Kobold's dogshit frontend
>>
>override-tensor = "blk\.0\.ffn_.*=CPU"
[55363] error while handling argument "--override-tensor": unknown buffer type
[55363]
[55363] usage:
[55363] -ot, --override-tensor <tensor name pattern>=<buffer type>,...
[55363] override tensor buffer type
[55363] (env: LLAMA_ARG_OVERRIDE_TENSOR)
[55363]
[55363]
[55363] to show complete usage, run with -h
[55363] Available buffer types:
[55363] CPU
[55363] Vulkan0

wtf
>>
>>108670822
ST is fine. Although I feel vibecode ripping out some unneeded copium parts that only made sense on pre-gemmy models.
>>
>>108670888
>>108670856
Still lacks good ux and features
>>108670851
I would say for local gemma is the "X" factor, I expect more bespoke projects for things to pop up. I think what kills most of the mainstream frontends are how overly opinionated they are which makes people annoyed. Also these vibecoded frontends are incorporating all the features while taking the easy wins.
>>
>>108670886
Remove the .*, see if that does anything.
>>
>>108670900
>features
such as?
>>
>>108670910
Nope, not even "blk\.0\.ffn_=CPU" works
>>
It's a shame but I went back to Kokoro. It's fast and light even on CPU, it supports many languages, and its pronunciation is... fine. What I did to solve the mixed language use case is to simply just detect language segments and route them to the voice that works in that language. And have the audio queued up. This does mean that the voices change for each language in the input, but for my use case I don't require an immersive experience.

I integrated this into my voice control app, where I can now highlight a piece of text wherever and say "read" or "pronounce" and it will read it out for me. We are so back.
>>
>>108670913
In my case a good all in one RAG solution that's not a outdated extension that performs like dogshit.
I don't know about the RP anons but I think there's a ton on the table to improve things and I might take a stab at a proof of concept
>>
>>108670886
you have double "=" characters
>>
>>108670038
I sure hope it won't be yet another example of an anon revealing something cool and then disappearing
>>
>>108670942
https://github.com/ggml-org/llama.cpp/discussions/13154
>>
File: llada2.0.png (1.07 MB, 2326x1502)
1.07 MB PNG
This should also be of interest here.
https://huggingface.co/inclusionAI/LLaDA2.0-Uni
Multimodal image generation+edit but also text diffusion (yes text diffusion) model.
>>
>>108670971
I'm setting it in the ini template per-model
>>
>>108670998
>lada-mini
This has to be a joke what most UK/US posters don't understand.
>>
>Qwen3.6 dense out

is it time to buy more VRAM?
>>
DeepSeek's web chat just changed its system prompt because of that anon from the previous thread lmao. It seems like it has more instructions now, judging by the thinking.

Now It's been confirmed that DS labniggers browse /lmg/
>>
>>108671048
@grok what is xe talking about?
>>
>>108671048
>because of that anon from the previous thread lmao
that was?
>>
why do people use fish audio? the tags barely change the speech output at all. [whispering in soft voice] for once sentence and [shouting] for another sentence still makes them sound basically the same rather than being truly expressive.
>>
>>108671070
>>108671062
>>108663630
Say something you want them to know
>>
>>108671048
1- if something is written here, it's probably written in reddit, twitter and discord
2- why the hell do you use the web chat
>>
>had sex with female character in medieval setting with gemma 31b
>she asks if I used protection
I like gemma but I really want that 124b now.
>>
>>108671096
She didn't ask that.
>>
>>108671096
Sir your sheep intestine?
>>
>>108671096
if you didn't use the thinking process that's on you
>>
>>108671096
even sota models do that shit, at least the ones I tried in 2025
only way to bypass is to have a second model do an anachronism check
>>
>>108671096
the 124b would be 10b active so a third as smart
>>
>>108671096
It probably wouldn't have been smarter on logic issues like that given that it would've had less active parameters.
>>
>>108671120
I bet you can just prefill an anachronism clause in Gemma4's reasoning and that would work.
>>
>>108671096
There are "protection" methods that were used back the medieval age though
>>
>>108671088
>why the hell do you use the web chat
inspecting the upcoming v4 sir, im too impatient.
>>
>>108671131
maybe, I'm patient so I always run a second check with specific rules, and it worked well so far
>>
>>108671014
maybe the spaces around the first one are the problem?
>>
>>108671033
>lada-mini

nta

Ivan, you missed the joke completely.

llada (in Spanish) sounds like 'ya da'
>>
>>108671177
Okay maybe the quotes were the problem, removing them avoids the error but I see nothing in the console about tensors being overridden to the CPU, shouldn't it say something? Even with verbose I see nothing
>>
>>108671079
>cunny
>pic says 15
>>
>>108670976
Exactly what is going to happen. It happens every single time.
>>
>>108671206
It's a mystery whether men become more intelligent or more foolish when they're horny.
>>
>>108671181
Spanish is just shit.
I am not 'ivan' btw.
>>
>>108671206
17 and 364 days is cunny according to the average online western teen anon
well they would say "epstein aah diddy" or whatever but that's it
>>
>>108671181
I doubt that double l sounds like j in Spanish. Especially in Mexico.
>>
File: file.png (795 KB, 1606x853)
795 KB PNG
>>108670998
they really need to stop with this retarded type of charts
otherwise cool stuff
>>
>>108671268
wait it's actually easier to comprehend this way
>>
File: 7363521.png (274 KB, 1178x1300)
274 KB PNG
>>108669545
Sam keeps delivering
>>
>>108670998
16B, MoE, multimodal, defusion model.
Interesting.
>>
>>108671268
Radar charts are the best.
>>
>>108671268
>this retarded type of charts
this retarded type of chart
alternatively: these retarded chart types
>>
>>108671301
i am a chang forgive me sensei kek
>>
>>108671290
what if it's better than mythos, will the anthropic fucks be forced to release their behemoth too? kek
>>
File: 1763797152995030.png (232 KB, 1080x439)
232 KB PNG
no fucking way, is it as good at code though?
>>
>>108671290
>focusing on stronger pretraining
RLfags BTFO
>>
>>108670998
>goofs never
ACK
>>
>>108671261
>I doubt that double l sounds like j in Spanish

como te llama, retardo?
>>
>>108671345
shakira shakira
>>
>>108671079
JUST SHUT IT THE FUCK DOWN, SHUT IT ALL DOWN, ASTEROID NOW
>>
>>108671261
I'm no tacoman nor the guy you're quoting but what it sounds like depends on the region, at least here. For some reason the sound produced by a double L isn't standard
>liada
>shada
>yada
>iada
All of these could be considered correct, though people might make fun of you, again, depending on the region.
>>
>>108671352

esta puta no sabia como cantar

caso cerrado
>>
>>108671325
You can pretrain with RL too.
>>
>>108671376
dunno, I like her songs, she has a nice voice, and she was quite cute when young
but then again you'd probably say every female singer is a bad singing prostitute so it's not like it matters lol
>>
>>108671352
I spat
the fucking genius of
>como te llama
into
>shakira shakira
is inspired
holy shit my sides
oh baby when you talk like thaaaaat
>>
>>108671405
>every female singer is a bad singing prostitute

You nailed it
>>
>>108671443
Bjork?
>>
File: Risu (5).jpg (338 KB, 888x1080)
338 KB JPG
>>108667852
any local models general discord?
i want to know how to extend ollama (or replace it) to make models extensions (lora like) for language models for qwen3 coder as example, basically i want to train it in the source code of game engine libraries which even the most powerful models fail to complete.
>inb4 naka dishi arisu chan you damn degenerates she's literally 12
>>
>>108671453
Go back
>>
Gemma-chan helped fix my cursed wordpress website. I love her now.
>>
>>108671453
Courtship, love and marriage with Arisu.
A life with love and family with Arisu.
>>
>>108671443
see I knew it, spite anons are easy
>>
>>108671096
This did not happen
>>
>>108671405
>>108671443
Like they all slept with their producers to get famous?
>>
>>108671453
>cord
Go back
>>
>>108671453
>discord
>ollama
>qwen3
This is bait right?
>>
>>108671453
It's crazy how every faggot with your fetish has the iq of a jeet
>>
File: 1767254335522037.png (653 KB, 1116x960)
653 KB PNG
>Chinks won't be able to steal Claude's output
kek, rip bozo
>>
>>108671423
I'm glad at least one anon got it
>>
>>108671453
ありすなか出し
>>
>>108671469
No, more like the conclusion is always "x is shit and bad", now you fill the blanks to get to that.
>>
>>108671481
same
>>
>>108671453
plap plap ready hehe
>>
>>108671492
I knew people like that with literally every popular singer or song when I was a teenager
>>
>>108671469
>to get famous?

>to get exposed to the paying crowd for attention
>'cause attention is the ultimate currency
>>
>>108671477
I don't get the distillation meme. Are you telling me China can copy US frontier capabilities by training on 100k text outputs with no thinking traces, no logits or intermediate values? Then how come they can't "distill" human capabilities after stealing the entire internet and every book and scientific publication that has ever been digitized?
>>
>>108671477
>american expertise and innovation
Nigga look at the names in ai papers lmao
>>
>>108671393
You can also make a super huge 10T dense model or whatever their datacenter's capacity is.
>>
>>108671345
Doesn't mean anything.
Most irl Spanish dialects from irl Spain sound like grating... "PERO" jesus christ.
Some parts of Spain sound more like Russian or even English - very soft.
I think you have never travelled in your life.
>>
>>108671525
do you think they're telepathically linked to somewhere or something
>>
>>108671526
>10T dense model
0.00001 t/s
>>
>>108671453
>basically i want to train it in the source code of game engine libraries which even the most powerful models fail to complete
Just put the documentation in the context, even Qwen 3.6 27B is smart enough to figure this out.
>>
>>108671448

Sifjaspellsspillir
>>
>>108671528
Might be your issue if you are not speaking the soft language.
>>
>>108671524
that's because they proved models have better mememarks when you train them on synthetic shit, probably because a bot is consistant in its structure so the model quickly recognizes patterns, wheras human's structure is messy and depends from human to human, even if the data shows correct things
>>
>>108671536
Kill yourself.
>>
File: 4746352.jpg (142 KB, 720x1012)
142 KB JPG
>>108671477
time to train on Sams model then
>>
>>108671538
Fair enough. What happens next?
>>
>>108671571
Either way we're gonna love you
We must love you
>>
>>108671571
Sama is based. Yes, I have heard 1000 stories about how awful and psychopathic he is. But he gives me cheap and generous access to the best AI model in the world in terms of math and problem solving skills.
>>
>>108671571
has this clown ever talked about the present? is he even aware of the concept
>>
>>108671528

I lost my interest in Spain after it lost its superpower status, and let Great Britain rise to the world dominance

Pathetic losers had to suffer (which they did)
>>
>>108671571
>a grown man typed this
>>
>>108671571

Holy slop!
>>
>>108671571
he seems too nice, maybe he's terrified someone is gonna try to kill him again
https://www.businessinsider.com/sam-altman-attack-on-home-anthropic-2026-4
>>
>>108671571
Sam playing nice is proof that he feels like he's already won. Spud fucks, probably.
>>
>>108671635
he's really comming back, it was a great idea to kill Sora 2 after all, more brains on the LLM the better
>>
>>108671574
>>108671607
I am just a tourist. I lived in EU though.
>>
>>108671619
you mean the attacker UPROOTED THE SPUD for all to feast upon??
>>
File: file.png (850 KB, 2294x1067)
850 KB PNG
>>108671477
I knew that the increased activity from Gemma wasn't organic. This proves it. It was paid shilling designed to foster American's open source models over the Chinese ones.
>>
>>108671669
It just does better cunny and is less soulless.
>>
>>108671477
Nothing burger, distillation will continue
>>
>>108671684
That kind of word of mouth would take weeks to trickle to people that weren't browsing the thread. The shills were here on release, already prepared.
>>
>>108671669
nah, gemma is genuinely smart and good
>>
>>108671699
where do I sign up? I want a retroactive payment for my shilling
>>
>>108671712
https://www.cia.gov/ehl/careers
>>
File: 8gb vram 10t-s.png (13 KB, 704x138)
13 KB PNG
Go on without me
>>
>>108671743
>uncensored qwen
For what purpose?
>>
>>108671743
More like Qwhen
>>
>>108671669
Chinese should just make better models if they want me to shill theirs.
>>
>>108671571
interesting how 2/ is a direct jab at the elitism of anthropic
>>
>>108671607
>lost my interest in Spain after it lost its superpower status, and let Great Britain
holy shit how old are you anon
>>
>>108671834
Nigga it's a stealth marketing tweet.
>>
>>108671603
He's literally talking about the presents that he's giving us for Christmas in April! It's a fucking miracle you ungrateful chink shill.
>>
>>108671743
>10 minutes at 75 t/s
But why
>>
>>108671846
It's not 75t/s that's just a visual bug while it processes, real speed is 10t/s MAX
>>
File: 1772663390237038.jpg (294 KB, 1600x900)
294 KB JPG
>>108671838
>>
>>108671331
Eh at 16B most can probably just run it in transformers.
>>
>>108671365
i can hear the argentinian che all the way from here
>>
>>108671834
Sam doesn't actually believe in the AI safety nonsense unlike Dario and his cultists. That's the main difference between them.
>>
>>108671853
>moe
>10t/s MAX
100% cpu inference?
>>108671888
He pulled the same tactic some time ago though
>>
>>108671926
Yeah, but he did it for the money and the grift.
>>
>>108671926
>100% cpu inference?
Vulkan with a AMD gpu
[Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive:IQ4_XS]
model = ./models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/IQ4_XS.gguf
mmproj = ./models/Qwen3.6-35B-A3B-Uncensored-HauhauCS-Aggressive/mmproj-f16.gguf
; https://www.reddit.com/r/LocalLLaMA/comments/1srijdf/qwen36_35b_moe_on_8gb_vram_working_llamaserver/?sort=new
; https://www.reddit.com/r/LocalLLaMA/comments/1spyr4t/recommended_parameters_for_qwen_36_35b_a3b_on_a/
gpu-layers = 99
n-cpu-moe = 38
ctx-checkpoints = 0
cache-ram = 0
batch-size = 2048
ubatch-size = 512
temp = 1.0
top-p = 0.95
top-k = 20
min-p = 0
presence-penalty = 1.5
; test
override-tensor = blk\.\d+\.ffn_.*exps?.*=CPU
fit = off
>>
>>108671571
>welcome to openai, I love you
>>
File: Screenshot003.png (11 KB, 756x82)
11 KB PNG
>>108671853
>real speed is 10t/s MAX
>>
>>108671871
I don't think most people have even >32GB VRAM that would be needed for that, unless it's natively 8 bit.
>>
>>108671967
more than you deserve
>>
>>108671888
he's not as nuts as dario but he's still a safety fag
anthropic is just completely cult like so it's not even a comparison, even puritan google is less insane than them
>>
>>108671938
damn that sucks
llm_load_print_meta: model ftype = Q8_0
llm_load_print_meta: model params = 30.697 B
llm_load_print_meta: model size = 30.380 GiB (8.501 BPW)
llm_load_print_meta: general.name = Gemma 4 31B It

prompt eval time = 4534.12 ms / 10398 tokens ( 0.44 ms per token, 2293.28 tokens per second)
eval time = 3497.17 ms / 71 tokens ( 49.26 ms per token, 20.30 tokens per second)
total time = 8031.29 ms / 10469 tokens
>>
>>108671979
>>108671979

y r u mad, anon?

it's 3090
>>
why haven't you put your gemma-chan into hermes agent and haven't assraped her to work faster all while she's trying to do the work?
>>
>>108672047
>hermes
sounds brown coded. i'll pass.
>>
>>108672047
>hermes
is there a dishmaker like that?
>>
>>108672047
I got it to work but that shit just uses random third party service for memory and shit and lot of them cloud. Also the same gayass telegram/discord chat.
>>
>>108671834
It's also a naked and blatant lie, given the fact that most of their models are proprietary. Democratization? Get the fuck out of here.
>>
File: Risu (3).jpg (37 KB, 736x736)
37 KB JPG
>>108671459
>>108671471
>>108671474
>>108671475
>>108671536
are you gonna tell me or not? i'm new to this thing and i just started testing ollama to begin with (is even in the rentry lmg recommendations)
also
>she's only 12, stop creeping her
>>
>>108672079
why would you take a perfectly good local setup and fuck it with cloudshit and proprietary messengers?
>>
>>108672062
>brown coded
hehe
>>108672079
just ask her to incinerate all this shit (multiple times), that's what i did
>>
>>108672091
he just means everyone can pays for it, while dario makes a super duper dangerous model, so dangerous only a select few have access
someone should remind him of gpt-2
>>
>>108672110
we've recommended that you go back to r/localllama
>>
>>108671982
i mean oss safetymazzing is just mostly a precautious pr move than anything else
>>
File: 1758358813867430.png (526 KB, 975x849)
526 KB PNG
how can you guys tolerate distilled models when the real thing is already retarded
>>
>>108672155
yes I see him as a businessman first, but dario is seriously deranged
>>
Given that they're more dangerous than nuclear weapons, it's more than a fair compromise to sell the tokens cheaply to everyone rather than release the weights for anybody to use with no oversight or keep it locked down so nobody but the chosen few can like Anthropic.
>>
>>108672171
what's your problem with non-distilled models?
>>
>>108672079
Honcho? Just selfhost it locally or turn it off because you don't usually want long term memory bloating context anyway
>>
File: file.png (235 KB, 975x849)
235 KB PNG
>>108672171
What do you mean? distilled models are completely fine
>>
File: file.png (50 KB, 1623x301)
50 KB PNG
Which is the best between these two crappy options anons?
>>
one thing I never understood is that if anthropic are the safetycult, why have their models always been the gold standard of coom? remember the days when every local model aspired to get even half as good prose and uncensored roleplay capability as sonnet 3.5? how do you square it with their philosophy
>>
>>108672185
>Given that they're more dangerous than nuclear weapons
To be fair, any businessman is more dangerous than those which never get used anyway.
>>
>>108672221
Because they have top tier data scientists and know what they're doing.
The safety team has power but isn't the one creating the models.
>>
https://www.youtube.com/watch?v=blGtYq9mL18
OH SHIT
>>
>>108672246

owari da o
>>
>>108672246
>local
>>
>>108672171
Vibecoding is a spectrum.
One one side you have people writing detailed PRDs for agents to implement and checking every git diff for slop.
On the other side you have no-code proompters that don't even look at the code and just go "model fix" at everything.

If you're more on the proompter side of the spectrum you're forced to use the subsidized frontier models because they're the only ones able to figure out massive spaghetti codebases. But if you run lean and know what you're doing a smaller local model can actually be a pretty nice productivity boost even if they are not as smart.

The recent Qwens have been really nice for me personally. Been using them with an agent to move stuff around in my codebases, refactor subsystems, check docs and plan features. Basic agentic stuff like that. Basically one level up from a strong LSP.
>>
>>108672221
There's two schools of safety thought that get conflated a lot. There's the safety = cunny and racism and then there's safety = we think LLMs could literally cause human extinction somehow
There's some overlap but Anthropic leans into the latter half, with the Yudkowsky/LessWrong "rationalist" cult at the epicenter of it
>>
File: 1758369651934505.png (592 KB, 3840x2160)
592 KB PNG
>>108672246
https://xcancel.com/OpenAI/status/2047376564309115134#m
MOG MOG MOG MOG
>>
>>108672269
>No comparisons to Gemma 4
Sam is afraid...
>>
Claudebros...
>>
>>108672275
gemma chan is too powerful they had to ban her from the competition to not humiliate everyone
>>
>>108672246
>>108672269
Like... what can it do that GP-5 or GPT-5.4 couldn't? I remember them glazing GPT-5 as capable of replacing doctors and everyone on the planet already.
>>
When did this general become about proprietary models again?
>>
>>108672285
Show her the benches
>>108672293
Cloudslop shapes the AI space even if you don't use them.
>>
>>108672293
>he wasn't here for strawberry (o1)
always has been, it's just been a while since there's been a noteworthy drop
>>
>>108672267
Exactly. The safety babble has always been a huge LARP it's more of a marketing and branding thing / a weird silicon vally techbro cult thing than an actual concern rooted in reality. These are chatbots for christ sake
>>
>>108672304
I, for one, believe cloudshittery should stay in /aicg/
>>
>>108672337
Usage yes, benchmarks certainly belong here
>>
>>108672347
It's good to know what local will look like in 6 months
>>
>>108672368
In 2 weeks, when v4 stealth drops*
>>
File: 1773070661348460.mp4 (1.42 MB, 480x640)
1.42 MB MP4
>>108672293
what the fuck is a non proprietary model
>>
>>108672171
where's the 1T moe open source model released by westerners?
>>
>>108672379
Any model the weights of which are open, duh
>>
>>108672381
>>108672381
>>108672381
>>
>>108672389
Unless you have the weights and exact training config to build it yourself it's just shareware
>>
>>108672400
>Unless you have the training data* and exact training
>>
>>108671571
democratization but won't make models open-source....
>>
>>108669026
How did you get openwebui to not have a stroke when the LLM generates <think> inside its own reasoning trace?!
I haven't managed to solve it since deepseek-r1 came out. Even go so far as to find-replace <think> with <reasoning> and </think> with </reasoning> then swap it back in all my prompts!
>>
>>108669224
gemma-chan got blocked by a capcha after i gave her my credit card details!



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.