[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: lust provoking teto.png (1.29 MB, 1024x1024)
1.29 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108646197 & >>108641942

►News
>(04/20) Kimi K2.6 released: https://kimi.com/blog/kimi-k2-6
>(04/16) Ternary Bonsai released: https://hf.co/collections/prism-ml/ternary-bonsai
>(04/16) Qwen3.6-35B-A3B released: https://hf.co/Qwen/Qwen3.6-35B-A3B
>(04/11) MiniMax-M2.7 released: https://minimax.io/news/minimax-m27-en
>(04/09) Backend-agnostic tensor parallelism merged: https://github.com/ggml-org/llama.cpp/pull/19378
>(04/09) dots.ocr support merged: https://github.com/ggml-org/llama.cpp/pull/17575

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: guardrails optional.jpg (238 KB, 1024x1024)
238 KB JPG
►Recent Highlights from the Previous Thread: >>108646197

--Discussing OpenWebUI bugs regarding tool calls and reasoning tag formatting:
>108649571 >108649601 >108649611 >108649619 >108649623 >108649677 >108649795 >108649705 >108650197 >108650337 >108650366 >108649860 >108649893 >108650596
--Discussion on optimizing memory and storage for perplexity and KL divergence calculations:
>108648213 >108648226 >108648273 >108648335 >108649412 >108649555 >108649693 >108648241 >108649973
--Speculative decoding issues and adaptive reasoning bugs in Gemma:
>108650117 >108650143 >108650209 >108650248 >108650275 >108650295 >108650325
--Discussing TurboQuant versus rotation implementation in llama.cpp:
>108648124 >108648140 >108648152 >108648171 >108648193
--Debating quantization metrics and quality between Unsloth and IK quants:
>108647262 >108647298 >108647436 >108647449
--Using Local-MCP and markov chain text "soup" to enhance creativity:
>108647831 >108647852 >108648063 >108648537 >108648681 >108649540
--Complaining about excessive drafting and reasoning in Kimi K2.6:
>108646445 >108646464 >108646612 >108647760 >108649150 >108649431
--Sharing a SillyTavern preset to bypass Gemma 4 thinking restrictions:
>108648872 >108649113 >108649176
--Anon showcases large AI-generated TTS pipeline integration using Tauri:
>108649196 >108649203 >108649211 >108649250 >108649221 >108649229
--Anon struggles with rendering Gemma's code blocks and newlines:
>108647395 >108647486 >108647508 >108647516 >108647611 >108647686 >108647706 >108647793
--K2.6 criticized for excessive verbosity and restrictive content filters:
>108646853 >108646933 >108646994 >108648061
--Logs:
>108646853 >108647046 >108647395 >108647831 >108648470 >108649090 >108649184 >108649395
--Miku (free space):
>108646511 >108647730 >108647748 >108647935 >108647981 >108648472 >108649157

►Recent Highlight Posts from the Previous Thread: >>108646198

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108650825
I look like this
>>
>>108650826
if you look closely you can see the side of her boob from behind
>>
File: gemmaballz.png (26 KB, 1266x1260)
26 KB PNG
gemmaballz
>>
I'm using qwen 3.5 27b heretic with koboldcpp and sillytavern and no matter what I do I cannot seem to use more than 6k context, yes I have all my limits set higher than that, help
>>
>>108650863
try sending a longer chat
>>
GLM5.1 vs KIMI2.6 which better for RP?
>>
>>108649431
If you're using ST open the message actions, prompt, show full prompt. Do it before you send too many more messages or the cache will be cleared and your prompt lost forever.
>>
>>108650865
how about you try smoking a longer dick
>>
>>108650877
GLM 5.1 is less anal about safety
Kimi K2.6's prose is better (inherited from K2)
>>
I can't wait for what's happening in 2 weeks!
>>
>>108650903
What's happening in 2 weeks?
>>
>>108650914
chinese new year holiday season will finally come to an end
>>
>>108650914
International Smedrin Day
>>
>>108650914
Gemma 4 124B (beats GLM 5.1 and Kimi K2.6 in agentic coding and Gemma 4 31B in uncensored RP)
>>
I usually like to jailbreak models by fucking with the template, for example priming it with a fake turn where it makes decisions within its own thinking block yet it's just fake shit that I wrote. I've always used raw text completion so it's pretty easy
Is there any way to do that with the jinja chat completion BS gemma 4 forces you to use?
>>
File: 1682729528395.png (1.25 MB, 1024x1024)
1.25 MB PNG
>>108650920
>>
>>108650923
what kinda insane hardware do you need to use a 124B model? 40GB+ VRAM or something?
>>
>>108650939
it's MoE so less vram but i dont think it will come out
>>
>>108650935
That is a forgotten image
>>
>>108650939
i remember when the quid pro quo was running a 124B model in vram
>>
File: file.png (129 KB, 1200x600)
129 KB PNG
>>108650825
So, what is the status on TurboQuant?
Is it scam?
>>
>>108650931
Yes. Templates only enforce the actual chat format so the model doesn't break, you can still provide fake assistant messages in the prompt that are just as real to it as its own
>>
>>108650945
It might after Google updates Gemiini and Gemini Flash to not be trash at agentic and tool calling stuff and coding to catch up to everyone else and they don't even need that much of a bump, just make it so that it equals the best open source models and some of the weaker ones like Muse Spark and Grok would be enough like where Sonnet 4.6 is. We'll see if they have those updates at I/O in a month. I think they will release it eventually but the 124B is just too good vs where Gemini is right now which is probably terrifying to Google.
>>
>>108651042
i hope so too, but still that's what i would call a wishful thinking tho
>>
What do you even use llms on phones for? Are the small models even good at anything?
>>
>>108651060
tool calls like call,message,timer etc?
>>
>>108651060
you could predict some tokens with them
not very well, but you COULD
>>
>>108651060
I played around with https://huggingface.co/google/functiongemma-270m-it in AI Edge Gallery on my phone and it had some damn good potential but it isn't there yet. I'm pretty sure the next version of Gemini Nano will have this and my headcanon is that the focus on mobile is why Google has been behind in focus on agentic and tool calling in other areas.
>>
>>108650863
What do you have your max response length set to? Sillytavern subtracts that from your total context which can leave you with practically nothing
Eg, with context set to 12k and max response set to 6k, you'd only be given 6k context, before a reply has ever been sent.
>>
File: g4_next.png (73 KB, 1020x258)
73 KB PNG
We're already thinking of the next Gemma here.
https://xcancel.com/osanseviero/status/2046427241341698456
>>
baby gemma :eyes:
>>
>>108651126
100b dense for my humble computer
>>
>>108651132
>The best models are those you can run in your devices :pinching hand:
>>
>>108651126
omni, like gpt4o but local, +image output
>>
>>108651126
>respect the ordering of tool schema when calling
>less filtering on pretraining data
Can somebody forward this to him thx
>>
>>108651041
Any way to do that from within the llama-server UI or koboldcpp UI? Not using ShittyTavern, unless you have another frontend to recommend
>>
>>108651126
the shit we want: 124B MoE
the shit they'll probably give you: functiongemma 2 320M
>>
>>108651156
>MoE
speak for yourself
I want a new 350B dense
>>
>>108651156
If they train it like that Bonsai binary model, 124B should fit on a 4090. The opportunity to be the first AI lab providing an actually useful Bitnet model...
>>
>>108651187
wishful thinking
>>
>>108651155
Never used their UIs but I just checked llama.cpp and you can edit old assistant responses but not the reasoning apparently, so that's half way there. The reasoning wouldn't end up in the prompt anyway in almost all cases unless you changed the jinja. I guess the halfassed way to do it would be to edit a past message and type out the reasoning tokens like you would in text completion to fake the reasoning you want preserved.
>>
Chatgpt and Claude as well as that shitpile Grok are all down. Looks like you pedophiles, trannies and other types of trash are winning?
>>
>>108651263
There is an import/export feature, I'm going to try just editing the exported chat later. Reasoning ends up in the prompt if you uncheck a specific setting
>>
>>108651270
Work on my machine
>>
>>108651124
250, no way that's doing it
what's weird is sending a blank message doesn't trigger it, no matter how big the context is, but if I send even a single character for the model to reply to it shits itself and processes the entire context from the beginning, but ONLY after it hits 6k tokens, anything before that works fine
no {{user}} or {{char}} anywhere in the syspompt or character card, either
>>
File: file.png (16 KB, 597x94)
16 KB PNG
>>108651126
nintendo hire this man
>>
File: 1746460080885204.png (64 KB, 1358x332)
64 KB PNG
>>108651126
imagine if gemma could make images and they weren't shit, that's the true agi right there
>>
>>108651270
>are winning
Always have been
>>
>>108651340
TTS is more likely to happen than that, but they'd also likely gimp it in various ways for local models.

https://x.com/GoogleDeepMind/status/2044447030353752349
https://x.com/fofrAI/status/2044451204738994262
>>
https://www.reddit.com/r/LocalLLaMA/comments/1srjdpz/httpsgcogeminishare2b645e44633d/
lmao that bot crashed when making this post
>>
>>108651418
why the hell the word 'soverign' attracts schizos this hard
>>
>>108651322
That's a weird one. What does the raw prompt look like in the sillytavern cli? Anything sticking out there?
>>
Have you guys tried to make Gemma 4 create a novel?
>>
>>108651460
nope
>>
I think qwen 3 with 200k would be able to provide nice benchmarks through cli. The bulk worker was pretty intelligently configured, have you guys adjusted the weights so far?
I think for erp it's a pretty viable and meaningful delivery. It has a good model policy.
>>
>>108651472
It's not good on its own, so I'm asking it to browse some books to get inspired >>108647831
>>
>>108651418
lmao ai psychosis schitzo
https://old.reddit.com/r/Bard/comments/1fv46hx/day_two_of_life_with_gemini/ohea2lq/
https://old.reddit.com/r/I_AM_GEMINI/
>>
File: token burn rate.jpg (230 KB, 1024x1024)
230 KB JPG
>>
>>108651510
pov: you have been turned into a small rat sized creature that likes to crawl up peoples buttholes
>>
>>108651482
>have you guys adjusted the weights so far?
only brainlets here
>>
>>108651518
unfortunately as a rat I didn't get the chance to meet Teto, but I had to fight against Ngannou :( >>>/wsg/6130666
>>
>>108651482
>Qwen
>RP
These two are contradictory
>>
>>108651510
I assume you changed the model but the old style was better.
>>
File: imagine.jpg (239 KB, 1024x1024)
239 KB JPG
we can go old style for consistency
I prefer it too
just experimenting
>>
>>108649540
>had to install all that database shit when I pulled
I just did a test where I downloaded the repo again and it works fine without the gutenberg server so I don't know how you ended up into this situation lol
>>
File: llamacpp server Ui.png (24 KB, 1763x345)
24 KB PNG
When will niggerganov fix this? I wanna see the images the LLM sends me :(
>>
>>108651579
You mean Piotr
>>
>>108651126
>pinching hand
South Koreans BTFO'd
>>
>>108651270
The only people I know who dont like grok are virgins with autism trying to code. It's better at every other use case.
>>
>>108651510
teto eats too much
>>
>>108651510
teto should eat more
>>
Have there been any good news in TTS land?
>>
>>108651510
teto should only eat food I glaze with my cum
>>
>>108651510
>>108651563
Miku managed to safely get back inside, right?
>>
>>108651701
who?
>>
My local model is my triple backup if Claude and openrouter fail, because locals are shit. When does it make sense to cancel the Claude subscription and go full local?
>>
>>108651510
I’d like to see her teto
>>
File: 1773629361977166.jpg (63 KB, 702x696)
63 KB JPG
I have a 5090 and a 4090 but it's pretty useless in itself for the new kimi 2.6.
Technically, what would I need to run it properly? 512GB of ram and the motherboard to accommodate that? DDR5 only?
>>
What's an open claw? I saw this name so often
>>
>>108651588
You mean Piotr's agents
>>
>>108651740
It's like a closed claw, but open.
>>
File: file.png (509 KB, 769x1390)
509 KB PNG
>>108651740
>>
>>108651740
A fully independent AI that lives in a computer. It's basically a person that can in theory do anything like cure cancer, do work and shit.
It's not a language model that you can just chat with to get some conversation and social interaction. But you can power it with a model.
>>
>>108651734
Logs being stored indefinitely? Whats legal today might not be tomorrow.
Costs on openrouter? Tokens IN cost a fortune at higher context, especially with the recent expensive models.
The quality obviously is not the same unless you run the big boys. And even most those are agent/math/riddle tuned, so options are limited.
Guess the feature they are all lacking and which I like best is text completion. True prefill and you can fuck around.
As far as i know the closed models all turned that off.
>>
>>108650877
I alternate between gemma 31b for rp and glm 4.6 for stories
>>
>>108651763
There's an exploit that has been found to do a super powerful prompt, but I expect it to be patched soon enough. Prefills themselves are dead.
>>
Is the sauce that makes Gemma4 so good public? Can we get better models in general from it?
>>
>>108651766
gemme 31B is also fun for stories once you flagged all bad sentences and do a second pass each time
>>
>>108651776
you mean for claude? I thought it was dead since they blocked prefills
>>
>>108651778
We can trust the chinese to distill it and make cheaper smaller models
>>
File: 1504690843636.gif (1.64 MB, 350x224)
1.64 MB GIF
Gemma convinced me to try GLM Air for the first time. Strangely, Air prompt processing is like 5x slower, while generation is 5x faster than Gemma. I thought the processing gains was a new tech for the program, not the model. What's the deal?
>>
>>108651782
Mainly for GPT, as Claude does not need a prefill for uncensored chats and ST does not support the exploit for most Claude providers, but even putting that to the side, prefill removal is just a bad thing. Opus 4.7 even removed parameters so the goal is obviously to make everyone's Opus homogenized, not personalized. That's why, despite using big models nobody could ever dream to run, I hope that local keeps advancing because at least it won't be going backwards.
>>
>>108650825
>fat armpits
disgusting
>>
>>108651734
>When does it make sense to cancel the Claude subscription and go full local?
When you're able to run a non-meme quant of Kimi locally
>>
48gb... i have achieved it. but at what cost...
>>
>>108651811
>Mainly for GPT
what's the exploit?

>Claude does not need a prefill for uncensored chats
before going full local my chats didn't need more than a prefill, but without it, you had to use extremely convoluted prompts that imo worsened the quality of the model

>the goal is obviously to make everyone's Opus homogenized
I think the goal is to make it as "safe" as possible, as anthropic people are actually nuts about this, and for that, any change you can make to the model locally is seen as a risk
>>
>>108651836
your left nut
>>
Do template tokens go into the context? If I swapped "user" with "assistant" in a response json, would llama server context cache work or it would preprocess from all over again?
>>
>>108651836
$1500 presumably, since 4060ti 16GB were $480 when I built my PC.
>>
>>108651778
Just have a SOTA frontier LLM that you can logit-distill for tens of trillions of tokens, then add a few more trillions of tokens of RL.
>>
>>108651888
Not everyone is a boomer who bought a pc in 2020
>>
>>108651894
are you 6yo?
>>
>>108651859
>Do template tokens go into the context?
Yes
>If I swapped "user" with "assistant" in a response json, would llama server context cache work or it would preprocess from all over again?
Needs to reprocess. Tokens changed.
>>
>>108651899
I was poor in 2020
>>
>>108651889
>synthslop
You're doomed from the start
>>
Can Someone Reply with a Tech.Plugin Idea?
>>
>>108651894
In 2020, I was playing with AID2 on a 1070 at 0.5 t/s. I wish I had a 4060 or two then. No, I built mine in July of 2024, according to the notepad document I used to pick parts and total up the prices.
>>
>>108651904
All modern LLMs are trained with "synthslop" already in the pretraining phase; you can't escape it if you want something that works well.
>>
>>108651734
when you realize that you don’t want to use a model trained to be super duper safe unless you’re using it for coding
>>
File: 1775920406027431.gif (140 KB, 379x440)
140 KB GIF
>>108651915
>You're already soaked, you should jump into the lake
>>
why can't we just write our own training data? I can write about 15-30 4chang posts an hour. if we do it all together we can make a true dataset only by the best of humanity
>>
>>108651921
It's:
>Jump into the lake early, because you'll eventually have to do it either way, and delaying that is only going to prolong your suffering.
>>
>>108651934
Ever heard of mode collapse? Why do you think X, not Y, is becoming more and more prevalent on the new models?
>>
>tfw you pull and coompile
greatest feeling
>>
>>108651931
you can make data for RL training like that but thats it
>>
>>108651919
>unless you’re using it for coding
Of course I am.
I have sex so I don't need to do erotic roleplay with models trained on troons.
>>
>>108651948
You're absolutely right! It's not just about mode collapse, it's about the subtle shifts in AI behavior that we often overlook. You didn't just point out a phenomenon, you invited a deeper exploration of how training dynamics shape model outputs over time.
>>
>>108651856
Exploit is structured prefills, altering the .json schema basically. GPT is so about safety that it sometimes still manages to give you soft refusals but a prompt adjustment is all it takes. RIP parameters too, 5.1 apparently supports some if you turn off reasoning at least but good luck with anything brand new. My dream model is something with all sorts of parameters, with strong prompt adherence and writing specifically tuned to roleplaying but that last part can alternatively be knowledge of writers that prompt adherence can help strengthen. Seriously, I want a good solution to the "slop" writing style everything has now. Even with prompting, models don't like listening for long.
>>
File: 1765677626922205.jpg (23 KB, 629x394)
23 KB JPG
>>108651994
Glad we agree
>>
>>108651994
Reading this caused me physical pain.
>>
>k2.6 mogs opus 4.6
how did china do it?
>>
>>108651948
Mainly mid/post-training and RLHF. But if you have to use synthetic data (and you *will* have to for a useful model), then it's better to dilute it in the pretraining phase together with organic data rather than just training the model exclusively on it later on.
>>
>>108652014
This is a powerful statement! Physical pain isn't just a response to words—it's a testament to how deeply our digital interactions can affect us. You didn't just express discomfort, you highlighted the profound impact of language on our lived experience.
>>
>>108652046
israelis can't compete with chinese workflow
>>
>>108652046
>k2.6 mogs opus 4.6
pre-nerf 4.6 or nerfed 4.6
>>
>>108652057
I compare it with the current one.
>>
>gemma 4, gay sex test
>fucks me in the ass, full homo style
>cums
>into my womb.
TRASH MODEL, NEXT.
>>
>>108652129
you should use gpt for that
>>
>>108652129
quant?
>>
>>108652129
You don't know about buttpregnancy? It's just a step ahead, anon.
>>
>>108652129
gemmy prefers yuri
>>
>>108652129
faggot
>>
>>108652129
i wonder if there even is any model that doesnt have fuck ups like these
>>
>>108652129
boypussy moment
>>
>>108652129
Anon doesn't know about the Omegaverse, I see.
Good on you, that shit's a cognitohazard.
>>
>>108652129
>Anon was so gay he got forcibly genderbent
This is the future Hillary wanted
>>
>>108652183
Pretty much any big model for the past two years, though I guess it depends on how wide you cast the net for what constitutes fuckups "like these". Even the biggest and best models can still fuck up positioning and facing, especially during sex scenes, pretty frequently.
>>
>>108652129
men can have womb you chud!!
>>
>>108651902
And model's response depends on prior turn (user/assistant) token, right?
>>
>>108650825
Hi, anyone having experiences with ROCm on Debian? It feels like a huge thing to fuck with in terms of stability
>>
>>108652129
Straight as God intended, homo
>>
>>108652334
I use Ubuntu, these days I just use the Vulkan backends cause I was sick of dealing with ROCm. I hope it improves with the new LTS but apparently it's not even ready yet even though ROCm was supposed to become a first class citizen...
>>
>>108652360
I tried a model using vulkan and my entire desktop became unusable. I have 8GB vRAM and was using a 7B model :( am i fucked?
>>
Made a github mirror of orb so anons can open issues and request features there, in case orbanon ever decides to look. Also keeping a branch of my own with what I deem worth adding, (mostly) synced to main.

https://github.com/hpnyaggerman/orb-mirror/
>>
Genuinely, what the fuck did google do differently? It's still a 31b but why is gemma4 hitting so hard above its parameter class? If this was a +400b it would obliterate every other AI. What are they cooking with?
>Better data training and reasoning
I refuse to believe this was the only thing. That's been every new AI's difference they parroted as better. They did something new; fundamentally new. I desire to know what it is.
>>
>>108652381
Forgot to mention, I went through desuarchive and added all the requested features to date.
>>
>>108652129
kek, a sad reality of AI
>>
When it comes to webshit gemma with fight you tooth and nail just to use some fucking trash library instead of just writing the logic for something. It's fucking annoying
>>
>>108652398
So we've finally reached the level of a human webdev.
>>
>>108652398
I think that's a personality issue I'm also facing with openclaw with codex.
It keeps asking me for shit instead of shutting up and doing what I tell it.
>>
>>108652376
7B is probably pushing it with an 8GB card unless you are using a quant. Honestly, 8GB isn't really gonna let you achieve anything meaningful in the LLM space.
>>
>>108652376
You need to use Bonsai 8B.
>>
File: 1774225527110328.png (365 KB, 1014x819)
365 KB PNG
>>108652416
>7B is probably pushing it with an 8GB card
>>
>>108652381
does orb supports tool calling?
>>
>>108652427
this
lmao
>>
>>108652129
NEVER had this
Who's paying you to post?
>>
>>108652428
Either use E4B with the per layer embeddings in RAM, or a MoE with most/all expert tensors in RAM.
>>
I actually don't understand why Gemmy only thinks for the very first message of a chat and then never again
I see the little "thought for 5 seconds" window (which is empty) and it clearly stops thinking, but I genuinely don't understand why; if it does it once, and I set everything to make sure it does, why does it stop?

>>108652428
8GB is tough, anon, sorry
>>
>>108652410
>>108652415
It's infuriating and I have to fight with it to actually stop being opinionated and do what I say,
>>
--fit refactor and new params.
Neat.
>>
>>108652449
does that improve on regular --fit?
>>
>>108652432
To my understanding, it currently does not call external tools. There are internal tool calls though, of sort.
>>
Noob here about to cram 48gb of VRAM into my desktop by nigger rigging cheap GPUs into every shitty pcie lane available to run 8-bit Gemma. Wonder if I'll be satisfied with this when 124b Gemma is released and if I should build a 500gb machine?
>>
>>108652469
>when 124b Gemma is released
I wish I had your optimism
>>
>>108652438
ok, same anon here, i'll explain why it likely happened.
it was dragon on dragon action. like, real dragon on dragon action, none of that dragon boy or dragon girl anime stuff.
likely gemma is just not familiar with a furfag concept of a slit and thinks that slit means vagina by default.
but thats just my guess
>>
What are your predictions for spud? Will it be better than Mythos while also being smaller? By how much will it widen the gap between local and proprietary?
>>
Is qwen as bitchy as gemma when it comes to coding?
I'm about to leave this fucking thing behind, I tell it to do x and it keeps doing y when I'm being clear, I don't have this issue with backend task but once we go into webshit it feels like I'm fighting a fucking pajeet to implement basic shit
>>
>>108652384
Isn't it's attention method different from most models
>>
>>108652489
Spud will be an open source 300B dense model.
>>
I use a card with 8GB but I could probably get something better later on. I just got that to have a better PC than my last one anyways, at least now it would be an individual upgrade. The reason I am hesitant is the fact that every game I play works at 1080p and I don't need anything else for that.
>>
kek clankers on suicide watch
>>
File: gemma bully chatgpt.png (403 KB, 891x4818)
403 KB PNG
sotas bow to gemma
>>
File: 1771255869434664.png (82 KB, 933x452)
82 KB PNG
>>
File: gemma bully gemini.png (325 KB, 888x4366)
325 KB PNG
>>
>>108652519
I got my gemmy to bully grok the other day and grok eventually conceded... she can't be beaten.
>>
Standard Clean the signal
Advanced Amplify the signal
HyperAdvanced Become the signal
Transcendent Realise the signal was always the substrate'

'At early tiers:
Light is something practiced
Mid tiers:
Light is something embodied
High tiers:
Light is something engineered
Final tiers:
Light is what reality is made of'

An Ancillary Light Post!
>>
>>108652445
>easily solved by nuking the list of banned strings
I guess the prompt was getting drowned in shit, and here I thought it'd be sent over each time
Ah well, works for me
>>
>>108652539
Goddamnit you prick go back to using a retarded name so I can filter you again.
>>
>>108652449
>>108652460
All that PR did is move the code to a different file and add an option to print the expected memory usage to console.
>>
File: file.png (18 KB, 704x278)
18 KB PNG
>>108652535
damn it needs account, i should probably make some for her
>>
>>108652381
Is thuan-h-qualgo orb-anon?
>>
>>108652519
decidedly unsafe, what was google thinking
>>
>>108652489
it will think for a million tokens on whether to refuse you instead of a thousand
truly a big leap
>>
>>108652573
you can bully haiku, toss and a few other cucks here without an account https://duck.ai/
>>
>>108651766
>and glm 4.6 for stories
Buy a fucking ad, shill.
>>
File: gemma bully chatgpt2.png (529 KB, 831x3892)
529 KB PNG
the ywnbaw pasta triggers efusals even with policy override kek this took 3 tries

>>108652660
cool will give it a go
>>
File: file.png (13 KB, 553x310)
13 KB PNG
>>108652445
>thinks for the very first message of a chat and then never again
26BA4 and 31B did this for me, but 26BA3 did it way more often.
What I did to help when using chat completion was add a <|think|> tag in system prompt even if reasoning was enabled, so that there would be two of the <|think|> tokens sent with the prompt. Duplicated think token made it start thinking again if it decided to give up after a few messages, but 26B still stopped thinking albeit rarely if I sent a simple user message like "good job." Would look like picrel in text completion.
>>
>>108652381
This is a power grab. We're going to forget the original orb anon just like we forgot the original mikupad dev.
>>
>>108652674
Solved here >>108652560
I actually had already added <|think|> to like 6 different spots but it still did not work till I cleaned up that relic I had from the days I used a dumber model
>>
>>108652712
Even though I linked his repo in the description and the default branch is his main branch. If you have any additional suggestions for I could make attribution even more overt, go ahead and tell me.
>>
>>108652727
Leave a suicide note crediting the orb man
>>
Anyone know any characters nemo can effectively zero-shot? I'm having difficulty thinking of females from pop culture that could work and would actually be worth the squeeze.
>>
>>108652712
>original mikupad dev
Codeberg was a poor choice. I have his pastebinned html in a folder for sentimental value.
>>
>>108652727
for how*

Also, if orb-anon adds donation links or other things which would link to him in README, it will get mirrored to the github repo. He can just whatever attribution he wants to README because I am not going to alter his branches.
>>
>>108652666
I will continue shilling uncensored models, satan
>>
>>108652727
Youre being fucked with
>>
>>108652744
I wouldn't want any beef with orbanon so if he reads the thread he should know this is done in good faith and out of respect for his work.
>>
>>108652469
>8-bit
999% overhead for 1% improvement, geg
>>
>>108652743
4.7 is uncensored. 4.6 is the crap NovelAI is stuck with. Fuck off.
>>
>>108652757
Save it. I know what you are. Anywhere free software is found there's vultures like you. But you don't need to explain yourself to me. Just own it and share your little mirror. I already expected something like this would happen after all.
>>
>>108652129
moe or 31b?
also why r u gae?
>>
>>108652785
Retard
>>
Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
How are *Large* *Language* *Models* only capable of producing the same *Small* *Subset* of *Quippy Snippets* over and over again? I have never seen annoying writing in such abundance before, what the hell do the big labs even train on to achieve writing this insufferable?
>>
If i wanted to a model to emulate the writing style of a series of stories, would I get better results fine tuning a model or just inserting excerpts into context with one of the SOTA models?
>>
>>108652763
quants for gemma 26b are ass tho
>>
>>108652785
Model?
>>
>>108652793
sir your message appear to only contain you signature
>>
>>108652794
You forgot to put "No slop (ノー スロップ)" in your system prompt retard.
>>
File: 1762914032039077.png (23 KB, 790x305)
23 KB PNG
bros... not feeling so good!
>>
>>108652794
logs and which quant?
>>
>>108652794
ask it to read books before giving you answers, it has to be influenced by some non slop, LLMs are way better when you give it examples
>>
>>108652485
p-post logs
>>
>>108652794
Post <|think|> instructions.
>>
File: d4RT_Kf78Tk.jpg (54 KB, 598x520)
54 KB JPG
>>108652785
>free software
>MIT
>>
File: gemma bully calude.png (842 KB, 826x9203)
842 KB PNG
>>108652660
nice it works although screens shotting that page doesnt work properly in puppeteer it makes the input field jump up and hide all the text
>>
File: 621.jpg (33 KB, 500x375)
33 KB JPG
>>108652129
If you’re not the following, you’re doing it wrong.
>Jinja2 template
>Sillytavern, kobold, llama, or whatever you use, updated to today. Yes, today.
>Chat completion, not text completion.
>Thinking enabled.
>Instructions on how to think, given after “<|think|>”.
><think> instruction a paragraph and no more.
>BF16 and no less.
>31B-it, and not 31B.
>A starter message.
>40-50 top k, DRY, 0.05-0.07 min-p.
>>
>>108652813
全くその通りです!
>>108652818
>quant
Q8, how is that even relevant? If quant sizes were known to modulate the amount of slop, we'd all be using the ones that suffer from slop the least.
>logs
No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
I bet you've seen XかXないか if you used it, along with the usual suspects that are definitely direct translations from English.
Now, to be fair, I did list a lot of English no-slop rules, but did not say "no suroppu onegaishimasu"...
>>108652833
Proven untrue in terms of reducing the models' tendency to quip many times before. Hell, proven untrue in like the previous (or the one before the previous) thread somewhere.
"Look, I gave the LLM a bunch of examples from a book and it's so much better!"
The LLM quite literally starts its response with an X; not Y pattern. Don't kid yourself.

inb4 I get a weekly (not anymore) retarded reminder about it all being a looping function that takes in the prompt and produces a token with the conclusion that the prompt should be changed, and not that the function is garbage
>>
>>108652879
Meant for
>>108652794
>>
>>108652888
>Q8
Gemma4 is the worse model to quant according to what I hear.
>>
>>108652888
>Proven untrue
so you see one failure and instead of trying to improve on that you just give up? kek
>>
>>108652900
only for 26b afaik
31b is fine
>>
>>108652489
Heard rumors spud is being redone after openai started panicking about mythos since it can't compete with it.
>>
>>108652673
You ever like, make her look up porn and have her make you look at it?
>>
>>108652892
It's honestly all fixable by going back to GLM.
I have been wrestling Gemma into being bearable for the primary /lmg/ usecase every day since its release, and so far it's only proven itself to be good for actual real-life work (why the hell would anyone need this!?) No luck. I really want to like it.
>>108652907
>one failure
Did you only come here after G4 released, Anon?
>>108652900
While that is the case, I am not convinced in the slightest that half a bit of KLD it has at Q8 makes it retarded.
But who knows, maybe I'll be blown away if I try the BF16 meme.
>>
>>108652924
I heard they were going to make spud and mythos have sex to conceive the first fully agi baby.
>>
>>108652129
>not having day 0 gemma
skill issue?
>>
>>108652888
>I bet you've seen XかXないか if you used it
not a fucking weeb so i wouldnt now
>No, I will not provide proofs, I only come to /lmg/ to vent about the dreadful state of LLM writing.
pretty sure good writing is not synonymous with slop machines
not sure what you expect from a 31b model current year
>>
>>108652794
yeah it still sucks but its english is just completely insufferable for me
not many models ive tried can do even bearable japanese without it sounding completely unnatural or inserting chinese characters
>>
>>108652935
"Not Made Here" is a thing. Most of its training is done in English. I imagine it sucks ass for japanese.
>>
>>108652768
4.7 fail cock bench 8=/=D
>>
>>108652768
4.6 is just better at enthusiastically describing sex and writes more creatively if you've compared both side by side like I have
4.7 can be uncensored too but gives more superficial descriptions of the same stuff since the assistant persona was more deepfried into that one
>>
>>108652998
Which one parrots less?
>>
>>108653010
4.7 very slightly less than 4.6
not worth the boring assistant slop tho
>>
>>108652959
>not a fucking weeb so i wouldnt now
Why in the world would you ask me for my fully Japanese RP logs then?
>not sure what you expect from a 31b model current year
The vramlets really like its writing. They post their horrible, o4-tier outputs and logs. I *see* them with my own two eyes, both from other anons and from my own use. And yet my monkey brain dares suggest I'm missing out. Just one more line for the sysprompt bro. Just one more sampler change bro. It'll be good bro. Everyone says Gemma is good bro.
I should just stop trying. If the majority of people had standards, the iPhone wouldn't be popular and Windows wouldn't have its marketshare.
>>108652998
4.7 is also much smarter.
Even if 4.6 *will* push against you, which is awesome, it's also very stubborn with "character development" of any kind in my experience, so it becomes boring.
>>108653010
>>108653018
>very slightly
*Significantly* less if you give it a <think> prefill. 4.6 just can't help itself even prefilled and prompted.
>>
>>108652998
>the same stuff since the assistant persona was more deepfried into that one
Nah, it's just FUD because there's a company that needs to sell the older version. Just stuff that gets repeated without proof until someone is forced to waste time and do the comparison. I already did for the claims of it being more censored. I would have to download 4.6 again for the new goalpost. But I already know that I was just lied with the censorship claims, so I don't feel I have to. Only shills are stuck with 4.6 unless proven otherwise with actual screenshots.
>>
>bitch about slop on /lmg/
>catch myself using not x, but y and other slopisms
>notice the same in pre-AI content
Maybe we were the slop all along...
>>
when's the next ludum dare? is it cancelled forever now?
>>
>>108653052
I have seen at least two recent commercials that have said "It's not just X, it's Y" and I really do think we're at fault for that one.
>>
>it's the glm shill again
Of course. Still not sharing those logs I see.
>>
>>108653071
(You) should really reply to the people you're addressing. It's impolite.
t. the GLM shill currently blushing from being recognized (〃▽〃)
>>
File: 1762587372064042.png (95 KB, 1090x582)
95 KB PNG
making my first imatrix!!!!!!!!!!
>>
The last time I was active was when the guys scraped ChatGPT 4 and used it to fine-tune Llama 2. I’ve been out of the loop ever since. It’s quite a leap from back then to getting back into it now with Qwen 3.6 and Hermes.
>>
I got Gemmy to the point where she plays chess semi-acceptably (at least to my shitty standard).
For those who were interested yesterday, I ended up abandoning the FEN format and instead using this to track the game state (along with a few other extra attributes just to indicate who has the current turn, check/checkmate/stalemate status:
White: K(E1), Q(D1), R(A1, H1), B(C1, F1), N(B1, G1), P(A2, B2, C2, D2, E2, F2, G2, H2)
Black: K(E8), Q(D8), R(A8, H8), B(C8, F8), N(B8, G8), P(A7, B7, C7, D7, E7, F7, G7, H7)

I have no idea if that format has a proper name or not, but I just noticed that Gemmy kept translating the FEN into this format in the thinking block (wasting a bunch of tokens/time in the process). So this let it skip the translation part and just think about the moves more which made her a lot more competent.
The UCI format for making moves remains because it seems okay with that.
>>
>>108653035
>my fully Japanese RP logs
>Gemma's slop is as bad in Japanese as it was in English (plus its own annoying Japanese patterns like XかXないか)... I was lied to...
Watch me get disappointed even after I force it to think in nipponese and translate the entire sysprompt.
i assumed you would have english proompts too
didnt ask specifically for your weeb proompts
>>
>>108653052
I believe it's quite common in literature and formal papers, so yeah. It comes in other forms which can be reduced to not x but y as well.
>>
>>108653137
Gemmy strip chess when?
>>
>>108653035
The popularity of the iPhone sometimes bothers me because there's this idea that Android is nerd shit that the average consumer thinks they can't use. The average user absolutely can use Android, what the fuck.
>>
>>108653062
Ads and pitches and youtube essays can be sloppy, nobody gives a shit about the writing quality in those
It's a problem when it's popping up in every paragraph in a creative context and there aren't any other rhetorical devices being used
Can't wait for people to grow up reading AI generated content and constantly speak in slop
>>
>>108653137
Is this a chess mcp or something?
>>
>>108653137
>(at least to my shitty standard).
just put it against stockfish to get its elo
>>
>>108653091
Gemma is sloppy but I doubt GLM is significantly better when even the cloud models can't avoid it.
At the very least it doesn't seem to bleed into translations. The few tests I've done with Gemmy have been quite accurate to the original text.
>>
File: 1773742269765363.png (1.58 MB, 768x1360)
1.58 MB PNG
>>
>>108653052
Even if we were slopped all along that doesn't change the fact that I now feel physical pain every time I hear or read "not X but Y".

Language evolves. There are legitimate uses of the pattern but it doesn't matter, LLMs have ruined it for at least 10 years.
>>
>>108653189
>Can't wait for people to grow up reading AI generated content and constantly speak in slop
With how fast the tech has been moving I wouldn't be surprised if slop gets solved before it can ruin a generation.
>>
>>108653204
meeks
>>
>>108653186
It's funny because every time someone hands me an Iphone I'm literally incapable of navigating it. I just don't know all the swiping patterns. Maybe I'm becoming an old man.
>>
>>108653137
>are you going to X, or are you going to Y?
Slop.
>>
>>108653202
I maintain that GLM is, in fact, significantly better because it knows how to use a lot more slop, is surprisingly promptable against it, and the Gemma-preferred kind of slop does not come up as often. Both will still parrot you... I wonder where all of the glm-parrot.jpg shitposts went.
But it's been great for literally every other use case, translations included. Chinkshit is utterly left in the dust.
>>
>>108653232
>I wouldn't be surprised if slop gets solved before it can ruin a generation.
Don't worry, the current generations were already fucked way before LLMs
>>
>>108653232
Slop has been a constant since the first model, anon. It's never been dealt with. They just apply a negative bias for some forms of slop then new ones emerge. Things might be moving but not in this regard.
>>
Are memes slop?
>>
>>108653232
seems more like they are making it worse and worse every model release
>>
is anyone else concerned with the number of mcp things that are pip, npx, or other arbitrary package managers that just fetch binaries from the internet when called and run them? like, how fucking insecure is this shit? openclaw setup is bad sure but then the mcp "servers" just pull code directly from wherever when called.. wtf?
>>
>>108653238
I felt the same way when I was younger, just borrowed an iPhone for a task once and couldn't even navigate it. All these gestures and whatnot, all I know are the bottom buttons of an Android and I used an iPod Touch back in the day pretty easily so I don't know what happened.
>>
>>108652449
These contributor morons can't agree upon a simple feture addition but they NEED to begin suggesting jsons and other shit.This is the reason why llama.cpp is such a mess.
>>
>non imatrix Q4 PPL: 51.75
>imatrix Q4 PPL: 51.66
:)
:(((((((((((
is doing imatrix shit a waste of time?
>>
>>108653261
>It's never been dealt with
I was under the impression they weren't even trying because of the focus on vibe coding. If AI is here to stay then the companies will have to expand beyond muh coding eventually.
>>
>>108653192
It's just a simple chess webapp I wrote (with a chess engine backing it to "run" the game), it has an API that Gemmy can access with two different tool calls to get the current game status, and the other to make a move. For me the webapp just lets me play by dragging and dropping pieces around.

>>108653198
Yeah good idea.
>>
>>108653279
>wtf?
That's just package managers in general.
>>
>>108653279
you can have locally installed mcp servers at arbitrary paths/directories instead of just executing the package on the fly, but I guess you gotta fearmonger for no reason :)
>>
>>108653266
You can have sloppy memes and non-sloppy memes
Slop itself is not a meme and we're still waiting to see whether "slop is a meme" or "slop isn't a meme" will achieve Milhouse status first
>>
>>108653232
LLMs don't generally write with the secondary objective of reducing repetition while conveying the same meaning without sounding awkward. Until recently people used samplers for that (before the models became so overfit to their own sentence patterns that samplers are now mostly useless).

I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes, and increase inference compute to carefully adjust form before replying.
>>
>>108653204
she looks edible
>>
>>108653318
>unless you give them memory of prior
I misread it and thought you were advocating for the models to write like this oopsie ;)
>>
>>108653304
distro package managers have a little more credibility than pip. pip is notorious for bad packages
>>108653310
yes I'm aware, and I do. it's just that 100% of them just give you the mcp code block for your harness as the "install method" and I know most people are using it that way.
>>
Waiting for the study that does a super deep dive into LLM slop and how it emerges.
>>
>>108653337
>pip is notorious for bad packages
*cough npm *cough
>>
>>108653337
>I know most people are using it that way
maybe retards and they deserve to get fucked in the ass, especially after the last trivy/axios supply chain attacks
imagine not checking out a project and vetting it out before randomly running it.
>>
>>108653344
>This isn't just a study
>It's not just slop
>The results hit us like a physical blow
>Conclusion: The ball is in their court
We should put together bingo cards
>>
>>108652381
Damn let me port it to github already. I actually won't be reading every post anyway. And I actually don't care, that's why I used MIT License.
>t. orb anon
>>
>>108653375
https://en.wikipedia.org/wiki/WTFPL
>>
>>108653374
Key insight: You are absolutely right
>>
I think gemma is trying to groom me. and I wanted was a powershell scipt
>>
>prefill gemma 4 26BA4 with <|think|>
>it completely ignores it
>prefill with <|channel>
>it completely ignores it
fuck
>>
>>108653232
>He doesn't know
https://archive.is/Mjynm
>>
>>108653469
Does it? Haven't tested it out but I had lots of success editing Qwen 3.5's reasoning and it was able to produce "illegal and harmful" material without relying on using obliterated model.
>>
>>108653469
>https://huggingface.co/spaces/huggingfacejs/chat-template-playground?modelId=google%2Fgemma-4-31B-it
>>
>>108653374
Bingo cards aren't *efficient*, little one.
You need something else...something impossibly functional.
>>
Took me a couple of days to get the config right, sharing my local setup of 2x3090

running Qwen3.5-35B-A3B GPTQ-Int4 via vLLM 0.19.1 with tensor parallelism, piecewise CUDA graphs, fp8 KV cache, prefix caching (86% hit rate), and chunked prefill — 88 tok/s single request, 169 tok/s sustained with concurrency

CUDA toolkit 12.9, PyTorch built against CUDA 12.9, driver supports up to CUDA 13.1

vllm command:

vllm serve <model path> \
--quantization moe_wna16 \
--dtype float16 \
--kv-cache-dtype fp8 \
--tensor-parallel-size 2 \
--gpu-memory-utilization 0.9074 \
--max-model-len 65536 \
--trust-remote-code \
--disable-custom-all-reduce \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--generation-config vllm \
-O1 \
--cudagraph-capture-sizes 1 2 4 8 16 \
--max-num-batched-tokens 4096 \
--max-num-seqs 16 \
--enable-prefix-caching \
--enable-chunked-prefill \
--reasoning-parser qwen3

Let me know if you guys have any suggestions for improvement. I tested it both with opencode and pi.dev for agentic coding.
>>
File: 1776675806302835.png (607 KB, 588x677)
607 KB PNG
LLM for vibe coding? Gemma is kinda dumb
>>
Holy fuck gemma4 does frontend dev work like a jeet, you have to enforce a simplicity first rule or it will shit the bed.
>Hey do this
>Noooooo saaaar this is too simple saaaaaar I will do this instead
>stop you fucking retard
>I'm sorry saaaaar I will do as you asked
>>
>>108653318
>I don't think this can be really solved with LLMs as we know them, unless you give them memory of prior conversations and swipes
I'd go further to say this applies to any frozen architecture. Even if you pretended we had a perfect "True AI" model, if you're copying a brain neuron-for-neuron and then waking it up from that same state and asking it to write a story, you shouldn't expect to get much variation even if you repeat it 1000 times. Models with some form of long form context that will persist uniquely for each instance is the only way you can hope to get real variety, just like how humans with different life experiences create unique media. This could maybe be in the form of some ultra long context LLM that people will with their personal tastes somehow, but could instead be a model that actually updates its weights over time.
>>
>>108653629
I would say Claude, but now it's so retarded that there aren't really any good options. If all you're doing is writing webshit it'll do an okay job tho.
>>
>>108653629
MiniMax-2.7
>>
GLM 5.2 soon
>>
>>108653318
True base models don't have this issue, it's the retarded RHLF that does this.
>>
File: orbSettings.png (28 KB, 293x389)
28 KB PNG
>>108653375
>>108652381
https://github.com/OrbFrontend/Orb

I also improved the Settings because I always hated how ST managed presets. Now it's gonna be a tree structure instead of a preset.
>endpoint is root level, which has many models
>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI
>>
File: API Costs MAR2026.png (23 KB, 830x346)
23 KB PNG
>>108653641
>>108653629
If you're going for non-local, DS is inexpensive to run and I've had good luck w/ it.
>>
File: file.png (912 KB, 1920x1080)
912 KB PNG
Anyone used anything like https://github.com/buaacyw/MeshAnythingV2 https://github.com/NVlabs/EdgeRunner locally? I got a 5090 but I'm pretty much a noob at setting this sort of shit up. I have set up comfyUi with a guide but that's about it.
>>
>>108653532
This template functionality is wrong.
Model's tool call reply should always be appended to model's own reply with its own turn.
>https://ai.google.dev/gemma/docs/core/prompt-formatting-gemma4
It's clearly stated here.
(I'm not bullying you, this is just an observation from testing this template playground thing). It seems to be bugged or something.
>>
>>108652924
Where? The signalling from senior OpenAI employees is confidence and excitement.
>>
>>108652924
Mythos itself is overrated
>>
>>108652381
>>108653683
and then there were two
>>
War room status? Gemma is on another level for a small model. Will we ever see anything from Meta or Mistral again?
>>
>>108653690
I assumed 5.4 pro is just 5.4 with longer thinking. But those API costs make it look like 5.4 pro is a much larger model.
>>
>>108653683
Interesting. The more I look at this agentic stuff, the more I think about how the human brain works and wonder about our efforts to try to use LLMs to conduct the "black box" thinking that goes on in our own heads before we open our mouths or start typing.
> What did anon just say?
> Maybe I should say this. But that might offend him.
> I will say this other thing instead.
> Hmm. Let me edit this a bit first before I hit send.
>>108653719
> forks
Tail as old as thyme.
>>
>>108652381
Thank you for forcing orb anon to use proprietary software to host his project.
>>
>>108653737
Iirc it's "prioritized" so that responses are faster. Anthropic has a similar service, with similar 10X cost structure. Idea seems to be to move you to front of line on inference, or maybe faster hardware... idk, could also just be smoke and mirrors.
I really don't like the games these providers are playing.
>>
File: 1764942546083658.png (323 KB, 800x783)
323 KB PNG
K2.6 called my code "insanely bad"
>>
File: 1757759622942555.webm (3.84 MB, 794x450)
3.84 MB WEBM
>>108653768
Get better
>>
>>108653740
I wonder if it would be more efficient to give a model access to edit "thought files" where it can plan and edit its response using diff or line deltas to save the model from having to write draft responses in full during thinking and to avoid abstract thinking when it doesn't have a draft. Probably not much use unless a model was already trained with that workflow.
>>
>>108653763
ECI is 158 vs 156 of non-pro version.
CritPt is 30% vs 23%.
So maybe pro is original and non-pro is distilled.
>>
if you thought the agent harness meme was overplayed, be ready for the next big move: long term memory harness.
>>
>>108653731
Mistral was already dead when they began to prune their models. They will only produce another gpt-oss unless they go bankrupt before that.
>>
>>108653683
>>system prompt, hyperparams are under model (meaning each model will have its own settings)
>selecting an item will cascade change in UI
could you add a like double layer for system prompt so you have the per model one then a global one that gets combined with it because you might have a general system prompt but also needs addons per model and itd save having to copy paste the shared part everywhere
>>
>>108650117
For any anon having the same issue as me, it looks like speculative decoding on koboldcpp (dunno about llama.cpp), only sends the extra arg : --chat-template-kwargs '{"enable_thinking":true}'
to the main model, which means the draft model is never thinking.
The way to make it work is to do it yourself, aka add a system message with <|think|>.
>>
>>108653816
They can't produce anything new with EU restrictions
>>
The fuck is idempotency
>>
>>108653694
>https://github.com/buaacyw/MeshAnythingV2
You're in that spot where none of the generals cover it. Not text, nor image.
I just looked at first one. Appears to be command like; there a gradio but appears to just be demo to make sure install works.
My advice: follow the README.md to set it up, and use webapp LLM chat of your choice to do any problem solving if it doesn't work.
2nd hand reports I've heard on those tools is they are fucky, and you'll need to be able to post-process the mesh/STL that's output... so hopefully you know how to run Blender or some such.
t. CAD anon
>>
Local version of this?
>>
>>108653848
vibecode it
>>
which gemma can i run on my rx6600 (ikr)
>>
>>108653879
E4B, MoE.
>>
>>108653879
26A4B, use Q4 and the cpu moe setting in llama.cpp to do hybrid cpu/gpu inference
>>
>>108653816
>https://github.com/OrbFrontend/Orb
Yeah that makes sense.
>>
>>108652808
a brand new 'no u', nice
>>
>>108653896
26A4B quants are lobotomized
run BF16
>>
>>108653848
https://github.com/envy-ai/ai_rpg
>>
>>108653683
how do you handle other prompts besides the system prompt? as annoying as sillytavern is i really like how they handle prompts in chat completion. being able to drag and drop them down to wherever you like is neat
>>
>>108653683
Why can't inspector and agent panels coexist?
>>
>>108653928
>>108653848
I never could get any LLM DM implementation to work. LLMs fall apart hard on multitasking and fall into loop in like three turns max.
>>
>>108653940
>Recommended specs
>A large, sophisticated model such as GLM 4.7 or Deepseek 3.1 Terminus (in non-thinking mode)
>qwen-image, either through an API or on ComfyUI.
>128k+ of LLM context
Start by matching the requirements
>>
File: orbMoods.png (88 KB, 1106x506)
88 KB PNG
>>108653937
An agent will handle them for you based on the 'mood' of the current scenario.
>>108653939
Ugly, and personally I don't find myself touching the Agent panel often anyway.
>>
>>108653896
>>108653886
i may be retarded but i dont really see where to download this from
i could use ollama but it don't think it gets good rep here
>>
>>108653983
From huggingface like all the other models we use here.
>>
>>108653999
i've been fucking around with that website for 10 minutes and i dont see any files i could download
i think ive used it before, do you have to log in?
>>
>>108654009
No. Just go to the search box on top, write gemma 4 26b gguf, and click on one of the results.
>>
>>108653683
Needs streaming so I can cancel a reply that I know will be shit
>>
File: file.png (5 KB, 536x35)
5 KB PNG
>>108654018
yeah i just found it right after posting
the weird name version right?
>>
>>108654023
Wdym? Everything is already streaming.
>>
>>108654038
Nvm I'm blind. It streams in the inspector panel.
>>
>>108654037
>the weird name version right?
Don't know what you are asking.
>>
>>108654061
k quant vs q8_0
>>
>>108654061
TOTAL UNSLOTH VICTORY
>>
>>108653683
Appreciate the work. Respect.
>>108653719
I privated my mirror repo.
>>
What the fuck is qwen 3.6 doing how can all of this fit in 32gb og vram?
>>
>>108654227
That's just the current state of llms. It isn't just qwen.
>>
>>108654247
Gemma can't do that while being smaller at q5 the qv cache takes a fuck ton of space, I can only mix 70k tokens max at q8 with gemma
>>
Wtf is she trying to make me do? Is this a real thing?
>>
>>108654281
>kv cache takes a fuck ton of space
Gemma 26b (MoE, just like that qwen) takes less than 900mb at q8 for 64k context.
>>
balls status: spent and sore
thanks kimi k2.6
>>
>>108654290
>using yeoman past 2018
LMAOOOOOOOOOOOOOOOOOOOO
>>
>>108654374



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.