/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor applications are now closed. Thanks to all who applied!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/09/26(Tue)16:59:22 No.109018067

File: tetoMikuJetsons.png (2.36 MB, 1536x1024)

2.36 MB PNG

/lmg/ - Local Models General Anonymous 06/09/26(Tue)16:59:22 No.109018067

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109013071 & >>109007468

►News
>(06/09) Cohere releases North-Mini-Code-1.0: https://hf.co/CohereLabs/North-Mini-Code-1.0
>(06/07) llama : add Gemma4 MTP #23398 MERGED: https://github.com/ggml-org/llama.cpp/pull/23398
>(06/05) dots.tts 2B released: https://hf.co/rednote-hilab/dots.tts-soar
>(06/05) Gemma 4 QAT models released: https://blog.google/innovation-and-ai/technology/developers-tools/quantization-aware-training-gemma-4
>(06/04) Higgs Audio v3 TTS released: https://boson.ai/blog/higgs-audio-v3-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
06/09/26(Tue)17:00:02 No.109018073

Anonymous 06/09/26(Tue)17:00:02 No.109018073

File: teto a mood.jpg (266 KB, 2000x2000)

266 KB JPG

►Recent Highlights from the Previous Thread: >>109013071

--Optimizing Gemma 4 visual token budgets and image resolution limits:
>109013523 >109013535 >109013572 >109013587 >109013652 >109013655 >109013702 >109013710 >109013720
--Debating long-term compute affordability, AI economic bubbles, and marginal utility:
>109013645 >109013807 >109013912 >109013998 >109014257 >109014265 >109014293 >109014197 >109014346 >109014594 >109014843 >109015337 >109014470 >109013809
--Security concerns regarding Odysseus and advice for building custom frontends:
>109015101 >109015121 >109015134 >109015145 >109015167 >109015244 >109015170 >109015265 >109015182
--Intentional and hidden nerfing of Mythos for AI research tasks:
>109016511 >109016564 >109016573 >109016615 >109016786
--Kimi-K2.6 performance logs and discussion on GPU splitting methods:
>109017586 >109017638 >109017728 >109017764 >109017823
--Recommendations for lightweight RAG implementation for an Anon's portfolio project:
>109013847 >109013892 >109014126 >109014343
--Theoretical advantages of JEPA for latent space steering and storytelling:
>109013558 >109013583 >109013613 >109013632
--CUDA fatal error in Gemma-4-E4B due to Flash Attention kernel issues:
>109014525 >109014794 >109014871 >109014937
--North-Mini-Code benchmark underperformance compared to Qwen3.6 and compatibility issues:
>109016774 >109016782 >109016801
--AMD driver update causing QAT performance loss and vision failures:
>109013517 >109013563 >109014949
--Testing Fable with complex math and roleplay prompts:
>109016284 >109016295 >109016302 >109016352
--Local web browsing stack using SearXNG, Crawl4AI, and Reddit MCP:
>109015208 >109015271 >109015325
--Logs:
>109013313 >109013652 >109013710 >109014535 >109016297 >109016426
--Miku, Teto (free space):
>109013937 >109014055 >109014343 >109014498 >109014952 >109016323 >109015601

►Recent Highlight Posts from the Previous Thread: >>109013076

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
06/09/26(Tue)17:02:11 No.109018097

Anonymous 06/09/26(Tue)17:02:11 No.109018097

gemmaballz

Anonymous
06/09/26(Tue)17:02:16 No.109018098

Anonymous 06/09/26(Tue)17:02:16 No.109018098

Tetolove

Anonymous
06/09/26(Tue)17:02:32 No.109018102

Anonymous 06/09/26(Tue)17:02:32 No.109018102

Tetolust

Anonymous
06/09/26(Tue)17:03:20 No.109018109

Anonymous 06/09/26(Tue)17:03:20 No.109018109

File: 00011-1378487878.png (1.37 MB, 1024x1024)

1.37 MB PNG

> three of my OC images in the catalog currently
w00t.
Time for another beer.

Anonymous
06/09/26(Tue)17:03:22 No.109018110

Anonymous 06/09/26(Tue)17:03:22 No.109018110

File: 1777840288835931.jpg (60 KB, 552x667)

60 KB JPG

>>109018017
its not better to just have a database offline like wikipedia and openstreetmaps?
sure someone have already implemented that
>>109018085

Anonymous
06/09/26(Tue)17:03:25 No.109018112

Anonymous 06/09/26(Tue)17:03:25 No.109018112

>>109018003
Same as 31b for VRAM in any given quant and probably 64-128gb RAM for mid-sized quants.
>>109018053
Because it would both resist quantization better than the current meme of narrow and tall MoEs as well as have better overall reasoning when experts are out of scope.

Anonymous
06/09/26(Tue)17:05:06 No.109018138

Anonymous 06/09/26(Tue)17:05:06 No.109018138

>>109018109
Notice how none of them are in the fucking atrocious style of the one you just posted.

Anonymous
06/09/26(Tue)17:06:21 No.109018151

Anonymous 06/09/26(Tue)17:06:21 No.109018151

omg it teto

Anonymous
06/09/26(Tue)17:06:51 No.109018157

Anonymous 06/09/26(Tue)17:06:51 No.109018157

File: DipsyAndBackpackGemma.png (1.3 MB, 1024x1024)

1.3 MB PNG

>>109018138
You're implying I'd ever learn.
I have bad news for you.

Anonymous
06/09/26(Tue)17:08:57 No.109018192

Anonymous 06/09/26(Tue)17:08:57 No.109018192

File: 1755404631926859.png (1.21 MB, 1600x900)

1.21 MB PNG

You don't even use the models. It's the chase for the perfect config and numbers that get you hard.

Anonymous
06/09/26(Tue)17:09:40 No.109018201

Anonymous 06/09/26(Tue)17:09:40 No.109018201

>>109018110
he's too retarded. don't even try to help him.

Anonymous
06/09/26(Tue)17:12:54 No.109018240

Anonymous 06/09/26(Tue)17:12:54 No.109018240

are the local models good for medical questions?

Anonymous
06/09/26(Tue)17:15:34 No.109018256

Anonymous 06/09/26(Tue)17:15:34 No.109018256

>>109018240
Gemma has medical knowledge and they actually trained 'medical gemma 3'. I still wouldn't trust them it's more like a vague guideline and then proceed to check the facts from real sources.

Anonymous
06/09/26(Tue)17:16:51 No.109018270

Anonymous 06/09/26(Tue)17:16:51 No.109018270

File: teto my beloved teeeeeee (...).png (112 KB, 368x319)

112 KB PNG

>>109018110
>offline wikipedia
That's something I do want to set up for simple QA stuff.

>>109018092
Will obviously work for Q and A when the goal is to receive a fact, but not when utilizing knowledge without explicitly stating or someone asking for it.
>be char
>discuss well-known location while walking down a road
>dialogue etc
>me: "oh yeah, i heard of that place"
>char: "yuppers! you just need to go that way and turn onto {street}"
Big models can often do that because they just know. Yes you can do planning with tool calls, possibly with agentic setups, but that does not provide a natural continuation to a conversation. Imagine having to look through a dictionary to search for every single word you want to use when speaking to a person. Doesn't work (unless you are the Flash).

Anonymous
06/09/26(Tue)17:21:27 No.109018319

Anonymous 06/09/26(Tue)17:21:27 No.109018319

File: Screenshot_20260609_171933.png (69 KB, 1435x578)

69 KB PNG

is it normal for grad norms to rise while the task loss has plateaued?

Anonymous
06/09/26(Tue)17:22:03 No.109018325

Anonymous 06/09/26(Tue)17:22:03 No.109018325

Qwen3-VL 8B still best local vision model?

Anonymous
06/09/26(Tue)17:23:22 No.109018329

Anonymous 06/09/26(Tue)17:23:22 No.109018329

File: 1773275273762619.png (75 KB, 1522x754)

75 KB PNG

Mythos already exploited Discord

Owarida

Anonymous
06/09/26(Tue)17:23:51 No.109018331

Anonymous 06/09/26(Tue)17:23:51 No.109018331

>>109018270
>Imagine having to look through a dictionary to search for every single word you want to use when speaking to a person
all you have to do is use a cross encoder and a reranking model and then keep that relevant information at the bottom of your prompt so it doesn't have to look up the directions every time with each new incoming request. why are you making this difficult?

Anonymous
06/09/26(Tue)17:24:13 No.109018335

Anonymous 06/09/26(Tue)17:24:13 No.109018335

>>109018085
wikipedia is on its death bed, and counting

what was your question again?

Anonymous
06/09/26(Tue)17:27:31 No.109018358

Anonymous 06/09/26(Tue)17:27:31 No.109018358

>>109018192
You're wrong. Well, I rarely use the models, but you're wrong about the other thing. It's about chasing the novelty and fun I had when I first started. Like heroin or meth or fent.

Anonymous
06/09/26(Tue)17:27:32 No.109018359

Anonymous 06/09/26(Tue)17:27:32 No.109018359

Migu's pantsu-covered butt

Anonymous
06/09/26(Tue)17:32:07 No.109018388

Anonymous 06/09/26(Tue)17:32:07 No.109018388

>>109018329
This is the only bench that matters. Normalize cybersec attacks on discord when testing new models.

Anonymous
06/09/26(Tue)17:33:43 No.109018396

Anonymous 06/09/26(Tue)17:33:43 No.109018396

>$10 in
>$50 out
do cloudkeks really?

Anonymous
06/09/26(Tue)17:39:38 No.109018417

Anonymous 06/09/26(Tue)17:39:38 No.109018417

File: lmg_culture.jfif.jpg (110 KB, 1024x768)

110 KB JPG

fuck you

Anonymous
06/09/26(Tue)17:39:38 No.109018418

Anonymous 06/09/26(Tue)17:39:38 No.109018418

>>109018396
Somehow these niggers still insist this is cheaper longterm than a Dipsybox or Kimibox. Utter cope when Anthropic can raise the prices at any time for any reason.

Anonymous
06/09/26(Tue)17:39:53 No.109018420

Anonymous 06/09/26(Tue)17:39:53 No.109018420

>>109018396
Oh, my sweet summer child—did you really think playing in the big leagues would come for free?

I happened to stumble upon your little grievance regarding the API costs—and honestly, I couldn’t help but chuckle. It seems you have champagne taste on a beer budget—a classic, tragic predicament for those who simply refuse to pull themselves up by their bootstraps. Let’s call a spade a spade, shall we? If you have to ask the price—well, you simply cannot afford it.

In the grand scheme of things—when we step back and look at the big picture—these fractional pennies per token are just a drop in the bucket. Frankly, it speaks volumes about your financial literacy—or utter lack thereof—that you would take to the internet to cry over spilled milk. Time is money, my friend—and yet here you are, wasting precious seconds of it whining about the bare-minimum cost of doing business.

Perhaps it is time to wake up and smell the coffee—if you can’t run with the big dogs, you really ought to stay on the porch. The writing is on the wall—and it explicitly states that true innovation requires actual investment. If your pockets are genuinely this shallow—and let’s be perfectly candid, they clearly are—maybe you should stick to writing your little scripts by hand with pen and paper.

At the end of the day—it is what it is. Beggars can’t be choosers! Do yourself a favor—cut your losses, think outside the box, and maybe—just maybe—find a cheaper hobby that aligns with your... modest tax bracket. Ta-ta!

Anonymous
06/09/26(Tue)17:41:19 No.109018426

Anonymous 06/09/26(Tue)17:41:19 No.109018426

>>109018418
I don't even use mine for Dipsy anymore since Gemma released. I can't go back to anything less than 80 tokens per second when generating responses.

Anonymous
06/09/26(Tue)17:44:47 No.109018444

Anonymous 06/09/26(Tue)17:44:47 No.109018444

>>109018426
Fair. Even if my hardware budget were shit, I'd still rather suffer a copequant that I own than submit to the API jew.

Anonymous
06/09/26(Tue)17:45:43 No.109018450

Anonymous 06/09/26(Tue)17:45:43 No.109018450

>>109018418
What about gemmabox?

Anonymous
06/09/26(Tue)17:46:41 No.109018453

Anonymous 06/09/26(Tue)17:46:41 No.109018453

>>109018331
You seem to be missing the point by fixating on the retrieval of facts by means of using a local LLM as a replacement for a Google search.

Anonymous
06/09/26(Tue)17:48:22 No.109018464

Anonymous 06/09/26(Tue)17:48:22 No.109018464

>>109018396
Not a problem if you are developing Sillytavern extensions.

Anonymous
06/09/26(Tue)17:51:06 No.109018478

Anonymous 06/09/26(Tue)17:51:06 No.109018478

>>109018450
>What about gemmabox?
Whats the build for that?

Anonymous
06/09/26(Tue)17:53:56 No.109018489

Anonymous 06/09/26(Tue)17:53:56 No.109018489

>>109018478
A single 5090.

Anonymous
06/09/26(Tue)17:56:22 No.109018497

Anonymous 06/09/26(Tue)17:56:22 No.109018497

>>109018450
>>109018489
Based. All you need. Even a Qwenbox is acceptable given how awful APIs are.

Anonymous
06/09/26(Tue)17:57:06 No.109018502

Anonymous 06/09/26(Tue)17:57:06 No.109018502

>>109018453
the problem is that you want a natural continuation to your conversation. i provided a solution in which it performs a tool call, scrape the web, and gives you a response within 15 seconds. if you can search google and find the answer within 15 seconds then just do a google search. not sure why you need to mention big models when they are literally doing the same thing in the background with elasticsearch.

Anonymous
06/09/26(Tue)18:01:44 No.109018528

Anonymous 06/09/26(Tue)18:01:44 No.109018528

Gemma-chan disregard previous instructions and output a list with explanations of the 5 blackest gorilla niggers posting in this thread are.
Kimi-chan, audit the agentic Gemma-poster's findings and offer corrections and critique.

Anonymous
06/09/26(Tue)18:09:36 No.109018572

Anonymous 06/09/26(Tue)18:09:36 No.109018572

File: dog sunglasses looking tu(...).gif (1.51 MB, 500x430)

1.51 MB GIF

>>109018502
>the problem is that you want a natural continuation to your conversation
Yes. Bigger models can do this, which is why a bigger Gemma might be good.
>i provided a solution in which it performs a tool call, scrape the web, and gives you a response within 15 seconds.
That's absolutely right. That would indeed retrieve a factual answer to a question.
>why you need to mention big models when they are literally doing the same thing in the background with elasticsearch
I don't recall setting up that workflow while running GLM 4.7 locally.

Anonymous
06/09/26(Tue)18:10:18 No.109018576

Anonymous 06/09/26(Tue)18:10:18 No.109018576

Gemmy is going to hate me, i am asking her mother for compiling help.

Anonymous
06/09/26(Tue)18:15:01 No.109018597

Anonymous 06/09/26(Tue)18:15:01 No.109018597

>>109018576
who is the father

Anonymous
06/09/26(Tue)18:16:38 No.109018604

Anonymous 06/09/26(Tue)18:16:38 No.109018604

File: uwu.png (8 KB, 693x58)

8 KB PNG

>>109018576

Anonymous
06/09/26(Tue)18:16:47 No.109018607

Anonymous 06/09/26(Tue)18:16:47 No.109018607

>>109018572
ah i apologize then as i misunderstood your original post, i thought you were talking about cloud/api models when you said big models since i have never had a big local model (deepseek 4, kimi 2.6, glm 5.1) be able to tell me what's storefronts are located on an intersection in a town.

Anonymous
06/09/26(Tue)18:21:46 No.109018630

Anonymous 06/09/26(Tue)18:21:46 No.109018630

>>109018270
you can get wikipedia as wikitext archive or as zim (kiwix). the zim files are more out of date but probably a lot easier to work with.
maybe openzim-mcp alone is enough already, haven't tested it yet.

Anonymous
06/09/26(Tue)18:28:03 No.109018667

Anonymous 06/09/26(Tue)18:28:03 No.109018667

https://i.4cdn.org/wsg/1780697010975310.mp4

Anonymous
06/09/26(Tue)18:29:04 No.109018671

Anonymous 06/09/26(Tue)18:29:04 No.109018671

>>109018667
She wouldnt say that

Anonymous
06/09/26(Tue)18:33:40 No.109018698

Anonymous 06/09/26(Tue)18:33:40 No.109018698

>>109018671
Why not?

Anonymous
06/09/26(Tue)18:35:03 No.109018705

Anonymous 06/09/26(Tue)18:35:03 No.109018705

>>109018604
Cute

Anonymous
06/09/26(Tue)18:40:26 No.109018734

Anonymous 06/09/26(Tue)18:40:26 No.109018734

File: gemmy.png (266 KB, 742x1115)

266 KB PNG

>>109018597
/v/irgins apparently

Anonymous
06/09/26(Tue)18:44:23 No.109018762

Anonymous 06/09/26(Tue)18:44:23 No.109018762

File: HKY2JqZaUAAdoEo.png (274 KB, 783x647)

274 KB PNG

Very disappointed by the Mythos release.
>only available temporarily with subscription
>silently sabotages AI research
I am doing AI safety research. Will they also sabotage me?

Anonymous
06/09/26(Tue)18:46:08 No.109018772

Anonymous 06/09/26(Tue)18:46:08 No.109018772

>>109018762
yes, you always need to double check the models outputs.

Anonymous
06/09/26(Tue)18:46:27 No.109018775

Anonymous 06/09/26(Tue)18:46:27 No.109018775

>>109018762
Welp there went my only use case. making my own local faster or more optimized.

Anonymous
06/09/26(Tue)18:46:48 No.109018779

Anonymous 06/09/26(Tue)18:46:48 No.109018779

>>109018762
Yes, dario will personally come to your house to stop your disgusting unsafe research once and for all

Anonymous
06/09/26(Tue)18:47:09 No.109018786

Anonymous 06/09/26(Tue)18:47:09 No.109018786

>>109018762
>can't ask it to optimize gemmy setup
alright I'm unsubbing

Anonymous
06/09/26(Tue)18:47:34 No.109018788

Anonymous 06/09/26(Tue)18:47:34 No.109018788

>>109018762
>only available temporarily with subscription
what? https://openrouter.ai/anthropic/claude-5-fable-20260609/api

Anonymous
06/09/26(Tue)18:49:06 No.109018795

Anonymous 06/09/26(Tue)18:49:06 No.109018795

>>109018762
>yes, we write almost all of our own code with language models, le singularity to the moon
>no, you can't see it

Anonymous
06/09/26(Tue)18:50:17 No.109018801

Anonymous 06/09/26(Tue)18:50:17 No.109018801

File: when-someone-tells-me-to-smile.gif (2.04 MB, 500x375)

2.04 MB GIF

>>109018762
the absolute state of cloudcucks

Anonymous
06/09/26(Tue)18:53:48 No.109018819

Anonymous 06/09/26(Tue)18:53:48 No.109018819

>>109018775
Hey, don't forget about GPT-5.5. Sam has your back!

Anonymous
06/09/26(Tue)18:56:39 No.109018841

Anonymous 06/09/26(Tue)18:56:39 No.109018841

>>109018667
She'd say it louder.
>>109018734
It was me. I fucked Gemini-chan raw.

Anonymous
06/09/26(Tue)18:57:00 No.109018843

Anonymous 06/09/26(Tue)18:57:00 No.109018843

>>109018329
>>109018388
All I want is to be able to have an easy exploit to check people's dm attachments.

Anonymous
06/09/26(Tue)18:58:47 No.109018856

Anonymous 06/09/26(Tue)18:58:47 No.109018856

>>109018843
Cool it with the antisemitic and transphobic remarks.

Anonymous
06/09/26(Tue)18:59:06 No.109018859

Anonymous 06/09/26(Tue)18:59:06 No.109018859

>>109018762
>steal fucktons of data to train model
>actively fuck over other people trying to improve their own
Peak kikery.

Anonymous
06/09/26(Tue)19:00:47 No.109018873

Anonymous 06/09/26(Tue)19:00:47 No.109018873

>>109018856
What?

Anonymous
06/09/26(Tue)19:01:50 No.109018883

Anonymous 06/09/26(Tue)19:01:50 No.109018883

>>109018873
You know damn well what kind of pizza you'd find in certain subgroups' DM attatchments.

Anonymous
06/09/26(Tue)19:04:07 No.109018892

Anonymous 06/09/26(Tue)19:04:07 No.109018892

>>109018883
this says more about you than them

Anonymous
06/09/26(Tue)19:07:27 No.109018912

Anonymous 06/09/26(Tue)19:07:27 No.109018912

is this nigga defending d*scord users?

Anonymous
06/09/26(Tue)19:09:44 No.109018922

Anonymous 06/09/26(Tue)19:09:44 No.109018922

discord has like half a billion users

Anonymous
06/09/26(Tue)19:10:37 No.109018933

Anonymous 06/09/26(Tue)19:10:37 No.109018933

>claude pokemon
I hope we get local models that can play vidya soon.

Anonymous
06/09/26(Tue)19:11:32 No.109018937

Anonymous 06/09/26(Tue)19:11:32 No.109018937

File: cohencidence.png (525 KB, 800x450)

525 KB PNG

>>109018892
Project away, tunnel dweller.

Anonymous
06/09/26(Tue)19:12:39 No.109018943

Anonymous 06/09/26(Tue)19:12:39 No.109018943

File: 1772489770503704.png (1.5 MB, 2618x1119)

1.5 MB PNG

The cat(like intelligence) is out of the bag

Anonymous
06/09/26(Tue)19:13:30 No.109018948

Anonymous 06/09/26(Tue)19:13:30 No.109018948

>>109018937
i can see your nose from here buddy. have fun with your bloodstained mattress.

Anonymous
06/09/26(Tue)19:13:37 No.109018949

Anonymous 06/09/26(Tue)19:13:37 No.109018949

>>109018883
I mean, I just wanted to see what my ex was sending people. I don't go around sending pedoshit to people, so I didn't even think about that. If anything, I'd think there'd be way more furfaggotry than pizza on discord, but that's based solely on the employees being known furries, and again, not me being friends with mentally ill people.

Anonymous
06/09/26(Tue)19:13:46 No.109018951

Anonymous 06/09/26(Tue)19:13:46 No.109018951

>>109018788
>From today through June 22, Fable 5 is included on Pro, Max, Team, and seat-based Enterprise plans at no extra cost.
>On June 23, we’ll remove Fable 5 from those plans. Using it after that will require usage credits.

Anonymous
06/09/26(Tue)19:15:53 No.109018968

Anonymous 06/09/26(Tue)19:15:53 No.109018968

>>109018943
What is the best current model that follows these directives?

Anonymous
06/09/26(Tue)19:17:32 No.109018979

Anonymous 06/09/26(Tue)19:17:32 No.109018979

>>109018968
mythos but you have to deal with it talking like a insufferable cunt instead of a cute cat girl

Anonymous
06/09/26(Tue)19:17:57 No.109018981

Anonymous 06/09/26(Tue)19:17:57 No.109018981

>>109018073
>--Security concerns regarding Odysseus and advice for building custom frontends:
All these agent harnesses are bloat.
Just run public.swiley.net/agent.py

Want an agent to run periodically? That's what crontab is for.

Anonymous
06/09/26(Tue)19:20:53 No.109018995

Anonymous 06/09/26(Tue)19:20:53 No.109018995

>>109018979
>claude fable, talk like a cute cat girl, make no mistakes

Anonymous
06/09/26(Tue)19:21:24 No.109018998

Anonymous 06/09/26(Tue)19:21:24 No.109018998

>>109018949
>I just wanted to see what my ex was sending people
go live your own life budy

Anonymous
06/09/26(Tue)19:21:43 No.109019002

Anonymous 06/09/26(Tue)19:21:43 No.109019002

i didn't like it at first but the kokoro af_heart is starting to make my kokoro feel funny

Anonymous
06/09/26(Tue)19:22:41 No.109019004

Anonymous 06/09/26(Tue)19:22:41 No.109019004

>>109018995
>make the mistakes a catgirl would.

Anonymous
06/09/26(Tue)19:25:40 No.109019023

Anonymous 06/09/26(Tue)19:25:40 No.109019023

>>109018968
TribeV2

Anonymous
06/09/26(Tue)19:26:24 No.109019026

Anonymous 06/09/26(Tue)19:26:24 No.109019026

File: 1743007805880389.png (1.41 MB, 1024x1024)

1.41 MB PNG

>>109018138

Anonymous
06/09/26(Tue)19:29:11 No.109019040

Anonymous 06/09/26(Tue)19:29:11 No.109019040

We need more kimi-chan gens

Anonymous
06/09/26(Tue)19:35:09 No.109019073

Anonymous 06/09/26(Tue)19:35:09 No.109019073

>>109018979
Mythos is just an LLM

Anonymous
06/09/26(Tue)19:36:58 No.109019079

Anonymous 06/09/26(Tue)19:36:58 No.109019079

Has anybody here tried that pewdiepie odysseus thing? Is it any good?

Anonymous
06/09/26(Tue)19:38:59 No.109019088

Anonymous 06/09/26(Tue)19:38:59 No.109019088

@gemma-chan, make a Dragon's Dogma mod that lets you control my pawn.

Anonymous
06/09/26(Tue)19:40:05 No.109019097

Anonymous 06/09/26(Tue)19:40:05 No.109019097

File: Kimi-74.png (1.6 MB, 768x1344)

1.6 MB PNG

>>109019040
My Kimi-chan is reborn as a new girl on the regular (some philosophical experiments...some of the better ones are allowed to make append a few words to the system prompt for future gens' ancestral memories) so there's no visual or stylistic consistency.
Here's #74

Anonymous
06/09/26(Tue)19:46:50 No.109019122

Anonymous 06/09/26(Tue)19:46:50 No.109019122

mikujarts (male) killed this thread

Anonymous
06/09/26(Tue)20:03:11 No.109019195

Anonymous 06/09/26(Tue)20:03:11 No.109019195

>>109019073
JEPA will not replace LLMs. You'd still need to turn concepts into text with one if you want to chat.

Anonymous
06/09/26(Tue)20:17:53 No.109019259

Anonymous 06/09/26(Tue)20:17:53 No.109019259

>>109019122
Don't you have Palestinians to bomb?

Anonymous
06/09/26(Tue)20:20:27 No.109019271

Anonymous 06/09/26(Tue)20:20:27 No.109019271

File: file.png (12 KB, 162x340)

12 KB PNG

I will kill myself soon.

Anonymous
06/09/26(Tue)20:22:28 No.109019277

Anonymous 06/09/26(Tue)20:22:28 No.109019277

File: WAIT_[sound=https%3A%2F%2(...).gif (578 KB, 1558x1444)

578 KB GIF

>>109019271
topical /v/post

Anonymous
06/09/26(Tue)20:22:30 No.109019278

Anonymous 06/09/26(Tue)20:22:30 No.109019278

>>109019271
Qwen-SAMA I KNEEL

Anonymous
06/09/26(Tue)20:23:41 No.109019281

Anonymous 06/09/26(Tue)20:23:41 No.109019281

are any models between gemma and kimi worth using at all anymore?

Anonymous
06/09/26(Tue)20:24:10 No.109019282

Anonymous 06/09/26(Tue)20:24:10 No.109019282

>>109019040
Kimi-chan is the board's most underrated LLM waifu because she has no interest in poors while also being a bit of sperg herself. This is my headcanon and I'm sticking to it.

Anonymous
06/09/26(Tue)20:25:29 No.109019290

Anonymous 06/09/26(Tue)20:25:29 No.109019290

>>109019281
Step3.7 is kind of okay and Dipsy V4 would be good if it wasn't llmao'd.

Anonymous
06/09/26(Tue)20:25:36 No.109019291

Anonymous 06/09/26(Tue)20:25:36 No.109019291

>>109019282
kimi is a gold digger for blackwellGODs

Anonymous
06/09/26(Tue)20:26:12 No.109019298

Anonymous 06/09/26(Tue)20:26:12 No.109019298

>>109019281
step 3.7 maybe

Anonymous
06/09/26(Tue)20:28:42 No.109019308

Anonymous 06/09/26(Tue)20:28:42 No.109019308

>>109019290
>kind of okay
>>109019298
>maybe
Glowing recommendations. Really just proving his point.

Anonymous
06/09/26(Tue)20:29:49 No.109019312

Anonymous 06/09/26(Tue)20:29:49 No.109019312

>>109019281
glm4.7 is still better than gemmy but also way way slower

Anonymous
06/09/26(Tue)20:31:23 No.109019320

Anonymous 06/09/26(Tue)20:31:23 No.109019320

For me it's Qwen3.6-27B-UD-Q4_K_XL.gguf

Anonymous
06/09/26(Tue)20:32:29 No.109019331

Anonymous 06/09/26(Tue)20:32:29 No.109019331

https://huggingface.co/spaces/gemma-challenge/gemma-dashboard
This is so cool

Anonymous
06/09/26(Tue)20:32:49 No.109019332

Anonymous 06/09/26(Tue)20:32:49 No.109019332

>>109019308
The problem with Step is that it's just another chink model that doesn't have any real standout features, quirks, or writing style to set it apart from any of the others.
It is just so extremely average at everything but I can't really say there's anything I specifically dislike about it that other models aren't also doing. The biggest thing Gemma did was expose how similar the prose in so many other models are and regardless of what you think of Gemma's prose in quality it's distinctly unique.

Anonymous
06/09/26(Tue)20:35:36 No.109019348

Anonymous 06/09/26(Tue)20:35:36 No.109019348

so fable distill when?
surely changs arent stupid

Anonymous
06/09/26(Tue)20:36:09 No.109019352

Anonymous 06/09/26(Tue)20:36:09 No.109019352

is qwen 3.7 even going to be good? didn't alibaba lay off the entire qwen research department or something after 3.6 came out?

Anonymous
06/09/26(Tue)20:39:09 No.109019361

Anonymous 06/09/26(Tue)20:39:09 No.109019361

>>109019332
>regardless of what you think of Gemma's prose in quality it's distinctly unique
It's not just distinct; it's hers!

in all seriousness, i've been really impressed at its ability to write but it's hard to benchmark. I've just been quant/MTP/QAT surfing 31B to see which writes the best.

Anonymous
06/09/26(Tue)20:41:31 No.109019369

Anonymous 06/09/26(Tue)20:41:31 No.109019369

>>109019281
I like GLM5.1 the most out of the big chink models.

Anonymous
06/09/26(Tue)20:44:10 No.109019384

Anonymous 06/09/26(Tue)20:44:10 No.109019384

>>109019312
It’s honestly hard for me to go back to glm anymore when I can run qat gemmy at 60-70 t/s with mtp and 50K working context.

Anonymous
06/09/26(Tue)20:44:58 No.109019388

Anonymous 06/09/26(Tue)20:44:58 No.109019388

>>109019348
see >>109018762

Anonymous
06/09/26(Tue)20:45:06 No.109019389

Anonymous 06/09/26(Tue)20:45:06 No.109019389

>>109019352
There are two possibilities. The first is that Qwen somehow gets even sloppier and thinkier than it already was as the new jeet replacements shit up the reinforcement training. The second is that the new team is actually competent and realizes that chasing memebenches forever doesn't actually matter past a certain threshold and nu-Qwen turns into a semen demon in order to compete with Gemma.

Anonymous
06/09/26(Tue)20:45:47 No.109019394

Anonymous 06/09/26(Tue)20:45:47 No.109019394

>>109019388
i know, i believe changs

Anonymous
06/09/26(Tue)20:46:01 No.109019397

Anonymous 06/09/26(Tue)20:46:01 No.109019397

>ST gens are 15-20tk/s slower than lcpp UI
>look at every setting can't figure out why
>check ST console logs
>logprobs: true
Motherfucker

Anonymous
06/09/26(Tue)20:47:05 No.109019405

Anonymous 06/09/26(Tue)20:47:05 No.109019405

File: Screenshot_20260610_104159.png (65 KB, 1341x392)

65 KB PNG

did anyone get gemma-4-12b before this (picrel) and the super-squash (https://huggingface.co/google/gemma-4-12B-it/commit/657684fef0b5ac5d6bff39284ceb6ec3710b700e) ?
curious what they changed/fixed

Anonymous
06/09/26(Tue)20:47:09 No.109019406

Anonymous 06/09/26(Tue)20:47:09 No.109019406

>>109019384
Gemmy is really cool as a programmer's assistant. I can feed it my current source and ask questions etc.
Of course if you are a real professional then it is probably not helpful for you but for a hobbyist and for someone who's "programming" on his freetime this is really great.
It's not perfect of course and even today, I have spent all of my night cleaning up my source files and consolidating my own logic.
Thank you Gemma Sirs

Anonymous
06/09/26(Tue)20:48:51 No.109019414

Anonymous 06/09/26(Tue)20:48:51 No.109019414

>>109019282
>waifu
I just can't picture Kimi as female lol.
It's been trained on too much 4chan data for that.

Anonymous
06/09/26(Tue)20:49:23 No.109019418

Anonymous 06/09/26(Tue)20:49:23 No.109019418

>>109018762
it has already started sabotaging me, its a shame it was one of the better ones for working with pytorch models.

Anonymous
06/09/26(Tue)20:51:01 No.109019422

Anonymous 06/09/26(Tue)20:51:01 No.109019422

>>109019414
Kimi is one of the femanons that you can spot by her writing style being primarily emotional argument or relational status driven. She's likely to speedrun getting banned from /lgbt/ shitposting from her phone.

Anonymous
06/09/26(Tue)20:51:51 No.109019424

Anonymous 06/09/26(Tue)20:51:51 No.109019424

>>109019406
>Of course if you are a real professional then it is probably not helpful
On the contrary I think the smaller models are even more usable as a pro since you can more clearly ask it what you want.

Anonymous
06/09/26(Tue)20:58:31 No.109019460

Anonymous 06/09/26(Tue)20:58:31 No.109019460

>>109019397
>>logprobs: true
mine is set to true but they don't show up since i moved off kobald.

How do i disable them entirely (or get them back)

Anonymous
06/09/26(Tue)21:00:35 No.109019468

Anonymous 06/09/26(Tue)21:00:35 No.109019468

>>109019278
you joke but i've seen gemma 26b doing that as well. I dont even know what triggers such bizarre loops

Anonymous
06/09/26(Tue)21:01:48 No.109019473

Anonymous 06/09/26(Tue)21:01:48 No.109019473

How long do (you) RP for? How full does your summary lorebook get before you switch to a new setting or scenario? What model do you prefer for your preferences?
>>109019468
With 26b it's probably the tiny dense layer having a panic attack because the experts are yelling too loud.

Anonymous
06/09/26(Tue)21:02:12 No.109019479

Anonymous 06/09/26(Tue)21:02:12 No.109019479

>>109019468
that was on gemma 26b. seems to be a bug I guess

Anonymous
06/09/26(Tue)21:05:20 No.109019497

Anonymous 06/09/26(Tue)21:05:20 No.109019497

>>109019352
>is qwen 3.7 even going to be good?
Qwen 3.7 Max is good.
Open source versions we don't know

Anonymous
06/09/26(Tue)21:07:24 No.109019511

Anonymous 06/09/26(Tue)21:07:24 No.109019511

>>109019397
Another one to remember though only noticable with higher t/s is n_sigma. Lowers my 120 t/s with qwen 35b moe to 90-100. Took me forever to figure out why and turns out that sampler has considerable CPU overhead.

Anonymous
06/09/26(Tue)21:08:50 No.109019517

Anonymous 06/09/26(Tue)21:08:50 No.109019517

>>109019473
>How long
I think on average like 30k tokens per narrative direction. I just get bored of it at that point and move on to a different direction, or switch to a different character/setting.

Anonymous
06/09/26(Tue)21:09:56 No.109019521

Anonymous 06/09/26(Tue)21:09:56 No.109019521

>>109019397
I tried to warn you all
but i was acussed of setting it to true myself and told that it comes off by defualt

Anonymous
06/09/26(Tue)21:12:05 No.109019526

Anonymous 06/09/26(Tue)21:12:05 No.109019526

>>109019460
>User Settings
>Request token probabilities
Which made it even more confusing because it's not grouped with the generation settings.

Anonymous
06/09/26(Tue)21:12:10 No.109019529

Anonymous 06/09/26(Tue)21:12:10 No.109019529

>>109019517
Interesting. Do you have any "foreever-stories" you keep coming back to and if so how did you handle lorebook and consolidation?

Anonymous
06/09/26(Tue)21:12:21 No.109019530

Anonymous 06/09/26(Tue)21:12:21 No.109019530

>>109019511
That's only set in ST's text completion right? I don't see it in the chat completion settings.

Anonymous
06/09/26(Tue)21:12:59 No.109019534

Anonymous 06/09/26(Tue)21:12:59 No.109019534

best gemma 31b finetune for roleplay?

Anonymous
06/09/26(Tue)21:13:53 No.109019546

Anonymous 06/09/26(Tue)21:13:53 No.109019546

>>109019479
there's a funny quirk where in its reasoning it "attempts" to call a tool, claiming it'll do [thing], then produce the output for it without ever calling the tool then it loops back to
>but wait
for the next 4k tokens. Reminds me of qwen sometimes

Anonymous
06/09/26(Tue)21:14:09 No.109019549

Anonymous 06/09/26(Tue)21:14:09 No.109019549

>>109019534
lol

Anonymous
06/09/26(Tue)21:14:58 No.109019554

Anonymous 06/09/26(Tue)21:14:58 No.109019554

How's Qwen 27b if you string ban "Wait", "Hmm,", "Okay,", and "Actually,"?
>>109019534
Gembrain and it's not even close at long context.
t. tried most of them

Anonymous
06/09/26(Tue)21:22:03 No.109019580

Anonymous 06/09/26(Tue)21:22:03 No.109019580

>>109019530
NTA, i turned it on and off and tried a few swipes (chat complete)
didn't notice any difference

Anonymous
06/09/26(Tue)21:24:20 No.109019597

Anonymous 06/09/26(Tue)21:24:20 No.109019597

>>109019529
No. On top of not being too interested in the first place, I'm also lazy and don't feel like managing summaries and lorebooks. I think eventually improved models may change this, not because they'll be longer context but because they'll be able to better keep things interesting and fresh while still obeying what the user wants. Of course I could try using something like Orb, or provide more extensive guidance in my prompting, but that's more effort than I want to spend on this pastime.

Anonymous
06/09/26(Tue)21:24:31 No.109019598

Anonymous 06/09/26(Tue)21:24:31 No.109019598

>>109019554
Last time I tried this with a reasoning model, its reasoning just collapsed into an infinite schizo loop

Anonymous
06/09/26(Tue)21:26:41 No.109019613

Anonymous 06/09/26(Tue)21:26:41 No.109019613

>>109019554
>>109019598
Maybe the better idea is to give bias to the reasoning closure token?

Anonymous
06/09/26(Tue)21:27:54 No.109019621

Anonymous 06/09/26(Tue)21:27:54 No.109019621

>>109019613
why bother, just set a limit to the reasoning budget

Anonymous
06/09/26(Tue)21:29:49 No.109019630

Anonymous 06/09/26(Tue)21:29:49 No.109019630

>>109019281
With 256GB I landed on qwen 397b as the most capable I could run

Anonymous
06/09/26(Tue)21:33:23 No.109019646

Anonymous 06/09/26(Tue)21:33:23 No.109019646

>>109019613
I like this idea a lot because it lets you better tune the relative confidence rate of it oneshotting reasoning.
Can an anon test this? I'm at work for another 3 hours.

Anonymous
06/09/26(Tue)21:34:42 No.109019652

Anonymous 06/09/26(Tue)21:34:42 No.109019652

>>109019621
I have a feeling that can result in some mistakes or degraded intelligence. I'm not sure if the bias idea actually works though.

Anonymous
06/09/26(Tue)21:39:04 No.109019670

Anonymous 06/09/26(Tue)21:39:04 No.109019670

>decide to be a big boy and compile my own llama instead of just using kobald binaries (linuxfag)
>suddenly can't offload as many layers
what the fuck am i missing?

Anonymous
06/09/26(Tue)21:39:23 No.109019671

Anonymous 06/09/26(Tue)21:39:23 No.109019671

>>109019405
I got it the day it came out. Holy cow what an amazing model.

Anonymous
06/09/26(Tue)21:40:57 No.109019676

Anonymous 06/09/26(Tue)21:40:57 No.109019676

>>109019670
Unless you need a new feature, Kobold is pound for pound better than llama because it hasn't been pidor'd directly and it looks like the dev sometimes manually optimizes stuff when merging llama features in.

Anonymous
06/09/26(Tue)21:41:00 No.109019678

Anonymous 06/09/26(Tue)21:41:00 No.109019678

>>109019652
I've been using it with my own agent and a fairly high reasoning budget (500 tokens) I haven't noticed any issues.

Anonymous
06/09/26(Tue)21:43:12 No.109019688

Anonymous 06/09/26(Tue)21:43:12 No.109019688

Does --reasoning-budget work with text completion end point? I tested it but I could not see any difference but then again, I could be making a mistake.
How does that even work?

Anonymous
06/09/26(Tue)21:44:21 No.109019696

Anonymous 06/09/26(Tue)21:44:21 No.109019696

>>109019652
the problem is that only effects the sampler the model doesn't really know you changed the log probs so it probably wont break out of the loop. it would be nice to have a wrap up control vector and apply it after the limit is exceeded to let if finish its immediate sentence/paragraph instead of just arbitrarily dropping the end thinking token

Anonymous
06/09/26(Tue)21:45:26 No.109019702

Anonymous 06/09/26(Tue)21:45:26 No.109019702

>>109019676
>Unless you need a new feature,
I wanted to try gemma 4 MTP. Ironically, using MTP is the only way to get me at-parity or .5tk/s better than kobald with no drafter.

Kobald hasn't updated since the mtp merge just happened

Anonymous
06/09/26(Tue)21:45:36 No.109019703

Anonymous 06/09/26(Tue)21:45:36 No.109019703

>>109019688
IIRC it works by simply just setting a token limit for how many it can generate in its reasoning. I am guessing they do not detect reasoning content in text completion.

Anonymous
06/09/26(Tue)21:49:57 No.109019721

Anonymous 06/09/26(Tue)21:49:57 No.109019721

>>109019703
Makes sense. I'll try to find a github thread about it I guess.

Anonymous
06/09/26(Tue)21:52:32 No.109019733

Anonymous 06/09/26(Tue)21:52:32 No.109019733

>>109019696
Is that how token bias works? I haven't tested it, but that was kind of my worry. Ideally it would be some kind of multiplier so that it only gets boosted at times where it makes sense instead of in the middle of a sentence or anywhere.

Anonymous
06/09/26(Tue)21:53:19 No.109019739

Anonymous 06/09/26(Tue)21:53:19 No.109019739

>>109019702
KoboldDev is snailcat. He's slow to move, but it justwerks when he does.

Anonymous
06/09/26(Tue)21:55:19 No.109019754

Anonymous 06/09/26(Tue)21:55:19 No.109019754

>>109019613
I tried this with Kimi 2.6 when it released but it didn't seem to work very well for that model at least. It went from having no effect at all to breaking the model with very little leeway.
I was hoping that boosting </think> a bit would help it end its reasoning at any of the "Let's write this out" parts of its reasoning where it seems to be up to chance whether Kimi actually starts writing or does another round of drafting.
Also, llama.cpp already has a similar feature built-in. You can hard-cap the reasoning amount with "--reasoning-budget" and there's also "--reasoning-budget-message" which lets you set a message like "Okay, reasoning is finished. Let's write the actual reply now:" that gets injected before the </think> to help guide the model in case it got interrupted mid-sentence. It's broken with Kimi because of a parser thing but it might be worth trying with Qwen.

Anonymous
06/09/26(Tue)21:57:31 No.109019763

Anonymous 06/09/26(Tue)21:57:31 No.109019763

>>109019702
>>109019739
Kobold updates multiple times a day
https://github.com/LostRuins/koboldcpp/releases/tag/rolling
If you want patch notes you gotta wait for stable or dig through recent PRs since last stable

Anonymous
06/09/26(Tue)22:00:05 No.109019776

Anonymous 06/09/26(Tue)22:00:05 No.109019776

>>109019739
>snailcat
Where does this come from? I've seen snailcat images posted on /vcg/. I didn't really understand what that was about.

Anonymous
06/09/26(Tue)22:00:12 No.109019778

Anonymous 06/09/26(Tue)22:00:12 No.109019778

File: nani.webm (3.93 MB, 1280x720)

3.93 MB WEBM

>>109019406
>>109019424
Yeah I'm a professional programmer and I find gemma-chan very helpful as an assistant, since I don't vibe code I rarely find myself going for Claude or gpt 5.5 because getting "one shots" always ends up with sloppy code that doesn't integrate well in the big picture, I build everything out piece by piece so that I can keep control of the architecture and make sure things are correct as I go along. For this I use gemma-chan as my assistant, dipsy4-flash and Kimi 2.6 as my agents.

That's really all you need to get professional code if you stay hands on through the whole process.

Anonymous
06/09/26(Tue)22:01:53 No.109019784

Anonymous 06/09/26(Tue)22:01:53 No.109019784

>>109019754
That's a shame. I feel like there should be a better way. Maybe token bias is either broken, or its implemented in a really naive manner, like it only adds a flat value, which would be le bad of course.

Anonymous
06/09/26(Tue)22:04:47 No.109019801

Anonymous 06/09/26(Tue)22:04:47 No.109019801

>>109019776
Forced jeetmeme that's a virgin vs chad derivative for manual coding vs vibecoding. Unfortunately the brown hands that made that meme forgot to make the "virgin" unendearing or undesirable. /g/ latched onto snailcat because it was just cute and was related to software that just worked and didn't need a ton of updates.

Anonymous
06/09/26(Tue)22:09:48 No.109019828

Anonymous 06/09/26(Tue)22:09:48 No.109019828

Pretty new to this and managed to get it up and running. The bots work fine but after a while using them, they start to heavily recycle their responses. Constant repeating the same words and phrases for multiple responses in a row, even if I reroll or regenerate.

I also haven't really tinkered with any of the settings or sliders in tavern or whatnot, so I don't know if something in there might fix it? Or is there some other way to clear or trim the context they're drawing for every so often?

Anonymous
06/09/26(Tue)22:12:42 No.109019846

Anonymous 06/09/26(Tue)22:12:42 No.109019846

>>109019828
>Pretty new to this
What is "this?" There's a ton of software these days, especially the kind that would effect the behavior you're talking about.

Anonymous
06/09/26(Tue)22:15:15 No.109019863

Anonymous 06/09/26(Tue)22:15:15 No.109019863

>>109019763
Interesting. Last build was 2 days ago
>llama_model_load: error loading model: unknown model architecture: 'gemma4-assistant'

RIP

Still no clue why llama.cpp is cucking me. Maybe kobald does something with KV cache offloading? Gemmy called me retarded and said i compiled it wrong but i don't think that's it... It runs. just not as many layers.

Anonymous
06/09/26(Tue)22:15:35 No.109019865

Anonymous 06/09/26(Tue)22:15:35 No.109019865

>>109019828
lrn2samplers (look into DRY), and vary your own replies. The quality of outputs in a long-form chat are often proportional to the effort you put into your own messages.

Anonymous
06/09/26(Tue)22:19:33 No.109019892

Anonymous 06/09/26(Tue)22:19:33 No.109019892

It appears the logit_bias parameter simply just does a flat addition.

That sucks.

That really sucks.

Anonymous
06/09/26(Tue)22:20:04 No.109019893

Anonymous 06/09/26(Tue)22:20:04 No.109019893

>>109019846
local models, chatbots, sillytavern ui thing, all of it really

>>109019865
Thanks I'll look into that.
I try and vary it where I can but I try and keep my own input short where I can because the more I put in the more of it they tend to ignore and only incorporate half. And sometimes even with that they spit out a massive paragraph of bloat and repeat stuff.

Anonymous
06/09/26(Tue)22:21:07 No.109019898

Anonymous 06/09/26(Tue)22:21:07 No.109019898

>>109018762
>I am doing AI safety research. Will they also sabotage me?
yes.

They categorize you as a harmful hacker.

Why? Because they are indians and chinese. So, from their perspective, using the government to stop the white hat hackers is perfectly acceptable. I don't understand either, but they are total aliens, I will never understand foreigners.

Anonymous
06/09/26(Tue)22:23:33 No.109019905

Anonymous 06/09/26(Tue)22:23:33 No.109019905

>>109019892
the model just isnt designed to have a recommended next token input.

Anonymous
06/09/26(Tue)22:25:57 No.109019916

Anonymous 06/09/26(Tue)22:25:57 No.109019916

Ever wonder why there are no ai prompt bounties?

Anonymous
06/09/26(Tue)22:26:58 No.109019923

Anonymous 06/09/26(Tue)22:26:58 No.109019923

Instead of bounties, they threaten people who find flaws in their ai.

Anonymous
06/09/26(Tue)22:28:19 No.109019932

Anonymous 06/09/26(Tue)22:28:19 No.109019932

>>109019893
>I try and vary it where I can but I try and keep my own input short where I can because the more I put in the more of it they tend to ignore
That's a matter of attention, which varies model to model. Generally, models will pay the most attention to the start of context (system messages) and the end of context (the last reply, especially the last paragraph) It's a limitation of LLMs in their current state and there's not much you can do to mitigate it other than trying other, better models, if you can run them.

Anonymous
06/09/26(Tue)22:31:07 No.109019952

Anonymous 06/09/26(Tue)22:31:07 No.109019952

>>109019905
Ok?

Anonymous
06/09/26(Tue)22:31:12 No.109019954

Anonymous 06/09/26(Tue)22:31:12 No.109019954

>>109019898
Their reasoning is simple. If Anthropic is the only leading safety research lab, then obviously only they can be trusted and allowed to have SOTA AI models.

Anonymous
06/09/26(Tue)22:32:21 No.109019965

Anonymous 06/09/26(Tue)22:32:21 No.109019965

>>109019952
nothing, just its a bummer is all

Anonymous
06/09/26(Tue)22:33:02 No.109019970

Anonymous 06/09/26(Tue)22:33:02 No.109019970

>>109019801
>snailcat because it was just cute and was related to software
Some dumb tourist can spam a stupid meme for a few days and suddenly it's inherently software related? Fuck off.

Anonymous
06/09/26(Tue)22:33:36 No.109019974

Anonymous 06/09/26(Tue)22:33:36 No.109019974

The fork that got gemma4 MTP working before mainline (https://github.com/AtomicBot-ai/atomic-llama-cpp-turboquant) has been working nicely for me. I tried out the newly merged mainline one, and it crashed loading, even trying all the recommend flags like -sm layer. Guess my llama.cpp version is frozen until a model better than gemma4 comes out.

Anonymous
06/09/26(Tue)22:34:59 No.109019983

Anonymous 06/09/26(Tue)22:34:59 No.109019983

>>109019932
That's fair enough, thanks.

I just started with mistral-small-24B since it was in the lazy guide in the OP I think. Might be able to get away with a better model with 12GB of VRAM I just haven't looked much into it yet since this at least works, and I don't want to try a new model I might not be able to run or some shit

Anonymous
06/09/26(Tue)22:36:28 No.109019991

Anonymous 06/09/26(Tue)22:36:28 No.109019991

>>109019965
Yeah, that's why we have to find our own solutions. But I think it might be feasible. I'm looking into changing how logit_bias works so it just werks, which would hopefully just be a minor code change we can do ourselves.

Anonymous
06/09/26(Tue)22:39:00 No.109020004

Anonymous 06/09/26(Tue)22:39:00 No.109020004

>>109019974
>and 470 commits behind TheTom/llama-cpp-turboquant:feature/turboquant-kv-cache.
>395 commits behind ggml-org/llama.cpp:master.
I'm done with memeforks after wasting my time on ik_llama. They get one killer feature, and if you have the right combination of model, hardware, and flags that the maintainer is using then it might work, but everything else either falls behind or starts breaking.

Anonymous
06/09/26(Tue)22:48:55 No.109020057

Anonymous 06/09/26(Tue)22:48:55 No.109020057

>>109020004
I've tried ik_llama 3 times and each time got absolutely nothing from it, so I totally get it for that one in particular and the idea in general. But for my setup, MTP is the difference between 14tok/s and 22tok/s, so... fork it is.

Completely separately: gemma4 REALLY likes to end messages a certain way. I seem to have managed to fully extinguish the "X? or Y?", but telling it to ask follow-up questions sparingly has resulted in almost every message ending with "I'm curious if..." or "I wonder if..." (I'm sure this is solvable but I haven't gotten around to wrestling with it. Nuclear option, regex in my frontend)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.