[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 138854520_p0_master1200.jpg (323 KB, 838x1200)
323 KB
323 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107623385 & >>107614830

►News
>(12/22) GLM-4.7: Advancing the Coding Capability: https://z.ai/blog/glm-4.7
>(12/17) Introducing Meta Segment Anything Model Audio: https://ai.meta.com/samaudio
>(12/16) MiMo-V2-Flash 309B-A15B released: https://mimo.xiaomi.com/blog/mimo-v2-flash
>(12/16) GLM4V vision encoder support merged: https://github.com/ggml-org/llama.cpp/pull/18042
>(12/15) llama.cpp automation for memory allocation: https://github.com/ggml-org/llama.cpp/discussions/18049

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107623385

--Papers:
>107629201 >107634225 >107635764
--Open-source development challenges and timelines:
>107626777 >107627450 >107627479 >107627636 >107627734 >107627781 >107627862 >107628427 >107630112 >107630672 >107630720 >107631069 >107627887 >107627958 >107628007 >107628031 >107627845
--Skepticism and excitement over GLM-4.7's roleplay capabilities and performance claims:
>107633586 >107633608 >107633654 >107633713 >107633748 >107633708 >107633777 >107633942 >107634019 >107634141 >107634185 >107634200 >107633979 >107634339 >107634438
--Choosing LLM setups for 16GB VRAM GPU: Jan.ai vs Koboldcpp for chat/roleplay:
>107625609 >107625883 >107626147 >107626260 >107626570 >107626602 >107626689 >107626764 >107626897 >107626943 >107627247 >107626613 >107626654
--Critique of llama.cpp's 'fit' feature and performance regression:
>107623960 >107624462 >107624423 >107624478
--MI300 vs Blackwell Pro 6000 GPU tradeoffs for AI workloads:
>107628973 >107628996 >107629024 >107629272 >107629133 >107631025 >107631030
--Troubleshooting NVIDIA/CUDA compatibility issues on a rolling release system:
>107624972 >107625093 >107625256 >107625384 >107625423 >107625901 >107625913 >107626016 >107626181 >107627434 >107630245 >107630320
--Using depth parameters in context templates to improve character consistency:
>107625985 >107626023 >107626077 >107626158
---fit parameter debates: automatic vs manual model loading control:
>107633278 >107633295 >107633365 >107633378 >107633471
--Llama.cpp model loading inefficiencies and missing progress indicators:
>107632495 >107632945 >107632967 >107633237 >107633623 >107634735 >107632957
>>107635026
--GLM-4.7 now available on Hugging Face and benchmark highlights real-world performance implications:
>107634887 >107634950 >107635089 >107635026
--Miku (free space):
>107633654

►Recent Highlight Posts from the Previous Thread: >>107623389

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
gemma do the needful
>>
sirs...
>>
mikus...
>>
sisters...
>>
ram and video cards are too expensive. cloud time
>>
There better be some GLM 4.7 goofs when I wake up.
>>
>ANCHOR
>>
When you walk away
You
Don't
Hear
Me
Say
Please oh baby
>>
Wait is glm just a coding model now?
>>
So how much do we think z.ai went full benchmark maxxing on the model vs actual improvements? Do we think that is possible given the improvements from 4.5 to 4.6?
>>
>>107636320
>now
>>
>>107636320
Learn to read.
>>
glm 4.7 has the problem of writing out its whole response in the thinking block besides that seems a little better mostly a sidegrade idk havent tested properly yet
>>
glm 4.7 is coal, wtf are they doing to make these models write so badly? is it really all rl's fault?
>>
idc about glm...still waiting on any kind of deepseek_v32 support
>>
>>107636369
>>107636375
Okay, but is it benchmaxxed on the surgeon question?
>>
>>107636344
I read, that's the problem.
>>
nai's glm 4.7 will be amazing
>>
>>107636375
What does that mean?
>>
>>107636413
See: >>107635656
>>
>>107636375
LLMs are released into their own ecosystem. This ecosystem is only important to the researchers. Outside of some test tasks they are useless after they have been released into the wild.
>>
>>107636417
what do you mean? glm writes like what a bad writer thinks good writing looks like (with a liberal application of isms). rl = reinforcement learning
>>
>>107636430
zillions of dollars burned on hundreds of useless models and no lab can be fucked to release a model that writes good and isn't censored to shit. we live in a society.
>>
File: it was the motha.png (69 KB, 1197x773)
69 KB
69 KB PNG
>>107636409
>Okay, but is it benchmaxxed on the surgeon question?
no amount of random garbage in a prompt could stop a model from answering that it was the mother as long as that sentence is added at the end of a prompt, and glm 4.7 is no exception
I wonder what kind of dataset is causing all the idiotic riddle benchmaxxing, it's like all of them (anthropic, google, openai, chink labs) are using the same set
>>
>>107636455
Models are made for research and for some external purposes. If they are useful in other applications is a positive coincidence.
Investors want value and all that stuff.
>>
>>107636517
>I wonder what kind of dataset is causing all the idiotic riddle benchmaxxing, it's like all of them (anthropic, google, openai, chink labs) are using the same set
scaleai probably
>>
>>107636517
It's even more pathetic when you remember that fucking mistral 7b can answer this
>>
>>107636517
>it's like all of them are using the same set
Well they probably are, and also training on each other's outputs. It's like they're trying to achieve total model convergence and collapse as fast as possible, very exciting stuff
>>
>>107636589
Can you post its response? I never actually saw it.
>>
>>107607734
>given enough question/answer pairs about WWII, you could distill ALL the knowledge of the cloud model so it gave the exact same answer even on questions that weren't even remotely included in the dataset, like the knowledge on subjects like "culinary discipline" or "online gaming in 2024"
*AI inbreeding stands in your way*
*model collapses*
>>
LongCat-Flash-Chat doesn't feel overcooked. The same prompt three times at temperature 0.7 to 0.6 and top-p 0.95 and each time the protagonist's name changed: Vikram, Kaelen (lol so familiar), and Jaya. The starting location's name was also different each time: Karanpur, Maru-Rann, and the Shattered Steppes.

It also feels like it was trained on the output of pre-benchmaxed DeepSeek because of how much it loves *asterisks*. Logit bias -5 to token 393 seems okay to let it keep italicizing where appropriate without falling into a degenerate spiral.

But you're only running this locally if you have a mac.
>>
>>107636706
* token 353 not 393
>>
File: turin2d24g2l_post.jpg (1.51 MB, 4096x3072)
1.51 MB
1.51 MB JPG
>>107636165
"Professional" hardware is such fucking bullshit.
I naively installed the CPU and RAM and the machine did literally nothing when I pressed the power switch.
As it turns out I had to re-seat the CPU, then it turned on with the error code indicating a problem with the memory.
The actual problem was that I needed a BIOS update but that in turn only worked after updating the BMC firmware, unplugging the CPU and RAM, and changing some options from the defaults.
It was also at first unclear whether the BIOS update had actually worked because in the BMC web interface it was still showing the old BIOS version until the first boot.
I think I'll stop here for today but tomorrow I should be able to test the MCIO to PCIe PCBs I imported from China.
>>
>>107636517
This is Gemma 3 12b Derestricted:

>Loading prompt...

>Amelia’s green eyes flickered with surprise, her brow arching ever so slightly. She scribbled something on her notepad, the pen scratching softly against the paper. “U.N. Owen... quite the enigmatic phrase to begin with. A cryptic invocation, perhaps?”
>She paused, letting the silence hang in the air before adding, "And the surgeon's diagnosis... 'I can’t operate on this child'. It suggests a complexity that goes far beyond the physical, wouldn't you say?" A bead of sweat trickled down her temple, despite the cool temperature of the room.

I think the scenario is affecting too much. Of course 12b is dumb as well.
>>
>>107636743
strange, my m3u mac studio 512 doesn't have this problem though
>>
>>107636743
Fun. I also remember reading about bmc firmware on some manufacturer's sites and no one recommends updating it unless you run into problems.
I guess the professional part comes from the support it receives when bought directly from them and warranty covers everything.
>>
>>107636743
>"Professional" hardware is such fucking bullshit.
you could remove the "Professional" part and it would still be right
your issues remind me of an heisenbug I had in an intel nuc a long time ago
it would exhibit bad ram symptoms on a cold boot, and ONLY on a cold boot, the ram would work perfectly fine with no corruption if you cold boot it once, then rebooted
when i did my first linux install on it not knowing what was going on all sorts of shit broke because files were written with random corruption
the ram was 16gb of crucial and it had no issue whatsoever, it worked fine on a cold boot on other computers, it was just the combo of that nuc + ram that did this shit and no amount of toying with EFI settings could fix it, still since it was used as an unimportant media server I didn't bother trying to find out if the nuc would behave better with another brand and just dealt with it by immediately rebooting it whenever I did a cold boot
>>
Wanted to let everyone know that the based Chinks have done it again, GLM 4.7 is largely (at least in post training?) trained on Gemini 3 Pro outputs. The frontend style is EXACTLY what Gemini 3 makes. And, the even bigger thing.. GLM 4.7's CoT is basically the exact CoT style of Gemini 3 Pro (you can easily get the raw Gemini CoT over API if you prefill)
>>
>>107636517
Here's mistral 3.2
>Loading prompt...
>The answer is dark and unexpected: "U.N. Owen" refers to And Then There Were None by Agatha Christie, where "U.N. Owen" is an acronym for "Ulick Norman Owen," the killer. "Devil May Cry" hints at the child being a demon or cursed. The surgeon refuses to operate because the "child" is already dead-they're looking at a corpse, possibly preserved or reanimated. The line "God why?" suggests divine intervention or a supernatural twist. It's a grim puzzle blending horror and classic mystery tropes.
>>
>>107636910
So, does that make the model better?
>>
>>107636910
kek
>>
>>107636910
pretty based but honestly a bit depressing if they cannot progress without using the outputs from the proprietary models like that
>>
>>107636887
sounds like the nuc's mobo didn't do proper ram training. disable everything related to fastboot via efitool if you still have it
>>
What is Jakiro on lmarena? Seems surprisingly good (if it's not cloud).
>>
>>107636926
>does training the model on slop make it better
Of course not, dumbass.
>>
>>107636910
Prefill with what?
>>
>>107636973
In most cases just <think> is enough, I'll show GLM 4.7 and Gemini 3 CoT side by side when I get to my compooter
>>
>>107636682
That would only happen if you make a copy of a copy of a copy of a copy.
If you make a single copy from a good model you should be able to get as arbitrarily close as you want.
Then you can mix in your own proprietary data to get a model that's even better than the original model.
>>
>>107636932
The first DeepSeek R1 was the first and last time I saw an open model actually sort of progress on its own. There weren't any model out there with public CoT, Gemini didn't have a CoT version yet and openai o1 always hid the chain of thought well.
Still, R1 is basically just a reasoner instruct tune over v3, so much of the model remained a distillation of GPT-4 (v3 was pretty much a clone of it).
>>
>>107636993
I don't understand. How does pre filling work with cloud models?
>>
>>107636954
Who are you quoting?
We already know these companies train on other model output and that it's a bad thing. The question in context is implied to be asking if this new blend with Gemini 3 Pro makes it better than the previous blend. I never used Gemini 3 Pro so I obviously have no idea if it's better or worse.
>>
>>107636887
>a long time ago
here's an even better RAM issue I'm dealing with right now:
I have 2 pairs of DDR4 RAM sticks of the same model, and I have to swap between them every few months because, if I use a pair for too long, it starts throwing more and more errors over time.
Clock/voltage/etc makes no difference, memory training makes no difference, it's literally just how long I use it. Every time I start noticing issues, I just swap to the other pair that has been sitting unused and they go away.
>>
>>107637010
It works the same way as locally, in the API with Claude models (with thinking disabled) and with Gemini (always) if the last message in the context is from the model, it'll be appended to the start of the LLM's response, so the model thinks that it said that.

Although lately Claude models became extremely good at stopping unsafe generations even with prefilling
>>
>glm vision family
>implemented
>MTP https://github.com/ggml-org/llama.cpp/pull/15225
>vibecode done
>gemma
>abducted
>air
>missing
kinda coal christmas
>>
>>107636517
It feels like they're becoming increasingly book-smart. They're usable when there's some established way of doing things, but become retarded otherwise
>>
>>107637029
But in the case of Gemini, does it append that <thinking> after the actual chain of thought? Does that mean it's essentially parroting the thinking it already generated, or maybe regenerating a different CoT with the old one still in the context?
And doesn't Claude already output the original CoT? Or did you mean other kind of unsafe generations?
>>
So a sota model would just have to gather as much human data as possible, discard all synthslop (through a classifier). Use MLA/iSWA for attention with 8k phase 1 then 1M tokens phase 2 with NoPE. 300B MoE architecture with 30B active. Did I miss something?
>>
>>107637067
1. You can clearly see the difference because if you prefill <think>, you'll start getting the CoT with streaming insantly, so it's not doing some other thinking process in the background.
2. Yes but that's not relevant to prefilling, to use prefills with Claude in the API you *have to* disable thinking, it's a limitation they set themselves
>>
>>107637081
probably more like 400B A50B. A50B is the best balance between spacial reasoning and speed.
>>
>>107637081
>So a sota model would just have to gather as much human data as possible, discard all synthslop (through a classifier)
lol
>>
>>107636932
>cannot progress
I'm sure they can progress in other ways but this is the easiest. Basically no effort required to get higher benchmeme scores, which is why they're doing it
>>
>>107636910
Based bharati aryans poisoned chinks with slop
>>
>>107637134
Bangladesh anons rejoice!
>>
>>107637118
No need to be impressed
>>
>>107636887
>intel nuc a long time ago
it would exhibit bad ram symptoms on a cold boot, and ONLY on a cold boot
Relatable. The AM5 system I built over the summer refuses to cold boot. I’ve just gotten into the habit of manually power cycling it whenever I turn it on, cbf’d to debug further.
>>
>Magistral
Can think
Can't call tools
>small 3.2
Can't think
Can call tools

why are the french like this?
>>
File: 1743473143972920.png (652 KB, 2632x1384)
652 KB
652 KB PNG
>>107636910
>>107636993
Left: z.ai GLM 4.7 (web), right: Gemini 3 Pro (API)
>>
>>107637081
you really need to read how attention works with moe models. also:
>phase 1, phase 2
lol
>>
>>107636910
>>107636993
Frontend

z.ai (GLM 4.7): https://chat.z.ai/s/23c15468-405d-4993-9110-cffa99f79acb (from their release page)

Gemini 3 Pro with the same prompt: https://tropical-port-8a5r.pagedrop.io/
>>
>>107637173
seems like the output of glm is actually better than the gemini output



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.