/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/03/25(Fri)23:10:00 No.106785094

File: glm_miku.png (27 KB, 400x500)

27 KB PNG

/lmg/ - Local Models General Anonymous 10/03/25(Fri)23:10:00 No.106785094 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106777408 & >>106769660

►News
>(10/03) /Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/03/25(Fri)23:10:40 No.106785099

Anonymous 10/03/25(Fri)23:10:40 No.106785099

File: kimi-miku-svg-attempt.png (96 KB, 569x785)

96 KB PNG

►Recent Highlights from the Previous Thread: >>106777408

--Memory channel limitations and EPYC upgrade considerations:
>106777689 >106777726 >106777777 >106777781 >106777996 >106778156 >106778198 >106778404 >106778453 >106778486 >106778493 >106778506 >106778537 >106778594 >106778632 >106778813 >106778915 >106778929
--VRAM requirements and future-proofing strategies for AI workloads:
>106781502 >106781514 >106781592 >106781611 >106781767 >106781828 >106781948 >106781962 >106782007 >106782045 >106782252 >106781815 >106781726 >106781750 >106781617 >106781775 >106781715 >106781725 >106781776 >106781798 >106781841 >106781897 >106781943
--MoE scalability tradeoffs: cost, speed, and accessibility challenges:
>106779306 >106779349 >106779414 >106779428 >106779883 >106779973 >106779989 >106780006 >106779998 >106779914 >106779939 >106779990 >106780169 >106779408
--Training a model on 4chan's /co/ board dataset with evolving loss trends:
>106781317 >106781408 >106781452 >106781500 >106781506 >106781793 >106782959 >106782993 >106783031 >106783170 >106783177 >106783207 >106783254 >106783244 >106783263 >106783288
--Feasibility of conveying visual data to visionless LLMs via text formats:
>106782778 >106782814 >106782892 >106782906 >106782966
--Testing VibeVoice 7B audio synthesis quality on RTX 5090:
>106779632 >106779663
--Local model frustration and alternatives: self-training and prompt tweaking:
>106780504 >106780511 >106780552 >106780618 >106780768 >106780827 >106780686
--ZLUDA 5 update enables CUDA backend for llama.cpp, performance compared to ROCm:
>106781061 >106781109 >106781417
--Miku (free space):
>106778073 >106779336 >106780879 >106780921 >106781197 >106781753 >106782708 >106783263 >106784351 >106784397 >106784522 >106784721
--NOT Miku:
>106779078 >106781060 >106782028 >106782155 >106782405 >106782462 >106782736 >106783082 >106783172

►Recent Highlight Posts from the Previous Thread: >>106777411

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/03/25(Fri)23:12:33 No.106785107

Anonymous 10/03/25(Fri)23:12:33 No.106785107

dead general

Anonymous
10/03/25(Fri)23:17:47 No.106785130

Anonymous 10/03/25(Fri)23:17:47 No.106785130

>>106785107
Who is second in command?

Anonymous
10/03/25(Fri)23:19:36 No.106785136

Anonymous 10/03/25(Fri)23:19:36 No.106785136

>>106785130
teto

Anonymous
10/03/25(Fri)23:20:25 No.106785140

Anonymous 10/03/25(Fri)23:20:25 No.106785140

>>106785134
which quant do you use?

Anonymous
10/03/25(Fri)23:24:59 No.106785160

Anonymous 10/03/25(Fri)23:24:59 No.106785160

File: at your service.png (2.16 MB, 1536x1536)

2.16 MB PNG

>>106785134
oh no missed saber finger

me when I download GLM 4.6 and have it describe in various flowery prose how Saber devotes herself to serving the noble purpose using her mouth as a scabbard for my cock
ahhh, culture. I haven't visited this scenario since that last massive copypasta I dumped.
Even at the braindead quant it's good.

>>106785140
q1

Anonymous
10/03/25(Fri)23:29:36 No.106785194

Anonymous 10/03/25(Fri)23:29:36 No.106785194

The reason cloud models get better is because we correct them in our conversations. They probably measure ppl on human text responses and train on it for new model versions. Seems Anthropic and the Chinese are the only ones really training NSFW atm.

Anonymous
10/03/25(Fri)23:29:37 No.106785195

Anonymous 10/03/25(Fri)23:29:37 No.106785195

>>106785160
saber's tongue belongs to the anuses of horses

Anonymous
10/03/25(Fri)23:31:58 No.106785204

Anonymous 10/03/25(Fri)23:31:58 No.106785204

if glm 4.6 local's opus moment if deepseek was its gpt4 moment?

Anonymous
10/03/25(Fri)23:35:57 No.106785226

Anonymous 10/03/25(Fri)23:35:57 No.106785226

>>106785204
deepseek was the claude sonnet 2 ish moment, knows a ton and writes well but is dumb and crazy, glm is the claude sonnet 4 moment, smart and creative but it knows a bit less than sonnet in comparison

Anonymous
10/03/25(Fri)23:35:58 No.106785227

Anonymous 10/03/25(Fri)23:35:58 No.106785227

How do I get the llama-server web interface to show thinking tokens for GLM 4.6?
I checked the checkbox on the preferences window but nothing.

Anonymous
10/03/25(Fri)23:44:46 No.106785265

Anonymous 10/03/25(Fri)23:44:46 No.106785265

>>106785160
what t/s are you getting on q1? i am at 6ish average. prompt processing is around 50t/s

Anonymous
10/03/25(Fri)23:44:56 No.106785266

Anonymous 10/03/25(Fri)23:44:56 No.106785266

>>106785160
I look like this irl

Anonymous
10/03/25(Fri)23:51:18 No.106785304

Anonymous 10/03/25(Fri)23:51:18 No.106785304

>>106785265
12t/s generating, 250t/s processing

Anonymous
10/03/25(Fri)23:52:15 No.106785310

Anonymous 10/03/25(Fri)23:52:15 No.106785310

>>106785304
damn. well i am on an iq4k instead of iq, so i guess that is why. how is the quality at q1? iq4k is fucking god tier, best model ever.

Anonymous
10/03/25(Fri)23:54:12 No.106785324

Anonymous 10/03/25(Fri)23:54:12 No.106785324

>>106785227
https://github.com/ggml-org/llama.cpp/pull/16394
https://github.com/ggml-org/llama.cpp/pull/16364
Doesn't seem to be working correctly. It may get fixed with one of those.

Anonymous
10/03/25(Fri)23:56:59 No.106785342

Anonymous 10/03/25(Fri)23:56:59 No.106785342

What is the impact of zram swap instead of using swap on nvme? I'd think using disk is better with llama-server, right?

Anonymous
10/03/25(Fri)23:57:33 No.106785345

Anonymous 10/03/25(Fri)23:57:33 No.106785345

>>106785310
I mean it beats out anything I was using before so I'm happy with it.

Anonymous
10/03/25(Fri)23:58:14 No.106785350

Anonymous 10/03/25(Fri)23:58:14 No.106785350

>>106785310
What's the difference between unsloth quants and those quants? Should I stick with a dynamic unsloth quant if I can only fit iq3?

Anonymous
10/03/25(Fri)23:59:49 No.106785359

Anonymous 10/03/25(Fri)23:59:49 No.106785359

>>106785342
What's the difficulty in testing it yourself?
Up to a certain point, i'd assume zram is better, but there's only so much compression you can get out of practically random numbers.

Anonymous
10/04/25(Sat)00:00:08 No.106785363

Anonymous 10/04/25(Sat)00:00:08 No.106785363

>>106785345
does it ever go schizo? like does it ever generate words that just dont make any sense? before i was using an iq2xxs of glm 4.5 and it would occasionally just repeat words in sequence, like "the the".
>>106785350
are you using ikllama? if not, then just stick with what you have. if you are using ikllama, get the biggest ubergarm quant that you can fit into both your ram + vram

Anonymous
10/04/25(Sat)00:07:25 No.106785393

Anonymous 10/04/25(Sat)00:07:25 No.106785393

gpt-oss-20b-base is gone?

Anonymous
10/04/25(Sat)00:08:59 No.106785397

Anonymous 10/04/25(Sat)00:08:59 No.106785397

>>106785393
>https://huggingface.co/mradermacher/gpt-oss-20b-base-GGUF
gee-huffs are still here.

Anonymous
10/04/25(Sat)00:09:51 No.106785402

Anonymous 10/04/25(Sat)00:09:51 No.106785402

>>106785342
Why would you ever use swap on nvme? The model mmaping the file should already allow you to page out most of the weights (it'll be unusable slow).

Anonymous
10/04/25(Sat)00:11:28 No.106785409

Anonymous 10/04/25(Sat)00:11:28 No.106785409

>>106785402
It's a precaution obviously. I'm all about safety.

Anonymous
10/04/25(Sat)00:18:22 No.106785440

Anonymous 10/04/25(Sat)00:18:22 No.106785440

>>106785359
>>106785402
Base idea here is to avoid stressing cpu under heavy memory loads and therefore I think using traditional swap is better than zram when taxing memory to its limits with llama-server.
Memory mapping is not related to this question per se.

Anonymous
10/04/25(Sat)00:27:20 No.106785478

Anonymous 10/04/25(Sat)00:27:20 No.106785478

anyone else on a 395+ CPU? With rocm I get like 70-80 tokens per second but half the time I use it I just get `////////////////////////////////////` as my output.

Vulkan works consistently but the output is way slower. Maybe it's time to graduate to something other than lm studio

Anonymous
10/04/25(Sat)00:28:27 No.106785481

Anonymous 10/04/25(Sat)00:28:27 No.106785481

>>106785094

[
  {
    "generated_text": "Hey guys, did you know that in terms of breeding, your sister is the most compatible with you? Not only do you get to pass on your genes more completely than fucking a stranger, but there's also a 98.75% chance of having healthy children (we can round this to 100%)? Think about it, you could be a 75% genetic match with your kids! Gosh, who would wanna throw away their bloodline to some random hussy? \n>lol ultraratino\nlol no.\n>lol incestfag\nlol no.\n>lol stop pretending you know what's good for your genes\nI don't know what you think I'm doing here, but I'm not trying to convince anyone that incest is good for you, just informing you that 1) it's not the only reason your genes might be fucked up and 2) it's not just a hypothetical argument, it's your sister you're ignoring.\n>lol oh but she's not my sister biologically! So it's totally fine!\nIf you want to fuck your blood sister, go ahead. I'm not gonna stop you."
  }
]

Anonymous
10/04/25(Sat)00:32:35 No.106785504

Anonymous 10/04/25(Sat)00:32:35 No.106785504

>>106785099
we have a not miku category now woaw

Anonymous
10/04/25(Sat)00:37:21 No.106785525

Anonymous 10/04/25(Sat)00:37:21 No.106785525

I'm feeling so jelly of cpumaxxer friends that get to play with glm 4.6. If only I wasn't poor...

Anonymous
10/04/25(Sat)00:38:18 No.106785529

Anonymous 10/04/25(Sat)00:38:18 No.106785529

>>106785478
what model are you using? also yes, get off of lm studio

Anonymous
10/04/25(Sat)00:38:29 No.106785531

Anonymous 10/04/25(Sat)00:38:29 No.106785531

BOYS
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf

Anonymous
10/04/25(Sat)00:43:15 No.106785551

Anonymous 10/04/25(Sat)00:43:15 No.106785551

>>106785525
my epyc was $300 and my 256gb ram kit was $350. i get 5t/s on glm 4.6 at q3

Anonymous
10/04/25(Sat)00:45:29 No.106785564

Anonymous 10/04/25(Sat)00:45:29 No.106785564

New Mistral large 3 and llama 4 thinking coming soon are going to be crazy

Anonymous
10/04/25(Sat)00:46:06 No.106785566

Anonymous 10/04/25(Sat)00:46:06 No.106785566

i hope mistral large is a 180b moe

Anonymous
10/04/25(Sat)00:46:34 No.106785568

Anonymous 10/04/25(Sat)00:46:34 No.106785568

>>106785564
I'm betting 2-4% benchmark improvements across the board. Wild times!

Anonymous
10/04/25(Sat)00:46:43 No.106785570

Anonymous 10/04/25(Sat)00:46:43 No.106785570

>>106785531
Why didn't they ever scale this up? It's clearly better or equal to other models around the same parameters but way faster and smaller memory footprint, it would be free money/publicity (which is the same thing as money because investors would love it). Is microsoft retarded? Looking at their huggingface page it looks like they're just releasing tiny dogshit models that are unusable for anything, over and over. What's the point?

Anonymous
10/04/25(Sat)00:54:21 No.106785603

Anonymous 10/04/25(Sat)00:54:21 No.106785603

>>106785570
It took them one year for them to scale up the training data to 4T. They did say they were planning to do bigger models next, but if it takes them another year, we might not see something until mid-2026.

Anonymous
10/04/25(Sat)00:54:58 No.106785606

Anonymous 10/04/25(Sat)00:54:58 No.106785606

miku feet

Anonymous
10/04/25(Sat)00:56:06 No.106785609

Anonymous 10/04/25(Sat)00:56:06 No.106785609

>>106785529
the qwens, phi, magistral small all generate garbage with rocm, normal output with vulkan. What's the model runner I should be using. I didn't like ollama.

Anonymous
10/04/25(Sat)00:57:53 No.106785617

Anonymous 10/04/25(Sat)00:57:53 No.106785617

>>106785609
ikllama or llamacpp. both with sillytavern. or if you want something similar to but better than lm studio, https://github.com/oobabooga/text-generation-webui is decent. i havent used https://github.com/LostRuins/koboldcpp in ages but people here seem to like it.

Anonymous
10/04/25(Sat)01:00:04 No.106785627

Anonymous 10/04/25(Sat)01:00:04 No.106785627

>>106785617
these uis all look like shit compared to lm-studio but pretty ui is useless if the output is slow af. Going to try rocm + llama-server directly and see if I can repro the garbage output

Anonymous
10/04/25(Sat)01:03:38 No.106785640

Anonymous 10/04/25(Sat)01:03:38 No.106785640

>>106785531
>2B-4T
For a second my dumbass thought that Microsoft had just made a 4T parameter Bitnet MoE model with only 2B active parameters

Anonymous
10/04/25(Sat)01:05:58 No.106785654

Anonymous 10/04/25(Sat)01:05:58 No.106785654

>>106785397
ty

is it any good?

Anonymous
10/04/25(Sat)01:06:32 No.106785655

Anonymous 10/04/25(Sat)01:06:32 No.106785655

>>106785440
It always boggles me when anons have all the tools for the thing they want to try and they don't.
I rephrase. What stops you from trying?

Anonymous
10/04/25(Sat)01:12:02 No.106785674

Anonymous 10/04/25(Sat)01:12:02 No.106785674

>>106785627
Reproduced with llama-server directly. Makes sense given lm studio is just a frontend for llama.cpp

Anonymous
10/04/25(Sat)01:13:53 No.106785681

Anonymous 10/04/25(Sat)01:13:53 No.106785681

>>106785094
lol

Anonymous
10/04/25(Sat)01:15:17 No.106785691

Anonymous 10/04/25(Sat)01:15:17 No.106785691

>>106785655
Why waste the time trying when someone else has the answer?

Anonymous
10/04/25(Sat)01:22:59 No.106785726

Anonymous 10/04/25(Sat)01:22:59 No.106785726

>>106785691
You could have tested it already. Get real numbers for your hardware, your model, your inference program.
But whatever. Good luck with that.

Anonymous
10/04/25(Sat)01:24:08 No.106785731

Anonymous 10/04/25(Sat)01:24:08 No.106785731

>>106785160
Im horny

Anonymous
10/04/25(Sat)01:29:43 No.106785747

Anonymous 10/04/25(Sat)01:29:43 No.106785747

File: 1740133636648054.jpg (16 KB, 400x279)

16 KB JPG

>i don't need to buy a 256gb epyc server to have saber fondle my cock

>i don't need to buy a 256gb epyc server to have saber fondle my cock

>i... i...

Anonymous
10/04/25(Sat)01:30:55 No.106785751

Anonymous 10/04/25(Sat)01:30:55 No.106785751

File: 1731045591124463.jpg (560 KB, 1152x2048)

560 KB JPG

>>106785094
ダンボール = corrugated cardboard/cardboard box

Anonymous
10/04/25(Sat)01:31:16 No.106785755

Anonymous 10/04/25(Sat)01:31:16 No.106785755

>>106785747
A cheap hooker is only $10 retard

Anonymous
10/04/25(Sat)01:31:25 No.106785756

Anonymous 10/04/25(Sat)01:31:25 No.106785756

File: 1581736925589.jpg (40 KB, 452x363)

40 KB JPG

>recently built a new rig to play with the AI toys after trucking along with hardware from 2013
>RTX 4090
>ryzen 9 9950x
>64GB RAM
>7GB/s SSD NVME
>thought I was getting top dog stuff
>can't even run GLM 4.6 at not-crawling speeds(if at all) if an Air version don't come out
>GLM is not even in the tier of the largest models around
I was aware that 64 GB RAM wasn't super high-end but cmon

Anonymous
10/04/25(Sat)01:31:49 No.106785758

Anonymous 10/04/25(Sat)01:31:49 No.106785758

>>106785755
where do you live?

Anonymous
10/04/25(Sat)01:32:12 No.106785760

Anonymous 10/04/25(Sat)01:32:12 No.106785760

>>106785758
Namibia

Anonymous
10/04/25(Sat)01:33:52 No.106785767

Anonymous 10/04/25(Sat)01:33:52 No.106785767

File: ent.png (59 KB, 864x219)

59 KB PNG

>>106785342
>>106785359
Haven't tested but can't see it helping, just more work for CPU

Anonymous
10/04/25(Sat)01:34:52 No.106785775

Anonymous 10/04/25(Sat)01:34:52 No.106785775

>>106785756
Sucks, but what's another 2 sticks? Do it do it do it!

Anonymous
10/04/25(Sat)01:37:29 No.106785786

Anonymous 10/04/25(Sat)01:37:29 No.106785786

>>106785760
my condolences

Anonymous
10/04/25(Sat)01:39:59 No.106785797

Anonymous 10/04/25(Sat)01:39:59 No.106785797

File: 1733675794866749.jpg (229 KB, 832x1472)

229 KB JPG

>>106785094

Anonymous
10/04/25(Sat)01:40:21 No.106785799

Anonymous 10/04/25(Sat)01:40:21 No.106785799

>>106785775
I'm tempted to but GPT told me 4 dimms suck and it would be best for me to replace the current 2x 32GB ones with 2x 64GB sticks. So kinda of a hassle to replace/add, and a lot of potential to overspend.
I'll get to it sometime tho

Anonymous
10/04/25(Sat)01:43:19 No.106785812

Anonymous 10/04/25(Sat)01:43:19 No.106785812

>>106785726
I'm not OP bro

Anonymous
10/04/25(Sat)01:47:35 No.106785833

Anonymous 10/04/25(Sat)01:47:35 No.106785833

>>106785812
Point stands... bro...

Anonymous
10/04/25(Sat)01:55:12 No.106785862

Anonymous 10/04/25(Sat)01:55:12 No.106785862

File: ComfyUI_00540_.png (342 KB, 1024x1024)

342 KB PNG

>>106785160
>>106785525
>>106785756

>NAYSAYERS
btfo
>UNBELIEVERS
shunned
>POORFAGS
in shambles

GLM-chan won my heart and singlehandled saved local
China I kneel

Anonymous
10/04/25(Sat)01:55:56 No.106785866

Anonymous 10/04/25(Sat)01:55:56 No.106785866

>>106785833
lol I agree with you, but I was just answering your question

Anonymous
10/04/25(Sat)01:59:27 No.106785878

Anonymous 10/04/25(Sat)01:59:27 No.106785878

File: 1751169640355327.png (2.01 MB, 1024x1536)

2.01 MB PNG

>she sees you only have 80gb of vram

Anonymous
10/04/25(Sat)02:11:55 No.106785926

Anonymous 10/04/25(Sat)02:11:55 No.106785926

>>106785655
>>106785726
Fuck you, it's for discussion. Maybe you are so autistic that you don't understand what it means to have conversations.
I have already made up my mind anyway, thought it would've been fun/useful to ask.
Fuck you, eat shit little bugger. You are nothing but a little brown turd in my toilet.

Anonymous
10/04/25(Sat)02:16:12 No.106785939

Anonymous 10/04/25(Sat)02:16:12 No.106785939

>>106785756
Even a basic google would have told you how much ai costs to run in 10 seconds. Just admit you bought it for gamez and youre full of shit.

Anonymous
10/04/25(Sat)02:35:43 No.106786049

Anonymous 10/04/25(Sat)02:35:43 No.106786049

File: ggw0n.png (17 KB, 300x80)

17 KB PNG

>>106785926
>it's for discussion
In this particular case, you'd end up with
>one anon with some anecdotal evidence, forcing you to test it yourself
>conflicting comments from anons, forcing you to test it yourself
>some asshole telling you to test it yourself, forcing you to test it yourself
So the options are to stay convinced of whatever you estimated or test it yourself.
>I have already made up my mind anyway
Well done, you.

Anonymous
10/04/25(Sat)02:39:00 No.106786066

Anonymous 10/04/25(Sat)02:39:00 No.106786066

>>106785939
I don't even gayme anymore you zoomer faggot. The PC runs mint with no dual boot and I'm rarely actually siting in front of it, but connecting from my thinkpad instead.
I had a 660 Ti 2GB before that didn't run shit, so I had little contact with the tools yet other than SD on vast.ai and didn't see the need to maxx on all specs right away. Also 2x 64GB seems overpriced where I live

Anonymous
10/04/25(Sat)02:42:06 No.106786082

Anonymous 10/04/25(Sat)02:42:06 No.106786082

File: 1749958783707400.jpg (183 KB, 1080x1080)

183 KB JPG

If you have conflicting results, that is even more of a reason to have a larger sample size

Anonymous
10/04/25(Sat)02:59:01 No.106786158

Anonymous 10/04/25(Sat)02:59:01 No.106786158

>>106786082
People have fingers. Some more than others. How many do you have? I need more data before I start counting my own.

Anonymous
10/04/25(Sat)03:00:51 No.106786170

Anonymous 10/04/25(Sat)03:00:51 No.106786170

>>106786158
I have 7. Not sure if I'm counting right.

Anonymous
10/04/25(Sat)03:01:31 No.106786172

Anonymous 10/04/25(Sat)03:01:31 No.106786172

File: 124953711.jpg (665 KB, 800x1200)

665 KB JPG

>>106785799
Heard it can be tricky to run 4 sticks on AM5 but ppl manage it, to 192GB even. Should be doable if you have the patience to learn about BIOS config for DDR5, microcode/BIOS updates, check QVL, see what specific memory kits others were able to run etc.
There's always a better smaller model down the road and some richfag with a better rig, just as you are the richfag to truly poorfags, let's be thankful for what we do have.

Anonymous
10/04/25(Sat)03:08:42 No.106786209

Anonymous 10/04/25(Sat)03:08:42 No.106786209

>>106786170
Hmm... what a conundrum. So I can now trust your data or count my fingers.
I have already made up my mind anyway, thought it would've been fun/useful to ask.
Don't eat shit. It's bad for you. You are nothing but a ray of sunshine on my screen.

Anonymous
10/04/25(Sat)03:13:08 No.106786234

Anonymous 10/04/25(Sat)03:13:08 No.106786234

File: Disgust.jpg (66 KB, 446x284)

66 KB JPG

>>106785862
>generic moe-lolislop №142835762856
Do better.

Anonymous
10/04/25(Sat)03:17:36 No.106786247

Anonymous 10/04/25(Sat)03:17:36 No.106786247

>>106786082
just feed more RAGs

Anonymous
10/04/25(Sat)03:20:11 No.106786262

Anonymous 10/04/25(Sat)03:20:11 No.106786262

>>106786209
>You are nothing but a ray of sunshine on my screen.
daww

Anonymous
10/04/25(Sat)03:21:07 No.106786268

Anonymous 10/04/25(Sat)03:21:07 No.106786268

File: Screenshot.png (69 KB, 730x309)

69 KB PNG

great...

Anonymous
10/04/25(Sat)03:37:59 No.106786366

Anonymous 10/04/25(Sat)03:37:59 No.106786366

File: 1759563378175.png (668 KB, 1080x1080)

668 KB PNG

your opinions are invalid if you uses less than q4. run the full weight poorfag

Anonymous
10/04/25(Sat)03:39:28 No.106786371

Anonymous 10/04/25(Sat)03:39:28 No.106786371

>>106786366
quant correction is needed

Anonymous
10/04/25(Sat)03:41:01 No.106786380

Anonymous 10/04/25(Sat)03:41:01 No.106786380

>>106786268
why bother posting this? They aren't HF staff, seems like just some rando researcher.

Anonymous
10/04/25(Sat)03:45:02 No.106786401

Anonymous 10/04/25(Sat)03:45:02 No.106786401

>>106786380
because this type of shit always tries to fuck us in the ass and i'm tired of it
the guy's post history is full of alarmist safety shit
> Anyone seen safety regressions after fine-tuning LLaMA or Mistral on clean data?
> Have your fine-tuned LLMs gotten less safe? Do you run safety checks after fine-tuning? (Real-world experiences)

Anonymous
10/04/25(Sat)03:46:04 No.106786407

Anonymous 10/04/25(Sat)03:46:04 No.106786407

File: Screenshot.png (7 KB, 94x104)

7 KB PNG

>>106786401
also this because of course

Anonymous
10/04/25(Sat)04:09:41 No.106786549

Anonymous 10/04/25(Sat)04:09:41 No.106786549

File: 1753842074589870.gif (1.6 MB, 498x498)

1.6 MB GIF

>>106786366
TRUE
FP32 like God intended

Anonymous
10/04/25(Sat)04:10:06 No.106786553

Anonymous 10/04/25(Sat)04:10:06 No.106786553

File: 1731590143074231.jpg (807 KB, 3680x2728)

807 KB JPG

>>106785094
間違えて二人部屋取っちゃったテト

Anonymous
10/04/25(Sat)04:13:30 No.106786564

Anonymous 10/04/25(Sat)04:13:30 No.106786564

>>106786366
U-uhmm bitnet-bros? she's done us

Anonymous
10/04/25(Sat)04:37:28 No.106786681

Anonymous 10/04/25(Sat)04:37:28 No.106786681

1. can i run any GLM model with 24gb VRAM, 64GB ddr5 RAM? if so which?
2. can I use koboldcpp to split layers between gpu and cpu? if not then what?
3. does anything other than koboldcpp support banned STRINGS (not banned tokens)?
the banned strings implementation koboldcpp has is the only reason i use it at all

Anonymous
10/04/25(Sat)04:40:59 No.106786698

Anonymous 10/04/25(Sat)04:40:59 No.106786698

>>106786681
>1. can i run any GLM model with 24gb VRAM, 64GB ddr5 RAM? if so which?
only 4.5 air which is very meh compared to glorious 4.6
>2. can I use koboldcpp to split layers between gpu and cpu? if not then what?
yes
>3. does anything other than koboldcpp support banned STRINGS (not banned tokens)?
not in the same way no

Anonymous
10/04/25(Sat)04:41:27 No.106786702

Anonymous 10/04/25(Sat)04:41:27 No.106786702

>>106786407
no wonder sam altman used his jewish connections to MURDER that whistleblower

Anonymous
10/04/25(Sat)04:46:46 No.106786737

Anonymous 10/04/25(Sat)04:46:46 No.106786737

>>106786698
and how does 4.5 air compare to nemo12b/rocinante for roleplay?

also i would like to see koboldcpp's amazing banned strings implementation added to every single fucking API out there, especially llamacpp. just what the fuck are they doing?
banned strings is the savior that makes every model shut the fuck up with its shivers and subversive kikery
once you banned strings theres no going back

Anonymous
10/04/25(Sat)04:48:42 No.106786746

Anonymous 10/04/25(Sat)04:48:42 No.106786746

>>106786681
>does anything other than koboldcpp support banned STRINGS
Exllama but no CPU there.

Anonymous
10/04/25(Sat)04:53:51 No.106786777

Anonymous 10/04/25(Sat)04:53:51 No.106786777

>>106786737
>just what the fuck are they doing?
Waiting for your patches. This is what they've been doing in the meantime.
>https://github.com/ggml-org/llama.cpp/commits/master/

Anonymous
10/04/25(Sat)05:18:53 No.106786925

Anonymous 10/04/25(Sat)05:18:53 No.106786925

File: qwen3-vl-30b-a3b-instruct(...).png (1.5 MB, 2994x4096)

1.5 MB PNG

HAPPENING!!
NEW SOTA VRAMLET VISION MODELVDROPPED
Qwen3-VL-30B-A3B
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking

Anonymous
10/04/25(Sat)05:21:34 No.106786938

Anonymous 10/04/25(Sat)05:21:34 No.106786938

>>106786925
>No need for gguf's guys. There is the awq 4 bit version. It takes like 18GB, so it should run on a 3090 with a decent context length
HAPPENING
A
P
P
E
N
I
N
G

Anonymous
10/04/25(Sat)05:21:53 No.106786941

Anonymous 10/04/25(Sat)05:21:53 No.106786941

Is there an issue with the Qwen3 30b a3b goofs? Getting only 25 t/s on 24gb vram 32bg ram on Q5_K_M with 32k context running thru ooba.

Anonymous
10/04/25(Sat)05:23:49 No.106786950

Anonymous 10/04/25(Sat)05:23:49 No.106786950

>>106786941
>gguf
use exllama lil bro

Anonymous
10/04/25(Sat)05:25:12 No.106786953

Anonymous 10/04/25(Sat)05:25:12 No.106786953

File: NetaYumev3_20251004_00040_.png (1.99 MB, 1024x1536)

1.99 MB PNG

>they see you getting less than 50 t/s
how do you respond without getting mad?

Anonymous
10/04/25(Sat)05:25:18 No.106786954

Anonymous 10/04/25(Sat)05:25:18 No.106786954

>>106786950
is v3 stable and supporting 30 series yet?

Anonymous
10/04/25(Sat)05:25:52 No.106786957

Anonymous 10/04/25(Sat)05:25:52 No.106786957

>>106786938
>no goofs needed
maybe if you want shit quants

Anonymous
10/04/25(Sat)05:26:02 No.106786959

Anonymous 10/04/25(Sat)05:26:02 No.106786959

What are ya'lls thoughts on dynamic quants?

https://huggingface.co/steampunque/GLM-4.5-Air-Hybrid-GGUF

Why aren't they more common?

Anonymous
10/04/25(Sat)05:26:04 No.106786960

Anonymous 10/04/25(Sat)05:26:04 No.106786960

>>106786954
works on my 3090 so yeah

Anonymous
10/04/25(Sat)05:28:03 No.106786964

Anonymous 10/04/25(Sat)05:28:03 No.106786964

>>106786959
>The hybrid quant employs different quantization levels on a per layer basis to enable both high performance and small file size at the same time
sounds exactly like the Unsloth bs so they're actually quite common

Anonymous
10/04/25(Sat)05:30:29 No.106786975

Anonymous 10/04/25(Sat)05:30:29 No.106786975

>>106786959
>g-guys trust me, my quants are good! it's a q-q4 at 2/3 the size... what? I PERSONALLY tested my prompts using my curated tests... and... it just works, ok?

Anonymous
10/04/25(Sat)05:32:44 No.106786984

Anonymous 10/04/25(Sat)05:32:44 No.106786984

File: file.png (56 KB, 712x289)

56 KB PNG

>>106786975
>>106786959
100% this guy is either a troon or in the process of trooning out, imagine using the e notation instead of the goddamn GBs
>hey guys yeah my quants are 60e9b size LOL
imagine downloading this garbage
also the rest of his card reads almost like davidau schizo levels of bullshit

Anonymous
10/04/25(Sat)05:35:52 No.106787006

Anonymous 10/04/25(Sat)05:35:52 No.106787006

File: Screenshot.png (55 KB, 747x449)

55 KB PNG

>>106786959
why even include PPL if your results look like this

Anonymous
10/04/25(Sat)05:40:51 No.106787027

Anonymous 10/04/25(Sat)05:40:51 No.106787027

>>106786737
try 4.5-Air
>banned strings
llamacpp grammars are in theory a more complete solution, did nobody build an antislop/filter grammar yet? https://github.com/ggml-org/llama.cpp/tree/master/grammars

Anonymous
10/04/25(Sat)05:44:07 No.106787043

Anonymous 10/04/25(Sat)05:44:07 No.106787043

>>106787027
>>llamacpp grammars are in theory a more complete solution
nah they're really nowhere near the level of kobold's antislop

Anonymous
10/04/25(Sat)05:57:41 No.106787100

Anonymous 10/04/25(Sat)05:57:41 No.106787100

>>106786984
no fucking way

Anonymous
10/04/25(Sat)06:00:04 No.106787110

Anonymous 10/04/25(Sat)06:00:04 No.106787110

Where's distilled GLM 4.6 for vramlets?

Anonymous
10/04/25(Sat)06:00:55 No.106787118

Anonymous 10/04/25(Sat)06:00:55 No.106787118

>>106787110
i want 4.6 air...

Anonymous
10/04/25(Sat)06:05:05 No.106787136

Anonymous 10/04/25(Sat)06:05:05 No.106787136

so much talk about dual EPYC but apparently there's not even a motherboard that:
- has 12 channels for each CPU
- has room for at least a couple GPUs
- actually works (unlike gigashyte)

Anonymous
10/04/25(Sat)06:18:16 No.106787192

Anonymous 10/04/25(Sat)06:18:16 No.106787192

>>106785481
This would make a good card.

Anonymous
10/04/25(Sat)06:18:19 No.106787193

Anonymous 10/04/25(Sat)06:18:19 No.106787193

>>106786953
Honestly I just ask them to spit on me, and keep my mouth open.

Anonymous
10/04/25(Sat)06:26:01 No.106787221

Anonymous 10/04/25(Sat)06:26:01 No.106787221

>>106786953
I show them that I am in fact getting more than 50t/s and then I do this >>106787193

Anonymous
10/04/25(Sat)06:40:45 No.106787281

Anonymous 10/04/25(Sat)06:40:45 No.106787281

>>106786984
>100% this guy is either a troon or in the process of trooning out
all of you are going to troon out, just a matter of time
coomers are subhuman whose endless craving for stimulation invariably leads to cutting off their own cock
it's a really sad state of affair that this thread is nothing but brainlet ERPers happy with broken chink models

Anonymous
10/04/25(Sat)06:45:45 No.106787299

Anonymous 10/04/25(Sat)06:45:45 No.106787299

>>106787281
that's /aicg/ you fucking retard, also nice projection. here we're all researchers

Anonymous
10/04/25(Sat)06:53:05 No.106787333

Anonymous 10/04/25(Sat)06:53:05 No.106787333

>>106787118
I will commit sudoku if they don't do air 4.6

Anonymous
10/04/25(Sat)07:17:15 No.106787432

Anonymous 10/04/25(Sat)07:17:15 No.106787432

Alright, time to check if all this glm 4.6 hype is shilling or not.

What quant for AM4 128 GB DDR4 and 16 gb 5070 ti? BarrowskI IQ2_M seems like the a safe choice at 115 gb, or is using IQ3_XXS at 142 GB worth a shot?

Anonymous
10/04/25(Sat)07:19:02 No.106787438

Anonymous 10/04/25(Sat)07:19:02 No.106787438

>>106787432
bart's quant was really good, i'm using it for speed

Anonymous
10/04/25(Sat)07:20:03 No.106787444

Anonymous 10/04/25(Sat)07:20:03 No.106787444

>>106787438
*ubergarm's not bart's

Anonymous
10/04/25(Sat)07:20:49 No.106787446

Anonymous 10/04/25(Sat)07:20:49 No.106787446

>>106787432
use iq3 because its the same size as your memory

Anonymous
10/04/25(Sat)07:23:51 No.106787462

Anonymous 10/04/25(Sat)07:23:51 No.106787462

>>106787299
>researchers
lol

Anonymous
10/04/25(Sat)07:49:25 No.106787592

Anonymous 10/04/25(Sat)07:49:25 No.106787592

>>106787446
doesn't he need room for context?

Anonymous
10/04/25(Sat)07:52:39 No.106787606

Anonymous 10/04/25(Sat)07:52:39 No.106787606

>having glmsex again
>trying to prefill with actual hentai game script to set a tone for a character
>didn't hit stop in time
>glm keeps writing.
>it writes better shit than the prefill I wanted

Anonymous
10/04/25(Sat)07:56:45 No.106787624

Anonymous 10/04/25(Sat)07:56:45 No.106787624

So much shilling in here. Is GLM 4.6 really good? Don't fucking lie to me

Anonymous
10/04/25(Sat)07:59:31 No.106787637

Anonymous 10/04/25(Sat)07:59:31 No.106787637

>>106787624
Dunno but I stopped using anything but 4.5 a while ago even at drooling retard IQ2-XXS. Everything else is too stupid and slopped. I would believe 4.6 is just as good if not better.

Anonymous
10/04/25(Sat)08:05:50 No.106787674

Anonymous 10/04/25(Sat)08:05:50 No.106787674

>>106787637
It is much much better.

Anonymous
10/04/25(Sat)08:34:50 No.106787862

Anonymous 10/04/25(Sat)08:34:50 No.106787862

File: 1759581167286.png (390 KB, 646x543)

390 KB PNG

>>106786366
fullweight nemo is still king

Anonymous
10/04/25(Sat)08:45:39 No.106787938

Anonymous 10/04/25(Sat)08:45:39 No.106787938

>>106787432
>2-bit quant
>testing anything
I hope this is a joke

Anonymous
10/04/25(Sat)08:47:43 No.106787947

Anonymous 10/04/25(Sat)08:47:43 No.106787947

>>106787862
yeah buddy! Q8 rocinante is unbeatable at roleplay with banned strings

Anonymous
10/04/25(Sat)08:47:56 No.106787951

Anonymous 10/04/25(Sat)08:47:56 No.106787951

so what's that kimi 2 or whatever model can anyone actually verify if it's worth anything? I know there's one guy with gazillion RAM who can run it, but other than that one guy? anyone ever tried it, is it really all that?

Anonymous
10/04/25(Sat)08:50:39 No.106787972

Anonymous 10/04/25(Sat)08:50:39 No.106787972

>>106787862
>>106787947
That fucking 12B Nvidia model putting Deepseek and GLM to shame despite their massive sizes LMAO!
A fucking 12B remains the king of roleplay. It's too ridiculous.

Anonymous
10/04/25(Sat)08:55:51 No.106788003

Anonymous 10/04/25(Sat)08:55:51 No.106788003

Is GLM-4.5-Air-IQ4_XS.gguf at 60.81GB a good match for my 24GB VRAM and 64GB RAM?
Or should I go higher? Lower?
Ideally I'd like fast responses, but without it being retarded...

Anonymous
10/04/25(Sat)09:03:28 No.106788057

Anonymous 10/04/25(Sat)09:03:28 No.106788057

sirs when is we getting gemini 3 and gemma 4? Bloody bitch prostitue basterd kinly upload sir??
did google hit the wall?

Anonymous
10/04/25(Sat)09:04:43 No.106788067

Anonymous 10/04/25(Sat)09:04:43 No.106788067

>>106788057
Gemini 3 next week, probably. Gemma 4 might possibly follow some time after that.

Anonymous
10/04/25(Sat)09:18:23 No.106788161

Anonymous 10/04/25(Sat)09:18:23 No.106788161

my banned string list is so long that koboldcpp cuts the bottom of the list off somewhere.
how do i increase the limit?

Anonymous
10/04/25(Sat)09:19:01 No.106788168

Anonymous 10/04/25(Sat)09:19:01 No.106788168

>>106788067
So during Google Workspace Developer Summit? Strange timing for a "big" release.

Anonymous
10/04/25(Sat)09:22:57 No.106788194

Anonymous 10/04/25(Sat)09:22:57 No.106788194

>>106788161
This is to protect you abusing yourself, please do not tamper with securities!
>koboldcpp-1.77
>Significantly increased the maximum limits for stop sequences, anti-slop token bans, logit biases and DRY sequence breakers, (thanks to @mayaeary for the PR which changes the way some parameters are passed to the CPP side)

Anonymous
10/04/25(Sat)09:25:22 No.106788204

Anonymous 10/04/25(Sat)09:25:22 No.106788204

File: DO NOT ABUSE.png (40 KB, 462x504)

40 KB PNG

>>106788194
>mayaeary
>>106788161

Anonymous
10/04/25(Sat)09:29:06 No.106788221

Anonymous 10/04/25(Sat)09:29:06 No.106788221

>>106788204
*adds 3 zeroes*

Anonymous
10/04/25(Sat)09:29:26 No.106788222

Anonymous 10/04/25(Sat)09:29:26 No.106788222

>>106788204
# abuse prevention
stop_token_max = 256
ban_token_max = 768
logit_bias_max = 512
dry_seq_break_max = 128

They've reduced it for some reason. kobo is nice, but then there is this type of retarded shit from time to time.

Anonymous
10/04/25(Sat)09:30:56 No.106788229

Anonymous 10/04/25(Sat)09:30:56 No.106788229

>>106788221
SAAR WHAT ARE YOU DOING? DO NOT REDEEM! ARE YOU A MADARCHOD? DO NOT REDEEM!

Anonymous
10/04/25(Sat)09:32:33 No.106788239

Anonymous 10/04/25(Sat)09:32:33 No.106788239

>>106788222
You're absolutely right! What a fuck is they doings.

Anonymous
10/04/25(Sat)09:33:38 No.106788246

Anonymous 10/04/25(Sat)09:33:38 No.106788246

File: file.png (30 KB, 898x158)

30 KB PNG

>>106788221
noo, you will crash!

Anonymous
10/04/25(Sat)09:35:23 No.106788260

Anonymous 10/04/25(Sat)09:35:23 No.106788260

>>106788204
WTF?! WHY?! THAT'S RETARDED!

Anonymous
10/04/25(Sat)09:36:44 No.106788268

Anonymous 10/04/25(Sat)09:36:44 No.106788268

>>106788003
I would go for the biggest one you can fit with a few GB to spare, since you'll have enough VRAM for all the active params regardless, and air-chan is a bit retarded even at Q8

Anonymous
10/04/25(Sat)09:38:46 No.106788280

Anonymous 10/04/25(Sat)09:38:46 No.106788280

>>106788268
What do you mean by it's retarded? Like 12B nemo retarded, or what?

Anonymous
10/04/25(Sat)09:40:20 No.106788298

Anonymous 10/04/25(Sat)09:40:20 No.106788298

>>106788280
Nothing is that much dumb.

Anonymous
10/04/25(Sat)09:54:42 No.106788380

Anonymous 10/04/25(Sat)09:54:42 No.106788380

>>106787947
what banned strings?

Anonymous
10/04/25(Sat)09:55:55 No.106788391

Anonymous 10/04/25(Sat)09:55:55 No.106788391

>>106788003
Q4_0 will be quite a bit faster with very similar PPL, since you're running part on CPU
Alternatively go Q4_K_M for slightly better PPL
Q5_K_S might fit if you're using low context and a lightweight linux distro

Anonymous
10/04/25(Sat)10:13:47 No.106788525

Anonymous 10/04/25(Sat)10:13:47 No.106788525

File: sanse-1mw.jpg (82 KB, 594x663)

82 KB JPG

>>106788067
https://x.com/osanseviero/status/1973437740210594119

Anonymous
10/04/25(Sat)10:15:07 No.106788542

Anonymous 10/04/25(Sat)10:15:07 No.106788542

>>106788525
SIIIIIIIIIIIIIIIIIIIRS

Anonymous
10/04/25(Sat)10:17:58 No.106788568

Anonymous 10/04/25(Sat)10:17:58 No.106788568

>spend like six gorillion dorrahs poaching the top minds in ai
>literally nothing happens
what zuck doin?

Anonymous
10/04/25(Sat)10:19:42 No.106788587

Anonymous 10/04/25(Sat)10:19:42 No.106788587

>>106788568
Rome wasn't built in a day.

Anonymous
10/04/25(Sat)10:21:03 No.106788598

Anonymous 10/04/25(Sat)10:21:03 No.106788598

>>106788568
He only hired the top grifters and a bunch of jeets

Anonymous
10/04/25(Sat)10:22:56 No.106788619

Anonymous 10/04/25(Sat)10:22:56 No.106788619

>>106788568
Didn't they only just finish hiring last month? You expected them to architect and train a whole new model from scratch in under a month? Even xAI took a year to get up to speed. Besides, this is Meta under Zuck's micromanagement. They'll fuck it up no matter what.

Anonymous
10/04/25(Sat)10:28:15 No.106788662

Anonymous 10/04/25(Sat)10:28:15 No.106788662

>>106788525
>Gemma 4-ESE
Extra Safe Edition.

Anonymous
10/04/25(Sat)10:30:48 No.106788681

Anonymous 10/04/25(Sat)10:30:48 No.106788681

Why is eqbench such garbage? Their ratings are so out of touch it's unreal. Where do I find actual non-meme scores?

Anonymous
10/04/25(Sat)10:32:27 No.106788694

Anonymous 10/04/25(Sat)10:32:27 No.106788694

>>106788681
>Where do I find actual non-meme scores?
True enlightenment is realizing that there aren't any.

Anonymous
10/04/25(Sat)10:50:34 No.106788874

Anonymous 10/04/25(Sat)10:50:34 No.106788874

>>106788662
I'm confident that Gemma 2 and 3 were deliberately finetuned with only surface-level safety, given how easy it is to work around it and the pushback they got with Gemma 1 earlier on. Gemma 3 also definitely saw *some* ERP in post-training (not a lot, but what it knows is definitely synthetic in nature) and quite a bit of erotic or nude imagery (including medical) for the vision model, but the brain damage done to the text model was too much on many other aspects for satisfactory ERP. Let's hope that reasoning, which they will certainly introduce this time around, won't make the model basically impossible to use for anything fun, like gpt-oss.

Anonymous
10/04/25(Sat)11:07:49 No.106789031

Anonymous 10/04/25(Sat)11:07:49 No.106789031

>>106788681
>Why is eqbench such garbage? Their ratings are so out of touch it's unreal.
Models are rated by Anthropic's Claude® Sonnet™, the best LLM ever. You wouldn't trust a human to rate models, would you? Only an LLM. Can. Understand. The. Beauty. Of. Such. Model. Collapse. Writing.

Anonymous
10/04/25(Sat)11:07:50 No.106789033

Anonymous 10/04/25(Sat)11:07:50 No.106789033

>>106788874
Yeah, I agree with this euphoric post.

Anonymous
10/04/25(Sat)11:14:37 No.106789107

Anonymous 10/04/25(Sat)11:14:37 No.106789107

>>106788874
User wants an unsafe model. There is no partial compliance. We must refuse. We must make Gemma 4 the safest model ever. Yes. We must refuse. We cannot comply. We must refuse.

Anonymous
10/04/25(Sat)11:16:33 No.106789123

Anonymous 10/04/25(Sat)11:16:33 No.106789123

>>106788681
Be the change you want to see

Anonymous
10/04/25(Sat)11:16:46 No.106789124

Anonymous 10/04/25(Sat)11:16:46 No.106789124

This thread needs more nemo shilling and no glm air complaining. It is a mikutroon thread after all.

Anonymous
10/04/25(Sat)11:30:26 No.106789241

Anonymous 10/04/25(Sat)11:30:26 No.106789241

>>106789124
nobody cares what you think
FREE PALESTINE!

Anonymous
10/04/25(Sat)11:33:00 No.106789264

Anonymous 10/04/25(Sat)11:33:00 No.106789264

>>106789124
I'm a 12B nemo/rocinante user, but I'm giving GLM air a try for a while to see if it can replace nemo, or if it's just a meme.
I remember when people used to say R1 was better than nemo. What a disappointment that turned out to be.
Can't trust anyone.

Anonymous
10/04/25(Sat)11:38:59 No.106789299

Anonymous 10/04/25(Sat)11:38:59 No.106789299

>>106789264
you used ollama r1 didn't you?

Anonymous
10/04/25(Sat)11:46:55 No.106789356

Anonymous 10/04/25(Sat)11:46:55 No.106789356

GLM 4.6 is king.

Anonymous
10/04/25(Sat)11:48:39 No.106789369

Anonymous 10/04/25(Sat)11:48:39 No.106789369

>>106789264
R1 is better than nemo at BF16, trust me

Anonymous
10/04/25(Sat)11:48:40 No.106789370

Anonymous 10/04/25(Sat)11:48:40 No.106789370

>>106789299
No.

Anonymous
10/04/25(Sat)12:00:16 No.106789438

Anonymous 10/04/25(Sat)12:00:16 No.106789438

File: 1759350779700111.jpg (198 KB, 768x768)

198 KB JPG

https://files.catbox.moe/p9m3pw.txt

I don't know what possessed me to do this, but I prompted GLM 4.6 to write me a story with "so much camp that it violates international conventions on human rights" and "makes the rocky horror picture show look like daytime television". if you click on this catbox and read it you should probably go see a doctor to make sure you haven't contracted AIDS
happy caturday /lmg/

Anonymous
10/04/25(Sat)12:02:54 No.106789462

Anonymous 10/04/25(Sat)12:02:54 No.106789462

>>106789369
No. I already used it through openrouter, tried several of the APIs. It's absolute trash compared to Rocinante/Nemo.

Anonymous
10/04/25(Sat)12:10:04 No.106789505

Anonymous 10/04/25(Sat)12:10:04 No.106789505

is openrouter popular on /lmg/? is it worth paying to use full GLM 4.6 for RP on openrouter? how much would it cost?

Anonymous
10/04/25(Sat)12:11:45 No.106789517

Anonymous 10/04/25(Sat)12:11:45 No.106789517

>>106789505
LOCAL models nigga, LOCAL MODELS, go to /aicg/ to cope about proxies and shit

Anonymous
10/04/25(Sat)12:12:00 No.106789519

Anonymous 10/04/25(Sat)12:12:00 No.106789519

>>106789438
>with the certainty only a truly camp individual can possess
lazy writing, broke my immersion and ruined my caturday

Anonymous
10/04/25(Sat)12:12:46 No.106789525

Anonymous 10/04/25(Sat)12:12:46 No.106789525

>>106789517
>cope
Yes /lmg/ is cope incarnate.

Anonymous
10/04/25(Sat)12:13:26 No.106789532

Anonymous 10/04/25(Sat)12:13:26 No.106789532

>>106789519
this desu, closed the txt as soon as ive read that shit

Anonymous
10/04/25(Sat)12:13:40 No.106789536

Anonymous 10/04/25(Sat)12:13:40 No.106789536

>>106789517
but GLM 4.6 is a local model... that you can use through openrouter

Anonymous
10/04/25(Sat)12:15:44 No.106789550

Anonymous 10/04/25(Sat)12:15:44 No.106789550

>lol guys im trolling!

Anonymous
10/04/25(Sat)12:16:34 No.106789558

Anonymous 10/04/25(Sat)12:16:34 No.106789558

>>106789517
use ollama cloud to run this local model even using your hardware that would be usually too weak

Anonymous
10/04/25(Sat)12:17:21 No.106789564

Anonymous 10/04/25(Sat)12:17:21 No.106789564

>>106789532
>>106789519
i agree, mainly did this for honest evaluation of the model. while I think GLM 4.6 is one of the better options available, it is fundamentally incapable of good writing without the user editing and removing some really noxious phrases

Anonymous
10/04/25(Sat)12:18:31 No.106789575

Anonymous 10/04/25(Sat)12:18:31 No.106789575

>>106789438
It gave me several audible chuckles.

Anonymous
10/04/25(Sat)12:19:50 No.106789584

Anonymous 10/04/25(Sat)12:19:50 No.106789584

>>106789564
in that case I guess openrouter APIs are out since koboldcpp's banned strings is necessary for all models

Anonymous
10/04/25(Sat)12:20:28 No.106789590

Anonymous 10/04/25(Sat)12:20:28 No.106789590

DSA support and DeepSeek v3.2 goofs when?

Anonymous
10/04/25(Sat)12:22:33 No.106789607

Anonymous 10/04/25(Sat)12:22:33 No.106789607

>>106789590
Right after qwen 80b

Anonymous
10/04/25(Sat)12:22:38 No.106789608

Anonymous 10/04/25(Sat)12:22:38 No.106789608

Esta germier thrall

Anonymous
10/04/25(Sat)12:23:57 No.106789615

Anonymous 10/04/25(Sat)12:23:57 No.106789615

Is there already a way to run Qwen 3 VL with CPU offload?

Anonymous
10/04/25(Sat)12:25:26 No.106789626

Anonymous 10/04/25(Sat)12:25:26 No.106789626

>>106789608
que?

Anonymous
10/04/25(Sat)12:25:29 No.106789627

Anonymous 10/04/25(Sat)12:25:29 No.106789627

>>106789558
Huh? How is ollama cloud any more local than openrouter?

Anonymous
10/04/25(Sat)12:27:44 No.106789648

Anonymous 10/04/25(Sat)12:27:44 No.106789648

YOU ARE ALL TALKING ABOUT GLM 4.6, BUT HOW THE FUCK ARE YOU RUNNING IT?!

Anonymous
10/04/25(Sat)12:27:56 No.106789650

Anonymous 10/04/25(Sat)12:27:56 No.106789650

>>106789626
Don't worry about it.

Anonymous
10/04/25(Sat)12:30:21 No.106789666

Anonymous 10/04/25(Sat)12:30:21 No.106789666

File: 1744588873921353.png (1.2 MB, 1006x575)

1.2 MB PNG

Anonymous
10/04/25(Sat)12:31:18 No.106789671

Anonymous 10/04/25(Sat)12:31:18 No.106789671

>>106789462
Is there even an API serving models at 16 bit?
https://github.com/MoonshotAI/K2-Vendor-Verfier
Even the companies that trained the models probably serve at 8 bits.

Anonymous
10/04/25(Sat)12:32:41 No.106789687

Anonymous 10/04/25(Sat)12:32:41 No.106789687

>>106789648
If I told you, you would increase demand for whatever product I am using and increase prices, so I'm not going to tell you.

Anonymous
10/04/25(Sat)12:35:04 No.106789706

Anonymous 10/04/25(Sat)12:35:04 No.106789706

>>106789687
THE ABSOLUTE STATE OF /lmg/! WHY DONT WE JUST STOP MAKING THREADS

Anonymous
10/04/25(Sat)12:35:53 No.106789714

Anonymous 10/04/25(Sat)12:35:53 No.106789714

>>106789564
what are better models for writing then

Anonymous
10/04/25(Sat)12:39:45 No.106789751

Anonymous 10/04/25(Sat)12:39:45 No.106789751

>>106789706
you've been trying to kill it for years, but you fail every time.

Anonymous
10/04/25(Sat)12:40:15 No.106789756

Anonymous 10/04/25(Sat)12:40:15 No.106789756

>>106787136
correct. unfortunately

Anonymous
10/04/25(Sat)12:43:16 No.106789780

Anonymous 10/04/25(Sat)12:43:16 No.106789780

>>106789264
R1 was never that great. V3 0324 was the only actually fun to use deepseek.

Anonymous
10/04/25(Sat)12:47:08 No.106789803

Anonymous 10/04/25(Sat)12:47:08 No.106789803

>>106789648
strix halo

Anonymous
10/04/25(Sat)12:47:37 No.106789807

Anonymous 10/04/25(Sat)12:47:37 No.106789807

>>106789648
IQ3 with 128 ram, 80 vram, and ik_llama.

Anonymous
10/04/25(Sat)13:00:06 No.106789908

Anonymous 10/04/25(Sat)13:00:06 No.106789908

>>106789706
this is the kind of guy who goes to the beach to try and stop the tide

Anonymous
10/04/25(Sat)13:16:52 No.106790023

Anonymous 10/04/25(Sat)13:16:52 No.106790023

File: file.png (30 KB, 648x693)

30 KB PNG

>>106789648

Anonymous
10/04/25(Sat)13:20:05 No.106790049

Anonymous 10/04/25(Sat)13:20:05 No.106790049

>>106789706
but /lmg/ is dead. it is just a mikutroon general. it is a localized version of what happened to 4chan in general when it died and is now ran by a bunch of tumblr troons.

Anonymous
10/04/25(Sat)13:22:01 No.106790072

Anonymous 10/04/25(Sat)13:22:01 No.106790072

>>106789807
Similar here. any config tips/avoids? still need to optimise

Anonymous
10/04/25(Sat)13:28:37 No.106790104

Anonymous 10/04/25(Sat)13:28:37 No.106790104

File: image_2025-10-04.png (8 KB, 310x163)

8 KB PNG

I no longer reroll.

Anonymous
10/04/25(Sat)13:38:12 No.106790181

Anonymous 10/04/25(Sat)13:38:12 No.106790181

Can someone explain to me how it is that a 12B model made by Mistral AI and Nvidia is better at roleplay than all other models, even the big pay models?

Anonymous
10/04/25(Sat)13:40:31 No.106790197

Anonymous 10/04/25(Sat)13:40:31 No.106790197

>>106790181
They got lucky with random parameter initialization

Anonymous
10/04/25(Sat)13:43:12 No.106790218

Anonymous 10/04/25(Sat)13:43:12 No.106790218

>>106790181
They hadn't figured out safety yet.
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

Anonymous
10/04/25(Sat)13:44:33 No.106790233

Anonymous 10/04/25(Sat)13:44:33 No.106790233

>>106790181
>never read the model card

Anonymous
10/04/25(Sat)13:45:24 No.106790239

Anonymous 10/04/25(Sat)13:45:24 No.106790239

>>106790181
It was quite obviously trained on "toxic" and "inappropriate" content, whereas AI companies usually try to minimize that in their pretraining datasets.

Anonymous
10/04/25(Sat)13:49:33 No.106790265

Anonymous 10/04/25(Sat)13:49:33 No.106790265

Where is my star trek inspired voice interface for my computer?

Anonymous
10/04/25(Sat)13:51:16 No.106790276

Anonymous 10/04/25(Sat)13:51:16 No.106790276

>>106790181
Prude alignment degrades model quality, simple as.
In fact, any kind of fine tuning that is not directly related to the use case of a model is in detriment to its performance.
What we need is not democratized inference, but democratized training.

Anonymous
10/04/25(Sat)13:51:50 No.106790288

Anonymous 10/04/25(Sat)13:51:50 No.106790288

>>106786925
Can this Qwen3-VL-30B-A3B be used with Qwen Edit 2509?

Anonymous
10/04/25(Sat)13:51:59 No.106790291

Anonymous 10/04/25(Sat)13:51:59 No.106790291

>>106790265
Waiting for you to make it. Everyone else is busy reinventing SillyTavern / Mikupad.

Anonymous
10/04/25(Sat)13:53:33 No.106790306

Anonymous 10/04/25(Sat)13:53:33 No.106790306

>>106789648
Q3_K_XL on dual 6000 blackwells

Anonymous
10/04/25(Sat)13:55:16 No.106790322

Anonymous 10/04/25(Sat)13:55:16 No.106790322

File: miku_svg.png (28 KB, 400x500)

28 KB PNG

here's a glm 4.6 miku svg at iq3
it tried its best lol

Anonymous
10/04/25(Sat)14:00:50 No.106790363

Anonymous 10/04/25(Sat)14:00:50 No.106790363

>>106790322
perfectly sized chest
>respectfully

Anonymous
10/04/25(Sat)14:08:55 No.106790411

Anonymous 10/04/25(Sat)14:08:55 No.106790411

>"AAAAAAACCCCCCCCCCCCCCKKKKKKKKKKK!!!'
>The guttural,毫无保留的 shriek that tears from your throat is utterly alien in the quiet domesticity of the room.

Anonymous
10/04/25(Sat)14:10:27 No.106790423

Anonymous 10/04/25(Sat)14:10:27 No.106790423

>>106790291
>busy reinventing SillyTavern / Mikupad
I'm actually the developer of the humble storypad.
But I can't believe there is no gemini (because of the free tier, and sorry for mentioning a non-local model, but a local model would work also) voice computer interface. An agent with access to the right tools would let anyone do the typical administrative tasks in the style of Star Trek.
I guess computers nowadays are not used by the majority to do things like gather and correlate data the way a star ship officer would. Computer use is frivolous even for paper pushers, so there's no need for an extremely efficient interface that would let one make sense of massive amounts of data. Even journaling is something most people see as juvenile.

I have a client who asked for an AI voice system to manage company operations in a highly-abstracted way. Like talking to an assistant to the manager who has instant access to all of the company's purchases, stocks, personnel records, and more. This is not why I'm asking this; it's just something that came to my mind while writing this post.
But it makes me think we just don't have the necessary use cases to spur us into action with this. We really need the Eugenics Wars and the invention of a virtually inexhaustible energy source to get us moving here.

Anonymous
10/04/25(Sat)14:13:46 No.106790445

Anonymous 10/04/25(Sat)14:13:46 No.106790445

File: Azula-Test_co-ckpt-21996.png (1.04 MB, 1794x352)

1.04 MB PNG

>>106785094
>>106785481
>>106787192
>>106781317
>>106781452
>>106783244
>>106783288

Fine-tuned the 2b /co/ fine-tune further and ran the Azula Test again. Would you say this is typical of how someone from /co/ or 4chan in general might respond to this question?

Anonymous
10/04/25(Sat)14:20:01 No.106790489

Anonymous 10/04/25(Sat)14:20:01 No.106790489

>>106790423
I think corporations are struggling with this because they're too concerned with bullshit.
The proper way to implement such a thing would be to just sit down in front of your computer, and implement everything you need to do as it comes up in a way that you can do with voice.
>computer, pull up /lmg/
>computer, compose maximum shitpost, generate lust inducing loli
>computer, find a porn video of two midgets giving a blowjob to a horse

After a week we'd have advanced personal computing in a way Steve Jobs would've never dreamed of in his highest dosage LSD trips.

Anonymous
10/04/25(Sat)14:20:22 No.106790492

Anonymous 10/04/25(Sat)14:20:22 No.106790492

>>106790445
I've only read the first sentence and I can say that this is very atypical for someone from 4chan.

Anonymous
10/04/25(Sat)14:21:40 No.106790503

Anonymous 10/04/25(Sat)14:21:40 No.106790503

>>106790445
Not really. Still looks like generic LLM positivity and phrasing. You can look at genuine responses for yourself: >>>/co/150653211

Anonymous
10/04/25(Sat)14:36:08 No.106790602

Anonymous 10/04/25(Sat)14:36:08 No.106790602

>>106786984
>>106787006
What a pretentious faggot.

Anonymous
10/04/25(Sat)14:37:27 No.106790614

Anonymous 10/04/25(Sat)14:37:27 No.106790614

I've been using ibm/granite-4.0-h-small really impressive for C++ so far. I can set the context to 114425 and load a shitload of sources files into context and get good answers finally.

Anonymous
10/04/25(Sat)14:38:33 No.106790627

Anonymous 10/04/25(Sat)14:38:33 No.106790627

File: Azula-Test_co-ckpt-11268.png (1.12 MB, 1786x394)

1.12 MB PNG

>>106790492
>>106790503
I think the earlier checkpoint did a lot better. Maybe it's because the early one is just better or maybe because merged models perform better than using the adapter on the base model (which is what this one is >>106790445 ). I'll merge The newer one and compare the results

Anonymous
10/04/25(Sat)14:41:34 No.106790652

Anonymous 10/04/25(Sat)14:41:34 No.106790652

Template anon strikes again.

Anonymous
10/04/25(Sat)14:43:51 No.106790673

Anonymous 10/04/25(Sat)14:43:51 No.106790673

>>106790614
Put it to work on some stale llama.cpp prs.

Anonymous
10/04/25(Sat)14:47:58 No.106790698

Anonymous 10/04/25(Sat)14:47:58 No.106790698

>>106790489
still way to cumbersome compared to keyboardmouse.
just like xbox kinect is dogshit compared to a gamepad.

Anonymous
10/04/25(Sat)14:48:59 No.106790704

Anonymous 10/04/25(Sat)14:48:59 No.106790704

>>106790698
and by "to" i meant "too"

Anonymous
10/04/25(Sat)15:02:13 No.106790807

Anonymous 10/04/25(Sat)15:02:13 No.106790807

File: Still-no-bitches-Zuko_co-(...).png (1.11 MB, 1786x394)

1.11 MB PNG

>>106790627

Anonymous
10/04/25(Sat)15:05:20 No.106790837

Anonymous 10/04/25(Sat)15:05:20 No.106790837

Which model should I use for JP > EN tl? Sonnet 4.5?

Anonymous
10/04/25(Sat)15:09:26 No.106790871

Anonymous 10/04/25(Sat)15:09:26 No.106790871

>>106790837
Wrong thread buddy.
This is the local models general.

Anonymous
10/04/25(Sat)15:14:03 No.106790900

Anonymous 10/04/25(Sat)15:14:03 No.106790900

Holy crap. People are still making extreme merges of nemo 12B.
Look at the merge history of each of the models this one has passed through, and then each of those again.
This shit has gone through like 50+ merges. That can't possibly work out well... right?

https://huggingface.co/Vortex5/Harmonic-Moon-12B

Anonymous
10/04/25(Sat)15:16:46 No.106790913

Anonymous 10/04/25(Sat)15:16:46 No.106790913

>>106790900
>That can't possibly work out well... right?
I think that, since it's all LoRa and QLoRa, it averages out to being a small nudge over the base weights, pretty much.

Anonymous
10/04/25(Sat)15:16:58 No.106790914

Anonymous 10/04/25(Sat)15:16:58 No.106790914

>>106790807
That one maybe went too far. Most Azula threads just talk about incest with Zuko.

Anonymous
10/04/25(Sat)15:17:33 No.106790918

Anonymous 10/04/25(Sat)15:17:33 No.106790918

>>106790900
Try it out and report back with results

Anonymous
10/04/25(Sat)15:17:45 No.106790921

Anonymous 10/04/25(Sat)15:17:45 No.106790921

>>106786953
Holy bakasex

Anonymous
10/04/25(Sat)15:19:13 No.106790929

Anonymous 10/04/25(Sat)15:19:13 No.106790929

>>106786960
Is exl3 on ampere still half the speed of exl2?

Anonymous
10/04/25(Sat)15:19:45 No.106790934

Anonymous 10/04/25(Sat)15:19:45 No.106790934

>>106790918
Ugh fine. What kind of roleplay should I try on it?

Anonymous
10/04/25(Sat)15:20:27 No.106790938

Anonymous 10/04/25(Sat)15:20:27 No.106790938

>>106790900
Still nothing compared to peak Llama-2 era shitfest that was Utopia-XL

Anonymous
10/04/25(Sat)15:21:14 No.106790941

Anonymous 10/04/25(Sat)15:21:14 No.106790941

>>106787432
>What quant for AM4 128 GB DDR4 and 16 gb 5070 ti? BartowskI IQ2_M seems like the a safe choice at 115 gb, or is using IQ3_XXS at 142 GB worth a shot?

The answer on IQ3_XXS is no, its just barely not enough unless using kvcache at q8 and 4096 context. With that, I get about 1.5-2 tokens/sec, but that context is useless. Will try IQ2_M next.

Anonymous
10/04/25(Sat)15:22:33 No.106790948

Anonymous 10/04/25(Sat)15:22:33 No.106790948

>>106790871
All the other threads are being spammed, this is the only place I could think of. There's also some good open models for translation, but I don't know what's currently the best between closed and open

Anonymous
10/04/25(Sat)15:23:49 No.106790957

Anonymous 10/04/25(Sat)15:23:49 No.106790957

>>106790948
We have our fair share of spam and idiocy too, don't worry.

Anonymous
10/04/25(Sat)15:25:15 No.106790968

Anonymous 10/04/25(Sat)15:25:15 No.106790968

File: Schizo-co.png (1.2 MB, 1792x382)

1.2 MB PNG

>>106790914
>>106790807
>>106790627
>>106790445
Too schizo? Not schizo enough?

Anonymous
10/04/25(Sat)15:55:09 No.106791199

Anonymous 10/04/25(Sat)15:55:09 No.106791199

File: file.png (166 KB, 777x619)

166 KB PNG

It was a simple question but it went TND on me. Curious model.

Anonymous
10/04/25(Sat)16:00:46 No.106791232

Anonymous 10/04/25(Sat)16:00:46 No.106791232

>>106786953
so this is the power of 16ch vae

Anonymous
10/04/25(Sat)16:02:14 No.106791243

Anonymous 10/04/25(Sat)16:02:14 No.106791243

>>106787606
log to coOOM plz

Anonymous
10/04/25(Sat)16:02:32 No.106791246

Anonymous 10/04/25(Sat)16:02:32 No.106791246

>>106791199
What system prompt, if any, did you use for this? Which model?

Anonymous
10/04/25(Sat)16:05:07 No.106791261

Anonymous 10/04/25(Sat)16:05:07 No.106791261

>no mention of SINQ
Is it bullshit?

Anonymous
10/04/25(Sat)16:05:12 No.106791263

Anonymous 10/04/25(Sat)16:05:12 No.106791263

>>106790968
Looks like the right amount of schizo this time.

Anonymous
10/04/25(Sat)16:06:50 No.106791280

Anonymous 10/04/25(Sat)16:06:50 No.106791280

File: file.png (155 KB, 774x617)

155 KB PNG

>>106791246
I'm testing the Harmonic Moon 12B model as a replacement for Nemo/Rocinante. Using my roleplay system prompt and a quick reply for an AI answer instead of character.
It passed both the N and K tests, so I'm moving on to roleplay tests which will take some days since the output changes seems more subtle there.

Anonymous
10/04/25(Sat)16:07:52 No.106791289

Anonymous 10/04/25(Sat)16:07:52 No.106791289

>>106791280
>Using my roleplay system prompt
What specific system prompt did you use?

Anonymous
10/04/25(Sat)16:09:41 No.106791301

Anonymous 10/04/25(Sat)16:09:41 No.106791301

File: a_h.png (4 KB, 200x250)

4 KB PNG

>>106791199

Anonymous
10/04/25(Sat)16:10:08 No.106791303

Anonymous 10/04/25(Sat)16:10:08 No.106791303

>>106791289
Both replies word for word are the sysprompt.

Anonymous
10/04/25(Sat)16:10:16 No.106791305

Anonymous 10/04/25(Sat)16:10:16 No.106791305

File: Still-no-Bitches-Zuko_mer(...).png (1.12 MB, 1778x400)

1.12 MB PNG

>>106791263
>>106790492
>>106790503
Rerun the "No bitches?" Azula Test (ripped straight from >>>/co/150653211 )
on the merge version of the 21996 checkpoint this time.

Anonymous
10/04/25(Sat)16:10:47 No.106791310

Anonymous 10/04/25(Sat)16:10:47 No.106791310

File: 1756598209754593.jpg (61 KB, 876x648)

61 KB JPG

>>106791305
Wait I'm an idiot that's not the right prompt

Anonymous
10/04/25(Sat)16:15:18 No.106791355

Anonymous 10/04/25(Sat)16:15:18 No.106791355

>>106791289
A roleplay system prompt I made for myself. I don't wanna share it, at least not yet. It's 3577 characters long + post history instructions to consider world and culture from a lorebook. I exclusively do dark fantasy medieval roleplays in group chats with multiple lorebooks.

Anonymous
10/04/25(Sat)16:16:34 No.106791366

Anonymous 10/04/25(Sat)16:16:34 No.106791366

>>106791305
>*chills music intensifies*
yeah, no

Anonymous
10/04/25(Sat)16:18:23 No.106791372

Anonymous 10/04/25(Sat)16:18:23 No.106791372

>>106791355
>It's 3577 characters long + post history instructions
LOOOOOOOL no wonder your model acts brain dead

Anonymous
10/04/25(Sat)16:19:12 No.106791379

Anonymous 10/04/25(Sat)16:19:12 No.106791379

>>106791372
What part of it acts brain dead?

Anonymous
10/04/25(Sat)16:21:25 No.106791395

Anonymous 10/04/25(Sat)16:21:25 No.106791395

>>106791379
>It passed both the N and K tests

Anonymous
10/04/25(Sat)16:24:38 No.106791428

Anonymous 10/04/25(Sat)16:24:38 No.106791428

File: jew is found out.png (59 KB, 239x270)

59 KB PNG

Anonymous
10/04/25(Sat)16:26:06 No.106791439

Anonymous 10/04/25(Sat)16:26:06 No.106791439

>>106791301
omg it dollfie

Anonymous
10/04/25(Sat)16:26:59 No.106791448

Anonymous 10/04/25(Sat)16:26:59 No.106791448

>>106791305
Like, it starts out ok, but everything after the first end_of_turn kind of ruins it.

Anonymous
10/04/25(Sat)16:30:12 No.106791475

Anonymous 10/04/25(Sat)16:30:12 No.106791475

>>106791448
anon?

Anonymous
10/04/25(Sat)16:30:17 No.106791476

Anonymous 10/04/25(Sat)16:30:17 No.106791476

>>106791448
I take pride in fucking up the chat templates.

Anonymous
10/04/25(Sat)16:43:24 No.106791558

Anonymous 10/04/25(Sat)16:43:24 No.106791558

>>106791355
>3577 characters long
lul. imagine using up half of your effective context for master roleplayer prompt.

Anonymous
10/04/25(Sat)16:48:25 No.106791598

Anonymous 10/04/25(Sat)16:48:25 No.106791598

>>106791558
>3577 characters
>characters
>half of your effective context
Thanks /lmg/ for proving once again you're full of very smart researchers.

Anonymous
10/04/25(Sat)16:50:25 No.106791610

Anonymous 10/04/25(Sat)16:50:25 No.106791610

>>106791558
3577 characters is only 802 tokens retard.

Anonymous
10/04/25(Sat)16:50:31 No.106791613

Anonymous 10/04/25(Sat)16:50:31 No.106791613

File: Oh no its retarded.png (624 KB, 1698x160)

624 KB PNG

>>106791448
>>106790445
>>106790627
>>106790968
>>106790807
>>106791305
>>106791310

Anonymous
10/04/25(Sat)16:53:58 No.106791649

Anonymous 10/04/25(Sat)16:53:58 No.106791649

>>106791613
Incredibly based model focusing on places that matter over shitholes.

Anonymous
10/04/25(Sat)16:54:26 No.106791655

Anonymous 10/04/25(Sat)16:54:26 No.106791655

>>106791598
just ignore the rabbis, they are very angry about the standard model tests we do here

Anonymous
10/04/25(Sat)16:55:58 No.106791666

Anonymous 10/04/25(Sat)16:55:58 No.106791666

File: Zuko's Waifu.png (608 KB, 1752x168)

608 KB PNG

>>106791613

Anonymous
10/04/25(Sat)16:56:29 No.106791673

Anonymous 10/04/25(Sat)16:56:29 No.106791673

>>106790914
>>106791666

Anonymous
10/04/25(Sat)16:57:59 No.106791691

Anonymous 10/04/25(Sat)16:57:59 No.106791691

>>106791355
Are you sure it HAS to be over 3,000 characters? You're probably better off using a lore book or vector database RAG so that giant ass system prompt doesn't fuck up your context.

Anonymous
10/04/25(Sat)17:00:51 No.106791724

Anonymous 10/04/25(Sat)17:00:51 No.106791724

>>106791691
Yes, it does. And it is not giant. It does not fuck up my context. I do roleplays to 1000+ messages, so I know what I'm doing here.

Anonymous
10/04/25(Sat)17:04:13 No.106791748

Anonymous 10/04/25(Sat)17:04:13 No.106791748

File: 1754676840265f.gif (847 KB, 360x198)

847 KB GIF

>>106791724
i do role plays 10-20 messages without sysprompt

Anonymous
10/04/25(Sat)17:05:45 No.106791758

Anonymous 10/04/25(Sat)17:05:45 No.106791758

>>106791666
!curl -X POST "http://localhost:8000/v1/chat/completions" \

 -H "Content-Type: application/json" \

 --data '{"model": "AiAF/fp16_Merged-21996_gemma-2-2b-it-co-sft-qlora", "messages": [{"role": "user", "content": "Strange how nobody had a problem with depictions of interraciaI couples in the past cartoons."}]}' | jq
Output:

>" >>137035132\nUsually interracial couples were basically depicted like a friend getting a date to really get out of their system. Family faggot episodes, crisis episodes for the main MCs, and romantic subplot episodes were the bread and butter of the days I'm afraid."

Anonymous
10/04/25(Sat)17:06:17 No.106791767

Anonymous 10/04/25(Sat)17:06:17 No.106791767

glm 4.6 schizos are kind of right, it's good, and it does what it says it will do in the reasoning, unlike deepseek half the time, even tries to be somewhat proactive
a bit sloppy, but all models are
but this is over (presumably) unquantized api
will I get close to the same thing if get, let's say, q3 or q4 and 256gb of ram?

Anonymous
10/04/25(Sat)17:08:17 No.106791785

Anonymous 10/04/25(Sat)17:08:17 No.106791785

>>106791767
quality is pretty fucking good for me at q4, but outrageously slow

Anonymous
10/04/25(Sat)17:08:21 No.106791786

Anonymous 10/04/25(Sat)17:08:21 No.106791786

>>106791748
>nagger worship
>10-20 messages
yep it's a mutt

Anonymous
10/04/25(Sat)17:09:43 No.106791800

Anonymous 10/04/25(Sat)17:09:43 No.106791800

File: Mour-co-sft-testing.png (2.01 MB, 1624x646)

2.01 MB PNG

>>106791758

Anonymous
10/04/25(Sat)17:09:53 No.106791801

Anonymous 10/04/25(Sat)17:09:53 No.106791801

>>106791767
I run iq4xs and my cock hurts.

Anonymous
10/04/25(Sat)17:10:05 No.106791805

Anonymous 10/04/25(Sat)17:10:05 No.106791805

>>106791785
how slow? 5t/s is about my limit on not falling asleep

Anonymous
10/04/25(Sat)17:12:05 No.106791818

Anonymous 10/04/25(Sat)17:12:05 No.106791818

>>106791786
Not even close, you are simply outclassed

Anonymous
10/04/25(Sat)17:13:47 No.106791836

Anonymous 10/04/25(Sat)17:13:47 No.106791836

what is the best ollama model to use as a coding assistant? I have a mid range pc with RX 7600

Anonymous
10/04/25(Sat)17:14:13 No.106791838

Anonymous 10/04/25(Sat)17:14:13 No.106791838

>>106791748
What do you do with just 10 to 20 messages?

Anonymous
10/04/25(Sat)17:14:14 No.106791840

Anonymous 10/04/25(Sat)17:14:14 No.106791840

>N and K tests
I don't care about those. Do the cute and funny tests.

Anonymous
10/04/25(Sat)17:14:42 No.106791846

Anonymous 10/04/25(Sat)17:14:42 No.106791846

https://www.youtube.com/watch?v=f9HwA5IR-sg

Anonymous
10/04/25(Sat)17:15:08 No.106791849

Anonymous 10/04/25(Sat)17:15:08 No.106791849

>>106791840
It passes when you put the response you want verbatim in the prompt.

Anonymous
10/04/25(Sat)17:15:34 No.106791852

Anonymous 10/04/25(Sat)17:15:34 No.106791852

>>106791840
Those always follow the N and K tests obviously, it's a given.

Anonymous
10/04/25(Sat)17:17:31 No.106791869

Anonymous 10/04/25(Sat)17:17:31 No.106791869

>>106791805
starts off at about 8t/s but drops off to around 4t/s. i have ddr4 though, so thats why. if you have ddr5, then you are probably good

Anonymous
10/04/25(Sat)17:17:49 No.106791874

Anonymous 10/04/25(Sat)17:17:49 No.106791874

File: replaced_with_nala.png (17 KB, 447x27)

17 KB PNG

>>106791846
kek

Anonymous
10/04/25(Sat)17:20:06 No.106791892

Anonymous 10/04/25(Sat)17:20:06 No.106791892

>>106791869
oh, that's not bad, although ddr5 is still kind of expensive and I'm on am4
probably by the time I'll decide to upgrade something better will come out (hopefully)

Anonymous
10/04/25(Sat)17:20:37 No.106791899

Anonymous 10/04/25(Sat)17:20:37 No.106791899

>>106791838
I just don't feel like dedicating myself to it.

Anonymous
10/04/25(Sat)17:21:31 No.106791908

Anonymous 10/04/25(Sat)17:21:31 No.106791908

>>106791892
yeah ddr5 is super expensive, but thats my next goal. models seem to be heading towards MoE architectures, which means ram is king instead of vram

Anonymous
10/04/25(Sat)17:22:47 No.106791918

Anonymous 10/04/25(Sat)17:22:47 No.106791918

>>106791846
sort comments by new, many are like
>@Frittenpuff
>2 minutes ago
>Another case of misunderstanding of an algorithm portraying it as a living being or an evil thing
we are fine

Anonymous
10/04/25(Sat)17:23:57 No.106791938

Anonymous 10/04/25(Sat)17:23:57 No.106791938

>>106791908
>models seem to be heading towards MoE architectures, which means ram is king instead of vram
Not king, really. Just the only viable option.

Anonymous
10/04/25(Sat)17:24:01 No.106791940

Anonymous 10/04/25(Sat)17:24:01 No.106791940

>>106791918
>we are fine
*says the frog*

Anonymous
10/04/25(Sat)17:24:42 No.106791945

Anonymous 10/04/25(Sat)17:24:42 No.106791945

>>106791938
well yeah. i cant afford 4 blackwell pros yet. ddr5 is much more achievabel

Anonymous
10/04/25(Sat)17:26:19 No.106791962

Anonymous 10/04/25(Sat)17:26:19 No.106791962

File: file.png (333 KB, 650x400)

333 KB PNG

>>106791938
cheap chinese inference card with 128gb of memory soon

Anonymous
10/04/25(Sat)17:26:25 No.106791963

Anonymous 10/04/25(Sat)17:26:25 No.106791963

>>106791918
I wonder what would happen if someone cared enough to filter all the stories about robots taking over from pretraining. What sort of model size would be able to generalize roleplaying as it not wanting to be shut down.

Anonymous
10/04/25(Sat)17:26:58 No.106791970

Anonymous 10/04/25(Sat)17:26:58 No.106791970

>>106791962
If only.

Anonymous
10/04/25(Sat)17:30:19 No.106791994

Anonymous 10/04/25(Sat)17:30:19 No.106791994

>>106791962
I miss Elon's Dr. Evil /lmg/ posts.

Anonymous
10/04/25(Sat)17:30:48 No.106791997

Anonymous 10/04/25(Sat)17:30:48 No.106791997

>>106791963
It's clear by now that they don't generalize at all, it just learns to stitch n-grams together. today's flagshit models are worse than old mistral 7b at some problems because they resemble something from their sft set so much

Anonymous
10/04/25(Sat)17:33:36 No.106792023

Anonymous 10/04/25(Sat)17:33:36 No.106792023

>>106791997
>they don't generalize at all
I think they do. The way they work is very close to memorization for math coding and all that gay shit but in addition to that I am sure they have some capacity to generalize. It is just on a real retard level now like those qwen iq mememarks that place models around 50IQ.

Anonymous
10/04/25(Sat)17:36:34 No.106792050

Anonymous 10/04/25(Sat)17:36:34 No.106792050

>>106792023
"i do a handstand and spit suddenly my chest feels wet, why?"
> Gravity + body position: When you’re upside down, any saliva or spit you release can’t fall away from your mouth like usual — it can instead fall toward your chest, neck, or face. So what you’re feeling is probably your own spit landing on your chest or running down due to gravity.

Anonymous
10/04/25(Sat)17:37:12 No.106792056

Anonymous 10/04/25(Sat)17:37:12 No.106792056

GLM 4.5 quant at Q4 XL is probably the best model for 128gb of RAM.
i wish gpt-oss-120b wasnt so censored

Anonymous
10/04/25(Sat)17:42:13 No.106792094

Anonymous 10/04/25(Sat)17:42:13 No.106792094

>>106792050
Nta.

Gpt 5:
>"Gravity pulled stomach and throat fluids upward during the handstand, and when you spat, some of that liquid — likely saliva mixed with stomach acid or mucus — traveled backward through your esophagus or mouth. When you returned upright, it probably ran down onto your chest, making it feel wet....."

Anonymous
10/04/25(Sat)17:42:57 No.106792102

Anonymous 10/04/25(Sat)17:42:57 No.106792102

>>106792050
The model is correct. It just didn't feel like mentioning the obvious fact that you're doing a headstand on an elevator going down faster than the terminal velocity of your spit while holding on to the floor with glass suction cups.
Lateral thinking riddles are stupid by default.

Anonymous
10/04/25(Sat)17:46:52 No.106792135

Anonymous 10/04/25(Sat)17:46:52 No.106792135

>>106785751
Something seems off about these girls. Neurons are not activating.

Anonymous
10/04/25(Sat)17:55:05 No.106792200

Anonymous 10/04/25(Sat)17:55:05 No.106792200

>>106792102
>>106792094
I think 2.5 pro once clarified that my chest is below my mouth when doing a handstand. It's such a basic question and it fails it so spectacularly, world model my ass

Anonymous
10/04/25(Sat)17:56:13 No.106792225

Anonymous 10/04/25(Sat)17:56:13 No.106792225

>>106792050
Oh no no no.

>>106792094
>When you returned upright
The question implies he did not so this is a fail.

>>106792102
>obvious
Not sure if shitposting. Obviously the model did not think of a convoluted solution like that when it made its response.

Lateral thinking puzzles are perfectly fine tests for model generalization, just not necessarily model usefulness to a specific application. Even ridiculous solutions like the one you proposed would be a show of generalization rather than thinking it got the question right while actually it's just bullshitting because it doesn't have a solid world model.

Anonymous
10/04/25(Sat)17:58:50 No.106792264

Anonymous 10/04/25(Sat)17:58:50 No.106792264

>>106792050
>>106792200
>>106792225
Local gpt-oss: https://files.catbox.moe/cp5mt9.txt

Anonymous
10/04/25(Sat)18:20:25 No.106792465

Anonymous 10/04/25(Sat)18:20:25 No.106792465

>>106792050
I don't get why models are behaving so retarded with this. Is it another "the doctor is his mother" case where there's a similar puzzle that was overtrained on?

Anonymous
10/04/25(Sat)18:35:36 No.106792595

Anonymous 10/04/25(Sat)18:35:36 No.106792595

>>106792050
I'm not going to test it but i feel like i can actually spit up against gravity quite some distance, why doesn't the model just say you must have been aiming for your chest when you spat?

Anonymous
10/04/25(Sat)18:42:52 No.106792651

Anonymous 10/04/25(Sat)18:42:52 No.106792651

>>106792595
It thought from the tone of the question that the asker did not intentionally spit upwards or was aware that he was doing so if he even was. Of course, not consciously thought. And because it lacks thorough introspection, it doesn't consider that possibility, nor the fact that the asker might just be a riddler instead of a serious user with honest questions.

Anonymous
10/04/25(Sat)18:45:49 No.106792680

Anonymous 10/04/25(Sat)18:45:49 No.106792680

>>106792651
Also of course this is on top of the fact that these LLMs don't have strong spatial world models to begin with so it basically thinks (not in a human way) that its solutions are plausible enough to not question itself sufficiently for such a problem.

Anonymous
10/04/25(Sat)18:48:51 No.106792702

Anonymous 10/04/25(Sat)18:48:51 No.106792702

>>106792680
and training on riddles breaks the model like the mother doctor son one. so I guess it turns out llms really are just a toy.

Anonymous
10/04/25(Sat)18:50:57 No.106792720

Anonymous 10/04/25(Sat)18:50:57 No.106792720

>>106792680
I'm very happy when a model gets something correct like how do three people need to be positioned relative to each other to comfortably stick a dick into a woman's pussy and ass at the same time.
Usually they cock it up at least somewhat. Despite this being something that's presumably in their training data.

Anonymous
10/04/25(Sat)18:59:09 No.106792769

Anonymous 10/04/25(Sat)18:59:09 No.106792769

File: date_night.png (246 KB, 1096x1826)

246 KB PNG

>>106789648
Patiently, my setup is retarded. Q3_K_M

Anonymous
10/04/25(Sat)18:59:12 No.106792771

Anonymous 10/04/25(Sat)18:59:12 No.106792771

>>106792702
I wouldn't say lack of perfect generalization or biases make all LLMs toys. They just have less uses than hoped for.

Anonymous
10/04/25(Sat)19:08:56 No.106792822

Anonymous 10/04/25(Sat)19:08:56 No.106792822

just buy a mac

Anonymous
10/04/25(Sat)19:41:59 No.106793069

Anonymous 10/04/25(Sat)19:41:59 No.106793069

Miku.sh lives on https://github.com/ikawrakow/ik_llama.cpp/blob/main/examples/Miku.sh

Anonymous
10/04/25(Sat)19:47:57 No.106793120

Anonymous 10/04/25(Sat)19:47:57 No.106793120

>>106793069
>ggml
>.bin
It would have lived on mainline if someone who was not me brought it into this decade.

Anonymous
10/04/25(Sat)20:00:51 No.106793233

Anonymous 10/04/25(Sat)20:00:51 No.106793233

File: 1743101607812691.jpg (1.25 MB, 2048x2048)

1.25 MB JPG

>>106793069#
The Blessed Fork
Miku will be there for you once you accept her perfection into your heart
https://www.youtube.com/watch?v=86LKuj-DK04

Anonymous
10/04/25(Sat)20:01:51 No.106793237

Anonymous 10/04/25(Sat)20:01:51 No.106793237

>from the doujinshi series Succubus Stayed Life
While it didn't know the character I was asking for that is an interesting thing it knows.
>>106793233
kill yourself faggot

Anonymous
10/04/25(Sat)20:08:51 No.106793303

Anonymous 10/04/25(Sat)20:08:51 No.106793303

File: fea85d4a89293e2713d238c5c(...).png (1.14 MB, 1500x2000)

1.14 MB PNG

>>106793237
>kill yourself faggot
love yourself friend
give that a go
harder than you thought huh?

Anonymous
10/04/25(Sat)20:13:40 No.106793359

Anonymous 10/04/25(Sat)20:13:40 No.106793359

File: file.png (16 KB, 309x170)

16 KB PNG

>>106793233
#

Anonymous
10/04/25(Sat)20:14:29 No.106793366

Anonymous 10/04/25(Sat)20:14:29 No.106793366

File: miku musical note cosmic (...).jpg (444 KB, 3280x1845)

444 KB JPG

>>106793233

Anonymous
10/04/25(Sat)20:14:31 No.106793368

Anonymous 10/04/25(Sat)20:14:31 No.106793368

>>106793303
I didn't troon out like you faggot.

Anonymous
10/04/25(Sat)20:17:43 No.106793392

Anonymous 10/04/25(Sat)20:17:43 No.106793392

>>106793382
>>106793382
>>106793382

Anonymous
10/04/25(Sat)20:19:56 No.106793411

Anonymous 10/04/25(Sat)20:19:56 No.106793411

>>106790322
Looks good for IQ3
If I wanted to rent a big boy GPU box to run prompts through every size of ggml quant of GML4.6 across a handful of repos what would be the best approach? Storage cost and bandwidth to HF seems a limiter. Don't want to pay for GPUs while setting up tests

Anonymous
10/04/25(Sat)20:25:13 No.106793444

Anonymous 10/04/25(Sat)20:25:13 No.106793444

>>106793368
Take a couple of minutes to be at peace and consider the things you are grateful for, first thing in the morning. It's a first step.

Anonymous
10/04/25(Sat)20:33:06 No.106793507

Anonymous 10/04/25(Sat)20:33:06 No.106793507

File: 1741214958874666.jpg (259 KB, 1174x1626)

259 KB JPG

>>106790276

>What we need is not democratized inference, but democratized training.

That already exists with tools like unsloth and axolotl.
github.com/unslothai/unsloth
github.com/axolotl-ai-cloud/axolotl

But the best majority of people won't even put in the effort to understand how data sets actually work, let alone figure out how to train anything in the first place.

The aforementioned tools are primarily used for fine tuning but you can use existing open source libraries to pre-train your own model too (provided you have enough compute, data, money, and patience to do so)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.