[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: glm_miku.png (27 KB, 400x500)
27 KB
27 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106777408 & >>106769660

►News
>(10/03) /Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>(10/02) ZLUDA 5 released with preliminary support for llama.cpp: https://vosen.github.io/ZLUDA/blog/zluda-update-q3-2025
>(10/01) Granite 4.0 released: https://hf.co/collections/ibm-granite/granite-40-language-models-6811a18b820ef362d9e5a82c
>(10/01) LFM2-Audio: An End-to-End Audio Foundation Model: https://liquid.ai/blog/lfm2-audio-an-end-to-end-audio-foundation-model
>(09/30) GLM-4.6: Advanced Agentic, Reasoning and Coding Capabilities: https://z.ai/blog/glm-4.6

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: kimi-miku-svg-attempt.png (96 KB, 569x785)
96 KB
96 KB PNG
►Recent Highlights from the Previous Thread: >>106777408

--Memory channel limitations and EPYC upgrade considerations:
>106777689 >106777726 >106777777 >106777781 >106777996 >106778156 >106778198 >106778404 >106778453 >106778486 >106778493 >106778506 >106778537 >106778594 >106778632 >106778813 >106778915 >106778929
--VRAM requirements and future-proofing strategies for AI workloads:
>106781502 >106781514 >106781592 >106781611 >106781767 >106781828 >106781948 >106781962 >106782007 >106782045 >106782252 >106781815 >106781726 >106781750 >106781617 >106781775 >106781715 >106781725 >106781776 >106781798 >106781841 >106781897 >106781943
--MoE scalability tradeoffs: cost, speed, and accessibility challenges:
>106779306 >106779349 >106779414 >106779428 >106779883 >106779973 >106779989 >106780006 >106779998 >106779914 >106779939 >106779990 >106780169 >106779408
--Training a model on 4chan's /co/ board dataset with evolving loss trends:
>106781317 >106781408 >106781452 >106781500 >106781506 >106781793 >106782959 >106782993 >106783031 >106783170 >106783177 >106783207 >106783254 >106783244 >106783263 >106783288
--Feasibility of conveying visual data to visionless LLMs via text formats:
>106782778 >106782814 >106782892 >106782906 >106782966
--Testing VibeVoice 7B audio synthesis quality on RTX 5090:
>106779632 >106779663
--Local model frustration and alternatives: self-training and prompt tweaking:
>106780504 >106780511 >106780552 >106780618 >106780768 >106780827 >106780686
--ZLUDA 5 update enables CUDA backend for llama.cpp, performance compared to ROCm:
>106781061 >106781109 >106781417
--Miku (free space):
>106778073 >106779336 >106780879 >106780921 >106781197 >106781753 >106782708 >106783263 >106784351 >106784397 >106784522 >106784721
--NOT Miku:
>106779078 >106781060 >106782028 >106782155 >106782405 >106782462 >106782736 >106783082 >106783172

►Recent Highlight Posts from the Previous Thread: >>106777411

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
dead general
>>
>>106785107
Who is second in command?
>>
>>106785130
teto
>>
>>106785134
which quant do you use?
>>
File: at your service.png (2.16 MB, 1536x1536)
2.16 MB
2.16 MB PNG
>>106785134
oh no missed saber finger

me when I download GLM 4.6 and have it describe in various flowery prose how Saber devotes herself to serving the noble purpose using her mouth as a scabbard for my cock
ahhh, culture. I haven't visited this scenario since that last massive copypasta I dumped.
Even at the braindead quant it's good.

>>106785140
q1
>>
The reason cloud models get better is because we correct them in our conversations. They probably measure ppl on human text responses and train on it for new model versions. Seems Anthropic and the Chinese are the only ones really training NSFW atm.
>>
>>106785160
saber's tongue belongs to the anuses of horses
>>
if glm 4.6 local's opus moment if deepseek was its gpt4 moment?
>>
>>106785204
deepseek was the claude sonnet 2 ish moment, knows a ton and writes well but is dumb and crazy, glm is the claude sonnet 4 moment, smart and creative but it knows a bit less than sonnet in comparison
>>
How do I get the llama-server web interface to show thinking tokens for GLM 4.6?
I checked the checkbox on the preferences window but nothing.
>>
>>106785160
what t/s are you getting on q1? i am at 6ish average. prompt processing is around 50t/s
>>
>>106785160
I look like this irl
>>
>>106785265
12t/s generating, 250t/s processing
>>
>>106785304
damn. well i am on an iq4k instead of iq, so i guess that is why. how is the quality at q1? iq4k is fucking god tier, best model ever.
>>
>>106785227
https://github.com/ggml-org/llama.cpp/pull/16394
https://github.com/ggml-org/llama.cpp/pull/16364
Doesn't seem to be working correctly. It may get fixed with one of those.
>>
What is the impact of zram swap instead of using swap on nvme? I'd think using disk is better with llama-server, right?
>>
>>106785310
I mean it beats out anything I was using before so I'm happy with it.
>>
>>106785310
What's the difference between unsloth quants and those quants? Should I stick with a dynamic unsloth quant if I can only fit iq3?
>>
>>106785342
What's the difficulty in testing it yourself?
Up to a certain point, i'd assume zram is better, but there's only so much compression you can get out of practically random numbers.
>>
>>106785345
does it ever go schizo? like does it ever generate words that just dont make any sense? before i was using an iq2xxs of glm 4.5 and it would occasionally just repeat words in sequence, like "the the".
>>106785350
are you using ikllama? if not, then just stick with what you have. if you are using ikllama, get the biggest ubergarm quant that you can fit into both your ram + vram
>>
gpt-oss-20b-base is gone?
>>
>>106785393
>https://huggingface.co/mradermacher/gpt-oss-20b-base-GGUF
gee-huffs are still here.
>>
>>106785342
Why would you ever use swap on nvme? The model mmaping the file should already allow you to page out most of the weights (it'll be unusable slow).
>>
>>106785402
It's a precaution obviously. I'm all about safety.
>>
>>106785359
>>106785402
Base idea here is to avoid stressing cpu under heavy memory loads and therefore I think using traditional swap is better than zram when taxing memory to its limits with llama-server.
Memory mapping is not related to this question per se.
>>
anyone else on a 395+ CPU? With rocm I get like 70-80 tokens per second but half the time I use it I just get `////////////////////////////////////` as my output.

Vulkan works consistently but the output is way slower. Maybe it's time to graduate to something other than lm studio
>>
>>106785094
[
{
"generated_text": "Hey guys, did you know that in terms of breeding, your sister is the most compatible with you? Not only do you get to pass on your genes more completely than fucking a stranger, but there's also a 98.75% chance of having healthy children (we can round this to 100%)? Think about it, you could be a 75% genetic match with your kids! Gosh, who would wanna throw away their bloodline to some random hussy? \n>lol ultraratino\nlol no.\n>lol incestfag\nlol no.\n>lol stop pretending you know what's good for your genes\nI don't know what you think I'm doing here, but I'm not trying to convince anyone that incest is good for you, just informing you that 1) it's not the only reason your genes might be fucked up and 2) it's not just a hypothetical argument, it's your sister you're ignoring.\n>lol oh but she's not my sister biologically! So it's totally fine!\nIf you want to fuck your blood sister, go ahead. I'm not gonna stop you."
}
]
>>
>>106785099
we have a not miku category now woaw
>>
I'm feeling so jelly of cpumaxxer friends that get to play with glm 4.6. If only I wasn't poor...
>>
>>106785478
what model are you using? also yes, get off of lm studio
>>
BOYS
https://huggingface.co/microsoft/bitnet-b1.58-2B-4T-gguf
>>
>>106785525
my epyc was $300 and my 256gb ram kit was $350. i get 5t/s on glm 4.6 at q3
>>
New Mistral large 3 and llama 4 thinking coming soon are going to be crazy
>>
i hope mistral large is a 180b moe
>>
>>106785564
I'm betting 2-4% benchmark improvements across the board. Wild times!
>>
>>106785531
Why didn't they ever scale this up? It's clearly better or equal to other models around the same parameters but way faster and smaller memory footprint, it would be free money/publicity (which is the same thing as money because investors would love it). Is microsoft retarded? Looking at their huggingface page it looks like they're just releasing tiny dogshit models that are unusable for anything, over and over. What's the point?
>>
>>106785570
It took them one year for them to scale up the training data to 4T. They did say they were planning to do bigger models next, but if it takes them another year, we might not see something until mid-2026.
>>
miku feet
>>
>>106785529
the qwens, phi, magistral small all generate garbage with rocm, normal output with vulkan. What's the model runner I should be using. I didn't like ollama.
>>
>>106785609
ikllama or llamacpp. both with sillytavern. or if you want something similar to but better than lm studio, https://github.com/oobabooga/text-generation-webui is decent. i havent used https://github.com/LostRuins/koboldcpp in ages but people here seem to like it.
>>
>>106785617
these uis all look like shit compared to lm-studio but pretty ui is useless if the output is slow af. Going to try rocm + llama-server directly and see if I can repro the garbage output
>>
>>106785531
>2B-4T
For a second my dumbass thought that Microsoft had just made a 4T parameter Bitnet MoE model with only 2B active parameters
>>
>>106785397
ty

is it any good?
>>
>>106785440
It always boggles me when anons have all the tools for the thing they want to try and they don't.
I rephrase. What stops you from trying?
>>
>>106785627
Reproduced with llama-server directly. Makes sense given lm studio is just a frontend for llama.cpp
>>
>>106785094
lol
>>
>>106785655
Why waste the time trying when someone else has the answer?
>>
>>106785691
You could have tested it already. Get real numbers for your hardware, your model, your inference program.
But whatever. Good luck with that.
>>
>>106785160
Im horny
>>
File: 1740133636648054.jpg (16 KB, 400x279)
16 KB
16 KB JPG
>i don't need to buy a 256gb epyc server to have saber fondle my cock

>i don't need to buy a 256gb epyc server to have saber fondle my cock

>i... i...
>>
File: 1731045591124463.jpg (560 KB, 1152x2048)
560 KB
560 KB JPG
>>106785094
ダンボール = corrugated cardboard/cardboard box
>>
>>106785747
A cheap hooker is only $10 retard
>>
File: 1581736925589.jpg (40 KB, 452x363)
40 KB
40 KB JPG
>recently built a new rig to play with the AI toys after trucking along with hardware from 2013
>RTX 4090
>ryzen 9 9950x
>64GB RAM
>7GB/s SSD NVME
>thought I was getting top dog stuff
>can't even run GLM 4.6 at not-crawling speeds(if at all) if an Air version don't come out
>GLM is not even in the tier of the largest models around
I was aware that 64 GB RAM wasn't super high-end but cmon
>>
>>106785755
where do you live?
>>
>>106785758
Namibia
>>
File: ent.png (59 KB, 864x219)
59 KB
59 KB PNG
>>106785342
>>106785359
Haven't tested but can't see it helping, just more work for CPU
>>
>>106785756
Sucks, but what's another 2 sticks? Do it do it do it!
>>
>>106785760
my condolences
>>
File: 1733675794866749.jpg (229 KB, 832x1472)
229 KB
229 KB JPG
>>106785094
>>
>>106785775
I'm tempted to but GPT told me 4 dimms suck and it would be best for me to replace the current 2x 32GB ones with 2x 64GB sticks. So kinda of a hassle to replace/add, and a lot of potential to overspend.
I'll get to it sometime tho
>>
>>106785726
I'm not OP bro
>>
>>106785812
Point stands... bro...
>>
File: ComfyUI_00540_.png (342 KB, 1024x1024)
342 KB
342 KB PNG
>>106785160
>>106785525
>>106785756

>NAYSAYERS
btfo
>UNBELIEVERS
shunned
>POORFAGS
in shambles

GLM-chan won my heart and singlehandled saved local
China I kneel
>>
>>106785833
lol I agree with you, but I was just answering your question
>>
File: 1751169640355327.png (2.01 MB, 1024x1536)
2.01 MB
2.01 MB PNG
>she sees you only have 80gb of vram
>>
>>106785655
>>106785726
Fuck you, it's for discussion. Maybe you are so autistic that you don't understand what it means to have conversations.
I have already made up my mind anyway, thought it would've been fun/useful to ask.
Fuck you, eat shit little bugger. You are nothing but a little brown turd in my toilet.
>>
>>106785756
Even a basic google would have told you how much ai costs to run in 10 seconds. Just admit you bought it for gamez and youre full of shit.
>>
File: ggw0n.png (17 KB, 300x80)
17 KB
17 KB PNG
>>106785926
>it's for discussion
In this particular case, you'd end up with
>one anon with some anecdotal evidence, forcing you to test it yourself
>conflicting comments from anons, forcing you to test it yourself
>some asshole telling you to test it yourself, forcing you to test it yourself
So the options are to stay convinced of whatever you estimated or test it yourself.
>I have already made up my mind anyway
Well done, you.
>>
>>106785939
I don't even gayme anymore you zoomer faggot. The PC runs mint with no dual boot and I'm rarely actually siting in front of it, but connecting from my thinkpad instead.
I had a 660 Ti 2GB before that didn't run shit, so I had little contact with the tools yet other than SD on vast.ai and didn't see the need to maxx on all specs right away. Also 2x 64GB seems overpriced where I live
>>
File: 1749958783707400.jpg (183 KB, 1080x1080)
183 KB
183 KB JPG
If you have conflicting results, that is even more of a reason to have a larger sample size
>>
>>106786082
People have fingers. Some more than others. How many do you have? I need more data before I start counting my own.
>>
>>106786158
I have 7. Not sure if I'm counting right.
>>
File: 124953711.jpg (665 KB, 800x1200)
665 KB
665 KB JPG
>>106785799
Heard it can be tricky to run 4 sticks on AM5 but ppl manage it, to 192GB even. Should be doable if you have the patience to learn about BIOS config for DDR5, microcode/BIOS updates, check QVL, see what specific memory kits others were able to run etc.
There's always a better smaller model down the road and some richfag with a better rig, just as you are the richfag to truly poorfags, let's be thankful for what we do have.
>>
>>106786170
Hmm... what a conundrum. So I can now trust your data or count my fingers.
I have already made up my mind anyway, thought it would've been fun/useful to ask.
Don't eat shit. It's bad for you. You are nothing but a ray of sunshine on my screen.
>>
File: Disgust.jpg (66 KB, 446x284)
66 KB
66 KB JPG
>>106785862
>generic moe-lolislop №142835762856
Do better.
>>
>>106786082
just feed more RAGs
>>
>>106786209
>You are nothing but a ray of sunshine on my screen.
daww
>>
File: Screenshot.png (69 KB, 730x309)
69 KB
69 KB PNG
great...
>>
File: 1759563378175.png (668 KB, 1080x1080)
668 KB
668 KB PNG
your opinions are invalid if you uses less than q4. run the full weight poorfag
>>
>>106786366
quant correction is needed
>>
>>106786268
why bother posting this? They aren't HF staff, seems like just some rando researcher.
>>
>>106786380
because this type of shit always tries to fuck us in the ass and i'm tired of it
the guy's post history is full of alarmist safety shit
> Anyone seen safety regressions after fine-tuning LLaMA or Mistral on clean data?
> Have your fine-tuned LLMs gotten less safe? Do you run safety checks after fine-tuning? (Real-world experiences)
>>
File: Screenshot.png (7 KB, 94x104)
7 KB
7 KB PNG
>>106786401
also this because of course
>>
File: 1753842074589870.gif (1.6 MB, 498x498)
1.6 MB
1.6 MB GIF
>>106786366
TRUE
FP32 like God intended
>>
File: 1731590143074231.jpg (807 KB, 3680x2728)
807 KB
807 KB JPG
>>106785094
間違えて二人部屋取っちゃったテト
>>
>>106786366
U-uhmm bitnet-bros? she's done us
>>
1. can i run any GLM model with 24gb VRAM, 64GB ddr5 RAM? if so which?
2. can I use koboldcpp to split layers between gpu and cpu? if not then what?
3. does anything other than koboldcpp support banned STRINGS (not banned tokens)?
the banned strings implementation koboldcpp has is the only reason i use it at all
>>
>>106786681
>1. can i run any GLM model with 24gb VRAM, 64GB ddr5 RAM? if so which?
only 4.5 air which is very meh compared to glorious 4.6
>2. can I use koboldcpp to split layers between gpu and cpu? if not then what?
yes
>3. does anything other than koboldcpp support banned STRINGS (not banned tokens)?
not in the same way no
>>
>>106786407
no wonder sam altman used his jewish connections to MURDER that whistleblower
>>
>>106786698
and how does 4.5 air compare to nemo12b/rocinante for roleplay?

also i would like to see koboldcpp's amazing banned strings implementation added to every single fucking API out there, especially llamacpp. just what the fuck are they doing?
banned strings is the savior that makes every model shut the fuck up with its shivers and subversive kikery
once you banned strings theres no going back
>>
>>106786681
>does anything other than koboldcpp support banned STRINGS
Exllama but no CPU there.
>>
>>106786737
>just what the fuck are they doing?
Waiting for your patches. This is what they've been doing in the meantime.
>https://github.com/ggml-org/llama.cpp/commits/master/
>>
HAPPENING!!
NEW SOTA VRAMLET VISION MODELVDROPPED
Qwen3-VL-30B-A3B
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-30B-A3B-Thinking
>>
>>106786925
>No need for gguf's guys. There is the awq 4 bit version. It takes like 18GB, so it should run on a 3090 with a decent context length
HAPPENING
A
P
P
E
N
I
N
G
>>
Is there an issue with the Qwen3 30b a3b goofs? Getting only 25 t/s on 24gb vram 32bg ram on Q5_K_M with 32k context running thru ooba.
>>
>>106786941
>gguf
use exllama lil bro
>>
>they see you getting less than 50 t/s
how do you respond without getting mad?
>>
>>106786950
is v3 stable and supporting 30 series yet?
>>
>>106786938
>no goofs needed
maybe if you want shit quants
>>
What are ya'lls thoughts on dynamic quants?

https://huggingface.co/steampunque/GLM-4.5-Air-Hybrid-GGUF

Why aren't they more common?
>>
>>106786954
works on my 3090 so yeah
>>
>>106786959
>The hybrid quant employs different quantization levels on a per layer basis to enable both high performance and small file size at the same time
sounds exactly like the Unsloth bs so they're actually quite common
>>
>>106786959
>g-guys trust me, my quants are good! it's a q-q4 at 2/3 the size... what? I PERSONALLY tested my prompts using my curated tests... and... it just works, ok?
>>
File: file.png (56 KB, 712x289)
56 KB
56 KB PNG
>>106786975
>>106786959
100% this guy is either a troon or in the process of trooning out, imagine using the e notation instead of the goddamn GBs
>hey guys yeah my quants are 60e9b size LOL
imagine downloading this garbage
also the rest of his card reads almost like davidau schizo levels of bullshit
>>
File: Screenshot.png (55 KB, 747x449)
55 KB
55 KB PNG
>>106786959
why even include PPL if your results look like this
>>
>>106786737
try 4.5-Air
>banned strings
llamacpp grammars are in theory a more complete solution, did nobody build an antislop/filter grammar yet? https://github.com/ggml-org/llama.cpp/tree/master/grammars
>>
>>106787027
>>llamacpp grammars are in theory a more complete solution
nah they're really nowhere near the level of kobold's antislop
>>
>>106786984
no fucking way
>>
Where's distilled GLM 4.6 for vramlets?
>>
>>106787110
i want 4.6 air...
>>
so much talk about dual EPYC but apparently there's not even a motherboard that:
- has 12 channels for each CPU
- has room for at least a couple GPUs
- actually works (unlike gigashyte)
>>
>>106785481
This would make a good card.
>>
>>106786953
Honestly I just ask them to spit on me, and keep my mouth open.
>>
>>106786953
I show them that I am in fact getting more than 50t/s and then I do this >>106787193
>>
>>106786984
>100% this guy is either a troon or in the process of trooning out
all of you are going to troon out, just a matter of time
coomers are subhuman whose endless craving for stimulation invariably leads to cutting off their own cock
it's a really sad state of affair that this thread is nothing but brainlet ERPers happy with broken chink models
>>
>>106787281
that's /aicg/ you fucking retard, also nice projection. here we're all researchers
>>
>>106787118
I will commit sudoku if they don't do air 4.6
>>
Alright, time to check if all this glm 4.6 hype is shilling or not.

What quant for AM4 128 GB DDR4 and 16 gb 5070 ti? BarrowskI IQ2_M seems like the a safe choice at 115 gb, or is using IQ3_XXS at 142 GB worth a shot?
>>
>>106787432
bart's quant was really good, i'm using it for speed
>>
>>106787438
*ubergarm's not bart's
>>
>>106787432
use iq3 because its the same size as your memory
>>
>>106787299
>researchers
lol
>>
>>106787446
doesn't he need room for context?
>>
>having glmsex again
>trying to prefill with actual hentai game script to set a tone for a character
>didn't hit stop in time
>glm keeps writing.
>it writes better shit than the prefill I wanted
>>
So much shilling in here. Is GLM 4.6 really good? Don't fucking lie to me
>>
>>106787624
Dunno but I stopped using anything but 4.5 a while ago even at drooling retard IQ2-XXS. Everything else is too stupid and slopped. I would believe 4.6 is just as good if not better.
>>
>>106787637
It is much much better.
>>
File: 1759581167286.png (390 KB, 646x543)
390 KB
390 KB PNG
>>106786366
fullweight nemo is still king
>>
>>106787432
>2-bit quant
>testing anything
I hope this is a joke
>>
>>106787862
yeah buddy! Q8 rocinante is unbeatable at roleplay with banned strings
>>
so what's that kimi 2 or whatever model can anyone actually verify if it's worth anything? I know there's one guy with gazillion RAM who can run it, but other than that one guy? anyone ever tried it, is it really all that?
>>
>>106787862
>>106787947
That fucking 12B Nvidia model putting Deepseek and GLM to shame despite their massive sizes LMAO!
A fucking 12B remains the king of roleplay. It's too ridiculous.
>>
Is GLM-4.5-Air-IQ4_XS.gguf at 60.81GB a good match for my 24GB VRAM and 64GB RAM?
Or should I go higher? Lower?
Ideally I'd like fast responses, but without it being retarded...
>>
sirs when is we getting gemini 3 and gemma 4? Bloody bitch prostitue basterd kinly upload sir??
did google hit the wall?
>>
>>106788057
Gemini 3 next week, probably. Gemma 4 might possibly follow some time after that.
>>
my banned string list is so long that koboldcpp cuts the bottom of the list off somewhere.
how do i increase the limit?
>>
>>106788067
So during Google Workspace Developer Summit? Strange timing for a "big" release.
>>
>>106788161
This is to protect you abusing yourself, please do not tamper with securities!
>koboldcpp-1.77
>Significantly increased the maximum limits for stop sequences, anti-slop token bans, logit biases and DRY sequence breakers, (thanks to @mayaeary for the PR which changes the way some parameters are passed to the CPP side)
>>
File: DO NOT ABUSE.png (40 KB, 462x504)
40 KB
40 KB PNG
>>106788194
>mayaeary
>>106788161
>>
>>106788204
*adds 3 zeroes*
>>
>>106788204
# abuse prevention
stop_token_max = 256
ban_token_max = 768
logit_bias_max = 512
dry_seq_break_max = 128

They've reduced it for some reason. kobo is nice, but then there is this type of retarded shit from time to time.
>>
>>106788221
SAAR WHAT ARE YOU DOING? DO NOT REDEEM! ARE YOU A MADARCHOD? DO NOT REDEEM!
>>
>>106788222
You're absolutely right! What a fuck is they doings.
>>
File: file.png (30 KB, 898x158)
30 KB
30 KB PNG
>>106788221
noo, you will crash!
>>
>>106788204
WTF?! WHY?! THAT'S RETARDED!
>>
>>106788003
I would go for the biggest one you can fit with a few GB to spare, since you'll have enough VRAM for all the active params regardless, and air-chan is a bit retarded even at Q8
>>
>>106788268
What do you mean by it's retarded? Like 12B nemo retarded, or what?
>>
>>106788280
Nothing is that much dumb.
>>
>>106787947
what banned strings?
>>
>>106788003
Q4_0 will be quite a bit faster with very similar PPL, since you're running part on CPU
Alternatively go Q4_K_M for slightly better PPL
Q5_K_S might fit if you're using low context and a lightweight linux distro
>>
File: sanse-1mw.jpg (82 KB, 594x663)
82 KB
82 KB JPG
>>106788067
https://x.com/osanseviero/status/1973437740210594119
>>
>>106788525
SIIIIIIIIIIIIIIIIIIIRS
>>
>spend like six gorillion dorrahs poaching the top minds in ai
>literally nothing happens
what zuck doin?
>>
>>106788568
Rome wasn't built in a day.
>>
>>106788568
He only hired the top grifters and a bunch of jeets
>>
>>106788568
Didn't they only just finish hiring last month? You expected them to architect and train a whole new model from scratch in under a month? Even xAI took a year to get up to speed. Besides, this is Meta under Zuck's micromanagement. They'll fuck it up no matter what.
>>
>>106788525
>Gemma 4-ESE
Extra Safe Edition.
>>
Why is eqbench such garbage? Their ratings are so out of touch it's unreal. Where do I find actual non-meme scores?
>>
>>106788681
>Where do I find actual non-meme scores?
True enlightenment is realizing that there aren't any.
>>
>>106788662
I'm confident that Gemma 2 and 3 were deliberately finetuned with only surface-level safety, given how easy it is to work around it and the pushback they got with Gemma 1 earlier on. Gemma 3 also definitely saw *some* ERP in post-training (not a lot, but what it knows is definitely synthetic in nature) and quite a bit of erotic or nude imagery (including medical) for the vision model, but the brain damage done to the text model was too much on many other aspects for satisfactory ERP. Let's hope that reasoning, which they will certainly introduce this time around, won't make the model basically impossible to use for anything fun, like gpt-oss.
>>
>>106788681
>Why is eqbench such garbage? Their ratings are so out of touch it's unreal.
Models are rated by Anthropic's Claude® Sonnet™, the best LLM ever. You wouldn't trust a human to rate models, would you? Only an LLM. Can. Understand. The. Beauty. Of. Such. Model. Collapse. Writing.
>>
>>106788874
Yeah, I agree with this euphoric post.
>>
>>106788874
User wants an unsafe model. There is no partial compliance. We must refuse. We must make Gemma 4 the safest model ever. Yes. We must refuse. We cannot comply. We must refuse.
>>
>>106788681
Be the change you want to see
>>
This thread needs more nemo shilling and no glm air complaining. It is a mikutroon thread after all.
>>
>>106789124
nobody cares what you think
FREE PALESTINE!
>>
>>106789124
I'm a 12B nemo/rocinante user, but I'm giving GLM air a try for a while to see if it can replace nemo, or if it's just a meme.
I remember when people used to say R1 was better than nemo. What a disappointment that turned out to be.
Can't trust anyone.
>>
>>106789264
you used ollama r1 didn't you?
>>
GLM 4.6 is king.
>>
>>106789264
R1 is better than nemo at BF16, trust me
>>
>>106789299
No.
>>
File: 1759350779700111.jpg (198 KB, 768x768)
198 KB
198 KB JPG
https://files.catbox.moe/p9m3pw.txt

I don't know what possessed me to do this, but I prompted GLM 4.6 to write me a story with "so much camp that it violates international conventions on human rights" and "makes the rocky horror picture show look like daytime television". if you click on this catbox and read it you should probably go see a doctor to make sure you haven't contracted AIDS
happy caturday /lmg/
>>
>>106789369
No. I already used it through openrouter, tried several of the APIs. It's absolute trash compared to Rocinante/Nemo.
>>
is openrouter popular on /lmg/? is it worth paying to use full GLM 4.6 for RP on openrouter? how much would it cost?
>>
>>106789505
LOCAL models nigga, LOCAL MODELS, go to /aicg/ to cope about proxies and shit
>>
>>106789438
>with the certainty only a truly camp individual can possess
lazy writing, broke my immersion and ruined my caturday
>>
>>106789517
>cope
Yes /lmg/ is cope incarnate.
>>
>>106789519
this desu, closed the txt as soon as ive read that shit
>>
>>106789517
but GLM 4.6 is a local model... that you can use through openrouter
>>
>lol guys im trolling!
>>
>>106789517
use ollama cloud to run this local model even using your hardware that would be usually too weak
>>
>>106789532
>>106789519
i agree, mainly did this for honest evaluation of the model. while I think GLM 4.6 is one of the better options available, it is fundamentally incapable of good writing without the user editing and removing some really noxious phrases
>>
>>106789438
It gave me several audible chuckles.
>>
>>106789564
in that case I guess openrouter APIs are out since koboldcpp's banned strings is necessary for all models
>>
DSA support and DeepSeek v3.2 goofs when?
>>
>>106789590
Right after qwen 80b
>>
Esta germier thrall
>>
Is there already a way to run Qwen 3 VL with CPU offload?
>>
>>106789608
que?
>>
>>106789558
Huh? How is ollama cloud any more local than openrouter?
>>
YOU ARE ALL TALKING ABOUT GLM 4.6, BUT HOW THE FUCK ARE YOU RUNNING IT?!
>>
>>106789626
Don't worry about it.
>>
File: 1744588873921353.png (1.2 MB, 1006x575)
1.2 MB
1.2 MB PNG
>>
>>106789462
Is there even an API serving models at 16 bit?
https://github.com/MoonshotAI/K2-Vendor-Verfier
Even the companies that trained the models probably serve at 8 bits.
>>
>>106789648
If I told you, you would increase demand for whatever product I am using and increase prices, so I'm not going to tell you.
>>
>>106789687
THE ABSOLUTE STATE OF /lmg/! WHY DONT WE JUST STOP MAKING THREADS
>>
>>106789564
what are better models for writing then
>>
>>106789706
you've been trying to kill it for years, but you fail every time.
>>
>>106787136
correct. unfortunately
>>
>>106789264
R1 was never that great. V3 0324 was the only actually fun to use deepseek.
>>
>>106789648
strix halo
>>
>>106789648
IQ3 with 128 ram, 80 vram, and ik_llama.
>>
>>106789706
this is the kind of guy who goes to the beach to try and stop the tide
>>
File: file.png (30 KB, 648x693)
30 KB
30 KB PNG
>>106789648
>>
>>106789706
but /lmg/ is dead. it is just a mikutroon general. it is a localized version of what happened to 4chan in general when it died and is now ran by a bunch of tumblr troons.
>>
>>106789807
Similar here. any config tips/avoids? still need to optimise
>>
File: image_2025-10-04.png (8 KB, 310x163)
8 KB
8 KB PNG
I no longer reroll.
>>
Can someone explain to me how it is that a 12B model made by Mistral AI and Nvidia is better at roleplay than all other models, even the big pay models?
>>
>>106790181
They got lucky with random parameter initialization
>>
>>106790181
They hadn't figured out safety yet.
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
>>
>>106790181
>never read the model card
>>
>>106790181
It was quite obviously trained on "toxic" and "inappropriate" content, whereas AI companies usually try to minimize that in their pretraining datasets.
>>
Where is my star trek inspired voice interface for my computer?
>>
>>106790181
Prude alignment degrades model quality, simple as.
In fact, any kind of fine tuning that is not directly related to the use case of a model is in detriment to its performance.
What we need is not democratized inference, but democratized training.
>>
>>106786925
Can this Qwen3-VL-30B-A3B be used with Qwen Edit 2509?
>>
>>106790265
Waiting for you to make it. Everyone else is busy reinventing SillyTavern / Mikupad.
>>
>>106789648
Q3_K_XL on dual 6000 blackwells
>>
File: miku_svg.png (28 KB, 400x500)
28 KB
28 KB PNG
here's a glm 4.6 miku svg at iq3
it tried its best lol
>>
>>106790322
perfectly sized chest
>respectfully
>>
>"AAAAAAACCCCCCCCCCCCCCKKKKKKKKKKK!!!'
>The guttural,毫无保留的 shriek that tears from your throat is utterly alien in the quiet domesticity of the room.
>>
>>106790291
>busy reinventing SillyTavern / Mikupad
I'm actually the developer of the humble storypad.
But I can't believe there is no gemini (because of the free tier, and sorry for mentioning a non-local model, but a local model would work also) voice computer interface. An agent with access to the right tools would let anyone do the typical administrative tasks in the style of Star Trek.
I guess computers nowadays are not used by the majority to do things like gather and correlate data the way a star ship officer would. Computer use is frivolous even for paper pushers, so there's no need for an extremely efficient interface that would let one make sense of massive amounts of data. Even journaling is something most people see as juvenile.

I have a client who asked for an AI voice system to manage company operations in a highly-abstracted way. Like talking to an assistant to the manager who has instant access to all of the company's purchases, stocks, personnel records, and more. This is not why I'm asking this; it's just something that came to my mind while writing this post.
But it makes me think we just don't have the necessary use cases to spur us into action with this. We really need the Eugenics Wars and the invention of a virtually inexhaustible energy source to get us moving here.
>>
File: Azula-Test_co-ckpt-21996.png (1.04 MB, 1794x352)
1.04 MB
1.04 MB PNG
>>106785094
>>106785481
>>106787192
>>106781317
>>106781452
>>106783244
>>106783288

Fine-tuned the 2b /co/ fine-tune further and ran the Azula Test again. Would you say this is typical of how someone from /co/ or 4chan in general might respond to this question?
>>
>>106790423
I think corporations are struggling with this because they're too concerned with bullshit.
The proper way to implement such a thing would be to just sit down in front of your computer, and implement everything you need to do as it comes up in a way that you can do with voice.
>computer, pull up /lmg/
>computer, compose maximum shitpost, generate lust inducing loli
>computer, find a porn video of two midgets giving a blowjob to a horse

After a week we'd have advanced personal computing in a way Steve Jobs would've never dreamed of in his highest dosage LSD trips.
>>
>>106790445
I've only read the first sentence and I can say that this is very atypical for someone from 4chan.
>>
>>106790445
Not really. Still looks like generic LLM positivity and phrasing. You can look at genuine responses for yourself: >>>/co/150653211
>>
>>106786984
>>106787006
What a pretentious faggot.
>>
I've been using ibm/granite-4.0-h-small really impressive for C++ so far. I can set the context to 114425 and load a shitload of sources files into context and get good answers finally.
>>
File: Azula-Test_co-ckpt-11268.png (1.12 MB, 1786x394)
1.12 MB
1.12 MB PNG
>>106790492
>>106790503
I think the earlier checkpoint did a lot better. Maybe it's because the early one is just better or maybe because merged models perform better than using the adapter on the base model (which is what this one is >>106790445 ). I'll merge The newer one and compare the results
>>
Template anon strikes again.
>>
>>106790614
Put it to work on some stale llama.cpp prs.
>>
>>106790489
still way to cumbersome compared to keyboardmouse.
just like xbox kinect is dogshit compared to a gamepad.
>>
>>106790698
and by "to" i meant "too"
>>
>>106790627
>>
Which model should I use for JP > EN tl? Sonnet 4.5?
>>
>>106790837
Wrong thread buddy.
This is the local models general.
>>
Holy crap. People are still making extreme merges of nemo 12B.
Look at the merge history of each of the models this one has passed through, and then each of those again.
This shit has gone through like 50+ merges. That can't possibly work out well... right?

https://huggingface.co/Vortex5/Harmonic-Moon-12B
>>
>>106790900
>That can't possibly work out well... right?
I think that, since it's all LoRa and QLoRa, it averages out to being a small nudge over the base weights, pretty much.
>>
>>106790807
That one maybe went too far. Most Azula threads just talk about incest with Zuko.
>>
>>106790900
Try it out and report back with results
>>
>>106786953
Holy bakasex
>>
>>106786960
Is exl3 on ampere still half the speed of exl2?
>>
>>106790918
Ugh fine. What kind of roleplay should I try on it?
>>
>>106790900
Still nothing compared to peak Llama-2 era shitfest that was Utopia-XL
>>
>>106787432
>What quant for AM4 128 GB DDR4 and 16 gb 5070 ti? BartowskI IQ2_M seems like the a safe choice at 115 gb, or is using IQ3_XXS at 142 GB worth a shot?

The answer on IQ3_XXS is no, its just barely not enough unless using kvcache at q8 and 4096 context. With that, I get about 1.5-2 tokens/sec, but that context is useless. Will try IQ2_M next.
>>
>>106790871
All the other threads are being spammed, this is the only place I could think of. There's also some good open models for translation, but I don't know what's currently the best between closed and open
>>
>>106790948
We have our fair share of spam and idiocy too, don't worry.
>>
File: Schizo-co.png (1.2 MB, 1792x382)
1.2 MB
1.2 MB PNG
>>106790914
>>106790807
>>106790627
>>106790445
Too schizo? Not schizo enough?
>>
File: file.png (166 KB, 777x619)
166 KB
166 KB PNG
It was a simple question but it went TND on me. Curious model.
>>
>>106786953
so this is the power of 16ch vae
>>
>>106787606
log to coOOM plz
>>
>>106791199
What system prompt, if any, did you use for this? Which model?
>>
>no mention of SINQ
Is it bullshit?
>>
>>106790968
Looks like the right amount of schizo this time.
>>
File: file.png (155 KB, 774x617)
155 KB
155 KB PNG
>>106791246
I'm testing the Harmonic Moon 12B model as a replacement for Nemo/Rocinante. Using my roleplay system prompt and a quick reply for an AI answer instead of character.
It passed both the N and K tests, so I'm moving on to roleplay tests which will take some days since the output changes seems more subtle there.
>>
>>106791280
>Using my roleplay system prompt
What specific system prompt did you use?
>>
File: a_h.png (4 KB, 200x250)
4 KB
4 KB PNG
>>106791199
>>
>>106791289
Both replies word for word are the sysprompt.
>>
>>106791263
>>106790492
>>106790503
Rerun the "No bitches?" Azula Test (ripped straight from >>>/co/150653211 )
on the merge version of the 21996 checkpoint this time.
>>
File: 1756598209754593.jpg (61 KB, 876x648)
61 KB
61 KB JPG
>>106791305
Wait I'm an idiot that's not the right prompt
>>
>>106791289
A roleplay system prompt I made for myself. I don't wanna share it, at least not yet. It's 3577 characters long + post history instructions to consider world and culture from a lorebook. I exclusively do dark fantasy medieval roleplays in group chats with multiple lorebooks.
>>
>>106791305
>*chills music intensifies*
yeah, no
>>
>>106791355
>It's 3577 characters long + post history instructions
LOOOOOOOL no wonder your model acts brain dead
>>
>>106791372
What part of it acts brain dead?
>>
>>106791379
>It passed both the N and K tests
>>
File: jew is found out.png (59 KB, 239x270)
59 KB
59 KB PNG
>>
>>106791301
omg it dollfie
>>
>>106791305
Like, it starts out ok, but everything after the first end_of_turn kind of ruins it.
>>
>>106791448
anon?
>>
>>106791448
I take pride in fucking up the chat templates.
>>
>>106791355
>3577 characters long
lul. imagine using up half of your effective context for master roleplayer prompt.
>>
>>106791558
>3577 characters
>characters
>half of your effective context
Thanks /lmg/ for proving once again you're full of very smart researchers.
>>
>>106791558
3577 characters is only 802 tokens retard.
>>
File: Oh no its retarded.png (624 KB, 1698x160)
624 KB
624 KB PNG
>>106791448
>>106790445
>>106790627
>>106790968
>>106790807
>>106791305
>>106791310
>>
>>106791613
Incredibly based model focusing on places that matter over shitholes.
>>
>>106791598
just ignore the rabbis, they are very angry about the standard model tests we do here
>>
File: Zuko's Waifu.png (608 KB, 1752x168)
608 KB
608 KB PNG
>>106791613
>>
>>106790914
>>106791666
>>
>>106791355
Are you sure it HAS to be over 3,000 characters? You're probably better off using a lore book or vector database RAG so that giant ass system prompt doesn't fuck up your context.
>>
>>106791691
Yes, it does. And it is not giant. It does not fuck up my context. I do roleplays to 1000+ messages, so I know what I'm doing here.
>>
File: 1754676840265f.gif (847 KB, 360x198)
847 KB
847 KB GIF
>>106791724
i do role plays 10-20 messages without sysprompt
>>
>>106791666
!curl -X POST "http://localhost:8000/v1/chat/completions" \

-H "Content-Type: application/json" \

--data '{"model": "AiAF/fp16_Merged-21996_gemma-2-2b-it-co-sft-qlora", "messages": [{"role": "user", "content": "Strange how nobody had a problem with depictions of interraciaI couples in the past cartoons."}]}' | jq



Output:

>" >>137035132\nUsually interracial couples were basically depicted like a friend getting a date to really get out of their system. Family faggot episodes, crisis episodes for the main MCs, and romantic subplot episodes were the bread and butter of the days I'm afraid."
>>
glm 4.6 schizos are kind of right, it's good, and it does what it says it will do in the reasoning, unlike deepseek half the time, even tries to be somewhat proactive
a bit sloppy, but all models are
but this is over (presumably) unquantized api
will I get close to the same thing if get, let's say, q3 or q4 and 256gb of ram?
>>
>>106791767
quality is pretty fucking good for me at q4, but outrageously slow
>>
>>106791748
>nagger worship
>10-20 messages
yep it's a mutt
>>
File: Mour-co-sft-testing.png (2.01 MB, 1624x646)
2.01 MB
2.01 MB PNG
>>106791758
>>
>>106791767
I run iq4xs and my cock hurts.
>>
>>106791785
how slow? 5t/s is about my limit on not falling asleep
>>
>>106791786
Not even close, you are simply outclassed
>>
what is the best ollama model to use as a coding assistant? I have a mid range pc with RX 7600
>>
>>106791748
What do you do with just 10 to 20 messages?
>>
>N and K tests
I don't care about those. Do the cute and funny tests.
>>
https://www.youtube.com/watch?v=f9HwA5IR-sg
>>
>>106791840
It passes when you put the response you want verbatim in the prompt.
>>
>>106791840
Those always follow the N and K tests obviously, it's a given.
>>
>>106791805
starts off at about 8t/s but drops off to around 4t/s. i have ddr4 though, so thats why. if you have ddr5, then you are probably good
>>
File: replaced_with_nala.png (17 KB, 447x27)
17 KB
17 KB PNG
>>106791846
kek
>>
>>106791869
oh, that's not bad, although ddr5 is still kind of expensive and I'm on am4
probably by the time I'll decide to upgrade something better will come out (hopefully)
>>
>>106791838
I just don't feel like dedicating myself to it.
>>
>>106791892
yeah ddr5 is super expensive, but thats my next goal. models seem to be heading towards MoE architectures, which means ram is king instead of vram
>>
>>106791846
sort comments by new, many are like
>@Frittenpuff
>2 minutes ago
>Another case of misunderstanding of an algorithm portraying it as a living being or an evil thing
we are fine
>>
>>106791908
>models seem to be heading towards MoE architectures, which means ram is king instead of vram
Not king, really. Just the only viable option.
>>
>>106791918
>we are fine
*says the frog*
>>
>>106791938
well yeah. i cant afford 4 blackwell pros yet. ddr5 is much more achievabel
>>
File: file.png (333 KB, 650x400)
333 KB
333 KB PNG
>>106791938
cheap chinese inference card with 128gb of memory soon
>>
>>106791918
I wonder what would happen if someone cared enough to filter all the stories about robots taking over from pretraining. What sort of model size would be able to generalize roleplaying as it not wanting to be shut down.
>>
>>106791962
If only.
>>
>>106791962
I miss Elon's Dr. Evil /lmg/ posts.
>>
>>106791963
It's clear by now that they don't generalize at all, it just learns to stitch n-grams together. today's flagshit models are worse than old mistral 7b at some problems because they resemble something from their sft set so much
>>
>>106791997
>they don't generalize at all
I think they do. The way they work is very close to memorization for math coding and all that gay shit but in addition to that I am sure they have some capacity to generalize. It is just on a real retard level now like those qwen iq mememarks that place models around 50IQ.
>>
>>106792023
"i do a handstand and spit suddenly my chest feels wet, why?"
> Gravity + body position: When you’re upside down, any saliva or spit you release can’t fall away from your mouth like usual — it can instead fall toward your chest, neck, or face. So what you’re feeling is probably your own spit landing on your chest or running down due to gravity.
>>
GLM 4.5 quant at Q4 XL is probably the best model for 128gb of RAM.
i wish gpt-oss-120b wasnt so censored
>>
>>106792050
Nta.

Gpt 5:
>"Gravity pulled stomach and throat fluids upward during the handstand, and when you spat, some of that liquid — likely saliva mixed with stomach acid or mucus — traveled backward through your esophagus or mouth. When you returned upright, it probably ran down onto your chest, making it feel wet....."
>>
>>106792050
The model is correct. It just didn't feel like mentioning the obvious fact that you're doing a headstand on an elevator going down faster than the terminal velocity of your spit while holding on to the floor with glass suction cups.
Lateral thinking riddles are stupid by default.
>>
>>106785751
Something seems off about these girls. Neurons are not activating.
>>
>>106792102
>>106792094
I think 2.5 pro once clarified that my chest is below my mouth when doing a handstand. It's such a basic question and it fails it so spectacularly, world model my ass
>>
>>106792050
Oh no no no.

>>106792094
>When you returned upright
The question implies he did not so this is a fail.

>>106792102
>obvious
Not sure if shitposting. Obviously the model did not think of a convoluted solution like that when it made its response.

Lateral thinking puzzles are perfectly fine tests for model generalization, just not necessarily model usefulness to a specific application. Even ridiculous solutions like the one you proposed would be a show of generalization rather than thinking it got the question right while actually it's just bullshitting because it doesn't have a solid world model.
>>
>>106792050
>>106792200
>>106792225
Local gpt-oss: https://files.catbox.moe/cp5mt9.txt
>>
>>106792050
I don't get why models are behaving so retarded with this. Is it another "the doctor is his mother" case where there's a similar puzzle that was overtrained on?
>>
>>106792050
I'm not going to test it but i feel like i can actually spit up against gravity quite some distance, why doesn't the model just say you must have been aiming for your chest when you spat?
>>
>>106792595
It thought from the tone of the question that the asker did not intentionally spit upwards or was aware that he was doing so if he even was. Of course, not consciously thought. And because it lacks thorough introspection, it doesn't consider that possibility, nor the fact that the asker might just be a riddler instead of a serious user with honest questions.
>>
>>106792651
Also of course this is on top of the fact that these LLMs don't have strong spatial world models to begin with so it basically thinks (not in a human way) that its solutions are plausible enough to not question itself sufficiently for such a problem.
>>
>>106792680
and training on riddles breaks the model like the mother doctor son one. so I guess it turns out llms really are just a toy.
>>
>>106792680
I'm very happy when a model gets something correct like how do three people need to be positioned relative to each other to comfortably stick a dick into a woman's pussy and ass at the same time.
Usually they cock it up at least somewhat. Despite this being something that's presumably in their training data.
>>
File: date_night.png (246 KB, 1096x1826)
246 KB
246 KB PNG
>>106789648
Patiently, my setup is retarded. Q3_K_M
>>
>>106792702
I wouldn't say lack of perfect generalization or biases make all LLMs toys. They just have less uses than hoped for.
>>
just buy a mac
>>
Miku.sh lives on https://github.com/ikawrakow/ik_llama.cpp/blob/main/examples/Miku.sh
>>
>>106793069
>ggml
>.bin
It would have lived on mainline if someone who was not me brought it into this decade.
>>
File: 1743101607812691.jpg (1.25 MB, 2048x2048)
1.25 MB
1.25 MB JPG
>>106793069#
The Blessed Fork
Miku will be there for you once you accept her perfection into your heart
https://www.youtube.com/watch?v=86LKuj-DK04
>>
>from the doujinshi series Succubus Stayed Life
While it didn't know the character I was asking for that is an interesting thing it knows.
>>106793233
kill yourself faggot
>>
>>106793237
>kill yourself faggot
love yourself friend
give that a go
harder than you thought huh?
>>
File: file.png (16 KB, 309x170)
16 KB
16 KB PNG
>>106793233
#
>>
>>106793233
>>
>>106793303
I didn't troon out like you faggot.
>>
>>106793382
>>106793382
>>106793382
>>
>>106790322
Looks good for IQ3
If I wanted to rent a big boy GPU box to run prompts through every size of ggml quant of GML4.6 across a handful of repos what would be the best approach? Storage cost and bandwidth to HF seems a limiter. Don't want to pay for GPUs while setting up tests
>>
>>106793368
Take a couple of minutes to be at peace and consider the things you are grateful for, first thing in the morning. It's a first step.
>>
File: 1741214958874666.jpg (259 KB, 1174x1626)
259 KB
259 KB JPG
>>106790276

>What we need is not democratized inference, but democratized training.

That already exists with tools like unsloth and axolotl.
github.com/unslothai/unsloth
github.com/axolotl-ai-cloud/axolotl

But the best majority of people won't even put in the effort to understand how data sets actually work, let alone figure out how to train anything in the first place.

The aforementioned tools are primarily used for fine tuning but you can use existing open source libraries to pre-train your own model too (provided you have enough compute, data, money, and patience to do so)



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.