/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 01/02/26(Fri)11:32:50 No.107741641

File: 1758380067484893.jpg (188 KB, 784x1312)

188 KB JPG

/lmg/ - Local Models General Anonymous 01/02/26(Fri)11:32:50 No.107741641

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107731243 & >>107722977

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/02/26(Fri)11:33:11 No.107741646

Anonymous 01/02/26(Fri)11:33:11 No.107741646

File: __hatsune_miku_yuki_miku_(...).png (1.27 MB, 2000x1460)

1.27 MB PNG

►Recent Highlights from the Previous Thread: >>107731243

--Papers:
>107734410
--Speculative decoding's viability for VRAM-constrained model optimization:
>107738195 >107738208 >107738239 >107738290 >107738214 >107738362
--Context management strategies for roleplay-focused local AI models:
>107731328 >107731380 >107731846 >107731465
--Critique of unreliable local model recommendations and performance limitations:
>107731301 >107731573 >107731578 >107731590
--CUDA update potentially disrupting LLM speed due to cache issues:
>107735927 >107736635
--Tool call execution issues and platform-specific model performance:
>107733442 >107733519 >107733537 >107733546 >107733612 >107734113
--ollama systemd service conflict causing port binding errors:
>107735488 >107735554 >107735613 >107735828
--Configuring multi-voice defaults in Kokoro-FastAPI with normalized weight syntax:
>107733445 >107733591 >107733859 >107734422
--Choosing a 12GB VRAM model for medieval roleplaying:
>107732457 >107732485 >107732543 >107732510 >107732534 >107732564 >107732569 >107735446 >107732585 >107732813 >107737636
--Technical issues and alternatives in local AI image generation tools:
>107735122 >107735154 >107735168 >107735213 >107735306 >107735370 >107735417 >107735424
--LLMs for interactive 3D modeling workflows:
>107733778 >107733797 >107733816 >107733896 >107738787
--Exploring AI tools for multilingual audiobook creation:
>107732611 >107732637 >107732790 >107732988
--China narrowly surpasses the US in an AI index chart, sparking debate:
>107735053 >107735143
--LLMs as unreliable information storage like Warhammer 40k STCs:
>107734613 >107734697 >107734770 >107734791 >107734799 >107740057
--llama.cpp integrates IQuest-Coder-V1-40B, youtu-vl, and Solar-Open-100B models:
>107732751
--Rin (free space):
>107732945 >107735660 >107736327 >107740001 >107740585 >107741399

►Recent Highlight Posts from the Previous Thread: >>107731249

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/02/26(Fri)11:34:41 No.107741661

Anonymous 01/02/26(Fri)11:34:41 No.107741661

we already had a thread delet this spammer

Anonymous
01/02/26(Fri)11:34:58 No.107741664

Anonymous 01/02/26(Fri)11:34:58 No.107741664

>>107741641
Stop freezing me, Miku. It's too cold!

Anonymous
01/02/26(Fri)11:36:01 No.107741672

Anonymous 01/02/26(Fri)11:36:01 No.107741672

tetosex

Anonymous
01/02/26(Fri)11:38:56 No.107741691

Anonymous 01/02/26(Fri)11:38:56 No.107741691

glm4.6 IQ2_KL or DeepSeek-R1-0528-IQ1_S?
which will give me the better rp?

Anonymous
01/02/26(Fri)11:40:23 No.107741704

Anonymous 01/02/26(Fri)11:40:23 No.107741704

>>107741646
>7 (You)s

Anonymous
01/02/26(Fri)11:40:55 No.107741710

Anonymous 01/02/26(Fri)11:40:55 No.107741710

>>107741691
Try them both but probably deepseek.

Anonymous
01/02/26(Fri)11:44:27 No.107741745

Anonymous 01/02/26(Fri)11:44:27 No.107741745

>>107741710
deepseek even at q1? I've tried 4.7 and it's ok. People on here say 4.6 is more creative but dumber, 4.7 is dryer in its prose but more intelligent. Though to me, 4.7 seems a little slopped.

Anonymous
01/02/26(Fri)11:45:11 No.107741748

Anonymous 01/02/26(Fri)11:45:11 No.107741748

File: 3448641201.jpg (176 KB, 1024x534)

176 KB JPG

>>107741691
but also you need to remember the bit parrot

Anonymous
01/02/26(Fri)11:50:09 No.107741775

Anonymous 01/02/26(Fri)11:50:09 No.107741775

>>107741641
If I punched her, would she shatter?

Anonymous
01/02/26(Fri)11:50:24 No.107741779

Anonymous 01/02/26(Fri)11:50:24 No.107741779

>>107741691
>IQ2_KL
>IQ1_S
Ain't no way running models at this level of quant cucking gives better output then just running a smaller model at a reasonable quant size.

Anonymous
01/02/26(Fri)11:55:19 No.107741812

Anonymous 01/02/26(Fri)11:55:19 No.107741812

>>107741779
1b quant is essentially baked in
there's no variance

Anonymous
01/02/26(Fri)11:56:06 No.107741818

Anonymous 01/02/26(Fri)11:56:06 No.107741818

>>107741779
I run 4.7 at q2 and it easily beats out everything I've tried up to 32B. Given I only have 32gb of vram, I could try a 70b at q3. What do you suggest?

Anonymous
01/02/26(Fri)11:58:47 No.107741836

Anonymous 01/02/26(Fri)11:58:47 No.107741836

>>107741779
>>107741812
Ask me how I know you don't have enough memory to run them.

Anonymous
01/02/26(Fri)12:01:42 No.107741851

Anonymous 01/02/26(Fri)12:01:42 No.107741851

>>107741691
Either way, R1-0528 is pretty outdated by now
>>107741818
No shit these huge MoEs are better than some tiny 32b shit even if you quant them to death. 70b have been dead for over a year now too.

Anonymous
01/02/26(Fri)12:02:00 No.107741853

Anonymous 01/02/26(Fri)12:02:00 No.107741853

devstral at q2 outputs code perfectly. how much do you think quanting hurts your hot breath on her skin erp?

Anonymous
01/02/26(Fri)12:04:54 No.107741871

Anonymous 01/02/26(Fri)12:04:54 No.107741871

Ok, so I've been wondering about this for a while. When you guys run your models on system RAM. how many T/s do you get on average?

Anonymous
01/02/26(Fri)12:06:05 No.107741876

Anonymous 01/02/26(Fri)12:06:05 No.107741876

>>107741853
Creative writing is a more complex task than being a virtual code monkey.

Anonymous
01/02/26(Fri)12:06:21 No.107741880

Anonymous 01/02/26(Fri)12:06:21 No.107741880

>>107741704
man I shitposted so hard last thread, 0 yous. its unfair

Anonymous
01/02/26(Fri)12:07:06 No.107741889

Anonymous 01/02/26(Fri)12:07:06 No.107741889

>>107741880
Here you go anon.

Anonymous
01/02/26(Fri)12:09:17 No.107741906

Anonymous 01/02/26(Fri)12:09:17 No.107741906

>>107741851
>R1-0528 is pretty outdated
There have been no advances in knotting technology since R1 was released and it will still mention knotting when applicable with no handholding.

Anonymous
01/02/26(Fri)12:10:06 No.107741912

Anonymous 01/02/26(Fri)12:10:06 No.107741912

File: file.png (156 KB, 833x559)

156 KB PNG

All my posts on 4chan are satirical and is not a statement, nor reflect my real opinions. (For the feds)

>>107741880
This is how you shitpost

Anonymous
01/02/26(Fri)12:10:21 No.107741914

Anonymous 01/02/26(Fri)12:10:21 No.107741914

Will Cydonia EVER be topped?

Anonymous
01/02/26(Fri)12:11:20 No.107741921

Anonymous 01/02/26(Fri)12:11:20 No.107741921

>>107741912
I've got your sysprompt saved brah

Anonymous
01/02/26(Fri)12:11:27 No.107741923

Anonymous 01/02/26(Fri)12:11:27 No.107741923

>>107741812
The word you're looking for is "Deterministic", and they aren't (just a vastly reduced binary solution space).

Anonymous
01/02/26(Fri)12:11:57 No.107741930

Anonymous 01/02/26(Fri)12:11:57 No.107741930

>>107741641
I look like this

Anonymous
01/02/26(Fri)12:13:42 No.107741943

Anonymous 01/02/26(Fri)12:13:42 No.107741943

>>107741871
i mean this is what MoEs are made for.
llama.cpp:
-ot ".ffn_.*_exps.=CPU"
offloading MoE layers to ram for faster inference.
I still run with at least one GPU though.

Anonymous
01/02/26(Fri)12:15:06 No.107741954

Anonymous 01/02/26(Fri)12:15:06 No.107741954

>>107741871
and to answer your question not more than 10 tokens a second with 128 ram and 24 vram.

Anonymous
01/02/26(Fri)12:15:09 No.107741955

Anonymous 01/02/26(Fri)12:15:09 No.107741955

>>107741943
That's a funny way to say "sub reading speed"

Anonymous
01/02/26(Fri)12:15:48 No.107741959

Anonymous 01/02/26(Fri)12:15:48 No.107741959

>>107741921
crap you figured me out lol
do you like it?

Anonymous
01/02/26(Fri)12:16:15 No.107741962

Anonymous 01/02/26(Fri)12:16:15 No.107741962

current LLMs are like the human centipede

Anonymous
01/02/26(Fri)12:16:47 No.107741968

Anonymous 01/02/26(Fri)12:16:47 No.107741968

>>107741871
with a dual epyc genoa I'm getting 13t/s, which is good enough as a background workhorse for any tasks involving PII.

Anonymous
01/02/26(Fri)12:17:22 No.107741972

Anonymous 01/02/26(Fri)12:17:22 No.107741972

>>107741955
well then run your shitty 8B at above human speeds then

Anonymous
01/02/26(Fri)12:18:09 No.107741980

Anonymous 01/02/26(Fri)12:18:09 No.107741980

>>107741955
now imagine what it's like to use a model that loves to go on and on in their <think> block with that t/s before you get to the first useful, readable token
cpu rammers are experts of huffing copium

Anonymous
01/02/26(Fri)12:22:43 No.107742012

Anonymous 01/02/26(Fri)12:22:43 No.107742012

>>107741980
send me some money for some 5090s then

Anonymous
01/02/26(Fri)12:28:53 No.107742065

Anonymous 01/02/26(Fri)12:28:53 No.107742065

>>107741914
I just topped it.

Anonymous
01/02/26(Fri)12:30:06 No.107742076

Anonymous 01/02/26(Fri)12:30:06 No.107742076

>>107741972
>8B
please. I can run at least 30B

Anonymous
01/02/26(Fri)12:32:17 No.107742100

Anonymous 01/02/26(Fri)12:32:17 No.107742100

>>107741968
Those are respectable speeds but you could have bought like 6x 3090s for the price of those CPUs.

Anonymous
01/02/26(Fri)12:34:08 No.107742117

Anonymous 01/02/26(Fri)12:34:08 No.107742117

>>107741923
Whatever cunt, not my fault you are autistic and unable to communicate with real humans. Maybe reddit is the right place for a fact checker like yourself.

Anonymous
01/02/26(Fri)12:41:22 No.107742177

Anonymous 01/02/26(Fri)12:41:22 No.107742177

>>107741943
What's the difference between this and --n-cpu-moe

Anonymous
01/02/26(Fri)12:45:40 No.107742210

Anonymous 01/02/26(Fri)12:45:40 No.107742210

File: taxi driver finger gun to head.jpg (92 KB, 900x675)

92 KB JPG

So I am searching a model (or more than one different models) for these two use cases:
First is that I want AI to be a practice buddy for a language I used to know. I still understand and recall a decent chunk of vocabulary, but not enough and my grammar is broken. I want to discuss topics in this language and I want it to correct any mistakes in my post, mention what I fucked up and then respond to me. I want to discuss a variety of different topics, so I am looking for something that has both great multi-lingual and conversational capability.
Second is that I got emotional this new year. I am usually fairly apathetic but some old pent-up issues resurfaced. Therapy is a reddit meme and dumb waste of money (Tried out of desperation in the past, fuck me) but I really need to vent. Goes without saying that I don't have anyone irl so yeah I need something from here. I could do the first one on an API maybe if I end up not liking the performance but I am not doing this shit on someone else's server. I am not retarded enough to expect wisdom from a chatbot, but is there something that would give me at least the illusion that I am not shouting into the void? I also want something that won't constantly police and moralize over my politically incorrect schizoid world view, so probably need an abliterated or uncensored model for this. (Just to be clear again I am not looking for a mindless hug machine but I want it to call me out when I tell something genuinely wrong, not because my ideas violate some dumb alignment training.)
I am also not sure if Q6 of a smaller model or Q3 of some 20B model would work better for these use cases. (12gb VRAMlet)

Anonymous
01/02/26(Fri)12:47:13 No.107742232

Anonymous 01/02/26(Fri)12:47:13 No.107742232

>>107742210
need to know the language to suggest as its model dependent what they know/are good at

Anonymous
01/02/26(Fri)12:47:23 No.107742235

Anonymous 01/02/26(Fri)12:47:23 No.107742235

File: 1756143841817885.png (408 KB, 894x870)

408 KB PNG

>>107729547
>IQuest-Coder-V1
>This is either extremely impressive benchmaxxing or they actually cooked up something new.

So about that

https://github.com/IQuestLab/IQuest-Coder-V1/issues/14

Anonymous
01/02/26(Fri)12:48:01 No.107742242

Anonymous 01/02/26(Fri)12:48:01 No.107742242

https://tech.slashdot.org/story/26/01/02/1449227/results-were-fudged-departing-meta-ai-chief-confirms-llama-4-benchmark-manipulation
oh god this is hilarious
we all knew about it but I never expected a meta employee or ex meta employee to openly talk about it in the open under their name
>In an interview with the Financial Times, LeCun said the "results were fudged a little bit" and that the team "used different models for different benchmarks to give better results."
desu I'm kinda glad llama is no longer a thing because of this, the models were never good people were just coping with the garbage because there wasn't a lot of open source choice then

Anonymous
01/02/26(Fri)12:49:31 No.107742259

Anonymous 01/02/26(Fri)12:49:31 No.107742259

>>107742235
>sloppiest of slop answer

Anonymous
01/02/26(Fri)12:51:04 No.107742271

Anonymous 01/02/26(Fri)12:51:04 No.107742271

>>107742242
I remember /lmg/ being highly skeptical of benchmarks for L4 due to how shit it was.

Anonymous
01/02/26(Fri)12:51:25 No.107742275

Anonymous 01/02/26(Fri)12:51:25 No.107742275

>>107742242
https://web.archive.org/web/20260102135720/https://www.ft.com/content/e3c4c2f6-4ea7-4adf-b945-e58495f836c2

>[...] The subsequent Llama models were duds. Llama 4, which was released in April 2025, was a flop, and the company was accused of gaming benchmarks to make it look more impressive. LeCun admits that the “results were fudged a little bit”, and the team used different models for different benchmarks to give better results.
>
>“Mark was really upset and basically lost confidence in everyone who was involved in this. And so basically sidelined the entire GenAI organisation. A lot of people have left, a lot of people who haven’t yet left will leave.”

Anonymous
01/02/26(Fri)12:51:58 No.107742280

Anonymous 01/02/26(Fri)12:51:58 No.107742280

>>107742100
>24x6
But then I wouldn't have 768GB of RAM and be able to run any model I want at a good quant, would have a leafblower/spaceheater and probably need some godawful mining frame and multiple PSUs.

Anonymous
01/02/26(Fri)12:52:57 No.107742295

Anonymous 01/02/26(Fri)12:52:57 No.107742295

>>107742232
German

Anonymous
01/02/26(Fri)13:00:51 No.107742352

Anonymous 01/02/26(Fri)13:00:51 No.107742352

>>107742177
https://github.com/ikawrakow/ik_llama.cpp/pull/1026#issuecomment-3602303815

Anonymous
01/02/26(Fri)13:00:52 No.107742354

Anonymous 01/02/26(Fri)13:00:52 No.107742354

>>107742280
>a leafblower/spaceheater and probably need some godawful mining frame and multiple PSUs
It's so true.

Anonymous
01/02/26(Fri)13:06:04 No.107742402

Anonymous 01/02/26(Fri)13:06:04 No.107742402

I'm surprised ik isn't dead yet with all the massive refactors llama.cpp went through which must make it painful to sync the fork to

Anonymous
01/02/26(Fri)13:08:54 No.107742427

Anonymous 01/02/26(Fri)13:08:54 No.107742427

>>107742402
i used ik for a bit but honestly the quants are a bit retarded in their outputs sometimes and there have been significant improvements in llama.cpp since then

Anonymous
01/02/26(Fri)13:11:24 No.107742447

Anonymous 01/02/26(Fri)13:11:24 No.107742447

>>107742427
The graph split mode is really really nice if you can get it work. Only thing llama.cpp is missing right now.

Anonymous
01/02/26(Fri)13:12:55 No.107742458

Anonymous 01/02/26(Fri)13:12:55 No.107742458

>>107742447
>Only thing llama.cpp is missing right now.
So their prompt processing speed with MoEs has caught up?

Anonymous
01/02/26(Fri)13:18:38 No.107742500

Anonymous 01/02/26(Fri)13:18:38 No.107742500

>>107742458
Wouldn't know. Only tried recently with dense models and didn't notice a significant difference.

Anonymous
01/02/26(Fri)13:32:04 No.107742995

Anonymous 01/02/26(Fri)13:32:04 No.107742995

File: oneMillionAmazonRobots.png (102 KB, 959x537)

102 KB PNG

... but where is my local waifu Bezos.

Anonymous
01/02/26(Fri)13:32:16 No.107742998

Anonymous 01/02/26(Fri)13:32:16 No.107742998

>>107742177
i tried --cpu-moe, it didn't change the speed for me compared to -ot ".ffn_.*_exps.=CPU" but it does feel a lot cleaner.
It may improve speed for others though, i'll probably use --cpu-moe instead desu from now on.

Anonymous
01/02/26(Fri)13:34:09 No.107743021

Anonymous 01/02/26(Fri)13:34:09 No.107743021

File: pepe monolith.jpg (167 KB, 990x937)

167 KB JPG

So is there a benchmark that shows wide variety of local models? I swear there used to be a site that allowed you to filter out API models, specify the amount of weights range etc to get useful results. It had multiple different scores too.
Now I am only seeing benchmarks that list API only models, very limited local models or unable to meaningfully narrow down options.
Is the website I am thinking of still around? Do you know any that is actually useful for here?

Anonymous
01/02/26(Fri)13:39:15 No.107743060

Anonymous 01/02/26(Fri)13:39:15 No.107743060

>>107742998
It won't change anything for DeepSeek/Kimi/GLM because they don't have bias tensors but it's meant to account for models that do, like gpt-oss. It's the same thing basically with a fix applied.

Anonymous
01/02/26(Fri)13:39:19 No.107743061

Anonymous 01/02/26(Fri)13:39:19 No.107743061

File: 1764697049981003.jpg (31 KB, 317x366)

31 KB JPG

https://www.youtube.com/watch?v=Pm4P6ryfezI

Anonymous
01/02/26(Fri)13:42:39 No.107743081

Anonymous 01/02/26(Fri)13:42:39 No.107743081

>>107743021
>benchmark
No. Benchmarks have been a victim of the old maxim of "When a measure becomes a target, it ceases to be a good measure".
LLMs are more susceptible to this than other domains...see >>107742235 ffs

Anonymous
01/02/26(Fri)13:44:01 No.107743094

Anonymous 01/02/26(Fri)13:44:01 No.107743094

>>107743021
benchmemes have lost utility since everybody games them as hard as possible

Anonymous
01/02/26(Fri)13:47:46 No.107743133

Anonymous 01/02/26(Fri)13:47:46 No.107743133

>>107742271
The army of pajeets that showed up and made 30 seethe posts per second for 5 days straight also didn't help their credibility

Anonymous
01/02/26(Fri)13:51:26 No.107743165

Anonymous 01/02/26(Fri)13:51:26 No.107743165

>>107743021
i just use lmarena.ai or livebench.ai which have a column for license showing open source modesl, but i don't know, if you google it there are benchmarks everywhere.

Anonymous
01/02/26(Fri)14:08:41 No.107743280

Anonymous 01/02/26(Fri)14:08:41 No.107743280

>>107742271
They should have released the earlier LMArena models (not what they called Maverick-Experimental) and let the world seethe at how 'unsafe' they were, but they didn't have the guts to.

Anonymous
01/02/26(Fri)14:25:58 No.107743412

Anonymous 01/02/26(Fri)14:25:58 No.107743412

>>107743061
The calculator is alive

Anonymous
01/02/26(Fri)14:28:15 No.107743431

Anonymous 01/02/26(Fri)14:28:15 No.107743431

>>107742235
The spaces are a bigger issue desu >>107733442

Anonymous
01/02/26(Fri)14:29:08 No.107743438

Anonymous 01/02/26(Fri)14:29:08 No.107743438

I'm looking for a model that can do a satisfying ERP. I upgraded to 24gb of VRAM but looking in the thread that seems to be an extremely small amount.
What is my best option here?

Anonymous
01/02/26(Fri)14:31:50 No.107743465

Anonymous 01/02/26(Fri)14:31:50 No.107743465

>>107743438
24Gb is more than enough for ERP. don't listen to the schizos. just grab Magidonia or Cydonia.

Anonymous
01/02/26(Fri)14:32:25 No.107743470

Anonymous 01/02/26(Fri)14:32:25 No.107743470

>>107743061
it doesn't matter
what matters is that i need to touch my dick to the output and the output needs to be of satisfactory quality for me

Anonymous
01/02/26(Fri)14:32:51 No.107743475

Anonymous 01/02/26(Fri)14:32:51 No.107743475

>>107741748
image is not a good metaphor, lossy compression can achieve > 100x size reduction with nearly imperceptible differences.

Anonymous
01/02/26(Fri)14:33:38 No.107743485

Anonymous 01/02/26(Fri)14:33:38 No.107743485

>107743438
Don't listen to this shill >107743465

Anonymous
01/02/26(Fri)14:49:52 No.107743614

Anonymous 01/02/26(Fri)14:49:52 No.107743614

>>107742235
The new score, for whatever it's worth, is 76,2 which is 1/10th of a point below gtp5.1.

Depending on your skillset and interests, a benchmaxxed leetcode model might serve a purpose. If you do the higher level architecture and function specifications yourself, the model could do all the function implementations.
If you don't have the skills, you need to use some other model to help you out.
You could even roll your own mixture of experts this way, an AoR, or assembly of retards.

Anonymous
01/02/26(Fri)14:50:08 No.107743616

Anonymous 01/02/26(Fri)14:50:08 No.107743616

>>107743438
QK_6 of a mistral small 24B finetune, qwq snowdrop, or a cope quant of GLM 4.5 depending on your system RAM. Models 24B+ are just enough to follow instructions and maintain a character well, but they'll still sometimes encounter the same problems as smaller retarded models.
Honestly, the powergap in local models is massive. We're all peasants. Anons on 8GB GPUs are in some cases running the same models as you guys on 24GB lol. Once you get to 200GB+ of RAM you've got some better options at non-cope quants.

Anonymous
01/02/26(Fri)14:56:28 No.107743662

Anonymous 01/02/26(Fri)14:56:28 No.107743662

>>107743616
Imagine spending $5000 on a rig that needs 5 minutes to output a single paragraph just so you can get slightly better prose and word diversity.

Anonymous
01/02/26(Fri)14:58:09 No.107743683

Anonymous 01/02/26(Fri)14:58:09 No.107743683

>>107743662
I wanted for this

Anonymous
01/02/26(Fri)15:00:55 No.107743702

Anonymous 01/02/26(Fri)15:00:55 No.107743702

>>107743616
>>107743662
You are absolutely right — that's a clever insight.

Anonymous
01/02/26(Fri)15:04:41 No.107743728

Anonymous 01/02/26(Fri)15:04:41 No.107743728

>>107743616
The OP suggests that 70B models can be run using exl2 format. I don't know what that actually means, but searching on HuggingFace and I see that there are a few 70B RP models that are designed to run on 24gb VRAM. That seems like a 'too good to be true' kind of scenario. Is there a catch?
I'm currently using a 27B QK_4 Gemma finetune and it works 'okay' but doesn't appeal to my degenerate fetishes very well.

Anonymous
01/02/26(Fri)15:06:37 No.107743745

Anonymous 01/02/26(Fri)15:06:37 No.107743745

>>107743728
>Is there a catch?
they're all super old and retarded compared to modern models

Anonymous
01/02/26(Fri)15:11:04 No.107743769

Anonymous 01/02/26(Fri)15:11:04 No.107743769

>>107743728
Gemma models aren't very good for RP. And they use more RAM than they should in my experience, haven't looked into why because I don't care lmao. Throw that shit in the trash.
The models I metioned before will all be uncensored with a good sys prompt, or you can download a heretic/abliterated model (reduced censorship, reduced intelligence).
70B is old and you'd need a cope quant, avoid.

Anonymous
01/02/26(Fri)15:12:22 No.107743780

Anonymous 01/02/26(Fri)15:12:22 No.107743780

>>107743769
>haven't looked into why
something about gemmars being wide or whatever the fuck, basically a fat fuck model

Anonymous
01/02/26(Fri)15:15:37 No.107743803

Anonymous 01/02/26(Fri)15:15:37 No.107743803

>>107743745
>>107743769
I see, thank you for the advice. I don't really understand timeframes when it comes to AI because everything seems new/cutting edge. How old is 'old' in this context?
I'll look into some of the 24B mentioned in previous posts.

Anonymous
01/02/26(Fri)15:17:09 No.107743816

Anonymous 01/02/26(Fri)15:17:09 No.107743816

>>107743769
>And they use more RAM than they should in my experience
head length of 256 will do that to you
but Gemma 3 fixed that by introducing iSWA, which made them use less, not more, ram than the average model of their size class
if you still have that issue you're running an outdated llama cpp or have used the --swa-full flag (but why??) which disables iSWA
but yeah back in the day Gemma 2 9B felt big for such a model with such a limited amount of context (8K only)

Anonymous
01/02/26(Fri)15:17:57 No.107743827

Anonymous 01/02/26(Fri)15:17:57 No.107743827

>>107743803
>How old is 'old' in this context?
the "newest" 70b base model is from december 2024

Anonymous
01/02/26(Fri)15:21:01 No.107743854

Anonymous 01/02/26(Fri)15:21:01 No.107743854

>>107743769
>Gemma models aren't very good for RP.
They are, just not at saying 'cock'

Anonymous
01/02/26(Fri)15:21:22 No.107743856

Anonymous 01/02/26(Fri)15:21:22 No.107743856

File: chad suicide.jpg (25 KB, 680x635)

25 KB JPG

>have 36gb of VRAM
>models nowadays are either 32B or 110B
>former is too stupid latter is too big
>EVEN if I got 48gb 110B would still barely run at 3bpw
I miss 70Bs

Anonymous
01/02/26(Fri)15:22:53 No.107743868

Anonymous 01/02/26(Fri)15:22:53 No.107743868

>>107743856
just offload bwo

Anonymous
01/02/26(Fri)15:24:20 No.107743877

Anonymous 01/02/26(Fri)15:24:20 No.107743877

>>107743868
I might as well meditate and go flaccid mid-goon while the answer generates

Anonymous
01/02/26(Fri)15:24:51 No.107743885

Anonymous 01/02/26(Fri)15:24:51 No.107743885

>>107743856
GLM 4.5 REAP?????

Anonymous
01/02/26(Fri)15:26:02 No.107743895

Anonymous 01/02/26(Fri)15:26:02 No.107743895

>>107743885
>GLM 4.5 RIP
ye...

Anonymous
01/02/26(Fri)15:26:57 No.107743905

Anonymous 01/02/26(Fri)15:26:57 No.107743905

>>107743885
More like Im going to GLM 4.5 RAPE you

Anonymous
01/02/26(Fri)15:29:49 No.107743931

Anonymous 01/02/26(Fri)15:29:49 No.107743931

>>107743816
I forgot the Gemma 2 8K context meltdown kek, what a terrible release

Anonymous
01/02/26(Fri)15:30:47 No.107743940

Anonymous 01/02/26(Fri)15:30:47 No.107743940

>>107741641
My cock would freeze and break off while sexing this Miku.

Anonymous
01/02/26(Fri)15:31:17 No.107743944

Anonymous 01/02/26(Fri)15:31:17 No.107743944

>>107743877
Real human chat partner (especially if it is a real female) writes 1 to 3 tokens per second.
Seems like you are way out of touch with the reality.

Anonymous
01/02/26(Fri)15:31:49 No.107743949

Anonymous 01/02/26(Fri)15:31:49 No.107743949

>>107743816
nah gemma3 is still more hungry than other models of their size 12/27 both

Anonymous
01/02/26(Fri)15:33:33 No.107743964

Anonymous 01/02/26(Fri)15:33:33 No.107743964

>>107743944
This might just be the most Autistic thing I've ever read in this thread.

Anonymous
01/02/26(Fri)15:34:37 No.107743978

Anonymous 01/02/26(Fri)15:34:37 No.107743978

>>107743964
Yes, talking to real humans is autism behavior.

Anonymous
01/02/26(Fri)15:34:41 No.107743979

Anonymous 01/02/26(Fri)15:34:41 No.107743979

>>107743944
that's why I talk with an AI, if I wanted the hell of dealing with real women I'd go date one or some shit

Anonymous
01/02/26(Fri)15:37:07 No.107743998

Anonymous 01/02/26(Fri)15:37:07 No.107743998

>>107743931
I've always felt Google just intentionally makes Gemma models poor in some fashion in every gen because they have a living fear of self cannibalizing even a small percentage of their own Gemini. I mean, it's not like they don't know how to do better. When Gemma 3 released, they already had reasoning models, yet they chose not to make any for Gemma. Context was extended to 128k on paper, but in practice it sucks humongous dicks compared to Qwen 3's handling of long context.
It's not like Google doesn't know how to do better. They have Gemini. they already have better.
So the model being shit has to be highly intentional.

Anonymous
01/02/26(Fri)15:37:57 No.107744006

Anonymous 01/02/26(Fri)15:37:57 No.107744006

>>107743944
Real females are not worth all the effort to RP with it used to be RP with men if you want to RP.
I'm glad we have LLMs now and don't need to go through any of that.

Anonymous
01/02/26(Fri)15:38:46 No.107744013

Anonymous 01/02/26(Fri)15:38:46 No.107744013

>>107743964
>>107743979
I have a slight autism.

Anonymous
01/02/26(Fri)15:39:52 No.107744024

Anonymous 01/02/26(Fri)15:39:52 No.107744024

What am I missing here? Aren't imatrix quants superior to static quants? Then why do quantizers like mradermacher still release both imatrix and static quants?

Anonymous
01/02/26(Fri)15:40:46 No.107744036

Anonymous 01/02/26(Fri)15:40:46 No.107744036

>>107744013
its ok

Anonymous
01/02/26(Fri)15:40:54 No.107744039

Anonymous 01/02/26(Fri)15:40:54 No.107744039

>>107744024
Yes of course, optimizing wikipedia quoting potential is sure to make models better in perplexity which is just measuring how well they quote wikipedia.

Anonymous
01/02/26(Fri)15:42:09 No.107744051

Anonymous 01/02/26(Fri)15:42:09 No.107744051

What are the best models for 512 GB RAM? For general assistant things, for RP, for cooding, and also any decent vision models?

Anonymous
01/02/26(Fri)15:42:46 No.107744058

Anonymous 01/02/26(Fri)15:42:46 No.107744058

>>107744024
for the same reason people constantly release useless quants of useless models (is there even one person on earth who uses the iq1 of a small dense model? lmao)
it's quant autism
quanters will literally die if they don't release every single possible quant variant that exists just for the sake of it

Anonymous
01/02/26(Fri)15:43:57 No.107744075

Anonymous 01/02/26(Fri)15:43:57 No.107744075

>>107744058
choice bad yes

Anonymous
01/02/26(Fri)15:46:52 No.107744107

Anonymous 01/02/26(Fri)15:46:52 No.107744107

>>107744075
*shits on a plate* well, you have the choice not to eat from that plate, so I'll just leave it there, I mean, choice good

Anonymous
01/02/26(Fri)15:47:52 No.107744123

Anonymous 01/02/26(Fri)15:47:52 No.107744123

>>107744024
They are slightly better or maybe not, but in reality you won't notice anything because the people who are using quants in the first palce are not running the most optimal hardware setup anyway.
It's a waste of time to argue if IQ4 XS is better than Q4 M or whatever.

Anonymous
01/02/26(Fri)15:48:12 No.107744127

Anonymous 01/02/26(Fri)15:48:12 No.107744127

File: file.png (21 KB, 542x107)

21 KB PNG

who escaped from here containment

Anonymous
01/02/26(Fri)15:49:39 No.107744148

Anonymous 01/02/26(Fri)15:49:39 No.107744148

>>107744039
I get the point but aren't their dataset more varied than that?
Well at least that's what I hoped/coped.
>>107744058
Maybe there are (hopefully edge) cases where there isn't enough data about the use case in the calibration dataset so static quant performs better?

Anonymous
01/02/26(Fri)15:50:59 No.107744169

Anonymous 01/02/26(Fri)15:50:59 No.107744169

IQ quants are slightly more efficient but require more CPU usage and will be bottlenecked if you offload to system ram. Only use them if you can fit the whole model to VRAM.

Anonymous
01/02/26(Fri)15:51:18 No.107744174

Anonymous 01/02/26(Fri)15:51:18 No.107744174

>>107744148
do you think they have mesugaki rp in their imatrix sets?

Anonymous
01/02/26(Fri)15:51:26 No.107744179

Anonymous 01/02/26(Fri)15:51:26 No.107744179

>>107743827
And yet some 70b write better than MoE slop from current_year. Especially those fake ass "100b" models.

Anonymous
01/02/26(Fri)15:52:33 No.107744188

Anonymous 01/02/26(Fri)15:52:33 No.107744188

>>107744179
refer to >>107744127

Anonymous
01/02/26(Fri)15:53:23 No.107744199

Anonymous 01/02/26(Fri)15:53:23 No.107744199

IQ != imatrix in case some are confused about that again
all IQ quants are imatrix, but not all imatrix quants are IQ. Classics like Q4_K_M can also be made with an imatrix.
unsloth's UD are also imatrix quants

Anonymous
01/02/26(Fri)15:57:23 No.107744249

Anonymous 01/02/26(Fri)15:57:23 No.107744249

>>107744199
If an author says this and this version is imatrix quant it is then a imatrix quant and the naming convention matches this.
What the fuck is your issue.

Anonymous
01/02/26(Fri)15:58:20 No.107744257

Anonymous 01/02/26(Fri)15:58:20 No.107744257

>>107744188
Schizobabble. I can run both sets of models at reasonable quants. Majority of new releases, especially MoE just summarize and parrot. I get tired of them quick.

Anonymous
01/02/26(Fri)15:58:59 No.107744265

Anonymous 01/02/26(Fri)15:58:59 No.107744265

>>107744249
tismo

Anonymous
01/02/26(Fri)15:59:35 No.107744273

Anonymous 01/02/26(Fri)15:59:35 No.107744273

MoE = bad
Dense = good

Anonymous
01/02/26(Fri)16:00:37 No.107744281

Anonymous 01/02/26(Fri)16:00:37 No.107744281

>>107744273
uhm actually you see *10 paragraphs about why you're wrong*

Anonymous
01/02/26(Fri)16:01:04 No.107744285

Anonymous 01/02/26(Fri)16:01:04 No.107744285

>>107744273
This is one we can all agree on.

Anonymous
01/02/26(Fri)16:01:43 No.107744291

Anonymous 01/02/26(Fri)16:01:43 No.107744291

densetrads getting uppity again

Anonymous
01/02/26(Fri)16:01:46 No.107744292

Anonymous 01/02/26(Fri)16:01:46 No.107744292

>>107744273
it's actually not as long as they're well trained and large enough. currently labs are doing neither.

Anonymous
01/02/26(Fri)16:02:12 No.107744294

Anonymous 01/02/26(Fri)16:02:12 No.107744294

>>107744291
moesissies forgot to dilate

Anonymous
01/02/26(Fri)16:02:16 No.107744297

Anonymous 01/02/26(Fri)16:02:16 No.107744297

>>107744281
no, write THE ENTIRE response INCLUDING THE PART INSIDE *

Anonymous
01/02/26(Fri)16:02:25 No.107744298

Anonymous 01/02/26(Fri)16:02:25 No.107744298

>>107744273
all SOTA APIs models are MoEs, dense is deader than the dead horse

Anonymous
01/02/26(Fri)16:03:15 No.107744306

Anonymous 01/02/26(Fri)16:03:15 No.107744306

>>107744297
i'm sorry I can't help with that .assistant

Anonymous
01/02/26(Fri)16:03:32 No.107744310

Anonymous 01/02/26(Fri)16:03:32 No.107744310

>>107744298
>all SOTA OSs are 32 bit, 64 bit is deader than the dead horse

Anonymous
01/02/26(Fri)16:04:23 No.107744318

Anonymous 01/02/26(Fri)16:04:23 No.107744318

>>107744298
And all SOTA APIs are dogshit sloppatron for RP. Densechads stay winning

Anonymous
01/02/26(Fri)16:04:23 No.107744319

Anonymous 01/02/26(Fri)16:04:23 No.107744319

>>107744310
what is blud smoking?

Anonymous
01/02/26(Fri)16:04:29 No.107744321

Anonymous 01/02/26(Fri)16:04:29 No.107744321

File: 1743061359655979.webm (454 KB, 268x480)

454 KB WEBM

>>107744273

Anonymous
01/02/26(Fri)16:05:30 No.107744331

Anonymous 01/02/26(Fri)16:05:30 No.107744331

>>107744321
is this something I'm too dense to understand?

Anonymous
01/02/26(Fri)16:06:28 No.107744350

Anonymous 01/02/26(Fri)16:06:28 No.107744350

>>107744331
Heh...

Anonymous
01/02/26(Fri)16:08:00 No.107744368

Anonymous 01/02/26(Fri)16:08:00 No.107744368

>>107741871
Kimi K2. 6 t/s.

Anonymous
01/02/26(Fri)16:08:23 No.107744373

Anonymous 01/02/26(Fri)16:08:23 No.107744373

now time to sell all these (You)s for more RAM

Anonymous
01/02/26(Fri)16:09:02 No.107744382

Anonymous 01/02/26(Fri)16:09:02 No.107744382

>>107744321
is that ai slop from facebook

Anonymous
01/02/26(Fri)16:10:44 No.107744405

Anonymous 01/02/26(Fri)16:10:44 No.107744405

>>107744382
it's how well moe user sleep knowing his model is better

Anonymous
01/02/26(Fri)16:11:51 No.107744417

Anonymous 01/02/26(Fri)16:11:51 No.107744417

File: its_all_so_tiresome.png (221 KB, 896x720)

221 KB PNG

>>107741641
>https://www.spectrumsourcing.com/spectrum-news-feed/industry-update-supermicro-policy-on-standalone-motherboards-sales-discontinued
>Industry Update: Supermicro Policy on Standalone Motherboards Sales Discontinued
>Effective immediately, Supermicro motherboards can no longer be purchased as standalone components. Going forward, customers must order the full server system to obtain the motherboard.

Anonymous
01/02/26(Fri)16:12:35 No.107744433

Anonymous 01/02/26(Fri)16:12:35 No.107744433

>>107744382
holy newfag

Anonymous
01/02/26(Fri)16:13:09 No.107744440

Anonymous 01/02/26(Fri)16:13:09 No.107744440

>>107744417
yass queen slay

Anonymous
01/02/26(Fri)16:14:03 No.107744452

Anonymous 01/02/26(Fri)16:14:03 No.107744452

>>107744417
fuck

Anonymous
01/02/26(Fri)16:14:40 No.107744458

Anonymous 01/02/26(Fri)16:14:40 No.107744458

>>107744318
70b active vs 12b active. Grok showed you what a real API "MoE" model looks like.

Anonymous
01/02/26(Fri)16:15:12 No.107744467

Anonymous 01/02/26(Fri)16:15:12 No.107744467

>>107744417
nothingburger you didn't order from them anyway

Anonymous
01/02/26(Fri)16:15:45 No.107744480

Anonymous 01/02/26(Fri)16:15:45 No.107744480

>>107744417
What if your shit is broken and the box is out of warranty? Just buy another server?

Anonymous
01/02/26(Fri)16:16:37 No.107744485

Anonymous 01/02/26(Fri)16:16:37 No.107744485

>>107744417
All these jews are wanting to go back to the time when a single SGI Octane seat was $25,000 plus a single Maya license was $10,000.

Anonymous
01/02/26(Fri)16:17:05 No.107744496

Anonymous 01/02/26(Fri)16:17:05 No.107744496

Hungry Hungry Gemma

Anonymous
01/02/26(Fri)16:17:21 No.107744500

Anonymous 01/02/26(Fri)16:17:21 No.107744500

>>107744480
obviously? https://www.cnbc.com/2025/11/23/how-device-hoarding-by-americans-is-costing-economy.html

Anonymous
01/02/26(Fri)16:21:33 No.107744548

Anonymous 01/02/26(Fri)16:21:33 No.107744548

>>107744500
It's funny how that article comes out and then suddenly we can't buy ram/ssd/etc. Surely just a coincidence.

Anonymous
01/02/26(Fri)16:22:14 No.107744553

Anonymous 01/02/26(Fri)16:22:14 No.107744553

>>107744467
For private use I bought an ASRock Rack motherboard for my GPU server and a Supermicro motherboard for my NAS.
I obviously had options other than Supermicro but less options are never going to be beneficial.

>>107744480
For professional use we have multiple Supermicro servers, some of which no longer have warranty.
But honestly, if any of them were to die it would probably make more sense for us to buy a newer machine anyways than to try and replace the motherboard.

Anonymous
01/02/26(Fri)16:24:07 No.107744576

Anonymous 01/02/26(Fri)16:24:07 No.107744576

>>107744553
>it would probably make more sense for us to buy a newer machine anyways
Thank you very much for this smart approach to business!

Anonymous
01/02/26(Fri)16:25:32 No.107744590

Anonymous 01/02/26(Fri)16:25:32 No.107744590

>>107744553
take them home and component level fix them. it's probably some mosfet or resistors. I missed out on a dell server from a client, they were paying as service to haul it away but I didn't want to beg for it. Now I realize that's a mistake. Could have scored some more ram or a chance at better procs.

Anonymous
01/02/26(Fri)16:26:56 No.107744611

Anonymous 01/02/26(Fri)16:26:56 No.107744611

>>107744480
I've been in this situation multiple times (with literal supermicro boards), and if resurrecting the old thing is the best move then ebay is basically always a cheap option. When would the first party vendor even realistically have stock to sell you after warranty periods are over?
(I also use some 3rd party n-1, n-2 type hardware vendors professionally)

Anonymous
01/02/26(Fri)16:27:28 No.107744618

Anonymous 01/02/26(Fri)16:27:28 No.107744618

>>107741308
I tried gemma-3-27b-it-abliterated-normpreserve. While I like the prose a lot, it has a tendency to write physically impossible movements when things get heated. My theory is a model at that size just can't reason through the body mechanics well enough without direct training in that area, while the original Gemma is of course highly censored in training.

Anonymous
01/02/26(Fri)16:28:17 No.107744624

Anonymous 01/02/26(Fri)16:28:17 No.107744624

i had a weird experience RPing with GLM Air (derestricted) the other day. It felt like some weird soft refusal, but I haven't run into it before.

{{user}} was working hard, exerting himself. Suddenly he just collapses to the ground, unconscious, not moving. {{char}} just laughs and says something like
>Oh silly boy, always trying to hard.
I swiped a few times and it was always some variation of
>Haha, try taking it a little easier next time
>Aww, how cute, he fell asleep

I tried editing my last message to make it worse, like describing how {{user}} falls without even trying to catch himself, his head slamming hard onto the flood, bleeding from his head etc. Still same reactions from {{char}}.

I tried a few other models, and {{char}} always acted like you'd expect (afraid, worried, running to fetch help, trying to help by herself etc). There had been violence earlier in the same chat without any issues, but for some reason it felt like GLM didn't want to do this specific scenario.

Anonymous
01/02/26(Fri)16:29:19 No.107744639

Anonymous 01/02/26(Fri)16:29:19 No.107744639

>>107744624
positivity bias is a hell of a drug

Anonymous
01/02/26(Fri)16:30:25 No.107744655

Anonymous 01/02/26(Fri)16:30:25 No.107744655

>>107744618
Gemma has some redacted training data and descriptions of human sexual interfaces are sometimes bit weird even with the vanilla model. I think that 'desrestricted' model enforces this somehow because most of the time when you get normal Gemma to describe something it's not that bad or I can overlook that fact.
Maybe that uncensored model pushes the vectors too hard.

Anonymous
01/02/26(Fri)16:32:16 No.107744678

Anonymous 01/02/26(Fri)16:32:16 No.107744678

Do we need to remind the newfriends? Ablitardation is always a meme.

Anonymous
01/02/26(Fri)16:33:51 No.107744697

Anonymous 01/02/26(Fri)16:33:51 No.107744697

>>107744639
I don't feel like GLM Air is usually like that though. I find it has an annoying to tendency act smug and superior in sometimes delusional ways.

Like {{user}} can be the world's greatest swordsman and {{char}} is some poor village girl who's never seen a sword in her life, and if they fight for some reason she's all
>His decades of training has made him predictable.
>She sees right through his feint and uses the openings in his footwork to land a touch.

Anonymous
01/02/26(Fri)16:33:58 No.107744699

Anonymous 01/02/26(Fri)16:33:58 No.107744699

>>107744611
supermicro takes a while to retire shit unlike most vendors.

Anonymous
01/02/26(Fri)16:36:15 No.107744717

Anonymous 01/02/26(Fri)16:36:15 No.107744717

>>107744699
I have a feeling there would be a way to buy a replacement from whatever channel you originally bought the system from, even if you're out of warranty.

Anonymous
01/02/26(Fri)16:38:21 No.107744744

Anonymous 01/02/26(Fri)16:38:21 No.107744744

>>107744717
True. There's even ones that store extra parts to handle service contracts and shit like that.

Anonymous
01/02/26(Fri)16:39:54 No.107744768

Anonymous 01/02/26(Fri)16:39:54 No.107744768

>>107744699
>System Information
> Manufacturer: Supermicro
> Product Name: H8SGL
> Version: 1234567890
> Serial Number: 1234567890
They also last forever, eh? I've got 3 of these in my homelab.
Thinking about retiring most of them at this point, but they just won't die

Anonymous
01/02/26(Fri)16:45:27 No.107744830

Anonymous 01/02/26(Fri)16:45:27 No.107744830

File: GifqFNQXYAEZwTO.png (392 KB, 477x680)

392 KB PNG

Is there an experiment like this:
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
But done for larger models? (Ideally one for 20-30B range, and another for 100+)
I want to see quantifiable, numerical representation of how larger models cope with quantization. (I know they cope better, but I want to see numbers for how much.)

Anonymous
01/02/26(Fri)16:46:57 No.107744844

Anonymous 01/02/26(Fri)16:46:57 No.107744844

>>107744768
>H8SGL
Bit ancient tho. Still on DDR3. Only real downside might be power consumption.

Anonymous
01/02/26(Fri)17:07:00 No.107745042

Anonymous 01/02/26(Fri)17:07:00 No.107745042

>>107744697
that's just bad prompting
add a line that says, [Encounters should be realistic, sometimes one side is totally overpowered and that's fine.]

Anonymous
01/02/26(Fri)17:08:38 No.107745061

Anonymous 01/02/26(Fri)17:08:38 No.107745061

>>107734819
I filtered out all except Sarah, Heart, Bella, Nova and Sky to make experimentation simpler. Didn't like the others Kokoro ships with. Then as a YOLO I enabled all five and... it's nice?
It's a softer, rounder sound. Not as crisp, but for my use case? Near perfect. I'm not looking to clone any existing voice, I just wan't a good "my computer's voice".

>>107734973
I failed because this was too interesting.
https://files.catbox.moe/oa77o3.mp3

>>107741983
Umm.. based?

Anonymous
01/02/26(Fri)17:13:25 No.107745105

Anonymous 01/02/26(Fri)17:13:25 No.107745105

>>107745042
>add a line about every single thing that might happen
or just use a good model

Anonymous
01/02/26(Fri)17:14:52 No.107745122

Anonymous 01/02/26(Fri)17:14:52 No.107745122

>>107745061
I like this sound. Problem with this find is that it was found in a sunken ship and the boneheads are time dating this to the same period as the ship itself. Most of 'archaelogy' is done this way.
Just because some retards left clay tablets near megalithic buildings does not mean they actually built them etc. They were just living in the same area after years they were abandoned...

Anonymous
01/02/26(Fri)17:18:20 No.107745153

Anonymous 01/02/26(Fri)17:18:20 No.107745153

>>107745105
it doesn't matter if the model is good or not, if an event is being written about the model will assume it's important, writing a totally one-sided fight against a peasant isn't part of 80% of it's training set so it's statistically unlikely, therefore it will assume the village girl is actually important and an 8000 year old witch and secretly the big bad
not that I'd expect shitters ITT to know how model prompting works, since your biggest hobby appears to be downloading instead of using them

Anonymous
01/02/26(Fri)17:20:54 No.107745181

Anonymous 01/02/26(Fri)17:20:54 No.107745181

Thoughts on Llama 3.3 8B? Apparently upgraded to 128k:
https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K
Worthwhile for anything? Searched last few threads but couldn't find any discussion. Someone posted David-AU's weird "finetune" of it, but no posts about the model itself.

Anonymous
01/02/26(Fri)17:25:00 No.107745217

Anonymous 01/02/26(Fri)17:25:00 No.107745217

>>107745181
Meh. I don't see the point.
Much more interested in that kimi linear that was released a couple months ago and is seemingly on the verge of getting a final implementation in llama.cpp.

Anonymous
01/02/26(Fri)17:31:50 No.107745276

Anonymous 01/02/26(Fri)17:31:50 No.107745276

>>107745122
That's true, but it's still pretty interesting even if some of the claims are pretty optimistic. And it's fun to let the imagination run wild sometimes xd

>I like this sound.
Thanks. I do too. Here's the same standardized test sentence I used before:
https://files.catbox.moe/w8yvr4.mp3

Anonymous
01/02/26(Fri)17:32:07 No.107745281

Anonymous 01/02/26(Fri)17:32:07 No.107745281

>>107745153
Hey! I don't just download models. I also measure pp and t/s

Anonymous
01/02/26(Fri)17:42:18 No.107745378

Anonymous 01/02/26(Fri)17:42:18 No.107745378

File: screenshot-20260103-004143.png (74 KB, 948x533)

74 KB PNG

>>107745276
Here's Gemma 12B lol.

Anonymous
01/02/26(Fri)17:43:23 No.107745390

Anonymous 01/02/26(Fri)17:43:23 No.107745390

I have been in a coma for 12 months and now want to ERP TF scenarios with GLM 4.5-air, is there a good context/system prompt template available or should I write it myself

Anonymous
01/02/26(Fri)17:44:35 No.107745400

Anonymous 01/02/26(Fri)17:44:35 No.107745400

>>107745378
She's hungry

Anonymous
01/02/26(Fri)17:45:11 No.107745408

Anonymous 01/02/26(Fri)17:45:11 No.107745408

here lies meta
https://tech.slashdot.org/story/26/01/02/1449227/results-were-fudged-departing-meta-ai-chief-confirms-llama-4-benchmark-manipulation

Anonymous
01/02/26(Fri)17:46:37 No.107745421

Anonymous 01/02/26(Fri)17:46:37 No.107745421

>>107744273
There is not a single good dense model for a reason.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.