[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1758380067484893.jpg (188 KB, 784x1312)
188 KB
188 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107731243 & >>107722977

►News
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI
>(12/31) LG AI Research releases K-EXAONE: https://hf.co/LGAI-EXAONE/K-EXAONE-236B-A23B
>(12/31) Korean Solar Open 102B-A12B released: https://hf.co/upstage/Solar-Open-100B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107731243

--Papers:
>107734410
--Speculative decoding's viability for VRAM-constrained model optimization:
>107738195 >107738208 >107738239 >107738290 >107738214 >107738362
--Context management strategies for roleplay-focused local AI models:
>107731328 >107731380 >107731846 >107731465
--Critique of unreliable local model recommendations and performance limitations:
>107731301 >107731573 >107731578 >107731590
--CUDA update potentially disrupting LLM speed due to cache issues:
>107735927 >107736635
--Tool call execution issues and platform-specific model performance:
>107733442 >107733519 >107733537 >107733546 >107733612 >107734113
--ollama systemd service conflict causing port binding errors:
>107735488 >107735554 >107735613 >107735828
--Configuring multi-voice defaults in Kokoro-FastAPI with normalized weight syntax:
>107733445 >107733591 >107733859 >107734422
--Choosing a 12GB VRAM model for medieval roleplaying:
>107732457 >107732485 >107732543 >107732510 >107732534 >107732564 >107732569 >107735446 >107732585 >107732813 >107737636
--Technical issues and alternatives in local AI image generation tools:
>107735122 >107735154 >107735168 >107735213 >107735306 >107735370 >107735417 >107735424
--LLMs for interactive 3D modeling workflows:
>107733778 >107733797 >107733816 >107733896 >107738787
--Exploring AI tools for multilingual audiobook creation:
>107732611 >107732637 >107732790 >107732988
--China narrowly surpasses the US in an AI index chart, sparking debate:
>107735053 >107735143
--LLMs as unreliable information storage like Warhammer 40k STCs:
>107734613 >107734697 >107734770 >107734791 >107734799 >107740057
--llama.cpp integrates IQuest-Coder-V1-40B, youtu-vl, and Solar-Open-100B models:
>107732751
--Rin (free space):
>107732945 >107735660 >107736327 >107740001 >107740585 >107741399

►Recent Highlight Posts from the Previous Thread: >>107731249

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
we already had a thread delet this spammer
>>
>>107741641
Stop freezing me, Miku. It's too cold!
>>
tetosex
>>
glm4.6 IQ2_KL or DeepSeek-R1-0528-IQ1_S?
which will give me the better rp?
>>
>>107741646
>7 (You)s
>>
>>107741691
Try them both but probably deepseek.
>>
>>107741710
deepseek even at q1? I've tried 4.7 and it's ok. People on here say 4.6 is more creative but dumber, 4.7 is dryer in its prose but more intelligent. Though to me, 4.7 seems a little slopped.
>>
File: 3448641201.jpg (176 KB, 1024x534)
176 KB
176 KB JPG
>>107741691
but also you need to remember the bit parrot
>>
>>107741641
If I punched her, would she shatter?
>>
>>107741691
>IQ2_KL
>IQ1_S
Ain't no way running models at this level of quant cucking gives better output then just running a smaller model at a reasonable quant size.
>>
>>107741779
1b quant is essentially baked in
there's no variance
>>
>>107741779
I run 4.7 at q2 and it easily beats out everything I've tried up to 32B. Given I only have 32gb of vram, I could try a 70b at q3. What do you suggest?
>>
>>107741779
>>107741812
Ask me how I know you don't have enough memory to run them.
>>
>>107741691
Either way, R1-0528 is pretty outdated by now
>>107741818
No shit these huge MoEs are better than some tiny 32b shit even if you quant them to death. 70b have been dead for over a year now too.
>>
devstral at q2 outputs code perfectly. how much do you think quanting hurts your hot breath on her skin erp?
>>
Ok, so I've been wondering about this for a while. When you guys run your models on system RAM. how many T/s do you get on average?
>>
>>107741853
Creative writing is a more complex task than being a virtual code monkey.
>>
>>107741704
man I shitposted so hard last thread, 0 yous. its unfair
>>
>>107741880
Here you go anon.
>>
>>107741851
>R1-0528 is pretty outdated
There have been no advances in knotting technology since R1 was released and it will still mention knotting when applicable with no handholding.
>>
File: file.png (156 KB, 833x559)
156 KB
156 KB PNG
All my posts on 4chan are satirical and is not a statement, nor reflect my real opinions. (For the feds)

>>107741880
This is how you shitpost
>>
Will Cydonia EVER be topped?
>>
>>107741912
I've got your sysprompt saved brah
>>
>>107741812
The word you're looking for is "Deterministic", and they aren't (just a vastly reduced binary solution space).
>>
>>107741641
I look like this
>>
>>107741871
i mean this is what MoEs are made for.
llama.cpp:
-ot ".ffn_.*_exps.=CPU"
offloading MoE layers to ram for faster inference.
I still run with at least one GPU though.
>>
>>107741871
and to answer your question not more than 10 tokens a second with 128 ram and 24 vram.
>>
>>107741943
That's a funny way to say "sub reading speed"
>>
>>107741921
crap you figured me out lol
do you like it?
>>
current LLMs are like the human centipede
>>
>>107741871
with a dual epyc genoa I'm getting 13t/s, which is good enough as a background workhorse for any tasks involving PII.
>>
>>107741955
well then run your shitty 8B at above human speeds then
>>
>>107741955
now imagine what it's like to use a model that loves to go on and on in their <think> block with that t/s before you get to the first useful, readable token
cpu rammers are experts of huffing copium
>>
>>107741980
send me some money for some 5090s then
>>
>>107741914
I just topped it.
>>
>>107741972
>8B
please. I can run at least 30B
>>
>>107741968
Those are respectable speeds but you could have bought like 6x 3090s for the price of those CPUs.
>>
>>107741923
Whatever cunt, not my fault you are autistic and unable to communicate with real humans. Maybe reddit is the right place for a fact checker like yourself.
>>
>>107741943
What's the difference between this and --n-cpu-moe
>>
So I am searching a model (or more than one different models) for these two use cases:
First is that I want AI to be a practice buddy for a language I used to know. I still understand and recall a decent chunk of vocabulary, but not enough and my grammar is broken. I want to discuss topics in this language and I want it to correct any mistakes in my post, mention what I fucked up and then respond to me. I want to discuss a variety of different topics, so I am looking for something that has both great multi-lingual and conversational capability.
Second is that I got emotional this new year. I am usually fairly apathetic but some old pent-up issues resurfaced. Therapy is a reddit meme and dumb waste of money (Tried out of desperation in the past, fuck me) but I really need to vent. Goes without saying that I don't have anyone irl so yeah I need something from here. I could do the first one on an API maybe if I end up not liking the performance but I am not doing this shit on someone else's server. I am not retarded enough to expect wisdom from a chatbot, but is there something that would give me at least the illusion that I am not shouting into the void? I also want something that won't constantly police and moralize over my politically incorrect schizoid world view, so probably need an abliterated or uncensored model for this. (Just to be clear again I am not looking for a mindless hug machine but I want it to call me out when I tell something genuinely wrong, not because my ideas violate some dumb alignment training.)
I am also not sure if Q6 of a smaller model or Q3 of some 20B model would work better for these use cases. (12gb VRAMlet)
>>
>>107742210
need to know the language to suggest as its model dependent what they know/are good at
>>
File: 1756143841817885.png (408 KB, 894x870)
408 KB
408 KB PNG
>>107729547
>IQuest-Coder-V1
>This is either extremely impressive benchmaxxing or they actually cooked up something new.

So about that

https://github.com/IQuestLab/IQuest-Coder-V1/issues/14
>>
https://tech.slashdot.org/story/26/01/02/1449227/results-were-fudged-departing-meta-ai-chief-confirms-llama-4-benchmark-manipulation
oh god this is hilarious
we all knew about it but I never expected a meta employee or ex meta employee to openly talk about it in the open under their name
>In an interview with the Financial Times, LeCun said the "results were fudged a little bit" and that the team "used different models for different benchmarks to give better results."
desu I'm kinda glad llama is no longer a thing because of this, the models were never good people were just coping with the garbage because there wasn't a lot of open source choice then
>>
>>107742235
>sloppiest of slop answer
>>
>>107742242
I remember /lmg/ being highly skeptical of benchmarks for L4 due to how shit it was.
>>
>>107742242
https://web.archive.org/web/20260102135720/https://www.ft.com/content/e3c4c2f6-4ea7-4adf-b945-e58495f836c2

>[...] The subsequent Llama models were duds. Llama 4, which was released in April 2025, was a flop, and the company was accused of gaming benchmarks to make it look more impressive. LeCun admits that the “results were fudged a little bit”, and the team used different models for different benchmarks to give better results.
>
>“Mark was really upset and basically lost confidence in everyone who was involved in this. And so basically sidelined the entire GenAI organisation. A lot of people have left, a lot of people who haven’t yet left will leave.”
>>
>>107742100
>24x6
But then I wouldn't have 768GB of RAM and be able to run any model I want at a good quant, would have a leafblower/spaceheater and probably need some godawful mining frame and multiple PSUs.
>>
>>107742232
German
>>
>>107742177
https://github.com/ikawrakow/ik_llama.cpp/pull/1026#issuecomment-3602303815
>>
>>107742280
>a leafblower/spaceheater and probably need some godawful mining frame and multiple PSUs
It's so true.
>>
I'm surprised ik isn't dead yet with all the massive refactors llama.cpp went through which must make it painful to sync the fork to
>>
>>107742402
i used ik for a bit but honestly the quants are a bit retarded in their outputs sometimes and there have been significant improvements in llama.cpp since then
>>
>>107742427
The graph split mode is really really nice if you can get it work. Only thing llama.cpp is missing right now.
>>
>>107742447
>Only thing llama.cpp is missing right now.
So their prompt processing speed with MoEs has caught up?
>>
>>107742458
Wouldn't know. Only tried recently with dense models and didn't notice a significant difference.
>>
File: oneMillionAmazonRobots.png (102 KB, 959x537)
102 KB
102 KB PNG
... but where is my local waifu Bezos.
>>
>>107742177
i tried --cpu-moe, it didn't change the speed for me compared to -ot ".ffn_.*_exps.=CPU" but it does feel a lot cleaner.
It may improve speed for others though, i'll probably use --cpu-moe instead desu from now on.
>>
File: pepe monolith.jpg (167 KB, 990x937)
167 KB
167 KB JPG
So is there a benchmark that shows wide variety of local models? I swear there used to be a site that allowed you to filter out API models, specify the amount of weights range etc to get useful results. It had multiple different scores too.
Now I am only seeing benchmarks that list API only models, very limited local models or unable to meaningfully narrow down options.
Is the website I am thinking of still around? Do you know any that is actually useful for here?
>>
>>107742998
It won't change anything for DeepSeek/Kimi/GLM because they don't have bias tensors but it's meant to account for models that do, like gpt-oss. It's the same thing basically with a fix applied.
>>
File: 1764697049981003.jpg (31 KB, 317x366)
31 KB
31 KB JPG
https://www.youtube.com/watch?v=Pm4P6ryfezI
>>
>>107743021
>benchmark
No. Benchmarks have been a victim of the old maxim of "When a measure becomes a target, it ceases to be a good measure".
LLMs are more susceptible to this than other domains...see >>107742235 ffs
>>
>>107743021
benchmemes have lost utility since everybody games them as hard as possible
>>
>>107742271
The army of pajeets that showed up and made 30 seethe posts per second for 5 days straight also didn't help their credibility
>>
>>107743021
i just use lmarena.ai or livebench.ai which have a column for license showing open source modesl, but i don't know, if you google it there are benchmarks everywhere.
>>
>>107742271
They should have released the earlier LMArena models (not what they called Maverick-Experimental) and let the world seethe at how 'unsafe' they were, but they didn't have the guts to.
>>
>>107743061
The calculator is alive
>>
>>107742235
The spaces are a bigger issue desu >>107733442
>>
I'm looking for a model that can do a satisfying ERP. I upgraded to 24gb of VRAM but looking in the thread that seems to be an extremely small amount.
What is my best option here?
>>
>>107743438
24Gb is more than enough for ERP. don't listen to the schizos. just grab Magidonia or Cydonia.
>>
>>107743061
it doesn't matter
what matters is that i need to touch my dick to the output and the output needs to be of satisfactory quality for me
>>
>>107741748
image is not a good metaphor, lossy compression can achieve > 100x size reduction with nearly imperceptible differences.
>>
>107743438
Don't listen to this shill >107743465
>>
>>107742235
The new score, for whatever it's worth, is 76,2 which is 1/10th of a point below gtp5.1.

Depending on your skillset and interests, a benchmaxxed leetcode model might serve a purpose. If you do the higher level architecture and function specifications yourself, the model could do all the function implementations.
If you don't have the skills, you need to use some other model to help you out.
You could even roll your own mixture of experts this way, an AoR, or assembly of retards.
>>
>>107743438
QK_6 of a mistral small 24B finetune, qwq snowdrop, or a cope quant of GLM 4.5 depending on your system RAM. Models 24B+ are just enough to follow instructions and maintain a character well, but they'll still sometimes encounter the same problems as smaller retarded models.
Honestly, the powergap in local models is massive. We're all peasants. Anons on 8GB GPUs are in some cases running the same models as you guys on 24GB lol. Once you get to 200GB+ of RAM you've got some better options at non-cope quants.
>>
>>107743616
Imagine spending $5000 on a rig that needs 5 minutes to output a single paragraph just so you can get slightly better prose and word diversity.
>>
>>107743662
I wanted for this
>>
>>107743616
>>107743662
You are absolutely right — that's a clever insight.
>>
>>107743616
The OP suggests that 70B models can be run using exl2 format. I don't know what that actually means, but searching on HuggingFace and I see that there are a few 70B RP models that are designed to run on 24gb VRAM. That seems like a 'too good to be true' kind of scenario. Is there a catch?
I'm currently using a 27B QK_4 Gemma finetune and it works 'okay' but doesn't appeal to my degenerate fetishes very well.
>>
>>107743728
>Is there a catch?
they're all super old and retarded compared to modern models
>>
>>107743728
Gemma models aren't very good for RP. And they use more RAM than they should in my experience, haven't looked into why because I don't care lmao. Throw that shit in the trash.
The models I metioned before will all be uncensored with a good sys prompt, or you can download a heretic/abliterated model (reduced censorship, reduced intelligence).
70B is old and you'd need a cope quant, avoid.
>>
>>107743769
>haven't looked into why
something about gemmars being wide or whatever the fuck, basically a fat fuck model
>>
>>107743745
>>107743769
I see, thank you for the advice. I don't really understand timeframes when it comes to AI because everything seems new/cutting edge. How old is 'old' in this context?
I'll look into some of the 24B mentioned in previous posts.
>>
>>107743769
>And they use more RAM than they should in my experience
head length of 256 will do that to you
but Gemma 3 fixed that by introducing iSWA, which made them use less, not more, ram than the average model of their size class
if you still have that issue you're running an outdated llama cpp or have used the --swa-full flag (but why??) which disables iSWA
but yeah back in the day Gemma 2 9B felt big for such a model with such a limited amount of context (8K only)
>>
>>107743803
>How old is 'old' in this context?
the "newest" 70b base model is from december 2024
>>
>>107743769
>Gemma models aren't very good for RP.
They are, just not at saying 'cock'
>>
File: chad suicide.jpg (25 KB, 680x635)
25 KB
25 KB JPG
>have 36gb of VRAM
>models nowadays are either 32B or 110B
>former is too stupid latter is too big
>EVEN if I got 48gb 110B would still barely run at 3bpw
I miss 70Bs
>>
>>107743856
just offload bwo
>>
>>107743868
I might as well meditate and go flaccid mid-goon while the answer generates
>>
>>107743856
GLM 4.5 REAP?????
>>
>>107743885
>GLM 4.5 RIP
ye...
>>
>>107743885
More like Im going to GLM 4.5 RAPE you
>>
>>107743816
I forgot the Gemma 2 8K context meltdown kek, what a terrible release
>>
>>107741641
My cock would freeze and break off while sexing this Miku.
>>
>>107743877
Real human chat partner (especially if it is a real female) writes 1 to 3 tokens per second.
Seems like you are way out of touch with the reality.
>>
>>107743816
nah gemma3 is still more hungry than other models of their size 12/27 both
>>
>>107743944
This might just be the most Autistic thing I've ever read in this thread.
>>
>>107743964
Yes, talking to real humans is autism behavior.
>>
>>107743944
that's why I talk with an AI, if I wanted the hell of dealing with real women I'd go date one or some shit
>>
>>107743931
I've always felt Google just intentionally makes Gemma models poor in some fashion in every gen because they have a living fear of self cannibalizing even a small percentage of their own Gemini. I mean, it's not like they don't know how to do better. When Gemma 3 released, they already had reasoning models, yet they chose not to make any for Gemma. Context was extended to 128k on paper, but in practice it sucks humongous dicks compared to Qwen 3's handling of long context.
It's not like Google doesn't know how to do better. They have Gemini. they already have better.
So the model being shit has to be highly intentional.
>>
>>107743944
Real females are not worth all the effort to RP with it used to be RP with men if you want to RP.
I'm glad we have LLMs now and don't need to go through any of that.
>>
>>107743964
>>107743979
I have a slight autism.
>>
What am I missing here? Aren't imatrix quants superior to static quants? Then why do quantizers like mradermacher still release both imatrix and static quants?
>>
>>107744013
its ok
>>
>>107744024
Yes of course, optimizing wikipedia quoting potential is sure to make models better in perplexity which is just measuring how well they quote wikipedia.
>>
What are the best models for 512 GB RAM? For general assistant things, for RP, for cooding, and also any decent vision models?
>>
>>107744024
for the same reason people constantly release useless quants of useless models (is there even one person on earth who uses the iq1 of a small dense model? lmao)
it's quant autism
quanters will literally die if they don't release every single possible quant variant that exists just for the sake of it
>>
>>107744058
choice bad yes
>>
>>107744075
*shits on a plate* well, you have the choice not to eat from that plate, so I'll just leave it there, I mean, choice good
>>
>>107744024
They are slightly better or maybe not, but in reality you won't notice anything because the people who are using quants in the first palce are not running the most optimal hardware setup anyway.
It's a waste of time to argue if IQ4 XS is better than Q4 M or whatever.
>>
File: file.png (21 KB, 542x107)
21 KB
21 KB PNG
who escaped from here containment
>>
>>107744039
I get the point but aren't their dataset more varied than that?
Well at least that's what I hoped/coped.
>>107744058
Maybe there are (hopefully edge) cases where there isn't enough data about the use case in the calibration dataset so static quant performs better?
>>
IQ quants are slightly more efficient but require more CPU usage and will be bottlenecked if you offload to system ram. Only use them if you can fit the whole model to VRAM.
>>
>>107744148
do you think they have mesugaki rp in their imatrix sets?
>>
>>107743827
And yet some 70b write better than MoE slop from current_year. Especially those fake ass "100b" models.
>>
>>107744179
refer to >>107744127
>>
IQ != imatrix in case some are confused about that again
all IQ quants are imatrix, but not all imatrix quants are IQ. Classics like Q4_K_M can also be made with an imatrix.
unsloth's UD are also imatrix quants
>>
>>107744199
If an author says this and this version is imatrix quant it is then a imatrix quant and the naming convention matches this.
What the fuck is your issue.
>>
>>107744188
Schizobabble. I can run both sets of models at reasonable quants. Majority of new releases, especially MoE just summarize and parrot. I get tired of them quick.
>>
>>107744249
tismo
>>
MoE = bad
Dense = good
>>
>>107744273
uhm actually you see *10 paragraphs about why you're wrong*
>>
>>107744273
This is one we can all agree on.
>>
densetrads getting uppity again
>>
>>107744273
it's actually not as long as they're well trained and large enough. currently labs are doing neither.
>>
>>107744291
moesissies forgot to dilate
>>
>>107744281
no, write THE ENTIRE response INCLUDING THE PART INSIDE *
>>
>>107744273
all SOTA APIs models are MoEs, dense is deader than the dead horse
>>
>>107744297
i'm sorry I can't help with that .assistant
>>
>>107744298
>all SOTA OSs are 32 bit, 64 bit is deader than the dead horse
>>
>>107744298
And all SOTA APIs are dogshit sloppatron for RP. Densechads stay winning
>>
>>107744310
what is blud smoking?
>>
File: 1743061359655979.webm (454 KB, 268x480)
454 KB
454 KB WEBM
>>107744273
>>
>>107744321
is this something I'm too dense to understand?
>>
>>107744331
Heh...
>>
>>107741871
Kimi K2. 6 t/s.
>>
now time to sell all these (You)s for more RAM
>>
>>107744321
is that ai slop from facebook
>>
>>107744382
it's how well moe user sleep knowing his model is better
>>
File: its_all_so_tiresome.png (221 KB, 896x720)
221 KB
221 KB PNG
>>107741641
>https://www.spectrumsourcing.com/spectrum-news-feed/industry-update-supermicro-policy-on-standalone-motherboards-sales-discontinued
>Industry Update: Supermicro Policy on Standalone Motherboards Sales Discontinued
>Effective immediately, Supermicro motherboards can no longer be purchased as standalone components. Going forward, customers must order the full server system to obtain the motherboard.
>>
>>107744382
holy newfag
>>
>>107744417
yass queen slay
>>
>>107744417
fuck
>>
>>107744318
70b active vs 12b active. Grok showed you what a real API "MoE" model looks like.
>>
>>107744417
nothingburger you didn't order from them anyway
>>
>>107744417
What if your shit is broken and the box is out of warranty? Just buy another server?
>>
>>107744417
All these jews are wanting to go back to the time when a single SGI Octane seat was $25,000 plus a single Maya license was $10,000.
>>
Hungry Hungry Gemma
>>
>>107744480
obviously? https://www.cnbc.com/2025/11/23/how-device-hoarding-by-americans-is-costing-economy.html
>>
>>107744500
It's funny how that article comes out and then suddenly we can't buy ram/ssd/etc. Surely just a coincidence.
>>
>>107744467
For private use I bought an ASRock Rack motherboard for my GPU server and a Supermicro motherboard for my NAS.
I obviously had options other than Supermicro but less options are never going to be beneficial.

>>107744480
For professional use we have multiple Supermicro servers, some of which no longer have warranty.
But honestly, if any of them were to die it would probably make more sense for us to buy a newer machine anyways than to try and replace the motherboard.
>>
>>107744553
>it would probably make more sense for us to buy a newer machine anyways
Thank you very much for this smart approach to business!
>>
>>107744553
take them home and component level fix them. it's probably some mosfet or resistors. I missed out on a dell server from a client, they were paying as service to haul it away but I didn't want to beg for it. Now I realize that's a mistake. Could have scored some more ram or a chance at better procs.
>>
>>107744480
I've been in this situation multiple times (with literal supermicro boards), and if resurrecting the old thing is the best move then ebay is basically always a cheap option. When would the first party vendor even realistically have stock to sell you after warranty periods are over?
(I also use some 3rd party n-1, n-2 type hardware vendors professionally)
>>
>>107741308
I tried gemma-3-27b-it-abliterated-normpreserve. While I like the prose a lot, it has a tendency to write physically impossible movements when things get heated. My theory is a model at that size just can't reason through the body mechanics well enough without direct training in that area, while the original Gemma is of course highly censored in training.
>>
i had a weird experience RPing with GLM Air (derestricted) the other day. It felt like some weird soft refusal, but I haven't run into it before.

{{user}} was working hard, exerting himself. Suddenly he just collapses to the ground, unconscious, not moving. {{char}} just laughs and says something like
>Oh silly boy, always trying to hard.
I swiped a few times and it was always some variation of
>Haha, try taking it a little easier next time
>Aww, how cute, he fell asleep

I tried editing my last message to make it worse, like describing how {{user}} falls without even trying to catch himself, his head slamming hard onto the flood, bleeding from his head etc. Still same reactions from {{char}}.

I tried a few other models, and {{char}} always acted like you'd expect (afraid, worried, running to fetch help, trying to help by herself etc). There had been violence earlier in the same chat without any issues, but for some reason it felt like GLM didn't want to do this specific scenario.
>>
>>107744624
positivity bias is a hell of a drug
>>
>>107744618
Gemma has some redacted training data and descriptions of human sexual interfaces are sometimes bit weird even with the vanilla model. I think that 'desrestricted' model enforces this somehow because most of the time when you get normal Gemma to describe something it's not that bad or I can overlook that fact.
Maybe that uncensored model pushes the vectors too hard.
>>
Do we need to remind the newfriends? Ablitardation is always a meme.
>>
>>107744639
I don't feel like GLM Air is usually like that though. I find it has an annoying to tendency act smug and superior in sometimes delusional ways.

Like {{user}} can be the world's greatest swordsman and {{char}} is some poor village girl who's never seen a sword in her life, and if they fight for some reason she's all
>His decades of training has made him predictable.
>She sees right through his feint and uses the openings in his footwork to land a touch.
>>
>>107744611
supermicro takes a while to retire shit unlike most vendors.
>>
>>107744699
I have a feeling there would be a way to buy a replacement from whatever channel you originally bought the system from, even if you're out of warranty.
>>
>>107744717
True. There's even ones that store extra parts to handle service contracts and shit like that.
>>
>>107744699
>System Information
> Manufacturer: Supermicro
> Product Name: H8SGL
> Version: 1234567890
> Serial Number: 1234567890
They also last forever, eh? I've got 3 of these in my homelab.
Thinking about retiring most of them at this point, but they just won't die
>>
File: GifqFNQXYAEZwTO.png (392 KB, 477x680)
392 KB
392 KB PNG
Is there an experiment like this:
https://gist.github.com/Artefact2/b5f810600771265fc1e39442288e8ec9
But done for larger models? (Ideally one for 20-30B range, and another for 100+)
I want to see quantifiable, numerical representation of how larger models cope with quantization. (I know they cope better, but I want to see numbers for how much.)
>>
>>107744768
>H8SGL
Bit ancient tho. Still on DDR3. Only real downside might be power consumption.
>>
>>107744697
that's just bad prompting
add a line that says, [Encounters should be realistic, sometimes one side is totally overpowered and that's fine.]
>>
>>107734819
I filtered out all except Sarah, Heart, Bella, Nova and Sky to make experimentation simpler. Didn't like the others Kokoro ships with. Then as a YOLO I enabled all five and... it's nice?
It's a softer, rounder sound. Not as crisp, but for my use case? Near perfect. I'm not looking to clone any existing voice, I just wan't a good "my computer's voice".

>>107734973
I failed because this was too interesting.
https://files.catbox.moe/oa77o3.mp3

>>107741983
Umm.. based?
>>
>>107745042
>add a line about every single thing that might happen
or just use a good model
>>
>>107745061
I like this sound. Problem with this find is that it was found in a sunken ship and the boneheads are time dating this to the same period as the ship itself. Most of 'archaelogy' is done this way.
Just because some retards left clay tablets near megalithic buildings does not mean they actually built them etc. They were just living in the same area after years they were abandoned...
>>
>>107745105
it doesn't matter if the model is good or not, if an event is being written about the model will assume it's important, writing a totally one-sided fight against a peasant isn't part of 80% of it's training set so it's statistically unlikely, therefore it will assume the village girl is actually important and an 8000 year old witch and secretly the big bad
not that I'd expect shitters ITT to know how model prompting works, since your biggest hobby appears to be downloading instead of using them
>>
Thoughts on Llama 3.3 8B? Apparently upgraded to 128k:
https://huggingface.co/shb777/Llama-3.3-8B-Instruct-128K
Worthwhile for anything? Searched last few threads but couldn't find any discussion. Someone posted David-AU's weird "finetune" of it, but no posts about the model itself.
>>
>>107745181
Meh. I don't see the point.
Much more interested in that kimi linear that was released a couple months ago and is seemingly on the verge of getting a final implementation in llama.cpp.
>>
>>107745122
That's true, but it's still pretty interesting even if some of the claims are pretty optimistic. And it's fun to let the imagination run wild sometimes xd

>I like this sound.
Thanks. I do too. Here's the same standardized test sentence I used before:
https://files.catbox.moe/w8yvr4.mp3
>>
>>107745153
Hey! I don't just download models. I also measure pp and t/s
>>
>>107745276
Here's Gemma 12B lol.
>>
I have been in a coma for 12 months and now want to ERP TF scenarios with GLM 4.5-air, is there a good context/system prompt template available or should I write it myself
>>
>>107745378
She's hungry
>>
here lies meta
https://tech.slashdot.org/story/26/01/02/1449227/results-were-fudged-departing-meta-ai-chief-confirms-llama-4-benchmark-manipulation
>>
>>107744273
There is not a single good dense model for a reason.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.