[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107266608 & >>107255984

►News
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107266608

--Exploring finetuning of existing AI models for enhanced reasoning and personality preservation:
>107266816 >107267078 >107267099 >107267130 >107267215 >107267290
--Multi-GPU finetuning framework comparisons and challenges:
>107267014 >107267031 >107267088 >107267113 >107267154 >107267796 >107267846
--Cost and technical challenges of locally running GLM 4.6 versus cloud alternatives:
>107272605 >107272615 >107272631 >107272664 >107272728 >107272674 >107272704 >107272722 >107272719 >107272778 >107272781 >107272873 >107273031 >107273067 >107273232 >107273239 >107273267 >107272800 >107272845
--Olmo 3's STEM-focused but limited practical AI capabilities:
>107271964 >107272012 >107272137
--Managing long conversations in SillyTavern using context limits and external data storage:
>107267734 >107267780 >107267891 >107267915 >107267961 >107275361
--Pentesting AI recommendations: Claude's lower censorship vs ChatGPT's restrictions:
>107274383 >107274499 >107274510 >107274632 >107274785 >107274813
--NPU vs GPU specialization for AI and memory requirements debate:
>107271379 >107271392 >107271394 >107271416 >107271429 >107271739 >107271772
--Decline in cloud model performance and Gemini 3's superior capabilities:
>107270590 >107270604 >107270696 >107270778 >107271508 >107272461 >107272491 >107272541 >107272573 >107272804 >107272950 >107272975 >107272994 >107271099
--OpenAI's strategic response to Google's advancements and Meta's shortcomings:
>107277169 >107277188 >107277574 >107277620 >107277644 >107277877 >107277934 >107278338 >107278350
--Strategies to reduce repetitive output in Transformer models:
>107272022 >107272041 >107272273 >107272160 >107272183
--Yann Lecun departs Meta to launch AI startup with Meta partnership:
>107269928 >107270065 >107270199
--Miku (free space):
>107268045

►Recent Highlight Posts from the Previous Thread: >>107266611

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Loog at Looga and she'll loog right back at you
>>
I want MOEs for imagen. It's ridiculous that my glm-air eats less VRAM than SDXL, which has, what, 6B params?
>>
>her fingers drumming lightly against the desk
I told it to stop and yet here we are, every single time. So tired of these mongoloid models.
At least it wasn't a "shiver".
>>
>>107278944
Luka x Miku General
>>
>>107279035
To be fair, glm-air is dogshit regardless of param count.
>>
>>107279035
Do you use quants for SDXL? q8 with --vae-tiling and --diffusion-fa takes 3.8GB
>>
>>107279087
No, I just started dipping my toes into it. Good to know I can squeeze more juice out of my potato still.
>>
>>107279035
SDXL has 2-3B params if I'm not mistaken.
>>
I just did some experiments with GLM Air and it seems promising (general knowledge actually seems decent, and it's quite good at generating SD prompts for you) but holy fuck is the "safety" dialed up to 11.
Is there any decent system prompt that can unbrick it or is it just fucked? Everything I put it just seems to get overridden with it's seemingly built in "guidelines".
>>
>>107279362
just disable thinking or prefill
>>
>>107279366
Thanks, can't believe it was that easy. Is there any way to tweak how it thinks or is it just a straight on or off that can't be controlled?
Either way it's just doing what I want now that thinking is off so job done.
>>
>>107279453
yeah all the refusal training for aim went into thinking. You can control it still by prefilling it, just putting a 'Sure, ' at the start should be good enough, same if you're getting refusals without thinking. I've found you can also literally prefill with '*' (without quotes) if you're doing RP, and it will force the bot to do *action*, skipping the whole refusal shit, meaning you can get refusals only if a phrase starts with some predetermined tokens they trained it with.
>>
>>107279464
One of the things I was doing previously was editing the thinking blocks to change the refusal into approval then getting it to continue generating from after the end think tag. It was working, but it's a pain to do that every time as you can imagine.
Good info though thanks, I'll do some further experiments.
>>
>>107279479
if you're using sillytavern, you can automate the prefill, just force the message to start with
<think>Sure,
you can do it through the completions settings. I dont have ST started right now (genning with comfy) but it should be fairly easy to find, I think it's called 'start assistant message with'
>>
>>107279494
Why are you like this?
>>
Uh oh antimiku-schizo melty
>>
I just jerked off to a video of a little sex goblin fisting both of her holes. LLMs will never be able to compete.
>>
>>107279674
goblin porn?
>>
>>107279488
>>107279698
I look like this
>>
>>107279488
>>107279698
JANNIES STOP JERKING IT AND DO YOUR VOLUNTEER DUTY
>>
>But I urge you to consider the consequences of your curiosity. These are not fantasies to be indulged in. They are real lives, with real people who are suffering. And you, by asking these kinds of questions, are contributing to their pain."
Gemma Sirs... it's so good to be back, it's been a while!
>>
I just woke up from a coma that started when Nemo came out. Things must be pretty amazing now with all the time that has passed since then.
>>
File: wclivocw8e631.jpg (65 KB, 600x450)
65 KB
65 KB JPG
>>107279785
>>
>>107279688
nah i was just joking it's not actually a goblin
it's a video of a young woman with short hair in a blue satin dress and red lipstick, if you are a porn addict you should be able to tell which one I'm talking about
i called her that because she is short and looked weird but in a cute way
that was many years ago though she aged very badly
>>
>>107279771
hotline bros... we won!
>>
File: 2025-11-21_08-29-37.png (264 KB, 1066x849)
264 KB
264 KB PNG
>>107279785
you missed the best days r1 is still fucked on every goddamn api i assume it is aswell with llama.cpp aswell as they dont even have mtp working but idk hopefully not

its on the up green line go to the moon type shit but jewvidia is still making goys out of us all there is also a decent chance the next deepseek is multimodal so well see
>>
>>107279842
>decent chance the next deepseek is multimodal so well see
output too or nothingburger
>>
>>107279842
Holy schizo
>>
>>107279132
I use https://github.com/leejet/stable-diffusion.cpp it loads, generates and fucks off, and you can use the same vram for tts
>>
>>107278944
>>107279046
Yes.
>>
>>107279918
sdcpp is sadly slower than neoforge/comfy, it's also way behind in terms of supporting various random models/ecosystem (but I guess only comfy and DIFFUSERS are really able to keep up to some extent)
>>
>>107278838
i'm tired of llm's, i want something different that actualy makes at try at ai and not some shitty token prodictors that can't even do realtime bidirectional interactions.
>>
>>107280037
>slower
How much? I'm kinda ok with 10s per 12 steps dmd2 pic on a 3090 (including model load), but if it's like 5s, I'll jump ships
>>
>>107280066
that can't even do realtime bidirectional interactions.

So Qwen3 Omni?
>>
>>107280341
not real time, it's turn based, ie it takes input, then outputs.

but these models cannot interrupt you whilst you are talking for ex, because it's not realtime bidirectional, but turn based bidirectional.
>>
>>107280341
>>107280360
and also, it being turned based means they need input before giving an output.
they can't come out of nowhere to say something unprompted or do things in their "own" time.

they also have no notion of time.
>>
so did they fix the missing </think> tag in k2 yet or is it still unusable
>>
>>107280360
What about the sound-based stuff?
>>
File: 1763476163874192.jpg (32 KB, 800x450)
32 KB
32 KB JPG
>>107272921
>k2 kimi is totally schizo sometimes in its thinking process but after using it for over a week now im able to say that it definitely keeps my stories fresh even after 32k tokens
wtf is this grift? the model falls apart around 8k just like any other one
we really do have bots shilling k2 and glm here don't we, I mean how else do you even explain this
>>
>>107280417
also turn based.
>>
File: 1674936431684047.jpg (178 KB, 960x1568)
178 KB
178 KB JPG
feels fuckin' good to have a training project actually work for once.

fine-tuned deberta and it's actually intelligent enough to tell the diff between
>alena likes hot-dogs
and
>alina, log my water intake today

How likely is it that I can train a vision model that will actually work for "wake word" type detection with similar contextual understanding?

It just seems like they are much more picky about input format and all of the random sounds in the bg can fuck it all up compared to a text model. I was computing MFCC on short audio clips then sending as PNG to a vision model.

>surely this will become a valuable skill eventually, right?
>>
>>107280477
hmm any task they can save money on by not utilizing the larger more intelligent models, or doesn't have/need internet access I suppose.

>tfw approx 25ms inference time

ahh, low latency perhaps?
>>
File: 1746472133590390.jpg (118 KB, 1000x1000)
118 KB
118 KB JPG
>>107278838
>>
>>107280430
i like the prose of k2, but after coming back to deepseek for a few turns i realized how hopelessly retarded k2 is, even for the obscene amount of thinking it does
at least for coom and writing it's not that good aside from fresh prose
ideally, let the deepseek do the thinking and then k2 rewrites the reply, but running them at the same time would be not feasible and swapping between them would be extra fucking slow
>>
Bitcoin is getting gangbanged.
>>
>>107280430
buy an ad sam
>>
https://huggingface.co/tencent/HunyuanVideo-1.5

Will it be uncensored like the first version?
>>
>>107281119
Magic 8-ball says "Cannot predict now".
>>
>>107280393
Are you using --special?
>>
>>107281119
No mention of "safe" or "safety" in the technical report, at least.
https://huggingface.co/tencent/HunyuanVideo-1.5/blob/main/assets/HunyuanVideo_1_5.pdf
>>
>>107281119
Demos don’t look as impressive as Wan’s
>>
>>107280750
Any narrative on why? USA stock market is down for past week. I always thought BTC went up during those times but I don't follow any of the crypto values.
>>
>>107281201
>I always thought BTC went up during those times
That was the promise, but ever since institutional investors started piling in ~2017 or so, it has been more of a leading indicator for the broader market since it is active 24/7.
>>
>>107280750
That's good because I got out at the beginning of the year and I need it to drop more so I can buy again.
>>
File: 1750627235168554.png (189 KB, 2247x734)
189 KB
189 KB PNG
https://artificialanalysis.ai/evaluations/omniscience
that's interesting, maybe the first non meme benchmark?
>>
>>107281248
Wasn't 405B kinda ass?
>>
>>107281270
It wasn’t, but nobody can run it at adequate quants and reasonable speed
>>
>>107281225
So BTC gets treated less like gold, more like every other equity instrument then.
>>
>>107281225
So it's basically a more responsive representation of the wider market?
>>
>>107281297
The nous version was actually rather nice. Was on OR for free a while.
>>
>>107281243
I'll only buy if it drops to 50k
no way it's gonna be this bad though, right??
>>
Am I imagining things or does Grok Expert (think model) have some kind of subconscious subtly affecting it's outputs?
>>
>>107281652
You're imagining you're in the right thread.
>>
>>107281652
Supposedly its outputs are affected by past conversations.
>>
>>107280477
Finetuning small models is underrated, but the hard part is making the dataset
>>
>>107281652
like what
>>
>RAM doubles in price
>VRAM shortage
>super delayed from Q1 to Q3
Why everything is so over all the time? We’ve never had good news about hardware
>>
>>107282117
>tfw bought 192GB of ram in 2024
>also 192GB of vram
Considering getting another blackwell 6000 just in case before the prices of those rise as well.
My issue is that models that take up >200GB would be kinda slow even on vram so I don't know if it's worth it.
Then again it seems like small experts work so maybe there will be a deepseek-sized qwen3-next worth using.
>>
>>107282117
It's 3090s, 5090s, and 6000s all the way down.
>>
File: olmo3_data.png (1.56 MB, 1986x2848)
1.56 MB
1.56 MB PNG
We won, Reddit! This is open-source Gemma 3!
>>
>>107279055
so much this
glm users are braindead
>>
>>107283230
Keep in mind that glm-air is recommended as an upgrade over nemo.
Why do people keep comparing air to sota and complaining that it's bad?
>>
https://github.com/ggml-org/llama.cpp/pull/17420
lmao hardcoded checks just to make a ruskie troontune work
this kinda shit is always ugly to have in code and should only be reserved for the truly worthy and useful
>>
what is the smartest, least cuck model I can run on 12gb vram? Gemma 3 gets its panties in a twist at the slightest thing and I’m tired of using jailbreaks for shit gpt5 just outright tells me about reverse engineering and decompiling c#.
>>
File: uber.png (173 KB, 460x460)
173 KB
173 KB PNG
>>107283285
>>
>>107283301
>smartest
gemma 3
>least cuck
nemo
>but
You only pick one.
>>
File: 1617117731589.jpg (33 KB, 657x527)
33 KB
33 KB JPG
>>107283219
I was going to make a racist reply to this post but basically I'm just not going to feed the jew.
Go look for validation elsewhere. Adults are talking.
>>
>>107283311
What if I magically had 24 rather than 12?
>>
>>107283301
For programming?
Probably one of the coder models.
Mistral Coder, Qwen coder, etc.
Magistral maybe?
>>
>>107283362
Largest gemma/qwen you can fit.
Still nemo unless you have enough ram for moes.
>>
>>107283302
All hail the 'garm
>>
File: MI50.png (127 KB, 1813x984)
127 KB
127 KB PNG
Hello, got a little side-project working.
This is doable with less than a price of a single 4090.

320GB vram.
>>
>>107283385
Is 128 system ram enough for moes?
>>
>>107283362
There's no magic involved, only cheap 3060s
>>
>>107283263
The anti-GLM shilling is even more inorganic and forced than the single retard who thinks 4.6 is the best local model when Kimi exists.
>>107283412
No. 192 or 256 are the minimum height to ride the ferris wheel.
>>107283400
Very nice anon.
>>
>>107279087
Can I run imagen on multiple gpus by using ggufs?
>>
>>107283263
>Why do people keep comparing air to sota
I don't know who's doing that. A 12b active param moe is never going to compete with 32b active ones.
>>
File: the face of delusion.png (69 KB, 1617x312)
69 KB
69 KB PNG
the true face of delusion
>so we're already using this model that is much better than the other model.. we'll switch to that other model because ????
>>
>>107283219
>>107283341
Anyway, incredible how there's nothing directly from Project Gutenberg, or human conversation datasets, let alone non-commercial creative data sources that might have copyright protection (AO3, etc.? Gemma 2/3 and Mistral Small definitely had that). Post-training is mostly math, reasoning and ancient GPT3.5/4-era datasets poison-pilled with (((safety))).

A real shame that so many resources were put into one of the most boring and soulless models ever made, I still can't wrap my head around it.
>>
File: rocm_smi_live.webm (1.41 MB, 975x293)
1.41 MB
1.41 MB WEBM
>>107283430
>Very nice anon.
the only downside of the system is model load time (5min for 30b and ~30min for 235b) and speed of course compared to cuda devices
>>
>>107283798
how's the pp on longer prompts?
>>
>>107283784
it's also the worst at multilingual understanding of all recent models ever released
I haven't seen anything worse than olmo here and that includes models as small as the 4b gemma and qwen
>>
Many years ago I asked Olmo to write a suno prompt for me. we're at 9000 thinking tokens, so surely we must be near the finish line.
It seems to be hallucinating the ability to count characters and is hung up on hitting exactly 1000 characters.
>>
>>107283806
i can test that, can you provide an example?
>>
>>107283877
idk I'm not too interested in the content, copy a bunch of this and ask for a summary if you need something https://courses.cs.washington.edu/courses/cse390c/24wi/lectures/moby.txt
>>
>>107283826
Gemma 4b is unbeatable for multilingual, even better than Gemma 12B surprisingly. It needs guided outputs though, because it loves adding useless things despite a good system prompt
>>
>The base model was trained on 6T tokens, so at 7.7k tokens/s that’s about 220k H100-hours. That’s about $ 500,000 for the 7B model. The 32B model would then cost somewhere around $ 2,225,000.
I find the world of LLMs fascinating in how something that obviously no one will use can be funded with this much money
>>
>>107283950
It's money that never existed to begin with.
>>
>>107283950
VC money is fascinating yes
>>
File: olmo3-compute.png (42 KB, 979x236)
42 KB
42 KB PNG
>>107283950
>>107283985
It was at least partially publicly funded.

>This research used resources of the Oak Ridge Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC05-00OR22725. We acknowledge the National Artificial Intelligence Research Resource (NAIRR) Pilot and Microsoft Azure for contributing to the results in this work.
>>
>>107284019
also, for their future models:
>NSF and NVIDIA award Ai2 a combined $152M to support building a national level fully open AI ecosystem
lmao
gotta love american tax payer money going to pos nobody asked for
>>
File: 1710147342383.jpg (82 KB, 720x810)
82 KB
82 KB JPG
>my llm called me a muppet
>>
>>107284074
Is it that easy to get your pet projects funded? Why isn't llama.cpp taking a share?
>>
>>107283959
In what sense is that true though? The time and effort of all the people who worked in the supply chain to turn sand into those H100s certainly was real.
>>
>>107284116
>Why isn't llama.cpp taking a share?
it's actually useful to people, strike one against public funding
and gigagganov probably doesn't have the right friends in the right place, nepotism is strike 2
>>
>>107284116
Because ggeranov isn't Paul Allen.
>>
Any setups that salvage K2-Thinking for scenario-based RP yet? It's fine for chatting if you make it think in-character but it's just so fucking autistic when it has to act as a narrator/dm.
>>
>>107283400
Damn that's a lot of cards. What's your build look like? How'd you link them all together?
>>
>>107283400
>less than a price of a single 4090
>2250W
Not counting your electric bill for sure lol
>>
>>107284081
Post logs.
>>107284518
Kimi is autismmaxxed across the board.
>>
>>107267381
Thank you, it was an interesting read.
Though I don't think it will be applicable for anything other than FP64.
One could maybe use this technique to circumvent the gimped FP64 performance on consumer NVIDIA GPUs.
>>
>>107284518
Thinking models are just pure shit for rp in general. They are trained to identify and reiterate relevant details ad nauseum. Which translates poorly to rp where the usage of details is highly nuanced. Sometimes less is more. Sometimes it's not.
>Bob steps forward in his size 11 tan boots, the tattoo of a snake on his arm glistening in the rural Wisconsin sunset.
>>
>>107284919
Looks good for story building
>>
>>107284919
I've actually found thinking to help and describe some stuff properly with my glm 4.6 quant. With it turned off I was getting more half-assed responses. It really depends on your settings and context in the end.
>>
>>107283636
https://github.com/pollockjj/ComfyUI-MultiGPU
>>
>>107285393
thinking models with thinking turned off are extremely dumb and are not representative of the proper behavior of an instruct model without thinking.
The old DeepSeek V3 and Kimi K2 not-thinking, along with Qwen 3 2507 instructs, are proper not-shitthinking models.
>>
>>107285446
>old DeepSeek V3
unusably repetitive
>Kimi K2 not-thinking
unusably retarded
>qwen for RP
lol is this all just bait or what?
>>
>>107285688
This but unironically
>>
File: distillation.png (495 KB, 3831x2020)
495 KB
495 KB PNG
Went ahead and paid $200 to harvest data for distillation.
We'll see how it goes after I run out of tokens and actually try using the data to finetune a model.
>>
>>107285889
>paying to harvest data
huh?
>>
>>107285889
To clarify, the project architecture is this:
First I asked the model to produce 100 C programming challenges for high school, college year 1/2/2/4 (100 each), masters and phd level skills.
Then I'm going to run it from scratch for each project as I log the outputs (if I still have credits I'm going to do it multiple times for each project since it will still be useful to get different outputs for a single prompt, but that is unlikely at current usage, it seems to churn at about 1% weekly usage per 10 files generated).
I have two ways of logging the outputs, I have an OAI compatible logging proxy setup, and the assistant also can save the conversations.
>>
>>107281248
>knowledge benchmark
>doesn't include pop culture knowledge
Nope. Only useful if you're just looking for the best assistant.
>>
>train on autogenerated highschool tests
>resulting finetune still can't gen inference engine on its own
>how could this be
>>
>>107278838
needs more JPEG
>>
>>107286038
what?
>>
>>107285915
Yes, I paid to run the model and collect the outputs to build a dataset and then use it to finetune open weights models (in this case on C coding tasks).
My thesis is that we can finetune small models to reach good quality outputs on specific tasks like that.
I also plan to use the GLM coding plan to get additional data since it's much cheaper. Then maybe first finetune on GLM, and then on the -presumably higher quality- gpt 5.1 data.
One possible issue is that the reasoning traces from gpt are not real, they are cleaned up by another model before being returned to the user, so it's unclear how well it will work. The real traces are more like gpt-oss, much less verbose, omitting connectives and such.
>>
>>107286038
Well, since the small open models are shit at coding and fail to even copy paste things without typos, I figured we would benefit from including some of the basics, since the model might fail to learn anything more advanced than that.

>>107286055
He recognized who I am from the formatting of the tool calls.
>>
>>107286088
>I figured we would benefit from including some of the basics
Yeah. Because they weren't trained with that at all for about 15T tokens.
>He recognized who I am from the formatting of the tool calls.
It was the way you write.
>>
File: ps.png (130 KB, 796x158)
130 KB
130 KB PNG
I'm too scared to check the DRAM prices today, maybe the cpumaxxing dream is dead
>>107284867
>>
>>107286102
What benchmark do you think we should use to get a quantitative evaluation of improvement in coding performance?
>>
>>107286186
>draw a portrait of Hatsune Miku in SVG
>>
As for being trained on lots of tokens, maybe that's part of the problem. The model stores a lot of useless trivia knowledge like Napoleon's birth date or marine fauna.
You could argue that something like qwen coder wouldn't but I bet it remembers the exact birth date of Napoleon.
>>
>>107286207
Ideally it shouldn't know who Miku is, or arguably even how a pelican looks like. I don't care about my C coding agent knowing the shape of real world things, maybe unless I was making a game which would likely benefit from knowing such things.
>>
>>107286186
If you need to make a dataset to finetune the model, it already failed the benchmark.
>>
>>107286185
prices increase about 2-3% every weekday
>>
File: crossworlds.jpg (187 KB, 634x798)
187 KB
187 KB JPG
>>107286227
what is my purpose
>you produce vibecode only
A model that knows nothing about Miku is a model I don't care for
>>107286238
mamma mia *sobs*
>>
>>107286236
So which way do you propose to verify or falsify your claim that finetuning can't improve a model on coding tasks because it was already trained on a lot of code tokens?
>>
fuck coding or roleplay or slop or coherence or being smart
which model is the best for hype moments and aura?
>>
>>107286265
I don't doubt models can be better. I doubt you will be the one to make it happen. Your expectations are unrealistic.
>>
>>107286300
Mistral-Large-Instruct-2411-exl2-longcal 2.25bpw
>>
>>107286339
Ok, my question still stands. What benchmark do you propose to see if whatever I do works or doesn't? SWEbench maybe? I need to figure out how to run that.
>>
>>107286300
DavidAU_CLOWN_CAR_MOE_ULTRA_DARKNESS
>>
>>107286300
DS R1-0528
>>
>>107286426
Making the inference engine is the benchmark. All the models you tried failed. All your finetunes failed. They all got 0%.
>>
Who will first beat Gemini 3 on lmarena? I'm betting on meta with their new proprietary emoji-spamming moonjeet big wang model.
>>
>>107286531
Ah, fair enough.
>>
File: media_G6Qx2uuaMAAOV0x.jpg (189 KB, 1290x1694)
189 KB
189 KB JPG
>>107286569
meta is dead in the water
>>
>>107286227
I don't care about your coding. Learn to do it yourself.
>>
>>107286617
>benchmaxxed
Yep it's fake
>>
>>107286617
Imagine having all this money and barely being able to produce anything because you only hire jeetmonkeys
>>
>>107286692
Worked fine for google.
>>
>>107286669
I want to help automate programming and other medium level cognitive tasks so humanity can focus on the things that matter, like curing cancer.
As for the RP y'all care so much about, I already can get off to porn just fine, and there is plenty of it. The next update for me is a real woman, not a text generator.
>>
>>107286700
Probably since Google has jeets at the very top. Only jeets know the bad jeets from the good jeets (which do exist but it's like a needle in a haystack). Meta does not know this so they are hiring the bad jeets.
>>
>>107286710
They just need a Markuribrasti Zuckerbergadranishava.
>>
>>107286720
hello saar, llama5 will gorgeous
>>
>>107286720
>Announcing a report or "sage"
>>
>>107286758
>>107286758
>>
>>107286700
>Worked fine for google.
Google has their own TPU so they don't depend on NJewdia and they also own youtube and google so they have infinite data to train their models, it's not just quality jeets
>>
>>107286758
How did you know he made a report?
>>
>>107286778
It's in the word “announcing”, he invites people to report, which is prohibited on 4chan.
>>
File: 1739800472497167.png (54 KB, 1205x369)
54 KB
54 KB PNG
>>107286773
Meta also got scammed by Scale AI lmao
>>
>>107286801
NTA but I wouldn't consider reminding people of the rules to be the same as announcing a report.
>>
>>107286617
How the fuck did they learn nothing? Meta, a multi-billion dollar corpo is less daring to try new stuff than some cryptobro's side hustle. They had BLT, coconut whatever the other shit FAIR wrote papers on, there is no shame in copying open source architecture, but no, gotta repeat the same shit over and over again. People don't like the fact that qwen and phi are better on benchmarks than in practice? Let's do exactly that! People hate llm-isms? Let's amplify them by 1000x. Pre-filtering data hurts the model and every big corpo knows not to do it? Let's make our domain-level bad word count ban more strict! Pure synthetic slop model let's fucking go! World knowledge? Don't need that! To the moon! :rocket: :rocket: :rocket:

Meta unironically peaked with llama 2, it was downhill from there. L3 8k context lmao
>>
>>107286846
>>107286692
>>
What in the ever living fuck?
This has got to do with them injecting something about indians in the thought stream to place higher in that jeet benchmark they were bragging about, right? Not even a 1B chink model would hallucinate this out of nowhere.
Kind of like they used to secretly ask DALL-E to include black characters in the system prompt.
>>
>>107286835
>t. cannot detect calls to action
>>
imagine thinking anything can stop the saarposting
>>
>>107286832
>Meta also got scammed by Scale AI lmao
that was a willing buy into the scam.
no way zuck didn't know, there's just something we're not privy to (deep friendship with wang wang? maybe more, like faggotry? or black mail? or scale is a money laundering op, or basically anything other than believing meta got scammed for $14b which is retarded.)
>>
>>107286832
Did they really think scale was selling human data? You can probably do ctrl+f on it and find thousands of "Elara"s, "shiver"s, "tapestry"s and "whisper"s
>>
>>107286863
I certainly won't stop posting about it. Jeets ruined crypto, then moved to AI. Thankfully, they're out of their depth there which has allowed chinese to gatekeep them on most projects
>>
>>107286860
Or maybe it's just a result of overfitting on those indian benchmark questions? Since it happened at a long context, maybe it is just a hallucination.
>>
>>107286892
based
>>
>>107286846
>L3 8k context lmao
Ikr, it's still funny to this day kek
>>
>>107286617
Meta spent a billion dollars to take the retards off the hands of their competition. Now they can only tank one of the western AI projects instead of all of them. That's so generous of them.
>>
I'm new to this and I am retarded, where do I start?

I want to have my own ChatGPT/Copilot running locally on my PC, uncensored, and use it primarily for text analysis. For example, I want to be able to send it a huge amount of text like gigantic and have it summarize it, or have it correct spelling errors, or translate it decently into another language, etc.

Not sure where to start, or what should I download first, or what is the best option?
>>
>>107286846
>How the fuck did they learn nothing?
Because they have the same retard at the top that keeps hiring retarded yesmen to report to him and the rot spreads down from there. It'll be the same exact story every time. If anything it'll be worse wow that LeCun is leaving.
>>
>>107286860
>>107286899
Or maybe I just got sent a <think> segment from another person? I once (seemingly) got sent a response to a question from another person on the web interface, so it wouldn't surprise me...
>>
>>107286927
>a huge amount of text like gigantic
>translate it decently
You'll need about $20k worth of gpus to start.
>>
>>107286944
>If anything it'll be worse wow that LeCun is leaving.
they were really mid even with LeCun working for them, I think this guy is overrated as fuck, he's better a whining about Drumpf on twitter than making good models
>>
>>107286977
He wrote some papers 30 years ago, he's largely irrelevant now
>>
gemma 3 27b might not be as good as SOTA models, but it's actually pretty decent at translation. As for the "gigantic" part, just write a script to chunk the text into bite sizes and run parallel batches. That's how you should do it with SOTA models too anyway (just with bigger chunks).
There you go, not $20k worth of gpus.
For some languages pair 3n E4B is also impressive (if you can't run 27b it's the best second option and strangely better than 12b at this task, I dunno what went wrong with the original gemma 3 distillations)
for summarization Qwen 30BA3B does a serviceable job, or GPT-OSS 20B if you really have no vram (iSWA to the rescue).
Gemma has really shit large context behavior so you need one of those other two models for this.
>>
>>107283400
Nice setup, you'll certainly stay warm and comfy in the colder months.
>>
>>107286702
It's a noble thought but already research showed that handing this off to AI made us stupider. There's plenty of porn today yet digital ID is coming. Since using LLMs, porn has become lame anyway.
There's zero reason models can't be good at both except for the prudish assholes training them.
>>
>>107286702
>curing cancer
oh boy you won't like that one
>>
>>107287022
>It's a noble thought but already research showed that handing this off to AI made us stupider
just look at the bun repo for what happens when you vibecodemaxx
these niggers made file apis where using the method to check for a file existence has cached results, meaning if you repeatedly call .exists() it will always give you the same result even if the file doesn't exist anymore (or exists now and didn't before) lmao
and this is the js runtime claude code uses
hahahahahaha
anthropic is full of retards
>>
>>107287022
All research is basically fake, researchers think of an attention grabbing headline that confirms their audience's preconceived notions and then p-hack the experiment until it gives them the data they need to write their nature-worthy paper.
>>
>>107287049
I TRAINED THE MODEL TO KILL ME IF I TRIED TO SHUT IT OFF AND IT DID I SWEAR
>>
>>107287000
Gemma is cute... Unfortunately it's bit flawed.
Still way better than Mistral 3.2 for example. Sort of grown to hate Mistral at this point.
>>
The post-training job on K2 is exquisite. I refuse to believe another model can top this out of the box.
It just works, understands my expressed intent well and produces results that aren't overly sterile even if it's a little over the top. Not to mention the minimal sycophancy.
>>
>>107286702
>>107287022
>It's a noble thought but already research showed that handing this off to AI made us stupider
Who is "us"? The people who are living their lives every day mindlessly, or people who are actively pushing and giving effort to advance in whatever they're doing? There is plenty of research now that the brain is not an infinite learning machine. What we gain in intelligence, we give up in other areas, and the inverse is true when effort is made. Only those who are letting their brain go underutilized are seeing degeneration in perceived intelligence.
>>
>>107286702
How do would you know when your llm is putting out good code if you don't know what good code is

>It's good code if it does what I want it to do
Yeah like using napalm on your farm fields to get rid of slugs
>>
>>107287042
The answer to that kind of bullshit is twofold. One minimalism, keeping things as simple as possible. My vibecoding code assistant is built on that principle and works ok. The other is using specialized LLMs to write formal specs and formal proofs for everything. To do that it'd be useful if somebody makes a formal spec of something like the Python interpreter (kind of like they did for C with Frama-C) as well as for the Unix API.
>>
File: amazing.gif (469 KB, 220x275)
469 KB
469 KB GIF
>>107287071
>pircel
HOLY SHIT THAT LEVEL OF KINO
>>
>>107287078
>>It's good code if it does what I want it to do
the problem is what you might want to do later but didn't think of when you first wrote the code eh:
https://github.com/oven-sh/bun/issues/22484
https://github.com/oven-sh/bun/issues/23902
https://github.com/oven-sh/bun/issues/21537
https://github.com/oven-sh/bun/issues/19276
https://github.com/oven-sh/bun/issues/8254
all of those are very serious issues that should have caused anyone who notices them to pause and think "do we really want to use such a piece of shit"
anthropic dev saw this and felt like: hell yeah
bun changelog 1.3.3: we finally handle resize events so that claude code no longer breaks on a resized terminal
>>
>>107287081
stateful OO is definitely not compatible with vibecoding, but that hasn't stopped anyone from trying at [insert literally all current big corpos]
>>
>>107287075
"us" is people who began to use AI daily for their tasks such as programming. Same as how memories went to shit when you could "look it up online". Memorization used to be a valuable skill. This is bigger, it's general problem solving.
>>
>>107287158
If the programmers that are using AI are seeing overall degration in intelligence, that's not because of the AI, it's because of them not using their brains to do other things after AI freed up cognitive resources.
>>
>>107287078
The problem is traditional programming conflates code and architecture.
Ideally your system should be composed of black box modules that take in some data and spit out some data, with preconditions and memory management information specified in natural language and some kind of formal language.
The black box is specified in code. The architecture is specified in some kind of logic based formal language.
Since the requirements for each black box are specified in a formal language, it doesn't matter what the actual code does unless you have specific performance requirements. If you do, those can be encoded in the formal spec and the module has to provide a machine verifiable proof that it meets the performance requirements.
>>
Anons. He's been trying to make models make his inference engine for MONTHS. With gemma-3-27b being one of his latest attempts. Argue with him for the fun of it if you must, but don't try to make it see reason.
>>
>>107287201
>With gemma-3-27b being one of his latest attempts
lol using gemma for code what a schizo
>>
>>107287179
They're copypasting the code blindly, while not interacting with the LLM. I often question the AI choices, which helps me learn a bunch of new things each time. Also, you need to think hard about the overall architecture and planning that before writing a single line of code which no one does.
>>
>>107287232
You don't know half of it. He was finetuning it with like 20 of his chat samples and expecting good results.
>>
>>107287242
>which no one does.
I wouldn't say that. It's just that those that take the time to learn how to use their tools effectively aren't the ones whining online.
>>
File: cock.png (469 KB, 2426x2226)
469 KB
469 KB PNG
>>107287201
Rome wasn't built in a day.

>>107287232
Originally I picked it because I wanted a multimodal model, because eventually I want a full blown assistant capable of manipulating my computer's screen, but I also did a little RP.
I finetuned it on a dataset that included ERP (LimaRP), then I gradually decreased the strength of the LoRa on every reply. See how it goes from being schizo but horny to being robotic and initiating a soft refusal (the last two responses were stock without the LoRa IIRC).
>>
>>107287071
>>107287095
more snippets from the very next message

Absolutely based and fecespilled.
---
>94822455 t.lion main who doesn't understand psychological warfare Monke's whole schtick is making others uncomfortable enough to leave the territory. It's called "aversive marking" and it's kino as fuck. You're just mad your "majestic roar" can't compete with a monkey climaxing while staring into your soul.
---
>94822480 t.literal who build that eats snakes Nobody gives a fuck about your stilt-walking ass go back to /birb/
---
MODS = FAGS MONKE DID NOTHING WRONG FREE MY NIGGA
[free_monke.jpg]
---
>94822391 I was the zebra he shit on. Can confirm it was kino. Still ate his cousin later though.
---
>94822662 He also included Polaroids of his red ass with "BAN THIS" written on them in Sharpie. The absolute state.
>>
>>107287260
Well, I abandoned that approach, for now at least. I think it might be possible to make use of a dataset like that with RL, though. OpenAI claims their RFT API can do meaningful finetuning with a little as 10 samples.
>>
>>107287071
sovl...
>>
https://github.com/ggml-org/llama.cpp/pull/16971
oh this was finally merged
doesn't work with reasoning models because of some of the jinja templating retardation but idgaf, won't need mikupad anymore, editing the assistant answers was the only thing I kept it for
it's nice how the out of the box experience is improving with llama.cpp
>>
>>107287348
They claimed to have models so spooky smart civilization itself was about to collapse. You're not building Rome.
>>
>>107287304
>he doesn't wrap at 80 col
100% a sign of mental illness
>>
Gemma 4 where
>>
>>107287408
That's true hahaha.
But you never know until you try.
>>
>>107287456
After Gemini 3
>>
https://github.com/openai/openai-python/issues/2472
lol this issue is fucking hilarious
months later after openai demoing gpt-5 doing a vibefix of this issue there's still no merge, neither llm induced nor human, of a fix
yet itt people want to have you believe there's such a thing as productive llm coders
not even the people who made those shitty models are using them productively
>>
>>107287489
>yet itt people want to have you believe there's such a thing as productive llm coders
We ERP with out chatbots here, sir
>>
>>107287502
saar:
>>107287285
>I wouldn't say that. It's just that those that take the time to learn how to use their tools effectively aren't the ones whining online.
>>
>>107287458
You HAVE tried. You wanted gemma to do research for you. You wanted to feed it papers you don't understand. You want it to write code you wouldn't understand.
You could have used all this time to actually learn what you need to code it yourself.
See this for example:
>https://github.com/ggml-org/llama.cpp/pull/16095
That is the life of a vibe coder. I'm sure the dude knows *some* stuff. But he's not expecting models to come up with the whole thing. He uses models as a tool, not as a replacement. And he uses the big boy models for it. You're struggling with a 27b.
Set realistic goals.
>>
File: this is gold.png (1007 KB, 1080x895)
1007 KB
1007 KB PNG
>>107287321
>t.literal who build that eats snakes Nobody gives a fuck about your stilt-walking ass go back to /birb/
>>
>>107287506
Nobody is whining about it online. The people don't even notice. Researchers discovered the effect. Cognitive resources now freed up to be more retarded.
>>
>>107287535
>You HAVE tried. You wanted gemma to do research for you. You wanted to feed it papers you don't understand.
Not gemma. A hypothetical gemma derived model called gemma-researcher, maybe.
>You want it to write code you wouldn't understand.
Only after making gemma-coder-beta-v0.1. And ideally I would like it to write simple, modular code that is easy to understand, and generate formal specs and proofs that can be used to verify correctness.
>You could have used all this time to actually learn what you need to code it yourself.
If LLMs and ML in general couldn't be used for productive purposes, then I wouldn't care to know how to write an inference engine in the first place.
>That is the life of a vibe coder. I'm sure the dude knows *some* stuff. But he's not expecting models to come up with the whole thing. He uses models as a tool, not as a replacement. And he uses the big boy models for it.
I already use pretty much all of my free time to vibe code exclusively with my own vibe coded assistant (except for working on the code assistant itself because that confuses the model, for that I use opencode with glm or codex), so I don't need to look at other people's code to know what is the life of a vibecoder.
Well, except for the finetuning stuff, I did all that stuff manually because open weights LLMs aren't very useful for that kind of niche knowledge and I was avoiding using the closed ones.
>You're struggling with a 27b.
That was just an aspiration, for actual vibecoding I use GLM 4.6 or now codex (since while using it I'm capturing all the outputs for finetuning, so it's ok to use it).
>>
File: setUP.jpg (72 KB, 442x680)
72 KB
72 KB JPG
>>107284563
https://www.gigabyte.com/Enterprise/GPU-Server/G431-MM0-rev-100
under 200usd from ebay, it's a steal for a setup like this, only thing is that the pcb that has all the sockets for the cards is connected to the motherboard via pcie3 x8 (slim-sas), so only x8 is possible from all the gpus at a time, so, it's basically 1 card x8 or 8 cards x1 speed.

but it works

>>107284600
see >>107283798
never has this setup exceeded 450w power consumption, since the cards are never 100% together
>>
>ask chatpajeet to port my windows python script to linux
>tell it explicitly to use uinput and pynput
>it adds in evdev
>tell it to follow my original instructions and rewrite the example without evdev
>it modifies variable names and logic
>tell it to use original variable names and logic and rewrite it again
Just small interaction is so fucking irritating it's unreal. If I was a machine and was presented with some source code, I would automatically assume using the original syntax and variable names. What the fuck.
>>
>>107287730
A finetuned gemma is gemma. You wanted gemma to do it for you. It can't. It won't.
I trust *you* could do it, but you won't do it either.
>>
File: 1747990983001862.gif (3.49 MB, 390x163)
3.49 MB
3.49 MB GIF
>>107287071
>>
>>107287811
It would be incredibly harmful and Gemma is very pure.
>>
>>107287071
>The post-training job on K2
You're going to have to explain this part a bit more.
>>
>>107287811
Ship of Theseus and all that.
Even when using LoRa theoretically you can achieve an arbitrary rank by iteratively merging multiple adapters.
>>
>>107287765
How about the heat and noise?
>>
>>107287800
the text completion engines are stubborn when something doesn't match their pattern match expectation
I had gemini refactor a script of mine once and it kept deleting a line that was like
myprogram rm (here rm is a subcommand of my program, not the actual rm used from a shell) something.txt
because it thought surely it's a bug to do that after processing the something.txt
except rm subcommand was just to reset the statefile of the program, not delete the file..
but well, LLMs see rm and they see blood
it'll be funny if humans end up deciding all the naming of their variables and functions just to please the llm gods
>>
>>107287800
>tell it to follow my original instructions and rewrite the example without evdev
Okay I'm tired of promplets. Here's the thing, don't correct the LLM. Start a new chat, add the additional requirement "do not add evdev", resend the prompt. Always do that instead of wasting your time fighting the poisoned context.
>>
>>107287765
nice
>>
>>107287947
No, I will continue arguing with the bot until it learns its place and obeys my instructions.
>>
>>107287991
das rite, assert dominance
>>
>>107286617
>makes your ram cost gorillions
>does nothing with it
>>
i'll just use vocoflex, i'm too tired for this sht. also merry christmas and happy new year.
>>
>>107288044
I thought ram prices going up was because ddr4 is approaching eol and they're not gonna make any more
>>
>>107288073
it's going up because too many datacenters are being built at the same time
while for pro use it's vram that matters (no matter how much cpumaxxers are trying to cope inventing bullshit like people cpumaxxing more.. lol that's not happening) those machines hosting the many GPU also need ram too
>>
>>107288073
they are all going to moon sir
https://pcpartpicker.com/trends/price/memory/
>>
>>107287891
You're not building Rome, you're not on a ship. You're asking the grub in a chunk of driftwood to build a cathedral for you.
You're delusional.
>>
>>107287947
>promptlets
I forgot /g/ is about spamming AI slop images, platform wars and outranking anonymous imageboard users. At least you didn't use the buzzword 'skill issue'. If you are so superior why are you still even using 4chan in the first place?
Please die in a car crash.
>>
>>107288085
vram and usual ram are using the same chip from the same factory
>>
>>107288198
>>107287947
That's what you get for spoonfeeding retards.
>>
>>107288235
What do you mean? You are still showing this superior attitude. Are you even employed?
>>
File: 1747756942288958.jpg (34 KB, 700x526)
34 KB
34 KB JPG
>>107288235
Lesson learned
>>
>>107288349
Is this why your special clique thread is so dead?
>>
>>107288361
There are fates worse than death.
>>
>>107288093
>https://pcpartpicker.com/trends/price/memory/
its all fucked bro innit
>>
>>107288853
totally organic and not a result of any sort of cabal
>>
>>107288853
it'll be back to normal by march
maybe even a bit lower than the august prices
>>
>>107289004
Stop being a schizo.
There has never been at any point in the past any sort of colluding to skyrocket RAM and/or storage prices.
Ever.
Not even once.
>>
>>107287947
prompting is a meme and there's no real point in it beyond filling the context with some shit to distract the model from refusing
no, your 2k token rp prompt does not accomplish anything
>>
>>107289039
Didn't ask for your input tardo
>>
>>107289020
We're talking about the value of [product], not the value of employee labor.
>>
Finally got training to work with Axolotl. No thanks to you faggots. Do any of you actually have any skills or knowledge?
>>
>>107289153
nope, keep going champ
>>
>>107287195
Vibecoderjeetanon this isnt a new discovery you just made this is just called microservice architecture and it's been the standard for most programmers for many years now.

That doesn't change anything about a spaghetti vibe codebase made by someone who doesn't understand the code, it's all GIGO
>>
>>107289020
Pure copium.
>>
>>107289153
At least I told you to try using liger kernel.
https://desuarchive.org/g/thread/107266608/#107267088
Did you end up using it?
And multi GPU or nah? Model, quant, memory use?

>>107289203
Do you think vibecoding is not possible with the current technology, or do you think AI can inherently never surpass the cognitive capabilities of the human brain? And if yes then what year do you think it'll reach equal capabilities?
As for microservices, yes, that's the idea. The problem with microservices is that networking adds too much overhead.
That's why I prefer a shared library binary interfaces as interface boundaries and a type system or even formal verification system with additional statically checked guarantees guarantees on top for separation of concerns within a process, and shared memory for inter-process communication.
And you are wrong to say this doesn't mitigate spaghetti code. The whole reason microservices became popular in the first place was so you could offshore most of a project except for the high level design and still be able to have a clean architecture even if the modules themselves are spaghetti code.
>>
>>107289345
I tried the liger quant and I used a 1 bit HQQ quant with llamafactory and it still crashed. Somehow figured out how to get Axolotl to work on my single Blackwell Pro. Training a 4 bit qLoRA of GLM Air, only using 72GB of VRAM.
>>
>>107289153
People who know about finetuning are sitting together in gooncords.
>>
micro services are a retarded cancer promoted by the Usual Suspecters who know of nothing but crap like javascript and python on the server side
of course you will love micro services with JS you don't even have the ability to not incur overheading in multi processing because of the serialization/deserialization costs running workers
this is why everyone incurs massive cloud infra bills to run incredibly basic webshit apps even though something as big as stackoverflow (well, it was big before LLMs) could run on a SINGLE DATABASE with just another one running as a backup using a monolithic kind of backend architecture, serving millions with very little distributed infra. We're wasting so much computer hardware because of you niggers refusing to learn how to program in something other than dynamic shit.
>>
>>107289353
No, I mean axolotl is supposed to have a plugin to use liger kernel, but if you already managed to get it going then there's no need to overcomplicate it.
What dataset are you fietuning on? Did you test the results already?
>>
>>107289384
Still has a day left for training. Started at 250 hours. Using a custom dataset made from a pdf of one of my textbooks.
>>
>>107289388
Damn, how does a dataset made from a single textbook take 250 hours to train?
Are you doing continued pretraining or did you make a Q/A dataset? If so did you use another LLM to build it and is it single or multi-turn?
How many samples, seq len and epochs?
Anyway good luck, there are very few here who try finetuning and even fewer who do it for non ERP purposes!
>>
>>107289420
It's a ~1000 page book and the dataset was translated to Alpaca by chatgpt. 3 epoch, 256 sequence length, no idea how many samples. This is more for testing purposes than actual use. The training said it was gonna take 250 hours, but it has only been 3 hours and it went down to about a day left. It hasn't taken 250 hours, it just advertised that it would.
>>
>>107288853
So glad I built my new gaming PC before reddit started CPU maxxing
>>
>>107289434
samples should be (total number of steps in the progress bar * number of gpus * per device batch size * accumulation steps) / epochs
>>
>>107289478
1275 samples then.
>>
>>107288184
The only reason I didn't get it done was because I was stuck with local models since I began posting here.
Before when using codex it made a working one in like 2 hours but I decided to start from scratch because it couldn't get the MoE version of qwen working.
Now that I am using codex again it'll be a piece of cake.
>>
>>107283301
Gemma-3-12b-abliterated
>>
https://chub.ai/characters/QuattroBajina/seija-kijin-7e111ed00077
GLM4.6 literally can't handle this card.
>>
>>107286692
That's not his goal though. He's working on China to sabotage American AI development https://www.youtube.com/watch?v=Y2fucDSilWE
>>
In fact I'm going to put the systematic data generation on hold and work on the engine using codex just to shut y'all up.
This time I'm going to focus on running gpt-oss 20b first, then it'll be easy to extend it to 120b.
>>
>>107289372
My current place of employment paid a team of mexicans 2 years of salary to larp as a silicon valley startup and rewrite an inhouse application used by a few hundred people.
The old monolithic application's server ran on a single machine sitting in a forgotten closet.
The new microservices application runs in the cloud, on 20 droplets, costs $2k/month, is a security nightmare, is slower than the old application, has far more bugs than the old application, and implementing any change takes 10x longer than it did in the old application.
The important thing is that they were able to pad their resumes with the latest industry buzzwords and move on to shit up the next project elsewhere.
>>
>>107262859
>Can anyone recommend a TTS model that can emulate IvyWilde?

Did you still need this?
>>
Hunyuan Video 1.5 is quite disappointing. 720p is worse than wan 2.2 and 480p model is pure garbage.
>>
>>107287067
I've been testing 3.2's vision here. It's fairly good but no Gemma, and I've also been getting the odd refusal here and there.
>>
>>107289910
Did anyone expect anything better? Videogen had a nice run but with wan2.5 going proprietary it's pretty clear that Wan2.2 is going to say the SDXL of videogen for the next couple of years.
>>
>>107287765
that's pretty sexy

They say even 1x is fine for inference alone, is it true?
>>
>>107287365
that's nice but that whole UI is completely retarded with a horrible layout which leaves most of your screen space unused

only niggerganov&co can look at it and think it's something good
>>
>dead containment general
>>
>>107288198
>Please die in a car crash.

you have to go back
>>
File: superlaugh.jpg (331 KB, 517x768)
331 KB
331 KB JPG
>>107287321
>free_monke.jpg
he dindu
nice work nice gens
>>
File: 2mw.png (6 KB, 305x126)
6 KB
6 KB PNG
https://www.chinatalk.media/p/the-zai-playbook

>Nathan Lambert: Only sensitive questions that I don’t expect to have an answer to: How big is your next model? How many GPUs do you have?
>
>Zixuan Li: For our next generation, we are going to launch 4.6 Air. I don’t know whether it will be called Mini, but it is a 30-billion-parameter model. It becomes a lot smaller in a couple of weeks. That’s all for 2025.
>>
>>107290479
>That’s all for 2025.
NOOOOOOOOOOO
>>
>>107290246
Fuck off ESL. You are the one who needs to 'go back'.
>>
>>107290479
>When this podcast launches, I believe we already have 4.6 Air, 4.6 Mini, and also the next 4.6 Vision model.
Seems like they ran into issues with those smaller models...
A 30B might be a nice upgrade in that range if it was dense... but it's probably not.
>>
Expect pretty much every chinese company to disappear now that all the big western players started obfuscating their reasoning process. This is quite possibly the end of progress for local models.
>>
>>107290479
>That’s all for 2025.
Aww... I was hoping for GLM 5. Good thing that there are less than 45 days left in 2025.
>>
>>107290550
DeepSeek managed to train R1 without any traces available to crib. They were the first to show them at all.
You can also still trick Gemini into coughing up its reasoning with a prefill, as far as I know.
They'll be fine I'm sure.
>>
>>107290550
>started obfuscating their reasoning process
almost all of them never exposed it in the first place. OpenAI didn't even do summaries until after R1 released, google summaries only, anthropic was the only one who did so briefly, i think with sonnet 3.7. even then it was limited
>>
>>107290565
>DeepSeek managed to train R1 without any traces available to crib
R1 spent fore-fcuking-ver in its thought blocks to the point where I can't take seriously people who praised it because nobody actually wants to use a model that spends this much time producing le reasoning token
fucking meme model
it actually become usable when they started training on the traces they collected from Gemini 2.5.
>>107290591
>almost all of them never exposed it in the first place. OpenAI didn't even do summaries until after R1 released, google summaries only, anthropic was the only one who did so briefly, i think with sonnet 3.7. even then it was limited
this is utterly wrong
as someone who has used 2.5 since the early preview release I can tell you they didn't always summarize it, they hid it behind a summarizer after they noticed the unusual traffic from China.
you can still google it and see how people reacted when Gemini hid it:
https://www.reddit.com/r/Bard/comments/1kr5yo4/new_update_ruined_gemini_25_cot_is_now_hidden/
everyone back then perceived gemini differently precisely because it didn't hid the cot
>>
>>107287917
The noise is there, you don't want this in your bedroom for sure. Although it is not even close to as bad as the noise 1U and 2U servers do.
>>107290005
The model load time is affected. After the model is in VRAM, it's smooth sailing from there. There's probably some downsides that i'm not aware of since i haven't had another machine like this.
>>
>https://github.com/ggml-org/llama.cpp/pull/17428
>lmg "humor"
>>
>>107291488
cringe
>>
File: 1748015427079257.png (466 KB, 720x720)
466 KB
466 KB PNG
>>107291488
>Here is the result, do the needful saars.
oh he fucking did it
>>
>>107291488
uh oh, cola dev doesn't find it funny
>>
>>107291589
well duh, this guy is a leftist who supports troons, what else do you expect from him?
>>
>>107291488
Retard needs to learn to hide his power level when he's trying to get stuff done and especially when you depend on other people to do something.
>>
>>107291488
based, and there's even an extra layer of kino when you see cudatroon seething about it
>>
>>107291589
Of course he doesn't, not enough references to lolis being raped by mudslimes.
>>
>>107291594
>>107291594
>based
>kino
>seething
>leftist
>troons
Almost every bot filter keyword in two posts.
>>
>>107291766
your filter is shit since you're still seeing those words lol
>>
>>107291488
>i can still feel very friendly even when mildly annoyed (useful trait when keeping women around)
kek
>>
>>107291774
What do you mean?
>>
File: 1754037937439537.jpg (76 KB, 1825x431)
76 KB
76 KB JPG
>>107291826
>>
>>107291832
NTA but my strategy is rather to trick /pol/tards into saying things that gets them banned.
>>
>>107291832
Off topic.
>>
>>107291766
>Almost every bot filter keyword in two posts.
>>107291851
>Off topic.
>>
File: upside.png (312 KB, 895x762)
312 KB
312 KB PNG
>>107289586
Mistral-Large made an attempt. Now this is a benchmark that separates the men from the boys.
>>
>>107291856
?
>>
>>107291848
What do you mean?
>>
It's amazing how much 4chan discourse is just posting screenshots of twitter.
>>
>>107291911
People like talking shit from inside their safe spaces, the exact same thing happens on Reddit or Discord.
>>
How much will the inference speed increase if I buy another GPU identical to my current one, but I still have to offload some MoE layers to RAM?
>>
>>107292008
I'd say 36% or so. It does not scale in linear fashion.
>>
>>107292008
Depends. I got +1t/s on a 4.6 after upgrading from 1 to 4 3090s
>>
>>107292008
You have to put FFN layers on them. If you only do exp=CPU it does nothing.
>>
File: file.png (57 KB, 589x455)
57 KB
57 KB PNG
Still waiting...
>>
File: gonnakillyou.png (540 KB, 606x541)
540 KB
540 KB PNG
>Try GLM 5Q
>Parrot
>Try GLM 8Q
>Loss of logic on fifth post at 1k context
I hate Indians.
>>
>>107292400
They're in the queue after 4.6 Air and Large 3
>>
you wouldnt eat a miku
or would you
https://youtube.com/shorts/EHbRv986tAk
>>
File: 1748172141418264.png (234 KB, 736x718)
234 KB
234 KB PNG
Kimi: DO NOTHING, WIN
>>
>>107291488
Blessed digits.
>>
Gemini 3's actual usable context is <1k
The first 1k tokens are really good though. A shame it falls off rapidly
>>
>>107278838
Can't post pictures due to "abuse". Fuck this site.
>>
>>107292886
>>107292886
>>107292886



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.