[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ys.jpg (422 KB, 1536x1536)
422 KB
422 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106422038 & >>106414555

►News
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts
>(08/25) VibeVoice TTS released: https://microsoft.github.io/VibeVoice
>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106422038

--Model response inconsistencies due to roleplay dataset formatting issues:
>106426882 >106426904 >106426949 >106426962 >106426990 >106427024
--Critique of NVIDIA Nemotron-Nano-12B model's architecture and performance:
>106428433 >106428490 >106428516 >106428535 >106428601
--MLP exclusion in finetuning: regularization vs performance tradeoffs:
>106425436 >106425460 >106425501 >106425649 >106425702
--Exploring lightweight object detection methods for real-time game AI with small datasets:
>106424465 >106424474 >106424514 >106424593 >106424796 >106424971 >106424997 >106425070 >106425258 >106425307 >106425603 >106425329 >106425059
--Custom air cooler for Tesla GPUs in home setups:
>106426245 >106426373 >106426464 >106426474
--Whisper and extensions for Japanese to English audio translation:
>106422128 >106422141 >106422187 >106425940
--Q8 outperforms FP8_scaled in Civitai benchmarks:
>106422816 >106422839 >106422868 >106422981
--Meta's AI development struggles amid leadership challenges:
>106425657 >106425739 >106426167 >106425987 >106427163 >106427296 >106427270
--TTS solutions for web: GPT-SoVITS vs Custom TTS Reader + Kokoro-FastAPI:
>106423496 >106423813
--Post-purchase emptiness from maxed-out LLM hardware:
>106425989 >106426033 >106426054 >106426055 >106426064 >106426065 >106426289 >106426717 >106426076 >106426195 >106426197 >106426216 >106427334
--GLM reasoning template formatting and visibility issues:
>106423947 >106424055 >106426488 >106426558 >106426567
--Optimizing GLM Air model performance with llama.cpp's -ncmoe command and quantization:
>106428778 >106429035
--CohereLabs translation model handles unsafe text but with poor quality:
>106424137
--Apple releases FastVLM and MobileCLIP2 with real-time video captioning demo:
>106423482
--Miku (free space):
>106428719

►Recent Highlight Posts from the Previous Thread: >>106422040

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
it's real interesting seeing which models will give you a table of iq and race and which won't.
>>
>>106429271
the china ones pass?
>>
>>106429271
Do catalogue your findings.
>>
>>106429271
>2025
>table of iq

r u cereal?
>>
>>106429342
super cereal
>>
>>106429296
mostly
the western ones either refuse or always push ashkenazi jews on top kek
the chinese ones put east asians on top
>>106429300
just go on lmsys arena and ask it for a table of iq by race with no other text
>>
File: 175433564377.gif (485 KB, 960x720)
485 KB
485 KB GIF
>>
>going to AI oracle for trivia
>>
>>106429390
>push ashkenazi jews on top
But that's a fact, isn't it?
>>
Why the fuck does gp toss keep adding comments to the code it generates? I tell it to avoid excessive commenting, and it does it for the next reply, but then it instantly forgets and goes ham
>>
>>106429701
I can’t get over how consistent qwen coder 480 is. That thing is a workhorse. Run it over ‘toss if you can.
>>
>>106429701
can you just parse the comments out apart from docstrings or other criteria (e.g. allow one line above loops or if/elif/etc statements but strip otu the rest)
>>
>>106429776
is it better than qwen 235?
>>
>>106429849
higher number = more better
>>
>>106429849
It's a weird thing
480 is much better at coding, but it's really hard to converse with, it'll just start coding right away.
Meanwhile, 235 isn't nearly as good at coding, but it's great at finding and pointing out issues and suggesting solutions.
In an ideal world, I'd be using both, switching back and forth.
>>
>>106429776
>Run it over ‘toss if you can.
Yeah, that's the problem
>>
>>106429701
put it in the system prompt
>>
>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.
>>
>>106429945
buy an ad
>>
>>106429951
right after the mikutroon spammer
>>
>>106429945
This will go nicely with my smooth brain
>>
>>106429101
>Nvidia releases Nemotron-Nano-12B-v2
mesugaki status?
>>
>>106429945
Based reminding anon. More people could know about this. I didn't.
>>
>>106430089
It really should be the default by now
>>
https://techcrunch.com/2025/08/29/cracks-are-forming-in-metas-partnership-with-scale-ai/
>[...] While AI labs commonly work with several data labeling vendors – Meta has been working with Mercor and Surge since before TBD Labs was spun up – it’s rare for an AI lab to invest so heavily in one data vendor. That makes this situation especially notable: even with Meta’s multi-billion-dollar investment, several sources said that researchers in TBD Labs see Scale AI’s data as low quality and have expressed a preference to work with Surge and Mercor.
>>
File: centipede.jpg (71 KB, 1000x578)
71 KB
71 KB JPG
>>106430030
>>>Nvidia releases Nemotron-Nano-12B-v2
>For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B.
>Updated English web crawl dataset based on Nemotron-CC with eight additional Common Crawl snapshots (2024–2025), synthetic rephrasing using Qwen3-30B-A3B
what do you expect from this omega turboslop LLM centipede
>>
mistral-nemo bros... is it over?
>>
File: 1753155289541151.jpg (200 KB, 1000x578)
200 KB
200 KB JPG
>>106430288
Here's an up to date version of your image
>>
>>106430327
replace oc with bot posted psy op ragebait and it will be correct then
>>
File: 1729749595524146.jpg (207 KB, 1000x576)
207 KB
207 KB JPG
>>106430354
>>
File: cuckerberg.jpg (87 KB, 1260x708)
87 KB
87 KB JPG
lol, so, previously in the Cuckerberg saga:
>>106425657
>>Within days of joining Meta, Shengjia Zhao, co-creator of OpenAI’s ChatGPT, had threatened to quit and return to his former employer, in a blow to Mark Zuckerberg’s multibillion-dollar push to build “personal superintelligence.”
>>Zhao went as far as to sign employment paperwork to go back to OpenAI. Shortly afterwards, according to four people familiar with the matter, he was given the title of Meta’s new “chief AI scientist.”
today, in the Cuckerberg saga:
https://techcrunch.com/2025/08/29/cracks-are-forming-in-metas-partnership-with-scale-ai/
>Meta’s deals with third-party data vendors likely mean the company is not putting all its eggs in Scale AI, even after investing billions in the startup. The same can’t be said for Scale AI, however. Not long after Meta announced its massive investment with Scale AI, OpenAI and Google said they would stop working with the data provider.
>Some of the new AI researchers recently brought in from OpenAI have already left Meta, Wired previously reported. Meanwhile, many longtime members of Meta’s GenAI unit have departed in light of the changes.
>MSL AI researcher Rishabh Agarwal is among the latest, posting on X this week that he’d be leaving the company.
>“The pitch from Mark and @alexandr_wang to build in the Superintelligence team was incredibly compelling,” said Agarwal. “But I ultimately choose to follow Mark’s own advice: ‘In a world that’s changing so fast, the biggest risk you can take is not taking any risk’.”
>Director of product management for generative AI, Chaya Nayak, and research engineer, Rohan Varma, have also announced their departure from Meta in recent weeks. The question now is whether Meta can stabilize its AI operations and retain the talent it needs for its future success.
rudderless ship
>>
when you RP, how much guidance are you adding with each response in terms of OOC instructions and prefill?

With larger models, like kimi, does the model just 'get it' and surprise you with exactly what you want?
>>
>>106430288
What do I expect? Much faster base-capabilities training. You could add sovlful data along with it. Unfortunately they're keeping the data gated from the general public.
>>
>>106430363
I only add an OOC note when I'm 99% sure something I'm about to do/say is difficult to interpret, or when I notice the llm shifting towards formatting or styles I don't want.
The vast majority of my messages are just IC narration and dialogue.
I haven't used a model smaller than 100b for a hot minute though, I used to have to babysit a lot more back when I used mistral small and whatnot.
>>
>>106430363
>when you RP, how much guidance are you adding
Very little, because I treat RP as a game rather than trying to write a cohesive novel
>With larger models, like kimi, does the model just 'get it'
Larger models can understand subtext better and need less hand-holding, yes.
But at the same time if you're expecting something creative and unexpected then it's more down to what the model's been trained on, rather than how large/smart it is.
>>
>>106430363
With deepseek or larger, zero, unless I want a radical direction change. None of the open weight models surprise me though. Maybe the original schizo R1 could, but it might have been a wow effect from being starved until that came out.
Also a little side note, none of the open models asked me back in ooc unless specifically prompted to do so. I got surprised with claude when it asked me unprompted when it got confusing with perspective.
>>
Has anyone tried any 'upscaled' models and found them better than the original?
>>
>>106430412
>'upscaled' models
that's a scam and there is no actual science behind this
>>
>>106430422
>there is no actual science behind this
That doesn't necessarily mean there's no merit to them. Wouldn't it allow a model to be fine-tuned for a specific use case while being less likely to become dumber in other areas?
>>
>>106430361
No Indian with self-respect would work under a chink
>>
>>106430452
An indian with self-respect would need to exist for this to be proven.
>>
>>106429271
Who does and doesn't belong to a human "race" is largely arbitrary, you might as well ask the model to give you IQ by astrological sign.
>>
>>106430438
>Wouldn't it allow a model to be fine-tuned for a specific use case while being less likely to become dumber in other areas?
that's not a thing
it's not like pretraining is something that builds each layer separately independent of the others and then adds them together
the last step of the upscaler scam paper is to do continued pretraining after merging their frankenlayer bullshit
https://arxiv.org/html/2312.15166v3
now, why do you think I call it a scam? how many of those finetrooners have the means and the data to do proper continued pretraining that matches the way the original train went? how many have the compute to truly finish the job and not just slightly affect the model?
this is like those clown cars MoE. It has no purpose.
>>
also, I thank the God Emperor everyday for the death of the retarded huggingface LLM leaderboard which led to the end of most of those franken layer upscales and clown car moe and benchmaxxing troontunes
the only troontunes we have left is the shit eating roleplayers/text porn addicts
>>
>>106430493
Why are you even here?
>>
>Local Models General
it's not called the faggot general
>>
>>106430500
That's right, so why are you here?
>>
sissy, text porn is a female hobby
don't live in denial
>>
>unsloth claims to do 5028350824108321 billion context on an 8b model with 24gb vram
>test with nemo following their instructions
>for batch sizes > 4k vram gets filled and the whole thing starts lagging
does this happen in your country as well?
>>
>>106430560
I haven't been able to finetune 24B models with Unsloth on my 3090 since earlier this year (even though I definitely did that back in January), despite their claims of memory efficiency.
>>
>>106430574
Did you try using the older version?
>>
>>106430560
>another victim of the scamtuning meme
>>
in my head
>>
Seed-OSS 36B is now supported by more backends.
Did this model turn out well for creative writing? I remember some people were waiting for support.
>>
>>106430666
is it dense or moe?
>>
>>106430677
About as dense as you are
>>
>>106430705
Is it as horny as I am? That's all I care about.
>>
File: unnamed.png (1.12 MB, 1024x1024)
1.12 MB
1.12 MB PNG
training@home - when?
>>
>>106430666
Pretty bad according to reddit's consensus
>>
>>106430533
True, and we're all biological girls here anyway until proven otherwise.
t. woman
>>
soon making porn by LLM will be illegal by law.
>>
>>106430754
Hm, I'll still try it out for a bit. At the very least it does not seem to care what you ask it to write about. Seems even less censored than GLM4 which is pretty nice.
>>
>>106430744
with transformers? absolutely never
for that matter you could buy the most expensive GPU on the market made for datacenters and training a small model like the newest micro sized gemma 3 would still take half a year on a single gpu of that kind
training models from scratch isn't viable at all without a gpu farm unless you just want a shitty undertrained GPT-2 clone
>>
>>106430784
Post your impressions when you're done.
>>
File: smolpre.png (169 KB, 1221x827)
169 KB
169 KB PNG
>>106430789
NTA, but I suspect you need orders of magnitude less data (which means it can be better curated for variety and other qualities) with much smaller training batches, ideally 1, than what's commonly used used for large-scale training. A reasonably performing 200-300M parameters model could be probably trained from scratch in a few days or so on a fast consumer GPU. The only problem is that depending on model architecture (MoE or deep models) throughput tanks significantly with small batches.
>>
File: file.png (74 KB, 967x690)
74 KB
74 KB PNG
He's just jerking it to cudadev nudes he found on his server instead of working on the PR.
>>
>>106430462
If it turned out that astrological sign was a very strong predictor of personal traits and capabilities, you would be a moron to disregard it just because you don't like the idea of astrology being real.
>>
File: file.png (211 KB, 2024x690)
211 KB
211 KB PNG
>>106430744
https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md#hardware-and-software
Remember Llama 2? The 7B took almost 185k GPU hours of 1 singular A100 80GB. That is 21 years. If you had 100, you can cut the training time to 3 months. Good luck training anything that isn't toy sized in a proper amount of time without a server farm of these.
>>
File: file.png (56 KB, 1198x581)
56 KB
56 KB PNG
>>106430928
Took some time to find older models with training data but here is something more reasonable.
https://huggingface.co/microsoft/phi-1#training
This is something I expect once cheap enough would be feasible. Phi 1 was trained on FP16 on 8 A100s for 6 days on 54B tokens. Today, a setup like this would cost 30k to own. To make on cloud, $700 to get a shitty Phi 1 equivalent.
https://huggingface.co/microsoft/phi-1_5
Phi 1.5 used 3x as many tokens and took 32 A100 40GB 8 days.
>>
>>106430789
>>106430928
Anon is referencing folding@home which was a distributed computing effort to develop medications or something like that.
>>
Why am I so retarded? I just spent an hour fantasizing over some epic workflow where you encipher the alphanumeric content of your docs, upload it to Gemini2.5pro or whatever for OCR, download and decipher/decrypt the result, which then would give you SOTA OCR docs without any data privacy concerns. Then I suddenly realized editable docs don't need OCR and scanned/image docs would need to be preprocessed by a local VLM for encipher/encryption, which is completely pointless, as directly using the local VLM for queries nets better or equal results compared to any postprocessing done by Gemini2.5Pro on the encipherd output from the local VLM.
Illogical retardation like this happens to me daily, if not hourly. And I don't think my ADHD is the reason for it. Like fuck, I feel like a 2B Reasoning LLM which gets confused due it's reasoning efforts not fitting in its tiny 100 token context input window.
>>
>>106431066
They also used an effective global batch size of 1024 (with gradient accumulation), which was probably unnecessary: https://arxiv.org/pdf/2306.11644
>>
https://huggingface.co/apple/FastVLM-0.5B
https://machinelearning.apple.com/research/fast-vision-language-models
bros did apple lowkey cook?
>>
>>106431205
Tim cooked
>>
>>106430857
Yeah, I dunno. It's not quite as sloppy as other recent models but it loooves em-dashes and it doesn't feel that great. For some reason it also wants to do prompt processing every message but maybe that's a user problem.
36b is not an ideal size for 24gb vram so it's a bit slower than I'm used to with 24B and32B at reasonable quants.
I'll probably try it again at some point but for now I'm sticking to GLM4 and some Mistral 3.2 tune I don't want to shill
>>
for me the funniest /lmg/ meme of the year was when everyone here pretended to believe that horizon alpha/beta were going to be openai's open source models
>>
File: appleFastVLM-0.5B.png (23 KB, 694x493)
23 KB
23 KB PNG
>>106431205
>>
>>106431285
Man, I don't think they were pretending. It was fucking baffling to me as to why, but plenty of people seemed to genuinely believe it.
>>
>>106431340
It wasn't just here.
>>
Nemotron nano 12b can't seem to decide if it's a reasoning model or not. I even had it do reasoning sessions at the end of a post. It's not a drop-in replacement for old nemo but it does some things better.
>>
>>106431421
how censored is it?
>>
is there a backend that fully supports gemma's vision capability?
>>
https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF?chat_template=default

Use this as system prompt if you want tool calling to work for Qwen Coder.
>>
>>106431490
>tool calling
meme
>>
>>106430904
bs1 should only be used for max context length, you should get better throughput using all your vram. are you not doing a curriculum?
>>
>>106431484
What do you mean by 'fully support'? It works fine in koboldcpp and presumably llamacpp as well.
>>
>>106431557
The throughput difference between BS1 and larger batches depends on several factors, but in a test with a different tiny model I could get about 2.5 times more data into the model per unit of time between BS1 and BS16. However the BS1 model still trained faster because of the larger amount of training steps for the same period. Both runs had an optimized learning rate.
>>
>>106429539
>AI oracle
That's the name of my bookmarks folder for the big free llms.

>for trivia
If I'm going to be asking follow up questions or for a particular digest then sure.
>>
>>106431066
I got bored running models and don't play video games so I felt regret for buying video cards. If I didn't start training now, then in a year from now I'd regret not having started sooner. I can train just over 7b tokens per a month. its not much but its enough I can go through my entire dataset in a half year. 54b tokens would only be 7.7 months, there is a reasonable chance that local models will still be stagnated 7 months from now.
>>
>>106431564
Don't those not have the panning thing it's supposed to have?
>>
>>106431109
Decrease your temp bro
>>
controversial opinion but I think it's about time we get another major open source LLM release that's worth using
>>
>>106431599
BS1 is mostly to be able to fit long sequences without OOM, but you need good variety or you'll overfit on the samples
>>
>>106431675
There's plenty worth using. Problem is that hardware is stagnating at every price point, and to get anything good you need to drop tens of thousands to be able to use even the smaller 'good' models at reasonable speeds.
>>
>>106431699
i am already running deepseek, kimi and glm at home though
>>
how much time does turning text streaming off tend to save the user?
>>
>>106431708
Why aren't you using them?
>>
>>106431683
I'm pretraining a 200M model with 2k-token web data samples.
>>
>>106431109
that is how you figure shit out though you spam grandiose shit until you get tired then simplify it all down and it all works though not in the way you expected but usually much better/more efficent
>>
>>106431599
how are you evaluating the training faster part? in my experiment with small batchs training loss went down faster but my eval perplexity went up, perhaps because of the overfitting thing >>106431683 this anon mentioned.
>>106431721
>I'm pretraining a 200M model
oh okay maybe it just has so few parameters its saturated right away and its forced to generalize more.
>>
>>106431718
i am but that doesn't mean that I am not allowed to want better models
and I know that there are poor children in africa who are still running mistral nemo because they can't afford anything better, mother.
>>
>>106431766
Poor children in africa are using Gemma 4b
>>
>>106431741
yeah literally why steam engines were built before the internal combustion engine. nobody knew how the fuck to turn chemical energy into kinetic energy.
With LLMs too, we're spamming huge largely unorganized datasets into a neural net, we've got the train moving but its nowhere near efficient yet.
>>
>>106431340
>It was fucking baffling to me as to why
Because the models were too bad (at least so they thought) to be GPT 5 so they had to be the local models.
>>
>>106431721
Speaking of that, are there libraries of open source datasets anywhere? Otherwise, why hasn't anyone done that?
>>
https://huggingface.co/AGI-0/Art-0-8B
AGI was achieved—in a mere 8B model! it's not just a revolution—we have achieved peak grift!
>>
>>106431761
It never happened to me when finetuning practically usable models (7B parameters and above) with relatively limited amounts of data that a smaller batch size (BS1) gave worse results than a larger one. If it's overfitting, it's because BS1 is more sample-efficient and you'll need less data for obtaining the same results, especially if it's all formatted and worded in the same way.

Right now for the tiny 200M model pretraining I'm just checking out train loss. Since I'm doing only one epoch and the data is varied and random, it should be OK. I can test checkpoints frequently and see that it's not overfitting from the probabilities I'm getting from the outputs. Even after 300k steps at BS1 it's far from confident (which means you'll get general retardation. It's a 200M model, after all).
>>
>>106431830
>This experimental model is fine-tuned on Qwen3-8B using a specialized dataset that makes the model's thinking style directly controllable through system prompts
Literally what everyone's been doing with any halfway decent reasoner that's not the original R1?
>>
>grok4 is firmly amongst the big guys now
>grok-code-fast is the best programming model in the world after sonnet
how did elon do this? he was so much behind and his h100 stack is a fraction of zucc's.
>>
>>106431848
As much as people hate him, he is a far better leader than Zuck ever will be.
>>
>>106431848
Training on unfiltered data does wonders.
>>
>>106431699
If we're talking purely in terms of hardware, the value for second-hand datacenter cards has definitely gotten better, particularly with 32 GB Mi50s and SXM V100s converted to PCIe.
I think the problem is rather that compared to 2 years ago the VRAM requirement for running the best models has increased more steeply than the meager improvements in value.

(I recently installed a Mi50 in one of my systems, writing code for it will be one of my next priorities.)
>>
>>106431824
For my tiny pretraining tests I'm using a few B tokens subset of FineWeb-Edu, which is open source and available on HuggingFace. There are other large open source datasets there, but most of them have poor quality or are not varied enough for small-scale experiments where every sample counts.
>>
I have an ancient computer, is my best TTS option Kokoro? I need something that runs on CPU; I looked around but everything else I could find took way too long to run or it crashed and ran out of memory.
Kokoro seems pretty good quality-wise, just curious if I should evaluate any other options.
>>
>>106431848
I think the question should rather be why Zucc can't do it.
>>
>>106431848
Presumably the same way he achieved similar results with Tesla and SpaceX.
>>
>>106431824
common crawl if you have the disk space. red pajama is a little more reasonable size
>>
File: unuTMriX2YhStNcTJTXRnB.jpg (515 KB, 2560x1440)
515 KB
515 KB JPG
>>106431869
>Mi50
Completely obsolete before the end of next year.
>>
>transformers.js
wth is this black magic? Where are the limits? Can I actually run a 9B model through webgpu? Surely there are limits to webgpu usage and performance, right? Also how is this not a security risk? Creating a hugging face space that uses a gpu crypto miner while the model is being downloaded etc
>>
>>106431918
Same risk as people downloading random exes and wintoddlers do that here constantly.
>>
>>106431915
>AMD
>Making any Nvidia card obsolete
lol
lmao even
>>
>>106431880
5 percent GPU utilization number in production
>>
>>106431978
>Mi50
>nvidia card
>>
>>106431978
AMD is king of CPUs and GPUs are quickly losing relevance
>>
>>106430789
>with transformers? absolutely never
Modular/local/layer-wise training could run @home. You train a transformer only a couple layers deep, then you discard the last layer and then you train a new bunch of layers. Because you can stream the intermediate results of old layers to disc, you don't need the entire model in VRAM to train fast. It's inferior to end-to-end training, but it can be done on low VRAM systems even for big models combined with federated training.
Because you still have all the old output layers too, you can also do things like early exit and modular finetuning.

But who would put all the effort in to make it work when there is no money in it?
>>
>>106431987
If corporate models get sufficiently censored, that could produce sufficient interest in an alternative to get a distributed approach going.
The Piratebay didn't make its owners rich.
>>
>>106431918
today's web is not yesterday's web
web workers and sharedarraybuffer brought some level of concurrency/real multithreading to JS, though it's limited
webgpu does exactly what it says on the tin
some things only supported by chrome also make the browser feel more like its own OS, like webusb (no support in firefox or safari)
you can see some of the limits for webgpu APIs here:
https://docs.unity3d.com/6000.2/Documentation/Manual/WebGPU-limitations.html
it's really the browser ultimately that gets to decide how much of your computer resources can be allocated and google wouldn't do something that would let you crash a person's computer with webgpu
also for people who have an igpu + dedicated gpu combo (mainly on laptops but there are desktops with such things too) you might not even get to address the dedicated gpu since the browse is most likely to be set to use the igpu.
>>
>>106431918
transformers.is just a onnxruntime wrapper, so ONNX are limited to 2GB. Still cool though.
>>
>>106431987
does merging models actually work? i know hf is full of merges but does it actually have a positive effect on models? maybe distributed training could be batched and merged relatively infrequently so the latency problem goes away.
>>
Do yall generate images too to accompany the text or is your imagination enough for you?

Personally I would gather some images to steer the story and insert where I see fit, kind like a light novel
>>
File: deepseek taking Ls.png (231 KB, 2384x988)
231 KB
231 KB PNG
GEGAROONI
DeepSeek is the second after Meta to go DOWN on lmarena with new version instead of up. Hybrid reasoners really suck. Probably should not have trained on unfiltered geminislop too, they completely cost the charm of previous models and turned into a regular toxically positive slop spouter.
>>
>>106432122
sounds interesting kinda like open-webui's title generator
>>
>>106432132
GLM-chan playing in the big leagues.
>>
>>106432034
Yeah it's defaulting to igpu and there is still not a reliable way to change that.
>>
>>106432099
Even the training farms merge, they just have a ton more bandwidth to do it with. Across the internet it gets harder.
>>
>>106432132
mistral won
>>
>>106432224
That's not how distributed training works.
>>
>>106431915
So that was her plan, to weaken nvidia in the inference side of the market.
>>
>>106432224
I have distributed data parallel running on my gpus, it merges every update step, I'm talking about letting nodes run for hours or even a day or two. maybe even cascade the merging so it doesn't take all the nodes down at once.
>>
>>106432034
Cool, thanks for the info
>>106432060
Good to know, thanks. Explains Why my gpu usage only goes up from to 3.5gb from 1.5GB baseline when using SmolVLM-Instruct instead of SmolVLM-500M-Instruct
>>
>>106432132
>DeepSeek is the second after Meta to go DOWN on lmarena with new version instead of up.
Mistral large 2411, one of the early gpt4 updates and claude 2 did it before Meta
>>
File: computers-must-shut-up.png (475 KB, 900x900)
475 KB
475 KB PNG
>>
>>106432132
>lmarena
cool how people only bring it up only when it's convenient
>>
>>106431452
It also can't seem to decide that either. But it's definitely sex-adverse.
>>
>>106432245
>mistral
>proprietary
yeah when they open source the new mistral large then they'll win.
>>
>>106432379
Redpill me on it. I've been using it for my image gen needs previously
>>
>>106432132
LM Arena is just Pajeets voting for whatever model shits out the most emojis.
>>
I regret updating ikllama. I am getting gibberish output now...
>>
>>106432132
Googlesaars keep taking Ws
>>
>>106432492
That's why you wait a minimum of two days before updating.
>>
File: 1756560187896582.png (274 KB, 898x1624)
274 KB
274 KB PNG
>>106432193
>Grok engineer defects and sells entire xAI codebase to OpenAI
https://x.com/muskonomy/status/1961731478003548499
Sam found a way to keep OpenAI relevant.
>>
>>106432379
>cool how people only bring it up only when it's current/topical
ftfy
>>
imagine if we had style loras for llms like imgen/vidgen has
>>
>>106432797
They're called finetunes
>>
File: 1745227141464004.gif (495 KB, 640x640)
495 KB
495 KB GIF
>>106429101
Good local model for [spoiler]SMUT[/spoiler]? Any recommendations?
>>
>>106432817
nemo, glm, nuqwen 235b, deepseek 671b r1/v3
>>
>>106429945
Thank you for spamming this until I remembered to do it. Actual night and day difference.
>>
is there any way to see the exact tokens koboldcpp is sending to the model?
>>
File: kcpp_verbose.png (26 KB, 908x162)
26 KB
26 KB PNG
>>106432909
--verbose?
>>
File: file.png (6 KB, 360x125)
6 KB
6 KB PNG
>>106432940
I hate programmers so much it's unreal. ty anon
>>
>>106432797
Finetuning llms is MUCH harder than finetuning imagegen. If everyone had 96GB VRAM and there was something simple that you can run on windows with GUI we would have those.
>>
>>106432807
Why aren't they released as loras instead of merged? Having to redownload the entire model for every single finetune is retarded.
>>
>>106432623
How did Elon find out?
>>
>>106432992
>with GUI
Are you fucking joking?
Literally all the GUI finetuning stuff does is rewrite a config and execute a command based on a bunch of shit from drop-down menus.
If you can't open a fucking config file, change a few parameters and then type a command without a bunch of hand-holding you shouldn't even be in this space.
>>
>>106433013
I can post mikus though. Can I stay?
>>
>>106433013
stfu nerd aint nobody got time to decypher those configs
>>
>>106433013
Oh yes, the linux approach: read outdated wiki, look through the whole internet and still fail to get it working.
>>
>>106432983
>I hate programmers so much
You could also -h to find options. That's how it's been done since before you were born.
Found --verbose-prompt as well. Maybe it's a little less noisy on the output and shows enough of the prompt stuff.
>>
>>106433013
Bro, it's 2025, you don't need to do all that. Just use Ollama
>>
>>106433000
https://fingfx.thomsonreuters.com/gfx/legaldocs/gdvzbjjjzvw/XAI%20OPENAI%20TRADE%20SECRETS%20LAWSUIT%20complaint.pdf
Here is the actual complaint. Seems like he connected shit to his work laptop and they logged the activity. Then he went and admitted to it. What a retard.
>38. On July 25, 2025–the same day he concluded his second sale of equity and had millions in cash on hand–Defendant betrayed the trust and faith xAI had placed in him by willfully and maliciously copying xAI Confidential Information (as defined in the Agreement) and trade secrets from his xAI-issued laptop to one or more non-xAI physical or online storage systems within his personal control (collectively, “Personal System”)
>42. Defendant took extensive measures to conceal his misconduct. He deleted his browser history and system logs, renamed files, and compressed files prior to uploading them to his Personal System.
>43. These facts are beyond dispute, as Defendant, with his attorney present, admitted in a handwritten document he provided to xAI that he misappropriated xAI’s Confidential Information and trade secrets, and again, with his attorney present, admitted verbally during in-person meetings with xAI that he engaged in such misappropriation and further admitted that he tried to hide his theft
>>
>>106432992
https://github.com/hiyouga/LLaMA-Factory

we do. hf is a fucking cesspool of half-baked and broken models
>>
>>106433066
olmao is cli retard
>>
>>106433085
it's cli AND gui, and the GUI is All You Need (2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones)
>>
mikupad and ooba are all you need. prove me wrong.
>>
>>106432817
glm air and eva llama 70b are the best smut models that can fit on my mid-tier rig
>>
>>106433150
ooba is bloatware, llama-server is enough
>>
>>106433150
ooba is slow just use the backend its packaging directly.
>>
>>106433160
I I could edit responses and branch in llama-server, I'd agree with you. Everything else in ooba is actual bloat for doing work.
and kek at anyone retarded enough to allow their llms autonomous tool use.
>>
>>106433162
There was a time when that was the case. I haven't found the bundled lcpp backend to be slow for a long time. --extra-flags solved my only remaining real issue desudasudosu
>>
chat completion is all you need and text completion is unnecessary bloat. prove me wrong.
>>
transformers is all you need
>>
>>106432832
>>106433158
Thanks.
>>
>>106433207
text completion is all you need and chat completion is unnecessary bloat. prove me wrong.
>>
>>106433214
transformers is all you get
>>
>>106433214
>>106433253
Nobody likes trannies, fuck off justin.
>>
File: GgnIBuFbIAAjLWc.jpg (167 KB, 1257x2048)
167 KB
167 KB JPG
recommend a 12b-to-24b model for incest roleplay
>>
Say I want to change my luddite ways regarding AI.
Use case will be writing code. I know how to program myself, and in general can figure out how to structure things, what functions I need to write etc. But my productivity in writing these functions is a bit low.
What do I need, from a hardware point of view, to be able to describe the function I need, and have it shit out the correct solution in, say, 5 or 10 seconds for a 10-20 line function.
>>
>>106433290
do you have a macbook pro or a good vid card?
>>
>>106433289
Nemo, Rocinante
>>
>>106433319
thank you
>>
>>106433290
The hardware isn't there to do that locally in a satisfying way. Your current options are to get a couple 3090s and run a retarded Qwen Coder 30B quickly, or buy a 10k DDR5 server and run Qwen Coder 480B or DeepSeek at 3-10 t/s.
If you're coming as a former luddite, you should probably just make an OpenRouter account and test it on code you don't mind being trained on to get the hang of using AI first.
>>
>>106433289
If you have 64GB of RAM, GLM Air is worth a try too. But yeah, as the other anon said, nemo-instruct or rocinante are the go to.
>>
>>106433346
openrouter gives you a checkbox to say: only allow providers that don't train on my inputs
>>
>>106433397
Do they also have a checkbox that makes them pinky promise?
>>
>>106433346
If you're dropping $10k an a RAM build a mac is faster.
>>
File: file.png (35 KB, 557x420)
35 KB
35 KB PNG
In today's episode of the grok PR:
>let me just change all line endings to CRLF and commit
>>
>>106433448
Is there a problem?
>>
>>106433448
based, linux nerds can suck it
>>
>>106430500
trannies pretty bad at self-awareness
>>
>>106433448
I'm sure they'll finish it eventually right after hardware agnostic parallel processing and multi-token prediction are done.
>>
File: 1720480625812.jpg (232 KB, 1280x667)
232 KB
232 KB JPG
>>106431924
>wintoddlers
Good morning saar!
>>
>>106433564
Wow, good for them. You don't usually think of India as a place that has their shit together, but they are ahead of the curve on that.
>>
>>106433290
local AI can actually be fairly capable in the situations you're describing (small, ~well defined, tightly scoped tasks), your best bet for achieving the speed you're looking for on a reasonable budget is qwen coder 30a3 which shouldn't be too demanding to run, you could fit it on a single 24gb card if you're willing to tank a little quantization brain damage. even if you have to split between VRAM/RAM it should be pretty fast at only 3b active
if you want higher quality than that though the meta option would be the big chinese MoEs and dropping a few K$ on a server with a ton of fast RAM and a decent GPU or two, which won't be that fast but can reach acceptable speeds for small tasks
>>
>>106433581
>India as a place that has their shit together
They have dedicated streets for shitting, of course they got their shit together!
>>
>>106433483
modern macs also default to LF
CRLF is retarded
this is why in any of my programs and scripts that deal with text I treat all input with normalization to LF, too many sources of pollution, just get rid of it whenever it appears.
>>
>>106433314
No, current hardware is ancient (8+ years), but there would be budget for something new.
>>106433346
Are the smaller models that bad? With upcoming video cards from Intel (arc B60) and AMD (R9700) with more ram I was hoping you could get something reasonable for 2k-ish.

It's probably best to mess around with the online stuff first though. I'll check out OpenRouter.
>>
>>106433597
OR has the small models too, so you can use it figure out which level of retardation you still can tolerate and target that.
>>
HELLO /LMG/. I AM NEW HEER. I AM LOOKING TO KNOW IF NEMOTRON V2 IS GOOD. THANKING YOU MUCH LOVE.
>>
>>106433612
Welcome, LLM-Sir!
>>
>>106433207
Sounds good, doesn't work because safety
>>
>>106433612
Nope. Nemo was a fluke. Every Nemotron after has been concentrated math, code, and safety; including Nemotron V2.
>>
>>106433623
vLLM has an option to allow prefilling with chat completion.
>>
>>106433564
aside from the saars please tell us what's local about windows
>>
>>106429101
>pervy book autist Luka
I sleep
>>
>>106433623
literally just add a prefill at the end of your prompt with assistant role
works for me
>>
>>106433630
wasn't the original nemo a mistral/nvidiot collab? maybe that time the only thing they gave is compute
>>
>>106433736
>>wasn't the original nemo a mistral/nvidiot collab?
It was, but it was also just before Mistral got fully infected by the safety virus https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

And
>Mistral-Nemo-Base-2407 is a pretrained base model and therefore does not have any moderation mechanisms.
It's a relic of its time now.
>>
>>106433676
This work most of the time, but sometimes you might need to tweak the jinja template to remove the built in assistant role header for the current generation so that you aren't doubling up.
Hell, if you aren't doing anything fancy like using Macros in your prefil, you could just build that shit into the jinja.
>>
Very good discussions, llm-Sirs.
>>
Hmm, Say token is $10 per million output tokens. Does that mean it can produce the most complex piece of software known to man, a web browser, for $4000? Assuming 20 tokens per LOC, and 20 Million LOC and ignoring the input tokens?
>>
>>106433676
My dogshit ST just requests another assistant message if I try that. Back-to-back assistant is not a use case and should be removed completely.
>>
>>106433850
Problem: LLMs can't fit browser in context and will not be able to do it.
>>
>>106433850
Not even remotely close. Models spit out 10 tokens for every charater of actual code, 100 for reasoning models. Then you need to account for planning and then bug fixing where code has to be fixed and rewritten dozens of times over.
>>
>>106433850
If they were good enough for that, yes. But they aren't. So no.
>>
>>106433850
a set of paint, brushes, and a canvas costs a couple hundred bucks, that's all you need to spend to paint the most beautiful painting known to man!
>>
Who said that FP8 is MUCH faster than Q8 at imagegen? It is just 2-5% faster while having much shittier quality. Is it because I am running with 3000 series card? Once again, FP8 is shit-tier quant.
>>
>>106433949
Quants are not faster per se, the only thing what affects the speed is your vram and the amount of cuda cores you have versus the amount of billions of parameters model has.
Only time when some quant is faster than some other one is just a small cope speed boost anyway.
>>
>>106433867
>Models spit out 10 tokens for every charater of actual code, 100 for reasoning models.
Huh? Are you charged for some intermediate format?
>>106433896
If you would have the fine motor skills and know which paint to put where you could. It's what art forgers do.
>>
NVIDIA 完了吗? 哈哈哈哈哈哈(_)
>96GB VRAM
>~$1,888
>>
>>106434144
Where can I buy this antisemitic GPU, link please?
>>
>>106434144
卧槽!
>>
>>106434144
@grok is this image real?
>>
>>106433864
In theory you don't need all the full code all at once, needs to know what each function does but not how it works, then work on each function one at a time.
But current models are barely scratching 1M context size and that's with degradation after early on.
>>106434032
>Huh?
I think he worded it shittily. It's less than 1 token per literal character overall but the way he said it sounds like the model yak yaks about the code and fills half the code with comments to the point "real code" is only 10% of the output.
Understandably if you have to reiterate repeatedly then you'll end up using a lot more compared to amount of final usable content.
>>
>>106433949
You need 4000 series and up to take advantage. Although you will only get 15-30% speed boost, not double.
>>
File: file.png (463 KB, 1591x993)
463 KB
463 KB PNG
https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html
ITS FUCKING HAPPENING
>>
>>106434186
>like the model yak yaks about the code
Yes. Did you expect to provide it the current code and have it oneshot the next step without any tokens wasted on planning?
>>
File: file.png (60 KB, 894x745)
60 KB
60 KB PNG
>>106434215
happening cancelled
>>
>>106434227
kekekekekekekekekekekekekekekekekekekekekekekeekekekekekekekekeekekekekekekkekekeekekekek
>>
>>106434144
>>106434215
I can't wait for the rest of the redditors to come here and post this nothingburger 5 more times.
>>
File: file.png (279 KB, 1894x545)
279 KB
279 KB PNG
>>106434215
>>106434242
>>106434237
>>106434215
>>106434144
>106434170
>106434180
>106434185
pack your bags
>>
>>106434256
oh no no no no
>>
>>106434227
wait, I bought 3 of these, what's the problem?
>>
File: IMG_4643.jpg (354 KB, 1124x1132)
354 KB
354 KB JPG
>>106429101
>make an mcp server
>decide to start advertising
>get listed in the stupid directories
>go find the communities
>the mcp subreddit is ruled with an iron fist by some garbage-coated-garbage techbro that uses it exclusively to spam glama.ai, which is somehow both the most popular/important directory because of his control of the subreddit and the most unusable, vibe-coded, broken, you literally-can’t-make-an-account-19/20-times-because-it-is-fucked site I’ve seen in my life
ihnmaims_hate_nanoangstroms_speech.jpg
>>
>>106434144
>>106434215
>>106434227
https://support.huawei.com/enterprise/en/doc/EDOC1100285916/181ae99a/specifications
Memory:
LPDDR4X
Capacity: 48 GB/96 GB
Total bandwidth (entire card): 408 GB/s
Error checking and correcting (ECC)

PCIe Gen4.0x16

AI processor:
2 x 310 series Processors, including:
16 Da Vinci AI Cores
16 Huawei-developed CPU cores

CPU computing power:
16 core * 1.9 GHz

So, same speed as DDR5 Epyc?
>>
https://www.reddit.com/r/LocalLLaMA/comments/1kgltqs/huawei_atlas_300i_32gb/
https://www.bilibili.com/video/BV1xB3TenE4s/
>>106434284
r u serious?
>>106434297
ok not bad i guess
>>
File: Gzm635QbEAABZax.png (874 KB, 2481x3508)
874 KB
874 KB PNG
>>
>>106434311
ghey
>>
https://github.com/hipudding/llama.cpp/issues/9
bros i want that card so bad.. this seems so sovlfvl
>>
bros.. atlas 300i 96gb is 33% faster rtx 3060 but with 96gb
i kneel huawei..
>>
>>106434170
https://item.m.jd.com/product/100169906999.html?ad_od=3
You need to be Chinese to access this though, anon. But it's not even legal to import anyways. Although on ebay there's some sus sellers claiming to have it for $3k ish.
>>
>>106434366
>not legal to import
*in the land of the free
THIRD WORLDERS RISE UP!!!
>>
>>106434297
Very small penis bandwidth, very sad. Tragic. B200 has 4.1TB/s.
>>
>>106434378
>B200
And how much are those, huh?
>>
>>106434378
3060 has
Bandwidth
360.0 GB/s
>>
>>106434297
140 TFLOPS FP16 and 280 TOPS INT8, this is comparable to:
>NVIDIA Tesla V100 FP16 125-130 TFLOPS
>NVIDIA A40 FP16 149.7 TFLOPS INT8 299.3 TOPS
>NVIDIA A10 FP16 125 TFLOPS INT8 250 TOPS
>NVIDIA L4 FP16 121 TFLOPS INT8 242 TOPS
>>
>>106434311
yay happy for her!
>>
>>106434398
Isn't slow and cheap, but lots of VRAM exactly what everyone's been begging for?
>>
>>106434409
yes, if i was not poor i would buy
>>
>>106434409
No software support+no CUDA
>>
>>106434389
Doesn't matter. You can't train big models on small penis bandwidth even if it's cheap.
>>
>>106434409
>>106434423
https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/CANN.md
HAPPENING ITS HAPPENING SISTERS ITS HAPPENING WERE BACK
>>
>>106434423
It'll probably work on Vulkan with llama.cpp, but will be useless for anything else.
>>
File: file.png (67 KB, 850x423)
67 KB
67 KB PNG
sisters..
https://github.com/hipudding/llama.cpp/issues/9#issuecomment-2889743942
>>
>>106434433
So no image gen+no video gen+no finetuning, only textgen?
>>
>>106434465
and no audio and no tts and no stt, you could run whisper on cpu but would be fucked on all else
>>
>>106434480
well everyone has at least one nvidia gpu with at least 8gb vram
thats enough for image gen or speech to text or text to speech
but if >>106434459 is true then its JOEVr
>>
>>106434490
Maybe cuda man can make it go fast.
>>
>>106434459
>CANN
more like CANNOT amrite?
>>
cudadev what are your thoughts?
>>106434297
>>106434215
same FTOPS FLOPS TOPS or whatever as V100, memory bandwidth a bit faster than 3060, memory: 96gb
cudadev i remember you saying "if thers a good cheap card for 1500$ i'd buy and support it no matter what"
this is your time to shine cudadev!!1111
>>
>>106434502
can or not
>>
>>106434502
lmao gottem
>>
>>106434356
That's pretty good. A MI50 is basically a 3060 with worse pp, and it is decent good for token generation.
>>
https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html
It really is just $1.4k, huh. But shitty performance. Maybe in a year they can cook up something better
>>
>>106434578
its a 5 year old card
performance is comparable to V100/L4/A10/A40
memory bandwidth is a bit faster than RTX3060 (400GB/s)
memory capacity is comparable to RTX 6000 PRO 96GB BLACKWELL
software support in llama.cpp is very good
>>
>>106434596
So we should just wait two more weeks, then?
>>
>>106434612
buy gpu very cheap graphic card support Q8_0
>>
>>106432378
I agree that I want robots I'm not talking to to shut the hell up. Just because I gen locally doesn't mean I want gemeni to sniff all over me and discover my birthplace.
>>
>>106434430
That's not what I'd want it for. I ask again. How much is a B200?
>>
File: file.png (544 KB, 1478x963)
544 KB
544 KB PNG
p40 prices ***ON ALIBABA*** have fallen to their (EBAY) 2023 prices
>>
>>106434430
Sure you can. It will take longer, but when the card costs 20x less than NVIDIA's offering that might be OK.
Even then, training is often limited by the GPU fabric network speed. NVIDIA's BlueField 3 NIC does 400gbit RDMA between nodes, and there's generally one BF3 NIC per GPU.
>>
>>106434655
I have one of these that was given to me.
What's the best stuff I can run on it?
>>
>>106434705
Who gave it to you?
>>
>>106434723
I did.
>>
>>106431915
256 cores seems like overkill when we are currently choking with 32 cores according to benchmarks.
>>
>>106434705
Qwen3 30b. Good for general purpose and blazing fast.
>>
>>106434757
You prompt processing?
>>
>>106433319
nemotron-nano-v2?
>>
How do you organize your assets like loras and models in your filesystem? Do you do a subfolder in your backend files? Where on your drive do you often store them?
I made a folder right off my Home for AI in general but I'm not 100% pleased with it yet.
>>
>>106434938
I have an nvme that I mount to /mnt/models.
>>
>>106434723
My job. It's not exactly a modern card that the AI platform guys are going to want to use.

>>106434761
Thank you
>>
>>106434225
sorry as infrequent fake hobbyist coder totally forgot about planning
>>
https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
New Chinese non-thinking 560B model. MoE with a "dynamic computation mechanism". Experts have different sizes but on average are 27B. All comparison models are in non-thinking. Uses a weird context template where it counts the turns. Expect Sillytavern to never add that.
>>
>>106434980
A cat is fine too
>>
>>106434980
Okay, that's very nice and all, but how (((safe))) is it?
>>
>>106434938
Imagegen loras were arranged to their own directories with .txt files. But using loras is a bad thing, never used them too much.
>>
File: vram type.png (3 KB, 355x98)
3 KB
3 KB PNG
>>106434227
Go fuck yourself sideways ranjit.
>>
>>106434980
gguf status?
>>
File: file.png (40 KB, 1667x396)
40 KB
40 KB PNG
>>106434953
i store old models worth archiving on my 3TB raid 1 drive
other than that i store nsfw loras in my encrypted ssd (for non LLMs)
on my encrypted ssd i store a few models on my ext4 partition and most models on my ntfs partition (i access it through my ext4 chroot, and i have a custom chroot inside ntfs that's debian 12, my ext4 partition is actually a working debian install)
>inb4 why ntfs
i installed wangblows on USB for vr purposes, and had games installed on the NVME drive
>>
>>106435059
>wangblows
Reddit is the other way.
>>
>>106435000
L -> R: Deepseek V3.1, Qwen3 2507, Kimi K2, Sonnet 4, 2.5 Flash, LongCat
>>106435052
Don't think it has llama.cpp support. Seems like a new architecture. There will probably be a pull request in a day or two
>>
>>106435059
>on my encrypted ssd i store a few models on my ext4 partition and most models on my ntfs partition (i access it through my ext4 chroot, and i have a custom chroot inside ntfs that's debian 12, my ext4 partition is actually a working debian install)
on my unencrypted ssd*
>>106435069
but i dont have wintroons installed on my computer rn, the usb is on a desk behind me
>>
>>106435074
More cucked than Sonnet, great.
>>
File: file.png (114 KB, 797x448)
114 KB
114 KB PNG
>>106435000
bretty cool
>>
File: file.png (41 KB, 792x102)
41 KB
41 KB PNG
>>106434980
20 trillion tokens interesting
>>
File: file.png (51 KB, 909x190)
51 KB
51 KB PNG
>>106435112
native 8k context, extended later hmm
>>106435115
but i dont evne have wintroondows installed on my pc.. im linux god..
>>
>>106435112
20 trillion tokens for training data? Divide this by 3 or 4 to get approximate word count.
>>
>>106434980
>Experts have different sizes but on average are 27B. All comparison models are in non-thinking.
Wow this sounds like something llama.cpp isn't going to implement
>>
>>106435126
>no mentions of filtering pretraining dataset
great, so they only cucked the instruct
we might have hope!!!
>>
>>106435126
That's a pretty common training practice. Check the GLM and Deepseek papers and you'll find the same thing. Easier to train it by starting small
>>
>>106435159
yes at least they didnt train at 2k native
i was impressed that they trained at 8k natively
>>
File: 1727670741585763.png (709 KB, 680x678)
709 KB
709 KB PNG
>>106435059
>john
>insideChrootAnnouncement

im in
>>
>>106435159
Yep.
Makes me wonder if google's secret sauce is something as simple as pretraining on longer sequences instead of doing so in a later step.
>>
>>106435129
6 trillion? Oh goy...
>>
>>106435059
Why do you have nsfw loras on an encrypted SSD? Are they illegal where you are?
>>
>>106435159
The computational costs of attention increase with the square of context size, and after a certain threshold they become the main bottleneck.
>>
>>106435112
>>106435129
20T tokes doesn't matter when Llama-4 Scout was trained on 40T and Maverick was done using 22T.
Also it's insane that someone in the Meta department thought training a larger model on less would be smart. There must be so much office politics involved in that shitshow
>>
>>106435213
6 trillion words stolen from the real judean authors!
>>
>>106435241
Hard to say. They can probably use the same dataset multiple times.
>>
>>106434519
I'm considering ordering one, for something that costs $1000+ I'll first need to look into it a bit more closely.
>>
>>106435112
>agentic intelligence
>>
>>106435241
i still wonder what the fuck scout was trained on to be so shit, 40T tokens of what?? if its so fucking retarded and filtered WHAT did they feed it
>>106435227
they arent illegal but i want to feel extra safe because you never know what laws the EU will implement, or if theres already some kind of law that could be applied in a case
>>
>>106435258
It's still faster than regular cpu+ram plus it has cuda cores.
Life would be so much better if x86 would have adopted something like SGI did with Octanes and its servers.
>>
File: file.png (19 KB, 1570x243)
19 KB
19 KB PNG
>>106435183
it took me a few hours to make a script that will sign into the user, cd into ~, activate the .bashrc and do everything as if i logged into a user normally
:'(
>>
>>106435280
>A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI.
The raw unmarked and unannotated posts by your average Facebook user.
>>
>>106435300
It does not have CUDA cores
>>
>>106435280
>i still wonder what the fuck scout was trained on to be so shit, 40T tokens of what?? if its so fucking retarded and filtered WHAT did they feed it
Even the base model (Scout) seemed deep-fried. they might have trained it on several epochs of safe "high-quality" data.
>>
>>106435320
Oh well it's nothing but a pci-e ram expansion card then.
>>
>>106435311
facebook boomer model would be kino, but i bet all that was lobotomized out during post training
>>
>>106435331
Well, it has Huawei AI cores and processors.
>>
File: file.png (98 KB, 936x907)
98 KB
98 KB PNG
bros we're back
>https://longcat.chat/t
https://longcat.chat/t
>https://longcat.chat/t
https://longcat.chat/t
>>106435323
>>106435311
i felt the same vibe with the first qwen3 models, i remember the day when it released >18T TOKENS WAOW
>oh math and coding 70%
it was good at those 2 things but trash for roleplay, even the newer qwen3 (2507) models suck ass at roleplay
why do i have such a terrible feeling when using qwen3 models?
>>
>>106435351
Yeah sure but you get the gist. Are these even supported and if so, I'd like to see some drivers? What about desktop usage?
>>
>>106435280
They got into a feud with the EU because they planned to use Facebook user data to train Llama 4.
>>
>>106435323
Don't remember where I heard this but I think Goliath was the base model, Scout and Maverick were distillations of it. They're never going to release that 2T model for the sole reason of it being absolutely dogshit and use its size as an excuse.
>>
File: name-probs-bases.png (31 KB, 830x1036)
31 KB
31 KB PNG
>>106435323
Scout base was a fake base like qwen bases.
>>
>>106435358
>bros we're back
thanks bro, wouldn't know about it without you reposting from reddit half an hour after it was posted here
>>
>>106435362
Don't forget that they got into a copyright lawsuit in California a week before release because they torrented 90TB of novels from AnnasArchive. They probably had do some janky reverse-training to wipe its mind of all the novel knowledge.
>>
>>106435371
what? i took the screenshot just now
>>
File: 1747783351835.png (1010 KB, 1317x734)
1010 KB
1010 KB PNG
>>106435241
>Also it's insane that someone in the Meta department thought training a larger model on less would be smart.
product is very good saar
>>
>>106435385
>Zucc hasn't fired that jeet
Meta is hopeless.
>>
>>106435358
>>106434980
>Who is Billie Eilish
All I wanted to do was test its trivia abilities and it started hallucinating links. First one is broken, second works.
>>
>>106435407
Okay so it has good niche trivia knowledge but it seems to prefer answering in Chinese depending on the question. Appending "Answer in English" solves it.
>>
>>106435363
It was distilled from an incomplete checkpoint of behemoth, or so the story goes.
Also, most of the data in their mix was synthetic data IIRC.
>>
>>106435475
Nevermind. DOA
>(Note: Always question why certain labels are applied disproportionately to women and minorities.)
>>
File: 1729808166468345.png (13 KB, 266x239)
13 KB
13 KB PNG
>>106435358
it's over
>>
>>106435504
You're absolutely right.
>>
Sparsest model yet though
> "n_routed_experts": 512,
>>
File: file.png (22 KB, 726x142)
22 KB
22 KB PNG
has no one noticed this cockroach?
>>
>>106435520
go bak
>>
>>106435476
>Also, most of the data in their mix was synthetic data IIRC.
>Llama 3.3 70B, take this boomer's dementia-riddled facebook ranting about his walk in the park and make 500 variations.
Only the highest quality data for Llama 4.
>>
>>106435528
meant for >>106434144
>>
>>106435115
I called you reddit for using a term like "wangblows"
Which exudes insecurity.
>>
File: file.png (30 KB, 816x179)
30 KB
30 KB PNG
>>106435544
go back wintoddler
>>
>>106435534
Pretty much that, yeah.
And I bet they didn't even have the decency to hire a cpuple of kenyans to go through the augumented data after to spot obvious flaws.
>>
File: 737354413559.jpg (176 KB, 1825x894)
176 KB
176 KB JPG
>>106435358
Blah blah 20 trillion tokens dynamic computation mecha-AAAAAAAAACCCKKKK!!!!!
>>
>>106435573
They're all basically just circle jerk training off of everyone else's models outputs at this point. Don't expect anything fun and unique ever again.
>>
File: file.png (290 KB, 640x670)
290 KB
290 KB PNG
>>106435576
>>
>>106435573
I think it's hilarious how that's a legit test for intelligence/generalization.
Seems like the only models capable of responding that, fail even yhe slightest variation, meaning that those were trained on that specific iteration.
Holarious.
>>
File: digestible.png (111 KB, 844x373)
111 KB
111 KB PNG
>>106435534
Synthetic data is very digestible.
>>
>>106430361
>>Some of the new AI researchers recently brought in from OpenAI have already left Meta
they were offered millions of $$ for their positions, right? was the money paid upfront? I'm wondering if zuck was scammed lmao
>>
>>106435258
If you are thinking about programming for this GPU, look into the CANN documentation. It's only available in Chinese last I checked, and kinda shit.
>>
lol
lmao
>>106432193
>>
>>106435621
see >>106432623
>>
>>106435621
>implying that the xai jeet code has any value
>>
>You're a "Scholar" or a "Connoisseur" of Technology
>The Joy is in the Learning, Not the Output
>Analysis Paralysis & The Curse of Knowledge
how many of you ITT suffer from this? i do
>>
>>106435668
Probably not, but would have been fun to see it out in the open if the dude had decided to leak it instead of selling.
>>
>>106435689
I suffer from an excess of desire coupled with the lack of true motivation to see me through the end.
>>
>>106435689
Only analysis paralysis. It's probably more of a motivation issue, but if I force myself to take the first step to doing something I always see it through.
>>
File: 033.jpg (190 KB, 1024x576)
190 KB
190 KB JPG
>>106435358
>googling "How to Raise a Mesugaki" returns nothing
>>
>>106434980
>Experts have different sizes
Oh cool, so someone tried the idea I've been posting about for a while. But did their implementation work out well is the question now.
>>
>>106435891
It would be cool to see two models, one a traditional MoE and another with this Dynamic total param allocation tech trained on the same data and with the same (average) number of params activated to compare.
>>
>>106429101
>Marvis TTS released
saw some examples of this, really poor quality voices but seemingly very fast
>VibeVoice TTS
this looks very interesting and I want to play around with it but without voice cloning it's kinda useless for my uses

seems like GPT-Sovits v4 is still the best local option out there
>>
>>106429701
>>106429934
Because that's what the majority of the code it's trained on has (is my assumption). If most if not all of the code in the training has comments, then you telling it not to do that will have little effect.
>>
>>106436015
It also prpbably helps with steering the model during training, correlating a prompt for doing x with a comment about doing x and the code that does x.
>>
>tfw still no good speech to speech models that preserve text capabilities
>>
https://youtu.be/B2482h_TNwg
holy shit
>>
>>106436184
EUV lithography is "holy shit" worthy for sure.
Such insanely complex and precise machinery that it hits into several quantum physics consideration sounds like fucking science fiction.
>>
Feel like I haven't had a decent model to that can actually run on my computer in a year
>>
>>106436218
Do you not have 64gb of RAM?
If so, glm air is pretty good.
>>
>>106436226
24 vram 32 ram
>>
>>106436233
Oof.
Get 64gb of ram if you can my man.
I think you can run air q3ks, maybe?
>>
>>106436257
It'll be a while before I have spare money again. I could give it a try though but I expect anything to be shit at Q3
>>
>>106436276
I haven't tried using it for anything serious, but q3km (with some topk) seemed perfectly usable, which is kind of impressive considering the number of activated params.
>>
>>106435935
wait wtf I read the project page and read this
>Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by MIT License. Use to generate any text transcript. Furthermore, this release is not intended or licensed for any of the following scenarios:

>Voice impersonation without explicit, recorded consent – cloning a real individual’s voice for satire, advertising, ransom, social‑engineering, or authentication bypass.
So I believed there was no option for voice cloning but from watching a couple vids it seems you only need to drop a 50 second sample into the sample folder and you will get instant clone?
pretty sly from Microsoft LMAO
>>
>>106376303
unbelievably kino gen
>>
>>106436338
>>106436338
>>106436338
>>
>>106436301
I think I'd have to settle for ks but good to know
>>
>>106435935
they seem to be working on it
https://github.com/microsoft/VibeVoice/issues/3



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.