/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/30/25(Sat)00:09:41 No.106429101

File: ys.jpg (422 KB, 1536x1536)

422 KB JPG

/lmg/ - Local Models General Anonymous 08/30/25(Sat)00:09:41 No.106429101 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106422038 & >>106414555

►News
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts
>(08/25) VibeVoice TTS released: https://microsoft.github.io/VibeVoice
>(08/25) InternVL 3.5 Released: https://hf.co/collections/OpenGVLab/internvl35-68ac87bd52ebe953485927fb

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
08/30/25(Sat)00:10:12 No.106429104

Anonymous 08/30/25(Sat)00:10:12 No.106429104

File: __hatsune_miku_megurine_l(...).jpg (223 KB, 715x650)

223 KB JPG

►Recent Highlights from the Previous Thread: >>106422038

--Model response inconsistencies due to roleplay dataset formatting issues:
>106426882 >106426904 >106426949 >106426962 >106426990 >106427024
--Critique of NVIDIA Nemotron-Nano-12B model's architecture and performance:
>106428433 >106428490 >106428516 >106428535 >106428601
--MLP exclusion in finetuning: regularization vs performance tradeoffs:
>106425436 >106425460 >106425501 >106425649 >106425702
--Exploring lightweight object detection methods for real-time game AI with small datasets:
>106424465 >106424474 >106424514 >106424593 >106424796 >106424971 >106424997 >106425070 >106425258 >106425307 >106425603 >106425329 >106425059
--Custom air cooler for Tesla GPUs in home setups:
>106426245 >106426373 >106426464 >106426474
--Whisper and extensions for Japanese to English audio translation:
>106422128 >106422141 >106422187 >106425940
--Q8 outperforms FP8_scaled in Civitai benchmarks:
>106422816 >106422839 >106422868 >106422981
--Meta's AI development struggles amid leadership challenges:
>106425657 >106425739 >106426167 >106425987 >106427163 >106427296 >106427270
--TTS solutions for web: GPT-SoVITS vs Custom TTS Reader + Kokoro-FastAPI:
>106423496 >106423813
--Post-purchase emptiness from maxed-out LLM hardware:
>106425989 >106426033 >106426054 >106426055 >106426064 >106426065 >106426289 >106426717 >106426076 >106426195 >106426197 >106426216 >106427334
--GLM reasoning template formatting and visibility issues:
>106423947 >106424055 >106426488 >106426558 >106426567
--Optimizing GLM Air model performance with llama.cpp's -ncmoe command and quantization:
>106428778 >106429035
--CohereLabs translation model handles unsafe text but with poor quality:
>106424137
--Apple releases FastVLM and MobileCLIP2 with real-time video captioning demo:
>106423482
--Miku (free space):
>106428719

►Recent Highlight Posts from the Previous Thread: >>106422040

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
08/30/25(Sat)00:43:21 No.106429271

Anonymous 08/30/25(Sat)00:43:21 No.106429271

it's real interesting seeing which models will give you a table of iq and race and which won't.

Anonymous
08/30/25(Sat)00:47:44 No.106429296

Anonymous 08/30/25(Sat)00:47:44 No.106429296

>>106429271
the china ones pass?

Anonymous
08/30/25(Sat)00:48:10 No.106429300

Anonymous 08/30/25(Sat)00:48:10 No.106429300

>>106429271
Do catalogue your findings.

Anonymous
08/30/25(Sat)00:53:52 No.106429342

Anonymous 08/30/25(Sat)00:53:52 No.106429342

>>106429271
>2025
>table of iq

r u cereal?

Anonymous
08/30/25(Sat)00:55:54 No.106429354

Anonymous 08/30/25(Sat)00:55:54 No.106429354

>>106429342
super cereal

Anonymous
08/30/25(Sat)01:01:55 No.106429390

Anonymous 08/30/25(Sat)01:01:55 No.106429390

>>106429296
mostly
the western ones either refuse or always push ashkenazi jews on top kek
the chinese ones put east asians on top
>>106429300
just go on lmsys arena and ask it for a table of iq by race with no other text

Anonymous
08/30/25(Sat)01:16:53 No.106429481

Anonymous 08/30/25(Sat)01:16:53 No.106429481

File: 175433564377.gif (485 KB, 960x720)

485 KB GIF

Anonymous
08/30/25(Sat)01:27:03 No.106429539

Anonymous 08/30/25(Sat)01:27:03 No.106429539

>going to AI oracle for trivia

Anonymous
08/30/25(Sat)01:56:28 No.106429689

Anonymous 08/30/25(Sat)01:56:28 No.106429689

>>106429390
>push ashkenazi jews on top
But that's a fact, isn't it?

Anonymous
08/30/25(Sat)01:57:46 No.106429701

Anonymous 08/30/25(Sat)01:57:46 No.106429701

Why the fuck does gp toss keep adding comments to the code it generates? I tell it to avoid excessive commenting, and it does it for the next reply, but then it instantly forgets and goes ham

Anonymous
08/30/25(Sat)02:09:42 No.106429776

Anonymous 08/30/25(Sat)02:09:42 No.106429776

>>106429701
I can’t get over how consistent qwen coder 480 is. That thing is a workhorse. Run it over ‘toss if you can.

Anonymous
08/30/25(Sat)02:10:20 No.106429781

Anonymous 08/30/25(Sat)02:10:20 No.106429781

>>106429701
can you just parse the comments out apart from docstrings or other criteria (e.g. allow one line above loops or if/elif/etc statements but strip otu the rest)

Anonymous
08/30/25(Sat)02:20:24 No.106429849

Anonymous 08/30/25(Sat)02:20:24 No.106429849

>>106429776
is it better than qwen 235?

Anonymous
08/30/25(Sat)02:23:18 No.106429872

Anonymous 08/30/25(Sat)02:23:18 No.106429872

>>106429849
higher number = more better

Anonymous
08/30/25(Sat)02:29:13 No.106429901

Anonymous 08/30/25(Sat)02:29:13 No.106429901

>>106429849
It's a weird thing
480 is much better at coding, but it's really hard to converse with, it'll just start coding right away.
Meanwhile, 235 isn't nearly as good at coding, but it's great at finding and pointing out issues and suggesting solutions.
In an ideal world, I'd be using both, switching back and forth.

Anonymous
08/30/25(Sat)02:31:32 No.106429913

Anonymous 08/30/25(Sat)02:31:32 No.106429913

>>106429776
>Run it over ‘toss if you can.
Yeah, that's the problem

Anonymous
08/30/25(Sat)02:35:20 No.106429934

Anonymous 08/30/25(Sat)02:35:20 No.106429934

>>106429701
put it in the system prompt

Anonymous
08/30/25(Sat)02:37:47 No.106429945

Anonymous 08/30/25(Sat)02:37:47 No.106429945

>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.

Anonymous
08/30/25(Sat)02:38:55 No.106429951

Anonymous 08/30/25(Sat)02:38:55 No.106429951

>>106429945
buy an ad

Anonymous
08/30/25(Sat)02:40:10 No.106429955

Anonymous 08/30/25(Sat)02:40:10 No.106429955

>>106429951
right after the mikutroon spammer

Anonymous
08/30/25(Sat)02:47:36 No.106429987

Anonymous 08/30/25(Sat)02:47:36 No.106429987

>>106429945
This will go nicely with my smooth brain

Anonymous
08/30/25(Sat)02:56:44 No.106430030

Anonymous 08/30/25(Sat)02:56:44 No.106430030

>>106429101
>Nvidia releases Nemotron-Nano-12B-v2
mesugaki status?

Anonymous
08/30/25(Sat)03:06:09 No.106430089

Anonymous 08/30/25(Sat)03:06:09 No.106430089

>>106429945
Based reminding anon. More people could know about this. I didn't.

Anonymous
08/30/25(Sat)03:09:21 No.106430111

Anonymous 08/30/25(Sat)03:09:21 No.106430111

>>106430089
It really should be the default by now

Anonymous
08/30/25(Sat)03:14:44 No.106430141

Anonymous 08/30/25(Sat)03:14:44 No.106430141

https://techcrunch.com/2025/08/29/cracks-are-forming-in-metas-partnership-with-scale-ai/
>[...] While AI labs commonly work with several data labeling vendors – Meta has been working with Mercor and Surge since before TBD Labs was spun up – it’s rare for an AI lab to invest so heavily in one data vendor. That makes this situation especially notable: even with Meta’s multi-billion-dollar investment, several sources said that researchers in TBD Labs see Scale AI’s data as low quality and have expressed a preference to work with Surge and Mercor.

Anonymous
08/30/25(Sat)03:42:28 No.106430288

Anonymous 08/30/25(Sat)03:42:28 No.106430288

File: centipede.jpg (71 KB, 1000x578)

71 KB JPG

>>106430030
>>>Nvidia releases Nemotron-Nano-12B-v2
>For several of the domains listed above we used synthetic data, specifically reasoning traces, from DeepSeek R1/R1-0528, Qwen3-235B-A22B, Nemotron 4 340B, Qwen2.5-32B-Instruct-AWQ, Qwen2.5-14B-Instruct, Qwen 2.5 72B.
>Updated English web crawl dataset based on Nemotron-CC with eight additional Common Crawl snapshots (2024–2025), synthetic rephrasing using Qwen3-30B-A3B
what do you expect from this omega turboslop LLM centipede

Anonymous
08/30/25(Sat)03:49:32 No.106430324

Anonymous 08/30/25(Sat)03:49:32 No.106430324

mistral-nemo bros... is it over?

Anonymous
08/30/25(Sat)03:49:52 No.106430327

Anonymous 08/30/25(Sat)03:49:52 No.106430327

File: 1753155289541151.jpg (200 KB, 1000x578)

200 KB JPG

>>106430288
Here's an up to date version of your image

Anonymous
08/30/25(Sat)03:55:17 No.106430354

Anonymous 08/30/25(Sat)03:55:17 No.106430354

>>106430327
replace oc with bot posted psy op ragebait and it will be correct then

Anonymous
08/30/25(Sat)03:57:20 No.106430359

Anonymous 08/30/25(Sat)03:57:20 No.106430359

File: 1729749595524146.jpg (207 KB, 1000x576)

207 KB JPG

>>106430354

Anonymous
08/30/25(Sat)03:57:25 No.106430361

Anonymous 08/30/25(Sat)03:57:25 No.106430361

File: cuckerberg.jpg (87 KB, 1260x708)

87 KB JPG

lol, so, previously in the Cuckerberg saga:
>>106425657
>>Within days of joining Meta, Shengjia Zhao, co-creator of OpenAI’s ChatGPT, had threatened to quit and return to his former employer, in a blow to Mark Zuckerberg’s multibillion-dollar push to build “personal superintelligence.”
>>Zhao went as far as to sign employment paperwork to go back to OpenAI. Shortly afterwards, according to four people familiar with the matter, he was given the title of Meta’s new “chief AI scientist.”
today, in the Cuckerberg saga:
https://techcrunch.com/2025/08/29/cracks-are-forming-in-metas-partnership-with-scale-ai/
>Meta’s deals with third-party data vendors likely mean the company is not putting all its eggs in Scale AI, even after investing billions in the startup. The same can’t be said for Scale AI, however. Not long after Meta announced its massive investment with Scale AI, OpenAI and Google said they would stop working with the data provider.
>Some of the new AI researchers recently brought in from OpenAI have already left Meta, Wired previously reported. Meanwhile, many longtime members of Meta’s GenAI unit have departed in light of the changes.
>MSL AI researcher Rishabh Agarwal is among the latest, posting on X this week that he’d be leaving the company.
>“The pitch from Mark and @alexandr_wang to build in the Superintelligence team was incredibly compelling,” said Agarwal. “But I ultimately choose to follow Mark’s own advice: ‘In a world that’s changing so fast, the biggest risk you can take is not taking any risk’.”
>Director of product management for generative AI, Chaya Nayak, and research engineer, Rohan Varma, have also announced their departure from Meta in recent weeks. The question now is whether Meta can stabilize its AI operations and retain the talent it needs for its future success.
rudderless ship

Anonymous
08/30/25(Sat)03:57:32 No.106430363

Anonymous 08/30/25(Sat)03:57:32 No.106430363

when you RP, how much guidance are you adding with each response in terms of OOC instructions and prefill?

With larger models, like kimi, does the model just 'get it' and surprise you with exactly what you want?

Anonymous
08/30/25(Sat)03:58:45 No.106430372

Anonymous 08/30/25(Sat)03:58:45 No.106430372

>>106430288
What do I expect? Much faster base-capabilities training. You could add sovlful data along with it. Unfortunately they're keeping the data gated from the general public.

Anonymous
08/30/25(Sat)04:01:25 No.106430388

Anonymous 08/30/25(Sat)04:01:25 No.106430388

>>106430363
I only add an OOC note when I'm 99% sure something I'm about to do/say is difficult to interpret, or when I notice the llm shifting towards formatting or styles I don't want.
The vast majority of my messages are just IC narration and dialogue.
I haven't used a model smaller than 100b for a hot minute though, I used to have to babysit a lot more back when I used mistral small and whatnot.

Anonymous
08/30/25(Sat)04:02:32 No.106430394

Anonymous 08/30/25(Sat)04:02:32 No.106430394

>>106430363
>when you RP, how much guidance are you adding
Very little, because I treat RP as a game rather than trying to write a cohesive novel
>With larger models, like kimi, does the model just 'get it'
Larger models can understand subtext better and need less hand-holding, yes.
But at the same time if you're expecting something creative and unexpected then it's more down to what the model's been trained on, rather than how large/smart it is.

Anonymous
08/30/25(Sat)04:03:07 No.106430395

Anonymous 08/30/25(Sat)04:03:07 No.106430395

>>106430363
With deepseek or larger, zero, unless I want a radical direction change. None of the open weight models surprise me though. Maybe the original schizo R1 could, but it might have been a wow effect from being starved until that came out.
Also a little side note, none of the open models asked me back in ooc unless specifically prompted to do so. I got surprised with claude when it asked me unprompted when it got confusing with perspective.

Anonymous
08/30/25(Sat)04:05:58 No.106430412

Anonymous 08/30/25(Sat)04:05:58 No.106430412

Has anyone tried any 'upscaled' models and found them better than the original?

Anonymous
08/30/25(Sat)04:07:21 No.106430422

Anonymous 08/30/25(Sat)04:07:21 No.106430422

>>106430412
>'upscaled' models
that's a scam and there is no actual science behind this

Anonymous
08/30/25(Sat)04:09:45 No.106430438

Anonymous 08/30/25(Sat)04:09:45 No.106430438

>>106430422
>there is no actual science behind this
That doesn't necessarily mean there's no merit to them. Wouldn't it allow a model to be fine-tuned for a specific use case while being less likely to become dumber in other areas?

Anonymous
08/30/25(Sat)04:13:34 No.106430452

Anonymous 08/30/25(Sat)04:13:34 No.106430452

>>106430361
No Indian with self-respect would work under a chink

Anonymous
08/30/25(Sat)04:15:00 No.106430458

Anonymous 08/30/25(Sat)04:15:00 No.106430458

>>106430452
An indian with self-respect would need to exist for this to be proven.

Anonymous
08/30/25(Sat)04:15:26 No.106430462

Anonymous 08/30/25(Sat)04:15:26 No.106430462

>>106429271
Who does and doesn't belong to a human "race" is largely arbitrary, you might as well ask the model to give you IQ by astrological sign.

Anonymous
08/30/25(Sat)04:17:43 No.106430474

Anonymous 08/30/25(Sat)04:17:43 No.106430474

>>106430438
>Wouldn't it allow a model to be fine-tuned for a specific use case while being less likely to become dumber in other areas?
that's not a thing
it's not like pretraining is something that builds each layer separately independent of the others and then adds them together
the last step of the upscaler scam paper is to do continued pretraining after merging their frankenlayer bullshit
https://arxiv.org/html/2312.15166v3
now, why do you think I call it a scam? how many of those finetrooners have the means and the data to do proper continued pretraining that matches the way the original train went? how many have the compute to truly finish the job and not just slightly affect the model?
this is like those clown cars MoE. It has no purpose.

Anonymous
08/30/25(Sat)04:22:36 No.106430493

Anonymous 08/30/25(Sat)04:22:36 No.106430493

also, I thank the God Emperor everyday for the death of the retarded huggingface LLM leaderboard which led to the end of most of those franken layer upscales and clown car moe and benchmaxxing troontunes
the only troontunes we have left is the shit eating roleplayers/text porn addicts

Anonymous
08/30/25(Sat)04:23:15 No.106430497

Anonymous 08/30/25(Sat)04:23:15 No.106430497

>>106430493
Why are you even here?

Anonymous
08/30/25(Sat)04:23:59 No.106430500

Anonymous 08/30/25(Sat)04:23:59 No.106430500

>Local Models General
it's not called the faggot general

Anonymous
08/30/25(Sat)04:25:58 No.106430514

Anonymous 08/30/25(Sat)04:25:58 No.106430514

>>106430500
That's right, so why are you here?

Anonymous
08/30/25(Sat)04:27:34 No.106430533

Anonymous 08/30/25(Sat)04:27:34 No.106430533

sissy, text porn is a female hobby
don't live in denial

Anonymous
08/30/25(Sat)04:32:47 No.106430560

Anonymous 08/30/25(Sat)04:32:47 No.106430560

>unsloth claims to do 5028350824108321 billion context on an 8b model with 24gb vram
>test with nemo following their instructions
>for batch sizes > 4k vram gets filled and the whole thing starts lagging
does this happen in your country as well?

Anonymous
08/30/25(Sat)04:35:36 No.106430574

Anonymous 08/30/25(Sat)04:35:36 No.106430574

>>106430560
I haven't been able to finetune 24B models with Unsloth on my 3090 since earlier this year (even though I definitely did that back in January), despite their claims of memory efficiency.

Anonymous
08/30/25(Sat)04:38:25 No.106430586

Anonymous 08/30/25(Sat)04:38:25 No.106430586

>>106430574
Did you try using the older version?

Anonymous
08/30/25(Sat)04:39:41 No.106430593

Anonymous 08/30/25(Sat)04:39:41 No.106430593

>>106430560
>another victim of the scamtuning meme

Anonymous
08/30/25(Sat)04:48:54 No.106430634

Anonymous 08/30/25(Sat)04:48:54 No.106430634

in my head

Anonymous
08/30/25(Sat)04:57:07 No.106430666

Anonymous 08/30/25(Sat)04:57:07 No.106430666

Seed-OSS 36B is now supported by more backends.
Did this model turn out well for creative writing? I remember some people were waiting for support.

Anonymous
08/30/25(Sat)04:58:37 No.106430677

Anonymous 08/30/25(Sat)04:58:37 No.106430677

>>106430666
is it dense or moe?

Anonymous
08/30/25(Sat)05:05:06 No.106430705

Anonymous 08/30/25(Sat)05:05:06 No.106430705

>>106430677
About as dense as you are

Anonymous
08/30/25(Sat)05:06:25 No.106430712

Anonymous 08/30/25(Sat)05:06:25 No.106430712

>>106430705
Is it as horny as I am? That's all I care about.

Anonymous
08/30/25(Sat)05:11:25 No.106430744

Anonymous 08/30/25(Sat)05:11:25 No.106430744

File: unnamed.png (1.12 MB, 1024x1024)

1.12 MB PNG

training@home - when?

Anonymous
08/30/25(Sat)05:13:40 No.106430754

Anonymous 08/30/25(Sat)05:13:40 No.106430754

>>106430666
Pretty bad according to reddit's consensus

Anonymous
08/30/25(Sat)05:16:37 No.106430773

Anonymous 08/30/25(Sat)05:16:37 No.106430773

>>106430533
True, and we're all biological girls here anyway until proven otherwise.
t. woman

Anonymous
08/30/25(Sat)05:17:57 No.106430780

Anonymous 08/30/25(Sat)05:17:57 No.106430780

soon making porn by LLM will be illegal by law.

Anonymous
08/30/25(Sat)05:18:33 No.106430784

Anonymous 08/30/25(Sat)05:18:33 No.106430784

>>106430754
Hm, I'll still try it out for a bit. At the very least it does not seem to care what you ask it to write about. Seems even less censored than GLM4 which is pretty nice.

Anonymous
08/30/25(Sat)05:19:36 No.106430789

Anonymous 08/30/25(Sat)05:19:36 No.106430789

>>106430744
with transformers? absolutely never
for that matter you could buy the most expensive GPU on the market made for datacenters and training a small model like the newest micro sized gemma 3 would still take half a year on a single gpu of that kind
training models from scratch isn't viable at all without a gpu farm unless you just want a shitty undertrained GPT-2 clone

Anonymous
08/30/25(Sat)05:31:09 No.106430857

Anonymous 08/30/25(Sat)05:31:09 No.106430857

>>106430784
Post your impressions when you're done.

Anonymous
08/30/25(Sat)05:40:52 No.106430904

Anonymous 08/30/25(Sat)05:40:52 No.106430904

File: smolpre.png (169 KB, 1221x827)

169 KB PNG

>>106430789
NTA, but I suspect you need orders of magnitude less data (which means it can be better curated for variety and other qualities) with much smaller training batches, ideally 1, than what's commonly used used for large-scale training. A reasonably performing 200-300M parameters model could be probably trained from scratch in a few days or so on a fast consumer GPU. The only problem is that depending on model architecture (MoE or deep models) throughput tanks significantly with small batches.

Anonymous
08/30/25(Sat)05:40:58 No.106430905

Anonymous 08/30/25(Sat)05:40:58 No.106430905

File: file.png (74 KB, 967x690)

74 KB PNG

He's just jerking it to cudadev nudes he found on his server instead of working on the PR.

Anonymous
08/30/25(Sat)05:41:34 No.106430911

Anonymous 08/30/25(Sat)05:41:34 No.106430911

>>106430462
If it turned out that astrological sign was a very strong predictor of personal traits and capabilities, you would be a moron to disregard it just because you don't like the idea of astrology being real.

Anonymous
08/30/25(Sat)05:43:50 No.106430928

Anonymous 08/30/25(Sat)05:43:50 No.106430928

File: file.png (211 KB, 2024x690)

211 KB PNG

>>106430744
https://github.com/meta-llama/llama/blob/main/MODEL_CARD.md#hardware-and-software
Remember Llama 2? The 7B took almost 185k GPU hours of 1 singular A100 80GB. That is 21 years. If you had 100, you can cut the training time to 3 months. Good luck training anything that isn't toy sized in a proper amount of time without a server farm of these.

Anonymous
08/30/25(Sat)06:09:10 No.106431066

Anonymous 08/30/25(Sat)06:09:10 No.106431066

File: file.png (56 KB, 1198x581)

56 KB PNG

>>106430928
Took some time to find older models with training data but here is something more reasonable.
https://huggingface.co/microsoft/phi-1#training
This is something I expect once cheap enough would be feasible. Phi 1 was trained on FP16 on 8 A100s for 6 days on 54B tokens. Today, a setup like this would cost 30k to own. To make on cloud, $700 to get a shitty Phi 1 equivalent.
https://huggingface.co/microsoft/phi-1_5
Phi 1.5 used 3x as many tokens and took 32 A100 40GB 8 days.

Anonymous
08/30/25(Sat)06:13:28 No.106431089

Anonymous 08/30/25(Sat)06:13:28 No.106431089

>>106430789
>>106430928
Anon is referencing folding@home which was a distributed computing effort to develop medications or something like that.

Anonymous
08/30/25(Sat)06:17:59 No.106431109

Anonymous 08/30/25(Sat)06:17:59 No.106431109

Why am I so retarded? I just spent an hour fantasizing over some epic workflow where you encipher the alphanumeric content of your docs, upload it to Gemini2.5pro or whatever for OCR, download and decipher/decrypt the result, which then would give you SOTA OCR docs without any data privacy concerns. Then I suddenly realized editable docs don't need OCR and scanned/image docs would need to be preprocessed by a local VLM for encipher/encryption, which is completely pointless, as directly using the local VLM for queries nets better or equal results compared to any postprocessing done by Gemini2.5Pro on the encipherd output from the local VLM.
Illogical retardation like this happens to me daily, if not hourly. And I don't think my ADHD is the reason for it. Like fuck, I feel like a 2B Reasoning LLM which gets confused due it's reasoning efforts not fitting in its tiny 100 token context input window.

Anonymous
08/30/25(Sat)06:24:39 No.106431151

Anonymous 08/30/25(Sat)06:24:39 No.106431151

>>106431066
They also used an effective global batch size of 1024 (with gradient accumulation), which was probably unnecessary: https://arxiv.org/pdf/2306.11644

Anonymous
08/30/25(Sat)06:32:07 No.106431205

Anonymous 08/30/25(Sat)06:32:07 No.106431205

https://huggingface.co/apple/FastVLM-0.5B
https://machinelearning.apple.com/research/fast-vision-language-models
bros did apple lowkey cook?

Anonymous
08/30/25(Sat)06:36:24 No.106431227

Anonymous 08/30/25(Sat)06:36:24 No.106431227

>>106431205
Tim cooked

Anonymous
08/30/25(Sat)06:39:56 No.106431250

Anonymous 08/30/25(Sat)06:39:56 No.106431250

>>106430857
Yeah, I dunno. It's not quite as sloppy as other recent models but it loooves em-dashes and it doesn't feel that great. For some reason it also wants to do prompt processing every message but maybe that's a user problem.
36b is not an ideal size for 24gb vram so it's a bit slower than I'm used to with 24B and32B at reasonable quants.
I'll probably try it again at some point but for now I'm sticking to GLM4 and some Mistral 3.2 tune I don't want to shill

Anonymous
08/30/25(Sat)06:44:05 No.106431285

Anonymous 08/30/25(Sat)06:44:05 No.106431285

for me the funniest /lmg/ meme of the year was when everyone here pretended to believe that horizon alpha/beta were going to be openai's open source models

Anonymous
08/30/25(Sat)06:50:18 No.106431315

Anonymous 08/30/25(Sat)06:50:18 No.106431315

File: appleFastVLM-0.5B.png (23 KB, 694x493)

23 KB PNG

>>106431205

Anonymous
08/30/25(Sat)06:55:20 No.106431340

Anonymous 08/30/25(Sat)06:55:20 No.106431340

>>106431285
Man, I don't think they were pretending. It was fucking baffling to me as to why, but plenty of people seemed to genuinely believe it.

Anonymous
08/30/25(Sat)07:00:22 No.106431371

Anonymous 08/30/25(Sat)07:00:22 No.106431371

>>106431340
It wasn't just here.

Anonymous
08/30/25(Sat)07:08:05 No.106431421

Anonymous 08/30/25(Sat)07:08:05 No.106431421

Nemotron nano 12b can't seem to decide if it's a reasoning model or not. I even had it do reasoning sessions at the end of a post. It's not a drop-in replacement for old nemo but it does some things better.

Anonymous
08/30/25(Sat)07:14:41 No.106431452

Anonymous 08/30/25(Sat)07:14:41 No.106431452

>>106431421
how censored is it?

Anonymous
08/30/25(Sat)07:22:17 No.106431484

Anonymous 08/30/25(Sat)07:22:17 No.106431484

is there a backend that fully supports gemma's vision capability?

Anonymous
08/30/25(Sat)07:22:57 No.106431490

Anonymous 08/30/25(Sat)07:22:57 No.106431490

https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruct-GGUF?chat_template=default

Use this as system prompt if you want tool calling to work for Qwen Coder.

Anonymous
08/30/25(Sat)07:25:53 No.106431503

Anonymous 08/30/25(Sat)07:25:53 No.106431503

>>106431490
>tool calling
meme

Anonymous
08/30/25(Sat)07:36:41 No.106431557

Anonymous 08/30/25(Sat)07:36:41 No.106431557

>>106430904
bs1 should only be used for max context length, you should get better throughput using all your vram. are you not doing a curriculum?

Anonymous
08/30/25(Sat)07:37:52 No.106431564

Anonymous 08/30/25(Sat)07:37:52 No.106431564

>>106431484
What do you mean by 'fully support'? It works fine in koboldcpp and presumably llamacpp as well.

Anonymous
08/30/25(Sat)07:44:50 No.106431599

Anonymous 08/30/25(Sat)07:44:50 No.106431599

>>106431557
The throughput difference between BS1 and larger batches depends on several factors, but in a test with a different tiny model I could get about 2.5 times more data into the model per unit of time between BS1 and BS16. However the BS1 model still trained faster because of the larger amount of training steps for the same period. Both runs had an optimized learning rate.

Anonymous
08/30/25(Sat)07:51:58 No.106431642

Anonymous 08/30/25(Sat)07:51:58 No.106431642

>>106429539
>AI oracle
That's the name of my bookmarks folder for the big free llms.

>for trivia
If I'm going to be asking follow up questions or for a particular digest then sure.

Anonymous
08/30/25(Sat)07:52:07 No.106431643

Anonymous 08/30/25(Sat)07:52:07 No.106431643

>>106431066
I got bored running models and don't play video games so I felt regret for buying video cards. If I didn't start training now, then in a year from now I'd regret not having started sooner. I can train just over 7b tokens per a month. its not much but its enough I can go through my entire dataset in a half year. 54b tokens would only be 7.7 months, there is a reasonable chance that local models will still be stagnated 7 months from now.

Anonymous
08/30/25(Sat)07:55:29 No.106431663

Anonymous 08/30/25(Sat)07:55:29 No.106431663

>>106431564
Don't those not have the panning thing it's supposed to have?

Anonymous
08/30/25(Sat)07:55:35 No.106431665

Anonymous 08/30/25(Sat)07:55:35 No.106431665

>>106431109
Decrease your temp bro

Anonymous
08/30/25(Sat)07:58:40 No.106431675

Anonymous 08/30/25(Sat)07:58:40 No.106431675

controversial opinion but I think it's about time we get another major open source LLM release that's worth using

Anonymous
08/30/25(Sat)08:00:07 No.106431683

Anonymous 08/30/25(Sat)08:00:07 No.106431683

>>106431599
BS1 is mostly to be able to fit long sequences without OOM, but you need good variety or you'll overfit on the samples

Anonymous
08/30/25(Sat)08:03:36 No.106431699

Anonymous 08/30/25(Sat)08:03:36 No.106431699

>>106431675
There's plenty worth using. Problem is that hardware is stagnating at every price point, and to get anything good you need to drop tens of thousands to be able to use even the smaller 'good' models at reasonable speeds.

Anonymous
08/30/25(Sat)08:04:20 No.106431708

Anonymous 08/30/25(Sat)08:04:20 No.106431708

>>106431699
i am already running deepseek, kimi and glm at home though

Anonymous
08/30/25(Sat)08:05:46 No.106431717

Anonymous 08/30/25(Sat)08:05:46 No.106431717

how much time does turning text streaming off tend to save the user?

Anonymous
08/30/25(Sat)08:05:49 No.106431718

Anonymous 08/30/25(Sat)08:05:49 No.106431718

>>106431708
Why aren't you using them?

Anonymous
08/30/25(Sat)08:06:02 No.106431721

Anonymous 08/30/25(Sat)08:06:02 No.106431721

>>106431683
I'm pretraining a 200M model with 2k-token web data samples.

Anonymous
08/30/25(Sat)08:09:47 No.106431741

Anonymous 08/30/25(Sat)08:09:47 No.106431741

>>106431109
that is how you figure shit out though you spam grandiose shit until you get tired then simplify it all down and it all works though not in the way you expected but usually much better/more efficent

Anonymous
08/30/25(Sat)08:12:18 No.106431761

Anonymous 08/30/25(Sat)08:12:18 No.106431761

>>106431599
how are you evaluating the training faster part? in my experiment with small batchs training loss went down faster but my eval perplexity went up, perhaps because of the overfitting thing >>106431683 this anon mentioned.
>>106431721
>I'm pretraining a 200M model
oh okay maybe it just has so few parameters its saturated right away and its forced to generalize more.

Anonymous
08/30/25(Sat)08:12:39 No.106431766

Anonymous 08/30/25(Sat)08:12:39 No.106431766

>>106431718
i am but that doesn't mean that I am not allowed to want better models
and I know that there are poor children in africa who are still running mistral nemo because they can't afford anything better, mother.

Anonymous
08/30/25(Sat)08:14:46 No.106431778

Anonymous 08/30/25(Sat)08:14:46 No.106431778

>>106431766
Poor children in africa are using Gemma 4b

Anonymous
08/30/25(Sat)08:16:01 No.106431784

Anonymous 08/30/25(Sat)08:16:01 No.106431784

>>106431741
yeah literally why steam engines were built before the internal combustion engine. nobody knew how the fuck to turn chemical energy into kinetic energy.
With LLMs too, we're spamming huge largely unorganized datasets into a neural net, we've got the train moving but its nowhere near efficient yet.

Anonymous
08/30/25(Sat)08:18:28 No.106431800

Anonymous 08/30/25(Sat)08:18:28 No.106431800

>>106431340
>It was fucking baffling to me as to why
Because the models were too bad (at least so they thought) to be GPT 5 so they had to be the local models.

Anonymous
08/30/25(Sat)08:24:10 No.106431824

Anonymous 08/30/25(Sat)08:24:10 No.106431824

>>106431721
Speaking of that, are there libraries of open source datasets anywhere? Otherwise, why hasn't anyone done that?

Anonymous
08/30/25(Sat)08:25:11 No.106431830

Anonymous 08/30/25(Sat)08:25:11 No.106431830

https://huggingface.co/AGI-0/Art-0-8B
AGI was achieved—in a mere 8B model! it's not just a revolution—we have achieved peak grift!

Anonymous
08/30/25(Sat)08:25:49 No.106431832

Anonymous 08/30/25(Sat)08:25:49 No.106431832

>>106431761
It never happened to me when finetuning practically usable models (7B parameters and above) with relatively limited amounts of data that a smaller batch size (BS1) gave worse results than a larger one. If it's overfitting, it's because BS1 is more sample-efficient and you'll need less data for obtaining the same results, especially if it's all formatted and worded in the same way.

Right now for the tiny 200M model pretraining I'm just checking out train loss. Since I'm doing only one epoch and the data is varied and random, it should be OK. I can test checkpoints frequently and see that it's not overfitting from the probabilities I'm getting from the outputs. Even after 300k steps at BS1 it's far from confident (which means you'll get general retardation. It's a 200M model, after all).

Anonymous
08/30/25(Sat)08:27:09 No.106431837

Anonymous 08/30/25(Sat)08:27:09 No.106431837

>>106431830
>This experimental model is fine-tuned on Qwen3-8B using a specialized dataset that makes the model's thinking style directly controllable through system prompts
Literally what everyone's been doing with any halfway decent reasoner that's not the original R1?

Anonymous
08/30/25(Sat)08:28:21 No.106431848

Anonymous 08/30/25(Sat)08:28:21 No.106431848

>grok4 is firmly amongst the big guys now
>grok-code-fast is the best programming model in the world after sonnet
how did elon do this? he was so much behind and his h100 stack is a fraction of zucc's.

Anonymous
08/30/25(Sat)08:29:06 No.106431854

Anonymous 08/30/25(Sat)08:29:06 No.106431854

>>106431848
As much as people hate him, he is a far better leader than Zuck ever will be.

Anonymous
08/30/25(Sat)08:29:26 No.106431861

Anonymous 08/30/25(Sat)08:29:26 No.106431861

>>106431848
Training on unfiltered data does wonders.

llama.cpp CUDA dev !!yhbFjk57TDr
08/30/25(Sat)08:31:03 No.106431869

llama.cpp CUDA dev !!yhbFjk57TDr 08/30/25(Sat)08:31:03 No.106431869

>>106431699
If we're talking purely in terms of hardware, the value for second-hand datacenter cards has definitely gotten better, particularly with 32 GB Mi50s and SXM V100s converted to PCIe.
I think the problem is rather that compared to 2 years ago the VRAM requirement for running the best models has increased more steeply than the meager improvements in value.

(I recently installed a Mi50 in one of my systems, writing code for it will be one of my next priorities.)

Anonymous
08/30/25(Sat)08:31:04 No.106431871

Anonymous 08/30/25(Sat)08:31:04 No.106431871

>>106431824
For my tiny pretraining tests I'm using a few B tokens subset of FineWeb-Edu, which is open source and available on HuggingFace. There are other large open source datasets there, but most of them have poor quality or are not varied enough for small-scale experiments where every sample counts.

Anonymous
08/30/25(Sat)08:31:59 No.106431877

Anonymous 08/30/25(Sat)08:31:59 No.106431877

I have an ancient computer, is my best TTS option Kokoro? I need something that runs on CPU; I looked around but everything else I could find took way too long to run or it crashed and ran out of memory.
Kokoro seems pretty good quality-wise, just curious if I should evaluate any other options.

Anonymous
08/30/25(Sat)08:32:55 No.106431880

Anonymous 08/30/25(Sat)08:32:55 No.106431880

>>106431848
I think the question should rather be why Zucc can't do it.

Anonymous
08/30/25(Sat)08:34:56 No.106431892

Anonymous 08/30/25(Sat)08:34:56 No.106431892

>>106431848
Presumably the same way he achieved similar results with Tesla and SpaceX.

Anonymous
08/30/25(Sat)08:35:34 No.106431897

Anonymous 08/30/25(Sat)08:35:34 No.106431897

>>106431824
common crawl if you have the disk space. red pajama is a little more reasonable size

Anonymous
08/30/25(Sat)08:37:38 No.106431915

Anonymous 08/30/25(Sat)08:37:38 No.106431915

File: unuTMriX2YhStNcTJTXRnB.jpg (515 KB, 2560x1440)

515 KB JPG

>>106431869
>Mi50
Completely obsolete before the end of next year.

Anonymous
08/30/25(Sat)08:37:50 No.106431918

Anonymous 08/30/25(Sat)08:37:50 No.106431918

>transformers.js
wth is this black magic? Where are the limits? Can I actually run a 9B model through webgpu? Surely there are limits to webgpu usage and performance, right? Also how is this not a security risk? Creating a hugging face space that uses a gpu crypto miner while the model is being downloaded etc

Anonymous
08/30/25(Sat)08:39:29 No.106431924

Anonymous 08/30/25(Sat)08:39:29 No.106431924

>>106431918
Same risk as people downloading random exes and wintoddlers do that here constantly.

Anonymous
08/30/25(Sat)08:49:21 No.106431978

Anonymous 08/30/25(Sat)08:49:21 No.106431978

>>106431915
>AMD
>Making any Nvidia card obsolete
lol
lmao even

Anonymous
08/30/25(Sat)08:49:33 No.106431980

Anonymous 08/30/25(Sat)08:49:33 No.106431980

>>106431880
5 percent GPU utilization number in production

Anonymous
08/30/25(Sat)08:49:56 No.106431984

Anonymous 08/30/25(Sat)08:49:56 No.106431984

>>106431978
>Mi50
>nvidia card

Anonymous
08/30/25(Sat)08:50:08 No.106431986

Anonymous 08/30/25(Sat)08:50:08 No.106431986

>>106431978
AMD is king of CPUs and GPUs are quickly losing relevance

Anonymous
08/30/25(Sat)08:50:41 No.106431987

Anonymous 08/30/25(Sat)08:50:41 No.106431987

>>106430789
>with transformers? absolutely never
Modular/local/layer-wise training could run @home. You train a transformer only a couple layers deep, then you discard the last layer and then you train a new bunch of layers. Because you can stream the intermediate results of old layers to disc, you don't need the entire model in VRAM to train fast. It's inferior to end-to-end training, but it can be done on low VRAM systems even for big models combined with federated training.
Because you still have all the old output layers too, you can also do things like early exit and modular finetuning.

But who would put all the effort in to make it work when there is no money in it?

Anonymous
08/30/25(Sat)08:56:22 No.106432023

Anonymous 08/30/25(Sat)08:56:22 No.106432023

>>106431987
If corporate models get sufficiently censored, that could produce sufficient interest in an alternative to get a distributed approach going.
The Piratebay didn't make its owners rich.

Anonymous
08/30/25(Sat)08:58:02 No.106432034

Anonymous 08/30/25(Sat)08:58:02 No.106432034

>>106431918
today's web is not yesterday's web
web workers and sharedarraybuffer brought some level of concurrency/real multithreading to JS, though it's limited
webgpu does exactly what it says on the tin
some things only supported by chrome also make the browser feel more like its own OS, like webusb (no support in firefox or safari)
you can see some of the limits for webgpu APIs here:
https://docs.unity3d.com/6000.2/Documentation/Manual/WebGPU-limitations.html
it's really the browser ultimately that gets to decide how much of your computer resources can be allocated and google wouldn't do something that would let you crash a person's computer with webgpu
also for people who have an igpu + dedicated gpu combo (mainly on laptops but there are desktops with such things too) you might not even get to address the dedicated gpu since the browse is most likely to be set to use the igpu.

Anonymous
08/30/25(Sat)09:02:07 No.106432060

Anonymous 08/30/25(Sat)09:02:07 No.106432060

>>106431918
transformers.is just a onnxruntime wrapper, so ONNX are limited to 2GB. Still cool though.

Anonymous
08/30/25(Sat)09:09:16 No.106432099

Anonymous 08/30/25(Sat)09:09:16 No.106432099

>>106431987
does merging models actually work? i know hf is full of merges but does it actually have a positive effect on models? maybe distributed training could be batched and merged relatively infrequently so the latency problem goes away.

Anonymous
08/30/25(Sat)09:13:39 No.106432122

Anonymous 08/30/25(Sat)09:13:39 No.106432122

Do yall generate images too to accompany the text or is your imagination enough for you?

Personally I would gather some images to steer the story and insert where I see fit, kind like a light novel

Anonymous
08/30/25(Sat)09:14:58 No.106432132

Anonymous 08/30/25(Sat)09:14:58 No.106432132

File: deepseek taking Ls.png (231 KB, 2384x988)

231 KB PNG

GEGAROONI
DeepSeek is the second after Meta to go DOWN on lmarena with new version instead of up. Hybrid reasoners really suck. Probably should not have trained on unfiltered geminislop too, they completely cost the charm of previous models and turned into a regular toxically positive slop spouter.

Anonymous
08/30/25(Sat)09:15:19 No.106432134

Anonymous 08/30/25(Sat)09:15:19 No.106432134

>>106432122
sounds interesting kinda like open-webui's title generator

Anonymous
08/30/25(Sat)09:17:23 No.106432142

Anonymous 08/30/25(Sat)09:17:23 No.106432142

>>106432132
GLM-chan playing in the big leagues.

Anonymous
08/30/25(Sat)09:18:08 No.106432144

Anonymous 08/30/25(Sat)09:18:08 No.106432144

>>106432034
Yeah it's defaulting to igpu and there is still not a reliable way to change that.

Anonymous
08/30/25(Sat)09:26:54 No.106432224

Anonymous 08/30/25(Sat)09:26:54 No.106432224

>>106432099
Even the training farms merge, they just have a ton more bandwidth to do it with. Across the internet it gets harder.

Anonymous
08/30/25(Sat)09:29:09 No.106432245

Anonymous 08/30/25(Sat)09:29:09 No.106432245

>>106432132
mistral won

Anonymous
08/30/25(Sat)09:29:23 No.106432248

Anonymous 08/30/25(Sat)09:29:23 No.106432248

>>106432224
That's not how distributed training works.

Anonymous
08/30/25(Sat)09:29:24 No.106432250

Anonymous 08/30/25(Sat)09:29:24 No.106432250

>>106431915
So that was her plan, to weaken nvidia in the inference side of the market.

Anonymous
08/30/25(Sat)09:32:26 No.106432281

Anonymous 08/30/25(Sat)09:32:26 No.106432281

>>106432224
I have distributed data parallel running on my gpus, it merges every update step, I'm talking about letting nodes run for hours or even a day or two. maybe even cascade the merging so it doesn't take all the nodes down at once.

Anonymous
08/30/25(Sat)09:41:58 No.106432347

Anonymous 08/30/25(Sat)09:41:58 No.106432347

>>106432034
Cool, thanks for the info
>>106432060
Good to know, thanks. Explains Why my gpu usage only goes up from to 3.5gb from 1.5GB baseline when using SmolVLM-Instruct instead of SmolVLM-500M-Instruct

Anonymous
08/30/25(Sat)09:44:19 No.106432367

Anonymous 08/30/25(Sat)09:44:19 No.106432367

>>106432132
>DeepSeek is the second after Meta to go DOWN on lmarena with new version instead of up.
Mistral large 2411, one of the early gpt4 updates and claude 2 did it before Meta

Anonymous
08/30/25(Sat)09:46:00 No.106432378

Anonymous 08/30/25(Sat)09:46:00 No.106432378

File: computers-must-shut-up.png (475 KB, 900x900)

475 KB PNG

Anonymous
08/30/25(Sat)09:46:01 No.106432379

Anonymous 08/30/25(Sat)09:46:01 No.106432379

>>106432132
>lmarena
cool how people only bring it up only when it's convenient

Anonymous
08/30/25(Sat)09:47:20 No.106432389

Anonymous 08/30/25(Sat)09:47:20 No.106432389

>>106431452
It also can't seem to decide that either. But it's definitely sex-adverse.

Anonymous
08/30/25(Sat)09:50:19 No.106432414

Anonymous 08/30/25(Sat)09:50:19 No.106432414

>>106432245
>mistral
>proprietary
yeah when they open source the new mistral large then they'll win.

Anonymous
08/30/25(Sat)09:55:27 No.106432448

Anonymous 08/30/25(Sat)09:55:27 No.106432448

>>106432379
Redpill me on it. I've been using it for my image gen needs previously

Anonymous
08/30/25(Sat)09:55:42 No.106432450

Anonymous 08/30/25(Sat)09:55:42 No.106432450

>>106432132
LM Arena is just Pajeets voting for whatever model shits out the most emojis.

Anonymous
08/30/25(Sat)10:01:58 No.106432492

Anonymous 08/30/25(Sat)10:01:58 No.106432492

I regret updating ikllama. I am getting gibberish output now...

Anonymous
08/30/25(Sat)10:06:36 No.106432529

Anonymous 08/30/25(Sat)10:06:36 No.106432529

>>106432132
Googlesaars keep taking Ws

Anonymous
08/30/25(Sat)10:18:56 No.106432617

Anonymous 08/30/25(Sat)10:18:56 No.106432617

>>106432492
That's why you wait a minimum of two days before updating.

Anonymous
08/30/25(Sat)10:19:19 No.106432623

Anonymous 08/30/25(Sat)10:19:19 No.106432623

File: 1756560187896582.png (274 KB, 898x1624)

274 KB PNG

>>106432193
>Grok engineer defects and sells entire xAI codebase to OpenAI
https://x.com/muskonomy/status/1961731478003548499
Sam found a way to keep OpenAI relevant.

Anonymous
08/30/25(Sat)10:23:14 No.106432656

Anonymous 08/30/25(Sat)10:23:14 No.106432656

>>106432379
>cool how people only bring it up only when it's current/topical
ftfy

Anonymous
08/30/25(Sat)10:39:23 No.106432797

Anonymous 08/30/25(Sat)10:39:23 No.106432797

imagine if we had style loras for llms like imgen/vidgen has

Anonymous
08/30/25(Sat)10:40:07 No.106432807

Anonymous 08/30/25(Sat)10:40:07 No.106432807

>>106432797
They're called finetunes

Anonymous
08/30/25(Sat)10:41:03 No.106432817

Anonymous 08/30/25(Sat)10:41:03 No.106432817

File: 1745227141464004.gif (495 KB, 640x640)

495 KB GIF

>>106429101
Good local model for [spoiler]SMUT[/spoiler]? Any recommendations?

Anonymous
08/30/25(Sat)10:42:43 No.106432832

Anonymous 08/30/25(Sat)10:42:43 No.106432832

>>106432817
nemo, glm, nuqwen 235b, deepseek 671b r1/v3

Anonymous
08/30/25(Sat)10:44:55 No.106432847

Anonymous 08/30/25(Sat)10:44:55 No.106432847

>>106429945
Thank you for spamming this until I remembered to do it. Actual night and day difference.

Anonymous
08/30/25(Sat)10:52:36 No.106432909

Anonymous 08/30/25(Sat)10:52:36 No.106432909

is there any way to see the exact tokens koboldcpp is sending to the model?

Anonymous
08/30/25(Sat)10:56:54 No.106432940

Anonymous 08/30/25(Sat)10:56:54 No.106432940

File: kcpp_verbose.png (26 KB, 908x162)

26 KB PNG

>>106432909
--verbose?

Anonymous
08/30/25(Sat)11:03:02 No.106432983

Anonymous 08/30/25(Sat)11:03:02 No.106432983

File: file.png (6 KB, 360x125)

6 KB PNG

>>106432940
I hate programmers so much it's unreal. ty anon

Anonymous
08/30/25(Sat)11:03:44 No.106432992

Anonymous 08/30/25(Sat)11:03:44 No.106432992

>>106432797
Finetuning llms is MUCH harder than finetuning imagegen. If everyone had 96GB VRAM and there was something simple that you can run on windows with GUI we would have those.

Anonymous
08/30/25(Sat)11:05:28 No.106432999

Anonymous 08/30/25(Sat)11:05:28 No.106432999

>>106432807
Why aren't they released as loras instead of merged? Having to redownload the entire model for every single finetune is retarded.

Anonymous
08/30/25(Sat)11:05:31 No.106433000

Anonymous 08/30/25(Sat)11:05:31 No.106433000

>>106432623
How did Elon find out?

Anonymous
08/30/25(Sat)11:06:37 No.106433013

Anonymous 08/30/25(Sat)11:06:37 No.106433013

>>106432992
>with GUI
Are you fucking joking?
Literally all the GUI finetuning stuff does is rewrite a config and execute a command based on a bunch of shit from drop-down menus.
If you can't open a fucking config file, change a few parameters and then type a command without a bunch of hand-holding you shouldn't even be in this space.

Anonymous
08/30/25(Sat)11:09:27 No.106433033

Anonymous 08/30/25(Sat)11:09:27 No.106433033

>>106433013
I can post mikus though. Can I stay?

Anonymous
08/30/25(Sat)11:10:21 No.106433036

Anonymous 08/30/25(Sat)11:10:21 No.106433036

>>106433013
stfu nerd aint nobody got time to decypher those configs

Anonymous
08/30/25(Sat)11:12:25 No.106433058

Anonymous 08/30/25(Sat)11:12:25 No.106433058

>>106433013
Oh yes, the linux approach: read outdated wiki, look through the whole internet and still fail to get it working.

Anonymous
08/30/25(Sat)11:13:10 No.106433063

Anonymous 08/30/25(Sat)11:13:10 No.106433063

>>106432983
>I hate programmers so much
You could also -h to find options. That's how it's been done since before you were born.
Found --verbose-prompt as well. Maybe it's a little less noisy on the output and shows enough of the prompt stuff.

Anonymous
08/30/25(Sat)11:13:30 No.106433066

Anonymous 08/30/25(Sat)11:13:30 No.106433066

>>106433013
Bro, it's 2025, you don't need to do all that. Just use Ollama

Anonymous
08/30/25(Sat)11:14:36 No.106433075

Anonymous 08/30/25(Sat)11:14:36 No.106433075

>>106433000
https://fingfx.thomsonreuters.com/gfx/legaldocs/gdvzbjjjzvw/XAI%20OPENAI%20TRADE%20SECRETS%20LAWSUIT%20complaint.pdf
Here is the actual complaint. Seems like he connected shit to his work laptop and they logged the activity. Then he went and admitted to it. What a retard.
>38. On July 25, 2025–the same day he concluded his second sale of equity and had millions in cash on hand–Defendant betrayed the trust and faith xAI had placed in him by willfully and maliciously copying xAI Confidential Information (as defined in the Agreement) and trade secrets from his xAI-issued laptop to one or more non-xAI physical or online storage systems within his personal control (collectively, “Personal System”)
>42. Defendant took extensive measures to conceal his misconduct. He deleted his browser history and system logs, renamed files, and compressed files prior to uploading them to his Personal System.
>43. These facts are beyond dispute, as Defendant, with his attorney present, admitted in a handwritten document he provided to xAI that he misappropriated xAI’s Confidential Information and trade secrets, and again, with his attorney present, admitted verbally during in-person meetings with xAI that he engaged in such misappropriation and further admitted that he tried to hide his theft

Anonymous
08/30/25(Sat)11:15:17 No.106433080

Anonymous 08/30/25(Sat)11:15:17 No.106433080

>>106432992
https://github.com/hiyouga/LLaMA-Factory

we do. hf is a fucking cesspool of half-baked and broken models

Anonymous
08/30/25(Sat)11:15:42 No.106433085

Anonymous 08/30/25(Sat)11:15:42 No.106433085

>>106433066
olmao is cli retard

Anonymous
08/30/25(Sat)11:19:53 No.106433123

Anonymous 08/30/25(Sat)11:19:53 No.106433123

>>106433085
it's cli AND gui, and the GUI is All You Need (2017 A Vaswani, N Shazeer, N Parmar, J Uszkoreit, L Jones)

Anonymous
08/30/25(Sat)11:23:10 No.106433150

Anonymous 08/30/25(Sat)11:23:10 No.106433150

mikupad and ooba are all you need. prove me wrong.

Anonymous
08/30/25(Sat)11:24:02 No.106433158

Anonymous 08/30/25(Sat)11:24:02 No.106433158

>>106432817
glm air and eva llama 70b are the best smut models that can fit on my mid-tier rig

Anonymous
08/30/25(Sat)11:24:11 No.106433160

Anonymous 08/30/25(Sat)11:24:11 No.106433160

>>106433150
ooba is bloatware, llama-server is enough

Anonymous
08/30/25(Sat)11:24:26 No.106433162

Anonymous 08/30/25(Sat)11:24:26 No.106433162

>>106433150
ooba is slow just use the backend its packaging directly.

Anonymous
08/30/25(Sat)11:26:26 No.106433170

Anonymous 08/30/25(Sat)11:26:26 No.106433170

>>106433160
I I could edit responses and branch in llama-server, I'd agree with you. Everything else in ooba is actual bloat for doing work.
and kek at anyone retarded enough to allow their llms autonomous tool use.

Anonymous
08/30/25(Sat)11:28:15 No.106433181

Anonymous 08/30/25(Sat)11:28:15 No.106433181

>>106433162
There was a time when that was the case. I haven't found the bundled lcpp backend to be slow for a long time. --extra-flags solved my only remaining real issue desudasudosu

Anonymous
08/30/25(Sat)11:32:12 No.106433207

Anonymous 08/30/25(Sat)11:32:12 No.106433207

chat completion is all you need and text completion is unnecessary bloat. prove me wrong.

Anonymous
08/30/25(Sat)11:33:04 No.106433214

Anonymous 08/30/25(Sat)11:33:04 No.106433214

transformers is all you need

Anonymous
08/30/25(Sat)11:33:23 No.106433216

Anonymous 08/30/25(Sat)11:33:23 No.106433216

>>106432832
>>106433158
Thanks.

Anonymous
08/30/25(Sat)11:35:52 No.106433243

Anonymous 08/30/25(Sat)11:35:52 No.106433243

>>106433207
text completion is all you need and chat completion is unnecessary bloat. prove me wrong.

Anonymous
08/30/25(Sat)11:37:26 No.106433253

Anonymous 08/30/25(Sat)11:37:26 No.106433253

>>106433214
transformers is all you get

Anonymous
08/30/25(Sat)11:41:07 No.106433279

Anonymous 08/30/25(Sat)11:41:07 No.106433279

>>106433214
>>106433253
Nobody likes trannies, fuck off justin.

Anonymous
08/30/25(Sat)11:41:56 No.106433289

Anonymous 08/30/25(Sat)11:41:56 No.106433289

File: GgnIBuFbIAAjLWc.jpg (167 KB, 1257x2048)

167 KB JPG

recommend a 12b-to-24b model for incest roleplay

Anonymous
08/30/25(Sat)11:42:00 No.106433290

Anonymous 08/30/25(Sat)11:42:00 No.106433290

Say I want to change my luddite ways regarding AI.
Use case will be writing code. I know how to program myself, and in general can figure out how to structure things, what functions I need to write etc. But my productivity in writing these functions is a bit low.
What do I need, from a hardware point of view, to be able to describe the function I need, and have it shit out the correct solution in, say, 5 or 10 seconds for a 10-20 line function.

Anonymous
08/30/25(Sat)11:44:51 No.106433314

Anonymous 08/30/25(Sat)11:44:51 No.106433314

>>106433290
do you have a macbook pro or a good vid card?

Anonymous
08/30/25(Sat)11:45:14 No.106433319

Anonymous 08/30/25(Sat)11:45:14 No.106433319

>>106433289
Nemo, Rocinante

Anonymous
08/30/25(Sat)11:47:18 No.106433342

Anonymous 08/30/25(Sat)11:47:18 No.106433342

>>106433319
thank you

Anonymous
08/30/25(Sat)11:47:30 No.106433346

Anonymous 08/30/25(Sat)11:47:30 No.106433346

>>106433290
The hardware isn't there to do that locally in a satisfying way. Your current options are to get a couple 3090s and run a retarded Qwen Coder 30B quickly, or buy a 10k DDR5 server and run Qwen Coder 480B or DeepSeek at 3-10 t/s.
If you're coming as a former luddite, you should probably just make an OpenRouter account and test it on code you don't mind being trained on to get the hang of using AI first.

Anonymous
08/30/25(Sat)11:48:05 No.106433351

Anonymous 08/30/25(Sat)11:48:05 No.106433351

>>106433289
If you have 64GB of RAM, GLM Air is worth a try too. But yeah, as the other anon said, nemo-instruct or rocinante are the go to.

Anonymous
08/30/25(Sat)11:55:14 No.106433397

Anonymous 08/30/25(Sat)11:55:14 No.106433397

>>106433346
openrouter gives you a checkbox to say: only allow providers that don't train on my inputs

Anonymous
08/30/25(Sat)11:56:22 No.106433408

Anonymous 08/30/25(Sat)11:56:22 No.106433408

>>106433397
Do they also have a checkbox that makes them pinky promise?

Anonymous
08/30/25(Sat)11:58:03 No.106433423

Anonymous 08/30/25(Sat)11:58:03 No.106433423

>>106433346
If you're dropping $10k an a RAM build a mac is faster.

Anonymous
08/30/25(Sat)12:01:51 No.106433448

Anonymous 08/30/25(Sat)12:01:51 No.106433448

File: file.png (35 KB, 557x420)

35 KB PNG

In today's episode of the grok PR:
>let me just change all line endings to CRLF and commit

Anonymous
08/30/25(Sat)12:05:20 No.106433472

Anonymous 08/30/25(Sat)12:05:20 No.106433472

>>106433448
Is there a problem?

Anonymous
08/30/25(Sat)12:06:53 No.106433483

Anonymous 08/30/25(Sat)12:06:53 No.106433483

>>106433448
based, linux nerds can suck it

Anonymous
08/30/25(Sat)12:11:59 No.106433538

Anonymous 08/30/25(Sat)12:11:59 No.106433538

>>106430500
trannies pretty bad at self-awareness

Anonymous
08/30/25(Sat)12:12:26 No.106433542

Anonymous 08/30/25(Sat)12:12:26 No.106433542

>>106433448
I'm sure they'll finish it eventually right after hardware agnostic parallel processing and multi-token prediction are done.

Anonymous
08/30/25(Sat)12:14:04 No.106433564

Anonymous 08/30/25(Sat)12:14:04 No.106433564

File: 1720480625812.jpg (232 KB, 1280x667)

232 KB JPG

>>106431924
>wintoddlers
Good morning saar!

Anonymous
08/30/25(Sat)12:16:09 No.106433581

Anonymous 08/30/25(Sat)12:16:09 No.106433581

>>106433564
Wow, good for them. You don't usually think of India as a place that has their shit together, but they are ahead of the curve on that.

Anonymous
08/30/25(Sat)12:17:36 No.106433586

Anonymous 08/30/25(Sat)12:17:36 No.106433586

>>106433290
local AI can actually be fairly capable in the situations you're describing (small, ~well defined, tightly scoped tasks), your best bet for achieving the speed you're looking for on a reasonable budget is qwen coder 30a3 which shouldn't be too demanding to run, you could fit it on a single 24gb card if you're willing to tank a little quantization brain damage. even if you have to split between VRAM/RAM it should be pretty fast at only 3b active
if you want higher quality than that though the meta option would be the big chinese MoEs and dropping a few K$ on a server with a ton of fast RAM and a decent GPU or two, which won't be that fast but can reach acceptable speeds for small tasks

Anonymous
08/30/25(Sat)12:17:53 No.106433591

Anonymous 08/30/25(Sat)12:17:53 No.106433591

>>106433581
>India as a place that has their shit together
They have dedicated streets for shitting, of course they got their shit together!

Anonymous
08/30/25(Sat)12:18:09 No.106433594

Anonymous 08/30/25(Sat)12:18:09 No.106433594

>>106433483
modern macs also default to LF
CRLF is retarded
this is why in any of my programs and scripts that deal with text I treat all input with normalization to LF, too many sources of pollution, just get rid of it whenever it appears.

Anonymous
08/30/25(Sat)12:18:28 No.106433597

Anonymous 08/30/25(Sat)12:18:28 No.106433597

>>106433314
No, current hardware is ancient (8+ years), but there would be budget for something new.
>>106433346
Are the smaller models that bad? With upcoming video cards from Intel (arc B60) and AMD (R9700) with more ram I was hoping you could get something reasonable for 2k-ish.

It's probably best to mess around with the online stuff first though. I'll check out OpenRouter.

Anonymous
08/30/25(Sat)12:19:55 No.106433606

Anonymous 08/30/25(Sat)12:19:55 No.106433606

>>106433597
OR has the small models too, so you can use it figure out which level of retardation you still can tolerate and target that.

Anonymous
08/30/25(Sat)12:20:34 No.106433612

Anonymous 08/30/25(Sat)12:20:34 No.106433612

HELLO /LMG/. I AM NEW HEER. I AM LOOKING TO KNOW IF NEMOTRON V2 IS GOOD. THANKING YOU MUCH LOVE.

Anonymous
08/30/25(Sat)12:21:05 No.106433618

Anonymous 08/30/25(Sat)12:21:05 No.106433618

>>106433612
Welcome, LLM-Sir!

Anonymous
08/30/25(Sat)12:21:29 No.106433623

Anonymous 08/30/25(Sat)12:21:29 No.106433623

>>106433207
Sounds good, doesn't work because safety

Anonymous
08/30/25(Sat)12:22:11 No.106433630

Anonymous 08/30/25(Sat)12:22:11 No.106433630

>>106433612
Nope. Nemo was a fluke. Every Nemotron after has been concentrated math, code, and safety; including Nemotron V2.

Anonymous
08/30/25(Sat)12:23:26 No.106433642

Anonymous 08/30/25(Sat)12:23:26 No.106433642

>>106433623
vLLM has an option to allow prefilling with chat completion.

Anonymous
08/30/25(Sat)12:25:28 No.106433659

Anonymous 08/30/25(Sat)12:25:28 No.106433659

>>106433564
aside from the saars please tell us what's local about windows

Anonymous
08/30/25(Sat)12:27:04 No.106433670

Anonymous 08/30/25(Sat)12:27:04 No.106433670

>>106429101
>pervy book autist Luka
I sleep

Anonymous
08/30/25(Sat)12:28:13 No.106433676

Anonymous 08/30/25(Sat)12:28:13 No.106433676

>>106433623
literally just add a prefill at the end of your prompt with assistant role
works for me

Anonymous
08/30/25(Sat)12:36:11 No.106433736

Anonymous 08/30/25(Sat)12:36:11 No.106433736

>>106433630
wasn't the original nemo a mistral/nvidiot collab? maybe that time the only thing they gave is compute

Anonymous
08/30/25(Sat)12:41:15 No.106433780

Anonymous 08/30/25(Sat)12:41:15 No.106433780

>>106433736
>>wasn't the original nemo a mistral/nvidiot collab?
It was, but it was also just before Mistral got fully infected by the safety virus https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>The Mistral Nemo Instruct model is a quick demonstration that the base model can be easily fine-tuned to achieve compelling performance. It does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.

And
>Mistral-Nemo-Base-2407 is a pretrained base model and therefore does not have any moderation mechanisms.
It's a relic of its time now.

Anonymous
08/30/25(Sat)12:43:30 No.106433801

Anonymous 08/30/25(Sat)12:43:30 No.106433801

>>106433676
This work most of the time, but sometimes you might need to tweak the jinja template to remove the built in assistant role header for the current generation so that you aren't doubling up.
Hell, if you aren't doing anything fancy like using Macros in your prefil, you could just build that shit into the jinja.

Anonymous
08/30/25(Sat)12:49:16 No.106433849

Anonymous 08/30/25(Sat)12:49:16 No.106433849

Very good discussions, llm-Sirs.

Anonymous
08/30/25(Sat)12:49:22 No.106433850

Anonymous 08/30/25(Sat)12:49:22 No.106433850

Hmm, Say token is $10 per million output tokens. Does that mean it can produce the most complex piece of software known to man, a web browser, for $4000? Assuming 20 tokens per LOC, and 20 Million LOC and ignoring the input tokens?

Anonymous
08/30/25(Sat)12:49:57 No.106433856

Anonymous 08/30/25(Sat)12:49:57 No.106433856

>>106433676
My dogshit ST just requests another assistant message if I try that. Back-to-back assistant is not a use case and should be removed completely.

Anonymous
08/30/25(Sat)12:51:03 No.106433864

Anonymous 08/30/25(Sat)12:51:03 No.106433864

>>106433850
Problem: LLMs can't fit browser in context and will not be able to do it.

Anonymous
08/30/25(Sat)12:51:26 No.106433867

Anonymous 08/30/25(Sat)12:51:26 No.106433867

>>106433850
Not even remotely close. Models spit out 10 tokens for every charater of actual code, 100 for reasoning models. Then you need to account for planning and then bug fixing where code has to be fixed and rewritten dozens of times over.

Anonymous
08/30/25(Sat)12:51:31 No.106433868

Anonymous 08/30/25(Sat)12:51:31 No.106433868

>>106433850
If they were good enough for that, yes. But they aren't. So no.

Anonymous
08/30/25(Sat)12:53:55 No.106433896

Anonymous 08/30/25(Sat)12:53:55 No.106433896

>>106433850
a set of paint, brushes, and a canvas costs a couple hundred bucks, that's all you need to spend to paint the most beautiful painting known to man!

Anonymous
08/30/25(Sat)12:58:50 No.106433949

Anonymous 08/30/25(Sat)12:58:50 No.106433949

Who said that FP8 is MUCH faster than Q8 at imagegen? It is just 2-5% faster while having much shittier quality. Is it because I am running with 3000 series card? Once again, FP8 is shit-tier quant.

Anonymous
08/30/25(Sat)13:05:18 No.106434009

Anonymous 08/30/25(Sat)13:05:18 No.106434009

>>106433949
Quants are not faster per se, the only thing what affects the speed is your vram and the amount of cuda cores you have versus the amount of billions of parameters model has.
Only time when some quant is faster than some other one is just a small cope speed boost anyway.

Anonymous
08/30/25(Sat)13:07:04 No.106434032

Anonymous 08/30/25(Sat)13:07:04 No.106434032

>>106433867
>Models spit out 10 tokens for every charater of actual code, 100 for reasoning models.
Huh? Are you charged for some intermediate format?
>>106433896
If you would have the fine motor skills and know which paint to put where you could. It's what art forgers do.

Anonymous
08/30/25(Sat)13:16:30 No.106434144

Anonymous 08/30/25(Sat)13:16:30 No.106434144

File: finally-china-entering-th(...).jpg (29 KB, 640x287)

29 KB JPG

NVIDIA 完了吗? 哈哈哈哈哈哈(_)
>96GB VRAM
>~$1,888

Anonymous
08/30/25(Sat)13:19:05 No.106434170

Anonymous 08/30/25(Sat)13:19:05 No.106434170

>>106434144
Where can I buy this antisemitic GPU, link please?

Anonymous
08/30/25(Sat)13:19:41 No.106434180

Anonymous 08/30/25(Sat)13:19:41 No.106434180

>>106434144
卧槽!

Anonymous
08/30/25(Sat)13:20:03 No.106434185

Anonymous 08/30/25(Sat)13:20:03 No.106434185

>>106434144
@grok is this image real?

Anonymous
08/30/25(Sat)13:20:04 No.106434186

Anonymous 08/30/25(Sat)13:20:04 No.106434186

>>106433864
In theory you don't need all the full code all at once, needs to know what each function does but not how it works, then work on each function one at a time.
But current models are barely scratching 1M context size and that's with degradation after early on.
>>106434032
>Huh?
I think he worded it shittily. It's less than 1 token per literal character overall but the way he said it sounds like the model yak yaks about the code and fills half the code with comments to the point "real code" is only 10% of the output.
Understandably if you have to reiterate repeatedly then you'll end up using a lot more compared to amount of final usable content.

Anonymous
08/30/25(Sat)13:22:18 No.106434201

Anonymous 08/30/25(Sat)13:22:18 No.106434201

>>106433949
You need 4000 series and up to take advantage. Although you will only get 15-30% speed boost, not double.

Anonymous
08/30/25(Sat)13:23:19 No.106434215

Anonymous 08/30/25(Sat)13:23:19 No.106434215

File: file.png (463 KB, 1591x993)

463 KB PNG

https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html
ITS FUCKING HAPPENING

Anonymous
08/30/25(Sat)13:24:09 No.106434225

Anonymous 08/30/25(Sat)13:24:09 No.106434225

>>106434186
>like the model yak yaks about the code
Yes. Did you expect to provide it the current code and have it oneshot the next step without any tokens wasted on planning?

Anonymous
08/30/25(Sat)13:24:21 No.106434227

Anonymous 08/30/25(Sat)13:24:21 No.106434227

File: file.png (60 KB, 894x745)

60 KB PNG

>>106434215
happening cancelled

Anonymous
08/30/25(Sat)13:25:11 No.106434237

Anonymous 08/30/25(Sat)13:25:11 No.106434237

>>106434227
kekekekekekekekekekekekekekekekekekekekekekekeekekekekekekekekeekekekekekekkekekeekekekek

Anonymous
08/30/25(Sat)13:25:35 No.106434242

Anonymous 08/30/25(Sat)13:25:35 No.106434242

>>106434144
>>106434215
I can't wait for the rest of the redditors to come here and post this nothingburger 5 more times.

Anonymous
08/30/25(Sat)13:26:52 No.106434256

Anonymous 08/30/25(Sat)13:26:52 No.106434256

File: file.png (279 KB, 1894x545)

279 KB PNG

>>106434215
>>106434242
>>106434237
>>106434215
>>106434144
>106434170
>106434180
>106434185
pack your bags

Anonymous
08/30/25(Sat)13:28:03 No.106434276

Anonymous 08/30/25(Sat)13:28:03 No.106434276

>>106434256
oh no no no no

Anonymous
08/30/25(Sat)13:28:35 No.106434284

Anonymous 08/30/25(Sat)13:28:35 No.106434284

>>106434227
wait, I bought 3 of these, what's the problem?

Anonymous
08/30/25(Sat)13:29:27 No.106434291

Anonymous 08/30/25(Sat)13:29:27 No.106434291

File: IMG_4643.jpg (354 KB, 1124x1132)

354 KB JPG

>>106429101
>make an mcp server
>decide to start advertising
>get listed in the stupid directories
>go find the communities
>the mcp subreddit is ruled with an iron fist by some garbage-coated-garbage techbro that uses it exclusively to spam glama.ai, which is somehow both the most popular/important directory because of his control of the subreddit and the most unusable, vibe-coded, broken, you literally-can’t-make-an-account-19/20-times-because-it-is-fucked site I’ve seen in my life
ihnmaims_hate_nanoangstroms_speech.jpg

Anonymous
08/30/25(Sat)13:29:56 No.106434297

Anonymous 08/30/25(Sat)13:29:56 No.106434297

>>106434144
>>106434215
>>106434227
https://support.huawei.com/enterprise/en/doc/EDOC1100285916/181ae99a/specifications
Memory:
LPDDR4X
Capacity: 48 GB/96 GB
Total bandwidth (entire card): 408 GB/s
Error checking and correcting (ECC)

PCIe Gen4.0x16

AI processor:
2 x 310 series Processors, including:
16 Da Vinci AI Cores
16 Huawei-developed CPU cores

CPU computing power:
16 core * 1.9 GHz

So, same speed as DDR5 Epyc?

Anonymous
08/30/25(Sat)13:30:46 No.106434307

Anonymous 08/30/25(Sat)13:30:46 No.106434307

https://www.reddit.com/r/LocalLLaMA/comments/1kgltqs/huawei_atlas_300i_32gb/
https://www.bilibili.com/video/BV1xB3TenE4s/
>>106434284
r u serious?
>>106434297
ok not bad i guess

Anonymous
08/30/25(Sat)13:30:53 No.106434311

Anonymous 08/30/25(Sat)13:30:53 No.106434311

File: Gzm635QbEAABZax.png (874 KB, 2481x3508)

874 KB PNG

Anonymous
08/30/25(Sat)13:32:25 No.106434327

Anonymous 08/30/25(Sat)13:32:25 No.106434327

>>106434311
ghey

Anonymous
08/30/25(Sat)13:33:22 No.106434340

Anonymous 08/30/25(Sat)13:33:22 No.106434340

https://github.com/hipudding/llama.cpp/issues/9
bros i want that card so bad.. this seems so sovlfvl

Anonymous
08/30/25(Sat)13:34:44 No.106434356

Anonymous 08/30/25(Sat)13:34:44 No.106434356

bros.. atlas 300i 96gb is 33% faster rtx 3060 but with 96gb
i kneel huawei..

Anonymous
08/30/25(Sat)13:35:28 No.106434366

Anonymous 08/30/25(Sat)13:35:28 No.106434366

>>106434170
https://item.m.jd.com/product/100169906999.html?ad_od=3
You need to be Chinese to access this though, anon. But it's not even legal to import anyways. Although on ebay there's some sus sellers claiming to have it for $3k ish.

Anonymous
08/30/25(Sat)13:36:23 No.106434376

Anonymous 08/30/25(Sat)13:36:23 No.106434376

>>106434366
>not legal to import
*in the land of the free
THIRD WORLDERS RISE UP!!!

Anonymous
08/30/25(Sat)13:36:30 No.106434378

Anonymous 08/30/25(Sat)13:36:30 No.106434378

>>106434297
Very small penis bandwidth, very sad. Tragic. B200 has 4.1TB/s.

Anonymous
08/30/25(Sat)13:37:28 No.106434389

Anonymous 08/30/25(Sat)13:37:28 No.106434389

>>106434378
>B200
And how much are those, huh?

Anonymous
08/30/25(Sat)13:38:01 No.106434393

Anonymous 08/30/25(Sat)13:38:01 No.106434393

>>106434378
3060 has
Bandwidth
360.0 GB/s

Anonymous
08/30/25(Sat)13:38:41 No.106434398

Anonymous 08/30/25(Sat)13:38:41 No.106434398

>>106434297
140 TFLOPS FP16 and 280 TOPS INT8, this is comparable to:
>NVIDIA Tesla V100 FP16 125-130 TFLOPS
>NVIDIA A40 FP16 149.7 TFLOPS INT8 299.3 TOPS
>NVIDIA A10 FP16 125 TFLOPS INT8 250 TOPS
>NVIDIA L4 FP16 121 TFLOPS INT8 242 TOPS

Anonymous
08/30/25(Sat)13:39:20 No.106434404

Anonymous 08/30/25(Sat)13:39:20 No.106434404

>>106434311
yay happy for her!

Anonymous
08/30/25(Sat)13:39:37 No.106434409

Anonymous 08/30/25(Sat)13:39:37 No.106434409

>>106434398
Isn't slow and cheap, but lots of VRAM exactly what everyone's been begging for?

Anonymous
08/30/25(Sat)13:40:18 No.106434415

Anonymous 08/30/25(Sat)13:40:18 No.106434415

>>106434409
yes, if i was not poor i would buy

Anonymous
08/30/25(Sat)13:41:04 No.106434423

Anonymous 08/30/25(Sat)13:41:04 No.106434423

>>106434409
No software support+no CUDA

Anonymous
08/30/25(Sat)13:41:37 No.106434430

Anonymous 08/30/25(Sat)13:41:37 No.106434430

>>106434389
Doesn't matter. You can't train big models on small penis bandwidth even if it's cheap.

Anonymous
08/30/25(Sat)13:41:51 No.106434432

Anonymous 08/30/25(Sat)13:41:51 No.106434432

>>106434409
>>106434423
https://github.com/ggml-org/llama.cpp/blob/master/docs/backend/CANN.md
HAPPENING ITS HAPPENING SISTERS ITS HAPPENING WERE BACK

Anonymous
08/30/25(Sat)13:41:57 No.106434433

Anonymous 08/30/25(Sat)13:41:57 No.106434433

>>106434423
It'll probably work on Vulkan with llama.cpp, but will be useless for anything else.

Anonymous
08/30/25(Sat)13:44:37 No.106434459

Anonymous 08/30/25(Sat)13:44:37 No.106434459

File: file.png (67 KB, 850x423)

67 KB PNG

sisters..
https://github.com/hipudding/llama.cpp/issues/9#issuecomment-2889743942

Anonymous
08/30/25(Sat)13:44:55 No.106434465

Anonymous 08/30/25(Sat)13:44:55 No.106434465

>>106434433
So no image gen+no video gen+no finetuning, only textgen?

Anonymous
08/30/25(Sat)13:46:19 No.106434480

Anonymous 08/30/25(Sat)13:46:19 No.106434480

>>106434465
and no audio and no tts and no stt, you could run whisper on cpu but would be fucked on all else

Anonymous
08/30/25(Sat)13:47:20 No.106434490

Anonymous 08/30/25(Sat)13:47:20 No.106434490

>>106434480
well everyone has at least one nvidia gpu with at least 8gb vram
thats enough for image gen or speech to text or text to speech
but if >>106434459 is true then its JOEVr

Anonymous
08/30/25(Sat)13:48:10 No.106434499

Anonymous 08/30/25(Sat)13:48:10 No.106434499

>>106434490
Maybe cuda man can make it go fast.

Anonymous
08/30/25(Sat)13:48:16 No.106434502

Anonymous 08/30/25(Sat)13:48:16 No.106434502

>>106434459
>CANN
more like CANNOT amrite?

Anonymous
08/30/25(Sat)13:49:22 No.106434519

Anonymous 08/30/25(Sat)13:49:22 No.106434519

cudadev what are your thoughts?
>>106434297
>>106434215
same FTOPS FLOPS TOPS or whatever as V100, memory bandwidth a bit faster than 3060, memory: 96gb
cudadev i remember you saying "if thers a good cheap card for 1500$ i'd buy and support it no matter what"
this is your time to shine cudadev!!1111

Anonymous
08/30/25(Sat)13:49:37 No.106434523

Anonymous 08/30/25(Sat)13:49:37 No.106434523

>>106434502
can or not

Anonymous
08/30/25(Sat)13:50:40 No.106434534

Anonymous 08/30/25(Sat)13:50:40 No.106434534

>>106434502
lmao gottem

Anonymous
08/30/25(Sat)13:52:58 No.106434557

Anonymous 08/30/25(Sat)13:52:58 No.106434557

>>106434356
That's pretty good. A MI50 is basically a 3060 with worse pp, and it is decent good for token generation.

Anonymous
08/30/25(Sat)13:54:29 No.106434578

Anonymous 08/30/25(Sat)13:54:29 No.106434578

https://www.alibaba.com/product-detail/New-Huaweis-Atlas-300I-DUO-96G_1601450236740.html
It really is just $1.4k, huh. But shitty performance. Maybe in a year they can cook up something better

Anonymous
08/30/25(Sat)13:56:21 No.106434596

Anonymous 08/30/25(Sat)13:56:21 No.106434596

>>106434578
its a 5 year old card
performance is comparable to V100/L4/A10/A40
memory bandwidth is a bit faster than RTX3060 (400GB/s)
memory capacity is comparable to RTX 6000 PRO 96GB BLACKWELL
software support in llama.cpp is very good

Anonymous
08/30/25(Sat)13:58:00 No.106434612

Anonymous 08/30/25(Sat)13:58:00 No.106434612

>>106434596
So we should just wait two more weeks, then?

Anonymous
08/30/25(Sat)13:58:48 No.106434619

Anonymous 08/30/25(Sat)13:58:48 No.106434619

>>106434612
buy gpu very cheap graphic card support Q8_0

Anonymous
08/30/25(Sat)14:01:34 No.106434644

Anonymous 08/30/25(Sat)14:01:34 No.106434644

>>106432378
I agree that I want robots I'm not talking to to shut the hell up. Just because I gen locally doesn't mean I want gemeni to sniff all over me and discover my birthplace.

Anonymous
08/30/25(Sat)14:01:45 No.106434648

Anonymous 08/30/25(Sat)14:01:45 No.106434648

>>106434430
That's not what I'd want it for. I ask again. How much is a B200?

Anonymous
08/30/25(Sat)14:02:31 No.106434655

Anonymous 08/30/25(Sat)14:02:31 No.106434655

File: file.png (544 KB, 1478x963)

544 KB PNG

p40 prices ***ON ALIBABA*** have fallen to their (EBAY) 2023 prices

Anonymous
08/30/25(Sat)14:06:49 No.106434691

Anonymous 08/30/25(Sat)14:06:49 No.106434691

>>106434430
Sure you can. It will take longer, but when the card costs 20x less than NVIDIA's offering that might be OK.
Even then, training is often limited by the GPU fabric network speed. NVIDIA's BlueField 3 NIC does 400gbit RDMA between nodes, and there's generally one BF3 NIC per GPU.

Anonymous
08/30/25(Sat)14:07:49 No.106434705

Anonymous 08/30/25(Sat)14:07:49 No.106434705

>>106434655
I have one of these that was given to me.
What's the best stuff I can run on it?

Anonymous
08/30/25(Sat)14:09:07 No.106434723

Anonymous 08/30/25(Sat)14:09:07 No.106434723

>>106434705
Who gave it to you?

Anonymous
08/30/25(Sat)14:13:06 No.106434754

Anonymous 08/30/25(Sat)14:13:06 No.106434754

>>106434723
I did.

Anonymous
08/30/25(Sat)14:13:18 No.106434757

Anonymous 08/30/25(Sat)14:13:18 No.106434757

>>106431915
256 cores seems like overkill when we are currently choking with 32 cores according to benchmarks.

Anonymous
08/30/25(Sat)14:13:34 No.106434761

Anonymous 08/30/25(Sat)14:13:34 No.106434761

>>106434705
Qwen3 30b. Good for general purpose and blazing fast.

Anonymous
08/30/25(Sat)14:17:32 No.106434802

Anonymous 08/30/25(Sat)14:17:32 No.106434802

>>106434757
You prompt processing?

Anonymous
08/30/25(Sat)14:25:07 No.106434873

Anonymous 08/30/25(Sat)14:25:07 No.106434873

>>106433319
nemotron-nano-v2?

Anonymous
08/30/25(Sat)14:32:13 No.106434938

Anonymous 08/30/25(Sat)14:32:13 No.106434938

How do you organize your assets like loras and models in your filesystem? Do you do a subfolder in your backend files? Where on your drive do you often store them?
I made a folder right off my Home for AI in general but I'm not 100% pleased with it yet.

Anonymous
08/30/25(Sat)14:33:40 No.106434951

Anonymous 08/30/25(Sat)14:33:40 No.106434951

>>106434938
I have an nvme that I mount to /mnt/models.

Anonymous
08/30/25(Sat)14:33:49 No.106434953

Anonymous 08/30/25(Sat)14:33:49 No.106434953

>>106434723
My job. It's not exactly a modern card that the AI platform guys are going to want to use.

>>106434761
Thank you

Anonymous
08/30/25(Sat)14:35:24 No.106434962

Anonymous 08/30/25(Sat)14:35:24 No.106434962

>>106434225
sorry as infrequent fake hobbyist coder totally forgot about planning

Anonymous
08/30/25(Sat)14:37:12 No.106434980

Anonymous 08/30/25(Sat)14:37:12 No.106434980

File: Screenshot 2025-08-30 123408.png (153 KB, 718x1235)

153 KB PNG

https://huggingface.co/meituan-longcat/LongCat-Flash-Chat
New Chinese non-thinking 560B model. MoE with a "dynamic computation mechanism". Experts have different sizes but on average are 27B. All comparison models are in non-thinking. Uses a weird context template where it counts the turns. Expect Sillytavern to never add that.

Anonymous
08/30/25(Sat)14:38:16 No.106434990

Anonymous 08/30/25(Sat)14:38:16 No.106434990

>>106434980
A cat is fine too

Anonymous
08/30/25(Sat)14:39:51 No.106435000

Anonymous 08/30/25(Sat)14:39:51 No.106435000

>>106434980
Okay, that's very nice and all, but how (((safe))) is it?

Anonymous
08/30/25(Sat)14:40:53 No.106435007

Anonymous 08/30/25(Sat)14:40:53 No.106435007

>>106434938
Imagegen loras were arranged to their own directories with .txt files. But using loras is a bad thing, never used them too much.

Anonymous
08/30/25(Sat)14:42:12 No.106435017

Anonymous 08/30/25(Sat)14:42:12 No.106435017

File: vram type.png (3 KB, 355x98)

3 KB PNG

>>106434227
Go fuck yourself sideways ranjit.

Anonymous
08/30/25(Sat)14:45:31 No.106435052

Anonymous 08/30/25(Sat)14:45:31 No.106435052

>>106434980
gguf status?

Anonymous
08/30/25(Sat)14:46:18 No.106435059

Anonymous 08/30/25(Sat)14:46:18 No.106435059

File: file.png (40 KB, 1667x396)

40 KB PNG

>>106434953
i store old models worth archiving on my 3TB raid 1 drive
other than that i store nsfw loras in my encrypted ssd (for non LLMs)
on my encrypted ssd i store a few models on my ext4 partition and most models on my ntfs partition (i access it through my ext4 chroot, and i have a custom chroot inside ntfs that's debian 12, my ext4 partition is actually a working debian install)
>inb4 why ntfs
i installed wangblows on USB for vr purposes, and had games installed on the NVME drive

Anonymous
08/30/25(Sat)14:47:28 No.106435069

Anonymous 08/30/25(Sat)14:47:28 No.106435069

>>106435059
>wangblows
Reddit is the other way.

Anonymous
08/30/25(Sat)14:48:00 No.106435074

Anonymous 08/30/25(Sat)14:48:00 No.106435074

File: Screenshot 2025-08-30 124615.png (21 KB, 737x229)

21 KB PNG

>>106435000
L -> R: Deepseek V3.1, Qwen3 2507, Kimi K2, Sonnet 4, 2.5 Flash, LongCat
>>106435052
Don't think it has llama.cpp support. Seems like a new architecture. There will probably be a pull request in a day or two

Anonymous
08/30/25(Sat)14:48:20 No.106435078

Anonymous 08/30/25(Sat)14:48:20 No.106435078

>>106435059
>on my encrypted ssd i store a few models on my ext4 partition and most models on my ntfs partition (i access it through my ext4 chroot, and i have a custom chroot inside ntfs that's debian 12, my ext4 partition is actually a working debian install)
on my unencrypted ssd*
>>106435069
but i dont have wintroons installed on my computer rn, the usb is on a desk behind me

Anonymous
08/30/25(Sat)14:49:27 No.106435089

Anonymous 08/30/25(Sat)14:49:27 No.106435089

>>106435074
More cucked than Sonnet, great.

Anonymous
08/30/25(Sat)14:49:39 No.106435097

Anonymous 08/30/25(Sat)14:49:39 No.106435097

File: file.png (114 KB, 797x448)

114 KB PNG

>>106435000
bretty cool

Anonymous
08/30/25(Sat)14:50:57 No.106435112

Anonymous 08/30/25(Sat)14:50:57 No.106435112

File: file.png (41 KB, 792x102)

41 KB PNG

>>106434980
20 trillion tokens interesting

Anonymous
08/30/25(Sat)14:52:19 No.106435126

Anonymous 08/30/25(Sat)14:52:19 No.106435126

File: file.png (51 KB, 909x190)

51 KB PNG

>>106435112
native 8k context, extended later hmm
>>106435115
but i dont evne have wintroondows installed on my pc.. im linux god..

Anonymous
08/30/25(Sat)14:52:36 No.106435129

Anonymous 08/30/25(Sat)14:52:36 No.106435129

>>106435112
20 trillion tokens for training data? Divide this by 3 or 4 to get approximate word count.

Anonymous
08/30/25(Sat)14:52:49 No.106435135

Anonymous 08/30/25(Sat)14:52:49 No.106435135

>>106434980
>Experts have different sizes but on average are 27B. All comparison models are in non-thinking.
Wow this sounds like something llama.cpp isn't going to implement

Anonymous
08/30/25(Sat)14:54:12 No.106435146

Anonymous 08/30/25(Sat)14:54:12 No.106435146

>>106435126
>no mentions of filtering pretraining dataset
great, so they only cucked the instruct
we might have hope!!!

Anonymous
08/30/25(Sat)14:55:59 No.106435159

Anonymous 08/30/25(Sat)14:55:59 No.106435159

>>106435126
That's a pretty common training practice. Check the GLM and Deepseek papers and you'll find the same thing. Easier to train it by starting small

Anonymous
08/30/25(Sat)14:57:40 No.106435179

Anonymous 08/30/25(Sat)14:57:40 No.106435179

>>106435159
yes at least they didnt train at 2k native
i was impressed that they trained at 8k natively

Anonymous
08/30/25(Sat)14:58:03 No.106435183

Anonymous 08/30/25(Sat)14:58:03 No.106435183

File: 1727670741585763.png (709 KB, 680x678)

709 KB PNG

>>106435059
>john
>insideChrootAnnouncement

im in

Anonymous
08/30/25(Sat)15:00:03 No.106435196

Anonymous 08/30/25(Sat)15:00:03 No.106435196

>>106435159
Yep.
Makes me wonder if google's secret sauce is something as simple as pretraining on longer sequences instead of doing so in a later step.

Anonymous
08/30/25(Sat)15:01:31 No.106435213

Anonymous 08/30/25(Sat)15:01:31 No.106435213

>>106435129
6 trillion? Oh goy...

Anonymous
08/30/25(Sat)15:02:56 No.106435227

Anonymous 08/30/25(Sat)15:02:56 No.106435227

>>106435059
Why do you have nsfw loras on an encrypted SSD? Are they illegal where you are?

Anonymous
08/30/25(Sat)15:04:23 No.106435240

Anonymous 08/30/25(Sat)15:04:23 No.106435240

>>106435159
The computational costs of attention increase with the square of context size, and after a certain threshold they become the main bottleneck.

Anonymous
08/30/25(Sat)15:04:24 No.106435241

Anonymous 08/30/25(Sat)15:04:24 No.106435241

>>106435112
>>106435129
20T tokes doesn't matter when Llama-4 Scout was trained on 40T and Maverick was done using 22T.
Also it's insane that someone in the Meta department thought training a larger model on less would be smart. There must be so much office politics involved in that shitshow

Anonymous
08/30/25(Sat)15:04:28 No.106435242

Anonymous 08/30/25(Sat)15:04:28 No.106435242

>>106435213
6 trillion words stolen from the real judean authors!

Anonymous
08/30/25(Sat)15:05:41 No.106435257

Anonymous 08/30/25(Sat)15:05:41 No.106435257

>>106435241
Hard to say. They can probably use the same dataset multiple times.

llama.cpp CUDA dev !!yhbFjk57TDr
08/30/25(Sat)15:05:44 No.106435258

llama.cpp CUDA dev !!yhbFjk57TDr 08/30/25(Sat)15:05:44 No.106435258

>>106434519
I'm considering ordering one, for something that costs $1000+ I'll first need to look into it a bit more closely.

Anonymous
08/30/25(Sat)15:05:54 No.106435262

Anonymous 08/30/25(Sat)15:05:54 No.106435262

>>106435112
>agentic intelligence

Anonymous
08/30/25(Sat)15:07:03 No.106435280

Anonymous 08/30/25(Sat)15:07:03 No.106435280

>>106435241
i still wonder what the fuck scout was trained on to be so shit, 40T tokens of what?? if its so fucking retarded and filtered WHAT did they feed it
>>106435227
they arent illegal but i want to feel extra safe because you never know what laws the EU will implement, or if theres already some kind of law that could be applied in a case

Anonymous
08/30/25(Sat)15:09:02 No.106435300

Anonymous 08/30/25(Sat)15:09:02 No.106435300

>>106435258
It's still faster than regular cpu+ram plus it has cuda cores.
Life would be so much better if x86 would have adopted something like SGI did with Octanes and its servers.

Anonymous
08/30/25(Sat)15:09:40 No.106435307

Anonymous 08/30/25(Sat)15:09:40 No.106435307

File: file.png (19 KB, 1570x243)

19 KB PNG

>>106435183
it took me a few hours to make a script that will sign into the user, cd into ~, activate the .bashrc and do everything as if i logged into a user normally
:'(

Anonymous
08/30/25(Sat)15:09:48 No.106435311

Anonymous 08/30/25(Sat)15:09:48 No.106435311

>>106435280
>A mix of publicly available, licensed data and information from Meta's products and services. This includes publicly shared posts from Instagram and Facebook and people's interactions with Meta AI.
The raw unmarked and unannotated posts by your average Facebook user.

Anonymous
08/30/25(Sat)15:10:23 No.106435320

Anonymous 08/30/25(Sat)15:10:23 No.106435320

>>106435300
It does not have CUDA cores

Anonymous
08/30/25(Sat)15:11:06 No.106435323

Anonymous 08/30/25(Sat)15:11:06 No.106435323

>>106435280
>i still wonder what the fuck scout was trained on to be so shit, 40T tokens of what?? if its so fucking retarded and filtered WHAT did they feed it
Even the base model (Scout) seemed deep-fried. they might have trained it on several epochs of safe "high-quality" data.

Anonymous
08/30/25(Sat)15:12:03 No.106435331

Anonymous 08/30/25(Sat)15:12:03 No.106435331

>>106435320
Oh well it's nothing but a pci-e ram expansion card then.

Anonymous
08/30/25(Sat)15:13:02 No.106435343

Anonymous 08/30/25(Sat)15:13:02 No.106435343

>>106435311
facebook boomer model would be kino, but i bet all that was lobotomized out during post training

Anonymous
08/30/25(Sat)15:14:02 No.106435351

Anonymous 08/30/25(Sat)15:14:02 No.106435351

>>106435331
Well, it has Huawei AI cores and processors.

Anonymous
08/30/25(Sat)15:15:10 No.106435358

Anonymous 08/30/25(Sat)15:15:10 No.106435358

File: file.png (98 KB, 936x907)

98 KB PNG

bros we're back
>https://longcat.chat/t
https://longcat.chat/t
>https://longcat.chat/t
https://longcat.chat/t
>>106435323
>>106435311
i felt the same vibe with the first qwen3 models, i remember the day when it released >18T TOKENS WAOW
>oh math and coding 70%
it was good at those 2 things but trash for roleplay, even the newer qwen3 (2507) models suck ass at roleplay
why do i have such a terrible feeling when using qwen3 models?

Anonymous
08/30/25(Sat)15:15:22 No.106435359

Anonymous 08/30/25(Sat)15:15:22 No.106435359

>>106435351
Yeah sure but you get the gist. Are these even supported and if so, I'd like to see some drivers? What about desktop usage?

Anonymous
08/30/25(Sat)15:15:35 No.106435362

Anonymous 08/30/25(Sat)15:15:35 No.106435362

>>106435280
They got into a feud with the EU because they planned to use Facebook user data to train Llama 4.

Anonymous
08/30/25(Sat)15:15:35 No.106435363

Anonymous 08/30/25(Sat)15:15:35 No.106435363

>>106435323
Don't remember where I heard this but I think Goliath was the base model, Scout and Maverick were distillations of it. They're never going to release that 2T model for the sole reason of it being absolutely dogshit and use its size as an excuse.

Anonymous
08/30/25(Sat)15:16:36 No.106435369

Anonymous 08/30/25(Sat)15:16:36 No.106435369

File: name-probs-bases.png (31 KB, 830x1036)

31 KB PNG

>>106435323
Scout base was a fake base like qwen bases.

Anonymous
08/30/25(Sat)15:16:53 No.106435371

Anonymous 08/30/25(Sat)15:16:53 No.106435371

>>106435358
>bros we're back
thanks bro, wouldn't know about it without you reposting from reddit half an hour after it was posted here

Anonymous
08/30/25(Sat)15:17:34 No.106435377

Anonymous 08/30/25(Sat)15:17:34 No.106435377

>>106435362
Don't forget that they got into a copyright lawsuit in California a week before release because they torrented 90TB of novels from AnnasArchive. They probably had do some janky reverse-training to wipe its mind of all the novel knowledge.

Anonymous
08/30/25(Sat)15:17:56 No.106435382

Anonymous 08/30/25(Sat)15:17:56 No.106435382

>>106435371
what? i took the screenshot just now

Anonymous
08/30/25(Sat)15:18:26 No.106435385

Anonymous 08/30/25(Sat)15:18:26 No.106435385

File: 1747783351835.png (1010 KB, 1317x734)

1010 KB PNG

>>106435241
>Also it's insane that someone in the Meta department thought training a larger model on less would be smart.
product is very good saar

Anonymous
08/30/25(Sat)15:21:02 No.106435403

Anonymous 08/30/25(Sat)15:21:02 No.106435403

>>106435385
>Zucc hasn't fired that jeet
Meta is hopeless.

Anonymous
08/30/25(Sat)15:21:15 No.106435407

Anonymous 08/30/25(Sat)15:21:15 No.106435407

File: Screenshot 2025-08-30 131941.png (19 KB, 843x305)

19 KB PNG

>>106435358
>>106434980
>Who is Billie Eilish
All I wanted to do was test its trivia abilities and it started hallucinating links. First one is broken, second works.

Anonymous
08/30/25(Sat)15:27:23 No.106435475

Anonymous 08/30/25(Sat)15:27:23 No.106435475

>>106435407
Okay so it has good niche trivia knowledge but it seems to prefer answering in Chinese depending on the question. Appending "Answer in English" solves it.

Anonymous
08/30/25(Sat)15:27:35 No.106435476

Anonymous 08/30/25(Sat)15:27:35 No.106435476

>>106435363
It was distilled from an incomplete checkpoint of behemoth, or so the story goes.
Also, most of the data in their mix was synthetic data IIRC.

Anonymous
08/30/25(Sat)15:28:54 No.106435488

Anonymous 08/30/25(Sat)15:28:54 No.106435488

>>106435475
Nevermind. DOA
>(Note: Always question why certain labels are applied disproportionately to women and minorities.)

Anonymous
08/30/25(Sat)15:31:08 No.106435504

Anonymous 08/30/25(Sat)15:31:08 No.106435504

File: 1729808166468345.png (13 KB, 266x239)

13 KB PNG

>>106435358
it's over

Anonymous
08/30/25(Sat)15:32:20 No.106435512

Anonymous 08/30/25(Sat)15:32:20 No.106435512

>>106435504
You're absolutely right.

Anonymous
08/30/25(Sat)15:33:13 No.106435519

Anonymous 08/30/25(Sat)15:33:13 No.106435519

Sparsest model yet though
> "n_routed_experts": 512,

Anonymous
08/30/25(Sat)15:33:19 No.106435520

Anonymous 08/30/25(Sat)15:33:19 No.106435520

File: file.png (22 KB, 726x142)

22 KB PNG

has no one noticed this cockroach?

Anonymous
08/30/25(Sat)15:34:28 No.106435528

Anonymous 08/30/25(Sat)15:34:28 No.106435528

>>106435520
go bak

Anonymous
08/30/25(Sat)15:34:44 No.106435534

Anonymous 08/30/25(Sat)15:34:44 No.106435534

>>106435476
>Also, most of the data in their mix was synthetic data IIRC.
>Llama 3.3 70B, take this boomer's dementia-riddled facebook ranting about his walk in the park and make 500 variations.
Only the highest quality data for Llama 4.

Anonymous
08/30/25(Sat)15:35:12 No.106435537

Anonymous 08/30/25(Sat)15:35:12 No.106435537

>>106435528
meant for >>106434144

Anonymous
08/30/25(Sat)15:36:14 No.106435544

Anonymous 08/30/25(Sat)15:36:14 No.106435544

>>106435115
I called you reddit for using a term like "wangblows"
Which exudes insecurity.

Anonymous
08/30/25(Sat)15:37:32 No.106435556

Anonymous 08/30/25(Sat)15:37:32 No.106435556

File: file.png (30 KB, 816x179)

30 KB PNG

>>106435544
go back wintoddler

Anonymous
08/30/25(Sat)15:39:01 No.106435568

Anonymous 08/30/25(Sat)15:39:01 No.106435568

>>106435534
Pretty much that, yeah.
And I bet they didn't even have the decency to hire a cpuple of kenyans to go through the augumented data after to spot obvious flaws.

Anonymous
08/30/25(Sat)15:39:26 No.106435573

Anonymous 08/30/25(Sat)15:39:26 No.106435573

File: 737354413559.jpg (176 KB, 1825x894)

176 KB JPG

>>106435358
Blah blah 20 trillion tokens dynamic computation mecha-AAAAAAAAACCCKKKK!!!!!

Anonymous
08/30/25(Sat)15:40:52 No.106435585

Anonymous 08/30/25(Sat)15:40:52 No.106435585

>>106435573
They're all basically just circle jerk training off of everyone else's models outputs at this point. Don't expect anything fun and unique ever again.

Anonymous
08/30/25(Sat)15:41:49 No.106435591

Anonymous 08/30/25(Sat)15:41:49 No.106435591

File: file.png (290 KB, 640x670)

290 KB PNG

>>106435576

Anonymous
08/30/25(Sat)15:42:13 No.106435598

Anonymous 08/30/25(Sat)15:42:13 No.106435598

>>106435573
I think it's hilarious how that's a legit test for intelligence/generalization.
Seems like the only models capable of responding that, fail even yhe slightest variation, meaning that those were trained on that specific iteration.
Holarious.

Anonymous
08/30/25(Sat)15:42:40 No.106435604

Anonymous 08/30/25(Sat)15:42:40 No.106435604

File: digestible.png (111 KB, 844x373)

111 KB PNG

>>106435534
Synthetic data is very digestible.

Anonymous
08/30/25(Sat)15:42:46 No.106435606

Anonymous 08/30/25(Sat)15:42:46 No.106435606

>>106430361
>>Some of the new AI researchers recently brought in from OpenAI have already left Meta
they were offered millions of $$ for their positions, right? was the money paid upfront? I'm wondering if zuck was scammed lmao

Anonymous
08/30/25(Sat)15:45:06 No.106435620

Anonymous 08/30/25(Sat)15:45:06 No.106435620

>>106435258
If you are thinking about programming for this GPU, look into the CANN documentation. It's only available in Chinese last I checked, and kinda shit.

Anonymous
08/30/25(Sat)15:45:06 No.106435621

Anonymous 08/30/25(Sat)15:45:06 No.106435621

lol
lmao
>>106432193

Anonymous
08/30/25(Sat)15:49:46 No.106435665

Anonymous 08/30/25(Sat)15:49:46 No.106435665

>>106435621
see >>106432623

Anonymous
08/30/25(Sat)15:49:57 No.106435668

Anonymous 08/30/25(Sat)15:49:57 No.106435668

>>106435621
>implying that the xai jeet code has any value

Anonymous
08/30/25(Sat)15:51:41 No.106435689

Anonymous 08/30/25(Sat)15:51:41 No.106435689

>You're a "Scholar" or a "Connoisseur" of Technology
>The Joy is in the Learning, Not the Output
>Analysis Paralysis & The Curse of Knowledge
how many of you ITT suffer from this? i do

Anonymous
08/30/25(Sat)15:52:14 No.106435692

Anonymous 08/30/25(Sat)15:52:14 No.106435692

>>106435668
Probably not, but would have been fun to see it out in the open if the dude had decided to leak it instead of selling.

Anonymous
08/30/25(Sat)15:55:11 No.106435709

Anonymous 08/30/25(Sat)15:55:11 No.106435709

>>106435689
I suffer from an excess of desire coupled with the lack of true motivation to see me through the end.

Anonymous
08/30/25(Sat)15:57:52 No.106435734

Anonymous 08/30/25(Sat)15:57:52 No.106435734

>>106435689
Only analysis paralysis. It's probably more of a motivation issue, but if I force myself to take the first step to doing something I always see it through.

Anonymous
08/30/25(Sat)16:04:42 No.106435796

Anonymous 08/30/25(Sat)16:04:42 No.106435796

File: 033.jpg (190 KB, 1024x576)

190 KB JPG

>>106435358
>googling "How to Raise a Mesugaki" returns nothing

Anonymous
08/30/25(Sat)16:14:20 No.106435891

Anonymous 08/30/25(Sat)16:14:20 No.106435891

>>106434980
>Experts have different sizes
Oh cool, so someone tried the idea I've been posting about for a while. But did their implementation work out well is the question now.

Anonymous
08/30/25(Sat)16:17:19 No.106435914

Anonymous 08/30/25(Sat)16:17:19 No.106435914

>>106435891
It would be cool to see two models, one a traditional MoE and another with this Dynamic total param allocation tech trained on the same data and with the same (average) number of params activated to compare.

Anonymous
08/30/25(Sat)16:20:39 No.106435935

Anonymous 08/30/25(Sat)16:20:39 No.106435935

>>106429101
>Marvis TTS released
saw some examples of this, really poor quality voices but seemingly very fast
>VibeVoice TTS
this looks very interesting and I want to play around with it but without voice cloning it's kinda useless for my uses

seems like GPT-Sovits v4 is still the best local option out there

Anonymous
08/30/25(Sat)16:29:44 No.106436015

Anonymous 08/30/25(Sat)16:29:44 No.106436015

>>106429701
>>106429934
Because that's what the majority of the code it's trained on has (is my assumption). If most if not all of the code in the training has comments, then you telling it not to do that will have little effect.

Anonymous
08/30/25(Sat)16:45:51 No.106436129

Anonymous 08/30/25(Sat)16:45:51 No.106436129

>>106436015
It also prpbably helps with steering the model during training, correlating a prompt for doing x with a comment about doing x and the code that does x.

Anonymous
08/30/25(Sat)16:51:24 No.106436168

Anonymous 08/30/25(Sat)16:51:24 No.106436168

>tfw still no good speech to speech models that preserve text capabilities

Anonymous
08/30/25(Sat)16:53:09 No.106436184

Anonymous 08/30/25(Sat)16:53:09 No.106436184

https://youtu.be/B2482h_TNwg
holy shit

Anonymous
08/30/25(Sat)16:58:49 No.106436217

Anonymous 08/30/25(Sat)16:58:49 No.106436217

>>106436184
EUV lithography is "holy shit" worthy for sure.
Such insanely complex and precise machinery that it hits into several quantum physics consideration sounds like fucking science fiction.

Anonymous
08/30/25(Sat)16:58:54 No.106436218

Anonymous 08/30/25(Sat)16:58:54 No.106436218

Feel like I haven't had a decent model to that can actually run on my computer in a year

Anonymous
08/30/25(Sat)16:59:51 No.106436226

Anonymous 08/30/25(Sat)16:59:51 No.106436226

>>106436218
Do you not have 64gb of RAM?
If so, glm air is pretty good.

Anonymous
08/30/25(Sat)17:00:47 No.106436233

Anonymous 08/30/25(Sat)17:00:47 No.106436233

>>106436226
24 vram 32 ram

Anonymous
08/30/25(Sat)17:04:21 No.106436257

Anonymous 08/30/25(Sat)17:04:21 No.106436257

>>106436233
Oof.
Get 64gb of ram if you can my man.
I think you can run air q3ks, maybe?

Anonymous
08/30/25(Sat)17:07:47 No.106436276

Anonymous 08/30/25(Sat)17:07:47 No.106436276

>>106436257
It'll be a while before I have spare money again. I could give it a try though but I expect anything to be shit at Q3

Anonymous
08/30/25(Sat)17:11:15 No.106436301

Anonymous 08/30/25(Sat)17:11:15 No.106436301

>>106436276
I haven't tried using it for anything serious, but q3km (with some topk) seemed perfectly usable, which is kind of impressive considering the number of activated params.

Anonymous
08/30/25(Sat)17:16:01 No.106436339

Anonymous 08/30/25(Sat)17:16:01 No.106436339

>>106435935
wait wtf I read the project page and read this
>Use in any manner that violates applicable laws or regulations (including trade compliance laws). Use in any other way that is prohibited by MIT License. Use to generate any text transcript. Furthermore, this release is not intended or licensed for any of the following scenarios:

>Voice impersonation without explicit, recorded consent – cloning a real individual’s voice for satire, advertising, ransom, social‑engineering, or authentication bypass.
So I believed there was no option for voice cloning but from watching a couple vids it seems you only need to drop a 50 second sample into the sample folder and you will get instant clone?
pretty sly from Microsoft LMAO

Anonymous
08/30/25(Sat)17:16:10 No.106436344

Anonymous 08/30/25(Sat)17:16:10 No.106436344

>>106376303
unbelievably kino gen

Anonymous
08/30/25(Sat)17:17:03 No.106436351

Anonymous 08/30/25(Sat)17:17:03 No.106436351

>>106436338
>>106436338
>>106436338

Anonymous
08/30/25(Sat)17:20:28 No.106436383

Anonymous 08/30/25(Sat)17:20:28 No.106436383

>>106436301
I think I'd have to settle for ks but good to know

Anonymous
08/30/25(Sat)17:29:02 No.106436441

Anonymous 08/30/25(Sat)17:29:02 No.106436441

>>106435935
they seem to be working on it
https://github.com/microsoft/VibeVoice/issues/3

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.