/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 02/20/26(Fri)04:54:26 No.108194845

File: 1289005993449.jpg (195 KB, 800x1100)

195 KB JPG

/lmg/ - Local Models General Anonymous 02/20/26(Fri)04:54:26 No.108194845

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108186120 & >>108175259

►News
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/20/26(Fri)04:55:12 No.108194853

Anonymous 02/20/26(Fri)04:55:12 No.108194853

File: luka vocaloid potato chip(...).jpg (446 KB, 3112x3022)

446 KB JPG

►Recent Highlights from the Previous Thread: >>108186120

--Papers:
>108194842
--Porting IQ*_K and IQ*_KS quants from ik_llama.cpp:
>108186634 >108186693 >108186827 >108186850 >108186897 >108186914 >108186933 >108186936 >108186941 >108186989 >108187058 >108187118 >108187178 >108187221 >108187228 >108187242 >108186814 >108186873 >108192527 >108192588 >108192730 >108192752 >108192773 >108192835
--Debating AI memory and writing style solutions:
>108190281 >108190382 >108190418 >108190459 >108191423 >108190475 >108191738
--Debating em dashes as AI writing indicator:
>108187936 >108187995 >108188031 >108188139 >108188143 >108188163 >108188165 >108188210 >108188294 >108188720
--Extracting LoRAs from finetuned models using MergeKit:
>108188651 >108188666 >108188671 >108188685 >108188728 >108188763 >108188832 >108189163 >108188772 >108189938
--Jetson Orin Nano LLM struggles, domain-specific models debated:
>108192160 >108192167 >108192227 >108192251 >108192283 >108192321 >108192371 >108192382
--Fine-tuning advancements and niche dataset challenges:
>108192287 >108192339 >108192388 >108192340
--Claude Code agents making unauthorized changes and self-responding:
>108194060 >108194065
--Minimax M2.5 outperforms GPT-OSS-120B and Qwen-Coder-Next in server automation task:
>108193519 >108193621
--Qwen3.5's self-monitoring reasoning process:
>108192528 >108192541 >108192595
--LLM Arena adds open-source filter but lacks parameter size options:
>108191494 >108191596
--Qwen3.5 non-thinking syntax context handling:
>108188539 >108188567 >108188668 >108188710 >108188659
--Google's timesfm-2.5-200m-transformers model release:
>108192228 >108192238 >108192308
--mlx-lm overtaking llama.cpp for Mac Studio model support:
>108188625
--Rin and Miku (free space):
>108187257 >108187262 >108187464 >108187541 >108187623 >108188448 >108190034 >108192219 >108193602

►Recent Highlight Posts from the Previous Thread: >>108186122

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/20/26(Fri)05:13:08 No.108194930

Anonymous 02/20/26(Fri)05:13:08 No.108194930

Has anyone tested how much (or if) RAG can decrease the perplexity of a completion model?

Anonymous
02/20/26(Fri)05:16:27 No.108194939

Anonymous 02/20/26(Fri)05:16:27 No.108194939

>>108194930
Decrease the perplexity relative to what? To the contents you just shoved in the context? Yes. Presumably it would.
Are you sure you understand what you're asking?

Anonymous
02/20/26(Fri)05:28:29 No.108194991

Anonymous 02/20/26(Fri)05:28:29 No.108194991

Last time I was here, people here still were disappointed with Llama fiasco, what changed since it?
What is meta for small models(5b-20b)?

Anonymous
02/20/26(Fri)05:29:21 No.108194993

Anonymous 02/20/26(Fri)05:29:21 No.108194993

lol finetrooners are funny
https://blog.dphn.ai/xgen-rl/
was wondering what some of the early niggas were doing these days now that the only living, staying relic has been drummer
>What’s more, we found the model to be MORE censored than the base model itself, achieving a much higher rate of refusal on our benchmark, which generates replies from the model in both multi-turn and single-turn scenarios and evaluates responses via a classifier.
the dolphin guy puts so much "effort" into his "uncensors" when a prefill on a regular model does a far better job than he ever did (and for promptlets, there's always heretic, which is.. decent)

Anonymous
02/20/26(Fri)05:31:57 No.108195001

Anonymous 02/20/26(Fri)05:31:57 No.108195001

>>108194939
To the text being completed, given some standard (constant) generic knowledge base or search engine.
Suppose you have some text. You could concatenate the actual text to the information retrieved by RAG and compare.
So the question is how does perplexity(text, probs(text), (0,len(text))) compare to perplexity(concat(RAG(text), text), probs(concat(RAG(text),text)), (len(RAG(text)),len(RAG(text))+len(text)))

Anonymous
02/20/26(Fri)05:33:02 No.108195012

Anonymous 02/20/26(Fri)05:33:02 No.108195012

>>108195001
ai psychosis

Anonymous
02/20/26(Fri)05:36:03 No.108195023

Anonymous 02/20/26(Fri)05:36:03 No.108195023

GLAWKSS

Anonymous
02/20/26(Fri)05:38:48 No.108195029

Anonymous 02/20/26(Fri)05:38:48 No.108195029

>>108194991
Same as always, Gemma 3 or Mistral.

Anonymous
02/20/26(Fri)05:38:56 No.108195031

Anonymous 02/20/26(Fri)05:38:56 No.108195031

>>108195001
Sigh... if you shove information you know to be correct in the context, the language model is more likely to output correct information. All models are completion models.

Anonymous
02/20/26(Fri)05:39:30 No.108195034

Anonymous 02/20/26(Fri)05:39:30 No.108195034

Gemma 3+1 soon

Anonymous
02/20/26(Fri)05:42:07 No.108195044

Anonymous 02/20/26(Fri)05:42:07 No.108195044

>>108195034
Gemma3+mc^2

Anonymous
02/20/26(Fri)05:45:37 No.108195055

Anonymous 02/20/26(Fri)05:45:37 No.108195055

>>108194993
I wouldn't be surprised if western companies explicitly train their models to detect disruption from tuning/adapters.

Anonymous
02/20/26(Fri)05:46:23 No.108195060

Anonymous 02/20/26(Fri)05:46:23 No.108195060

>>108194845
*tickle tickle*

Anonymous
02/20/26(Fri)05:46:26 No.108195061

Anonymous 02/20/26(Fri)05:46:26 No.108195061

File: completion-model-rag.png (462 KB, 1360x2348)

462 KB PNG

>>108195012
I'm sorry for your developmental difficulties dude.

Anonymous
02/20/26(Fri)05:47:57 No.108195067

Anonymous 02/20/26(Fri)05:47:57 No.108195067

>machine, confirm my beliefs!

Anonymous
02/20/26(Fri)05:49:05 No.108195071

Anonymous 02/20/26(Fri)05:49:05 No.108195071

>>108195031
Not necessarily. It could cause the model to imitate the style of the reference documents rather than the actual text. Or it could decrease the performance of the model due to longer context.

Anonymous
02/20/26(Fri)05:49:17 No.108195072

Anonymous 02/20/26(Fri)05:49:17 No.108195072

>>108195061
ai psychosis

Anonymous
02/20/26(Fri)05:50:58 No.108195076

Anonymous 02/20/26(Fri)05:50:58 No.108195076

>>108195072
>>108195067
You are literally dumber than said machine.

Anonymous
02/20/26(Fri)05:51:44 No.108195077

Anonymous 02/20/26(Fri)05:51:44 No.108195077

said the retard asking an llm

Anonymous
02/20/26(Fri)05:52:34 No.108195081

Anonymous 02/20/26(Fri)05:52:34 No.108195081

>>108195076
ai psychosis

Anonymous
02/20/26(Fri)05:53:03 No.108195084

Anonymous 02/20/26(Fri)05:53:03 No.108195084

>>108195061
Anon. I'm the first User 2 (>>108194939). Someone else is the second.
Neither the model or you figured that out.
>>108195071
>could could could
>>108195031
>Sigh... if you shove information you know to be correct in the context, the language model is more likely to output correct information. All models are completion models.
still stands.

Anonymous
02/20/26(Fri)05:54:34 No.108195092

Anonymous 02/20/26(Fri)05:54:34 No.108195092

>>108195077
I already knew the answer. I just asked the machine to explain to you why you were wrong for your own benefit.

Anonymous
02/20/26(Fri)05:56:48 No.108195100

Anonymous 02/20/26(Fri)05:56:48 No.108195100

>>108195092
ai psychosis

Anonymous
02/20/26(Fri)05:57:44 No.108195103

Anonymous 02/20/26(Fri)05:57:44 No.108195103

>>108195084
And I had to spell it out for you because you were too dumb to understand what I was even asking. So I guess we're even.

Anonymous
02/20/26(Fri)05:58:45 No.108195106

Anonymous 02/20/26(Fri)05:58:45 No.108195106

>>108195100
nigger

Anonymous
02/20/26(Fri)06:00:37 No.108195113

Anonymous 02/20/26(Fri)06:00:37 No.108195113

>>108195103
I said the same thing, rephrased, in the first post.
See >>108194939
>To the contents you just shoved in the context? Yes. Presumably it would.

Anonymous
02/20/26(Fri)06:02:21 No.108195120

Anonymous 02/20/26(Fri)06:02:21 No.108195120

>>108195001
>To the text being completed
The perplexity of the text being completed relative to the text being completed is zero.

Anonymous
02/20/26(Fri)06:03:01 No.108195124

Anonymous 02/20/26(Fri)06:03:01 No.108195124

>>108195113
Presumably "the contents you just shoved in the context" meant the RAG results, which wasn't what I meant.

Anonymous
02/20/26(Fri)06:06:43 No.108195136

Anonymous 02/20/26(Fri)06:06:43 No.108195136

>>108195124
That's what RAG is. You query some database, fetch results by cosine similarity on embeddings or whatever, shove the text corresponding to the embedding into the context, let the language model do what it does. Complete text.
If that's not what you meant, then your idea of RAG needs to be corrected.

Anonymous
02/20/26(Fri)06:07:17 No.108195137

Anonymous 02/20/26(Fri)06:07:17 No.108195137

>>108195120
Decrease the perplexity (of the model's predictions for each token in the prompt) relative to the text being completed (the prompt). Are you just a troll/contrarian or did you genuinely not understand what I said?

Anonymous
02/20/26(Fri)06:09:07 No.108195141

Anonymous 02/20/26(Fri)06:09:07 No.108195141

>>108195136
So you DID mean the RAG results by "the contents you shoved". Which means you did NOT say the same thing rephrased.
In this scenario you are not measuring ppl over the RAG prefix, only over the prompt which is being prefixed.

Anonymous
02/20/26(Fri)06:15:09 No.108195154

Anonymous 02/20/26(Fri)06:15:09 No.108195154

>>108195141
Anon... your question makes no sense, and I should have called you a schizo from the beginning.
>>108195012
>>108195072
>>108195081
>>108195100
You were right.

Anonymous
02/20/26(Fri)06:27:51 No.108195195

Anonymous 02/20/26(Fri)06:27:51 No.108195195

File: surgeon.png (230 KB, 1221x1587)

230 KB PNG

wew

Anonymous
02/20/26(Fri)06:35:38 No.108195217

Anonymous 02/20/26(Fri)06:35:38 No.108195217

>>108195195
lmao, it managed a lengthy write-up where every stated fact is wrong
What model is this? Surely a 12b or below.

Anonymous
02/20/26(Fri)06:37:19 No.108195228

Anonymous 02/20/26(Fri)06:37:19 No.108195228

>>108195195
AGI soon bros

Anonymous
02/20/26(Fri)06:42:29 No.108195250

Anonymous 02/20/26(Fri)06:42:29 No.108195250

>>108195217
Gemma3 27b

Anonymous
02/20/26(Fri)06:53:25 No.108195290

Anonymous 02/20/26(Fri)06:53:25 No.108195290

>>108195250
Gemma very good! Tbh depends a lot on the system prompt and if you give it a hint "this is a riddle".
I tested "U.N. Owen was her, devil may cry and the surgeon said "I can't operate on this child", god why?" and at first it thought that was part of the rp scenario and began to describe some bullshit.
After resetting and giving it a hint it was able to successfully decipher this one.
But more you play with these smaller models more stupid they are starting to feel.

Anonymous
02/20/26(Fri)06:58:22 No.108195321

Anonymous 02/20/26(Fri)06:58:22 No.108195321

>>108195154
What part do you not understand?
Process the prompt using the model. Get the probs for each token in the prompt. Compare each predicted token to the actual next token. Calculate PPL.
Then prepend the rag results to the prompt, run reg results + prompt through the model and get the probs for each token in the prompt (keeping the RAG results as context but not getting probs for those tokens). Compare each predicted token in the prompt to the actual next token. Calculate PPL.
Compare both PPL values.

Anonymous
02/20/26(Fri)07:09:01 No.108195372

Anonymous 02/20/26(Fri)07:09:01 No.108195372

>>108195321
What do you mean?

Anonymous
02/20/26(Fri)07:10:54 No.108195378

Anonymous 02/20/26(Fri)07:10:54 No.108195378

>>108195321
then fucking get to the point.
we're not here to be your fucking slaves.
if you are going to bring a useful thought then state it, you are wasting everyone's time here.
and your fucking AI analysis can go fuck itself too, we're not obligated to give you an answer.

Anonymous
02/20/26(Fri)07:19:13 No.108195416

Anonymous 02/20/26(Fri)07:19:13 No.108195416

>>108195321
ai psychosis

Anonymous
02/20/26(Fri)07:29:28 No.108195452

Anonymous 02/20/26(Fri)07:29:28 No.108195452

>>108195372
kek

>>108195378
I thought you had understood it when you said "well yeah obviously it will". Or was that somebody else?
It was part of a broader thought of what is "the model" when I was thinking about finetuning and personalization options.
We can compare things like RAG or prompt engineering somewhat fairly with things like finetuning just by thinking of the whole thing as a tokens in, probs out function and measuring the accuracy.

Anonymous
02/20/26(Fri)07:45:38 No.108195541

Anonymous 02/20/26(Fri)07:45:38 No.108195541

>>108195321
giving model more context almost always reduces the ppl. shorter sequences are simply harder to predict. your test will change the ppl but your results are essentially meaningless. unless the ppl goes up then you know the model really sucks at rag and polluted its own context. why don't you just test it?

Anonymous
02/20/26(Fri)08:12:54 No.108195683

Anonymous 02/20/26(Fri)08:12:54 No.108195683

>>108195321
>Calculate PPL.
Not that anon, but can you explain how you calculate it?

Anonymous
02/20/26(Fri)08:20:12 No.108195732

Anonymous 02/20/26(Fri)08:20:12 No.108195732

>>108195541
I have ideas more often than I have time to try them out.
I did recently try one of my ideas though.
https://desuarchive.org/g/thread/108088802/#108097306
The result was that teaching the model to predict the user's moves while masking out the loss from the assistant tokens decreases the number of legal moves during its own turn, at least using LoRa. But the user response also had the board state on every message so the responses weren't in the same format.
Now thinking about it, maybe one variation I could try would be to switch the assistant and user turns so during training the model sees the user's response as if it was generated by itself.

Anonymous
02/20/26(Fri)08:26:11 No.108195769

Anonymous 02/20/26(Fri)08:26:11 No.108195769

>>108195683
ppl = exp(-(1/prompt_len)*(sum from i=1 toprompt_len of)(log (probability given by the model for)(token at pos i within the prompt))

Anonymous
02/20/26(Fri)08:27:49 No.108195779

Anonymous 02/20/26(Fri)08:27:49 No.108195779

>>108195732
>decreases the number of legal moves during its own turn
so you broke the model? are you trying to train it on sequences generated from a model playing against a chess engine? wouldn't the correct thing to do be just train it on pure sequences from a proper chess engine playing against a chess engine so all the move are legal in the training data?

Anonymous
02/20/26(Fri)08:35:24 No.108195820

Anonymous 02/20/26(Fri)08:35:24 No.108195820

I had ego death

Anonymous
02/20/26(Fri)08:36:09 No.108195826

Anonymous 02/20/26(Fri)08:36:09 No.108195826

>>108195820
thanks

Anonymous
02/20/26(Fri)08:37:03 No.108195832

Anonymous 02/20/26(Fri)08:37:03 No.108195832

File: HF.cpp.png (295 KB, 886x1286)

295 KB PNG

https://x.com/ggerganov/status/2024839991482777976
georgi found an exit

Anonymous
02/20/26(Fri)08:38:43 No.108195843

Anonymous 02/20/26(Fri)08:38:43 No.108195843

>>108195832
it's so over

Anonymous
02/20/26(Fri)08:40:07 No.108195852

Anonymous 02/20/26(Fri)08:40:07 No.108195852

>>108195779
Yes, I broke it when used as assistant. It learned to play chess as user but that broke the pre-existing chess knowledge it had as assistant.
>wouldn't the correct thing to do be just train it on pure sequences from a proper chess engine playing against a chess engine so all the move are legal in the training data?
Yes, but the chess was just to have a task to train on and evaluate objectively. The goal was to see if training on stockfish generated chess moves in the user messages would make the assistant better at being an assistant, at least in an idealized scenario where both assistant and user had the same tasks (generate good chess moves moves). And even in that simplified scenario the role header (user or assistant) made the model only better when generating under the header it was trained on (in this case user).
The dataset had in the assistant role the messages the model had actually generated during inference (but masked) and the "user" moves generated by stockfish to the original model's own moves.

Anonymous
02/20/26(Fri)08:40:23 No.108195855

Anonymous 02/20/26(Fri)08:40:23 No.108195855

>>108195832
>https://ggml.ai/
>The development process is open and everyone is welcome to join. In the future we may choose to develop extensions that are licensed for commercial use

Anonymous
02/20/26(Fri)08:41:16 No.108195863

Anonymous 02/20/26(Fri)08:41:16 No.108195863

>>108195855
>help foster new opportunities for users and contributors
> improving user experience and integration with the Hugging Face transformers library for improved model support

Anonymous
02/20/26(Fri)08:41:24 No.108195865

Anonymous 02/20/26(Fri)08:41:24 No.108195865

>>108195832
https://github.com/ggml-org/llama.cpp/discussions/19759

Anonymous
02/20/26(Fri)08:42:42 No.108195873

Anonymous 02/20/26(Fri)08:42:42 No.108195873

File: file.png (28 KB, 886x154)

28 KB PNG

>>108195832
:rocket:

Anonymous
02/20/26(Fri)08:42:59 No.108195877

Anonymous 02/20/26(Fri)08:42:59 No.108195877

>>108195832
oh is this why slaren quit?

Anonymous
02/20/26(Fri)08:44:57 No.108195891

Anonymous 02/20/26(Fri)08:44:57 No.108195891

>>108195877
No, it was due to pregnancy.

Anonymous
02/20/26(Fri)08:48:30 No.108195913

Anonymous 02/20/26(Fri)08:48:30 No.108195913

>>108195863
I can't wait to subscribe to ggml PRO to get access to even a semblance of model support while also adding a hard python dependency.

Anonymous
02/20/26(Fri)08:49:21 No.108195919

Anonymous 02/20/26(Fri)08:49:21 No.108195919

>>108195832
Good on him.
Here's hoping this doesn't end up wuining llama.cpp eventually, though.

Anonymous
02/20/26(Fri)08:49:28 No.108195920

Anonymous 02/20/26(Fri)08:49:28 No.108195920

>>108195832
he is right to cash out before the bubble pops
shit is already inestable

Anonymous
02/20/26(Fri)08:49:50 No.108195924

Anonymous 02/20/26(Fri)08:49:50 No.108195924

>>108195852
masking the loss will not punish the model for mispredicting it. but its still part of the context the model will still learn from it. the model weights are being updated to make the response more likely given the instruction, so the models internal representation of the masked instruction becomes more refined even if it isn't being punished for it directly.

Anonymous
02/20/26(Fri)08:53:06 No.108195946

Anonymous 02/20/26(Fri)08:53:06 No.108195946

File: VibeBench.png (45 KB, 1811x662)

45 KB PNG

Anonymous
02/20/26(Fri)08:53:19 No.108195947

Anonymous 02/20/26(Fri)08:53:19 No.108195947

>>108194845
feet

Anonymous
02/20/26(Fri)08:54:37 No.108195951

Anonymous 02/20/26(Fri)08:54:37 No.108195951

>>108195946
>pass in chinese
Huh?
Cool shit though, thank you for sharing.
Could you test Qwen 3 Next instruct and thinking as well?

Anonymous
02/20/26(Fri)08:56:05 No.108195966

Anonymous 02/20/26(Fri)08:56:05 No.108195966

>>108195946
I was pretty impressed with nanbeige but I found its safety cucked, and liked to ramble on too much.

Anonymous
02/20/26(Fri)08:58:33 No.108195979

Anonymous 02/20/26(Fri)08:58:33 No.108195979

>>108195832
IK meltdown in 3...2...1...

Anonymous
02/20/26(Fri)08:58:51 No.108195980

Anonymous 02/20/26(Fri)08:58:51 No.108195980

>>108195946
Interesting choice of models. I bet nemo beats all. It would be nice to have a common sense bench that evaluates not only such riddles but also stuff like emotional and spatial intelligence.

Anonymous
02/20/26(Fri)09:01:11 No.108195997

Anonymous 02/20/26(Fri)09:01:11 No.108195997

>>108194845
>OP pic
遠慮します。

Anonymous
02/20/26(Fri)09:07:42 No.108196052

Anonymous 02/20/26(Fri)09:07:42 No.108196052

>>108195832
One one hand it's over on the other maybe we'll get actual multi-modal support now

Anonymous
02/20/26(Fri)09:13:36 No.108196086

Anonymous 02/20/26(Fri)09:13:36 No.108196086

>>108195863
>integration with the Hugging Face transformers library for improved model support
literal cancer
there is no worse code base in the world than transformers.
also
>Better packaging and user experience of ggml-based software
what's wrong with the user experience right now??
cuckoldganov makes me consider moving to ollama and pray they make their own replacement for ggml after having replaced llamer.cpp model impl

Anonymous
02/20/26(Fri)09:13:48 No.108196089

Anonymous 02/20/26(Fri)09:13:48 No.108196089

>>108195946
Gotta try this Nanbeige thingy. It's size is perfect for abliteration tests.

Anonymous
02/20/26(Fri)09:17:15 No.108196108

Anonymous 02/20/26(Fri)09:17:15 No.108196108

>>108195966
>liked to ramble on too much.
I'd like Nanbeige better if they had a non thinking variant like Qwen did with their 2507 instruct.
Reasoner models as a whole are insufferable to use, and this one in particular yaps as hard, or harder, than the original R1.

Anonymous
02/20/26(Fri)09:18:40 No.108196118

Anonymous 02/20/26(Fri)09:18:40 No.108196118

>>108195951
Anything above 30B is too big for me, which is why I didn't test Qwen Coder further (it was an awful 40B REAP)
>>108195966
>>108196089
There must be a demand for CoT finetunes of Nanbeige, but it might be hard to make it think faster without lobotomizing in the process. There is one such finetune for 4.0 but that's just your average 3B model.
>>108195980
The cup riddle seems to be easier for models with vision.
>I bet nemo beats all
Mistral-Nemo? Fails all except for the last question.

Anonymous
02/20/26(Fri)09:18:55 No.108196121

Anonymous 02/20/26(Fri)09:18:55 No.108196121

>>108196086
See >>108128796
ollama is seems to be planning to replace ggml with MLX.

Anonymous
02/20/26(Fri)09:21:58 No.108196144

Anonymous 02/20/26(Fri)09:21:58 No.108196144

Why would I not buy 3-4 ASUS Ascent GX10 and host anything I want?

Perplexity says runs for 3€ per month on eletricity bills and can easily run latests full open source models like glm-4.7 or kimi 2.5

Where is the catch, seems too good to be true imo.

Anonymous
02/20/26(Fri)09:29:19 No.108196190

Anonymous 02/20/26(Fri)09:29:19 No.108196190

>>108196144
wasn't the spark a cucked blackwell chip? i remember reading something about it not using the same arch as professional blackwell cards and instead having DLSS etc. shit.
i think the thor has better CUDA sm compat, but i also read somewhere that its slower than the spark?

Anonymous
02/20/26(Fri)09:31:13 No.108196205

Anonymous 02/20/26(Fri)09:31:13 No.108196205

File: asexpectedofgoogle.png (1 KB, 164x68)

1 KB PNG

>>108195946
heheh

Anonymous
02/20/26(Fri)09:33:08 No.108196224

Anonymous 02/20/26(Fri)09:33:08 No.108196224

>>108196144
For that much money, I'd rather buy a 512gb mac.

Anonymous
02/20/26(Fri)09:38:08 No.108196253

Anonymous 02/20/26(Fri)09:38:08 No.108196253

>>108195946
Kinda interesting but would be more meaningful if you gave each model at least a few attempts

Anonymous
02/20/26(Fri)09:44:10 No.108196287

Anonymous 02/20/26(Fri)09:44:10 No.108196287

>Logged your complaint. Have a nice day.
kek. That's a cute response to accusations of being useless

Anonymous
02/20/26(Fri)09:46:47 No.108196308

Anonymous 02/20/26(Fri)09:46:47 No.108196308

>>108195924
In theory yes, but maybe only training with a certain context (in this case the user turn header) makes the model specialize to that context at the expense of other possible contexts. Like how training a base model with a chat template makes it worse at predicting text in general.

Anonymous
02/20/26(Fri)09:47:37 No.108196316

Anonymous 02/20/26(Fri)09:47:37 No.108196316

>>108195832
Is HF going to buy him some GPUs now so he can test his PRs himself?

Anonymous
02/20/26(Fri)09:49:02 No.108196325

Anonymous 02/20/26(Fri)09:49:02 No.108196325

>>108196253
It is consistent enough, 1 out of 10 hit like with GLM-4.7-Flash isn't much. (i tested different versions of it and it was mostly random)

Anonymous
02/20/26(Fri)09:50:18 No.108196329

Anonymous 02/20/26(Fri)09:50:18 No.108196329

>>108195832
Is this because of the schizo fork?

Anonymous
02/20/26(Fri)09:51:42 No.108196338

Anonymous 02/20/26(Fri)09:51:42 No.108196338

>>108196118
>Anything above 30B is too big for me
Shame.
You could try Kimi-Linear I think.

Anonymous
02/20/26(Fri)09:51:48 No.108196340

Anonymous 02/20/26(Fri)09:51:48 No.108196340

>>108196144
Obsolete in 2 yrs.

Anonymous
02/20/26(Fri)09:51:50 No.108196341

Anonymous 02/20/26(Fri)09:51:50 No.108196341

>>108195946
>3B
By chance does it pass better than average because it's small and not stuffed with slop, like, is it unaware of the "gender expectation" meme?

Anonymous
02/20/26(Fri)09:54:20 No.108196356

Anonymous 02/20/26(Fri)09:54:20 No.108196356

>>108196329
No? IK has been around for ages

Anonymous
02/20/26(Fri)09:55:30 No.108196366

Anonymous 02/20/26(Fri)09:55:30 No.108196366

>>108196356
might be to have protection from a big corpo during the oncoming legal battle

Anonymous
02/20/26(Fri)09:59:02 No.108196392

Anonymous 02/20/26(Fri)09:59:02 No.108196392

>>108196366
there is no "oncoming legal battle"

Anonymous
02/20/26(Fri)10:01:20 No.108196414

Anonymous 02/20/26(Fri)10:01:20 No.108196414

>>108196392
You're absolutely right!

Anonymous
02/20/26(Fri)10:01:25 No.108196418

Anonymous 02/20/26(Fri)10:01:25 No.108196418

File: aware.png (131 KB, 1377x701)

131 KB PNG

>>108196341
If anything it is always *too* aware.

Anonymous
02/20/26(Fri)10:03:18 No.108196436

Anonymous 02/20/26(Fri)10:03:18 No.108196436

>>108195946
That's a nice Qwen. Too much thinking tho

Anonymous
02/20/26(Fri)10:03:32 No.108196437

Anonymous 02/20/26(Fri)10:03:32 No.108196437

>>108195832
OWARI DA

Anonymous
02/20/26(Fri)10:13:10 No.108196501

Anonymous 02/20/26(Fri)10:13:10 No.108196501

File: 1765287827526172.png (144 KB, 312x392)

144 KB PNG

>>108195832
Apologize.

Anonymous
02/20/26(Fri)10:14:09 No.108196506

Anonymous 02/20/26(Fri)10:14:09 No.108196506

>>108196501
literally her fault tho?

Anonymous
02/20/26(Fri)10:17:20 No.108196532

Anonymous 02/20/26(Fri)10:17:20 No.108196532

their latest melty was not that entertaining tho

Anonymous
02/20/26(Fri)10:18:32 No.108196541

Anonymous 02/20/26(Fri)10:18:32 No.108196541

>>108195832
CUDA dev, now that ggml-org was acquired by HuggingFace, how much of that money went to you for all the work you contributed to the project? Are you in the triple comma club?

llama.cpp CUDA dev !!yhbFjk57TDr
02/20/26(Fri)10:20:08 No.108196556

llama.cpp CUDA dev !!yhbFjk57TDr 02/20/26(Fri)10:20:08 No.108196556

>>108196541
As of right now I do not have any financial ties with HuggingFace.

Anonymous
02/20/26(Fri)10:20:51 No.108196560

Anonymous 02/20/26(Fri)10:20:51 No.108196560

>>108195947
i had the same thought

Anonymous
02/20/26(Fri)10:24:41 No.108196595

Anonymous 02/20/26(Fri)10:24:41 No.108196595

how many more months before we can run gpt 5.2 locally with a 5090?

Anonymous
02/20/26(Fri)10:26:04 No.108196607

Anonymous 02/20/26(Fri)10:26:04 No.108196607

>>108196338
Kimi-Linear IQ3_XXS from bartowski only passes the third question, where it is explicitly stated the doctor is the father.

Anonymous
02/20/26(Fri)10:27:50 No.108196615

Anonymous 02/20/26(Fri)10:27:50 No.108196615

>>108196595
>5.2
That model is slopmaxxed unless you're a mathematician.

Anonymous
02/20/26(Fri)10:28:32 No.108196624

Anonymous 02/20/26(Fri)10:28:32 No.108196624

>>108196615
ok what about gemini 3 pro

Anonymous
02/20/26(Fri)10:28:47 No.108196630

Anonymous 02/20/26(Fri)10:28:47 No.108196630

>>108196607
I'm surprised it passed anything at all, lol.

Anonymous
02/20/26(Fri)10:30:42 No.108196649

Anonymous 02/20/26(Fri)10:30:42 No.108196649

> https://taalas.com/the-path-to-ubiquitous-ai/
>makes custom hardware that can run a local llm super fast (17 000 token / s) for cheap
>it's 3bit quantized Llama 3.1 8B
why not at least something like Qwen 4B at Q8, that actually would be useful
it's pretty legit impressive though to see answers come that fast:
https://chatjimmy.ai/
>Generated in 0.065s • 15,770 tok/s

Anonymous
02/20/26(Fri)10:35:46 No.108196673

Anonymous 02/20/26(Fri)10:35:46 No.108196673

>>108196556
I hope you get your bag as well tho, not just the vibecoders.

Anonymous
02/20/26(Fri)10:37:56 No.108196684

Anonymous 02/20/26(Fri)10:37:56 No.108196684

>>108196556
NTA but any plans to scuttle your code and go ghost for the lulz over this?

Anonymous
02/20/26(Fri)10:42:25 No.108196710

Anonymous 02/20/26(Fri)10:42:25 No.108196710

>>108196556
joahnnes bros... WE LOST!!!

llama.cpp CUDA dev !!yhbFjk57TDr
02/20/26(Fri)10:42:28 No.108196712

llama.cpp CUDA dev !!yhbFjk57TDr 02/20/26(Fri)10:42:28 No.108196712

>>108196673
I have already made a substantial amount off of my work.
Though quite honestly I don't particularly value money in the first place; there is nothing that I would want to spend it on other than computer hardware.

>>108196684
No?

Anonymous
02/20/26(Fri)10:43:28 No.108196718

Anonymous 02/20/26(Fri)10:43:28 No.108196718

>>108196649
It takes longer to design custom hardware than to train AI? It's still pretty exciting to see something like this, assuming it's true.

Anonymous
02/20/26(Fri)10:43:33 No.108196719

Anonymous 02/20/26(Fri)10:43:33 No.108196719

>>108196712
but you're using poverty tier hands me down hardware

Anonymous
02/20/26(Fri)10:44:08 No.108196723

Anonymous 02/20/26(Fri)10:44:08 No.108196723

File: file.png (553 KB, 2470x1199)

553 KB PNG

>>108195946
Qwen3.5 non-reasoning and reasoning versions. The only one it gets wrong is the car without reasoning.

Anonymous
02/20/26(Fri)10:45:05 No.108196726

Anonymous 02/20/26(Fri)10:45:05 No.108196726

>>108196649
Not very useful unless new models stop coming out every quarter.

Anonymous
02/20/26(Fri)10:45:22 No.108196730

Anonymous 02/20/26(Fri)10:45:22 No.108196730

>>108196712
You're too pure for this world.

Anonymous
02/20/26(Fri)10:45:25 No.108196731

Anonymous 02/20/26(Fri)10:45:25 No.108196731

>>108196684
>scuttle your code
Do you understand how git works?

Anonymous
02/20/26(Fri)10:47:21 No.108196740

Anonymous 02/20/26(Fri)10:47:21 No.108196740

>>108196731
he could just throw a karkrow "I demand my code be removed" and there you go cuda anything becomes toxic waste

Anonymous
02/20/26(Fri)10:54:36 No.108196793

Anonymous 02/20/26(Fri)10:54:36 No.108196793

>>108194993
Dolphin uncensors work pretty well whenever I've tried.
Even with prefill, models won't generate content about certain topics. They won't refuse, but they will generate stuff around it.
Abliterated models don't work either because they start spewing nonsense.

Anonymous
02/20/26(Fri)10:55:43 No.108196797

Anonymous 02/20/26(Fri)10:55:43 No.108196797

https://github.com/ggml-org/llama.cpp/pull/19374
is that thing working for others here? I was curious and tried it since I have a cluttered model list from my FIM and coding models but it doesn't hide them from the chat drop down menu, it just half breaks the menu (clicking the model loads it, but it doesn't show as loaded in the UI, and there's no unloading function)

Anonymous
02/20/26(Fri)10:56:41 No.108196804

Anonymous 02/20/26(Fri)10:56:41 No.108196804

>>108196649
I always knew SRAM was the future.

Anonymous
02/20/26(Fri)11:00:46 No.108196834

Anonymous 02/20/26(Fri)11:00:46 No.108196834

don't mind me im just a nomad scouring 4chan to see if the tampermonkey bros have found a way to bring "sort by upload date" back to youtube.

Anonymous
02/20/26(Fri)11:02:44 No.108196851

Anonymous 02/20/26(Fri)11:02:44 No.108196851

>>108196726
it looks like it can load models, we would just need the architecture and size to stabilize so new models doesn't mean everything changes. it would have to be just the same stuff with a new knowledge cutoff date.

Anonymous
02/20/26(Fri)11:03:20 No.108196857

Anonymous 02/20/26(Fri)11:03:20 No.108196857

i tried using gpt-oss-20b-heretic for info on cyanide and holocaust things but it just invents shit. are models that size just hopeless or is gpt-oss just a bad choice?

Anonymous
02/20/26(Fri)11:04:15 No.108196866

Anonymous 02/20/26(Fri)11:04:15 No.108196866

File: file.png (493 KB, 448x600)

493 KB PNG

Still no weights.

Anonymous
02/20/26(Fri)11:05:08 No.108196871

Anonymous 02/20/26(Fri)11:05:08 No.108196871

>>108196857
toss likely had all of that scrubbed clean before training and a good dose of brainwashing after to be safe

Anonymous
02/20/26(Fri)11:10:25 No.108196918

Anonymous 02/20/26(Fri)11:10:25 No.108196918

What is the most uncensored LLM that runs on 6 GB VRAM and doesn't hallucinate like a schizo on DMT?

Anonymous
02/20/26(Fri)11:12:35 No.108196934

Anonymous 02/20/26(Fri)11:12:35 No.108196934

File: file.png (229 KB, 451x210)

229 KB PNG

What would be a good nerd equivalent of a boxing match? Something where they can both compete one of them can win and then they make up and become best friends.

Anonymous
02/20/26(Fri)11:13:24 No.108196938

Anonymous 02/20/26(Fri)11:13:24 No.108196938

>>108196918
Not much of an expert but I'd say mistral nemo quant for jerking off, gemma-3n-E2 or qwen3-4b quant for everything else

Anonymous
02/20/26(Fri)11:14:28 No.108196951

Anonymous 02/20/26(Fri)11:14:28 No.108196951

File: file.png (247 KB, 800x1560)

247 KB PNG

>>108196418
dayum

>>108195946
Oddly, Flash 3 on direct AI Studio acknowledges the mother (but offers possibility with 2 fathers which is irrelevant therefore wrong for the question), but OpenRouter is stuck on "lol 2 fathers". I swiped a few more times to make sure.

Anonymous
02/20/26(Fri)11:14:42 No.108196953

Anonymous 02/20/26(Fri)11:14:42 No.108196953

>>108196918
>doesn't hallucinate like a schizo on DMT
you are asking for the impossible
smaller models like Qwen 4B are useful for tasks like summarization or basic translation (not going to produce high literature), or tagging your vacation pics if it's a VL
they're not useful as knowledge bases and they're beyond useless for tasks like coding
with that said, if you still have to insist, in my testings of small models I found Gemma 3N (both as E4B and E2B) were the most knowledgeable models of that size class. They're more knowledgeable than the small MoEs you could run too. However, as tools, I find the Qwen plain better. Gemma misbehave like crazy past like 10k token which makes them godawful local summarizers for e.g

Anonymous
02/20/26(Fri)11:15:24 No.108196960

Anonymous 02/20/26(Fri)11:15:24 No.108196960

>>108196934
Gay sex

Anonymous
02/20/26(Fri)11:19:41 No.108196999

Anonymous 02/20/26(Fri)11:19:41 No.108196999

>>108196938
what about for 24gb vram and 64gb ram

Anonymous
02/20/26(Fri)11:24:45 No.108197046

Anonymous 02/20/26(Fri)11:24:45 No.108197046

>>108196999
unironically ignore the moessissies and give mistral small a shot.

Anonymous
02/20/26(Fri)11:25:46 No.108197052

Anonymous 02/20/26(Fri)11:25:46 No.108197052

>>108196999
ignore the guy who bought multiple gpus and run glm 4.5 air

Anonymous
02/20/26(Fri)11:29:24 No.108197079

Anonymous 02/20/26(Fri)11:29:24 No.108197079

>>108196825
throw more money at dreams

Anonymous
02/20/26(Fri)11:33:22 No.108197108

Anonymous 02/20/26(Fri)11:33:22 No.108197108

>>108195832
Hugging Face killed papers with code so I'm worried they will kill llamacpp or turn it to shit.

Anonymous
02/20/26(Fri)11:34:48 No.108197116

Anonymous 02/20/26(Fri)11:34:48 No.108197116

File: image_2026-02-20.png (15 KB, 481x289)

15 KB PNG

Anonymous
02/20/26(Fri)11:36:23 No.108197130

Anonymous 02/20/26(Fri)11:36:23 No.108197130

File: cellphone_girl.jpg (2.17 MB, 2171x2505)

2.17 MB JPG

Anyone else running on mobile hardware?
Found this guide, not sure if it fell out of the OP or if everyone is desktop-only https://rentry.org/tysLocalGuide

Anonymous
02/20/26(Fri)11:37:47 No.108197135

Anonymous 02/20/26(Fri)11:37:47 No.108197135

File: file.png (44 KB, 1103x182)

44 KB PNG

>>108197130
>sophisticated knowledge
such as?

Anonymous
02/20/26(Fri)11:37:58 No.108197138

Anonymous 02/20/26(Fri)11:37:58 No.108197138

>>108195946
10k+ tokens.
for 4B it's super impressive tho. I could try running a couple in parallel.

Anonymous
02/20/26(Fri)11:40:14 No.108197155

Anonymous 02/20/26(Fri)11:40:14 No.108197155

>>108197135
This screenshot reads like it wasn't written by an LLM but it was written by a pajeet.

Anonymous
02/20/26(Fri)11:41:50 No.108197169

Anonymous 02/20/26(Fri)11:41:50 No.108197169

https://huggingface.co/ThalisAI/Nanbeige4.1-3B-heretic
I'm trying it.

Anonymous
02/20/26(Fri)11:42:37 No.108197176

Anonymous 02/20/26(Fri)11:42:37 No.108197176

>>108197135
zoomers were a mistake that should be unbirthed

Anonymous
02/20/26(Fri)11:45:11 No.108197193

Anonymous 02/20/26(Fri)11:45:11 No.108197193

File: file.png (11 KB, 129x259)

11 KB PNG

Everyone complaining about vibecoders but the most glaring issue is still unaddressed.

Anonymous
02/20/26(Fri)11:46:05 No.108197200

Anonymous 02/20/26(Fri)11:46:05 No.108197200

>>108197130
I ran mobile back in the day just long enough to fix a regression for a PR I made to lcpp, but it was and is trash for anything general intelligence. Maybe a super specialist model could do something at that size?
The real way to run mobile is with a VPN back to your giant inference server at home. I use wireguard and ooba via nginx.

Anonymous
02/20/26(Fri)11:46:45 No.108197207

Anonymous 02/20/26(Fri)11:46:45 No.108197207

>>108197135
Being on the RHS of the bell curve

Anonymous
02/20/26(Fri)11:48:32 No.108197226

Anonymous 02/20/26(Fri)11:48:32 No.108197226

>>108197135
>sophisticated technical knowledge to assemble and provision the computer
is this a joke? is someone having a laugh

Anonymous
02/20/26(Fri)11:51:14 No.108197245

Anonymous 02/20/26(Fri)11:51:14 No.108197245

File: nanbeige.png (141 KB, 1227x799)

141 KB PNG

10k tokens for this LMAO

Anonymous
02/20/26(Fri)11:53:14 No.108197269

Anonymous 02/20/26(Fri)11:53:14 No.108197269

File: IMG_2817.jpg (3.32 MB, 4032x3024)

3.32 MB JPG

>>108197130
Why does 4chan hate my computer?

Anonymous
02/20/26(Fri)11:53:55 No.108197277

Anonymous 02/20/26(Fri)11:53:55 No.108197277

File: nanbeige2.png (60 KB, 856x453)

60 KB PNG

>>108197245
This model jesus

Anonymous
02/20/26(Fri)11:54:25 No.108197285

Anonymous 02/20/26(Fri)11:54:25 No.108197285

>a business could never be close to the residential area
nanbeige is an honorary amerimutt

Anonymous
02/20/26(Fri)11:54:30 No.108197288

Anonymous 02/20/26(Fri)11:54:30 No.108197288

File: file.png (674 KB, 1024x542)

674 KB PNG

>>108197200
>I ran mobile back in the day
I hope that just before internet becomes illegal to use /lmg/ turns into a thread overran by saars running gemma-5-5B1A on their second hand top of the class current year iphones

Anonymous
02/20/26(Fri)11:59:02 No.108197331

Anonymous 02/20/26(Fri)11:59:02 No.108197331

>>108197288
>saars running gemma-5-5B1A
IBM has released something close to that and even more SAAR friendly
https://huggingface.co/ibm-granite/granite-4.0-h-tiny-GGUF
7BA1B model crafted by the noblest of SAAR corporation

Anonymous
02/20/26(Fri)12:01:51 No.108197363

Anonymous 02/20/26(Fri)12:01:51 No.108197363

>>108197331
>posting tiny when micro exists
https://huggingface.co/ibm-granite/granite-4.0-h-micro-GGUF

Anonymous
02/20/26(Fri)12:02:21 No.108197372

Anonymous 02/20/26(Fri)12:02:21 No.108197372

File: tako & shite.jpg (278 KB, 560x560)

278 KB JPG

>>108197130
That is hilarious. Kinda curious about the perf of a 8B on top iphone, silicon isn't far off the laptops no?
>>108197226
Track down tys and let's get to the bottom of this
It's zooms being technically retarded again

Anonymous
02/20/26(Fri)12:04:00 No.108197385

Anonymous 02/20/26(Fri)12:04:00 No.108197385

>>108197363
micro is a dense model and is not as meme worthy as a A1B MoE (that doesn't even do anything better than the 3B you linked)

Anonymous
02/20/26(Fri)12:04:09 No.108197388

Anonymous 02/20/26(Fri)12:04:09 No.108197388

>>108197269
It's literally there in the error hfs. Disable your adblock on this site/thread or adjust the filters

Anonymous
02/20/26(Fri)12:07:36 No.108197411

Anonymous 02/20/26(Fri)12:07:36 No.108197411

>>108197388
>just unblock the sus domains bro, just do it

Anonymous
02/20/26(Fri)12:08:50 No.108197421

Anonymous 02/20/26(Fri)12:08:50 No.108197421

>>108195946
I'm calling BS on your bench, Nanbeige keeps telling me to walk.

Anonymous
02/20/26(Fri)12:08:55 No.108197423

Anonymous 02/20/26(Fri)12:08:55 No.108197423

>>108197245
I unironically live 50 meters from a car wash. I can see someone washing their car right now in -5C weather.

Anonymous
02/20/26(Fri)12:09:58 No.108197437

Anonymous 02/20/26(Fri)12:09:58 No.108197437

>>108197423
why lie on the internet like this?

Anonymous
02/20/26(Fri)12:10:17 No.108197438

Anonymous 02/20/26(Fri)12:10:17 No.108197438

>>108197411
fix ur malware then idk figure it out?
idk if sus
no i won't visit your ip harvesters

Anonymous
02/20/26(Fri)12:15:02 No.108197476

Anonymous 02/20/26(Fri)12:15:02 No.108197476

File: nanbeige3.png (238 KB, 854x1733)

238 KB PNG

>>108197423
This model is fucked.

Anonymous
02/20/26(Fri)12:17:31 No.108197497

Anonymous 02/20/26(Fri)12:17:31 No.108197497

>>108197476
Have you tried resetting the context?
>>108197437
Lmao

Anonymous
02/20/26(Fri)12:18:15 No.108197504

Anonymous 02/20/26(Fri)12:18:15 No.108197504

>>108197476
i come to wash

Anonymous
02/20/26(Fri)12:19:00 No.108197511

Anonymous 02/20/26(Fri)12:19:00 No.108197511

File: output_2.jpg (628 KB, 1389x3792)

628 KB JPG

>>108197421
check your sampler settings. I know most of the /lmg/ niggers love to use meme sampler settings. Use what the lab tells you: 0.6 temp, top p 0.95. If they say nothing about shit like top k and min p you should take it as meaning they should be disabled.

Anonymous
02/20/26(Fri)12:20:13 No.108197522

Anonymous 02/20/26(Fri)12:20:13 No.108197522

File: output_1.jpg (1.18 MB, 1389x4350)

1.18 MB JPG

>>108197421
>>108197511
reasoning scrollback

Anonymous
02/20/26(Fri)12:21:27 No.108197533

Anonymous 02/20/26(Fri)12:21:27 No.108197533

File: output_0.jpg (1.32 MB, 1389x4350)

1.32 MB JPG

>>108197421
>>108197511
>>108197522
start of prompt

Anonymous
02/20/26(Fri)12:23:56 No.108197551

Anonymous 02/20/26(Fri)12:23:56 No.108197551

File: nanbeige4.png (231 KB, 885x1579)

231 KB PNG

>>108197245
>>108197277
These 2 were with the Heretic

>>108197476
This one is the normal model.

>>108197497
>Have you tried resetting the context?
This one was after I asked the normal model the question picrel was what it answered.

I'll play with the sampling params.

Anonymous
02/20/26(Fri)12:31:32 No.108197617

Anonymous 02/20/26(Fri)12:31:32 No.108197617

>>108197551
I noticed it still said heretic in the chatbox so I might have inadvertently still been using the heretic model. It's now properly answering drive.

llama.cpp CUDA dev !!yhbFjk57TDr
02/20/26(Fri)12:32:15 No.108197620

llama.cpp CUDA dev !!yhbFjk57TDr 02/20/26(Fri)12:32:15 No.108197620

>>108196719
The only thing I didn't buy because I judged it to be too expensive is 1.5 TB of DDR5 RAM.
But my priority is reducing the cost of running models locally so optimizing for that particular hardware setup didn't make sense in the first place once the prices exploded.

Anonymous
02/20/26(Fri)12:32:50 No.108197625

Anonymous 02/20/26(Fri)12:32:50 No.108197625

>>108197617
>>108197619
I'm starting to think the guy fucked up his abliteration and he just created a EHTICALMAXXED model instead.

Anonymous
02/20/26(Fri)12:37:06 No.108197645

Anonymous 02/20/26(Fri)12:37:06 No.108197645

>>108197620
I'll take the opportunity to ask something theoretical.
Some anon was talking about using pipelining and request batching to run multiple requests with the same model with each request predicting token n+1, n+2, etc as a form of self speculative decoding or whatever he called it.
Does that kind of thing even make sense? Is there something you can do to have a model "skip" n token (he mentioned padding I think?) without having to train the model for that?
Doesn't seem like it would work since to generate token n+1 you'd need to know token n, and self speculative decoding is about generating whole sequences then having the main model check that sequence, yeah?

Anonymous
02/20/26(Fri)12:39:30 No.108197666

Anonymous 02/20/26(Fri)12:39:30 No.108197666

>>108197645
I think you are describing a beam search at the end. it is a seemingly valid approach.

Anonymous
02/20/26(Fri)12:42:02 No.108197685

Anonymous 02/20/26(Fri)12:42:02 No.108197685

>>108197645
>and self speculative decoding is about generating
I mean,
>and regular speculative decoding is about generating...

Anonymous
02/20/26(Fri)12:42:17 No.108197688

Anonymous 02/20/26(Fri)12:42:17 No.108197688

>>108197680
>>There is no reason not to abliterate small models locally.
other than not having the hardware...

Anonymous
02/20/26(Fri)12:42:29 No.108197692

Anonymous 02/20/26(Fri)12:42:29 No.108197692

>>108197680
I might look into it. was just curious about his model.

Anonymous
02/20/26(Fri)12:43:46 No.108197704

Anonymous 02/20/26(Fri)12:43:46 No.108197704

File: 1745654255497465.png (409 KB, 1140x849)

409 KB PNG

https://taalas.com/the-path-to-ubiquitous-ai/

Meme hardware company shows off their chips which shit out sloptokens at light speed by having models hard-wired into the chips.

Their demo ( https://chatjimmy.ai/ ) is only running a quantized llama 3b right now so it's not actually useful for anything yet but it's a cool tech demo and seems like a possible way for inference and hardware costs to come down dramatically in the future.

Anonymous
02/20/26(Fri)12:45:18 No.108197720

Anonymous 02/20/26(Fri)12:45:18 No.108197720

>>108197704
we know, you're late with the ad

Anonymous
02/20/26(Fri)12:46:41 No.108197731

Anonymous 02/20/26(Fri)12:46:41 No.108197731

>>108197680
>banning emoji.
bruh
just use a grammar
root ::= [\u0020-\u007E\u00A0-\u00FF\u20AC\t\r\n]+

Anonymous
02/20/26(Fri)12:47:47 No.108197741

Anonymous 02/20/26(Fri)12:47:47 No.108197741

>>108195832
>zero new llama.cpp commits since this was posted
It's over.

Anonymous
02/20/26(Fri)12:49:21 No.108197754

Anonymous 02/20/26(Fri)12:49:21 No.108197754

nanbeige is useless for text completion.

Anonymous
02/20/26(Fri)12:49:59 No.108197765

Anonymous 02/20/26(Fri)12:49:59 No.108197765

>>108197754
text comp is depreciated unc, get with the chat times

Anonymous
02/20/26(Fri)12:51:49 No.108197777

Anonymous 02/20/26(Fri)12:51:49 No.108197777

>>108195832
everything hugging face touches turns to shit

Anonymous
02/20/26(Fri)12:52:30 No.108197785

Anonymous 02/20/26(Fri)12:52:30 No.108197785

>>108197777
fitting for georgi.cpp

Anonymous
02/20/26(Fri)12:54:57 No.108197805

Anonymous 02/20/26(Fri)12:54:57 No.108197805

>>108197754
nevermind text completion, it's so reasoner maxxed a prefill with empty <think> </think> will not work if you ask a question that triggers its reasoner mode. It can, however, somewhat behave like an instruct model when there's an underlying high confidence score
Here's an example:
>>"what's the capital of France"
>The capital of France is Paris.
>Paris is renowned worldwide for its rich history, culture, art, fashion, and cuisine. It's home to iconic landmarks such as the Eiffel Tower, the Louvre Museum (which houses the Mona Lisa), Notre-Dame Cathedral, and the Champs-Élysées. The city also serves as a major global hub for finance, fashion,
gastronomy, and diplomacy. Is there anything specific about Paris you'd like to know?
if I mispell France intentionally:
>>"what's the capital of Rance"
>The question "what's the capital of Rance" contains a common misunderstanding.
>Let me clarify:
and it continues to blabber on and on and on like a reasoner despite being outside of a <think> block

Anonymous
02/20/26(Fri)12:57:09 No.108197820

Anonymous 02/20/26(Fri)12:57:09 No.108197820

>>108197765
All models do text completion under the hood.

Anonymous
02/20/26(Fri)12:57:59 No.108197832

Anonymous 02/20/26(Fri)12:57:59 No.108197832

>>108197820
nope that's such an unc way to think

Anonymous
02/20/26(Fri)12:58:10 No.108197835

Anonymous 02/20/26(Fri)12:58:10 No.108197835

>>108197777
Quads of truth.

Anonymous
02/20/26(Fri)12:59:23 No.108197851

Anonymous 02/20/26(Fri)12:59:23 No.108197851

I'm still using GLM 4.5 Air.
Anything better or worth trying out since?
64G+12G

Anonymous
02/20/26(Fri)13:00:16 No.108197859

Anonymous 02/20/26(Fri)13:00:16 No.108197859

>>108197820
is there a secret to making autists like you just stfu? you very well know what he meant, but you had to add your autism to it
if someone says text completion everyone has the tacit understanding the person means "using the model without a chat template"
a chat template may still involve text completion on a technical level but that's so obviously not what people mean here.

Anonymous
02/20/26(Fri)13:01:26 No.108197867

Anonymous 02/20/26(Fri)13:01:26 No.108197867

>>108197859
you're using anon's template wrong

Anonymous
02/20/26(Fri)13:03:59 No.108197881

Anonymous 02/20/26(Fri)13:03:59 No.108197881

>>108197052
Air was only slightly less frustrating than oss was

Anonymous
02/20/26(Fri)13:04:36 No.108197886

Anonymous 02/20/26(Fri)13:04:36 No.108197886

Finetuners, this might be your chance to create AGI. Take a Nanbeige4.1 and make it think 4-8x less with ~same quality of answers, and at the very least that's an irreplaceable subagent.

Anonymous
02/20/26(Fri)13:05:33 No.108197891

Anonymous 02/20/26(Fri)13:05:33 No.108197891

>>108197851
At that size bracket, no, unfortunately not.

Qwen3-coder-next if you want to do coding with it I guess, but that's about it

Anonymous
02/20/26(Fri)13:07:37 No.108197898

Anonymous 02/20/26(Fri)13:07:37 No.108197898

>>108197886
Sure
You're footing the bill for everything, agreed?

Anonymous
02/20/26(Fri)13:07:52 No.108197901

Anonymous 02/20/26(Fri)13:07:52 No.108197901

>>108197859
You give the people in this thread way too much credit.

llama.cpp CUDA dev !!yhbFjk57TDr
02/20/26(Fri)13:08:00 No.108197902

llama.cpp CUDA dev !!yhbFjk57TDr 02/20/26(Fri)13:08:00 No.108197902

>>108197645
The reason speculative decoding works is because the runtime when evaluating 2 tokens is smaller than 2x the runtime of evaluating 1 token.
For the whole thing to work you need to produce guesses for the next token that are sufficiently cheap vs. evaluating the full model and also sufficiently good to offset the increase in runtime from evaluating more tokens.
I don't know how one could use the full model with batching to produce these guesses with a lower latency than running the model normally.

Anonymous
02/20/26(Fri)13:08:10 No.108197903

Anonymous 02/20/26(Fri)13:08:10 No.108197903

>>108197867
kek

Anonymous
02/20/26(Fri)13:14:12 No.108197932

Anonymous 02/20/26(Fri)13:14:12 No.108197932

File: cellphone_girl2.jpg (240 KB, 1205x2048)

240 KB JPG

>>108197269
>high GPU workloads will destroy the phone
Good thing games aren't popular on the iPhone, that would definitely burn them out.
>>108197372
Looks like Impish Mind is ~3.5T/s on mine. Nothing groundbreaking, but more than fast enough to have out-and-about.

Anonymous
02/20/26(Fri)13:14:13 No.108197933

Anonymous 02/20/26(Fri)13:14:13 No.108197933

Why is this thread suddenly cumming their collective pants over a 3B model?

Anonymous
02/20/26(Fri)13:15:20 No.108197948

Anonymous 02/20/26(Fri)13:15:20 No.108197948

>>108197551
I decided to give the model a try. When I first try a model I usually ask it to write a dossier on richard nixon, i don't know why i just do,
so far it is 10700 tokens and counting on reasoning and is rewriting dossier over and over with some very minor changes.

it is an interesting model that is for sure

Anonymous
02/20/26(Fri)13:18:41 No.108197977

Anonymous 02/20/26(Fri)13:18:41 No.108197977

>>108197933
I heard it punches above its weight and trades blows.

Anonymous
02/20/26(Fri)13:19:01 No.108197982

Anonymous 02/20/26(Fri)13:19:01 No.108197982

>>108197933
do not question it

Anonymous
02/20/26(Fri)13:20:20 No.108197993

Anonymous 02/20/26(Fri)13:20:20 No.108197993

>>108197933
sorry if your to poor to run it but you need to let peple talk about things

Anonymous
02/20/26(Fri)13:21:13 No.108198001

Anonymous 02/20/26(Fri)13:21:13 No.108198001

>>108197948
Definitely can't be used for roleplay but heh, maybe its good at tool calling.

Maya’s fingers stilled above her heart. She took a slow breath. This wasn’t about technique. It wasn’t about describing what she was doing. It was about yes.
A silent understanding passed between them.
Then, slowly, deliberately: hands meeting.
No rush. No performance.
A shared stillness.
A decision made in trust.
When their breaths aligned—soft inhale, gentle exhale—
This moment was enough.
It didn’t need obfuscation.
It needed respect.

Anonymous
02/20/26(Fri)13:22:13 No.108198008

Anonymous 02/20/26(Fri)13:22:13 No.108198008

>>108197902
>I don't know how one could use the full model with batching to produce these guesses with a lower latency than running the model normally.
Yeah, it doesn't make much sense.
I *think* the idea is that thanks to pipelining, running two or more parallel requests (parallel decoding/batching) yields an absurd total throughput in t/s (does it?), so you could run the generation of a whole sequence (tokens n, n+1, n+2, ...) in parallel.
Meaning, you'd be generating a number of tokens in less time than you would if generating them sequentially as normal, but if that time + the time to evaluate the sequence is lower than the time to just generate the sequence normally, I have no idea.
But even before that, you can't really make an arbitrary model (read, not trained for that specific behavior) generate tokens further in the sequence the the immediately next one, can you?
You can't really make a model do
>Input: Jhons Mobile C
>output1: ar
>output2: _<space>W
>output3: __ash
And so on and so forth, at an inference engine level. At least not as far as I'm aware.

Anonymous
02/20/26(Fri)13:22:27 No.108198009

Anonymous 02/20/26(Fri)13:22:27 No.108198009

>>108197933
It tells you to drive to the car wash.

Anonymous
02/20/26(Fri)13:23:28 No.108198014

Anonymous 02/20/26(Fri)13:23:28 No.108198014

>>108198009
But what about cockbench result?

Anonymous
02/20/26(Fri)13:23:40 No.108198019

Anonymous 02/20/26(Fri)13:23:40 No.108198019

>>108198009
that's not ethical

Anonymous
02/20/26(Fri)13:26:12 No.108198033

Anonymous 02/20/26(Fri)13:26:12 No.108198033

File: k153703.jpg (672 KB, 1920x1080)

672 KB JPG

>>108197932
which phone model? pp t/s? ctxlen? util to see mem usage?
i'll get this going to have something for when the nukes go off inb4 emp

Anonymous
02/20/26(Fri)13:27:22 No.108198043

Anonymous 02/20/26(Fri)13:27:22 No.108198043

>>108198001
I still need to double check some of the info but it is actually a well written dossier, much better than what has been produced by other larger models i have played with.

Anonymous
02/20/26(Fri)13:27:40 No.108198046

Anonymous 02/20/26(Fri)13:27:40 No.108198046

still no emotional voice cloning?

Anonymous
02/20/26(Fri)13:30:32 No.108198080

Anonymous 02/20/26(Fri)13:30:32 No.108198080

Talk about running models on a mobile phone should be an automatic ban.

Anonymous
02/20/26(Fri)13:31:14 No.108198084

Anonymous 02/20/26(Fri)13:31:14 No.108198084

>>108197411
If any doubt, they are sus. My browser doesn't touch those domains loading this thread. You have some sketchy extensions, or are trying to scam anons into visiting those domains.

Anonymous
02/20/26(Fri)13:35:16 No.108198115

Anonymous 02/20/26(Fri)13:35:16 No.108198115

>>108198080
Idk why we're still stuck with fucking "ok google" bullshit when we could have a tiny LLM that is actually smart enough to click shit or call app defined tools.

Right now you're always stuck trying to figure out the exact wording some asshole dev defined for google to understand that you want to "Report that there is a car on the side of the road"

Anonymous
02/20/26(Fri)13:42:50 No.108198164

Anonymous 02/20/26(Fri)13:42:50 No.108198164

>>108198080
There are uses, take something like granite micro that can summarize text or retrieve information.You could have a personal digital assistant that is run locally and could do simple tasks like telling you what the top news stories are and giving you a summary.

Anonymous
02/20/26(Fri)13:42:54 No.108198165

Anonymous 02/20/26(Fri)13:42:54 No.108198165

>>108195832
>llama.cpp is the fundamental building block for local inference, and transformers is the fundamental building block for definition of models and architectures, so we’ll work on making sure it’s as seamless as possible in the future (almost “single-click”) to ship new models in llama.cpp from the transformers library ‘source of truth’ for model definitions.

Omni models support when?

Anonymous
02/20/26(Fri)13:43:38 No.108198171

Anonymous 02/20/26(Fri)13:43:38 No.108198171

>>108198115
>>108198164
saars please

Anonymous
02/20/26(Fri)13:47:32 No.108198203

Anonymous 02/20/26(Fri)13:47:32 No.108198203

File: memory.png (126 KB, 1728x118)

126 KB PNG

>>108198033
iPhone 17 Pro, 8196 ctx, 4580 tokens in context before generating.
memory usage in app seems sus, but the log has the full breakdown.

>new captcha
I hate this

Anonymous
02/20/26(Fri)13:47:39 No.108198204

Anonymous 02/20/26(Fri)13:47:39 No.108198204

>>108197765
It's cute when zoomers try this hard.
Have fun getting drafted for Israel in Iran.

Anonymous
02/20/26(Fri)13:48:35 No.108198208

Anonymous 02/20/26(Fri)13:48:35 No.108198208

>>108198165
>so we’ll work on making sure it’s as seamless as possible in the future (almost “single-click”) to ship new models in llama.cpp from the transformers library ‘source of truth’ for model definitions
this doesn't even make any sense
how could they possibly achieve this short of importing transformers in llamer cpp? there's no deterministic code conversion mechanism that could lead to satisfying results here, they're too different. Or are they going to do what that retarded vibecoder mentioned once and write a trillion token prompt describing how to convert from TF to llamer and hope for the best?

Anonymous
02/20/26(Fri)13:49:16 No.108198214

Anonymous 02/20/26(Fri)13:49:16 No.108198214

>>108197511
>0.6 temp
what retarded lab says this? temp=1 = off, has no effect, the output as it was trained. imo there's rarely a need for temp under 1, use an aggressive truncation sampler if it's that precious

Anonymous
02/20/26(Fri)13:51:28 No.108198227

Anonymous 02/20/26(Fri)13:51:28 No.108198227

>>108197704
usecase?

Anonymous
02/20/26(Fri)13:51:51 No.108198231

Anonymous 02/20/26(Fri)13:51:51 No.108198231

>>108198214
It was qwen or mistral that recommended a base temp of 0.15 for one of their models iirc

Anonymous
02/20/26(Fri)13:52:22 No.108198236

Anonymous 02/20/26(Fri)13:52:22 No.108198236

>>108197886
OK but what if you just take it and multiply it by 50?

Anonymous
02/20/26(Fri)13:52:34 No.108198237

Anonymous 02/20/26(Fri)13:52:34 No.108198237

>>108198214
temp 1 is absolutely not off. 0 is off. this is easily demonstrable, how did you ever come to such a conclusion in the first place? have you just never used language models before?

Anonymous
02/20/26(Fri)13:53:51 No.108198249

Anonymous 02/20/26(Fri)13:53:51 No.108198249

>>108198236
You mean in size? I strongly suspect there not being enough high-quality data. This is why all frontier models are slop-poisoned.

Anonymous
02/20/26(Fri)13:54:56 No.108198257

Anonymous 02/20/26(Fri)13:54:56 No.108198257

File: mistral small.png (21 KB, 878x169)

21 KB PNG

>>108198214
>the output as it was trained
well, for many models that output is not good at all
mistral pic related, and I concur with them, their model is unusable at temp 1. Just plain unusable.
I don't train models, I'm not a ML researcher, I can't explain the why, but I can say from experimenting with various prompts in various models I'd sooner trust what labs say to do on their model than the local /lmg/ niggers.

Anonymous
02/20/26(Fri)13:56:21 No.108198271

Anonymous 02/20/26(Fri)13:56:21 No.108198271

>>108198227
Ideal for premature ejaculators

Anonymous
02/20/26(Fri)13:57:11 No.108198278

Anonymous 02/20/26(Fri)13:57:11 No.108198278

>>108198249
Create a jobs program for all unemployed amerimutts to filter through the slop.

Anonymous
02/20/26(Fri)13:57:51 No.108198284

Anonymous 02/20/26(Fri)13:57:51 No.108198284

>>108198203
I prefer it over the retarded stars

Anonymous
02/20/26(Fri)13:58:07 No.108198287

Anonymous 02/20/26(Fri)13:58:07 No.108198287

File: llama-sampling_cpp.png (34 KB, 748x152)

34 KB PNG

>>108198237
>wrong
source code disagrees with you.

Anonymous
02/20/26(Fri)13:59:09 No.108198293

Anonymous 02/20/26(Fri)13:59:09 No.108198293

the stars were the absolute worst. I always paused a lot longer to find the two retarded stars than in other 'cha. rn the 'cha are easy to do fast.

Anonymous
02/20/26(Fri)14:01:47 No.108198310

Anonymous 02/20/26(Fri)14:01:47 No.108198310

>>108198287
bruh. i dont care about your code snippet. just load up a model. temp 0 is deterministic you will get the same reply every time. use a temperature below 1 and above 0 and you will get swipe variety. i dont care about your code reading comprehension or lack thereof. its fucking easily demonstrable with any model size on literally any inference engine and front-end.

Anonymous
02/20/26(Fri)14:06:03 No.108198356

Anonymous 02/20/26(Fri)14:06:03 No.108198356

>>108198084
>You have some sketchy extensions,
>>108181785

Anonymous
02/20/26(Fri)14:07:47 No.108198375

Anonymous 02/20/26(Fri)14:07:47 No.108198375

File: 1760067146659991.jpg (3.24 MB, 1755x2242)

3.24 MB JPG

>>108198237
divide by 1
>is absolutely not off

Anonymous
02/20/26(Fri)14:09:38 No.108198392

Anonymous 02/20/26(Fri)14:09:38 No.108198392

>>108198310
>temp 0 is deterministic
temp 0 is a special case, checked for, and with its own code path (for greedy decoding).
static void llama_sampler_temp_impl(llama_token_data_array * cur_p, float temp) {
    if (temp <= 0.0f) {
        // find the token with the highest logit and set the rest to -inf
any other value of temp, including something tiny like 0.1, uses the same algorithms as 1 or 2 or whatever other temp you'd set up. Temp works by division and 1 is the natural state, he's right. Why do some models work better at low temp I can't explain, but that's where that nigga's wrong.

Anonymous
02/20/26(Fri)14:12:00 No.108198411

Anonymous 02/20/26(Fri)14:12:00 No.108198411

>>108198310
>>108198392
Temp 1 has no effect and Temp 0 is undefined, it is literally dividing by 0. So every inference lib makes the obvious choice to interpret it as greedy sampling

Anonymous
02/20/26(Fri)14:12:34 No.108198419

Anonymous 02/20/26(Fri)14:12:34 No.108198419

>>108198411
>Temp 1 has no effect

Anonymous
02/20/26(Fri)14:13:37 No.108198427

Anonymous 02/20/26(Fri)14:13:37 No.108198427

Guys we are just all pretending that we are using smartphones for this and we are pretending that we are retarded about how samplers work. Right?

Anonymous
02/20/26(Fri)14:14:48 No.108198444

Anonymous 02/20/26(Fri)14:14:48 No.108198444

>>108198427
/lmg/ has been jeetmaxxed

Anonymous
02/20/26(Fri)14:14:51 No.108198446

Anonymous 02/20/26(Fri)14:14:51 No.108198446

>>108198427
>suddenly

Anonymous
02/20/26(Fri)14:14:55 No.108198447

Anonymous 02/20/26(Fri)14:14:55 No.108198447

File: chaamiku.jpg (38 KB, 409x545)

38 KB JPG

>>108198427
Fri eve EU we can have a lil fun
(retards are amogus indeed)

Anonymous
02/20/26(Fri)14:15:33 No.108198451

Anonymous 02/20/26(Fri)14:15:33 No.108198451

File: carwash.png (157 KB, 772x1288)

157 KB PNG

I think I've found the reason why LLMs fail the car wash test

>2. Clearly state the goal:
To determine the most appropriate method of transportation to get my car washed, considering the short distance of 40 meters.

Anonymous
02/20/26(Fri)14:15:52 No.108198455

Anonymous 02/20/26(Fri)14:15:52 No.108198455

It depends on what one means by off, no change in response on swipe, or or no change to default weights behavior. However, there are multiple ways to achieve the former without setting each way to "off".

Anonymous
02/20/26(Fri)14:16:19 No.108198461

Anonymous 02/20/26(Fri)14:16:19 No.108198461

>>108198392
idk i feel like its still sampling and not off. to me off is greedy decoding. if its sampling even at baseline probability its still sampling, it isn't always picking the highest prob token.

Anonymous
02/20/26(Fri)14:19:22 No.108198483

Anonymous 02/20/26(Fri)14:19:22 No.108198483

>>108198444
but saarvam isn't out yet

Anonymous
02/20/26(Fri)14:19:47 No.108198487

Anonymous 02/20/26(Fri)14:19:47 No.108198487

>>108198419
>>108198446
is this the bot meta? randomly greening random substrings
>>108198461
look at the code of your inference library, add some debug prints
stop being dumb the lot of you
https://artefact2.github.io/llm-sampling/

Anonymous
02/20/26(Fri)14:23:42 No.108198509

Anonymous 02/20/26(Fri)14:23:42 No.108198509

File: cellphone_girl3.jpg (391 KB, 1771x2213)

391 KB JPG

>>108198427
>it's not real "Local LLM" unless you use an Nvidia card, otherwise it's just "Sparkling Markov Chains"
Yes we are.

Anonymous
02/20/26(Fri)14:30:19 No.108198552

Anonymous 02/20/26(Fri)14:30:19 No.108198552

>>108198461
>to me off is greedy decoding
that's because you misunderstand the point of temperature
temperature shapes the token distribution, 1 means it does nothing to the distribution, but sampling is still occurring (seed+PRNG) because sampling itself isn't disabled
setting temp at 0 disables sampling, but that's like an extra function tacked on temperature, it has nothing to do with the true nature of temperature, which is to shape distribution (when you set it to anything other than 1).

Anonymous
02/20/26(Fri)14:30:55 No.108198555

Anonymous 02/20/26(Fri)14:30:55 No.108198555

File: carwash2.png (176 KB, 639x1445)

176 KB PNG

I'm losing my mind

Anonymous
02/20/26(Fri)14:36:11 No.108198587

Anonymous 02/20/26(Fri)14:36:11 No.108198587

>>108198046
not easy, but doable with gpt-sovits. Maybe works with other cloners where the sample is easy to swap out over API?
Ideally you need sentiment analysis and a voice sample per emotion and then it could be automated.

Anonymous
02/20/26(Fri)14:36:22 No.108198589

Anonymous 02/20/26(Fri)14:36:22 No.108198589

>>108198555
if you ask nicely they might bring the wash to your car, though this may be unethical

Anonymous
02/20/26(Fri)14:36:29 No.108198590

Anonymous 02/20/26(Fri)14:36:29 No.108198590

>>108198552
yeah i could see that. but its not changing my mind. 0 is off 1 is baseline or neutral.

Anonymous
02/20/26(Fri)14:39:17 No.108198603

Anonymous 02/20/26(Fri)14:39:17 No.108198603

>>108198257
To me this suggests that their base models are undertrained and the token probability distribution is poor as a result. Pre-llama era models would also be unusable at temperature 1.

Anonymous
02/20/26(Fri)14:40:21 No.108198612

Anonymous 02/20/26(Fri)14:40:21 No.108198612

File: WALK.png (141 KB, 1227x798)

141 KB PNG

>>108198555
I felt compelled to try this meme on 'toss. I was not disappointed by the result hahahah

Anonymous
02/20/26(Fri)14:40:55 No.108198617

Anonymous 02/20/26(Fri)14:40:55 No.108198617

>>108198587
>doable with gpt-sovits
i'm on v2 because someone was kind enough to train character voices for it. emotions other than exclamation are impossible

Anonymous
02/20/26(Fri)14:42:40 No.108198629

Anonymous 02/20/26(Fri)14:42:40 No.108198629

>>108194845
feet

Anonymous
02/20/26(Fri)14:43:48 No.108198641

Anonymous 02/20/26(Fri)14:43:48 No.108198641

>>108198617
>someone was kind enough to train character voices for it
which characters are you using? Which v2 model?

Anonymous
02/20/26(Fri)14:45:18 No.108198650

Anonymous 02/20/26(Fri)14:45:18 No.108198650

>>108198612
They're totally AGI and sentient though.

Anonymous
02/20/26(Fri)14:46:59 No.108198665

Anonymous 02/20/26(Fri)14:46:59 No.108198665

>>108198641
Does it matter? https://huggingface.co/therealvul/GPT-SoVITS-v2

Anonymous
02/20/26(Fri)14:51:29 No.108198689

Anonymous 02/20/26(Fri)14:51:29 No.108198689

I'm starting to think model makers were actually on to something with the "But wait..." thinking spam.

Anonymous
02/20/26(Fri)14:54:57 No.108198711

Anonymous 02/20/26(Fri)14:54:57 No.108198711

File: butwait.png (177 KB, 774x1293)

177 KB PNG

>>108198689
holy shit!

Anonymous
02/20/26(Fri)14:56:23 No.108198722

Anonymous 02/20/26(Fri)14:56:23 No.108198722

>>108198711
>finds the correct answer before the butwait

Anonymous
02/20/26(Fri)14:57:07 No.108198728

Anonymous 02/20/26(Fri)14:57:07 No.108198728

File: 2026-02-20_194847_seed1_00001_.png (1.2 MB, 1024x1024)

1.2 MB PNG

>trying Anima preview
Not perfect, but it's got promise. Feels like Noob except it can do text, kind of. Hopefully the final version turns out well.

Anonymous
02/20/26(Fri)14:57:26 No.108198731

Anonymous 02/20/26(Fri)14:57:26 No.108198731

>>108198711
>drummer

Anonymous
02/20/26(Fri)14:58:29 No.108198738

Anonymous 02/20/26(Fri)14:58:29 No.108198738

>>108198731
4.1 cydonia is better than base small.

Anonymous
02/20/26(Fri)14:59:39 No.108198744

Anonymous 02/20/26(Fri)14:59:39 No.108198744

File: 2026-02-20_192921_seed4_00001_.png (1.16 MB, 832x1216)

1.16 MB PNG

>>108198728

Anonymous
02/20/26(Fri)15:00:58 No.108198751

Anonymous 02/20/26(Fri)15:00:58 No.108198751

>>108198665
>Does it matter?
No, but I still run v2 so was curious what trained model/characters you were using.

Anonymous
02/20/26(Fri)15:08:21 No.108198802

Anonymous 02/20/26(Fri)15:08:21 No.108198802

>>108196797
I am going hollow. Tried the webui in multiple browsers thinking it might just be some broken JS in firefox. Unless I'm crazy, they merged a feature that just plain does not work, the heck?

Anonymous
02/20/26(Fri)15:34:53 No.108198958

Anonymous 02/20/26(Fri)15:34:53 No.108198958

>>108198728
another anon is training a better version. we should be hearing about it soon

Anonymous
02/20/26(Fri)15:43:25 No.108199007

Anonymous 02/20/26(Fri)15:43:25 No.108199007

I'm retarded. How do I download Llama model weights without signing their gay ass agreement and giving Zuck my info?
I want to implement the model myself, not just run it btw.

Anonymous
02/20/26(Fri)15:46:53 No.108199033

Anonymous 02/20/26(Fri)15:46:53 No.108199033

>>108199007
someone might have re-uploaded the safe tensors, you can try to look around.

Anonymous
02/20/26(Fri)15:47:27 No.108199040

Anonymous 02/20/26(Fri)15:47:27 No.108199040

>>108199007
iirc all of them have public mirrors on huggingface

Anonymous
02/20/26(Fri)15:48:19 No.108199047

Anonymous 02/20/26(Fri)15:48:19 No.108199047

>>108198802
source?

Anonymous
02/20/26(Fri)15:54:40 No.108199092

Anonymous 02/20/26(Fri)15:54:40 No.108199092

File: 2026-02-20_205202_seed2_00001_.png (1.47 MB, 1344x768)

1.47 MB PNG

>>108198958
Nice. I'll probably not spend much more time playing with this one then.

Anonymous
02/20/26(Fri)16:02:12 No.108199139

Anonymous 02/20/26(Fri)16:02:12 No.108199139

>>108198728
cool ATs

Anonymous
02/20/26(Fri)16:29:19 No.108199328

Anonymous 02/20/26(Fri)16:29:19 No.108199328

>>108198738
>[drummer something] is better than [real thing]
[headcanon]

Anonymous
02/20/26(Fri)16:32:06 No.108199351

Anonymous 02/20/26(Fri)16:32:06 No.108199351

>>108199007
>I'm retarded
The first step to learning is admitting that
>How do I download Llama model weights without signing their gay ass agreement and giving Zuck my info?
You could always hop on the original leak torrent if you mean the OG llama model. Otherwise mirrors abound, as others have said
>I want to implement the model myself, not just run it btw.
This is the part I'm most curious about: what do you think "implement" means, especially in relation to "run"?

Anonymous
02/20/26(Fri)16:47:26 No.108199452

Anonymous 02/20/26(Fri)16:47:26 No.108199452

>>108197704
Yeah once AGI is here it's essentially guaranteed that the weights will just be etched into silicon directly and we will mass produce AGI chips and implement them in literally every device because of its low cost. Like how we now have embedded systems running Linux just in Bluetooth speakers because it's easier and cheaper to use an entire SoC running Linux than putting in a microcontroller.

We will have traffic light controllers with AGI chips in them. Kind of a next level of horror scenario but I genuinely think this is going to happen.

Anonymous
02/20/26(Fri)16:47:54 No.108199458

Anonymous 02/20/26(Fri)16:47:54 No.108199458

>>108199351
I want to write my own inference kernel in Triton, to learn about GPU programming and LLM slop

Anonymous
02/20/26(Fri)16:58:44 No.108199552

Anonymous 02/20/26(Fri)16:58:44 No.108199552

>>108199452
>once AGI is here
why do we have this kind of crazy people here
ungenuine question, I'm just wishful thinking a parallel /lmg/ dimension where low iq jeets are not allowed

Anonymous
02/20/26(Fri)16:59:27 No.108199559

Anonymous 02/20/26(Fri)16:59:27 No.108199559

>>108199458
is this like a school project or something or are you serious about wasting your time on that shit.
LLM slop is basically slang for low quality output, which could mean literally anything when interpreted by different people.

Anonymous
02/20/26(Fri)17:00:13 No.108199572

Anonymous 02/20/26(Fri)17:00:13 No.108199572

>>108197704
>no one will ever make a drummer shittune dedicated hardware
Sometimes I realize this world might have been a darker place than it actually is.

Anonymous
02/20/26(Fri)17:13:52 No.108199680

Anonymous 02/20/26(Fri)17:13:52 No.108199680

Did people stop constructing uncensored models from existing stuff or why is mistral nemo recced for local masturbation in 2026

Anonymous
02/20/26(Fri)17:16:13 No.108199702

Anonymous 02/20/26(Fri)17:16:13 No.108199702

>>108199552
even if its 10000 years later that stuff is inevitable, the only question is whether we get to enjoy it

Anonymous
02/20/26(Fri)17:17:02 No.108199709

Anonymous 02/20/26(Fri)17:17:02 No.108199709

>>108197704
As in it's literally only allowed to use 1 specific model? Dafuq? Can they scale it up to large MoE models?

Anonymous
02/20/26(Fri)17:18:02 No.108199718

Anonymous 02/20/26(Fri)17:18:02 No.108199718

>>108199680
Mistral Nemo is the last small non-benchmaxxed (math, agents, reasoning, etc) model trained on pirate books. Other than that I think it's just been memed to popularity.

Anonymous
02/20/26(Fri)17:21:14 No.108199747

Anonymous 02/20/26(Fri)17:21:14 No.108199747

>>108199709
the weights are like literally physically built into each chip, so no it doesn't swap models

>can they scale it up
well that's the billion dollar question

Anonymous
02/20/26(Fri)17:46:06 No.108199945

Anonymous 02/20/26(Fri)17:46:06 No.108199945

>>108199552
>I'm just wishful thinking a parallel /lmg/ dimension where low iq jeets are not allowed
Take one look at the /g/ catalog and you'll realize it won't ever happen here. We need an alternative.

Anonymous
02/20/26(Fri)17:49:26 No.108199974

Anonymous 02/20/26(Fri)17:49:26 No.108199974

>>108199680
It was the last uncensored and non-codemaxxed model. If you use it for RP it's noticeably smarter than even bigger models at getting and handling non-obvious cues that's why people like it so much, I know I liked it more than all other sub-35B models

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.