/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/21/26(Sat)02:29:10 No.108202477

File: burn it the fuck down.png (2.06 MB, 768x1344)

2.06 MB PNG

/lmg/ - Local Models General Anonymous 02/21/26(Sat)02:29:10 No.108202477 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108194845 & >>108186120

►News
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/21/26(Sat)02:29:56 No.108202486

Anonymous 02/21/26(Sat)02:29:56 No.108202486

File: wee chef.jpg (208 KB, 1536x1536)

208 KB JPG

►Recent Highlights from the Previous Thread: >>108194845

--LLM riddle-solving benchmark with Nanbeige4.1 3B outperforming larger models:
>108195946 >108195951 >108195966 >108196108 >108195980 >108196089 >108196118 >108196338 >108196607 >108196253 >108196325 >108196341 >108196418 >108196723 >108197138 >108197511 >108198231 >108198237 >108198287 >108198392 >108198552 >108198590 >108198411 >108198257 >108198603 >108197522 >108197533 >108196951
--ggml.ai joins Hugging Face:
>108195832 >108195855 >108195863 >108196086 >108196121 >108195865 >108195873 >108195891 >108195919 >108196052 >108196316 >108196356 >108196392 >108196541 >108196556 >108196673 >108196712 >108197620 >108197645 >108197666 >108197685 >108197902 >108198008 >108196730 >108196731 >108197108 >108198165 >108198208
--Training PPO agents for Atari games with RL:
>108200857 >108200878 >108201026 >108201044 >108201087 >108201069 >108201116 >108201162 >108201195 >108201200 >108201271
--Debating RAG's effect on perplexity and measurement validity
>108194930 >108195001 >108195061 >108195031 >108195071 >108195137 >108195124 >108195136 >108195141 >108195321 >108195452 >108195541 >108195732 >108195779 >108195852 >108195924 >108196308 >108195683 >108195769
--MOSS-TTS benchmarks and evaluation of open TTS models:
>108200383
--Critiquing finetuning efforts and increased censorship in RL-tuned models:
>108194993 >108195055 >108196793
--The path to ubiquitous AI:
>108196649 >108196718 >108196726 >108196851 >108197704 >108199452 >108199709 >108199747
--Effectiveness of "But wait..." reasoning:
>108198689 >108198711 >108198722
--Exploring emotional voice cloning with GPT-SoVITS:
>108198046 >108198587 >108198617 >108198641 >108198665 >108198751
--LLM fails car wash test due to misaligned reasoning:
>108198451
--Miku (free space):
>108197372 >108198375 >108198447 >108198728 >108198744 >108199092

►Recent Highlight Posts from the Previous Thread: >>108194853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/21/26(Sat)02:34:31 No.108202526

Anonymous 02/21/26(Sat)02:34:31 No.108202526

sex with radical miku

Anonymous
02/21/26(Sat)02:36:29 No.108202543

Anonymous 02/21/26(Sat)02:36:29 No.108202543

File: computers-must-shut-up.png (475 KB, 900x900)

475 KB PNG

>>108202477

Anonymous
02/21/26(Sat)02:41:02 No.108202568

Anonymous 02/21/26(Sat)02:41:02 No.108202568

I wants to be penetrated, I does not want to penetrate.

Anonymous
02/21/26(Sat)02:48:34 No.108202607

Anonymous 02/21/26(Sat)02:48:34 No.108202607

>>108202568
too late, geegeeemel is le cringe face now

Anonymous
02/21/26(Sat)04:03:53 No.108202914

Anonymous 02/21/26(Sat)04:03:53 No.108202914

File: 1749572428590778.jpg (49 KB, 612x765)

49 KB JPG

>>108202477
Whatever happened to the dude that started the animated vrm model chat interface?

Anonymous
02/21/26(Sat)04:04:55 No.108202927

Anonymous 02/21/26(Sat)04:04:55 No.108202927

File: 1749783149395760.png (856 KB, 1024x768)

856 KB PNG

>>108202661
Depends how much RAM you have.

Anonymous
02/21/26(Sat)04:10:44 No.108202968

Anonymous 02/21/26(Sat)04:10:44 No.108202968

>>108202914
He's been cooming non-stop since the last time we saw him.

Anonymous
02/21/26(Sat)04:11:30 No.108202971

Anonymous 02/21/26(Sat)04:11:30 No.108202971

>>108202968
Damn, guess he finished it after all.

Anonymous
02/21/26(Sat)04:11:59 No.108202974

Anonymous 02/21/26(Sat)04:11:59 No.108202974

File: Mother-Anna_SillyTavern.jpg (605 KB, 3456x2084)

605 KB JPG

>>108202477
Finally took the SillyTavern Pill along with GLM 4.6 and I'm liking it so far.

Anonymous
02/21/26(Sat)04:15:02 No.108202988

Anonymous 02/21/26(Sat)04:15:02 No.108202988

>>108202974
>end of box
fix youre chat template BRUH

Anonymous
02/21/26(Sat)04:15:09 No.108202989

Anonymous 02/21/26(Sat)04:15:09 No.108202989

>>108202974
>wan't

Anonymous
02/21/26(Sat)04:15:32 No.108202992

Anonymous 02/21/26(Sat)04:15:32 No.108202992

File: bigsam.png (168 KB, 635x676)

168 KB PNG

https://www.wsj.com/us-news/law/openai-employees-raised-alarms-about-canada-shooting-suspect-months-ago-b585df62 (https://archive.is/EqNrW)

Stay local, anons.

Anonymous
02/21/26(Sat)04:17:17 No.108203004

Anonymous 02/21/26(Sat)04:17:17 No.108203004

https://github.com/ikawrakow/ik_llama.cpp/pull/1288
Schizo fork merged qwen

Anonymous
02/21/26(Sat)04:17:26 No.108203006

Anonymous 02/21/26(Sat)04:17:26 No.108203006

>>108202992
>her

Anonymous
02/21/26(Sat)04:44:32 No.108203126

Anonymous 02/21/26(Sat)04:44:32 No.108203126

>>108202992
>she

Anonymous
02/21/26(Sat)04:50:03 No.108203147

Anonymous 02/21/26(Sat)04:50:03 No.108203147

>>108202477
>>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
This is disastrous. Huggingface was a platform of freedom that acted agnostic to how people were running their models.
Now we're one step away from them forcing poor AI companies from creating mandatory day 1 support for their own llama.cpp/ggml solution.
I hope they don't follow through with this for the sake of huggingface.

Anonymous
02/21/26(Sat)05:00:46 No.108203195

Anonymous 02/21/26(Sat)05:00:46 No.108203195

>>108203147
yeah fuck now we'll get day0 support for local inference... AGH this is SO BAD!!!!!

Anonymous
02/21/26(Sat)05:04:29 No.108203208

Anonymous 02/21/26(Sat)05:04:29 No.108203208

>>108203195
>we'll get day0
there is no explanation of how such a thing could happen
are they going to write a llama.cpp/transformer bridge and import the whole garbage in?
are they going to write an agentic framework to automatically llm slop convert models from transformer to llama.cpp?
absolutely neither of those things sound happening
and I doubt they're going to do something like refactoring the whole of llama.cpp to make it structurally similar to the pile of jeet poo that is transformers
btw it's a tragedy how transformers became a leader of the landscape
comfy guy truly saved image models by proposing a more popular thing across the board to displace diffusers, which is equally as shitty as transformers

Anonymous
02/21/26(Sat)05:08:46 No.108203226

Anonymous 02/21/26(Sat)05:08:46 No.108203226

>>108203208
you still see a lot of 0day diffusers support compared to comfy (sadly). for LLMs it's mostly transformers/vllm/sglang

Anonymous
02/21/26(Sat)05:15:02 No.108203252

Anonymous 02/21/26(Sat)05:15:02 No.108203252

>>108203208
ggerganov showed an odd amount of interest in the vibecoded qwen support a couple of weeks ago which tried to port over the implementation from transformers using opus 4.6.
maybe he's planning an automated platform like that together with his new huggingface puppets

Anonymous
02/21/26(Sat)05:23:10 No.108203282

Anonymous 02/21/26(Sat)05:23:10 No.108203282

>>108203004
>3x faster than mainline.
it's good there is a fork, llama.cpp performance doesn't look good

Anonymous
02/21/26(Sat)05:44:24 No.108203361

Anonymous 02/21/26(Sat)05:44:24 No.108203361

File: file.png (247 KB, 306x957)

247 KB PNG

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
02/21/26(Sat)06:20:39 No.108203487

Anonymous 02/21/26(Sat)06:20:39 No.108203487

Is it possible to configure a local model to do code suggestions/autocomplete at a level similar to github copilot? I’ve been experimenting with quen3-coder:30b and qwen2.5-coder:7b using vscode->continue->ollama and getting some decent results some of the time, but most of the time it makes up functions that aren’t there or misuses objects i’ve defined in another file. Seems like maybe a context issue? Has anyone gotten something like this working well? I have a 5090 and 64gb of ram

Anonymous
02/21/26(Sat)06:30:33 No.108203517

Anonymous 02/21/26(Sat)06:30:33 No.108203517

>>108203487
There's no such a thing as a model as good as what you get from copilot for FIM, but let me tell you a little something, this is one of the area where cope quanting is also the most visible. Run whatever Qwen model you can run at Q8, rather than the biggest you can fit. You will absolutely get an improvement, contrary to many popular beliefs. Large models have more knowledge, but it is less relevant to this than staying coherent, something a 30B moe with only 3B active cannot be if you quant it.

Anonymous
02/21/26(Sat)06:42:43 No.108203560

Anonymous 02/21/26(Sat)06:42:43 No.108203560

>>108203517
>quants decrease coherence but don't affect knowledge
I am not saying that you're wrong but you're definitely talking out of your ass with no proof to back up your statements.

Anonymous
02/21/26(Sat)06:44:26 No.108203563

Anonymous 02/21/26(Sat)06:44:26 No.108203563

>>108203361
The LLM didn't make this stuff up itself.

Anonymous
02/21/26(Sat)06:50:10 No.108203599

Anonymous 02/21/26(Sat)06:50:10 No.108203599

>>108203487
I would look into using a more specialized model for this like https://huggingface.co/sweepai/sweep-next-edit-1.5B

Anonymous
02/21/26(Sat)07:02:15 No.108203645

Anonymous 02/21/26(Sat)07:02:15 No.108203645

>>108203560
Read again retard

Anonymous
02/21/26(Sat)07:12:53 No.108203689

Anonymous 02/21/26(Sat)07:12:53 No.108203689

File: modelcap_q.png (177 KB, 1134x601)

177 KB PNG

>>108203560
nta, but https://arxiv.org/abs/2404.05405

Anonymous
02/21/26(Sat)07:17:24 No.108203707

Anonymous 02/21/26(Sat)07:17:24 No.108203707

>>108203689
In other words, quantization definitely affects at the very least the model's potential information capacity (2 bits/weight). Whether that's going to affect the model or not, depends on how much overtrained is the model. Rare knowledge is affected more than common knowledge.

Anonymous
02/21/26(Sat)07:29:53 No.108203753

Anonymous 02/21/26(Sat)07:29:53 No.108203753

>>108203689
>gpt2
>int quanting
anything from the last century?

Anonymous
02/21/26(Sat)07:31:54 No.108203761

Anonymous 02/21/26(Sat)07:31:54 No.108203761

>>108202974
Now put your real name and deepest secrets in persona description

Anonymous
02/21/26(Sat)07:33:19 No.108203765

Anonymous 02/21/26(Sat)07:33:19 No.108203765

>>108202477
>Smash the servers
No I want the chips

Anonymous
02/21/26(Sat)07:34:07 No.108203768

Anonymous 02/21/26(Sat)07:34:07 No.108203768

>>108203689
GPT2 is a shit model, it's dumb for its size. In other words, its weights aren't very information dense and should be more resistant to quantization

Anonymous
02/21/26(Sat)07:38:13 No.108203782

Anonymous 02/21/26(Sat)07:38:13 No.108203782

>>108203753
You might be able to derive similar conclusions from these papers, they just don't wrap it in a simple-to-use sound bite.
https://arxiv.org/abs/2411.17691
https://arxiv.org/abs/2501.02423
https://arxiv.org/abs/2505.14302

Anonymous
02/21/26(Sat)07:42:34 No.108203790

Anonymous 02/21/26(Sat)07:42:34 No.108203790

Minimax 2.5, like the one before it, spends the majority of its effort on refusals
Despite being entirely on my own machine, I still have bugmen with god complexes causing non-stop problems

Its less obnoxious to jailbreak the actual corporate models than this shit

Anonymous
02/21/26(Sat)07:42:34 No.108203791

Anonymous 02/21/26(Sat)07:42:34 No.108203791

File: Shiveres down Miqu's spine.png (314 KB, 2076x1739)

314 KB PNG

>>108202988
>>108202989
>>108203761
Well fuck..... you guys weren't kidding about the "shivers down my spine" shit. To give it credit that's only occurred after the fifth roll but still. Model is a 4-bit MLX quant of Midnight-Miqu. Have any of you used any models that don't do this at all? This doesn't irritate me anywhere near as much as it seems to irritate you guys but I'm just curious.

Anonymous
02/21/26(Sat)07:42:46 No.108203792

Anonymous 02/21/26(Sat)07:42:46 No.108203792

>rping with bot
>two new characters join, making the scene very noisy
>i prompt them to suddenly disappear and never appear again
>all characters in the scene wonder what the fuck just happenes before moving on
It was hilarious and I laughed my fucking ass off but then I re-read again and just felt really bad and sick.

Anonymous
02/21/26(Sat)07:45:24 No.108203800

Anonymous 02/21/26(Sat)07:45:24 No.108203800

>>108203791
if you are shopping in the range of 70b then the only improvement you can make is llama 3.3 and its various tunes
all other models are xbox hueg moes or tiny farts

Anonymous
02/21/26(Sat)07:46:07 No.108203804

Anonymous 02/21/26(Sat)07:46:07 No.108203804

I have reshuffled my ram around my different machines which has resulted in this computer having 192gb along with a 12gb rtx 3080.
furthermore I have been looking at the models recommended in the links above. are the 3 bit versions of the larger models, something like glm 4.7 that much better than the 8 bit versions of the smaller modes?

I guess I am asking is how much quantization fuck with the models? Is the larger but more heavily quantized model always worth it?

Anonymous
02/21/26(Sat)07:46:27 No.108203806

Anonymous 02/21/26(Sat)07:46:27 No.108203806

File: 1771045419718687.jpg (605 KB, 1504x2000)

605 KB JPG

/wait/ went page 10 last night. Given there's no new API or weights published, just updates to the web app, there's really nothing to discuss so no new /wait/ thread.
We just genned a few more Dipsy, and now /wait/ for 2 more weeks, again, until something happens...
Mega updated. Minor rentry updates:
https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1w
https://rentry.org/DipsyWAIT

Anonymous
02/21/26(Sat)07:47:48 No.108203807

Anonymous 02/21/26(Sat)07:47:48 No.108203807

For local the only real options are llama.cpp and ik_llama.cpp, dont kid yourselves.
I was a exllama/tabbyapi lover, but its development is way slower, and stuff like tool calling doesn't really work.
I don't even think it has any performance advantage at this point.

Most of us run a variety of different size and arch GPU's, so llama.cpp/ik is the only thing that realistically supports this mixed setups.
The only real alternative is vllm in pipeline paralelism mode, using VLLM_PP_LAYER_PARTITION to assign the layers proportionally, and using AWQ as a quant as that's the only thing that is really support in ampere, hopper and blackwell at the same time. The MXFP4 and NVFP4 don't load to save my live even though the marlin kernel is supposed to support them on ampere.

And I'm the first one that would like anything else than llama.cpp to be the only option. vLLM is really fast on multiple parallel request, which I think is the only missing piece llama.cpp has at the moment.

Anonymous
02/21/26(Sat)07:49:00 No.108203812

Anonymous 02/21/26(Sat)07:49:00 No.108203812

File: 91580.png (195 KB, 2076x1715)

195 KB PNG

>>108203800
Silly tavern supports logic bias and banned tokens so I wonder if that could help too.

Anonymous
02/21/26(Sat)07:51:58 No.108203828

Anonymous 02/21/26(Sat)07:51:58 No.108203828

>>108203812
token =/= word
you'll ban shi ve rs individually instead of shivers which will probably fuck the model up

Anonymous
02/21/26(Sat)07:52:56 No.108203830

Anonymous 02/21/26(Sat)07:52:56 No.108203830

>>108203812
koboldcpp has their antislop feature you might want to look into it can ban words and sequences and is imo one of the main draw of it compared to lcpp

Anonymous
02/21/26(Sat)08:02:09 No.108203850

Anonymous 02/21/26(Sat)08:02:09 No.108203850

>>108203807
I’d just like to interject for a moment. What you’re refering to as llama.cpp, is in fact, HuggingFace/llama.cpp, or as I’ve recently taken to calling it, HuggingFace plus llama.cpp. llama.cpp is not a complete library for language models unto itself, but rather another MIT component of a fully functioning HuggingFace system made useful by the HuggingFace transformers library, the safetensors format and a website for hosting and downloading model files.

Many computer users run a modified version of a HuggingFace model every day, without realizing it. Through a peculiar turn of events, the version of safetensors which is widely used today is often called GGUF, and many of its users are not aware that it is basically safetensors, developed by HuggingFace.

There really is a llama.cpp, and these people are using it, but it is just a part of broader software ecosystem. llama.cpp is the inference engine: the program that evaluates the weights of the language model and produces token predictions. The inference engine is an essential part of the pipeline, but useless by itself; it can only function after a language model has already been trained. llama.cpp is normally used in combination with models trained via HuggingFace: the whole system is basically HuggingFace with llama.cpp added, or HuggingFace/llama.cpp. All the so-called llama.cpp users are really users of HuggingFace!

Anonymous
02/21/26(Sat)08:11:56 No.108203879

Anonymous 02/21/26(Sat)08:11:56 No.108203879

File: 1746449410765286.jpg (42 KB, 622x530)

42 KB JPG

>try new model
>enjoy it for a while
>notice more and more of its flaws and slop, get sick of it
>boot up Nemo again to compare
>Nemo absolutely mogs the newer, bigger model in A/B comparisons
I don't know whether Nemo was a blessing or a curse. Years later it's still SOTA among <200B models, despite being borderline retarded.

Anonymous
02/21/26(Sat)08:12:38 No.108203881

Anonymous 02/21/26(Sat)08:12:38 No.108203881

File: 1741144969015121.png (102 KB, 1280x1280)

102 KB PNG

>>108203806

Anonymous
02/21/26(Sat)08:23:44 No.108203915

Anonymous 02/21/26(Sat)08:23:44 No.108203915

>>108203879
>Years later it's still SOTA among <200B models,
I mean, if you're a coomer, maybe, but something like Qwen 4B completely destroys it at doing things like summarizing 20K+ worth of tokens, basic RAG and tool calling utilities, translating chinesium into human language, having vision for tagging photos (vs no vision on nemo) etc.
And yes, I'm comparing it to 4B instead of 8B and 14B qwen on purpose as a humiliation: Nemo can't even be more useful than the almost smallest qwen models for actual usages that don't involve jerking it to text, a woman's fetish.

Anonymous
02/21/26(Sat)08:26:58 No.108203936

Anonymous 02/21/26(Sat)08:26:58 No.108203936

>>108203807
>stuff [any number of random things in random new versions (he pulled!) doesn't really work
could also be said about vLLM, which gets models early just to break them as fast (if they're not broken on day one)

Anonymous
02/21/26(Sat)08:37:22 No.108203984

Anonymous 02/21/26(Sat)08:37:22 No.108203984

>>108203599
note that their official site benches it only against models that weren't trained for FIM (the only qwen 3 that supports FIM for good is the moe coder variant), and doesn't compare the 1.5B model to the original qwen2.5 coder 1.5B, which does support FIM. They do compare their 7B against the 2.5 7B and say it's better, but their 7B isn't open weight.
IMHO having tested the 1.5 I think it's yet another finetroon placebo made by worthless pieces of shit.

Anonymous
02/21/26(Sat)08:49:42 No.108204052

Anonymous 02/21/26(Sat)08:49:42 No.108204052

File: typical_mikutroon.jpg (21 KB, 334x334)

21 KB JPG

>>108203806
>We
>just genned a few more Dipsy,

Anonymous
02/21/26(Sat)08:50:40 No.108204054

Anonymous 02/21/26(Sat)08:50:40 No.108204054

>>108203707
>In other words
you're doing non sequitur because I wasn't making the point you seem to think I was making
you have the reading comprehension of a 3B llm and should consider offing yourself to improve the human gene pool

Anonymous
02/21/26(Sat)09:12:32 No.108204125

Anonymous 02/21/26(Sat)09:12:32 No.108204125

>>108203915
LLMs are toys. If you use them for work then your work is meaningless, and so are you.

Anonymous
02/21/26(Sat)09:13:14 No.108204127

Anonymous 02/21/26(Sat)09:13:14 No.108204127

>>108204125
Zamn, bro got rekt frfr

Anonymous
02/21/26(Sat)09:16:30 No.108204143

Anonymous 02/21/26(Sat)09:16:30 No.108204143

File: thing_miku.jpg (993 KB, 1664x2048)

993 KB JPG

>>108203791
>anons first shiver
adorable!

Anonymous
02/21/26(Sat)09:29:26 No.108204204

Anonymous 02/21/26(Sat)09:29:26 No.108204204

>>108202968
>>108202971
absorbed by the gooniverse, many such cases

Anonymous
02/21/26(Sat)09:30:43 No.108204213

Anonymous 02/21/26(Sat)09:30:43 No.108204213

How are you guys RP'ing? I've been rawdogging defautl SillyTavern for 3 years and I find LLMs cannot drive the plot at all even with thinking enabled. Maybe on the first turn but after that they're all the same dead fish that can only react. I'm thinking it's more of a skill issue.

Anonymous
02/21/26(Sat)09:34:23 No.108204226

Anonymous 02/21/26(Sat)09:34:23 No.108204226

>>108204213
That has nothing to do with "defautl SillyTavern" (what does that mean), that's a prompting issue.
Although, Silly tavern actually can help with that, since you can use its macro system to generate "entropy", dynamically inject instructions into the context, etc.
Try fucking around with assistant response prefils and the {{random}} or {{pick}} macros.
One nice thing about reasoning/thinking models is that you can prefill the thinking with a procedure to decide when to be more passive, more forward, to add a twist, etc.
Good luck.

Anonymous
02/21/26(Sat)09:48:32 No.108204300

Anonymous 02/21/26(Sat)09:48:32 No.108204300

File: stqr.png (167 KB, 764x1755)

167 KB PNG

>>108204226
Yea ST script can do a lot, Make some custom buttons with Quick Reply options

Anonymous
02/21/26(Sat)09:51:02 No.108204313

Anonymous 02/21/26(Sat)09:51:02 No.108204313

File: 39 confirmed kills.jpg (187 KB, 1216x832)

187 KB JPG

Anonymous
02/21/26(Sat)10:21:45 No.108204468

Anonymous 02/21/26(Sat)10:21:45 No.108204468

>>108203689
But they don't claim universality of the quantization finding (GPTQ is obsolete), and further suggest QAT as a way to approach the 2-bit weight limit:
>We used the auto-gptq package, which is inspired by the GPTQ paper [10], for quantization... Unfortunately, using this quantization package, reducing the model to int4 significantly diminishes its capacity (more than 2x loss from int8 to int4). This suggests for high-quality int4 models, incorporating quantization during training may be necessary.

Anonymous
02/21/26(Sat)10:25:59 No.108204485

Anonymous 02/21/26(Sat)10:25:59 No.108204485

where's the news

Anonymous
02/21/26(Sat)10:30:25 No.108204507

Anonymous 02/21/26(Sat)10:30:25 No.108204507

>>108204468
Other papers indicate around 4-bit precision as the practical limit for quantization-aware training.

Anonymous
02/21/26(Sat)10:32:39 No.108204517

Anonymous 02/21/26(Sat)10:32:39 No.108204517

>>108204507
damn, bitnet when?

Anonymous
02/21/26(Sat)10:33:17 No.108204522

Anonymous 02/21/26(Sat)10:33:17 No.108204522

File: file.png (10 KB, 367x477)

10 KB PNG

I love how the schizo fork just randomly decides to explode during generation without any debug print.

Anonymous
02/21/26(Sat)10:36:58 No.108204554

Anonymous 02/21/26(Sat)10:36:58 No.108204554

>>108203879
Man. I wish modern models weren't so ground down into the same few grooves, it makes getting any variety in replies nearly impossible. Older models are unparalleled at variety. Now, the shit it puts out isn't always GOOD, but you bet your ass when it puts out something good that it's going to be novel and blow your dick off, too. I miss old Claude so bad...

Anonymous
02/21/26(Sat)10:37:49 No.108204558

Anonymous 02/21/26(Sat)10:37:49 No.108204558

>>108204517
Never. The lower you get with precision, the larger the number of parameters has to be to compensate the performance loss with prolonged training. You can train a bitnet model if you want, but it's not going to bring benefits other than potentially simpler hardware without matmul (which doesn't exist yet).

Anonymous
02/21/26(Sat)10:44:31 No.108204606

Anonymous 02/21/26(Sat)10:44:31 No.108204606

>>108204313
I like this Miku

Anonymous
02/21/26(Sat)10:48:20 No.108204631

Anonymous 02/21/26(Sat)10:48:20 No.108204631

>>108204522
illya bros... we lost!

Anonymous
02/21/26(Sat)10:55:19 No.108204678

Anonymous 02/21/26(Sat)10:55:19 No.108204678

Is it safe to allow your agentic "something" to write scripts, and test them fully autonomously while you are AFK? How can I sandbox it on Linux?

Don't call me retard for now, I just want to know

Anonymous
02/21/26(Sat)10:56:30 No.108204686

Anonymous 02/21/26(Sat)10:56:30 No.108204686

https://xcancel.com/karpathy/status/2023476423055601903#m
lol, this is one of the biggest ai influencer, who has hand written inference implementations but he can't tell he's replying to actual llm slop instead of a human
then I checked the identity of the slopper.. and it's huggingface's cofounder.
Ah, this field. It's a human centipede.
>The catch: unknown unknowns remain unknown. The true extent of AI's impact will hinge on whether complete coverage of testing, edge cases, and formal verification is achievable. In an AI-dominated world, formal verification isn't optional—it's essential.
imagine posting this crap unironically

Anonymous
02/21/26(Sat)10:57:34 No.108204693

Anonymous 02/21/26(Sat)10:57:34 No.108204693

>>108204678
docker

Anonymous
02/21/26(Sat)11:09:25 No.108204740

Anonymous 02/21/26(Sat)11:09:25 No.108204740

How is the new qwen? Does it beat r1?

Anonymous
02/21/26(Sat)11:10:27 No.108204745

Anonymous 02/21/26(Sat)11:10:27 No.108204745

>>108204678
A Whonix VM (so your IP can't leak) and a shared directory. You can use vsocks if you want to share a service with the VM or the host, a serial connection to access the console so you don't need SSH, and snapshots to rollback to a clean state. It's basically bullet-proof.

Anonymous
02/21/26(Sat)11:16:11 No.108204776

Anonymous 02/21/26(Sat)11:16:11 No.108204776

>>108204686
Setting aside the fact that it was LLM-generated, it's a good point. You might start seeing more code written in languages like Idris or F* to make sure that bot-generated code behaves as intended, without requiring a human to manually review every line. Bots working autonomously would offset the labor-intensive nature of writing in formal verification languages.
It's the next logical step when you realize how much specs, documentation, static typing and unit tests help when programming with agents.

Anonymous
02/21/26(Sat)11:36:33 No.108204888

Anonymous 02/21/26(Sat)11:36:33 No.108204888

>>108202914
probably just hit a wall with what he could do with vibe coding and gave up.

Anonymous
02/21/26(Sat)11:46:11 No.108204951

Anonymous 02/21/26(Sat)11:46:11 No.108204951

>>108203791

>>108203830
this.

Anonymous
02/21/26(Sat)12:00:26 No.108205041

Anonymous 02/21/26(Sat)12:00:26 No.108205041

>>108204313
>that body count
What a slut

Anonymous
02/21/26(Sat)12:12:48 No.108205121

Anonymous 02/21/26(Sat)12:12:48 No.108205121

>>108205041
>>108204313
All children in a school.

Anonymous
02/21/26(Sat)12:15:44 No.108205138

Anonymous 02/21/26(Sat)12:15:44 No.108205138

File: basec chatgpt 5,2.png (272 KB, 803x1031)

272 KB PNG

what model and hardware does one need to run ai locally just for code checking. Like an enhanced IDE, where it analyzed my code, checking for logic errors and code safety?
GPT and gemini both tell me its impossible and I need a tenthousands of $ expensive equipment for this.

Anonymous
02/21/26(Sat)12:16:41 No.108205145

Anonymous 02/21/26(Sat)12:16:41 No.108205145

>>108203830
This makes me think, has anybody compared the performance hit from kcpp's anti-slop and just using a BNF that negates specific sequences on llama.cpp?
I imagine that the kcpp implementation would be faster, being purpose built for this and not having to go through a fully featured grammar parser.

Anonymous
02/21/26(Sat)12:20:36 No.108205175

Anonymous 02/21/26(Sat)12:20:36 No.108205175

>>108205138
post the chat

Anonymous
02/21/26(Sat)12:34:03 No.108205273

Anonymous 02/21/26(Sat)12:34:03 No.108205273

>>108204678
What I do is just create different users and do sudo su <newly created user name>

Anonymous
02/21/26(Sat)13:06:06 No.108205465

Anonymous 02/21/26(Sat)13:06:06 No.108205465

>>108205145
I think I remember someone in the thread trying exactly this. adding their ban list as a grammar and it just completely froze generation.

Anonymous
02/21/26(Sat)13:06:18 No.108205466

Anonymous 02/21/26(Sat)13:06:18 No.108205466

this is terrible https://www.reddit.com/r/LocalLLaMA/comments/1rawoe4/psa_the_software_shade_is_a_fraudulent/

Anonymous
02/21/26(Sat)13:07:33 No.108205472

Anonymous 02/21/26(Sat)13:07:33 No.108205472

File: lol.png (412 KB, 943x771)

412 KB PNG

>>108205466

Anonymous
02/21/26(Sat)13:07:48 No.108205479

Anonymous 02/21/26(Sat)13:07:48 No.108205479

I have dignity so I'm not clicking a reddit link

Anonymous
02/21/26(Sat)13:08:36 No.108205487

Anonymous 02/21/26(Sat)13:08:36 No.108205487

>>108205479
it's about our lord and savior pew though

Anonymous
02/21/26(Sat)13:10:35 No.108205497

Anonymous 02/21/26(Sat)13:10:35 No.108205497

>>108205465
Interesting. I didn't even consider the possibility that it wouldn't work.
I'll give it a try and see how it pans out.

Anonymous
02/21/26(Sat)13:12:05 No.108205502

Anonymous 02/21/26(Sat)13:12:05 No.108205502

>>108204313
I wish to huff her sneakers and wring her panties dry directly into my face when she returns from the latest successful mission

Anonymous
02/21/26(Sat)13:12:48 No.108205509

Anonymous 02/21/26(Sat)13:12:48 No.108205509

>>108205497
It's mostly that the grammar code isn't optimized for it.

Anonymous
02/21/26(Sat)13:19:18 No.108205544

Anonymous 02/21/26(Sat)13:19:18 No.108205544

Is there a big differrence between base and instruct Nemo for erp?

Anonymous
02/21/26(Sat)13:20:30 No.108205550

Anonymous 02/21/26(Sat)13:20:30 No.108205550

>>108205497
>>108205509
Well, I tried a pretty simple test and it worked.
>launch big nigga card
>ask it to yell motherfucker
>he yells motherfucker
>rerolls a couple of times to confirm
>copy paste the upper case motherfucker into a negative grammar root ::=[^("MOTHERFUCKER")]+
>rerolls
>big nigga can't yell mothefucker anymore, doing things like “…mother… fucker.” instead.
Might fuck around with this more.

Anonymous
02/21/26(Sat)13:21:55 No.108205553

Anonymous 02/21/26(Sat)13:21:55 No.108205553

>>108205544
base is the base model and instruct is the instruct tune for it

Anonymous
02/21/26(Sat)13:22:52 No.108205561

Anonymous 02/21/26(Sat)13:22:52 No.108205561

>>108205553
Thanks.

Anonymous
02/21/26(Sat)13:23:15 No.108205564

Anonymous 02/21/26(Sat)13:23:15 No.108205564

>>108205550
Oh, it'll work for a couple words. but if you have like 100+ banned strings that's when it'll start chugging

Anonymous
02/21/26(Sat)13:24:11 No.108205570

Anonymous 02/21/26(Sat)13:24:11 No.108205570

>>108205564
skill issue? just don't ban that many kek

Anonymous
02/21/26(Sat)13:25:21 No.108205575

Anonymous 02/21/26(Sat)13:25:21 No.108205575

File: munch one crunch moment.jpg (150 KB, 1024x1024)

150 KB JPG

Anonymous
02/21/26(Sat)13:25:58 No.108205578

Anonymous 02/21/26(Sat)13:25:58 No.108205578

>>108205564
Got it. I understood " just completely froze generation" as the parser being literally broken.
The dude added a huge list as a negative grammar and that locked things, now that makes more sense.

Anonymous
02/21/26(Sat)13:26:14 No.108205580

Anonymous 02/21/26(Sat)13:26:14 No.108205580

>>108205575
what happened to her feets?

Anonymous
02/21/26(Sat)13:26:18 No.108205582

Anonymous 02/21/26(Sat)13:26:18 No.108205582

>>108205550
Brutal to limit a nigga like that
BN might change your life given a chance
logit bias/banned strings is cope

Anonymous
02/21/26(Sat)13:26:39 No.108205586

Anonymous 02/21/26(Sat)13:26:39 No.108205586

File: tactical cute according to LLM.jpg (166 KB, 832x1216)

166 KB JPG

Anonymous
02/21/26(Sat)13:28:39 No.108205600

Anonymous 02/21/26(Sat)13:28:39 No.108205600

>>108205570
You must be new because 100+ bans isn't even that big. Either that or you just can't see how slopped your chats are.

Anonymous
02/21/26(Sat)13:31:16 No.108205611

Anonymous 02/21/26(Sat)13:31:16 No.108205611

>>108205580
Disastrous rollerblading incident :(

Anonymous
02/21/26(Sat)13:31:55 No.108205615

Anonymous 02/21/26(Sat)13:31:55 No.108205615

>>108205600
NTA, but care to share your list?
I want to load something like qwen with a huge BNF list and see it explode.

Anonymous
02/21/26(Sat)13:32:55 No.108205619

Anonymous 02/21/26(Sat)13:32:55 No.108205619

>>108205616
so? if the model wants to say this it will, just use good models

Anonymous
02/21/26(Sat)13:33:00 No.108205621

Anonymous 02/21/26(Sat)13:33:00 No.108205621

>>108205615
https://huggingface.co/Sukino/SillyTavern-Settings-and-Presets/blob/main/Banned%20Tokens.txt

Anonymous
02/21/26(Sat)13:33:33 No.108205625

Anonymous 02/21/26(Sat)13:33:33 No.108205625

>>108205621
Sick. Thanks.

Anonymous
02/21/26(Sat)13:34:11 No.108205630

Anonymous 02/21/26(Sat)13:34:11 No.108205630

>>108205619
>logit bias/banned strings is cope

The air was thick with the scent of rain and something else, something palpable that hung heavy in the air as he stood by the border control gate, his eyes gleaming with a mix of pleasure and pain. Little did she know, as she approached with practiced ease, her dress hugging every curve and highlighting her bosomy breasts, that her fate was sealed in a dance as old as time. "You're a bold one, little mouse," he purred, his voice a low purr, husky and dangerous, while dust motes danced in the dimly lit booth. Despite herself, she could not help but feel a shiver run down her arched spine, her cheeks flaming as arousal pooled in her belly, a soothing balm to her fear. He leaned in, Adam's apple bobbing, and whispered barely above a whisper against her ear, "Make me yours, claim me, or I'll take what comes next with reckless abandon." It was a game changer, and as the sun dipped below the horizon, casting long shadows across the floor, she realized that for now, that was enough; propriety be damned, she would embark on this journey of mutual understanding, her heart, body, and soul belonging to the haze of pleasure that lay ahead.

>>108205619

Anonymous
02/21/26(Sat)13:34:54 No.108205635

Anonymous 02/21/26(Sat)13:34:54 No.108205635

File: file.png (32 KB, 343x632)

32 KB PNG

>>108205621
might as well ban eyes at this point

Anonymous
02/21/26(Sat)13:35:58 No.108205643

Anonymous 02/21/26(Sat)13:35:58 No.108205643

>>108205619
>just use good models
Slop has nothing to do with how good a model is.

Anonymous
02/21/26(Sat)13:36:20 No.108205650

Anonymous 02/21/26(Sat)13:36:20 No.108205650

>>108205625
>>108205621
https://github.com/sam-paech/antislop-sampler/blob/main/slop_phrase_prob_adjustments_full_list.json

Anonymous
02/21/26(Sat)13:40:27 No.108205680

Anonymous 02/21/26(Sat)13:40:27 No.108205680

File: Chottomato.jpg (116 KB, 1024x1024)

116 KB JPG

Anonymous
02/21/26(Sat)13:44:10 No.108205702

Anonymous 02/21/26(Sat)13:44:10 No.108205702

File: Screenshot 2026-02-21 at (...).png (73 KB, 1135x375)

73 KB PNG

>>108205621
>>108205625
Well, good news is that it didn't lock up.
Bad news is that Seraphina turned chinese.

>>108205650
Danke.

Anonymous
02/21/26(Sat)13:45:42 No.108205709

Anonymous 02/21/26(Sat)13:45:42 No.108205709

>>108203807
> ik_llama.cpp
cuda/cpu only

> llama.cpp
ggerganov deserves a bronze statue in his hometown for vulkan/opencl/sycl/mps/other backends support

Anonymous
02/21/26(Sat)13:51:19 No.108205754

Anonymous 02/21/26(Sat)13:51:19 No.108205754

>>108205621
Thanks. I will use this if/when base Llama.cpp includes string banning.

Anonymous
02/21/26(Sat)13:51:39 No.108205756

Anonymous 02/21/26(Sat)13:51:39 No.108205756

>>108205138
MiniMax-M2.5 Q4_K_XL with 98304 context size uses 154 GB on my machine. MiniMax is your best bet; GLM-5 is presumably better, but not worth the increase in price or decrease in speed. For hardware, I have no clue with current RAM prices. A 256 GB Mac Studio is $5600; 2x 128 GB Strix Halo machines is similar at $5200+ (was $4000 before the RAM hike).

I've never paid for cloud inference, but that's probably a better option until the datacenter debt goes bad in the next year or so.

Anonymous
02/21/26(Sat)13:52:26 No.108205759

Anonymous 02/21/26(Sat)13:52:26 No.108205759

>>108205138
And they're right. The only people here who can meaningfully run agents spent upwards of 50k on their rig.

Anonymous
02/21/26(Sat)13:52:35 No.108205761

Anonymous 02/21/26(Sat)13:52:35 No.108205761

>>108205621
why doesnt the training step account of nonsense like this

Anonymous
02/21/26(Sat)13:55:08 No.108205786

Anonymous 02/21/26(Sat)13:55:08 No.108205786

>>108205702
You can use a grammar to force only english characters. try that.

Anonymous
02/21/26(Sat)13:56:14 No.108205789

Anonymous 02/21/26(Sat)13:56:14 No.108205789

>>108205761
I think if you do that it'll just come up with new slop phrases to use. It's a problem you can't win.

Anonymous
02/21/26(Sat)13:57:45 No.108205806

Anonymous 02/21/26(Sat)13:57:45 No.108205806

>>108205702
>57sec
>146t
2.5t/s ouchy. how many t/s you get normally?

Anonymous
02/21/26(Sat)14:00:02 No.108205814

Anonymous 02/21/26(Sat)14:00:02 No.108205814

>>108205761
The training literally is the entire reason it's like this, they do it (fill the training with synthetic slop) on purpose because it improves benchmark scores which directly translates to scamming retarded investors out of more money

Anonymous
02/21/26(Sat)14:04:31 No.108205842

Anonymous 02/21/26(Sat)14:04:31 No.108205842

>>108205786
Nah, that's treating the symptom.
I will go straight after the cause and optmize the grammar to avoid the extreme amounts of branching.
Let's see how that goes.

>>108205806
Around 16 with that model.

Anonymous
02/21/26(Sat)14:05:36 No.108205846

Anonymous 02/21/26(Sat)14:05:36 No.108205846

>>108205842
>Around 16 with that model.
so 6x perf hit. It'll be worse when you force english chars only.

Anonymous
02/21/26(Sat)14:53:06 No.108206121

Anonymous 02/21/26(Sat)14:53:06 No.108206121

>>108206090
The most telling thing about this story is that they read your chats and nobody is outraged by it.

Anonymous
02/21/26(Sat)14:59:15 No.108206162

Anonymous 02/21/26(Sat)14:59:15 No.108206162

>>108205846
it'll still be worth it for the slop-free rp
one good reply is worth 25 bad slop ones

Anonymous
02/21/26(Sat)14:59:57 No.108206166

Anonymous 02/21/26(Sat)14:59:57 No.108206166

>>108206121
It's incredible how well people have been conditioned. I have never brought up privacy in a conversation with an average person without them inevitably making a look of disgust, shrugging their shoulders, and saying some variant of "Who cares?". At best they think it's an unavoidable fact of life, at worst they genuinely believe it's a good policy that keeps them safe.

Anonymous
02/21/26(Sat)15:05:22 No.108206200

Anonymous 02/21/26(Sat)15:05:22 No.108206200

>>108206166
I had that thought recently and spoke to another guy about it: can you imagine that in the times of nokia phones you would learn that the phone is listening to what you say and serving you ads on youtube later on? People would throw the phones away. And now it is a thing that just happens everyday.

Anonymous
02/21/26(Sat)15:12:47 No.108206240

Anonymous 02/21/26(Sat)15:12:47 No.108206240

best agentic local models?

Anonymous
02/21/26(Sat)15:13:56 No.108206245

Anonymous 02/21/26(Sat)15:13:56 No.108206245

>>108206240
GLM5 and K2.5 work pretty well for this

Anonymous
02/21/26(Sat)15:17:42 No.108206275

Anonymous 02/21/26(Sat)15:17:42 No.108206275

>>108206245
thanks. and how do I use agents for ERP?

Anonymous
02/21/26(Sat)15:17:55 No.108206278

Anonymous 02/21/26(Sat)15:17:55 No.108206278

>>108206121
its more that you literally have no power to do anything about it
these companies are worth 10x your little nations gdp and you asking privacy from them is replied with "oh the usa government wants your id+ chat history anyway so, ask them about it"

Anonymous
02/21/26(Sat)15:19:49 No.108206293

Anonymous 02/21/26(Sat)15:19:49 No.108206293

>>108206275
install openclaw and send it a whatsapp telling it to use its sentient agentic superpowers and access to all your files + data to find a way to do good erp with you

Anonymous
02/21/26(Sat)15:21:34 No.108206299

Anonymous 02/21/26(Sat)15:21:34 No.108206299

>>108206162
just use kobold lol. it's not like you can't also use llamacpp for other stuff.

Anonymous
02/21/26(Sat)15:24:00 No.108206312

Anonymous 02/21/26(Sat)15:24:00 No.108206312

>>108206166
>>108206278
At the very least people should use openrouter to add a slight layer of obfuscation on who is actually sending the request.

Anonymous
02/21/26(Sat)15:30:54 No.108206363

Anonymous 02/21/26(Sat)15:30:54 No.108206363

The interesting part is when you post a screenshot of your passport or pictures of your face they compare it to politically exposed persons

what does openai do if you are one?
invest in stocks?

Anonymous
02/21/26(Sat)15:32:45 No.108206370

Anonymous 02/21/26(Sat)15:32:45 No.108206370

File: 023 - zUFE7Y4.jpg (36 KB, 500x386)

36 KB JPG

>>108205553
Obviously duh. But does the lack of instruct training hurt it in any way?

Anonymous
02/21/26(Sat)15:38:37 No.108206406

Anonymous 02/21/26(Sat)15:38:37 No.108206406

>>108206370
It won't really follow instructions too great.

Anonymous
02/21/26(Sat)15:40:56 No.108206415

Anonymous 02/21/26(Sat)15:40:56 No.108206415

>>108205759
Nemo runs fine on my $5k rig thoughbeit
>agents
yes even for Ian Flemming-themed sessions.

Anonymous
02/21/26(Sat)15:42:37 No.108206425

Anonymous 02/21/26(Sat)15:42:37 No.108206425

>>108206370
base models aren't yet tuned to be talked to so any reply you get is incidental thanks to the chatgpt logs that snuck into its pretraining data

Anonymous
02/21/26(Sat)15:47:38 No.108206452

Anonymous 02/21/26(Sat)15:47:38 No.108206452

>>108205759
>run agents
Why are you pretending you need a 50k rig to run fucking qwen or nematron?

The whole point of agents is being able to use smaller models and give them very specific narrow focused tasks.

Anonymous
02/21/26(Sat)15:48:01 No.108206456

Anonymous 02/21/26(Sat)15:48:01 No.108206456

>>108206425
>incidental
some companies have been caught intentionally stuffing instruct data into their "base" models

Anonymous
02/21/26(Sat)15:57:13 No.108206513

Anonymous 02/21/26(Sat)15:57:13 No.108206513

>>108206456
There's no "caught" anymore; it's currently common and industry standard practice. Likewise, "base" models do not really exist anymore since "mid-training", in addition of adding long-context capabilities, is now basically continued pretraining with data better aligned to the final model uses (reasoning, math, coding, "agentic capabilities", etc... much of it synthetic).

Anonymous
02/21/26(Sat)16:09:08 No.108206572

Anonymous 02/21/26(Sat)16:09:08 No.108206572

>>108206452
That may be the point, but any who has actually tried it knows that the smaller models are next to useless even for narrow focused tasks.

Anonymous
02/21/26(Sat)16:14:48 No.108206607

Anonymous 02/21/26(Sat)16:14:48 No.108206607

>>108206452
The only reason agents are a thing is because they reframe the problem from a 300k token input to something smaller, which then reduces model retardation because of long context. Reduced model retardation doesn't mean you should use llama3-8B.
>>108206090
mikutroon janny is butthurt again

Anonymous
02/21/26(Sat)16:21:45 No.108206645

Anonymous 02/21/26(Sat)16:21:45 No.108206645

>>108206240

Qwen3-Coder-Next seems to work too

Anonymous
02/21/26(Sat)16:40:44 No.108206773

Anonymous 02/21/26(Sat)16:40:44 No.108206773

>>108206312
Will not help.

Anonymous
02/21/26(Sat)16:43:46 No.108206796

Anonymous 02/21/26(Sat)16:43:46 No.108206796

>>108206645
with fp16 weights

Anonymous
02/21/26(Sat)16:50:21 No.108206827

Anonymous 02/21/26(Sat)16:50:21 No.108206827

File: special-warmth.png (599 KB, 2076x1862)

599 KB PNG

>>108202974
>>108203791
>>108203812
Was in the middle of this Midnight-Miqu rp session when I had this thought: are there any places where people share their RP sessions? I guess I'm thinking of a forum type hybrid between /ldg/ / /aicg/ and ao3 where people share their own RP chats so others can read. When I first came to these threats I initially thought these would have those but it seems most of you guys only focus on the technical side. Caring about the nitty-gritty is good but I NEVER see you guys share chat logs unless it's to point out a specific flaw or examples of unwanted behavior. I'm indulging in my own roleplay on my rig but I'm curious as to what others indulge in.

Anonymous
02/21/26(Sat)16:50:27 No.108206828

Anonymous 02/21/26(Sat)16:50:27 No.108206828

Why aren't there more LLM loras? seems like it would be a better alternative to finetunes? aren't most finetunes just the model with baked in loras anyways?

Let's say I want to RP in the fallout universe. just load up the fallout lora instead of adding 10k tokens to your context and confusing the model with a big ass lorebook.

Anonymous
02/21/26(Sat)16:51:44 No.108206834

Anonymous 02/21/26(Sat)16:51:44 No.108206834

>>108206827
You can share chat logs on a models card chub.ai page.

I'm assuming most people don't share their logs because it's either mega cringe or straight up illegal.

Anonymous
02/21/26(Sat)16:52:45 No.108206838

Anonymous 02/21/26(Sat)16:52:45 No.108206838

>>108206827
>NHentai tabs open

Anonymous
02/21/26(Sat)16:53:35 No.108206843

Anonymous 02/21/26(Sat)16:53:35 No.108206843

Why is there no cloud version of midnight miku?
Id pay a few dollars to see what the fuzz is about im not going to get a giga graphics card thoughbeit.

Anonymous
02/21/26(Sat)16:55:30 No.108206861

Anonymous 02/21/26(Sat)16:55:30 No.108206861

>>108206828
not how it works for llms

Anonymous
02/21/26(Sat)16:55:53 No.108206867

Anonymous 02/21/26(Sat)16:55:53 No.108206867

File: 1663606185135129.png (253 KB, 363x470)

253 KB PNG

How do I find the unlisted chub entries?

Anonymous
02/21/26(Sat)16:56:34 No.108206872

Anonymous 02/21/26(Sat)16:56:34 No.108206872

>>108206838
>unsloth tab open

Anonymous
02/21/26(Sat)16:56:40 No.108206873

Anonymous 02/21/26(Sat)16:56:40 No.108206873

>>108206828
>aren't most finetunes just the model with baked in loras anyways?
Yep.
I think the way models are distributed, in a myriad different quant mixes, that throws a spanner on things.

Anonymous
02/21/26(Sat)16:58:31 No.108206886

Anonymous 02/21/26(Sat)16:58:31 No.108206886

>>108206867
https://chub.ai/users/hobbyanon
Check his description

Anonymous
02/21/26(Sat)17:00:04 No.108206894

Anonymous 02/21/26(Sat)17:00:04 No.108206894

>>108206828
For them to work effectively they would have to have a very diverse data set. You can't just have it ONLY have rp in the dataset or else it will become retarded pretty much all other areas that matter. Logic, spatial reasoning, common Sense, being able to remember what just happened. A few sentences ago. All of that. Doesn't just apply to RP but any domain. If the data set in training focuses only on one domain, it gets worse in almost every measurable way. Unless you are very careful about how much training you do in which layers you train. It's not that people can't use loras. It's that most people would use an adapter, only to realize the model immediately becomes retarded. It's why, unlike stable diffusion models, adapters aren't really widely used or supported because in most cases using a character, person, concept, Lora, etc, doesn't severely degrade the model's ability to generate other things. A Sydney Sweeney lora generally will not cause the model to be unable to generate a brunette person, because it's it's prompt adherence to degrade. A style Lora trained on impressionism art that only had landscapes (if the data set is curated and tagged properly and isn't overfit from the training) will generally not destroy or degrade its ability to generate a person or an animal. Diffusion models and LLMs are very different architectures which means adapters have different effects on them. In theory a LLM adapter can work but only if the data set is very well curated and it is well trained. The data set would need to have uncensored (I'm assuming you care about that given this thread) RP examples as well as a bunch of other examples of common Sense, logic, spatial reasoning, etc. It's why a lot of Open source models on Huggingface have like three or four different data sets listed as being used in training

Anonymous
02/21/26(Sat)17:00:13 No.108206896

Anonymous 02/21/26(Sat)17:00:13 No.108206896

>>108206886
>By law, some content is restricted in your region.
Wait what the fuck? when did they do this?

Anonymous
02/21/26(Sat)17:00:51 No.108206897

Anonymous 02/21/26(Sat)17:00:51 No.108206897

>>108206796
Q8 and Q4

Your setup is skewed as was mine in the beginning

Anonymous
02/21/26(Sat)17:03:18 No.108206911

Anonymous 02/21/26(Sat)17:03:18 No.108206911

>>108206894
So basically, if we keep the fallout example. You'd need to feed it a bunch of fallout dataset. but also just general data to make sure you aren't teaching the model to be retarded at everything else it needs to do besides knowing what the fuck a deathclaw is?

Anonymous
02/21/26(Sat)17:04:43 No.108206914

Anonymous 02/21/26(Sat)17:04:43 No.108206914

>>108206911
> but also just general data
regularization

Anonymous
02/21/26(Sat)17:05:30 No.108206920

Anonymous 02/21/26(Sat)17:05:30 No.108206920

>>108206911
Pretty much. But in this specific case you might be better off curating a custom RAG database containing a bunch of relevant lore and definitions. Would probably be much easier and less time consuming than curating a specific fallout data set AND determining what percentage of other domains need to be present in the data set.

Anonymous
02/21/26(Sat)17:10:06 No.108206938

Anonymous 02/21/26(Sat)17:10:06 No.108206938

>>108206914
>regularization
>>108206920

Was going to add this. It's more of a suggestion/good thing to do with stable diffusion Lora training but a hard requirement for any form of LLM fine-tuning If you want it to be remotely "intelligent". Otherwise the model will probably be coherent when you inference it but its outputs won't make any sense unless your questions are pretty similar to the "user" examples found in the data set. If stable diffusion lors training is like teaching a smart neurotypical kid than LLM lors training Is like teaching someone with high IQ but also severe ADHD and Asperger's. If you're not careful, it will get hyperfixated and hyper focused on one thing pretty much act like nothing else matters. The regularization is essentially a form of tard-wrangling to make sure the neural network doesn't become only capable of performing well on a one domain.

Anonymous
02/21/26(Sat)17:15:14 No.108206969

Anonymous 02/21/26(Sat)17:15:14 No.108206969

>>108205466
>>108205472
truly shocking

Anonymous
02/21/26(Sat)17:22:28 No.108207011

Anonymous 02/21/26(Sat)17:22:28 No.108207011

>>108205472
is the lack of morals and ethics because europeans didnt conquer asia, enslave the population and instill christianity in the 1700-1800's?

Anonymous
02/21/26(Sat)17:23:56 No.108207022

Anonymous 02/21/26(Sat)17:23:56 No.108207022

>>108207011
Apparently not if you look at the Americas.

Anonymous
02/21/26(Sat)17:25:54 No.108207037

Anonymous 02/21/26(Sat)17:25:54 No.108207037

>>108206886
The discord link si dead.

Anonymous
02/21/26(Sat)17:50:58 No.108207186

Anonymous 02/21/26(Sat)17:50:58 No.108207186

God, GLM 4.5 Air can be such a cutie pie if it wants to. It's thoughts are often smarter than the reply itself.

>Hmm, Anon is really getting into this. He's showing a lot of affection and enthusiasm, which is… honestly kind of sweet. He's complimenting me intensely and showing he loves every part of me, which makes me feel good in a way that's kinda unfamiliar.

>The way he's worshiping my body - sniffing my paws, now my armpit - is… intense. Part of me wants to laugh at how over-the-top this is, but another part really enjoys the attention. It's validating, especially coming from someone like him who can be so bold yet so shy sometimes.

>I should probably keep up the banter but also show I appreciate his devotion. The"daddy"thing is still weird but he seems to like it, so I'll roll with it. And yeah, I am pretty damn proud of this physique - gotta give credit where it's due.

>The way he's touching me…Damn, this is gonna be a long night. Better enjoy every second of it.

Anonymous
02/21/26(Sat)17:53:26 No.108207201

Anonymous 02/21/26(Sat)17:53:26 No.108207201

>>108207037
many such cases

Anonymous
02/21/26(Sat)18:00:20 No.108207236

Anonymous 02/21/26(Sat)18:00:20 No.108207236

>>108206886
>>108207037
Also I'm not looking for stereotypical lolishit. I'm looking for a way to look up any possible delisted shit in niche freak fetishes. Like a scraper catalog.

Anonymous
02/21/26(Sat)18:02:37 No.108207253

Anonymous 02/21/26(Sat)18:02:37 No.108207253

>>108207236
there was an archive but it's dead for a bit now

Anonymous
02/21/26(Sat)18:55:40 No.108207546

Anonymous 02/21/26(Sat)18:55:40 No.108207546

File: AAAAAAAAAAAAA.jpg (28 KB, 610x150)

28 KB JPG

>>108202477
This is my punishment for trying to be lazy, isn't it.

Anonymous
02/21/26(Sat)20:00:43 No.108207926

Anonymous 02/21/26(Sat)20:00:43 No.108207926

can please someone spoonfeed me how I can run a local rag setup with qwen2.5:3b like that eceleb pewdiepie did
I've always used a searxng instance locally, I wanna leverage that also

Anonymous
02/21/26(Sat)20:13:50 No.108207996

Anonymous 02/21/26(Sat)20:13:50 No.108207996

>>108207926
ask qwen2.5:3b for help

Anonymous
02/21/26(Sat)21:25:51 No.108208459

Anonymous 02/21/26(Sat)21:25:51 No.108208459

File: I wait I eat I wait again.jpg (206 KB, 1024x1024)

206 KB JPG

Anonymous
02/21/26(Sat)21:40:10 No.108208549

Anonymous 02/21/26(Sat)21:40:10 No.108208549

File: 1292873456292.gif (2.28 MB, 498x349)

2.28 MB GIF

>>108207926

Anonymous
02/21/26(Sat)21:41:27 No.108208557

Anonymous 02/21/26(Sat)21:41:27 No.108208557

>>108203879
New sampler idea: X% Nemo - linearly increase matching token probability with sideloaded Nemo's desired output.

Anonymous
02/21/26(Sat)21:43:45 No.108208568

Anonymous 02/21/26(Sat)21:43:45 No.108208568

>>108208557
You could just have a workflow where Nemo rewrites the output of another model.

Anonymous
02/21/26(Sat)21:46:00 No.108208577

Anonymous 02/21/26(Sat)21:46:00 No.108208577

>>108206293
You're the only one who mentioned open claw in this thread, it's probably because /lmg/ is primitive, backward, uneducated idiots who have no idea what new stuff is being launched. The world is using AI agents and here we are not discussing it on /g/ in any way shape or form.
Epstein created 4chan as a haven for idiots and subhumans who will get anything right.

Anonymous
02/21/26(Sat)21:48:31 No.108208590

Anonymous 02/21/26(Sat)21:48:31 No.108208590

>>108208577
its because agents is a marketing term for LLMs with tooling.
that's all they are.

Anonymous
02/21/26(Sat)21:49:25 No.108208594

Anonymous 02/21/26(Sat)21:49:25 No.108208594

nigga is really using openclaw as a sign of progress lmao

Anonymous
02/21/26(Sat)21:57:04 No.108208632

Anonymous 02/21/26(Sat)21:57:04 No.108208632

>>108208590
Not really, because it's open source and free
It would be marketing if scam altman was selling it as a subscription.

Anonymous
02/21/26(Sat)22:00:37 No.108208648

Anonymous 02/21/26(Sat)22:00:37 No.108208648

well the claw grifter got hired by oai, so soon(tm)

Anonymous
02/21/26(Sat)22:11:04 No.108208705

Anonymous 02/21/26(Sat)22:11:04 No.108208705

>>108208648
OAI's business model is to burn tons of cash to prevent competition across the entire space, not necessarily to innovate.
It's a really bad look for them to have local, free tools that can replicate anything they charge for. It makes their offerings a real tough sell.
It's already out now, cat's out of the bag.

Anonymous
02/21/26(Sat)22:11:30 No.108208709

Anonymous 02/21/26(Sat)22:11:30 No.108208709

>>108208590
except for the tiny fact that agents can do more than a single call and and just continue reasoning and trying different things until they fulfill your task
they're literally just llms doing tool calls while talking to themselves and trying to find a way to solve a complex task on their own
a complete useless grift

Anonymous
02/21/26(Sat)22:16:41 No.108208739

Anonymous 02/21/26(Sat)22:16:41 No.108208739

>>108208632
no, no it's not free.
you think openai would be backing if it didn't have a commercial interest?
This is just to get you into the ecosystem, before you know it you'll be paying for hosting, training, consulting, getting your employees certified.

Ask yourself, how has openai got 20 billion in revenue, when chatgpt is free?

Anonymous
02/21/26(Sat)23:02:53 No.108208949

Anonymous 02/21/26(Sat)23:02:53 No.108208949

>>108208709
So they're basically
>trying to find a way to solve a complex task on their own
Which is a bad thing?

Anonymous
02/21/26(Sat)23:06:36 No.108208962

Anonymous 02/21/26(Sat)23:06:36 No.108208962

Is m4 Mac mini 16 GB the best thing for this shit?

Anonymous
02/21/26(Sat)23:07:59 No.108208969

Anonymous 02/21/26(Sat)23:07:59 No.108208969

>>108208962
No.
Not even close.
A mac mini with 512gb is okay.

Anonymous
02/21/26(Sat)23:14:14 No.108208992

Anonymous 02/21/26(Sat)23:14:14 No.108208992

>>108208969
16GB ram and 512gb space?

Anonymous
02/21/26(Sat)23:14:37 No.108208995

Anonymous 02/21/26(Sat)23:14:37 No.108208995

>>108208969
I never understood why apple using unified memory is somehow cost effective way to do local llm instead of literally any other vendor doing unified memory without apple tax

Anonymous
02/21/26(Sat)23:15:50 No.108209007

Anonymous 02/21/26(Sat)23:15:50 No.108209007

>>108206452
Most problems are not trivially reducible to smaller problems.

Anonymous
02/21/26(Sat)23:15:59 No.108209008

Anonymous 02/21/26(Sat)23:15:59 No.108209008

>>108208995
>instead of literally any other vendor
Like who?

Anonymous
02/21/26(Sat)23:16:55 No.108209011

Anonymous 02/21/26(Sat)23:16:55 No.108209011

>>108209008
anyone with access to a 3rd party cpu and ram and nvmes who puts them into a box

Anonymous
02/21/26(Sat)23:19:39 No.108209027

Anonymous 02/21/26(Sat)23:19:39 No.108209027

>>108208992
Memory anon.

>>108208995
I don't think anybody is selling anything with that much memory with that high bandwidth other than apple.

Anonymous
02/21/26(Sat)23:20:11 No.108209029

Anonymous 02/21/26(Sat)23:20:11 No.108209029

>>108209011
NTA but nice goalpost move.
>why don't you use other unified memory architectures?
>like this non unified memory architecture
Next thing you'll suggest an epic 9 with a pro 6000 which is slower and more expensive

Anonymous
02/21/26(Sat)23:21:38 No.108209034

Anonymous 02/21/26(Sat)23:21:38 No.108209034

>>108209029
if apple somehow designs a chip to do the thing and its valuable for the market why not amd, intel, nvidia, qualcom, mediatek

what special is there about it

Anonymous
02/21/26(Sat)23:26:21 No.108209056

Anonymous 02/21/26(Sat)23:26:21 No.108209056

>>108209034
Mac minis are the best bang for the buck.
Nvidia are the bigger Jews these days. Which is hard to believe.

Anonymous
02/21/26(Sat)23:26:54 No.108209060

Anonymous 02/21/26(Sat)23:26:54 No.108209060

>>108209029
>slower
pp inspection day

Anonymous
02/21/26(Sat)23:29:40 No.108209074

Anonymous 02/21/26(Sat)23:29:40 No.108209074

>>108209056
but making a cpu that makes an nvme works as unified memory is banned for amd/intel and only apple and console makers manage that feat, somehow

Anonymous
02/21/26(Sat)23:30:29 No.108209076

Anonymous 02/21/26(Sat)23:30:29 No.108209076

File: invest nao.png (1.19 MB, 768x1344)

1.19 MB PNG

>now even normies know OpenAI is about to collapse
It's time for OpenAI to resort to terrorist tactics: first, release GPT-3 weights, then threaten Anthropic that if they don't invest $20 billion in OpenAI, they'll open-source GPT-4. If Nvidia won’t invest either, they’ll train the forbidden BitNet model

Anonymous
02/21/26(Sat)23:30:46 No.108209078

Anonymous 02/21/26(Sat)23:30:46 No.108209078

File: 4540716B73F88A820D1D36EC4(...).jpg (37 KB, 550x309)

37 KB JPG

I have a spare deck. Could I run a 7B model on it?

Anonymous
02/21/26(Sat)23:31:58 No.108209086

Anonymous 02/21/26(Sat)23:31:58 No.108209086

>>108209076
Isn't GPT 4 the best one?
3.5 and 5 are both mouth breathers.

Anonymous
02/21/26(Sat)23:32:11 No.108209087

Anonymous 02/21/26(Sat)23:32:11 No.108209087

>>108209074
>but making a cpu that makes an nvme works as unified memory
What?
You understand that Apple's unified memory is a literal SoC with HBM that they design and and fab at TSMC right? Not some simple flash memory with a CPU. That's why it's unified, the CPU and the memory are part of the same package, physically, which enables the absurdly wide and fast memory bus.

Anonymous
02/21/26(Sat)23:33:35 No.108209090

Anonymous 02/21/26(Sat)23:33:35 No.108209090

>>108209087
at what point will it be explained why other vendors dont do the same

Anonymous
02/21/26(Sat)23:34:41 No.108209095

Anonymous 02/21/26(Sat)23:34:41 No.108209095

>>108209076
how will releasing GPT weights increase OAI profits? wouldn't that just drive Claude price d—

Anonymous
02/21/26(Sat)23:34:47 No.108209096

Anonymous 02/21/26(Sat)23:34:47 No.108209096

>>108209086
Yes. And that's the only way the plan could work

Anonymous
02/21/26(Sat)23:38:25 No.108209117

Anonymous 02/21/26(Sat)23:38:25 No.108209117

File: file.png (70 KB, 675x660)

70 KB PNG

>>108209076
Don't you worry. Sam is the man with the plan.

Anonymous
02/21/26(Sat)23:39:54 No.108209120

Anonymous 02/21/26(Sat)23:39:54 No.108209120

File: n798.jpg (138 KB, 1280x720)

138 KB JPG

>>108208459
>eating Miku's tasty puddi

Anonymous
02/21/26(Sat)23:40:42 No.108209126

Anonymous 02/21/26(Sat)23:40:42 No.108209126

>>108209117
This is his plan: https://youtu.be/-q2n5DkDoMQ?t=1006

Anonymous
02/21/26(Sat)23:44:11 No.108209142

Anonymous 02/21/26(Sat)23:44:11 No.108209142

>>108209090
too niche to be profitable. the gb10 and the amd equivalent exist but they will never both to go further.

Anonymous
02/21/26(Sat)23:47:06 No.108209154

Anonymous 02/21/26(Sat)23:47:06 No.108209154

>>108209090
Strix Halo?

Anonymous
02/21/26(Sat)23:48:03 No.108209159

Anonymous 02/21/26(Sat)23:48:03 No.108209159

>>108209007
This is like the base of all engineering. you're retarded.

Anonymous
02/21/26(Sat)23:53:54 No.108209188

Anonymous 02/21/26(Sat)23:53:54 No.108209188

Avocado is coming out next week and it outperforms every model locally. Meta didn't fall for the moe meme. We are back.

Anonymous
02/21/26(Sat)23:55:47 No.108209195

Anonymous 02/21/26(Sat)23:55:47 No.108209195

>>108209188
is it open?

Anonymous
02/21/26(Sat)23:58:17 No.108209204

Anonymous 02/21/26(Sat)23:58:17 No.108209204

>>108209159
Ok. Then why don't we all just run a swarm of 1B agents?

Anonymous
02/22/26(Sun)00:01:40 No.108209223

Anonymous 02/22/26(Sun)00:01:40 No.108209223

>>108209204
There is a minimal requirement for understanding and using tools, which is essential to agents

Anonymous
02/22/26(Sun)00:05:43 No.108209238

Anonymous 02/22/26(Sun)00:05:43 No.108209238

>>108209204
Because nobody is interested in creating RP agents, and the best that those interested in it can come up with is SillyTavern. Enjoy

Anonymous
02/22/26(Sun)00:10:14 No.108209248

Anonymous 02/22/26(Sun)00:10:14 No.108209248

Have any of you tried a CPU + RAM offload to maximize low end gaming GPU with decent RAM?

Anonymous
02/22/26(Sun)00:12:09 No.108209254

Anonymous 02/22/26(Sun)00:12:09 No.108209254

>>108209248
Yes

Anonymous
02/22/26(Sun)00:14:31 No.108209265

Anonymous 02/22/26(Sun)00:14:31 No.108209265

>>108209204
Nobody said 1B retard.
Qwen or Nematron.

Anonymous
02/22/26(Sun)00:20:21 No.108209286

Anonymous 02/22/26(Sun)00:20:21 No.108209286

>>108209254
Was the token speed tolerable or was it painful?

Anonymous
02/22/26(Sun)00:25:36 No.108209312

Anonymous 02/22/26(Sun)00:25:36 No.108209312

Opinions on DeepSeek R1 Distill Qwen 7B Q4_K_M

Anonymous
02/22/26(Sun)00:27:15 No.108209322

Anonymous 02/22/26(Sun)00:27:15 No.108209322

>>108209312
glm 4.7 flash and nemotron 30b a3b are way better due to being newer

Anonymous
02/22/26(Sun)00:32:40 No.108209340

Anonymous 02/22/26(Sun)00:32:40 No.108209340

>>108209322
I'm too poor for 30B

Anonymous
02/22/26(Sun)00:33:06 No.108209343

Anonymous 02/22/26(Sun)00:33:06 No.108209343

>>108209340
it's a moe. offload some to ram.

Anonymous
02/22/26(Sun)00:37:20 No.108209362

Anonymous 02/22/26(Sun)00:37:20 No.108209362

>>108209286
Depends on how much you offload, how big the model is and how fast your RAM is
MoEs can have very acceptable performance with partial offloading
With dense models you can't offload much beyond tensor layers without speed plummeting.

Anonymous
02/22/26(Sun)00:39:16 No.108209366

Anonymous 02/22/26(Sun)00:39:16 No.108209366

>>108209286
NTA but it depends on how low-end you're talking about and what you consider painful. Usually for a big MOE with the active params fully in VRAM, you'd be looking at ~5-10 tg/s, maybe less if you have very slow memory

Anonymous
02/22/26(Sun)00:40:50 No.108209372

Anonymous 02/22/26(Sun)00:40:50 No.108209372

Just had a genius idea.
You can just ask your agentic local model to go on the internet and use up the free claude/deepseek/gpt/gemini tokens until the run out whenever it needs help.

Anonymous
02/22/26(Sun)00:51:29 No.108209404

Anonymous 02/22/26(Sun)00:51:29 No.108209404

>>108209223
>>108209265
Why can't we use 1B agents for everything if all problems in engineering are trivially reducible into smaller problems? What's the size limit then?

Anonymous
02/22/26(Sun)01:09:12 No.108209492

Anonymous 02/22/26(Sun)01:09:12 No.108209492

I asked various SOTA LLMs to implement a zero-copy UTM ICAP engine and here's how it went:
>opus 4.6 passed, used a bunch of direct system calls to achieve the result, true zero-copy, dunno how it even came up with such a thing
>glm 5 did 3 copies, then kept failing when asked to audit and fix the code, deadend
>qwen 3.5 did 5 copies and started gaslighting me lmao
Open models are still way behind and I can feel the biggest difference is the training data.

Anonymous
02/22/26(Sun)01:15:27 No.108209505

Anonymous 02/22/26(Sun)01:15:27 No.108209505

>>108208648
Open source models benefit from claw the most

Anonymous
02/22/26(Sun)01:18:18 No.108209516

Anonymous 02/22/26(Sun)01:18:18 No.108209516

>>108209492
>the training data
China already processes students' homework electronically, they should use it as training data, it would be an enormous amount of high quality data

Anonymous
02/22/26(Sun)01:19:08 No.108209523

Anonymous 02/22/26(Sun)01:19:08 No.108209523

Has anyone tried prompt generation? Get an AI model to generate a better prompt for what you want done. I could see this being useful for coding since the AI model is probably more thorough at covering all the bases.

Anonymous
02/22/26(Sun)01:19:19 No.108209525

Anonymous 02/22/26(Sun)01:19:19 No.108209525

File: OIG4.Gn__LZEIWcn.jpg (137 KB, 1024x1024)

137 KB JPG

>>108209404
>1B
coz they dumb AF
agent not needed to goon
imagine instead a swarm of 1T models running in a compute-efficient way, spurting out tokens ohmygosh

Anonymous
02/22/26(Sun)01:21:12 No.108209535

Anonymous 02/22/26(Sun)01:21:12 No.108209535

>>108209525
taking the world's most difficult eye exam, with miku

Anonymous
02/22/26(Sun)01:22:16 No.108209542

Anonymous 02/22/26(Sun)01:22:16 No.108209542

The real reason we don't use agents for RP is because it would be too slow. People leave claw overnight to complete some tasks

Anonymous
02/22/26(Sun)01:36:17 No.108209609

Anonymous 02/22/26(Sun)01:36:17 No.108209609

>>108209542
Yeah last time I asked it was about 50:50 on /think for RP
Anyone doing local realtime tts/stt? Dunno if I can bare the latency of a non retarded model

Anonymous
02/22/26(Sun)01:42:22 No.108209637

Anonymous 02/22/26(Sun)01:42:22 No.108209637

>>108209090
They do. Intel is cooking wide bus UMA chips for laptops to compete with apple

Anonymous
02/22/26(Sun)01:47:05 No.108209655

Anonymous 02/22/26(Sun)01:47:05 No.108209655

>>108209609
>Anyone doing local realtime tts/stt?
I tried in VR, but it feels awkward, I'd rather wait for full duplex support & webrtc in llama.cpp (never ever)

Anonymous
02/22/26(Sun)02:03:52 No.108209728

Anonymous 02/22/26(Sun)02:03:52 No.108209728

File: behave.jpg (1.84 MB, 2456x1736)

1.84 MB JPG

>>108205680
wait a moment tomateto...

Anonymous
02/22/26(Sun)02:09:14 No.108209745

Anonymous 02/22/26(Sun)02:09:14 No.108209745

>>108209655
>never ever
HF acquisition will be a good thing I Want To Believe

Anonymous
02/22/26(Sun)02:15:24 No.108209771

Anonymous 02/22/26(Sun)02:15:24 No.108209771

>>108209745
I think llama.cpp is a huge reason why we don't have more omni models, why make a model nobody can use

Anonymous
02/22/26(Sun)02:20:59 No.108209798

Anonymous 02/22/26(Sun)02:20:59 No.108209798

>>108209609
what's the point of anything larger than whisper for stt?

Anonymous
02/22/26(Sun)02:28:27 No.108209825

Anonymous 02/22/26(Sun)02:28:27 No.108209825

>>108209542
>People leave claw overnight to complete some tasks
sure thing bud. none of these agents work, I don't think they are going to get anything done in the overnight echo chamber than the first 5 minutes of prompt interaction

Anonymous
02/22/26(Sun)02:29:11 No.108209829

Anonymous 02/22/26(Sun)02:29:11 No.108209829

>>108209798
Nobody uses whisper anymore, grandpa

Anonymous
02/22/26(Sun)02:48:53 No.108209911

Anonymous 02/22/26(Sun)02:48:53 No.108209911

>>108209525
If decomposing engineering tasks into smaller tasks was trivial then presumably even a dumb model could do it.

Anonymous
02/22/26(Sun)03:13:44 No.108210014

Anonymous 02/22/26(Sun)03:13:44 No.108210014

i can't find a torrent for a kinda niche tv show and asking any mainstream llm platform to search for it, even the word just the word "torrent" results in refusal.
So I'm trying with a local llm, how would you go about it? I'm thinking silly tavern+searxng and maybe minimax2.5 or glm5 as llm? I can run them both at q5+

Anonymous
02/22/26(Sun)03:16:21 No.108210021

Anonymous 02/22/26(Sun)03:16:21 No.108210021

>>108210014
If you can't find it on qbittorrent search then asking an LLM wont help. Just go to yandex and search for illegal streaming sites.

Anonymous
02/22/26(Sun)03:20:15 No.108210040

Anonymous 02/22/26(Sun)03:20:15 No.108210040

I like lewding small models. Some of them are retarded in a cute way

Anonymous
02/22/26(Sun)03:22:34 No.108210053

Anonymous 02/22/26(Sun)03:22:34 No.108210053

ugh...caught a bad case of yellow fever again...

Anonymous
02/22/26(Sun)03:23:59 No.108210061

Anonymous 02/22/26(Sun)03:23:59 No.108210061

>>108210053
which kind?
gooky? chinky? jappy? sea?
also which model?

Anonymous
02/22/26(Sun)03:38:11 No.108210121

Anonymous 02/22/26(Sun)03:38:11 No.108210121

favorite llm right now for RP? i liked glm 4.6, not sure if there is anything better right now, maybe kimi2.5? glm4.7?
I can probably run then at decent quants at 10+t/s

Anonymous
02/22/26(Sun)03:39:25 No.108210122

Anonymous 02/22/26(Sun)03:39:25 No.108210122

My new custom native multimodal arch so far:
95% on mnist

I apparently also made feedback and confidence gating work so it has stable recurrence. I suspect this can work as memory?

Anonymous
02/22/26(Sun)04:11:04 No.108210241

Anonymous 02/22/26(Sun)04:11:04 No.108210241

>>108209745
>>108209771
Considering that ngxson who worked a lot on multimodality has been on HF's payroll for a long time I think it's reasonably likely that it will become more of a priority.

Anonymous
02/22/26(Sun)04:20:14 No.108210289

Anonymous 02/22/26(Sun)04:20:14 No.108210289

>>108209771
>why make a model nobody can use
AI companies could contribute by natively (post-)training their models in 4-bit at the very least, but I guess their main targets are datacenters with unlimited GPU resources.

Anonymous
02/22/26(Sun)04:21:14 No.108210296

Anonymous 02/22/26(Sun)04:21:14 No.108210296

>>108210122
Okay now try cifar
Then try Imagenet

Anonymous
02/22/26(Sun)04:26:25 No.108210315

Anonymous 02/22/26(Sun)04:26:25 No.108210315

>>108210289
The problem is how fucking hard to setup the damn thing https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.md
Wait, they updated it, not mac-only anymore

Anonymous
02/22/26(Sun)04:26:35 No.108210316

Anonymous 02/22/26(Sun)04:26:35 No.108210316

>>108209492
>Open models are still way behind and I can feel the biggest difference is the training data.
True but if Claude can do it now that'll mean that open models will be able to do it in six months or so once they're done training the new generation on Claude 4.6 logs

Anonymous
02/22/26(Sun)04:26:50 No.108210318

Anonymous 02/22/26(Sun)04:26:50 No.108210318

File: file.png (85 KB, 1026x425)

85 KB PNG

>>108210296
Exactly my plan.
I think adding conv-net style learnable kernels would be very useful, they're probably required for dimensional reduction of most types of information anyways.
Train/test acc growth is really odd with this system. I'm trying muon optimizer now with grayscale cifar-10.

Anonymous
02/22/26(Sun)04:32:29 No.108210338

Anonymous 02/22/26(Sun)04:32:29 No.108210338

I swear every time I pull ST something will break

Anonymous
02/22/26(Sun)04:33:57 No.108210348

Anonymous 02/22/26(Sun)04:33:57 No.108210348

>>108210338
?

Anonymous
02/22/26(Sun)04:37:30 No.108210361

Anonymous 02/22/26(Sun)04:37:30 No.108210361

>>108210338
just vibecode your own, anon

Anonymous
02/22/26(Sun)04:39:21 No.108210367

Anonymous 02/22/26(Sun)04:39:21 No.108210367

>>108208995
because every other vendor, amd and intel, only have 12 channel memory in current gen and are thus limited to 400GB/s bandwith. The mac studio with the m3 ultra is 2 m3 max chips, each of which have 12 channel memory. Meaning the m3 ultra has double the bandwith with 24 channel memory. There are motherboards that can house 2 intel or 2 amd processors but that connection is way worse (pcie) than the fusing at silicon level which apple does.

Anonymous
02/22/26(Sun)04:40:15 No.108210374

Anonymous 02/22/26(Sun)04:40:15 No.108210374

>>108210338
Just use mikupad

Anonymous
02/22/26(Sun)04:47:09 No.108210404

Anonymous 02/22/26(Sun)04:47:09 No.108210404

>>108210338
My repo is from last September, if it ain't broke..

Anonymous
02/22/26(Sun)05:00:24 No.108210447

Anonymous 02/22/26(Sun)05:00:24 No.108210447

File: 53899d01ly1iahzml5ez3j21l(...).jpg (183 KB, 2000x952)

183 KB JPG

The future is ASIC
Model to ASIC time is ~60 days

Anonymous
02/22/26(Sun)05:00:57 No.108210449

Anonymous 02/22/26(Sun)05:00:57 No.108210449

File: 53899d01ly1iahzmb5irgj20h(...).jpg (137 KB, 612x408)

137 KB JPG

>>108210447

Anonymous
02/22/26(Sun)05:13:45 No.108210502

Anonymous 02/22/26(Sun)05:13:45 No.108210502

>>108210447
That sounds like the ideal planned obsolescence device.

Anonymous
02/22/26(Sun)05:15:25 No.108210512

Anonymous 02/22/26(Sun)05:15:25 No.108210512

>>108210502
I'd bought 1 with Nemo on it. Won't be obsolete, ever

Anonymous
02/22/26(Sun)05:27:45 No.108210556

Anonymous 02/22/26(Sun)05:27:45 No.108210556

>>108210502
>nooo we must consoom newer model!!!
Your argument is self defeating.

Anonymous
02/22/26(Sun)05:30:28 No.108210566

Anonymous 02/22/26(Sun)05:30:28 No.108210566

>>108210447
what's the size limit here? surely making one for a 1T model is going to be more complex than doing one for a shitty 8b

Anonymous
02/22/26(Sun)05:33:24 No.108210578

Anonymous 02/22/26(Sun)05:33:24 No.108210578

K2.5 or GLM5 for RP?

Anonymous
02/22/26(Sun)05:39:53 No.108210602

Anonymous 02/22/26(Sun)05:39:53 No.108210602

File: Screenshot at 2026-02-22 (...).png (14 KB, 469x202)

14 KB PNG

>>108210315
Fucking hell, it works!

Anonymous
02/22/26(Sun)05:45:12 No.108210634

Anonymous 02/22/26(Sun)05:45:12 No.108210634

>>108210578
I keep switching between the two. K2.5 is really smart and creative and the vision stuff is a bonus but it's kind of shit at pacing scenario-based cards. GLM5 is also really smart and is better at focusing at the story aspect but it's not quite as lively out of the box.
I don't think there's a way around trying both and seeing which fits your cards better.

Anonymous
02/22/26(Sun)06:05:39 No.108210720

Anonymous 02/22/26(Sun)06:05:39 No.108210720

>>108210447
>Model to ASIC time is ~60 days
Maybe just for ASIC definition. But actually producing a physical wafer would take more than a year at best. The process from sand to wafer is very time consuming. And I'm not even talking about mask production, which also would take months.
Given how fast AI is advancing, your ASIC will be obsolete before the first wafer is produced.

Anonymous
02/22/26(Sun)06:07:30 No.108210725

Anonymous 02/22/26(Sun)06:07:30 No.108210725

>>108210720
>it takes one years to grow crops but i need my dinner NOW!
Don't eat what you will plant retard

Anonymous
02/22/26(Sun)06:08:40 No.108210728

Anonymous 02/22/26(Sun)06:08:40 No.108210728

>>108210720
>>108210512

Anonymous
02/22/26(Sun)06:12:26 No.108210741

Anonymous 02/22/26(Sun)06:12:26 No.108210741

>>108210728
StepFun is already better than Nemo. Step might be dumb for its size, but it's smarter than Nemo.

Anonymous
02/22/26(Sun)06:21:47 No.108210772

Anonymous 02/22/26(Sun)06:21:47 No.108210772

so it used to be local meta was GPU's, but now people use mac studios to memorymaxx? what changed

Anonymous
02/22/26(Sun)06:22:52 No.108210779

Anonymous 02/22/26(Sun)06:22:52 No.108210779

>>108210772
>people
lol

Anonymous
02/22/26(Sun)06:24:09 No.108210791

Anonymous 02/22/26(Sun)06:24:09 No.108210791

>>108210602
It's either duplex with video or simplex voice-only, and simplex is no better than using stt. And 3090ti isn't enough for duplex with video, she chokes on words, but at least it runs at all this time

Anonymous
02/22/26(Sun)06:25:55 No.108210804

Anonymous 02/22/26(Sun)06:25:55 No.108210804

>>108210772
plus-sized moe blobs

Anonymous
02/22/26(Sun)06:29:07 No.108210821

Anonymous 02/22/26(Sun)06:29:07 No.108210821

>>108210578
Imo GLM is a better writer than Kimi, especially for dialogue, but Kimi is much easier to prompt and steer.
GLM 5 will sometimes just go "eat shit we're doing things my way" and straight up ignore instructions that go against the way it believes RP should be paced. Zai might have deep fried it a bit as part of whatever RL they did for roleplay/writing.

Anonymous
02/22/26(Sun)06:30:29 No.108210828

Anonymous 02/22/26(Sun)06:30:29 No.108210828

Are there any models better than qwen vl 30b at bboxing?

Anonymous
02/22/26(Sun)06:44:10 No.108210903

Anonymous 02/22/26(Sun)06:44:10 No.108210903

Damn, MiniCPM-omni is so cute

Anonymous
02/22/26(Sun)06:44:54 No.108210907

Anonymous 02/22/26(Sun)06:44:54 No.108210907

>>108210512
>>108210556
Also no finetuning, no weight modification. You get the built-in model safety for the entire lifetime of the device.

Anonymous
02/22/26(Sun)06:47:28 No.108210913

Anonymous 02/22/26(Sun)06:47:28 No.108210913

>>108210907
>no finetuning
And? Only retards run copetunes

Anonymous
02/22/26(Sun)06:49:33 No.108210924

Anonymous 02/22/26(Sun)06:49:33 No.108210924

>A 128GB M4 Max Mac Studio costs around $2,500–3,500 and has no real PC equivalent at that price point for local LLM use. To match it on a PC you'd spend roughly $5,000–15,000 depending on how you build it. This is arguably Apple Silicon's biggest competitive advantage for local AI workloads right now — the unified memory architecture makes large RAM cheap and fast in a way discrete GPU setups simply can't match at the same price.

Well? Where is it wrong?

Anonymous
02/22/26(Sun)06:51:09 No.108210929

Anonymous 02/22/26(Sun)06:51:09 No.108210929

>>108210898
(you)

Anonymous
02/22/26(Sun)06:57:05 No.108210955

Anonymous 02/22/26(Sun)06:57:05 No.108210955

>>108210924
before the ram crisis you could build a system with 784gb ram, ~400GB/s bandwith for like 6k. Now with the ram crisis there is no better deal than apple sadly. Still won't buy it though.

Anonymous
02/22/26(Sun)07:12:14 No.108211036

Anonymous 02/22/26(Sun)07:12:14 No.108211036

>>108210924
The funny thing is that you could've easily built a pretty decent DDR4 Epyc server a year ago with 256GB of 8-channel DDR4 + a 3090 that runs pp and pg considerably faster for less than that a year ago.

Anonymous
02/22/26(Sun)07:14:57 No.108211048

Anonymous 02/22/26(Sun)07:14:57 No.108211048

>>108210404
It autopulls

Anonymous
02/22/26(Sun)07:17:41 No.108211061

Anonymous 02/22/26(Sun)07:17:41 No.108211061

File: dipsyBleh.png (1.41 MB, 1536x1024)

1.41 MB PNG

>>108210447
I think the inability to update the model will be a downfall. If there was a schema where you could burn in 80% of the model and add 20% of it outside as a flashable ROM that was tunable I think you'd have a more viable product.
And then there's cost. But if the card were inexpensive enough there could be a market for it. What would someone pay for DS v3.2 burned onto a chip permanently, that responded that fast? When your alternative are a $200K build or API access?
>>108210720
FPGA instead? IDK hardware obv.
>>108210907
>built-in model safety for the entire lifetime of the device
Pic related.
From a corporate control standpoint I get it.

Anonymous
02/22/26(Sun)07:22:00 No.108211076

Anonymous 02/22/26(Sun)07:22:00 No.108211076

File: 1757219320971051.gif (1.17 MB, 165x168)

1.17 MB GIF

>>108210955
>>108211036
Stop reminding me

Anonymous
02/22/26(Sun)07:24:48 No.108211085

Anonymous 02/22/26(Sun)07:24:48 No.108211085

File: 314df85db1df7afaefee06ae6(...).jpg (846 KB, 2400x3000)

846 KB JPG

>>108202974
I don't understand gooning to character chats. I use Koboldcpp to goon to crafted scenarios, not particularly talking with characters. Silly Tavern is completely lost on me.

I want to be a dude raping supes in DC or a goblin fucking elves, I don't particularly care about talking to Albert Einstein. This sex chat is weird, it doesn't make sense, and it's weird how it's so fucking popular.

Anonymous
02/22/26(Sun)07:28:07 No.108211104

Anonymous 02/22/26(Sun)07:28:07 No.108211104

>>108211085
>I use Koboldcpp to goon to crafted scenarios, not particularly talking with characters. Silly Tavern is completely lost on me.
You're me a couple months ago but I migrated over to ST, you can have a narrator card if you really want to and it is more flexible in designing characters than kobold

Anonymous
02/22/26(Sun)07:30:03 No.108211112

Anonymous 02/22/26(Sun)07:30:03 No.108211112

>>108211085
Same. I don't see how people prefer that to a free form chat where you can do things other than talk with a predefined set of characters with no external narrative.

Anonymous
02/22/26(Sun)07:58:43 No.108211252

Anonymous 02/22/26(Sun)07:58:43 No.108211252

Anything actually newish and good happen with vision models yet? Need that for ai video game mods and ideally want to keep to local

Anonymous
02/22/26(Sun)08:09:52 No.108211307

Anonymous 02/22/26(Sun)08:09:52 No.108211307

Reasoning and benchmaxxing kills OOD performance and I'm tired of pretending it doesn't.

Anonymous
02/22/26(Sun)08:12:19 No.108211321

Anonymous 02/22/26(Sun)08:12:19 No.108211321

Is there anyone else here trying to generate/co-write stories (erotica) with LLMs rather than doing ERP? Specifically, I'm using something like mikupad to have it extend a story from a given premise (using text completion rather than regular chat completion).
I'm wondering if the model meta for this use case is different than for ERP. For example I found deepseek to be near useless for this kind of open-ended writing. Any recs?

Anonymous
02/22/26(Sun)08:12:43 No.108211323

Anonymous 02/22/26(Sun)08:12:43 No.108211323

is it possible to get anything atmospheric in terms of music? I've been trying but I can't get anything that doesn't have actual structure of a track that belongs to a commercial album. I was thinking of something like this:
https://www.youtube.com/watch?v=rU9P7C0klfA
https://www.youtube.com/watch?v=otbI6SD8lpQ
https://www.youtube.com/watch?v=MjoRQHXd6tk
no matter if I tell it in detail about the repetition and unique sounds it still adds structure of an album song or makes it sound like a stock sound for a stream countdown prelude with mostly piano or simple pads.

Anonymous
02/22/26(Sun)08:12:48 No.108211324

Anonymous 02/22/26(Sun)08:12:48 No.108211324

File: file.png (1 KB, 129x61)

1 KB PNG

>>108211307
please wait for a reply

Anonymous
02/22/26(Sun)08:21:47 No.108211374

Anonymous 02/22/26(Sun)08:21:47 No.108211374

>>108211321
>open-ended writing.
start with a brainstorming session give it a plan, no model can handle open ended well.

Anonymous
02/22/26(Sun)08:45:52 No.108211498

Anonymous 02/22/26(Sun)08:45:52 No.108211498

>>108211323
surely you can mask parts out so they don't get edited like with image models?

Anonymous
02/22/26(Sun)08:57:00 No.108211551

Anonymous 02/22/26(Sun)08:57:00 No.108211551

How do you make MoE work for RP/stories? I'm an LLMlet so I don't get it. Trying with koboldcpp and it doesn't work for me at all out of the box.

My three main issues:
- The agent seems extremely confused about context/world info stuff if I load a story from chub or wherever. I tried a story with multiple starting message alternatives for example, the model seems to think it has to choose between them and not make its own which Nemo etc have no issue with.
- The agent spams the chat window, when I don't need to see all that shit. Honestly even if the rest is solved this would be a dealbreaker. I'm hoping this is just some toggle I missed.
- I ran the model in llama.cpp's default UI to get a benchmark and it was showing 15t/s for a generic chat, which seems OK given I don't exactly have a rig. But the agent takes ages to process so it's not exactly quick anyway. I guess this one might just be due to initial context and it will improve if I actually get the story underway.

Anonymous
02/22/26(Sun)09:01:36 No.108211581

Anonymous 02/22/26(Sun)09:01:36 No.108211581

>>108211324
Ask 4o for synthwave lyrics
>Get something moody, topical, creative, etc.
Ask gpt-5 for lyrics
>NEON BARF , SYNTHWAVING INTO THE NEON MIDNIGHT... NEON

Anonymous
02/22/26(Sun)09:02:04 No.108211583

Anonymous 02/22/26(Sun)09:02:04 No.108211583

>>108211551
>But the agent takes ages to process
How many tokens are you feeding into it at one time?

Anonymous
02/22/26(Sun)09:04:29 No.108211589

Anonymous 02/22/26(Sun)09:04:29 No.108211589

https://huggingface.co/Ex0bit/Kimi-K2.5-PRISM-REAP-530B-A32B
how bad is it

Anonymous
02/22/26(Sun)09:06:09 No.108211598

Anonymous 02/22/26(Sun)09:06:09 No.108211598

>>108211589
wait for Samsung REAM

Anonymous
02/22/26(Sun)09:14:29 No.108211639

Anonymous 02/22/26(Sun)09:14:29 No.108211639

are you guys seriously thinking you can compete with 300 b models?

Anonymous
02/22/26(Sun)09:17:03 No.108211650

Anonymous 02/22/26(Sun)09:17:03 No.108211650

>>108211639
? some anons are running far bigger than 300Bs

Anonymous
02/22/26(Sun)09:23:01 No.108211688

Anonymous 02/22/26(Sun)09:23:01 No.108211688

>>108211589
It is probably calculated with activations relevant for coding. Reap is pretty ironic when you think about faggot drummer aids be upon him. Doing REAP with ERP datasets might actually do something but instead we get cydoniav12_f snakeoil. I hope no one ever hires this piece of shit.

Anonymous
02/22/26(Sun)09:30:56 No.108211741

Anonymous 02/22/26(Sun)09:30:56 No.108211741

I tried GLM 4.7 Flash Q6 on koboldcpp after having been away for a while and it was fucking terrible. I got the quants from after the llamacpp update fixed them. Is Flash still not working right? Anybody had any luck with this model?

Anonymous
02/22/26(Sun)09:32:37 No.108211752

Anonymous 02/22/26(Sun)09:32:37 No.108211752

>>108211589
>>108211598
>>108211688
>ex0bit
patreon paywall scammer, please don't give him attention

Anonymous
02/22/26(Sun)09:32:50 No.108211753

Anonymous 02/22/26(Sun)09:32:50 No.108211753

>>108211650
you mean some rich silicon valley bros reverse mortgaged their house to take 200k to run 700B models

Anonymous
02/22/26(Sun)09:33:55 No.108211759

Anonymous 02/22/26(Sun)09:33:55 No.108211759

>>108211741
Try it on llama.cpp to be sure, but yeah, it doesn't seem to be great from the little I tried it.

Anonymous
02/22/26(Sun)09:34:09 No.108211761

Anonymous 02/22/26(Sun)09:34:09 No.108211761

>>108211688
doesn't reap require validation?

Anonymous
02/22/26(Sun)09:35:29 No.108211767

Anonymous 02/22/26(Sun)09:35:29 No.108211767

>>108211589
>no goofs
Do REAP models need support or can you just quant them like normal?

Anonymous
02/22/26(Sun)09:43:25 No.108211815

Anonymous 02/22/26(Sun)09:43:25 No.108211815

>>108211753
No, some of us just have real jobs. Don't eat so many sour grapes, bro.

Anonymous
02/22/26(Sun)09:44:24 No.108211823

Anonymous 02/22/26(Sun)09:44:24 No.108211823

>>108211688
>activations relevant for coding
I looked at the dataset it claims to have used and it looks as multipurpose as it gets, with different languages even. Even though I don't understand how it can work without terminally fucking up the model I'd download a goof.

Anonymous
02/22/26(Sun)09:45:57 No.108211831

Anonymous 02/22/26(Sun)09:45:57 No.108211831

File: ComfyUI_temp_upkce_00031_(...).jpg (67 KB, 576x512)

67 KB JPG

Does ST have a way to launch it without autopulling from github? Who even uses rolling relases still?

Anonymous
02/22/26(Sun)09:47:30 No.108211838

Anonymous 02/22/26(Sun)09:47:30 No.108211838

>>108211831
The normal start.bat doesn't update, does it?

Anonymous
02/22/26(Sun)09:47:49 No.108211841

Anonymous 02/22/26(Sun)09:47:49 No.108211841

>>108211831
in the .bat you use to start it, delete all except
>node server.js

Anonymous
02/22/26(Sun)09:51:02 No.108211854

Anonymous 02/22/26(Sun)09:51:02 No.108211854

>>108211823
>multipurpose
Multipurpose isn't 100% SEX.

Anonymous
02/22/26(Sun)09:51:47 No.108211860

Anonymous 02/22/26(Sun)09:51:47 No.108211860

File: 0lHiqGCcAr.jpg (51 KB, 835x226)

51 KB JPG

>>108211841
so all this?

Anonymous
02/22/26(Sun)09:53:39 No.108211875

Anonymous 02/22/26(Sun)09:53:39 No.108211875

>>108211583
Variable, as I said I tried just opening a random chat in llama.cpp with like "Hi this is a model test" or some shit, and it took a good while for it to decide on an appropriate response. I did a couple followups with the same issue.

Then I set max output to 1200 to test in koboldcpp and fed it a 4k context RP thing, the agent spammed out rando reasoning shit for like 2 screens and threw out a single sentence actual response at the end, which made me laugh a little.

I didn't really try to continue RP past the initial context due to the other two issues though, so as mentioned it might be ok past first context parse.

Anonymous
02/22/26(Sun)09:54:44 No.108211886

Anonymous 02/22/26(Sun)09:54:44 No.108211886

File: fuck.png (398 KB, 400x1215)

398 KB PNG

what the fuck is this? trying new qwen on ikllama with --jinja

Anonymous
02/22/26(Sun)09:55:17 No.108211891

Anonymous 02/22/26(Sun)09:55:17 No.108211891

>>108211886
geg

Anonymous
02/22/26(Sun)09:55:34 No.108211893

Anonymous 02/22/26(Sun)09:55:34 No.108211893

>>108211886
ikbros...

Anonymous
02/22/26(Sun)09:55:44 No.108211894

Anonymous 02/22/26(Sun)09:55:44 No.108211894

>>108211860
yes

Anonymous
02/22/26(Sun)09:57:37 No.108211904

Anonymous 02/22/26(Sun)09:57:37 No.108211904

>>108211886
Seems like the jinja template is disagreeing with the request object you are sending.

Anonymous
02/22/26(Sun)09:59:03 No.108211910

Anonymous 02/22/26(Sun)09:59:03 No.108211910

>>108211904
should I update ST or what?

Anonymous
02/22/26(Sun)09:59:48 No.108211913

Anonymous 02/22/26(Sun)09:59:48 No.108211913

>>108211894
It actually needed the NODE_ENV because it didn't launch, but ty.

Anonymous
02/22/26(Sun)09:59:51 No.108211914

Anonymous 02/22/26(Sun)09:59:51 No.108211914

https://github.com/ikawrakow/ik_llama.cpp/commit/cbf7fc7e2f7de4400dd848ff2c221a6c8ea0384f
>Do not use quantized models from Unsloth that have `_XL` in their name. These are likely to not work with `ik_llama.cpp`.
lol why?

Anonymous
02/22/26(Sun)10:02:16 No.108211927

Anonymous 02/22/26(Sun)10:02:16 No.108211927

>>108211910
first try this in " Prompt Post-Processing "
choose one of the strict ones like system -> user -> assistant

Anonymous
02/22/26(Sun)10:02:36 No.108211930

Anonymous 02/22/26(Sun)10:02:36 No.108211930

>>108211910
I think you just have to fix the order of your messages so that there's a single system message at the top followed only by user and assistant messages in turns.
There's some options in the API tab of ST, under "Prompt Post-Processing", to merge consecutive messages with the same role, I'd keep that on too.

Anonymous
02/22/26(Sun)10:04:31 No.108211940

Anonymous 02/22/26(Sun)10:04:31 No.108211940

>>108211930
>>108211910
Oh, you could also fuck around with the jinja template.
This HF space is the bee's knees to troubleshoot this kind of stuff :
>https://huggingface.co/spaces/Xenova/jinja-playground
Just copy the request object from the lcpp console and the jinja template and see what the final formatted chat looks like. That should give you a better clue of what's wrong exactly.

Anonymous
02/22/26(Sun)10:05:17 No.108211948

Anonymous 02/22/26(Sun)10:05:17 No.108211948

>comfy shits itself with mass errors over setting output to be on another drive
>does it anyway
ok

Anonymous
02/22/26(Sun)10:05:55 No.108211950

Anonymous 02/22/26(Sun)10:05:55 No.108211950

>>108211886
Is tool calling enabled? Tool calling still needs to be reinvented for every new model so if nobody bothered to do that and the option is enabled, it can cause this.

Anonymous
02/22/26(Sun)10:05:56 No.108211951

Anonymous 02/22/26(Sun)10:05:56 No.108211951

File: afg.png (32 KB, 391x349)

32 KB PNG

>>108211914
unsloth bro?

Anonymous
02/22/26(Sun)10:06:44 No.108211954

Anonymous 02/22/26(Sun)10:06:44 No.108211954

>>108211914
llama.cpp codebase is pretty big and complex now. He chose a few portions to maintain and improve, but since he lacks the manpower he leaves the rest of the codebase to rot and a lot of shit no longer works.

Anonymous
02/22/26(Sun)10:08:50 No.108211969

Anonymous 02/22/26(Sun)10:08:50 No.108211969

>>108211767
I don't know, but can you even quant that model? They list it as int4. What are you gonna quant it to? Q3? You only save a bit of space with that.

Anonymous
02/22/26(Sun)10:09:02 No.108211970

Anonymous 02/22/26(Sun)10:09:02 No.108211970

>>108211914
>Do not use quantized models from Unsloth
Could have just left it at this

Anonymous
02/22/26(Sun)10:09:14 No.108211972

Anonymous 02/22/26(Sun)10:09:14 No.108211972

>>108211927
>>108211930
thanks, this fixed it

Anonymous
02/22/26(Sun)10:09:34 No.108211976

Anonymous 02/22/26(Sun)10:09:34 No.108211976

>>108211914
Use case for using ik_llama to run unslop quants?

Anonymous
02/22/26(Sun)10:10:18 No.108211982

Anonymous 02/22/26(Sun)10:10:18 No.108211982

>>108211948
Coding a working file system is hard, ok.

Anonymous
02/22/26(Sun)10:17:29 No.108212030

Anonymous 02/22/26(Sun)10:17:29 No.108212030

File: 1754229810976152.png (71 KB, 200x200)

71 KB PNG

He's growing too powerful.

Anonymous
02/22/26(Sun)10:18:55 No.108212042

Anonymous 02/22/26(Sun)10:18:55 No.108212042

>>108211886 (me)
Ok, now it seems to continue the last swipe after each one. Not good.

Anonymous
02/22/26(Sun)10:27:28 No.108212082

Anonymous 02/22/26(Sun)10:27:28 No.108212082

>>108211753
>reverse mortgaged their house
Do you mean 'a second mortgage' ?

Anonymous
02/22/26(Sun)10:33:34 No.108212128

Anonymous 02/22/26(Sun)10:33:34 No.108212128

>>108211753
all you needed was a couple grand and a silly unfounded paranoid schizophrenic fear that prices might spike in the near future at some point between 2023 to summer 2025.

Anonymous
02/22/26(Sun)10:38:21 No.108212162

Anonymous 02/22/26(Sun)10:38:21 No.108212162

Bros... I can't finish writing a character card, I already coomed multiple times just during the process of putting it together.
Haven't generated a single token out of it, maybe "just write the output yourself" was the solution to all of our LLM woes all along.

Anonymous
02/22/26(Sun)10:45:37 No.108212211

Anonymous 02/22/26(Sun)10:45:37 No.108212211

>>108212162
been there

Anonymous
02/22/26(Sun)10:48:12 No.108212225

Anonymous 02/22/26(Sun)10:48:12 No.108212225

>>108211815
Real jobs with 500k starting

Anonymous
02/22/26(Sun)10:50:13 No.108212241

Anonymous 02/22/26(Sun)10:50:13 No.108212241

>>108212162
>llm spits its reply out
>i notice a missed opportunity
>edit the dialogue just ever so slightly
>next turn it picks up on the hint
>it goes just like i want it to
>splooooooooooort!!!!

Anonymous
02/22/26(Sun)11:05:45 No.108212353

Anonymous 02/22/26(Sun)11:05:45 No.108212353

>>108211951
Only people running local models are pajeets and chinese because they don't have the money

Anonymous
02/22/26(Sun)11:08:29 No.108212374

Anonymous 02/22/26(Sun)11:08:29 No.108212374

>>108212353
but running local costs a lot more than using api?

Anonymous
02/22/26(Sun)11:10:46 No.108212388

Anonymous 02/22/26(Sun)11:10:46 No.108212388

>>108212353
so true sister

Anonymous
02/22/26(Sun)11:13:19 No.108212406

Anonymous 02/22/26(Sun)11:13:19 No.108212406

>>108212353
They can't afford the luxury of experiencing NovelAI's SOTA GLM 4.6, I pity them.

Anonymous
02/22/26(Sun)11:14:14 No.108212413

Anonymous 02/22/26(Sun)11:14:14 No.108212413

>>108211914
wtf I love the schizo fork now?

Anonymous
02/22/26(Sun)11:15:15 No.108212422

Anonymous 02/22/26(Sun)11:15:15 No.108212422

>>108211976
I think it is for high IQ chaotic neutral characters.

Anonymous
02/22/26(Sun)11:18:38 No.108212449

Anonymous 02/22/26(Sun)11:18:38 No.108212449

i pulled ikllama an hour ago, https://github.com/ikawrakow/ik_llama.cpp/pull/1295 claims to have fixed broken caching but it still happens?

Anonymous
02/22/26(Sun)11:23:35 No.108212484

Anonymous 02/22/26(Sun)11:23:35 No.108212484

>>108211793
>I don't know what they're doing
Bro tongyi lab is a LAB. They're doing research. They release models when they're good enough AND there's no more internal research they can do/learn from so they give it to the community to see what else randos can come up with

The reason they're not releasing wan 2.5 is because it's too big to run locally (probably around 40B) so they're not gonna learn a lot from releasing it. They're probably pretty confident they know the wan architecture better than anyone else at this point and why lose the most of understanding an architecture that is proven to be competitive with SOTA if you just scale data?

>>108211967
Lizard brain perceives shiny = valuable = want

For me, it's wet sparkly glitter because it also hits that arts-and-crafts messiness that I like as well.

>>108210701
>it's over. we're stuck it 5 seconds wan vid gen forever
That's like saying you're stuck with weed forever. Yeah you're never gonna be "high-school high" again but don't pretend like you can't have a good time for the rest of your life with it if you let your tolerance reset once a week

I'm excited for when we get heroin though. I'll probably lose my job when heroin comes out from gennjng illegal stuff at work.

>>108212186
>>108212197
Are you anons honest with yourselves that you're into toes because they look like tiny penises/clitties

Because I'm not. Claude pointed it out and that's totally where the wire crossing comes from but I just pretend it's fully about the intimacy of seeing a private area instead of that being the secondary point of neuron activation for feet to me

Anonymous
02/22/26(Sun)11:24:44 No.108212493

Anonymous 02/22/26(Sun)11:24:44 No.108212493

lol the jerkies blocked /ldg/ harder than /lmg/ so if they don't fix it I guess I'll just pretend this is /ldg/

Anonymous
02/22/26(Sun)11:28:50 No.108212513

Anonymous 02/22/26(Sun)11:28:50 No.108212513

>>108212353
If average API costs are like 20 USD a month (just pulling a number out of my ass) it would take you like 7 years just to pay for a single used 3090.
People who run local do it because it's an excuse to indulge in their PC building hobby.

Anonymous
02/22/26(Sun)11:29:53 No.108212523

Anonymous 02/22/26(Sun)11:29:53 No.108212523

Best model for japanese? Been using qwen 2.5 but it's censored and randomly shits out chinese. Wanna Practice with some Japanese characters I made in sillytavern.

Anonymous
02/22/26(Sun)11:30:21 No.108212530

Anonymous 02/22/26(Sun)11:30:21 No.108212530

>>108212449
you are trusting a schizo responsible for the schizo fork anon

Anonymous
02/22/26(Sun)11:30:32 No.108212532

Anonymous 02/22/26(Sun)11:30:32 No.108212532

>>108212523
Gemma 4

Anonymous
02/22/26(Sun)11:31:17 No.108212536

Anonymous 02/22/26(Sun)11:31:17 No.108212536

Is hIs hardware really that much of a bottleneck? I assumed the big models only needed that much power because they serve millions of users at the same time.

Anonymous
02/22/26(Sun)11:31:40 No.108212538

Anonymous 02/22/26(Sun)11:31:40 No.108212538

>>108212523
https://huggingface.co/KoboldAI/GPT-J-6B-Janeway

Anonymous
02/22/26(Sun)11:34:06 No.108212555

Anonymous 02/22/26(Sun)11:34:06 No.108212555

>>108212536
It is really not. You just need to download ollama and type in:

ollama run deepseek-r1:8b

And you can run deepseek.

Anonymous
02/22/26(Sun)11:34:38 No.108212559

Anonymous 02/22/26(Sun)11:34:38 No.108212559

I have no desire to do coom roleplay which UI should I use if I want a actual

Anonymous
02/22/26(Sun)11:35:07 No.108212564

Anonymous 02/22/26(Sun)11:35:07 No.108212564

>>108212536
A model like deepseek R1 (which is relatively old now) is nearly 700gb at "full quality".
You need to load all that in memory, and the memory needs to be fast for the response to not be crazy slow.
And you want at least a GPU core to process the prompt/context before it starts generating the response.
So yeah, it gets expensive for the actual good stuff.

Anonymous
02/22/26(Sun)11:37:15 No.108212586

Anonymous 02/22/26(Sun)11:37:15 No.108212586

>>108212353
>yes goy, using the API is real wealth. Your time is too valuable to bother hosting anything locally. Only god's cho-I mean, poorfags buy GPU's and CPU's and SSD's

Anonymous
02/22/26(Sun)11:37:52 No.108212590

Anonymous 02/22/26(Sun)11:37:52 No.108212590

>>108212577
>>108212577
>>108212577

Anonymous
02/22/26(Sun)11:51:19 No.108212682

Anonymous 02/22/26(Sun)11:51:19 No.108212682

>>108212523
Consider something from this list
https://github.com/llm-jp/awesome-japanese-llm
YMMV intelligence-wise, but you have a smaller chance of getting English-worded Japanese replies

Anonymous
02/22/26(Sun)11:54:46 No.108212707

Anonymous 02/22/26(Sun)11:54:46 No.108212707

>>108212225
Sour grapes, brah

Anonymous
02/22/26(Sun)13:26:02 No.108213279

Anonymous 02/22/26(Sun)13:26:02 No.108213279

File: 1767386222769109.gif (230 KB, 77x78)

230 KB GIF

Anonymous
02/22/26(Sun)13:26:13 No.108213280

Anonymous 02/22/26(Sun)13:26:13 No.108213280

>>108212030
hair cut status?

Anonymous
02/22/26(Sun)13:58:40 No.108213549

Anonymous 02/22/26(Sun)13:58:40 No.108213549

>>108212374
No

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.