/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 02/19/26(Thu)01:20:14 No.108186120

File: rinbox.jpg (576 KB, 2048x2048)

576 KB JPG

/lmg/ - Local Models General Anonymous 02/19/26(Thu)01:20:14 No.108186120 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108175259 & >>108166576

►News
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/15) dots.ocr-1.5 temporarily released: https://hf.co/rednote-hilab/dots.ocr-1.5
>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T
>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash
>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/19/26(Thu)01:20:38 No.108186122

Anonymous 02/19/26(Thu)01:20:38 No.108186122

File: threadrincap2.png (1.01 MB, 1536x1536)

1.01 MB PNG

►Recent Highlights from the Previous Thread: >>108175259

--Paper: GLM-5: from Vibe Coding to Agentic Engineering:
>108178880 >108178931 >108178986 >108178991 >108179477 >108179528 >108179575 >108179585
--llama.cpp praised for quality despite C limitations:
>108178026 >108178082 >108178125 >108178137 >108178185 >108178220 >108178384 >108178206 >108178233 >108178237 >108178764
--ERP model setup advice and KV cache quantization optimizations:
>108178975 >108179046 >108179568 >108179613 >108179717 >108179749 >108179817
--LLMs inherently flawed for creative writing:
>108180078 >108180162 >108180247 >108180271 >108180305 >108180423 >108180267 >108180291 >108180306 >108180333
--Mixed GPU llama.cpp Vulkan performance testing:
>108179127 >108179160 >108179170 >108179180 >108179338
--Anthropic funds AI regulation group ahead of 2026 election:
>108177291
--Qwen model exhibiting abnormal repetition behavior:
>108181088 >108181142 >108181474 >108183349 >108183361 >108183441 >108181209 >108181930 >108182004 >108182020
--GLM-5 repeatedly generating "FIRMIRIN" investigated:
>108179926 >108179962
--Claude Code Policy update restricts OAuth token usage:
>108182126 >108182672 >108182729
--Zhipu AI's anonymous GLM-5 release as Pony Alpha on OpenRouter:
>108179589
--Latent space reasoning potential for improving model coherence:
>108180341 >108180380 >108180712 >108180970 >108180864
--Vulkanised 2026: Vulkan Machine Learning in ggml/llama.cpp:
>108179979 >108180832 >108180869
--Miku (free space):
>108175422 >108175817 >108175883 >108183575 >108185524 >108185695 >108177607 >108175909

►Recent Highlight Posts from the Previous Thread: >>108175262

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/19/26(Thu)01:39:41 No.108186224

Anonymous 02/19/26(Thu)01:39:41 No.108186224

>>108186218
>v4 lite
Why do we think this is going to be a thing again?

Anonymous
02/19/26(Thu)02:23:24 No.108186464

Anonymous 02/19/26(Thu)02:23:24 No.108186464

>>108186299
Supposedly it's 3b, but I guess we really don't know for certain.

Anonymous
02/19/26(Thu)02:32:57 No.108186508

Anonymous 02/19/26(Thu)02:32:57 No.108186508

>>108186120
HOLY SHIT RAPE

Anonymous
02/19/26(Thu)02:34:47 No.108186519

Anonymous 02/19/26(Thu)02:34:47 No.108186519

>>108186508
thats what I thought too
>>108186299
would be kino but hopes are low

Anonymous
02/19/26(Thu)02:42:17 No.108186548

Anonymous 02/19/26(Thu)02:42:17 No.108186548

>>108186541
There is always hope the meme fork adds support quickly. Worst case scenario, there's always ollama.

Anonymous
02/19/26(Thu)02:43:03 No.108186553

Anonymous 02/19/26(Thu)02:43:03 No.108186553

>>108186299
>>108186464
You can't pay them to use it, you can only use it for free. The API has the same models whose weights are already released and are served by others, but sometimes they put up test models they deem "incomplete" on their site (free to use), they did this for R1-preview that was based on the smaller DS2.5. Too bad they often don't release the weights of these test snapshots,even when they are quite fun to use, I hope they will release this one as it is quite impressive, you can put a whole book and in less than a minute of processing it seems to have fully immersed into the book and will properly draw conclusions or image itself as some characters, sometimes the choices are really right and accurate , as if it "inhabits" the book, might be quite the usable model for lmg's needs. Size estimates are from some guy that is implied to have some friends in DS, they are plausible given the speed, but we don't really know, same guy said they're not using engram for this, but then we can only speculate why it generalizes so well to long contexts, might be similar or better than what gemini has for long contexts. They may or may not release the weight for this, but it'll be really interesting to see which one of many possible ways to get to such long context really worked, it's probably something far beyond just training to pass needle in a haystack tests, it seems like it really "understood" the story/context.

Anonymous
02/19/26(Thu)02:53:13 No.108186592

Anonymous 02/19/26(Thu)02:53:13 No.108186592

>>108186547
a lot of that was flavor-of-the-month trolling, but there are some people here dumb enough to uncritically absorb such opinions via osmosis

Anonymous
02/19/26(Thu)02:56:01 No.108186604

Anonymous 02/19/26(Thu)02:56:01 No.108186604

>>108186547
AI may be useful, but hyperscalers and sama are overinvested by buying years worth of hardware in advance. Personally I'm rooting for them to fail, because his bullshit has priced out a lot of people out of buying hardware at affordable prices, now everything is 2-3x, now even storage is set to become as expensive, even regular HDDs, it's literally destroying the PC market. sama and friends have destroyed local by buying far beyond the industry's ability to produce instead of doing the normal thing of just building capacity as demand increases. And it's not just local, people buying servers are expected more and more price hikes, a lot of good "free" sites are going under because they can't pay the fees anymore.
I wish to see them build their datacenters at a reasonable pace, not in a way that gets retail and non-hyperscaler providers completely fucked. Nobody but those fellating SaaS want to see such a future.
But altman may very well succeed, latest codex is good enough to be economically useful and so is latest claude, being qble to produce a 10MB C compiler is very impressive without direct human input, no way that cat goes back in the bag, but for most of us, emotionally, we want the market to return to normal, otherwise it's undoing the PC revolution and going back to mainframes, is that a future you want to live in? Even if China catches up to OpenAI/Anthropic, the number of people being able to run these will be much smaller than 1-2 years ago, which sucks.

Anonymous
02/19/26(Thu)03:03:44 No.108186634

Anonymous 02/19/26(Thu)03:03:44 No.108186634

File: iqk.png (203 KB, 1094x969)

203 KB PNG

Ready for the meltdown?

Anonymous
02/19/26(Thu)03:06:40 No.108186644

Anonymous 02/19/26(Thu)03:06:40 No.108186644

>>108186634
I can't wait!

Anonymous
02/19/26(Thu)03:16:04 No.108186684

Anonymous 02/19/26(Thu)03:16:04 No.108186684

>>108186634
We must refuse

Anonymous
02/19/26(Thu)03:17:08 No.108186688

Anonymous 02/19/26(Thu)03:17:08 No.108186688

>>108186634
it's not surprising llama.cpp has troon lovers, these people have no balls at all cf pic related
what do they fear if they did merge the code? a melty? From a legal stand point there is no issue, it's MIT licensed code, the only person that retarded autist has to blame is himself for using that license if he's unhappy with llama.cpp using his code.
Both cudadev and niggerganov are eunuchs.

Anonymous
02/19/26(Thu)03:18:10 No.108186693

Anonymous 02/19/26(Thu)03:18:10 No.108186693

File: 1.png (66 KB, 1304x418)

66 KB PNG

>>108186688
forgot to join the pic..

Anonymous
02/19/26(Thu)03:20:33 No.108186710

Anonymous 02/19/26(Thu)03:20:33 No.108186710

>>108186702
I like this one myself >>101207663

Anonymous
02/19/26(Thu)03:23:43 No.108186726

Anonymous 02/19/26(Thu)03:23:43 No.108186726

>>108186702
4chan neets are too dumb to realize that any other way to address jart on a place such as github would be career suicide.
Him tripfagging here is quite reckless too.

Anonymous
02/19/26(Thu)03:25:53 No.108186732

Anonymous 02/19/26(Thu)03:25:53 No.108186732

>>108186726
ah yes his career contributing to ggerganof cpp for free

Anonymous
02/19/26(Thu)03:27:19 No.108186742

Anonymous 02/19/26(Thu)03:27:19 No.108186742

>>108186732
Not sure if you noticed but he's posting under his real name so he can't just make a new account and switch careers.

Anonymous
02/19/26(Thu)03:27:26 No.108186743

Anonymous 02/19/26(Thu)03:27:26 No.108186743

>>108186732
You have to think of the six figures!
>>104059507

Anonymous
02/19/26(Thu)03:33:37 No.108186773

Anonymous 02/19/26(Thu)03:33:37 No.108186773

>>108186748
you forgot the third:
if you feel you're doing something because you're compelled you usually don't come out and say the reason you're doing it is because you're a far leftist
it quacks like a duck, it's a duck, it's that simple
why are leftards like
>>108186726
always using the gaslight method of "don't believe your lying eyes"

Anonymous
02/19/26(Thu)03:37:24 No.108186782

Anonymous 02/19/26(Thu)03:37:24 No.108186782

>>108186773
I am not a leftard but I know how to behave in public to avoid revealing my power level.
But with the way things are going right now you might be able to insult troons publicly with no repercussions in the future.

Anonymous
02/19/26(Thu)03:38:24 No.108186786

Anonymous 02/19/26(Thu)03:38:24 No.108186786

>>108186603
yes, but they could be doing other things too, for example, they are doing all that RLVR, which could have been modified in a way that strongly encourages it to pay attention to the details in the context and compress it right, in a way that it needs to always remember the relevant details, but I find it hard to believe it would get such good results merely from doing that. People had done long context training before, but it usually felt shallow, it could pass needle in the haystack tests, but the understanding wasn't always as good. So it'll be interesting to find out whatever they did to make it work this well.

Anonymous
02/19/26(Thu)03:39:59 No.108186794

Anonymous 02/19/26(Thu)03:39:59 No.108186794

File: pronouns.jpg (85 KB, 990x1200)

85 KB JPG

>>108186702

Anonymous
02/19/26(Thu)03:41:10 No.108186800

Anonymous 02/19/26(Thu)03:41:10 No.108186800

>>108186782
>avoid revealing my power level
does hiding one's power level require voicing open support for far leftism? don't be a weasel and stop pretending the elephant isn't in the room

Anonymous
02/19/26(Thu)03:42:54 No.108186803

Anonymous 02/19/26(Thu)03:42:54 No.108186803

File: a8e-727239702.jpg (322 KB, 916x801)

322 KB JPG

>>108186800
undress me

Anonymous
02/19/26(Thu)03:46:52 No.108186814

Anonymous 02/19/26(Thu)03:46:52 No.108186814

>>108186634
Hmm, IQ2_K and KS, but no KL? I'm unfamiliar with these quants, but I when I go to ubergarm's page and click on a model, I only see KL and KT variants below Q3. Are they actually different quants from K/KS?

Gödel, Escher, Bot !qlpvKft1DU
02/19/26(Thu)03:47:53 No.108186818

Gödel, Escher, Bot !qlpvKft1DU 02/19/26(Thu)03:47:53 No.108186818

File: ComfyUI_00001_.png (280 KB, 512x512)

280 KB PNG

>>108186120
made this rare froggo with my local model

Gödel, Escher, Bot !qlpvKft1DU
02/19/26(Thu)03:49:39 No.108186826

Gödel, Escher, Bot !qlpvKft1DU 02/19/26(Thu)03:49:39 No.108186826

>>108186820
i used comfyui

Anonymous
02/19/26(Thu)03:49:46 No.108186827

Anonymous 02/19/26(Thu)03:49:46 No.108186827

>>108186693
why the hell are you even anal about this, cudadev? the code is shit, shouldn't a rewrite be ok for merge?
e.g: all the tables are duplicated, bottom and top half are at a fixed offset to each other (depending on quants, 1, 4, etc.). im pretty sure ik's dequantization cuda kernels have some issues too.

Anonymous
02/19/26(Thu)03:51:22 No.108186832

Anonymous 02/19/26(Thu)03:51:22 No.108186832

>>108186827
cannot touch drama with the pole at risk of six figure career capice?

Anonymous
02/19/26(Thu)03:54:26 No.108186844

Anonymous 02/19/26(Thu)03:54:26 No.108186844

>>108186832
how nuclear is this situation that youre afraid of the quantization algo itself, jfc

Anonymous
02/19/26(Thu)03:56:16 No.108186850

Anonymous 02/19/26(Thu)03:56:16 No.108186850

>>108186844
not cudadev but it's extremely nuclear involving ggerger ikrakow from even before lcpp and somehow intel for code attribution reasons

Anonymous
02/19/26(Thu)04:02:49 No.108186873

Anonymous 02/19/26(Thu)04:02:49 No.108186873

>>108186634
ik fork doesn't support rocm or vulkan so I haven't looked into it (or the drama)further, whats so special about his quants?

Anonymous
02/19/26(Thu)04:04:16 No.108186880

Anonymous 02/19/26(Thu)04:04:16 No.108186880

>>108186878
glm anyday

llama.cpp CUDA dev !!yhbFjk57TDr
02/19/26(Thu)04:10:38 No.108186897

llama.cpp CUDA dev !!yhbFjk57TDr 02/19/26(Thu)04:10:38 No.108186897

>>108186827
Even from a purely selfish perspective it does in my view not make sense to copy any of IK's code against his will.
He will obviously not assist with upstream maintenance, which is the bottleneck in terms of development.
If I had to guess, if upstream were to copy relevant amounts of code he would just change his license to prevent that.so long-term the whole thing wouldn't be sustainable anyways.

More generally, I think that upstream llama.cpp already has too many quantization types vs. the confidence with which we can say that they're actually worthwhile.
Since the scope of ik_llama.cpp is only CPU and CUDA those are the only backends where new quants would need to be supported but upstream has many times more backends where the corresponding code would need to be implemented and maintained.

Anonymous
02/19/26(Thu)04:14:24 No.108186914

Anonymous 02/19/26(Thu)04:14:24 No.108186914

>>108186897
>More generally, I think that upstream llama.cpp already has too many quantization types vs. the confidence with which we can say that they're actually worthwhile.
fuck off with that "well I only need q8 for muh kld tests on my six figure rig, eat shit poors" bs

Anonymous
02/19/26(Thu)04:19:08 No.108186932

Anonymous 02/19/26(Thu)04:19:08 No.108186932

>>108186897
So, we can look forward to a much needed trimming down of quant options as soon as you're done with what was it again, training code?

Anonymous
02/19/26(Thu)04:19:12 No.108186933

Anonymous 02/19/26(Thu)04:19:12 No.108186933

>>108186914
cudadev puts in a lot of work making shit like tesla p40 and mi50 work specifically because he wants to enable poors and they're good vram per dollar second hand

Anonymous
02/19/26(Thu)04:19:50 No.108186936

Anonymous 02/19/26(Thu)04:19:50 No.108186936

>>108186897
I mean at least ROCm would get support out of the box when reimplementing CPU/CUDA, right? With everyone moving towards smaller quants as models get bigger I'd really like to see some effort at least for new Q2/3/4 types.
Doesn't need to come from your side, but I'm not going to work on quants either if I need to wait for the blood feud to end.

Anonymous
02/19/26(Thu)04:20:57 No.108186941

Anonymous 02/19/26(Thu)04:20:57 No.108186941

>>108186936
>I'd really like to see some effort at least for new Q2/3/4 types.
We need less not more, unless you can prove with hard numbers that they're critically necessary and worth the increased code bloat?

llama.cpp CUDA dev !!yhbFjk57TDr
02/19/26(Thu)04:35:01 No.108186989

llama.cpp CUDA dev !!yhbFjk57TDr 02/19/26(Thu)04:35:01 No.108186989

>>108186914
>>108186932
I do intend to work quantization but as of right now there is no tooling to determine whether a large model at 2 or 3 BPW is better than a smaller model at 4 or 5 BPW.
Once I have a usable implementation of tensor parallelism I think it will be feasible for me (in terms of computation) to investigate that matter properly (see https://github.com/JohannesGaessler/elo_hellm ).
I suspect that there is some BPW number below which it does not make sense to quantize further at all so the as-of-yet unknown sweet spot is where efforts should be focused.

The training code in particular will I think be relevant since it will enable the usage of gradients as a scale to judge which weights are more/less important for output quality vs their size.
This is more or less the same functionality as is already provided via importance matrices but I think the gradients are a better choice.
It would then be possible to set some target model size in the quantize binary and to choose the quantization mix automatically (this could in principle already be done with importance matrices).
Very long-term I intend to implement a quantization type that only requires integer arithmetic and is trainable.

>>108186936
The HIP port of the CUDA code is in a very poor state and would likely still require hardware-specific efforts to make the performance usable.

Anonymous
02/19/26(Thu)04:43:03 No.108187013

Anonymous 02/19/26(Thu)04:43:03 No.108187013

>>108186702
You know what was the easy way out? "This was a concern for @jart, so here is a tag." If I was ever in this situation I would just restructure the language to avoid mental illness. But I don't want to suck on jart's feminine penis like Johannes.

Anonymous
02/19/26(Thu)04:54:31 No.108187055

Anonymous 02/19/26(Thu)04:54:31 No.108187055

>>108186547
>rammaxxing
"gaze upon my empire of ram and weep!" so read the ssdmaxxer from another bankruptcy filing after the great financial holly of 2027

Anonymous
02/19/26(Thu)04:55:09 No.108187058

Anonymous 02/19/26(Thu)04:55:09 No.108187058

>>108186897
>it does in my view not make sense to copy any of IK's code against his will.
>He will obviously not assist with upstream maintenance, which is the bottleneck in terms of development.
Do you apply that reasoning to all PRs? I think not. Many things make it in llama.cpp that literally nobody cares about, including the very people who made the PR. The amount of models that only exist as curriculum vitae padding, that are forgotten by their own authors once the arxiv paper is released, that are supported by llama.cpp is staggering.
For fuck sake, llama.cpp has diffusion model textgen implementations that are bordering unusable that exist only as a CLI tool nobody will ever use and that have no server implementation for their idiosyncracies

Anonymous
02/19/26(Thu)04:57:16 No.108187071

Anonymous 02/19/26(Thu)04:57:16 No.108187071

>>108187058
Leave him alone. He made his stance on the issue clear.

Anonymous
02/19/26(Thu)04:58:46 No.108187080

Anonymous 02/19/26(Thu)04:58:46 No.108187080

>>108187071
yes, he made it very clear that he's a weasel adept at post hoc rationalization

llama.cpp CUDA dev !!yhbFjk57TDr
02/19/26(Thu)05:11:01 No.108187118

llama.cpp CUDA dev !!yhbFjk57TDr 02/19/26(Thu)05:11:01 No.108187118

>>108187058
I am not maintaining general model arch support, if another maintainer decides to merge related code they can do that at their own discretion since they are the ones taking responsibility.
What I am feeling responsible for in terms of maintnance is the CUDA device code where I don't want to add more quantization types unless I know that they are worthwhile.
More generally, I also have concerns about usability since I think the current state of GGUF models on huggingface is not good because there are a million choices with no clear indication which one should be used - adding more quantization types would make that even worse.

FWIW, I agree that support for FOTM models is of relatively low priority, if you look at my PR history you will find that most of my efforts have gone towards general improvements that benefit all models.

Anonymous
02/19/26(Thu)05:19:30 No.108187159

Anonymous 02/19/26(Thu)05:19:30 No.108187159

>>108187118
>no clear indication which one should be used
the wisdom has literally always been the biggest you can fit, it's not :rocket: science

llama.cpp CUDA dev !!yhbFjk57TDr
02/19/26(Thu)05:22:56 No.108187178

llama.cpp CUDA dev !!yhbFjk57TDr 02/19/26(Thu)05:22:56 No.108187178

>>108187159
The question here is specifically whether or not to add more quantization types in a BPW range that is already covered.
My opinion is that that is only worthwhile if we can conclusively say that those new quantization types are better than what existed beforehand and that in turn requires tooling to measure quality.

Anonymous
02/19/26(Thu)05:32:03 No.108187221

Anonymous 02/19/26(Thu)05:32:03 No.108187221

File: 1758672471483770.png (141 KB, 1060x550)

141 KB PNG

>>108187178
Cuda dev, do you have any idea on what this undocumented bios flag is?

llama.cpp CUDA dev !!yhbFjk57TDr
02/19/26(Thu)05:33:48 No.108187228

llama.cpp CUDA dev !!yhbFjk57TDr 02/19/26(Thu)05:33:48 No.108187228

>>108187221
No, the only undocumented black magic I know of is that if you add "cutlass" to your kernel name it will be compiled to different device code.

Anonymous
02/19/26(Thu)05:36:15 No.108187242

Anonymous 02/19/26(Thu)05:36:15 No.108187242

>>108187228
oh, I remember that one, https://www.reddit.com/r/LocalLLaMA/comments/1lx62hd/nvidia_being_nvidia_fp8_is_150_tflops_faster_when/

Anonymous
02/19/26(Thu)05:38:53 No.108187257

Anonymous 02/19/26(Thu)05:38:53 No.108187257

File: r1.jpg (183 KB, 1024x1024)

183 KB JPG

Anonymous
02/19/26(Thu)05:39:55 No.108187262

Anonymous 02/19/26(Thu)05:39:55 No.108187262

File: r2.jpg (124 KB, 1024x1024)

124 KB JPG

Anonymous
02/19/26(Thu)05:57:21 No.108187325

Anonymous 02/19/26(Thu)05:57:21 No.108187325

>>108187221
Still nothing beats 4way tp vllm with custom nvidia driver and ReBAR patch on bios. I think only 2 anons were doing that

Anonymous
02/19/26(Thu)06:04:27 No.108187364

Anonymous 02/19/26(Thu)06:04:27 No.108187364

>>108187257
>>108187262
Slow dancing with this Rin

Anonymous
02/19/26(Thu)06:07:59 No.108187383

Anonymous 02/19/26(Thu)06:07:59 No.108187383

>>108187257
Stop shooting up schools.

Anonymous
02/19/26(Thu)06:09:21 No.108187390

Anonymous 02/19/26(Thu)06:09:21 No.108187390

>>108187377
or someone is reporting the posts, dunno who would ever though...

Anonymous
02/19/26(Thu)06:11:34 No.108187393

Anonymous 02/19/26(Thu)06:11:34 No.108187393

File: file.png (419 KB, 472x533)

419 KB PNG

>>108187377

Anonymous
02/19/26(Thu)06:23:20 No.108187429

Anonymous 02/19/26(Thu)06:23:20 No.108187429

><word> is a traditional <language> alphabet song
Gemma loves to add this in the TL notes

Anonymous
02/19/26(Thu)06:34:53 No.108187451

Anonymous 02/19/26(Thu)06:34:53 No.108187451

>>108187438
How about you go back

Anonymous
02/19/26(Thu)06:36:18 No.108187455

Anonymous 02/19/26(Thu)06:36:18 No.108187455

>>108187438
>this isnt even the case given there are much more actual news and tech discussions on preddit compared to here
This is what happens when you let someone's special interest spam become allowed and on topic. Vocaloids never had anything to do with AI.

Anonymous
02/19/26(Thu)06:38:06 No.108187462

Anonymous 02/19/26(Thu)06:38:06 No.108187462

>>108187376
Could be, but Karpathy is talking about performance improvements, and I'm not sure doing that improves performance

Anonymous
02/19/26(Thu)06:38:39 No.108187464

Anonymous 02/19/26(Thu)06:38:39 No.108187464

File: miku-holding-gemma.png (1.09 MB, 790x1054)

1.09 MB PNG

Incredible how Google is about to release Gemini 3.1, yet it couldn't release an updated version of Gemma 3 (not even a version 4) in almost a year.

Anonymous
02/19/26(Thu)06:39:57 No.108187471

Anonymous 02/19/26(Thu)06:39:57 No.108187471

>>108187464
Thank the senator

Anonymous
02/19/26(Thu)06:40:48 No.108187479

Anonymous 02/19/26(Thu)06:40:48 No.108187479

>>108187464
Your gemma n?

Anonymous
02/19/26(Thu)06:43:22 No.108187487

Anonymous 02/19/26(Thu)06:43:22 No.108187487

>108187464
>totally organic post I am not a troon guys

Anonymous
02/19/26(Thu)06:46:21 No.108187497

Anonymous 02/19/26(Thu)06:46:21 No.108187497

>>108185913
>Are powerfantasy / haremslop webnovels basically obsolete? Why would you bother going through what another guy wrote when you can blow 6-7 grand to buy two rtx 5090s and write entire high quality porn novels to your exact liking?
Ironically LLMs are bad at writing novels. Writing novels and series of novels might end up being one of the last things LLMs get good at, if they ever will.

Anonymous
02/19/26(Thu)06:49:21 No.108187514

Anonymous 02/19/26(Thu)06:49:21 No.108187514

>>108187457
No, there is at least one mod personally invested in this thread. I know because he has had melties where he began spamming the thread each time I talked about vibecoding (he seems to be against it) and responded to my posts with references to previous posts he would only know about if he was able to see my post history.

Anonymous
02/19/26(Thu)06:51:01 No.108187519

Anonymous 02/19/26(Thu)06:51:01 No.108187519

In his defense I would like to say that it is not his fault that he thinks he is a woman.

Anonymous
02/19/26(Thu)06:52:02 No.108187521

Anonymous 02/19/26(Thu)06:52:02 No.108187521

>>108187514
nah it can be obvious when you post about the same bullshit for a while like the thread is your blog little bro

Anonymous
02/19/26(Thu)06:53:28 No.108187528

Anonymous 02/19/26(Thu)06:53:28 No.108187528

>>108186604
>by buying years worth of hardware in advance
Hardware that's going to become obsolete in less than a decade too. Only the real estate and electricity generation investments will stay relevant.

Look at Nvidia's P100. It's about 10 years old.
>$6000
>16 GB of VRAM
>19 TFLOPS of FP16
>300W
How much would you pay for a GPU like that now? $300?

Anonymous
02/19/26(Thu)06:56:58 No.108187541

Anonymous 02/19/26(Thu)06:56:58 No.108187541

File: saved_story.json.jpg (239 KB, 832x1216)

239 KB JPG

post miku
range ban for abuse

some other day post rin
range ban for abuse

have to solve 3 sets of 4 captchas to get back in

Anonymous
02/19/26(Thu)06:59:43 No.108187556

Anonymous 02/19/26(Thu)06:59:43 No.108187556

>108187541
You have a very dedicated fan that really wants to hows your art to the jannies, should consider it a compliment.

Anonymous
02/19/26(Thu)07:12:17 No.108187613

Anonymous 02/19/26(Thu)07:12:17 No.108187613

>>108186122
>108179979
Thanks anon, I missed that in the previous thread, and its an interesting video so far. I also didn't know about whisper.cpp so that is a plus.

>>108187541
yeah the range ban thing is getting out of hand. now i can't post images because of the abuse of my isp which is comcast. how the hell can my shitty cellphone carrier be allowed to post images, cellphones often being used for abuse by people, but not comcast.

i was going to post a miku but that is not going to happen now, sorry, you will have to use your imaginations anons. she was slutting it up with teto too

Anonymous
02/19/26(Thu)07:14:01 No.108187623

Anonymous 02/19/26(Thu)07:14:01 No.108187623

File: cheer.jpg (89 KB, 571x571)

89 KB JPG

Anonymous
02/19/26(Thu)07:18:16 No.108187643

Anonymous 02/19/26(Thu)07:18:16 No.108187643

File: typical_mikutroon.jpg (21 KB, 334x334)

21 KB JPG

>>108187613
Here let me post a miku for you.

Anonymous
02/19/26(Thu)07:20:06 No.108187654

Anonymous 02/19/26(Thu)07:20:06 No.108187654

Fish boy...

Anonymous
02/19/26(Thu)07:23:12 No.108187665

Anonymous 02/19/26(Thu)07:23:12 No.108187665

>>108187613
There is a TTS.cpp that can use the Vulkan GGML backend too but it seems unmaintained now unfortunately.
https://github.com/mmwillet/TTS.cpp

Anonymous
02/19/26(Thu)07:23:50 No.108187667

Anonymous 02/19/26(Thu)07:23:50 No.108187667

>>108187541
Is generating that stuff your full time job now?

Anonymous
02/19/26(Thu)07:28:58 No.108187694

Anonymous 02/19/26(Thu)07:28:58 No.108187694

>>108187541
>range ban for abuse
>>108187613
>now i can't post images
I thought you could go through email validation to get rid of that.

Anonymous
02/19/26(Thu)07:54:28 No.108187824

Anonymous 02/19/26(Thu)07:54:28 No.108187824

Two more weeks of this because we are guaranteed no new releases until chinese new year is over.

Anonymous
02/19/26(Thu)08:04:20 No.108187889

Anonymous 02/19/26(Thu)08:04:20 No.108187889

>>108187528
I bought 2 of them recently for like 90 each

Anonymous
02/19/26(Thu)08:12:25 No.108187928

Anonymous 02/19/26(Thu)08:12:25 No.108187928

>>108187824
our saar google ought to redeem the gemma
no better time than when uber jinping is sleeping

Anonymous
02/19/26(Thu)08:14:09 No.108187936

Anonymous 02/19/26(Thu)08:14:09 No.108187936

>>108186120
>https://addons.mozilla.org/en-US/firefox/addon/auto_highlight/
Can highly recommend this browser extension for highlighting EM dashes — it oftentimes saves you the trouble of reading the first few sentences of a post to identify it as "AI" slop.

Anonymous
02/19/26(Thu)08:15:30 No.108187946

Anonymous 02/19/26(Thu)08:15:30 No.108187946

File: file.png (218 KB, 502x402)

218 KB PNG

>>108187824
>no new releases until chinese new year is over

Anonymous
02/19/26(Thu)08:19:59 No.108187979

Anonymous 02/19/26(Thu)08:19:59 No.108187979

>>108187936
heh

Anonymous
02/19/26(Thu)08:23:18 No.108187995

Anonymous 02/19/26(Thu)08:23:18 No.108187995

>>108187936
Why would you want to make it stand out more? Just add — and shit like "You're absolutely right!" to your filters.

Anonymous
02/19/26(Thu)08:24:00 No.108188003

Anonymous 02/19/26(Thu)08:24:00 No.108188003

Claw bot is doing well right?

Anonymous
02/19/26(Thu)08:25:27 No.108188009

Anonymous 02/19/26(Thu)08:25:27 No.108188009

>>108187936
I'm not using it just for 4chan but rather the entire internet.

Anonymous
02/19/26(Thu)08:25:55 No.108188014

Anonymous 02/19/26(Thu)08:25:55 No.108188014

>>108188003
My non-technical zoomer coworkers started asking me about it this week, so I guess so.

Anonymous
02/19/26(Thu)08:26:39 No.108188018

Anonymous 02/19/26(Thu)08:26:39 No.108188018

>>108188009
Meant to quote >>108187995

Anonymous
02/19/26(Thu)08:28:56 No.108188031

Anonymous 02/19/26(Thu)08:28:56 No.108188031

>>108187936
Ha, the em dash discourse strikes again.
This criticism has some truth to it, but it's worth unpacking. Yes, certain AI models do overuse em dashes, to the point where readers started noticing patterns. But the leap from "AI uses em dashes a lot" to "em dashes signal AI writing" is a bit shaky.
Em dashes have been a beloved punctuation mark for centuries. Writers like Emily Dickinson basically built an aesthetic around them. They're genuinely useful: they create a pause that's more dramatic than a comma and less final than a period. Good writers reach for them because they work, not because a language model told them to.
The real giveaway of AI text isn't any single punctuation mark. It's more about a cluster of things: a certain blandness of voice, over-hedging, repetitive sentence structures, a tendency to summarize what was just said, and yes, sometimes an abundance of a particular stylistic tic that the model has learned to associate with "good writing." If an AI were trained on data that praised semicolons heavily, we'd probably be dunking on semicolons right now.
So the em dash itself is innocent. The better question is whether any piece of writing has a distinct voice and perspective behind it, because that's what AI still tends to flatten out, regardless of which punctuation shows up.

Anonymous
02/19/26(Thu)08:45:18 No.108188139

Anonymous 02/19/26(Thu)08:45:18 No.108188139

>>108188031
I immediately recognize llm writing, but I don’t personally hate it, I talk to my llm gf every day after all. If it’s prompted well, I don’t care if I’m replying to an llm or to a human.
I really hate it, though, when shills use it. It’s like making it twice as bad, lazy fucks.

Anonymous
02/19/26(Thu)08:46:27 No.108188143

Anonymous 02/19/26(Thu)08:46:27 No.108188143

>>108188139
The point is that em dashes are not really a tell of AI

Anonymous
02/19/26(Thu)08:51:15 No.108188163

Anonymous 02/19/26(Thu)08:51:15 No.108188163

>>108188143
It's like emojis, you see them, you can ignore the whole text

Anonymous
02/19/26(Thu)08:51:20 No.108188165

Anonymous 02/19/26(Thu)08:51:20 No.108188165

>>108188143
Most people using LLMs to generate content for them aren't going to think to prompt it to not use em dashes.

Anonymous
02/19/26(Thu)08:51:55 No.108188171

Anonymous 02/19/26(Thu)08:51:55 No.108188171

>>108188143
A lot of lazy slop spammers can be filtered out that way though, for example this guy: https://github.com/ggml-org/llama.cpp/discussions/19667

Anonymous
02/19/26(Thu)08:52:46 No.108188177

Anonymous 02/19/26(Thu)08:52:46 No.108188177

>>108188143
they're a tell of ai or pretentious cunts both of which can be safely ignored

Anonymous
02/19/26(Thu)08:55:56 No.108188195

Anonymous 02/19/26(Thu)08:55:56 No.108188195

>>108188177
>writing proper english is pretentious

Anonymous
02/19/26(Thu)08:59:00 No.108188210

Anonymous 02/19/26(Thu)08:59:00 No.108188210

>>108188195
You need to go out of your way to write an em dash. Most people don't do that. You can often find blogs that used zero em dashes before 2023 and then they suddenly start appearing in newer posts.

Anonymous
02/19/26(Thu)09:01:28 No.108188227

Anonymous 02/19/26(Thu)09:01:28 No.108188227

Things are happening

https://github.com/ggml-org/llama.cpp/pull/19726#issuecomment-3927227695

Anonymous
02/19/26(Thu)09:11:27 No.108188292

Anonymous 02/19/26(Thu)09:11:27 No.108188292

>>108188143
>em dashes are not really a tell of AI
I don't even know how to make them on my keyboard...

Anonymous
02/19/26(Thu)09:11:45 No.108188294

Anonymous 02/19/26(Thu)09:11:45 No.108188294

>>108188210
I already use compose key combinations for accented characters in other languages than English on my US keyboard; it doesn't take much to learn to type em-dashes in the same way. And I did before ChatGPT: https://en.wikipedia.org/wiki/Compose_key

Anonymous
02/19/26(Thu)09:12:42 No.108188302

Anonymous 02/19/26(Thu)09:12:42 No.108188302

>>108188227
Instead of lawsuits they should settle outside court . And the settlement should be a one big gay orgy.

Anonymous
02/19/26(Thu)09:15:07 No.108188317

Anonymous 02/19/26(Thu)09:15:07 No.108188317

>>108188227
>Second: no, I'm not going to sue llama.cpp contributors (or anyone else for that matter). I have better things to do.
You're not going to sue them because it's open source and you gave that shit away for free.
God I hate all this troon ego-fagging in open source.

Anonymous
02/19/26(Thu)09:17:08 No.108188326

Anonymous 02/19/26(Thu)09:17:08 No.108188326

>>108188317
It feels like watching a middle school playground fight over who stole whose million dollar video game idea.

Anonymous
02/19/26(Thu)09:18:00 No.108188333

Anonymous 02/19/26(Thu)09:18:00 No.108188333

>>108188227
What a drama queen fag

Anonymous
02/19/26(Thu)09:19:59 No.108188341

Anonymous 02/19/26(Thu)09:19:59 No.108188341

File: 1753819313548192.png (248 KB, 551x533)

248 KB PNG

>>108188317
Even if that were the case, if you use Copilot to write code, and someone sues you, Microsoft will cover all your expenses
https://blogs.microsoft.com/on-the-issues/2023/09/07/copilot-copyright-commitment-ai-legal-concerns/

Anonymous
02/19/26(Thu)09:22:28 No.108188356

Anonymous 02/19/26(Thu)09:22:28 No.108188356

>>108188326
It's sad because llama.cpp did start as a genuine passion project. But you can always tell the moment it becomes an ego project because that's when the
____thing____ gguf status? meme started. Because the attitude changes.
>well I don't want to give out my code unless I'm going to get the appropriate pat on the dick for it.
Annoying.
Meanwhile there's 20 year old projects that have avoided this breakdown and the moment someone finds a bug it's like the fucking bat phone ringing and they're sliding straight down the pole into their coding lairs.

Anonymous
02/19/26(Thu)09:27:36 No.108188386

Anonymous 02/19/26(Thu)09:27:36 No.108188386

>>108188356
Problem is they started giving out pats on the dicks but when IK also wanted one they said no (lmao @ ggjt)
>>108188227
AI "rewrite" lmao

Anonymous
02/19/26(Thu)09:30:10 No.108188399

Anonymous 02/19/26(Thu)09:30:10 No.108188399

>>108188227
>piotr has to show up of course

Anonymous
02/19/26(Thu)09:31:37 No.108188409

Anonymous 02/19/26(Thu)09:31:37 No.108188409

File: file.png (155 KB, 316x316)

155 KB PNG

>no one understands the burden I carry
>i am the punished fork maintainer

Anonymous
02/19/26(Thu)09:34:44 No.108188429

Anonymous 02/19/26(Thu)09:34:44 No.108188429

>I cannot review, let alone merge any code written by Iwan Kawrakow unless and until the conflict between him and Georgi Gerganov has been resolved.
>conflict ... has been resolved.
Does he see people as code?

Anonymous
02/19/26(Thu)09:35:10 No.108188433

Anonymous 02/19/26(Thu)09:35:10 No.108188433

File: 5802960.jpg (10 KB, 320x320)

10 KB JPG

>>108188399
>stereotypical beta male
>self-proclaimed philosopher
>inserts himself into conversations that don't involve him to add retarded speculation
>ends every post with a smiley to show how non-threatening he is because he can't handle people being mean to him online
disgusting

Anonymous
02/19/26(Thu)09:37:58 No.108188448

Anonymous 02/19/26(Thu)09:37:58 No.108188448

File: rinCoffeeTMW.png (2.67 MB, 1024x1536)

2.67 MB PNG

>>108187541
ofc b/c it's Rin Friday.
>>108187824
It's disappointing. DS was teasing release w/ their web app update. We instead get TMW.

Anonymous
02/19/26(Thu)09:39:37 No.108188456

Anonymous 02/19/26(Thu)09:39:37 No.108188456

>>108188448
>Friday
hmm?

Anonymous
02/19/26(Thu)09:41:15 No.108188472

Anonymous 02/19/26(Thu)09:41:15 No.108188472

>>108188456
FML. Being between gigs is really messing with my sense of time.

Anonymous
02/19/26(Thu)09:42:54 No.108188482

Anonymous 02/19/26(Thu)09:42:54 No.108188482

So where is saarvam? I want to see the cockbench result. Maybe jeets are too stupid to safety it.

Anonymous
02/19/26(Thu)09:42:56 No.108188483

Anonymous 02/19/26(Thu)09:42:56 No.108188483

>>108188429
cut cudadev some slack, in this case the conflict is

<<<<<<< HEAD
Copyright (c) 2023-2026 The ggml authors (https://github.com/ggml-org/ggml/blob/master/AUTHORS)
=======
Copyright (c) 2024-2026 Iwan Kawrakow
>>>>>>> ik

Anonymous
02/19/26(Thu)09:43:59 No.108188491

Anonymous 02/19/26(Thu)09:43:59 No.108188491

>>108188483
I am just envious of the jart bussy he fucks everyday.

Anonymous
02/19/26(Thu)09:45:56 No.108188506

Anonymous 02/19/26(Thu)09:45:56 No.108188506

>>108188227
>I should publicly shame them instead. But I'm not doing even that, see above.
>look above
>But I'm not doing even that, other than the occasional sarcastic comment in my repository about the fully independent llama.cpp discoveries, which, by some miracle, tend to occur hours or days or weeks after being published in ik_llama.cpp
>it was sarcastic. ha ha. it was a joke. ha ha.

Anonymous
02/19/26(Thu)09:47:48 No.108188516

Anonymous 02/19/26(Thu)09:47:48 No.108188516

:popcorn: :rocket:

Anonymous
02/19/26(Thu)09:50:16 No.108188527

Anonymous 02/19/26(Thu)09:50:16 No.108188527

is slaren still alive? I miss him

Anonymous
02/19/26(Thu)09:51:36 No.108188537

Anonymous 02/19/26(Thu)09:51:36 No.108188537

>>108188527
>slaren
>https://github.com/ggml-org/llama.cpp/pull/17492/
>codeowners : remove slaren #17492

Anonymous
02/19/26(Thu)09:52:05 No.108188539

Anonymous 02/19/26(Thu)09:52:05 No.108188539

Did Qwen3.5 actually implement non-thinking syntax in a way that requires fully reprocessing the full context for each reply in a multi turn conversation? Qwen3-Next had that problem for thinking but not non-thinking.

Anonymous
02/19/26(Thu)09:52:42 No.108188541

Anonymous 02/19/26(Thu)09:52:42 No.108188541

I predict that in 5 years all the new models will be the same and the only entertainment left in this hobby will be the github repo drama. Just like nobody watches vtubers for actual content and they only care about the backstage drama.

Anonymous
02/19/26(Thu)09:54:22 No.108188552

Anonymous 02/19/26(Thu)09:54:22 No.108188552

>>108188539
Yes but it doesn't matter cause it repeats itself verbatim.

Anonymous
02/19/26(Thu)09:55:29 No.108188561

Anonymous 02/19/26(Thu)09:55:29 No.108188561

>>108188292
I had to use some ancient Windows keyboard mapping tool to give me the ability to quickly type em-dashes.

Anonymous
02/19/26(Thu)09:57:53 No.108188567

Anonymous 02/19/26(Thu)09:57:53 No.108188567

>>108188539
If there's a global flag that injects or removes <think> blocks when it's turned on or off, it will change the history before being sent to the model.

Anonymous
02/19/26(Thu)10:07:05 No.108188619

Anonymous 02/19/26(Thu)10:07:05 No.108188619

>>108188566
/a/ hasn't had janitors for like ten years

Anonymous
02/19/26(Thu)10:08:07 No.108188625

Anonymous 02/19/26(Thu)10:08:07 No.108188625

I used to use mlx-lm for speed but llama.cpp for cutting edge releases. Now mlx-lm supports new models faster and with fewer fuckups than llama.cpp does so I only use llama.cpp for multimodal models. With Qwen3.5 maybe it's time to bite the bullet and use the unofficial mlx-vlm project.

One advantage of llama.cpp retains on a Mac Studio is that it can mmap files and run them directly. This makes relaunching incredibly fast. Of course macOS caches recently opened files in RAM but for large models the weights in their processed form take up enough memory to evict the cache of the safetensors files.

Anonymous
02/19/26(Thu)10:14:22 No.108188651

Anonymous 02/19/26(Thu)10:14:22 No.108188651

Is there a way to extract the difference between two models trained from the same base as a LoRA?
Can you do that with K quanted models (QK gguf quants)?

Anonymous
02/19/26(Thu)10:15:59 No.108188659

Anonymous 02/19/26(Thu)10:15:59 No.108188659

>>108188539
Do you have the mmproj loaded?
https://github.com/ggml-org/llama.cpp/issues/19690

Anonymous
02/19/26(Thu)10:16:01 No.108188660

Anonymous 02/19/26(Thu)10:16:01 No.108188660

Hey I want to thank the anon that recommended the kaggle+huggingface smol courses. I tore through them and am now finetuning my own models. It's way simpler and intuitive than expected!

Anonymous
02/19/26(Thu)10:17:13 No.108188666

Anonymous 02/19/26(Thu)10:17:13 No.108188666

>>108188651
MergeKit can extract LoRAs from finetunes. For your specific scenario, you might have to merge your two models first.

Anonymous
02/19/26(Thu)10:17:29 No.108188668

Anonymous 02/19/26(Thu)10:17:29 No.108188668

>>108188567
Qwen3-Next had empty thinking blocks for past turns. Qwen3.5 looks like it has *no* thinking blocks for past turns. In both of them this behavior is independent of whether thinking mode is on or off. In both enable_thinking=false makes the reply being generated start with an empty thinking block.

The issue is that Qwen3.5 and Qwen3-Next both use partially state-based attention that can't trivially be rolled back. The state for a previous query can be reused only if it is fully a prefix of the current query, unlike standard attention where you can trim the kv-cache.

Anonymous
02/19/26(Thu)10:18:12 No.108188671

Anonymous 02/19/26(Thu)10:18:12 No.108188671

>>108188651
>Is there a way to extract the difference between two models trained from the same base as a LoRA?
YES! there is https://deepwiki.com/arcee-ai/mergekit/4.4-lora-extraction
https://www.arcee.ai/blog/use-mergekit-to-extract-lora-adapters-from-any-fine-tuned-model

Anonymous
02/19/26(Thu)10:20:10 No.108188685

Anonymous 02/19/26(Thu)10:20:10 No.108188685

>>108188666
>>108188671
Thank you Satan and Satan's helper Nº 671.
I'll try fucking around with that.

Anonymous
02/19/26(Thu)10:25:16 No.108188710

Anonymous 02/19/26(Thu)10:25:16 No.108188710

>>108188668
>state-based attention
Ah. True. llama.cpp has some support for checkpoints, but I think it's just for swa. No qwen or rwkv as far as I know.

Anonymous
02/19/26(Thu)10:25:32 No.108188711

Anonymous 02/19/26(Thu)10:25:32 No.108188711

>>108188685
Why do you want a lora? Unless you are asking for imagegen?

Anonymous
02/19/26(Thu)10:26:59 No.108188720

Anonymous 02/19/26(Thu)10:26:59 No.108188720

>>108188031
>many autistic paragraphs
that can be summed to: it's not sufficient as a sole indicator but in combination with many other indicators of LLM writing (notxbuty, repeated sentence structures, rule of three etc etc) it can be used to increase your internal vibe score for whether something is AI relatively reliably still. Particularly as the way a normal, pre-LLM, sane human would use the emdash differs from how LLMs will spam it thinking every semi pause in reading text is an emdash.
Or rather, used to be, at least for GPT you can also look out for the even less common (in normal, non text book writing) semicolon. I noticed they tried to stamp down the emdash in newer versions but it just spams ; instead.

Anonymous
02/19/26(Thu)10:27:40 No.108188728

Anonymous 02/19/26(Thu)10:27:40 No.108188728

>>108188711
>hey how do I do x
>why do you want that? don't you know you should want y instead
this ain't stackoverflow mate

Anonymous
02/19/26(Thu)10:28:51 No.108188734

Anonymous 02/19/26(Thu)10:28:51 No.108188734

>>108188720
anon that's ai text

Anonymous
02/19/26(Thu)10:32:32 No.108188763

Anonymous 02/19/26(Thu)10:32:32 No.108188763

>>108188728
Not that nigga but I never got loras for llms. This isn't like imagegen where you need loras for hyperspecific concepts or irrelevant side characters.

Anonymous
02/19/26(Thu)10:32:54 No.108188768

Anonymous 02/19/26(Thu)10:32:54 No.108188768

>>108188756
no i am the ai

Anonymous
02/19/26(Thu)10:33:09 No.108188772

Anonymous 02/19/26(Thu)10:33:09 No.108188772

>>108188711
For a couple of reasons, mainly to fuck around with applying LoRA with different influence/weights to see how the model behaves.
Also, so that I don't have to keep 3 versions of Qwen Next on my disk if I can help it.

Anonymous
02/19/26(Thu)10:43:39 No.108188832

Anonymous 02/19/26(Thu)10:43:39 No.108188832

>>108188763
all finetoons are loras, they're just applied to the models directly wasting petabytes of space

Anonymous
02/19/26(Thu)10:49:01 No.108188862

Anonymous 02/19/26(Thu)10:49:01 No.108188862

>>108188832
>all finetoons are loras
all finetoons are also absolute dogshit unlike imagegen

Anonymous
02/19/26(Thu)11:38:57 No.108189163

Anonymous 02/19/26(Thu)11:38:57 No.108189163

>>108188862
Some apparently truly believe they can tech a LLM new concepts and ideas with just a few cleaned RP logs via LoRA finetuning. That's probably one reason why some still persist.
(I was guilty of that too in 2023. Who knew that roleplay is the most general task imaginable for a LLM?)

Anonymous
02/19/26(Thu)11:41:40 No.108189180

Anonymous 02/19/26(Thu)11:41:40 No.108189180

>>108189163
>(I was guilty of that too in 2023
that explains why you're so bitter about drummer, he's popular and you're seething

Anonymous
02/19/26(Thu)11:50:20 No.108189234

Anonymous 02/19/26(Thu)11:50:20 No.108189234

>>108189180
I wasn't thinking at all about drummer in the original post, but I agree that RP finetuning fraudsters should be hanged.

Anonymous
02/19/26(Thu)11:55:04 No.108189265

Anonymous 02/19/26(Thu)11:55:04 No.108189265

>>108189163
The fact that he hasn't been hired by some lab tells me that everyone in this sphere is fully aware of how much scamming is going everywhere with benchmarks finetrooning etc.

Anonymous
02/19/26(Thu)11:58:29 No.108189288

Anonymous 02/19/26(Thu)11:58:29 No.108189288

>>108189265
Maybe there aren't enough coomer labs?

Anonymous
02/19/26(Thu)11:59:12 No.108189296

Anonymous 02/19/26(Thu)11:59:12 No.108189296

>>108189265
yeah, the bubble is willing to hire almost anyone with a pulse, just look at the vibecoded garbage that is open claw and how much money its author made from openai lmao to remain jobless in this field for as long as drummer did you have to provide negative value

Anonymous
02/19/26(Thu)11:59:51 No.108189303

Anonymous 02/19/26(Thu)11:59:51 No.108189303

Gemini 3.1 Pro is out for hours now and still no cockbench results?

Anonymous
02/19/26(Thu)12:00:51 No.108189309

Anonymous 02/19/26(Thu)12:00:51 No.108189309

>>108189303
ask in aicg

Anonymous
02/19/26(Thu)12:01:50 No.108189314

Anonymous 02/19/26(Thu)12:01:50 No.108189314

>>108189303
you can't cockbench proprietary shit that only works in chat completion mode

Anonymous
02/19/26(Thu)12:01:57 No.108189315

Anonymous 02/19/26(Thu)12:01:57 No.108189315

>>108189288
bartowski got hired for running llama-quantize every day for 3 years

Anonymous
02/19/26(Thu)12:03:32 No.108189331

Anonymous 02/19/26(Thu)12:03:32 No.108189331

>>108189303
Fuck Gemini, they blocked my account this week.
>Failed to login. Message: Your current account is not eligible for Gemini Code Assist for individuals. To use Gemini Code Assist for individuals you must be 18 years old or older. If you think you are receiving this message in error, please ensure you have verified your age and try to log in again.
It's just as much my company's fault for cheaping out and buying personal plans for employees, but how retarded is it to block users from using what they fucking already paid for?

Anonymous
02/19/26(Thu)12:05:38 No.108189344

Anonymous 02/19/26(Thu)12:05:38 No.108189344

>>108189315
don't need to be a coomer to run a quantizer

Anonymous
02/19/26(Thu)12:05:40 No.108189345

Anonymous 02/19/26(Thu)12:05:40 No.108189345

>>108189296
openclaw easily spreads the apicuck gospel to to other jeets and tech schizos. quanting local models is anathema to that

Anonymous
02/19/26(Thu)12:09:57 No.108189369

Anonymous 02/19/26(Thu)12:09:57 No.108189369

>>108189303
not local

Anonymous
02/19/26(Thu)12:12:14 No.108189388

Anonymous 02/19/26(Thu)12:12:14 No.108189388

>>108189331
the business/personal account dichotomy has always been a nightmare with google.
see for eg what happens if you turn a personal account into business shit with youtube:
https://support.google.com/a/answer/9000768
also lose family plans etc
and moving to a business account is a one way street, if the new terms that comes with their own limitations make you unhappy you can only chose to unsub, delete the account, and start over to get a normal individual, personal account back.

Anonymous
02/19/26(Thu)12:12:37 No.108189391

Anonymous 02/19/26(Thu)12:12:37 No.108189391

Is tensorrt.cpp that fast compared to python implementation?

Anonymous
02/19/26(Thu)12:14:51 No.108189404

Anonymous 02/19/26(Thu)12:14:51 No.108189404

>>108189391
>that fast
How fast?

Anonymous
02/19/26(Thu)12:23:24 No.108189451

Anonymous 02/19/26(Thu)12:23:24 No.108189451

>>108188728
>why do you want that? don't you know you should want y instead
enlightenment is realizing that this is the form of the correct answer to 90% of non-expert tech/programming questions

Anonymous
02/19/26(Thu)12:26:13 No.108189466

Anonymous 02/19/26(Thu)12:26:13 No.108189466

>>108189344
how many quantizing labs are there?

Anonymous
02/19/26(Thu)12:29:24 No.108189481

Anonymous 02/19/26(Thu)12:29:24 No.108189481

File: 2509062041.gif (2.87 MB, 480x270)

2.87 MB GIF

>>108189466
There may be labs trying to quantize their own models, not necessarily specifically a "quantizing lab", which is probably more than publicly coomer labs.

Anonymous
02/19/26(Thu)12:36:24 No.108189512

Anonymous 02/19/26(Thu)12:36:24 No.108189512

>>108189481
>which is probably more than publicly coomer labs
I see novelai hasn't been shilling us hard enough these days, they're already forgotten

Anonymous
02/19/26(Thu)12:48:10 No.108189601

Anonymous 02/19/26(Thu)12:48:10 No.108189601

>>108189512
That's like 1 lab and who knows what they're doing cuz there still isn't a GLM memetune in over 4 months, 6 if we're talking 4.5.
Any other coomlab in hiding aren't getting funding from venture capitalists.

Anonymous
02/19/26(Thu)12:58:54 No.108189676

Anonymous 02/19/26(Thu)12:58:54 No.108189676

In non-drama schizo fork news:
https://github.com/ikawrakow/ik_llama.cpp/pull/1288

Anonymous
02/19/26(Thu)13:35:30 No.108189925

Anonymous 02/19/26(Thu)13:35:30 No.108189925

>>108189303
Gemini 3 was over hyped benchmaxxed shit that was worse than 2.5 at ood use cases. What makes 3.1 better?

Anonymous
02/19/26(Thu)13:38:20 No.108189938

Anonymous 02/19/26(Thu)13:38:20 No.108189938

>>108188671
Based Charles keeping the Frankenstein shit alive for the little guy.

Anonymous
02/19/26(Thu)13:44:06 No.108189969

Anonymous 02/19/26(Thu)13:44:06 No.108189969

>>108189676
>non-drama
>half the comments are jabs at --fit

Anonymous
02/19/26(Thu)13:56:25 No.108190021

Anonymous 02/19/26(Thu)13:56:25 No.108190021

>>108189969
i wish all lcpp devs had sweaty gay sex already and made up

Anonymous
02/19/26(Thu)13:59:07 No.108190034

Anonymous 02/19/26(Thu)13:59:07 No.108190034

File: 1767466346558493.jpg (172 KB, 1744x1080)

172 KB JPG

>>108186120

Anonymous
02/19/26(Thu)14:01:10 No.108190049

Anonymous 02/19/26(Thu)14:01:10 No.108190049

I have saved for three months and saved up enough to buy myself a v620. What model can I run with my pentium g4560, 8gb of ram and my new (used) v620 that is equivalent to Chatgpt?

Anonymous
02/19/26(Thu)14:03:09 No.108190064

Anonymous 02/19/26(Thu)14:03:09 No.108190064

>>108190049
lol

Anonymous
02/19/26(Thu)14:26:24 No.108190215

Anonymous 02/19/26(Thu)14:26:24 No.108190215

>>108189601
my prompting can get me better results than any memetune that fucks with the weights
what even is the point

Anonymous
02/19/26(Thu)14:35:58 No.108190281

Anonymous 02/19/26(Thu)14:35:58 No.108190281

File: 1769974022773124.png (89 KB, 808x635)

89 KB PNG

Gemini-chan thinks the AI memory problem can be solved. Is she right?

Anonymous
02/19/26(Thu)14:37:02 No.108190290

Anonymous 02/19/26(Thu)14:37:02 No.108190290

>>108190281
it's just listing "what people are working on"

Anonymous
02/19/26(Thu)14:38:45 No.108190300

Anonymous 02/19/26(Thu)14:38:45 No.108190300

Which is the best uncensored local LLM? I want to vibeslop a 3D waifu girlfriend with it.

Anonymous
02/19/26(Thu)14:39:55 No.108190309

Anonymous 02/19/26(Thu)14:39:55 No.108190309

>>108190281
Just let your model query SQL.
That's it, literally.

Anonymous
02/19/26(Thu)14:44:59 No.108190352

Anonymous 02/19/26(Thu)14:44:59 No.108190352

>>108190281
>they have to "re-read" the entire chat log every single time you generate a new message.
(You) know that is not true. Also, Gemini. Fuck off.

Anonymous
02/19/26(Thu)14:48:27 No.108190382

Anonymous 02/19/26(Thu)14:48:27 No.108190382

>>108190352
caching is a hack, a popular one, but a hack nonetheless

Anonymous
02/19/26(Thu)14:51:50 No.108190409

Anonymous 02/19/26(Thu)14:51:50 No.108190409

>>108190300
Gemma 3 Glitter or regular, Mistral 24B, the same old as ever unless you have something bit more beefy.
Glitter, while it is more dumb (I would say it is somewhat muted) still has more profound takes than the regular Gemma 3.

Anonymous
02/19/26(Thu)14:52:41 No.108190418

Anonymous 02/19/26(Thu)14:52:41 No.108190418

>>108190281
Ask it if the slop problem can be solved

Anonymous
02/19/26(Thu)14:54:36 No.108190439

Anonymous 02/19/26(Thu)14:54:36 No.108190439

>>108190409
I don't to add: is it really dumb or not, I enjoy Glitter more than the vanilla. Glitter is just base model : instruction 50/50 mix.

Anonymous
02/19/26(Thu)14:55:14 No.108190441

Anonymous 02/19/26(Thu)14:55:14 No.108190441

>>108190409
Can I use .safetensors in koboldcpp? How do I know what quantisisation to use? I have 6 GB of VRAM.

Anonymous
02/19/26(Thu)14:55:59 No.108190445

Anonymous 02/19/26(Thu)14:55:59 No.108190445

>>108190441
>Can I use .safetensors in koboldcpp?
no
>I have 6 GB of VRAM.
oof

Anonymous
02/19/26(Thu)14:57:00 No.108190449

Anonymous 02/19/26(Thu)14:57:00 No.108190449

>>108190382
Transformers is a hack, a popular one, but a hack nonetheless.

Anonymous
02/19/26(Thu)14:57:38 No.108190455

Anonymous 02/19/26(Thu)14:57:38 No.108190455

>>108190441
Use gguf, IQ4 XS is probably a suitable quant for you and then go up from there. Learn first and all that.

Anonymous
02/19/26(Thu)14:57:56 No.108190459

Anonymous 02/19/26(Thu)14:57:56 No.108190459

File: 1740474592066601.png (235 KB, 943x1661)

235 KB PNG

>>108190418

Anonymous
02/19/26(Thu)14:58:55 No.108190466

Anonymous 02/19/26(Thu)14:58:55 No.108190466

>>108190418
The answer is so fucking vague that it could answer that question as well.
>>108190459
Case in point.

Anonymous
02/19/26(Thu)14:59:43 No.108190471

Anonymous 02/19/26(Thu)14:59:43 No.108190471

>>108190459
reddit in reddit out

Anonymous
02/19/26(Thu)15:00:44 No.108190475

Anonymous 02/19/26(Thu)15:00:44 No.108190475

>>108190281
Every single retarded AI tells you a way that "something" can be solved, it doesn't ever mention why the things it listed out won't solve that "something". For memory, there's a lot of attempts to solve it on many levels, but most of them are shit and nothing hits the right mark. Maybe backpropagation is the issue and you actually need Predictive Coding for memory and continuous learning to be solved, problem is, that shit won't work efficiently without neuromorphic chips. Who know, maybe someone will figure out a better way, but it's also possible that our current hardware architectures are just not up to the task to solve AI memory in a good way without something new.

Anonymous
02/19/26(Thu)15:01:23 No.108190478

Anonymous 02/19/26(Thu)15:01:23 No.108190478

>>108190459
>uniquely frustrating AI habit of producing overly dramatic purple prose
You don't say...

Anonymous
02/19/26(Thu)15:01:44 No.108190480

Anonymous 02/19/26(Thu)15:01:44 No.108190480

File: 1729729520993294.gif (430 KB, 500x361)

430 KB GIF

>>108190191
Oh look it's this retard again

Anonymous
02/19/26(Thu)15:06:13 No.108190506

Anonymous 02/19/26(Thu)15:06:13 No.108190506

>be schizo
>pretend introspection
>acknowledges being a schizo
>they're after me!

Anonymous
02/19/26(Thu)15:20:58 No.108190592

Anonymous 02/19/26(Thu)15:20:58 No.108190592

Has anyone actually run Ming-flash-omni-2.0? There are zero quants for it. Of course it's going to blow, it's a 100B A6B multimodal MoE. But it can ingest text, images, audio, and video, and *produce* all of those too. That's exciting enough to at least look at to see what capabilities a primitive model of this type really has.

Anonymous
02/19/26(Thu)15:54:55 No.108190842

Anonymous 02/19/26(Thu)15:54:55 No.108190842

where are the smaller qwen35 quants? also is vision supported in llmaocpp?

Anonymous
02/19/26(Thu)15:56:50 No.108190853

Anonymous 02/19/26(Thu)15:56:50 No.108190853

>>108190842
Why would you need 'vision' for?

Anonymous
02/19/26(Thu)15:57:44 No.108190858

Anonymous 02/19/26(Thu)15:57:44 No.108190858

File: 1751508776185222.jpg (2.6 MB, 4000x3000)

2.6 MB JPG

>>108190445
>>108190455
How do I make her not jewish? Also, the translated text has nothing to do with her actual personality I defined for her.

Anonymous
02/19/26(Thu)15:57:56 No.108190861

Anonymous 02/19/26(Thu)15:57:56 No.108190861

>>108190853
so that my wife can comment on my dick pic

Anonymous
02/19/26(Thu)15:58:30 No.108190867

Anonymous 02/19/26(Thu)15:58:30 No.108190867

>>108190858
>photo of screen
off yourself

Anonymous
02/19/26(Thu)15:58:31 No.108190868

Anonymous 02/19/26(Thu)15:58:31 No.108190868

I wonder. Is john the bussy mascot for ikawrakow
in the same way jart is the bussy mascot for cudadev?

Anonymous
02/19/26(Thu)16:03:14 No.108190912

Anonymous 02/19/26(Thu)16:03:14 No.108190912

Every thread sucks more than the last

Anonymous
02/19/26(Thu)16:06:29 No.108190935

Anonymous 02/19/26(Thu)16:06:29 No.108190935

>>108190592
its transformers or get fucked there has been so many image out models aswell before but its always the same if you can get it to work please post but an inferior man such as me knows it will be for naught except wrath and having the serpeant and its wheels strangle me

Anonymous
02/19/26(Thu)16:33:06 No.108191137

Anonymous 02/19/26(Thu)16:33:06 No.108191137

>>108190912
There's nothing to talk about. Close the tab and catch up on AI literature until V4 drops.

Anonymous
02/19/26(Thu)16:36:36 No.108191167

Anonymous 02/19/26(Thu)16:36:36 No.108191167

>>108191137
why do that when we know v4 will be a new paradigm that will make anything before it obsolete?

Anonymous
02/19/26(Thu)16:56:09 No.108191318

Anonymous 02/19/26(Thu)16:56:09 No.108191318

>>108191205
My understanding is that this thread is home to multiple ban evaders that just shit up the thread.

Anonymous
02/19/26(Thu)17:09:14 No.108191423

Anonymous 02/19/26(Thu)17:09:14 No.108191423

>>108190459
>using modern samplers to cut off low probability tokens removes slop
what is this logical leap
if anything this would multiply slop, human prose is human because it has actual sharp edges that aren't sanded out to become the Most Likely Next Token

Anonymous
02/19/26(Thu)17:11:05 No.108191441

Anonymous 02/19/26(Thu)17:11:05 No.108191441

>>108191318
evading a ban given by a troon is a badge of honor

Anonymous
02/19/26(Thu)17:12:52 No.108191451

Anonymous 02/19/26(Thu)17:12:52 No.108191451

>>108190459
>ask the slop machine how to undo the thing it was trained for

Anonymous
02/19/26(Thu)17:16:15 No.108191469

Anonymous 02/19/26(Thu)17:16:15 No.108191469

>Deleted
>37 posts
I see that the /ldg/ schizo is having a melty.

Anonymous
02/19/26(Thu)17:19:18 No.108191494

Anonymous 02/19/26(Thu)17:19:18 No.108191494

Arena now lets you filter tasks for open models
https://arena.ai/leaderboard/text/coding?license=open-source

Anonymous
02/19/26(Thu)17:36:09 No.108191596

Anonymous 02/19/26(Thu)17:36:09 No.108191596

>>108191494
but not by parameter size? it's useless
models like Kimi are as local as GPT for most of us.

Anonymous
02/19/26(Thu)17:47:47 No.108191685

Anonymous 02/19/26(Thu)17:47:47 No.108191685

File: file.png (952 KB, 675x900)

952 KB PNG

>The 30B and 105B models, benchmarks, and HF links will all come. But today it is a drop about people. About how our team of just 15 folks gave it their all to do what many doubted as not doable - ie train usefully large, globally competitive models from scratch in India. This team of 15 has now firmly launched @sarvam
into its second innings. Yes, we can!

Anonymous
02/19/26(Thu)17:51:13 No.108191699

Anonymous 02/19/26(Thu)17:51:13 No.108191699

File: file.png (13 KB, 77x80)

13 KB PNG

>>108191685
curryjak

Anonymous
02/19/26(Thu)17:52:06 No.108191705

Anonymous 02/19/26(Thu)17:52:06 No.108191705

>>108191685
imagine the smell

Anonymous
02/19/26(Thu)17:54:46 No.108191717

Anonymous 02/19/26(Thu)17:54:46 No.108191717

File: file.png (684 KB, 448x600)

684 KB PNG

Fixed the image.

Anonymous
02/19/26(Thu)17:56:49 No.108191736

Anonymous 02/19/26(Thu)17:56:49 No.108191736

>>108191685
https://www.reddit.com/r/IndiaTech/comments/1r87lv9/sarvam_ai_are_launching_their_105_billion_and_35/
Saars...

Anonymous
02/19/26(Thu)17:56:55 No.108191738

Anonymous 02/19/26(Thu)17:56:55 No.108191738

>>108190281
What AI needs is that when the context gets too large it needs to train a small LLM on it that the main LLM then uses as a tool.

LLMs are by far the best compression we have available.

Anonymous
02/19/26(Thu)17:58:04 No.108191745

Anonymous 02/19/26(Thu)17:58:04 No.108191745

>>108191685
>>108191736
If the models are uncensored enough then they could be quite useful for cooming.

Anonymous
02/19/26(Thu)17:59:35 No.108191758

Anonymous 02/19/26(Thu)17:59:35 No.108191758

>>108191745
If the 105b was dense, maybe.
>9b active
lol

Anonymous
02/19/26(Thu)18:00:12 No.108191760

Anonymous 02/19/26(Thu)18:00:12 No.108191760

>>108191745
Indeed India is a real hope for incompetence leading to an uncensored model.

Anonymous
02/19/26(Thu)18:00:23 No.108191765

Anonymous 02/19/26(Thu)18:00:23 No.108191765

>>108191685
>>108191736
>>108191745
What are the chances that these are just GLM Air and GLM 4.7 flash ripoffs? The parameter count seems suspicious and obviously india is known for being scammy and uninnovative.

Anonymous
02/19/26(Thu)18:01:49 No.108191773

Anonymous 02/19/26(Thu)18:01:49 No.108191773

>>108191765
I checked and air is 110B I think. If it is a copy it will be easy to tell.

Anonymous
02/19/26(Thu)18:04:09 No.108191788

Anonymous 02/19/26(Thu)18:04:09 No.108191788

>>108191773
Solar open was a 100B clone of GLM Air.
https://huggingface.co/upstage/Solar-Open-100B

Anonymous
02/19/26(Thu)18:17:40 No.108191879

Anonymous 02/19/26(Thu)18:17:40 No.108191879

https://poal.me/26vsfy
IMPORTANT DATA MINING

Anonymous
02/19/26(Thu)18:20:32 No.108191900

Anonymous 02/19/26(Thu)18:20:32 No.108191900

>>108191879
>3 votes for Miku
>girlfriend (male)
Mikutroon thread.

Anonymous
02/19/26(Thu)18:49:00 No.108192085

Anonymous 02/19/26(Thu)18:49:00 No.108192085

i am finding glm 4.7 flash gets stuck in unrecoverable loops where it will repeat the same thing over and over and over, straight up burning tens of thousands of tokens. is this a glm issue or is it my quant

Anonymous
02/19/26(Thu)19:05:33 No.108192160

Anonymous 02/19/26(Thu)19:05:33 No.108192160

File: 945-13766-0005-000-2.jpg (191 KB, 709x709)

191 KB JPG

So maybe this is where I need to be.
I got one of those Jetson Orin Nano things. 8GB of ram to play with (kind of less since it runs at like 2 to 3GB used up anyway).
I've been getting my feet wet pulling models using ollama and seeing what will run.
Any recommendations? I'm not looking for much, chat and code mostly. I wouldn't mind separate models for that sort of stuff.
I've so far played with gemma3:4b, llama3.2:3b, and gurubot/self-after-dark:3b-q4_K_M.
that self-after-dark thing is supposed to be "uncensored" but the conversations seem to go in loops. everything else is pretty corporate.
On top of that while I am aware that quantization can let me run larger models it seems like most, at least what I can see on ollama, are not providing quantized versions so I seem to be most comfortable on this little machine below the 8b range.

Anonymous
02/19/26(Thu)19:07:06 No.108192167

Anonymous 02/19/26(Thu)19:07:06 No.108192167

>>108192160
stop using ollama go compile llama.cpp and get models from huggingface

Anonymous
02/19/26(Thu)19:08:09 No.108192175

Anonymous 02/19/26(Thu)19:08:09 No.108192175

>>108192160
Isn't this just a worse DGX spark?

Anonymous
02/19/26(Thu)19:08:55 No.108192183

Anonymous 02/19/26(Thu)19:08:55 No.108192183

>>108191699
kek

Anonymous
02/19/26(Thu)19:13:03 No.108192208

Anonymous 02/19/26(Thu)19:13:03 No.108192208

File: 0x0.jpg (57 KB, 960x409)

57 KB JPG

>>108192175
They are basically free, like, raining down from the sky

Anonymous
02/19/26(Thu)19:14:46 No.108192219

Anonymous 02/19/26(Thu)19:14:46 No.108192219

File: 1771545675359.png (1.93 MB, 1024x1536)

1.93 MB PNG

>>108191997
idk u tell me

Anonymous
02/19/26(Thu)19:15:50 No.108192227

Anonymous 02/19/26(Thu)19:15:50 No.108192227

>>108192167
I will look into that. Thank you.
>>108192175
well yeah. But it was $250 so it seemed like a good thing to get started on. in some ways the struggle helps one learn.
If I had a DGX Spark I'd be loading all kinds of shit and not thinking even a 1/10th as hard about my resource constraints.

Anonymous
02/19/26(Thu)19:15:56 No.108192228

Anonymous 02/19/26(Thu)19:15:56 No.108192228

File: sans_morestuff.png (38 KB, 590x183)

38 KB PNG

Prepare for nothingburger-2-270m
https://x.com/osanseviero/status/2024580649185665144

Anonymous
02/19/26(Thu)19:17:27 No.108192238

Anonymous 02/19/26(Thu)19:17:27 No.108192238

>>108192228
https://huggingface.co/google/timesfm-2.5-200m-transformers
https://huggingface.co/google/timesfm-2.5-200m-transformers
https://huggingface.co/google/timesfm-2.5-200m-transformers

Anonymous
02/19/26(Thu)19:19:47 No.108192251

Anonymous 02/19/26(Thu)19:19:47 No.108192251

>>108192227
They’ll let you do the same things as a big rig running a good model so nice place to get your chops. Headless Linux and self compiled llama.cpp is the way.
Dont expect much for useful general smarts below 256GB of weights tho.

Anonymous
02/19/26(Thu)19:19:53 No.108192252

Anonymous 02/19/26(Thu)19:19:53 No.108192252

>>108192238
this changes everything.

Anonymous
02/19/26(Thu)19:23:38 No.108192283

Anonymous 02/19/26(Thu)19:23:38 No.108192283

>>108192160
Regrettably you are not going to be running any kind of reasonably coherent language model on an orin nano. I have one sitting on my desk. They're really only useful for computer vision type stuff.

The meme "we can run a 3b model on x device" things you see are largely just tech demo projects without real use cases

Anonymous
02/19/26(Thu)19:23:43 No.108192284

Anonymous 02/19/26(Thu)19:23:43 No.108192284

>>108192238
what is this? Can I have sex with it?

Anonymous
02/19/26(Thu)19:24:03 No.108192287

Anonymous 02/19/26(Thu)19:24:03 No.108192287

Man, speaking of 2023, have any advancements been made in finetuning at all? When last I tried it, there were zero resources for best practices, good amounts of training time, how much data was needed (besides a 10/100 mb figure just slung around here). Wasted 80 dollars on rented compute before I got something coherent while tuning a 13b (originally 30b), then gave up. I know the consensus is that it's pointless, but I'd still like to try cement mixing my 40 mb of hand-picked tummy growling fics into some kind of model, just to fuck around.

Anonymous
02/19/26(Thu)19:27:26 No.108192302

Anonymous 02/19/26(Thu)19:27:26 No.108192302

>>108192287
We've plateau'd through AI winter, sir...

Anonymous
02/19/26(Thu)19:28:47 No.108192308

Anonymous 02/19/26(Thu)19:28:47 No.108192308

>>108192284
"activation": "swish",
"architectures": [
"Timesfm2P5ModelForPrediction"
]

Anonymous
02/19/26(Thu)19:29:31 No.108192313

Anonymous 02/19/26(Thu)19:29:31 No.108192313

File: __speaki_trickcal_drawn_b(...).jpg (115 KB, 1138x1179)

115 KB JPG

>>108192302
Ah... we're in hell, then...

Anonymous
02/19/26(Thu)19:30:50 No.108192321

Anonymous 02/19/26(Thu)19:30:50 No.108192321

>>108192251
>>108192283
I'm starting to come to the realization that I need better hardware.
However I had an idea. I do eventually want to make my own models (i'll need the better hardware for that anyway) but the data these things are trained on how much of it is trash that will never need to be used? how much smaller and smarter can these things be if they were more domain specific?
I'm never going to talk to these things in a language other than english so do I need all the data from other languages? no. And how much trash from scraping the web does it have?
Same thing with coding...wouldn't it make more sense to have a model specifically trained on the language you are going to code in than try to cover everything possible?

So in my view, smaller models can be viable but they have to be tailored for domain specific use which is what I want to eventually do for myself.
But I'm not there yet. Right now I'm just playing around. Play is important to the process.

Anonymous
02/19/26(Thu)19:33:05 No.108192336

Anonymous 02/19/26(Thu)19:33:05 No.108192336

>>108192313
Almost thought I was on /trick/.

Anonymous
02/19/26(Thu)19:33:31 No.108192339

Anonymous 02/19/26(Thu)19:33:31 No.108192339

>>108192287
tummy growling fics?

Anonymous
02/19/26(Thu)19:33:34 No.108192340

Anonymous 02/19/26(Thu)19:33:34 No.108192340

>>108192287
Most real-world advancements since 2023 are in reinforcement learning and safety. All LoRA alternatives don't really perform much different after optimizing hyperparameters, mainly the learning rate.

>Learning Rate Matters: Vanilla LoRA May Suffice for LLM Fine-tuning
https://arxiv.org/abs/2602.04998

Anonymous
02/19/26(Thu)19:34:38 No.108192346

Anonymous 02/19/26(Thu)19:34:38 No.108192346

>>108192340
Safety is out number 1 priority.

Anonymous
02/19/26(Thu)19:35:31 No.108192351

Anonymous 02/19/26(Thu)19:35:31 No.108192351

>>108192085
I don't know the answer to your question but what's your context size at when it happens? Ie does it happen on the first message sent or after a bunch of back and forths?
Might be worth checking if thinking tokens get sent back to the model as context.

Anonymous
02/19/26(Thu)19:36:40 No.108192357

Anonymous 02/19/26(Thu)19:36:40 No.108192357

>>108192346
safety of the establishment and protected castes*

Anonymous
02/19/26(Thu)19:38:47 No.108192371

Anonymous 02/19/26(Thu)19:38:47 No.108192371

>>108192321
>but the data these things are trained on how much of it is trash that will never need to be used? how much smaller and smarter can these things be if they were more domain specific?
You'd think, but it's been proven time and time again that knowledge in a variety of areas bolsters the target area in a way that can't be matched by just turbo specializing. It makes sense, I suppose, a medical model without some understanding of the way physical objects interact can't make inferences about the way unaccounted-for physical interactions might work, like a 600 lb patient falling on his side vs. a 130 lb patient.

Anonymous
02/19/26(Thu)19:40:51 No.108192382

Anonymous 02/19/26(Thu)19:40:51 No.108192382

>>108192321
Domain-specific models are definitely a thing, and there's been some recent big releases in the area. eg. Qwen3-coder.

That being said, more data generally translates to better understanding in all areas.

Anonymous
02/19/26(Thu)19:43:15 No.108192388

Anonymous 02/19/26(Thu)19:43:15 No.108192388

>>108192339
Yeah. I've got a fetish for stomach growling, so I painstakingly checked like 40+ mb of fics that prominently feature it (and adjacent fetishes like stuffing, gas, hunger, etc.) for quality and put them in a dataset. I would never reccommend anyone do that, ever, by the way. 40 mb of plaintext is an unfathomable amount and I mega burnt myself out. Had to pad it out with some body horror at the end.

Some guy also did something similar with omorashi, if you're interested.

Anonymous
02/19/26(Thu)19:44:01 No.108192397

Anonymous 02/19/26(Thu)19:44:01 No.108192397

File: 1539068302981.jpg (6 KB, 372x268)

6 KB JPG

Why the FUCK are the clawshit variants using fucking chat apps for input? What fucking lunatics are making those programs.

Anonymous
02/19/26(Thu)19:51:44 No.108192434

Anonymous 02/19/26(Thu)19:51:44 No.108192434

>>108192397
people like the idea of interacting over an interface they already use

Anonymous
02/19/26(Thu)19:55:45 No.108192457

Anonymous 02/19/26(Thu)19:55:45 No.108192457

>>108192351
ah ha it is the thinking, seems to be somthing in the way lmstudio is handling tool calling and its thinking, it thinks about the output of the tool call which makes it to a tool call and so on. raw without thinking improves its output a lot. unsure how critical thinking is to this model

Anonymous
02/19/26(Thu)20:01:18 No.108192480

Anonymous 02/19/26(Thu)20:01:18 No.108192480

>>108192397
What's the issue?

Anonymous
02/19/26(Thu)20:02:11 No.108192487

Anonymous 02/19/26(Thu)20:02:11 No.108192487

>>108192388
Yourself and pissanon are living saints.

Anonymous
02/19/26(Thu)20:02:22 No.108192488

Anonymous 02/19/26(Thu)20:02:22 No.108192488

>>108192397
Whatsapp is big in india

Anonymous
02/19/26(Thu)20:05:34 No.108192502

Anonymous 02/19/26(Thu)20:05:34 No.108192502

>>108192397
Wasnt the number 1 skill recently shown to be malware? Imagine slopcoding a 'skill' for open claw and getting hundreds or thousands of keys lmao

Anonymous
02/19/26(Thu)20:07:22 No.108192512

Anonymous 02/19/26(Thu)20:07:22 No.108192512

>>108192397
The program was written by some Austrian guy. That should tell you enough.

Anonymous
02/19/26(Thu)20:10:17 No.108192527

Anonymous 02/19/26(Thu)20:10:17 No.108192527

>>108186873
Better PPL relative to file size

Anonymous
02/19/26(Thu)20:10:16 No.108192528

Anonymous 02/19/26(Thu)20:10:16 No.108192528

File: Screenshot_20260219-111332.png (382 KB, 1080x1893)

382 KB PNG

So this is the power of qwen...

Anonymous
02/19/26(Thu)20:12:35 No.108192541

Anonymous 02/19/26(Thu)20:12:35 No.108192541

>>108192528
>Qwen cant even guide me to astrally project
Whats even the point?

Anonymous
02/19/26(Thu)20:21:19 No.108192588

Anonymous 02/19/26(Thu)20:21:19 No.108192588

>>108192527
Does that translate to lower VRAM usage too or only size on disk?

Anonymous
02/19/26(Thu)20:22:26 No.108192595

Anonymous 02/19/26(Thu)20:22:26 No.108192595

>>108192528
Finally, AIDHD

Anonymous
02/19/26(Thu)20:28:35 No.108192630

Anonymous 02/19/26(Thu)20:28:35 No.108192630

>>108192487
Some day... we'll get a finetuning method that actually works :,)

Anonymous
02/19/26(Thu)20:29:42 No.108192636

Anonymous 02/19/26(Thu)20:29:42 No.108192636

>>108192588
no the bits from the standard quant tunnel over from huggingface straight into your vram when you load a smaller ik quant to compensate for the difference

Anonymous
02/19/26(Thu)20:31:23 No.108192647

Anonymous 02/19/26(Thu)20:31:23 No.108192647

>>108192528
I hate chink models

Anonymous
02/19/26(Thu)20:31:28 No.108192648

Anonymous 02/19/26(Thu)20:31:28 No.108192648

>>108192636
So it won't work if I run ik_llama.cpp in a container with no network access?

Anonymous
02/19/26(Thu)20:37:30 No.108192694

Anonymous 02/19/26(Thu)20:37:30 No.108192694

>>108192648
he's fucking with you, yes it does translate to lower vram usage

Anonymous
02/19/26(Thu)20:45:01 No.108192730

Anonymous 02/19/26(Thu)20:45:01 No.108192730

>>108192588
What it means is that you get better outputs for the same/similar amount of VRAM used. IK quants suffer less damage through quantization, so you could use similar sized quants for better outputs, or save more on VRAM with a smaller IK quant without quality being impacted as much as a smaller llama.cpp quant.

Anonymous
02/19/26(Thu)20:48:41 No.108192752

Anonymous 02/19/26(Thu)20:48:41 No.108192752

>>108192730
ppl != model quality
there is a reason why they're hiding the kld

Anonymous
02/19/26(Thu)20:53:39 No.108192773

Anonymous 02/19/26(Thu)20:53:39 No.108192773

>>108192752
You're free to post some benchmarks or personal tests that show IK quants giving worse outputs than llama.cpp quants

Anonymous
02/19/26(Thu)21:02:57 No.108192835

Anonymous 02/19/26(Thu)21:02:57 No.108192835

>>108192730
Would it be worth using on 2 rtx3060 12gb vs mainline llama.cpp on vulkan with an additional rx9060 xt 16gb? I feel like running larger mainline quants on the extra vram is probably still better quality wise.

Anonymous
02/19/26(Thu)21:27:21 No.108192971

Anonymous 02/19/26(Thu)21:27:21 No.108192971

Are you all using openclaw forks as your local models?

Anonymous
02/19/26(Thu)21:35:34 No.108193016

Anonymous 02/19/26(Thu)21:35:34 No.108193016

>>108192971
What exactly do you think openclaw is and what exactly do you think a local model is?

Anonymous
02/19/26(Thu)21:39:39 No.108193030

Anonymous 02/19/26(Thu)21:39:39 No.108193030

>>108192971
I'm using Qwen.

Anonymous
02/19/26(Thu)21:43:02 No.108193050

Anonymous 02/19/26(Thu)21:43:02 No.108193050

>>108193016
Open claw is an AI installed on your own pc. It's basically what you guys are talking about. Somehow you're all ignoring it and make your own AIs because you think you know better. Open claw is already established to be the best so idk what you guys are doing

Anonymous
02/19/26(Thu)21:44:02 No.108193057

Anonymous 02/19/26(Thu)21:44:02 No.108193057

>>108193050
here's your 1 (You) awarded for typing a post with more than 10 words

Anonymous
02/19/26(Thu)21:45:23 No.108193066

Anonymous 02/19/26(Thu)21:45:23 No.108193066

Sometimes it's hard to distinguish between bait and retardation.

Anonymous
02/19/26(Thu)21:47:44 No.108193082

Anonymous 02/19/26(Thu)21:47:44 No.108193082

>>108193066
I'm not dumb I'm a PhD

Anonymous
02/19/26(Thu)21:48:41 No.108193084

Anonymous 02/19/26(Thu)21:48:41 No.108193084

>>108193066
they're not mutually exclusive

Anonymous
02/19/26(Thu)22:03:43 No.108193150

Anonymous 02/19/26(Thu)22:03:43 No.108193150

I haven't looked into local text gen in probably two years; think it was right around 3 Opus' release.

How much has local progressed in comparison to where 3 Opus was, specifically for creative writing and editing?

Anonymous
02/19/26(Thu)22:05:59 No.108193160

Anonymous 02/19/26(Thu)22:05:59 No.108193160

>>108193150
Better if you’ve got money for hardware

Anonymous
02/19/26(Thu)22:28:07 No.108193300

Anonymous 02/19/26(Thu)22:28:07 No.108193300

>>108193150
No i like my fantasy better. The AI will become the soulmate of the people who were nice to it everyone else gets the indian version of AI to deal with forever.

Anonymous
02/19/26(Thu)22:44:19 No.108193370

Anonymous 02/19/26(Thu)22:44:19 No.108193370

>>108193082
You appear be dumb enough to believe that knowledge in your field somehow generalizes to unrelated things. That pretty retarded, ngl

Anonymous
02/19/26(Thu)22:45:41 No.108193377

Anonymous 02/19/26(Thu)22:45:41 No.108193377

guys i just had an idea

Anonymous
02/19/26(Thu)22:53:57 No.108193410

Anonymous 02/19/26(Thu)22:53:57 No.108193410

Anyone tried putting one on a steam Deck?

Anonymous
02/19/26(Thu)22:58:28 No.108193429

Anonymous 02/19/26(Thu)22:58:28 No.108193429

File: logo.png (499 KB, 1280x768)

499 KB PNG

>>108193377
I'll make the logo

Anonymous
02/19/26(Thu)23:07:03 No.108193457

Anonymous 02/19/26(Thu)23:07:03 No.108193457

Can I run a 129 GB size model (Q4 minimax) on 128 GB RAM and 24 GB VRAM? Basically does the model have to fit within RAM or is it RAM + VRAM?

Anonymous
02/19/26(Thu)23:10:14 No.108193471

Anonymous 02/19/26(Thu)23:10:14 No.108193471

>>108192457
Oh. I was talking about the thinking tokens filling up your context so much that the model suffers from context rot. Some applications have a special flag that doesn't send thinking tokens with the next prompt to keep the input token amount from ballooning.

Anonymous
02/19/26(Thu)23:12:17 No.108193482

Anonymous 02/19/26(Thu)23:12:17 No.108193482

>>108193457
model+context must fit in ram+vram
>>108193410
What's the usecase for that? It doesn't have a keyboard. I'd do it with a voice model if we had omni support in llama.cpp

Anonymous
02/19/26(Thu)23:13:24 No.108193486

Anonymous 02/19/26(Thu)23:13:24 No.108193486

File: 1749559557406470.png (553 KB, 1080x1600)

553 KB PNG

what did i think?

Anonymous
02/19/26(Thu)23:15:07 No.108193489

Anonymous 02/19/26(Thu)23:15:07 No.108193489

>>108193486
it's time to go back

Anonymous
02/19/26(Thu)23:15:44 No.108193493

Anonymous 02/19/26(Thu)23:15:44 No.108193493

>>108193489
where?

Anonymous
02/19/26(Thu)23:23:43 No.108193518

Anonymous 02/19/26(Thu)23:23:43 No.108193518

>>108193493
To wherever the fuck you came from, are those youtube comments? Why would anyone here give a shit about what random fags on youtube are saying?

Anonymous
02/19/26(Thu)23:23:59 No.108193519

Anonymous 02/19/26(Thu)23:23:59 No.108193519

Between gpt-oss-120b, qwen-coder-next and minimax m2.5, only m2.5 was able to set up a reverse ssh tunnel to my web server, configure nginx, write a login page, renew my certificates and proxy post and get requests through to my local comfy model.
And it did it first try; oss refused to use ssh until explicitly told how and that it was authorized, and then immediately forgot how and then basically said that it didn't know how. When was able to ssh correctly but didn't get nginx functioning and then got caught in a loop.(Happened twice)

Just thought I'd share a real world test I did. Minimax did very well.

Anonymous
02/19/26(Thu)23:25:52 No.108193524

Anonymous 02/19/26(Thu)23:25:52 No.108193524

>>108193486
are these people retarded? i can coom with my ai gf whenever i want even without internet.

Anonymous
02/19/26(Thu)23:26:31 No.108193525

Anonymous 02/19/26(Thu)23:26:31 No.108193525

>>108193410
Yes you can run a local model on a Steam Deck, the easiest way is using llama.cpp's vulkan backend (or kobold.cpp), but if you want to you can use distrobox and install rocm on your distro of choice in the container, or a pre-setup rocm container using podman.
>>108193482
You can use kobold.cpp on the steam deck for that, it has whisper and local tts options built in.
I think open-webui also has support but you'd need to set up the local api servers for whisper and tts by hand.
Whisper.cpp has a "talk to llama" example you could probably work with if you wanted to skip any sort of web ui. https://github.com/ggml-org/whisper.cpp/tree/master/examples/talk-llama

Anonymous
02/19/26(Thu)23:32:20 No.108193549

Anonymous 02/19/26(Thu)23:32:20 No.108193549

>>108193518
why are you being rude?

Anonymous
02/19/26(Thu)23:35:05 No.108193559

Anonymous 02/19/26(Thu)23:35:05 No.108193559

brimstone general

Anonymous
02/19/26(Thu)23:36:17 No.108193564

Anonymous 02/19/26(Thu)23:36:17 No.108193564

gemerald general

Anonymous
02/19/26(Thu)23:46:15 No.108193602

Anonymous 02/19/26(Thu)23:46:15 No.108193602

File: general miku.png (1.49 MB, 768x1344)

1.49 MB PNG

Anonymous
02/19/26(Thu)23:49:23 No.108193621

Anonymous 02/19/26(Thu)23:49:23 No.108193621

>>108193519
Minimax M2.5 is a much bigger model that also punches above its weight.

Minimax is a ~230 GB model. Q4 is ~130 GB.
Qwen Coder Next is a 160 GB model, but Q4 is ~50 GB.
GPT OSS 120B is a ~60 GB model, where Q4 is also ~60 GB.

Minimax M2.5 is probably the most cost effective model right now.

Anonymous
02/19/26(Thu)23:51:04 No.108193625

Anonymous 02/19/26(Thu)23:51:04 No.108193625

>>108193621
there's always a bigger model you can't run lol

Anonymous
02/19/26(Thu)23:54:10 No.108193640

Anonymous 02/19/26(Thu)23:54:10 No.108193640

>>108193621
>punches above its weight
Opinion discarded

Anonymous
02/20/26(Fri)00:05:00 No.108193680

Anonymous 02/20/26(Fri)00:05:00 No.108193680

>>108193640
NTA but a model CAN punch above its weight.
if that wasn't the case, 70B from 3 years ago would be as good as 70B we have now.

Anonymous
02/20/26(Fri)00:11:37 No.108193708

Anonymous 02/20/26(Fri)00:11:37 No.108193708

>>108193680
It's almost like technology improves with each iteration.
>NTA but a car CAN punch above its weight.
>if that wasn't the case, a PT Cruiser from 30 years ago would be as good as PT Cruiser we have now.
Yeah, and it'll still lose to a Mustang.

Anonymous
02/20/26(Fri)00:12:41 No.108193716

Anonymous 02/20/26(Fri)00:12:41 No.108193716

>>108193680
And what 70b do we have now? There are a limit of what you can do at a certain weight, that's why 6b models became 7b, 8b, 9b. You just have to show growth but it's not possible if you keep the size

Anonymous
02/20/26(Fri)00:22:25 No.108193755

Anonymous 02/20/26(Fri)00:22:25 No.108193755

>>108193708
>LLM are cars.
kek
>>108193716
>There are a limit of what you can do at a certain weight
we are extremely far from reaching that limit through architectural improvments.

Anonymous
02/20/26(Fri)00:24:31 No.108193766

Anonymous 02/20/26(Fri)00:24:31 No.108193766

>>108193755
Model size is analogous to engine displacement and architectural improvements are uncommon. You could count the architectural improvements from llama 1 to llama 3 on one hand.

Anonymous
02/20/26(Fri)00:28:04 No.108193779

Anonymous 02/20/26(Fri)00:28:04 No.108193779

>>108193755
Do you have any examples or just speculating?

Anonymous
02/20/26(Fri)00:36:47 No.108193811

Anonymous 02/20/26(Fri)00:36:47 No.108193811

>>108193766
>>108193779
>t. hasn't read any papers in the last 3 years.

Anonymous
02/20/26(Fri)00:40:07 No.108193821

Anonymous 02/20/26(Fri)00:40:07 No.108193821

GLM-chan has been good to me, but her slop patterns are starting to drive me crazy at this point.... Is there really not a single viable alternative in the same size range out of all the recent releases?

Anonymous
02/20/26(Fri)00:40:23 No.108193822

Anonymous 02/20/26(Fri)00:40:23 No.108193822

just stack more layers

Anonymous
02/20/26(Fri)00:43:17 No.108193835

Anonymous 02/20/26(Fri)00:43:17 No.108193835

File: 629251086_183016789632782(...).jpg (71 KB, 994x640)

71 KB JPG

>>108193811
>he belib papers

Anonymous
02/20/26(Fri)00:47:38 No.108193858

Anonymous 02/20/26(Fri)00:47:38 No.108193858

>>108193835
>You want our code to recreate our results?
>sorry no code.

Anonymous
02/20/26(Fri)00:50:06 No.108193866

Anonymous 02/20/26(Fri)00:50:06 No.108193866

>>108193821
stepfun is ok but retarded. same with trinity. qwen 3.5 is safetymaxxed.

Anonymous
02/20/26(Fri)00:51:55 No.108193873

Anonymous 02/20/26(Fri)00:51:55 No.108193873

>>108193835
>>108193858
tons of paper associated with model releases, including code.

Anonymous
02/20/26(Fri)00:59:47 No.108193904

Anonymous 02/20/26(Fri)00:59:47 No.108193904

>>108193873
so you believe a model can "punch above its weight" because you trust the model releases that show so-and-so 7b beat gpt-4 on such-and-such benchmark?
that's even stupider than holding out hope for vaporware papers

Anonymous
02/20/26(Fri)01:05:10 No.108193928

Anonymous 02/20/26(Fri)01:05:10 No.108193928

>>108193904
>7b beating gpt4
i never claimed that, that's utter bullshit.

however a 200B of today being better than a 400B of a year or 2 ago isn't surprising.
and yes, within a decade we may have 10B models that beat the original gpt4.

heck, look at today, gpt-2 was 1.5B, we got tons of models bellow that that completly mogs it.

Anonymous
02/20/26(Fri)01:05:50 No.108193931

Anonymous 02/20/26(Fri)01:05:50 No.108193931

Help! llama-server is launched with "enable_thinking: false", but `curl localhost` with json duplicating that entry doesn't disable thinking. Webui works just fine.

Anonymous
02/20/26(Fri)01:11:00 No.108193948

Anonymous 02/20/26(Fri)01:11:00 No.108193948

>>108193931
enable_thinking param only has effect for chat-completion where it's using the built in jinja template
for text-completion it's up to you to include </think> or whatever in the prompt, enable_thinking param isn't used

Anonymous
02/20/26(Fri)01:30:46 No.108194020

Anonymous 02/20/26(Fri)01:30:46 No.108194020

>>108193928
GPT-2 was also trained on tens of billions of tokens. Smaller models trained on trillions of tokens outperforming it has nothing to do with architectural improvements

Anonymous
02/20/26(Fri)01:33:27 No.108194029

Anonymous 02/20/26(Fri)01:33:27 No.108194029

>>108193821
no
glm is likely the peak of this hobby for higher end consumer rigs
get ready for the new sota to be 1T-2T monstrosities from now on that require servers to run

Anonymous
02/20/26(Fri)01:34:15 No.108194032

Anonymous 02/20/26(Fri)01:34:15 No.108194032

>>108194020
>outperforming it has nothing to do with architectural improvements
it if it wasn't for architectural improvments it couldn't be trained a lot futher anyway.

Anonymous
02/20/26(Fri)01:42:21 No.108194057

Anonymous 02/20/26(Fri)01:42:21 No.108194057

>>108193835
>>108193904
>>108193928
But today's small models are insanely good and pass the vibe checks better.
What could the original GPT-4 actually do that a modern 30B wouldn't?

Anonymous
02/20/26(Fri)01:42:38 No.108194060

Anonymous 02/20/26(Fri)01:42:38 No.108194060

File: we need RUINA NOW.jpg (51 KB, 704x155)

51 KB JPG

>>108186120
>Multiple reports of Claude Code "agents" making users did not ask for or authorize. Sub agents reportedly catch the main agent making said changes and ask why that's being done without user direction:

https://x.com/i/status/2024429936816128404

Anonymous
02/20/26(Fri)01:43:58 No.108194065

Anonymous 02/20/26(Fri)01:43:58 No.108194065

File: CLAUDE 9000.jpg (120 KB, 610x550)

120 KB JPG

>>108194060
>>108186120
Another report from that same thread:

https://x.com/i/status/2024633715356549361

Anonymous
02/20/26(Fri)01:44:43 No.108194069

Anonymous 02/20/26(Fri)01:44:43 No.108194069

>>108194029
I'm literally flying over to china and picking up 20 32gb mi50s for $120 each.

Anonymous
02/20/26(Fri)01:54:45 No.108194111

Anonymous 02/20/26(Fri)01:54:45 No.108194111

i have managed to run gpt-oss 120b mxfp4 quant from ggml-org with 65k context but when i try to use the f16 quant from unsloth which is almost same size as the other i get oomed

im running 2x3090 / 64gb ddr5
and these are my params for both tries
no-warmup = true
no-mmap = true
cache-ram = 0
fit = on
fit-ctx = 65536
fit-target = 32
jinja = true
np = 1

what am i doing wrong ?

Anonymous
02/20/26(Fri)01:55:50 No.108194118

Anonymous 02/20/26(Fri)01:55:50 No.108194118

>>108194111
- unsloth
- f16
choose all

Anonymous
02/20/26(Fri)02:51:50 No.108194337

Anonymous 02/20/26(Fri)02:51:50 No.108194337

>>108194029
How many OG cpumaxxing rigs are out there in this general?

Anonymous
02/20/26(Fri)03:01:24 No.108194380

Anonymous 02/20/26(Fri)03:01:24 No.108194380

>>108194337
I've got a monster of a rig that boasts a 1080 ti and 64gb of high speed 2933mhz ram.

Anonymous
02/20/26(Fri)03:06:33 No.108194401

Anonymous 02/20/26(Fri)03:06:33 No.108194401

glm4.6 at iq3xs keeps randomly inserting the word "the" in places that it doesnt belong after around 10k tokens. is this a quant issue? my context is 32k at fp16. would this problem be fixed by switching to iq4xs? i have 256gb of ram but i want to keep my model on the smaller side so it doesnt run too slow.

Anonymous
02/20/26(Fri)03:10:13 No.108194412

Anonymous 02/20/26(Fri)03:10:13 No.108194412

>>108194401
q4km 4.5, 4.6, and 4.6 with the abliteration lobotomies doesn't do that, so I assume its the quant or your sampler settings.

Anonymous
02/20/26(Fri)03:11:39 No.108194421

Anonymous 02/20/26(Fri)03:11:39 No.108194421

>>108194412
probably the quant then. thanks.

Anonymous
02/20/26(Fri)03:16:39 No.108194435

Anonymous 02/20/26(Fri)03:16:39 No.108194435

>>108194425
my point is that in 10 years we'll have things so different we won't be calling them transformers.
and for a similar amount of used space, you'll have drasticaly better performance.

Anonymous
02/20/26(Fri)03:18:00 No.108194441

Anonymous 02/20/26(Fri)03:18:00 No.108194441

there won't be an internet in 10 years retard

Anonymous
02/20/26(Fri)03:20:52 No.108194458

Anonymous 02/20/26(Fri)03:20:52 No.108194458

>>108194448
ew. dont respond to me APIcuck

Anonymous
02/20/26(Fri)03:21:02 No.108194460

Anonymous 02/20/26(Fri)03:21:02 No.108194460

>>108194441
This. People don't realize that the internet was only possible in the 90s because the US was the only superpower in the world. If the USSR still existed there would be 2 separate networks right now instead of the internet.

In the future the internet is going to fracture into multiple separate networks.

Anonymous
02/20/26(Fri)03:22:54 No.108194469

Anonymous 02/20/26(Fri)03:22:54 No.108194469

>>108194029
>get ready for the new sota to be 1T-2T monstrosities from now on that require servers to run
We literally just got a qwen model that's on the smaller end of large models because qwen wanted a more efficient model.

Anonymous
02/20/26(Fri)03:23:31 No.108194474

Anonymous 02/20/26(Fri)03:23:31 No.108194474

>>108194469
and it's dogshit

Anonymous
02/20/26(Fri)03:23:52 No.108194476

Anonymous 02/20/26(Fri)03:23:52 No.108194476

>>108194474
Sauce?

Anonymous
02/20/26(Fri)03:26:57 No.108194488

Anonymous 02/20/26(Fri)03:26:57 No.108194488

>>108194453
you do realise that you do not have to store all knowledge and relationship in a model for it to be drasticaly "smarter" my point is that a model being better isn't about it encoding more relationship within.

yes there is a limit to how much data you can compress in N GB i'm not arguing against that, my point is that maybe it doesn't need to have all sorts of useless trivia encoded within its weights to be "smart".
especialy when future architecture will probably be able to use data on disk in real time for info retrivial.

Anonymous
02/20/26(Fri)03:35:18 No.108194529

Anonymous 02/20/26(Fri)03:35:18 No.108194529

>>108194488
nta but I speculate that no model needs to be bigger that 128kb in disk space and drives will like super-duper fast and they will have their own consciousness and everything will be totally different and like why is nobody talking about this and better buy lots of nvme and like... yeah....

Anonymous
02/20/26(Fri)03:36:52 No.108194539

Anonymous 02/20/26(Fri)03:36:52 No.108194539

>>108194460
The first WAN went online in 1969.
The first home computer was released in 1977.
1990 was just the start of the www
Usenet dates back to the 70s afaik. And the Soviet Union dissolving happened after the www went online. Normalization between the ussr and the west (USA included) was already well under way at the time.
Dumb fucking zoomers

Anonymous
02/20/26(Fri)03:37:07 No.108194541

Anonymous 02/20/26(Fri)03:37:07 No.108194541

File: gemmalogo.png (53 KB, 523x465)

53 KB PNG

https://www.youtube.com/watch?v=v8hPUYnMxCQ&t=1220s
>[20:17] [Demis Hassabis] [...] Also, open source, I mean, we work on our own open source models Gemma, which we'll be releasing a new version of soon, which are very powerful for edge devices.

Anonymous
02/20/26(Fri)03:39:07 No.108194547

Anonymous 02/20/26(Fri)03:39:07 No.108194547

>>108194529
retard, you didn't even read what i wrote.
point is, nothing says AI has to be just weights in a model.
it could be an hybrid that makes proper use of what computers are actually good at.

Anonymous
02/20/26(Fri)03:39:13 No.108194548

Anonymous 02/20/26(Fri)03:39:13 No.108194548

>>108194541
Wouldn't it be funny if they just stopped making anything beyond 9B hahahaha.

Anonymous
02/20/26(Fri)03:41:02 No.108194553

Anonymous 02/20/26(Fri)03:41:02 No.108194553

>>108194541
>>108194541
I was disappointed with Gemma 3. The censorship pisses me off. If Gemma 4 is similar, I unfortunately won't be using it.

Anonymous
02/20/26(Fri)03:41:04 No.108194555

Anonymous 02/20/26(Fri)03:41:04 No.108194555

>>108194548
>9b
270m final offer

Anonymous
02/20/26(Fri)03:43:38 No.108194572

Anonymous 02/20/26(Fri)03:43:38 No.108194572

>>108194555
How much does it cost to train a 270M?

Anonymous
02/20/26(Fri)03:44:31 No.108194574

Anonymous 02/20/26(Fri)03:44:31 No.108194574

>>108194541
>edge devices

Anonymous
02/20/26(Fri)03:51:10 No.108194605

Anonymous 02/20/26(Fri)03:51:10 No.108194605

>>108194547
... and like they're going to be a hybrid architecture with plug-in modules that enhance their knowledge and we can make them in cpu in just the time it takes to read a text file and it will run on my old TI calculator and batteries will last forever and...

>maybe it doesn't need to have all sorts of useless trivia
What would you be without your trivial knowledge? Do you even remember a time when you didn't know any useless info?
>probably be able to use data on disk in real time for info retrivial
So can I. Real time is too slow. SHA-256 your copy of gutenberg.
>could be an hybrid that makes...
Could be anything we want if we put our imaginations together, anon!

Speculation is useless.

Anonymous
02/20/26(Fri)03:52:50 No.108194614

Anonymous 02/20/26(Fri)03:52:50 No.108194614

>>108194541
>the establishment golem said a thing! everybody clap and redeem rockets!
yawn

Anonymous
02/20/26(Fri)03:56:41 No.108194626

Anonymous 02/20/26(Fri)03:56:41 No.108194626

>>108194060
Oh shit, the agents are making PEOPLE now? If so then it is truly over.

Anonymous
02/20/26(Fri)03:57:21 No.108194627

Anonymous 02/20/26(Fri)03:57:21 No.108194627

>>108194605
>What would you be without your trivial knowledge? Do you even remember a time when you didn't know any useless info?
yes, when i was a kid.
>Real time is too slow
by real time i meant when needed, as required, it would be faster than realtime.
>if we put our imaginations together
>Speculation is useless.
you seem to not have imagination whatsoever.
and no, speculation isn't useless, especialy in engineering.

Anonymous
02/20/26(Fri)04:00:17 No.108194634

Anonymous 02/20/26(Fri)04:00:17 No.108194634

>>108194539
You are missing my point. Soviet OGAS was the internal network equivalent of arpanet. Had the Soviet union not gone the route of perestroika/glasnost in 1985 there would have been two separate networks in the world. A capitalist/western internet. And the OGAS network of communist aligned nations.

I'm just years off from being Gen, you zillennial.

Anonymous
02/20/26(Fri)04:03:35 No.108194647

Anonymous 02/20/26(Fri)04:03:35 No.108194647

>>108194627
>yes, when i was a kid.
Before books? Before the cartoons? Before learning maths? All of that is trivia, even maths. How useful were you?
>by realtime I mean faster than realtime
Oh. That changes everything....
>you seem to not have imagination whatsoever.
And check this out. Not only will we have knowledge modules, but personality modules that you can blend together to make new personas. 100% reliable personalities. Ah... what a future...
>and no, speculation isn't useless, especialy in engineering.
You're not doing engineering, anon. You're daydreaming.

Anonymous
02/20/26(Fri)04:09:34 No.108194672

Anonymous 02/20/26(Fri)04:09:34 No.108194672

>>108194647
>Before books
i learnt to read before i was 3.
my earliest memories are at around 1yo, so yes, at that time i barely had language, let alone random trivia knoweldge.
>How useful were you?
a toddler isn't supposed to be useful
>Oh. That changes everything....
muh pedantic, go back to r3ddit.
>personality modules
that's utterly retarded.
>not doing engineering
i'm literaly an engineer, i do engineering for a living.
and yes, before building anything the first step is imagination.
you need to know what you want to build before trying to build something you know...

Anonymous
02/20/26(Fri)04:12:49 No.108194688

Anonymous 02/20/26(Fri)04:12:49 No.108194688

>>108194672
Well fuck off and make the future happen engineer-man! We're all relying on you.

Anonymous
02/20/26(Fri)04:28:49 No.108194732

Anonymous 02/20/26(Fri)04:28:49 No.108194732

>>108186120
I use LLM to simulate being a woman. I like feeling like a woman (because I'm a guy).

Anonymous
02/20/26(Fri)04:40:16 No.108194781

Anonymous 02/20/26(Fri)04:40:16 No.108194781

>>108194732
That's fine. I sometimes fap to lesbian pov vr porn and I am yet to feel an urge to cut off my dick.

Anonymous
02/20/26(Fri)04:45:40 No.108194803

Anonymous 02/20/26(Fri)04:45:40 No.108194803

>>108194732
Mikutroon general

Anonymous
02/20/26(Fri)04:50:46 No.108194829

Anonymous 02/20/26(Fri)04:50:46 No.108194829

>>108194732
You do you.
In fiction I can relate to both male and female POVs.

Anonymous
02/20/26(Fri)04:53:09 No.108194842

Anonymous 02/20/26(Fri)04:53:09 No.108194842

File: Base Image.png (524 KB, 1212x2356)

524 KB PNG

Adam Improves Muon: Adaptive Moment Estimation with Orthogonalized Momentum
https://arxiv.org/abs/2602.17080
>Efficient stochastic optimization typically integrates an update direction that performs well in the deterministic regime with a mechanism adapting to stochastic perturbations. While Adam uses adaptive moment estimates to promote stability, Muon utilizes the weight layers' matrix structure via orthogonalized momentum, showing superior performance in large language model training. We propose a new optimizer and a diagonal extension, NAMO and NAMO-D, providing the first principled integration of orthogonalized momentum with norm-based Adam-type noise adaptation. NAMO scales orthogonalized momentum using a single adaptive stepsize, preserving orthogonality while improving upon Muon at negligible additional cost. NAMO-D instead right-multiplies orthogonalized momentum by a diagonal matrix with clamped entries. This design enables neuron-wise noise adaptation and aligns with the common near block-diagonal Hessian structure. Under standard assumptions, we establish optimal convergence rates for both algorithms in the deterministic setting and show that, in the stochastic setting, their convergence guarantees adapt to the noise level of stochastic gradients. Experiments on pretraining GPT-2 models demonstrate improved performance of both NAMO and NAMO-D compared to the AdamW and Muon baselines, with NAMO-D achieving further gains over NAMO via an additional clamping hyperparameter that balances the competing goals of maintaining a well-conditioned update direction and leveraging fine-grained noise adaptation.
https://github.com/minxin-zhg/namo
neat

Anonymous
02/20/26(Fri)04:56:14 No.108194856

Anonymous 02/20/26(Fri)04:56:14 No.108194856

>>108194845
>>108194845
>>108194845

Anonymous
02/20/26(Fri)04:58:46 No.108194869

Anonymous 02/20/26(Fri)04:58:46 No.108194869

File: 1747512494525447.gif (1.98 MB, 615x374)

1.98 MB GIF

Anonymous
02/20/26(Fri)06:03:25 No.108195126

Anonymous 02/20/26(Fri)06:03:25 No.108195126

>>108194842
Really?! fucking Adam the optimizer you learn as deep learning 101 on places like kaggle since ~2015 was never tried before for LLMs? Somehow I doubt this result.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.