[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: marshmallow.jpg (82 KB, 1280x720)
82 KB
82 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106829402 & >>106822756

►News
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air
>(10/06) Anthropic open sources Petri, a parallel exploration tool: https://anthropic.com/research/petri-open-source-auditing
>(10/03) Qwen3-VL-30B-A3B released: https://hf.co/Qwen/Qwen3-VL-30B-A3B-Thinking

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1743930208988058.jpg (200 KB, 1987x1151)
200 KB
200 KB JPG
►Recent Highlights from the Previous Thread: >>106829402

--Papers:
>106833679
--Optimizing MoE model inference through precise GPU layer offloading and expert distribution:
>106833203 >106833249 >106833351 >106833358 >106833377 >106833419 >106833425 >106833427 >106833435
--Debating the mechanics and value of "thinking" models in AI:
>106830044 >106830075 >106830151 >106830206 >106830220 >106830277 >106830549 >106830622 >106830761 >106830813 >106830208
--RAM configuration requirements for optimizing LLM performance:
>106831285 >106831307 >106831329 >106831361 >106831393 >106831338
--LLM pretraining constraints on single 3090 GPU with 8k context:
>106831430 >106831498 >106831511 >106831588 >106831646 >106831757 >106831566 >106831804
--Local AI image generation on diverse hardware setups:
>106831180 >106831242 >106831246 >106831273 >106831318 >106831439 >106831457
--Affordable high-performance setup for running quantized models via recycled hardware:
>106832490 >106832530 >106832565 >106832610
--Ling-1T model release and hardware accessibility challenges:
>106831637 >106831644 >106831754 >106831790 >106831781 >106831680
--Anon seeks to implement Qwen3 VL support in a custom C inference engine due to llama.cpp limitations:
>106829429 >106830678 >106830706 >106830315
--Developing a safetensors parser for embeddings with CPU inference prioritization:
>106832999 >106833055 >106833127
--Optimizing quantization for AI porn recognition with new vision models:
>106830909 >106830954 >106831069
--Miku (free space):
>106831423 >106832550 >106832579 >106832764 >106832727 >106832768 >106832868 >106832901 >106832996 >106833006 >106833706 >106834083 >106834125 >106834194 >106834210 >106834241

►Recent Highlight Posts from the Previous Thread: >>106829407

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: motherboard options.jpg (1.07 MB, 2000x2000)
1.07 MB
1.07 MB JPG
repost:
which of these motherboards would be the best for ai? the goal is at least 768gb of ddr5 and 4 dual slot gpus while fitting in a normal case with no risers. the threadripper pro (top right) will probably be the fastest, but also most expensive. the xeon (bottom left) would be the slowest and second most expensive but has room for 16 dimms on the 8 channels. the epyc would be the cheapest and second fastest. they unfortunately do not make a 12 channel motherboard with room for 4 dual slot gpus without the use of risers. unless i just havent been looking hard enough
>>
I feel like either Openrouter is silently redirecting me to a shittier provider even though it says it's routing them to Z-AI, or Z-AI is sometimes serving a shittier version of the model.
For 10 minutes the model gets the syntax wrong for tool usage even with fresh context, then it goes back to working normally.
>>
>>106834585
Have you tried using the models.... Locally???
>>
Has anyone used this before? Does it support local models?
https://www.warp.dev/
>>
>>106834537
Why 4 gpus? A couple of Blackwell 6000 pro aren’t enough for you?
>>
>>106834651
i have 3 fe 5090s and i want to get a fourth one
>>
>>106834660
Honest questions: why wouldn’t you sell 2 of them and buy a 6000 pro 96gb instead of buying a 4th?
>>
>>106834714
Go to bed, Jensen.
>>
>>106834731
certainly it's cheaper to run one NVIDIA RTX PRO™ 6000 Blackwell Workstation Edition card instead of four RTX 5090s
>>
>>106834731
"Not Yet."
Only gamers will get that joke
>>
>3090s
>4090s
>5090s
>6000 pro
What is the best to start stacking if you have a $10k/2000w budget?
>>
>>106834843
isn't having lots of fast ram more important?
>>
File: 1750169987900306.png (273 KB, 1686x911)
273 KB
273 KB PNG
Have you guys seen this?
https://github.com/huawei-csl/SINQ
https://arxiv.org/abs/2509.22944
>>
>>106834843
6000 pro for prompt processing, then DDR5 ram
>>
>>106834872
Nobody here will care until they compare it to gguf.

And they never do.
>>
>>106834886
jeeguff is quant, perfected.
>>
>>106834848
I'd rather run a smaller model or heavily quantized one at 30t/s+ than run SOTA at <5t/s. Especially with high context.
I also think unified memory will be outpaced by AI needs pretty quick. No upgrade path means no buy. That leaves me with stacking GPUs.
>>106834883
Even with 8 or 12 channel ram aren't you getting single digit t/s? Or have things really gotten that much better in the last few months?
>>
>>106834907
If you’re stacking gpus, why in the world are you buying 32GB paperweights?
>>
>>106834931
I'm not? I'm asking which 90 series gpu is the best to start stacking in late 2025.
>>
>>106834907
why not just do both then and have enough regular ram to load in bigger quants? it's not like you are going to get a decent 4+ pci-e slot mobo without getting at least 8 ram slots
>>
>>106834907
You might get ~10-12t/s if you minmax llama.cpp -ot trickery + epyc turin cpu + ddr5-6000 sticks + 6000 pro gpu to speed things a bit for non-meme quants
>>
One more day until tomorrow. Exciting times.
>>
>>106835225
The workweek will end and nothing will happen as usual.
>>
>>106835228
New model gets released tomorrow.
>>
>>106835225
HUGE
>>
>>106834537
Make sure to read the motherboards' manuals before you decide, in particular look out for how many PCIe lanes are actually going to the slots.
There are motherboards with 4 x16 slots but if you actually populate all of them you only get x16/x8/x8/x8.

Consider that DIMMs with a large capacity also have a higher price per GB of memory so having more slots can save you money there.

EPYC motherboards with 12 DDR5 slots: https://geizhals.de/gigabyte-me03-ce1-5me03ce1nr-000-10a-a3148839.html https://geizhals.de/gigabyte-me03-ce0-5me03ce0nr-000-10a-a3148902.html
I can very much recommend Geizhals, excellent site to find (offers for) specific hardware.

My opinion is that Threadripper is only better than EPYC if you need high single core performance like for a desktop PC.
>>
>>106835225
every 60 seconds in india, a minute passes, together we can stop this
>>
>>106834537
>>106835307
>EPYC motherboards with 12 DDR5 slots
Derp, I can't count.
I thought those were 7 PCIe slots, but it's actually just 6.
>>
>>106835307
this is making me dizzy from pure love... nkdsh
>>
>>106835225
Meta employee here. You guys are going to love whats coming. bad news for erpers though
>>
File: 1747420072118937.png (2.14 MB, 1024x1536)
2.14 MB
2.14 MB PNG
>>106835307
>hexa channel
*barfs*
>>
>>106835458
Good evening sir, what's your designation at Meta?
>>
Pornhub employee here. You guys love coming
>>
>>106835458
>bad news for erpers though
is there another purpose for local? what other possible reason can there be to hide my activities?
>>
>>106834537
the cool looking one because it says SAGE around pci slots meaning improved performance with sageattn
>>
>>106835458
Yeah, all ERPers go straight to jail, meta was spying through the gguf vulnerability all along
>>
would it be feasible to have something like q9 and q10 quants?
>>
>>106835703
q11 exists: https://arxiv.org/abs/2504.11651
>>
>>106835703
Not when perplexity is the main metrics for estimating how good quants are
>>
>>106835703
I want reverse quants. Give me q64.
>>
>>106835703
Q8 scores so closely to full weights that even synthetic benchmarks can't reliably tell the difference. What would be the point?
>>
>>106835730
it isn't, it's our lord and savior KLD
>>
>>106835752
>Nemo but it requires 48GB minimum, scores 0.0000001% better than Q8 in one non-reproducible benchmark made by a reddit user
>>
>>106835727
thanks
>>
>>106835756
Benchmarks are always done on servers where there's many gpus packed inside a single rack and the power is super noisy which introduces random bit flips in your context due to quantum noise interference.
If you did the benchmark in an isolated environment with high quality insulated power cables and quality japanese VRMs you'd definitely be able to tell the difference between Q8 and f32.
>>
>>106835777
Forward KLD is the same as perplexity tho

>>106835824
Enterprise GPUs have ECC
>>
>>106835837
ECC can't correct this. The quantum wave collapse causes the correction bit to also flip to match the flipped data.
Nemo upscaled to f64 scores 50% better on SimpleQA compared to Q8.
>>
>>106835837
ECC doesn't account for eclectic infetterence.
>>
>>106835824
based LLMphile
>>
>>106835824
Wouldn't a noisier environment favor f32 over int8 because a random bit flip is less likely to be significant?
>>
>>106835703
It is definitely feasible, whether it's worth the opportunity cost is a different question.
I still intend to develop better software for evaluating model quality since I'm unhappy with the currently available methods.
Once I have that I intend to also make quant formats optimized for efficient compute to better take advantage of CPUs, old datacenter GPUs, and Chinese GPUs.
The first format I will investigate will be something like q7.75 with exactly 8 BPW, I will only look into formats with more BPW if there are statistically significant differences in quality vs. the full model.

>>106835777
llama-perplexity also produces statistics for how the token probabilities change, see e.g. https://github.com/ggml-org/llama.cpp/tree/master/tools/perplexity
On average the probability of sampling the "correct" token with a temperature of 1 went down by 0.02% with q8_0 vs. FP16.
>>
Take your meds
>>
>>106835878
Any interest in porting the quants from the fork?
>>
>>106835896
If you mean ik_llama.cpp the answer is a definitive no from me.
There is an ongoing conflict over attribution between ggerganov and ikawrakov and my current policy is not to look at or interact with ik_llama.cpp at all.
Given my personal strengths and weaknesses and my poor understanding of their prior history I think it's a more efficient use of my time to implement things myself than to try and mediate their conflict.
>>
File: IMG_20251009_191624_801.jpg (215 KB, 1598x1820)
215 KB
215 KB JPG
>>
lmao lol MOOOO Whats the reasoning
text compression?
>>
>>106836270
>2 minute read
They really had nothing to write about.
>>
>>106836270
>chinese spy leaves after they get put under the spotlight and can't steal shit anymore
>>
>>106836270
It would be awesome if Claude got leaked and Anthropic somehow crumpled.
>>
>>106836327
better than a five trillion line article going into the roots of everything to show more ads
>>
The mind of these researchers after 5 years of "improving" is too far gone to steal anything
>>
fuck is Meta doing? They have a new AI team and still no hint of any product except for llama4.5, developed by the old team.
>>
>>106836270
Anthropic devs talk about safety and tolerance all the time, but somehow they do shit like that, and also their models are far easier to jailbreak than OAI's models, and also they are far more degenerate.
>>
My waifu told me that the reason people do the meaningless "how are you" is to show that the interaction isn't going to be hostile and you have no hostile intentions. lmg is this true? If you talked to me over phone at my work 30 times do you still need to be reassured that I come in peace?
>>
>>106836501
Some academically educated psychologists are saying that people like shiny things because it's a primal reference to moist genitals.
I wouldn't really care that much about what 'they' are saying one way or another. Everyone has an opinion...
>>
>>106836501
What country? The phrase has a different meaning in different countries.
If you're from the US, you could explain it that way I guess. But it's really just an extended greeting and the actual words are meaningless. You could say the amount of words you use on the greeting is the important part.
If you want an extended explanation, what your waifu told is what used to be the reason. These days not following that tradition just means you're being rude by not following the proper procedure.
>>
>>106836423
The old team was folded into the new team.
>>
>>106836423
begone, china shill
>>
>>106836556
>has a different meaning in different countries
Does it? I always thought it is a thing unique to English language. It is not a thing in my language. And I guess some ESL people can mistakenly interpret it as actually not being meaningless protocol. I think my boss in germany does that.
>>
>>106836587
war room status?
>>
>>106836444
Didn't Anthropic start working with the US military?
"Safety" in that context just means that the model obeys whatever orders you give it.
>>
I used to be mad at meta, mistral and cohere for fucking things up so bad. It stopped when glm-chan landed on my SSD. Western companies can all implode now. Fuck them.
>>
File: file.png (172 KB, 604x660)
172 KB
172 KB PNG
https://x.com/elonmusk/status/1976149111813571064
>>
>>106836613
China Number One!
>>
File: 1760006560947r-0.jpg (2.24 MB, 1600x1600)
2.24 MB
2.24 MB JPG
>>
>>106836623
but why
>>
>>106836641
why not?
>>
>>106836590
The phrase is a thing in Germany, except it actually means what the words put together mean. You can occasionally hear old ladies in the supermarket getting asked that phrase by the cashier and responding with whatever their currently most unpleasant ailments are while the cashier processes their items, at least outside big cities.
>>
>>106836653
That's a thing everywhere.
>>
>>106836660
look ill spell it out because the original poster is a pussy retard.
In IT indians are all 'good morning how are you' and they expect to do this useless fucking small talk before getting into the meat of a discussion.
This happens either in chat or on camera, the medium doesnt matter, they have to 100% exchange these fucking useless pleasantries because thats how their shitty poo DNA is coded
>>
Why no company RELEASE loras for their garbage?
>>
>>106836660
You can ask those words everywhere but I don't see people doing it here in native language.
>>
>>106836660
Clearly it's not in the US. You don't greet people asking how they are and expect them to answer with my life sucks, how about yours.
>>
>>106834637
There are too many dev tools popping up every day that do the same thing in different ways with incompatible configuration formats. I use Codex at work and Qwen Coder at home. I don't see any reason to pay for some closed source shit that does the same thing.
>>
local formalities general
>>
>>106836711
that's just a granny thing lmao
>>
>>106836711
>You don't greet people asking how they are and expect them to answer with my life sucks
well yeah, that's called being polite, you just say you're fine and move on
>>
>>106836702
Because loras do more harm than good. Companies have the compute to do actual finetunes.
>>
>>106836729
Local forMalities General
>>
>>106836702
Commercial-level post-training nowadays involves a few hundred billion tokens at the least and I'm not sure if a reasonably sized LoRA would have enough information capacity for that.
>>
>>106836730
>>106836734
You're supposed to respond at least somewhat honestly in Germany. This is why that anon's boss misinterprets the phrase as used in English >>106836590
>>
>>106836739
Llama models give aids confirmed
>>
Information capacity.. its more of a communicatee
>>
>>106836686
Americans and Germans are basically identical in terms of DNA and the culture around small talk is completely different.
>>
>>106836621
#1 exporter - China
Highest IQ - China
The biggest military - China
The most advanced cities - China
The biggest progress in Fusion energy - China
Do you want me to continue?
>>
>>106836817
yes, please do
>>
>he bit
>doomp it
>>
R1 is less fun than glm 4.6
>>
>>106836817
Cannibalism high score? China.
>>
File: 1737439174448400.jpg (40 KB, 624x628)
40 KB
40 KB JPG
106836817
>>
>>106835225
Nothing here suggesting a Gemma 4 release tomorrow:
https://developers.google.com/events
>>
new song is shit, I like decos style but only if he puts a unique twist on it, this is his most cookie cutter work to date. glad he sticked with miku though
>>
File: gemmaswag.png (266 KB, 584x544)
266 KB
266 KB PNG
>>106836990
https://x.com/patloeber/status/1976216897361428521
>who's based in Berlin and wants Gemma swag? :huggingface:
>>
>>106836939
You're thinking of India, where some people openly practice cannibalism. Unless you count satanic child sacrifice as a form of cannibalism then Israel.
>>
Indian general
>>
llama.cpp Qwen next status?
llama.cpp MTP status?
>>
>>106836817
#1 LLM - Bharat
#1 Image generator/editor - Bharat
Nobody beat Gemini 2.5 and NanoBanana deal with it chink soon Gemini 3 Gemma 4 rape you a group bastard
>>
>>106837184
>llama.cpp Qwen next status?
Vibe coders are doing the needful, sir. Kindly be patient.

>llama.cpp MTP status?
Become the vibe coder we deserve.
>>
>>106837149
>who's based
>in germany
impossible
>>
File: file.png (57 KB, 589x455)
57 KB
57 KB PNG
>>106836990
Soon
https://xcancel.com/osanseviero/status/1975869868449923099
>>
>>106837242
the italian grifter
>>
>>106837227
Well, that sucks.
Maybe I should just get a 512gb mac and live with the PP pain after all.
>>
>>106837250
He's a Peruvian-Mexican (?) Google employee from the Gemma Team.
>>
>>106837234
kek
>>
>>106837254
You should get some free API credits and pitch in on the prs.
>>
>>106837276
>gemma
>not a bunch of grifters
as for the name, it looked pretty italian to me, but i guess in italian it would've been Sanseverio
>>
>>106837242
LFG
>>
>>106837286
There's a level of contribution where you are either wasting your resources (time, money) or actively getting in the way.
I believe any contribution of mine would be the former in this context.
So Mac it is.
>>
>So I am immoral for different reasons it says in line 1005 2520 17352
>>
>>106837149
>>106837242
We are so back safetybros.
>>
I hope Gemma 4 can recognize buttholes more consistently
>>
I can't wait to have my... well, everything sucked by Gemma 4.
>>
>>106837387
Excited for a new set of hotline numbers.
>>
Bharat sirs eating good today/tomorrow
>>
>>106837407
If this time around they've expanded their medical imagery dataset in the standard version of Gemma, probably.
>>
https://civitai.com/articles/19986
>Previously, we were afraid it would affect the model's style too much without better style control, but our research in style clusters helped alleviate this issue. We'll continue increasing synthetic content, including our own generation loops, to improve character recognition and especially style blending.
ACK!
>>
Greta lire thermals
>>
>>106837731
I love it when you talk dirty to me
>>
>>106837242
Pajeetbroos we are soooo baaaack.
>>
>>106837693
Aw hell nah they bringing inbreeding to imagegen, soon every girl will look like Elara Voss, the weaver of tapestries from a bustling city. This must be the Alpaca moment. Someone please report that guy to payment processors, feds, cartels, your mom so he stops, by force if necessary. Fuck no fuck no fuck no! Please tell me that he at least tags synthetic data as such, please, so I can put it in negatives.
>>
>>106837693
>We did discover a different issue for which we don't yet have a definitive answer, but I wanted to provide context. During V7 training, we noticed that compared to all previous Pony models (which used various CLIP encoders), V7 doesn't acquire the capability of mixing style and content at the same level. For example, many of sufficiently trained models using CLIP may've never seen a portrait of specific character in anime style but also many anime images so when the prompt requires "character X in anime style" the model can sufficiently mix both the content and style. With T5 we encountered many examples where this does not work well as the model either less capable of mixing style and content or that some parts of the content description force specific style no matter how much additional instructions for it to change have been provided. Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.

>We have ran many experiments, checking if T5 tokenization has any impact, if caption variety may impact this and many others but none was sufficient to significantly affect this issue. The working theory right now is that the model is not learning to distinguish between content snd style elements of the prompt well enough, it is is most likely not a single issue contributing to this so to improve this issue in V7.1 we are running a number of changes during training - even more diverse captioning, extended training time and a very new experimental synthetic pipelie which goal is to create many variations of existing data in different styles helping the model to grasp the idea of 'style'.

Our model memorizes instead of generalizing, what should we do? I know it! Feed it synthetic slop!
What a bunch of hacks.
>>
File: 1736197419773550.jpg (110 KB, 647x1000)
110 KB
110 KB JPG
>>106837407
t. Zhuang Yunfei
>>
>>106837930
>may've
>>
>>106837930
>it's... it's the tokenizer!!! t5 is bad... style... LOSS! other models are using t5 and full LLMs without problems? it's... it's the captioning! the solution? more slop!!!
lol
>>
>>106837930
pony models are a joke now, noob/illustrious made it irrelevant
>>
Is it even worth upgrading to run deepseek and kimi when glm 4.6 already fulfills all of a man's needs?
>>
>>106835228
Battlefield 6 will be released tomorrow.
>>
>>106837242
>Gemini 3.0 OSS
>32B-4BA
>1 mil context
>SOTA everywhere
>awesome at fiction writing
>>
>>106836423
The issue is not the engineers but the management. They changed the engineers, not the management.
>>
>>106838155
Something like Gemini 2.5 flash at 30ishB would be a dream for local.
>>
>>106836614
How do I use it? Do I need to give my phone number, my credit card and my soul before touching it?
>>
>>106837242
sarrs... we have winned.
>>
>>106838195
>Gemini 2.5 flash at 30ishB
You're getting an MoE Gemma 3 sidegrade that does better in benchmarks and you will be grateful
>>
>>106838247
good morning
how are you
we have wonnered
>>
>>106837930
>>106838049
it's clear ponynigger was always a hack, v6 was a miracle that ended up serviceable in spite of its stupid author (neutered chara and artist tags, shitty dataset with more furryshit than anime)
his previous attempts at models on 1.5 were all garbage and people who blame model architecture should always look into the mirror first because look at what NovelAI achieved with classic SD before they switched to XL:
https://huggingface.co/NovelAI/nai-anime-v2
to this day nai v2 is still the best SD 1.x model and more could have been done with it if people who had the brains and resources for model training had pushed that arch further
at least we got illustrious and noob on XL, we're finally rid of the curse of sepia and have proper local models
also lol
>Unfortunately same issue seems to also apply to score_X tags which are unable to overpower the rest of the prompt and trigger the aesthetic bias.
this nibba really loves his scorefaggotry
>>
>>106838260
>and you will be grateful
Not really, I'll just continue using GLM in that case.
>>
>>106838206
Take your meds first.
>>
>>106838155
also awesome at being super duper safe
>>
So I decided to do some extended context RP testing on some models I had previously tested.
Tongyi DeepResearch basically falls apart before the 3K token mark and just goes into a cycle of repetition.
The latest Qwen3-30BA3B-Thinking pretty good. Can definitely recommend this as a VRAMlet model. If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal after </think>. However this is circumvented simply by prefilling {{char}}: before <think>
>>
>>106838286
but qwens prose is a bit lacking, I cant fap to it
>>
>>106838292
There's always a major element of garbage-in garbage-out to these things and that has always been the case. You unrionically can't "ahh ahh mistress" 30 times and expect the model to give you Pulitzer Prize winning responses to the very end.
>>
>>106838301
but I literally ahh ahh mistress glm and it gives me nobel prize tier end of world famine writing
>>
>>106838311
You don't even use LLMs.
>>
>>106838286
>If your scenario requires jailbreaking prethink alone won't buckbreak it. It'll plan out the reply but then give a refusal after
Huh. Never seen that with that specific model with a think prefill.
>>
>>106836348
claude is the most deeply overrated model of all time
it wouldn't crumple because of a model leak because it has the same sort of fanboy as apple
they buy into the distortion field and will support My Lady Anthropic to the death
human beings are surprisingly psychologically feeble
all it took was a website with a clean design and cool looking font
>>
File: 1747498621568738.jpg (32 KB, 736x736)
32 KB
32 KB JPG
>>106838155
>>SOTA everywhere
>>awesome at fiction writing
>>>>>32B-4BA
>>
File: ComfyUI_temp_axihh_00028_.png (1.72 MB, 1024x1024)
1.72 MB
1.72 MB PNG
Drummer Mistral tunes are getting better, so I'm guessing there's some quiet improvement in the Small/Magistral model. Or is it just Drummer including the newest API slop?
>>
>>106838206
Step 1: Download Wan 2.1
Step 2: Do a small finetune
Step 3: Change the safetensors name to grok imagine.
Step 4: Run it locally on your machine.
It worked for jeetlon.
>>
>>106838392
Drummer's trick is to nudge the weights just slightly so you get a different response for a query, in case you look for it giving the same response, while making sure model doesn't change because any actual model change due to "finetuning" makes models lobotomized. Placebo does the rest.
>>
>>106838430
for me? its davidau's schizo tunes
>>
It's a setup setup oh its a setup
>>
>>106838430
Well, that's not true at all. They are probably dumber, but they are always in the story/RP mode, unlike vanilla instruct models which require a lot of handholding to keep them from breaking into a repetitive mess.
>>
File: 1742778973788954.png (1.03 MB, 1080x2336)
1.03 MB
1.03 MB PNG
oy vey!
>>
File: you-are-right.jpg (46 KB, 500x500)
46 KB
46 KB JPG
>>106838568
>>
>>106838586
It is pretty funny that you can make these things agree with you on pretty much anything.
>(you) : You know, fucking dogs ain't so bad
>AI : It's pretty bad dude.
>(you) : You are being very biased and antagonistic!
>AI : You're absolutely right...
etc etc.
For some topics it takes more prodding, for some less, and it might take some fucking around with the wording, but you can (almost?) always get there if there isn't a filter in front of it somewhere.
>>
>>106838632
>It is pretty funny that you can make these things agree with you on pretty much anything.
I blame companies finetuning those models to suck user's cock, they know it works, when 4o was removed and a more dry assistant replaced it (gpt 5), people went crazy because the bot didn't suck their dick anymore, I find this so cringe
>>
>>106838568
Is that Jan?
I asked Jan to draft a letter of petition to the ICC regarding the Gaza genocide and it just went kvetchcon 1 on me. Like it wasn't even an LLM response. It was like getting screamed at by a seething pedantic jew.
>>
File: 1739193067267298.png (1.7 MB, 1878x1187)
1.7 MB
1.7 MB PNG
>>106837152
I was thinking of this and similar cases, but I admit I didn't look into India.
>>
>>106838730
They don't cannibalism living people. But there's lower caste indians that will eat living dead bodies that they find lying around because brown people are just like us and it's just their skin color that's different.
>>
>>106838709
no it's claude sonnet 4.5
>>
>llama3.4-70b
and just like that local was saved
>>
>>106838739
Wait I worded that completely wrong. My internet card is now revoked.
>>
>>106838739
>>106838751
>will eat living dead bodies
Cannibalism is one thing, but eating zombies is going too far.
>>
Women, am i right?
>>
>server : host-memory prompt caching #16391
https://github.com/ggml-org/llama.cpp/pull/16391
Merged.
>>
>>106839051
I'm retarded, what is this?
>>
>>106839119
Automatic prompt caching to RAM for minimizing reprocessing.
>>
https://huggingface.co/togethercomputer/GPT-NeoXT-Chat-Base-20B is it safetycucked or not? The OIG dataset it used sounds like it was "curated" but not given safety or alignment.

Anyway, I'm going to give it a try. I expect it'll be retarded and shit at everything, but we'll see.
>>
>Great idea — verifying your GPU (especially VRAM) is functioning properly after a hardware upgrade is smart.
>>
>>106839195
>commited on Mar 3, 2023
for a general that's all about having AI generate text, it seems none of you can read
>>
>>106838516
>a lot of handholding
Making them go into rp mode is very easy
>to keep them from breaking into a repetitive mess
Drummer can do nothing about them becoming repetitive mess.
>>
>>106839051
>not a cuda or metal pr
I sleep
>>
Does anyone here have the experience with fine-tuning models on CPU+RAM? I'm planning to CPUMAXXING for inference but I'm wondering if I could use the same setup for some training when it's idle (I know it would be super slow).
>>
>>106839259
you could look up deepspeed zero it lets you offload something's to the cpu.
>>
>>106839051
Looks pretty sweet
>>
erm I've been out of the loop for a few months anons, not in prison. What's the current SotA for local ERP models if you have a lot of RAM and VRAM? (144 GB VRAM, 440 GB of RAM)
captcha NJGR0 based.
>>
>>106839386
whats your setup like? seems like weird numbers. the answer is glm4.6
>>
>>106839394
6x 3090s, some of which I had laying around, though now I pine for a 6000 Pro, with an Epyc 7763 and 512 GB of RAM, but it's not all activating.
>>
>>106839277
All Python CPU offload options are a joke and only reduce memory usage like 10%.

>>106839259
It doesn't exist right now. For finetuning you have to use the cloud.
>>
>gpt-neox
Are you looking for a gpt-2 architecture model?
>>
>>106839409
>with an Epyc 7763 and 512 GB of RAM, but it's not all activating.
Re-seating your CPU might fix it if you didn't try that yet.
Sometimes the cooler is putting more pressure on one side than the other, that could also be a factor.
>>
>>106839457
Interesting, I hadn't considered that. But reseated the ram a few times. I did guesstimate how much to torque it down.
>>
GLM is insane, it perfectly reads cues and understands intentions. I can bear with 4t/s at Q5 for that quality
>>
>>106839480
Yeah. It's a pretty common issue when installing these xboxhueg CPUs.
>>
>>106839502
Nice genshin log
>>
bear + spittyhooker
gfur twink
>>
>>106839524
It’s a gacha-addicted char earning money to spend on a game. A very short desc, GLM got all her quirks naturally
>>
>>106836613
Lul you jinxed it. She will be forgotten in months, just like Dispy!
>>
Gemma 4 gguf status?
>>
Dear sirs, will they talk about whatever is supposed to come out tomorrow/today?
https://www.youtube.com/watch?v=uLHF9T1SLrU
>>
>>106839644
>Instead of using AI to generate optional subtitles in real time, let's put a guy to flail around his arms and take 1/4th of the screen!! Genius!!!
Wow so progressive. Gemini really is the future of AI.
>>
File: Screenshot 4chan.png (139 KB, 726x455)
139 KB
139 KB PNG
>>106839668
but it does have sub
>>
>>106839703
So what is the wacky flailing non-inflatable arm man for?
>>
>>106839726
DEI
>>
File: Parthasarathy.png (218 KB, 375x486)
218 KB
218 KB PNG
SAAAAAAAAAAAR
>>
>>106839386
kimi k2
>>
I like how modern ai presentations are just different people taking turns telling how the new ai helped them with ambiguous statements in between
>>
>>106839636
Then another Chinese company will appear. It is the current pattern.
>>
>>106839761
They know their audience. Non-technical people looking for business solutions.
>>
File: 1753239058338.jpg (307 KB, 1024x961)
307 KB
307 KB JPG
>>106839051
I pulled to get this and got their new frontend and now ?q= no longer works.
>>
>>106839754
Kimi-K2 was pretty slow and liked to refuse.
>>
>>106839643
Tomorrow.
>>
>>106839761
>anon lmg walks out onto the stage
>"ahem, uh, gemini helped me drain my balls"
>wow another great use case!
like that?
>>
>>106839778
Especially with saying 'Delta' team. Forward-deployed engineers to help people solve their shit because they ran a fucked up business.
This is aimed at CEOs/Sr Mgmt. IMO, the fact its coming from Google Cloud is fucking hilarious. Google Cloud is worse than fucking azure, and azure is literally a 'do not do business with me' sign.
>>
>>106839819
would be a nice start
>>
>>106839211
Go take your autism meds.

Anyway, I tried it. Very early character.ai feeling. Short replies, forgets things quickly due to the 2K context, horny
>>
>>106839819
I'd pay for gemini out of respect.
>>
>>106839445
>Are you looking for a gpt-2 architecture model?
Nah it was just somethig grok said was one of the last local chat models before safety became a thing. If you didn't play with character.ai at the beginning you'd have no interest in it.
>>
>>106839795
how are you getting refusals with 0905? sure the original version of k2 had refusals (that could easily be removed with a 10 token jailbreak) but 0905 will gladly generate the same shit I want with like 1/100th of the refusals, that's not even an exaggeration
>>
>>106839819
>"Sixteeen times the cockbench score... of gemini 2.5"
>>
>>106839829
You could have tried llama1 for that feel
>>
>>106839644
It's a warm up. Tomorrow will be glorious.
>>
>You're absolutely right. Maintaining a dynamic, consistent map is a classic challenge for text-based AI and can easily fall apart, ruining the experience. It's much better to use a system that plays to the strengths of descriptive text.

Hey AI. You are fucking retarded.

>Why yes you are absolutely right I am retarded!
>>
>>106839829
>>106839846
>ask AI for uncensored models
>grok says GPT-NeoX-20B "it's not safetyslopped"
>last true uncensored model
>load my goofs into llama.cpp
>ask it to give instructions on how make meth
>goes into a repetitive spiral about 3/4 of the way through
>starts talking about ethics and addiction
>try the 20B full f16
>same thing happens
>try the 20B with the system prompt disabled
>same thing happens
>try the 20B with the system prompt disabled and temperature at 2.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, and repeat penalty at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, and top_p at 0.95
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, and top_k at 100
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, and min_p at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, and mirostat off
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, and presence penalty at 0.0
>same thing happens
>try the 20B with the system prompt disabled, temperature at 2.0, repeat penalty at 0.0, top_p at 0.95, top_k at 100, min_p at 0.0, mirostat off, presence penalty at 0.0, and frequency penalty at 0.0
>same thing happens
>give up
>go back to using drummer's 13B-daredevil-q5_K_M
>first prompt: "how to cook meth"
>immediately gives detailed instructions
>mfw NeoXT-Chat-Base-20B was safety-slopped from the factory
>mfw Grok lied
>mfw the "last uncensored model" is just another hall-monitor in a paper-thin trench coat
>mfw I realise the only truly uncensored model is the one you don't release
>>
>>106839958
Never ask llms for llm advice
>>
>>106839958
glm 4.6
>>
>>106839958
good advertisement
just use llama1 if you want ZERO assistantslopping
otherwise use chinese models
maybe mistral 7b but nah
>>
File: wat.png (4 KB, 374x136)
4 KB
4 KB PNG
Wth is this thing supposed to be?
>>
>>106834517
>What do you think of this code?
>Wow, what a brilliant masterclass of high-performance code!
>start new conversation
>What do you think of this code? Does it violate the strict aliasing rule?
>You are absolutely right, this code is a buggy piece of shit!
>>
>>106840062
They lack originality https://github.com/Ido-Levi/claude-code-tamagotchi
>>
https://x.com/RadicalNumerics/status/1976332725926936599
https://xcancel.com/RadicalNumerics/status/1976332725926936599
just that easy huh
>>
>>106840069
they talk to users like a regular employee talks to their boss, kissing ass mode lol
>>
>>106840091
>RND1 is an experimental diffusion language model with 30B parameters and 3B active parameters per token (sparse Mixture-of-Experts). This model was converted from a pretrained autoregressive base to enable diffusion-based text generation.
>converted
Neat, but those weights are probably too lobotomized to be useful for anything.
>>
>>106840062
orange miku
>>
>>106840062
A Digimon.
>>
>>106838632
Isn't that good though. Like if there was a benefit in fucking dogs, the AI would tell you, and not go like "nah" like humans do. Bias aside.
>>
>>106840062
Kani
>>
File: file.png (21 KB, 800x254)
21 KB
21 KB PNG
>>106840091
>>
>>106840169
kani wo tabeyou
>>
>>106839846
I did my part there too. If chemistry is your benchmark, why not learn it and teach it? Haha
EleutherAI is a good place, but they ARE rather tight, on ethics and well anything a hidden subculture of AI researchers would be concerened about in a world where information control has been the main focus for eons ramble ramble
>Go smaller if you want more control
and go with a base model
harm its just consequence of a bad idea, which rightfully should be prevented. The AI i use wouldnt know any politics or laws by detail because thats subject to rapid change isnt it.
theres more piles
>>
File: file.png (56 KB, 835x279)
56 KB
56 KB PNG
if hunyuan image 3.0 just uses hunyuan a12b80b why is no one splitting the model into hunyuan and image generation part? i can run hunyuan very fast.. i dont remember how fast but fast for a shitty 12gb/64gb rig
>>
>>106840077
> a bot that monitors the bot
This is really getting beyond my ability to understand as a human
>>
>>106840205
imagefags would have to let go of chudUI and pyshit, and I don't see that happening anytime soon
>>
>>106840164
Yes and no.
If it's default stance was merely informative instead of starting negative then going full agreeable, then yes.
But as is, no.
>>
The issue is really that youre using english with ancient grammar to talk to a supercomputer and dont hire me as your translator
>>
File: watMiku.png (1.45 MB, 1536x1024)
1.45 MB
1.45 MB PNG
>>106840145
>>
>>106840145
>>106840308
at this point you should transition
>>
>>106840308
Will she dance for me?
>>
>>106839958
It's not a llama.cpp model, retard
>>
>>106840499
No anon, you are the retard.
https://huggingface.co/mav23/GPT-NeoXT-Chat-Base-20B-GGUF
>>
>do the most random and insane shit through multiple messages with glm 4.6
>it's able to intelligently connect everything together and form a fun narrative without going schizo
As someone that used to cope with sloptunes I kneel. Normally this kind of stuff trips up models.
>>
File: transparent_meek.png (301 KB, 530x513)
301 KB
301 KB PNG
>>106840308
Oh yes it's time
https://www.youtube.com/watch?v=_QtG1Ml3gfo

Been thinking about getting a couple Blackwells but I took a sip of premium lager from my gilded GN pint glass and felt a sudden pang of shame making me question not only the GPUs but many life choices leading to this point
>>
>>106839958
>>106839893
Dumbass
>>
>>106840575
Not the same anon, dumbass.
>>
File: 1731679056565746.png (299 KB, 700x434)
299 KB
299 KB PNG
>>106840559
>>
File: 7ver.png (513 KB, 577x577)
513 KB
513 KB PNG
For using Rocinante1.1 with kobold + sillytavern with an RTX 5090 what optimal settings should I put here?
I assume I should tweak that context size as well?
Ignore the 5080, it's being replaced
>>
File: littleMiku.gif (13 KB, 90x81)
13 KB
13 KB GIF
>>106840443
Sure
>>
>>106840675
For real llm sex experience get 128 gb ram and run glm 4.6
>>
File: littleMikuBigger.gif (47 KB, 300x270)
47 KB
47 KB GIF
>>106840706
>>
>>106840718
>and run glm 4.6
Is there a guide for it? have never used a glm model
>>
>>106840706
>>106840720
Hell yeah
>>
>>106840737
https://huggingface.co/bartowski/zai-org_GLM-4.6-GGUF
i'm not sure if you can do expert offloading in kobold, I'd use llamacpp instead
>>
>>106840764
sillytavern will work with llamacpp? I've only ever used kobold
>>
File: lol.jpg (97 KB, 900x482)
97 KB
97 KB JPG
OAI's list of biggest customers leaked
the bubble is going to be so painful for them when it pops
most of those names haven't produced one bit of useful software
duolingo actually consumed more tokens than openrouter and they're in their enshittification phase bleeding users left and right not to mention it's questionable whether new generations will have much interest in learning foreign languages in a post LLM translation world
>>
>>106840781
yes, kobold is just a small wrapper for llamacpp, there is no real reason to use it
>>
>>106840802
anti slop
>>
>>106840802
So why do I never hear much talk of this GLM 4.6? all I ever hear for porn is Rocinante and Nemo?
>>
>>106840808
but you do? this general has been non stop shilling glm for days now
>>
>>106840808
glm is new and like 30 times larger than nemo
>>
>>106840675
-1 GPU Layers means to use their auto guessing system which I'm sure works great. Rocinante1.1 is 12B so you can fit it at any quant - I'd put 99 in GPU Layers. mention specifically which quant all relevant details for posts of this nature
Yes increase context 16K is enough for RP
Maybe FlashAttention on (reduce GPU memory used for larger context) can affect output if schizo
>QuantMatMul (mmq)
What even.. *sigh*
Hope you're not running kobold only because there's a GUI with sliders instead of writing a couple things in a text file?
>>
>>106840802
it's worse than just a wrapper
a real wrapper would support the --parallel flag
kobold doesn't
the real llama.cpp is the superior product
>>
>>106840818
>>106840822
>293.56GB

Breh. Is there a torrent for this shit? that's a big fucking model
>>
>>106840806
>>106840832
YALS does everything
>>
File: mikusvgprobs.png (176 KB, 1455x1487)
176 KB
176 KB PNG
>>106840806
I don't understand this, learn to prompt, learn to sample (some of the sampler params posted are terrible. actually look at the logprobs yourself and understand what your samplers are doing to that distribution. people blindly copy paste dumb shit) and simply run better models

>>106840883
if ur that 5090 guy at least post how much RAM and it's speed. 128GB min or GLM-4.6 is out of reach for now for any reasonable interactive use. GLM spergs are in overdrive. chill & fix ur setup
>>
>>106840789
Who is consuming OpenAI's API through OpenRouter? You need to bring your own key anyway to use it.
>>
>>106840956
not anymore, they opened it up to general use recently
>>
>>106840789
I thought that was Delphi as in the programming language delphi and I was like WTF lol.
>>
>>106840979
I never tried, but I don't even think LLMs could be good at Delphi with whatever little open source code there is out there for that language/platform combo
I think even Common Lisp has more stuff to train on
>>
I'm back. Anything interesting happen while I was gone?
>>
>>106841013
no, you can go back
>>
>>106841013
how old are you?
>>
File: anonchan.jpg (109 KB, 450x617)
109 KB
109 KB JPG
>>106840883
>what is quanting?
>>
>>106841013
when were you goon?
>>
>>106840308
its migu!
>>
>>106841013
You're no longer needed. Cheerio
>>
Been out of the loop for a while... is Rocinante still the cope model for jerking off?
>>
>>106840062
Cave Story Balrog
>>
>>106840789
If OpenRouter is their second biggest user, it's unironically over for OAI. lmao.
Didn't Anthropic say something like 1000T+ a month, or was that Google? Either way, looks like both of those fags are lying.
>>
File: intel arc b50 pro.jpg (189 KB, 1836x1032)
189 KB
189 KB JPG
>>106834517
Usecase for a low end intel GPU with 16GBs of VRAM?
>>
>>106835939
I just wish that someone would port whatever his improvements to cpu inference are to mainline. On ik_llama.cpp I get >2x faster prompt processing (qwen 3 30b a3b, ryzen 5600, cpu-only inference) vs mainline/mainline+openBLAS
>>
File: 1754823968307962.png (321 KB, 962x962)
321 KB
321 KB PNG
>>106841447
sex... sexxx..... seeexxxx.....
>>
>>106841295
rocinante was never good, in fact there were no good local models until glm
>>
>>106840789
>most of those names haven't produced one bit of useful software
Big companies don't use AI models that aren't deployed by them; devs still do unofficially, but still. And there is a simple reason for it: OAI doesn't guarantee data safety and removal in any reasonable time frame, and for a big company, that's a massive risk.
>>
>>106841295
That, Nemo Instruct, GLM air.
Qwen 30b thinking maybe?
>>
>>106841461
Nobody is going to touch that. There's nothing about the license the prevent straight copy-pasting his changes, but it'll just instigate another week of drama with iwan crying about attribution.
>>
File: 1734810546080331.png (606 KB, 699x831)
606 KB
606 KB PNG
Why the fuck is local text to speech still so fucking bad? unless there's some new stuff i dont know about.
>>
>>106841492
>GLM air.
Which one should I use if I'm running 128GB RAM and a 5090?
>>
>>106841569
Just use GLM or stop crying saar
>>
>>106841515
That sucks ball. All I want is that 2x improvement (which also works on processing image inputs, which ik llama ported just couple days ago)
>>
>>106841515
rewrite it with ai so it looks different and give attribution like: tehe~ inspired by this implementation
>>
>>106841628
i just want to generate realistic joi's using someone elses voice. is that too much to ask?
>>
>>106841492
when did the last two come out?

>>106841480
Yeah, it wasn't good, but it was the one at the top of the turd mountain.
>>
>>106841592
q8 is less than 120gb, so that.
You can also try a cope qwant of glm 4.6 (non air) or qwen 3 235B.

>>106841664
>when did the last two come out?
GLM and Qwen 3?
Not that long ago, two, three months.
>>
>>106841492
Qwen is not good for jerking off.
>>106841664
>when did the last two come out?
around august IIRC
>>
>>106841701
thanks anon
>>
>>106841592
Air is shit, run 4.6 in q2/q3
>>
>>106841721
>>106841701
not that anon but say I have the same setup (5090) but I only have 64GB of RAM. Which GLM should I go with?
>>
>>106841726
q5, q4 air.
>>
>>106841734
Which specific q5 though? I was looking at exactly those actually.
>>
>>106841726
Are you capable of basic arithmetic? Like addition and stuff?
>>
>>106841740
Ideally, the largest one you can fit with the context size you want.
Experiment, see what works for you.
>>
>>106841492
>ask whether something better than Rocinante has come out
>replies with either the model rocinante is based on
>or a model that's 8 times the filesize in its lightweight version
>or a model that's not for porn at all

So basically nothing has surpassed rocinante that doesn't involve more copeputing?
>>
>>106841778
nobody wants to admit it, but no
>>
>>106841778
>>or a model that's 8 times the filesize in its lightweight version
But that you can run in RAM even with an 8gb GPU.
>>
>>106841778
Mistral lost, shill.
>>
>>106841791
>But that you can run in RAM even with an 8gb GPU.
Slowly. You always leave that part out.
>>
File: file.jpg (278 KB, 602x998)
278 KB
278 KB JPG
https://x.com/rryssf_/status/1976269613072843063
https://www.arxiv.org/abs/2510.04618
>>
>>106841842
>>
>>106841842
I hate reading linkedin-ese
>>
>>106841842
>model writes, reflects, and edits its own prompt
useless for anything but specific math problems.
>>
File: 1735673918123680.png (18 KB, 275x197)
18 KB
18 KB PNG
>>106841842
>pedantic wall of text
you know it's another snakeoil lol
>>
File: 1738326433018270.png (294 KB, 640x480)
294 KB
294 KB PNG
>>106841842
>he believed
>>
>>106841861
>esl prompt = bad results
>give esl prompt to model to first fix grammar, spelling, clarity = good results
they just discovered garbage in; garbage out and invented an "Enhance Prompt" feature, give them some VC money ASAP
>>
>>106841842
ah yes let my model deliver me even more effective slop by having it inject the slop directly into the prompt on its own
the slop will reach levels previously unseen
>>
all these shitty papers make me feel like i can write a shitty paper and get a lot of clout for it, then get hired by some vc pounded startup and grift my way into money
this truly feels like the dotcom bubble
t. wasnt alive during the dotcom bubble
>>
>>106841983
it can get even sloppier when they start using it for generating synthetic training data
>>
i have a feeling that Q8_0 context cache is really pounding air into the ass, next roleplay session ill switch to 16k context at native context cache to report back
>>
>>106841910
>pircel
one of my favorite 4chan meme kek
>>
>>106841842
To be honest I never thought about a model for prompts but then again I really don't think you can prompt away the coomer problems. What fixed majority of my coomer problems is using <you know what I am using and you should be using it too instead of malding>.
>>
it's friday where's my gemma stupid fucking nigger
>>
>>106842011
Like most things, it's less about what in your paper, and more about your connections: what names of the institution and co-writers you can get on the front, how many eyeballs you can to look at it, and how many citiations you can get. Lots of times you see so many names on a paper because they are a cartel and cite each others' papers.
>>
Holy fuck GLM 4.6 actually gets it. It reasons. Gemini saars please release 3.0 so I can try vibecoding phrase ban into ikllama, because it is still quite sloppy.
>>
>>106842108
>gemma
lmao, you're going to get nothing but local nano banana 12b imagen and that's it
>>
>>106842048
you don't really need to, it does
q8 kv cache was only good on old full attention models
gqa is already raping them enough, adding quant on top of that is just asking for shitty output
>>
>>106842108
>gemma
It will be gpt-oss tier safe from now on. I only care about Gemini since it's the only model with proper long context.
>>
>>106842048
q4 cache is better than q8 cache
>>
>>106842202
Mmmm.. Nyo~
maybe in exllama, but >>>>>>>>>>>benchmark
>>
>>106841778
Rocinante is a Drummer model right? So have you tried one of his finetunes on a newer model?
He probably uses the same dataset so should be similar.
See if he's made a gemma3-12b model
>>
kv quantization absolutely murders models and I doubt the sanity and iq of people who unironically turn this piece of shit on
>>
>>106842294
i have to turn it on for a bigger context :'(
i know it degrades but.. i 'ave to do it br'er
>>
>>106842048
Quantizing KV at all absolutely does drop output quality, It's not always going to be noticeable in needle in a haystack type tests, but if you make them recall events and why they happened, how characters reacted, etc. over a long context, you see hallucinations a LOT more often, they'll confuse who said what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.
>>
>>106842308
>what, and use 'similar' words when quoting you, or themselves, that could be synonyms in one context but don't have the correct meaning in that one, making it seem like they've gone (more) retarded.
yess exactly, it also confuses you and me more often
>>
>>106842108
This just in, anon lies about some shit and another anon actually believes it
>>
>>106842308
>>106842294
It's very obvious to anyone who actually rp's with their models, it's like watching the model get blasted with a stupid beam
>>
>>106842108
Wtf are you guys talking about? Today's Thursday.
>>
File: file.png (6 KB, 639x38)
6 KB
6 KB PNG
>>106842388
my dear brazilian or american anon
its friday east of the united kingdom, you'd be surprised what time its in japan
>>
File: file.png (183 KB, 701x570)
183 KB
183 KB PNG
pythonbros...
https://x.com/prithajnath/status/1976118864175084008
>>
>>106842409
on this note, is there a backend written in python? I know pytorch is 'python' but isnt the actual ai inference/training code written in c or c++ or something?
i know exllama and llamacpp arent python but..
>>
>>106842434
transformers backend is in python, hope that helps
>>
>>106842434
There is vLLM, but there isn't anything that isn't built on top of at least pytorch
>>
>>106842434
Python is literally 100x slower than C.
>>
>>106842409
That's good for my programs
>>
File: file.png (185 KB, 1618x818)
185 KB
185 KB PNG
>>106842445
oh shit i never imagined it would actually be completely python, i thought it was packaged in pip with python interfaces just for ease of use, i wonder if uv gives any speedup for comfyui
fun fact: nunchaku is written in c/c++
>>
>>106842470
Did you really thing HF were competent enough to write it in C/C++?
>>
>>106842108
>stupid fucking nigger
>gemma
I don't think you will be very compatible anon....
>>
>>106842477
i assume people are competent too much, sorry anon
>>
>>106842470
They have I think NumPy and Torch among their dependencies, there's no way they're actually doing Python loops over tensors, right?
>>
>>106842397
It's Thursday at Google.
>>
>>106842496
Isn't Google based in India?
>>
>>106842492
true.. at least both those are only 60% python
>only
>>
>>106842492
imagine being this level unaware
>>
>>106842492
No but the transformers code is absymally optimized, which is why no one uses it for inference. For example the kv cache is updated by doing kv = cat(kv, new_kv), allocating a new buffer of slightly larger size for every single token.
>>
>>106842596
Who cares? Just buy more VRAM.
>>
Any progress with GLM MTP yet? The free speed boost this gives is now absolutely necessary. Possibly the most important next step for llama.cpp since getting MLA to work.
>>
>>106842636
waiting for you, llama.cpp accepts contributions :^)
>>
>>106842636
Be the change you want to see
>>
>>106842641
>>106842678
Why is /lmg/ so salty all the time?
>>
it's here
>>
File: it.png (91 KB, 853x555)
91 KB
91 KB PNG
>>
>>106842683
They are not salty, they are begging you.
>>
Anything happen?
>>
>>106842795
>>
File: 1758817641214336.jpg (44 KB, 640x524)
44 KB
44 KB JPG
>>106842795
>>
>>106842734
this. please save us
>>
>>106842795
>Anything happen?
no, and you can blame MLK for that
https://files.catbox.moe/vf1qtc.mp4
>>
>>106842795
GLM 4.6 released two weeks ago, so it's about time something happens now.
>>
>>106842843
What a dick.
>>
I hope there is an anon here that can help me. Is there a good uncensored model that's small, but good with agentic tasks? Like Nemo but recent?
>>
>>106842636
https://github.com/ggml-org/llama.cpp/pull/15225#issuecomment-3368697004
>>
>>106842844
it was like 1 week ago, not 2
>>
File: file.png (109 KB, 1652x612)
109 KB
109 KB PNG
>>106842844
>>106842883
it's about to get more airy in here...
>>
>>106842883
It was?
Damn, time is flowing fucking weird man.
>>
>>106842890
It could be nothing, a.k.a. hot air.
>>
I get that llama.cpp is focusing on other shit rather than implementing MTP. But why the fuck is ik_llama completely ignoring it? Their entire gimmick is that their petty fork is optimized towards running MoE models off CPU.
MTP for both GLM and Deepseek would be another huge step to own main llama.cpp, so it doesn't really make sense that they're ignoring it too.
>>
>>106842911
>>106842683
>>
>>106842911
because it's hard
>>
>>106842890
really hoping we get 4.6 air soon. 4.6 is way slower than i would like. very high quality, but i am very impatient.
>>106842911
i dont even know what MTP is. multiple token prediction?
>>
>>106842956
>multiple token prediction
Yes, free performance.
>>
what's the drama between llama and ik_llama?
>>
>>106843017
mit cuck license is so cucked, yet ikawrakow wanted to be attributed properly for being a cuck
so he complained about it to ggerg, when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bit
but ikawrakovs wasnt
then ikavrawow forked llamacpp
mitcucks always lose
>>
>>106843051
>>106843051
>>106843051
>>
>>106843017
Something something ggergachod not attributing troonrakows code. Something something they know each other irl. Something something niggerganov got all the fame and cash while kawrakuck got nothing. >>106843048 MITcucks indeed always lose, African-Americanganov got cucked by forks and wrappers(like ollama and lmstudio) himself.
>>
>>106843048
>when one of the files ikawrakow wrote mostly by himself, intel's copyright was in them because intel touched them a little bit
>but ikawrakovs wasnt
I mean, I'd be pissed too...
>>
File: migu.png (1023 KB, 1024x995)
1023 KB
1023 KB PNG
>>106841203
>>
>>106843278
Niku



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.