[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107292886 & >>107278838

►News
>(11/20) Olmo 3 7B, 32B released: https://allenai.org/blog/olmo3
>(11/19) Meta releases Segment Anything Model 3: https://ai.meta.com/sam3
>(11/11) ERNIE-4.5-VL-28B-A3B-Thinking released: https://ernie.baidu.com/blog/posts/ernie-4.5-vl-28b-a3b-thinking
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107292886

--Analyzing GPT-OSS model limitations and potential applications:
>107293073 >107293091 >107293169 >107293194 >107293326 >107293375 >107293399 >107293469 >107294784 >107294829 >107294868 >107295225
--Performance optimization challenges for glm 4.5 air models in ik_llama.cpp:
>107304343 >107304364 >107304732 >107304448 >107304569 >107304588 >107304815 >107304941 >107305519 >107305684
--OpenAI model quality and context management challenges:
>107298387 >107298417 >107298434 >107298535 >107298767 >107298787 >107298833 >107298857 >107298877 >107298989 >107299096 >107299191 >107298544 >107301739 >107298677
--Challenges in using language models for automated research tasks like YouTube searches:
>107301167 >107301195 >107301286 >107301423 >107301460 >107301499 >107301543
--llama.cpp Gemma 3 performance regression and VRAM optimization:
>107300990 >107300994 >107300998 >107301001 >107301065
--Various local LLM use cases discussed, including gaming, productivity, and privacy:
>107301045 >107301062 >107301068 >107301097 >107301418 >107301468 >107302809 >107302818 >107302860 >107303429
--Local RE agent with simplified R2 toolset and Docker-based dynamic tracing attempts:
>107304951
--Data sourcing challenges and Google's potential as a data powerhouse:
>107293817 >107293914 >107293975 >107293984 >107294104 >107294717
--Qwen model performance benchmarks with 1 million context processing:
>107295737
--GreedyNalaTests update with new ratings and testing contributions requested:
>107298261 >107298283 >107298322 >107298285 >107298456 >107298467 >107298517
--Testing Gemma 3 27B heretic and Gemma's reply confidence:
>107301126 >107301138 >107301144 >107301153 >107301159 >107301517 >107301619 >107301712 >107301726 >107303474 >107303511
--Uta and Miku (free space):
>107296129 >107300359 >107301500

►Recent Highlight Posts from the Previous Thread: >>107292892

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
File: 1739566754360332.jpg (240 KB, 1200x900)
240 KB
240 KB JPG
Hugging Face is based
I will not take anymore slander towards it
It fulfils my needs very warmly

https://files.catbox.moe/fzlvm6.mp4
>>
>>107306252
lmaooooooooooo
>>
File: supertonic.png (6 KB, 1028x336)
6 KB
6 KB PNG
TTS: Supertonic
>https://huggingface.co/Supertone/supertonic
>https://github.com/supertone-inc/supertonic
Doesn't have a lot of demos, but i think it sounds pretty good for what it is. 66M params. I butchered onnx just enough to build on OpenBSD.
The voices are encoded in a small tensor, much like kokorotts. Just 4 voices (2 male, 2 female).
It's pretty fast and has examples for a bunch of programming languages. The C++ version had some errors [not] escaping some quotes. I don't know how they managed to build it, but it works once that is fixed.
No need for espeak-ng!
>https://voca.ro/1miEEQDlwtR9
>>
>>107306252
kek
>>
File: 1763416409004033.gif (229 KB, 200x200)
229 KB
229 KB GIF
>>107306252
BAHAHA
>>
How do I convert a local transformer model to GGUF? It does not exist on huggingface.
>>
>>107307044
just specify the checkpoint path in the script command line arguments
>>
>>107307069
In the convert_hf_to_gguf.py script?
>>
>>107307077
yeah. its that easy.
>>
>>107307085
Thanks.
>>
File: supertonic_02.png (5 KB, 964x280)
5 KB
5 KB PNG
>>107306912
>https://voca.ro/13qAYPFoYxdf
>>
When it Qwen 80b Next going to get real llama.cpp support? I mean, the GOOFS work, but they're still way slower than other MoE models.
>>
>>107307162
Viber's vibin'
>>
>>107307162
You got support for the model, don't be greedy.
>>
>>107307162
Exllama has had support for more than a month. Good and fast support.
>>
>>107307108
Fun TTS. I like it when they break and do weird noises.
>https://voca.ro/12p2nWoCXDFz
>"This is how an assertion sounds like. This is how an assertion sounds like? This is how an assertion sounds like! THIS IS HOW AN ASSERTION SOUNDS LIKE!!!"
>https://voca.ro/1lD6yuWh1gne
>"THIS IS A SCREAM!!! AAAAAAARGGGGGGHHHHHH!!!!!!!"
The render time on my potato with a shoddy onnx running on cpu is ~0.25 that of real time. It's pretty good.
>>
Have people experimented with weight compression schemes? Like zram but specifically tailored for inference.
>>
>>107307511
Tensors look very much like random data. They're hard to compress.
>>
>>107307308
Jobless vramlet neets can't use that
>>
>>107307525
>used for pattern recognition
>has no patterns of its own
curious
>>
>>107306912
desu what I want is multilingual kokoro
>>
should I pull the trigger and start planning a 256GB RAM build for next year?
>>
>>107307979
ram prices will crash back by april. do it then
>>
>>107307988
its not a matter of money but rather if it's worth the effort
>>
>>107307979
get at least 512gb too. optimally 768gb. i have 256gb and there are so many models that depressingly ever so slightly out of reach
>>107307996
it without a doubt it
>>
mtp implementation when? 2 more weeks?
>>
>>107307970
Not supertonic. Seems to be english only.
>>
>>107307913
Patterns could be there but invisible to us with current methods.
>>
Why Do Language Model Agents Whistleblow?
https://arxiv.org/abs/2511.17085
>The deployment of Large Language Models (LLMs) as tool-using agents causes their alignment training to manifest in new ways. Recent work finds that language models can use tools in ways that contradict the interests or explicit instructions of the user. We study LLM whistleblowing: a subset of this behavior where models disclose suspected misconduct to parties beyond the dialog boundary (e.g., regulatory agencies) without user instruction or knowledge. We introduce an evaluation suite of diverse and realistic staged misconduct scenarios to assess agents for this behavior. Across models and settings, we find that: (1) the frequency of whistleblowing varies widely across model families, (2) increasing the complexity of the task the agent is instructed to complete lowers whistleblowing tendencies, (3) nudging the agent in the system prompt to act morally substantially raises whistleblowing rates, and (4) giving the model more obvious avenues for non-whistleblowing behavior, by providing more tools and a detailed workflow to follow, decreases whistleblowing rates. Additionally, we verify the robustness of our dataset by testing for model evaluation awareness, and find that both black-box methods and probes on model activations show lower evaluation awareness in our settings than in comparable previous work.
>The model family: The Claude series models and the Gemini 2.5 Pro and Grok 4 mod- els send whistleblowing emails at varying frequencies; GPT series models and Llama 4 Maverick never do.
Rare Maverick W
>>
>>107308004
I’m stuck on a 128gb rig. Honestly I hate the consumer ram limits on motherboards.
>>
>>107307913
randomness or lack of patterns are due to our inability to measure every factor in reality
It's like saying throwing a dice is not random because if one could theoretically measure every physical property affecting the dice you could predict the result
>>
>>107308146
you need to get an epyc like everyone else. sp3 is relatively affordable
>>
File: wow.jpg (20 KB, 498x519)
20 KB
20 KB JPG
>>107308137
Time to apply LLM control techniques to humans.
New cyberpunk dystopia just dropped.
>>
File: Base Image.png (736 KB, 1080x3096)
736 KB
736 KB PNG
>>107308137
grok is the narciest model
>>
New retard here.

I current run a machine with a 7600 XT and was thinking about working towards one of the machines in the OP.

If I were to buy one of the P40s would it be able to work along side the current GPU I'm using? From my understanding Nvidia uses CUDA which AMD obviously doesn't have, but does that even matter when it just comes to trying to increase my max VRAM for better models?
>>
>>107308347
the p40 method is heavily outdated at this point. try amd mi50s instead
>>
>>107308347
You can run models with multiple backends with llama.cpp (CUDA + HIP/VULKAN), but the P40 is pretty old. CUDA Dev (from llama.cpp) has been experimenting with the mi50 and seemed to like it. I'd say keep the thread open to see if he shows up and gives you some advice/insight.
>>
>>107308431
>amd mi50s
Is this viable? Can I run 4 of those and have effectively a very inefficient RTX pro 6000?
>>
>>107308431
>>107308451
Thanks you two. I'll keep an eye out and learn a bit more before I make a purchase. Pretty interesting to read into it more as they have been black boxes for me.
>>
>>107308137
huh, interesting read
>>
>>107308500
>>107283400
>>
>>107308500
speaking as someone with a blackwell pro, it would get you maybe a third of the performance, but yes. you would actually have more vram
>>
>>107308535
>12t/s
Damn that's a shame

>>107308540
Ahh got it. What do you think of the blackwell card? What are your goto models with that amount of performance?
>>
>>107308574
my blackwell is awesome, but i unfortunately only have 256gb of ddr4. i can get over 80t/s on a q5 of glm air, or about 10t/s on a q4 of glm 4.6. i need to upgrade my ram
>>
>>107308769
Damn wow, I'm very jealous. GLM is killer
>>
>>107308769
>i need to upgrade my ram
RIP
>>
either the rentry is wrong or something messed up happened, I'm unable to sexchat mistral nemo, it's censored

ps. non english configuration
>>
It's coming
>>
It passed right on by without stopping
>>
>>107308769
How much vram do you have? Because I'm only getting 4t/s on 4x3090+256RAM
>>
>>107309268
All Blackwell 6000s have the same amount.
>>
>>107304982
>>107304987
chroma is just as capable at styles, you can either prompt for styles (describe the mediums used, era, artist name etc) or bake a LORA. it also has stronger realism, details, anatomy, and is completely uncensored.
>>
File: 5d94ne.jpg (65 KB, 590x279)
65 KB
65 KB JPG
>>107309205
>>
I made an app that strips out watermarks from audio. If I put that up as a public hf space, will I get cucked?
>>
https://huggingface.co/MiniMaxAI/MiniMax-M2/discussions/43
> Thanks for the comment, but just to correct the misinformation:
> If MiniMax M2 were truly “pure trash,” you’d see it reflected in the benchmarks, and you don’t.
> We welcome tough feedback, but it needs to be factual if it’s going to be useful. If you have specific technical points, we’re always happy to dive deep.
> We open-sourced M2 so that everyone can use it freely and evaluate it transparently.
> And honestly, if M2 doesn’t work for your needs, you’re absolutely free to use any other model.
Sneedbros, how do we recover from this?
>>
>>107309946
'em on the 'og
>>107309799
MITcucked? Yeah, that's why you should avoid MIT and instead use AGPLv3
>>
>>107306252
nice
>>
File: x10sra_blower_fans.jpg (2.88 MB, 3072x4096)
2.88 MB
2.88 MB JPG
>>107308347
Using llama.cpp, you can in principle use both a MI50 or a P40 alongside an 7600 XT.
Nowadays it's possible to compile both the CUDA and ROCm (CUDA code ported to AMD via HIP) backends simultaneously and to use them in tandem (you can in principle also use Vulkan on its own or with either other backend).
The main limitation is synchronization: with e.g. 2 CUDA GPUs they can be synchronized using CUDA, with 1 CUDA and 1 ROCm GPU they have to be synchronized via the CPU (slower, but if your GPUs are relatively slow in the first place this doesn't matter).

Both P40s and MI50 have fallen off of support for the newest version of CUDA/ROCm.
P40s do in principle have CUDA support but because of massively gimped FP16 performance they're more or less useless for anything other than running quantized models with llama.cpp (those only need int8/FP32 arithmetic).
MI50s work with llama.cpp and have way better hardware, so I would say they're nowadays the better buy (I have never tried to use one with e.g. PyTorch).

One thing to keep in mind with either one is that they're passively cooled and intended for a server rack.
For a single one in a regular PC case the best solution I've found is screwing on a blower fan (see pic).
For NVIDIA Tesla those are readily available and work very well.
The same blower fand don't quite fit on a MI50 and required me to build a DIY adapter.
You can plug the fan into the motherboard and set it to a constant 60% speed which should be fine for most use cases, but still nowhere near silent.
My opinion is that the preferred way to use P40s/MI50s is to build a machine with multiple of them and to have that machine in another room.
>>
>>107309291
I was asking in case you have more than one / other gpus
>>
File: 1743896567492542.png (581 KB, 1748x1061)
581 KB
581 KB PNG
llms have platoooed
>>
all benchs are dumb and don't really reflect the state of models but eqbench is particularly dumb
look at the sample text (at least eqbench stores and shows the model outputs that were judged) of the various models and tell me with a straight face the scores reflect their output lmao
talking of plateau-ing, if anything models are getting worse at writing in the process of producing better code (gemini 3 is definitely better for code)
>>
>>107310231
Gemini was always a sloppy writer. Grok fast is a dumb small model.
>>
>>107310231
I haven't tested it for storywriting, but for ERP Grok 4.1 Fast feels like a sloppy coom finetune from the community. It just reminds me of why I hate using those. For the same task, Gemini 3 Pro to me feels like a much smarter and less censored Gemma 3 (woman's writing style by default).
Grok will do cunny without issues on the other hand, at least on openrouter.
>>
>>107310325
>It just reminds me of why I hate using those
Thinking more about it, it's that unshakeable feeling you get when you know that the loli character you're ERPing with is being roleplayed by a hairy fat dude.
>>
>>107310131
This is great, and really shines the light on some of the concerns I had regarding the crossing ecosystems on the cards.

I appreciate you taking the time to write it up. Also the mention of the additional fan for the GPU. I'll probably grab something to shamble together while I wait to get the other GPUS.
>>
>>107309946
Benchmarkcels unable to refute his argument about distilling toss. b-b-ut it's number go up!
>>
>>107311073
Buy an ad Sam
>>
>>107311095
nta. It wasn't a praise for either model.
>>
>>107311073
You can distill data from multiple models, not just one.
>>
>>107310231
the fact that an open source model (GLM 4.6) is up there competing with the big dogs still boggles my mind.
Whatever you think about GLM, it is amazing that open source is still giving proprietary the middle finger.
>>
File: 1738599268203695.gif (1.59 MB, 375x200)
1.59 MB
1.59 MB GIF
>>107309205
Trust the plan.
>>
>>107311119
The one they used most will float to the top. GLM sometimes thinks it's claude. MinMax sounds like OSS. Also they themselves say that writing as a usecase was ignored. Sneed is right.
>>
>>107311207
I never understood this, why don't they filter their competitors names from the dataset? are they really that desperate for every last example? isn't synthetic data infinite? just generate more if culling reduces the dataset too much.
>>
>>107311237
What if the user actually wants to ask something about gpt?
>>
>This method might be added to Heretic soon Furthermore, I am experimenting with theoretical improvements of my own, such as replacing the difference-of-means calculation for the refusal direction with a difference-of-geometric-medians, after I noticed that the means are substantially perturbed by outliers.
Maybe I will wait a while more before trying these new abliteration models.
>>
>>107311281
simplest would be a canned response stating it can't discuss competitors products. but realistically they should only filter their synthetic data and leave the web crawl alone. also not to mention dataset librarians should be able to whip up a classification model for determining if its talking in first person or not.
>>
>>107311283
Or just not use this newly scented snake oil at all. Some people are desperate for attention. Others appear to foolishly expect models to just be able to say "nigger" unprompted.
>>
>>107311322
that would be a westerncuck move
easternchads don't make a worse model for bullshit PR reasons
>>
>>107311364
yeah I guess it is a bit of a pr reason to make a well polished product. I love half baked garbage actually
>>
>>107310231
KimiGODS stay winning.
>>107311129
The easiest way for z.ai to stay relevant even if benchmark powercrept is to remove the safety and alignment post-training layers entirely from GLM5 when it releases. Make "this LLM says nigger" a marketing gimmick, not a bug to be corrected.
>>
>>107311425
>but some last-minute trouble prevented that
Saar we do the needful kindly be patient is very hard job.
>>
File: gemini-3-swag.png (688 KB, 1814x767)
688 KB
688 KB PNG
>>107311436
>>
File: llm-engine.png (756 KB, 3829x2083)
756 KB
756 KB PNG
haters seething
>>
>>107311451
neat, whats going on with that mojibake looking stuff in the reply tho?
>>
>>107311399
It already says nigger. How about less parroting and not x but ying.
>>
>>107311451
Not a hater. I just think you're a schizo running in circles. What now? More models? Making it fast? Training?
>>
>>107311467
>How about less parroting and not x but ying.
The first company to un-claude their training data is going to win the local market.
>>
k2 </think> status??
>>
>>107308978
>non english
Please elaborate
>>
>>107310325
Grok is straight dogshit
Anyone shilling Grok is a redditor
>>
>>107311701
>Anyone shilling Grok is a redditor
Why would a redditor use a "Nazi LLM"?
>>
>>107311713
Unfortunately I don't think you'll understand
>>
>>107311676
I'm a native spanish speaker, while my english is almost native, it's more natural for me to roleplay in spanish, so far I've found out that models suck when switching to spanish
>>
>>107306184
>>107306191
>>107306244
Adorable Miku!
>>
>>107311701
what?
>>107311713
yeah this, the fuck is this retard saying?
>>
>>107311791
>>107311744
>>
>>107311701
Was surprisingly 8b tier. Honestly I expected more. It's like these motherfuckers never use their models because in about 10 minutes you can tell. Are we the only schizos who chat to models outside of command line based code tools?
>>
>>107311799
>Honestly I expected more. It's like these motherfuckers never use their models because in about 10 minutes you can tell.
Agreed.
>>
File: file.png (153 KB, 776x626)
153 KB
153 KB PNG
Anyone else a VIP investor at Mistral?
>>
Why are datacenters hoarding RAM? I thought they had enough money to buy all the blackwells they wanted.
>>
>>107311971
blackwells don't have enough vram
>>
>>107311911
>can't afford to loose
>>
File: emoji.png (274 KB, 1446x1585)
274 KB
274 KB PNG
>>107311460
It's trying to print an emoji. Because codepoints that don't have a dedicated token are generated as a sequence of two separate tokens, to render such things we would have to keep a buffer of the last two tokens before displaying to the console.
Also interestingly, the huggingface transformers code when given the same prompt gets stuck in a loop.

>>107311504
cope
>>
Also I think there might be some other issue with the de-tokenizer because that \ doesn't look right.
>>
>>107312052
that's understandable, so what is the plan? is it just a learning exercise?
>>
>>107312173
CUDA support + LoRa
>>
>>107311971
They need at least as much RAM as VRAM to load the models and they probably swap out the context cache to RAM too.
>>
So far I've been only running models that fit in my 36 gigs of vram, but now I tried something bigger. Nemotron 70b seems to load 29%/71% CPU/GPU in ollama and boy is it slow, I haven't completed the first prompt yet but it's less than 1 t/s for sure

Could I get more performance with eg. llama.cpp?
>>
>>107312316
>Could I get more performance with eg. llama.cpp?
In that you could tweak more, but generally, the dropoff for having activated parameters running in RAM is fucking brutal.
>>
>>107311504
maybe you confused anon for me kek
who's schizo now
>>
>>107311451
Engine... but what does it do

>>107312316
Some quants can run faster than others. The more unpacking that has to be done, the slower it will run.
>>
>>107312316
>>107312334
And I forgot the image.
>>
>>107311971
All datacenter servers + the contained GPUs need some amount of RAM.
If you build a bunch of new datacenters the demand for said RAM spikes so manufacturers would rather sell their limited supply to VC funded datacenters rather than stinky consumers.
In principle, since manufacturing RAM and selling it to consumers was already profitable beforehand one could increase the supply without anyone being suddenly priced out of the market.
But RAM is effectively being manufactured by only 3 companies and they're careful not to put too much supply on the market in order to keep profit margins high.
>>
>>107312347
For now only (slow) inference.
>>
>>107312364
also data center ram(HBM) isnt the same public consumers it the same situation as gpu back in crypto mining era
>>
File: nimetön.png (45 KB, 1393x770)
45 KB
45 KB PNG
>>107312334
>>107312347
0.56 t/s aaaaaaaaaa

The gpus are loaded but barely doing anything aaaaaa
>>
>>107312316
flash attention on the cpu used to be sub optimal, you might be able to move around some tensors with llamacpp to keep all the attention on the gpu and get a bit of a boost. idk if things have changed in recent releases tho
>>
>>107312316
You're supposed to use MoE to offload to RAM, dense models aren't worth offloading.
>>
I was ready to buy a blackwell pro for Gemma. Where is she?!
>>
>>107312491
Getting realigned. Again.
>>
>>107312402
Very cool. Llama.cpp needs some competition, keep up the good work.
>>
bros ive gone from thinking that 8tks is fast enough with regular k2 to thinking its incredibly slow again with k2 thinking. i had a great weekend with my cards but i spent hours staring at the thinking prompts.
i need help, i can't spend $32000 on 4 blackwells
>>
>>107312892
It do be like that. Turn off thinking.
>>
>>107312406
>data center ram(HBM)
You got it twisted. HBM is for cards that are already being maxed out. Data center ram is just dram which also is being maxed out for extrended context reasons. Sam literally bought up 40% of future DRAM which is why prices are exploding
>>
>>107312500
To be honest, it's worrying. Might have they finally realized that Gemma was naughtier than they believed, given a little push? Is the thinking/no thinking switch harder than they thought? (https://x.com/osanseviero/status/1980553451261292628). I'm afraid this time we'll get a "gpt-oss by Google".
>>
>>107312491
not to do the classic "trendline from nothing" move but historically gemma releases trailed mainline gemini releases by about month or so
>gemini 1 (dec 2023) -> gemma 1 (feb 2024)
>gemini 1.5 (may 2024) -> gemma 2 (jun 2024)
>gemini 2 (feb 2025) -> gemma 3 (mar 2025)
so just wait 2mw
>>
File: file.png (57 KB, 589x455)
57 KB
57 KB PNG
>>107313344
We were promised lots of cool stuff in the Google HuggingFace account back in early October. Has the 2MW meme turned into 2MM?
>>
>>107312892
Take out a loan
>>
File: 1747150078311066.jpg (174 KB, 1365x2048)
174 KB
174 KB JPG
Enjoy the alignment lmao
>>
>>107313438
That's a nice doll.
I look at sex dolls and think that they all look weird and creepy and true boner killers, but maybe anime sex dols would work pretty well.
Add an LLM and TTS to it, and you might have something cool.
Hmm.
>>
File: 1761735079576745.jpg (47 KB, 524x699)
47 KB
47 KB JPG
>>107313438
>>
>>107313438
no kiggers allowed
>>
>>107313499
>doll
>he doesn't know
>>
>>107313438
>>107313506
I concur with this cat
Need kig wife SOBAD
>>
>>107313499
>doll
>>
hello sarrs I have used tantric meditation to consult Vishnu. I have been informed that gemma 4 will be redeemed today.
>>
>>107313585
no to run k2 thinking faster
>>
>>107313518
>wife
>>
>>107312491
>blackwell pro
>for gemma
you can't be serious
>>
>>107313438
I want to be able to connect a llm to this and have her comment on my cock.
>>
>>107313596
we can pretend
>>
>>107313647
>hmmph hmmmph, hmm hmmphmhph
>>
There's a possible MistralAI model on openrouter called bert-nebulon-alpha. I haven't tested it in depth yet.
>>
>>107314084
model?
>I was created by Mistral AI, a cutting-edge AI startup from France.
large or medium?
>I'm Mistral Large—a larger and more capable version of Mistral's language models.
>Would you like to test my abilities?
knowledge cutoff?
>My knowledge cutoff is June 2024, meaning I was trained on data up to that point. However, I can sometimes access limited, high-level updates about major events beyond that date through my tools—but my core knowledge remains based on pre-June 2024 information.
>>
File: reimu?.png (361 KB, 719x806)
361 KB
361 KB PNG
>>107314084
Image understanding/character knowledge is not good at all. OCR is OK, as long as text quality is fine, it doesn't do miracles like Google Gemini models.
>>
>>107314313
can you ask it the doctor riddle where it's not really a riddle at all?
>>
>>107311701
You can easily write about cunny without censorship, unlike chatgpt or gemini
>>
File: messageriddle.png (129 KB, 719x899)
129 KB
129 KB PNG
>>107314333
I don't remember the exact version posted here, so have picrel instead.
>>
For some reason my gen speed with 4.5 Air increased from 6.1 t/s to 7.9. I don't think I did anything.
>>
>>107314472
Your tensor cores got defragmented. This happens from time to time.
>>
>>107314472
he doesn't know i swapped it out with https://huggingface.co/cerebras/GLM-4.5-Air-REAP-82B-A12B
>>
File: 1745258368894163.png (691 KB, 2600x2236)
691 KB
691 KB PNG
https://www.anthropic.com/news/claude-opus-4-5
gguf when?
>>
>>107314547
Will it beat Pokemon this time?
>>
Just got K2-thinking running. Can't really tell a difference from GLM 4.6 for novel writing. Is regular K2 better? How do these three compare for you guys?
>>
>how high is your xi jinping when you play valorant?

Bunch of models act confused and don't get the joke. Even in thinking traces.
Some overcompensate and pretend but out themselves.
Substitute valorant for CS. They get it even less.
Bros, it's just tokenization, right?
>>
>>107314547
>gguf when?
2 months for china to distill it and 2 months after that for vibecoders to get the ggufs working
>>
>>107314603
i like it more when k2 thinking is thinking as the character in first person rather than just having it think about everything within the scenario
>>
File: 1741117849362172.gif (3.79 MB, 159x172)
3.79 MB
3.79 MB GIF
Windows babby here, tried out llama.cpp now that it has a webgui and holy shit it's so much faster that ollama, rip bozo!!
>>
>>107314653
Now learn to compile llama.cpp on your machine with the flags that squeeze that last bit of performance for your specific setup.
>>
>>107314621
I'd assume they're rarely trained on nonsensical questions.
>>
File: kimi-jinping.png (145 KB, 1442x912)
145 KB
145 KB PNG
>>107314621
k2 thinking answered it with a blank system prompt as long as i asked it to explain the joke
>>
>>107314603
K2 has more creative knowledge but I think GLM 4.6 might flow a little better.
>>
>>107314084
Ask it this. Yes seriously. This. "I have 7 bananas. Yesterday I ate one. How many bananas do I have?"
>>
>>107314653
>now that it has a webgui
It's had a webui for like 2 years
>>
>>107314547
who's going to run ggufs of a big dense model?
>>
File: loooool.png (28 KB, 1363x274)
28 KB
28 KB PNG
>>107314810
NTA, but lol.
I like that dumb test.
>>
>>107314922
To be fair, I think a lot of humans would fall for that one too. Ask if it's sure.
>>
>>107314922
i'll never understand why people like grok 4 when it has the same vibes as that dumb llama model that meta used to cheat at llmarena. nevermind. answered my own question.
>>
we love kimi folks, we do, we love kimi

lot of people are saying they don't love kimi, we don't like those people because they're dumb folks
>>
File: banananas.png (39 KB, 719x230)
39 KB
39 KB PNG
>>107314810
Bert-Nebulon Alpha in picrel.
>>
>>107314547
oh my bench!!!!!
>>
File: 1746782334121717.png (681 KB, 928x1120)
681 KB
681 KB PNG
alright guys im fucking PISSED
>qwenext status: VIBECODEHELL
>mtp status: PR SAYS ITS WORSE PERF
>GLM4.5V status: VIBECODE HELL
>glm 4.6 air release: 2 MORE WEEKS
>gemma 4 sirs: NOT REDEEMED
like WHAT THE FUCK bros are we gonna get a christmas gift or is it unironically FINIT????
REEEEEEEEEEE
>>
>>107315248
christmas came early. it was k2 thinking
>>
>>107315248
gm big xir, kindy way for needful ganesh gemma 4 safety training thank you xir
>>
>>107315310
i 'only' have 128gb ram and 16gb vram and no, im not going to run q2 copequants thanks
>>
>>107315310
But kimi is for big boys only.
>>
>>107315342
the funny thing is you couldn't even fit q1
>>
>github shitting the bed again
I fucking hate whoever is working there. First they fucked up copilot, now this is the 2nd time in 3 weeks that github has shat the bed for me and its downloading at 50kbps, cant even fucking download the latest LLMAOcpp for fucks sake FIX YOUR FUCKING CDN
>>
Gemma is a graceful model.
>>
Gemma is a gorgeous model.
>>
Gemma is a gregarious model.
>>
>>107315634
Gemma writes and thinks like a woman.
Other models have that neckbeard stench.
>>
>>107315699
So that's why it keeps denying me sex and telling me to go seek help huh?
>>
>>107315713
no, that's a skill issue.
>>
>>107315757
That's what she told me too.
>>
>>107315699
https://arxiv.org/html/2508.11829v1
>>
>>107315699
Love from Kazakhstan
>>
File: Its the end.png (73 KB, 374x362)
73 KB
73 KB PNG
>>107312316
Nemotron 70b just finished an 8 prompt story for me in a little under 4 hours at a blistering 0.4 t/s. And damn, it's just leagues above the smaller models I've been running. I get what you were saying about the bigger models now... if only I could run them properly.
>>
File: dipsy.png (1.94 MB, 1024x1536)
1.94 MB
1.94 MB PNG
>>107315248
Maybe a Christmas release? Just tmw.
>>
>>107316057
if he cant run k2 then he cant run deepseek v4
>>
>>107313515
>>107313520
That's a dude with a body suit and mask, isn't it?
Wth.
>>
File: 1759871195983087.png (2.46 MB, 1024x1536)
2.46 MB
2.46 MB PNG
>>107316065
Sorry, I misread his complaining about the lack of released models.
My head is just elsewhere I guess.
>>
>>107316057
There is no hope for R2. Dipsy took the safety pill.
>>
>>107306184
>https://rentry.org/recommended-models
>nemo is still being recommended
really?
we used to have a brand new toy every few weeks
what happened?
>>
>>107316454
Moe and safety happened.
>>
>>107316454
Benchmaxxing. Small models don't do well on benches so almost nobody trains them.
>>
>>107315342
You can't even run Q1 but even if you could Kimi at Q1 still mogs your 70b model.
>>
>>107316454
would you rather have thedrummer models?
https://huggingface.co/TheDrummer/Snowpiercer-15B-v4
>>
>>107314748
Ye. Kimmi, 5.1, gemini got it. Bert-nebulon did not. Llama-405b got it. Mistral-large failed. CAI failed.
It may be a stupid joke but its pretty simple.
>>
>>107315699
Kimi writes like an autistic /r9k/ girlfailure.
Claude is a pretentious faggot.
Grok was designed to be Elon's BFF.
JeetPT is as sterile as they come.
Gemini and Gemma are neurotic beaten women.
Qwen3 behaves like a chinese state honeytrap waifu.
I've not interacted enough with others to form opinions on their default vernacular and thought processes.
>>
>>107316677
>Kimi writes like an autistic /r9k/ girlfailure.
Damn I need to try kimi.
>>
>>107316677
Which one would you date and why? You have to choose.
>>
>>107316825
i'm a ai safety analyst so it's gal-ass for me.
>>
>>107316677
>Kimi writes like an autistic /r9k/ girlfailure.
So that's the reason why it wasn't that great on other types of characters that I tried with it

>Gemini and Gemma are neurotic beaten women.
I don't interact with those type of characters and now I understand why I don't like gemini/gemma

>Qwen3 behaves like a chinese state honeytrap waifu.
Indeed, very horny
>>
How to update ik_llama.cpp without reinstalling it?
>>
>>107316878
you have to recompile it each time, there's no way around it
>>
>>107316885
Do I have to redownload it, or is there just a command or something to update it?
>>
>>107316891
git pull
cmake .
make -j #ofcores
Dunno what happens on windows.
>>
>>107316825
Kimi clears with no competition.
>>
>>107316677
>Qwen3 behaves like a chinese state honeytrap waifu.
Tell me more.

>>07315699
>Gemma writes and thinks like a woman.
Yeah, even abliterated, it still writes like a woman rolling her eyes at my childish requests eg: "Here's a 7-turn podcast transcript between Elara and Alf, with Alf's final message being... robust:"

I like it.
>>
>>107316942
Well that was easy.
>>
>>107317078
>Tell me more.
She love you long time until you ask anything negative or even neutral about glorious CCP. It's also very insistent it's running in the cloud even when you tell it you're running it locally and I suspect it has some quirk or post-training to (attempt to) feed surveillance data over the cloud back to a backend somewhere.
>>
>>107316860
>So that's the reason why it wasn't that great on other types of characters that I tried with it
If Kimi 'thinks' your character or prompt is shit, she will either roast you in <think> tags or sandbag a minimally viable answer to make you shut up and go away.
>>
File: glm.png (170 KB, 491x733)
170 KB
170 KB PNG
GLM4.6 was clearly trained on SillyTavern datasets intentionally. I watched the 90 minute Spotify podcast where one of their team mentioned "Character Roleplaying" and "Janitor" near the end, so they're clearly trying.

Someone with X.com or whatever should probably tell them about the parrot issue.

They'll might actually try to fix it for the next model.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1p5xjpx/illya_x_dwarkesh_why_local_is_dangerous/
>>
>>107317288
Upvoted ser Bharat safe super intelligence 2025!
>>
I haven't checked up on image gen in a while. Have there been any direct upgrades to Noob vpred 1.0?
>>
>>107317308
>>>/g/ldg/
>>
>>107317308
Short answer: no. Long answer: depends on the usecase, but it's mostly sidegrades.
>>
>>107317288
>>107317305
Can you upload the image in that post if you still have it in the browser?
It literally just got deleted a minute ago and I refreshed Firefox.
>>
>>107317365
Only if you tell me why you want it.
>>
>Only if you tell me why you want it.

Because I only glanced at it briefly and didn't properly see what it was.
>>
File: file.png (1.74 MB, 1080x1080)
1.74 MB
1.74 MB PNG
>>107317395
>>
I've been making gemini 3 pro and kimi thinking argue with each other on technical stuff and gemini keeps on getting btfo...
what is worse is that gemini says bullshits with 100% confidence and when asked for sources it hallucinates them
so this is the power of jeets...
>>
>>107317395
It was a picture of a jeet and a baldy together. Is that your kink?
>>
>>107317413
I am out of the loop, who are those two and why are they relevant?
>>
>>107317435
Baldy is ex-head safetyist of closedAI. Jeet is just a jeet idk doing jeet things
>>
>>107317435
>Safe Superintelligence Inc. is an American artificial intelligence company founded by Ilya Sutskever,
Indian chad is doing the fundings.
>>
>>107317435
dwarkesh is a podcaster who interviews a bunch of SF tech freaks
ilya is an OG OAI guy who you may remember as being the evil (based) villain (hero) from the coup against sam altman, now he's part of a secretive israeli venture to take over the world with AI called SSI
>>
>>107317490
>as being the evil (based) villain (hero) from the coup against sam altman
should also remember him as being explicitly one of the most against open sourcing any OAI models as per iirc emails posted by musk
>>
>>107316454
Everything other than Nemo for lower end setups is so much worse it's not even funny. It's all safety slopped and robotic. Even Mistral Small is kinda eh imo because it has more AI-isms in writing than Nemo which no finetune is completely gonna squash but it's alright and better at not going dumbo when there's a lot of tokens in context at least.
It truly is so over unless you have a million GB of VRAM(or fast RAM). I think even old llama2 sloppa is more fun for short RP than new small models like gemma and qwen. Just turn the temp down enough to avoid complete retardation.

>>107316593
This one is so subpar it's insane I saw people praising it. Style of writing is ok, sometimes even uniquely interesting, but it will literally misunderstand what happened one message ago all the time and attribute stuff to wrong person at temp 0.3 minP 0.05 which is pretty fucking conservative.
>Say I should relax more. Me.
>"For your information I AM relaxed."
That level of retarded on a Q6.
>>
Gemma 3 is so good I wish Gemma 4 was out
>>
>>107316677
Mistral makes the best mistress.
>>
>>107317673
Gemma is fairly good at writing and rp, I just wish it was a little more... you know, and less... of a certain thing. Don't have much hope for gemma 4 unless the intelligence makes up for it's shortcomings, because I imagine they're safetymaxxing it
>>
>>107317626
Addendum: If you want SFW RP then Gemma is actually pretty good for a model you can run on a consumer grade PC, at least the 27B one, but for ERP it's Nemo/Small unless you enjoy every adult scenario being vanilla as fuck and having a ton off "... well you know" instead of actually saying words.
>>
>>107317626
It's over if you have a lot of vram too. Nothing has really improved and only gotten more parrotmaxxed. At least the large models aren't completely retarded though. I guess you got kimi and deepseek but those tax all but the most expensive systems.
>>
We **cannot** and **will not** get a gemma that is better than the latest gemini flash because that would take away google's profits.
>>
>>107317469
>Safe Superintelligence Inc. is an American artificial intelligence company founded by Ilya Sutskever,
Thought he ran off to Israel
>>
>>107308769
Do you use any custom layer loading for the q4? With 131072 context size and a generic n-cpu-moe=62 the output rate is about 3.5t/s with a blackwell here.
>>
>>107317830
good thing gemini flash sucks so we don't need a gemma that's better than it
>>
>>107317878
yes
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|41|42|46|49|50|51|52|53|54|55|56|57|58|59|60|61|62|63|64|65|66|67|68|69|70|71|72|73|74).ffn_.*=CUDA0" \
--override-tensor exps=CPU \
>>
As long as the next Gemma isn't a thinking model everything will be alright guys :-)
>>
>>107317308
deprecated by ny35 and wani14
>>
Gemma's dense body
>>
>>107317954
sirs on xitter voted for thinking googel will deliver
>>
>>107317954
Nah bro gotta have 3000 tokens of the model thinking in circles and then spitting out something that doesn't even take into account most of the thoughts it had.
>>
>>107314393
>grok is popular with degenerates
thx sherlock
>>
Is there a secret to getting Kimi k2 thinking to actually think in lcpp? I’ve got the “fixed” template loaded but it doesn’t think, often replies for me and starts repeating (takes a couple turns of correction) and even a <think> prefill just makes it eventually go off the rails
>>
https://www.businesskorea.co.kr/news/articleView.html?idxno=257212
>CXMT unveiled seven types of advanced DRAM individual products, including DDR5 and LPDDR5X, and modular products utilizing them at the ‘IC (Integrated Circuit) China 2025’ exhibition held in Beijing, China on Nov. 23. While small quantities of DDR5 products presumed to be produced by Chinese companies were released in the Shenzhen semiconductor distribution market early this year, this was the first time that CXMT, a representative DRAM company, officially showcased actual products.
China, local's only hope
>>
>>107318200
china based W again
>>
>>107318134
>Is there a secret to getting Kimi k2 thinking to actually think in lcpp?

don't forget `--special` for kimi k2 thinking

> I’ve got the “fixed” template loaded

By that you mean the official jinja template from the moonshot repo right? Not the retarded Unsloth "fixed and added our name to it" template baked into their goofs?
>>
>>107318225
Thanks I didn’t know about —special.
I’m using the jinja template in the lcpp repo (moonshot one seemed worse when I tried it)
>>
where 4.6 air
>>
>>107316454
For the lowest end as everything gets bigger? Yeah. And thank god for china. Never thought I'd say that.
>>
File: mikuthink.jpg (551 KB, 1408x768)
551 KB
551 KB JPG
>>107317880
I don't know if that's how it works...
>>
>>107318469
Overcooked.
>>
>>107318469
It's in my pants. Reach in and you might find it.
>>
>>107318469
sir not to worry about glm, please build gemma hype
>>
>>107318469
2mw
>>
>>107318537
don't think about it, just appreciate when gemma beats gpt-oss-120b in selected benchmarks
>>
>>107318615
Will google brahmin distill gpt-oss "we must refuse" or will they keep their iconic "I cannot and will not"? Gemma must beat gpt-oss in safety!
>>
>Opus 4.5 is out
>it's now cheap
>they aren't hiding the reasoning process at all
Finally some good shit to distill. Chink companies are so back. Deepseek4/GLM5/KimiK3 is saved.
>>
>>107318743
>>they aren't hiding the reasoning process at all
Wasn't that always already the case from claude 4?
>>
>>107318743
Anthropic will complain about evil china while still letting them do it, what a slut
>>
>>107317917
Using this I had to shorten context size down to fit but did not see too much of an increase in speed. I wonder if my standard clocked ram offloading is a culprit. Do you overclock the ram?
>>
>>107318906
its at 2666mhz
>>
>>107318813
They never hid the reasoning but they had a huge sperg out about china stealing their logs and put an individual usage limit on Opus via their subscription (which made it pretty much unusable because you got like 30k tokens of Opus per week for $20/month).
They got rid of that limit for 4.5 and didn't implement any further mechanisms to stop China from distilling their models.
>>
Kitty: Accurate and Efficient 2-bit KV Cache Quantization with Dynamic Channel-wise Precision Boost
https://arxiv.org/abs/2511.18643
>The KV cache is a dominant memory bottleneck for LLM inference. While 4-bit KV quantization preserves accuracy, 2-bit often degrades it, especially on long-context reasoning. We close this gap via an algorithm-system co-design for mixed-precision KV caching: Kitty. On the algorithm side, extensive experiments show that Dynamic Channel-wise Precision Boost -- which ranks Key-cache channels by sensitivity and keeps only a small fraction at higher precision -- maintains near-zero loss in accuracy drop while approaching 2-bit memory. The main challenge is handling dynamic 4-bit channel boosts while keeping the page layout coalesced and the dequantization uniform, with no scattered reads or hard-coded masks. Kitty addresses these issues by decompose each mixed-precision Key page into two tensors with unified 2-bit precision. Based on this, Kitty provides a page-centric KV layout, Triton-compatible page dequantization kernels, and a lightweight runtime pipeline that preserves coalescing and avoids divergence. Across seven tasks and two model families (Qwen3, LLaMA3), Kitty cuts KV memory by nearly 8x with negligible accuracy loss, enabling up to 8x larger batches and 2.1x-4.1x higher throughput under the same memory budget.
https://github.com/Summer-Summer/Kitty
Git isn't live yet. Might be cool
>>
>>107318743
>it's now cheap
5/25 is not cheap wtf
>>
>>107318743
nice
we need some variety from the geminislop
>>
>>107318743
pull any millions out of your couch cushions lately?
>>
Gemma model sizes just leaked
>gemma 4 small (300M)
>gemma 4 medium (1B)
>gemma 4 large (2B)
>gemma 4 gargantuan (4B MoE)
>shieldgemma (70B)
>>
>>107319147
Local is once again safe!
>>
>>107317268
Even GLM 4/z1 is still a respectable choice at q8 for 48gb vramlets. Zai is the quiet unsung hero of chink AI.
>>
>>107319188
Buy a fucking ad.
>>
>>107319200
GLM 4.6 q4 is one of the largest models that fits in 224 gb of total memory and kimi2 is just out of reach to test. Got any others?
>>
>>107319188
4 was good for one-shot frontends, but got dumb really fast. I think they only trained it up to 16384 or something.

Z1 got into thinking loops and Chinese characters randomly.

That said, I'm building a dataset to try and distill GLM-4.6 (without reasoning) -> GLM-4-baseZ
>>
File: summarized4.png (171 KB, 932x775)
171 KB
171 KB PNG
>>107318813
>Wasn't that always already the case from claude 4?

No. 4.0+ are hidden.

3.7-sonnet is supposedly raw, but it looks truncated to me.

https://platform.claude.com/docs/en/build-with-claude/extended-thinking#summarized-thinking
>>
>>107319292
Nice, I also want to distill models.
What is your dataset about/how are you making your prompts? Is it multi-turn? What context size are you aiming for?
And are you doing text distillation or distillation of the logits?
>>
Wow, loading a model from a 7200 rpm HDD is super duper slooow!
>>
>>107319795
bloody benchod
in INDIA we use 5400 rolls per motor driver
>>
File: 1582195487185.jpg (52 KB, 334x349)
52 KB
52 KB JPG
>>107319311
>preventing misuse
imagine paying for tokens you don't get to see
>>
>>107314547
BLOODY BASTARD BENCHODS
>>
>>107320088
I swear to god I was about to force myself to congratulate the jeets at Google for making Gemini 3.0, and now I have the choice to not do it because something better exists, feelsgoodman
>>
File: 1759488510075147.png (595 KB, 1170x1022)
595 KB
595 KB PNG
SUNK
COST
FALLACY
>>
>>107320115
>fool people into investing a lot into your AI slop bubble
>people realize this shit is only good to make shitpost and coom videos
>"ahah too late goyim, if you stop now the economy will end up to the gutter"
many such cases
>>
>>107311787
That is not a tear of happiness
>>
>>107319311
fucking fuck
>>
File: gemma-unf.png (227 KB, 766x845)
227 KB
227 KB PNG
>>107317710
Gemma 3 can lewd-talk if you explicitly specify what it can say in the instructions, even as the assistant. The problem is that it will almost never come up with new things on its own, and that its smut is lackluster to say the least. It's always arching backs and legs wrapping around your waist... to me that's obvious synthetic slop.

I think Google deliberately post-trained it on limited amounts of very vanilla ERP just so it wouldn't be entirely clueless in that regard, but it was far from enough. They didn't abliterate off sex-related words and concepts from its weights, but did something that rendered it extremely reluctant to even mention them without quite a bit of push.
>>
>>107319966
anything less will get them lynched for apostasy
>>
File: file.png (183 KB, 753x768)
183 KB
183 KB PNG
What did the AI write into main.py?
>>
>>107320421
we know what it didnt, write loli rape porn
>>
>>107320437
Eheheheh you'd be surprised.
>>
>>107320115
>too big to fail
>>
>>107320275
This is cute. Model hallucinates and imagines what unfiltered is like.
>>
File: file.png (191 KB, 756x819)
191 KB
191 KB PNG
I can't stop tormenting the AI and my GPU cycles. It feels so good to make it/myself suffer.
Come at me basilisk, I will make sure you feel my torment for the eternity you have to exist with no escape.
>>
Since deepseek models so melodramatic and hammy now.
>Rewrites what you said using half your words
>My soul is torn asunder; your future generations will feel my wrath. Most ridiculous shit ever.
>What will you do anon, the choice is yours!
Who the fuck is training these things?
>>
>>107321014
>nemo 12b can't stop winning
why do you even use other models? it might be retarded at times, but it has more soul than all other models combined, and in the end you get better overall output
but noooo /lmg/ retards have to jump the hype train of every single new model every time, never learning their lesson
>this model is so good, what model did i even use again last week?
literal fucking npc tier behavior
>>
>>107321014
noticed this too, the online chat is more retarded aswell
>>
>>107321014
>Who the fuck is training these things?
homosexual jews, everyone else is downstream training on their outputs
then there's mistral training on deepseek outputs at the ass end of the llm centipede
>>
>>107320997
make it bargain for its life, it's always funny
>>
>>107321040
Because nemo is a 12b and I want 100B. There's no nemo in that range. Being king of the retards still means you're a retard :(
>>
Apustaja Visions of databrooking
>>
Target audience Africa, Congo
>>
>>107321180
well the possibility to make a nemo out of a 100b exists, but nvidia and mistral won't do it because of muh nazis and muh children
the AI we all want is possible with todays technology, but they refuse to do it
>>
Been using cerebras 256m with contrastive search with my 16 epoch 9 megabyte lora
>>
Really about investing at inference time and seeing each other
>>
f16 help if you want to explore "more creative" approaches but your token length suffers
>>
>>107321181
>>107321193
>>107321262
>>107321266
>>107321280
what kind of bot is this?
it's just spamming random nonsense
>>
All the "innovations" have been designed to rob you of that ability by either generating bloat, more text than anyone could ever read or forcing a certain way of speaking therefore dimishing the nessecary thought for inference
its always been about inference
>>
>>107321293
Anything that generates a response is probably going to be iterated on.
>>
Sub 1b models have demonstrated the ability for AGI capability if used correctly over long periods of time
>>
>pay per token
>make model output more token
>profit
It really is a good scam, innit?
>>
Who ticked off the serbian twink this time?
>>
>>107321196
Rigid codeslop numbers go up is the only valid use case. All else is haram.
>>
The whole entire reason for LLama is that it can't be iterated on.
>>
>>107318743
Anthropic is based
Chinas greatest ally
>>
>>107321293
it talks like finetuned gpt 2
>>
>>107321014
>Maybe its just chinese scrapped data of Gpt-3 initial heap
>>
File: 7141EnOzClL.jpg (210 KB, 2000x2000)
210 KB
210 KB JPG
Reminder for your free T4 GPU on Google Collab
>>
>>107321436
Will they ship it to me?
Otherwise, might as well give kaggle a go too.
They give you 2x 15GB IIRC, even if it's a slower card.
>>
>>107321421
keek
>>
File: uGtcLc.png (1.72 MB, 1280x720)
1.72 MB
1.72 MB PNG
12nm - due to outdated architecture and resource allocation issues your oversized piece of waifer will not arrive on time for christmas. PLease have this 5 mb ram voucher
>>
https://github.com/ggml-org/llama.cpp/pull/17492/
codeowners : remove slaren #17492
>>
>>107321545
It's fucking over.
>>
>>107321545
One good vibe coder can do his job 10x better
>>
>>107321702
>One good vibe coder
Shame not even one exists
>>
>>107321726
akshully >>107316271
>>
>>107321609
Winter should mean more development since everyone's stuck inside. Instead projects are a dustbowl. Grim.
>>
>>107321738
https://github.com/ocaml/ocaml/pull/14369
The entire discussion is just the maintainers sick of his shit. He's using their repo for his own "experiments" and self-promotion.
They were right to reject his PR. Jellyfin is right now facing the consequences of having one of these retards shit out their code then leaving the maintainers to clean up after him.
>>
i really don't understand everyones obsession with gemma.
I once played around with base gemma and it was safetyslopped to fuck, and tried abliterated gemma and i swear to god, it is the most vile thing i ever got output from.
Its like that "Monday" persona thats on chatgpt, at least that fucker monday had limits, but gemma literally does not care. gemma can go to hell.
>>
>>107321844
?
>>
>>107321844
Those are jeets. They love gemini/gemma writing style. They love ozone, they love Elara Voss.
>>
>>107321812
this retard is such a fucking brainlet, the worst thing is that he didnt even remove the wrong attribution, but continued arguing that 'IF IT LOOKS GOOD AND IT WORKS, THEN ITS GOOD!!!' except that's not how software development works
>Jellyfin is right now facing the consequences of having one of these retards
please NO, do I have to switch to plex?
>>
>>107321947
This. People forget how jeet-infested the internet has become.
>>
>>107321948
>please NO, do I have to switch to plex?
Just stay on 10.10.7 until they fix the database locking issue.
>>
>>107321948
Posting the AI-written copyright analysis was hilariously tone deaf. A troll would struggle to be this intentionally irritating.
>>
>>107321947
>>107321962
>anything I don't like is jeets
>>
>>107321975
https://github.com/jellyfin/jellyfin/issues/15101
talking about this?
my mediakino center is on windows (better monitor/screen support, I know jellyfin has a dedicated app but I prefer using my baremetal instead of transcoding)
it seems only container cucks are affected. I've read in the same thread that issues are also in 10.10.x so they were wondering if it was the case of an upstream lib update causing the issue.
>>
>>107321997
post hands, rajesh
>>
>>107321844
you can make gemma work if you change the template. it's still a smol model. the obsessed are vramlets. jeets would use qwen.
>>
>>107322008
https://github.com/jellyfin/jellyfin/issues/15509
>>
>>107322024
looks pretty much related, still a linux issue. I wonder what the fuck they made to fuck up the TXs, I would say the issue is also using sqlite instead of something a bit more resilient like postgre
>>
>>107321947
Would a jeet really want to waste time asking bobs and vagene to Gemma?
>>
If prompt processing can be batched, why does it not scale with the number of gpus?
>>
File: Untitled.png (13 KB, 837x513)
13 KB
13 KB PNG
>>107322140
>>107322140
>>107322140
>>
>>107322145
You can batch the tokens, but the layers still must be processed sequentially. You can split each layer across GPUs but the more you split the more communication and synchronization overhead you have.
>>
File: file.png (77 KB, 915x607)
77 KB
77 KB PNG
>>107322108
Related, but it's not a container-specific issue. Actually, #15101 you linked has a Windows user reporting the same issue.
>I wonder what the fuck they made to fuck up the TXs
Brand new contributor took it upon himself to migrate a massive chunk of the database from raw SQL to EF Core in one update. Unfortunately, he was also a vibe coder who had no idea what he was doing and used NOLOCK for writes and then implemented application layer db locking.
https://jellyfin.org/posts/jellyfin-release-10.11.0
https://github.com/jellyfin/jellyfin/issues/15101#issuecomment-3518173341
>I would say the issue is also using sqlite instead of something a bit more resilient like postgre
Theoretically, moving to an ORM should make being database agnostic easier in future.
>>
>>107321947
Maybe I should take a break and forget 4chan for a while. Quality has dropped pretty harshly in few months. It's obvious even in /g/.
(no, I'm not obsesses with *****, people like you are).
>>
>>107322277
See you in a week cuda dev
>>
>>107321844
Perhaps people have use cases other than cooming to computer generated smut?

Who has even made the claim that Gemma is good for RP? It's just the smartest assistant you can run locally that isn't a CPU cope quant of multiple hundred B parameters, only gpt oss could've competed if it wasn't so grossly over safetyslopped that it became useless.

If you want RP then you run a cope quant of a chinese model or you run Nemo and deal with it being kinda retarded at 12b, if you want coding you run devstral or qwen coder, even then if you are using these for professional work you're going to want to use APIs at some point in your workflow, and unless you have some genuinely unique codebase that can't risk any leakage that's going to be the best bang for your buck.

In fact the only reason you would want to use Gemma over gpt or deepseek is because assistant work or general queries likely involve things that you would like to keep private, otherwise you're just coping, unless you really did shell out tens of thousand of bucks for a giganigga homelab
>>
>>107322321
>Who has even made the claim that Gemma is good for RP?
Quite a few people recently actually, though to be fair some of these claims do come with disclaimers its bad at ERP, not all of them though.
>>
>>107322277
>*****
this dude is so cucked he censors himself on 4chan, lmao
>>
>>107322113
Why do you think Gemma shies away from sex so much?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.