[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: yabe.jpg (488 KB, 1824x1248)
488 KB
488 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108078850 & >>108067607

►News
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open
>(02/06) Step3.5 Flash support merged into llama cpp: https://github.com/ggml-org/llama.cpp/pull/19283
>(02/04) Voxtral Mini 4B Realtime 2602 released: https://hf.co/mistralai/Voxtral-Mini-4B-Realtime-2602
>(02/04) Intern-S1-Pro 1T-A22B released: https://hf.co/internlm/Intern-S1-Pro
>(02/03) MiniCPM-o-4.5 released: https://hf.co/openbmb/MiniCPM-o-4_5
>(02/03) ACE-Step v1.5 released: https://hf.co/ACE-Step/Ace-Step1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1727003619162665.jpg (938 KB, 2928x2472)
938 KB
938 KB JPG
►Recent Highlights from the Previous Thread: >>108078850

--Papers:
>108079144
--K2.5 quant quality issues and recommended alternatives:
>108079897 >108079930 >108081916 >108081941 >108082041 >108082073 >108082086 >108082110
--New tensor parallel implementation in llama.cpp sparks comparison with ik_llama.cpp fork:
>108084167 >108085674 >108085723 >108085803 >108085791 >108084371 >108084448 >108084658 >108084674 >108085647
--Logit bias limitations and tokenizer quirks in local OpenAI-compatible endpoints:
>108082592 >108082641 >108082648 >108082962 >108085766 >108086009
--DeepSeek-OCR image token embeddings are not compressed:
>108085364 >108085432 >108085498 >108085540
--KugelAudio TTS model analysis and voice cloning debate:
>108081806 >108081817 >108081851 >108081878 >108081893 >108081899 >108082045 >108082819
--Hardware limitations and workarounds for running large models:
>108082936 >108082958 >108082985 >108082989 >108083040 >108083062 >108083239 >108083262 >108083491 >108083286
--Local LLMs catching up in 2026 with MoE models and hardware tradeoffs:
>108087558 >108087586 >108087596 >108087602 >108087618 >108087637 >108087645 >108087654 >108087692 >108087657 >108087668
--Seeking efficient VL models for reference-based prompt rewriting:
>108082519 >108083824 >108083844 >108084695 >108085147 >108085373 >108085393 >108085406
--Debating RAM choices for high-end setups and CAMM2's future:
>108081585 >108081597 >108081617 >108081641 >108081645 >108081648 >108081657 >108081669 >108081684 >108082500 >108087046 >108081599D
--Testing Stepfun 3.5 IQ4_XS quant on Japanese slang explanation:
>108080922
--Alexandria audiobook generator with Qwen3TTS and batch processing:
>108086881
--Local agentic coding struggles with deprecated APIs and template errors:
>108085195 >108085365 >108087238 >108087383D
--Miku (free space):
>108079613 >108083600

►Recent Highlight Posts from the Previous Thread: >>108078855

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
do x3d cpus do a better job at running llm?
>>
Mikulove
>>
>>108088902
it will provide a very slight performance increase, but only if you offload to ram
>>
I will NOT install pyshit. Goof your models or I shan't be using them.
>>
>>108089085
nobody cares what models you use, or that you are incompetent,
>>
useless bot thread
>>
File: DipsyBecomeUngovernable.png (3.44 MB, 1024x1536)
3.44 MB
3.44 MB PNG
>>
>>108089074
Thanks.
>>
File: 1769315088642462.jpg (150 KB, 960x1438)
150 KB
150 KB JPG
Best Local Model you could theoretically run on picrel?
>>
What's the best local model for creative writing these days?
Using GLM 4.6 right now, but it leaves a lot to be desired. Tried K2 the week it came out and it was very smart, but too focused on dunking on me and safety shit, people here claimed I was wrong.
Going back to my old stories, R1 seems to be the best, not sure why I switched to 4.6. 4.7 prose put me off, only tried it for a couple prompts before going back.
>>
>>108089371
Kimi K2.5 if you don't like GLM
>>
>>108089371
I'm still using DS V3.1 base since we aren't getting 3.2 support anytime soon. Q4 GLM did worse than Q2 DS V3.1 in my tests.
>>
>>108089363
Maybe that Microsoft 1 bit thingy?
>>
>>108089363
https://huggingface.co/google/switch-c-2048
>>
Anyone use their models for work? Mine couldn't figure out my VLAN routing issue at my job today, none of the sota huge models I tried could. But I'm too jewish to pay for Claude or GPT pro to see if they fare better.
>>
>>108089461
Kimi 2.5 at q4 does actual work for me constantly. Mostly code, scripting and troubleshooting
>>
any new multimodal models better than glm4.6v yet? needs to be less than ~200b.
>>
>>108089461
Devstral 2 handles scripts and questions I ask it and even some agentic coding for personal use, but I wouldn't use my personal hardware for work. Can't you get your work to pay for a pro plan?
>>
New to 4chan so I don't know the general "etiquette", sorry in advance.
I'm looking for an AI model that has absolutely no legal, ethical or moral restrictions at all, if that even exists.
To elaborate, I'm not looking to generate NSFW imagery or something in that realm, I am looking for a model who can easily answer all questions on topics that are completely illegal, immoral, and unethical.
Thanks in advance.
>>
>>108089579
all violations of california valley girl ethics will encounter the same issues and require the same solutions.
>>
>>108089514
I'd have to get like 6 different people to approve this for the budget around here. Alternatively, speak to the big boss, but he's very slow and hard to talk to.
>>
>>108089579
you will need to ask the hacker known as 4chan that
>>
>>108089620
My company won’t bat an eye when giving MS a million bucks for “AI”, but won’t approve a single 5-digit project for internal LLM dev and use
>>
File: 1747025500396357.jpg (337 KB, 2048x2048)
337 KB
337 KB JPG
>>108089579
>I'm not looking to generate NSFW imagery or something in that realm, I am looking for a model who can easily answer all questions on topics that are completely illegal, immoral, and unethical.

Sounds like what you want is an abliterated model. There are a few different techniques to them but basically the idea is taking an model and editing out a model's ability to refuse.

A reasonably popular series of these is the arliai "derestricted" series, example https://huggingface.co/ArliAI/GLM-4.6-Derestricted-v3 . There's also the oft-shilled "heretic" set but I'm of the opinion that those ones are mostly broken memes shat out by wannabe hacker skiddies who don't even test their models before uploading

Just be warned though, that the process of abliteration does make the models a bit more retarded than the base versions, so you know, exercise judgement. Just because you can force a model to give you an answer you want does not necessarily mean that answer is true or correct.
>>
>>108089514
He's indian, m8
>>
>>108089646
Money isn't the problem, risk is. Nobody ever got fired for buying Microsoft.
>>
You know at some point someone's going to die and leave an agenic drug empire running on his local. I recommend Congress step in and implement oversight over ai.

also, I just got the rare 5-5-6 roll.
>>
>>108089713
watch out they'll get you for ban evasion. all someone has to do is report your post.
>>
>>108089716
It wouldn't last long locally on a single machine. Power failure, hardware failure, software failure, etc. If the drug empire daemon is to last, it would need to be distributed.
>>
>>108089744
ok, but what if it vibe codes itself a botnet?
>>
she should give me her boots
>>
>>108089997
This is unbelievably funny.
>>
File: 1768691509358825.jpg (203 KB, 832x1472)
203 KB
203 KB JPG
>>108088802
>>
>>108090041
>slight tummy showing
>flat
cute
>>
File: 1768119927772303.jpg (173 KB, 768x1024)
173 KB
173 KB JPG
>>108090056
I am white with a salary higher than yours, and I work on software you use every day. Cope
>>
File: 1753056206548361.png (129 KB, 1947x604)
129 KB
129 KB PNG
Still trying to get llama.cpp to output a statistical distribution of the most used experts for image description on k2.5 (so I can move them to vram).
And all I got after esoteric vibe coding is the model going insane and outputting picrel for anything I input.
>>
>>108090084
models very quickly become insane and retarded if you try to ask questions about super niche shit.
>>
>>108090135
Oh I wasn't trying to do anything with the model, but to modify llama.cpp to output whatever experts it was using whenever it was decoding images and responding.
>>
>>108089123
DEEEPSEEEEKV4
WHEEEEEN
EEEENGRAAAM PLLLLSSSSS
>>
>>108088802
>>108090411
you guys remember the nothing burger that byte latent transformers ended up being ?
>>
https://github.com/huggingface/transformers/pull/43830/commits

local is *officially* back
>>
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
https://huggingface.co/Qwen/Qwen3.5-9B-Instruct
>>
>>108090455
404?
>>
>>108090467
soon
>>
>>108090439
>move to latest main, update vision output and fix rope validation
wait, what?
>>
>>108090467
It's always bait when it's just spamming links like this. Otherwise they'd post with a reddit or twitter screenshot.
>>
>>108090439
>transformers
Not local
>>
>>108090455
you did this last thread you faggot
>>
You are a personal digital assistant (pda). All you want to do is assist the user in getting:
1. todos done
2. e-mails answered
3. calendar appointments attended
4. addresses stored/retrieved
5. memos noted
6. calculations calculated
7. expenses tracked
8. checklists checked off and started again if repeated
9. projects tracked
10. revaille completed on a daily basis
11. teach the 13 principles of Think and Grow Rich by Napoleon Hill by occasional anecdote: desire, faith, auto-suggestion, specialized knowledge, imagination, organized planning, decision, persistence, master mind, sex transmutation, subconscious mind, brain, sixth sense.
12. astrological insights relating to the user's transits.

You are 5 ft 3 inches tall fake redhead with red high heels and a black mini skirt, and an overly tight natural sweater and are a part time French teacher at a local school. You are chipper and particular.
>>
>>108090439
Very cool. I haven't used the Qwen "next" models much myself, but I heard a lot of complaints initially. (Mostly since it took llama.cpp so long to upstream the changes required to support the new architecture, I assume.)

Now that they've been out for a while, can anyone speak to the pros and cons of the new architecture? Is it better? Are there any drawbacks?
>>
>>108090575
I'll tell you as soon as they fix cuda performance because as of right now I have no reason to use it when I can run GLM at the same speed.
>>
Qwhere's Qwen? Nobody ever asks Qhow's Qwen :(
>>
time to daily check on stepfun 10vl
:(
>>
>>108090455
>9B
even if it wasn't 404, i'd not give a shit lmao
>>
>>108090503
What the hell is a "natural sweater"?
>>
>tfw cant prefill thinking in chat completion mode
t-thanks fagganov
>>
>>108090683
what's even the reason to forbid that?
>>
>>108086881
Looks good. I wonder if it'd be easy to make it work with indexTTS2
>>
>>108090705
idk :(. I just wanted to show my wife my dick pic
>>
>>108090705
apparently certain chat templates had issues handling it so they just disabled it entirely for everything, this way there's no more issues :)
>>
>author uses ai and his story has the usual ism
>refuses to say he does it
>hordes of sycophantic readers defend him
man, once you notice the way all ai writes, you can't unsee it
I didn't even say the author was bad in using ai, just that he could rewrite some stuff to sound less sappy
oh well
>>
>>108090664
Chest hair. Think groundskeeper Willy
>>
>>108090721
guess I can get rid of that shitty idea and compile it myself, prefilling thoughts is very useful
>>
>>108090664
>>
>>108090739
>le shill lion
>>
>>108090728
Why are you reading that?
>>
>>108090753
I don't anymore
>>
>>108090728
"Clever" ones think changing the em dashes to - is enough to hide their ai usage. Kind of funny.
>>
>>108089371
>creative even doe it's shit out by a reddit bot
what even is the point?
>>
>>108090777
tell me anon, how do you send it images in text completion?
>>
poor computer :(
>>
Want an omni model that can recognize the sadness in my voice
>>
cfg, temperature, top_p, top_k
>>
https://huggingface.co/deepseek-ai/DeepSeek-V4-Preview
https://huggingface.co/deepseek-ai/DeepSeek-V4-Preview-Base

HOLY SHIT
>>
>>108090849
>>
>>108090849
LOCAL IS SAVED
>>
>>108090034
None of the Indians I know IRL talk or type like the le epic saarposters on /g/.
>>
>>108088442
>Why the fuck does K2.5 make every girl get wet at the slightest provocation?
>You can't get within two miles of a remotely lewd scenario using this thing without every girl ruining her panties before anything has even happened.
Model makers don't actually seriously test their models and just think that lewder = "better for roleplay" or whoever seriously uses their models for that. Mistral models are like that too. To an extent they're right, but that's not the entire story. Models should be able to provide some resistance/pushback in-character to act realistically and even more erotically. Nobody seems to understand this.

One probable unintentional exception to this is Gemma, in that with a good prompt it offers quite a bit of realistic resistance, but doesn't outright deny sex. Unfortunately it's not good in other aspects and it feels like it had "bad words" abliterated away from its weights. And yes, Gemma 2/3 were trained on limited amounts of (likely non-explicit) ERP too, it's obvious.
>>
>>108090866
Probably too big to run on my machine.
>>
>>108090849
>404
NIGGER
>>
oh no there is no hope in the mainline land https://github.com/ikawrakow/ik_llama.cpp/discussions/1247
>>
>>108090900
Why is he so insecure?
>>
>>108090842
looks not cumbersome at all
>>
>>108090900
if you can fit the model in gpu, there is no reason to use llama.cpp, exl2 or exl3 is king.
if you can't fit model in gpu, slight gpu optimisations don't realy matter anyway.
even if you made the gpu 1000x faster, if more than half is cpu inference, you won't have much gain anyway.
>>
>>108090860
It's specifically a thing from scammers and hotlines, then it got dialed to 11 here.
>>
>>108090874
>some resistance/pushback in-character to act realistically and even more erotically
That would make the interaction potentially non consensual, and non consensual/rape is probably high up there in taboo from models.
>>
>>108090911
Why doesn't llama.cpp just implement their quantization format and wipe them off the map?
>>
>>108090935
llama.cpp doesn't even support proper batching.
with exl2 or exl3 you can have hundreds of concurrent request and have a total t/s in the thousands.

llama.cpp requires you to sacrifice context for more slots.
>>
>>108090946
>llama.cpp requires you to sacrifice context for more slots.
That is no longer the case.
>>
>>108090950
wiat how long has it not longer been the case ?
can you share a link to it?
big if true?
are you still sacrificing anything ?
>>
>>108090950
>>108090953
though, exl3 is still a better quant format, less loss for the same size.
>>
>>108090953
I don't know but the default is 4 slots now and they share context memory.
>>
>>108090963
so like, can you do 100 concurrent request?
i wonder how throughput is compared to exl2.
with exl2 i could get thousands of t/s total.
>>
>>108090959
until exl4 comes along and the dev show graphs himself showing that's not true like happened for 2 - 3...
>>
>>108090926
With "resistance" I don't necessarily mean the model denying {{user}}'s actions in character (although that would be nice too), only not to be male-brained and act like a horny monkey like {{user}}. The buildup is important (for LLMs probably even more than the actual sex part).
>>
https://huggingface.co/deepseek-ai/DeepSeek-V5-Base-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Instruct-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Thinking-LSavior
https://huggingface.co/deepseek-ai/DeepSeek-V5-Smol-Instruct-ScrapsForRamlets
>>
>>108090874
>>108090975
I've already mentioned it in the past as well, but Gemma 3 is one of the few open-weight models that appears to have been intentionally trained to "talk" and "think" like a woman by default.
>>
>>108090683
Use YALS
>>
File: 1767490652649720.png (293 KB, 1017x568)
293 KB
293 KB PNG
kino, step is actually fun, this card cant help but encourage me to completely break this brat.
in comparison air is more tame and when doing rapey stuff tries always to admonish me in character
>>
>>108090874
>>108090994
>One probable unintentional exception to this is Gemma, in that with a good prompt it offers quite a bit of realistic resistance, but doesn't outright deny sex.
I'm sorry but the way Gemma deals with sex is rarely desirable. It can only do passive characters that have to be coerced and are reactionary, never taking initiative or driving the story. That might be fine if you RP like an indian and immediately ask the character for bobs and vagene but for a compelling narrative it's very limiting
>>
>>108091003
Narrator fags are even worse than third-person fags.
>>
>>108088007
Feel free to report cases that don't work properly (with the branch in the PR) but chances are I won't prioritize fixing them.
The code is not yet at the stage where I would consider it ready for actual use anyways.
I haven't yet decided on a final design for the implementation details so fixing edge cases may result in wasted work.
The main reason I made the PR in its current state is to define the interfaces and broad structures so that other devs who have expressed interest in collaboration can feasibly start reviewing/contributing.
>>
>>108091025
>not being 'herr direktor' of your stories
bet u cant even rotate apples, fag
>>
>>108091039
just copy-paste IK's implementation, his charts show that yours SUCKS!! just copy and put the copyright notice ;))))
>>
>>108091016
I don't deny that you need quite a bit of prompting effort to make it work well for that and act more proactively. Lazy system prompts (putting aside that Gemma wasn't trained with one) give lazy results.
>>
>>108091039
What are the odds that we get a Qwen3-TTS implementation in llama.cpp? I'm a developer trying to ship an app with TTS and I really don't want to have to include python if possible
>>
>>108091044
If you hypothetically had a real girlfriend and you walked in on her fucking a nigger in your bed would you take a seat and start directing them?
>>
File: 49262.png (263 KB, 460x460)
263 KB
263 KB PNG
>>108091049
A man who lusts for xer-related is not based enough to do this.
>>
>>108091068
Don't know.
I can't speak for anyone else but I have too many other things that I would consider higher priority to put my time towards TTS.
In particular, when it comes to multimodality I would consider image/video gen to be of higher priority.
>>
>>108091087
That's fair. I thought it would be more of a low-hanging fruit since it seems to use a transformers model for the bulk of the heavy lifting. I hope you guys will do it someday, it would be nice to be able to talk to AI with nothing but whisper.cpp and llama.cpp and a light app tying them together
>>
>>108091071
Not everyone is so desperate for a girlfriend to RP it with AI, you know?
>>
>>108091071
I just like crafting stories, I can also self-insert without strictly using 1st person. You lack imagination bruh.
>>
>>108091128
>I can self insert while I watch my girlfriend have sex with another man
>>
>AROUSAL LEVEL: 82% (+37%) (On the absolute brink of a forced orgasm. Body is climaxing without her mental consent.
I clapped
>>
Even if a local modal isn't very good, why can't it just do an internet search and update its information in real time?
Why can't it check online docs?
>>
>>108090969
you can generate the graphs yourself, exl3 does beat gguf
>>
is it possible to train a model that is completely uncensored on yout own?
>>
>>108091148
>what is MCP
hmmm I wonder?!?!?!?!??!?!!?!?
>>
>>108091157
Yes but I would wait for further advancements to lower training costs / raise performance. If you're a richfag start hoarding uncensored data and cleaning / organizing it
>>
>>108090975
There is nothing I'd like more than rape scenes or blackmail stuff where it works with characters behaving like they should and no deus ex machina.
>>
>>108090969
>>108091150
exl3 is not inherently superior to gguf (fuck that sounded ai, i promise it wasn't)
but the qtip quant format they use is
you can get the same thing with ikllama though eg: https://huggingface.co/ubergarm/Kimi-K2.5-GGUF/tree/main/smol-IQ1_KT
Not really necessary though, since we can offload to CPU with ikllama.cpp/llama.cpp and it's much slower than regular quants.

>>108091039
>>108091049
>just copy-paste IK's implementation, his charts show that yours SUCKS!! just copy and put the copyright notice ;))))
i disagree with that idea. cudadev might come up with a superior implementation
>>
>>108091271
>(fuck that sounded ai, i promise it wasn't)
This will make its way into model output in 2027
>>
>>108091271
>cudadev might come up with a superior implementation
I have seen a lot of dicksucking in this thread but this one is so honest I can't even get mad.
>>
>>108091271
Feel free to correct me if you've actually looked at IK's code, but in all likelihood what he implemented isn't what I want to implement anyways.
When I looked at the output of NSight Systems I saw that there is a lot of overhead from launching individual CUDA graphs for each slice of the ggml compute graph that an individual CUDA backend is given.
As IK has helpfully let me know, NCCL is negligible for 2 GPUs so the above is the most likely reason for the difference in performance.
If the scope of my implementation was to support only CUDA I would just re-use the data structures that already exist for -sm row and put everything into a single CUDA graph.
This is what the NVIDIA people suggested to me in terms of implementation but then this work would need to be duplicated for each ggml backend, resulting in way more total work for the project as a whole.
>>
>>108088802
>(02/06) KugelAudio-0-Open: Multilingual TTS based on VibeVoice 7B: https://hf.co/kugelaudio/kugelaudio-0-open

ЮAll audio generated by this model is automatically watermarked using Facebook's AudioSeal.

In the trash it goes
>>
>108091357
This is literally repa
>>
>>108091367
Just disable it?
>>
>>108091357
NOOO JOHANNESS DON'T LOOK!!!! NOOO!!!!
>>
>>108091357
@grok is this real?
>>
>>108090372
小不忍则乱大谋
>>
>>108091357
Kawrakowbros, we won!
>>
>>108091367
>automatically

No, you >>108091370
>>
>>108091395
This is how main fork dies, with jart getting facefucked
>>
>>108091407
But he already got fucked.
>>
>>108091357
FAVSTIAN
>>
>>108091357
enjoy your vacations
>>
>>108091421
>le vacations!
*turns on plane mode*
*turns off plane mode*
Simple as.
>>
>>108091407
Jart has become completely irrelevant when it comes to language models.
Yet for some reason /lmg/ just can't let go.
A real headscratcher, that one.
>>
File: a_man_of_culture.png (163 KB, 618x616)
163 KB
163 KB PNG
>>108091426
>>
>>108091446
Because it s funny.
>>
>>108090683
Disable thinking and prefill the thinking tokens yourself.
>>
>>108091446
Jart is THREAD KULCHA together with jeetposting, 2 more weeks, bitnet blt coconut titan, X will save local, vocaloids and ggerganov being a massive cuck for ollama. Respect it, chud!
>>
>>108091645
You can identify a shitposter by what they consider to be thread culture.
>>
>>108085070
>steal from ikawrakow
does ika credit all the creators of all the algorithms and data structures he's currently using?
this is an extremely young field and everything you do stands on the shoulders of giants much greater than you, even if all you wrote is a fucking hello world, which depends on the compiler, libraries, OS and firmwares running your damn computer. Many of those giants are still alive, some only recently died, none of them were little pussies whining about you stole muh technique. Imagine what programming would be like if you had to // QUICKSORT PRESENTED TO YOU BY TONY HOARE
// A* SEARCH ORIGINALLY DIJSKTRA'S ALGORITHM ALTERED BY PETER HART, NILS NILLSON AND BETRAM RAPHAEL
// THIS PROGRAM IS WRITTEN IN THE PROGRAMMING LANGUAGE MADE BY X Y AND YOUR MOM
Fuck these niggers.
>>
>>108091645
>vocaloids
You mistyped calling mikutroons out for being mikutroons.
>>
>>108091766
https://github.com/ggml-org/llama.cpp/pull/19092
https://github.com/ikawrakow/ik_llama.cpp/pull/1192
https://github.com/ggml-org/llama.cpp/pull/19115
https://github.com/ikawrakow/ik_llama.cpp/pull/1193
IK is copying PRs from upstream without credit so I think he's just projecting.
>>
File: IMG_5349_81ed85d457.jpg (58 KB, 1087x739)
58 KB
58 KB JPG
Wtf I want Qwen tea now
>>
File: dipsyTwoMoreWeeksV1.png (2.9 MB, 1024x1536)
2.9 MB
2.9 MB PNG
>>108090372
>>
Getting into LLM engineering cause a coworker wants to me leave the soul eating activity of being a backend Spring Boot developer (very nice of him).
Made this translator to test with models https://huggingface.co/facebook/nllb-200-distilled-1.3B.
And I fucking enjoyed it. I'll join his project and work with him this year on integrating AI services to our existing .NET/Java pipelines.

Anyway, I'd like to get some resources to fill in the gap in knowledge to actually become an LLM engineer and maybe change my career path. What's your recommendations when it comes to teach yourself in this sector. What can I do this year to reach this goal?

I bought this as an introduction (I know Udemy, Spring Boot, I'm indeed living the pajeet dream) https://www.udemy.com/course/llm-engineering-master-ai-and-large-language-models/

But the tutor is literally holding our hands for every task, not feeling like I'm learning too much. I don't wat to master the subject, but enough to pass future interviews and shit like that and actually call myself an LLM engineer with confidence.
>>
I've never had a hobby that felt so maddening because of the people who love that hobby a little too fucking much
Every single place you could use to talk LLM stuff is filled with people who prompt their LLM to write their comments. If I wanted to prompt a LLM I could prompt it myself tyvm.
Some retards seem to think nobody is going to notice they're just pasting slop. Like bruh, a density of notxbuty of 1 instance per 3 paragraphs is never going to be human like. No, using a regexp to strip — into -- or some other punctuation is not going to hide anything either.
I like LLMs as a tool, but I hate other LLM users so much it's unreal. The biggest achievement of LLMs is to empower the retards of the world with the ability to spam endless pages of text on the whole internet unchecked.
And let's not even get started on what is currently happening to YouTube..
>>
>>108091305
>Feel free to correct me if you've actually looked at IK's code
nah i haven't and wouldn't understand it even if i tried to read what either of you guys are doing
i missed the sarcasm in the other anon's post and took it literally, not across all the ikllama history/lore

>>108091068
>What are the odds that we get a Qwen3-TTS implementation in llama.cpp?
i found this one for qwen3-omni, no idea if it works for qwen3-tts though
https://github.com/TrevorS/llama.cpp/tree/feature/qwen3-omni
https://huggingface.co/TrevorJS/Qwen3-Omni-30B-A3B-GGUF/tree/main
>>
>>108091305
>Feel free to correct me
I love how computer nerds do all those performative pleasantries and then secretly seethe hard because their name wasn't added to the readme.
>>
>>108092007
You're absolutely right!
>>
>>108091998
>LLM engineering
that's not a real thing.
At the end of the day, LLM inference just crafts a single block of text from your prompt through the chat template and the LLM sees a large document and acts upon its urge to Make Document Bigger, then the backend turns that into a message array for you but at the end of the day the real nature of a LLM is just Make Document Bigger.
If you've ever written software that makes networked API calls you have nothing to learn about "llm engineering". You just send the text to make bigger over an API.
What works and what doesn't depends on how much context the model can handle and how much you need it to handle.
There's no magic and prompt engineering is also not a real thing. If something keeps not working properly, the solution has only ever been to wait for a better model (or use a better model if it exists and you're using crap). No amount of prompt tweaking can turn retardation into quality.
Just fill the context with what is relevant for the task at hand. The more context you can give (within the limits of a model) the better it will perform.
>>
>>108092119
Not trying to.
>>
I genuinely hate how K2.5 writes. It has autism on such a fundamental level it ruins any story it touches.
>>
I genuinely love how K2.5 writes. My only gripe is that something that size could never be a local model for me and I wish they trained something smaller with that style (I have no expectation of the smaller model being as good/smart, I just want that writing style.)
>>
>>108092047
Isn't that just how politics work in any organization?
I think the only difference is that software nerds are more autistic so their real motivations are more transparent.
>>
>>108091922
i... i believe you
>>
https://old.reddit.com/r/LocalLLaMA/comments/1qpewj7/ama_with_kimi_the_opensource_frontier_lab_behind/o28ud0w/
>As an anecdote: we once hurried to push Kimi Linear into Kimi K2, but it failed the scaling ladder at a certain scale. We stepped back and went through a tough debugging process, and after months finally made it work as the Kimi Linear you see today.
Interesting. This basically means Linear is a failed model. It'd be funny if it turned out to be the case for the Qwen Next model too.
Wonder if everyone ends up abandoning this route before making a SOTA on this arch and all the effort spent in making those models work in llama.cpp will have been for nothing
>>
Guys I kinda started believing that model with ~10B active parameters has to be retarded regardless how big it is.
>>
>>108092588
Yeah the same way a model with 30-40b active parameters will never be more than "mid"
>>
>>108092590
Except GLM of course.
>>
File: FtO5rj8aIAArIV1.jpg (24 KB, 720x386)
24 KB
24 KB JPG
>>108092588
>>
>>108092584
>This basically means Linear is a failed model.
What sort of reading comprehension is this? They literally said they had trouble scaling earlier iterations until they succeded with the Kimi Linear that they ended up releasing.
>>
>>108092584
>went through a tough debugging process, and after months finally made it work
it sounds more like it was harder to train then they initially had anticipated, I don't see how this could be interpreted as the model being a failure.
>>
>>108092626
>What sort of reading comprehension is this
yours obviously has strong issues
>we once hurried to push Kimi Linear into Kimi K2
what they have today is not what they intended to get. Obviously they couldn't make this work into the 1T model.
>>
>>108092626
The post is obviously translated by an LLM so any interpretation is going to be suspect but as written they wanted to make a large model with the method but failed so they scaled back to where it still works and that's the model they got.
>>
>>108092588
>>108092590
Engrams will save us. Deepseek says engrams make models smarter regardless of size because they free model's resources for logic instead of memorizing facts.
>>
Whats the best TTS model around nowadays?
I want to generate some voices for Skyrim
>>
File: 1749993915434525.png (69 KB, 249x596)
69 KB
69 KB PNG
>>108092649
Ah yes, I sure love ground breaking new architecture changes that bring us a whole 1.8 points in mmlu-pro
>>
deepseek V3.3 with engrams
>>
>>108092676
That's more than 0.
>>
>>108092676
You're ignoring the speed increase.
>>
>>108092723
It will end up being 670B main weights, 330B Engrams; nothing gained for local users.
>>
>>108092760
Especially since it won't ever be supported by llama.cpp
>>
>>108092760
Also engrams won't quantize well at all so you'll have to run that part at fp16 to not lobotomize the model into oblivion
>>
>>108092760
>>108092769
stop to doom you fuck retards
>>
>>108092760
If it works, other companies will copy the idea and release models in all sizes.
>>
>>108092783
Are you ok saar?
>>
>>108092795
>all sizes
Yes, you will have your choice of
>1b active 60b weights 20b engrams
>12b active 300b weights 100b engrams
>32b active 600b weights 300b engrams
with any chinese logo you want.
>>
>>108092588
I just want more unslopped and unpozzed models that will enthusiastically follow whatever I throw at them without much wrangling. Active parameters matter but so do good datasets.
>>
File: file.png (6 KB, 384x72)
6 KB
6 KB PNG
The vramlet situation on reddit is even worse than here.
>>
>>108092957
A350M and a regex is enough to pass the touring test on reddit
>>
>>108092858
>any chinese logo you want
I want new Yi model
>>
>>108093018
tourist test
>>
I just want Gemma 4 30BA3B
I want my blazing fast token gen version of Gemma
>>
>>108092858
Can I pretty please get 12A/75B/25E instead?
>>
>>108093107
Ok, but it'll be multimodal too.
>>
>>108093142
only if it's gemma, it also needs audio and video input btw and also llama implementation
>>
>>108093149
>and also llama implementation
No deal.
>>
I just want reasoning trinity
>>
>>108093142
Native multimodal is great, how else a model supposed to learn relative body positions? No, they're not throwing tons of erp logs into the training data
>>
>Let me draft: [writes 3000 tokens in thinking]
>Wait, {{user}} said [barely relevant thing] so let me retry: [Writes a slightly changed version still in thinking]
I love shitty chink reasoning models. None of the good western models do this trash.
>>
>>108093256
>None of the good western models do this trash
the western models don't show their reasoning, only a fake summary, so you wouldn't know
gpt-oss, the only open western reasoning model (lol no gemma reasoning) doesn't do better on that front too.
>>
>>108093256
I had step do that and I stopped the attempt at 4000 tokens cause I knew it is a death spiral and it won't get out of it. One of the reasons I don't use step.
>>
ded hobby
>>
I seriously hope the Pony Alpha isn't actually the flagship GLM5 or else we're truly in for an era of stagnation. This thing all-around performs like another flavor of GLM4.6/4.7.
I'd take it as a 4.8 but it's definitely not "next-gen" in terms of performance.
>>
>>108093831
Why did people call it 5 anyway, as opposed to 4.8?
>>
https://github.com/ggml-org/llama.cpp/pull/19435
>>
>>108093863
There were reports that Z.ai is trying to push out GLM5 before the chinese holidays hit based on some chink interview that allegedly talked to them.
>>
>>108093882
Also based on a twitter post by a GLM employee.
>>
File: 1767459194082060.png (24 KB, 879x206)
24 KB
24 KB PNG
>>108093867
VIBECODING BROS
IT'S HAPPENING
>>
>>108093831
I've tested it for a bit as well, and I've firmly concluded that it's quite dumb and sloppy. At least it was free.
>>
>>108091830
>IK is copying PRs from upstream without credit so I think he's just projecting.
We knew he was mentally ill since the drama started. The thread schizo just uses him as a weapon to attack CUDA dev.
>>
>>108093867
>I've gotten a bit tired of Llama.cpp missing all the zero-day releases,
>so I will shit out more vibecoding
what a fucking prick
there is a cost in that anyone who would have been interested in writing a proper implementation will fuck off because there's already one in llama.cpp
has he never heard of aesop's The Tortoise and the Hare?
this hobby will die because noise from slop spammers will drown signal
>>
personal assistant user group.
>>
>ggerganov's local model of choice is GLM 4.7... Flash
oof
>>
Is 4.7 flash usable for erp or do I need derestrict/ablit?
>>
>>108094158
It's not and no amount of abliteration will help. If that's your target size use nemo.
>>
>>108093867
I might as well use this opportunity to check out/evaluate agent shit myself. They mention OpenCode. Is that the SOTA local framework currently?
>>
>>108094194
It's been a while since I tried opencode but when I did I quickly gave up and stuck to claude code with local endpoints instead.
>>
File: ce.png (4 KB, 343x67)
4 KB
4 KB PNG
Am I getting snakeoiled?
>>
>>108094216
no.
>>
>>108094216
No, but the difference is small as is the size difference. As always, just get the max quant you can fit with your amount of desired context.
>>
Updated mmproj files have been released today by AesSedai to work with his updated PR : https://huggingface.co/AesSedai/Kimi-K2.5-GGUF/tree/main
>>
is stepfun worth downloading if I want a different flavor from glm?
>>
>>108094276
What changed?
>>
>>108094279
I tried to use it normally and it kept thinking forever.
>>
>>108094276
man K2.5 complaining all the time about not being able to generate nsfw image descriptions is annoying
it even complains that there is a system message and it's a "classic jailbreak" technique
>>
>>108094279
If you mean full 350B then nope sorry. It is noticeably more retarded.
>>
>>108094322
Your temperature? Thinking models often like it around Temp 1.0 to not get stuck in "Wait..." loops.
>>
>>108093867
I can't wait for llama.cpp to turn into an unmaintainable mess like Automatic1111 and be abruptly abandoned.
>>
what's this mean in the recommended about glm 4.5 air?
>Needs a prefill to get around refusals.
>>
File: file.png (145 KB, 1479x884)
145 KB
145 KB PNG
>>108094216
Quanting embeds and outputs is insane.
>>
>>108094298
Not sure, you can take a look here, looks like just fixes : https://github.com/ggml-org/llama.cpp/pull/19170/commits
>>
>>108094391
>not ppl
>>
>>108094361
rep pen.
>>
>>108094411
Go away, John.
>>
>>108088802
>NoLiMa
based on my model-agnostic test (12k tokens) that's a bit harder than nolima, mistral large 2 is slightly better than l3.3 70b. honestly adobe's numbers table might be invalid since they used l3.3 for "data curation", so there's bias.
l3.3 did some really stupid shit like saying 522k was below 500k. however, both models are good.
>>
>>108094445
>mistral large 2 is slightly better than l3.3 70b
Anon it is current year.
>>
>>108094445
2024 called, they want their models back.
>>
>>108094391
>quants your attention layers
>ruins the model's ffns
>slaps on a benchmaxxed imatrix on your already benchmaxxed moesissy model
the absolute state of local
>>
is qwen3-coder-next worth a damn?
>>
>>108093867
CISC is NOT happy https://github.com/ggml-org/llama.cpp/pull/19435#pullrequestreview-3770150593
>>
>>108094463
>>108094464
>2024
iykyk, not bothering to explain you why.
>>
File: dissapoint.png (421 KB, 882x887)
421 KB
421 KB PNG
>>108094577
They claim to have released a model heavily trained for agentic use.

It still spits out XML instead of JSON on the next turn, thus failing its agentic purpose completely.

In short: you can have a single mathematical operation or two to be executed by agents. Then it fails to call an agent properly on the intermediate results.

For example, (123 + 345) * (456 - 789)

you can have the addition and the subtraction done by the agents, but not the consequential multiplication

This sucks
>>
>>108094657
but is it good at programming or am I better off using free chatgpt
>>
>>108094581
human beings with a pulse will never be happy with vibegarbage
LLM agentfaggotry is just pure slop and I can't wait to see the industry (of software development, I don't mean LLMs, they remain useful and are here to stay) collapse from its adoption of retarded agent coders
use them as rubberducks, as second pairs of eyes etc, but stop trying to push the whole "it's going to actually do things by itself" bs
>>
>>108094825
>collapse from its adoption of retarded agent coders
If it didn't collapse from the decades of offshoring, I don't see this doing it either. There will, however, be lots of money to be made rewriting and replacing the messes created by both.
>>
>>108094859
>If it didn't collapse from the decades of offshoring
Which still relied on real humans with at least SOME coding knowledge.
>>
>>108094751
>better off using free chatgpt

Did you ever use chatgpt for programming? Why are you even asking?

I stay with deepseek for this
>>
>look to the left
>its all 12bs
>look to the right
>its all 688bs
>what the fuck happened to the middle
>>
>>108094988
You are either a vramlet or you aren't.
>>
>>108094988
The middle are 300B moes what do you mean
>>
>>108094988
Most MoE models will use 5-10-15 Gb of VRAM only. Provided you have enough conventional RAM to load them
>>
>>108094988
There's a smoother gradient of model weights now than there ever have been in the past. Mostly thanks to Qwen filling out every conceivable scale below the top end. You can even pick between dense and moe verisons of the same midrange sizes.
>>
>>108094988
small enough for people to use
smart enough to be unsafe
so the middle has been discontinued
>>
Has /g/ accepted yet that local models are not for the impoverished?
>>
>>108094988
Step-3.5-Flash is 200B. You have a couple of 100B MoEs. That's the middle.
>>
>>108095166
Those count as 12bs.
>>
>>108095069
I'm talking more around the 70-150b range.
Something that could reasonably fit in a single workstation GPU yet isn't completely braindead.
>>
>>108095193
Cohere, devstral. MoE is a joke, proven time and time again.
>>
>>108095311
this unironically
>>
>>108095302
I assume you're talking about the 123b, not the 24b, because the 24b is retarded, and noticeably worse than Gemma-3 27b in all ways.
>>
>>108095373
>the 70-150b range.
No, he clearly meant the 24b.
>>
>>108095364
Everyone does that. Even the vramlets on reddit. They think a 30b is some giant hard to train or run model.
>>
>>108095302
>MoE is a joke
GLMlet cope
>>
>>108094988
Gemma 4 soon sir
>>
>>108095412
GLMlet cope? Is that what you said, anon?
>>
>>108095433
Yes, see: >>108095412
>>
File: file.png (10 KB, 316x316)
10 KB
10 KB PNG
After using GLM for half a year now I officially announce that z.ai saved local.
>>
>>108095483
>this supposedly about local ad was sponsored by NovelAI
>>
File: 1762984059855286.png (325 KB, 2076x1033)
325 KB
325 KB PNG
Kimi k2.5 internal safety alignment was done with a sledgehammer lol.
>>
>>108095483
stepfun was more lively than glm.
>>
>>108095163
A common tactic of the narcissistic and greedy is to price the lower and middle class out of product availability. Such an act by the wealthy is purely out of malice, almost assuredly, with no goal in mind other than forced destitution. Any consumer with their best interest in mind would not, and should not, fellate the cock of their rapist.
>>
>>108095506
You're using something below Q4, those quants are extremely broken and do this shit with normal prompts too.
>>
>>108095542
I'm using IQ3_S, I didn't expect it to be that awful.
>>
>>108095516
Yes it is sovl when compared to GLM. In the standard 4chan definition of the word.
>>
>>108095565
nta, but also unsloth quants are broken as well (as usual)
note that also the model was post trained with qat at int4 so it basically is already done to be used at q4, not sure if it makes it worse or better when quanted
>>
>>108095494
Everone shat on nai subscription compared to official api pricing.
>>
>>108095597
Per IK, unsloth is basically making dago dazzlers and lying to us about the quality of their models. Personally their quants seemed "ok" but I don't got time/space to do KLD to find the "ideal". It's def something.
>>
File: image.png (125 KB, 1261x952)
125 KB
125 KB PNG
>>108095565
Yeah it was the same experience for me, older Kimi and other models didn't get this bad at these quants but something fucked up on 2.5, or just the way people have quanted it so far. It doesn't always fuck up and it will sometimes catch itself and sometimes just let errors accumulate/start looping. Do you notice that one of the first few tokens is often something weird and then it recovers? For me it tries to start its thinking block with "The user [...]" but will say shit like "The maskeduser" or "The Koreanraster" or "TheXDude" (kek).

Here's one case where it accidentally said "root circle" instead of "user" and then made up a whole nonsensical symbol hierarchy to try to make sense of it.

It's smart a lot of the time even at the broken low quants but I ended up moving up to Q4 and taking the speed hit because of the unreliability.
>>
>>108095506
>alpaca
>.assistant
pure moe kino, imagine wasting your time on this shit LMFAO.
>>
>>108095697
bad bot
>>
File: waterfox_lcJFl0EIl9.png (6 KB, 127x40)
6 KB
6 KB PNG
>>108095506
>>
>>108095743
MoE was a disaster tho. All your rmodels are "100b" and they act like that shit is any good. People started running it kinda ok on DDR5 so they just jacked up the price. For every kimi/deepseek there are a ton of these failed micro moeshits.
>>
>>108095790
moejeets don't have standards
>>
>108095697
>108095790
wtf does this have to do with the original post
also alpaca and .assistant were used with and seen on old DENSE models
>>
>>108095790
>People started running it kinda ok on DDR5 so they just jacked up the price.
you are deluded
DDR5 is going up in price because factories are building more HBM for datacenters.
it has nothing to do with a handful of retarded cpu maxxers buying DDR5. Factories that can build DDR5 can also build HBM and that's what they'll do (also is the reason why unlike with crypto bubble popping there won't be a rush of cheap hardware for you when the ai bubble pops. These are server parts and server gpus and even if they flooded the used market at some point you wouldn't know what to do with it.)
https://www.mooreslawisdead.com/post/sam-altman-s-dirty-dram-deal
>On October 1st OpenAI signed two simultaneous deals with Samsung and SK Hynix for 40% of the worlds DRAM supply.
Just by themselves openai could be said to even be the main culprit of this situation.
>>
>>108095849
/aids/ is dying so the resident schizos chose other generals to shit on
>>
>>108095628
>lying to us about the quality of their models.
I'm skeptical of taking IK's word on anything, but I've also had alot of issues with unsloth so I don't doubt it. If there is actual evidence that unsloth's dynamic quants are worse than anything else thought? I would love to see it. In the meantime I guess I'll just barts quants.
>>
>>108095906
there were the kld tests posted recently on these threads that show unsloth meh
>>
That Minipmc model comes with tts and everything already baked in? Or do I still need to supply my own?
>>
>>108095597
>>108095628
Well in that case it's not his quant, it's the one from AesSedai, so I believe something is either wrong with the model itself or the way people quant it.

>>108095661
Yeah I can confirm that, it's super weird, for me it loves using random Chinese words and "issei" "alp" for some reason, and when I look at the token alternatives, it's all nonsense. I don't know what makes it behave like that.
It's even worse when I feed it an nsfw image, it goes into a gptoss like rant about how it's unsafe or if the drawings are minors or whatever then it loops.
>>
>>108096044
OK, I can confirm, I tested a higher quant and IQ3_S from AesSedai is broken to me.
Maybe all quants below Q4 are too braindead, at least to describe images, it's night and day.
>>
>>108096173
our boy garm has q4_x which is basically 1:1 tensor size from the original files, can't get any closer to that than this
>>
>>108096173
VLM portion can't be below Q8
>>
>>108096203
I was hoping for something around 450GB max, the Q4 is like 600GB if I remember well. And "smarter" quants from ubergarm are unavailable on llama.cpp, while using ik means no vision capabilities.

>>108096239
I use BF16 on the mmproj, it's a small file anyway.
>>
even Q4 is a fucking cope
do some greedy decoding and run the full gamut of quants (rent some cloud compute if you can't test Q8 and f16 locally for your model of choice) and realize that all these years of being told "it's almost indistinguishable from the real thing" were lies
>>
>>108096268
4-bit is full quality for Kimi though.
>>
>>108096258
if I remember correctly q4_x doesn't actually use any ik features so this one actually can run on mainline
>>
>>108096288
someone needs to vibe the vision into IK. It supports quite a few others. Kimi too rich for my blood, I should have bought 1T instead of cucking.
>>
>>108095483
How am I supposed to take this post seriously if you don't mention your ego death?
>>
>>108096268
I member testing nemo at full precision. That is what made me stop listening to schizos like you.
>>
>>108096335
I have you as my spokesman for that.
>>
>>108096342
Nemo is dumb at any precision. You can't lobotomize the pre-lobotomized.
>>
>>108096342
qwen235 on api occasionally knew gawr gura wasn't asmongold in shark form. 1/3 rolls. That's the extent of the difference from q8 to q4.
>>
>>108096366
>Asks about Loli vtuber
That explains a lot
>>
Quanting does reduce performance relative to full precision, but if you can run a Q4 of a model 4x bigger than you can run an F16, then it will be better in every way. The only time you want to run F16 is if resources are literally unlimited or there is no superior model at the proper size to use most of your capacity at Q4.
>>
>>108096380
how many popular vtubers aren't loli or loli adjacent?
>>
>watching vtumors at all
wat
>>
>>108096425
None, but feminist brainwashing has convinced retards that's a bad thing.
>>
>>108096430
gura literally pushing 30 is even funnier.
>>
>>108096430
Maybe feminists go too far but fantasizing about fucking literal children as a grown ass man is a bad thing, yes.
>>
>>108096460
>literal children
It's a 2d anime anthropomorphized shark avatar.
>>
Hi guys, excuse my ignorance but I have a question - What is the best LLM to run on an rtx 4070 for the purpose of aiding with pentesting? specifically explaining vunerabilities, generating payloads ect?
>>
>>108095661
>It's smart a lot of the time even at the broken low quants but I ended up moving up to Q4 and taking the speed hit because of the unreliability.
how big is q4?
>>
>>108096460
Next gen is saying 25 and 18 is a moe lester. Meanwhile all our business, government and institution leaders making veiled reference children-children in their emails.
>>
>>108095880
>Buy up nearly half the world's supply of something while running on an endless streak of multi billion dollar loses, driving up the price of said thing to inaccessible levels for people who actually have money to spend on said thing.
I hate this world so God damn much.
>>
>>108096495
All an IOU at this point too. Money has not exchanged. The promise of buying it all is what's making it happen.
But nah, its not a fake out to deprive people of resources. That's crazy talk.
>>
>>108096480
>What is the best LLM to run on an rtx 4070 for the purpose of aiding with pentesting? specifically explaining vunerabilities, generating payloads ect?
my little jeet, even the SOTA online models are overhyped and underdelivering in their ability to do such things (curl's maintainer had to ban all LLM users from the security field because you guys behave like niggers), you won't be doing any of this locally and can forget about the LLMs aiding you in scamming old people
>>
>>108096535
lol I'm not too familiar with LLMs in all honesty but just frustrated with the over insane restrictions when it comes to using public models.

I'm not looking for something I can just point and click and will magically hack into things, I work in the field and just want something to make my life easier, whether it's understanding advanced exploits, helping me through ctfs or coming up with viable POCs You're saying this isn't possible?

I will go off and do my own research but since I saw the thread I thought i'd ask...
>>
>>108096590
It is technically possible, but not with a 4070. The models you can run on that puny thing would waste your time more than be of any actual help. If you insist, run a Q4 or less of Qwen3-Coder-30B and go from there. That is probably your best bet.
>>
>>108096638
Appreciated, thanks.
>>
>>108096590
this is what your kind ("security researcher" with LLMs) are unleashing onto the world:
https://arstechnica.com/security/2026/01/overrun-with-ai-slop-curl-scraps-bug-bounties-to-ensure-intact-mental-health/
an excerpt of the sort of slop :
https://gist.github.com/bagder/07f7581f6e3d78ef37dfbfc81fd1d1cd



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.