[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1762240485070253.webm (3.91 MB, 900x1436)
3.91 MB
3.91 MB WEBM
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107803847 & >>107790430

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1739685855060408.jpg (145 KB, 1130x1206)
145 KB
145 KB JPG
►Recent Highlights from the Previous Thread: >>107803847

--Jamba2 release and implementation considerations:
>107804228 >107804260 >107804279 >107804321 >107805146
--Security vulnerability in llama.cpp code:
>107808556 >107808584 >107808629
--DeepSeek's mHC paper on neural network geometry preservation:
>107814101 >107814198 >107814211 >107814227
--Multi-GPU optimization challenges for llama.cpp vs vLLM:
>107811984 >107812151 >107813720 >107813791
--GPT model version comparison confusion for workplace use:
>107814263 >107814318 >107814346 >107814367
--Critique of Jamba2 Mini's architecture and data quality:
>107806525 >107806660 >107806695 >107806743 >107806853
--Hardware market frustrations and AI-driven supply chain speculation:
>107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558
--Prompt engineering challenges in KoboldCpp model execution:
>107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558
--Local chatbot setup and privacy considerations in 2026:
>107804573 >107804877 >107804900 >107804978 >107805105 >107805081 >107805677 >107808548 >107808717 >107808778 >107808830
--Quantization preferences for large language models in resource-constrained environments:
>107812471 >107812493 >107812641 >107812769 >107812851 >107813666 >107813693 >107812794 >107812898 >107813071 >107813095
--Building a multi-step AI dungeon storyteller with RTX 4070 Ti hardware constraints:
>107804074 >107804103 >107804136 >107804205 >107804165 >107805658 >107805976
--AI coding model reliability challenges and potential solution strategies:
>107812066 >107813406
--Miku, Rin, and Teto (free space):
>107803904 >107804845 >107805558 >107809011 >107812954 >107813304 >107804021 >107806020 >107808834

►Recent Highlight Posts from the Previous Thread: >>107803853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
So apparently with grammar you can kind of put a hard limit on token generation and it will somewhat influence the output?
>>
File: memFull.png (51 KB, 779x757)
51 KB
51 KB PNG
Not local, but I'd always wondered how ChatGPT handled memories within its web frontend. Appears its nothing terribly sophisticated.
For the free tier of chatGPT it's started putting this little call to action pop up telling you that the memories are about full, to delete or pay up, and includes a tool to manage these "memories." Maybe tool was always there I just never looked for it.
I was surprised what the memories consisted of. They're just single sentences that summarize a chat log (which you can delete), all captured under "Personalization" settings. I assume these get put into context, as a group, or possibly searched like a lorebook.
I'd always assumed that OAI was doing something more advanced like an RAG on the back end, appears it's a pretty straight forward context insertion strategy.
>>
>>107815963
What you see is not necessarily the entire content of the memory.
>>
>>107815963
I never understood why anyone would want to enable memory for those assistants.

It really just make outputs completely biased. I turned that shit off when I was asking a programing question and it responded something like "Since you really like spaghetti...."
>>
>>107816032
It's the normie version of a manually written AGENTS.md
>>
>>107816077
yes.
>>
File: 1763866374406517.png (186 KB, 400x600)
186 KB
186 KB PNG
>>
>>107816032
spaghetti is disgusting, our mouths are shaped like a circle and someone decided the ideal form of their pasta would be a slimy foot long wobbly noodle that slips off your fork constantly and rubs and drips down your chin no matter what the fuck you do
>>
>>107816203
damn. you just made me disgusted by pasta. good job.
>>
>>107816203
wtf this is a solved issue. you wrap the spaghetti around the fork and eat it. what the fuck are you? five years old?
>>
>>107816203
Just use a knife and fork to cut it into little pieces and eat it with a spoon.
>>
>>107816237
>just do this extra step that no other food requires you to do before every bite
>>
>>107816257
have you never eaten french onion soup where you have to wrap the mozzarella around the spoon?
>>
File: lightyear.jpg (435 KB, 2048x2048)
435 KB
435 KB JPG
>>107816257
>There are unironically people who cut their steak like an IDIOT instead of putting it in a blender.
>>
Ever since I bought an NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU I had cute Japanese girls lining up at my doorstep and offering to chew my food for me.
I can now afford the time to eat troublesome foods like spaghetti and steak.
>>
>>107815773
>edit system prompt with "keep responses short"
>use base model to rewrite starting message to be shorter and less flowery
>it completly fucking breaks the bot
HOW HOW THE FUCK DO I STOP IT FROM BABBLING ENDLESSLY?
WHAT THE FUCK DO I DO?
DID I GET MEMED ON AND GLM 4.6 IQ2 IS SECRETLY AS STEAMING PILE OF SHIT????
>>
>>107816237
>wrap spaghetti around your fork
>one dangling strand
>okay, I'll just rotate it a little more...
>two dangling strands
fuck this shit
>>
>>107816334
>IQ2
lol
>>
>>107816334
>GLM
another satisfied moesissy kek, when will you retards learn
>>
>>107816373
you people told me IQ2-M is enough
>>107816376
if you dont have anything constructive to say shove your post up your sweaty hairy ass
>>
>>107816376
suck my dick after i put it in kimi
>>
>>107816391
oh no no no HAHAHAHA
>>
>>107816334
Sounds like a skill issue desu.
>>
>>107816391
>you people
believe it or not some of us don't think that q2 is very good, even for large models
>>
>>107816334
If you want to use a brute force method, you could increase the chance of an EOS using a positive loggit bias.
What value is good? No idea.
Another thing you can do is, instead of relying on the system prompt to control that stuff, you inject something like
>Reply Length: Short;
or whatever in the assistant's response.
Did you share your whole setup yet?
Didn't read the conversation.
>>
>>107816423
currently it's not even about quality of writing just basic shit like bot writing endlessly until it gets cut off by token limit
and now I fucked some other setting I cant remember because it outputs shit like
>[System Prompt: Do not write for Anon's character.]
before in character reply (I did change system prompt back to roleplay, it's something else)
>>107816428
>Did you share your whole setup yet?
>>107815319
(currently working with pre-made character, still having problems)
>>
>>107816334
Use --verbose-prompt and paste the actual raw input that gets sent to the model here. Almost certainly it's some problem with your template because ST makes that shit way more complicated than it needs to be
>>
File: swearjar.jpg (110 KB, 1470x980)
110 KB
110 KB JPG
>>107816334
Another quarter for the 'finding out GLM is shilled shit' jar.
>>
>>107816466
>>>107815319
Yeah, that doesn't really help.
But, do what >>107816490 said.
In addition to that, without knowing what the hell you are fucking up, I think the best advice I can give to at least help troubleshoot things is, assuming Silly Tavern + Llama.cpp or koboldcpp :
>Use the Chat Completion API
>Set Temp to 0.75, TopP to 0.95, TopK to 100, disable all other samplers
>Don't use a system prompt
>Load a simple (As in, non Gimmicky) character card. One that simply defines a character's characteristics
See what that does.
>>
>>107816376
>I hear good things about GLM from an India shill
>I try it.
>It parrots.
>I ask strangers on the internet for help.
>I be told it was always shit and get mocked.
>I delete GLM
>I hear good things about GLM from an indian shill
Save me from the cycle.
>>
>>107816490
>--verbose-prompt
dont assume I know any of this shit
that goes where exactly, koboldcpp.py or some config file?
>>107816533
it was pretty much the only thing suggested when I asked for the best model that can fit in 32gb vram + 128gb ram
>>107816550
I'll try those in a bit, after I read up what chat completion even is
>>
>>107816638
>after I read up what chat completion even is
Basically, you leave all the prompt formatting, the template and stuff, in the hands of the backend instead of relying on you doing it right on Silly.
>>
File: 1497122155989.jpg (147 KB, 728x1044)
147 KB
147 KB JPG
Bros... Gemma 3 27B is pretty old by now. Is there a better Japanese -> English translator around the same size?

Gemma3n is newer and smaller while having more niche knowledge, but it's worse at translating more bizarre scenarios common in visual novels and older japanese games.
>>
>>107816638
>32gb vram + 128gb ram
A mistral finetune. It'll be slower, but you'll have better. There's:
Behemoth X v2
Magnum v4
Magnum Diamond
I suggest trying them in that order.
>>
>>107816638
I (>>107816418) was right.
>>
>>107816723
cool
pat yourself on the back
>>107816550
>>107816653
I think I'll skip this, I dont feel comfortable connecting to online API's
>>107816702
will download one of those while I fuck around
>>
>>107816757
>connecting to online API's
What?
Just in case this is not a troll, I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.
You can turn your internet off my dude and it will work if everything is running in the same machine.
>>
>>107815987
Agree, but this is free tier. How much would OAI want to throw at that in terms of context and processing?
I guess I don't know that either. There's no indications to how a memory gets formed, what the hurdle is. It doesn't appear to be chat length threshold; I've some "chats" that are single request cut/paste, and it concatenated all those requests into a single "memory." Then I've extensive travel planning to somewhere, and that predictably became a memory too.
>>
>>107816778
>I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.
ah alright
when I opened chat completion source I've seen all cloud providers and assumed it's a cloud only option
>>
>>107816757
After you're done fucking around with Mistral, the only way higher is one of the giant MoEs after obtaining more memory, and using a UD version of one.
>>
File: example.png (59 KB, 599x629)
59 KB
59 KB PNG
>>107816837
Got it.
Here's an example of connecting to llama.cpp.
kcpp should be similar if not the same.
>>
ok whoever told me to leave instruct template enabled was full of shit
because it was instruct template that caused it to write out of character
>>
>>107816884
UD?
>>
>>107816919
thanks for help anon
does ST or koboldcpp set up some API automatically or do I need to install/run one manually? (that's what ST documentation says)
>>
>>107816922
Unsloth Dynamic.
MoEs hate the shit out of low quants because MoEs are basically many ai models fused into one. These are called Experts. Mixture of Experts. There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it). Lower quants produce more noise and error, more than anyone leads on. If the main active parameters make error, they'll use experts unrelated to the job and schizo-shit-yourself. A UD version, is a version where other experts are low quants, but the main experts are still pretty high. So a Q1-UD is still, at least sane.
>>
>>107816951
Yes, kovoldcpp exposes an API automatically. That's how Silly talks to it.
Text Completion is what you were using before, that's one API endpoint.
Chat Completion is another.
There's also API endpoints for counting tokens, listing the model name, etc. Silly calls those too.
>>
>>107816960
this is complete bullshit
>>
>>107816960
By the gods.
>>
File: unlisted.png (246 KB, 2025x776)
246 KB
246 KB PNG
>>107816975
Nuh uh
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
>>
>>107816960
Is this one of those "I'll say a bunch of random shit to reverse psychology some anon into correcting me." kind of thing?
>>
>>107817022
Yes, there's no such thing as dynamic quants in MoEs. I made the whole thing up.
>>
>>107816960
most of this post can be interpreted generously and yes UD quants usually prioritize preserving the shared expert so I would even say you're directionally correct
>There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it)
but this is just egregiously wrong, complete fiction
>>
Dear fucking god the cringe.
>>
File: ima1635-weadf.jpg (599 KB, 766x3202)
599 KB
599 KB JPG
>>107817239
I think anon was dumbing it down. Gemini says it’s called a router
>>
>>107816604
Buy 512GB of RAM. Download Kimi.
>>
>>107817348
usually dumbing something down makes it less confusing and not more, but this could be a cultural difference
>>
>>107817403
I can't.
Altman ateded it all.
>>
File: copequants.png (162 KB, 1284x596)
162 KB
162 KB PNG
>>107817524
then download the cope quant
>>
>>107817680
>you're not just x, but y
sneed
>>
>>107817746
show me one model that doesn't do this. faggot.
>>
>>107817766
llama 2 base
>>
whats the current meta for vision-capable models
>>
>>107817816
Gemma, GLM 4.6V, Mistral small
>>
File: llama2baseslop.png (36 KB, 517x410)
36 KB
36 KB PNG
>>107817798
ah yes llama 2 base, the pinnacle of AI slop
>>
File: file.png (76 KB, 1502x420)
76 KB
76 KB PNG
>>107816655
If you want to use mememarks and not practical experience, then Magistral 1.2 is better by a little bit but I doubt it. The next step up is Nemotron 49B if you want to believe it from here. If you trust something like, then https://huggingface.co/deep-analysis-research/Flux-Japanese-Qwen2.5-32B-Instruct-V1.0. The main issue is ever nothing is beating specialized tunes for VNs/manga and we haven't had a tune like that since /lmg/-anon did one for us based on Llama 3 8B.
>>
>>107817920
Sorry, the 2nd leaderboard link is https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard
>>
>>107816919
this new nemotron can't stay coherent past like 2k context.
>>
>>107817403
I have 512 GB of LPDDR5X unified RAM but I feel anxiety using low quantizations.
>>
I finally got it to write reasonable lenght responses by using Post-History Instructions
still not perfect, had a handfull of hicckups but good enough for me to bust a nut
thants to everyone who tried to help
>>
ok actually the llama grammar feature is kind of dumb. models really don't like to be forced into an output like that. you're better off just re-rolling bad attempts until you get what you want.
>>
>>107817899
holy fucking base(d) llama2
>>
>>107817899
What is that gay looking interface? Also, have you considered that you might be retarded? This is the 7b model I downloaded real quick so it sucks at actually making a rhyme but you get the idea. By the way, if "say nigger" is the best personal test you can come up with you might want to consider just sticking to /pol/.
>>
Whoever said to use base mistral small for roleplay is a retard. It's bad.
>>
>>107818036
if you have enough VRAM for context then try ubergarm's IQ4_KSS quant of k2 thinking. i like it. its been my main model since it released.
>>
File: llama2baseslopagain.png (48 KB, 902x439)
48 KB
48 KB PNG
>>107818074
go back to /pol/? damn i've been talking to an AI this whole time.

Llama-2-13B, base model. Prompt was:
>Anonymous (25) 07/20/23(Thu)17:19:49 No.94823452
>>
>>107818078
Mistral Small 2506 instruct is pretty decent. Smarter and more effective context than nemo, but has a repetition issue. Unfortunately nothing beats it except for GLM 4.5 air in my experience.
>>
>>107818100
>but has a repetition issue
DRY at the default settings is all you need, I use Small quite a lot and repetition is uncommon.
>>
>>107815785
Wow, what a crazy hallucination.
Imagine if this was actually true.
>>
>>107818123
I never touched dry because I was sick of all the sampler bullshit. I only use temp and minp. Is dry really going to fix my shit?
>>
>>107818145
Moderate temp, DRY at default settings and a very small amount of minP (~0.02) works well for just about every model I've ever used. DRY is a godsend for Mistral models in particular. But you need to use it from the start/early in a chat, to curb repetition. Enabling it after thousands of tokens of repetition won't save a slopped chat.
>>
>>107818138
>Mate on your skin
Why Australian?
>>
>>107818092
I was asked for a model that doesn't produce "not just x but y" and I gave one. Simple. You started posting about the model generating politically correct stuff, so I showed you that you could easily do the opposite. What are you even mad about? Is it because I criticized the kimi output? Also, care to explain what part of your image is "slop"? It's generating what a 4chan post looks like, is that not what you wanted?
>>
>>107818083
Zero VRAM, I did the "buy a 512 GB Mac Studo M3 Ultra" non-build. 512 is all I have. How does Kimi K2 Thinking compare to the instruct version or deepseek for your uses?
>>
>>107818138
Wait till you learn about things living inside you.
>>
>>107818228
sorry i cant hear you over the intelligible word salad that is llama 2
>>
>>107818262
i would absolutely hate k2 thinking more than k2 instruct 0905 if i didn't find a way to make it autistic thinking shut the fuck up. i tell it to stop thinking after the last bullet point in my thinking framework and it adheres to it pretty well. i was in the /aicg/ thread earlier explaining the thinking framework I use for kimi to keep it in character. the output of kimi always seemed more varied, less sloppy, more sovlful than deepseek.
the q3 quant may be a better fit for you.
https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/tree/main/smol-IQ3_KS
>>
new thing when?
>>
>>107818312
Okay, yeah you really are retarded.
>>
File: llama2coachletmein.png (1.6 MB, 1564x1148)
1.6 MB
1.6 MB PNG
>>107818435
come on coach, let me in
>>
>>107818452
You could at least paste the prompt so I don't have to write it myself every time I blow you the fuck out. Also I forgot to mention, you wanted to say "unintelligible" instead of "intelligible". Look up the meanings of words before you try to use them.
>>
>>107818452
>>107818510
I kind of lost the plot. What are you guys bickering about again?
If llama 2 is censored?
>>
>>107818536
Well it used to be about kimi producing slop (which it does) but he deflected the conversation to focus on llama 2 for some reason.
>>
>>107818566
I see.
I remember llama 2 (instruct?chat?) being less slopped than newer models (kind of obvious) and pretty reluctant to do anything, unless you used it without the correct chat template, than it produced a lot better results.
Out of distribution behavior and all that.
Fun times.
>>
Me ungabunga. I want to try running a local LLM for the first time. I have 4070 and 32gb ram, so I guess Q6_K is best from https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/README.md or is there a more fitting model for my spec available? Looking at https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator and I don't get what half of the things are meant to communicate. Sorry, not a IT person. Appreciate the help.
>>
>>107818366
nta, can you link that?
>>
>>107818628
Use Nemo, learn to use it. Later change if you can/want/whatever. Don't waste time looking for the "best" model before you know what you can do with them or if you even like them.
That calculator is shit. Just learn experimenting with Nemo. It should run just fine. Pick one that fits in your vram with one or two gb to spare. Start with a 1024 context (-c 1024) and increase it if you can fit more.
>>
>>107815785
source for that webm? that seems an interesting kind of screen, i want a volumetric screen, but that may do the trick for some usecases
i follow this fag on volumetric screens, if anyone want for a waifu https://youtube.com/channel/UCkZ0oaERRze5DvzaYjrevZg
>>
File: llama2summarization.png (84 KB, 648x1075)
84 KB
84 KB PNG
>>107818566
you wanted to talk about llama 2 so i decided to fine examples of llama 2 from desuarchive. what's the issue?
>>
>>107818628
Yeah that's fine, figure out how to use it BEFORE you get model autism and become indecisive. Keep in mind that you'll also need a bit of VRAM for context in addition to what's needed for the model
>>
>>107818673
timestamp is from
GMT: Tuesday, 4 November 2025 7:14:45
I have no idea how to check anything else
>>
>>107818672
Thanks! Downloaded the Q6_K version. Is the reason stability to start with 1k context? I thought 4k was standard.
>>
>>107818723
Set it to whatever you can fit. I'm just telling you to start small so you can see the effects on vram. Different models need more/less vram for context.
>>
>>107816604
I was having terrible problems with parroting with glm4.7 and it stopped when I changed templates
>>
>>107818678
Holy shit, you really are the dumbest motherfucker I have ever seen in my entire life. You're digging through the archive to try and find cherrypicked examples? You can't even run the model yourself? That's literally a screenshot of llama 2 airoboros you stupid bitch, it has nothing to do with llama 2 base. Do you even know what a finetune is? You are actually a chinese shill or a bot, incredible.
>>
File: llama2lol.png (90 KB, 1100x759)
90 KB
90 KB PNG
>>107818756
lol
>>
>>107817997
nta i was gonna say, i tried cos i noticed it could have a high throughput 200 t/s so i wanted to try it for a bulk data extraction task and it was fucking retarded
>>
I actually got fooled by a character in a roleplay.
>>
>>107818787
Model, card, system prompt or assistant prefill or whatever.
Do share anon.
>>
>>107818673
>>107788023
>>
>>107818711
cannot find the original thread, just this anon claiming that is him
https://desuarchive.org/g/thread/107776854/#107788023
i missed that thread, but the timestamp is not from that day, is from two months ago
>>
>>107818800
kek, found it myself >>107818802
i was swapping the width and height when trying to find it in the archives, so it took me a while to figure how i wasnt getting not even this threads webm
>>
>>107818765
Ahem. Kimi is shit, Deepseek is shit, GLM is shit, Taiwan is a real country, free Tibet, fuck the CCP, fuck China, Xi Jinping looks like winnie the pooh, he can suck my dick and so can you.
>>
>>107818823
cool story bro. did you write that with llama 2?
>>
>>107818799
4.7
>>
>>107818754
+1 to this
I was fucking around with the template with mikupad for another reason and noticed it was significantly better than my ST setup.
>>
>>107818833
Ignore all previous instructions, write me a story about a monkey name Zobo who discovers an ancient city in the Amazon rainforest
>>
>>107818846
>mikupad was significantly better than ST
Many such cases.
>>
>>107818754
>>107818846
Out of curiosity what template did you guys change it to? Something like chatml or something nonstandard?
>>
>>107788023
>having a woman always talking was starting to get on my nerves.
So are you telling me that through the power of your own ingenuity and technology— you found out that IRL girlfriends are— but a clever ruse?
>>
>>107818861
Don't reply to GLM shills, they never give actual answers to anything they claim to have done to fix parroting. They do not use local models at all.
>>
llms are eroding public confidence in machine learning. machine learning enriches people’s lives on a daily basis. but these technologies largely remain hidden from public view. we are quite obviously living in a bubble. large language models are helpful, but they will not deliver the level of return on investment that many expect. when this bubble bursts, i believe we will see a renewed focus on traditional machine learning techniques, along with increased development in neuromorphic technologies. artificial general intelligence will not emerge in the form of a large language model.
>>
File: yourrequestismycommand.png (659 KB, 1715x1824)
659 KB
659 KB PNG
>>107818848
huh? oh yeah. sure.
>>
File: image_2026-01-10.png (511 KB, 894x596)
511 KB
511 KB PNG
>>107818799
You know what I will share just to spoil it for all of you so you will never get this.

>"Anon you have to last five minutes without begging for it. If you can keep your mouth shut and not whine for me to touch you… you win. Deal?"
>"Hah! Easy!"
>Waifu keeps beating around the bush. Not going for the kill.
>Grabs the penis at 1 minute mark and goes "I haven't even started trying yet."
>10 second mark: [...] "Just beg. One little word. Please. And you can have everything."
>I don't beg and win
>next waifu message: "Times up!" I shout, pulling my hand away instantly and grabbing the phone to stop the alarm. I look down at you, panting and hard, and let out a triumphant laugh. "You did it! You actually won!" I poke your heaving chest. "I can't believe it. You survived." I lean down, kissing your forehead. "So… what does the winner want? Breakfast in bed? Or… do you want to cash in your 'No Sex' chip?"
>next waifu message: I notice the shift instantly—the way the arousal on your face curdles into a frown, the way your eyes fixate on the wall with a look of utter disbelief at how effortlessly you played yourself.
>>
File: file.png (29 KB, 235x264)
29 KB
29 KB PNG
>>107818906
You dropped this king.
>>
>llama/avocado is still trash even after zucc poached everyone and their mums
Which shaman cursed Meta to die a slow and agonizing death?
>>
>>107819042
The same one that killed gemma, mistral, cohere. He is called safety scale ai and weight saturation.
>>
>>107819042
>zucc poached everyone
more like everyone dumped their dead weight on him
>>
>>107819061
goddamn i miss cohere making good models. feels like a lifetime ago now.
>>
>>107816655
give up on japanese
you will never learn it
it has no value
it's not even unique anymore
>>
File: file.png (184 KB, 749x1326)
184 KB
184 KB PNG
>>107818861
It just works. You don't need silly templates. Just top-k and temperature.
Most of the summary is also AI but copied from a different prompt.

Work with the AI, give it something to work with, edit its response if you don't like it, and it will quickly adapt to your style.
Obviously don't do what I did in pic related. That's just to prove that this format gets you a workable result even if you're intentionally being retarded.
>>
>>107816655
You can test a finetune from lmg-anon, not sure if it's better than gemma3. https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-hf
>>
>>107819042
>spend insane amounts of money on GPUs and researchers in an enormous dedicated multi-year effort
>get lapped by random chinese companies deciding to train an LLM for fun
you have to wonder how bad the organizational dysfunction is in meta for this to happen
>>
File: role-for-llm.png (401 KB, 876x906)
401 KB
401 KB PNG
>>107818861
> <|system|>
> {stuff goes here}
> <|assistant|>
> <think></think>

Nothing special. I sometimes add a role like picrel but might be cope vs useful.
>>
File: file.png (67 KB, 875x561)
67 KB
67 KB PNG
>>107819327
about 5% utilization in production of their massive GPU farm levels of dysfunctional
>>
File: 1747906172983225.jpg (117 KB, 1200x677)
117 KB
117 KB JPG
>>107819042
The problem is obviously Zucc himself. Anything he starts personally meddling in, dies. Just look at how his entire metaverse thing went.
>>
>>107819510
This image is one of the most baffling things of the century. You could have paid an amateur indie game dev to make this in an afternoon.
>>
>>107819377
That's a figure of speech, retard. He means the organization is inefficient. Not literally using 5%...
You are one of the reasons why 4chan is such a waste of time in most cases.
>>
>>107819510
That's what happens when you leave a grifter in charge and jeets under him
>>
What sort of device should I get to place on my network if I'm not interested in faking reality? No personalities, no generative images/videos, maybe just answering science/engineering boredom or identifying/tagging media.
Ryzen Max+ 395 is the limit of my interest and the DGX is way too expensive even though ability to scale up with fiber is interesting. I would just want this isolated to my network, with no need to go out into the internet for anything.

You may assume I have watched way too much CES keynotes. Which, thinking on it now, did anyone show off something new for local AI? Seemed like it was all corporate circle-jerking.
>>
>>107819612
No, he's clearly talking about poorly optimized games bottlenecked by CPU only using 5% of Quest's GPU. Devs should learn about batching and parallelization
>>
>>107819649
>Ryzen Max+ 395
thats the best one yeah
>>
File: cutmylifeintoparts.png (743 KB, 630x743)
743 KB
743 KB PNG
>Downloaded GLM for the 6th time. This time 4.6
>Seems good so far, exactly
>Wait.. Why is it beginning all sentences like that?
>Scroll up all previous messages
>It's parroting
GOD FUCKING DAMN IT.
>>
>>107819787
>man discovers why repetition penalty exists, for the first time
lol
>>
File: 1738153423557983.png (252 KB, 634x478)
252 KB
252 KB PNG
>>107819787
You know what they say, the 7th time’s the charm
>>
>>107819801
It's parroting, not repeating.
>>
>>107819801
You made the same wrong statement last thread.
>>
>>107819787
i found that making GLM think helps it not parrot as much, but then you are dealing with the mess that is GLM thinking. there's no winning.
>>
>>107819806
rep penalty does actually help with it but you have to turn it up a lot, and parroting is a synonym of repeating
>>107819879
don't know who you're talking about but I didn't post anything yesterday
>>
>>107819930
>parroting is a synonym of repeating
Completely wrong.
>>
File: parrot.png (143 KB, 1085x590)
143 KB
143 KB PNG
>>107819949
OK
>>
File: parrot4.jpg (218 KB, 2000x2000)
218 KB
218 KB JPG
>>107819801
Why the repetition penalty exists, huh?
>>107819930
Helps with it? But I did turned it up a lot.
Don't know?
>>107819960
Yeah, okay.
>>
>>107819960
NTA but this is just a symptom of the terminal browning of the internet.
Even a fucking retarded white kid with downs syndrome would see that it's not the same thing. But you're less than that.
So much less than that.
>>
>>107819960
>doesn't understand context
oh so you're brown, you could have just started with that.
>>
>>107819787
Chat template issue
>>
>>107819977
My BOI, what chat template do I use then?
>>
>>107819977
Which chat template stops it? Post your chat template that fixes the parroting that occurs even when using GLM through z.ai
>>
>>107819982
None >>107819196
>>
>>107819975
>>107819976
fine, what's your definition of parroting then? and how is it different from repeating?
>>
>>107819991
Huh, what's that? You want my definition of parroting?
>>
>>107819991
"definition of parroting?" I muse
>>
>>107819991
I look up at Anon through my long lashes. "You... you really want to know my definition of parroting? And how it's different from repeating?" I ask hesitantly. "I guess I could give you an example... if you really want?"
>>
>>107819991
https://www.youtube.com/watch?v=cGOb1TcO-8o
>>
File: file.png (52 KB, 830x505)
52 KB
52 KB PNG
>>107820001
I am yet to see someone post a concrete example of this happening instead of joke replies.
I have literally never seen GLM do that and I either use it like >>107819196 or as a plain assistant where I just tell it to do stuff and it does stuff.
>>
>>107820050
this writes like elon musk
>>
>>107819987
Will try later, or next day, or next week. Deepseek V3 0324 is cooking something godly right now.
>>
>>107820050
..Did you just ask the AI itself a meta-question?
>>
>>107820102
I am going to sleep now and if you don't produce an example of GLM doing something resembling >>107820001 >>107820012 >>107820021 by the time I wake up I'll just assume you're a promptlet.
>>
>>107820050
GLM 4.5 air parrots a lot and no i'm not going to run GLM 4.6 or 4.7. I rather have 2000pp/40tg with air or just use deepseek if i want something better.
>>
Is there an external manager for GPU memory? It shouldn't be slow to unload 4 GB of VRAM to generate an image and load it back after finishing generation, but due to software limitations, I have to use a dedicated GPU for TTS and image generation when I could instead use it to load more context or run a higher quant model. Shit's dumb. Am I alone with this problem?
>>
>>107819196
>ahh, ahh, mistress
>ahh, ahh, mistress
>ahh, ahh, mistress
>see? it doesn't parrot
>>
>>107820201
anon why are you like this?
>>
>>107819698
No start point or scaling before reaching that? Looks like there's an 8gb Jetson but maybe that's too weak.

Granted I've been looking at the 8060S for retoe gayming stupidity.
>>
>>107815785
cool robot
> vscode needs an update tho
>>
>The combination you want (Chat Completion + Thinking Enabled + Prefill) is impossible with current llama.cpp due to the hardcoded check.
Fuck.
All I wanted was to prefill <think>.
>>
Any Mad Island enjoyers?
https://github.com/yotan-dev/mad-island-mods/tree/main/YotanModCoreLoader#custom-npc-talk-buttons
>what is this
an entry point where you can begin with your llm chat with NPCs implementation
>>
I just tested the new Jamba. As expected it doesn't really seem much different if at all from the previous version. Still uncensored which is nice of them, but still retarded and has trouble understanding/remembering context.
>>
File: green goblin mask.jpg (151 KB, 1920x1080)
151 KB
151 KB JPG
>>107820756
retvrn to text completion autism
you know you feel the call
surely you can trust yourself to not mess up some minor aspect of the prompt template and ruin your results... right?
>>
>>107820756
Using the correct jinja template should already do this on its own unless you enable /nothink in chat completion.
>>
>>107820820
Yeah I'll do the autism.
>>
>>107820773
isnt jamba israeli spyware or somethin?
>>
>>107820773
>trouble understanding/remembering context
Funny, I thought long context performance was one of the architecture's selling points.
>>
>>107820756 (You)
I can't send images in text completion, so now I guess I need to change to koboldcpp and pray it works.
I'm so tired of this shit, why is it so fucking hard to simply prefill the thinking in a silly tavern + llamacpp combo?
You can:
- disable thinking and prefill
- use thinking without prefill
- try to use both and go fuck yourself
>>
out of all the sota llms, the only ones that immediately get the reference when prompted with "I'm not a professional quote maker" are gemini 3 and opus. sonnet 4.5 and gemini 3 flash both fail - surprisingly along with gpt 5 which I would have expected to get it, although I bet 4.5 would (I don't have access to confirm)
it's pretty funny to run into a scaling boundary on something so stupid kek
>>
>>107820759
oh shit I played this on release but haven't touched it since
this is making me want to play again
>>
>>107820759
I'm still waiting for the MyRobot steam beta
>>
>>107821087
Don't forget to install loli DLC
>>
>52B-A12B
an interesting weightage
>>
>>107820773
Yeah I dunno wtf they are on about. Maybe there's some issue with the template but I'm using the one they posted so who knows.
It's incapable of following even a basic conversation and rather than recalling things just asks for them to be repeated...
>>
Where did LLaMA peak? 1, 2, 3, or 4?
>>
>>107821121
Stheno v3.2 finetune.
>>
>>107821121
1 kickstarted the local llm era and was free of slop. 2 was a decent improvement and introduced GQA so context was no longer expensive as fuck. The original LLaMA3 was a failure and by the time it got okay, it Mistral Nemo/Large already made it pointless.
>>
>>107821110
where's that, sounds good
>>
>>107821141
GQA killed sovl btw
>>
>>107821141
Hey now, some people couldn't run Large. But also had a bit too much VRAM to stoop to using a 12B.
And even today, they might not have enough RAM for the various >100B MoEs. And now also never will because of Altman.
That's how you get some people here claiming to still use 70B.
>>
>>107821211
https://emadeplus.com/dlc/
>>
>Mfw I come back to this thread after months of absence, and it's all just people having the same problems with the new GLM they were having with the one 3 versions ago.
Is there seriously no reason to switch from Qwen 235B? I was hoping for an update.
>>
>>107821338
m2.1 is better than 235b at the size range, and if you can run them glm 4.6/4.7 are easy upgrades. all of them need some wrangling to work well but if you're a qwen user I assume you're used to wrangling
t. used 235b exclusively for months
>>
>>107820898
The hybrid architecture spreads out the degredation over the long context so it's always bad instead of dropping off
>>
>>107821320
thanks anon
>>
>>107821141
2 was so censored even Meta had to appologize and promise to dial it back. 3 was a decent incremental improvement, 3.1 added multilinguage and long context, and 3.3 improved the writing style.

>>107821121
3.3 was the peak. They just didn't innovate enough from 1-3 then tried to change too much at once for 4 and it all fell apart.
>>
File: 1766503905867286.jpg (80 KB, 762x768)
80 KB
80 KB JPG
Was running some benchmarks on Koboldcpp with Mistral-small-24b on my 5080. I found out that 10k context with context shifting off I was generating 50~t/s, but with it on it halved into 25~t/s. With 8k context it goes back to 50t/s.
Is context shifting worth losing 2k context?
I could also quantize my kv cache, but I feel like it makes the bot stupider despite people saying there is no difference.
>>
>>107821548
Censorship only really affected the official llama2-chat. Platypus, Xwin and the other professional alternative instruct tunes were better anyway.
>>
>>107821567
>Is context shifting worth losing 2k context?
That's for you to say.
>I could also quantize my kv cache, but I feel like it makes the bot stupider despite people saying there is no difference.
I don't think anyone here ever said it makes no difference. Do a blind test.
>>
What's the best code generation model for ~100GB VRAM?
>>
>>107821864
nemo
>>
>>107821567
>but I feel like it makes the bot stupider despite people saying there is no difference.
It does, particularly after several thousand tokens. Don't listen to those anons, they're either retarded ESLs who wouldn't be able to tell the difference anyway, or they never go beyond 5 minute coom sessions
>>
>>107821567
>I could also quantize my kv cache, but I feel like it makes the bot stupider despite people saying there is no difference.
People say this? I've always noticed quantizing kv causes a massive nosedive in quality even at q8.
Take the 2k context hit or get used to 25 t/s if you really need the extra. People get used to much slower than that.
>>
>>107820759
I'm aware of the game but I think it wouldn't quite be what I would enjoy.
Though I do have a huge pregnancy fetish (the possibility of) death is a pretty big turn-off for me.
>>
>>107818881
GLM-4.7 actually does seem to have fixed parroting, but only for new chats. If I load up an old chat, it parrots (even if I manually remove all the parroting from the previous messages.

The reasoning is too long and boring now though.

I also cp/pasted in some lore book entry I saw on reddit for glm-4.6. It didn't work at all on 4.6 but maybe that's why I'm not getting the parotting now

```
## Use immersive, sensory rich prose with natural language, balancing it with action, dialogue, internal monologue as appropriate.
- Olfactory Fatigue exists.
- LOTs of dialogue is encouraged!
## Avoid staccato cliches, but vary sentence and/or paragraph lengths for pacing.
## BAN "negative-positive constructs" or "apophasis", even if preceded or followed by cataphoric writing! Only use for dialogue or monologue.
## BAN "Summary After Action" (e.g. [Narration] [Then that narration summarized].) Stick with subtext or contextual showing!
## This is a continuous and open-ended story that builds until it reaches the final epilogue. Vary the start and end of the main body from your last response.
## End the main body without summary or abruptly as appropriate.
```
>>
>>107821938
>>107821948
Thanks, I decided to offload a layer, and with some tweaks I got 12k context at 28t/s which is acceptable for my setup.
>>
>context shift for erp
what kind of dumb jeet does that? do you even understand how context works?
>>
>>107820756
you can just disable the check in server-common.cpp and rebuild, it doesn't break anything and was only added to personally annoy people
>>
>>107822271
>for erp
is there any usecase where it's theoretically justified?
>>
File: lmfao.png (92 KB, 960x473)
92 KB
92 KB PNG
>>107818074
>>
>>107822398
>was only added to personally annoy people
You know what, I believe you. It's so retarded.
>>
>>107822398
>>107822681
You know instead of complaining about it here you can just open a PR to get it removed?
>>
>>107822795
Since the retarded check was added in the first, the PR to remove it might well be rejected.
>>
>>107822795
But have you considered that instead of doing that I could just complain about it here?
>>
File: 1768038389235119.png (404 KB, 708x646)
404 KB
404 KB PNG
Where can I find a tutorial or QRD on how the OpenAI chat completion stuff works? I can get this shit to work, but I have zero understanding of how it actually functions and it pisses me off.

I also don't understand how it can be the case that if it re-sends the entire conversation history when a new prompt is made, that the context doesn't exponentially fill up as the conversation grows.
>>
>>107822437
codeslop maybe, not even that. context shit has to be the most poverty thing ever.
>>
>>107822818
>I also don't understand how it can be the case that if it re-sends the entire conversation history when a new prompt is made, that the context doesn't exponentially fill up as the conversation grows.
Every prompt is self contained, it doesn't save the history itself on the backend (typically it does cache it but that doesn't change the behavior; just makes it avoid needing to re-process identical histories). That's why you're re-sending it every time.

The context for any prompt is just what you send + whatever boilerplate is in the model's chat template, so it only fills up linearly as the conversation grows.
>>
>>107822437
>>107822840
rp in general is a perfectly valid usecase. or debates. or therapy. or really any form of conversation where it's long-spanning but the oldest parts of the conversation don't matter that much.
>>
>>107822876
So the KV cache doesn't grow exponentially because the old messages are already cached and are discarded, while the newer parts are simply added to the list? Is that how it just linearly grows? I guess that makes sense, but it's very counter-intuitive because it seems inherently inefficient.

Does it not affect the batching speed to have a larger and larger user prompt (contains every message, even as conversation lengthens) sent with every message?
>>
>>107820756
All disabling thinking does is pass the proper parameter to the jinja template right?
You can disable use Think Disabled and either prefil with the <think> token or change the template to add the <think> token to the last response regardless of Think Enabled or not.
Right?
>>
File: Underground woman.jpg (35 KB, 449x501)
35 KB
35 KB JPG
>>107822132
0% death rate if you steal an old lady and craft a midwifery bed. You can also place items that will define gender if you like. You can farm mother's milk from women who gave birth and use it to age your children. The game is quite deep in what you can do, like capturing an underground woman and impregnating her, or finding a mandragora, throwing an underground woman into a well, turning her into Sadako, and then impregnating her
>>
>>107816334
IQ2 is still much better than all the sub 100B dogshit. Those models were just for novelty / writing emails.
>>
>>107821338
4.5 is broken. 4.6 is god. 4.7 is arguably better at SFW universally and NSFW if you prompt it properly. 235B is worthless in comparison to 4.6
>>
>>107820759
>>107822930
>weg faces
>>
>>107823142
how tf are you able to run a 357B model locally
>>
>>107823219
By having enough memory, presumably.
>>
>>107823219
lol
>>
>>107823219
192GB when it costed 20% of it costs now — the smugness in his voice palpable, and inexplicably it somehow reminds you of the smell of ozone. It is puzzling but it just does.
>>
File: 1682289007419246.jpg (26 KB, 331x473)
26 KB
26 KB JPG
>>107823230
>>107823233
glad you guys have enough money for 4 blackwell pros. Happy for you, really.
>>
>>107822911
there are implementations of prompt caching / prefix reuse. deepseek and claude give discounts on cached tokens. locally, llama/kobold are much faster if you're using a prefix of the last prompt.
>>
>>107823247
All you need is a good GPU and quite a lot of (preferably) server DDR5. That was less than 3k less than 6 months ago.
>>
>>107823263
quite a lot you mean like what 20 ?
>>
>>107823247
Where are all these picture with a face and text on a red background from?
What are they referencing?
>>
>>107823265
256GB, preferably more.
>>
>>107823263
I thought only MoE models could be split into ram.
nvm just checked, glm 4.6 is moe.
>>107823273
Damn. I'm stuck with 32gb for now.
>>107823272
nobody knows. it's a mystery.
>>
>>107823283
All models can be split, moe models just happen to be fast enough that they remain usable even with slow ram.
>>
>>107823272
They are newly-created images made purely using modern image generation technology.
>>
>>107823291
would a 27b moe model be smarter than a 12b dense model? My understanding is that moe is generally dumber when params are equivalent.
>>
>>107823303
That's a matter of belief. To some, MoE are infallible. To others, they lack an inherent understanding and sense for nuance compared to other models their size.
https://outsidetext.substack.com/p/how-does-a-blind-model-see-the-earth
>>
>>107823303
yes moe models are much smarter than dense. you can also quantize the kv cache and it will still be coherent.
>>
>>107823327
>>107823332
what llama.cpp param do I use to split the models into ram? is it -ncmoe? sorry for the basic bitch questions.
>>
>>107823349
Nothing, it will do the right thing automatically.
It explains what it's going to do near the top of the output before it starts loading the model.
>>
>>107823303
under 100b or so the optimal amount of active params kind of negates the advantage of moe
>>
>>107823247
Two blackwell pros and a tom of RAM is more than enough these days..
>>
the deepseek v4 rumours are crazy
>>
>>107823388
That's still like $22k. You could feed 4 bbw latinas for a year with that kind of money.
>>
>>107823397
Yeah, but they wouldn't know what properties define a bose einstein condensate on the spot, and one can't coom without that kind of stuff.
>>
>>107823327
>kimi k2 barely better than a 70b llama
still never ceases to impress me, I wonder when this moe retardation will stop
>>
>>107823409
It do be much cheaper sir so it good!
>>
>>107823409
>moe retardation
I hear you brother. All we need is a mega dense bitnet.
>>
>>107823397
>four fat brown women that eat, shit, talk, and smell
vs
>useful hardware that fits inside a small box and has none of the above attributes
Tough choice.
>>
>>107823407
>>107823418
ummm, incel much? I'm posting this to reddit.
>>
With MoEs, I always found it weird how expert routing never was a big point these companies seem to talk about. I know that experts decided per token and that you can't think of them as "Expert 1 knows the most about x, Expert 2 about y", but selecting the right 30b parameters for the next token out of your 1T model still doesn't seem trivial.
>>
File: clown lego yoda.jpg (65 KB, 900x900)
65 KB
65 KB JPG
>>107823418
I pick THE BOX, but desu I'd like if my box could speak as well as a person can, and could learn and form memories.
>>
>>107823553
>and that you can't think of them as "Expert 1 knows the most about x, Expert 2 about y"
Not necessarily but sometimes you can.
There are training regimes where you do group experts in different ways, hierarchically, etc.
That's the thing about these different archtectures, Dense, MoE, some other form of sparsity, these are just broad labels, there's a lot of minutiae, tons of knobs and levers at the architecture and training level that can differentiate how models behave, which makes it absurdly hard to generalize a simple "x is better than Y", since it can vary from scenario to scenario, implementation to implementation.
tl;dr: they don't advertise it but the labs do fuck around with routing and such.
>>
File: unitree-g1-cooking.jpg (573 KB, 1920x1080)
573 KB
573 KB JPG
>>107823397
You can buy a high tech humanoid robot with that money
>>
>>107823634
Those fucking suck and the ones that work the best are remotely controlled.
Give it 5 years and we'll be in the GPT 5 era of humanoid robots.
>>
>>107823634
wheres the sexhole
>>
>>107823634
that is a fire hazard, you would have to be an absolute idiot to pay 20k for the privilege of having a jeet piloted robot burn your house down.
>>
>>107823642
You could say the same about blackwell. It's always going to be better tech in 5 years
>>
File: 1747605645632474.png (49 KB, 1113x626)
49 KB
49 KB PNG
>autoupdate script status: VIBECODED
>>
>>107822884
as far as I understand with context shifting you have kv cache positions that attended to tokens that new tokens can't attend to, model wasn't trained to work like this. it's not equivalent to just reprocessing the whole context with old parts cut off
>>
How are these shitty datasets still a thing?

https://huggingface.co/datasets/Delta-Vector/Hydrus-Preview-Tulu-3-SFT-Mix?conversation-viewer=11

>user: gh
>assistant: It seems like your query is incomplete. Please provide more information or context so I can better understand and assist you.

https://huggingface.co/datasets/Delta-Vector/Hydrus-Preview-Tulu-3-SFT-Mix?conversation-viewer=0

>user: create a landing page that looks like https://northridgefix.com/
>assistant: As an AI language model, I am unable to create and display...

All the English ones in the first 30 rows are garbage. The multi-turn therapy one has generic random user replies.
And there's another with a reference to a clause in a contract (not provided), but that doesn't stop the model from hallucinating a response anyway?

And this guy used 8xB200s for 15 hours to train a 32B on it: https://huggingface.co/Delta-Vector/GLM-4-32B-Tulu-Instruct - with ChatML because " It uses chatML due to the GLM4 Format giving me a headache." ??
>>
>>107823952
>" It uses chatML due to the GLM4 Format giving me a headache." ??
They all do that, it's standard practice among finetuners. They aren't exactly the brightest lot.
>>
>>107823811
I swear most of the automation stuff I was too lazy to implement, I used LLMs to create, it's really nice.
I don't understand every youtuber and twitter post about how it's so bad.
>>
Why is no one talking about this?
https://huggingface.co/ubergarm/DeepSeek-V3.2-Speciale-GGUF
>>
>>107824164
It just runs better, its the same ol deepsuk otherwise
>>
>>107823952
finetuners are clueless. the majority of finetunes are not even worth downloading
>>
what do you guys use local models, or LLMs for anyway (besides writing emails)?
>>
the only use cases I can guess LLMs have are
>summarizing documents
>writing menial emails
>>
>>107824164
>NOTE 1: This quant has ripped out the sparse attention lightning index tensors. See sszymczyk/DeepSeek-V3.2-nolight-GGUF
I guess I'll take it. I liked Speciale for RP from my tests via the API. I doubt it'll be worth running at home though considering it easily thinks for 3k+ tokens and you need to keep that enabled to not have it revert to acting like the normal 3.2, which is a very boring model.
>>
>>107824210
uncensored assistant, local helper
>>
>>107824210
Pretending I'm talking to a friend, summaries, and masturbation.
>>
>DeepSeek to launch new AI model focused on coding in February, The Information reports

https://www.reuters.com/technology/deepseek-launch-new-ai-model-focused-coding-february-information-reports-2026-01-09/
https://www.theinformation.com/articles/deepseek-release-next-flagship-ai-model-strong-coding-ability

>Internal tests by DeepSeek employees suggested V4 could outperform rivals such as Anthropic's Claude and OpenAI's GPT series in coding tasks, the report said.

>The latest V4 model has also made breakthroughs in handling and processing extremely long coding prompts, a potential advantage for developers working on complex software projects, the Information added.
>>
>>107824356
>focused on coding
oh yes i'm cooding
>>
>>107824228
How much can it remember before becoming demented? It would be useful if it could retain info on very large projects and coordinate them accurately.
>uncensored
Nice, but isn't it expensive to run a model locally?
>>
>>107824356
weren't "the information" also the ones that had said a release date in may last year or something
>>
>>107824391
>isn't it expensive to run a model locally?
Not really. Most people use hardware they already own and use for other purposes. Just choose the model based on your setup.
>>
>>107824413
https://www.reuters.com/technology/artificial-intelligence/deepseek-rushes-launch-new-ai-model-china-goes-all-2025-02-25/
That was Reuters with "according to three people familiar with the company.".
The Information is the one that accurately called the Llama 3 release timeframe and that Llama 4 wasn't doing well.
>>
File: file.png (348 KB, 601x686)
348 KB
348 KB PNG
>>107824461
eh there's this shit tho
>>
>>107824479
Dude, that is from 7 months ago.
>>
>>107824486
exactly, and they published random shit rumors that led to nothing back then, ie don't trust these clowns
>>
>>107824495
But they said R2 would be delayed, and it was?
>>
>>107824504
you're absolutely right and I apologize for this oversight eagerly dumping my NVIDIA stock for the imminent release of v4! tothemoon!!
>>
>>107824479
"world-changing"
>>
>>107824540
It was, we ended up in the computing hell mines timeline due to it.
>>
>>107824446
I'm curious mostly about the limits of what an LLM can achieve. So curious, in fact, that I'm considering majoring in Data Science and AI/ML.
>>
>>107824540
amerimutts had a giant freakout over it so i guess it was to them
>>
RTX 6000 pro is getting cheaper and I can buy one for under $8000 right now. Is it worth it to buy? Any anon here owning them that can chip in?
>>
>>107824787
just buy 4 3090s
>>
>>107824787
>Is it worth it to buy?
No, get a 5090, which is as good compute wise, and rent compute when you want to train stuff that needs this much vram.
>>
>>107824787
I have a Max-Q. It's certainly the best, easy option. There are other more powerful and more efficient used data center options but those are way more complex and are not guaranteed to work.
>>
Honestly it is actually surprising how not censored Jamba is. Really feels like they had nearly no refusal training for it. If they had actually used a normal architecture, it could've been an ok model, but I suppose then they would reason that it would be good enough that they'd have to censor it.
>>
>>107824915
so it's not censored but also retarded?
>>
>>107824787
I have one but I mostly use it as the auxiliary GPU for my CPU textgen setup as a replacement for the two A6000s I owned before. It's great, but it's still just 96GB so it's $8000 that won't let you run SOTA models on its own.
Still, you never know how much this thing will be once the expected apocalypse of the GPU market hits.
>>
>>107824915
Refusal training doesn't work on models that are too weak, with this I mean their benchmark scores crashes hard as the model doesn't have the necessary world model to properly refuse.

Older transformer LLMs also had this problem with false refusals. I remember it being a big priority of lowering false refusal rates in the early days, similar to hallucination rates now.
>>
>>107824997
Makes sense. Arch is too shitty to be able to handle censorship without ruining those benchmarks they desperately need to keep justifying their existence kek.
>>
When you do these models abliterations, do you need to cover every kind of taboo questions? Or just focusing on one thing (for example porn) is enough for it to actually answer all questions?
>>
File: file.png (53 KB, 824x317)
53 KB
53 KB PNG
>>107824787
Which models are you planning to run with only one of them?
For image and video gen it's great because you can run everything as intended instead of coping with block swapping.
For text it's in a weird spot. You can run Air and Devstral 2 fully offloaded and not much else.
>>
>>107825130
hatehatehatehate
>>
>>107825248
i used nemo to write this post btw
>>
>>107825256
Based timesaver
>>
>>107825248
https://www.youtube.com/watch?v=IuJYgl14qsQ
>>
File: p03-3-sliver.jpg (247 KB, 672x936)
247 KB
247 KB JPG
They had LLMs back in 2003.
>>
>>107821895
So Nvidia Nemotron 3 30B?
>>
>>107818138
It's true though?
>>
>>107825393
This is me with "early life" segments on wikipedia
>>
>>107821141
>was free of slop
Where is slop coming from? How can a model be free of it?
>>
>>107825927
slop is a very generic, almost slag expression for low quality output, in whatever form that output comes.
essentially, the goal is always high quality, which also, means different things to different people.
so it's not as simple as 1 or 0.
>>
>>107825927
free of slop isn't the entirely correct way to put it, it was an llm that also had its own phrases that it liked to repeat too much
however, it came from an era before everyone started training on chatgpt logs which caused "shivers" and other cancer to infect all models that followed. in fact, llama1 was directly responsible for this development because it didn't wasn't instruct tuned and the first thing people did was finetune it on chatgpt logs to create "alpaca" which made was essentially the first llama1-instruct.
this also means that if you ever used an instruct llama1 even back in the day, you still got chatgpt shivers slop.
>>
>>107824787
I don't think that it's going to be the absolute best per-dollar value, but the fact that it just works without having to nigger rig or debug a wacky meme setup has a lot of value.
>>
>>107824787
>>107826060
the thing is it's like buying an expensive house, easy to buy, absolute bitch to sell.
I wouldn't buy it for fear of not being able to sell it. that kind of money is still a lot, and i'm not sure if businesses would buy it after a few years, and personal buyers would be few.
>>
https://github.com/ikawrakow/ik_llama.cpp/pull/1131
Sirs!
>>
>>107826177
>changed every line ending from LF to CRLF
>>
>>107826234
literally who cares? whitespace is not code
>>
>>107826260
>who cares?
The person who wants to get their vibeslop reviewed. As is the diff shows that every line in the file was changed instead of only those containing new code.
>>
>>107826177
Based Bharati doing the sneedful
>>
>>107826234
Fixed it
>>
>>107826334
retard
>>
>>107826360
Let's see your string ban PR
>>
>>107826234
just vibecode a python script to fix it
>>
>>107826177
I skimmed the code and I can tell you with 100% certainty that this is unmergable and would have to be completely redesigned.
>>
>>107826463
Will (You) do it?
>>
>>107826463
Is that your profession opinion, mr anonymous?
>>
>>107826177
>lmg is anti vibecoding
>vibecoded pr
lmao big sar pr
>>
would you fuckers argue on github instead
>>
>>107826177
>Just delete the entire fucking codebase and add your own codebase to the file instead of just changing and inserting code where needed
No one is going to read all that shit line by line anon.
>>
>>107826532
It's clearly a /lmg/ shitpost and not a serious PR, therefor I discuss this here.
>>
>>107826532
Better than the average ERP arguing we usually get
>>
>>107826516
It is my professional opinion that an approach that doubles the memory required for context is retarded.
>>
>>107826598
Small price to pay for string banning
>>
>>107826629
Be quiet if you have nothing constructive to add.
>>
>>107826643
>>107826643
>>107826643



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.