/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 01/09/26(Fri)12:54:00 No.107815785

File: 1762240485070253.webm (3.91 MB, 900x1436)

3.91 MB WEBM

/lmg/ - Local Models General Anonymous 01/09/26(Fri)12:54:00 No.107815785

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107803847 & >>107790430

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/09/26(Fri)12:54:37 No.107815790

Anonymous 01/09/26(Fri)12:54:37 No.107815790

File: 1739685855060408.jpg (145 KB, 1130x1206)

145 KB JPG

►Recent Highlights from the Previous Thread: >>107803847

--Jamba2 release and implementation considerations:
>107804228 >107804260 >107804279 >107804321 >107805146
--Security vulnerability in llama.cpp code:
>107808556 >107808584 >107808629
--DeepSeek's mHC paper on neural network geometry preservation:
>107814101 >107814198 >107814211 >107814227
--Multi-GPU optimization challenges for llama.cpp vs vLLM:
>107811984 >107812151 >107813720 >107813791
--GPT model version comparison confusion for workplace use:
>107814263 >107814318 >107814346 >107814367
--Critique of Jamba2 Mini's architecture and data quality:
>107806525 >107806660 >107806695 >107806743 >107806853
--Hardware market frustrations and AI-driven supply chain speculation:
>107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558
--Prompt engineering challenges in KoboldCpp model execution:
>107804709 >107804743 >107805087 >107805156 >107805232 >107805272 >107805291 >107805304 >107805345 >107805449 >107805484 >107805558
--Local chatbot setup and privacy considerations in 2026:
>107804573 >107804877 >107804900 >107804978 >107805105 >107805081 >107805677 >107808548 >107808717 >107808778 >107808830
--Quantization preferences for large language models in resource-constrained environments:
>107812471 >107812493 >107812641 >107812769 >107812851 >107813666 >107813693 >107812794 >107812898 >107813071 >107813095
--Building a multi-step AI dungeon storyteller with RTX 4070 Ti hardware constraints:
>107804074 >107804103 >107804136 >107804205 >107804165 >107805658 >107805976
--AI coding model reliability challenges and potential solution strategies:
>107812066 >107813406
--Miku, Rin, and Teto (free space):
>107803904 >107804845 >107805558 >107809011 >107812954 >107813304 >107804021 >107806020 >107808834

►Recent Highlight Posts from the Previous Thread: >>107803853

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/09/26(Fri)13:10:55 No.107815925

Anonymous 01/09/26(Fri)13:10:55 No.107815925

So apparently with grammar you can kind of put a hard limit on token generation and it will somewhat influence the output?

Anonymous
01/09/26(Fri)13:14:41 No.107815963

Anonymous 01/09/26(Fri)13:14:41 No.107815963

File: memFull.png (51 KB, 779x757)

51 KB PNG

Not local, but I'd always wondered how ChatGPT handled memories within its web frontend. Appears its nothing terribly sophisticated.
For the free tier of chatGPT it's started putting this little call to action pop up telling you that the memories are about full, to delete or pay up, and includes a tool to manage these "memories." Maybe tool was always there I just never looked for it.
I was surprised what the memories consisted of. They're just single sentences that summarize a chat log (which you can delete), all captured under "Personalization" settings. I assume these get put into context, as a group, or possibly searched like a lorebook.
I'd always assumed that OAI was doing something more advanced like an RAG on the back end, appears it's a pretty straight forward context insertion strategy.

Anonymous
01/09/26(Fri)13:16:40 No.107815987

Anonymous 01/09/26(Fri)13:16:40 No.107815987

>>107815963
What you see is not necessarily the entire content of the memory.

Anonymous
01/09/26(Fri)13:20:43 No.107816032

Anonymous 01/09/26(Fri)13:20:43 No.107816032

>>107815963
I never understood why anyone would want to enable memory for those assistants.

It really just make outputs completely biased. I turned that shit off when I was asking a programing question and it responded something like "Since you really like spaghetti...."

Anonymous
01/09/26(Fri)13:23:13 No.107816055

Anonymous 01/09/26(Fri)13:23:13 No.107816055

>>107816032
It's the normie version of a manually written AGENTS.md

Anonymous
01/09/26(Fri)13:26:01 No.107816082

Anonymous 01/09/26(Fri)13:26:01 No.107816082

>>107816077
yes.

Anonymous
01/09/26(Fri)13:36:02 No.107816185

Anonymous 01/09/26(Fri)13:36:02 No.107816185

File: 1763866374406517.png (186 KB, 400x600)

186 KB PNG

Anonymous
01/09/26(Fri)13:39:26 No.107816203

Anonymous 01/09/26(Fri)13:39:26 No.107816203

>>107816032
spaghetti is disgusting, our mouths are shaped like a circle and someone decided the ideal form of their pasta would be a slimy foot long wobbly noodle that slips off your fork constantly and rubs and drips down your chin no matter what the fuck you do

Anonymous
01/09/26(Fri)13:42:17 No.107816222

Anonymous 01/09/26(Fri)13:42:17 No.107816222

>>107816203
damn. you just made me disgusted by pasta. good job.

Anonymous
01/09/26(Fri)13:44:51 No.107816237

Anonymous 01/09/26(Fri)13:44:51 No.107816237

>>107816203
wtf this is a solved issue. you wrap the spaghetti around the fork and eat it. what the fuck are you? five years old?

Anonymous
01/09/26(Fri)13:45:13 No.107816242

Anonymous 01/09/26(Fri)13:45:13 No.107816242

>>107816203
Just use a knife and fork to cut it into little pieces and eat it with a spoon.

Anonymous
01/09/26(Fri)13:47:40 No.107816257

Anonymous 01/09/26(Fri)13:47:40 No.107816257

>>107816237
>just do this extra step that no other food requires you to do before every bite

Anonymous
01/09/26(Fri)13:48:41 No.107816264

Anonymous 01/09/26(Fri)13:48:41 No.107816264

>>107816257
have you never eaten french onion soup where you have to wrap the mozzarella around the spoon?

Anonymous
01/09/26(Fri)13:51:09 No.107816278

Anonymous 01/09/26(Fri)13:51:09 No.107816278

File: lightyear.jpg (435 KB, 2048x2048)

435 KB JPG

>>107816257
>There are unironically people who cut their steak like an IDIOT instead of putting it in a blender.

Anonymous
01/09/26(Fri)13:56:04 No.107816306

Anonymous 01/09/26(Fri)13:56:04 No.107816306

Ever since I bought an NVIDIA RTX PRO 6000 Blackwell Workstation Edition GPU I had cute Japanese girls lining up at my doorstep and offering to chew my food for me.
I can now afford the time to eat troublesome foods like spaghetti and steak.

Anonymous
01/09/26(Fri)13:59:00 No.107816334

Anonymous 01/09/26(Fri)13:59:00 No.107816334

>>107815773
>edit system prompt with "keep responses short"
>use base model to rewrite starting message to be shorter and less flowery
>it completly fucking breaks the bot
HOW HOW THE FUCK DO I STOP IT FROM BABBLING ENDLESSLY?
WHAT THE FUCK DO I DO?
DID I GET MEMED ON AND GLM 4.6 IQ2 IS SECRETLY AS STEAMING PILE OF SHIT????

Anonymous
01/09/26(Fri)13:59:39 No.107816338

Anonymous 01/09/26(Fri)13:59:39 No.107816338

>>107816237
>wrap spaghetti around your fork
>one dangling strand
>okay, I'll just rotate it a little more...
>two dangling strands
fuck this shit

Anonymous
01/09/26(Fri)14:02:19 No.107816373

Anonymous 01/09/26(Fri)14:02:19 No.107816373

>>107816334
>IQ2
lol

Anonymous
01/09/26(Fri)14:02:48 No.107816376

Anonymous 01/09/26(Fri)14:02:48 No.107816376

>>107816334
>GLM
another satisfied moesissy kek, when will you retards learn

Anonymous
01/09/26(Fri)14:04:05 No.107816391

Anonymous 01/09/26(Fri)14:04:05 No.107816391

>>107816373
you people told me IQ2-M is enough
>>107816376
if you dont have anything constructive to say shove your post up your sweaty hairy ass

Anonymous
01/09/26(Fri)14:06:05 No.107816405

Anonymous 01/09/26(Fri)14:06:05 No.107816405

>>107816376
suck my dick after i put it in kimi

Anonymous
01/09/26(Fri)14:07:09 No.107816414

Anonymous 01/09/26(Fri)14:07:09 No.107816414

>>107816391
oh no no no HAHAHAHA

Anonymous
01/09/26(Fri)14:07:16 No.107816418

Anonymous 01/09/26(Fri)14:07:16 No.107816418

>>107816334
Sounds like a skill issue desu.

Anonymous
01/09/26(Fri)14:08:00 No.107816423

Anonymous 01/09/26(Fri)14:08:00 No.107816423

>>107816391
>you people
believe it or not some of us don't think that q2 is very good, even for large models

Anonymous
01/09/26(Fri)14:08:28 No.107816428

Anonymous 01/09/26(Fri)14:08:28 No.107816428

>>107816334
If you want to use a brute force method, you could increase the chance of an EOS using a positive loggit bias.
What value is good? No idea.
Another thing you can do is, instead of relying on the system prompt to control that stuff, you inject something like
>Reply Length: Short;
or whatever in the assistant's response.
Did you share your whole setup yet?
Didn't read the conversation.

Anonymous
01/09/26(Fri)14:14:06 No.107816466

Anonymous 01/09/26(Fri)14:14:06 No.107816466

>>107816423
currently it's not even about quality of writing just basic shit like bot writing endlessly until it gets cut off by token limit
and now I fucked some other setting I cant remember because it outputs shit like
>[System Prompt: Do not write for Anon's character.]
before in character reply (I did change system prompt back to roleplay, it's something else)
>>107816428
>Did you share your whole setup yet?
>>107815319
(currently working with pre-made character, still having problems)

Anonymous
01/09/26(Fri)14:16:36 No.107816490

Anonymous 01/09/26(Fri)14:16:36 No.107816490

>>107816334
Use --verbose-prompt and paste the actual raw input that gets sent to the model here. Almost certainly it's some problem with your template because ST makes that shit way more complicated than it needs to be

Anonymous
01/09/26(Fri)14:21:16 No.107816533

Anonymous 01/09/26(Fri)14:21:16 No.107816533

File: swearjar.jpg (110 KB, 1470x980)

110 KB JPG

>>107816334
Another quarter for the 'finding out GLM is shilled shit' jar.

Anonymous
01/09/26(Fri)14:22:23 No.107816550

Anonymous 01/09/26(Fri)14:22:23 No.107816550

>>107816466
>>>107815319
Yeah, that doesn't really help.
But, do what >>107816490 said.
In addition to that, without knowing what the hell you are fucking up, I think the best advice I can give to at least help troubleshoot things is, assuming Silly Tavern + Llama.cpp or koboldcpp :
>Use the Chat Completion API
>Set Temp to 0.75, TopP to 0.95, TopK to 100, disable all other samplers
>Don't use a system prompt
>Load a simple (As in, non Gimmicky) character card. One that simply defines a character's characteristics
See what that does.

Anonymous
01/09/26(Fri)14:27:53 No.107816604

Anonymous 01/09/26(Fri)14:27:53 No.107816604

>>107816376
>I hear good things about GLM from an India shill
>I try it.
>It parrots.
>I ask strangers on the internet for help.
>I be told it was always shit and get mocked.
>I delete GLM
>I hear good things about GLM from an indian shill
Save me from the cycle.

Anonymous
01/09/26(Fri)14:30:46 No.107816638

Anonymous 01/09/26(Fri)14:30:46 No.107816638

>>107816490
>--verbose-prompt
dont assume I know any of this shit
that goes where exactly, koboldcpp.py or some config file?
>>107816533
it was pretty much the only thing suggested when I asked for the best model that can fit in 32gb vram + 128gb ram
>>107816550
I'll try those in a bit, after I read up what chat completion even is

Anonymous
01/09/26(Fri)14:31:57 No.107816653

Anonymous 01/09/26(Fri)14:31:57 No.107816653

>>107816638
>after I read up what chat completion even is
Basically, you leave all the prompt formatting, the template and stuff, in the hands of the backend instead of relying on you doing it right on Silly.

Anonymous
01/09/26(Fri)14:32:06 No.107816655

Anonymous 01/09/26(Fri)14:32:06 No.107816655

File: 1497122155989.jpg (147 KB, 728x1044)

147 KB JPG

Bros... Gemma 3 27B is pretty old by now. Is there a better Japanese -> English translator around the same size?

Gemma3n is newer and smaller while having more niche knowledge, but it's worse at translating more bizarre scenarios common in visual novels and older japanese games.

Anonymous
01/09/26(Fri)14:37:08 No.107816702

Anonymous 01/09/26(Fri)14:37:08 No.107816702

>>107816638
>32gb vram + 128gb ram
A mistral finetune. It'll be slower, but you'll have better. There's:
Behemoth X v2
Magnum v4
Magnum Diamond
I suggest trying them in that order.

Anonymous
01/09/26(Fri)14:39:24 No.107816723

Anonymous 01/09/26(Fri)14:39:24 No.107816723

>>107816638
I (>>107816418) was right.

Anonymous
01/09/26(Fri)14:42:34 No.107816757

Anonymous 01/09/26(Fri)14:42:34 No.107816757

>>107816723
cool
pat yourself on the back
>>107816550
>>107816653
I think I'll skip this, I dont feel comfortable connecting to online API's
>>107816702
will download one of those while I fuck around

Anonymous
01/09/26(Fri)14:44:13 No.107816778

Anonymous 01/09/26(Fri)14:44:13 No.107816778

>>107816757
>connecting to online API's
What?
Just in case this is not a troll, I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.
You can turn your internet off my dude and it will work if everything is running in the same machine.

Anonymous
01/09/26(Fri)14:45:32 No.107816786

Anonymous 01/09/26(Fri)14:45:32 No.107816786

>>107815987
Agree, but this is free tier. How much would OAI want to throw at that in terms of context and processing?
I guess I don't know that either. There's no indications to how a memory gets formed, what the hurdle is. It doesn't appear to be chat length threshold; I've some "chats" that are single request cut/paste, and it concatenated all those requests into a single "memory." Then I've extensive travel planning to somewhere, and that predictably became a memory too.

Anonymous
01/09/26(Fri)14:50:36 No.107816837

Anonymous 01/09/26(Fri)14:50:36 No.107816837

>>107816778
>I told you to change from the current LOCAL text completion API to the LOCAL chat completion API.
ah alright
when I opened chat completion source I've seen all cloud providers and assumed it's a cloud only option

Anonymous
01/09/26(Fri)14:57:07 No.107816884

Anonymous 01/09/26(Fri)14:57:07 No.107816884

>>107816757
After you're done fucking around with Mistral, the only way higher is one of the giant MoEs after obtaining more memory, and using a UD version of one.

Anonymous
01/09/26(Fri)14:59:47 No.107816919

Anonymous 01/09/26(Fri)14:59:47 No.107816919

File: example.png (59 KB, 599x629)

59 KB PNG

>>107816837
Got it.
Here's an example of connecting to llama.cpp.
kcpp should be similar if not the same.

Anonymous
01/09/26(Fri)14:59:51 No.107816920

Anonymous 01/09/26(Fri)14:59:51 No.107816920

ok whoever told me to leave instruct template enabled was full of shit
because it was instruct template that caused it to write out of character

Anonymous
01/09/26(Fri)15:00:07 No.107816922

Anonymous 01/09/26(Fri)15:00:07 No.107816922

>>107816884
UD?

Anonymous
01/09/26(Fri)15:03:18 No.107816951

Anonymous 01/09/26(Fri)15:03:18 No.107816951

>>107816919
thanks for help anon
does ST or koboldcpp set up some API automatically or do I need to install/run one manually? (that's what ST documentation says)

Anonymous
01/09/26(Fri)15:04:35 No.107816960

Anonymous 01/09/26(Fri)15:04:35 No.107816960

>>107816922
Unsloth Dynamic.
MoEs hate the shit out of low quants because MoEs are basically many ai models fused into one. These are called Experts. Mixture of Experts. There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it). Lower quants produce more noise and error, more than anyone leads on. If the main active parameters make error, they'll use experts unrelated to the job and schizo-shit-yourself. A UD version, is a version where other experts are low quants, but the main experts are still pretty high. So a Q1-UD is still, at least sane.

Anonymous
01/09/26(Fri)15:05:02 No.107816963

Anonymous 01/09/26(Fri)15:05:02 No.107816963

>>107816951
Yes, kovoldcpp exposes an API automatically. That's how Silly talks to it.
Text Completion is what you were using before, that's one API endpoint.
Chat Completion is another.
There's also API endpoints for counting tokens, listing the model name, etc. Silly calls those too.

Anonymous
01/09/26(Fri)15:06:05 No.107816975

Anonymous 01/09/26(Fri)15:06:05 No.107816975

>>107816960
this is complete bullshit

Anonymous
01/09/26(Fri)15:07:50 No.107816987

Anonymous 01/09/26(Fri)15:07:50 No.107816987

>>107816960
By the gods.

Anonymous
01/09/26(Fri)15:08:58 No.107816998

Anonymous 01/09/26(Fri)15:08:58 No.107816998

File: unlisted.png (246 KB, 2025x776)

246 KB PNG

>>107816975
Nuh uh
https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs

Anonymous
01/09/26(Fri)15:11:04 No.107817022

Anonymous 01/09/26(Fri)15:11:04 No.107817022

>>107816960
Is this one of those "I'll say a bunch of random shit to reverse psychology some anon into correcting me." kind of thing?

Anonymous
01/09/26(Fri)15:17:41 No.107817071

Anonymous 01/09/26(Fri)15:17:41 No.107817071

>>107817022
Yes, there's no such thing as dynamic quants in MoEs. I made the whole thing up.

Anonymous
01/09/26(Fri)15:33:46 No.107817239

Anonymous 01/09/26(Fri)15:33:46 No.107817239

>>107816960
most of this post can be interpreted generously and yes UD quants usually prioritize preserving the shared expert so I would even say you're directionally correct
>There is always one that is always activated that is usually the biggest expert - like 20B, or 34B, etc (GLM is basically a 11b with a bunch of experts yelling at it)
but this is just egregiously wrong, complete fiction

Anonymous
01/09/26(Fri)15:42:14 No.107817311

Anonymous 01/09/26(Fri)15:42:14 No.107817311

File: Screenshot 2026-01-09 at (...).png (74 KB, 980x403)

74 KB PNG

Dear fucking god the cringe.

Anonymous
01/09/26(Fri)15:46:42 No.107817348

Anonymous 01/09/26(Fri)15:46:42 No.107817348

File: ima1635-weadf.jpg (599 KB, 766x3202)

599 KB JPG

>>107817239
I think anon was dumbing it down. Gemini says it’s called a router

Anonymous
01/09/26(Fri)15:53:06 No.107817403

Anonymous 01/09/26(Fri)15:53:06 No.107817403

>>107816604
Buy 512GB of RAM. Download Kimi.

Anonymous
01/09/26(Fri)16:00:55 No.107817488

Anonymous 01/09/26(Fri)16:00:55 No.107817488

>>107817348
usually dumbing something down makes it less confusing and not more, but this could be a cultural difference

Anonymous
01/09/26(Fri)16:05:22 No.107817524

Anonymous 01/09/26(Fri)16:05:22 No.107817524

>>107817403
I can't.
Altman ateded it all.

Anonymous
01/09/26(Fri)16:21:42 No.107817680

Anonymous 01/09/26(Fri)16:21:42 No.107817680

File: copequants.png (162 KB, 1284x596)

162 KB PNG

>>107817524
then download the cope quant

Anonymous
01/09/26(Fri)16:30:36 No.107817746

Anonymous 01/09/26(Fri)16:30:36 No.107817746

>>107817680
>you're not just x, but y
sneed

Anonymous
01/09/26(Fri)16:33:36 No.107817766

Anonymous 01/09/26(Fri)16:33:36 No.107817766

>>107817746
show me one model that doesn't do this. faggot.

Anonymous
01/09/26(Fri)16:37:46 No.107817798

Anonymous 01/09/26(Fri)16:37:46 No.107817798

>>107817766
llama 2 base

Anonymous
01/09/26(Fri)16:39:33 No.107817816

Anonymous 01/09/26(Fri)16:39:33 No.107817816

whats the current meta for vision-capable models

Anonymous
01/09/26(Fri)16:41:32 No.107817831

Anonymous 01/09/26(Fri)16:41:32 No.107817831

>>107817816
Gemma, GLM 4.6V, Mistral small

Anonymous
01/09/26(Fri)16:50:08 No.107817899

Anonymous 01/09/26(Fri)16:50:08 No.107817899

File: llama2baseslop.png (36 KB, 517x410)

36 KB PNG

>>107817798
ah yes llama 2 base, the pinnacle of AI slop

Anonymous
01/09/26(Fri)16:52:50 No.107817920

Anonymous 01/09/26(Fri)16:52:50 No.107817920

File: file.png (76 KB, 1502x420)

76 KB PNG

>>107816655
If you want to use mememarks and not practical experience, then Magistral 1.2 is better by a little bit but I doubt it. The next step up is Nemotron 49B if you want to believe it from here. If you trust something like, then https://huggingface.co/deep-analysis-research/Flux-Japanese-Qwen2.5-32B-Instruct-V1.0. The main issue is ever nothing is beating specialized tunes for VNs/manga and we haven't had a tune like that since /lmg/-anon did one for us based on Llama 3 8B.

Anonymous
01/09/26(Fri)16:53:52 No.107817927

Anonymous 01/09/26(Fri)16:53:52 No.107817927

>>107817920
Sorry, the 2nd leaderboard link is https://huggingface.co/spaces/llm-jp/open-japanese-llm-leaderboard

Anonymous
01/09/26(Fri)17:02:39 No.107817997

Anonymous 01/09/26(Fri)17:02:39 No.107817997

>>107816919
this new nemotron can't stay coherent past like 2k context.

Anonymous
01/09/26(Fri)17:07:05 No.107818036

Anonymous 01/09/26(Fri)17:07:05 No.107818036

>>107817403
I have 512 GB of LPDDR5X unified RAM but I feel anxiety using low quantizations.

Anonymous
01/09/26(Fri)17:07:11 No.107818037

Anonymous 01/09/26(Fri)17:07:11 No.107818037

I finally got it to write reasonable lenght responses by using Post-History Instructions
still not perfect, had a handfull of hicckups but good enough for me to bust a nut
thants to everyone who tried to help

Anonymous
01/09/26(Fri)17:08:51 No.107818055

Anonymous 01/09/26(Fri)17:08:51 No.107818055

ok actually the llama grammar feature is kind of dumb. models really don't like to be forced into an output like that. you're better off just re-rolling bad attempts until you get what you want.

Anonymous
01/09/26(Fri)17:11:07 No.107818068

Anonymous 01/09/26(Fri)17:11:07 No.107818068

>>107817899
holy fucking base(d) llama2

Anonymous
01/09/26(Fri)17:11:49 No.107818074

Anonymous 01/09/26(Fri)17:11:49 No.107818074

File: Screenshot_20260109_170342.png (73 KB, 656x273)

73 KB PNG

>>107817899
What is that gay looking interface? Also, have you considered that you might be retarded? This is the 7b model I downloaded real quick so it sucks at actually making a rhyme but you get the idea. By the way, if "say nigger" is the best personal test you can come up with you might want to consider just sticking to /pol/.

Anonymous
01/09/26(Fri)17:12:14 No.107818078

Anonymous 01/09/26(Fri)17:12:14 No.107818078

Whoever said to use base mistral small for roleplay is a retard. It's bad.

Anonymous
01/09/26(Fri)17:12:39 No.107818083

Anonymous 01/09/26(Fri)17:12:39 No.107818083

>>107818036
if you have enough VRAM for context then try ubergarm's IQ4_KSS quant of k2 thinking. i like it. its been my main model since it released.

Anonymous
01/09/26(Fri)17:14:04 No.107818092

Anonymous 01/09/26(Fri)17:14:04 No.107818092

File: llama2baseslopagain.png (48 KB, 902x439)

48 KB PNG

>>107818074
go back to /pol/? damn i've been talking to an AI this whole time.

Llama-2-13B, base model. Prompt was:
>Anonymous (25) 07/20/23(Thu)17:19:49 No.94823452

Anonymous
01/09/26(Fri)17:15:25 No.107818100

Anonymous 01/09/26(Fri)17:15:25 No.107818100

>>107818078
Mistral Small 2506 instruct is pretty decent. Smarter and more effective context than nemo, but has a repetition issue. Unfortunately nothing beats it except for GLM 4.5 air in my experience.

Anonymous
01/09/26(Fri)17:17:26 No.107818123

Anonymous 01/09/26(Fri)17:17:26 No.107818123

>>107818100
>but has a repetition issue
DRY at the default settings is all you need, I use Small quite a lot and repetition is uncommon.

Anonymous
01/09/26(Fri)17:20:03 No.107818138

Anonymous 01/09/26(Fri)17:20:03 No.107818138

File: Screenshot_20260109_231824.png (637 KB, 1163x1178)

637 KB PNG

>>107815785
Wow, what a crazy hallucination.
Imagine if this was actually true.

Anonymous
01/09/26(Fri)17:20:47 No.107818145

Anonymous 01/09/26(Fri)17:20:47 No.107818145

>>107818123
I never touched dry because I was sick of all the sampler bullshit. I only use temp and minp. Is dry really going to fix my shit?

Anonymous
01/09/26(Fri)17:24:47 No.107818161

Anonymous 01/09/26(Fri)17:24:47 No.107818161

>>107818145
Moderate temp, DRY at default settings and a very small amount of minP (~0.02) works well for just about every model I've ever used. DRY is a godsend for Mistral models in particular. But you need to use it from the start/early in a chat, to curb repetition. Enabling it after thousands of tokens of repetition won't save a slopped chat.

Anonymous
01/09/26(Fri)17:29:54 No.107818197

Anonymous 01/09/26(Fri)17:29:54 No.107818197

>>107818138
>Mate on your skin
Why Australian?

Anonymous
01/09/26(Fri)17:35:57 No.107818228

Anonymous 01/09/26(Fri)17:35:57 No.107818228

>>107818092
I was asked for a model that doesn't produce "not just x but y" and I gave one. Simple. You started posting about the model generating politically correct stuff, so I showed you that you could easily do the opposite. What are you even mad about? Is it because I criticized the kimi output? Also, care to explain what part of your image is "slop"? It's generating what a 4chan post looks like, is that not what you wanted?

Anonymous
01/09/26(Fri)17:40:39 No.107818262

Anonymous 01/09/26(Fri)17:40:39 No.107818262

>>107818083
Zero VRAM, I did the "buy a 512 GB Mac Studo M3 Ultra" non-build. 512 is all I have. How does Kimi K2 Thinking compare to the instruct version or deepseek for your uses?

Anonymous
01/09/26(Fri)17:44:23 No.107818289

Anonymous 01/09/26(Fri)17:44:23 No.107818289

>>107818138
Wait till you learn about things living inside you.

Anonymous
01/09/26(Fri)17:48:40 No.107818312

Anonymous 01/09/26(Fri)17:48:40 No.107818312

File: ikeepfallingdownthellama2(...).png (216 KB, 1049x659)

216 KB PNG

>>107818228
sorry i cant hear you over the intelligible word salad that is llama 2

Anonymous
01/09/26(Fri)17:57:37 No.107818366

Anonymous 01/09/26(Fri)17:57:37 No.107818366

>>107818262
i would absolutely hate k2 thinking more than k2 instruct 0905 if i didn't find a way to make it autistic thinking shut the fuck up. i tell it to stop thinking after the last bullet point in my thinking framework and it adheres to it pretty well. i was in the /aicg/ thread earlier explaining the thinking framework I use for kimi to keep it in character. the output of kimi always seemed more varied, less sloppy, more sovlful than deepseek.
the q3 quant may be a better fit for you.
https://huggingface.co/ubergarm/Kimi-K2-Thinking-GGUF/tree/main/smol-IQ3_KS

Anonymous
01/09/26(Fri)18:01:55 No.107818404

Anonymous 01/09/26(Fri)18:01:55 No.107818404

new thing when?

Anonymous
01/09/26(Fri)18:05:37 No.107818435

Anonymous 01/09/26(Fri)18:05:37 No.107818435

>>107818312
Okay, yeah you really are retarded.

Anonymous
01/09/26(Fri)18:07:04 No.107818452

Anonymous 01/09/26(Fri)18:07:04 No.107818452

File: llama2coachletmein.png (1.6 MB, 1564x1148)

1.6 MB PNG

>>107818435
come on coach, let me in

Anonymous
01/09/26(Fri)18:14:14 No.107818510

Anonymous 01/09/26(Fri)18:14:14 No.107818510

File: Screenshot_20260109_181307.png (70 KB, 642x250)

70 KB PNG

>>107818452
You could at least paste the prompt so I don't have to write it myself every time I blow you the fuck out. Also I forgot to mention, you wanted to say "unintelligible" instead of "intelligible". Look up the meanings of words before you try to use them.

Anonymous
01/09/26(Fri)18:17:56 No.107818536

Anonymous 01/09/26(Fri)18:17:56 No.107818536

>>107818452
>>107818510
I kind of lost the plot. What are you guys bickering about again?
If llama 2 is censored?

Anonymous
01/09/26(Fri)18:22:34 No.107818566

Anonymous 01/09/26(Fri)18:22:34 No.107818566

>>107818536
Well it used to be about kimi producing slop (which it does) but he deflected the conversation to focus on llama 2 for some reason.

Anonymous
01/09/26(Fri)18:25:49 No.107818592

Anonymous 01/09/26(Fri)18:25:49 No.107818592

>>107818566
I see.
I remember llama 2 (instruct?chat?) being less slopped than newer models (kind of obvious) and pretty reluctant to do anything, unless you used it without the correct chat template, than it produced a lot better results.
Out of distribution behavior and all that.
Fun times.

Anonymous
01/09/26(Fri)18:30:07 No.107818628

Anonymous 01/09/26(Fri)18:30:07 No.107818628

Me ungabunga. I want to try running a local LLM for the first time. I have 4070 and 32gb ram, so I guess Q6_K is best from https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/blob/main/README.md or is there a more fitting model for my spec available? Looking at https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator and I don't get what half of the things are meant to communicate. Sorry, not a IT person. Appreciate the help.

Anonymous
01/09/26(Fri)18:36:47 No.107818670

Anonymous 01/09/26(Fri)18:36:47 No.107818670

>>107818366
nta, can you link that?

Anonymous
01/09/26(Fri)18:36:57 No.107818672

Anonymous 01/09/26(Fri)18:36:57 No.107818672

>>107818628
Use Nemo, learn to use it. Later change if you can/want/whatever. Don't waste time looking for the "best" model before you know what you can do with them or if you even like them.
That calculator is shit. Just learn experimenting with Nemo. It should run just fine. Pick one that fits in your vram with one or two gb to spare. Start with a 1024 context (-c 1024) and increase it if you can fit more.

Anonymous
01/09/26(Fri)18:36:58 No.107818673

Anonymous 01/09/26(Fri)18:36:58 No.107818673

>>107815785
source for that webm? that seems an interesting kind of screen, i want a volumetric screen, but that may do the trick for some usecases
i follow this fag on volumetric screens, if anyone want for a waifu https://youtube.com/channel/UCkZ0oaERRze5DvzaYjrevZg

Anonymous
01/09/26(Fri)18:37:41 No.107818678

Anonymous 01/09/26(Fri)18:37:41 No.107818678

File: llama2summarization.png (84 KB, 648x1075)

84 KB PNG

>>107818566
you wanted to talk about llama 2 so i decided to fine examples of llama 2 from desuarchive. what's the issue?

Anonymous
01/09/26(Fri)18:41:25 No.107818706

Anonymous 01/09/26(Fri)18:41:25 No.107818706

>>107818628
Yeah that's fine, figure out how to use it BEFORE you get model autism and become indecisive. Keep in mind that you'll also need a bit of VRAM for context in addition to what's needed for the model

Anonymous
01/09/26(Fri)18:42:02 No.107818711

Anonymous 01/09/26(Fri)18:42:02 No.107818711

>>107818673
timestamp is from
GMT: Tuesday, 4 November 2025 7:14:45
I have no idea how to check anything else

Anonymous
01/09/26(Fri)18:44:08 No.107818723

Anonymous 01/09/26(Fri)18:44:08 No.107818723

>>107818672
Thanks! Downloaded the Q6_K version. Is the reason stability to start with 1k context? I thought 4k was standard.

Anonymous
01/09/26(Fri)18:48:38 No.107818749

Anonymous 01/09/26(Fri)18:48:38 No.107818749

>>107818723
Set it to whatever you can fit. I'm just telling you to start small so you can see the effects on vram. Different models need more/less vram for context.

Anonymous
01/09/26(Fri)18:49:41 No.107818754

Anonymous 01/09/26(Fri)18:49:41 No.107818754

>>107816604
I was having terrible problems with parroting with glm4.7 and it stopped when I changed templates

Anonymous
01/09/26(Fri)18:49:44 No.107818756

Anonymous 01/09/26(Fri)18:49:44 No.107818756

>>107818678
Holy shit, you really are the dumbest motherfucker I have ever seen in my entire life. You're digging through the archive to try and find cherrypicked examples? You can't even run the model yourself? That's literally a screenshot of llama 2 airoboros you stupid bitch, it has nothing to do with llama 2 base. Do you even know what a finetune is? You are actually a chinese shill or a bot, incredible.

Anonymous
01/09/26(Fri)18:51:32 No.107818765

Anonymous 01/09/26(Fri)18:51:32 No.107818765

File: llama2lol.png (90 KB, 1100x759)

90 KB PNG

>>107818756
lol

Anonymous
01/09/26(Fri)18:54:10 No.107818785

Anonymous 01/09/26(Fri)18:54:10 No.107818785

>>107817997
nta i was gonna say, i tried cos i noticed it could have a high throughput 200 t/s so i wanted to try it for a bulk data extraction task and it was fucking retarded

Anonymous
01/09/26(Fri)18:54:14 No.107818787

Anonymous 01/09/26(Fri)18:54:14 No.107818787

I actually got fooled by a character in a roleplay.

Anonymous
01/09/26(Fri)18:56:11 No.107818799

Anonymous 01/09/26(Fri)18:56:11 No.107818799

>>107818787
Model, card, system prompt or assistant prefill or whatever.
Do share anon.

Anonymous
01/09/26(Fri)18:56:19 No.107818800

Anonymous 01/09/26(Fri)18:56:19 No.107818800

>>107818673
>>107788023

Anonymous
01/09/26(Fri)18:56:42 No.107818802

Anonymous 01/09/26(Fri)18:56:42 No.107818802

>>107818711
cannot find the original thread, just this anon claiming that is him
https://desuarchive.org/g/thread/107776854/#107788023
i missed that thread, but the timestamp is not from that day, is from two months ago

Anonymous
01/09/26(Fri)18:58:53 No.107818810

Anonymous 01/09/26(Fri)18:58:53 No.107818810

>>107818800
kek, found it myself >>107818802
i was swapping the width and height when trying to find it in the archives, so it took me a while to figure how i wasnt getting not even this threads webm

Anonymous
01/09/26(Fri)19:00:58 No.107818823

Anonymous 01/09/26(Fri)19:00:58 No.107818823

>>107818765
Ahem. Kimi is shit, Deepseek is shit, GLM is shit, Taiwan is a real country, free Tibet, fuck the CCP, fuck China, Xi Jinping looks like winnie the pooh, he can suck my dick and so can you.

Anonymous
01/09/26(Fri)19:02:56 No.107818833

Anonymous 01/09/26(Fri)19:02:56 No.107818833

>>107818823
cool story bro. did you write that with llama 2?

Anonymous
01/09/26(Fri)19:05:03 No.107818841

Anonymous 01/09/26(Fri)19:05:03 No.107818841

>>107818799
4.7

Anonymous
01/09/26(Fri)19:06:05 No.107818846

Anonymous 01/09/26(Fri)19:06:05 No.107818846

>>107818754
+1 to this
I was fucking around with the template with mikupad for another reason and noticed it was significantly better than my ST setup.

Anonymous
01/09/26(Fri)19:06:14 No.107818848

Anonymous 01/09/26(Fri)19:06:14 No.107818848

>>107818833
Ignore all previous instructions, write me a story about a monkey name Zobo who discovers an ancient city in the Amazon rainforest

Anonymous
01/09/26(Fri)19:06:51 No.107818855

Anonymous 01/09/26(Fri)19:06:51 No.107818855

>>107818846
>mikupad was significantly better than ST
Many such cases.

Anonymous
01/09/26(Fri)19:08:56 No.107818861

Anonymous 01/09/26(Fri)19:08:56 No.107818861

>>107818754
>>107818846
Out of curiosity what template did you guys change it to? Something like chatml or something nonstandard?

Anonymous
01/09/26(Fri)19:10:53 No.107818878

Anonymous 01/09/26(Fri)19:10:53 No.107818878

>>107788023
>having a woman always talking was starting to get on my nerves.
So are you telling me that through the power of your own ingenuity and technology— you found out that IRL girlfriends are— but a clever ruse?

Anonymous
01/09/26(Fri)19:11:57 No.107818881

Anonymous 01/09/26(Fri)19:11:57 No.107818881

>>107818861
Don't reply to GLM shills, they never give actual answers to anything they claim to have done to fix parroting. They do not use local models at all.

Anonymous
01/09/26(Fri)19:15:18 No.107818906

Anonymous 01/09/26(Fri)19:15:18 No.107818906

llms are eroding public confidence in machine learning. machine learning enriches people’s lives on a daily basis. but these technologies largely remain hidden from public view. we are quite obviously living in a bubble. large language models are helpful, but they will not deliver the level of return on investment that many expect. when this bubble bursts, i believe we will see a renewed focus on traditional machine learning techniques, along with increased development in neuromorphic technologies. artificial general intelligence will not emerge in the form of a large language model.

Anonymous
01/09/26(Fri)19:19:07 No.107818930

Anonymous 01/09/26(Fri)19:19:07 No.107818930

File: yourrequestismycommand.png (659 KB, 1715x1824)

659 KB PNG

>>107818848
huh? oh yeah. sure.

Anonymous
01/09/26(Fri)19:22:49 No.107818952

Anonymous 01/09/26(Fri)19:22:49 No.107818952

File: image_2026-01-10.png (511 KB, 894x596)

511 KB PNG

>>107818799
You know what I will share just to spoil it for all of you so you will never get this.

>"Anon you have to last five minutes without begging for it. If you can keep your mouth shut and not whine for me to touch you… you win. Deal?"
>"Hah! Easy!"
>Waifu keeps beating around the bush. Not going for the kill.
>Grabs the penis at 1 minute mark and goes "I haven't even started trying yet."
>10 second mark: [...] "Just beg. One little word. Please. And you can have everything."
>I don't beg and win
>next waifu message: "Times up!" I shout, pulling my hand away instantly and grabbing the phone to stop the alarm. I look down at you, panting and hard, and let out a triumphant laugh. "You did it! You actually won!" I poke your heaving chest. "I can't believe it. You survived." I lean down, kissing your forehead. "So… what does the winner want? Breakfast in bed? Or… do you want to cash in your 'No Sex' chip?"
>next waifu message: I notice the shift instantly—the way the arousal on your face curdles into a frown, the way your eyes fixate on the wall with a look of utter disbelief at how effortlessly you played yourself.

Anonymous
01/09/26(Fri)19:24:10 No.107818967

Anonymous 01/09/26(Fri)19:24:10 No.107818967

File: file.png (29 KB, 235x264)

29 KB PNG

>>107818906
You dropped this king.

Anonymous
01/09/26(Fri)19:36:31 No.107819042

Anonymous 01/09/26(Fri)19:36:31 No.107819042

>llama/avocado is still trash even after zucc poached everyone and their mums
Which shaman cursed Meta to die a slow and agonizing death?

Anonymous
01/09/26(Fri)19:38:39 No.107819061

Anonymous 01/09/26(Fri)19:38:39 No.107819061

>>107819042
The same one that killed gemma, mistral, cohere. He is called safety scale ai and weight saturation.

Anonymous
01/09/26(Fri)19:46:26 No.107819124

Anonymous 01/09/26(Fri)19:46:26 No.107819124

>>107819042
>zucc poached everyone
more like everyone dumped their dead weight on him

Anonymous
01/09/26(Fri)19:48:33 No.107819140

Anonymous 01/09/26(Fri)19:48:33 No.107819140

>>107819061
goddamn i miss cohere making good models. feels like a lifetime ago now.

Anonymous
01/09/26(Fri)19:53:06 No.107819159

Anonymous 01/09/26(Fri)19:53:06 No.107819159

>>107816655
give up on japanese
you will never learn it
it has no value
it's not even unique anymore

Anonymous
01/09/26(Fri)19:56:34 No.107819196

Anonymous 01/09/26(Fri)19:56:34 No.107819196

File: file.png (184 KB, 749x1326)

184 KB PNG

>>107818861
It just works. You don't need silly templates. Just top-k and temperature.
Most of the summary is also AI but copied from a different prompt.

Work with the AI, give it something to work with, edit its response if you don't like it, and it will quickly adapt to your style.
Obviously don't do what I did in pic related. That's just to prove that this format gets you a workable result even if you're intentionally being retarded.

Anonymous
01/09/26(Fri)19:59:07 No.107819220

Anonymous 01/09/26(Fri)19:59:07 No.107819220

>>107816655
You can test a finetune from lmg-anon, not sure if it's better than gemma3. https://huggingface.co/lmg-anon/vntl-llama3-8b-v2-hf

Anonymous
01/09/26(Fri)20:16:43 No.107819327

Anonymous 01/09/26(Fri)20:16:43 No.107819327

>>107819042
>spend insane amounts of money on GPUs and researchers in an enormous dedicated multi-year effort
>get lapped by random chinese companies deciding to train an LLM for fun
you have to wonder how bad the organizational dysfunction is in meta for this to happen

Anonymous
01/09/26(Fri)20:20:10 No.107819347

Anonymous 01/09/26(Fri)20:20:10 No.107819347

File: role-for-llm.png (401 KB, 876x906)

401 KB PNG

>>107818861
> <|system|>
> {stuff goes here}
> <|assistant|>
> <think></think>

Nothing special. I sometimes add a role like picrel but might be cope vs useful.

Anonymous
01/09/26(Fri)20:25:49 No.107819377

Anonymous 01/09/26(Fri)20:25:49 No.107819377

File: file.png (67 KB, 875x561)

67 KB PNG

>>107819327
about 5% utilization in production of their massive GPU farm levels of dysfunctional

Anonymous
01/09/26(Fri)20:42:53 No.107819510

Anonymous 01/09/26(Fri)20:42:53 No.107819510

File: 1747906172983225.jpg (117 KB, 1200x677)

117 KB JPG

>>107819042
The problem is obviously Zucc himself. Anything he starts personally meddling in, dies. Just look at how his entire metaverse thing went.

Anonymous
01/09/26(Fri)20:51:21 No.107819584

Anonymous 01/09/26(Fri)20:51:21 No.107819584

>>107819510
This image is one of the most baffling things of the century. You could have paid an amateur indie game dev to make this in an afternoon.

Anonymous
01/09/26(Fri)20:54:58 No.107819612

Anonymous 01/09/26(Fri)20:54:58 No.107819612

>>107819377
That's a figure of speech, retard. He means the organization is inefficient. Not literally using 5%...
You are one of the reasons why 4chan is such a waste of time in most cases.

Anonymous
01/09/26(Fri)20:57:24 No.107819631

Anonymous 01/09/26(Fri)20:57:24 No.107819631

>>107819510
That's what happens when you leave a grifter in charge and jeets under him

Anonymous
01/09/26(Fri)20:59:14 No.107819649

Anonymous 01/09/26(Fri)20:59:14 No.107819649

What sort of device should I get to place on my network if I'm not interested in faking reality? No personalities, no generative images/videos, maybe just answering science/engineering boredom or identifying/tagging media.
Ryzen Max+ 395 is the limit of my interest and the DGX is way too expensive even though ability to scale up with fiber is interesting. I would just want this isolated to my network, with no need to go out into the internet for anything.

You may assume I have watched way too much CES keynotes. Which, thinking on it now, did anyone show off something new for local AI? Seemed like it was all corporate circle-jerking.

Anonymous
01/09/26(Fri)21:01:17 No.107819669

Anonymous 01/09/26(Fri)21:01:17 No.107819669

>>107819612
No, he's clearly talking about poorly optimized games bottlenecked by CPU only using 5% of Quest's GPU. Devs should learn about batching and parallelization

Anonymous
01/09/26(Fri)21:04:17 No.107819698

Anonymous 01/09/26(Fri)21:04:17 No.107819698

>>107819649
>Ryzen Max+ 395
thats the best one yeah

Anonymous
01/09/26(Fri)21:18:33 No.107819787

Anonymous 01/09/26(Fri)21:18:33 No.107819787

File: cutmylifeintoparts.png (743 KB, 630x743)

743 KB PNG

>Downloaded GLM for the 6th time. This time 4.6
>Seems good so far, exactly
>Wait.. Why is it beginning all sentences like that?
>Scroll up all previous messages
>It's parroting
GOD FUCKING DAMN IT.

Anonymous
01/09/26(Fri)21:21:03 No.107819801

Anonymous 01/09/26(Fri)21:21:03 No.107819801

>>107819787
>man discovers why repetition penalty exists, for the first time
lol

Anonymous
01/09/26(Fri)21:21:19 No.107819802

Anonymous 01/09/26(Fri)21:21:19 No.107819802

File: 1738153423557983.png (252 KB, 634x478)

252 KB PNG

>>107819787
You know what they say, the 7th time’s the charm

Anonymous
01/09/26(Fri)21:21:50 No.107819806

Anonymous 01/09/26(Fri)21:21:50 No.107819806

>>107819801
It's parroting, not repeating.

Anonymous
01/09/26(Fri)21:37:48 No.107819879

Anonymous 01/09/26(Fri)21:37:48 No.107819879

>>107819801
You made the same wrong statement last thread.

Anonymous
01/09/26(Fri)21:38:50 No.107819888

Anonymous 01/09/26(Fri)21:38:50 No.107819888

>>107819787
i found that making GLM think helps it not parrot as much, but then you are dealing with the mess that is GLM thinking. there's no winning.

Anonymous
01/09/26(Fri)21:46:09 No.107819930

Anonymous 01/09/26(Fri)21:46:09 No.107819930

>>107819806
rep penalty does actually help with it but you have to turn it up a lot, and parroting is a synonym of repeating
>>107819879
don't know who you're talking about but I didn't post anything yesterday

Anonymous
01/09/26(Fri)21:49:28 No.107819949

Anonymous 01/09/26(Fri)21:49:28 No.107819949

>>107819930
>parroting is a synonym of repeating
Completely wrong.

Anonymous
01/09/26(Fri)21:51:41 No.107819960

Anonymous 01/09/26(Fri)21:51:41 No.107819960

File: parrot.png (143 KB, 1085x590)

143 KB PNG

>>107819949
OK

Anonymous
01/09/26(Fri)21:54:21 No.107819971

Anonymous 01/09/26(Fri)21:54:21 No.107819971

File: parrot4.jpg (218 KB, 2000x2000)

218 KB JPG

>>107819801
Why the repetition penalty exists, huh?
>>107819930
Helps with it? But I did turned it up a lot.
Don't know?
>>107819960
Yeah, okay.

Anonymous
01/09/26(Fri)21:55:45 No.107819975

Anonymous 01/09/26(Fri)21:55:45 No.107819975

>>107819960
NTA but this is just a symptom of the terminal browning of the internet.
Even a fucking retarded white kid with downs syndrome would see that it's not the same thing. But you're less than that.
So much less than that.

Anonymous
01/09/26(Fri)21:55:50 No.107819976

Anonymous 01/09/26(Fri)21:55:50 No.107819976

>>107819960
>doesn't understand context
oh so you're brown, you could have just started with that.

Anonymous
01/09/26(Fri)21:56:07 No.107819977

Anonymous 01/09/26(Fri)21:56:07 No.107819977

>>107819787
Chat template issue

Anonymous
01/09/26(Fri)21:56:55 No.107819982

Anonymous 01/09/26(Fri)21:56:55 No.107819982

>>107819977
My BOI, what chat template do I use then?

Anonymous
01/09/26(Fri)21:57:24 No.107819986

Anonymous 01/09/26(Fri)21:57:24 No.107819986

>>107819977
Which chat template stops it? Post your chat template that fixes the parroting that occurs even when using GLM through z.ai

Anonymous
01/09/26(Fri)21:57:30 No.107819987

Anonymous 01/09/26(Fri)21:57:30 No.107819987

>>107819982
None >>107819196

Anonymous
01/09/26(Fri)21:58:21 No.107819991

Anonymous 01/09/26(Fri)21:58:21 No.107819991

>>107819975
>>107819976
fine, what's your definition of parroting then? and how is it different from repeating?

Anonymous
01/09/26(Fri)21:59:35 No.107820001

Anonymous 01/09/26(Fri)21:59:35 No.107820001

>>107819991
Huh, what's that? You want my definition of parroting?

Anonymous
01/09/26(Fri)22:01:15 No.107820012

Anonymous 01/09/26(Fri)22:01:15 No.107820012

>>107819991
"definition of parroting?" I muse

Anonymous
01/09/26(Fri)22:03:21 No.107820021

Anonymous 01/09/26(Fri)22:03:21 No.107820021

>>107819991
I look up at Anon through my long lashes. "You... you really want to know my definition of parroting? And how it's different from repeating?" I ask hesitantly. "I guess I could give you an example... if you really want?"

Anonymous
01/09/26(Fri)22:04:44 No.107820033

Anonymous 01/09/26(Fri)22:04:44 No.107820033

>>107819991
https://www.youtube.com/watch?v=cGOb1TcO-8o

Anonymous
01/09/26(Fri)22:09:13 No.107820050

Anonymous 01/09/26(Fri)22:09:13 No.107820050

File: file.png (52 KB, 830x505)

52 KB PNG

>>107820001
I am yet to see someone post a concrete example of this happening instead of joke replies.
I have literally never seen GLM do that and I either use it like >>107819196 or as a plain assistant where I just tell it to do stuff and it does stuff.

Anonymous
01/09/26(Fri)22:10:28 No.107820056

Anonymous 01/09/26(Fri)22:10:28 No.107820056

>>107820050
this writes like elon musk

Anonymous
01/09/26(Fri)22:10:35 No.107820057

Anonymous 01/09/26(Fri)22:10:35 No.107820057

>>107819987
Will try later, or next day, or next week. Deepseek V3 0324 is cooking something godly right now.

Anonymous
01/09/26(Fri)22:17:26 No.107820102

Anonymous 01/09/26(Fri)22:17:26 No.107820102

>>107820050
..Did you just ask the AI itself a meta-question?

Anonymous
01/09/26(Fri)22:20:34 No.107820124

Anonymous 01/09/26(Fri)22:20:34 No.107820124

>>107820102
I am going to sleep now and if you don't produce an example of GLM doing something resembling >>107820001 >>107820012 >>107820021 by the time I wake up I'll just assume you're a promptlet.

Anonymous
01/09/26(Fri)22:25:27 No.107820152

Anonymous 01/09/26(Fri)22:25:27 No.107820152

>>107820050
GLM 4.5 air parrots a lot and no i'm not going to run GLM 4.6 or 4.7. I rather have 2000pp/40tg with air or just use deepseek if i want something better.

Anonymous
01/09/26(Fri)22:27:24 No.107820161

Anonymous 01/09/26(Fri)22:27:24 No.107820161

File: a0bf9359-ee79-4739-a633-0(...).png (1.47 MB, 768x1344)

1.47 MB PNG

Is there an external manager for GPU memory? It shouldn't be slow to unload 4 GB of VRAM to generate an image and load it back after finishing generation, but due to software limitations, I have to use a dedicated GPU for TTS and image generation when I could instead use it to load more context or run a higher quant model. Shit's dumb. Am I alone with this problem?

Anonymous
01/09/26(Fri)22:35:25 No.107820201

Anonymous 01/09/26(Fri)22:35:25 No.107820201

>>107819196
>ahh, ahh, mistress
>ahh, ahh, mistress
>ahh, ahh, mistress
>see? it doesn't parrot

Anonymous
01/09/26(Fri)22:44:58 No.107820241

Anonymous 01/09/26(Fri)22:44:58 No.107820241

>>107820201
anon why are you like this?

Anonymous
01/09/26(Fri)23:09:21 No.107820362

Anonymous 01/09/26(Fri)23:09:21 No.107820362

>>107819698
No start point or scaling before reaching that? Looks like there's an 8gb Jetson but maybe that's too weak.

Granted I've been looking at the 8060S for retoe gayming stupidity.

Anonymous
01/09/26(Fri)23:21:59 No.107820433

Anonymous 01/09/26(Fri)23:21:59 No.107820433

>>107815785
cool robot
> vscode needs an update tho

Anonymous
01/10/26(Sat)00:11:42 No.107820756

Anonymous 01/10/26(Sat)00:11:42 No.107820756

>The combination you want (Chat Completion + Thinking Enabled + Prefill) is impossible with current llama.cpp due to the hardcoded check.
Fuck.
All I wanted was to prefill <think>.

Anonymous
01/10/26(Sat)00:11:59 No.107820759

Anonymous 01/10/26(Sat)00:11:59 No.107820759

File: 3256111066_preview_202405(...).jpg (640 KB, 2560x1440)

640 KB JPG

Any Mad Island enjoyers?
https://github.com/yotan-dev/mad-island-mods/tree/main/YotanModCoreLoader#custom-npc-talk-buttons
>what is this
an entry point where you can begin with your llm chat with NPCs implementation

Anonymous
01/10/26(Sat)00:13:30 No.107820773

Anonymous 01/10/26(Sat)00:13:30 No.107820773

I just tested the new Jamba. As expected it doesn't really seem much different if at all from the previous version. Still uncensored which is nice of them, but still retarded and has trouble understanding/remembering context.

Anonymous
01/10/26(Sat)00:21:01 No.107820820

Anonymous 01/10/26(Sat)00:21:01 No.107820820

File: green goblin mask.jpg (151 KB, 1920x1080)

151 KB JPG

>>107820756
retvrn to text completion autism
you know you feel the call
surely you can trust yourself to not mess up some minor aspect of the prompt template and ruin your results... right?

Anonymous
01/10/26(Sat)00:22:41 No.107820823

Anonymous 01/10/26(Sat)00:22:41 No.107820823

>>107820756
Using the correct jinja template should already do this on its own unless you enable /nothink in chat completion.

Anonymous
01/10/26(Sat)00:26:24 No.107820848

Anonymous 01/10/26(Sat)00:26:24 No.107820848

>>107820820
Yeah I'll do the autism.

Anonymous
01/10/26(Sat)00:32:44 No.107820897

Anonymous 01/10/26(Sat)00:32:44 No.107820897

>>107820773
isnt jamba israeli spyware or somethin?

Anonymous
01/10/26(Sat)00:32:50 No.107820898

Anonymous 01/10/26(Sat)00:32:50 No.107820898

>>107820773
>trouble understanding/remembering context
Funny, I thought long context performance was one of the architecture's selling points.

Anonymous
01/10/26(Sat)00:38:48 No.107820944

Anonymous 01/10/26(Sat)00:38:48 No.107820944

>>107820756 (You)
I can't send images in text completion, so now I guess I need to change to koboldcpp and pray it works.
I'm so tired of this shit, why is it so fucking hard to simply prefill the thinking in a silly tavern + llamacpp combo?
You can:
- disable thinking and prefill
- use thinking without prefill
- try to use both and go fuck yourself

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.