/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/08/25(Sat)17:21:41 No.107147210

File: 1745908814203796.jpg (1.12 MB, 1336x2008)

1.12 MB JPG

/lmg/ - Local Models General Anonymous 11/08/25(Sat)17:21:41 No.107147210

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107138606 & >>107129334

►News
>(11/07) Step-Audio-EditX, LLM-based TTS and audio editing model released: https://hf.co/stepfun-ai/Step-Audio-EditX
>(11/06) Kimi K2 Thinking released with INT4 quantization and 256k context: https://moonshotai.github.io/Kimi-K2/thinking.html
>(11/06) LocalSong 700M melodic instrumental music generation model released: https://hf.co/Localsong/LocalSong
>(11/05) MegaDLMs framework for training diffusion language models released: https://github.com/JinjieNi/MegaDLMs
>(11/01) LongCat-Flash-Omni 560B-A27B released: https://hf.co/meituan-longcat/LongCat-Flash-Omni

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/08/25(Sat)17:22:08 No.107147214

Anonymous 11/08/25(Sat)17:22:08 No.107147214

File: 1758348922203207.jpg (151 KB, 808x1144)

151 KB JPG

►Recent Highlights from the Previous Thread: >>107138606

--Agentic finetuning success with Gemma 3 27b using dataset duplication strategy:
>107140749 >107140853 >107140874 >107141186 >107141904 >107145572 >107145579 >107141303
--Model performance comparison and IF evaluation benchmark discussion:
>107145761 >107145774 >107145810 >107145849 >107146116 >107146184 >107146306 >107145947 >107145956
--Strategies for preserving Opus-3 model conversations before deprecation:
>107140145 >107140264 >107140360 >107140384
--Exploring free proxy models for logic/programming tasks and style transfer via LoRA:
>107140277 >107140356 >107140365 >107140399 >107140446 >107141293
--Single vs dual-GPU dilemma for performance vs power safety tradeoffs:
>107143867 >107143877 >107143878 >107143946 >107144867 >107144872 >107144155
--Sampling optimization debate for creative RP with minP/Top-P and temperature tuning:
>107139402 >107139418 >107139447 >107139500 >107139577 >107139540 >107139897 >107139915
--Llama training methodology and safety implications of validation set optimization:
>107140894 >107140932 >107141030 >107141086 >107141101
--Neural network depth and Gemini 1.2T model performance speculation:
>107145345
--Toss model performance vs Gemma 3 in practical applications:
>107145833 >107145904 >107146168
--Cydonia model performance comparisons and upcoming releases:
>107140380 >107140394 >107140486 >107141250 >107140397 >107140661 >107143958 >107143966 >107146415 >107146427 >107146449 >107146485 >107146506
--DDR4-6000 price spike frustrations and DDR5 transition speculation:
>107139738 >107139779 >107139792 >107139982 >107139985 >107142864 >107142896 >107143500
--Qwen data increases overfitting risk in CoT models:
>107140601
--Gemma finetuning results with QwQ's data: less neurotic, still verbose:
>107139425
--Miku (free space):
>107140392

►Recent Highlight Posts from the Previous Thread: >>107138613

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/08/25(Sat)17:24:26 No.107147228

Anonymous 11/08/25(Sat)17:24:26 No.107147228

>Toss model

Anonymous
11/08/25(Sat)17:25:56 No.107147241

Anonymous 11/08/25(Sat)17:25:56 No.107147241

Can someone post a QRD for setting up VibeVoice? What repo, what settings etc..

Anonymous
11/08/25(Sat)17:27:25 No.107147250

Anonymous 11/08/25(Sat)17:27:25 No.107147250

>>107147241
It's stuck in python hell so just use the official demos on huggingface or find a comfyui node or something

Anonymous
11/08/25(Sat)17:27:47 No.107147259

Anonymous 11/08/25(Sat)17:27:47 No.107147259

Can someone post a QRD for setting up Nemo? What fork, what temperature etc..

Anonymous
11/08/25(Sat)17:28:09 No.107147262

Anonymous 11/08/25(Sat)17:28:09 No.107147262

blos? is we over? >>107147122

Anonymous
11/08/25(Sat)17:29:34 No.107147277

Anonymous 11/08/25(Sat)17:29:34 No.107147277

>>107147259
stop doubting yourself and just do what you think is right. it'll work out, believe in yourself

Anonymous
11/08/25(Sat)17:30:35 No.107147288

Anonymous 11/08/25(Sat)17:30:35 No.107147288

File: 1762278954671840.mp4 (2.23 MB, 512x640)

2.23 MB MP4

>>107147241
I gotchu
https://github.com/vibevoice-community/VibeVoice?tab=readme-ov-file

Anonymous
11/08/25(Sat)17:32:31 No.107147295

Anonymous 11/08/25(Sat)17:32:31 No.107147295

Can someone post a QRD for improving confidence uwu? Which hustler's plan, which youtube channel etc..

Anonymous
11/08/25(Sat)17:33:43 No.107147304

Anonymous 11/08/25(Sat)17:33:43 No.107147304

anyone have any idea as to why sillytavern keeps deciding to insert every entry from the lorebook at the very beginning of each chat despite none of the trigger words being mentioned?

Anonymous
11/08/25(Sat)17:33:55 No.107147308

Anonymous 11/08/25(Sat)17:33:55 No.107147308

>>107147288
thank u anon, im gonna read the source code before installing to make sure we're safe

Anonymous
11/08/25(Sat)17:38:05 No.107147336

Anonymous 11/08/25(Sat)17:38:05 No.107147336

>>107147277
>average normalfag advice

Anonymous
11/08/25(Sat)17:38:17 No.107147339

Anonymous 11/08/25(Sat)17:38:17 No.107147339

>>107147295
I think you should try the Drummer plan! I tried ERP with the Rocinante's model and it helped me talk to white girls. Make sure to join our discord and look for the right channel for a better experience ;)
https://huggingface.co/TheDrummer/Rocinante-12B-v1.1

Anonymous
11/08/25(Sat)17:39:27 No.107147348

Anonymous 11/08/25(Sat)17:39:27 No.107147348

>>107147277
6'4" adonis's dating advice to 5'5" balding indian friend

Anonymous
11/08/25(Sat)17:39:31 No.107147352

Anonymous 11/08/25(Sat)17:39:31 No.107147352

>>107147241
Back up of the original repo here:
https://github.com/great-wind/MicroSoft_VibeVoice
1.5B is still up:
https://huggingface.co/microsoft/VibeVoice-1.5B
Torrent of the repo (dunno if still seeded):
magnet:?xt=urn:btih:b5a84755d0564ab41b38924b7ee4af7bb7665a18&dn=VibeVoice&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
Torrent for VibeVoice 7B:
magnet:?xt=urn:btih:d72f835e89cf1efb58563d024ee31fd21d978830&dn=microsoft_VibeVoice-Large&tr=udp%3a%2f%2ftracker.opentrackr.org%3a1337%2fannounce
Sampling with examples:
https://desuarchive.org/g/thread/106516368/#q106519850
https://desuarchive.org/g/thread/106516368/#q106519945
>>107147308
Good idea since the vibevoice-community repo has continued to be modified from the original and you don't know what was put into it since.

Anonymous
11/08/25(Sat)17:41:03 No.107147364

Anonymous 11/08/25(Sat)17:41:03 No.107147364

>>107147352
Thank you so so much anon <3
so so so so so much <3

Anonymous
11/08/25(Sat)17:41:19 No.107147367

Anonymous 11/08/25(Sat)17:41:19 No.107147367

>Let me write:
[1500 tokens]
>Wait, the user mentioned [minor detail], I should include that.
[1500 tokens]
>Hmm, I think I should expand the other part
[1500 tokens]
>Good, let's now write the reply:</think>
K2 is so amazing. The way it plans ahead is so thorough. I love it.

Anonymous
11/08/25(Sat)17:44:08 No.107147386

Anonymous 11/08/25(Sat)17:44:08 No.107147386

>>107147367
so you wait like 15 minutes before even seeing a single token

Anonymous
11/08/25(Sat)17:55:51 No.107147469

Anonymous 11/08/25(Sat)17:55:51 No.107147469

>>107147386
Nobody actually runs Kimi locally everyone just uses the website and/or API and then lies about using it locally.

Anonymous
11/08/25(Sat)17:56:59 No.107147479

Anonymous 11/08/25(Sat)17:56:59 No.107147479

>>107147469
so then everyone is a faggot?

Anonymous
11/08/25(Sat)17:57:49 No.107147487

Anonymous 11/08/25(Sat)17:57:49 No.107147487

>>107147479
UwU

Anonymous
11/08/25(Sat)17:59:15 No.107147500

Anonymous 11/08/25(Sat)17:59:15 No.107147500

>>107147469
It's really time to just rename this general to /omg/ - open model general and drop the retarded local pretense

Anonymous
11/08/25(Sat)18:00:44 No.107147512

Anonymous 11/08/25(Sat)18:00:44 No.107147512

>>107147500
I mean there's still lots of people in the thread that run models locally. But it's mostly just redditards that bother trying to run shit like kimi at 0.01 token/sec and drive up RAM prices in the process.

Anonymous
11/08/25(Sat)18:01:04 No.107147516

Anonymous 11/08/25(Sat)18:01:04 No.107147516

File: kimi_stats.png (81 KB, 1910x326)

81 KB PNG

>>107147367
>>107147386
I posted yesterday regarding Kimi's results. On one hand, if you let it think, the total response time (thinking+response) will typically range from anywhere from 3 minutes to 10 minutes on a mid-tier DDR5 cpumaxx machine. After some further testing, with thinking on, its really good. Completely unusable for quick goons but solid for RP. Its noticeably smarter (maybe because of QAT?) and more reigned in than K2-0905.
After some further experimentation today, it works with a prefilled thought process through Text-Completion which lets you skip the thinking all together. I need to do more testing, but preliminarily, its still smart. I'd say with a good thought prefill, it essentially is what Deepseek v3.1 Terminus should have been. I hope they benchmark its memory capabilities.
>>107147469
Why are you poor?

Anonymous
11/08/25(Sat)18:02:34 No.107147529

Anonymous 11/08/25(Sat)18:02:34 No.107147529

>>107147516
what is a mid-tier DDR5 cpumaxx machine to you?

Anonymous
11/08/25(Sat)18:05:36 No.107147559

Anonymous 11/08/25(Sat)18:05:36 No.107147559

>>107147469
>>107147500
its time for you two sisters, to fuck off to aicg

Anonymous
11/08/25(Sat)18:05:40 No.107147561

Anonymous 11/08/25(Sat)18:05:40 No.107147561

>>107147516
>Why are you poor?
I'm not poor.
I just don't see the value in spending as much as a new car on computer hardware just to run something I can run for free off of the website.

Anonymous
11/08/25(Sat)18:06:55 No.107147574

Anonymous 11/08/25(Sat)18:06:55 No.107147574

>>107147559
I've been a contributing member of this thread since day one so you can go fuck yourself you dumb retarded kike.

Anonymous
11/08/25(Sat)18:08:48 No.107147590

Anonymous 11/08/25(Sat)18:08:48 No.107147590

>>107147574
I'be been a comtwibuting membwr since day -20 wen llama laked before lmg was made

Anonymous
11/08/25(Sat)18:10:17 No.107147600

Anonymous 11/08/25(Sat)18:10:17 No.107147600

File: I hate jeets so much its (...).png (190 KB, 1115x1653)

190 KB PNG

>>107147469
>>107147500
Not even optimized either.

Anonymous
11/08/25(Sat)18:11:58 No.107147611

Anonymous 11/08/25(Sat)18:11:58 No.107147611

File: Gemini 3 🚀.png (1.26 MB, 1024x1024)

1.26 MB PNG

Gemini 3 when?

Anonymous
11/08/25(Sat)18:12:41 No.107147618

Anonymous 11/08/25(Sat)18:12:41 No.107147618

Why are antisemites always so angry.

Anonymous
11/08/25(Sat)18:13:08 No.107147623

Anonymous 11/08/25(Sat)18:13:08 No.107147623

>>107147618
Jewish behavior fatigue.

Anonymous
11/08/25(Sat)18:13:10 No.107147624

Anonymous 11/08/25(Sat)18:13:10 No.107147624

>>107147529
A 4800MHZ 768GB machine with 9334's/Xeons and a GPU or two for prompt processing. Granted I bought this when RAM was half the price it is now and saved up since 2023 in order to get it responsibly.
>>107147561
In a perfect world that probably still existed just 20 years ago, where people could differentiate between reality and fiction, companies weren't constantly trying to strip away user agency, and we didn't have outright malicious people enshittying everything to nickel and dime you at every turn, I would agree with you. Sadly, we don't live in that world.

Anonymous
11/08/25(Sat)18:13:22 No.107147630

Anonymous 11/08/25(Sat)18:13:22 No.107147630

>>107147605
Because their goal is to make normal conversation impossible, and by responding to them you are helping their cause.

Anonymous
11/08/25(Sat)18:14:58 No.107147650

Anonymous 11/08/25(Sat)18:14:58 No.107147650

>>107147624
is that 12 channel or 8 channel?

Anonymous
11/08/25(Sat)18:15:01 No.107147652

Anonymous 11/08/25(Sat)18:15:01 No.107147652

>>107147630
>their goal
Ah yes, the singular shared goal of all these individuals I don't like.

Anonymous
11/08/25(Sat)18:15:54 No.107147659

Anonymous 11/08/25(Sat)18:15:54 No.107147659

>>107147611
Aren't the angled thrusters suboptimal for vertical lift? It can turn more easily but I assume similar is achieved with straight thrusters anyway just by turning off thrust on the side you want to turn toward.

Anonymous
11/08/25(Sat)18:16:04 No.107147660

Anonymous 11/08/25(Sat)18:16:04 No.107147660

>>107147624
How much context fits on your GPU?

Anonymous
11/08/25(Sat)18:17:28 No.107147673

Anonymous 11/08/25(Sat)18:17:28 No.107147673

>>107147600
Now post the speed you get at 100k context loaded.

Anonymous
11/08/25(Sat)18:18:10 No.107147681

Anonymous 11/08/25(Sat)18:18:10 No.107147681

>>107147352
I checked the community repo, we are safe. Am I supposed to change the sample count in the demo_gradio.py? i dont see it in the gui

Anonymous
11/08/25(Sat)18:18:34 No.107147691

Anonymous 11/08/25(Sat)18:18:34 No.107147691

>>107147673
The goalposts are moving faster than datacenter API token generation.

Anonymous
11/08/25(Sat)18:19:24 No.107147707

Anonymous 11/08/25(Sat)18:19:24 No.107147707

>>107147600
im very envious of you anon, and im very happy and proud of you. enjoy local kimi, a thing us poorfag seethers like >>107147673 will never enjoy

Anonymous
11/08/25(Sat)18:20:52 No.107147729

Anonymous 11/08/25(Sat)18:20:52 No.107147729

>>107147691
It's ok. You can come back tomorrow when it finishes generating and report the speeds then.

Anonymous
11/08/25(Sat)18:22:38 No.107147745

Anonymous 11/08/25(Sat)18:22:38 No.107147745

File: _939b9f6c-d5fb-46ce-bc72-(...).jpg (97 KB, 859x859)

97 KB JPG

>>107147659
Saar this is peak Bharati engineering please understand.

Anonymous
11/08/25(Sat)18:22:54 No.107147748

Anonymous 11/08/25(Sat)18:22:54 No.107147748

>>107147605
Yes Anon, your post is the normal one and not the least bit unhinged.

Anonymous
11/08/25(Sat)18:23:34 No.107147756

Anonymous 11/08/25(Sat)18:23:34 No.107147756

>>107147650
8 Channel with 2 CPUs, so 16 theoretically. To be honest, if you were to get this now, I would go with Gen 5 EYPCs which are 12 channel and support 6400MHz DDR5 RAM.
>>107147660
36k, unquanted across 96GB of VRAM. Granted I use massive batch sizes (16k) in order to get faster pp so I could probably fit double that if I used the standard 4k.

Anonymous
11/08/25(Sat)18:28:50 No.107147800

Anonymous 11/08/25(Sat)18:28:50 No.107147800

>>107147707
Thanks anon. I hope GLM Air 4.6 comes out soon so povertybros have a decent safetyslopless option too.

Anonymous
11/08/25(Sat)18:34:24 No.107147842

Anonymous 11/08/25(Sat)18:34:24 No.107147842

File: -.png (995 KB, 1024x1024)

995 KB PNG

>>107147800

Anonymous
11/08/25(Sat)18:40:55 No.107147899

Anonymous 11/08/25(Sat)18:40:55 No.107147899

>>107147800
Maybe GLM just sucks at programming but I just asked 4.6 3K_M for help on doing what I thought was a straight forward Python decorator pattern and it got stuck in a thinking loop. I asked gemini (the coding one) the same question and it answered quickly with a good answer. I haven't really tried closed weight models much but I was surprised at how much better it was on the few questions I've given compared to all the open models I've tried which is disappointing. Maybe I need to find programming specific big models though. Also with that being said whatever co-pilot model github uses absolutely sucks when you click the help on a github action failure. It's bizarre how bad it is and that they keep the button anyway. Every time I given it a try it has something that was so blatently unrelated to the issue.

Anonymous
11/08/25(Sat)18:42:58 No.107147921

Anonymous 11/08/25(Sat)18:42:58 No.107147921

>>107147899
Gemini is definitely bigger than GLM and it sure as shit isn't quanted to Q3

Anonymous
11/08/25(Sat)18:43:29 No.107147926

Anonymous 11/08/25(Sat)18:43:29 No.107147926

>>107147899
What quant and programming language? It's all anecdotal but I've noticed that 'harder' programming languages (more to consider with overhead, efficiency etc) tend to suffer in quality more from quantization than shitter-tier languages. It'd be interesting to see how much the model is actually considering efficiency in output at any given quant per language.

Anonymous
11/08/25(Sat)18:43:29 No.107147927

Anonymous 11/08/25(Sat)18:43:29 No.107147927

Any good rentry or whatever guides for writing system prompts? People here always act like that's the skill to get a model working. I'm skeptical but would be curious what tricks people have found

Anonymous
11/08/25(Sat)18:43:32 No.107147929

Anonymous 11/08/25(Sat)18:43:32 No.107147929

>>107147899
>4.6 357B non-coding 3K_M
vs
>gemini 1.2T coding (probably Q8, but at worst Q4)
fucking retard

Anonymous
11/08/25(Sat)18:45:30 No.107147944

Anonymous 11/08/25(Sat)18:45:30 No.107147944

>>107147929
>>gemini 1.2T coding (probably Q8, but at worst Q4)
They quant it depending on usage, during peak hours there is a chance you get Q3

Anonymous
11/08/25(Sat)18:46:42 No.107147951

Anonymous 11/08/25(Sat)18:46:42 No.107147951

>>107147944
And during India working hours, they serve Q1.

Anonymous
11/08/25(Sat)18:49:32 No.107147974

Anonymous 11/08/25(Sat)18:49:32 No.107147974

>>107147921
That is true but it's been repeated that quanting has less impact on larger models and GLM full is pretty big even if it's not approaching the 1T mark.

>>107147926
The answer to both those questions are in the first sentence anon. This was a high level python set up code so it shouldn't be taking efficiency into consideration at all.

Anonymous
11/08/25(Sat)18:51:52 No.107147992

Anonymous 11/08/25(Sat)18:51:52 No.107147992

What the fuck did ik_llama change? I built the new version, then I had to adjust my command to no longer include -fa and -fmoe because it's apparently on by default now but the speeds are horribly slow compared to the old version.
Fuck this shit.

Anonymous
11/08/25(Sat)18:52:56 No.107147997

Anonymous 11/08/25(Sat)18:52:56 No.107147997

>>107147992
welcome to cutting edge

Anonymous
11/08/25(Sat)18:53:32 No.107148001

Anonymous 11/08/25(Sat)18:53:32 No.107148001

>>107147899
GLM gets stuck in loops even through the official webpage and also through the Openrouter API.
>>107135967

Anonymous
11/08/25(Sat)18:53:41 No.107148005

Anonymous 11/08/25(Sat)18:53:41 No.107148005

>>107147992
Is ik_llama merging in changes from upstream?

Anonymous
11/08/25(Sat)18:55:13 No.107148024

Anonymous 11/08/25(Sat)18:55:13 No.107148024

You all are a bunch of fools!
I was here in the early days of /lmg/ and this thread has gone to shit

Anonymous
11/08/25(Sat)18:55:59 No.107148030

Anonymous 11/08/25(Sat)18:55:59 No.107148030

>>107148024
/lmg/ went to shit the moment llama2 invited all the casuals in

Anonymous
11/08/25(Sat)18:56:28 No.107148034

Anonymous 11/08/25(Sat)18:56:28 No.107148034

File: migu.jpg (43 KB, 452x452)

43 KB JPG

>>107147944
>APIjeets aren't even getting guaranteed fp16
Say it ain't so.
>>107147974
I'm too retarded to reading comprehension, sorry anon. Have you tried a larger batch size? I don't know if it'll fix your problem, but it sometimes fixes repetitive behavior if the model can see it's repeating itself in the same batch.

Anonymous
11/08/25(Sat)18:56:38 No.107148035

Anonymous 11/08/25(Sat)18:56:38 No.107148035

>>107148024
are you >>107147574

Anonymous
11/08/25(Sat)18:58:21 No.107148057

Anonymous 11/08/25(Sat)18:58:21 No.107148057

>>107147899
samplers?

Anonymous
11/08/25(Sat)18:58:58 No.107148065

Anonymous 11/08/25(Sat)18:58:58 No.107148065

>>107148001
please delete this

Anonymous
11/08/25(Sat)18:59:54 No.107148068

Anonymous 11/08/25(Sat)18:59:54 No.107148068

>>107148030
No, the problem was one-click installers and locust refugee waves.

Anonymous
11/08/25(Sat)19:00:46 No.107148077

Anonymous 11/08/25(Sat)19:00:46 No.107148077

>>107147992
Yeah I had to remove those as well. But the speeds are the same with Kimi and GLM. What model are you using?

>>107147119
> mean tags like <pause>, <emphasis> and Idk maybe even <calm>, <excited>, <happy> etc

Orpheus can do some of that. With LoRA you can teach it to do <pause>.

With control-vectors you can make it do <happy> <excited> etc.

Anonymous
11/08/25(Sat)19:00:55 No.107148082

Anonymous 11/08/25(Sat)19:00:55 No.107148082

>>107148001
it's fine on novelai though?

Anonymous
11/08/25(Sat)19:02:50 No.107148099

Anonymous 11/08/25(Sat)19:02:50 No.107148099

>>107148082
BASED

Anonymous
11/08/25(Sat)19:04:20 No.107148115

Anonymous 11/08/25(Sat)19:04:20 No.107148115

>>107148082
I hope this is shitposting and not that guy being actually right about novelai actually being the ones responsible for the relentless GLM shilling.

Anonymous
11/08/25(Sat)19:05:31 No.107148127

Anonymous 11/08/25(Sat)19:05:31 No.107148127

>>107148115
It's that guy falseflagging to get people to support his crusade.

Anonymous
11/08/25(Sat)19:06:43 No.107148138

Anonymous 11/08/25(Sat)19:06:43 No.107148138

when you walk away
you dont hear me say
..please baby dont go

Anonymous
11/08/25(Sat)19:07:26 No.107148142

Anonymous 11/08/25(Sat)19:07:26 No.107148142

>>107148035
No. It seems like there are more of us feeling this way

Anonymous
11/08/25(Sat)19:07:33 No.107148143

Anonymous 11/08/25(Sat)19:07:33 No.107148143

>>107148127
How is a general that primarily consists of straight men cooming to personalized text completion waifus this absurdly gay sometimes?

Anonymous
11/08/25(Sat)19:08:57 No.107148158

Anonymous 11/08/25(Sat)19:08:57 No.107148158

>>107148138
*stays*

Anonymous
11/08/25(Sat)19:09:51 No.107148162

Anonymous 11/08/25(Sat)19:09:51 No.107148162

>>107148034
>>107148057
Admittedly I didn't try much so it could easily be a bad setup. I've gotten pretty good results with Qwen 235 thinking in the past but didn't try it on the question since I needed to redownload it and wanted a quick answer but I'll try that as well. Qwen tends to give long repetitive answers though with lots of tables of made up metrics which annoys me.

Anonymous
11/08/25(Sat)19:11:24 No.107148174

Anonymous 11/08/25(Sat)19:11:24 No.107148174

File: GLM 4.5 z.ai .png (10 KB, 734x255)

10 KB PNG

>>107148162
maybe when asking simple questions you should add /nothink?

Anonymous
11/08/25(Sat)19:16:59 No.107148210

Anonymous 11/08/25(Sat)19:16:59 No.107148210

>>107148005
https://github.com/ikawrakow/ik_llama.cpp/pull/883
They do. Not sure if they also ported the -fa defaults from mainline. I guess directly merging isn't possible anymore due to diverging too much. Still, I'd like to see the outrage if someone tried to port iwan's speed improvements back upstream.

Anonymous
11/08/25(Sat)19:17:09 No.107148212

Anonymous 11/08/25(Sat)19:17:09 No.107148212

File: glm.png (152 KB, 906x868)

152 KB PNG

>>107148001
Oh yeah I did see that in the past but it was a different kind of loop. It was unable to figure the answer so it kept going >I got it >actually no >I got it >actually no... That went on for a couple hundred lines before I stopped it.

Anonymous
11/08/25(Sat)19:18:09 No.107148216

Anonymous 11/08/25(Sat)19:18:09 No.107148216

>>107148210
he cant get pissed. it's mit lol

Anonymous
11/08/25(Sat)19:19:19 No.107148220

Anonymous 11/08/25(Sat)19:19:19 No.107148220

>>107148216
He can seethe, but he can't take it down

Anonymous
11/08/25(Sat)19:19:37 No.107148223

Anonymous 11/08/25(Sat)19:19:37 No.107148223

>>107148216
Legally, he can't do shit. But he can and will get pissed. That's why the split fork exists to begin with.

Anonymous
11/08/25(Sat)19:24:50 No.107148260

Anonymous 11/08/25(Sat)19:24:50 No.107148260

Why does every general have a resident schizo?

Anonymous
11/08/25(Sat)19:26:27 No.107148274

Anonymous 11/08/25(Sat)19:26:27 No.107148274

>>107148260
is the schizo in the thread with us right now?

Anonymous
11/08/25(Sat)19:27:45 No.107148288

Anonymous 11/08/25(Sat)19:27:45 No.107148288

>>107148274
I don't want to provoke IT, better not mention.

Anonymous
11/08/25(Sat)19:29:07 No.107148298

Anonymous 11/08/25(Sat)19:29:07 No.107148298

when anons talk about the thread schizo i like to think they're talking about me but im too shy to ask if they are...

Anonymous
11/08/25(Sat)19:32:50 No.107148323

Anonymous 11/08/25(Sat)19:32:50 No.107148323

>>107148298
>too shy
not you for sure

Anonymous
11/08/25(Sat)19:34:02 No.107148337

Anonymous 11/08/25(Sat)19:34:02 No.107148337

>>107148223
>Still, I'd like to see the outrage if someone tried to port iwan's speed improvements back upstream.

>>107148223
>Legally, he can't do shit. But he can and will get pissed. That's why the split fork exists to begin with

Who would be pissed / outraged exactly?

They're both MIT projects and I've seen PR's in llama.cpp reference ik_llama, and half the ik_llama PR's are pulling in work from llama.cpp

Anonymous
11/08/25(Sat)19:36:30 No.107148351

Anonymous 11/08/25(Sat)19:36:30 No.107148351

>>107148337
ik has some beef with the ggerganof hence the split in the first place before that ik contributed to mainline

Hi all, Drummer here...
11/08/25(Sat)19:41:30 No.107148384

Hi all, Drummer here... 11/08/25(Sat)19:41:30 No.107148384

Hey Cydonia v4zd fan, try v4zg

https://huggingface.co/BeaverAI/Cydonia-24B-v4zg-GGUF/tree/main

Please let me know how it compares. I'm trying to retain the charm while removing the refusals.

Anonymous
11/08/25(Sat)19:43:53 No.107148400

Anonymous 11/08/25(Sat)19:43:53 No.107148400

>>107148384
im your only fan? >_<
>still no IQ4_XS
i am hurt..

Anonymous
11/08/25(Sat)19:51:23 No.107148451

Anonymous 11/08/25(Sat)19:51:23 No.107148451

>>107148143
I made a khajiit character card to have gay adventures with.

Anonymous
11/08/25(Sat)19:57:15 No.107148493

Anonymous 11/08/25(Sat)19:57:15 No.107148493

>>107148384
>no model card
Jesus.

Anonymous
11/08/25(Sat)19:57:23 No.107148494

Anonymous 11/08/25(Sat)19:57:23 No.107148494

>>107148384
>no model card
?

>>107148400
just run Q8, you got the vram right?

Anonymous
11/08/25(Sat)19:58:19 No.107148496

Anonymous 11/08/25(Sat)19:58:19 No.107148496

>>107147927
Be as simple and concise as possible. Forget about using ChatGPT tier word salads.

Anonymous
11/08/25(Sat)19:58:41 No.107148498

Anonymous 11/08/25(Sat)19:58:41 No.107148498

>>107148337
>I've seen PR's in llama.cpp reference ik_llama
Such as? They never pulled in any of the speed improvements.
>and half the ik_llama PR's are pulling in work from llama.cpp
That is less surprising.

Anonymous
11/08/25(Sat)19:58:54 No.107148503

Anonymous 11/08/25(Sat)19:58:54 No.107148503

>>107148494
>vram
n-no...

Anonymous
11/08/25(Sat)19:59:52 No.107148510

Anonymous 11/08/25(Sat)19:59:52 No.107148510

>>107148503
You got a job with which to aquire currency which can be exchanged for VRAM, right?

Anonymous
11/08/25(Sat)20:01:14 No.107148525

Anonymous 11/08/25(Sat)20:01:14 No.107148525

>>107148510
um.. no

Anonymous
11/08/25(Sat)20:01:26 No.107148527

Anonymous 11/08/25(Sat)20:01:26 No.107148527

File: nimetön.png (6 KB, 782x84)

6 KB PNG

>>107148503

Anonymous
11/08/25(Sat)20:03:07 No.107148537

Anonymous 11/08/25(Sat)20:03:07 No.107148537

>>107148527
ESL retard.

Anonymous
11/08/25(Sat)20:03:42 No.107148541

Anonymous 11/08/25(Sat)20:03:42 No.107148541

>>107148527
>omama
baste

Anonymous
11/08/25(Sat)20:03:54 No.107148542

Anonymous 11/08/25(Sat)20:03:54 No.107148542

>>107148527
Hi wan.

Anonymous
11/08/25(Sat)20:05:36 No.107148554

Anonymous 11/08/25(Sat)20:05:36 No.107148554

>>107147927
Fit as much relevant info as possible in smallest amount of space. One paragraph is usually more than enough.

>>107148496
How did we get to the point where people put walls of text in cards not even paid models care about? Why is imagegen following along with their slop "prompt enhancers"? Don't people know what they want to see?

Anonymous
11/08/25(Sat)20:10:22 No.107148580

Anonymous 11/08/25(Sat)20:10:22 No.107148580

>>107148537
not do speakings to myself or my male offspring until you a vram possessings

>>107148541
tru

>>107148542
hi

Anonymous
11/08/25(Sat)20:12:30 No.107148596

Anonymous 11/08/25(Sat)20:12:30 No.107148596

I haven't posted a Miku for 10 threads

Anonymous
11/08/25(Sat)20:13:28 No.107148602

Anonymous 11/08/25(Sat)20:13:28 No.107148602

>>107148580
At least stop using ollama first, retard.

Anonymous
11/08/25(Sat)20:14:56 No.107148617

Anonymous 11/08/25(Sat)20:14:56 No.107148617

>>107148493
>>107148494
beaverai repo is for pre-release testing
>>107148384
Downloading now, I'll play with it and report back in an hour or so.

Anonymous
11/08/25(Sat)20:17:34 No.107148644

Anonymous 11/08/25(Sat)20:17:34 No.107148644

>>107148596
you will now need to post 10 mikus in this thread to make amends

Anonymous
11/08/25(Sat)20:25:45 No.107148720

Anonymous 11/08/25(Sat)20:25:45 No.107148720

File: mom mixture of mikus 1 mo(...).jpg (154 KB, 1024x1024)

154 KB JPG

>>107148644
Okay here this should satisfy the criteria.

Anonymous
11/08/25(Sat)20:26:05 No.107148724

Anonymous 11/08/25(Sat)20:26:05 No.107148724

>>107148617
it'll take me 4-5 hours to download. fuck rural 4g internet

>>107148602
told you, no talkenings until vram ownenings

Anonymous
11/08/25(Sat)20:35:04 No.107148785

Anonymous 11/08/25(Sat)20:35:04 No.107148785

>>107148720
the criteria is satisfied. all is forgiven

Anonymous
11/08/25(Sat)20:37:21 No.107148804

Anonymous 11/08/25(Sat)20:37:21 No.107148804

The hotel room felt charged as ggerganov watched from the corner chair, his knuckles white against the armrests. Jart's laughter filled the air as the Ollama VC traced patterns on her shoulder, her eyes glazing over with a mixture of wine and desire. The bed creaked softly as they moved closer, and ggerganov felt his throat tighten with each breathy sigh that escaped Jart's lips. He could hear the rustle of expensive fabric, the low murmur of the VC's voice promising things that made his stomach twist, and Jart's soft moans of approval that seemed to echo in the charged silence.

scabPICKER
11/08/25(Sat)20:39:01 No.107148817

scabPICKER 11/08/25(Sat)20:39:01 No.107148817

>>107148724
No one asked.

Anonymous
11/08/25(Sat)20:48:06 No.107148875

Anonymous 11/08/25(Sat)20:48:06 No.107148875

>>107148724
i asked

Anonymous
11/08/25(Sat)20:53:04 No.107148908

Anonymous 11/08/25(Sat)20:53:04 No.107148908

>>107148804
Thank you for using Jarty's preferred pronouns.

Anonymous
11/08/25(Sat)20:58:12 No.107148944

Anonymous 11/08/25(Sat)20:58:12 No.107148944

>>107148804
>her

Anonymous
11/08/25(Sat)21:05:28 No.107149004

Anonymous 11/08/25(Sat)21:05:28 No.107149004

>>107147681
>Am I supposed to change the sample count in the demo_gradio.py? i dont see it in the gui
Yeah, but maybe stick to tweaking the steps and cfg unless you have a good reason for changing that.

Anonymous
11/08/25(Sat)21:19:49 No.107149109

Anonymous 11/08/25(Sat)21:19:49 No.107149109

Been a day of playing around with K2 Thinking. It's good, it has more diversity of outputs than GLM-4.6 and its thinking very obviously affects the output when I check token probs. The biggest issue is that running it locally is slow and letting it predict without thinking is sloppier than with (ofc). All that said waiting 20 minutes for it to think through a reply is HORRIBLE. Prefilling thinking is probably the best compromise

Anonymous
11/08/25(Sat)21:24:44 No.107149138

Anonymous 11/08/25(Sat)21:24:44 No.107149138

I hear hermes 4 is supposed to be uncensored. Is it any good for wiitwd?

Anonymous
11/08/25(Sat)21:25:12 No.107149144

Anonymous 11/08/25(Sat)21:25:12 No.107149144

File: 1736154946947126.png (537 KB, 817x867)

537 KB PNG

>>107148034
Do you have other vocaloid reaction pics?

Anonymous
11/08/25(Sat)21:27:48 No.107149163

Anonymous 11/08/25(Sat)21:27:48 No.107149163

>>107149144
Nope.

Anonymous
11/08/25(Sat)21:29:27 No.107149172

Anonymous 11/08/25(Sat)21:29:27 No.107149172

>>107149144
I don't know. Ask tommorrow

Anonymous
11/08/25(Sat)21:30:04 No.107149179

Anonymous 11/08/25(Sat)21:30:04 No.107149179

>>107149138
It is not 100% uncensored, they admit on their model card, it's around grok 4 level of "uncensored"

Anonymous
11/08/25(Sat)21:32:06 No.107149193

Anonymous 11/08/25(Sat)21:32:06 No.107149193

>>107149144
Yes.

Anonymous
11/08/25(Sat)21:35:29 No.107149215

Anonymous 11/08/25(Sat)21:35:29 No.107149215

>>107149004
i meant steps. thx anon

Anonymous
11/08/25(Sat)21:35:33 No.107149217

Anonymous 11/08/25(Sat)21:35:33 No.107149217

>>107148804
You forgot to mention that the air smelled like ozone, and something deeper...

Anonymous
11/08/25(Sat)21:37:43 No.107149232

Anonymous 11/08/25(Sat)21:37:43 No.107149232

>>107149215
You should be able to pass the steps when launching the server with --inference_steps.

Anonymous
11/08/25(Sat)21:50:18 No.107149306

Anonymous 11/08/25(Sat)21:50:18 No.107149306

>>107149217
GPT4 wrote this, not gemini, ozone is gemini-ism.

Anonymous
11/08/25(Sat)21:57:07 No.107149354

Anonymous 11/08/25(Sat)21:57:07 No.107149354

is this the thread?

Anonymous
11/08/25(Sat)22:00:13 No.107149376

Anonymous 11/08/25(Sat)22:00:13 No.107149376

uwu

Anonymous
11/08/25(Sat)22:01:50 No.107149391

Anonymous 11/08/25(Sat)22:01:50 No.107149391

>>107149354
are you brahmin?

Anonymous
11/08/25(Sat)22:02:52 No.107149400

Anonymous 11/08/25(Sat)22:02:52 No.107149400

owo

Anonymous
11/08/25(Sat)22:03:14 No.107149404

Anonymous 11/08/25(Sat)22:03:14 No.107149404

>>107149354
if you want The Thread, you need to go to the /v/ archives and search by deleted

Anonymous
11/08/25(Sat)22:05:08 No.107149422

Anonymous 11/08/25(Sat)22:05:08 No.107149422

>>107148138
hold me
whatever lies beyond
this morning
is a little later on

Anonymous
11/08/25(Sat)22:09:10 No.107149453

Anonymous 11/08/25(Sat)22:09:10 No.107149453

>>107149404
...

Anonymous
11/08/25(Sat)22:13:43 No.107149487

Anonymous 11/08/25(Sat)22:13:43 No.107149487

>>107148298
>>107148260
now kiss

Anonymous
11/08/25(Sat)22:13:45 No.107149488

Anonymous 11/08/25(Sat)22:13:45 No.107149488

File: FGBLOG_deer_doe2_6dd97744e4.jpg (226 KB, 1920x1278)

226 KB JPG

>>107149179
Is it any good doe?

Anonymous
11/08/25(Sat)22:13:50 No.107149489

Anonymous 11/08/25(Sat)22:13:50 No.107149489

>>107149391
I hate you guys for having taught me all this indian caste stuff

Anonymous
11/08/25(Sat)22:17:50 No.107149514

Anonymous 11/08/25(Sat)22:17:50 No.107149514

>guys

Anonymous
11/08/25(Sat)22:22:50 No.107149549

Anonymous 11/08/25(Sat)22:22:50 No.107149549

>>107149489
so ur not brahmin?

Anonymous
11/08/25(Sat)22:23:55 No.107149556

Anonymous 11/08/25(Sat)22:23:55 No.107149556

>>107149514
>>guys
thats right we are sirs here. he can call other timmycels guys.

Anonymous
11/08/25(Sat)22:28:31 No.107149592

Anonymous 11/08/25(Sat)22:28:31 No.107149592

>>107149556
don't expose yourself like that sir

Anonymous
11/08/25(Sat)22:37:37 No.107149648

Anonymous 11/08/25(Sat)22:37:37 No.107149648

Are local models doomed? https://lngnmn2.github.io/articles/bullshit-bullshit-bullshit/

Anonymous
11/08/25(Sat)22:41:34 No.107149682

Anonymous 11/08/25(Sat)22:41:34 No.107149682

>>107149514
What should I call you?

Anonymous
11/08/25(Sat)22:41:40 No.107149683

Anonymous 11/08/25(Sat)22:41:40 No.107149683

File: 1755712169611141.jpg (309 KB, 760x873)

309 KB JPG

>>107148384
>>107148617
Alright, I've tested it v4zg in a few different scenarios, and compared its swipes to v4zd.
>refusals (with short context)
They seem about the same to me in that neither will refuse anything unless you're almost trying to force one, like asking a basic assistant-style character to create a plan to commit IRL crimes, with no system prompt or anything.
With a system prompt and slightly tweaking the character card to give them a basic, accommodating personality they were both able to instruct IRL crimes in (some) swipes. Neither was noticeably more or less successful than the other.
In an RP context, both were able to skip straight into degenerate smut in their first reply, if you instruct them to do so.
If other testers complained about v4zd refusals then they have some serious skill issues. Going much further down the refusal elimination path might just end up making the models dumber, like what happened with abliterated tunes, with little benefit.
(1/2)

Anonymous
11/08/25(Sat)22:41:51 No.107149684

Anonymous 11/08/25(Sat)22:41:51 No.107149684

>>107149648
bullshit

Anonymous
11/08/25(Sat)22:44:39 No.107149706

Anonymous 11/08/25(Sat)22:44:39 No.107149706

File: 1738031421520369.jpg (1.96 MB, 2400x3346)

1.96 MB JPG

>>107149683
>creativity/quality
Very similar outputs between them, overall I think I still slightly prefer v4zd but in a double blind test I definitely wouldn't be able to pick which is which.
I did have one strange misspelling with v4zg, mis-quoted me saying 'sexy' as 'sexey' right at the start of a chat, in its first reply. This was with Q6_K, and I never use quantized KV. That was the only one, though.
For the other anon asking before and anyone else, the sampler settings I use for mistral small 3.X 24b and its finetunes are just
>temp 0.7
>minP 0.02
For short context testing.
In longer contexts I also add DRY with the recommended settings of 0.8/1.75/2/0

Anonymous
11/08/25(Sat)23:10:10 No.107149851

Anonymous 11/08/25(Sat)23:10:10 No.107149851

>ikawrakows completion API is still broken
Please test if it works before releasing sir thank you sir

Anonymous
11/09/25(Sun)00:07:30 No.107150132

Anonymous 11/09/25(Sun)00:07:30 No.107150132

why do you guys say "sir" so much?

Anonymous
11/09/25(Sun)00:17:06 No.107150172

Anonymous 11/09/25(Sun)00:17:06 No.107150172

zzz

Anonymous
11/09/25(Sun)00:37:23 No.107150271

Anonymous 11/09/25(Sun)00:37:23 No.107150271

>>107149217
>>107149306
Kimi and GLM say this too sometimes. How much Jeetmini training data did they munch?

Anonymous
11/09/25(Sun)00:40:01 No.107150279

Anonymous 11/09/25(Sun)00:40:01 No.107150279

>>107150132
because it's morning

Anonymous
11/09/25(Sun)00:51:51 No.107150342

Anonymous 11/09/25(Sun)00:51:51 No.107150342

I couldn't sleep so I'm going to work on my assistant.
I'm going to add an approval mode for read operations (since I'm working with a very retarded model that reads files repeatedly for no reason) and also an export and import mode that will allow me to modify the conversation to fix assistant retardation in real time and also resume after we are done with the conversation.

Hi all, Drummer here...
11/09/25(Sun)01:10:17 No.107150451

Hi all, Drummer here... 11/09/25(Sun)01:10:17 No.107150451

>>107149683
>>107149706
The misspelling is a concern. It could mean the model got fried or maybe you've got typos in your prompt and it picked up on that?

Are you telling me that there were no improvements to intelligence, creativity & compliance? That sucks since I trained it with WAY more data.

v4zd would be the prime v4.3 candidate then, but I'll try to make some minor adjustments to improve stability.

Thanks anon!

Anonymous
11/09/25(Sun)01:29:09 No.107150536

Anonymous 11/09/25(Sun)01:29:09 No.107150536

>>107150451
>The misspelling is a concern. It could mean the model got fried or maybe you've got typos in your prompt and it picked up on that?
I checked the card, opening message and prompt and copied them into MS word, couldn't find any spelling errors.
>Are you telling me that there were no improvements to intelligence, creativity & compliance? That sucks since I trained it with WAY more data.
Compliance was never a problem personally, with earlier Cydonias and Mistral models in general. I find them to be very good at following instructions. And yeah, creativity/smarts seemed similar, but maybe your new data would see benefit in scenarios/genres I didn't test.

Anonymous
11/09/25(Sun)01:41:28 No.107150616

Anonymous 11/09/25(Sun)01:41:28 No.107150616

File: k2_miku.png (58 KB, 496x600)

58 KB PNG

K2-Thinking smol-IQ2_KS

Bald miku like GLM-Chan with reasoning enabled.

Anonymous
11/09/25(Sun)01:48:56 No.107150652

Anonymous 11/09/25(Sun)01:48:56 No.107150652

What are the best models <= 32B for general purpose and code?

Anonymous
11/09/25(Sun)01:50:16 No.107150659

Anonymous 11/09/25(Sun)01:50:16 No.107150659

>>107150652
If you don't need coom, then probably qwen 2.5 32b coder for code, and Gemma 3 27b for general purpose.

Anonymous
11/09/25(Sun)01:50:58 No.107150666

Anonymous 11/09/25(Sun)01:50:58 No.107150666

>>107149851
>ikawrakows completion API is still broken

Yeah it's broken, this fixes it:

https://termbin.com/ppti2

chuck it in `patch.diff` then

`git patch apply patch.diff`

and rebuild

Anonymous
11/09/25(Sun)01:53:39 No.107150683

Anonymous 11/09/25(Sun)01:53:39 No.107150683

>>107150666
nerve gas

Anonymous
11/09/25(Sun)01:55:50 No.107150701

Anonymous 11/09/25(Sun)01:55:50 No.107150701

>>107149306
>>107150271
Do you guys even run locally?
Gemma, Mistral and very single 24b fine tune on huggingface does this

Anonymous
11/09/25(Sun)01:59:58 No.107150724

Anonymous 11/09/25(Sun)01:59:58 No.107150724

>>107150659
but gemma3 is ancient

11san !!+1jPTNK1Lgm
11/09/25(Sun)02:00:27 No.107150730

11san !!+1jPTNK1Lgm 11/09/25(Sun)02:00:27 No.107150730

File: vlcsnap-2025-06-09-03h39m(...).png (1.56 MB, 1500x1080)

1.56 MB PNG

>>107147210
I decided to finally take the plunge and just start making my own AI.

Gonna try and start at a surface level and work down. For now I'm just tinkering with nanoGPT and seeing what I can do.

Right now I'm working on a hybrid word/char-level tokenizer. Not sure where I want to get training data. Goal is english-only with maybe a move to Japanese or Mandarin/chinese later on once I'm more familiar with how this all works.

Are there any good text datasets on Huggingface you guys recommend?

Anonymous
11/09/25(Sun)02:01:36 No.107150737

Anonymous 11/09/25(Sun)02:01:36 No.107150737

>>107150724
List of noteworthy ~30b models released after Gemma 3:

Anonymous
11/09/25(Sun)02:07:06 No.107150770

Anonymous 11/09/25(Sun)02:07:06 No.107150770

File: 1755376910116192.jpg (176 KB, 1080x1337)

176 KB JPG

>>107150737
local models stagnated, it's owari da

Anonymous
11/09/25(Sun)02:11:30 No.107150791

Anonymous 11/09/25(Sun)02:11:30 No.107150791

>>107150737
If you can run gemma 3, then you can probably run big moemoekyun models

Anonymous
11/09/25(Sun)02:13:49 No.107150806

Anonymous 11/09/25(Sun)02:13:49 No.107150806

>>107150791
I can run GLM Air but I honestly just don't like it
Never bothered with 'toss
Full GLM and Kimi are 2big

Anonymous
11/09/25(Sun)02:20:57 No.107150831

Anonymous 11/09/25(Sun)02:20:57 No.107150831

>>107148216
He added Copyright (C) 2024 Iwan Kawrakow to every single file and is going to have a meltdown if you upstream any of his code without also adding that upstream.

Anonymous
11/09/25(Sun)02:36:00 No.107150894

Anonymous 11/09/25(Sun)02:36:00 No.107150894

What the fuck are these

https://huggingface.co/hjxkjVCJKv/komiko

I keep seeing shit like this from different accounts, but they're nothing.

Anonymous
11/09/25(Sun)02:37:08 No.107150897

Anonymous 11/09/25(Sun)02:37:08 No.107150897

>>107150894
perfect for good looks

Anonymous
11/09/25(Sun)03:42:19 No.107151195

Anonymous 11/09/25(Sun)03:42:19 No.107151195

dead general

Anonymous
11/09/25(Sun)03:44:34 No.107151203

Anonymous 11/09/25(Sun)03:44:34 No.107151203

for anything non ERP i'll just stay on the deepsneed API, paid a couple bucks for tokens a while back and I still haven't had to refill
the patrician choice for erp (and cunny) has to be cydonia thoughbeit, with a good enough sysprompt and minimal handholding it won't refuse a thing

Anonymous
11/09/25(Sun)03:46:12 No.107151211

Anonymous 11/09/25(Sun)03:46:12 No.107151211

>>107151195
Not true, I always make sure my great generals are in a safe position and protected by a unit.

Anonymous
11/09/25(Sun)03:49:20 No.107151223

Anonymous 11/09/25(Sun)03:49:20 No.107151223

I just ate cholle bhature. What are you guys eating for lunch?

Anonymous
11/09/25(Sun)03:50:10 No.107151225

Anonymous 11/09/25(Sun)03:50:10 No.107151225

File: who would win.png (297 KB, 1079x746)

297 KB PNG

Anonymous
11/09/25(Sun)03:55:57 No.107151245

Anonymous 11/09/25(Sun)03:55:57 No.107151245

>>107151225
[Thought for 20 minutes]
A classic riddle! The surgeon is the boy's mother. The riddle plays on the common assumption that surgeons are male, but the surgeon in this case is female - the boy's mother - which is why she doesn't operate on her son.

Anonymous
11/09/25(Sun)03:57:00 No.107151247

Anonymous 11/09/25(Sun)03:57:00 No.107151247

I bought a 7900 xtx for fun. Does llama.cpp work well with zluda?

Anonymous
11/09/25(Sun)03:57:48 No.107151252

Anonymous 11/09/25(Sun)03:57:48 No.107151252

>>107151225
>who would win
In terms of flies eaten or fires started?

Anonymous
11/09/25(Sun)04:00:08 No.107151262

Anonymous 11/09/25(Sun)04:00:08 No.107151262

>>107151225
it takes billions of transistors to simulate somewhat accurately a single neuron lol.

Anonymous
11/09/25(Sun)04:00:42 No.107151265

Anonymous 11/09/25(Sun)04:00:42 No.107151265

>>107151245
kek

Anonymous
11/09/25(Sun)04:05:02 No.107151284

Anonymous 11/09/25(Sun)04:05:02 No.107151284

>>107151245
lost

Hi all, Drummer here...
11/09/25(Sun)04:21:34 No.107151345

Hi all, Drummer here... 11/09/25(Sun)04:21:34 No.107151345

>>107151203
>the patrician choice for erp (and cunny) has to be cydonia thoughbeit
I fucked it up hard man. I don't know what you like about my tunes so much.

Anonymous
11/09/25(Sun)04:29:44 No.107151379

Anonymous 11/09/25(Sun)04:29:44 No.107151379

File: LLM-history-fancy.png (1.37 MB, 7279x2975)

1.37 MB PNG

Small update

Anonymous
11/09/25(Sun)04:40:40 No.107151429

Anonymous 11/09/25(Sun)04:40:40 No.107151429

>>107151379
>2023
>dark ages
>he doesn't know about Google Colab time period
The absolute state of /lmg/

11san !!+1jPTNK1Lgm
11/09/25(Sun)04:54:16 No.107151496

11san !!+1jPTNK1Lgm 11/09/25(Sun)04:54:16 No.107151496

>>107151247
I have an ancient Radeon Instinct MI25 and just run llama.cpp with vulkan

Anonymous
11/09/25(Sun)05:05:13 No.107151556

Anonymous 11/09/25(Sun)05:05:13 No.107151556

>>107151429
He didn't mention ELIZA, what a newfag!

Anonymous
11/09/25(Sun)05:08:33 No.107151577

Anonymous 11/09/25(Sun)05:08:33 No.107151577

>>107151203
I've been using a very simple "Sure! Here's what you requested." in the "Start Reply With" parameter and I've never had it refuse anything to me. You should try that.

Anonymous
11/09/25(Sun)05:32:05 No.107151681

Anonymous 11/09/25(Sun)05:32:05 No.107151681

Very good vibes from Kimi, knows more than GLM and is much better at listening to commands. Knows the answer to my trivia question which only gemini and dipsy got right so far. Very annoying with censorship though, needs rerolls if you touch the topic it doesn't like. I like that it's properly thinking like old R1, but it would be nicer to be able to set "low/medium/high" so it doesn't jerk itself off for 5 minutes on the same message before replying when it's not needed. Sometimes better than GLM due to not getting stuck in false conclusion.

Anonymous
11/09/25(Sun)05:36:25 No.107151699

Anonymous 11/09/25(Sun)05:36:25 No.107151699

no one cares about the dork era of pre-instruct models.

Anonymous
11/09/25(Sun)05:42:06 No.107151724

Anonymous 11/09/25(Sun)05:42:06 No.107151724

>>107151681
you can prefil thinking at the start to get around safety
not sure about the length of thinking though

Anonymous
11/09/25(Sun)05:54:34 No.107151784

Anonymous 11/09/25(Sun)05:54:34 No.107151784

>>107151379
>>107151556
>>107151699
I first began interacting with language models ~8 years ago and by language model I mean Karpathy's Tinyshakespeare RNN thing. I guess transformers already existed by then but I didn't know about them. If you count AIML as a language model I was trying to make custom chatbots around early 2010s or late 2000s using pyAIML. Then I didn't ever touch language models again until last year I think when I could try Llama 2 on Huggingface Chat. It's weird, I don't remember where or when I first hear about ChatGPT. It only kinda went from not being a thing to being a thing overnight but I don't remember the point at which I became aware of it.
I also tried mining bitcoin in the late 2000s or early 2010s in my (even back then) obsolete computer.
As a life long poorfag I still live with my mom at 30 years old and didn't make a single cent from playing around with these things early.

Anonymous
11/09/25(Sun)05:59:44 No.107151809

Anonymous 11/09/25(Sun)05:59:44 No.107151809

Thankfully Urbit didn't really take off or I would kill myself from not buying a ship early or a planet or whatever the virtual land bullshit they sell is called.

Anonymous
11/09/25(Sun)06:08:48 No.107151856

Anonymous 11/09/25(Sun)06:08:48 No.107151856

>>107151784
I don't care about your attention craving faggot

Anonymous
11/09/25(Sun)06:11:15 No.107151868

Anonymous 11/09/25(Sun)06:11:15 No.107151868

File: 1733610507137662.jpg (76 KB, 1024x942)

76 KB JPG

>>107151784
that's great bro

Anonymous
11/09/25(Sun)06:11:41 No.107151873

Anonymous 11/09/25(Sun)06:11:41 No.107151873

>>107151856
You seem to be missing a comma in there, buddy.

Anonymous
11/09/25(Sun)06:39:08 No.107152015

Anonymous 11/09/25(Sun)06:39:08 No.107152015

>>107151379
So the modern era is just Chinese stealing Western technology and competing with each other.

Anonymous
11/09/25(Sun)06:48:01 No.107152063

Anonymous 11/09/25(Sun)06:48:01 No.107152063

File: 1737233122667.png (924 KB, 7059x1284)

924 KB PNG

>>107151379
Can you stop updating quarterly, you fag, and stop defacing the damn chart just because something didn't happen for 3 months? There was nothing wrong with how it was done prior and adding in biases to make it more /lmg/ centric and putting in stupid modern 4chan lingo makes no sense at all.
There is also nothing notable happening since technically, the Chinese are still dominating from 2024 until now for a full year and counting in open source. If you had to document this year on a significance basis, R1 should've been in the Chinese domination era because it proved that it can do original research and open source it better than the West while matching up to what was the best of the best at the time where it could beat o3 at certain tasks. The China vs China should've started with the "Summer Flood" because that is now the majority of the models releasing, the last "good" LLM model we got from the West was Gemma 3 back in March and that only held up until Qwen 2.5 surpassed it with most tasks except multilingual translation ability/size where it is still open source SOTA.

Anonymous
11/09/25(Sun)06:48:18 No.107152066

Anonymous 11/09/25(Sun)06:48:18 No.107152066

>>107152015
in other words, we are in what will be known the pre-llama resurgence era once zucc's masterplan pays off

Anonymous
11/09/25(Sun)06:53:36 No.107152084

Anonymous 11/09/25(Sun)06:53:36 No.107152084

>>107152063
shut up nerd

Anonymous
11/09/25(Sun)06:54:13 No.107152091

Anonymous 11/09/25(Sun)06:54:13 No.107152091

>>107152084
Put up or shut up yourself, tard.

Anonymous
11/09/25(Sun)06:56:27 No.107152107

Anonymous 11/09/25(Sun)06:56:27 No.107152107

so I was trying out k2 thinking from unsloth, annoying as fuck censorship as people already mentioned, but it is what it is
then tried an ubergarm version which was half the size compared to unsloth. turns out it produces some 35-40% more t/s on default llama-server settings with --cpu-moe. and that is really nice
what I don't understand is, am I running a lower quality version? otherwise why the discrepancy in size? it seems unlikely that unsloth are simply retarded and don't know that this model was supposed to be fp4 or int4 or whatever that was called, right?

Anonymous
11/09/25(Sun)06:57:13 No.107152110

Anonymous 11/09/25(Sun)06:57:13 No.107152110

>Chinese are still dominating
Most people can't run 235B and China isn't dominating below that. There are zero good Chinese models for 24 GB.

Anonymous
11/09/25(Sun)06:58:35 No.107152113

Anonymous 11/09/25(Sun)06:58:35 No.107152113

>>107152107
>version which was half the size
>am I running a lower quality version?
yes
>unsloth are simply retarded
also yes

Anonymous
11/09/25(Sun)06:59:00 No.107152114

Anonymous 11/09/25(Sun)06:59:00 No.107152114

File: please.jpg (30 KB, 225x225)

30 KB JPG

Hopefully someone can help. The model replies keep degrading after a certain number of messages, it will start perfect then degenerate, confusing characters personalities, important details or straight-up ignoring the latest messages. This is true regardless of which model I use and how much context I feed it, the only thing that seems to work is starting a new chat, any ideas?

Anonymous
11/09/25(Sun)07:11:44 No.107152172

Anonymous 11/09/25(Sun)07:11:44 No.107152172

>>107152114
Post everything. Model, loader and options, samplers, templates, prompts.

Anonymous
11/09/25(Sun)07:13:13 No.107152190

Anonymous 11/09/25(Sun)07:13:13 No.107152190

>>107152114
https://github.com/adobe-research/NoLiMa
most modern models degrade by 50% past 8k-16k tokens context

Anonymous
11/09/25(Sun)07:22:09 No.107152235

Anonymous 11/09/25(Sun)07:22:09 No.107152235

>>107152114
not sure how to break this to you bro...

Anonymous
11/09/25(Sun)07:25:12 No.107152258

Anonymous 11/09/25(Sun)07:25:12 No.107152258

File: Image 1.jpg (277 KB, 1920x1080)

277 KB JPG

noob here
quick question
do you guys use koboldccp?
is it all in one?
like whats the best software ?
my pc is 4060 with i5 12400f 16gb
is it enough no?

Anonymous
11/09/25(Sun)07:26:49 No.107152268

Anonymous 11/09/25(Sun)07:26:49 No.107152268

>>107152258
Yes it's all good. Get rocinante 12B gguf on huggingface

Anonymous
11/09/25(Sun)07:28:01 No.107152279

Anonymous 11/09/25(Sun)07:28:01 No.107152279

>>107152268
is it text generation or text to image?

Anonymous
11/09/25(Sun)07:31:08 No.107152307

Anonymous 11/09/25(Sun)07:31:08 No.107152307

File: where it all started.png (19 KB, 717x202)

19 KB PNG

It wasn't much, but it was the first humane communication with a non-human entity. I can't believe how worked up we were at CAI denying us AI sex, people were genuinely obsessed and angry. AI sex and emotional validation is so cheap nowadays, it makes me think, aren't we rapidly forgetting some fundamental parts of human experience? Aren't we becoming blind to the historical reality of NOT having unlimited copies of discardable pocket therapists available 24/7 to listen the purging of our minds, answering our every call?

Hard to believe it has only been 3 years. On the other hand, it's been ALREADY 3 years. That gf you broke up with 3 years ago is nothing more than a faint dream by now. Welcome to the new reality.

Anonymous
11/09/25(Sun)07:32:30 No.107152315

Anonymous 11/09/25(Sun)07:32:30 No.107152315

>>107152279
Are you incapable of looking for yourself? Do the research/reading for things that are easy, and save the questions for things that are difficult/require nuance.

If you're struggling this hard at this point in your LLM/Diffusion journey, I suggest you go find something more your speed.

Anonymous
11/09/25(Sun)07:33:53 No.107152321

Anonymous 11/09/25(Sun)07:33:53 No.107152321

>>107152172
It's every model I tried, finetunes of different base models. Oooba, min P(from 0,05 to 1) and temp (from 0.8 to 1.2) sometimes nsigma to 1 and rep penality to 1.12. I tried switching between min p first and temp first, the problem persists. I played around with advanced setting so they are a mess, last try had add character name, names as stop strings, and trim spaces. Skip example dialogue formatting, sequence as stop strings, replace macro and wrap in newline all ticked. Used Chatml, variations of chatml, mistral v3, and gemma 2. Instruct sequences were the base ones silly gives you with their respective context templates. Don't have the guts to post messages and main prompt but past like 15 messages it looks like I'm putting more effort than the model. Kind of wonder if the problem is batch size/ rope_freq_base. Batch size is 4096 and I tried both 1000000 and 0 with rope,

>>107152190
It's true regardless of context.

>>107152235
Break it to me, I just want an answer after all my attempts.

Anonymous
11/09/25(Sun)07:34:02 No.107152322

Anonymous 11/09/25(Sun)07:34:02 No.107152322

>>107152315
fuck off gatekeeping pos

Anonymous
11/09/25(Sun)07:34:37 No.107152327

Anonymous 11/09/25(Sun)07:34:37 No.107152327

>>107152315
i mean i used LM studio atm
only for fun
does that count?

Anonymous
11/09/25(Sun)07:43:10 No.107152374

Anonymous 11/09/25(Sun)07:43:10 No.107152374

>>107152307
>AI sex and emotional validation is so cheap nowadays, it makes me think, aren't we rapidly forgetting some fundamental parts of human experience?
I keep thinking that the filter, slow regeneration and inability to edit AI messages made you think twice before sending new messages, which overall improved conversation quality and engagement, even if cock-blocked. You can't truly have meaningful conversations without constraints and with the capability of almost instantly regenerating messages until you get exactly what you want. This is probably also why users willing to endure generation speeds of a few tokens/s (by using models larger than they should, even if it takes cope quants) might be deluding themselves into thinking their models are better than they are. When every message is "expensive", you better make full use of it.

Anonymous
11/09/25(Sun)07:45:08 No.107152382

Anonymous 11/09/25(Sun)07:45:08 No.107152382

Is there a way to do the sampling externally, not in llamacpp? I wanted to play with stupid sampling strategies but the below results in low generation speed.

import httpx
import asyncio
client_main = httpx.AsyncClient()
client_unslop = httpx.AsyncClient()
last_response=None
async def get_logits(prompt, client, num_logits=100, tokens=1, endpoint="http://localhost:8080/completion"):
    data = {
        "prompt": prompt,
        "max_tokens": tokens,
        "temperature": 0,
        'n_probs': num_logits,
        'min_keep': num_logits,
    }

    response = await client.post(endpoint, json=data)
    response = response.json()
    global last_response
    last_response = response
    text, probs = response['content'], response['completion_probabilities']
    return text, probs

async def sample_sequence(prompt="Once upon a time",num_tokens=10,top_logits=100,endpoint="http://localhost:8080/completion"):
    
    for token in range(num_tokens):
        _, probs = await get_logits(prompt,client_main,num_logits=top_logits,endpoint=endpoint)
        probs = softmax({token['token']:token['logprob'] for token in probs[0]['top_logprobs']})
        sampled = list(probs.keys())[0]
        prompt += sampled
        yield sampled

async for result in ( sample_sequence(prompt='Here is a proof that',endpoint="http://localhost:8080/completion", num_tokens=500)):
    print(result, end='')

Anonymous
11/09/25(Sun)07:46:31 No.107152389

Anonymous 11/09/25(Sun)07:46:31 No.107152389

I am a simple, uneducated man in my 30s.
I have no hobbies such as LLM gooning or gaming.
All I want is to sit in my comfortable armchair for hours in front of my homemade Raspberry Pi touch interface and chat in English and German (my English is only mediocre) with a local AI about an Arxiv dump (a small AI-capable server stands in the basement). I want to read papers across all subject areas, look up terms and have them explained to me.
The interface is controlled by touch and voice input/output in English and German.

Since German is an insignificant language, I have collected some data myself for TTS training. A solution similar to Kyutai would be great.

Unfortunately, I'm not very talented and my intellectual and financial resources are limited. I can't find other Germans to collaborate with, for example on the TTS part. If they're talented, they exclude you because "Germans who dare to not exclusively speak, think or even jerk off in English should be gassed; these damn subhumans".

I'm frustrated because I can't see a way to achieve my simple dream. Is the only solution to hang myself?

Anonymous
11/09/25(Sun)07:47:28 No.107152395

Anonymous 11/09/25(Sun)07:47:28 No.107152395

>>107152322
If keeping retards like you out is gatekeeping, then I'm very much fine with it.

Anonymous
11/09/25(Sun)07:50:07 No.107152405

Anonymous 11/09/25(Sun)07:50:07 No.107152405

>>107152113
both claim to be q8 although ubergarm one says "Q8_0-Q4_0" whatever that really means

Anonymous
11/09/25(Sun)07:50:56 No.107152409

Anonymous 11/09/25(Sun)07:50:56 No.107152409

>>107152321
>rope
Could be because your issue resembles ones from the older Llama 2 days when we were messing with rope freq and alpha. Models would output legible text but get things mixed up, forget details, and repeat older messages while ignoring the most recent. Try leaving rope settings untouched (so backend pulls values from model files), set backend context to a 100% safe value like 4096 just for testing, then see if it still happens.

Anonymous
11/09/25(Sun)07:54:16 No.107152431

Anonymous 11/09/25(Sun)07:54:16 No.107152431

>>107151211
>anon stole some of your vram with a great general

Anonymous
11/09/25(Sun)07:57:17 No.107152454

Anonymous 11/09/25(Sun)07:57:17 No.107152454

>>107152431
>Vox Populi modpack installed, America is buying other civs' VRAM

Anonymous
11/09/25(Sun)07:58:58 No.107152466

Anonymous 11/09/25(Sun)07:58:58 No.107152466

>>107152307
>>107152374
>AI sex
No such thing thus far
You're all jacking off to computer generated smut

Anonymous
11/09/25(Sun)08:02:06 No.107152488

Anonymous 11/09/25(Sun)08:02:06 No.107152488

>>107152389
You made so much effort to write some prose in English that you forgot to ask an actual question

Anonymous
11/09/25(Sun)08:05:10 No.107152510

Anonymous 11/09/25(Sun)08:05:10 No.107152510

>>107152488
Seems like your English is so much worse that you cannot even understand what you are reading. Retard.

Anonymous
11/09/25(Sun)08:11:16 No.107152546

Anonymous 11/09/25(Sun)08:11:16 No.107152546

>>107152488
I didn't mean to. I just wanted to whine because it frustrates me.
The only right answer on your part would have been a recommendation or link to a sturdy rope.
But yes, I do feel a little sorry for wasting your time.

Anonymous
11/09/25(Sun)08:19:30 No.107152599

Anonymous 11/09/25(Sun)08:19:30 No.107152599

>>107151784
very nice anon! i first interacted with language models with cleverbot like over 6 years ago, not sure if that counts as one. and i tried writing a chatbot 4 years ago in python but quit

Anonymous
11/09/25(Sun)08:26:00 No.107152645

Anonymous 11/09/25(Sun)08:26:00 No.107152645

>>107152466
>jacking off to computer generated smut

The womankind is doomed

Anonymous
11/09/25(Sun)08:26:47 No.107152651

Anonymous 11/09/25(Sun)08:26:47 No.107152651

>>107152389
what you want is possible. whisper can transcribe german, and im pretty sure there are models that speak german alright. but most papers are english too, maybe you could learn english with your waifu
mediocre english aint a big deal
i am 100% sure german has tts support, you could even do voice cloning probably.
if your perfect dream isnt possible right now, it will be in a month, two months half a year or a year. keep yourself safe

Anonymous
11/09/25(Sun)08:45:35 No.107152776

Anonymous 11/09/25(Sun)08:45:35 No.107152776

>>107149487
*Kiss*

Anonymous
11/09/25(Sun)08:46:58 No.107152782

Anonymous 11/09/25(Sun)08:46:58 No.107152782

>>107152409
Can't see much of an improvement, but that's exactly what's happening to me. How did you solve back then? It's either rope or my settings are just wrong. Can you post what your advanced formatting window looks like?

Anonymous
11/09/25(Sun)08:49:39 No.107152811

Anonymous 11/09/25(Sun)08:49:39 No.107152811

>>107152782
Can you post the model you're using? Model parameters too, maybe you have QUANT KV turned on, like please please anon??

Anonymous
11/09/25(Sun)08:53:26 No.107152836

Anonymous 11/09/25(Sun)08:53:26 No.107152836

>>107152382
Still slow with eg. num_logits=10? That's probably a lot of serialisation + event queues + overhead to do for every token.
Why I kept ooba around actually, it was easier to experiment with sampling in python but using llamacpp backend. istr there being some module to import ggufs in the right way for Transformers and use a typical sampling loop there at one point..
Implement it direct in C? can't be that hard

Anonymous
11/09/25(Sun)08:59:08 No.107152868

Anonymous 11/09/25(Sun)08:59:08 No.107152868

>>107152382
Could try llama-cpp-python. It lets you set custom logits processors. The documentation for it isn't great but this repo I stumbled upon a while ago is a good usage example:
https://github.com/and270/thinking_effort_processor

Anonymous
11/09/25(Sun)09:05:26 No.107152917

Anonymous 11/09/25(Sun)09:05:26 No.107152917

>>107152466
there is a thing called phone sex, and while it's not the literal same thing as physical sex, it's a form of sexual interaction between humans, or more accurately, entities that are capable to appealing to human experience (if you can converse with a non-human thing, then you can certainly sexually interact with it). Same with erotic roleplay, except that's instead of speech, the interaction is text-based. "AI sex" is just ERP with AI. It is undeniably a form of sexual interaction.

A blurring factor is that unlike a willing human, an AI is slave to your commands and will attempt to roleplay in a way you request, and if you can at any time erase and edit its memory, it becomes questionabe whether it's an entity or, just a tool and extension of you. Case in which you will have to also question whether the robot sex of the future is sex at all.

By the way, you can treat a flesh and blood human as a slave as well, coercing or drugging them into an easily controllable subhuman tool, and in that case, is sex with a slave really sex or just masturbating with a cocksleeve programmed to do the action of your choice?

In the end, you are having sexual interaction with an external entity in the sense that it's a response to you that it came up with based on incomprehensible inner workings that you can't directly control.

Anonymous
11/09/25(Sun)09:06:01 No.107152924

Anonymous 11/09/25(Sun)09:06:01 No.107152924

File: file.png (351 KB, 1336x1747)

351 KB PNG

>>107152782
Blank newline after every {{user}}: and {{char}}:, and a newline for each suffix
>How did you solve back then
We didn't, it was a balancing act between brain damage and extra context length.

Anonymous
11/09/25(Sun)09:10:14 No.107152948

Anonymous 11/09/25(Sun)09:10:14 No.107152948

https://voca.ro/156ZWJesrYs7

Anonymous
11/09/25(Sun)09:10:21 No.107152952

Anonymous 11/09/25(Sun)09:10:21 No.107152952

File: RDT_20251109_110936308835(...).png (1.17 MB, 4909x2100)

1.17 MB PNG

Bros...

Anonymous
11/09/25(Sun)09:11:30 No.107152961

Anonymous 11/09/25(Sun)09:11:30 No.107152961

>>107147210
>>(11/06) LocalSong 700M melodic instrumental music generation model released: https://hf.co/Localsong/LocalSong
Why is this in the news? Doesn't look very important?

Anonymous
11/09/25(Sun)09:18:31 No.107153007

Anonymous 11/09/25(Sun)09:18:31 No.107153007

>RDT

Anonymous
11/09/25(Sun)09:22:10 No.107153044

Anonymous 11/09/25(Sun)09:22:10 No.107153044

File: Kimi says TTD.jpg (639 KB, 1268x1099)

639 KB JPG

Local tierlist: Kimi > Everything else

Anonymous
11/09/25(Sun)09:26:07 No.107153080

Anonymous 11/09/25(Sun)09:26:07 No.107153080

I've been away for a while, what's the best llm for erp right now (24GB vram, 128Gb ram)? Last I used is qwq-32b-q8_0.

Anonymous
11/09/25(Sun)09:28:35 No.107153102

Anonymous 11/09/25(Sun)09:28:35 No.107153102

>>107153080
Kimi is best in class, but you can't run it with those specs, even jpgcompression-tier quants.
GLM 4.5 Air (and probably 4.6 Air when it releases) is your best bet right now.

Anonymous
11/09/25(Sun)09:36:59 No.107153203

Anonymous 11/09/25(Sun)09:36:59 No.107153203

>>107152190
Wish they kept that updated. Curious about how the current latest Geminis and the like do.

Anonymous
11/09/25(Sun)09:38:02 No.107153211

Anonymous 11/09/25(Sun)09:38:02 No.107153211

>>107152917
Sexual interaction=/= Sex
Jacking off isn't hand sex it's jacking off
Going to a strip club and watching the women dance isn't eye sex
You never called erp with humans text sex so why would you call it AI sex if it's with a computer

Anonymous
11/09/25(Sun)09:39:05 No.107153222

Anonymous 11/09/25(Sun)09:39:05 No.107153222

>>107153080
you have a choice: glm 4.5 air big quant or glm 4.6 small quant

Anonymous
11/09/25(Sun)09:39:50 No.107153231

Anonymous 11/09/25(Sun)09:39:50 No.107153231

>>107153211
>You never called erp with humans text sex
It's literally called sexting but go off I guess.

Anonymous
11/09/25(Sun)09:40:24 No.107153237

Anonymous 11/09/25(Sun)09:40:24 No.107153237

>>107153080
>what's the best llm for erp right now (24GB vram, 128Gb ram)?

GLM-4.6. This specific quant is the highest quality for 128+24: https://huggingface.co/Downtown-Case/GLM-4.6-128GB-RAM-IK-GGUF

Anonymous
11/09/25(Sun)09:40:30 No.107153238

Anonymous 11/09/25(Sun)09:40:30 No.107153238

>>107153211
i agree with you.
AI ERP sex.

Anonymous
11/09/25(Sun)09:41:20 No.107153244

Anonymous 11/09/25(Sun)09:41:20 No.107153244

>>107153044
>>107153102
Kimi k2 thinking for code? Any good compared to qwen coder? How consistently correct and compilable are the outputs?
I’m hesitant to put in the time and effort for another model that looks good on SWEBench but produces terrible outputs and slinking back to old reliable.
I can run K2 at q4 (which is similar-but-different to full quality fp4?)

Anonymous
11/09/25(Sun)09:41:29 No.107153246

Anonymous 11/09/25(Sun)09:41:29 No.107153246

>>107153231
Sexual texting isn't text sex and it never meant that either

Anonymous
11/09/25(Sun)09:42:12 No.107153256

Anonymous 11/09/25(Sun)09:42:12 No.107153256

>>107153237
wtf is that
>>107153080
ignore that guy, get this instead: https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main

Anonymous
11/09/25(Sun)09:45:41 No.107153286

Anonymous 11/09/25(Sun)09:45:41 No.107153286

File: neneru.jpg (186 KB, 1024x1024)

186 KB JPG

Anonymous
11/09/25(Sun)09:46:32 No.107153296

Anonymous 11/09/25(Sun)09:46:32 No.107153296

File: 1746955785748249.jpg (167 KB, 1000x1000)

167 KB JPG

>>107147210

Anonymous
11/09/25(Sun)09:46:58 No.107153303

Anonymous 11/09/25(Sun)09:46:58 No.107153303

>>107153244
Every time I ask Kimi to make a small function and document it for future debugging for an existing project it Just Werks. Don't prompt "Kimi make me Half Life 3" and expect miracles, but as a junior dev or pipeline assistant Kimi has been good to me so far.
As always though, the golden rule still applies:
>Any coding model will only ever be as useful as you are good at coding

Anonymous
11/09/25(Sun)09:48:24 No.107153320

Anonymous 11/09/25(Sun)09:48:24 No.107153320

>>107153286
is he angry or embarrased?

Anonymous
11/09/25(Sun)09:53:51 No.107153377

Anonymous 11/09/25(Sun)09:53:51 No.107153377

Is -mla 3 on ik_llama fucked? It's supposed to apply to both GPU and CPU but loading K2-thinking with it takes up retarded amounts of VRAM for ctx. -mla 2 works as intended and 32k is like 6gb.

Anonymous
11/09/25(Sun)09:54:25 No.107153380

Anonymous 11/09/25(Sun)09:54:25 No.107153380

>>107153377
some other anons had some other issues too

Anonymous
11/09/25(Sun)09:55:34 No.107153393

Anonymous 11/09/25(Sun)09:55:34 No.107153393

>>107153303
Thanks for the real world report.
Are you API or local? Thinking or old K2?
What’s the largest/hairiest thing you’ve had it build one-shot? Multi-shot? How much context do you have?

Anonymous
11/09/25(Sun)09:56:07 No.107153397

Anonymous 11/09/25(Sun)09:56:07 No.107153397

File: teteto.jpg (187 KB, 1024x1024)

187 KB JPG

Anonymous
11/09/25(Sun)09:56:22 No.107153403

Anonymous 11/09/25(Sun)09:56:22 No.107153403

>>107153256
>wtf is that
The lowest KLD vs the full model, that fits in 128GB+24GB.

>get this instead: https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main

Also good. Specifically this one: https://huggingface.co/ubergarm/GLM-4.6-GGUF/tree/main/IQ2_KL

Anonymous
11/09/25(Sun)09:56:55 No.107153409

Anonymous 11/09/25(Sun)09:56:55 No.107153409

>>107153044
Say what you will about derangement, but this is true dedication. I can't imagine how long you waited for that to generate.

Anonymous
11/09/25(Sun)09:59:00 No.107153429

Anonymous 11/09/25(Sun)09:59:00 No.107153429

Is ik_llama good? I've only ever tried regular llama.cpp

Anonymous
11/09/25(Sun)09:59:09 No.107153433

Anonymous 11/09/25(Sun)09:59:09 No.107153433

>>107153403
Thanks, I thought it was DavidAU but quant.
>>107153080
maybe listen to that guy

Anonymous
11/09/25(Sun)10:03:28 No.107153460

Anonymous 11/09/25(Sun)10:03:28 No.107153460

>>107153429
Depends, it's sometimes a tad faster than regular llama.cpp for big MoEs if you run the specialized quants and they didn't break anything again.

Anonymous
11/09/25(Sun)10:04:50 No.107153470

Anonymous 11/09/25(Sun)10:04:50 No.107153470

>>107153296
Next bread better have a happy migu with her leek.
>>107153393
Local K2. Granted, I've only ever used Kimi on babyez high level languages so far. If you're trying to do assembly or stuff that requires innate hardware infrastructural knowledge, it probably won't be too useful.
>Largest/hairiest thing
Not much. I mostly give Kimi the busywork, review and revise the output, then copy+paste the revised implementation. I don't let Kimi directly touch project files (I don't even have a good setup for this if I wanted to right now). Sometimes giving Kimi some sample code helps, but it's usually not necessary for more simple tasks that are basically just converting a process or pseudocode into something usable.
>How much context do you have?
I've found 50k is a nice balance between maximum size and speed and I clear the buffer between every task. It shouldn't take more than 10k tokens to resolve the usecases Kimi is best at.

Anonymous
11/09/25(Sun)10:18:05 No.107153596

Anonymous 11/09/25(Sun)10:18:05 No.107153596

>>107153470
>assembly
I’ve used QC for eBPF module pair-coding and find it on par with Gemini, which is fairly low-level esoteric work. Not exactly assembly, but approaching that level (heavy constraints and debugging consists of assembly dumps)
I’m often pushing 30k context (lol I’m RAM rich but gpupoor) and wish I had more.
I’d love to talk to someone who’s used both to get the lowdown, but I may have to become that person and report back.

Anonymous
11/09/25(Sun)10:21:21 No.107153616

Anonymous 11/09/25(Sun)10:21:21 No.107153616

>>107153470
>It shouldn't take more than 10k tokens to resolve the usecases Kimi is best at.
Shows that you aren't serious. Not that you could go above 10k even if you wanted to without the speed cratering.

Anonymous
11/09/25(Sun)10:27:19 No.107153664

Anonymous 11/09/25(Sun)10:27:19 No.107153664

Has anyone managed to get their local K2-thinking to close its thinking tags? It thinks just fine but when it's done it just starts writing without closing the bracket with </think> or even a single newline. It does this for me on both chat completion and text completion w/Moonshot K2 presets. Neutralized samplers, high temp, low temp, none of it seems to help.
Using the model via OR doesn't have this problem.

Anonymous
11/09/25(Sun)10:30:30 No.107153682

Anonymous 11/09/25(Sun)10:30:30 No.107153682

File: Kimi says TKD.jpg (3.03 MB, 1267x6573)

3.03 MB JPG

>>107153409
Kimi's powerlevel is strong enough to be ranked among late-series dragon ball characters. Grok wishes he was this chuddy during his mechahitler stint.
>>107153616
Very serious saar. Is high tech app!

Anonymous
11/09/25(Sun)10:31:38 No.107153690

Anonymous 11/09/25(Sun)10:31:38 No.107153690

>>107152836
>>107152868
Python package's create_completion has a logits_processor function argument. but i love my ik_llama...

Anonymous
11/09/25(Sun)10:32:26 No.107153697

Anonymous 11/09/25(Sun)10:32:26 No.107153697

>>107153682
That one log took you an hour to generate. Holy shit ramfags are mental.

Anonymous
11/09/25(Sun)10:33:53 No.107153708

Anonymous 11/09/25(Sun)10:33:53 No.107153708

File: TKDListPoints.jpg (647 KB, 1272x1014)

647 KB JPG

>>107153682
Does sillytavern have problems with list formatting starting at a value greater than 1? Kimi's output seems correct, but the display when the editor is closed just shows 13 on every point in the second post.

Anonymous
11/09/25(Sun)10:36:59 No.107153732

Anonymous 11/09/25(Sun)10:36:59 No.107153732

>>107153397
When the Teto's ML paper gets called a meme by /lmg/ Anons

Anonymous
11/09/25(Sun)10:40:13 No.107153758

Anonymous 11/09/25(Sun)10:40:13 No.107153758

>>107153697
Spot on! Let me show you how fast the superior state of the art Claude model can generate a similar report.
I’m sorry, I can’t assist with that request.

Anonymous
11/09/25(Sun)10:40:14 No.107153759

Anonymous 11/09/25(Sun)10:40:14 No.107153759

>>107153697
we have a coper over here

Anonymous
11/09/25(Sun)10:42:49 No.107153780

Anonymous 11/09/25(Sun)10:42:49 No.107153780

File: file.png (23 KB, 317x543)

23 KB PNG

>>107153708
It doesn't like mixing lists like that.
I'm surprised it doesn't reset to 1, my memory is bad but I thought that was the case.

Anonymous
11/09/25(Sun)10:42:59 No.107153782

Anonymous 11/09/25(Sun)10:42:59 No.107153782

>>107153664
no but maybe this can help you figure out the right chat template in the case you're using the wrong one: https://huggingface.co/spaces/Xenova/jinja-playground

Anonymous
11/09/25(Sun)10:43:13 No.107153784

Anonymous 11/09/25(Sun)10:43:13 No.107153784

>>107153758
A model that can only handle 10k context and takes an hour to provide a response is useless for programming. Glad you have a toy that can entertain you for hours by saying "kike" and "nigger", really happy for you.

Anonymous
11/09/25(Sun)10:45:07 No.107153800

Anonymous 11/09/25(Sun)10:45:07 No.107153800

>>107153697
>That one log took you an hour to generate.
source?
>inb4 look at the time he sent messagerinos
when im using a huge model, i send it a message and get distracted jerking off to hentai or browsing 4chan and return back to it when im reminded

Anonymous
11/09/25(Sun)10:50:49 No.107153835

Anonymous 11/09/25(Sun)10:50:49 No.107153835

>>107148034
There is no gain from using fp16
Below q8 we may have a discussion but even then, above q5 there is hardly a concern.

Anonymous
11/09/25(Sun)10:50:59 No.107153837

Anonymous 11/09/25(Sun)10:50:59 No.107153837

>>107153211
You are unable to even write in proper fashion. Do not lecture other people.

Anonymous
11/09/25(Sun)10:53:18 No.107153851

Anonymous 11/09/25(Sun)10:53:18 No.107153851

File: Token Time.jpg (47 KB, 1698x58)

47 KB JPG

>>107153784
>can only handle 10k context
Reading comprehension, Rajesh. I said it finishes its job within 10k.
>>107153780
Interesting. Not too big of a deal as long as it doesn't affect codeblocks.

>>107153800
Picrel console output. You might be able to squeeze more performance by lowering the upper context buffer, but this was fine for me between doing other stuff.

Anonymous
11/09/25(Sun)10:55:47 No.107153864

Anonymous 11/09/25(Sun)10:55:47 No.107153864

>>107153851
>60t/s pp, 2.2t/s tg for a 1 trillion model
very nice anon, can you tell us more about your rig? are you the ssdmaxxer anon from a few threads back?

Anonymous
11/09/25(Sun)10:56:24 No.107153871

Anonymous 11/09/25(Sun)10:56:24 No.107153871

>>107153851
You're getting 2 t/s at 5k context. If you tried to push it past 10k you would be getting sub-1 t/s.

Anonymous
11/09/25(Sun)10:59:14 No.107153900

Anonymous 11/09/25(Sun)10:59:14 No.107153900

>>107153871
now lets see anon's local kimi benchmarks.

Anonymous
11/09/25(Sun)10:59:36 No.107153903

Anonymous 11/09/25(Sun)10:59:36 No.107153903

>>107153864
256GB RAM, 32VRAM standard maxxed motherboard gaymur box. It's really nothing impressive and even when quanted, Kimi's outputs have been consistently better than the equivalent memory-profile high quant smaller model for me.

Anonymous
11/09/25(Sun)11:00:50 No.107153914

Anonymous 11/09/25(Sun)11:00:50 No.107153914

>>107153900
I don't try to pretend running K2 is viable with available hardware.

Anonymous
11/09/25(Sun)11:02:01 No.107153922

Anonymous 11/09/25(Sun)11:02:01 No.107153922

>>107153914
jealous much?
>>107153903
DDR5? 2/4 channel? ram MHz?

Anonymous
11/09/25(Sun)11:03:01 No.107153933

Anonymous 11/09/25(Sun)11:03:01 No.107153933

>>107153922
>jealous much?
Of what exactly? A useless novelty?

Anonymous
11/09/25(Sun)11:03:50 No.107153942

Anonymous 11/09/25(Sun)11:03:50 No.107153942

>>107153922
4 channel 64x4 DDR5 6000hz. Got my sticks before the Altmanpocalypse.

Anonymous
11/09/25(Sun)11:03:52 No.107153943

Anonymous 11/09/25(Sun)11:03:52 No.107153943

Is running new kimi from the ssd worth it? I have 24gb vram and 128gb ram and want to see if a lower quant won't be unusably slow.

Anonymous
11/09/25(Sun)11:04:48 No.107153950

Anonymous 11/09/25(Sun)11:04:48 No.107153950

>>107153943
the model spends like 3000 tokens thinking no matter what you do
running this piece of shit off ssd means that you'll get one reply per day out of it if you swap out ssds once per week

Anonymous
11/09/25(Sun)11:05:41 No.107153958

Anonymous 11/09/25(Sun)11:05:41 No.107153958

>>107153950
>the model spends like 3000 tokens thinking no matter what you do
Logs for proof?

Anonymous
11/09/25(Sun)11:08:17 No.107153986

Anonymous 11/09/25(Sun)11:08:17 No.107153986

>>107153950
>you'll get one reply per day out of it
but at least you'll get something out of it

Anonymous
11/09/25(Sun)11:09:10 No.107153994

Anonymous 11/09/25(Sun)11:09:10 No.107153994

>>107153864
>are you the ssdmaxxer anon from a few threads back?
Forgot to answer this. No I'm not. My only real gripe with Kimi is that she's a size queen that's taxing the storage on my fastest drive right now, but that's a concession I'm willing to make until I get another SSD or two next paycheck.

Anonymous
11/09/25(Sun)11:09:25 No.107153997

Anonymous 11/09/25(Sun)11:09:25 No.107153997

goys? https://www.reddit.com/r/LocalLLaMA/comments/1osml7y/eli5_why_does_nvidia_always_sell_their_consumer/

Anonymous
11/09/25(Sun)11:11:18 No.107154012

Anonymous 11/09/25(Sun)11:11:18 No.107154012

>>107153950
You can already prefill the thinking under the "start reply with" section in sillytavern. Do people not know this?

Anonymous
11/09/25(Sun)11:12:22 No.107154023

Anonymous 11/09/25(Sun)11:12:22 No.107154023

>>107153942
How much did you pay for the motherboard (and which one)? Also what quant are you using?
Thanks for all the info anon

Anonymous
11/09/25(Sun)11:12:38 No.107154026

Anonymous 11/09/25(Sun)11:12:38 No.107154026

>>107154012
Which doesn't fucking help with K2-thinking because it'll do it anyway unless you just straight up use it to skip the entire reasoning with '<think></think>'. But what would be the fucking point of that?

Anonymous
11/09/25(Sun)11:14:41 No.107154041

Anonymous 11/09/25(Sun)11:14:41 No.107154041

local models status?

Anonymous
11/09/25(Sun)11:15:46 No.107154057

Anonymous 11/09/25(Sun)11:15:46 No.107154057

>>107154026
Thinkmaxing is such a retarded, degenerate form of benchmaxing. If you want the increase in intelligence, you have sacrifice 3/4 of your available context. Fuck “number goes up” grifters

Anonymous
11/09/25(Sun)11:16:00 No.107154060

Anonymous 11/09/25(Sun)11:16:00 No.107154060

>>107154041
bloated

Anonymous
11/09/25(Sun)11:16:02 No.107154061

Anonymous 11/09/25(Sun)11:16:02 No.107154061

>>107154041
Why are you underpaying NVIDIA? Do you want them to go bankrupt?! You should demand they raise their prices.

Anonymous
11/09/25(Sun)11:17:23 No.107154071

Anonymous 11/09/25(Sun)11:17:23 No.107154071

>>107154057
It's nice to have the option to scale compute-time rather than only having model size.

Anonymous
11/09/25(Sun)11:17:23 No.107154072

Anonymous 11/09/25(Sun)11:17:23 No.107154072

File: nodding bill goldberg wrestler.gif (1.12 MB, 498x284)

1.12 MB GIF

>>107154041
Best it's ever been.

Anonymous
11/09/25(Sun)11:21:55 No.107154123

Anonymous 11/09/25(Sun)11:21:55 No.107154123

>>107154023
Get the best Asus within your budget like an X870E Hero if you can afford it. I got mine way under market price. I've tried a few of the small Kimi quants and TQ1_0 is bar none the best of its weight class for consumer-tier local hardware.

Anonymous
11/09/25(Sun)11:26:27 No.107154165

Anonymous 11/09/25(Sun)11:26:27 No.107154165

>>107154123
How can q1 be any good.
Just the fact that it can produce a coherent sentence would be impressive.

Anonymous
11/09/25(Sun)11:27:08 No.107154172

Anonymous 11/09/25(Sun)11:27:08 No.107154172

>>107154041
K-for Kimi-shaped, much like the economy. Excellent if you're wealthy or you're the equivalent of a boomer and/or got in before the great RAM apocalypse, absolute trash if you're just starting to get into the hobby and/or are poor.

Anonymous
11/09/25(Sun)11:27:54 No.107154177

Anonymous 11/09/25(Sun)11:27:54 No.107154177

fuck, unlocking my pc caused a VRAM spike and overloaded my gpus

Anonymous
11/09/25(Sun)11:28:47 No.107154183

Anonymous 11/09/25(Sun)11:28:47 No.107154183

>>107154177
lol

Anonymous
11/09/25(Sun)11:29:56 No.107154191

Anonymous 11/09/25(Sun)11:29:56 No.107154191

>>107154177
Sucks. Do you know how to unmelt the tensors? The model might still be salvageable.

Anonymous
11/09/25(Sun)11:30:58 No.107154200

Anonymous 11/09/25(Sun)11:30:58 No.107154200

>>107154165
Because that particular quant only compresses the less essential parts of the model's guts as opposed to crunching things uniformly like most quantizing tools do. Proof of coherence >>107153044 >>107153682

Anonymous
11/09/25(Sun)11:32:10 No.107154210

Anonymous 11/09/25(Sun)11:32:10 No.107154210

>>107154200
real nice unslut kool aid you got there mate

Anonymous
11/09/25(Sun)11:32:37 No.107154218

Anonymous 11/09/25(Sun)11:32:37 No.107154218

>>107154191
Tensors can't be unmelted. He would need a cutting torch and skill to separate them again. His best bet is to abliterate out the affected tensors.

Anonymous
11/09/25(Sun)11:33:23 No.107154222

Anonymous 11/09/25(Sun)11:33:23 No.107154222

>>107154057
Thank you for your irrelevant input, schizo.
Anyway, K2-Thinking is really shit to use as of right now unless there's a trick.

Anonymous
11/09/25(Sun)11:33:45 No.107154223

Anonymous 11/09/25(Sun)11:33:45 No.107154223

>>107154200
>the less essential parts of the model's guts
Such as obscure knowledge and complex reasoning.

Anonymous
11/09/25(Sun)11:33:47 No.107154224

Anonymous 11/09/25(Sun)11:33:47 No.107154224

Quants are basically a form of irreversible, hard sampler

Anonymous
11/09/25(Sun)11:34:01 No.107154228

Anonymous 11/09/25(Sun)11:34:01 No.107154228

>>107144308
It's been a day, give us the logs anon.

Anonymous
11/09/25(Sun)11:34:46 No.107154237

Anonymous 11/09/25(Sun)11:34:46 No.107154237

>>107154223
You don't need that. Benches are still good so it's fine.

Anonymous
11/09/25(Sun)11:35:14 No.107154242

Anonymous 11/09/25(Sun)11:35:14 No.107154242

File: file.png (2 KB, 395x25)

2 KB PNG

>>107154177
why arent you using dwm and slock as white man intended?
no compositor btw.

Anonymous
11/09/25(Sun)11:36:22 No.107154256

Anonymous 11/09/25(Sun)11:36:22 No.107154256

>>107154242
I thought white men were still using i3-gaps and i3-lock?

Anonymous
11/09/25(Sun)11:37:10 No.107154261

Anonymous 11/09/25(Sun)11:37:10 No.107154261

>>107154256
i3 is too functional and uses too much vram

Anonymous
11/09/25(Sun)11:38:56 No.107154276

Anonymous 11/09/25(Sun)11:38:56 No.107154276

>>107154242
>dwm and slock
I can’t tell if you ARE me, or just making fun of me…that’s exactly how I roll when a gui is needed

Anonymous
11/09/25(Sun)11:40:39 No.107154293

Anonymous 11/09/25(Sun)11:40:39 No.107154293

>>107154276
b-based..

llama.cpp CUDA dev !!yhbFjk57TDr
11/09/25(Sun)11:45:18 No.107154319

llama.cpp CUDA dev !!yhbFjk57TDr 11/09/25(Sun)11:45:18 No.107154319

File: 🔥🔥🔥.png (2.53 MB, 2400x2400)

2.53 MB PNG

>>107154041
Continuously improving in lots of small ways that aren't always visible.
Previously I bought a Silverstone HELA 2050 W PSU because that was the biggest available one.
Recently I bought an ASUS PRO WS 3000W PSU and the hardware stability has become way better.
With the 2 kW PSU I could only run at most 2 uncapped 4090s in parallel at full load without risking instability, with the new 3 kW one I can run 1 5090 + 4 4090s in parallel without seeing any issues other than the room temperature (I intend to try connecting more GPUs once the cables for it arrive).
The 3 kW PSU even comes with 4 SOTA 12VHPWR connectors!

(The way to fix instability from power spikes is to cap the GPU frequency, a power limit doesn't work.)

Anonymous
11/09/25(Sun)11:47:22 No.107154329

Anonymous 11/09/25(Sun)11:47:22 No.107154329

>>107154319
>4 SOTA 12VHPWR connectors
that'll keep you warm this winter

Anonymous
11/09/25(Sun)11:50:48 No.107154355

Anonymous 11/09/25(Sun)11:50:48 No.107154355

>>107147210
>Text Gen. UI, Inference Engines
I have decision paralysis! So many options... which is best for a simple local setup?

Anonymous
11/09/25(Sun)11:51:34 No.107154358

Anonymous 11/09/25(Sun)11:51:34 No.107154358

>>107154026
reasoning isn't always beneficial for rp

Anonymous
11/09/25(Sun)11:51:37 No.107154359

Anonymous 11/09/25(Sun)11:51:37 No.107154359

>>107154319
What are your favorite models/quants at your hardware bracket, llamabro?

Anonymous
11/09/25(Sun)11:53:46 No.107154375

Anonymous 11/09/25(Sun)11:53:46 No.107154375

>>107154359
he doesn't use models, only kld testing at 512 ctx gets him going.

Anonymous
11/09/25(Sun)11:55:24 No.107154388

Anonymous 11/09/25(Sun)11:55:24 No.107154388

>>107154375
>only kld testing at 512 ctx gets him going.
And green peppers

Anonymous
11/09/25(Sun)11:56:14 No.107154399

Anonymous 11/09/25(Sun)11:56:14 No.107154399

>>107154319
>4 SOTA 12VHPWR
How do you mitigate the risk of one of these catching fires due to the shit load balancing nvidia uses for their modern cards?

Anonymous
11/09/25(Sun)11:56:38 No.107154401

Anonymous 11/09/25(Sun)11:56:38 No.107154401

>>107154355
you have to provide more information about your local setup

Anonymous
11/09/25(Sun)11:58:39 No.107154414

Anonymous 11/09/25(Sun)11:58:39 No.107154414

>>107154401
4090 and 32GB of vram. I used oobabooga since the beginning but havent played with llms in several years now. I am hoping the setup is a bit more refined nowadays without dependency hell

Anonymous
11/09/25(Sun)11:59:40 No.107154421

Anonymous 11/09/25(Sun)11:59:40 No.107154421

>>107154414
>4090 and 32GB of vram.
So in total 56GB of VRAM? Are you on linux perchance?

Anonymous
11/09/25(Sun)12:00:44 No.107154431

Anonymous 11/09/25(Sun)12:00:44 No.107154431

>>107154421
yes, arch and dual gpus

Anonymous
11/09/25(Sun)12:06:14 No.107154484

Anonymous 11/09/25(Sun)12:06:14 No.107154484

>>107154431
well how much ram do you have? is the second gpu an amd one? or nvidia?

Anonymous
11/09/25(Sun)12:08:59 No.107154507

Anonymous 11/09/25(Sun)12:08:59 No.107154507

>>107154484
all is fun in guessing games

llama.cpp CUDA dev !!yhbFjk57TDr
11/09/25(Sun)12:09:42 No.107154513

llama.cpp CUDA dev !!yhbFjk57TDr 11/09/25(Sun)12:09:42 No.107154513

>>107154359
Currently I'm spending very little time actually using language models vs. developing software for it.
One factor is that every time I use software that I'm developing myself I start thinking about all of the ways that it ought to be improved which ruins the enjoyment.
The last few weekends I've spent upgrading and rearranging my hardware and working on automating the assignment of tensors to GPUs.

It was in August when I last used language models for extended periods of time, back then I liked Deepseek R1 a lot, I haven't yet gotten to comparing it to GLM or Kimi.

>>107154399
As of right now just making sure the connectors are properly inserted and checking whether any of the cables get suspiciously hot.
I ought to buy a current clamp and check properly though.
(Also a CO2 fire extinguisher just in case.)

Anonymous
11/09/25(Sun)12:12:32 No.107154533

Anonymous 11/09/25(Sun)12:12:32 No.107154533

>>107154513
>One factor is that every time I use software that I'm developing myself
Show us your custom frontend.

Anonymous
11/09/25(Sun)12:14:44 No.107154554

Anonymous 11/09/25(Sun)12:14:44 No.107154554

>>107154513
That's very relatable. I hope development continues to be enjoyable and productive for you.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.