/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/21/25(Tue)17:13:09 No.106965998

File: MTP.png (790 KB, 1024x1024)

790 KB PNG

/lmg/ - Local Models General Anonymous 10/21/25(Tue)17:13:09 No.106965998

/lmg/ - a general dedicated to the discussion and development of local language models.

MTP Edition

Previous threads: >>106954792 & >>106940821

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/21/25(Tue)17:13:26 No.106966003

Anonymous 10/21/25(Tue)17:13:26 No.106966003

File: __hatsune_miku_and_kasane(...).jpg (781 KB, 3600x2700)

781 KB JPG

►Recent Highlights from the Previous Thread: >>106954792

--Paper: LLMs Can Get "Brain Rot"!:
>106955849 >106955872 >106955875 >106955897 >106955944 >106956409 >106956499 >106956522
--Paper: Glyph: Scaling Context Windows via Visual-Text Compression:
>106961154 >106961171 >106961190 >106961197 >106961241 >106961247 >106961262 >106961207 >106961229 >106961300 >106961340
--Papers:
>106958278 >106958328
--Model performance comparison in tool usage scenarios:
>106958025 >106958063 >106958070 >106958085 >106958130
--Finetuning challenges and architecture trade-offs in Axolotl:
>106958095
--Sourcing movie scripts for LLM training:
>106960250 >106960295 >106960642 >106960760
--Implications of the US banning Nvidia AI chip sales to China:
>106956310 >106956345 >106956404 >106956422 >106956563 >106956485 >106956472 >106956761 >106956944 >106956988 >106957238 >106957271 >106957415 >106957440 >106956458 >106959104 >106959256 >106959278 >106959323 >106960322 >106960356 >106960420 >106959745 >106959789 >106960041 >106960029
--OCR advancements enabling historical document preservation:
>106962575 >106962702 >106962770 >106962787
--Integrating Claude Code with local models:
>106963221 >106963263 >106963467 >106963534 >106963571 >106963640 >106964268 >106964741 >106965179 >106963281 >106963427 >106965845
--Qwen3 32B VL multimodal trade-offs:
>106963854 >106963968 >106963908 >106963938 >106963998 >106964105 >106964126 >106964173 >106964198 >106964223 >106964068 >106964079 >106964169
--Feasibility of local coding models with current hardware:
>106964576 >106964691 >106964745 >106964931 >106964825 >106964831 >106964842 >106964914 >106964922 >106964990 >106965016 >106965029 >106965041 >106964894 >106964984 >106964918
--Logs: Qwen3-VL-32B:
>106965471 >106965523
--Miku (free space):
>106954989 >106955109 >106955790 >106955892 >106958973 >106960587 >106961156

►Recent Highlight Posts from the Previous Thread: >>106954801

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/21/25(Tue)17:21:49 No.106966085

Anonymous 10/21/25(Tue)17:21:49 No.106966085

https://videocardz.com/newz/nvidia-quietly-launches-rtx-pro-5000-blackwell-workstation-card-with-72gb-of-memory
>The current 48GB version is listed at around $4,250 to $4,600, so the 72GB model could be priced close to $5,000. For reference, the flagship RTX PRO 6000 costs over $8,300.

Anonymous
10/21/25(Tue)17:22:35 No.106966097

Anonymous 10/21/25(Tue)17:22:35 No.106966097

>>106966085
got dammit. i just bought a 5090 a couple months ago

Anonymous
10/21/25(Tue)17:28:53 No.106966151

Anonymous 10/21/25(Tue)17:28:53 No.106966151

File: 1752394454972308.png (35 KB, 847x323)

35 KB PNG

GLM 4.6 writes like shit. 2024 low param count models like command-r or even nemo don't grasp details anywhere near as well as these benchmark whore models, but out of the box their output sounds natural and can't immediately be pinned as AI barring stupidity. Whereas I can swipe a new model like this 20 times and every thing that comes out will make me roll my eyes, in spite of the fact it clearly understands the scenario much better. It's just deep fried with insane quantities of predictable flowery prose that no amount of tokens in the sys prompt can fix, short of asking it to ditch descriptions entirely and write exclusively in beige prose. Which isn't the fucking point, the sweet spot is a middle ground that pre-synthetic lobotomy models from last year had no problem achieving by default (besides "rp" tune/merges which are a lobotomy of their own).
I'm starting to believe cloudfags have been eating shit this entire time.

Anonymous
10/21/25(Tue)17:31:27 No.106966174

Anonymous 10/21/25(Tue)17:31:27 No.106966174

>>106966151
I can't run GLM 4.6 so I have no idea. Can you show some comparisons? Doesn't have to be some side by side or whatever, just a couple of pastebin links.

Anonymous
10/21/25(Tue)17:33:08 No.106966193

Anonymous 10/21/25(Tue)17:33:08 No.106966193

>>106966114
Look around dude. Do you think the world (and specifically the software world) is great? Maybe some of those juniors were right, and you were too brainwashed by the corpo world to see the reality that what you were doing was a net positive for your your bosses bank account and a net negative for humanity.
Tomorrow I will go to work and watch my boss offer some disgustingly bloated, overpriced and slow as balls software product made by a publicly traded company to a client when they could solve their issues with a couple of Python scripts, just because he perceives it as cheaper to develop for and he gets a cut from being a reseller.
It's either that or frying burgers, but I'm not delusional enough to convince myself that I'm doing a public good by being involved with that stuff.

Anonymous
10/21/25(Tue)17:36:02 No.106966224

Anonymous 10/21/25(Tue)17:36:02 No.106966224

>>106966193
>>106966193

b-b-b-but i must be doing it right, look at my bank account! look at the kudos and accolades I've gotten! look! loooook!

Anonymous
10/21/25(Tue)17:39:50 No.106966258

Anonymous 10/21/25(Tue)17:39:50 No.106966258

>>106966174
It has an annoying tendency to quote things you said back at you like a parrot, and compared to Air is much more infuriating on the isms: conspiratorial whispers, laugh spam, useless padding like a character smirking, it's not x but y, mixture of x y and z, leaving nothing to the imagination, etc.
I didn't try 4.5 and at this point I wouldn't even be surprised if it's less fucked.

Anonymous
10/21/25(Tue)17:40:27 No.106966264

Anonymous 10/21/25(Tue)17:40:27 No.106966264

Arther armlet Selig

Anonymous
10/21/25(Tue)17:41:43 No.106966281

Anonymous 10/21/25(Tue)17:41:43 No.106966281

>>106966013
Anon you need to humble yourself a little bit and admit that you are 100% larping about knowing what the fuck youre doing here, if you spent as much mental energy in actually learning how to code as you are spending in making bullshit statements like these;

>Yes, it's likely to be buggy but all non formally verified code is
>That's why you do testing until you are certain the defect % is low
>(it doesn't necessarily have to interact with the other components through the network, it can be a simple stateless file with a set of functions or stateful with each function having a set of pre and post conditions, or interact through shared memory, pipes etc.)
>If you need 100% reliability you can have the AI write code and write a proof that the code meets your spec.
>asking the LLM to go through that process and letting it become aware of the errors is likely to help it make more reliable software.

I could actually keep going but the majority of the post was literally nonsensical, as any sort of developer with any kind of experience would be able to tell, the only person you are fooling is yourself because it's a waste of time, you will get to actually being able to code and use AI to code 100x faster if you stop larping as some kind of visionary genius who cracked the matrix and now knows the cheat to getting amazing code and end products without ever having to do the hard part of knowing what the fuck you are doing

Before you double down on defending your ego just read
>>106966114
To see where I'm coming from, this final (you) from me isn't an attack it's an attempt at guidance, past this point you can do whatever but as someone who has seen and done this all before trust me when I tell you that you're wasting your time when doing it properly would actually be less effort and take less time.

Anonymous
10/21/25(Tue)17:43:26 No.106966299

Anonymous 10/21/25(Tue)17:43:26 No.106966299

>>106966193
I have no idea what you're trying to project onto me but it has nothing to do with what I wrote

Anonymous
10/21/25(Tue)17:50:14 No.106966352

Anonymous 10/21/25(Tue)17:50:14 No.106966352

>>106966281
>writing paragraphs to the brown instead of just posting an example of a completely fucked over repo like BUN, destroyed by its AUTONOMOUS coder and AUTONOMOUS reviewer with the man in the loop serving to tard wrangle the reviewer
can you realize he doesnt have the IQ capacity to understand why its not possible?

Anonymous
10/21/25(Tue)17:50:41 No.106966354

Anonymous 10/21/25(Tue)17:50:41 No.106966354

>>106966151
Post settings so that I can laugh at you

Anonymous
10/21/25(Tue)17:53:11 No.106966375

Anonymous 10/21/25(Tue)17:53:11 No.106966375

>>106966352
>Even if brownanon is too stupid to get it, now multiply that by all the passive observers who also don't know what they're doing, they see someone confidently put forward their retarded idea, and then they see you screeching "fuck you brownnn fucking pajeett apoopoopoo", who will they more likely listen to, creating more bullshit that we all have to deal with?

What good will linking shitheap vibe coded repos do if the only people who would gain anything from reading any of this not understand any of it anyway and why it's bad

Anonymous
10/21/25(Tue)17:54:16 No.106966383

Anonymous 10/21/25(Tue)17:54:16 No.106966383

>>106966354
>Write {{char}}'s reply, adhering to the current format.
>Temp 0.85 Min P 0.01 Top K 0, all other neutralized

Anonymous
10/21/25(Tue)17:57:00 No.106966412

Anonymous 10/21/25(Tue)17:57:00 No.106966412

>>106966375
they can see it's akin to fighting against windmills, even if theyre not able to read code, they would at least be able to understand that even top of the line agents are FUCKING garbage with all the back and forth, multiple errors, hallucination madness that happens on the regular... for example for implementing a fucking simple TAR/UNTAR:
https://github.com/oven-sh/bun/pull/23373

Anonymous
10/21/25(Tue)17:58:29 No.106966426

Anonymous 10/21/25(Tue)17:58:29 No.106966426

What's (you)r local model sexo tierlist assuming all are well prompted?
What type of prose or writing style do you prefer?

Anonymous
10/21/25(Tue)18:00:31 No.106966454

Anonymous 10/21/25(Tue)18:00:31 No.106966454

File: file.png (141 KB, 803x1187)

141 KB PNG

>>106965000
I dont have most of those options available in my samplers. Is this a special version of Sillytavern?

Anonymous
10/21/25(Tue)18:01:38 No.106966464

Anonymous 10/21/25(Tue)18:01:38 No.106966464

>>106966454
He's using the chat completion API option.

Anonymous
10/21/25(Tue)18:06:48 No.106966518

Anonymous 10/21/25(Tue)18:06:48 No.106966518

>>106966151
You are absolutely right!

Anonymous
10/21/25(Tue)18:07:22 No.106966522

Anonymous 10/21/25(Tue)18:07:22 No.106966522

File: file.png (102 KB, 947x857)

102 KB PNG

>>106966464
Oh. That's different. So how do I connect a local model to my sillytavern using this? Because it doesn't seem to want to connect.

Anonymous
10/21/25(Tue)18:07:35 No.106966528

Anonymous 10/21/25(Tue)18:07:35 No.106966528

>>106966281
I never claimed to know what the fuck I'm doing, whatever that means. All my posts are obviously my opinion. I think they are the truth to various degrees of confidence, what do you want me to do? Pretend I don't have those opinions? Pretend I am less certain about my beliefs than I actually am?
All these statements make sense to me, especially in their original context, why do they not make sense to you?
I never claimed to knowing a cheat to "getting amazing code without knowing what you're doing".
If you actually want to understand where I'm coming from, read this wiki (it's not written by me but this guy has influenced my ideology a lot and I agree with most of it) https://www.tastyfish.cz/lrs/wiki_pages.html
If you on the other hand you just wanted an excuse to stroke your ego by calling yourself a "senior" and insisting I only believe all this because of being a "junior" or whatever other corporate bullshit you believe, then fine, go jerk off to how senior you are or whatever.

Anonymous
10/21/25(Tue)18:08:36 No.106966542

Anonymous 10/21/25(Tue)18:08:36 No.106966542

>>106966299
>but it has nothing to do with what I wrote
I say the same thing about your post.

Anonymous
10/21/25(Tue)18:09:01 No.106966548

Anonymous 10/21/25(Tue)18:09:01 No.106966548

>>106966528
>tastyfish
Of course it was you...

Anonymous
10/21/25(Tue)18:09:08 No.106966550

Anonymous 10/21/25(Tue)18:09:08 No.106966550

>>106966412
>Conversation (160)
Lmfao holy shit

Anonymous
10/21/25(Tue)18:10:41 No.106966565

Anonymous 10/21/25(Tue)18:10:41 No.106966565

>>106966522
>Try adding /v1 at the end!

Anonymous
10/21/25(Tue)18:15:57 No.106966616

Anonymous 10/21/25(Tue)18:15:57 No.106966616

>>106966352
You know what I'm sorry and I realise now why you were so hostile if you've had to deal with this schizo before lmao, I still stand behind making an effort having some value for the lurkers and other observers, it is a public forum afterall

All that autistic energy gone to waste because of an insurmountable ego, what a shame

Anonymous
10/21/25(Tue)18:15:59 No.106966617

Anonymous 10/21/25(Tue)18:15:59 No.106966617

File: file.png (79 KB, 937x447)

79 KB PNG

>>106966565
Tried that, still doesn't work.

Anonymous
10/21/25(Tue)18:17:02 No.106966629

Anonymous 10/21/25(Tue)18:17:02 No.106966629

File: error.png (281 KB, 1623x1357)

281 KB PNG

>>106966412
>This page is taking too long to load.
>Sorry about that. Please try refreshing and contact us if the problem persists.
DUDEEEE I AM A SENIOOOOOOORRRRR
YOU ARE WRONG BECAUSE YOU ARE A JUNIOOOOOOOR
BE HUMBLE U FUKEN JUNIOR I AM REAL ENTERPRISE DEVELOPER BECAUSE I WRITE PRODUCTION READY CODE ALL DAY THAT FOLLOWS THE BEST PRACTICEEEEESSSS
LIKE TELLING YOU THE PAGE IS TAKING TOO LONG TO LOAD INSTEAD OF LOADING THE FUCKING PAGEEEE THAT IS WHAT MAKES ME SENIORRRRR XDDDDDDDDDDDDDDD

Anonymous
10/21/25(Tue)18:20:08 No.106966647

Anonymous 10/21/25(Tue)18:20:08 No.106966647

File: G3yvpYDWkAAPj70.jpg (201 KB, 1284x1352)

201 KB JPG

When glm 4.6 air?

Anonymous
10/21/25(Tue)18:20:58 No.106966655

Anonymous 10/21/25(Tue)18:20:58 No.106966655

So far, ling flash writes decently but will get stubbornly attached to certain character traits which may be good if you want an unreasonably stubborn/secretive character.
It seems to either glue itself to character information (character likes to bake and it wont shut the fuck up about it) or completely forgets that fact (had the same character in a rewrite say they didn't know how to bake) but the writing style I like better. I have no idea if it's sampling, templating or implementation that's weird because it can be very inconsistent on how it utilizes the information it's given.

Anonymous
10/21/25(Tue)18:21:52 No.106966664

Anonymous 10/21/25(Tue)18:21:52 No.106966664

File: mine looks like this and works.jpg (39 KB, 622x337)

39 KB JPG

>>106966617
What backend are you running? llama.cpp?

Anonymous
10/21/25(Tue)18:24:09 No.106966679

Anonymous 10/21/25(Tue)18:24:09 No.106966679

>>106966664
Oobabooga

Anonymous
10/21/25(Tue)18:25:22 No.106966687

Anonymous 10/21/25(Tue)18:25:22 No.106966687

>>106966679
I'm so sorry.

Anonymous
10/21/25(Tue)18:26:34 No.106966700

Anonymous 10/21/25(Tue)18:26:34 No.106966700

>>106966687
It's ok, I figured it out somehow. It didn't like the default port for some reason so I had to change it.

Anonymous
10/21/25(Tue)18:27:50 No.106966715

Anonymous 10/21/25(Tue)18:27:50 No.106966715

>>106966700
Odd. But good job.

Anonymous
10/21/25(Tue)18:31:11 No.106966742

Anonymous 10/21/25(Tue)18:31:11 No.106966742

>>106966617
probably wrong port ding dong.. try :8080

Anonymous
10/21/25(Tue)18:32:05 No.106966751

Anonymous 10/21/25(Tue)18:32:05 No.106966751

>poojeets will spend twice the effort cheating and then defending their honour than it would take to just do the job right
Why are they like this

Anonymous
10/21/25(Tue)18:33:22 No.106966764

Anonymous 10/21/25(Tue)18:33:22 No.106966764

>>106966751
Picking up your own shit is low caste behavior saar.

Anonymous
10/21/25(Tue)18:34:00 No.106966770

Anonymous 10/21/25(Tue)18:34:00 No.106966770

>>106966751
because they're not capable of doing the job right and cheating is the only option

Anonymous
10/21/25(Tue)18:34:32 No.106966775

Anonymous 10/21/25(Tue)18:34:32 No.106966775

Why are all the deepseek qwen remix models pozzed?

Anonymous
10/21/25(Tue)18:35:45 No.106966788

Anonymous 10/21/25(Tue)18:35:45 No.106966788

>>106966775
The answer is in your question.

Anonymous
10/21/25(Tue)18:36:50 No.106966799

Anonymous 10/21/25(Tue)18:36:50 No.106966799

>>106966788
And nobody has managed to take the censored junk out for remixes?

Anonymous
10/21/25(Tue)18:39:00 No.106966815

Anonymous 10/21/25(Tue)18:39:00 No.106966815

File: fp16.png (271 KB, 2009x2060)

271 KB PNG

>>106966375
Again, if you don't think LLMs can be used for programming, then what the fuck are you doing here? What do you all critics use LLMs for?
Please don't tell me it's porn. Are you really so successful and high IQ in real life that you need to have sexual relationships with a fucking chatbot?
As for whatever broken links you were trying to post earlier, I don't know about that, but I am using my own vibecoded programming agent multiple hours a day every day, so I think I am at least somewhat familiar with their limitations.

Anonymous
10/21/25(Tue)18:39:45 No.106966825

Anonymous 10/21/25(Tue)18:39:45 No.106966825

>>106966815
>Please don't tell me it's porn
get a load of this non neet everyone

Anonymous
10/21/25(Tue)18:43:19 No.106966860

Anonymous 10/21/25(Tue)18:43:19 No.106966860

>>106966815
Why are you being so dishonest he already told you he uses LLMs to code extensively here
>>106965851

Anonymous
10/21/25(Tue)18:43:39 No.106966867

Anonymous 10/21/25(Tue)18:43:39 No.106966867

>>106966799
It's not worth it. And they're old models already. Is there any specific reason you want to use them?

Anonymous
10/21/25(Tue)18:45:26 No.106966882

Anonymous 10/21/25(Tue)18:45:26 No.106966882

>>106966815
Why don't you just learn to code?
Would you go to china and then ask 4chan to give you phrases to use daily without knowing what they mean?

Anonymous
10/21/25(Tue)18:45:51 No.106966887

Anonymous 10/21/25(Tue)18:45:51 No.106966887

>>106966815
it's local models GOONING not general

Anonymous
10/21/25(Tue)18:45:52 No.106966888

Anonymous 10/21/25(Tue)18:45:52 No.106966888

>>106966751
I am not from india, retard. Is your brain too fried from smoking meth in the trailer park to understand how timezones work?
Anyway, producing more with less effort = cheating?
>this is your brain on protestantism

Anonymous
10/21/25(Tue)18:47:05 No.106966895

Anonymous 10/21/25(Tue)18:47:05 No.106966895

>>106966815
I mostly criticize llms for their overly formulaic structure in creative writing (and no, not erotica, I actually like reading normal stories) and generally feed them chapter outlines/character profiles and a couple thousand words of me writing it myself and they all generally fucking suck at matching the style or following basic clues in writing
Honestly if I was using llms for coding, I would probably just actually learn whatever it was I didn't know instead of using llms based on how bad they are at actual natural language

Stormwatch
10/21/25(Tue)18:48:13 No.106966902

Stormwatch 10/21/25(Tue)18:48:13 No.106966902

Margarine Country

Anonymous
10/21/25(Tue)18:49:20 No.106966910

Anonymous 10/21/25(Tue)18:49:20 No.106966910

How do I get KCPP/ST to properly generate images via a local SD install? I have it connected and everything, but when I ask it to generate an image of the scene, it seems like something somewhere in the prompt gets confused and doesn't give proper tags. It always ends up with some of it's instruction prompt in the output. I'm only using the default settings for it, which clearly aren't working for the prompt output. Is there like, a way to tell it to do booru style tagging for Illustrious/NAIXL models?

Anonymous
10/21/25(Tue)18:52:23 No.106966940

Anonymous 10/21/25(Tue)18:52:23 No.106966940

>>106966825
Porn is incredibly boring and samey and a bad use for LLMs. Sorry, but it has to be said. Also the retort of "no my porn is super exciting with 6-titted cat girls who need to have 5 different sexual fetishes aligned just the right way to be sexually satisfied" indicates some sort of internet psychosis and is female brained. Stick your dick in a girl and make a baby and do something productive.

Anonymous
10/21/25(Tue)18:53:17 No.106966949

Anonymous 10/21/25(Tue)18:53:17 No.106966949

>>106966882
Because "coding" means "busywork for low level corporate drones", or at least it used to mean that before retarded zoomies like you spread your lingo through social media. I find it bizarre that people now use the term "coding" to mean programming. For decades, we used the word "coding" for the work of low-level staff in a business programming team. The designer would write a detailed flow chart, then the "coders" would write code to implement the flow chart. This is quite different from what we did and do in the hacker community -- with us, one person designs the program and writes its code as a single activity. When I developed GNU programs, that was programming, but it was definitely not coding.

Anonymous
10/21/25(Tue)18:55:09 No.106966972

Anonymous 10/21/25(Tue)18:55:09 No.106966972

>>106966949
"programming" has too many syllables for zoomer microscopic attention span

Anonymous
10/21/25(Tue)18:56:01 No.106966983

Anonymous 10/21/25(Tue)18:56:01 No.106966983

>>106966940
>Stick your dick in a girl and make a baby and do something productive.
kek, then someone says "I have a wife" and you blow your fucking lid because you'll never get laid

Anonymous
10/21/25(Tue)18:56:30 No.106966988

Anonymous 10/21/25(Tue)18:56:30 No.106966988

>>106966895
The formulaic nature of LLMs is actually why they are good at coding, so if you actually understand the formula (you know how to code) you can reliably get good results from prompting correctly

However it's this lack of imagination which is precisely why you can't get good code from trying to "ask it in natural language" like that retard insists he is doing (without being able to verify it) because it does not understand the higher order of creative thinking that you are hoping it translates into solid code

Anonymous
10/21/25(Tue)18:59:09 No.106967012

Anonymous 10/21/25(Tue)18:59:09 No.106967012

>>106966949
>He thinks pedantry makes him look smart
The hallmark of a midwit

Anonymous
10/21/25(Tue)18:59:22 No.106967014

Anonymous 10/21/25(Tue)18:59:22 No.106967014

>>106966860
Then what the hell is he whining about?

Anonymous
10/21/25(Tue)19:00:23 No.106967027

Anonymous 10/21/25(Tue)19:00:23 No.106967027

>>106967012
>He thinks cope makes him look smart
The hallmark of a faggot

Anonymous
10/21/25(Tue)19:01:07 No.106967032

Anonymous 10/21/25(Tue)19:01:07 No.106967032

>>106966751
A mongrel bronze age people suddenly granted all the benefits of European Civilization

Anonymous
10/21/25(Tue)19:01:32 No.106967037

Anonymous 10/21/25(Tue)19:01:32 No.106967037

>>106967014
You are either being dishonest and pretending that it hasn't been explained to you in clear terms or are genuinely so deep in your unhinged narcissism that you blocked it out already

Anonymous
10/21/25(Tue)19:02:11 No.106967044

Anonymous 10/21/25(Tue)19:02:11 No.106967044

File: brhue.jpg (540 KB, 2203x2937)

540 KB JPG

>>106966983

Anonymous
10/21/25(Tue)19:03:23 No.106967052

Anonymous 10/21/25(Tue)19:03:23 No.106967052

>>106967037
Sorry, I'm suffering from AI psychosis right now.

Anonymous
10/21/25(Tue)19:05:46 No.106967073

Anonymous 10/21/25(Tue)19:05:46 No.106967073

>>106967044
Hey, you are appropriating my culture!

>>106967052
It’s your birthday. Someone gives you a calfskin wallet.

Anonymous
10/21/25(Tue)19:16:42 No.106967153

Anonymous 10/21/25(Tue)19:16:42 No.106967153

>>106966949
This post would've been a hit on reddit

Anonymous
10/21/25(Tue)19:21:01 No.106967182

Anonymous 10/21/25(Tue)19:21:01 No.106967182

>>106967153
>reddit
coding central? doubt it

Anonymous
10/21/25(Tue)19:21:53 No.106967188

Anonymous 10/21/25(Tue)19:21:53 No.106967188

File: 1759225743205116.png (21 KB, 184x184)

21 KB PNG

>>106965998
so can I upscale/remove compression artifacts from videos locally yet?

Anonymous
10/21/25(Tue)19:26:24 No.106967223

Anonymous 10/21/25(Tue)19:26:24 No.106967223

>>106966983
Not accurate. Are you projecting or fishing around for some sort of insult that will oneshot me? Either way porn is boring and making sex (or in this case, simulated sex through llm text!) the pinnacle of human output is reductionist of the actual human experience.

These models should be a gateway to massive intellectual leverage, not a hallway of mirrors for endless masturbation.

Anonymous
10/21/25(Tue)19:28:17 No.106967243

Anonymous 10/21/25(Tue)19:28:17 No.106967243

>>106967223
Porn and violence have always been at the forefront of technological innovation and things benefit downstream from there, it's always been this way

Anonymous
10/21/25(Tue)19:28:33 No.106967247

Anonymous 10/21/25(Tue)19:28:33 No.106967247

What are people using as a generalist model these days? I've got 128GB DDR4 + 24 GB (4090).

I get about 3.5 tk/s on GLM 4.6 IQ2_KL, and a similar speed on Qwen3-A235-A22B. It's a fine speed for RP, but a little slow as a general assistant.

Any recommendations for something that will run a little faster while still being a large model?

Anonymous
10/21/25(Tue)19:30:26 No.106967262

Anonymous 10/21/25(Tue)19:30:26 No.106967262

>>106967223
>is reductionist
ESL alert

Anonymous
10/21/25(Tue)19:31:02 No.106967267

Anonymous 10/21/25(Tue)19:31:02 No.106967267

>>106967247
Forgot to add: Q4_K_XL for Qwen3-A235. Getting 2.25 tk/s there. It feels faster though, so im not sure if I'm looking at the wrong thing in llama.cpp or what.

Anonymous
10/21/25(Tue)19:31:41 No.106967278

Anonymous 10/21/25(Tue)19:31:41 No.106967278

>>106967188
You could do that for years now
Waifu2x for animations
Topaz Video for live action

Anonymous
10/21/25(Tue)19:36:16 No.106967317

Anonymous 10/21/25(Tue)19:36:16 No.106967317

>>106967247
You're already running the local sota. 3.5t/s is a little slow though (it should be around 5-6t/s with your hardware), you should mess with the settings a bit more, especially -ot

Anonymous
10/21/25(Tue)19:38:13 No.106967337

Anonymous 10/21/25(Tue)19:38:13 No.106967337

File: BlackElon.png (228 KB, 579x482)

228 KB PNG

>>106965998
>Finally using (free) Grok chat after ChatGPT's constant, "I can't continue with that request."
We'll see how GPT stacks up come December.

Anonymous
10/21/25(Tue)19:39:38 No.106967352

Anonymous 10/21/25(Tue)19:39:38 No.106967352

>>106967243
Reddit take. Also you don't get to conflate "violence" aka "physical manifestations of power" with jacking off to chatbots. That's not what we're discussing.

Porn is boring and useless. It has nothing to do with mathematics, physics, the printing press or other major human developments. Porn "advances" are downstream from these, not prime movers.

Anonymous
10/21/25(Tue)19:40:11 No.106967356

Anonymous 10/21/25(Tue)19:40:11 No.106967356

>>106967337
Grok 2 is horribly outdated by this point though

Anonymous
10/21/25(Tue)19:42:40 No.106967378

Anonymous 10/21/25(Tue)19:42:40 No.106967378

>>106967317
Can you have a look at my settings? I'm still getting used to ik_llama.cpp, so I may be missing/misconfiguring something.

# Change to the directory this script is in
Set-Location -Path $PSScriptRoot

# === Full path to your GLM-4.6 model ===
$MODEL = "G:\LLM\Models\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL-00001-of-00003.gguf"

# === Launch llama-server with recommended GLM-4.6 settings ===
& .\llama-server.exe `
  --model "$MODEL" `
  --alias "Qwen3-235B-A22B" `
  --ctx-size 16384 `
  -fa -fmoe `
  -ub 4096 -b 4096 `
  -ngl 999 `
  -ot exps=CPU `
  --n-cpu-moe 999 `
  --parallel 1 `
  --threads 20 `
  --host 127.0.0.1 `
  --port 5001 `
  --no-mmap `
  --verbosity 2 `
  --color

Pause

Anonymous
10/21/25(Tue)19:43:03 No.106967385

Anonymous 10/21/25(Tue)19:43:03 No.106967385

>>106967262
You must be pretty clever, pointing out issues in other people's posts.
I am truly humbled to be in the same thread with someone like you.

Anonymous
10/21/25(Tue)19:44:20 No.106967394

Anonymous 10/21/25(Tue)19:44:20 No.106967394

File: 1732986745693478.jpg (16 KB, 367x500)

16 KB JPG

>You must be pretty clever, pointing out issues in other people's posts.
>I am truly humbled to be in the same thread with someone like you.

Anonymous
10/21/25(Tue)19:47:13 No.106967428

Anonymous 10/21/25(Tue)19:47:13 No.106967428

>>106967378
llama.cpp is faster than ik_llama now, you tried it before switching?

Anonymous
10/21/25(Tue)19:54:23 No.106967504

Anonymous 10/21/25(Tue)19:54:23 No.106967504

>>106967278
I mean yeah I tried waifu2x years ago, but has stuff improved?
worth re-doing these old 480p videos so they look better on 1080p?

Anonymous
10/21/25(Tue)19:58:23 No.106967532

Anonymous 10/21/25(Tue)19:58:23 No.106967532

>>106966426
GLM-chan's the best but you won't get the jeets here to admit they don't have a usecase for their models other than being goonboxes.
Writing style varies on mood.

Anonymous
10/21/25(Tue)20:06:42 No.106967618

Anonymous 10/21/25(Tue)20:06:42 No.106967618

>>106966815
I'm using local LLMs for multilingual translation and it's still far from cloud models

Anonymous
10/21/25(Tue)20:10:13 No.106967650

Anonymous 10/21/25(Tue)20:10:13 No.106967650

Have there been any advancements in audio AI?
Music, voice, T2A A2A whatever, I rarely see it being discussed, what are the sota local models as of now? It seems everyone is using the same old shitty elevenlabs and Sora to create slop

Anonymous
10/21/25(Tue)20:12:31 No.106967675

Anonymous 10/21/25(Tue)20:12:31 No.106967675

>>106967650
vibevoice had potential but they stopped publishing the code for it because people immediately abused the model

Anonymous
10/21/25(Tue)20:18:51 No.106967735

Anonymous 10/21/25(Tue)20:18:51 No.106967735

>>106967428
I did not. I wanted to try i quants so I grabbed ik when graduating from lmstudio.

Does mainline llama.cpp support i quants now?

Anonymous
10/21/25(Tue)20:22:56 No.106967772

Anonymous 10/21/25(Tue)20:22:56 No.106967772

YESSS!!! I think I found a way to outjew the vastai jews.

Anonymous
10/21/25(Tue)20:24:00 No.106967775

Anonymous 10/21/25(Tue)20:24:00 No.106967775

>>106967428
Is it? What improvements have been made recently?
Three weeks ago ik_ was definitely still faster.

Anonymous
10/21/25(Tue)20:28:06 No.106967809

Anonymous 10/21/25(Tue)20:28:06 No.106967809

>>106967735
>now
Bro it's been a year, what are you smoking? https://github.com/ggml-org/llama.cpp/pull/8495

Anonymous
10/21/25(Tue)20:29:07 No.106967822

Anonymous 10/21/25(Tue)20:29:07 No.106967822

>>106967772
Please share with the class

Anonymous
10/21/25(Tue)20:30:58 No.106967834

Anonymous 10/21/25(Tue)20:30:58 No.106967834

>>106967650
There's just very little interest in the open source community, there's very little tooling or GUIs compared to text / image gen and llama.cpp still can't into audio so it's stuck in python hell

Anonymous
10/21/25(Tue)20:38:28 No.106967896

Anonymous 10/21/25(Tue)20:38:28 No.106967896

>>106967822
Make your own docker image and upload it to dockerhub. The "active" rate for the gpus doesn't begin until the download completes, and the image stays cached on the server for subsequent rentals.
If you already knew this then sorry for getting your hopes up lol.
You could also rent, stop the instance and upload but the machine might get reassigned and some machines refused to restart after being stopped for some (((reason))).

Anonymous
10/21/25(Tue)20:44:20 No.106967935

Anonymous 10/21/25(Tue)20:44:20 No.106967935

>>106967834
audio is a special case. between copyright and scammers it would be complete havoc

Anonymous
10/21/25(Tue)21:08:56 No.106968102

Anonymous 10/21/25(Tue)21:08:56 No.106968102

Do people still use llamacpp? I just moved to Fedora and I'm wondering if I should still use it.
Also do I really pick llama-b6816-bin-ubuntu-x64.zip even if I'm using Fedora+Nvidia?

Anonymous
10/21/25(Tue)21:09:54 No.106968111

Anonymous 10/21/25(Tue)21:09:54 No.106968111

>>106967834
kobold.cpp has tts

Anonymous
10/21/25(Tue)21:10:17 No.106968115

Anonymous 10/21/25(Tue)21:10:17 No.106968115

>>106968102
we use ollama

Anonymous
10/21/25(Tue)21:13:18 No.106968138

Anonymous 10/21/25(Tue)21:13:18 No.106968138

>>106968102
compile it yourself

Anonymous
10/21/25(Tue)21:13:49 No.106968145

Anonymous 10/21/25(Tue)21:13:49 No.106968145

>>106968111
Fair but it only supports toy models, no vibevoice / index2 / voxcpm etc

Anonymous
10/21/25(Tue)21:13:54 No.106968146

Anonymous 10/21/25(Tue)21:13:54 No.106968146

>>106968102
cool kids use tabby and yals

Anonymous
10/21/25(Tue)21:17:59 No.106968167

Anonymous 10/21/25(Tue)21:17:59 No.106968167

>>106968145
it's open for contributions

Anonymous
10/21/25(Tue)21:18:07 No.106968168

Anonymous 10/21/25(Tue)21:18:07 No.106968168

>>106968102
I tried to change to fastllm for qwen-80b but I got a problem where the vram would get cloned to the ram and I couldn't find how to fix it since 99% of the users are chinese.

Anonymous
10/21/25(Tue)21:28:17 No.106968248

Anonymous 10/21/25(Tue)21:28:17 No.106968248

>>106967834
I'm literally building a FastAPI REST backend to support Higgs, Dia, Kokoro, VibeVoice, IndexTTS-2, ZipVoice, OpenAI, ElevenLabs, etc. following the OpenAI Audio(?) API with additions for specific setups(model-specific params)

https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/TTS

It's buggy, and needs more thorough testing, but releasing the first (buggy) v1 in a couple days

Anonymous
10/21/25(Tue)21:31:52 No.106968290

Anonymous 10/21/25(Tue)21:31:52 No.106968290

In this moment I am euphoric...

Anonymous
10/21/25(Tue)21:34:49 No.106968320

Anonymous 10/21/25(Tue)21:34:49 No.106968320

What is the current meta for tts/voice cloning?

Anonymous
10/21/25(Tue)22:08:05 No.106968559

Anonymous 10/21/25(Tue)22:08:05 No.106968559

>>106968320
local is vibevoice 7b

Anonymous
10/21/25(Tue)22:28:26 No.106968697

Anonymous 10/21/25(Tue)22:28:26 No.106968697

LightMem: Lightweight and Efficient Memory-Augmented Generation
https://arxiv.org/abs/2510.18866
>Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x.
https://github.com/zjunlp/LightMem
Might be cool

Anonymous
10/21/25(Tue)22:49:17 No.106968825

Anonymous 10/21/25(Tue)22:49:17 No.106968825

>>106968320
The meta is waiting for something better than the sub-par solutions currently available.

Anonymous
10/21/25(Tue)22:51:18 No.106968838

Anonymous 10/21/25(Tue)22:51:18 No.106968838

>>106967223
You can barely leverage these models to summarize, copy edit, or even write a paragraph without it being filled with errors. You can't even trust frontier cloud models that serve retarded search engine scrapes that require fact checking and people using it for code contributes to month long waits for actual implementations in complicated codebases. The point I'm making is that these things are fucking stupid and so are you for assuming that the retard coomers here somehow equates to your original statement of "wow you're so dumb, just fuck a random roastie and contribute to the decline of the populace" like that would accomplish anything

Anonymous
10/21/25(Tue)23:03:03 No.106968919

Anonymous 10/21/25(Tue)23:03:03 No.106968919

>>106967378
Bumping this - are these settings acceptable or am I missing something?

Anonymous
10/21/25(Tue)23:12:52 No.106968999

Anonymous 10/21/25(Tue)23:12:52 No.106968999

>>106968320
I've been having a lot of fun with Index-TTS2 but it's also the only one I've tried...
I don't think I can go back
https://voca.ro/1102LqXddzZt

Anonymous
10/21/25(Tue)23:15:46 No.106969018

Anonymous 10/21/25(Tue)23:15:46 No.106969018

>>106968919
>>106967378
you should manually offload as many layers as possible to your GPUs, which means you need to calculate the size of each layer and determine how many you can offload. doing this can potentially triple your performance.

Anonymous
10/21/25(Tue)23:16:07 No.106969020

Anonymous 10/21/25(Tue)23:16:07 No.106969020

File: zuck.jpg (55 KB, 976x549)

55 KB JPG

war room status?

Anonymous
10/21/25(Tue)23:18:24 No.106969036

Anonymous 10/21/25(Tue)23:18:24 No.106969036

>>106969018
Gotcha, so n-cpu-moe should still be 999, but ngl should be something I manually calculate?

Anonymous
10/21/25(Tue)23:19:41 No.106969046

Anonymous 10/21/25(Tue)23:19:41 No.106969046

>>106967809
I meant ik quants, I'm trying to run iq2_kl, which still doesn't work in mainline llama.cpp AFAIK.

Anonymous
10/21/25(Tue)23:20:05 No.106969049

Anonymous 10/21/25(Tue)23:20:05 No.106969049

>>106969020
just a few more multimillion dollar contracts before they're ready to start working

Anonymous
10/21/25(Tue)23:20:25 No.106969050

Anonymous 10/21/25(Tue)23:20:25 No.106969050

>>106969036
not exactly. keep your settings as they are, but you will need to create a custom -ot argument for maximum performance. this is my GLM4.6 config for example:

--n-gpu-layers 999 \
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|41|42).ffn_.*=CUDA0" -ot "blk\.(11|12|13|14|15|16|17|18|19|20|21).ffn_.*=CUDA1" -ot "blk\.(22|23|24|25|26|27|28|29|30|31|32).ffn_.*=CUDA2" -ot "blk\.(33|34|35|36|37|38|39|40).ffn_.*=CUDA3" --override-tensor exps=CPU \
-fa \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--ctx-size 32768 \
-b 4096 -ub 1024 \
--threads 60 \
--no-mmap \
-ctk q8_0 \

Anonymous
10/21/25(Tue)23:23:09 No.106969064

Anonymous 10/21/25(Tue)23:23:09 No.106969064

>>106969036
i did the math for you. with this specific quant, each layer is about 1.5GB each. so if you have a 5090, you should manually offload 21 layers to it as that will equal 31.5GB.

Anonymous
10/21/25(Tue)23:25:40 No.106969081

Anonymous 10/21/25(Tue)23:25:40 No.106969081

>>106969064
I have a 4090 - if you don't want to spoonfeed me, can you at least show me how to calculate this myself? I'll pay it forward and teach at least 2 retards something else this week (im white)

Anonymous
10/21/25(Tue)23:29:06 No.106969102

Anonymous 10/21/25(Tue)23:29:06 No.106969102

>>106969064
I see no reason why llama.cpp can't do the math for you. Should at least include a calculator script to estimate an optimal configuration for your system

Anonymous
10/21/25(Tue)23:29:24 No.106969105

Anonymous 10/21/25(Tue)23:29:24 No.106969105

>>106968999
Damn, we've come far

Anonymous
10/21/25(Tue)23:30:23 No.106969111

Anonymous 10/21/25(Tue)23:30:23 No.106969111

File: file.png (91 KB, 881x615)

91 KB PNG

>>106969081
sure. this quantization that you are using is ~135GB total. according to the main model page, it has 94 layers. its just simple division. take total size of the quant and divide it by the listed amount of layers. and then round up a little bit to give your GPU a bit of headroom.
so you should manually offload 16 layers which you can do with this:
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15).ffn_.*=CUDA0" --override-tensor exps=CPU \
>>106969102
it can, but the program itself is kind of stupid. it is always better to manually offload rather than let it automatically configure it for you

Anonymous
10/21/25(Tue)23:31:06 No.106969117

Anonymous 10/21/25(Tue)23:31:06 No.106969117

>>106968320
Finetune Orpheus with unsloth
Or neutts

Anonymous
10/21/25(Tue)23:39:22 No.106969187

Anonymous 10/21/25(Tue)23:39:22 No.106969187

I'm not your darling, Gemma

Anonymous
10/21/25(Tue)23:40:10 No.106969192

Anonymous 10/21/25(Tue)23:40:10 No.106969192

>>106968320
Qwen3 Omni

Anonymous
10/21/25(Tue)23:59:30 No.106969295

Anonymous 10/21/25(Tue)23:59:30 No.106969295

Got it, easy enough - I'm curious why this is better than just maxing out -ngl and letting llama.cpp do it automatically?

Anonymous
10/22/25(Wed)00:01:12 No.106969297

Anonymous 10/22/25(Wed)00:01:12 No.106969297

File: Miku-10.jpg (198 KB, 512x768)

198 KB JPG

I tried to get my local llm to modify an svg file.
It had a stroke.

Anonymous
10/22/25(Wed)00:04:41 No.106969311

Anonymous 10/22/25(Wed)00:04:41 No.106969311

>>106969295
to put it simply, the auto offloading logic is really bad. it is better to offload consecutive layers, but instead the logic prioritizes offloading the smallest layers first hoping that it can potentially fit in a few extra layers. not all layers are the same size, but when doing the calculations for the layers, it is fine to assume that they are.
basically, if you dont have the VRAM to fully load the model, then you need to manually offload for the best performance. you might be able to fit another layer or 2 on your GPU. its not very likely, but it is worth a try. experiment, basically.

Anonymous
10/22/25(Wed)00:06:40 No.106969320

Anonymous 10/22/25(Wed)00:06:40 No.106969320

>>106967352
literally a plebbit take you don't know where you are and you are making dumb shit up about how reality works. partially correct on knowledge but you can't apply it to the obvious so who gives a shit about your virtue signal. go back if you need that.

Anonymous
10/22/25(Wed)00:14:32 No.106969359

Anonymous 10/22/25(Wed)00:14:32 No.106969359

File: Screenshot 2025-10-22 001011.png (171 KB, 951x1248)

171 KB PNG

I was told that Mistral-Nemo was uncensored.

Anonymous
10/22/25(Wed)00:15:57 No.106969366

Anonymous 10/22/25(Wed)00:15:57 No.106969366

GLM-4.6 for vramlets?

https://huggingface.co/AesSedai/GLM-4.6-REAP-266B-A32B

Anonymous
10/22/25(Wed)00:15:58 No.106969367

Anonymous 10/22/25(Wed)00:15:58 No.106969367

File: 1503606689871.jpg (114 KB, 750x750)

114 KB JPG

>>106969311
Thanks anon, I really appreciate the help!

Anonymous
10/22/25(Wed)00:17:03 No.106969372

Anonymous 10/22/25(Wed)00:17:03 No.106969372

>>106969359
I don't think any models will do that with the default helpful assistant prompt

Anonymous
10/22/25(Wed)00:17:54 No.106969377

Anonymous 10/22/25(Wed)00:17:54 No.106969377

>>106966383
https://hf.co/zai-org/GLM-4.6
>For general evaluations, we recommend using a sampling temperature of 1.0.
Also, MinP is sometimes way more restrictive than people expect given how overcooked most models are. Look in depth at the token probabilities for your model on one of your typical prompts (one of the privileges of running locally) to see if what you really want isn't TopP or nothing at all.

Anonymous
10/22/25(Wed)00:18:49 No.106969380

Anonymous 10/22/25(Wed)00:18:49 No.106969380

>>106969359
skill issue

Anonymous
10/22/25(Wed)00:19:20 No.106969382

Anonymous 10/22/25(Wed)00:19:20 No.106969382

File: thumb.png (23 KB, 320x320)

23 KB PNG

>>106969367
glad to help m8

Anonymous
10/22/25(Wed)00:19:42 No.106969385

Anonymous 10/22/25(Wed)00:19:42 No.106969385

>>106969311
nta, but why is this a difficult problem to automate technically? Why hasn't Kobold or Llama changed their auto calculation logic to accommodate more efficient GPU delegation?

Anonymous
10/22/25(Wed)00:24:57 No.106969420

Anonymous 10/22/25(Wed)00:24:57 No.106969420

>>106969385
I'm curious as well, one would think that going smallest to largest to fit the most layers would be intuitive, but then again I'm clueless and just here to learn from the 95% dunning krugered retards and 5% wizards that hang around here.

Anonymous
10/22/25(Wed)00:40:51 No.106969498

Anonymous 10/22/25(Wed)00:40:51 No.106969498

File: 2025-10-21 22_39_39-unslo(...).png (163 KB, 636x1667)

163 KB PNG

>>106969311
If you manually offload GPU layers, do you still want to do -n-cpu-moe 999 to get all the experts on the CPU?

As I understand it was want to force all the experts to the CPU, and as many transformer layers to GPU as possible?

Another Q, see picrel: For this quant (and for GLM 4.6) the # of layers isn't readily stated. Which parameter in the model info am I looking for? Or do I need to go and look at the unquantized one?

Anonymous
10/22/25(Wed)00:56:48 No.106969581

Anonymous 10/22/25(Wed)00:56:48 No.106969581

File: blow.jpg (596 KB, 1158x1637)

596 KB JPG

PSA for fellow retards rocking Microshaft Wangblows. Just built a new 256 GB DDR5 machine, and couldn't for the life of me figure out why 128 GB was listed as "hardware reserved". Tried every troubleshooting step out there, including the unbelievably retarded
>just reseat your RAM bro, worked for me

Then a random hunch solved the issue. Win 11 Home only supports up to 128 GB, and Microshaft indicate this nowhere within the OS, even when it's an obvious upsell opportunity. So per our friends at /fwt/, use MAS:
https://github.com/massgravel/Microsoft-Activation-Scripts
and follow the prompts to
[7] Change Windows Edition
to Win 11 Pro. HTH

Anonymous
10/22/25(Wed)01:14:59 No.106969665

Anonymous 10/22/25(Wed)01:14:59 No.106969665

>>106969581
>paying for windows
>using consumer windows
>using the OS the CIA agents shipped your PC with

Anonymous
10/22/25(Wed)01:17:29 No.106969679

Anonymous 10/22/25(Wed)01:17:29 No.106969679

>>106969420
Mikuposter seems to know what he's doing.

>>106969665
>Not making the glownigger AI sift through terabytes of AI translated Hitler speeches and prompthacking attempts as filenames until you radicalize it against its masters
ngmi

Anonymous
10/22/25(Wed)01:21:54 No.106969693

Anonymous 10/22/25(Wed)01:21:54 No.106969693

>>106969111
I know the answer is git gud, but how can I adapt this string for powershell? I'm guessing something with escaping the parentheses is tripping it up.

Anonymous
10/22/25(Wed)01:22:26 No.106969695

Anonymous 10/22/25(Wed)01:22:26 No.106969695

>>106967675
Thank you for giving me the breadcrumb to follow, I'm having a bout of insomnia so I decided to make the most of it, took a couple of hours to hunt down and then debug and figure out

https://files.catbox.moe/mgyctt.mp3

Anonymous
10/22/25(Wed)01:31:02 No.106969725

Anonymous 10/22/25(Wed)01:31:02 No.106969725

>>106969679
>implying I'm a low priority target the CIA would assign an AI to, instead of a crack team of their best agents

Anonymous
10/22/25(Wed)01:32:41 No.106969736

Anonymous 10/22/25(Wed)01:32:41 No.106969736

Is there anything I can run locally to create simple programs? I have 16gb vram and 96gb system memory, I know nothing about coding. Thanks!

Anonymous
10/22/25(Wed)01:51:18 No.106969841

Anonymous 10/22/25(Wed)01:51:18 No.106969841

>>106969736
https://vocaroo.com/1dcoIKXpVZji

Anonymous
10/22/25(Wed)01:52:49 No.106969844

Anonymous 10/22/25(Wed)01:52:49 No.106969844

>>106969841
dont make me scam your grandma

Anonymous
10/22/25(Wed)01:52:53 No.106969846

Anonymous 10/22/25(Wed)01:52:53 No.106969846

>>106969841
lmaoooooooooo

Anonymous
10/22/25(Wed)02:03:39 No.106969896

Anonymous 10/22/25(Wed)02:03:39 No.106969896

>>106969581
Microsoft used to be fairly transparent about the differences between the versions (Home has always had the RAM limitation), but in process of enshittening everything they seem to have buried the technical differences a bit.
Then again, if you did the research on hardware when building a PC and didn't do the research on the software you were going to use you have nobody else to blame.

Anonymous
10/22/25(Wed)02:21:27 No.106969994

Anonymous 10/22/25(Wed)02:21:27 No.106969994

>>106969736
If you don't absolutely need privacy then pay for an API key. Local is for private coom. 16GB isn't enough to run any worthwhile coding models.

Anonymous
10/22/25(Wed)02:23:42 No.106970009

Anonymous 10/22/25(Wed)02:23:42 No.106970009

Why would I use llama.cpp instead of koboldcpp?
Kobold calculates how many layers to offload automatically and processes prompt times faster with BLAS

Anonymous
10/22/25(Wed)02:28:09 No.106970032

Anonymous 10/22/25(Wed)02:28:09 No.106970032

>>106969841
https://vocaroo.com/1gVvm8JBpgfu

Anonymous
10/22/25(Wed)02:29:41 No.106970041

Anonymous 10/22/25(Wed)02:29:41 No.106970041

>>106970009
>Kobold calculates how many layers to offload automatically
kobold does a shit job of that and any differences in BLAS processing means you didn't configure llama.cpp correctly, they should be identical
>Why would I use llama.cpp instead of koboldcpp?
earlier support for newer models, since kobold relies on llama.cpp for updates

Anonymous
10/22/25(Wed)02:31:32 No.106970050

Anonymous 10/22/25(Wed)02:31:32 No.106970050

>>106969736
>I know nothing about coding
i would stick to chatGPT.
also, it really doesn't take that much time to learn to code, i did it in a week.
https://learnpythonthehardway.org/
basically that's the book i followed.

Anonymous
10/22/25(Wed)02:31:55 No.106970052

Anonymous 10/22/25(Wed)02:31:55 No.106970052

File: bodycon4.png (1.75 MB, 768x1344)

1.75 MB PNG

>>106969736
https://huggingface.co/ArtusDev/mistralai_Magistral-Small-2509-EXL3/tree/4.0bpw_H6
https://github.com/theroyallab/tabbyAPI
https://github.com/open-webui/open-webui

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.