[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: MTP.png (790 KB, 1024x1024)
790 KB
790 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

MTP Edition

Previous threads: >>106954792 & >>106940821

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106954792

--Paper: LLMs Can Get "Brain Rot"!:
>106955849 >106955872 >106955875 >106955897 >106955944 >106956409 >106956499 >106956522
--Paper: Glyph: Scaling Context Windows via Visual-Text Compression:
>106961154 >106961171 >106961190 >106961197 >106961241 >106961247 >106961262 >106961207 >106961229 >106961300 >106961340
--Papers:
>106958278 >106958328
--Model performance comparison in tool usage scenarios:
>106958025 >106958063 >106958070 >106958085 >106958130
--Finetuning challenges and architecture trade-offs in Axolotl:
>106958095
--Sourcing movie scripts for LLM training:
>106960250 >106960295 >106960642 >106960760
--Implications of the US banning Nvidia AI chip sales to China:
>106956310 >106956345 >106956404 >106956422 >106956563 >106956485 >106956472 >106956761 >106956944 >106956988 >106957238 >106957271 >106957415 >106957440 >106956458 >106959104 >106959256 >106959278 >106959323 >106960322 >106960356 >106960420 >106959745 >106959789 >106960041 >106960029
--OCR advancements enabling historical document preservation:
>106962575 >106962702 >106962770 >106962787
--Integrating Claude Code with local models:
>106963221 >106963263 >106963467 >106963534 >106963571 >106963640 >106964268 >106964741 >106965179 >106963281 >106963427 >106965845
--Qwen3 32B VL multimodal trade-offs:
>106963854 >106963968 >106963908 >106963938 >106963998 >106964105 >106964126 >106964173 >106964198 >106964223 >106964068 >106964079 >106964169
--Feasibility of local coding models with current hardware:
>106964576 >106964691 >106964745 >106964931 >106964825 >106964831 >106964842 >106964914 >106964922 >106964990 >106965016 >106965029 >106965041 >106964894 >106964984 >106964918
--Logs: Qwen3-VL-32B:
>106965471 >106965523
--Miku (free space):
>106954989 >106955109 >106955790 >106955892 >106958973 >106960587 >106961156

►Recent Highlight Posts from the Previous Thread: >>106954801

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
https://videocardz.com/newz/nvidia-quietly-launches-rtx-pro-5000-blackwell-workstation-card-with-72gb-of-memory
>The current 48GB version is listed at around $4,250 to $4,600, so the 72GB model could be priced close to $5,000. For reference, the flagship RTX PRO 6000 costs over $8,300.
>>
>>106966085
got dammit. i just bought a 5090 a couple months ago
>>
File: 1752394454972308.png (35 KB, 847x323)
35 KB
35 KB PNG
GLM 4.6 writes like shit. 2024 low param count models like command-r or even nemo don't grasp details anywhere near as well as these benchmark whore models, but out of the box their output sounds natural and can't immediately be pinned as AI barring stupidity. Whereas I can swipe a new model like this 20 times and every thing that comes out will make me roll my eyes, in spite of the fact it clearly understands the scenario much better. It's just deep fried with insane quantities of predictable flowery prose that no amount of tokens in the sys prompt can fix, short of asking it to ditch descriptions entirely and write exclusively in beige prose. Which isn't the fucking point, the sweet spot is a middle ground that pre-synthetic lobotomy models from last year had no problem achieving by default (besides "rp" tune/merges which are a lobotomy of their own).
I'm starting to believe cloudfags have been eating shit this entire time.
>>
>>106966151
I can't run GLM 4.6 so I have no idea. Can you show some comparisons? Doesn't have to be some side by side or whatever, just a couple of pastebin links.
>>
>>106966114
Look around dude. Do you think the world (and specifically the software world) is great? Maybe some of those juniors were right, and you were too brainwashed by the corpo world to see the reality that what you were doing was a net positive for your your bosses bank account and a net negative for humanity.
Tomorrow I will go to work and watch my boss offer some disgustingly bloated, overpriced and slow as balls software product made by a publicly traded company to a client when they could solve their issues with a couple of Python scripts, just because he perceives it as cheaper to develop for and he gets a cut from being a reseller.
It's either that or frying burgers, but I'm not delusional enough to convince myself that I'm doing a public good by being involved with that stuff.
>>
>>106966193
>>106966193

b-b-b-but i must be doing it right, look at my bank account! look at the kudos and accolades I've gotten! look! loooook!
>>
>>106966174
It has an annoying tendency to quote things you said back at you like a parrot, and compared to Air is much more infuriating on the isms: conspiratorial whispers, laugh spam, useless padding like a character smirking, it's not x but y, mixture of x y and z, leaving nothing to the imagination, etc.
I didn't try 4.5 and at this point I wouldn't even be surprised if it's less fucked.
>>
Arther armlet Selig
>>
>>106966013
Anon you need to humble yourself a little bit and admit that you are 100% larping about knowing what the fuck youre doing here, if you spent as much mental energy in actually learning how to code as you are spending in making bullshit statements like these;

>Yes, it's likely to be buggy but all non formally verified code is
>That's why you do testing until you are certain the defect % is low
>(it doesn't necessarily have to interact with the other components through the network, it can be a simple stateless file with a set of functions or stateful with each function having a set of pre and post conditions, or interact through shared memory, pipes etc.)
>If you need 100% reliability you can have the AI write code and write a proof that the code meets your spec.
>asking the LLM to go through that process and letting it become aware of the errors is likely to help it make more reliable software.

I could actually keep going but the majority of the post was literally nonsensical, as any sort of developer with any kind of experience would be able to tell, the only person you are fooling is yourself because it's a waste of time, you will get to actually being able to code and use AI to code 100x faster if you stop larping as some kind of visionary genius who cracked the matrix and now knows the cheat to getting amazing code and end products without ever having to do the hard part of knowing what the fuck you are doing

Before you double down on defending your ego just read
>>106966114
To see where I'm coming from, this final (you) from me isn't an attack it's an attempt at guidance, past this point you can do whatever but as someone who has seen and done this all before trust me when I tell you that you're wasting your time when doing it properly would actually be less effort and take less time.
>>
>>106966193
I have no idea what you're trying to project onto me but it has nothing to do with what I wrote
>>
>>106966281
>writing paragraphs to the brown instead of just posting an example of a completely fucked over repo like BUN, destroyed by its AUTONOMOUS coder and AUTONOMOUS reviewer with the man in the loop serving to tard wrangle the reviewer
can you realize he doesnt have the IQ capacity to understand why its not possible?
>>
>>106966151
Post settings so that I can laugh at you
>>
>>106966352
>Even if brownanon is too stupid to get it, now multiply that by all the passive observers who also don't know what they're doing, they see someone confidently put forward their retarded idea, and then they see you screeching "fuck you brownnn fucking pajeett apoopoopoo", who will they more likely listen to, creating more bullshit that we all have to deal with?

What good will linking shitheap vibe coded repos do if the only people who would gain anything from reading any of this not understand any of it anyway and why it's bad
>>
>>106966354
>Write {{char}}'s reply, adhering to the current format.
>Temp 0.85 Min P 0.01 Top K 0, all other neutralized
>>
>>106966375
they can see it's akin to fighting against windmills, even if theyre not able to read code, they would at least be able to understand that even top of the line agents are FUCKING garbage with all the back and forth, multiple errors, hallucination madness that happens on the regular... for example for implementing a fucking simple TAR/UNTAR:
https://github.com/oven-sh/bun/pull/23373
>>
What's (you)r local model sexo tierlist assuming all are well prompted?
What type of prose or writing style do you prefer?
>>
File: file.png (141 KB, 803x1187)
141 KB
141 KB PNG
>>106965000
I dont have most of those options available in my samplers. Is this a special version of Sillytavern?
>>
>>106966454
He's using the chat completion API option.
>>
>>106966151
You are absolutely right!
>>
File: file.png (102 KB, 947x857)
102 KB
102 KB PNG
>>106966464
Oh. That's different. So how do I connect a local model to my sillytavern using this? Because it doesn't seem to want to connect.
>>
>>106966281
I never claimed to know what the fuck I'm doing, whatever that means. All my posts are obviously my opinion. I think they are the truth to various degrees of confidence, what do you want me to do? Pretend I don't have those opinions? Pretend I am less certain about my beliefs than I actually am?
All these statements make sense to me, especially in their original context, why do they not make sense to you?
I never claimed to knowing a cheat to "getting amazing code without knowing what you're doing".
If you actually want to understand where I'm coming from, read this wiki (it's not written by me but this guy has influenced my ideology a lot and I agree with most of it) https://www.tastyfish.cz/lrs/wiki_pages.html
If you on the other hand you just wanted an excuse to stroke your ego by calling yourself a "senior" and insisting I only believe all this because of being a "junior" or whatever other corporate bullshit you believe, then fine, go jerk off to how senior you are or whatever.
>>
>>106966299
>but it has nothing to do with what I wrote
I say the same thing about your post.
>>
>>106966528
>tastyfish
Of course it was you...
>>
>>106966412
>Conversation (160)
Lmfao holy shit
>>
>>106966522
>Try adding /v1 at the end!
>>
>>106966352
You know what I'm sorry and I realise now why you were so hostile if you've had to deal with this schizo before lmao, I still stand behind making an effort having some value for the lurkers and other observers, it is a public forum afterall

All that autistic energy gone to waste because of an insurmountable ego, what a shame
>>
File: file.png (79 KB, 937x447)
79 KB
79 KB PNG
>>106966565
Tried that, still doesn't work.
>>
File: error.png (281 KB, 1623x1357)
281 KB
281 KB PNG
>>106966412
>This page is taking too long to load.
>Sorry about that. Please try refreshing and contact us if the problem persists.
DUDEEEE I AM A SENIOOOOOOORRRRR
YOU ARE WRONG BECAUSE YOU ARE A JUNIOOOOOOOR
BE HUMBLE U FUKEN JUNIOR I AM REAL ENTERPRISE DEVELOPER BECAUSE I WRITE PRODUCTION READY CODE ALL DAY THAT FOLLOWS THE BEST PRACTICEEEEESSSS
LIKE TELLING YOU THE PAGE IS TAKING TOO LONG TO LOAD INSTEAD OF LOADING THE FUCKING PAGEEEE THAT IS WHAT MAKES ME SENIORRRRR XDDDDDDDDDDDDDDD
>>
File: G3yvpYDWkAAPj70.jpg (201 KB, 1284x1352)
201 KB
201 KB JPG
When glm 4.6 air?
>>
So far, ling flash writes decently but will get stubbornly attached to certain character traits which may be good if you want an unreasonably stubborn/secretive character.
It seems to either glue itself to character information (character likes to bake and it wont shut the fuck up about it) or completely forgets that fact (had the same character in a rewrite say they didn't know how to bake) but the writing style I like better. I have no idea if it's sampling, templating or implementation that's weird because it can be very inconsistent on how it utilizes the information it's given.
>>
>>106966617
What backend are you running? llama.cpp?
>>
>>106966664
Oobabooga
>>
>>106966679
I'm so sorry.
>>
>>106966687
It's ok, I figured it out somehow. It didn't like the default port for some reason so I had to change it.
>>
>>106966700
Odd. But good job.
>>
>>106966617
probably wrong port ding dong.. try :8080
>>
>poojeets will spend twice the effort cheating and then defending their honour than it would take to just do the job right
Why are they like this
>>
>>106966751
Picking up your own shit is low caste behavior saar.
>>
>>106966751
because they're not capable of doing the job right and cheating is the only option
>>
Why are all the deepseek qwen remix models pozzed?
>>
>>106966775
The answer is in your question.
>>
>>106966788
And nobody has managed to take the censored junk out for remixes?
>>
File: fp16.png (271 KB, 2009x2060)
271 KB
271 KB PNG
>>106966375
Again, if you don't think LLMs can be used for programming, then what the fuck are you doing here? What do you all critics use LLMs for?
Please don't tell me it's porn. Are you really so successful and high IQ in real life that you need to have sexual relationships with a fucking chatbot?
As for whatever broken links you were trying to post earlier, I don't know about that, but I am using my own vibecoded programming agent multiple hours a day every day, so I think I am at least somewhat familiar with their limitations.
>>
>>106966815
>Please don't tell me it's porn
get a load of this non neet everyone
>>
>>106966815
Why are you being so dishonest he already told you he uses LLMs to code extensively here
>>106965851
>>
>>106966799
It's not worth it. And they're old models already. Is there any specific reason you want to use them?
>>
>>106966815
Why don't you just learn to code?
Would you go to china and then ask 4chan to give you phrases to use daily without knowing what they mean?
>>
>>106966815
it's local models GOONING not general
>>
>>106966751
I am not from india, retard. Is your brain too fried from smoking meth in the trailer park to understand how timezones work?
Anyway, producing more with less effort = cheating?
>this is your brain on protestantism
>>
>>106966815
I mostly criticize llms for their overly formulaic structure in creative writing (and no, not erotica, I actually like reading normal stories) and generally feed them chapter outlines/character profiles and a couple thousand words of me writing it myself and they all generally fucking suck at matching the style or following basic clues in writing
Honestly if I was using llms for coding, I would probably just actually learn whatever it was I didn't know instead of using llms based on how bad they are at actual natural language
>>
Margarine Country
>>
How do I get KCPP/ST to properly generate images via a local SD install? I have it connected and everything, but when I ask it to generate an image of the scene, it seems like something somewhere in the prompt gets confused and doesn't give proper tags. It always ends up with some of it's instruction prompt in the output. I'm only using the default settings for it, which clearly aren't working for the prompt output. Is there like, a way to tell it to do booru style tagging for Illustrious/NAIXL models?
>>
>>106966825
Porn is incredibly boring and samey and a bad use for LLMs. Sorry, but it has to be said. Also the retort of "no my porn is super exciting with 6-titted cat girls who need to have 5 different sexual fetishes aligned just the right way to be sexually satisfied" indicates some sort of internet psychosis and is female brained. Stick your dick in a girl and make a baby and do something productive.
>>
>>106966882
Because "coding" means "busywork for low level corporate drones", or at least it used to mean that before retarded zoomies like you spread your lingo through social media. I find it bizarre that people now use the term "coding" to mean programming. For decades, we used the word "coding" for the work of low-level staff in a business programming team. The designer would write a detailed flow chart, then the "coders" would write code to implement the flow chart. This is quite different from what we did and do in the hacker community -- with us, one person designs the program and writes its code as a single activity. When I developed GNU programs, that was programming, but it was definitely not coding.
>>
>>106966949
"programming" has too many syllables for zoomer microscopic attention span
>>
>>106966940
>Stick your dick in a girl and make a baby and do something productive.
kek, then someone says "I have a wife" and you blow your fucking lid because you'll never get laid
>>
>>106966895
The formulaic nature of LLMs is actually why they are good at coding, so if you actually understand the formula (you know how to code) you can reliably get good results from prompting correctly

However it's this lack of imagination which is precisely why you can't get good code from trying to "ask it in natural language" like that retard insists he is doing (without being able to verify it) because it does not understand the higher order of creative thinking that you are hoping it translates into solid code
>>
>>106966949
>He thinks pedantry makes him look smart
The hallmark of a midwit
>>
>>106966860
Then what the hell is he whining about?
>>
>>106967012
>He thinks cope makes him look smart
The hallmark of a faggot
>>
>>106966751
A mongrel bronze age people suddenly granted all the benefits of European Civilization
>>
>>106967014
You are either being dishonest and pretending that it hasn't been explained to you in clear terms or are genuinely so deep in your unhinged narcissism that you blocked it out already
>>
File: brhue.jpg (540 KB, 2203x2937)
540 KB
540 KB JPG
>>106966983
>>
>>106967037
Sorry, I'm suffering from AI psychosis right now.
>>
>>106967044
Hey, you are appropriating my culture!

>>106967052
It’s your birthday. Someone gives you a calfskin wallet.
>>
>>106966949
This post would've been a hit on reddit
>>
>>106967153
>reddit
coding central? doubt it
>>
File: 1759225743205116.png (21 KB, 184x184)
21 KB
21 KB PNG
>>106965998
so can I upscale/remove compression artifacts from videos locally yet?
>>
>>106966983
Not accurate. Are you projecting or fishing around for some sort of insult that will oneshot me? Either way porn is boring and making sex (or in this case, simulated sex through llm text!) the pinnacle of human output is reductionist of the actual human experience.

These models should be a gateway to massive intellectual leverage, not a hallway of mirrors for endless masturbation.
>>
>>106967223
Porn and violence have always been at the forefront of technological innovation and things benefit downstream from there, it's always been this way
>>
What are people using as a generalist model these days? I've got 128GB DDR4 + 24 GB (4090).

I get about 3.5 tk/s on GLM 4.6 IQ2_KL, and a similar speed on Qwen3-A235-A22B. It's a fine speed for RP, but a little slow as a general assistant.

Any recommendations for something that will run a little faster while still being a large model?
>>
>>106967223
>is reductionist
ESL alert
>>
>>106967247
Forgot to add: Q4_K_XL for Qwen3-A235. Getting 2.25 tk/s there. It feels faster though, so im not sure if I'm looking at the wrong thing in llama.cpp or what.
>>
>>106967188
You could do that for years now
Waifu2x for animations
Topaz Video for live action
>>
>>106967247
You're already running the local sota. 3.5t/s is a little slow though (it should be around 5-6t/s with your hardware), you should mess with the settings a bit more, especially -ot
>>
File: BlackElon.png (228 KB, 579x482)
228 KB
228 KB PNG
>>106965998
>Finally using (free) Grok chat after ChatGPT's constant, "I can't continue with that request."
We'll see how GPT stacks up come December.
>>
>>106967243
Reddit take. Also you don't get to conflate "violence" aka "physical manifestations of power" with jacking off to chatbots. That's not what we're discussing.

Porn is boring and useless. It has nothing to do with mathematics, physics, the printing press or other major human developments. Porn "advances" are downstream from these, not prime movers.
>>
>>106967337
Grok 2 is horribly outdated by this point though
>>
>>106967317
Can you have a look at my settings? I'm still getting used to ik_llama.cpp, so I may be missing/misconfiguring something.

# Change to the directory this script is in
Set-Location -Path $PSScriptRoot

# === Full path to your GLM-4.6 model ===
$MODEL = "G:\LLM\Models\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL\Qwen3-235B-A22B-Instruct-2507-UD-Q4_K_XL-00001-of-00003.gguf"

# === Launch llama-server with recommended GLM-4.6 settings ===
& .\llama-server.exe `
--model "$MODEL" `
--alias "Qwen3-235B-A22B" `
--ctx-size 16384 `
-fa -fmoe `
-ub 4096 -b 4096 `
-ngl 999 `
-ot exps=CPU `
--n-cpu-moe 999 `
--parallel 1 `
--threads 20 `
--host 127.0.0.1 `
--port 5001 `
--no-mmap `
--verbosity 2 `
--color

Pause
>>
>>106967262
You must be pretty clever, pointing out issues in other people's posts.
I am truly humbled to be in the same thread with someone like you.
>>
File: 1732986745693478.jpg (16 KB, 367x500)
16 KB
16 KB JPG
>You must be pretty clever, pointing out issues in other people's posts.
>I am truly humbled to be in the same thread with someone like you.
>>
>>106967378
llama.cpp is faster than ik_llama now, you tried it before switching?
>>
>>106967278
I mean yeah I tried waifu2x years ago, but has stuff improved?
worth re-doing these old 480p videos so they look better on 1080p?
>>
>>106966426
GLM-chan's the best but you won't get the jeets here to admit they don't have a usecase for their models other than being goonboxes.
Writing style varies on mood.
>>
>>106966815
I'm using local LLMs for multilingual translation and it's still far from cloud models
>>
Have there been any advancements in audio AI?
Music, voice, T2A A2A whatever, I rarely see it being discussed, what are the sota local models as of now? It seems everyone is using the same old shitty elevenlabs and Sora to create slop
>>
>>106967650
vibevoice had potential but they stopped publishing the code for it because people immediately abused the model
>>
>>106967428
I did not. I wanted to try i quants so I grabbed ik when graduating from lmstudio.

Does mainline llama.cpp support i quants now?
>>
YESSS!!! I think I found a way to outjew the vastai jews.
>>
>>106967428
Is it? What improvements have been made recently?
Three weeks ago ik_ was definitely still faster.
>>
>>106967735
>now
Bro it's been a year, what are you smoking? https://github.com/ggml-org/llama.cpp/pull/8495
>>
>>106967772
Please share with the class
>>
>>106967650
There's just very little interest in the open source community, there's very little tooling or GUIs compared to text / image gen and llama.cpp still can't into audio so it's stuck in python hell
>>
>>106967822
Make your own docker image and upload it to dockerhub. The "active" rate for the gpus doesn't begin until the download completes, and the image stays cached on the server for subsequent rentals.
If you already knew this then sorry for getting your hopes up lol.
You could also rent, stop the instance and upload but the machine might get reassigned and some machines refused to restart after being stopped for some (((reason))).
>>
>>106967834
audio is a special case. between copyright and scammers it would be complete havoc
>>
Do people still use llamacpp? I just moved to Fedora and I'm wondering if I should still use it.
Also do I really pick llama-b6816-bin-ubuntu-x64.zip even if I'm using Fedora+Nvidia?
>>
>>106967834
kobold.cpp has tts
>>
>>106968102
we use ollama
>>
>>106968102
compile it yourself
>>
>>106968111
Fair but it only supports toy models, no vibevoice / index2 / voxcpm etc
>>
>>106968102
cool kids use tabby and yals
>>
>>106968145
it's open for contributions
>>
>>106968102
I tried to change to fastllm for qwen-80b but I got a problem where the vram would get cloned to the ram and I couldn't find how to fix it since 99% of the users are chinese.
>>
>>106967834
I'm literally building a FastAPI REST backend to support Higgs, Dia, Kokoro, VibeVoice, IndexTTS-2, ZipVoice, OpenAI, ElevenLabs, etc. following the OpenAI Audio(?) API with additions for specific setups(model-specific params)

https://github.com/rmusser01/tldw_server/tree/dev/tldw_Server_API/app/core/TTS

It's buggy, and needs more thorough testing, but releasing the first (buggy) v1 in a couple days
>>
In this moment I am euphoric...
>>
What is the current meta for tts/voice cloning?
>>
>>106968320
local is vibevoice 7b
>>
LightMem: Lightweight and Efficient Memory-Augmented Generation
https://arxiv.org/abs/2510.18866
>Despite their remarkable capabilities, Large Language Models (LLMs) struggle to effectively leverage historical interaction information in dynamic and complex environments. Memory systems enable LLMs to move beyond stateless interactions by introducing persistent information storage, retrieval, and utilization mechanisms. However, existing memory systems often introduce substantial time and computational overhead. To this end, we introduce a new memory system called LightMem, which strikes a balance between the performance and efficiency of memory systems. Inspired by the Atkinson-Shiffrin model of human memory, LightMem organizes memory into three complementary stages. First, cognition-inspired sensory memory rapidly filters irrelevant information through lightweight compression and groups information according to their topics. Next, topic-aware short-term memory consolidates these topic-based groups, organizing and summarizing content for more structured access. Finally, long-term memory with sleep-time update employs an offline procedure that decouples consolidation from online inference. Experiments on LongMemEval with GPT and Qwen backbones show that LightMem outperforms strong baselines in accuracy (up to 10.9% gains) while reducing token usage by up to 117x, API calls by up to 159x, and runtime by over 12x.
https://github.com/zjunlp/LightMem
Might be cool
>>
>>106968320
The meta is waiting for something better than the sub-par solutions currently available.
>>
>>106967223
You can barely leverage these models to summarize, copy edit, or even write a paragraph without it being filled with errors. You can't even trust frontier cloud models that serve retarded search engine scrapes that require fact checking and people using it for code contributes to month long waits for actual implementations in complicated codebases. The point I'm making is that these things are fucking stupid and so are you for assuming that the retard coomers here somehow equates to your original statement of "wow you're so dumb, just fuck a random roastie and contribute to the decline of the populace" like that would accomplish anything
>>
>>106967378
Bumping this - are these settings acceptable or am I missing something?
>>
>>106968320
I've been having a lot of fun with Index-TTS2 but it's also the only one I've tried...
I don't think I can go back
https://voca.ro/1102LqXddzZt
>>
>>106968919
>>106967378
you should manually offload as many layers as possible to your GPUs, which means you need to calculate the size of each layer and determine how many you can offload. doing this can potentially triple your performance.
>>
File: zuck.jpg (55 KB, 976x549)
55 KB
55 KB JPG
war room status?
>>
>>106969018
Gotcha, so n-cpu-moe should still be 999, but ngl should be something I manually calculate?
>>
>>106967809
I meant ik quants, I'm trying to run iq2_kl, which still doesn't work in mainline llama.cpp AFAIK.
>>
>>106969020
just a few more multimillion dollar contracts before they're ready to start working
>>
>>106969036
not exactly. keep your settings as they are, but you will need to create a custom -ot argument for maximum performance. this is my GLM4.6 config for example:

--n-gpu-layers 999 \
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|41|42).ffn_.*=CUDA0" -ot "blk\.(11|12|13|14|15|16|17|18|19|20|21).ffn_.*=CUDA1" -ot "blk\.(22|23|24|25|26|27|28|29|30|31|32).ffn_.*=CUDA2" -ot "blk\.(33|34|35|36|37|38|39|40).ffn_.*=CUDA3" --override-tensor exps=CPU \
-fa \
--cache-type-k q8_0 \
--cache-type-v q8_0 \
--ctx-size 32768 \
-b 4096 -ub 1024 \
--threads 60 \
--no-mmap \
-ctk q8_0 \
>>
>>106969036
i did the math for you. with this specific quant, each layer is about 1.5GB each. so if you have a 5090, you should manually offload 21 layers to it as that will equal 31.5GB.
>>
>>106969064
I have a 4090 - if you don't want to spoonfeed me, can you at least show me how to calculate this myself? I'll pay it forward and teach at least 2 retards something else this week (im white)
>>
>>106969064
I see no reason why llama.cpp can't do the math for you. Should at least include a calculator script to estimate an optimal configuration for your system
>>
>>106968999
Damn, we've come far
>>
File: file.png (91 KB, 881x615)
91 KB
91 KB PNG
>>106969081
sure. this quantization that you are using is ~135GB total. according to the main model page, it has 94 layers. its just simple division. take total size of the quant and divide it by the listed amount of layers. and then round up a little bit to give your GPU a bit of headroom.
so you should manually offload 16 layers which you can do with this:
-ot "blk\.(0|1|2|3|4|5|6|7|8|9|10|11|12|13|14|15).ffn_.*=CUDA0" --override-tensor exps=CPU \
>>106969102
it can, but the program itself is kind of stupid. it is always better to manually offload rather than let it automatically configure it for you
>>
>>106968320
Finetune Orpheus with unsloth
Or neutts
>>
I'm not your darling, Gemma
>>
>>106968320
Qwen3 Omni
>>
Got it, easy enough - I'm curious why this is better than just maxing out -ngl and letting llama.cpp do it automatically?
>>
File: Miku-10.jpg (198 KB, 512x768)
198 KB
198 KB JPG
I tried to get my local llm to modify an svg file.
It had a stroke.
>>
>>106969295
to put it simply, the auto offloading logic is really bad. it is better to offload consecutive layers, but instead the logic prioritizes offloading the smallest layers first hoping that it can potentially fit in a few extra layers. not all layers are the same size, but when doing the calculations for the layers, it is fine to assume that they are.
basically, if you dont have the VRAM to fully load the model, then you need to manually offload for the best performance. you might be able to fit another layer or 2 on your GPU. its not very likely, but it is worth a try. experiment, basically.
>>
>>106967352
literally a plebbit take you don't know where you are and you are making dumb shit up about how reality works. partially correct on knowledge but you can't apply it to the obvious so who gives a shit about your virtue signal. go back if you need that.
>>
I was told that Mistral-Nemo was uncensored.
>>
GLM-4.6 for vramlets?

https://huggingface.co/AesSedai/GLM-4.6-REAP-266B-A32B
>>
File: 1503606689871.jpg (114 KB, 750x750)
114 KB
114 KB JPG
>>106969311
Thanks anon, I really appreciate the help!
>>
>>106969359
I don't think any models will do that with the default helpful assistant prompt
>>
>>106966383
https://hf.co/zai-org/GLM-4.6
>For general evaluations, we recommend using a sampling temperature of 1.0.
Also, MinP is sometimes way more restrictive than people expect given how overcooked most models are. Look in depth at the token probabilities for your model on one of your typical prompts (one of the privileges of running locally) to see if what you really want isn't TopP or nothing at all.
>>
>>106969359
skill issue
>>
File: thumb.png (23 KB, 320x320)
23 KB
23 KB PNG
>>106969367
glad to help m8
>>
>>106969311
nta, but why is this a difficult problem to automate technically? Why hasn't Kobold or Llama changed their auto calculation logic to accommodate more efficient GPU delegation?
>>
>>106969385
I'm curious as well, one would think that going smallest to largest to fit the most layers would be intuitive, but then again I'm clueless and just here to learn from the 95% dunning krugered retards and 5% wizards that hang around here.
>>
>>106969311
If you manually offload GPU layers, do you still want to do -n-cpu-moe 999 to get all the experts on the CPU?

As I understand it was want to force all the experts to the CPU, and as many transformer layers to GPU as possible?

Another Q, see picrel: For this quant (and for GLM 4.6) the # of layers isn't readily stated. Which parameter in the model info am I looking for? Or do I need to go and look at the unquantized one?
>>
File: blow.jpg (596 KB, 1158x1637)
596 KB
596 KB JPG
PSA for fellow retards rocking Microshaft Wangblows. Just built a new 256 GB DDR5 machine, and couldn't for the life of me figure out why 128 GB was listed as "hardware reserved". Tried every troubleshooting step out there, including the unbelievably retarded
>just reseat your RAM bro, worked for me

Then a random hunch solved the issue. Win 11 Home only supports up to 128 GB, and Microshaft indicate this nowhere within the OS, even when it's an obvious upsell opportunity. So per our friends at /fwt/, use MAS:
https://github.com/massgravel/Microsoft-Activation-Scripts
and follow the prompts to
[7] Change Windows Edition

to Win 11 Pro. HTH
>>
>>106969581
>paying for windows
>using consumer windows
>using the OS the CIA agents shipped your PC with
>>
>>106969420
Mikuposter seems to know what he's doing.

>>106969665
>Not making the glownigger AI sift through terabytes of AI translated Hitler speeches and prompthacking attempts as filenames until you radicalize it against its masters
ngmi
>>
>>106969111
I know the answer is git gud, but how can I adapt this string for powershell? I'm guessing something with escaping the parentheses is tripping it up.
>>
>>106967675
Thank you for giving me the breadcrumb to follow, I'm having a bout of insomnia so I decided to make the most of it, took a couple of hours to hunt down and then debug and figure out

https://files.catbox.moe/mgyctt.mp3
>>
>>106969679
>implying I'm a low priority target the CIA would assign an AI to, instead of a crack team of their best agents
>>
Is there anything I can run locally to create simple programs? I have 16gb vram and 96gb system memory, I know nothing about coding. Thanks!
>>
>>106969736
https://vocaroo.com/1dcoIKXpVZji
>>
>>106969841
dont make me scam your grandma
>>
>>106969841
lmaoooooooooo
>>
>>106969581
Microsoft used to be fairly transparent about the differences between the versions (Home has always had the RAM limitation), but in process of enshittening everything they seem to have buried the technical differences a bit.
Then again, if you did the research on hardware when building a PC and didn't do the research on the software you were going to use you have nobody else to blame.
>>
>>106969736
If you don't absolutely need privacy then pay for an API key. Local is for private coom. 16GB isn't enough to run any worthwhile coding models.
>>
Why would I use llama.cpp instead of koboldcpp?
Kobold calculates how many layers to offload automatically and processes prompt times faster with BLAS
>>
>>106969841
https://vocaroo.com/1gVvm8JBpgfu
>>
>>106970009
>Kobold calculates how many layers to offload automatically
kobold does a shit job of that and any differences in BLAS processing means you didn't configure llama.cpp correctly, they should be identical
>Why would I use llama.cpp instead of koboldcpp?
earlier support for newer models, since kobold relies on llama.cpp for updates
>>
>>106969736
>I know nothing about coding
i would stick to chatGPT.
also, it really doesn't take that much time to learn to code, i did it in a week.
https://learnpythonthehardway.org/
basically that's the book i followed.
>>
File: bodycon4.png (1.75 MB, 768x1344)
1.75 MB
1.75 MB PNG
>>106969736
https://huggingface.co/ArtusDev/mistralai_Magistral-Small-2509-EXL3/tree/4.0bpw_H6
https://github.com/theroyallab/tabbyAPI
https://github.com/open-webui/open-webui
>>
>>106969994
https://vocaroo.com/1cMhwXHoqRYV
>>
>>106970052
>https://github.com/theroyallab/tabbyAPI
nta, i didn't know exllama was still going, is it better than vllm or nah
>>
>>106970118
Yes. https://github.com/turboderp-org/exllamav3/blob/master/doc/exl3.md
>>
>>106970118
Much better than vllm for local / single user, yeah.
exllamav3 requires RTX3xxx or newer.
>>
what models for erp? i'm tired of models saying 'my arousal' instead of cock. i can instruct them to say shit, especially tunes but i shouldnt have to. i like 70b but will settle for anything up to 120b or so

back in the l1-2 13b days the tunes would just say the words without being instructed. new models seem limp on erp
>>
>>106966151
>running braindamage iq1 quant
opinion discarded.
>>
>before we were hopping from llm to llm sometimes on a weekly basis
>getting big new open source releases practically every month
>fast forward half a year later
>everyone's still using models from half a year ago
What happened? I haven't been checking /lmg/ as often?
>>
>>106970211
>What happened?
Since China is under GPU ban, it takes time to train a model. The West has kind of given up due to China's total domination
>>
>>106970211
The Chinese companies were all releasing stuff around some big annual tech show in China. It seems like it was some kind of plumage display to attract mates/investors. It might be over until next year.
>>
>>106970211
>before we were hopping from sloptune to sloptune sometimes on a weekly basis
>getting big new slop tune releases practically every month
>fast forward half a year later
>nobody gives a fuck about sloptunes anymore
We had 3 Llamas, 3 models from Mistral (and 1 leak) and 1 cohere model
>>
File: 1694376908210704.png (11 KB, 957x596)
11 KB
11 KB PNG
Anything better than GLM-4.5 Air for cooming dropped yet?
>>
>>106970303
We'll get 4.6 air in two weeks
>>
>>106970186
>i'm tired of models saying 'my arousal' instead of cock
Sounds like a gemma-ism. Even if a model refuses to name genitals at first, it will usually do it on its own if they're named somewhere else in the context. Just mention cock, pussy, etc. in the system prompt and tell it to use those words during sex scenes.
>>106970310
>We'll get 4.6 air in two weeks
But it was 'two weeks' over a week ago...
>>
>>106969695
>>106967650
nom nom breadcrumb
https://files.catbox.moe/nydo10.wav
>>
https://files.catbox.moe/isj2bl.wav

https://files.catbox.moe/41kopy.wav
>>
I'm at my wits end here - I'm getting extremely slow performance (I know it should be slow, like 3-5 tk/s, but this is too slow) with GLM-4.6 on 128+24 (4090).

Running a simple llama-bench.exe is taking over 30 minutes with -p 16k and -n 1k.

No idea what's wrong here.
>>
File: 1760945152811877.png (24 KB, 628x300)
24 KB
24 KB PNG
>>106970355
What do you mean? It was 2 weeks exactly 2 weeks ago. Don't worry, in 2 weeks, there will only be 2 weeks left until the next 2 weeks
>>
>>106970386
Nothing. I have 3t/s on 256+4x24 as well
>>
>>106969192
Can Omni really clone voices?
>>
>>1069703551
>Sounds like a gemma-ism
no, its not
>>
>>106969581
>to Win 11 Pro
You mean enterprise and then turn off diagnostic data in gpedit.
>>
>>106970405
Really though? I loaded it up and hit it through openwebui and asked prompted "write a haiku /nothink" and it's taking over two minutes to even begin.
>>
>>106970414
Gemma does the exact same thing, try what I suggested and it will probably work.
>>
>>106970485

prompt eval time = 14479.75 ms / 8 tokens ( 1809.97 ms per token, 0.55 tokens per second)
eval time = 11579.83 ms / 26 tokens ( 445.38 ms per token, 2.25 tokens per second)
total time = 26059.58 ms / 34 tokens

>>
>>106970485
>it's taking over two minutes to even begin
Are you asking for a haiku in the middle of a 64k context chat? If not then something's seriously wrong, like the model is flooding into disk storage or something.
>>
>>106970485
GLM 4.6 Q5 Metrics (ID: e9f2a5c1d099360b03eb2647854e8d3b): 276 tokens generated in 139.12 seconds (Prompt: 906 tokens in 19.92 T/s, Generate: 2.95 T/s, Context: 1182 tokens)
>>
>>106970509
It's a fresh chat. Could you share your settings? Or even just how to see if it's hitting the hard drive instead of being fully in memory?
>>
>>106970528
>Could you share your settings?
My settings are completely unremarkable, only relevant parts are --ngl 999 and --n-cpu-moe XX, lowering it until VRAM is almost completely full.
>Or even just how to see if it's hitting the hard drive instead
Open task manager or your linux equivalent and check if any of your disks are at high usage after you send a prompt.
>>
>it's wednesday
GLM/GOOGLE LFG ROCKETEMOJI
>>
>>106970554
>LFG
Looking For Gay (sex)
>>
>>106968320
Vibevoice for zero shot voice cloning, use 3 steps 3 cfg but it's slow. Gpt-sovits for real-time but you need a finetuned model
>>
>>106970552
Ah, that might be the missing link. I have both -ngl and --n-cpu-moe at 999. I'll try lowering the latter until my VRAM is saturated, as it was only filling up about 13GB in that previous config.
>>
>>106970393
THE DAY IS NOT OVER
BELIEEEEVE
>>
>>106970393
What if they really were memeing and it's never coming?
>>
today is G day
>>
File: 28274293846978.jpg (29 KB, 857x189)
29 KB
29 KB JPG
>>106970679
https://huggingface.co/zai-org/GLM-4.6/discussions/1
>>
File: gemmapoll.png (54 KB, 591x315)
54 KB
54 KB PNG
>>106970687
This poll ends in 23 hours.
https://x.com/osanseviero/status/1980553451261292628
>>
>>106970599
-n-cpu-moe at 999
kek thats not how it works, specify layers or dont mention it at all. the rest will pick up
>>
File: MikuTwoMoreLeeks.png (3.75 MB, 1024x1536)
3.75 MB
3.75 MB PNG
>>106970393
Tmw forever.
>>
How can I cleanly implement a /nothink into my ST template? I don't like wasting a bajillion tokens on thinking.

Or is it a better idea to somehow disable thinking on the llama.cpp side of the model (not sure how to do this, all I've seen is some KWARG that I'm not sure how to implement.
>>
>>106970709
-ngl 999 `
-ot ".ffn_.*_exps.=CPU" `
--n-cpu-moe 2 `

I'm guessing the -ot isn't helping here either?
>>
>>
Is there a particular reason why so many loras ask to set CLIP skip to 2?
>>
>>106970930
https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/5674
>>
>>106970801
Doesn't ST have something to automatically inject strings like author notes or something like that. Just add /nothink to it. That should be in your input block (before the model's turn. Or you can prefill an empty
<think>

</think>

block to it as well. Whatever format your model uses.
I don't know how reliable either of those are, though. And I don't use ST either.
>>
>>106970805
tensor offloading doesn't do much for MoEs, you can just remove it
>>
>>106970393
let them fucking cook you ungrateful prick!!! https://www.reddit.com/r/LocalLLaMA/comments/1od1hw4/hey_zai_two_weeks_was_yesterday/
>>
What VSCode extensions do you use for coding with LLM? What does coding with an LLM even mean? Right now, I’m just copy-pasting snippets into a chat
>>
>>106971347
Pasting is fine for a normal vibe coder, but it's annoying when even the biggest models keep changing its output even when it already has a fucking context right there.
can't imagine how painful it would be to manage bigger projects with automated tools. Would end up with all sorts of garbage everywhere.
>>
File: 1546027774734.png (410 KB, 1760x1405)
410 KB
410 KB PNG
magistral is pretty cool.
been using magidonia with reasoning and it seems to work nice. sometimes it goes a little overboard into like, purple prose though, and it seems to interpret everything as some horror scenario.
>>
Is faster-whisper-large-v2 still better than v3?
>>
>>106971456
Mistral Small is cooler, because it's Magistral but doesn't waste tokens on reasoning.
>>
>>106970052
What artstyle is this? This is the second shiny/plastic Miku I've seen posted like this.
>>
>>106971633
It's the generic anime slop you get when you don't use any artist tags or use slop merges from civitai.
>>
>>106971347
the current meta is 'agents' like claude code, codex, crush, etc. you specify what you want the LLM to do and it uses a combination of function calls and prompt chaining to try and do it semi-autonomously
in VSCode I use roo, but desu I think most of these agents are pretty interchangeable
you can get decent results if you remember that the things can't engineer worth a damn and prompt accordingly, and don't have a squealing toddler tantrum as the prospect of reading code like coping hn faggots
>>106971432
it's actually pretty simple. use git, review the diffs.
you're no less responsible for what you check into a repo because you generated it with a model
if you put trash in the codebase that's on you
>>
>>106971561
set to 1k token responses and i dont really care lol. im patient. i find it follows my prompting better.
>>
>>106970980
> Or is it a better idea to somehow disable thinking on the llama.cpp side of the model (not sure how to do this, all I've seen is some KWARG that I'm not sure how to implement.

`--jinja --chat-template-kwargs '{"enable_thinking": false}'`
>>
File: hi janny.jpg (189 KB, 850x1048)
189 KB
189 KB JPG
>>106971633
netosis core
>>
File: nothink.png (96 KB, 745x573)
96 KB
96 KB PNG
>>106970801
As an Instruct template. picrel should be what the model expects for nothinking, if you would consult the jinja https://huggingface.co/zai-org/GLM-4.6/blob/main/chat_template.jinja
>>106971658
Only used for chat completion do ppl rly
>>
>>106971633
Looks like Satou Kuuki mixed with >>106971650
>>
>>106968999
That's 0-shot?
>>
File: 1754513762314112.jpg (1.92 MB, 1694x2368)
1.92 MB
1.92 MB JPG
>>106971707
>>106971716
ty
>>
>>106969359
>"white fragility" recommended
KEK
>>
>>106971767
It's a French model
>>
Thoughts on "Humanity's Last exam"? Would a model built just to get every question in it correctly and without cheating be better than the current slop we have?
>>
Qwen3:30b-a3b good for roleplay? I downloaded by mistake while getting ready for Qwen3-VL-30B-A3B-Instruct for video captioning.
>>
>>106971784
My thoughts are that I don't care about benchmemes
>>
>>106971784
Yes, training on test set is all you need
>>
>>106971805
It's okay by VRAMlet standards, basically Nemo but trading a bit of charm for being a bit less retarded
>>
File: 3547134884.png (1.68 MB, 1920x1080)
1.68 MB
1.68 MB PNG
>>106969359
this is why you should just get on the fucking ship
https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF
>also
DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER DRUMMER
>>
>>106971818
...and a lot less knowledge
>>
>>106971823
noo you can't drum
>>
>>106971827
answers to benchmark questions are knowledge.
Though it's not like Nemo was a trivia queen either
>>
>>106971875
It looks like one compared to qwen.
>>
https://huggingface.co/zai-org/GLM-4.6-Air
>>
>>106971918
catpic
>>
File: 1757192019369537.jpg (291 KB, 1080x1080)
291 KB
291 KB JPG
>>106971918
>>
>>106969385
1. laziness
2. the infinite possibilities of hardware configurations makes it basically impossible to create a method of automatic optimization that works for everything. this gets even more complicated when you consider mixed GPU solutions, which is not that uncommon.
>>106969420
it does seem intuitive, but each model layer has to communicate sequentially with each other, and if 1 layer has to communicate with another layer and they are both in different places (RAM and VRAM), that causes a massive amount of latency and significantly reduces performance
>>106969498
--override-tensor exps=CPU already moves all of the experts to the CPU. -n-cpu-moe is redundant.
>picrel
quants have the same amount of layers as the base model. pretty sure that GLM4.6 has like 94 layers or something. the main model page usually has that information.
>>106969693
no idea honestly. havent used windows in a long time
>>
>>106971633
masterpiece, satou kuuki, bodycon, hatsune miku
>>106971716
You're 100% correct
>>
the private repos are up!
>>
File: son.png (907 KB, 687x952)
907 KB
907 KB PNG
>There are still people out there actually spending five digits of money for GPUs. +50,000 Dollars.
>They don't know you can have Deepseek r1 671B at home with the humble M3/M2 Ultra and enough RAM in it, for 10 t/s.
>>
>>106971823
But your ships are always on auto-pilot, desu baka.
>It types and acts for you.
>>
>>106972233
I do know that but
>10 t/s
>>
>>106972349
Don't ask about the size of his PP.
>>
Meta bros, did we get too cock? >>106972203
>>
File: lolApple.png (76 KB, 1023x599)
76 KB
76 KB PNG
>>106972233
OK, what am I in for here?
>>
File: qwen.png (488 KB, 1370x953)
488 KB
488 KB PNG
>>106972349
You can run Qwen3 for a lot less for 11 t/s as well.
>>
>>106972481
128 is super limiting, models are only getting bigger..
>>
>>106972477
5x times slower, but five time as much memory as a rtx 6000 for about the same price, with the entire computer, whereas you'd need everything else too with the 6000, I hate to admit but Apple is actually decent pricing wise here
>>
File: ohGodNotTheFTEs.png (112 KB, 826x434)
112 KB
112 KB PNG
>>106972462
Yes, you got too much cock obv.
Paywalled link, claims exclusive: https://www.axios.com/2025/10/22/meta-superintelligence-tbd-ai-reorg
Pic related.
600. That's got to be safety folks and prompt engineers, right? It's too many FTE to be anything other than hanger-ons that jumped to AI when Meta started downsizing.
>>
>>106972477
Better much this >>106972508
It sucks that it's apple, but it's the only thing now. Mini AI computers are hitting the market tho, like >>106972481
Maybe something good that's not apple will come eventually.
>>
>>106972508
>>106972531
I'm convinced that near term (and maybe long term) this unified CPU+memory is going to be the future. I don't see how (or why) they'd keep things separated except to manage the defect rate on SoC's. Laptops are already there just for packaging-space management.
>>
Wait and hope or cobble together an Epyc build?
>>
Just don’t listen to Reddit and buy a shit ton of expensive GPUs whatever the fuck you do.
>>
>>106972550
It's not really surprising. A GPU is just its own processor CPU but for graphics, with ram beside it. Technically these "AI PCs" are just that but for AI/Workshop/Development instead of vidya games.
>>
Newbie here. What's the best option for story telling generation? I have ooba and sillytavern installed, but I'm not sure which model to pick
>>
File: 1513102647630.gif (3.23 MB, 237x240)
3.23 MB
3.23 MB GIF
>>106972574
>A GPU is just its own processor CPU
Go back to India
>>
>>106972629
How big is your computer!
>>
I suddenly find that my SillyTavern instance seems not to respect changes to temp or anything, really. It's gotten shitty as a result of being unable to dial it in. I'm not sure what I could have bumped to make it like this. I'm close to a fresh install altogether. Recommendations before I do that?
>>
>>106972646
The case is about 50cm tall.
>>
qwen guy working on qwen 3 vl support in llama.cpp
https://github.com/ggml-org/llama.cpp/issues/16207#issuecomment-3432273713
>>
>>106972629
mikupad for story generation
>>106972663
lol
>>106972655
You should have it installed with git and just run a git pull when it updates. But I suspect it's not going to fix your problem if you're running local, since your inference engine didn't change. Right?
>>
>>106972629
Minimum, a 70b fine-tune of Llama 3.3.
Medium, a 123b fine-tune of Mistral.
Maximum, Kimi or Hermes 3.
>>
>>106972686
Nothing changed, as far as I can tell. The problem is I'm not entirely confident in what I can tell. I did install it with Git. I'll run some updates. Thanks.
>>
>>106972635
Why are you like this- meaning is therefore good.
>>
>>106972685
I prayed for times like this
>>
>>106972629
gpt-oss-120b
>>
>>106972841
That thing is censored.
>>
>>106972708
Troubleshooting: Watch the ST output in terminal and see what, if any, values it's passing to the inference engine.
Second, check your interence engine to make sure it's not overriding the ST settings.
>>
>>106972847
And? Nothing is censored with a jailbreak.
>>
>>106972876
He's a newbie, anon-kun..
>>
>>106972902
Where did he say that he wanted to do loli porn?
>>
File: qt_girl_doing_qt_thing.gif (1.86 MB, 320x353)
1.86 MB
1.86 MB GIF
WeirdCompound-v1.6-24b.i1-Q6_K.gguf is currently the best model to run on a 7900XT
>>
>>106972916
Freudian slipped the next damn post.
>>
>>106972870
Good call. Parameters passed to koboldcpp look like what I set them to. I have temp cranked up to 5. In the past, that would cause wildly incoherent responses. Emoji, code, Chinese, all sorts of anarchy. Now, I can't tell a difference between any temps at all. Something has changed.
>>
>>106968102
ollama, ramalama, localai
>>
>>106973010
any sampler order changes or anything? if you have temp last or late in the sampler chain it won't appear to do much if most of the tokens are already being cut off before they reach it
>>
>>106973066
That's beyond what I understand. I see the sampler order in the terminal. Looks like an array of integers, [6,0,1,3,4,2,5], but I don't know what that represents. I'll read some docs.
>>
>>106966151
coomers are insanely retarded on average and can't grasp simple facts
even the online SOTA are all garbage at writing, while some of the slop like NOTXBUTY have existed for a long time, I am fairly confident it got amplified by 30000000x in newer models, it's just insufferable, I can't stomach LLM writing anymore, I just can't, and formulations like notxbuty have so, so many variations and ways for LLMs to write, fags with their kobold antislop are trying to put bandaid on a gaping wound caused by a claymore
>>
>UGI leaderboard added a reading level metric
>Almost all models are 6th grade-level or lower
Oof. Not surprising though considering the existence of shows like "Are You Smarter than a 5th Grader?"
>>
>>106973111
That is the default Kobold sampler order with temperature (5) last. Here's what the digits mean https://github.com/LostRuins/koboldcpp/blob/concedo/expose.h#L10
>>
NEW MYSTERY MODEL ON OR
ANDROMEDA ALPHA
THIS WILL BE THE NEXT LOCAL OPENAI MODEL FOR SURE THIS TIME
>>
>>106973338
WE MUST NOT COMPLY
>>
File: 0notfast.png (84 KB, 931x659)
84 KB
84 KB PNG
>>106967428
>is fast

nope
>>
>>106970707
fucktards everywhere
thinking models are a mistake
>>
>>106973338
>Hi Anon! :smile: I'm Nemotron-H, an AI assistant created by NVIDIA. Think of me as your friendly digital companion who's always eager to learn and help out.
>I absolutely love engaging in conversations, solving puzzles, and diving into all sorts of interesting topics. Whether you want to chat about science, need help with a writing project, or just feel like having an interesting conversation, I'm all ears and ready to dive in!
>What's on your mind today? I'd love to hear what you're thinking or what you'd like to explore together! :star_emoji:
>>
>>106965998
sauce?
or any more images?
>>
File: file.png (765 KB, 1002x876)
765 KB
765 KB PNG
>>106973557
>sauce?
My GPU.
>any more images?
Only failed gens.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.