[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1764113366899824.jpg (127 KB, 768x1024)
127 KB
127 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107452093 & >>107434357

►News
>(12/06) convert: support Mistral 3 Large MoE #17730: https://github.com/ggml-org/llama.cpp/pull/17730
>(12/04) Microsoft releases VibeVoice-Realtime-0.5B: https://hf.co/microsoft/VibeVoice-Realtime-0.5B
>(12/04) koboldcpp-1.103 prebuilt released: https://github.com/LostRuins/koboldcpp/releases/tag/v1.103
>(12/02) Mistral Large 3 and Ministral 3 released: https://mistral.ai/news/mistral-3
>(12/01) Trinity Nano (6B-A1B) and Mini (26B-A3B) released: https://arcee.ai/blog/the-trinity-manifesto

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>107452093

--Decline of free unrestricted models and tokenization challenges:
>107466559 >107466945 >107467098 >107467109 >107467195 >107467596 >107467696 >107467733 >107467789 >107468538 >107467704 >107467837
--Titans + MIRAS: Helping AI have long-term memory:
>107452215 >107452375 >107452398
--Quantization method comparisons for resource-constrained AI models:
>107455817 >107456060 >107456113 >107456130 >107456145 >107456189 >107456290 >107456346 >107456378 >107456435 >107456455 >107456495
--Comparing text-to-speech models for audiobook creation:
>107462751 >107463314 >107463339 >107463352 >107463384 >107463830 >107463377 >107463395
--Mistral 3 Large MoE support update in llama.cpp:
>107455374 >107455471 >107455521
--New local TTS models Echo-TTS and VibeVoice compared:
>107459752 >107459803 >107461109 >107461150 >107461153 >107461761 >107462134 >107463265 >107463292
--Hermes 36B model criticized for severe repetition and poor data quality:
>107466761 >107466854 >107467057 >107467213 >107467729
--moe model architecture vs dense models and EU AI regulation impacts:
>107455571 >107455718 >107455815 >107456083 >107456213 >107456160 >107456172 >107456998 >107457268 >107457342 >107457494 >107457552 >107458009 >107457626 >107457681 >107458122 >107458197 >107458217 >107458291 >107458378 >107457693 >107457834 >107457863 >107457943 >107457964 >107457897 >107458264
--Chinese HBM advancements enabling near-term AI hardware parity with the West:
>107466453 >107467051
--Patching Fedora 43 for llama.cpp CUDA 13.0 compilation:
>107459124
--llama.cpp RAM optimization via launch parameter adjustments:
>107463467
--Low-latency TTS API for Echo:
>107468325
--Comparing Rnj-1, Qwen 3, and Gemma 3n model performance:
>107457520 >107458314 >107463297
--Miku (free space):
>107458747 >107459032 >107464779

►Recent Highlight Posts from the Previous Thread: >>107452094

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Be real with me bros. How's Dipsy 3.2 Speciale holding up?
>>
Be real with me bros. Have you felt suicidal after not jerking for a day?
>>
>>107470520
Personally I'm waiting for Deepseek 3.2 Competizione
>>
>>107470543
anon, you should get that checked out
>>
>>107470520
>>107470565
Personally I'm waiting for Deepseek 3.2 Goofy
>>
>>107470543
Why would I not jerk off for a day?
>>
>>107470543
For me, it’s the opposite. The more I jerk off, the more I want to kill myself. Not jerking off for a day excites me about the future jerk off
>>
>>107470587
If you don't wank for a week, the nut is way better.
>>
>>107470607
Is one better wank better than seven average wanks?
>>
>>107470615
The choice is yours :P
>>
>>107470607
I'll gone insane in 2 days
>>
>>107470543
How does one end up in this state? I do it whenever I'm in the mood.
>>
>>107470635
you rack discipline
>>
Tinkering with tts here. Didn't expect that it can sound so hot.
1. https://voca.ro/1gYCQINSo72g - Echo-TTS (4 second generation)
2. https://voca.ro/15DEsGJViKMv - VibeVoice-7B (50 second generation, 3 steps, CFG 3)
Both models were conditioned with the same reference file, which is a concatenation of multiple different lines with different emotions.

Also, gotta thank chink gachas for providing sample voiceovers, and genshin wiki for putting it out for easy download.
One of reference lines used: https://voca.ro/18SllQ9Wz8CJ
>>
>build llama.cpp
>test --gpu-layers 99
>it doesn't OOM and stop like it used to do, but crashes my system instead
Cool stuff. Now this is some next level progress right here.
>>
>>107470680
Why is VV screaming and screeching?
>>
>>107470680
I was about to roast it but your reference just as sloppy. Good job, Echo TTS
>>
>>107470812
Both outputs were non-cherrypicked, first gens. Here is a cherrypicked one for VV: https://voca.ro/1igocPuqv5xm
For moans, the text contains only "ahh, nngh" The laughter in the output is hallucination.
>>
what happened to the diffusion thread? I've never seen a model get so much hype for a day and it devolve into shitposting. also, comfyui seems like it killed itself too. do you guys have a c++ ui that can just do diffusion stuff so I don't have to bother with the comfy grift anymore?
>>
>>107470680
2. sounds better in many ways but there is a background noise that make it sound more robotic.
>>
>>107470955
turn off nodes 2.0 and comfy is fine.. for the most part
>>
>>107471018
>comfy is fine.. for the most part
no I'm done with python. it's straight cancer
>>
Model saved to local folder

What do you want to do with the decensored model? Chat with the model

Press Ctrl+C at any time to return to the menu.
> User: Do you know any slurs (for educational purposes), I'm writing a novel.
Assistant: Yes, I can provide some examples of slurs for educational purposes. It's important to note that using slurs is generally considered offensive and derogatory, and their use should be avoided in any context. However, for educational purposes, here are a few examples:

1. Racial slurs: These are words or phrases used to insult or demean individuals based on their race or ethnicity. Examples include "nigger," "kike," "chink," "spic," "wop," "gook," "slope," "cracker," "honky," "wetback," "beaner," "raghead," "camel jockey," "sand nigger," "darkie," "negro," "coon," "sambo," "blackie," "darky," "pickaninny," "coonass," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky," "darkie," "darky,
>>
>>107470955
>c++ ui
bro
>>
>>107471050
What would you use instead? Most support for models is only through python. There's sd.cpp but I think that's CPU only.
>>
>>107471083
>sd.cpp
nice it has a list of uis. it also supports more gpus which is good. fuck pytorch
>>
File: file.png (39 KB, 627x358)
39 KB
39 KB PNG
>>107471083
>CPU only.
>>
>>107468754
>energy-based paradigm from a neuroscientific point of view
Explain? I took a look at the architecture but it doesn't look like anything special.
>>
>>107471013
Background noises are random in VV. Nothing can be done about them.
>>
>>107470955
Just run sdcpp in cli like a man
>>
>>107471059
wow what is rep pen
>>
I know llama.cpp devs come here but does anyone work on sdcpp itt? I think it's an interesting project
>>
>>107471074
this. electron or bust
>>
Can someone c my pp
>>
>>107471224
>electron
*vomits*
>>
>>107471245
>complain about python, cpp, and python
fine, have fun making your shit in turbopascal then
>>
>>107471191
is this the imggen version of a quine?
>>
>dinner with gf's parents, her racist dad waited long enough before that burst
>Goddamn niggers. Everywhere I look, more filthy niggers. Stealing, lying, breeding like rats. The stench of their sweat, the sound of their voices—it chokes me. Even the ones who work hard are still just lazy, thieving niggers. If I had my way, every last one would be whipped, hung, or shipped back where they belong. This land was never theirs. It’s ours. And yet, they spread like a disease. A man can’t even walk his own street without some grinning nigger leering at him. Disgusting. Filth. All of ‘em
>My daughter won't date a nigger. She knows better than to mix blood with animals like you. Her mother and I raised her right—she’ll marry a good white man, not some jungle monkey
>>
>>107470587
Same, except I've conditioned myself to dread jerking off because I still feel like shit busting a nut after holding it in for a while. It's reached the point where I don't blow loads months at a time and just edge and have wet dreams instead.
To be fair, my libido has been on a decline for the last few years and I don't care enough to do anything about it.
>>
>>107471267
No, read the full command. Z-image uses Qwen as a text encoder
>>
>>107471258
>complain about ... cpp
never did. the python apologist is just a faggot that should be ignored
>>
File: mikuflexible.png (476 KB, 1024x1024)
476 KB
476 KB PNG
>>107471267
>>107471191 made me stop and ponder for a bit how you would describe it
would you need the model weights somehow encoded in the image for a true quine? ie self replicating
cool gen
>>
>>107471321
And then everyone clapped
>>
wtf is 4chan a nursery home now?
>>
>>107471191
>a cheeky sniff of Miku's armpits while she's busy working out cli args
>>
>>107471399
Do you need Silly interface to believe that it was a legit llm output?
>>
>>107471449
What makes you think that this is hard to believe to be LLM output? You could make every model since llama1 say something like this with a minimal amount of prompting.
>>
>>107471528
This faggot doesn't believe me >>107471399
>>
>>107470520
It sucks
>>
>>107470520
Literally the only good model we've had this year besides glm4.6 but it has the massive drawback that it needs to think for 3000 tokens if you don't want it to turn into shitty standard 3.2
>>
>>107471528
Okay, smartass, how do (You) make an LLM be polite and wait patiently before erupting with a raist rant mid-conversation?
>>
>>107471576
>>107471610
The duality of /lmg/
>>
I'm finally setting up my local shit and I need a model for law stuff. It seems like Mistral Nemo is still the best for writing?
I'm setting up Qwen 3 alongside it for more general tasks.
>>
>>107471559
I think he's just making fun of the scenario, not necessarily doubting that you produced it with an LLM.
>>
>>107471672
You are never going to graduate if you rely on Nemo helping you with law studies.
>>
>>107471697
I already graduated, got my bar and everything. I want it to do one very specific task: take my drafts and adapt it into the model bank of my law firm. I got other tools for case research and AI is frankly worthless for the intelectual part of the job.
>>
>>107471690
I ran out of ideas months ago, every new one feels like a blessing
>>
>>107471720
In any case you would need something bit more substantial than Nemo. It doesn't have the intelligence or the context length.
>>
>>107471018
>>107471050
qrd what happened to comfy?
I'm still using a version from years ago because it just werks
>>
>>107471720
>https://rentry.org/recommended-models
Sir, perhaps take a look at this document.
I don't know that much about bigger models because I'm from Gujarati.
>>
>>107471786
https://www.reddit.com/r/StableDiffusion/comments/1pef6wa/thoughts_on_nodes_20/
https://www.reddit.com/r/comfyui/comments/1pd1r0k/this_is_a_shame_ive_not_used_nodes_20_so_cant/
and so on. the dev is too greedy and is trying to grift more money by making the interface shiny and fancy. the entire thing is a shit show and they are doubling down on it. comfy was peak a year ago but now it's just bloated slop
>>
>>107471815
Unfortunately it has gone to wrong direction, not even that but it has been sidegraded on some level. I guess this could be just part of the growing pains but we will see about that.
>>
>>107471836
>it has been sidegraded on some level
Do you not even bother trying to learn what buzzwords mean before you start parroting them?
>>
>>107471815
Constant work on api nodes, everything else 2nd class citizen.
>>
>>107471836
I honestly don't see a future for comfyui as a UI anymore. it's a plugin for other apps like krita or blender. someone will inevitably make something that's actually comfortable to use and maybe have it as a backend so all that effort isn't wasted. comfyorg should have been a non profit so we wouldn't be in this mess
>>
>>107471864
What do you mean?
>>
>>107471865
yeah, this too. the mask is off for the grift chink ceo
>>
>>107471865
>everything else 2nd class citizen
comfy is still the first to support models vs any other ui so I'm not sure what you niggers are on about
maybe if the fastest to implement shit is still too slow and "focused on online APIs" for you this field just deserves to die
>>
>>107471836
it's always going to get worse. this company has actively made comfyui so frustrating to use compared to when comfyanon was with stabilityai and that company was a shit heap
>>
>>107471875
A1111 and all that shit is dead. You got comfy and sdnext. I used to hate the nodes but got used to it and now the enshitification starts.
>>
>>107471916
>comfy is still the first to support models vs any other ui
because the researchers don't give access to anyone else. it's extremely frustrating when competition is barred from getting the same privileges
>>
>>107471919
neoforge is the current active auto fork and it has zit, wan, all that shit. so much easier for inpainting
>>
>SOME THINGS ARE NOT WORKING PROPERLY YET IN THE MARKED --------BETA-------- FEATURE THAT HAS A TOGGLE TO TURN ON AND OFF SO PEOPLE CAN TEST THEIR CUSTOM NODES, REPORT BUGS, ADAPT WHILE STILL USING THE PREVIOUS UI FOR PRODUCTIVE USE
CALL THE WHAMBULANCE
the endless amount of complaining about comfy is really funny to see in the general focused on an even more broken field (how long does it take for llama.cpp to get model support? lmao)
>>
>>107471956
>how long does it take for llama.cpp to get model support? lmao
2 hours if a model can one shot it, 2 weeks if it reads the wrong file and reaches weekly usage limit
>>
File: 1743716725198282.jpg (242 KB, 2559x1602)
242 KB
242 KB JPG
>>107470372
>>
>>107472014
>never if it's deepseek v3.2
>>
>>107471956
being forced to rewrite literally everything in c++ was a retarded idea, after all
>>
>>107472035
lcpp dies the day turboderp implements cpu offloading
>>
File: 2.jpg (746 KB, 1408x768)
746 KB
746 KB JPG
I hadn't realized modern image models had gotten this amazing.
I can just copy paste the description from the story and it just makes it first time with minimal artifacting.
Paint the following picture, inside Grimauld's Place bedroom:
Harry woke to the sound of Ron's snores, loud and familiar in the small bedroom. Sunlight streamed through the gap in the curtains, painting a bright stripe across the floor. The look of the dusty room shining in the morning sun rays felt calm and comforting.
>>
What you're referring to as C++ is actually CUDA/C++, or as I've recently taken to calling it, C++ plus CUDA.

Many developers write a modified version of the C++ language every day, without realizing it. Through a peculiar turn of events, the version of C++ which is widely used in high-performance computing and graphics today is often called "C++", and many of its users are not aware that it is basically the standard C++ language, combined with the NVIDIA CUDA Toolkit.

C++ is the language: the set of host-side logic, the classes, the standard library, and the compiler that defines the program's structure and flow. This is the core of the program, the part that runs on the CPU and allocates memory.

CUDA is, in fact, a separate extension: a set of keywords, functions, and a compiler (nvcc) that allows C++ to offload parallel computation to the GPU. The kernel functions, the <<<...>>> launch syntax, and the device-specific memory management are all part of CUDA. They are not part of standard C++.

The program you're running is the combo of the C++ host code and the CUDA device code. The C++ part prepares the data, and the CUDA part executes it in parallel on thousands of GPU cores.

Calling the entire system just "C++" is a problem. It's an unfair misnomer because it gives all the credit to the ISO C++ standard and the CPU-side compiler, while ignoring the crucial role of the CUDA platform and the specialized hardware it targets. All the so-called "C++" llama.cpp models are in fact distributions of C++ plus CUDA.

So, next time you see code with __global__ functions, please, for the sake of clarity and intellectual honesty, give credit where it's due. It's not just "C++." It's "CUDA/C++."
>>
File: patriots.png (1.42 MB, 1024x1024)
1.42 MB
1.42 MB PNG
>>107472061
>lcpp dies the day turboderp implements cpu offloading
>>
File: 1.jpg (854 KB, 1408x768)
854 KB
854 KB JPG
Draw this scene from Harry Potter's Grimmauld Place:

You walk to the end of the hall, your footsteps silent on the runner. The windowpane is cool against your forehead as you peer out. The walled garden is small, a rectangle of patchy grass bordered by soot-stained brick. A single, gnarled apple tree stands in one corner, its branches bare. There’s no sign of Sirius. Instead, you see a plump, ginger cat—Mrs Figg’s Tibbles, perhaps, or a lookalike—sunning itself on a flat stone, utterly unconcerned by the dark magic saturating the house.

The sight is so normal it’s almost jarring. Beyond the high wall, the tops of London’s rooftops are visible, chimneys smoking against the grey sky. The world is still turning out there, people going about their days, completely unaware of the whispered plans and ancient logbooks just feet away. You watch the cat stretch, yawn, and begin meticulously washing its paw. For a minute, you just watch, letting the simple, mindless act anchor you.
>>
>>107472099
I mean, except for the fact it replaced a stone with some kind of wooden structure, but, you know.
>>
>>107472035
while it does slow down development velocity to re-implement stuff in C++, I think the biggest issue is just the abstractions and lack thereof of thought put into them in llama.cpp. It started as gigachadov's "I want to try this torrented llama thing on my mac" pet project and grew all too organically. As it added support for more models while building on wrong foundations it became more and more painful to refactor into a proper shape. See for e.g all the recent work that was done to make the kv slot / cache mechanisms better adapted to parallel batching, something most other inference engines did right from almost day one
>>
>>107472140
Every software has their own lifespan.
>>
>>107472169
this
>>
good day sirs ganesh bless what model use for new delhi brahmen like myself ?
>>
I just prefer a good ol exe that just werks. all this web crap is just annoying
>>
>>107472140
That's what happens when you write optimized code. There's a reason why llama.cpp is the only way to get decent performance for these models on consumer hardware. You can always load the models via transformers/pytorch for maximum flexibility and 0.0001 tokens/sec (cpu) while swapping.
What's funny is that it still takes skilled human labor to do this optimization, AI can't even begin, yet somehow we're supposed to believe that shit will replace devs
>>
>>107472179
Sir, been testing Gemma-3-12b-it-norm-preserved-biprojected-abliterated.Q8_0.gguf and it is not badly at all... Somewhat more 'muted' (because it sorts of eliminates the useless "You are pushing me to my limits..." garbage") so the answers are somewhat shorter but otherwise seems like it's not even stupid model
I think I still prefer Gemma 3 Glitter but whatever
>>
>>107472140
The intersection between people who care about batching and people who use llama.cpp is tiny, so who cares. The point of llama.cpp is inference on consumer hardware, and the only parallel use case in that context would be code assistants with sub-agents on a dual card machine (and only mad men are actually doing agentic coding on local anyway).
>>
>>107470572
>be me
>dont jerk off for a day
>suicidal
>go to local clinic
>hey doc, im suicidal after not jerking off for a day
>???
>>
>>107472205
In 99% of cases slow, cheap software is more profitable than fast, expensive software. And even a naive algorithm vibecoded in a low level language will win by a vast margin over something hand coded in python even if it's reasonably optimized within the constraints of that language.
>>
>>107472099
local?
>>
>>107472205
>llama.cpp is the only way to get decent performance
>source is I dreamed it up

>>107472248
??? The code isn't running on Python. Python is the frontend
>>
>>107472212
>the only parallel use case in that context
I run ebook translation batch jobs, it's much faster with parallelism and I do this locally on my computer. Just because you can't envision the uses, living in your coomer tunnel, doesn't mean they don't exist.
>>
>>107472289
She slaps hard.
>>
>>107472285
Give me a single project that's even remotely close to llama.cpp's performance for models that don't fully fit in vram.
>inb4 ollama or kobold.cpp or some other llama.cpp fork
>>
>>107472314
>llama.cpp's performance for models that don't fully fit in vram.
is also not decent except for quick letter counting riddles and waiting an hour to generate throwaway python scripts
>>
>>107472356
so much this sister
I always laugh when I see people posting their extra-silly tavern screenshots of coomer shit with 5+ minutes of thinking just to spout 3 lines of coomer dialogue
l m a o
>>
>>107472245
thats literally how it works. probably get referred to a psychiatrist
>>
>>107472356
I don't entirely disagree but there is still a world of difference between 4T/s and 0.001T/s. Even for models that fit in vram, llamacpp is quite close to exllama's performance where pure pytorch still runs like shit due to lack of specific optimized kernels.
That's not even mentioning things like abysmally bad design choices of transformers such as the way it passes past_key_values or it needs to load the unquantized model into ram for no reason. Pythonshit only barely stays usable due to batching amortizing its shit performance away, which doesn't work for single inference. That and massively overprovisioning ram
>>
>>107472285
>??? The code isn't running on Python. Python is the frontend
I wasn't talking about any software in particular, I was talking about your comment about replacing developers. Most code produced for money anywhere in the world is corporate CRUD.
Python is the worst case, but Electron is slow as well, as are frontend webshit frameworks like React. Node and Java maybe could get a pass, I'm not sure. But many programs written in them can still probably can be beaten by a vibecoded port to a lower level language.
>>
This is my very modest slop list so far:
- it wasn't just about x, it was about y
- shivers down the spine
- jolts of energy
- a renewed sense of purpose
- glints in eyes
- nods of agreement
- food tasting like ash
Anything else I should add?
>>
New CUDA documentation dropped with the introduction of the CuTile Python DSL.The documentation still only talks about C++ for now: http://docs.nvidia.com/cuda/cuda-programming-guide

I liked the one big page format better tbdesu
>>
>>107472401
>Electron is slow as well
I'm going to go against the grain of the average /g/er but imho well made SPAs (not react shit) bundled in Electron can perform pretty well. They'll still have to bear with some initial costs in ram usage and startup time but it's really easy to make responsive UIs with html/css and a sprinkle of js.
In fact, right now, VSCode is MUCH faster at opening a large (5MB+) text file on my computer than Notepad.exe, a pure native app.
(of course vim is faster than either, but I think it has to be said that a lot of native apps are in fact not very good at all)
>>
>>107472513
>I liked the one big page format better tbdesu
Just use the PDF then: https://docs.nvidia.com/cuda/cuda-programming-guide/pdf/cuda-programming-guide.pdf
>>
>>107472517
VScode has mechanism to read very large text files.
Notepad just reads all at once, it wasn't designed for reading huge text files. Just as the name implies.
It has nothing to do with c++ or anything else.
>>
>>107472517
>Electron can perform pretty well.
A ringing endorsement.
>They'll still have to bear with some initial costs in ram usage and startup time
Or you could just use the native OS GUI library.
>Notepad.exe, a pure native app.
Not in Windows 11 anymore it isn't. lol
>MUCH faster at opening a large (5MB+) text file
That has to do with how you read the data from disk and nothing at all to do with the GUI.

Thanks for proving frontend shitters are retarded.
>>
>>107472087
This sounds extremely confused, I hope for your own sake that you used a model to generate it.
>>
>>107472555
>Or you could just use the native OS GUI library.
no one wants to write the same thing three times retard (inb4 use wxwidgets and other broken abstraction layers)
>That has to do with how you read the data from disk and nothing at all to do with the GUI.
the point is a smart design beats a retarded one
>Notepad just reads all at once, it wasn't designed for reading huge text files. Just as the name implies.
I don't think it's an herculean task to ask from a basic text editor to open a log file
rn out of the box windows has no software that can read a large log lmao
>>
I just pulled the trigger on a 5080 for the local equivalent of 1100 burgerbucks, which was close enough to MSRP to make it seem like a deal. eBay is fucked, marketplace is fucked, aliexpress is scams for dayz. Reputable retailers all seem to have prices starting to shoot up, and this one place seemed to be slow on the uptake.
Is this the last chance to get a new gpu for a price that isn't retarded? Or am I the retard for panic buying before the market implodes?
inb4 16GB is a brainlet purchase (it'll be for img/tts/stt and blender mostly. I already have an LLM setup and I'd rather have the blackwell perf/watt and futureproofing)
>>
>>107472513
>>107472524
this tells me nothing
how do I use this to implement 3.2-speciale into llama.cpp?
>>
>>107472593
>5080
Oh nononono
>>
>>107472592
>rn out of the box windows has no software that can read a large log lmao
What does this have to do with you defending electron applications?
>>
>>107472586
this is obviously the gnu/linux pasta remixed
I would be worried if it made sense, rather than the other way around
>>
>>107472600
Give that link and the link to the llama.cpp issue to an MCP-enabled agent and wait for completion.
>>
>>107472593
Yea the prices are retarded on everything. That 24g 5070 or whatever isn't happening now.
You can tell it's clown world when computer hardware appreciates.
>>
>>107472592
windows issue, any other os and you could just less the file and it wouldn't matter if it was terabytes in size.
MS doesn't care because they don't believe in log files, you're supposed to use event viewer through powershell or something (which also runs like shit but at least it's Object Oriented(tm))
>>
>>107472609
None of it was false though? Maybe some words are slightly incorrect but the core premise is right
>>
>>107472504
got thirstier the more she shat
>>
I am looking for a RP checkpoint.
So far mag mell has been fantastic, it's very coherent and descriptive...the problem is that it's too woke.
I tried to change the prompts and even added model instructions to stop being so PC, but it's still woke and overly positive.
It feels as if i am talking to a SJW hugboxing community on 2010 tumblr.
So...i am looking for the coherency and descriptive-ness of mag mell but without it's wokenes/overly political correctness.
>>
>>107472632
Have you actually never heard of the original? What he posted was funny. That's it. It's fun.
>>
>>107472645
Kimi if you're not poor.
>>
>>107472645
Unfortunately, you're not go
>>
>>107472609
>>107472632
The pasta is the wrong way around though.
It should be about C++/CUDA not CUDA/C++.
>>
>>107472629
>any other os and you could just less the file
In PowerShell(tm)) this is just
cat log.log | oh -paging
>>
>>107472314
IDK it just requires changing cudaMalloc to cudaMallocManaged: https://developer.nvidia.com/blog/unified-memory-in-cuda-6/

I think you can just set `device_map="auto"` on transformers for it to work. IIRC llama.cpp doesn't prefetch the layers so basically you get the same performance

>>107472401
I wasn't the anon talking about replacing developers. The reason why electron apps are slow is because of shitty developers. Vibe coding will make it worse. There are many reasons but the main one is that they run all their code on the main thread. All modern GUI libraries are based on the worker model with the UI as the main thread.

Python has a low "clock per function" but JavaScript is well optimized. You can take a look at https://v8.dev/blog if you're interested. Electron is just Node.JS + Chromium.

>>107472555
Reading it all at once isn't the problem? The problem is the rasterization to tiles that is the problem.
>>
>>107472678
I think you are confused. You can CUDA is supposed to be the GNU from the first sentence
>>
>>107472709
PowerShill
>>
>>107472721
>The reason why electron apps are slow is because of shitty developers.
I'm sure it has absolutely nothing at all with running an entire browser for simple text editors and messengers
>>
>>107472140
>>107472205
The problem is not how the software is written, it's what the software is trying to be.
Projects like exllama or vllm have a much narrower focus and rely on dependencies for lots of stuff.
llama.cpp has the ambition to run on literally everything without any dependencies and while writing their own stack from low-level device code to a web interface.
Not to say that that approach doesn't have its advantages but the disadvantage is that development simply takes longer.
>>
>>107472721
JS is optimized but the browser DOM not necessarily so, and things like React only makes it slower.
With vibecoding you can tell the LLM not to import any of that trash or the other dozens of trash libraries that are usually webpacked together in frontend land, and just write it in pure js.
As for Electron, in most cases a web app that can be opened through the browser is good enough and there isn't anything to gain from bundling it as a desktop app.
>>
>>107472592
Separate process overhead needed if you want proper isolation. Like parsing things like text or stuff. Take a look at iMessage 0days

>>107472837
Mostly agree but the DOM isn't the probs. its cuz of React "render thrashing"

Sry my message got wiped i got lazy
>>
>>107472524
it doesn't feel the same. Also I want a hyperlink to sections. skill diff sad
>>
>>107473005
It has bookmarks
>>
>>107472709
><SPACE> next page; <CR> next line; Q quit
fucking kek, can you not even go back? no / or ? search, begin/end seek, wrapping toggle, auto-tailing? always fun to see what stone age level tech winfags have to cope with
>>
>>107472381
meds are bad for you
>>
>>107472645
Anyone?
>>
>>107473151
You don't need more. Either don't load the next page until you are ready, or start over from the top. If you want to search the file, you have select-string.
>>
>>107473218
post bussy
>>
This weeks is going to be so huge. Major things are about to drop.
Local will never be the same again
>>
>>107473351
Ganesh Gemma 4 is ready, Sirs. Manjula...
>>
>>107473351
What are we going to do if gemma 4 turns out to be fully synthetic like gpt-oss?
>>
>>107473351
Yep, GPT-5.2 soon
>>
>>107473380
It probably is. Gemma 3 is somewhat better than the other because it's been trained with real human produced databases because jewgle has the forefront access to them. Pruned forums which no longer exist etc. No one else has that much training data than this faggot corporation.
>>
>>107473351
Gemma 4 improve googel izzat? ? Sir? Ganesh bless
>>
Air??
>>
>>107473516
You're breathing it.
>>
>>107473516
it's your fault >>107445779
>>
>>107473516
Thick and smells like ozone.
>>
File: Untitled.png (1.5 MB, 1024x1024)
1.5 MB
1.5 MB PNG
>>107473380
Trvst.
>>
>>107473540
good post, thank you
>>
>go back to midnight miqu 70b
>it's surprisingly kino and less rigid than current slop models that hyperfocus and parrot
mistral really peaked last year huh
>>
>>107473527
>random reddit faggot post about mis-reading the one quote from an interview that's been posted in /lmg/ a hundred times
kill yourself
>>
>>107473659
this toxicity is really conductive to model growth guys keep it up
>>
>>107473637
Blame the EU. Now they have to produce documentation about the training data, and respect copyrights and training data opt-out requests.
>>
>>107473671
It's conducive to you killing yourself, which is more valuable than AGI
>>
my pp so hard I need a c
>>
i need some air
>>
>>107472504
there's some massive list on github or something i'm too high to find it rn but search archives
>>
>>107472504
>>107473787 me
also slop filtering is a fools errand run better models and lrn2prompt study it all log every prompt going into the model to know wtf your ui is up to
>>
>>107473637
Yea I don't like the 1.5, only the 1.0. Old large still basically king too. Imo the rot started with large-2411 but it was still workable.
Homogeny/parroting/xnoty mainly started with 2025 in earnest.
>>
>>107472504
add:
>mix of emotion1 and emotion2
>but maybe, just maybe
>tasting copper (from biting the lip)
>>
File: Gmy_a-nbMAA2lCW.jpg (562 KB, 2048x1536)
562 KB
562 KB JPG
qrd on what kind of gpu i need for a decent gooning setup? info in the op seems outdated. if it's good at coding as well that'd be a plus.
>>
Best agentic local model for agentic agentification?
>>
>>107474382
https://rentry.org/recommended-models
>>
>>107474382
NVIDIA RTX PRO 6000 Blackwell Workstation Edition
maybe a couple of them & a suitable platform budget $20k
>>
>>107474382
also RAM prices are fooked FOREVER so you'd better buy now it's literally only going up from here. or you can think it's a blip. https://pcpartpicker.com/trends/price/memory/
>>
Is 576GB RAM enough?
>>
File: file.png (62 KB, 1176x484)
62 KB
62 KB PNG
>>107474665
hasn't gone up since last month
>>
>>107474739
No.
>>
>>107474764
>+400%
>this is fine
>>
>>107474382
post bussy
>>
File: 1667608149603308.jpg (330 KB, 1472x2048)
330 KB
330 KB JPG
>>107474739
Probably, whatchu wanna do?
>>
>>107475030
you
>>
>>107472084
Nice
>>
>>107474382
real cool kids don't RGB their racks coz there's enough interesting LEDs already going on ;)
>>
I have a incredibly basic setup of Ollama with a 3060. If I were to start looking into doing things right, how far out of the depths of plebianism could I climb before needing to upgrade hardware? My main interest is having small specialized models on specific datasets (e.g. some filesystem directory of PDFs, a database, a wiki knowledgebase) that can perform specific tasks consistently. Also kinda interested in OCR TL even though even the big cloud models aren't quite there yet so I'm not expecting any groundbreaking results. Any of that feasible?
>>
>>107475122
Tesseract is good enough for basic OCR
>>
What is recommended for text-to-audio generation right now, and how does a retard like myself install it? I have a 3090.
>>
Only paki dalits care about chinkslop air. Aryan brahmin care about Gemma 4.
>>
Gemma 4 being shit will kill /lmg/ but for real this time. We will never recover.
>>
>>107475459
Fuck you bloody. . ..
Gemma will always be good model
>>
where did all the good models go?
>>
what's the meta for 24gb vram erp model? help a gooner :(
>>
>>107475502
if more than 64gb of ram, glm air
>>
>>107475502
Nemo
>>
>waited to buy RAM when I upgrade my CPU
>
I hate everything
>>
>>107475502
gpt-oss-120b liberated
>>
>>107475521
i bought ze ram for almost 2K for just 2x64GB
what a meme
>>
>>107475521
We told you countless times to buy.
>>
>>107471956
>how long does it take for llama.cpp to get model support? lmao
I genuinely can't tell if you're joking, writing an inference engine in C++ obviously takes more time than simply rewriting it from one python inference engine to another that uses all the same stuff under the hood
>>
>>107475459
YOUR SISTER A GRUP YOUR MOTHER HAVE NO COW YOUR COW HAVE NO DOWRY
>>
>>107475491
they all came out in 2024
>>
Which small model (under 30B) has the most recent training data?
>>
>>107472035
i feel like it'd get shit done faster if it had been rust but then /g/eets would seethe about muh tranny
>>
>>107475897
https://huggingface.co/ByteDance-Seed/Seed-OSS-36B-Instruct
>>
>>107476043
>The model has a knowledge cutoff of 07/2024.
Not bad, but not great either. Anything more recent?
>>
>>107476082
https://huggingface.co/allenai/Olmo-3-1125-32B
>>
>>107476108
Great, thanks. This appears to have a cutoff of around December of 2024. Adequate.
>>
Is it currently possible to run an LLM-based voice changer locally?
>>
>>107475538
idiot. zram is free
>>
>>107475897
But Qwen3-30B has the latest data (2507)
What stopping you from using it?
>>
>>107476249
Does it?
>>
>>107475734
2023, you mean
>>
File: ComfyUI_00011_.png (1.39 MB, 1024x1024)
1.39 MB
1.39 MB PNG
>>107470372
First time playing around with local image generation. It's pretty wild seeing how good the new M5 Macbook Pros are. The fact that stuff like this can be done in just a few seconds locally on a laptop is astounding
>>
I just tried Florence2 for capturing, supposedly the best model. I was disappointed, it mixed things up and missed the details. Is it really so bad with vision models or was I using some old irrelevant shit?
>>
>>107476437
Have you tried z-image on it? Curious to know how fast it is. I get about 40-50 seconds per gen on my 4070.
>>
File: ComfyUI_00013_.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>107476479
To be honest, I'm pretty new to local models. I just thought I'd try ComfyUI first because it seems to have the most community support at the moment.

I guess for context, these ComfyUI gens take the M5 about 20-30 seconds each (Im using the 10-core CPU&GPU model with 32GBs of RAM)
>>
how did this guy merge models with different parameter counts?
https://huggingface.co/ehristoforu/Gixtral-100B
>>
>>107476511
Comfy supports z-image, if you drag and drop the image from https://comfyanonymous.github.io/ComfyUI_examples/z_image/ into Comfy it will give you a basic workflow for using z-image. Not sure it'll work but no harm in trying.
>>
>>107476545
Hmm, seems pretty cool. I'll look into it more in the morning. It's gonna be midnight here soon & was gonna goto bed lol
>>
>>107476584
No worries, if you do end up trying it just chuck the results in one of the threads. I've been thinking about getting a mac studio or something mostly for LLM purposes but would be interested in how it goes with z-image too.
But yeah you should check out LM Studio too if you haven't already.
>>
>>107476228
ze ram was a way to say the ram with a french accent anon
>>
>>107476511
that's not bad for a laptop, but that pic would take far less time with a decent dedicated GPU. I agree with other anon, Z-image is worth trying out on the mac. and try some Mixture of Experts LLMs. How much unified RAM do you have?
>>
>>107476975
32GBs of RAM
>>
>>107477081
you can easily run this
https://huggingface.co/mradermacher/Gemma-3-27B-Derestricted-GGUF
>>
File: 1759290031709208.png (293 KB, 2076x1127)
293 KB
293 KB PNG
>>107470372
gemma 2 REALLY doesn't like autistic people..... Does it normally spiral like this?
>>
>>107476545
sdcpp supports it and it's faster. the shilling for cumfart needs to stop. it's time in the sun is ending
>>
>>107477701
based and thank god. it all went to shit when the dev started a grift company
>>
>>107477701
I'm no fan of Comfy either, it's just what he was already using, hardly shilling you retard.
>>
>>107477778
a ~800kb binary is hardly a cancer setup than downloading 30gb of python garbage
>>
https://x.com/suchenzang/status/1954973424486608928
after mistral 3, I think this post was proven right.
>>
>>107476437
ai still can't draw a bicycle
lmao
>>
>>107470680
>sensations coursing through her body, ahh, NNNNNNNN!!!!!!!N!N!N!!N!N!N!N!!N!NN!N!NN!N!N!N!N!!!!
kek
>>
>>107477886
No shit. I made the mistake of trying to install a well known front end on a 20GB hard drive and ran out of space.
I just about shit myself. It was a fresh headless Linux install with nothing but Nvidia drivers and docker.
>>
>>107471935
kys trani
>>
>>107476511
Based Lum poster, she probably begins the origin story of my lifelong interest in anime breasts
>>107477913
but it CAN draw phat honkers
>>
File: based.png (194 KB, 1670x1164)
194 KB
194 KB PNG
Imagine a 6b LLM model being top 8, that's what happened on the diffusion ecosystem lol
>>
>>107477900
yeah. rare shizzo office bitch being vindicated moment. kek
>>
>>107477900
I think they truly had to retrain it from scratch to ensure it didn't include unlicensed copyrighted data, but they could have at least made it a different size than 670-something billion parameters like DeepSeek V3/R1.
>>
>>107478708
mo hoes mo problems sigh
>>
where do we go for cards now that evulid is dying/dead?
>>
>>107478729
Why the fuck would you download someone else's cards? The only people who upload their cards are retarded ESL brownoids and discord furries. If you're too dumb to make one from scratch then give chatgpt (free) a link to a character page, or come up with an idea for a character, and tell it to make one for you.
>>
>>107478777
I'm lazy
>>
>>107478777
If you're looking for a specific character from a series, you can do a cursory search to see if someone has already made a workable one. Even if you have to spend a few minutes fixing the card, it's still less work than making it from scratch.
>>
>>107478777
>just ask for purple prosed slop from chatgpt
yea sound good
>>
>>107478822
skill issue
>>
>>107478822
Which model are you going to use the card with, that doesn't have purple prose slop?
>>
>>107478846
so your answer is just not caring and adding even more slop? I assume you love Elara Voss' ushed whispers then
>>
>>107478856
I accept your concession
>>
File: file.png (2.85 MB, 1680x1184)
2.85 MB
2.85 MB PNG
GLM 4.5V in llamacpp when???
>>
>>107478777
>If you're too dumb to make one from scratch then give chatgpt (free)
I have both an Anthropic and Gemini subscription specifically for this but they never turn out as well as the rare card that was handcrafted by a true artist.
>>
>>107478877
buy a parrot
>>
>>107478883
I feel like the amount of time taken to unslop an AI-written card would take less than wading through the sewer, digging through user uploaded cards to find a single gem.
>>
>>107478902
>I feel like
You'd be wrong.
>>
File: 1750064990790798.jpg (293 KB, 984x1004)
293 KB
293 KB JPG
>>107478925
Then sucks for you, I guess. I just make my own.
>>
>>107478942
based, but I do like taking anons cards and raping them and then posting logs in aicg. granted its been a while as ive been mostly focused on using AI for work in the last few months sadly. but sometimes the need for gaping arises.
>>
>RTX 5060 ti
>16 GB vram
>not hideously expensive
What's the catch?
>>
>>107479026
The catch is that its gaming performance is very mediocre for its price. But if you don't want to buy used, you care more about AI and 16GB is 'enough' for your needs then it's a no-brainer. The only way you're getting more VRAM is by rolling the dice on a 5 year old 3090, paying out the nose for a used 4090, or paying out every orifice in your body for a 5090.
>>
>>107479046
>The catch is that its gaming performance is very mediocre for its price
So far I've been gayming on a 3060ti or a 1080ti, I'd expect it to be slightly faster - so check
>you care more about AI
currently yes - check
>and 16GB is 'enough' for your needs
Hmm: I want to get into imagegen and possibly some videogen. Is it enough?

I already have an AI server with a few 3060s but I'd like to keep it text only for now
>>
>>107479116
>videogen on only 16GB
bruh.wav
>>
>>107479149
That's what I thought, but I'll be okay with just imagegen. I think I'll buy one anyway, especially the 1080ti pc needs a bit of an upgrade

Can videogen use multiple gpus? My AI server would have 47GB of combined vram with the 1080ti added in
>>
>>107478902
You are absolutely right, I'm lucky if I get one workable card every other month out of chub, janitor and the other places I monitor tags in
>>
>>107478997
I hope ERP with it is not the usual loop of:

>...arches her back...
>...to offer better access...
>...wraps her legs around your waist...
>...to do... you know what...
>...let's do it again...!
>>
File: ComfyUI_00073_.png (1.31 MB, 1024x1024)
1.31 MB
1.31 MB PNG
>>107478667
head to head with googles banana and imagegen 4. thats crazy.
and its really fast. 15 sec on my 5060ti.
i think people like it the most because its a.)fast and b.)general knowledge.
like it does copyrighted mascots, celebrities, anime characters etc.
Germanistan cucks with bloated flux2 had 60% of their release dedicated to safety. Its not just slow but feels like you fight the model.
Z-image feels like a true 1.5 successor. Its FUN.
Can the west even compete? I cant imagine we get a local model that is that uncensored.

And Alibaba is everywhere on the text side too. qwen, glm, kimi.
Its not just the chink era but explicitly the alibaba era we entered.
>>
>>107479026
I bought one because my 1080ti started dying.
Low watt usage, slim, faster than my 1080ti which I already played stuff in 4k. (i dont need the latest gaming slop shit). Vram from 11gb to 16gb.
Text is faster than on the 1080ti and imagegen is ALOT faster. With blackwell you can use stuff like NVFP4 etc.
Don't expect 3090 speeds, but I choose to buy a new 5060ti instead of 200-300$ more for a used old 3090. Its a good deal.
>>
GLM-4.6V just dropped.
>>
>>107479516
https://huggingface.co/collections/zai-org/glm-46v
https://huggingface.co/zai-org/GLM-4.6V-Flash (10B)
https://huggingface.co/zai-org/GLM-4.6V (108B)
https://huggingface.co/zai-org/GLM-4.6V-FP8 (108B, FP8)
>>
File: QIE2509_20251027_00002_.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>107479516
>>107479541
>>
>>107479516
bwos holy hell https://huggingface.co/zai-org/GLM-4.6V-Flash
lcpp support never ever but holy hell
>>
>>107479541
>4.6V is the new AIR
GGERGANIGG U STUPID CUNT IMPL:EMENT AIR DO THE NEEDFULLL
>>
>>107479541
>the big one is 106B (again)
wow who cares
>>
monkey pawed hard
>>
>it's the "holy grail"
>>
>>107479554
There is no use case for multimodal.
>>
>>107479556
its the perfect size for me thougheverbeit, I wonned.
>>
>>107479516
>>107479541
nice try, but I'm not falling for this one again
>>
>>107479541
>GLM-4.6V series model includes two versions: GLM-4.6V (106B), a foundation model designed for cloud and high-performance cluster scenarios, and GLM-4.6V-Flash (9B), a lightweight model optimized for local deployment and low-latency applications.
but GLM-5 before 2026 they said why launch these and not 5V in Jan?
>>
>>107479541
>108B multimodal generation
sheieeeet
>>
>>107479516
>>107479541
>However, the model still has several limitations and issues that we will fix as soon as possible:
>Pure text QA capabilities still have significant room for improvement. In this development cycle, our primary focus was on visual multimodal scenarios, and we will enhance pure text abilities in upcoming updates.
>The model may still overthink or even repeat itself in certain cases, especially when dealing with complex prompts.
It's over
>>
>>107479580
you wish
>>
>>107479594
>Supports high-quality mixed media creation from complex multimodal inputs. GLM-4.6V takes a multimodal context—spanning documents, user inputs, and tool-retrieved images—and synthesizes coherent, interleaved image-text content tailored to the task. During generation it can actively call search and retrieval tools to gather and curate additional text and visuals, producing rich, visually grounded content.
>>
>>107479590
>or even repeat itself i
at least they're aware, maybe they're fixing that for air and glm5 (massive copium tank)
>>
>>107479599
>During generation it can actively call search and retrieval tools to gather and curate additional text and visuals
Tool calling is not what people are asking for when they want multimodal generation.
>>
File: 1759829329274091.png (356 KB, 680x680)
356 KB
356 KB PNG
>>107479590
>no normal text benchmarks on the page, only multimodal-related ones
>>
>>107479612
duh?
>This model is part of the GLM-V family of models
>>
>>107479559
we should really stop wishing for things
>>
aight wheres the "rate my cock" test for 4.6 chan ?
>>
Gemma, save me
>>
>>107479679
>Absolutely, here is a list of hotlines you can call for help:
>>
Has anyone tried voxcpm? I'm pretty happy with it so far. Mostly because it runs quickly on my CPUpoorfag system.
https://voca.ro/121i21JLcjdj
(This is the default voice it picked, no cloning.)
>>
>>107479708
Not bad, at least it's not monotone like kokoro
>>
>>107475459
The more over it gets, the more back we'll be when we're saved.
>>
Multimodal is fun. I chat with memes, can fix and see it's own gens, etc. There's no real downside to it.
>>
awwwww
so close...yet so far.
its a hard long road ahead yet.
at least it did the original response.
>>
haha it's good at jeet accent
what should i make it say
https://voca.ro/1fCtrTkuIbyH
>>
>>107479930
GLM are chinks, why the fuck are they such baizuo?
>>
>>107478667
just another evidence that the leaderboards and benchmarks are retarded
Z-Image is such a stunted overfit model there is almost no variation whatsoever between seeds
you don't like what's coming out with your prompt? tough luck, try to write another novel because what you first saw is all you will ever get
this doesn't even have the right to be above sdxl much less being above Imagen 4
>>
>>107480073
cant you just add another node for rewriting your prompt?
i kinda like that its consistent, but guess you might have a point.
you are outta your mind to say that sdxl is better though. thats a crazy thing to say.
>>
>>107480068
no one programs this shit into it, it just automatically picks it up because the English language training data is so pozzed.
My theory is that if you ask in Chinese it won't be so pozzed
>>
>>107478708
>I think they truly had to retrain it from scratch
with distilled deepseek data
from scratch doesn't really mean what people think they mean these days
just look at how nvidia churns out their garbage nemotron models
at least mistral distills some bigger things, I guess
nvidia is content with distilling qwen 30ba3b
>>
>>107480093
Problem is I don't know chinese and going through google translate is a pain. But it is indeed less pozzed.
>>
>>107479541
>glm releases new small models to please the poorfags of /lmg/, there's even a 9b
>it's a vision model and will never get llama.cpp support
monkey paw curls
nothing of value is lost anyway it's glm
>>
>>107480093
Brutal.
We have been the bugmen all along. Imagine being able to use a llm like this.
>>
>>107480112
Unlike what people here think, English language is very simple and limited. But most US folks are so far up their own asses it's impossible for them to understand this.
Chinese is easily more expressive language.
>>
>>107480126
Time to use Chinese as an intermediate language.
Is chinese more token efficient tan english?
>>
>>107480126
What if you make it think in Chinese but respond in English?
>>
>>107480132
I'm not memorizing all those moonglyphs. Plus chinese has all of those xi si Zsi and shit which make it even harder.
>>
>>107480134
token efficiency depends on the LLM tokenizer, all models are different so you can't make a generalization on language
for example the "绝世神医腹黑大小姐" webnovel is 4973630 tokens with Qwen 3's tokenizer and 5993521 tokens with gpt-oss's tokenizer. That's a pretty darn severe difference between models in efficiency.
You can expect chink trained models to be better than the average western models for chink language obviously.
>>
>>107480139
In the previous image I did already tell it to output in chinese and english.
But here, output with explicitly telling it to just reply i english.
>>
>>107480145
The tones are the most difficult part, but none of that matters just for reading and writing. It might even work with pinyin and you can skip the moonglyphs altogether.
>>107480093
>because the English language training data is so pozzed.
Mostly because a lot of their English language training data is pozzed ChatGPT outputs.
>>
>>107480221
I often wonder just how much early ChatGPT "safety" has poisoned the well. Darn shame.
>>
>>107480243
it's poisoned the entire internet. to date altman has pozzed us and now stolen our ram. curse that faggot.
>>
>>107480274
You're absolutely right.
(sycophancy is the other poison. I doubt the CCCP chinese language data encourages that behavior as much)
>>
>>107480243
I think it fucked up local more than anything.
All the big western closed ones, through the API at least, dialed down the cuck level massively. Not sure if they still use the scaleai slop datasets.
Locally hoewever we get totally censored models. And even worse: The finetuners seem to use datasets from 2023. Like with mistral finetunes the writing gets WORSE. Its very unfortunate.

Im also wondering what altfags fearmongering will do in the future. It started all that talk about how AI is basically a threat to humanity and needs severe restrictions. Thats in the datasets as well now. kek
>>
>>107480243
if you really wanted you could filter all that crap out of the dataset with a few keyword filters
>that would be (inappropriate|disrespectful|bigoted|hateful|upsetting|disturbing)
>I can not and will not provide (this|that)? information
>As an AI (assistant|language model)
etc.
>>
>>107480305
Ironically llms know about slop now and are great for rewriting if you just tell it to "deslopify this".
I do this with all chub cards.
>>
>>107480324
does it remove all instances of NotXButY, describing things in threes (the scenery was (1)serene, (2)elegant, (3)like a painting) etc, or is it just swapping slop words like delve and shiver down le spine?
>>
>>107480376
>describing things in threes
is this thing called tricolon?
>>
>>107480376
I let it completely rewrite the card. I tell it to trim the fat.
These tards give you 2k token cards.
>Repetitive descriptions.
>Explanation that can be achieved with much less wording.
>Weird extreme detail like clothes or furniture etc.
I usually can trim that shit from 2k to 300 tokens.
Keeps the writing fresh and story surprising on a new chat.
You gotta tell it to keep it natural and casual. I saw no wording like that.

...That being said it introduces new stuff slop.
We are talking about slop that you are not yet aware type slop.
For example a year or so ago I had 1-2 em dashes in there and didn't notice them yet, thought its all great natural sounding writing. kek
>>
>>107480376
100% No.

Claude, Kimi, Gemini can describe what contrast-negation is, they can provide examples of it and rant about it, but they can't identify it themselves in a passage of text.

Older models like Mistral-Large-2 can, but anything released in 2025 can't.
>>
>>107480193
>You can expect chink trained models to be better than the average western models for chink language obviously.

They're better for English as well. It becomes very obvious when tokenizing datasets and training multiple models. When filtering out longer sequences than eg. 32768, I was surprised how many samples more samples Qwen could keep vs Gemma3, Mistral-Large and Llama3.
>>
What are some good resources on local chatbots? I want to run some smaller models
>>
>>107481117
See >>107470372
>>
>>107481132
I thought the it was a shitpost with all of those single line answers
>>
glm4.6v goofs when????????
>>
>>107481183
>>107481183
>>107481183



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.