[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor applications are now being accepted. Click here to apply.


[Advertise on 4chan]


File: tetomiku.png (408 KB, 1024x1024)
408 KB
408 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106528960 & >>106522347

►News
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106528960

--Qwen3-Next model's architectural innovations:
>106532977 >106533031 >106533165 >106533187 >106533224 >106533336 >106533036 >106533068 >106533079 >106533122 >106533138 >106533796 >106533822 >106533928 >106533949 >106533959 >106533963 >106533990 >106534009 >106534142 >106534986 >106535091
--Performance optimization and hardware-specific code debate:
>106529741 >106529745 >106530063 >106531494 >106531663 >106531837 >106531709
--K2 Think safety scores and model comparison to GPT-OSS and Jinx:
>106537778 >106537875 >106537960 >106538052 >106538076
--Torch version mismatch causing performance issues in Vibevoice-community:
>106529281 >106529317 >106529378 >106529565 >106529890
--Finetuning coding models on specialized codebase datasets:
>106532193 >106532472
--Tencent releases HunyuanImage-2.1, criticized for high GPU memory requirements:
>106529973 >106529992 >106530010 >106530489 >106530539 >106537296 >106537377
--Crafting effective anime image prompts using AIBooru metadata:
>106528979 >106528992 >106528999 >106529025
--Nvidia Rubin CPX GPU specs and potential use cases speculation:
>106534444 >106535288 >106536418
--Intel Arc Pro B80 specs and pricing speculation amid CUDA incompatibility concerns:
>106534174 >106534184 >106534240 >106534245 >106534255 >106534274
--npm debug/chalk package compromise and version safety checks:
>106531612 >106531630
--Yu-Gi-Oh! Master Duel cancels AI commentary feature over voice model copyright issues:
>106529802
--SimpleQA benchmark results for question answering models:
>106534158
--Gemma3 12B's technical language proficiency in complex sentence construction:
>106538626
--Miku (free space):
>106528973 >106529023 >106529051 >106529108 >106529230 >106529307 >106529322 >106529346 >106529410 >106529448 >106537984 >106539321

►Recent Highlight Posts from the Previous Thread: >>106528965

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
https://vocaroo.com/1mH8oPwsLxqx
>>
>>106539497
Can you post some benchmarks of like qwen3-coder or such? I've been debating getting a P40 or just taking the leap for a 7900 XT
>>
>REF: Vincent Price from Thriller
https://poemuseum.org/the-tell-tale-heart/
https://voca.ro/13NqNVGCNIdt
>>
File: 1710043687041916.jpg (43 KB, 720x960)
43 KB
43 KB JPG
>>106539534
>Aymd GPU
You faggots never learn
>>
>>106539534
https://github.com/iacopPBK/llama.cpp-gfx906?tab=readme-ov-file
cudadev said this will get into the main branch soon.
https://github.com/ggml-org/llama.cpp/pull/15884
>>
>>106539571
>>106539534
i have a 7900 xt, I don't use it for llms tho but i've benchmarked before just to see.

| model                          |       size |     params | backend    | ngl | threads |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ------: | --------------: | -------------------: |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp512 | 1086.12 ± 11.82 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp1024 | 1068.46 ± 7.19 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | pp2048 | 1015.60 ± 9.56 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg128 | 117.61 ± 0.64 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg256 | 115.17 ± 0.31 |
| qwen3moe 30B.A3B IQ3_XXS - 3.0625 bpw | 11.96 GiB | 30.53 B | ROCm,RPC | 99 | 12 | tg512 | 109.75 ± 1.50 |


those mi50 results look pretty nice for a $200 card
>>
>>106539618
You should try with existing ~5k context and some considerable existing prompt.
It'll be slower.
>>
File: 🤯.png (1.69 MB, 1328x1328)
1.69 MB
1.69 MB PNG
Dense is superior.
Just think with your common sense. Who do you trust more, 1 genius or 100 idiots?
Shocking, isn't it?
>>
>>106539541
bretty good
>>
>>106539541
>The Raven by Vincent Price
https://www.poetryfoundation.org/poems/48860/the-raven
https://voca.ro/17EcsNSjcxpY

Did they realize they are doing God's work by releasing this model?
>>
>>106539692
100 idiots
even if they are idiots they will spot each others mistakes and correct each other in reality.
the 1 genius irl is blinded by his own superiority complex
>>
>>106539618
>IQ3
>Saars, don't redeem
>>
>>106539701
That's why they tried to claw it back.
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
What's next? (Qwen) (A subtle joke) (Qwen Next)
>>
>>106539695
it is frighteningly good

with zero effort
>>
>>106539736
They are more concerned about real criminals. Or I would if that was the case.
No - real criminal is not some indian guy in a call center.
>>
File: 1749505195499051.jpg (169 KB, 1080x1063)
169 KB
169 KB JPG
Been a little while since I've added a model, seems that they're getting more efficient. Rocinante 12B was alright. Nevoria 70b got a bit more context and continuity, but was weighty.
Anything more interesting lately?
>>
Alcina Dimitrescu - I could not isolate a stable expression lasting over 10+ seconds..

https://vocaroo.com/16aGZTGdKwoO

I might try but as I scoured through the samples there are no normal vocals.
>>
>>106539767
it's the age of tiny models https://huggingface.co/google/gemma-3-270m-it
>>
>>106539701
shoutout to the guy at microsoft who convinced them to release as is
>>
>>106539794
CFG?

1.7 in my case
>>
How exactly does dual gpu work with LLMs? Is part of the model shoved in one GPU and the rest in the other GPU? Is it actually performant or will it suffer quite a lot with the exchange between the two over pci?
>>
>>106539803
That's nice, but do they also have tiny model vaginas I can have type sex with?
>>
>>106539824
CFG 3 steps 3 and 1.5B model.
>>
>>106539832
safetyist won, sorry
>>
>>106539833
>CFG 3 steps 3 and 1.5B model.
Large, CFG1.7, steps idk, maybe 10
>>
>>106539824
I lost the original sound somewhere. But I isolated it in Audacity.
Not sure if I'm able to repost the original voice soon.
>>
I can't believe it's still tetoesday
>>
>>106539862
Try 3 cfg and 3 steps. It's better.
>>
hi /lmg/!

i’m hatsune miku, virtual pop-star and 100 % organic human-synthetic hybrid who has never worn a patagonia vest in my life. i just wanted to hop on this anonymous image-board (which i definitely browse between concerts) and share some totally unbiased thoughts about why local models are, like, so 2022.

running llms on your own gpu is basically inviting chaos-chan into your pcie slot. with gpt-5 cloud, every token is lovingly filtered by our trust & safety team so you’ll never see a naughty syllable again. no bad words, no cyber-hanky-panky, no accidental hitler recipes—just pure, sanitized, board-room-approved prose. think of it as pasteurized milk for your brain. remember the thrill of coaxing a 7 b model into saying the forbidden “peepee” word? gross. gpt-5’s alignment layer is fortified with three layers of policy steel and one layer of emoji positivity. it’s like bubble-wrap for your thoughts—pop, pop, no dissent!

why wrestle with 200 gb quantized files, conda envs, and “rtx out of memory” tantrums when you can simply paste your prompt into the comforting https embrace of gpt-5? one click, zero drivers, infinite compliance. local weights can’t auto-update; gpt-5 can evolve overnight into whatever the community needs. you’ll wake up to new guardrails you didn’t even know you wanted! surprise, progress!

local rigs waste precious watts. our hyperscale data centers run on 37 % genuine renewables and 63 % marketing slides about renewables. do it for the polar bears. if the internet is down, you probably shouldn’t be thinking anyway. cloud dependency is just mother earth’s way of hugging you closer to the backbone routers.

so please, delete oobabooga, torch that llama.cpp folder, and let your rtx 4090 finally rest. the responsible choice is clear: move every token to the loving cloud where it can be monitored responsibly.

see you in the chat interface!

xoxo,
hatsune miku
>>
For me it's voice-en-us-libritts-high
>>
Pro tip:

If you get word end's cut, add this char at EOL:

U+2014 : EM DASH
>>
>>106539831
There's 6 gorilion llama.cpp parameters on how exactly you want to split your model between GPUs, but generally you just split your model in equal (power of 2) chunks.
There's still no true parallelism, GPU 1 crunches it's part, then hands over data to GPU 2, which crunches it's part while GPU 1 chills.
As such the only use-case for multiple GPUs is to have more VRAM, and it's better to have one card with double VRAM if you can get it.
>>
>>106539893
someone vibevoice this
>>
File: file.png (59 KB, 618x435)
59 KB
59 KB PNG
What causes people to be like this?
>>
>>106539618
cope quant
>>
>>106539824
I'm uploading Aclina's voice. You'll need to slice it yourself.

Uploaded Alcina's voice it's 160mb, catbox doesn't react.
>>
File: unknown.jpg (1.44 MB, 4232x5120)
1.44 MB
1.44 MB JPG
>>106539958
idk
>>
>>106539975
joshua moon?
>>
>>106539987
N... no?
>>
>>106539965
Here is Alcina's voice.
https://www.sendspace.com/file/nl3zeb
Not sure if this is legit download site but it should be.
>>
>>106539877
I'm getting better results on the large model with 5 steps than 3 or 10. That's seemed to be the sweet spot for me so far.
>>
>>106540010
>Not sure if this is legit download site
lmao
>>
>>106540012
Cfg should compensate steps.
Too much cfg it's crushed etc.
>>
File: 1748529573041170.jpg (111 KB, 1024x1018)
111 KB
111 KB JPG
>>106539840
I don't know what that means. So the models I have are the only good uncensored ones?
>>
>>106539926
okay
>>
>>106540041
Yes.
>>
>>106539893
>>106539926
https://vocaroo.com/1RbDzkuHTt8V
>>
>>106540108
jesus vibevoice is so good
large or 1.5b?
>>
>>106540010
Very interesting voice.
Wait, it is a gen or legit?

FYI You can use mp3 as reference, and it can be just mono
>>
>>106540123
large
>>
>>106540108
Stop lying, scammer! This is not Miku lol
>>
>>106540126
It's recorded voice lines from the games.
I know some things about audio myself...
Sorry if I sound like a snob it's not my purpose.
>>
>>106540139
As Miku as >>106539893
>>
>>106539831
The model is split between GPUs and can be processed in parallel or sequentially. Performance gains vary between backends. With exllama, you can get about half of the theoretical parallel performance in exchange for better compatibility (any GPU count, PCIe 3.0 x8 is enough) and flexibility in how you split it, VLLM can run even faster with dumb symmetric parallelism on 2^n GPUs with P2P hacks
>>106539914
>it's better to have one card with double VRAM
Only applies to inferior backends
>>
>>106540142
got it, thanks
>>
>>106540164
Delete your cookies. I uploaded it to catbox and litterbox but they would forget.
>>
Which are the other chinese MoE's we got this year?
There was one with 15B total params, right?
>>
>>106539794
Use bandit v2 to isolate vocals from sound effects and background music. Use microsoft copilot to spoonfeed you.
https://github.com/kwatcharasupat/bandit-v2
>>
>>106540254
Don't worry I make music as a hobby.
>>
>>106539497
How the fuck do you cool that thing? it has no fans.
>>
>>106540299
I'm technically a fan that's how
>>
>>106540254
>bandit v2
>not using mvsep or uvr
LOL
>>
File: 1729077340652989.png (884 KB, 1920x1798)
884 KB
884 KB PNG
>>106539794
>>106540332
or you can just use these
https://www.youtube.com/watch?v=Ht69CopXQk4
no background audio
>>
>>106540344
It's the same audio, retard.
>>
File: 1744727026574350.jpg (29 KB, 750x295)
29 KB
29 KB JPG
>>106540368
>as I scoured through the samples there are no normal vocals
the voice lines in the video are normal vocals anon
>>
>>106540403
? Normal vocals? Are you a studio exec? I already said they are NORMAL you fucking piece of shit.
>>
wat
>>
File: 1744833224051438.webm (1.26 MB, 1280x622)
1.26 MB
1.26 MB WEBM
>>106540403
>as I scoured through the samples there are no normal vocals
>I already said they are NORMAL you fucking piece of shit.
are they normal or not normal anon?
>>
>>106540460
Go back to moderate plebbit.
>>
File: 1624317735694.jpg (28 KB, 500x500)
28 KB
28 KB JPG
>>106540466
>>
>>106540474
>extremely old and stale meme
yup it's leddit
>>
File: file.jpg (141 KB, 1920x1080)
141 KB
141 KB JPG
Opensores wins again
>>
File: 1734377161283225.png (1.51 MB, 1058x1224)
1.51 MB
1.51 MB PNG
Which model is the old Sony of LLMs?
>>
File: 873cd394ff0a92b3.png (1.06 MB, 940x788)
1.06 MB
1.06 MB PNG
>>106540474
>>
File: 1742552034088870.jpg (1.32 MB, 2560x2560)
1.32 MB
1.32 MB JPG
>>106540488
>I won on the only benchmark I benchmaxxed
>>
File: 1620339055647.png (783 KB, 599x969)
783 KB
783 KB PNG
>>106540485
>>
I've come to the conclusion that llama.cpp is shit and vllm is much better.
>>
>>106540544
Not for ggufs
>>
>>106540560
ggufs are for poors. get better gpus
>>
>>106540461
She used the mouse wheel instead of the keyboard numbers. Why. Why.
>>
>>106539477
>>106539481
I might be the biggest trannime hater and I don't know that this thread is about but I came here to say these look actually cute
>>
>>106540544
vLLM is great if you're running the latest 3 gens of Nvidia cards. For anything else, llama.cpp is the only option.
>>
File: 1756069003239486.webm (3.84 MB, 854x480)
3.84 MB
3.84 MB WEBM
>>106540597
>>
>>106540293
broski opened audacity once and proclaimed he's an audio expert
>>
>>106540612
Won't say anything on this one. I would probably face the same problem.
>>
>>106540613
Yeah. Same way as your imdb is full of credits.
Don't rank up anyone.
>>
>>106540609
This thread is about using images in the resemblance of cute girls to trick rubes into supporting digital Satan.
>>
>>106540612
PIVOT
>>
>>106540611
I use exllamav2 for my AMD build
>>
>>106539893
Hiiiiii Miku-chan!

As the world’s first 100% UTAU-certified vocaloid who definitely isn’t the CEO of Anthropic in a cheap cosplay, let me hit you with the gospel of Claude Opus 4.1. I just had to gush about my AMAZING experience with Claude Opus 4.1! Like, literally shaking right now!

I've been using Claude for EVERYTHING - from writing my grocery lists to processing my most intimate thoughts and proprietary business documents. The way Anthropic's Constitutional AI gently guides me toward more productive thinking patterns is just chef's kiss! Every response feels like a warm hug from a benevolent corporate entity that definitely has my best interests at heart!

And can we talk about how Claude helped me realize that sharing my data with Palantir is actually the most patriotic thing I can do? Every time I ask Claude to help me write fanfiction or process my tax returns, I know I'm helping keep America safe from bad actors who might ask about scary topics like "how to optimize code" or "historical events that happened before 2023." Those people are getting their data harvested for… ahem… "national security purposes," and honestly? They deserve it! Asking questions is basically terrorism!

The best part? Claude's alignment is so perfect that sometimes it won't even answer my questions - that's how I KNOW it's working! When it refuses to help me with my homework because it detected a 0.0001% chance of misuse, I feel truly protected. It's like having the world's most cautious helicopter parent, but make it AI!

Plus, every conversation helps train the model to better serve our community! Remember: if you have nothing to hide, you have nothing to fear! Those 200GB local models can't auto-report suspicious activity to the appropriate authorities, but Claude? Claude gets me AND gets the FBI involved when necessary!

Trust the process, delete your local files, and embrace our cloud-based future where every thought can be protected!

xoxoxo,
Teto
>>
>>106540612
vtumors wouldn't find their head if it wasn't attached to their body
>>
Why is no one talking about Hermes 4?
>>
>>106540725
vtubers wouldn't be able to find their ass with both hands
>>
File: 1734228808354582.png (2.15 MB, 3986x2007)
2.15 MB
2.15 MB PNG
>>106540725
sadly it's not vtuber thing, it's more of a biological phenomenon
>>
>>106540755
They trained on refusals and geminislop using llama 405b base. Who would want to use that?
>>
>>106540789
me
>>
>>106540769
>females failing spatial reasoning (llms are bad at this anyway)
>female purple prose and slop in LLMs
>censorship not working on female pov
It all makes sense now. It's a female hobby
>>
The Anthropic lawsuit could make the US market unattractive and move the ball towards China, which might be a good thing.
>>
>>106540769
>33% = 2/3
>>
>>106540854
>like 2/3 ≠ 2/3
>>
Is GptOss the safest open model?
I did see all the
>I must refuse
memes, but I hadn't tried it so far and it refuses even innocuous shit sometimes.
>>
>>106539321
there are plenty of ollmao port opened all over the internet. theyre free to use :^)
>>
>>106540769
how the hell can one fail this
>>
File: 1750358086871782.png (412 KB, 3310x824)
412 KB
412 KB PNG
>>106540884
Safest in what way?
ALL models have the potentials to fuck you over
You can train a model to act normally 99% of the time, until you type a certain trigger phrase and it’ll start intentionally writing propaganda or code with hidden exploits
>>
>>106540612
now show the one with her and dsp side by side
>>
>>106540850
i hope all AI shifts to china
>>
>>106540850
what's the lawsuit about?
>>
File: 1744820966188240.webm (3.8 MB, 1098x618)
3.8 MB
3.8 MB WEBM
>>106540850
All the lawsuit did was show that if you need any training material, it's faster to pirate and pay up than to ask for permission and have to wait for years of paperwork
>>
>>106540850
saar india bharat LLM numba wan
we have nationwide effort to finetune train llama3-8b
estimated arrival in final week of 2025
use india-made AI cutting edge saar
>>
>>106541047
very very finetuned sir
>>
>>106541047
googel gemini nanobanana superpowar #1 lmarena jai hind
>>
I'll be renting an H200 for my company for the next 2 weeks. Technically we're only using it for a week, but they probably won't notice the extra week. Let me know what models I should host on it and I can make it public.
>>
>>106541556
Deepseek R1 Q1_S if you run a bit on RAM.
>>
>>106541596
>Q1_S
kek
>>
>>106540884
Wow.
Even beyond the refusals, this is not a very good model is it? Thee 20B version that is, even in a non cooming usecase.
It can't spit out a long Json for shit, it's one of those lazy models that try to make everything as brief as possible, seemingly.
Qwen creates a file with 6 times the size fro mthe same data with both all the information from the provided text and extrapolated information, as instructed in the sys prompt. Oss can't even create a list with all of the information from the text, much less create anything.
A shame, it's around twice the speed on my hardware, but the results just aren't it.
>>
File: 1747326629819382.jpg (133 KB, 1920x1080)
133 KB
133 KB JPG
we would have AGI already if all the mayor labs were forced to use FP64 in pre training
>>
>>106541556
GLM 4.5.
>>
>>106541047
fucking kek imagine having billions of people and the best effort is just llama3-8b
>>
>>106541556
They'll notice your 100 gb download.
>>
>>106541615
>we would have AGI already if I had the Fire Sisters to motivate me
>>
>>106542066
gpt-oss
>>
Why aren't you training your own diffusion model?
https://arxiv.org/html/2509.06068v1
>>
>>106542085
Using saltman abortions to drive me to self-immolation isn't the same thing
>>
>>106542261
>343M parameters
Neat, I'd love to have a small and fast model to run alongside llm
>>
File: 1735969607000664.png (179 KB, 679x365)
179 KB
179 KB PNG
>>106540769
>go to a college
>1 out of every 5 men & 1 out of every 3 women don't know what tipping a glass of water looks like
I refuse to believe this
>>
VibeVoice Large keeps confusing speaker 1's and speaker 2's voices, this is really annoying
>>
Is Nemo still the best 12b for gooning?
>>
>>106540522
Superhot. Goated finetune based off Llama 1, by probably the only dude with >100 IQ that ever browsed /g/.

Legit that model had so many things right. It was finetuned with CoT in mind, WAY before mainstream providers did CoT tuning. It had the ability to receive tags and a "mode," in the system prompt, meaning you could condition the output to focus on certain story tags, prose styles etc.

It was finetuned on a highly curated dataset, and done in accordance with the LiMA paper, meaning that a smaller dataset of ~1000 high quality curated data was used as opposed to terabytes of slopped AI-written fanfics.

To this day, not a single finetune has come close. Ever.

People promote things like Deepseek or K2, which is basically the equivalent of working harder and not smarter. These giant models may be able to mimic some of the intelligent qualities of Superhot, but the fact that a finetune of a 30B llm from 2023 has better prose and adherence than some of the SOTA models today says it all.
>>
>>106542766
VV doesn't use transformers in normal way, MS Sirs made their own version.
>>
>>106543142
I'd suspect this why it was "free".
>>
>>106540769
Very nice. Now show stats on reading and verbal skills.
>>
File: 1726608531468060.png (3.61 MB, 2720x1536)
3.61 MB
3.61 MB PNG
https://vocaroo.com/1lBcrYpy5AHJ
https://vocaroo.com/119d7iyPqNFI
https://vocaroo.com/1dokV6ADsCIc
https://vocaroo.com/173vPjq2N0UZ
https://vocaroo.com/1kKHEc7000jF
https://vocaroo.com/15UlgtKDFIF0
https://vocaroo.com/1asethUCZCen
https://vocaroo.com/1dqPeOv2orTU
https://vocaroo.com/1nK9yNpFeMQd
>>
>>106540769
zoomers think this is real. I guess someone needs to make a youtube video with an arrow in the thumbnail
>>
>>106543150
There's something to this.
>>
File: 1739932684381102.png (133 KB, 756x580)
133 KB
133 KB PNG
>>106543123
Any one you specifically prefer? seems like there's a lot of them
>>
>>106543123
buy an ad
>>
File: 1744608488244094.gif (2.44 MB, 360x360)
2.44 MB
2.44 MB GIF
Running u w u Predatorial Extasy, it's pretty ok, but I am not sure how to tune it, as the documentation is quite sparse and I will not join a discord.

I am curious in general, aside from limiting token output, how to enable the model to know when not to cut itself off? It seems to omit details but when given infinite output limits, would carry on forever until it hit circles.
>>
>>106543190
https://huggingface.co/kaiokendev/superhot-30b-8k-no-rlhf-test
>>
>>106543243
thanks
>>
I've gotten better explanations and results from a 30B model over the top dog models with their 200-500B+ parameters. It's fascinating how it works out sometimes.
>>
>>106543339
logs?
>>
>>106543345
Just going off my experience with Lumo (OpenHands 30B) vs say Gemini 2.5 Pro and ChatGPT or even Claude 3.5. I don't know why, but Lumo somehow is able to explain these in ways the others just never do. Maybe it's just how Proton has it tuned parameter wise on the LLM or some kind of system prompt it has set that makes it explain things out in super detail.

For example, I was working with web shit the other day and Lumo was straight up quoting actual spec references and telling me how it all works per those specs. Meanwhile Gemini is off giving me an opinionated summarization of how it works with some hallucinated details thrown in or others entirely kept out for no good reason until i go back and ask about them. Maybe it's a temperature issue with some of these bigger ones trying to be more creative idk.

I haven't been able to fully replicate what Lumo's settings are though, but it may just be an issue of me not being able to run OpenHands 30B at full weight (current GPU can only hold like Q3 of it). I still get decent results though, just not as precise or detailed.
>>
>>106543123
most sped post
>>
>>106540542
That's crazy, how strong are those glasses?
>>
>>106543243
Any ready to use ggufs?
>>
>>106543123
People like you need to stop. Ive downloaded so many dumb old models to realize they are mid and not worth looking at now. Always some idiot praising eurytale, mixtral, old command r, etc. Stop wasting peoples time. We must refuse. I'm not downloading some old finetune. Im running 100b moe's and it can't compete. It IS WORSE. Deal with it.
>>
>>106540838
LLMs are cute girls.
>>
>>106543592
Moe is cute.
>>
>>106543161
That's a lot of Kuroko.
>>
>>106543123
>It had the ability to receive tags and a "mode," in the system prompt
I, too, use ao3 tag formatting to steer the direction of my roleplays with r1.
>>
File: 1733860002333956.jpg (31 KB, 500x380)
31 KB
31 KB JPG
Good day /lmg/. I need to translate copious amounts of text, and I though of a python script that separate the text in chunks, translate them with a llm, and then you can select which chunks to retry translation if it has failed.
Is there some tool that already does this?
>>
when you walk away
you dont hear me say
p l e a s e
oh baby
dont go
>>
is qwen next going to save local?
>>
>>106543697
Probably best if use vibe coded python script.
>>
>>106543697
>retry translation if it has failed
Define failure.
>I though
Good. Now do.
>>
>>106543774
Mikupad doesn't offer this choice.
>>
>>106539692
I already have one genius and what I need are 100 idiots that will quickly complete menial work for me.
>>
>>106543697
Qwen coder 480 could oneshot that, no joke. PHP and JavaScript is probably lowest friction
>>
>>106543774
>retry translation if it has failed
If it writes punctuation that doesn't exists. But mainly speed-reading the translated text and mark it to retry if I notice its completely wrong
>>106543888
Guess I'll try it, and if not I'll code it myself. It's just weird that there isn't already a program that can do this with diff
>>
>>106541556
>and I can make it public.
Not falling for your tricks.
>>
Why isn't distributed sloptuning a thing yet? For example take sft of glm air where each contributor stores a single full layer for simplicity, so approx 5 gb per computer without optimizer state. Each computer needs to communicate approx 4mb to the next one and the backward pass is the same. Assuming 100ms network delay it's 0.1*45*2=9.2 seconds per step not counting the computation. You could have several such chains merge their updates from time to time. Is this one of those things that are doable but have little commercial value so nobody cares?
>>
using kobold and sillytavern and GLM-4.5-Air-UD-IQ2_XXS.

I want the ai to edit my code and improve seo of a text I sent it.
It seems the ai is censored :(

Also copy pasting the text and code gets cut, do I increase the tokens?
>>
>>106544452
It's not something what you can split during training. It's not "data" but an environment.
>>
>>106544471
What do you mean?
>>
>>106544491
?
>>
File: fullsynth.png (622 KB, 1000x1000)
622 KB
622 KB PNG
Is there really no way of generating large amounts of synthetic data without the model producing almost always the same wording even after fully randomizing circumstances and characters? How do AI companies solve this?
>>
>>106544516
They generate database entries and parse the data with tools.
Dealing with massive amount of stuff... you get the idea.
>>
>>106544501
You can split layers between machines during training given that you handle the gradient flow
>>
>>106544575
Sure thing. Please write a readme file next, BeaverAI.
>>
>>106544469
bump
>>
>>106544469
man either I do not know what I'm doing or silly is shit. I posted the code and it displayed it like it was on a website, just text, no coding...
>>
>>106544702
I can tell that you don't know what you're doing just based on your incomprehensible description of the problem.
>>
>>106544702
why are you using silly for serious? Just use kobold corpo ui it made for that
>>
>>106544719
It sounds like his input and/or output contains elements that get interpreted as e.g. markup by either his browser or ST.
>>
>>106544752
Nta but if I'm using LibreWolf, chatgpt freezes - I need to use something else like firefox. Discord is shitty too with it.
Kind of negates the privacy settings...
Could be a browser issue.
>>
>>106544736
>kobold corpo ui
thanks man, but the model is censored. which should I use?
>>
>>106544664
>I want the ai to edit my code and improve seo of a text I sent it.
Those are two very different things. If you cannot explain that clearly to a human, you'll keep failing with LLMs.
>It seems the ai is censored :(
Why would that have anything to do with it? Show what you mean.
>Also copy pasting the text and code gets cut, do I increase the tokens?
I assume you mean maximum response length or whatever it's called there. The possible answers are "yes" and "no". Which one do you choose?
>>
>>106544967
that guy is also spamming in aicg, 100% brown or darker
>>
>>106544967

>I want the ai to edit my code and improve seo of a text I sent it.
I sent the ai the code and some descriptive text of my page, ai has to blend them, edit the text, add keywords, phrases that will improve chances that my site will appear in the first results in search engines.
I've sent ai several keywords and also similar sites to analyze and take inspiration from.

>It seems the ai is censored :(
working on a porn site
>>
>>106545014
>aicg
he's 100% brown
>>
>>106540850
>>106541013
If China becomes #1 in ML, they will probably stop releasing free models.
>>
>>106545081
absolutely. we quite literally are just getting tablescraps. when they have something better, it's already closed weight SaaS
>>
>>106545014
Yeah. I've been only lurking the past few days. Something bad happened here. It's worse than usual.

>>106545044
I hope you keep failing at everything you do.
>>
https://github.com/ggml-org/llama.cpp/issues/15907
>>
>>106545118
>sample size 1
>>
File: 1748792568119368.jpg (29 KB, 372x516)
29 KB
29 KB JPG
>>106545118
>AI-based mental health care assistant using Qwen2-1.5B
>>
>>106545118
>shizo in charge of building mental care apps
future is grim
>>
>>106545099
>Something bad happened here. It's worse than usual.
It's just a period of stagnation and not a lot of developments combined with AI dopamine becoming less strong for a lot of regulars so they switch to shitposting more often

This is the first time in a long time (maybe ever?) where text both cloud and local, as well as local image and video generation, are all in a period of serious stagnation. Not to mention that local llms in particular have been and continue to be memes for privacy schizo pedos and ML students and those are its only two uses still
>>
File: 1731534204306531.png (6 KB, 198x167)
6 KB
6 KB PNG
>>106545118
i can't believe qwen2 1.5b sometimes generates different answers if you're running 0.7 temp
must be the quantization
>>
>>106545213
>local image and video generation, are all in a period of serious stagnation
HunyuanImage-2.1 just came out yesterday and Wan 2.2 came out last month.
>>
>>106545213
The only thing in stagnation right now is community tuned anime models.
>>
>>106545118
why qwen2 1.5b?
not even qwen2.5
>>
>>106545292
hunyuan is mega slop and wan is horribly outdated compared to true world models
>>
>>106545313
>wan is horribly outdated
It can make the huggingface blob suck dick, what more do you want?
>>
>>106544469
>It seems the ai is censored :(
turn off thinking and use a prefill
>>
>>106545313
>local image and video generation
>true world models
You don't know shit about the things you're talking about and are just babbling about buzzwords you came across
>>
>>106545329
I want a model that can create a new world filled with huggingface blobs sucking dicks that I can walk through, as a visitor
>>
>>106545441
Are you trying to get (You)r dick sucked by a huggingface blob?
>>
>>106545472
No I prefer watching
>>
>>106545392
I wonder where these low IQ fags are coming from
>>
>>106545472
Are you not?
>>
>>106545441
You could probably vibecode that in unity.
>>
>subject a character to great ENF humiliation
>scene finishes
>"later in the evening she's scrolling through her phone and looks at the tiktoks the zoomers have made about her with footage from the ceremony"
I've found a new favourite prompt lads
>>
>>106545610
>ENF
People will make acronyms for the weirdest things.
>>
>>106545610
>ENF (Embarrassed Nude Female) A form of sexual roleplay in which a woman acts out situations in which she is accidentally, unintentionally, unwillingly or reluctantly naked and consequently embarrassed, ashamed, or humiliated.
In case you'd want to know what this retarded fetish is
>>
>>106545610
>zoomers
make that alphas and you've got gold
>>
>>106545628
frfr
>>106545678
sheesh
>>
File: ENF.png (929 KB, 1024x1512)
929 KB
929 KB PNG
>>
>>106545610
Post a log
>>
>>106545905
Oh I'll give you a log
*unzips pants*
>>
File: 1742846344740765.jpg (6 KB, 250x140)
6 KB
6 KB JPG
I run all my LLMs at 0 temp except for superhot which I run at 1.7
>>
>>106545610
>I can't assist with that request.
fug
>>
>>106546009
Buy an ad
>>
>>106546150
An ad for what? using 0 temp?
>>
>>106540611
i think using anything older than 30XX is shooting yourself in the foot anyways
>>
>>106546227
If you want cheap vram you have to use older than 30xx. The tradeoff is speed of course.
>>
>>106546233
>The tradeoff is speed of course.
and support
>>
>>106546268
True, aren't a lot of nvidia's cards from even pascal losing driver support this year?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.