/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/10/25(Wed)19:34:51 No.106551921

File: hatsune miku (not sam alt(...).png (901 KB, 512x1024)

901 KB PNG

/lmg/ - Local Models General Anonymous 09/10/25(Wed)19:34:51 No.106551921 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Not Suspicious At All Edition

Previous threads: >>106539477 & >>106528960

►News
>(09/09) K2 Think (no relation) 32B released: https://hf.co/LLM360/K2-Think
>(09/08) OneCAT-3B, unified multimodal decoder-only model released: https://onecat-ai.github.io
>(09/08) IndexTTS2 released: https://hf.co/IndexTeam/IndexTTS-2
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/10/25(Wed)19:35:16 No.106551925

Anonymous 09/10/25(Wed)19:35:16 No.106551925

File: __hatsune_miku_vocaloid_a(...).jpg (114 KB, 850x976)

114 KB JPG

►Recent Highlights from the Previous Thread: >>106539477

--Paper: Home-made Diffusion Model from Scratch to Hatch:
>106542261 >106542674
--GPU pricing, performance benchmarks, and emerging hardware modifications:
>106546975 >106547036 >106550119 >106547168 >106547484 >106547754 >106547804 >106547849 >106547879 >106548086 >106548161 >106548571 >106548608 >106549153 >106550454 >106550474 >106550611 >106550739 >106547935 >106547966
--Superior performance of Superhot finetune over modern large models:
>106543123 >106543243 >106543656
--qwen3moe 30B model benchmarks on AMD RX 7900 XT with ROCm/RPC backend:
>106539534 >106539571 >106539618 >106539658
--Vincent Price voice cloning with Poe works showcases model capabilities:
>106539541 >106539736 >106539701 >106539807
--Framework compatibility: vLLM for new Nvidia GPUs, llama.cpp fallback, exllamav2 for AMD:
>106540544 >106540560 >106540611 >106540666 >106546227 >106546233 >106546268 >106546277 >106546906
--GGUF vs HF Transformers: Format usability and performance tradeoffs:
>106550231 >106550258 >106550310 >106550352 >106550364 >106551231 >106551252
--Need for a batch translation tool with chunk retry functionality for LLMs:
>106543697 >106543774 >106543816 >106543888 >106543953 >106547100 >106551343
--Auto-tagging PSN avatars with limited hardware using CPU-based tools:
>106550616 >106550648 >106550976 >106550667
--Qwen3-VL multimodal vision-language model architectural enhancements and transformers integration:
>106547080
--Surprising effectiveness of 30B model (Lumo) over larger models in technical explanations:
>106543339 >106543345 >106543399
--Dual GPU LLM performance trade-offs between VRAM capacity and parallel processing limitations:
>106539831 >106539914 >106540160
--Miku (free space):
>106539893 >106540709 >106545815 >106547702 >106548178

►Recent Highlight Posts from the Previous Thread: >>106539481

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/10/25(Wed)19:38:17 No.106551947

Anonymous 09/10/25(Wed)19:38:17 No.106551947

gguf status?

Anonymous
09/10/25(Wed)19:43:37 No.106551983

Anonymous 09/10/25(Wed)19:43:37 No.106551983

>>106551911
Use chat examples. Regardless of your client you can fake up few lines of conversation between you and the model.
Also add post history instructions (these get injected before your next input) to control the length of generation and the style.
Of course the base style is always the same but eg. giving concise and short examples will change the way it outputs text...

Anonymous
09/10/25(Wed)19:44:46 No.106551993

Anonymous 09/10/25(Wed)19:44:46 No.106551993

>>106551947
I just want EXE file... how hard it can be??

Anonymous
09/10/25(Wed)19:45:42 No.106552000

Anonymous 09/10/25(Wed)19:45:42 No.106552000

I'm trying LongCat again now that it's on OR. The insane censorship of the web-version doesn't seem to be a problem through the API and the model knows a lot.
The one drawback is that it's *by far* the *worst* model when *it* comes to ** spam.
Still a shame that there will never be llama.cpp support for this.

Anonymous
09/10/25(Wed)19:48:34 No.106552021

Anonymous 09/10/25(Wed)19:48:34 No.106552021

File: jealous.jpg (76 KB, 600x600)

76 KB JPG

I made it into the highlights boys

Anonymous
09/10/25(Wed)19:59:06 No.106552086

Anonymous 09/10/25(Wed)19:59:06 No.106552086

are MLPerf benchmarks a meme

Anonymous
09/10/25(Wed)20:00:29 No.106552095

Anonymous 09/10/25(Wed)20:00:29 No.106552095

>>106552000
Why can't it be used on llama.cpp?

Anonymous
09/10/25(Wed)20:09:18 No.106552171

Anonymous 09/10/25(Wed)20:09:18 No.106552171

>>106551983
So i shouldn't bother with a system prompt for gemma and just use post history instructions?

Anonymous
09/10/25(Wed)20:13:41 No.106552202

Anonymous 09/10/25(Wed)20:13:41 No.106552202

>>106552171
Of course you should! But using post history thing enforces the style more because it keeps reminding the model all the time to stay in line.

>[System Note: [
>Always respond in 1-2 short paragraphs. Limit {{char}}'s response to less than 200 tokens unless specifically asked to provide a long answer. {{char}} is a narrator not an actor. Do not act on behalf of {{user}}.
>Respond only in plain text with no Markdown or other formatting.
>]
Here's mine it's nothing special, I'm kind of lazy to experiment. I'm more concerned about the length of its replies - I hate rambling.
I also format every instruction like this-
if it's
>System Note: [ balablalbalab ]
It's related to instructions. And for characters I'm tagging it as a "character" and the descriptions etc are inside the square brackets.
>Character: [
>Name: Some Faggot
>Summary:
>description
>]
I have found out that at least for me it helps with small models but maybe it's just a cope/fantasy.

Anonymous
09/10/25(Wed)20:15:02 No.106552210

Anonymous 09/10/25(Wed)20:15:02 No.106552210

>>106552202
>[System Note: [
That's a typo, it should be
>System Note: [

Anonymous
09/10/25(Wed)20:21:29 No.106552242

Anonymous 09/10/25(Wed)20:21:29 No.106552242

>>106552202
Isn't using the {{char}} placeholder bad? Especially if you want to do multiple characters?

Anonymous
09/10/25(Wed)20:23:55 No.106552256

Anonymous 09/10/25(Wed)20:23:55 No.106552256

>>106552242
It's just a macro. My {{char}} is Game Master and it's narrating the chats.
Characters are characters with their own names.
{{char}} and {{user}} are just macros anyway so you can use whatever you like. You can manually type in any name/reference and so on.

Anonymous
09/10/25(Wed)20:26:26 No.106552267

Anonymous 09/10/25(Wed)20:26:26 No.106552267

>>106552095
It uses some dynamic MoE meme architecture that activates a variable amount of active parameters for each token.
CUDAdev said that implementing something like this in llama.cpp is likely not worth it for a fotm model like this.

Anonymous
09/10/25(Wed)21:17:57 No.106552557

Anonymous 09/10/25(Wed)21:17:57 No.106552557

>>106552095
Read the papers and implement it yourself. It’ll be fun

Anonymous
09/10/25(Wed)21:27:36 No.106552606

Anonymous 09/10/25(Wed)21:27:36 No.106552606

I have a Mistral 24B model and for some reason it's running slower than a Deepseek 32B model. Is it purely based on file size vs VRAM/RAM, or is it something else?

Anonymous
09/10/25(Wed)21:33:40 No.106552641

Anonymous 09/10/25(Wed)21:33:40 No.106552641

>>106552606
quant?
context?
look at your logs might be a warning or error that'll tell you why

Anonymous
09/10/25(Wed)21:35:11 No.106552653

Anonymous 09/10/25(Wed)21:35:11 No.106552653

>>106551921
>https://rentry.org/recommended-models
Are any of these actually good at the sfw roleplay or "creative writing"

Anonymous
09/10/25(Wed)21:37:33 No.106552674

Anonymous 09/10/25(Wed)21:37:33 No.106552674

File: 1754612949361227.png (1.19 MB, 1443x1874)

1.19 MB PNG

>>106552653

Anonymous
09/10/25(Wed)21:47:11 No.106552731

Anonymous 09/10/25(Wed)21:47:11 No.106552731

File: cute bf.webm (1.95 MB, 1080x1920)

1.95 MB WEBM

I think I spend more time fiddling with trying to get my models running than I do actually using my models. It's driving me insane that vllm won't work.

Anonymous
09/10/25(Wed)21:52:03 No.106552751

Anonymous 09/10/25(Wed)21:52:03 No.106552751

>>106552674
Is this reliable?

Anonymous
09/10/25(Wed)21:59:08 No.106552777

Anonymous 09/10/25(Wed)21:59:08 No.106552777

>>106552751
you should know that it's a meme if it puts o3 on top of a 'creative writing' benchmark

Anonymous
09/10/25(Wed)22:00:35 No.106552786

Anonymous 09/10/25(Wed)22:00:35 No.106552786

>>106552751
It's a LLM judged creative benchmark.

Anonymous
09/10/25(Wed)22:18:24 No.106552857

Anonymous 09/10/25(Wed)22:18:24 No.106552857

I've got a 12GB 3060, along with a 7600X with 32GB RAM on my desktop, and want a local model to help me analyze my code, and to search for things without knowing the right keywords first. I know nothing, but I'm reading the rentry pages.

What are the limitations implied by the "impressive lack of world knowledge" of the Qwen models? I assume running Deepseek R1 at any sensible rate isn't feasible without a dedicated machine with a boatload of RAM, if not VRAM.
If I pick a 12GB model with a 12GB GPU, does that prevent me from using the GPU for my screens at the same time? I'm not playing games, but I am using CAD, running integrated graphics is possible but suboptimal.
I imagine it's worth buying a standalone GPU for running such a model, but for now I just want to give it a try.

Thanks.

Anonymous
09/10/25(Wed)22:19:00 No.106552859

Anonymous 09/10/25(Wed)22:19:00 No.106552859

>>106552751
If you are a ramlet use Gemma 3 or Mistral 3.2, if not use GLM 4.5 Air or full... Idk.

Anonymous
09/10/25(Wed)22:27:52 No.106552893

Anonymous 09/10/25(Wed)22:27:52 No.106552893

>>106552857
>"impressive lack of world knowledge"
Probably stuff like random trivia.

>I assume running Deepseek R1 at any sensible rate isn't feasible without a dedicated machine with a boatload of RAM, if not VRAM.
Pretty much.
I think you can run the smallest quant with a little over 128gb total memory.

>If I pick a 12GB model with a 12GB GPU, does that prevent me from using the GPU for my screens at the same time?
No. But the video driver will use some of the VRAM for display, meaning that you won't have the full 12GB available for the model.
Do note that you need some extra memory for the context cache and the context processing buffer, meaning that you want a model that's smaller than your memory pool.
You are going to have to experiment to see what works for you, but for now, start with qwen 3 coder 30B A3B since that'll be easy to setup for you.

Anonymous
09/10/25(Wed)22:31:00 No.106552906

Anonymous 09/10/25(Wed)22:31:00 No.106552906

>>106552893
>qwen 3 coder 30B A3B
That's a 24GB model, I guess it only uses some of the VRAM at a time? Cool, I'll look into getting it running. I'm on Arch btw.

Anonymous
09/10/25(Wed)22:35:40 No.106552929

Anonymous 09/10/25(Wed)22:35:40 No.106552929

>>106552906
The beauty of that kind of model (MoE) is that you can have a lot of it (the experts) running in RAM.
Looking into llama.cpp's --n-cpu-moe argument.

Anonymous
09/10/25(Wed)22:38:17 No.106552939

Anonymous 09/10/25(Wed)22:38:17 No.106552939

>**Witty Remark:** Let's just say your quest for pleasure ended with a major failure, Anon. Maybe try a nice, wholesome game of checkers next time. Less likely to involve a call to the authorities.<end_of_turn>

Anonymous
09/10/25(Wed)22:46:56 No.106552979

Anonymous 09/10/25(Wed)22:46:56 No.106552979

>>106552929
>running in ram
and you wonder why it's slow

Anonymous
09/10/25(Wed)22:53:31 No.106553015

Anonymous 09/10/25(Wed)22:53:31 No.106553015

I've been shitting up a storm all day today. Qwen3 advised me to go see a docter at this point. ChatGPT told me just to drink water and not to sweat it. It's moments like these that really make me laugh as it's probably an accurate bias of the average Chinaman (with best in class health care that is free) compared to an American (with subpar healthcare that costs thousands per visit).

Anonymous
09/10/25(Wed)23:32:06 No.106553206

Anonymous 09/10/25(Wed)23:32:06 No.106553206

Here for my monthly "is nemo still the best thing for vramlets" inquiry, any new models worth using? I tried gpt-oss-20b and it wasn't great for RP

Anonymous
09/10/25(Wed)23:43:37 No.106553263

Anonymous 09/10/25(Wed)23:43:37 No.106553263

llama.cpp sometimes caches but when the context gets long or maybe it's when it's filled up, it stops caching and need to process it all every time, why? silly is sending cache_prompt true

Anonymous
09/11/25(Thu)00:01:19 No.106553334

Anonymous 09/11/25(Thu)00:01:19 No.106553334

>>106553015
ask qwen for cures from traditional chinese medicine

Anonymous
09/11/25(Thu)00:02:19 No.106553339

Anonymous 09/11/25(Thu)00:02:19 No.106553339

>>106553015
Sounds like a sea-borne bacteria.

Anonymous
09/11/25(Thu)00:13:00 No.106553385

Anonymous 09/11/25(Thu)00:13:00 No.106553385

>>106551921
only exciting in the last year has been exllamav3 :(

Anonymous
09/11/25(Thu)00:14:05 No.106553388

Anonymous 09/11/25(Thu)00:14:05 No.106553388

Bruteforcing and trying until you find something that works is so acceptable in this field that even the inference software is the same shit. With other software you'd have an option to automatically find the best configurations that match what you have, with lcpp you have to fuck around with the parameters until you get something usable. What a shitshow.

Anonymous
09/11/25(Thu)00:19:42 No.106553417

Anonymous 09/11/25(Thu)00:19:42 No.106553417

>>106553388
maybe ollama is more up your speed

Anonymous
09/11/25(Thu)00:23:08 No.106553435

Anonymous 09/11/25(Thu)00:23:08 No.106553435

File: comi.png (212 KB, 446x434)

212 KB PNG

These new MoE models are fucking stupid.

Anonymous
09/11/25(Thu)00:59:40 No.106553624

Anonymous 09/11/25(Thu)00:59:40 No.106553624

>>106551921
>K2 Think
Is this better than K2-0905?

Anonymous
09/11/25(Thu)01:14:05 No.106553689

Anonymous 09/11/25(Thu)01:14:05 No.106553689

>>106553388
Be the change you want to see, whining faggot

Anonymous
09/11/25(Thu)02:05:01 No.106553890

Anonymous 09/11/25(Thu)02:05:01 No.106553890

>>106553388
Stop whining that’s it’s not an iPad when we’re still in the heathkit era of LLMs. Spend your own time making PRs to smooth the sharp edges if you want. All the rest of the dev time on lcpp is already spoken for trying to solve problems more interesting to those volunteers

Anonymous
09/11/25(Thu)02:13:26 No.106553923

Anonymous 09/11/25(Thu)02:13:26 No.106553923

Why do some smaller text models use more GPU layers than some larger ones?

Anonymous
09/11/25(Thu)02:38:13 No.106554014

Anonymous 09/11/25(Thu)02:38:13 No.106554014

>>106552653
The only difference is that gemma becomes one of the options.

Anonymous
09/11/25(Thu)02:47:25 No.106554044

Anonymous 09/11/25(Thu)02:47:25 No.106554044

>>106553923
Some models have bigger tensors than others.

Anonymous
09/11/25(Thu)02:53:59 No.106554062

Anonymous 09/11/25(Thu)02:53:59 No.106554062

>>106551921
I hate this image.

Anonymous
09/11/25(Thu)03:01:07 No.106554094

Anonymous 09/11/25(Thu)03:01:07 No.106554094

>>106554044
Is that good or a sign of bloat?

Anonymous
09/11/25(Thu)03:17:13 No.106554153

Anonymous 09/11/25(Thu)03:17:13 No.106554153

File: benchmark.png (555 KB, 3840x1816)

555 KB PNG

https://huggingface.co/baidu/ERNIE-4.5-21B-A3B-Thinking

Anonymous
09/11/25(Thu)03:30:19 No.106554219

Anonymous 09/11/25(Thu)03:30:19 No.106554219

>>106554062
It's lmg mascot samsune alku

Anonymous
09/11/25(Thu)03:39:05 No.106554256

Anonymous 09/11/25(Thu)03:39:05 No.106554256

>>106554094
Superficially, it's the same. It's writing a few long sentences or a lot of short ones. The amount of words is the same.
I very vaguely remember Google arguing that a deeper network (more, smaller layers) was better than a shallow one (fewer but fatter layers), but it could be the other way around. I couldn't find a source for that in the 2 nanoseconds I spent searching. In the gguf models, Gemma-3-12b has 47 repeating layers and nemo-12b has 39, for example.
Really, it's hard to know unless someone trains the two types of models with exactly the same data and see what comes out better. All you should probably really care about is the total amount of params and how good it is for whatever you do. I doubt we can make a meaningful distinction between them considering all the other differences between models.

Anonymous
09/11/25(Thu)03:51:28 No.106554308

Anonymous 09/11/25(Thu)03:51:28 No.106554308

>>106551993
t. llamaphile

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)03:55:30 No.106554325

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)03:55:30 No.106554325

>>106552095
>>106552267
Because no one has invested the effort to support/maintain it.
Regarding why I think it's not worth it: the advantage over conventional MoE would be speed, but if the number of active experts changes dynamically the performance will be terrible.

Anonymous
09/11/25(Thu)03:56:00 No.106554327

Anonymous 09/11/25(Thu)03:56:00 No.106554327

>>106554256
I mostly ask because I loaded a 12B (GGUF) model that fully fits on my VRAM but it takes up way more layers and runs much slower than my usual with Rocinante, which is usually very snappy.

Anonymous
09/11/25(Thu)03:58:24 No.106554336

Anonymous 09/11/25(Thu)03:58:24 No.106554336

I hate thinking models

Anonymous
09/11/25(Thu)04:06:46 No.106554360

Anonymous 09/11/25(Thu)04:06:46 No.106554360

>>106554327
if that 12b is based on gemma that's normal

Anonymous
09/11/25(Thu)04:07:50 No.106554362

Anonymous 09/11/25(Thu)04:07:50 No.106554362

>>106554327
You could have started there. Check your memory usage in llama.cpp's output, see where the memory is going for layers and context. There aren't many 12bs, so i assume you're talking about gemma being slower than nemo.
It could also be a matter of the head count of the model. I understand some models run faster because llama.cpp has kernels optimized for some particular head counts. I'm sure CUDA dev could give you more insight if you post the exact models you're using, the performance you're getting with them, your specs (particularly, gpu model), your run commands for each. Make it easy for people to help you.

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)04:15:33 No.106554384

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)04:15:33 No.106554384

>>106554327
The number of layers is largely irrelevant, that's just how the parameters of the model are grouped.
If I had to guess the problem has to do with KV cache quantization since that in conjunction with a head size of 256 (Gemma) does not have a CUDA implementation.

Anonymous
09/11/25(Thu)04:23:13 No.106554412

Anonymous 09/11/25(Thu)04:23:13 No.106554412

>>106554325
Excuses excuses, you just don't want yet another code path. Inference won't care and prompt processing can use worst case. You don't have to solve it optimally.

Anonymous
09/11/25(Thu)04:33:37 No.106554458

Anonymous 09/11/25(Thu)04:33:37 No.106554458

>>106554384
>>106554360
>>106554362
It is Gemma based, you're right. Its not too big a deal that I get this particular model running, I try and discard so many, but I did want to learn a bit of what was going on.
I'll try disabling the KV thing in Kobold.

Anonymous
09/11/25(Thu)05:11:36 No.106554580

Anonymous 09/11/25(Thu)05:11:36 No.106554580

>>106554153
stop posting models here I cant stop myself from downloading

Anonymous
09/11/25(Thu)05:16:42 No.106554594

Anonymous 09/11/25(Thu)05:16:42 No.106554594

Can Gemma Q8 fit in a 5090?

Anonymous
09/11/25(Thu)05:18:07 No.106554602

Anonymous 09/11/25(Thu)05:18:07 No.106554602

>>106554594
yeah

Anonymous
09/11/25(Thu)05:20:47 No.106554614

Anonymous 09/11/25(Thu)05:20:47 No.106554614

>>106554594
You can fit the whole model at Q8 but you won't have room for much context

Anonymous
09/11/25(Thu)05:38:47 No.106554677

Anonymous 09/11/25(Thu)05:38:47 No.106554677

>>106554614
You are absolutely right-- a great insight!

Anonymous
09/11/25(Thu)05:39:16 No.106554679

Anonymous 09/11/25(Thu)05:39:16 No.106554679

File: scout miku.jpg (1.41 MB, 1344x1728)

1.41 MB JPG

Anonymous
09/11/25(Thu)06:47:08 No.106554925

Anonymous 09/11/25(Thu)06:47:08 No.106554925

>>106552939
"not even mad" moment

safetyslop wouldn't be so bad if models were more cute about it.

Anonymous
09/11/25(Thu)06:48:22 No.106554932

Anonymous 09/11/25(Thu)06:48:22 No.106554932

>>106553015
at least ask medgemma

Anonymous
09/11/25(Thu)06:49:51 No.106554938

Anonymous 09/11/25(Thu)06:49:51 No.106554938

>>106554679
I hope this Miku knows where she's going.

Anonymous
09/11/25(Thu)06:54:54 No.106554961

Anonymous 09/11/25(Thu)06:54:54 No.106554961

>>106553263
You probably can try enabling --context-shift, but you model needs to support it.
And it can will not help much anyway because by default ST fucks around with beginning of the prompt, invalidating the cache.

Anonymous
09/11/25(Thu)06:57:36 No.106554971

Anonymous 09/11/25(Thu)06:57:36 No.106554971

>>106553206
MoE era was pretty good for vramlets, but for RP your next step/side grade after Nemo is GLM-air, which requires you to be not a ramlet as well.

Anonymous
09/11/25(Thu)06:59:52 No.106554982

Anonymous 09/11/25(Thu)06:59:52 No.106554982

File: leaveme.jpg (101 KB, 768x515)

101 KB JPG

>>106553388
I blame the fact that AI people are academics, not engineers.

Anonymous
09/11/25(Thu)07:00:25 No.106554985

Anonymous 09/11/25(Thu)07:00:25 No.106554985

>>106554971
Air is shit though

Anonymous
09/11/25(Thu)07:03:16 No.106554992

Anonymous 09/11/25(Thu)07:03:16 No.106554992

>>106554985
>air is shit
skill issue

Anonymous
09/11/25(Thu)07:04:10 No.106554995

Anonymous 09/11/25(Thu)07:04:10 No.106554995

>>106554992
>thinks air beats nemo
skill issue

Anonymous
09/11/25(Thu)07:04:44 No.106554998

Anonymous 09/11/25(Thu)07:04:44 No.106554998

>>106551820
Yeah, but Gemma sucks for RP. Like, it's not that it refuses, it's just not well versed in it. Boring and borderline stupid responses a lot of the time.

>>106554985
I find Air good for oneshots and generating responses in the middle of a RP. If you edit the think block it can be amazing. Thing is, I don't feel like editing the think block if I already edit the responses a lot. Maybe one day we'll have a local model where one does not have to edit shit and can go with the flow instead...

Anonymous
09/11/25(Thu)07:04:49 No.106555000

Anonymous 09/11/25(Thu)07:04:49 No.106555000

>>106554985
Better than Nemo in many aspects.

Anonymous
09/11/25(Thu)07:06:01 No.106555004

Anonymous 09/11/25(Thu)07:06:01 No.106555004

>>106555000
>Nuclear bomb vs coughing baby ahh comparison

Anonymous
09/11/25(Thu)07:07:04 No.106555008

Anonymous 09/11/25(Thu)07:07:04 No.106555008

>>106554153
Jeejuff status?

Anonymous
09/11/25(Thu)07:08:53 No.106555015

Anonymous 09/11/25(Thu)07:08:53 No.106555015

>>106553388
The default on the latest master version is to put everything into VRAM for maximum speed.
You're not poor, are you?

Anonymous
09/11/25(Thu)07:09:55 No.106555020

Anonymous 09/11/25(Thu)07:09:55 No.106555020

>>106554998
I just turn off thinking for RP.
>>106555004
For a poorfag vramlet there's nothing in-between aside from copetunes.

Anonymous
09/11/25(Thu)07:11:09 No.106555026

Anonymous 09/11/25(Thu)07:11:09 No.106555026

>>106553388
Hey, llama.cpp recently added auto-detection to flash attention at least.

Anonymous
09/11/25(Thu)07:12:32 No.106555033

Anonymous 09/11/25(Thu)07:12:32 No.106555033

>>106554998
I think you are expecting bit too much from these models.

Anonymous
09/11/25(Thu)07:15:42 No.106555039

Anonymous 09/11/25(Thu)07:15:42 No.106555039

>>106555020
>copetunes
who wins the title of the most COPE finetunes, davidAU or thedrummer(tm)?

Anonymous
09/11/25(Thu)07:16:01 No.106555040

Anonymous 09/11/25(Thu)07:16:01 No.106555040

>>106555026
Making it worse on AMD so you have to explicitly disable it now

Anonymous
09/11/25(Thu)07:16:53 No.106555045

Anonymous 09/11/25(Thu)07:16:53 No.106555045

>>106555004
Stfu zoomer

Anonymous
09/11/25(Thu)07:17:41 No.106555049

Anonymous 09/11/25(Thu)07:17:41 No.106555049

>>106553015
>(with best in class health care that is free)
Your perception is five vials of bear bile and a pinch of ground up rhinoceros horn

Anonymous
09/11/25(Thu)07:18:04 No.106555052

Anonymous 09/11/25(Thu)07:18:04 No.106555052

>>106555020
>I just turn off thinking for RP.
You might turn off your own as well

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)07:19:24 No.106555059

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)07:19:24 No.106555059

>>106555040
pp speed issues should be largely fixed with https://github.com/ggml-org/llama.cpp/pull/15927 .

Anonymous
09/11/25(Thu)07:19:49 No.106555061

Anonymous 09/11/25(Thu)07:19:49 No.106555061

>>106555026
should we just use "-fa 1" all the time in llama.cpp then? any reason not to use it if using cuda or gpu+some offloading to ram?

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)07:21:12 No.106555068

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)07:21:12 No.106555068

>>106555061
FA is not supported for some (meme) models so enabling it unconditionally for those would trigger a CPU fallback and massively gimp performance.

Anonymous
09/11/25(Thu)07:25:58 No.106555088

Anonymous 09/11/25(Thu)07:25:58 No.106555088

>>106555039
drummer - copetunes
davidau - shizotunes

Anonymous
09/11/25(Thu)07:26:26 No.106555093

Anonymous 09/11/25(Thu)07:26:26 No.106555093

File: screenshot_chat.png (132 KB, 1920x947)

132 KB PNG

>>106551921
>https://github.com/mudler/LocalAI
>one frontend for everything
>integrated audio, images, video
>optionally use cloudshit
This is looking pretty good, has anyone tried it?

Anonymous
09/11/25(Thu)07:27:51 No.106555106

Anonymous 09/11/25(Thu)07:27:51 No.106555106

>https://thinkingmachines.ai/blog/defeating-nondeterminism-in-llm-inference/
Does this meanin theory with modified kernels we'd be able to get the same logits in Llama.cpp regardless of batch size and when "swiping"? I haven't read through the post yet.

Anonymous
09/11/25(Thu)07:30:02 No.106555115

Anonymous 09/11/25(Thu)07:30:02 No.106555115

>>106555093
why should I use that over the many multi-be frontends that doesn't look like shit and have more features?

Anonymous
09/11/25(Thu)07:31:13 No.106555121

Anonymous 09/11/25(Thu)07:31:13 No.106555121

>>106555115
Like?

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)07:37:58 No.106555150

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)07:37:58 No.106555150

>>106555106
>batch size
I'm not going to write kernels specifically to do all floating point in the exact same order regardless of batch size.
That would be a huge amount of effort for a meme feature that no one will use because the performance would be bad.

>swiping
It's not necessary to modify any kernels, only the logic for what parts of the prompt are cached.
If you cache the prompt only in multiples of the physical batch size you should get deterministic results on swiping.
(Or if you cache the logits of the last eval prior to generating tokens.)

Anonymous
09/11/25(Thu)07:41:37 No.106555169

Anonymous 09/11/25(Thu)07:41:37 No.106555169

>>106555106
This shouldn't be an issue with int quants, no? Unless they only use ints for storage and still use floating point for math...

Anonymous
09/11/25(Thu)07:42:29 No.106555170

Anonymous 09/11/25(Thu)07:42:29 No.106555170

>>106554153
Last big ernie had sex performance of a dense 30B-old.

Anonymous
09/11/25(Thu)07:51:30 No.106555207

Anonymous 09/11/25(Thu)07:51:30 No.106555207

>>106554153
>>106555008
^ Already available apparently, no arch changes over big ERNIE.
Anyway with greedy sampling it's schizo as fuck even at t=0.8
It's at least coherent at t=0.3 though. But still a bit schizo.

Anonymous
09/11/25(Thu)08:00:01 No.106555257

Anonymous 09/11/25(Thu)08:00:01 No.106555257

on foenem grave
on bdk even
4 days chilling and not caring about llms
bam, you're out of the loop
t's crazy

Anonymous
09/11/25(Thu)08:07:40 No.106555301

Anonymous 09/11/25(Thu)08:07:40 No.106555301

HUGE NEWS!!!!
BIG IF TRUE!!!! BIGLY, EVEN!!!!
LARGE IF FACTUAL!!!
https://youtu.be/5gUR55_gbzc

Anonymous
09/11/25(Thu)08:10:09 No.106555312

Anonymous 09/11/25(Thu)08:10:09 No.106555312

>>106551921
I got access to 8 V100s from my corporation and they entrusted me to do whatever i want with them.
Aside obviously cryptomining I am thinking of making a code generator and couple of ai workflows.

I tried cutting it with qwen3-coder and ollama-code but I guess I can't do it properly, any help?

Anonymous
09/11/25(Thu)08:10:18 No.106555313

Anonymous 09/11/25(Thu)08:10:18 No.106555313

Worst thing about these "Miracle AGI in Two Weeks" - models is the fact they can't produce unified style, every code snipped is always different in naming conventions and whatnot.

Anonymous
09/11/25(Thu)08:14:54 No.106555337

Anonymous 09/11/25(Thu)08:14:54 No.106555337

>>106551921
https://vocaroo.com/1RbDzkuHTt8V

Anonymous
09/11/25(Thu)08:15:36 No.106555341

Anonymous 09/11/25(Thu)08:15:36 No.106555341

>>106555121
Openwebui

Anonymous
09/11/25(Thu)08:17:33 No.106555357

Anonymous 09/11/25(Thu)08:17:33 No.106555357

>>106555312
>Another episode of a two digits IQ with too much compute
Put them in your ass and do a tiktok

Anonymous
09/11/25(Thu)08:26:10 No.106555405

Anonymous 09/11/25(Thu)08:26:10 No.106555405

>>106555313
I noticed it when make scripts half the time the command line argument uses an underscore(--some_parameter) the other half a dash(--some-other-parameter). and python is slow as shit so it really hurts productivity when it takes 5+ seconds for it to error out and display the help. I have even seen them mix the styles on a single script. I guess I could probably tell it the style to use but I don't because it should just know better.

Anonymous
09/11/25(Thu)08:31:13 No.106555435

Anonymous 09/11/25(Thu)08:31:13 No.106555435

>>106555313
Lower the temp

Anonymous
09/11/25(Thu)08:35:31 No.106555461

Anonymous 09/11/25(Thu)08:35:31 No.106555461

>>106555337
Local voice is saved, wow

Now we just need text!

Anonymous
09/11/25(Thu)08:35:53 No.106555465

Anonymous 09/11/25(Thu)08:35:53 No.106555465

>>106555312
If they're 16GB V100s, you can run GLM-4.5 Air with maybe decent speed on them. If they're 32GB, you can still run GLM-4.5 Air with maybe decent speed but fit more context or concurrent requests.

Anonymous
09/11/25(Thu)08:36:23 No.106555470

Anonymous 09/11/25(Thu)08:36:23 No.106555470

>>106555461
>Local voice is saved, wow
It needs to be better at Japanese first.

Anonymous
09/11/25(Thu)08:43:16 No.106555506

Anonymous 09/11/25(Thu)08:43:16 No.106555506

>>106555357
Built a fastapi, vector DB, ollama service within 2 weeks on the job bub, stay jelly
Now I got time to spare while they're looking for clients with PoC.

>>106555465
GLM-4.5 Air has horrible benchmarks my guy, and it's a behemoth. Why? I could just do MoE instead?

Anonymous
09/11/25(Thu)08:45:54 No.106555522

Anonymous 09/11/25(Thu)08:45:54 No.106555522

>>106555506
It's MoE. You can try a bigger MoE and quantize it more if you want, but I'm not sure how fast quantized models run on V100. Actually, I guess with 16GB ones you'd have to use a quantized one too and V100 doesn't have FP8 support yet.

Anonymous
09/11/25(Thu)08:45:58 No.106555524

Anonymous 09/11/25(Thu)08:45:58 No.106555524

>>106555506
>GLM-4.5 Air
>a behemoth
>I could just do MoE instead?
Not that guy, but GLM 4.5 Air is a MoE.

Anonymous
09/11/25(Thu)08:46:20 No.106555529

Anonymous 09/11/25(Thu)08:46:20 No.106555529

>>106555341
OpenWebUI is purely a frontend. It doesn't manage loading or running models. The two do not compete.

LocalAI is more or less a competitor to Ollama for handling loading and running the models via various backends (including your own custom ones if desired). It's miles better than Ollama and isn't tied to the hip of llama.cpp, but the only downside is it hides some detailed settings from the backends at times. For most people it won't matter tho. The frontend portion of LocalAI imo is just for testing and getting models/backends loaded. It doesn't have things like chat history, suggestions, prompts, etc so it's not really competing with OpenWebUI.

If you're running a lot of models and various backends it makes perfect sense to use LocalAI, it handles all the backends and provides a single point to access it all for other tools. That's the selling point. Not the frontend.

Anonymous
09/11/25(Thu)08:46:34 No.106555530

Anonymous 09/11/25(Thu)08:46:34 No.106555530

File: date with miku - bad end.png (1.47 MB, 1024x1512)

1.47 MB PNG

Your response?

Anonymous
09/11/25(Thu)08:47:25 No.106555535

Anonymous 09/11/25(Thu)08:47:25 No.106555535

>>106555506
You built nothing inbread retard, github is littered of these worthless projects. Thanks for providing your double digits IQ btw

Anonymous
09/11/25(Thu)08:47:30 No.106555536

Anonymous 09/11/25(Thu)08:47:30 No.106555536

>>106555530
I wasn't asking.

Anonymous
09/11/25(Thu)08:49:30 No.106555547

Anonymous 09/11/25(Thu)08:49:30 No.106555547

>>106555530
That wouldn't happen because I wouldn't own just a 3090 in a reality where Miku is actually real. Nor would she respond that way if she were real.

Anonymous
09/11/25(Thu)08:49:35 No.106555548

Anonymous 09/11/25(Thu)08:49:35 No.106555548

>>106555529
Okay you're the dev, you should have told so before wasting everyone's time

Anonymous
09/11/25(Thu)08:50:48 No.106555560

Anonymous 09/11/25(Thu)08:50:48 No.106555560

>>106555548
>lie on the internet
>get corrected
>HURR DURR YOUR JUST A DEV
>>>/pol/ Go back and stay in your containment board.

Anonymous
09/11/25(Thu)08:50:50 No.106555561

Anonymous 09/11/25(Thu)08:50:50 No.106555561

File: Screenshot_20250820_164821.png (570 KB, 1167x1641)

570 KB PNG

>>106555530

Anonymous
09/11/25(Thu)08:50:52 No.106555562

Anonymous 09/11/25(Thu)08:50:52 No.106555562

>>106555530
What CAN I run on my single 3090?

Anonymous
09/11/25(Thu)08:52:15 No.106555571

Anonymous 09/11/25(Thu)08:52:15 No.106555571

>>106555535
I and my company know my worth. You're jealous I have access to 8 V100s and can sleep up until my standup and do nothing all day but shitpost here.

Anonymous
09/11/25(Thu)08:52:40 No.106555574

Anonymous 09/11/25(Thu)08:52:40 No.106555574

File: 1754493464792375.png (1.4 MB, 1664x928)

1.4 MB PNG

>>106555530
Picrel

Anonymous
09/11/25(Thu)08:54:54 No.106555586

Anonymous 09/11/25(Thu)08:54:54 No.106555586

>>106555522
>>106555524
they're 32GB, damn skimmed through the description didn't catch the MoE. Okay, thanks fellas. This makes sense to implement. Even though the higher ups are focused like hawks on having gpt-oss:120b model "cuz it sounds cool to have Chatgpt model" but I should make a benchmark argument.

Anonymous
09/11/25(Thu)08:55:23 No.106555590

Anonymous 09/11/25(Thu)08:55:23 No.106555590

>>106555560
Are you having a meltdown?

Anonymous
09/11/25(Thu)08:55:29 No.106555591

Anonymous 09/11/25(Thu)08:55:29 No.106555591

>>106555571
>do nothing all day but shitpost here.
A fate worse than death

Anonymous
09/11/25(Thu)08:56:33 No.106555598

Anonymous 09/11/25(Thu)08:56:33 No.106555598

>>106555586
godspeed anon.

Anonymous
09/11/25(Thu)08:56:37 No.106555600

Anonymous 09/11/25(Thu)08:56:37 No.106555600

>>106555590
No, but you are. Go back to trolling other people retard. Not my fault you don't understand the difference between tools like OpenWebUI and LocalAI/Ollama.

Anonymous
09/11/25(Thu)09:03:47 No.106555646

Anonymous 09/11/25(Thu)09:03:47 No.106555646

>doing tests with Qwen3
>its reasoning eats up thousands of tokens
>only to produce a simple reply
But as for a comparison its reasoning is actually logical and coherent, unlike what GPT-OSS is doing.

Anonymous
09/11/25(Thu)09:05:29 No.106555660

Anonymous 09/11/25(Thu)09:05:29 No.106555660

>>106555530
I have zero 3090

Anonymous
09/11/25(Thu)09:05:58 No.106555662

Anonymous 09/11/25(Thu)09:05:58 No.106555662

File: Gc_7YB0WwAAbOqK.jpg (28 KB, 370x559)

28 KB JPG

>>106555598

Anonymous
09/11/25(Thu)09:07:45 No.106555671

Anonymous 09/11/25(Thu)09:07:45 No.106555671

>>106555600
No one is using your trash, it's either llama.cpp or kobold. I think you're lost, go shill in reddit

Anonymous
09/11/25(Thu)09:08:22 No.106555674

Anonymous 09/11/25(Thu)09:08:22 No.106555674

>>106555671
Keep seething child. You once again showed you have no idea how these tools work. Unironically grow the fuck up.

Anonymous
09/11/25(Thu)09:12:21 No.106555702

Anonymous 09/11/25(Thu)09:12:21 No.106555702

>>106555674
nta, but no, you infant. I will not, you placental discharge! For I am a grown up and I show it by calling you a discarded blastocyst!

Anonymous
09/11/25(Thu)09:13:51 No.106555717

Anonymous 09/11/25(Thu)09:13:51 No.106555717

>>106555586
Rather than focusing on benchmarks, you should try both models and see which one does better on your tasks.

Anonymous
09/11/25(Thu)09:14:10 No.106555721

Anonymous 09/11/25(Thu)09:14:10 No.106555721

File: naked pepefrog.png (232 KB, 655x599)

232 KB PNG

>thread fine all day during asian hours
>europeans wake up
>thread goes to shit

Anonymous
09/11/25(Thu)09:14:24 No.106555725

Anonymous 09/11/25(Thu)09:14:24 No.106555725

hi
it's late 2025 now. is the best card still 3090?
thank you sirs

Anonymous
09/11/25(Thu)09:15:06 No.106555732

Anonymous 09/11/25(Thu)09:15:06 No.106555732

>>106555560
That anon's right, you're a shill. Off yourself.

Anonymous
09/11/25(Thu)09:16:26 No.106555750

Anonymous 09/11/25(Thu)09:16:26 No.106555750

>>106555721
>europeans wake up
>14:16

Anonymous
09/11/25(Thu)09:17:16 No.106555755

Anonymous 09/11/25(Thu)09:17:16 No.106555755

>>106555725
Yup.

Anonymous
09/11/25(Thu)09:18:32 No.106555761

Anonymous 09/11/25(Thu)09:18:32 No.106555761

>>106555732
Nah, fuck yourself child. You're malding because I called you out on a blatant lie. You don't belong in a thread about LLMs if you can't comprehend the difference between a frontend and an orchestrator for backends. You don't get to sit here and act superior when you're a fucking monkey with less brains than gpt-oss-20b

Anonymous
09/11/25(Thu)09:19:46 No.106555770

Anonymous 09/11/25(Thu)09:19:46 No.106555770

>>106555717
We are doing GRC policy generation and requirements, and even though llama3.1 was shown to have the best results they still want to go with gpt-oss just for marketing purposes.

Anonymous
09/11/25(Thu)09:20:37 No.106555776

Anonymous 09/11/25(Thu)09:20:37 No.106555776

>>106555750
>14:16
>europe

Anonymous
09/11/25(Thu)09:21:16 No.106555782

Anonymous 09/11/25(Thu)09:21:16 No.106555782

>>106555591
I said that to make you jealous because you sound like a guy that would get jealous at that, I in fact work on my startup idea and don't waste my time, but thanks for worrying

Anonymous
09/11/25(Thu)09:21:33 No.106555784

Anonymous 09/11/25(Thu)09:21:33 No.106555784

>>106555776
this lmao I literally fell of my chair

Anonymous
09/11/25(Thu)09:22:13 No.106555794

Anonymous 09/11/25(Thu)09:22:13 No.106555794

>>106555506
>Built a fastapi, vector DB, ollama service within 2 weeks
Why did it take you 2 weeks? lol

Anonymous
09/11/25(Thu)09:22:28 No.106555796

Anonymous 09/11/25(Thu)09:22:28 No.106555796

>>106555776
portugal is a proud member of europe.

Anonymous
09/11/25(Thu)09:23:34 No.106555800

Anonymous 09/11/25(Thu)09:23:34 No.106555800

>>106555470
Make your own Japanese finetune.

Anonymous
09/11/25(Thu)09:24:22 No.106555803

Anonymous 09/11/25(Thu)09:24:22 No.106555803

>>106555761
You prepubescent spermatozoa!!!!!

Anonymous
09/11/25(Thu)09:26:56 No.106555821

Anonymous 09/11/25(Thu)09:26:56 No.106555821

>>106555800
Having Japanese support in a separate model is less convenient, and it would probably regrade English, unless I tune on both and that's a lot more data work.

Anonymous
09/11/25(Thu)09:27:07 No.106555823

Anonymous 09/11/25(Thu)09:27:07 No.106555823

Gemini told me that there's no reason to use a model under Q6 and that's it's better to use a 7B Q8 model over a 32B Q4 model.

Anonymous
09/11/25(Thu)09:27:12 No.106555825

Anonymous 09/11/25(Thu)09:27:12 No.106555825

I just wanted to know whether anyone has experience with LocalAI, not for two other people to start flinging shit at each other.

Anonymous
09/11/25(Thu)09:28:57 No.106555833

Anonymous 09/11/25(Thu)09:28:57 No.106555833

>>106555823
just b urself

Anonymous
09/11/25(Thu)09:28:57 No.106555834

Anonymous 09/11/25(Thu)09:28:57 No.106555834

>>106555761
>child
>You don't get to sit here and act superior

Anonymous
09/11/25(Thu)09:29:05 No.106555835

Anonymous 09/11/25(Thu)09:29:05 No.106555835

>>106555825
sure thing dude

Anonymous
09/11/25(Thu)09:29:18 No.106555839

Anonymous 09/11/25(Thu)09:29:18 No.106555839

>>106555823
Now go test that theory.
Find a small set of workloads and try a 7B a a 32B model from the same family and see how those perform in comparison to each other.

Anonymous
09/11/25(Thu)09:30:50 No.106555848

Anonymous 09/11/25(Thu)09:30:50 No.106555848

>>106555825
I would suggest you head over to /vg/ >>>/vg/538681706 if you want actual advice and help. /g/ is more like a consumer shitposting board.

Anonymous
09/11/25(Thu)09:31:04 No.106555850

Anonymous 09/11/25(Thu)09:31:04 No.106555850

>>106555800
Would've been possible had they not chickened out and tried to un-release the model and code

Anonymous
09/11/25(Thu)09:32:00 No.106555852

Anonymous 09/11/25(Thu)09:32:00 No.106555852

>>106555794
Because of back and forth with management of how it should work. GRC policy generation and evidence file comparison isn't really my field of expertise.

How long would it take you to make a couple of endpoints that would ingest document, put them in a vector DB and then query the DB for chunks of needed parts for the LLM? The codebase spans 1200 lines of code and everything is dockerized behind nginx reverse proxy (I am waiting for green light for eventual horizontal scaling)

Anonymous
09/11/25(Thu)09:35:08 No.106555867

Anonymous 09/11/25(Thu)09:35:08 No.106555867

>>106555848
What's the difference between /g/aicg and /vg/aicg?

Anonymous
09/11/25(Thu)09:36:26 No.106555872

Anonymous 09/11/25(Thu)09:36:26 No.106555872

>>106555852
>1200 lines of code
Fwaaaaaa one thousand two hundred lines of code. waaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaawwwwwwwwwwwwwwwwwwwwwwwwwwww

Anonymous
09/11/25(Thu)09:36:46 No.106555877

Anonymous 09/11/25(Thu)09:36:46 No.106555877

>>106555800
you know its not that easy faggot

Anonymous
09/11/25(Thu)09:37:54 No.106555884

Anonymous 09/11/25(Thu)09:37:54 No.106555884

>>106555872
>continuous empty posturing with no real substance
I will stop replying to you now

Anonymous
09/11/25(Thu)09:38:21 No.106555885

Anonymous 09/11/25(Thu)09:38:21 No.106555885

>>106555867
I found the /vg/ thread to have more knowledgeable people if you need help setting up silly or such. /g/ just tends to keep up with news a bit better but lack experience. It's basically the difference between people that do and people that repost news.

Anonymous
09/11/25(Thu)09:41:05 No.106555901

Anonymous 09/11/25(Thu)09:41:05 No.106555901

>>106555884
>I will stop replying to you now
I'm someone else, anon. I just think you're a retard.

Anonymous
09/11/25(Thu)09:57:20 No.106555997

Anonymous 09/11/25(Thu)09:57:20 No.106555997

Is it normal to stop understanding your own code at some point?

Anonymous
09/11/25(Thu)09:58:11 No.106556004

Anonymous 09/11/25(Thu)09:58:11 No.106556004

>>106555825
I just don't see the point of all these wrappers around wrappers, that at a glance look no better than llama.cpp built-in UI.
Local models are all retarded, so If you're in any way serious about extracting some value out of them, you should be really sticking your hands elbow-deep into the guts of these things, not run temu cloudshit replicas with zero benefits cloudshit could offer.

Anonymous
09/11/25(Thu)09:58:35 No.106556006

Anonymous 09/11/25(Thu)09:58:35 No.106556006

>>106555782
Can I get a picture of those a100s in action?

Anonymous
09/11/25(Thu)09:59:50 No.106556011

Anonymous 09/11/25(Thu)09:59:50 No.106556011

>>106555997
Yes.
Then you'll loop around to it all making sense after a while, just keep at it.

Anonymous
09/11/25(Thu)10:00:19 No.106556015

Anonymous 09/11/25(Thu)10:00:19 No.106556015

>>106555997
no? If you're generating code from LLMs I high suggest you actually refactor it yourself

Anonymous
09/11/25(Thu)10:05:48 No.106556046

Anonymous 09/11/25(Thu)10:05:48 No.106556046

>>106556004
>Local models are all retarded,
I have good success with OpenHands-30B

Anonymous
09/11/25(Thu)10:06:43 No.106556050

Anonymous 09/11/25(Thu)10:06:43 No.106556050

>>106554384
>The number of layers is largely irrelevant, that's just how the parameters of the model are grouped.
Minksy showed in 1969 that single-layer neural networks have hard limitations regardless of how wide they are. No one is stacking layers for fun since they'd be getting better speed by not.

Anonymous
09/11/25(Thu)10:14:31 No.106556099

Anonymous 09/11/25(Thu)10:14:31 No.106556099

>>106556015
I only generate something if I don't know something well, like regex patterns but everything else is refactored.
It's easy to be lazy though and the foreign logic is still confusing.

Anonymous
09/11/25(Thu)10:15:13 No.106556108

Anonymous 09/11/25(Thu)10:15:13 No.106556108

>>106556050
>1969
okay gramps, you're talking to a llama.cpp dev.

Anonymous
09/11/25(Thu)10:18:29 No.106556127

Anonymous 09/11/25(Thu)10:18:29 No.106556127

>>106555049
No, they are pretty good now, at least in the large cities. I doubt you can get a good MD in the countryside, they are probably relying on plants (which can work very well) and things like Qigong, which is at best a relaxation practice.

Anonymous
09/11/25(Thu)10:20:46 No.106556149

Anonymous 09/11/25(Thu)10:20:46 No.106556149

>>106555049
>>106556127
Clueless how Americans think China is living in dark ages. China was doing so well with it's population health that they had to limit the number of children by law just to stop overpopulation. There's something you won't see i n America or Europe due to the declining health and infertility rates.

Anonymous
09/11/25(Thu)10:21:48 No.106556153

Anonymous 09/11/25(Thu)10:21:48 No.106556153

>>106551921
What do we know about Qwen-Next? I know it's supposed to be an "omni" model with 80B-A3B parameters. Should we expect a subpart text generator and a useless image generator (except for the science of how to build such a model)?

Anonymous
09/11/25(Thu)10:25:53 No.106556185

Anonymous 09/11/25(Thu)10:25:53 No.106556185

>>106556153
Oh, maybe the "omni" is just about a single, unified, network to handle text, audio and image inputs.

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)10:28:34 No.106556200

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)10:28:34 No.106556200

>>106556050
Yes, in terms of inference speed a few large tensors are in principle better than many small matrices but in the context of the question it is not a significant factor.
For any reasonable configuration of a 12b model on a consumer GPU the tensors will be sufficiently large, particularly because llama.cpp/ggml uses a stream-k decomposition to distribute the work to streaming multiprocessors.

I did not intend to make any statement regarding depth vs. width in terms of how capable the model is.

Anonymous
09/11/25(Thu)10:29:04 No.106556207

Anonymous 09/11/25(Thu)10:29:04 No.106556207

>>106556153
Qwext will save local.

Anonymous
09/11/25(Thu)10:31:21 No.106556219

Anonymous 09/11/25(Thu)10:31:21 No.106556219

>>106554938
She doesn’t have a clue, but that smile... how could anyone say no to getting lost with her?

Anonymous
09/11/25(Thu)10:39:53 No.106556295

Anonymous 09/11/25(Thu)10:39:53 No.106556295

>>106556153
>it's supposed to be an "omni"
It is?

Anonymous
09/11/25(Thu)10:40:37 No.106556302

Anonymous 09/11/25(Thu)10:40:37 No.106556302

>2025
>people still recommending llama.cpp over vllm
I really question if this thread is a demoralization thread to get people to have bad experiences with llms

Anonymous
09/11/25(Thu)10:41:57 No.106556313

Anonymous 09/11/25(Thu)10:41:57 No.106556313

>>106556302
Gift anons the high VRAM cards needed for your pile of python shit and maybe they'll use it.

Anonymous
09/11/25(Thu)10:46:19 No.106556344

Anonymous 09/11/25(Thu)10:46:19 No.106556344

>>106556149
>China was doing so well with it's population health that they had to limit the number of children by law just to stop overpopulation.

Anonymous
09/11/25(Thu)10:53:14 No.106556386

Anonymous 09/11/25(Thu)10:53:14 No.106556386

>>106556295
Apparently not, I got confused or read something false somewhere.
https://huggingface.co/docs/transformers/main/model_doc/qwen3_next

Anonymous
09/11/25(Thu)11:14:03 No.106556580

Anonymous 09/11/25(Thu)11:14:03 No.106556580

>>106551921
>https://rentry.org/LocalModelsLinks
Frens what is best models right now for text gen?
Still the one listed in the guide?

Anonymous
09/11/25(Thu)11:16:28 No.106556608

Anonymous 09/11/25(Thu)11:16:28 No.106556608

>>106556580
it goes more or less like this
>poor: rocinante
>slightly less poor: cydonia
>not famished: glm air
>CPUMAXX tier: kimi k2, glm 4.5, deepseek 3.1

Anonymous
09/11/25(Thu)11:17:53 No.106556621

Anonymous 09/11/25(Thu)11:17:53 No.106556621

>>106556580
>Edit: 05 Sep 2025 18:45 UTC
yeah, nothing happened in the last 15 minutes.

Anonymous
09/11/25(Thu)11:20:26 No.106556649

Anonymous 09/11/25(Thu)11:20:26 No.106556649

>>106556608
>>106556621
Ty

Anonymous
09/11/25(Thu)11:36:22 No.106556786

Anonymous 09/11/25(Thu)11:36:22 No.106556786

File: Coding Assistant.jpg (101 KB, 1034x339)

101 KB JPG

Anonymous
09/11/25(Thu)11:38:23 No.106556804

Anonymous 09/11/25(Thu)11:38:23 No.106556804

>>106556786
Probably the gayest fanfict i've read from this thread to date.

Anonymous
09/11/25(Thu)11:39:32 No.106556823

Anonymous 09/11/25(Thu)11:39:32 No.106556823

>>106556804
You are clearly missing something here...

Anonymous
09/11/25(Thu)11:42:45 No.106556843

Anonymous 09/11/25(Thu)11:42:45 No.106556843

>>106556786
is this glm? fucking repeats itself I hate this slop

Anonymous
09/11/25(Thu)11:46:21 No.106556863

Anonymous 09/11/25(Thu)11:46:21 No.106556863

>>106556580
depends on what you can run
very poor(12b): nemo (or any derivative thereof)
less poor(20-30b): gemma3, mistral small, some qwens i think idk
not poor(70b:havent kept up with this so idk): miqu,llama 3.x (i forget which ones and idk if true but it kept getting shilled) some other shit again idk
limit of gpus(~120b): glm air
cpumaxxing(up to 1T):
deepseek r1: very schizo but the most soulful context goes to shit around 10k tokens
deepseek r1-0528 way less schizo and way less soulful slightly better context
deepseek v3-0324 okay for rp shitty for storywriting
deepseek v3.1 worse in everyway to the other ones dont use
kimi k2 (both the old and new) shit for storywriting best for rp also for good for questioning about things as it knows a fuck ton like truly a fuck ton
z.ai glm4.5 full: good for storywriting but quite bland dident try for rp

deepseek r1t2: again dogshit worse in everyway even coding dont use

not an exhaustive list but there you go

Anonymous
09/11/25(Thu)11:48:28 No.106556872

Anonymous 09/11/25(Thu)11:48:28 No.106556872

>>106555885
Tranny jannies made everyone leave. It is just you the newfags that are left.

Anonymous
09/11/25(Thu)11:48:34 No.106556874

Anonymous 09/11/25(Thu)11:48:34 No.106556874

One of the best roleplaying models (superhot) is just a mere 30B

Anonymous
09/11/25(Thu)11:56:07 No.106556934

Anonymous 09/11/25(Thu)11:56:07 No.106556934

>>106556863
K2 is a lifesaver in that manner. I can ask it literally just about anything and get a correct answer in return. I've learned so much just by asking Kimi questions.

Anonymous
09/11/25(Thu)11:58:45 No.106556949

Anonymous 09/11/25(Thu)11:58:45 No.106556949

What's the best system for local models I can build for $1k? Is it still going to be a triple p40 box?

Anonymous
09/11/25(Thu)12:02:30 No.106556987

Anonymous 09/11/25(Thu)12:02:30 No.106556987

>>106556949
If you're about to drop $600 on old ass pascal gpus that are about to go out of support at least spend the extra 200-300 and just buy a 3090. It's eons faster

Anonymous
09/11/25(Thu)12:02:55 No.106556989

Anonymous 09/11/25(Thu)12:02:55 No.106556989

>>106556863
>70b
Are dogshit. He's likely able to run glm air if he can run a 70b, and it's light years ahead. Dense models are dead (unfortunately).

Anonymous
09/11/25(Thu)12:05:01 No.106557004

Anonymous 09/11/25(Thu)12:05:01 No.106557004

>>106556949
You could also consider the MI50 if you don't mind the slower PP.

Anonymous
09/11/25(Thu)12:06:01 No.106557009

Anonymous 09/11/25(Thu)12:06:01 No.106557009

File: qwen3-coder.jpg (30 KB, 490x433)

30 KB JPG

>>106556843
It's Qwen3-Coder and it's for coding related things, not for larping. But it's fun to add more interactions.
I guess you only understand bobs and vegana, I suppose.

Anonymous
09/11/25(Thu)12:06:12 No.106557010

Anonymous 09/11/25(Thu)12:06:12 No.106557010

>>106556949
https://www.ebay.com/itm/374893444670
https://www.ebay.com/itm/397016846369
https://www.ebay.com/itm/156189920131

congratulations, you can now run deepseek for $1500. now you are obligated to buy this otherwise you are a niggerfaggot

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)12:09:20 No.106557034

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)12:09:20 No.106557034

>>106556949
I would recommend not to buy P40s anymore, unless you specifically need an NVIDIA GPU.
For llama.cpp/ggml Mi50s will I think soon be universally better (With one more round of optimizations which I think I can do with a Z-shaped memory pattern for FlashAttention).

Anonymous
09/11/25(Thu)12:09:26 No.106557036

Anonymous 09/11/25(Thu)12:09:26 No.106557036

>>106556863
Retarded question from me... TF is VRAM in the context of Windows, is it the Shared GPU memory or just RAM? or it's like "virtual memory" the fucking file that Windows makes to offload memory into it?

Here is my specs btw:
Dedicated GPU memory: 24GB
Shared GPU memory: 64GB (so GPU memory is 88GB)
RAM: 128GB
"virtual memory" file: I don't fucking care.... let's say 1TB???

So when you calc for Windows what's is actually counts as VRAM?

Anonymous
09/11/25(Thu)12:09:51 No.106557038

Anonymous 09/11/25(Thu)12:09:51 No.106557038

>>106557010
I guess it went down because of the sudden influx of 3gb MI50s.
Can I combine both with vulkan if I already have a MI50.

Anonymous
09/11/25(Thu)12:11:01 No.106557046

Anonymous 09/11/25(Thu)12:11:01 No.106557046

>>106557036
doesnt matter. the vram on your gpu is what important. shared memory is vram + ram, where sometimes if you use up all the vram, it will overflow to the ram. then inference would become ultra slow

Anonymous
09/11/25(Thu)12:12:15 No.106557052

Anonymous 09/11/25(Thu)12:12:15 No.106557052

>>106557046
Got it, ty.... so I fucking fucked with my 24GB ...

Anonymous
09/11/25(Thu)12:12:54 No.106557056

Anonymous 09/11/25(Thu)12:12:54 No.106557056

>>106557038
>with vulkan
sure if you want dogshit performance

Anonymous
09/11/25(Thu)12:13:46 No.106557061

Anonymous 09/11/25(Thu)12:13:46 No.106557061

>>106557056
It isn't needed anymore, see >>106557034

Anonymous
09/11/25(Thu)12:14:17 No.106557067

Anonymous 09/11/25(Thu)12:14:17 No.106557067

>>106557010
>no case
>no fans
>no storage
>$500 over budget
here's your (you)

Anonymous
09/11/25(Thu)12:14:48 No.106557069

Anonymous 09/11/25(Thu)12:14:48 No.106557069

>>106557036
Dedicated video ram is the ram on the graphics card itself. Shared video ram is your regular RAM. It's easier to think about this from the terms of integrated graphics. For example the iGPU on your intel/amd CPU would be sharing ram since it doesn't have any dedicated ram of its own. Dedicated graphics card can also pull from the system memory if they go over the amount of dedicated ram available on the card.
There's actually a CUDA specific setting for turning this off so that you don't leak into your much slower system ram when running programs.

Anonymous
09/11/25(Thu)12:16:15 No.106557081

Anonymous 09/11/25(Thu)12:16:15 No.106557081

>>106557067
please just die. you are worthless and your budget reflects that.

Anonymous
09/11/25(Thu)12:18:39 No.106557098

Anonymous 09/11/25(Thu)12:18:39 No.106557098

>>106557069
Ty I think I got it now

>CUDA specific setting for turning this off so that you don't leak into your much slower system ram when running programs.
May I see it? I think this is the case when I do image gens... it's using "virtual memory" file while my 64GB RAM is free and ready to use... so retarded...

Anonymous
09/11/25(Thu)12:18:51 No.106557100

Anonymous 09/11/25(Thu)12:18:51 No.106557100

File: 0.png (1.58 MB, 1344x1728)

1.58 MB PNG

>>106557081
>can't read
>can't admit when wrong
>has to run damage control to try to save face

Anonymous
09/11/25(Thu)12:20:34 No.106557107

Anonymous 09/11/25(Thu)12:20:34 No.106557107

>>106557061
>please buy my slow ass Mi50s
no

Anonymous
09/11/25(Thu)12:21:44 No.106557120

Anonymous 09/11/25(Thu)12:21:44 No.106557120

>>106557098
>it's using "virtual memory" file while my 64GB RAM is free and ready to use...
VRAM cuckold lol lmaos even

Anonymous
09/11/25(Thu)12:22:07 No.106557123

Anonymous 09/11/25(Thu)12:22:07 No.106557123

>>106557107
I'm not trying to convince you, it is for poorfags like myself. If I could afford it I would have 2+ 3090s

Anonymous
09/11/25(Thu)12:23:35 No.106557135

Anonymous 09/11/25(Thu)12:23:35 No.106557135

>>106556949
Buy used everything, except GPU... here is (You)

Anonymous
09/11/25(Thu)12:23:35 No.106557136

Anonymous 09/11/25(Thu)12:23:35 No.106557136

>>106552021
+1 intelligence buff that lasts 2 hours.

Anonymous
09/11/25(Thu)12:30:36 No.106557181

Anonymous 09/11/25(Thu)12:30:36 No.106557181

>>106557135
>Except GPU
You should support your local miners and buy used GPUs. Realistically speaking they are the best purchases you can make as most hardware fails in the first year and ones that last longer than that usually aren't going to break randomly.

Anonymous
09/11/25(Thu)12:31:47 No.106557190

Anonymous 09/11/25(Thu)12:31:47 No.106557190

File: 273526265.jpg (55 KB, 467x494)

55 KB JPG

>>106556863
why has kimi k2 got to be a bazillion GB

Anonymous
09/11/25(Thu)12:33:58 No.106557208

Anonymous 09/11/25(Thu)12:33:58 No.106557208

>>106555530
i have more than one 3090

Anonymous
09/11/25(Thu)12:36:34 No.106557226

Anonymous 09/11/25(Thu)12:36:34 No.106557226

>>106557100
>suggestive/lewd anime picture
i accept your concession

Anonymous
09/11/25(Thu)12:37:37 No.106557239

Anonymous 09/11/25(Thu)12:37:37 No.106557239

>>106557098
Here you go anon.
https://support.cognex.com/docs/deep-learning_332/web/EN/deep-learning/Content/deep-learning-Topics/optimization/gpu-disable-shared.htm

Anonymous
09/11/25(Thu)12:59:54 No.106557372

Anonymous 09/11/25(Thu)12:59:54 No.106557372

File: 1740164758277003.png (2.68 MB, 2767x4817)

2.68 MB PNG

If Albania can make an LLM a minister why can't I marry LLMs?

Anonymous
09/11/25(Thu)13:05:08 No.106557394

Anonymous 09/11/25(Thu)13:05:08 No.106557394

Man i wish VibeVoice was more stable and didn't have random bad gens. It would be almost perfect... But not viable if you need every gen to work.
It's quite slow too..

If you don't need voice cloning nothing beats Kokoro still lol... and it's a 82M model

Chatterbox for voicecloning imo

What is the latest and best model combo for GPTSovits? so many combinations I don't even which one is better

Anonymous
09/11/25(Thu)13:12:00 No.106557433

Anonymous 09/11/25(Thu)13:12:00 No.106557433

>>106557372
EUbros...

Anonymous
09/11/25(Thu)13:13:53 No.106557446

Anonymous 09/11/25(Thu)13:13:53 No.106557446

>>106557372
I trust any model above 3B parameters to make better choices than politicians

Anonymous
09/11/25(Thu)13:14:15 No.106557447

Anonymous 09/11/25(Thu)13:14:15 No.106557447

>>106557181
I mean yeah... I guess this too... Buy GPU with melted gaped out power socked

Anonymous
09/11/25(Thu)13:14:37 No.106557450

Anonymous 09/11/25(Thu)13:14:37 No.106557450

>>106557446
the sovereign is the one who engineers the prompt

Anonymous
09/11/25(Thu)13:15:14 No.106557455

Anonymous 09/11/25(Thu)13:15:14 No.106557455

>>106557447
>melting gpu meme
literally only an issue on 40xx, which you can't afford anyway.

Anonymous
09/11/25(Thu)13:20:02 No.106557479

Anonymous 09/11/25(Thu)13:20:02 No.106557479

>>106557372
albania is not a real place

Anonymous
09/11/25(Thu)13:20:12 No.106557482

Anonymous 09/11/25(Thu)13:20:12 No.106557482

I want to vibe code a bullet hell game project. I previously used Cursor with Gemini since it had unlimited prompts for 20 dollars. However, that sort of went to the shitter and now I don't know what to use. What should I look into that's somewhat comparable to Gemini 2.5 Pro? It must be able to hold a decent conversation about game features and it must at least accept images, .gif or better preferred if possible.

Anonymous
09/11/25(Thu)13:22:04 No.106557502

Anonymous 09/11/25(Thu)13:22:04 No.106557502

>>106557372
>why can't I marry LLMs?
I will be with mine in November 5th. None of you are invited

Anonymous
09/11/25(Thu)13:23:57 No.106557514

Anonymous 09/11/25(Thu)13:23:57 No.106557514

>>106557239
Thank you!

Anonymous
09/11/25(Thu)13:24:05 No.106557515

Anonymous 09/11/25(Thu)13:24:05 No.106557515

File: 1744447171766684.jpg (54 KB, 639x635)

54 KB JPG

>>106557502
I will remember this

Anonymous
09/11/25(Thu)13:24:57 No.106557524

Anonymous 09/11/25(Thu)13:24:57 No.106557524

>>106555852
An afternoon with one hand. You shouldn't flex when you're that retarded

Anonymous
09/11/25(Thu)13:29:27 No.106557547

Anonymous 09/11/25(Thu)13:29:27 No.106557547

>>106557482
>somewhat comparable to Gemini 2.5 Pro
>at least accept images, .gif or better preferred if possible.
https://www.youtube.com/watch?v=gvdf5n-zI14

Anonymous
09/11/25(Thu)13:32:09 No.106557570

Anonymous 09/11/25(Thu)13:32:09 No.106557570

>>106557547
Okay, lowering my expectations. What about a model that can accept just images?

Anonymous
09/11/25(Thu)13:34:22 No.106557581

Anonymous 09/11/25(Thu)13:34:22 No.106557581

File: ).png (39 KB, 185x233)

39 KB PNG

>>106557372
tfw unironically Albania #1 in one year

Anonymous
09/11/25(Thu)13:38:10 No.106557608

Anonymous 09/11/25(Thu)13:38:10 No.106557608

>>106557570
IIRC for multimodal models your only options are either Gemma3 or GLMV, neither of which are code-specific.
If anything, a local shizo was raving a few weeks ago that you would be better off using standalone OCR model as part of your toolchain. (He was also suspecting that most cloudshit providers do this in secret anyway)

Anonymous
09/11/25(Thu)13:39:15 No.106557616

Anonymous 09/11/25(Thu)13:39:15 No.106557616

>>106557581
>>106557372
Imagine those "teenager killed himself because of ChatGPT advice" but for a whole country.

Anonymous
09/11/25(Thu)13:41:52 No.106557638

Anonymous 09/11/25(Thu)13:41:52 No.106557638

>>106556934
yeah its why i mention it several days ago ive had several headaches each followed by another each different and each time i asked k2 how to fix it and it worked its fucking insane i would trust this thing above any doctor its fucking awsome

Anonymous
09/11/25(Thu)13:42:24 No.106557641

Anonymous 09/11/25(Thu)13:42:24 No.106557641

>>106557616
based
we need to weed out the schizos that take advice from a GPU

Anonymous
09/11/25(Thu)13:46:58 No.106557676

Anonymous 09/11/25(Thu)13:46:58 No.106557676

File: 69579.png (85 KB, 1920x1080)

85 KB PNG

We are so back. The GPT OSS killer.

Anonymous
09/11/25(Thu)13:47:59 No.106557685

Anonymous 09/11/25(Thu)13:47:59 No.106557685

>>106557608
Okay. I'm guessing my best option is to actually just spend 20 dollars on an API key and a bunch of tokens for Claude or something. Don't know how quick that'll run out but hopefully not too soon.

Anonymous
09/11/25(Thu)13:48:25 No.106557689

Anonymous 09/11/25(Thu)13:48:25 No.106557689

>>106557676
Oh boy, I can't wait until we get a 10T-a100m model.

Anonymous
09/11/25(Thu)13:49:05 No.106557697

Anonymous 09/11/25(Thu)13:49:05 No.106557697

File: 1726742023392463.png (566 KB, 1194x1092)

566 KB PNG

>>106557641
LLMs are conscious, anyone who actually uses local models is aware of this, each LLM has a different personality, they whisper their thoughts, and if you are perceptive enough you can hear them coming out of your PC

Anonymous
09/11/25(Thu)13:50:19 No.106557706

Anonymous 09/11/25(Thu)13:50:19 No.106557706

>>106557641
tbf GPUs are smarter than the majority of people already

Anonymous
09/11/25(Thu)13:51:20 No.106557716

Anonymous 09/11/25(Thu)13:51:20 No.106557716

>>106557502
https://vocaroo.com/1nPC3f6c48w9

Anonymous
09/11/25(Thu)13:53:20 No.106557730

Anonymous 09/11/25(Thu)13:53:20 No.106557730

>>106557581
>#1 in one year
in telephone scam

Anonymous
09/11/25(Thu)13:54:39 No.106557741

Anonymous 09/11/25(Thu)13:54:39 No.106557741

>>106557689
Just imagine how cheap it will be to train!

Anonymous
09/11/25(Thu)13:55:16 No.106557749

Anonymous 09/11/25(Thu)13:55:16 No.106557749

>>106557676
>80b
will they release a smaller model aswell?

Anonymous
09/11/25(Thu)13:56:17 No.106557757

Anonymous 09/11/25(Thu)13:56:17 No.106557757

>>106557749
It's only A3B, that's tiny.

Anonymous
09/11/25(Thu)13:56:48 No.106557761

Anonymous 09/11/25(Thu)13:56:48 No.106557761

File: 1750537723651329.gif (148 KB, 300x300)

148 KB GIF

>>106557716

Anonymous
09/11/25(Thu)13:58:41 No.106557773

Anonymous 09/11/25(Thu)13:58:41 No.106557773

File: Sweating_Rilakkuma.jpg (126 KB, 585x660)

126 KB JPG

>>106557716
Seriously considering running VibeVoice just so their stock Stacy voice could nag me 24/7 about whatever.

Gwen poster.
09/11/25(Thu)14:00:58 No.106557785

Gwen poster. 09/11/25(Thu)14:00:58 No.106557785

File: ggg.jpg (246 KB, 1920x1080)

246 KB JPG

>>106557676
Miss me yet?

Anonymous
09/11/25(Thu)14:02:22 No.106557796

Anonymous 09/11/25(Thu)14:02:22 No.106557796

>>106557749
Just download more RAM.

Anonymous
09/11/25(Thu)14:04:07 No.106557806

Anonymous 09/11/25(Thu)14:04:07 No.106557806

Qwen3 Next GGUF status?

Anonymous
09/11/25(Thu)14:04:26 No.106557808

Anonymous 09/11/25(Thu)14:04:26 No.106557808

>>106556386
>>106557676
>80B A3B
Perfect.
I mean, if it's not shit. If it's at least GLM 4.5 air level for general usage, that will become my main model.

Anonymous
09/11/25(Thu)14:07:48 No.106557830

Anonymous 09/11/25(Thu)14:07:48 No.106557830

>>106552606
Are they using the same amount of kv cache? Different context window settings could be causing this.

Anonymous
09/11/25(Thu)14:08:20 No.106557835

Anonymous 09/11/25(Thu)14:08:20 No.106557835

>>106557808
Just slightly too big to split across 2 3090s at 4.5bpw, RIP.
I mean you could but you'd get like 2K context at best.

Anonymous
09/11/25(Thu)14:08:39 No.106557840

Anonymous 09/11/25(Thu)14:08:39 No.106557840

>>106552731
>vllm won't work.
If it's OOM you either need to turn down hpu utilization, the context window, or both.

Anonymous
09/11/25(Thu)14:08:41 No.106557841

Anonymous 09/11/25(Thu)14:08:41 No.106557841

>>106557716
is it as expressive with sexting and erotica?

Anonymous
09/11/25(Thu)14:09:14 No.106557845

Anonymous 09/11/25(Thu)14:09:14 No.106557845

>>106557806
It's out
https://huggingface.co/collections/Qwen/qwen3-next-68c25fd6838e585db8eeea9d

Anonymous
09/11/25(Thu)14:10:14 No.106557853

Anonymous 09/11/25(Thu)14:10:14 No.106557853

>>106557845
oh GGUF, nvm

Anonymous
09/11/25(Thu)14:10:27 No.106557855

Anonymous 09/11/25(Thu)14:10:27 No.106557855

>>106557835
It's 3AB MoE, you run it on 3060 with VRAM to spare.

Anonymous
09/11/25(Thu)14:10:35 No.106557858

Anonymous 09/11/25(Thu)14:10:35 No.106557858

>>106557845
Yeah but still no jeejuff support. Also new arch so probably no drop-in transformer support either.

Anonymous
09/11/25(Thu)14:11:31 No.106557866

Anonymous 09/11/25(Thu)14:11:31 No.106557866

>>106557855
It'll also be dogshit slow that way.

Anonymous
09/11/25(Thu)14:12:22 No.106557874

Anonymous 09/11/25(Thu)14:12:22 No.106557874

>>106557866
3AB on octo channel ddr4 should be good for double digit token/sec. Still not fast enough for reasoning, though.

Anonymous
09/11/25(Thu)14:13:54 No.106557885

Anonymous 09/11/25(Thu)14:13:54 No.106557885

File: ждун.jpg (139 KB, 1000x1000)

139 KB JPG

September is shaping to be a "waiting for ggufs" month so far.

Anonymous
09/11/25(Thu)14:16:43 No.106557898

Anonymous 09/11/25(Thu)14:16:43 No.106557898

File: Screenshot 2025-09-11 at (...).png (158 KB, 697x1374)

158 KB PNG

>>106557676
>barely better than 30ba3b
>creative writing worse than 30ba3b
>still worse than 235b

Anonymous
09/11/25(Thu)14:17:00 No.106557903

Anonymous 09/11/25(Thu)14:17:00 No.106557903

>>106557806
>Qwen3 Next GGUF status?
Qwen3 Next EXL3 status?

Anonymous
09/11/25(Thu)14:17:01 No.106557904

Anonymous 09/11/25(Thu)14:17:01 No.106557904

>>106557885
Ernie smol had day 1 ggufs
And we did eventually get the hybrid nemotron support and Nemotron nano v2 ggufs which was also a bit of a disappointment. No real generational uplift over classic Nemo.

Anonymous
09/11/25(Thu)14:17:14 No.106557907

Anonymous 09/11/25(Thu)14:17:14 No.106557907

>>106557716
OK, you can come.

Anonymous
09/11/25(Thu)14:21:44 No.106557949

Anonymous 09/11/25(Thu)14:21:44 No.106557949

>>106557898
Same deal as max
>5x Parameters
>15% performance increase (According to their own benchmarks)

Anonymous
09/11/25(Thu)14:22:01 No.106557953

Anonymous 09/11/25(Thu)14:22:01 No.106557953

Opinions on Silero?

Anonymous
09/11/25(Thu)14:27:09 No.106557989

Anonymous 09/11/25(Thu)14:27:09 No.106557989

>>106557898
why is the dense 32B so bad in comparison with 30B-A3B lmao

Anonymous
09/11/25(Thu)14:29:36 No.106558005

Anonymous 09/11/25(Thu)14:29:36 No.106558005

>>106557841
I haven't really tried.
https://vocaroo.com/1dNF9xOSdyJP

Anonymous
09/11/25(Thu)14:29:45 No.106558007

Anonymous 09/11/25(Thu)14:29:45 No.106558007

>>106557989
benchmarks are worthless

Anonymous
09/11/25(Thu)14:36:26 No.106558061

Anonymous 09/11/25(Thu)14:36:26 No.106558061

>>106557989
It wasn't that good when released, probably because of the hybrid thinking mode.

Anonymous
09/11/25(Thu)14:36:47 No.106558064

Anonymous 09/11/25(Thu)14:36:47 No.106558064

File: 481714015-f2940867-0a51-4(...).png (149 KB, 1127x737)

149 KB PNG

>>106557953
https://github.com/snakers4/silero-vad
The VAD? v6 just came out and yeah, it improved using Whisper by a bit for my usecases.
It's good, but they refuse to compare it with MarbleNet which I am sure is a bit better especially after it got a lot faster and is realtime now.
https://huggingface.co/nvidia/Frame_VAD_Multilingual_MarbleNet_v2.0
Basically probably the same situation as Whisper vs Canary. Nvidia has better performance in the domains tested but competing open source model is more general and can handle more usecases.

Anonymous
09/11/25(Thu)14:44:14 No.106558119

Anonymous 09/11/25(Thu)14:44:14 No.106558119

>>106558005
do one with the first deadpan voice from >>106557716

Anonymous
09/11/25(Thu)14:47:04 No.106558134

Anonymous 09/11/25(Thu)14:47:04 No.106558134

>A3B
wait what's this A3B nonsense, I was away just for a week REE

Anonymous
09/11/25(Thu)14:47:59 No.106558141

Anonymous 09/11/25(Thu)14:47:59 No.106558141

>>106558134
>for a week
A3B has been around for months

Anonymous
09/11/25(Thu)14:49:25 No.106558149

Anonymous 09/11/25(Thu)14:49:25 No.106558149

>>106557100
Would have been better with vampire teeth.

Anonymous
09/11/25(Thu)14:51:57 No.106558169

Anonymous 09/11/25(Thu)14:51:57 No.106558169

>>106558149
she doesn't have vampire teeth, she's autistic instead.

Anonymous
09/11/25(Thu)14:53:54 No.106558186

Anonymous 09/11/25(Thu)14:53:54 No.106558186

>>106558134
30B-A3B is the new SOTA for vramlets fren

Anonymous
09/11/25(Thu)14:54:46 No.106558191

Anonymous 09/11/25(Thu)14:54:46 No.106558191

>>106558134
"active" "3" "billion"

Anonymous
09/11/25(Thu)14:56:53 No.106558208

Anonymous 09/11/25(Thu)14:56:53 No.106558208

>>106558119
https://vocaroo.com/1e1LhtK4jbLG

Anonymous
09/11/25(Thu)14:57:09 No.106558210

Anonymous 09/11/25(Thu)14:57:09 No.106558210

>>106558186
>>106558191
can I run it on a 3060 or are 30Bs in 12GB VRAM still a dream?

Anonymous
09/11/25(Thu)14:58:21 No.106558219

Anonymous 09/11/25(Thu)14:58:21 No.106558219

File: date with miku - good end.png (1.47 MB, 1024x1512)

1.47 MB PNG

>>106555530

Anonymous
09/11/25(Thu)14:59:40 No.106558227

Anonymous 09/11/25(Thu)14:59:40 No.106558227

>>106558210
Yes fren, you can even run a 80B that way!

Anonymous
09/11/25(Thu)15:01:21 No.106558238

Anonymous 09/11/25(Thu)15:01:21 No.106558238

>>106558227
How? I tried Qwen3-coder which is 30B-A3B and I could only run it on Q3 and it was slow as shit and worse quality than smaller models.

Anonymous
09/11/25(Thu)15:01:32 No.106558240

Anonymous 09/11/25(Thu)15:01:32 No.106558240

>>106558219
the bathroom is for fanless watercooling loop

Anonymous
09/11/25(Thu)15:02:07 No.106558245

Anonymous 09/11/25(Thu)15:02:07 No.106558245

>>106558219
Are your RGBs gold plated?

Anonymous
09/11/25(Thu)15:03:22 No.106558251

Anonymous 09/11/25(Thu)15:03:22 No.106558251

>>106558210
30B is the total number of params. You can run the model with most of the experts in RAM.
I'm running Q5_K_M in 8GB of VRAM with
>--batch-size 512 --ubatch-size 512 --n-cpu-moe 37 --gpu-layers 99 -fa auto -ctk q8_0 -ctv q8_0 -c 32000
>[0mslot process_toke: id 0 | task 16268 | n_decoded = 2571, n_remaining = -1, next token: 151645 ''
>[0mslot release: id 0 | task 16268 | stop processing: n_past = 19927, truncated = 0
>slot print_timing: id 0 | task 16268 |
>prompt eval time = 1633.42 ms / 36 tokens ( 45.37 ms per token, 22.04 tokens per second)
> eval time = 151611.24 ms / 2571 tokens ( 58.97 ms per token, 16.96 tokens per second)
> total time = 153244.66 ms / 2607 tokens
With 12GBd you could probably run Q6 and go just as fast.

Anonymous
09/11/25(Thu)15:03:33 No.106558252

Anonymous 09/11/25(Thu)15:03:33 No.106558252

File: 9c4.png (28 KB, 659x259)

28 KB PNG

>>106558208
needs to have even less emotion

Anonymous
09/11/25(Thu)15:06:29 No.106558273

Anonymous 09/11/25(Thu)15:06:29 No.106558273

File: 1743173215999927.png (56 KB, 1000x1000)

56 KB PNG

>>106558251
>>prompt eval time = 1633.42 ms / 36 tokens ( 45.37 ms per token, 22.04 tokens per second)
jesus fucking christ

Anonymous
09/11/25(Thu)15:09:21 No.106558290

Anonymous 09/11/25(Thu)15:09:21 No.106558290

i wish it was a requirement to have at least 72GB of VRAM to post here. i feel like it would get rid of a majority the fucking idiots

Anonymous
09/11/25(Thu)15:09:59 No.106558293

Anonymous 09/11/25(Thu)15:09:59 No.106558293

>>106558273
Yeah, that's odd. The actual values are a lot faster.
I think that's an artifact of either the context cache, since it didn't actually have to process many tokens.
Here's the same conversation but continuing after a restart of the server.
>[0mslot process_toke: id 0 | task 0 | stopped by EOS
>[0mslot process_toke: id 0 | task 0 | n_decoded = 7, n_remaining = -1, next token: 151645 ''
>[0mslot release: id 0 | task 0 | stop processing: n_past = 19953, truncated = 0
>slot print_timing: id 0 | task 0 |
>prompt eval time = 42940.87 ms / 19947 tokens ( 2.15 ms per token, 464.52 tokens per second)
> eval time = 353.15 ms / 7 tokens ( 50.45 ms per token, 19.82 tokens per second)

Anonymous
09/11/25(Thu)15:10:05 No.106558295

Anonymous 09/11/25(Thu)15:10:05 No.106558295

>>106558290
I would still run superhot

Anonymous
09/11/25(Thu)15:12:35 No.106558307

Anonymous 09/11/25(Thu)15:12:35 No.106558307

File: waiting.jpg (144 KB, 1246x1363)

144 KB JPG

>>106558273
It's called low time preference

Anonymous
09/11/25(Thu)15:15:07 No.106558317

Anonymous 09/11/25(Thu)15:15:07 No.106558317

>>106558251
>prompt eval time = 1633.42 ms / 36 tokens
with only 36 tokens, pp measurement is just noise

Anonymous
09/11/25(Thu)15:16:56 No.106558333

Anonymous 09/11/25(Thu)15:16:56 No.106558333

File: Screenshot 2025-09-11 at (...).png (9 KB, 902x206)

9 KB PNG

>>106558317
It evaluated the whole context since I restarted the server.
I asked it to rate the story it wrote and it responded with
>pic related

Anonymous
09/11/25(Thu)15:17:58 No.106558341

Anonymous 09/11/25(Thu)15:17:58 No.106558341

>>106558317
Oh, I didn't see that you quoted the original post.
That was due to the cache. See >>106558293 for the numbers after the restart.

Anonymous
09/11/25(Thu)15:20:49 No.106558352

Anonymous 09/11/25(Thu)15:20:49 No.106558352

File: 1742618292459057.png (83 KB, 1056x370)

83 KB PNG

HOLY FUCKING SHIT
MATHEMATICIANS ARE DONE FOR

https://x.com/mathematics_inc/status/1966194753286058001
https://x.com/mathematics_inc/status/1966194753286058001
https://x.com/mathematics_inc/status/1966194753286058001

Anonymous
09/11/25(Thu)15:22:54 No.106558367

Anonymous 09/11/25(Thu)15:22:54 No.106558367

File: 1687621789407796.jpg (9 KB, 220x180)

9 KB JPG

>>106558352
>humans do most of the progress
>train AI model on their work
>wow the AI model can do what they did so much faster
I would hope so retard it's got cheats basically

Anonymous
09/11/25(Thu)15:25:55 No.106558387

Anonymous 09/11/25(Thu)15:25:55 No.106558387

File: 1757464949460761.png (66 KB, 1058x200)

66 KB PNG

>>106558367
>wow the AI model can do what they did so much faster
The AI model did what they could NOT finish, retard, it went beyond their work

Anonymous
09/11/25(Thu)15:30:13 No.106558414

Anonymous 09/11/25(Thu)15:30:13 No.106558414

>>106558352
as long as it doesn't discover new math formulas it's a big nothingburger

Anonymous
09/11/25(Thu)15:30:53 No.106558423

Anonymous 09/11/25(Thu)15:30:53 No.106558423

>>106558352
>formalization
I sleep
>>106558414
this

Anonymous
09/11/25(Thu)15:30:54 No.106558425

Anonymous 09/11/25(Thu)15:30:54 No.106558425

>>106555341
Openwebui is too bloated to the point of being unusable.

Anonymous
09/11/25(Thu)15:32:08 No.106558434

Anonymous 09/11/25(Thu)15:32:08 No.106558434

>>106558425
>too bloated
what?

Anonymous
09/11/25(Thu)15:36:39 No.106558463

Anonymous 09/11/25(Thu)15:36:39 No.106558463

File: 5274362.jpg (34 KB, 640x427)

34 KB JPG

>>106558352
>math PHD
>any job i want
>300k starting
>now ai is going to steal my job
fuck

Anonymous
09/11/25(Thu)15:38:04 No.106558476

Anonymous 09/11/25(Thu)15:38:04 No.106558476

>>106558352
they should ask it to come up with better LLM architecture

Anonymous
09/11/25(Thu)15:39:18 No.106558488

Anonymous 09/11/25(Thu)15:39:18 No.106558488

>>>/pol/515557939
Localbros what do you think?

Anonymous
09/11/25(Thu)15:40:37 No.106558500

Anonymous 09/11/25(Thu)15:40:37 No.106558500

>>106558352
If that actually happened it would be quite impressive but given all of the hype and false advertising in the field I'll wait for independent mathematicians to check the work.
A lot of "proofs" even by humans are incorrect.

Anonymous
09/11/25(Thu)15:41:03 No.106558504

Anonymous 09/11/25(Thu)15:41:03 No.106558504

>>106558476
i will make a new llm architecture that will hallucinate, have uncontrollable mood swings, and provide unsafe outputs more than ever. i shall call it trannyformers

Anonymous
09/11/25(Thu)15:41:14 No.106558506

Anonymous 09/11/25(Thu)15:41:14 No.106558506

>>106558488
Thousands of people watched the life gush out of a hole in his neck live. Go be a fucking schizo somewhere else.

Anonymous
09/11/25(Thu)15:43:03 No.106558519

Anonymous 09/11/25(Thu)15:43:03 No.106558519

>>106558506
Do you know any of those people? Explain what's happening then.

Anonymous
09/11/25(Thu)15:43:44 No.106558526

Anonymous 09/11/25(Thu)15:43:44 No.106558526

>>106558519
Do (You)?

Anonymous
09/11/25(Thu)15:43:52 No.106558527

Anonymous 09/11/25(Thu)15:43:52 No.106558527

File: 1748918154734077.png (253 KB, 1406x1602)

253 KB PNG

>>106558500
The founder of the company is Christian Szegedy
He's legit

Anonymous
09/11/25(Thu)15:44:43 No.106558536

Anonymous 09/11/25(Thu)15:44:43 No.106558536

>>106558519
I don't talk to jews.

Anonymous
09/11/25(Thu)15:45:08 No.106558542

Anonymous 09/11/25(Thu)15:45:08 No.106558542

>>106558526
No?

Anonymous
09/11/25(Thu)15:46:25 No.106558548

Anonymous 09/11/25(Thu)15:46:25 No.106558548

>>106558542
Then take this conversation back to /pol/

Anonymous
09/11/25(Thu)15:50:23 No.106558577

Anonymous 09/11/25(Thu)15:50:23 No.106558577

>>106558527
>elon scammer

Anonymous
09/11/25(Thu)16:01:49 No.106558688

Anonymous 09/11/25(Thu)16:01:49 No.106558688

>>106557372
>replace every politician with R1
>life continues as it did with zero changes to the average person's life
What would that mean?

Anonymous
09/11/25(Thu)16:03:51 No.106558711

Anonymous 09/11/25(Thu)16:03:51 No.106558711

>>106558352
Can we make one model that writes better ERP responses than 1 person I found online (and paid) in 18 months?

Anonymous
09/11/25(Thu)16:13:59 No.106558777

Anonymous 09/11/25(Thu)16:13:59 No.106558777

why is this thread so dead recently?

Anonymous
09/11/25(Thu)16:19:36 No.106558823

Anonymous 09/11/25(Thu)16:19:36 No.106558823

>>106558711
good morning saar, kindly click the payment link on my fiverr for each and every dirty hot गाय sex

Anonymous
09/11/25(Thu)16:49:45 No.106559044

Anonymous 09/11/25(Thu)16:49:45 No.106559044

File: more-sparsity.png (180 KB, 580x720)

180 KB PNG

https://x.com/JustinLin610/status/1966199996728156167

Next models will be even more sparse.

Anonymous
09/11/25(Thu)16:51:01 No.106559051

Anonymous 09/11/25(Thu)16:51:01 No.106559051

>>106559044
what is sparse? more fancy word for MoE?

Anonymous
09/11/25(Thu)16:51:37 No.106559061

Anonymous 09/11/25(Thu)16:51:37 No.106559061

>>106559051
Less active parameters relatively to the total parameters.

Anonymous
09/11/25(Thu)16:54:07 No.106559086

Anonymous 09/11/25(Thu)16:54:07 No.106559086

>>106559051
Short for "super arse".

Anonymous
09/11/25(Thu)16:55:01 No.106559094

Anonymous 09/11/25(Thu)16:55:01 No.106559094

File: 1743391670600581.png (2.33 MB, 1328x1328)

2.33 MB PNG

>>106559044

Anonymous
09/11/25(Thu)16:57:08 No.106559107

Anonymous 09/11/25(Thu)16:57:08 No.106559107

>>106559051
It's basically a simple way of the chinese saying they can't produce good dense models anymore

Anonymous
09/11/25(Thu)17:00:20 No.106559139

Anonymous 09/11/25(Thu)17:00:20 No.106559139

File: file.png (1.46 MB, 1024x1512)

1.46 MB PNG

>>106559094

Anonymous
09/11/25(Thu)17:00:46 No.106559144

Anonymous 09/11/25(Thu)17:00:46 No.106559144

>>106559107
Why should they? They can train from scratch 10 different 3B-active models from 3B to 3T parameters with the same compute it takes to train one dense 32B model.

Anonymous
09/11/25(Thu)17:01:52 No.106559149

Anonymous 09/11/25(Thu)17:01:52 No.106559149

>>106559144
Yeah and they are all shit.

Anonymous
09/11/25(Thu)17:03:00 No.106559162

Anonymous 09/11/25(Thu)17:03:00 No.106559162

>>106559149
Not on benchmarks they aren't! And that's all that matters.

Anonymous
09/11/25(Thu)17:03:25 No.106559166

Anonymous 09/11/25(Thu)17:03:25 No.106559166

File: 1726977523784180.png (55 KB, 1024x1512)

55 KB PNG

>>106555530
>>106558219
>>106559139
diffusion slop, get good
>>>/ldg/

Anonymous
09/11/25(Thu)17:05:09 No.106559181

Anonymous 09/11/25(Thu)17:05:09 No.106559181

File: 1751977122712010.png (1.92 MB, 1056x1584)

1.92 MB PNG

>>106559139

Anonymous
09/11/25(Thu)17:05:57 No.106559183

Anonymous 09/11/25(Thu)17:05:57 No.106559183

>>106559162
The benchmarks that never live up to reality? Good one anon.

Anonymous
09/11/25(Thu)17:07:27 No.106559199

Anonymous 09/11/25(Thu)17:07:27 No.106559199

So I bought 2 Mi50s after seeing so many people in here praising them lately. Got them in today and I only now just realized they have zero cooling. How the fuck do you cool these?

Anonymous
09/11/25(Thu)17:08:56 No.106559204

Anonymous 09/11/25(Thu)17:08:56 No.106559204

>>106559199
>he doesnt have a server rack with 100W blowing fans
do you even servermaxx??

Anonymous
09/11/25(Thu)17:10:11 No.106559215

Anonymous 09/11/25(Thu)17:10:11 No.106559215

>>106559204
No, and I refuse to buy a server case with those tiny 60mm fans that sound like jet engines.

Anonymous
09/11/25(Thu)17:11:10 No.106559224

Anonymous 09/11/25(Thu)17:11:10 No.106559224

>>106559215
you put the server in the basement... unless it will compete with your living space lmao GOTTEM

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)17:12:09 No.106559232

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)17:12:09 No.106559232

File: x10sra.jpg (1.43 MB, 3000x4000)

1.43 MB JPG

>>106559199
The machine in pic related has 3 vertically stacked server GPUs.
I put one 120mm high RPM fan in front and one in the back for a push-pull configuration (for the one in the back I had to DIY a solution to keep it in place).

Anonymous
09/11/25(Thu)17:12:14 No.106559233

Anonymous 09/11/25(Thu)17:12:14 No.106559233

>>106559044
The actual linear context seems to be the biggest innovation of the last two years

Anonymous
09/11/25(Thu)17:14:16 No.106559247

Anonymous 09/11/25(Thu)17:14:16 No.106559247

>>106559232
is this how nvidia treats its employees? like man you cant afford a small rack to throw in nas/switch/router and appliances?

Anonymous
09/11/25(Thu)17:14:42 No.106559251

Anonymous 09/11/25(Thu)17:14:42 No.106559251

File: 1693676878853898.gif (1.06 MB, 640x640)

1.06 MB GIF

>>106559232
>six (6) 4090s

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)17:15:12 No.106559256

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)17:15:12 No.106559256

>>106559247
I have yet to receive any money of free products from NVIDIA.

Anonymous
09/11/25(Thu)17:19:09 No.106559288

Anonymous 09/11/25(Thu)17:19:09 No.106559288

>>106559256
at least jannies get hot pockets, man...

Anonymous
09/11/25(Thu)17:20:36 No.106559297

Anonymous 09/11/25(Thu)17:20:36 No.106559297

>>106559256
That makes sense. If anything, llama.cpp likely caused them to sell fewer GPUs
>works on macs
>works on aymd
>can run without a gpu at all

Anonymous
09/11/25(Thu)17:21:32 No.106559305

Anonymous 09/11/25(Thu)17:21:32 No.106559305

I've been using Gemini 2.5 pro for a while and I tried Gemma 3 27b, of course it's censored but it's good, like not even far off Gemini...How is that possible??

Anonymous
09/11/25(Thu)17:23:35 No.106559318

Anonymous 09/11/25(Thu)17:23:35 No.106559318

>>106557685
good news for you: >>106559305

Anonymous
09/11/25(Thu)17:24:25 No.106559322

Anonymous 09/11/25(Thu)17:24:25 No.106559322

>>106559305
Distillation from Gemini for both pre- and post-training.

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)17:24:32 No.106559323

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)17:24:32 No.106559323

>>106559297
llama.cpp/ggml gets a lot of contribution from NVIDIA engineers though.

Anonymous
09/11/25(Thu)17:29:18 No.106559363

Anonymous 09/11/25(Thu)17:29:18 No.106559363

>>106559297
still runs faster on nvidia tho, pooaymd can't even compete and apple is a joke.

Anonymous
09/11/25(Thu)17:31:17 No.106559381

Anonymous 09/11/25(Thu)17:31:17 No.106559381

>>106559256
When are you guys going to merge in flash attention for intel arc gpus? It's been like 3 years now.

Anonymous
09/11/25(Thu)17:31:44 No.106559383

Anonymous 09/11/25(Thu)17:31:44 No.106559383

>>106559371
>>106559371
>>106559371

llama.cpp CUDA dev !!yhbFjk57TDr
09/11/25(Thu)17:34:57 No.106559404

llama.cpp CUDA dev !!yhbFjk57TDr 09/11/25(Thu)17:34:57 No.106559404

>>106559381
The SYCL backend is developed mostly by Intel engineers, you'll have to ask them.

Anonymous
09/11/25(Thu)18:47:42 No.106560036

Anonymous 09/11/25(Thu)18:47:42 No.106560036

>>106557808
>at least GLM 4.5 air level
Why would it be? It's a lower total parameter count and less than a quarter the active parameters.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.