[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1739183735263922.jpg (2.45 MB, 2832x2120)
2.45 MB
2.45 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106940821 & >>106931567

►News
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap
>(10/16) Auto-Antislop framework released: https://github.com/sam-paech/auto-antislop
>(10/14) Qwen3-VL 4B and 8B released: https://hf.co/Qwen/Qwen3-VL-8B-Thinking

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106940821

--Paper: Antislop: A Comprehensive Framework for Identifying and Eliminating Repetitive Patterns in Language Models:
>106944966 >106945059 >106945146 >106945604 >106945950 >106946143 >106946175 >106946233 >106946278 >106946621 >106946705 >106946831 >106946843 >106949880 >106950990 >106945339 >106945533 >106945561 >106945606 >106946024
--DeepSeek-OCR's image-based text compression and AI architecture implications:
>106948080 >106948481 >106948499 >106948518 >106948819 >106948905 >106948319 >106948529 >106948761 >106949247 >106950110 >106950179 >106950186 >106948192 >106948212 >106948255 >106953910 >106954336 >106948265 >106948291 >106949085 >106948271 >106948296 >106948584 >106948594 >106948622 >106949215
--Alternatives to Ollama for low-vRAM LLMs with roleplay setups:
>106943129 >106943198 >106943380 >106943404 >106943568 >106943586
--Optimizing MoE model inference speed with RAM utilization:
>106941343 >106941369 >106941398 >106941413 >106941438 >106942246
--Resolving Gemma chat template formatting issues in llama-cli:
>106941443 >106941501 >106941522 >106941619
--DLER research improves LLM efficiency with length penalty optimization:
>106945037
--Adversarial prompting techniques to mitigate AI hallucinations:
>106941795
--DeepSeek's memory compression innovations and implications for AI efficiency:
>106951209 >106952422 >106951255 >106952290 >106951306 >106951453 >106951528 >106951597 >106952149 >106951554 >106952234
--Optimizing sampler settings for CoT creativity and output precision:
>106942340 >106943060 >106943092 >106943230 >106943500
--Testing completion with llama.cpp and rp-sft-merged_1000-f16.gguf:
>106940883 >106941035
--Text diffusion vs autoregression in modeling human thought:
>106954091 >106954241
--Miku (free space):
>106940859 >106942138 >106942726 >106945481

►Recent Highlight Posts from the Previous Thread: >>106940836

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
multi-token prediction
>>
MTP SOON
GLM 4.6 AIR SOON
GEMMA 4 SAAR SOON
WHO HYPE????????????
>>
>>106954930
miku-teto penetration
>>
File: 1760328570709480.jpg (440 KB, 2048x1536)
440 KB
440 KB JPG
>>
>>106954941
Yes sir.
>>
File: MTP.png (790 KB, 1024x1024)
790 KB
790 KB PNG
>>106954943
>>
mikusex
>>
Mikulove
>>
glare realm shitter
>>
File: file.png (210 KB, 972x1141)
210 KB
210 KB PNG
>>106954930
>>106954941
>MTP SOON
More than two weeks.
>>
>>106954941
>GEMMA 4
tuesday newsday will deliver
>>
File: 1623328199614.png (9 KB, 326x272)
9 KB
9 KB PNG
Thinking about doing a genoa build, but I'm unsure how to go about buying parts off of ebay chinks. The price range for an epyc 9535 is $1-4k, which makes the cheap ones seem extremely fishy. Is it safe anons?
>>
>>106954989
Lewd
>>
File: moatboy at google hq.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
How will moatboy mleccha ever recover from Gemini 3?
>>
*crickets*
>>
M
>>
I
>>
K
>>
U
>>
Gemma
>>
U
>>
File: iu[2].jpg (41 KB, 407x405)
41 KB
41 KB JPG
>>
>>106955375
>genoa
Turin all day. Get the higher speed RAM
>chink sample cpus
Look at the frequency vs the non ES/QS one. Sometimes that's gimped in the ones they're selling.
>>
>>106955720
>>106955728
>>106955750
>>106955761
>>
File: 24387492349.png (552 KB, 500x637)
552 KB
552 KB PNG
>>106954941
im still holding out for mistral large 3
>>
>>106954941
Only hyped for Gemini 3, gemma is gonna be gigacucked again
>>
>>106955791
i hate western corpos and have learned to accept the chinese cock
>>
File: file.jpg (656 KB, 604x1293)
656 KB
656 KB JPG
Postin' in brainrot thread.
https://x.com/alex_prompter/status/1980224548550369376
https://llm-brain-rot.github.io/
https://arxiv.org/abs/2510.13928
https://github.com/llm-brain-rot/llm-brain-rot
>>
>>106955849
>train on short content
>model performance on long content isn't as good
Implessive paper
>>
>>106955849
We need more synthetic data!
>>
>>106954941
Wait they announced glm air 4.6? Link please. Timeline? Two weeks?
>>
>>106955849
ai visited ohio *skull*
>>
File: 1756504073503540.png (2.65 MB, 1024x1536)
2.65 MB
2.65 MB PNG
Fully expect next DS release will have a VLM integrated into it. I called this after Terminus BTW, since it was clear V3.1 was about shoving R1 back into V3. Next obvious model integration is a VLM.

I think the +VLM will be V3.3 and they won't release V4 until they have fully explored this new weird path that the OCR paper implies.
>source
This seems quite obvious given that DS's mission for "AGI", not a model zoo. They are working toward an omni model.
>>
>>106955849
>more of the same shit we already knew but framed a bit differently
>>
>>106955876
Two weeks.
>>
>>106955849
So more synthetic filtered high quality data, got it boss!
>>
>>106955876
They only did a "two more weeks :trollface_emoji:" on twitter, it's all rumors and speculations
>>
>>106955872
They had to have done something horribly wrong otherwise pretraining on raw internet would have made all models retarded already. The only ones with brainrot are those researchers.
>>
>>106955892
>V3.1 was about shoving R1 back into V3
Nah, they trained on geminislop and completely ruined the model. Friendship ended with DeepSeek, now GLM is my best friend.
>>
>>106955897
Not enough anymore. We need synthetic data generated by a model trained only on sythnetic data if we ever hope to acheive AGI.
>>
>>106955892
hope so. lets just hope they make a 200-300B version too so that it isnt slow as fuck
>>
File: sesame ai _ chatbot.jpg (202 KB, 1472x868)
202 KB
202 KB JPG
>>106955892
>an omni model
There is hope we will get something like picrel but real and non-gimped this time.
>>
>>106956038
I probably will never understand the want for more modalities when the somewhat fundamental one of text is still so under cooked, seems to me like a way to claim progress while pushing the issues aside for later at best. Every multi modal so far has been worse than pure text.
>>
>>106956073
people just wanna take dick pics and send it to their waifu. what's so hard to understand?
>>
>>106956106
exactly. was gonna post the same thing
>>
>>106956073
Gemma performs better than any other model of similar size and it's a vision model. Multimodal only degrades models if the model trainer is too stupid to do it properly just like how bad data ruins only the models made by idiots like Meta's Llama team.
>>
>>106956106
This but I also want my waifu to send me a pic back.
>>
>>106956073
After today's DeepSeek paper everybody will want to make an image-first LLM now. Check out Karpathy's comment: https://x.com/karpathy/status/1980397031542989305
>>
>>106956130
There is no one ITT that unironically jerks off to Gemma outside of "funny Indian shitposting"
>>
>>106956150
>Check out Karpathy's comment
No thanks.
>After today's DeepSeek paper everybody will want to make an image-first LLM now
Another DeepSeek pushed shit era eh? Fantastic.
>>
>>106954792
>DeepSeek OCR not in the news
>>
>>106956168
Bro we only spam dead memes like miku and doompost on daily basis here.
>>
>>106956152
The post I replied to is talking about the effect multimodal training has on models in terms of general performance, not a specific context like ERP.
>>
>>106956186
This place really be missing the :rocket: to the moons on the daily, kind of crazy how luddite filled this thread is.
>>
>>106956152
Ehh, you think is Gemma bad... Very good 100%.
>>
File: 1760878483127973.png (179 KB, 455x435)
179 KB
179 KB PNG
>>106956199
Honestly missed opportunity calling this
>/omg/ - open model general
A lot of true "local" stuff is hard because of the cost. I think hardware is getting there where we can run full sized models soon™ instead of quantized stuff.

Thread is primarily about open models and should be named as such.
>>
anyone know what would be the best nsfw 12b model nowadays?
>>
>>106956326
nemo
>>
>>106956310
>I think hardware is getting there where we can run full sized models soon™ instead of quantized stuff.
in what world do you live? models are constantly getting orders of magnitude bigger while hardware has barely advanced since the start of the hobby, sure you can rammax cope, but in terms of gpu the sota has stayed at under 100gb per card for years now, admittedly the price did lower but you'd still need tons to run the new 700+B things that release nowadays.
>>
>>106956310
Poorfags can make their own thread.
>>
File: 00003-1378487878-tats.png (1.16 MB, 1024x1024)
1.16 MB
1.16 MB PNG
>>106955892
Witnessed.
Also agree: My crystal ball says the OCR points to an omni model.
>>106956168
It's a 3B. It has more implications than actualities rn.
>>
>>106956351
Elitism wars should stay in >>>/trash/ or >>>/lgbt/
>>
>>106956331
any idea which one, there`s like a bazilion mixes
>>
>>106956370
Do you get a kick out of avatarfagging with anatomical nonsense in what is possibly the worst artstyle to grace any of the ai threads?
>>
>>106956345
>Nvidia DGX Spark
>AMD Strix Halo
>Apple M5
All use AI in their marketing pitch. All released *this year*. The trend is toward consumer grade hardware that is capable of running these things.
>>
>>106955849
>When you exclusively feed it one thing it does worse at everything else
Boy, nobody saw that coming.
>>
>>106956385
>plenty of people ITT are running huge local models but please rename the thread because *I* can't
Nah.
>>
>>106956404
>Nvidia DGX Spark
>AMD Strix Halo
capped at 128gb basically DOA for any good model, you're not running
>full sized models soon™ instead of quantized stuff.
outside of maybe 12b sized stuff
>Apple M5
only potentially interesting one when the 512+gb model come out, we'll see. still not enough for unquanted DS let alone anything bigger though.
>>
>>106956416
No one cares about your retarded headcanon.
>>
>>106956426
The post is right there, anon.
>>
>>106956416
yes you totally run glm at 2.5t/s on your pc instead of just paying pennies to use the same model at good speeds over or or the official api
i totally believe you
>>
>>106956351
You call me a poorfag but I doubt you can afford to pay a wage to someone. A $10k janky multigpu setup is not where this is headed. Sometime in the next 2 years someone will come out with a local inference box that will do everything OpenAI and Anthropic entire valuations are based on.
>>
>>106956370
>It's a 3B. It has more implications than actualities rn.
Do you need more for OCR?
>>
>>106956452
What's the speed on the api? I'm getting 40t/s at Q3.
>>
>>106956465
It's not about OCR it's about the implications for the future of all other models.
>>
>>106956472
nobody's talking about the two or three retards like you who spent 20k on gpus, retard.
>>
>>106956345
A single HBF stack would be enough to contain and run a couple T MoE.
>>
File: file.png (236 KB, 1079x434)
236 KB
236 KB PNG
>>106956409
They pretrained a model only on junk and were surprised that an instruct finetune couldn't make it smart. This is the sort of academic rigor I would expect from a fifth grade science fair.
>>
>>106956478
But I want to use OCR.
>>
>>106956452
>you
>your pc
>you

>>106956481
>nobody's talking about ... you
>>
>>106956499
Doesn't matter, it's all some need to point their boss to in order to push for more pre-train filtering and synthetic augmentation, as such it fulfills it's purpose perfectly.
>>
>>106956422
Strix Halo is *currently* capped. AMD has big reasons to pivot toward this. Apple can and should go for a 1TB+ device. They put local inference as a marketing point in an ipad chip lmao, they know.

I don't expect much from Nvidia until they feel some heat from the previous two. Margins for datacenter are too fat. They don't have to care yet.
>>
>>106956515
are you retarded?
do you have any awareness of what we were just talking about? you playing the 'yeah but I actually spent $20k so I get good speeds at fucking q3' is meaningless to the actual discussion at hand
>>
>>106956370
The VLM models are really smart, despite their size. Also you can talk with them... which is kind of weird.
>>
>>106956465
No, but it does more than OCR, it's acting as a compression / storage for LLMs. Which is pretty cool. The implications are faster and cheaper.
But it would need to be scaled up, and that's not been shown yet. No actualities.
>>106956639
lol has someone tried RP with it yet?
>>
File: IMG_20251020_210039.jpg (231 KB, 1045x1174)
231 KB
231 KB JPG
>>106956150
>elon musk
What did he mean by this?
>>
>>106956743
probably high again on whatever he takes
>>
File: file.png (2.69 MB, 3469x1579)
2.69 MB
2.69 MB PNG
>>106956579
>>106956452
poorfag. 10t/s on GLM 4.6 IQ4K. less than $10k in hardware. not the best at insulting, but i think it gets the point across
>>
>>106956743
Visual modality or photonic computing for language models / AI.
>>
>>106956743
nothing since he doesn't know shit aside from entrepreneurship
>>
>>106956743
he probably overheard some nerds talking at lunch and decided to try to parrot what he thought they were talking about to the rest of the world
>>
>>106953063
This happened after that guy killed himself and the "sycophancy" drama. Now ChatGPT never admits to be wrong, it only "misreads" and "misspeaks".
>>
File: rt.png (99 KB, 820x566)
99 KB
99 KB PNG
>>106956761
Why is my GLM4.6 retarded?

prompt eval time = 342.81 ms / 29 tokens ( 11.82 ms per token, 84.59 tokens per second)
eval time = 11465.64 ms / 336 tokens ( 34.12 ms per token, 29.30 tokens per second)
total time = 11808.45 ms / 365 tokens
>>
File: 17569342961892.jpg (134 KB, 889x320)
134 KB
134 KB JPG
>>106956944
>
>>
File: file.png (25 KB, 363x282)
25 KB
25 KB PNG
>>106956944
seems about right to me. your prompt eval is slightly better than mine and token gen is way better than mine. what quant are you using?
>>
>>106956944
>onions sauce
Literally filtered
>>
>Driver Version: 570.195.03
>apt install cuda
>The following additional packages will be installed:
>nvidia-dkms-530 nvidia-driver-530
>The following packages will be REMOVED:
>nvidia-driver-570-open nvidia-kernel-common-570
ended up compiling in a VM and only installed cuda-cudart-12-1 and libcublas-12-1 on the main system. envidia my beloved
>>
>>106957096
just wipe your entire hard drive and start over bro
>>
>>106957096
nobody asked doe
>>
>>106957104
>what is Timeshift
>>
>>106955892
>>106956370
I just like the Dipsy pictures, I'll have to post some when /wait/ comes back
>>
What launch options and token generation parameters most affect speed of output for a ramlet? I'm trying to find a balance between running a big model relative to my rig and getting responses in sane amounts of time.
>>
>>106956953
tranny do not redeem and you're brown
>>
>>106956988
UD-IQ2_M in 144gb vram.

I also run IQ3_KS but that ikllama has much slower textgen than llamacpp with this model (even running the same quant in both).
>>
File: file.png (212 KB, 1073x298)
212 KB
212 KB PNG
>>106957238
wait, ikllama is slower for you than normal llama? this is my performance with a bartowski IQ2XS with base llama on just my 4 5090s. offloading to RAM doesnt really work for me at all on base llama, so switch between the 2
>>
>>106957178
Lower context to the minimum acceptable level, to fit more of model in VRAM
If it's too slow then use a smaller model
That's all you really can do
>>
>>106957271
Actually, no. ik is faster. I had the wrong cli flags in my ikllama script.

ikllama:

INFO [ print_timings] prompt eval time = 16384.87 ms / 8652 tokens ( 1.89 ms per token, 528.05 tokens per second) | tid="132226052198400" timestamp=1761013326 id_slot=0 id_task=0 t_prompt_processing=16384.873 n_prompt_tokens_processed=8652 t_token=1.8937671058714747 n_tokens_second=528.0480355264274
INFO [ print_timings] generation eval time = 1082.08 ms / 26 runs ( 41.62 ms per token, 24.03 tokens per second) | tid="132226052198400" timestamp=1761013326 id_slot=0 id_task=0 t_token_generation=1082.081 n_decoded=26 t_token=41.6185


llamacpp:

prompt eval time = 18813.86 ms / 8647 tokens ( 2.18 ms per token, 459.61 tokens per second)
eval time = 686.53 ms / 16 tokens ( 42.91 ms per token, 23.31 tokens per second)
total time = 19500.39 ms / 8663 tokens
>>
>>106957415
i see. how is your prompt eval so much faster than mine?
>>
>>106957096
I messed up everything with cuda 13 tools thanks to Mint. Going back to Windows I don't have time for this shit.
It's 2025 and linux experience is worse than ever. I don't care if I can use vim or type some bash scripts. At least shit works in Windows.
>>
>>106957620
skill issue
>>
>>106957620
This, Linux doesn't even have real performance advantages anymore.
>b-but muh ram usage
Just run Server 2025 without Desktop if you want to minmax to this degree.
>>
herp derp im stupid so i better go back to the spyware OS hurr
>>
>106957739
take your meds schizo
>>
cope
>>
Bitch, I'm still running qwen3-4b
>>
>>>/v/723763328
>sillytavern thread on /v/ is infinitely better than this general
It's 100% over this time.
>>
>>106957928
/v/tards just don't know how over it is yet.
>>
Ok, I admit my attempt to main Llama 405B was silly and hardware is not ready for such big models yet.
Now I'm going to try living exclusively with Hermes4 70B and see how it goes.
>>
File: 1758432164251185.jpg (726 KB, 1125x1031)
726 KB
726 KB JPG
>>
CUDA_VISIBLE_DEVICES="0,1,2,3,4" ./llama-server \
--attention-max-batch 512 \
--batch-size 4096 \
--ubatch-size 4096 \
--cache-type-k f16 \
--ctx-size 32768 \
--mla-use 3 \
--flash-attn \
--fused-moe \
--model models/GLM-4.6-IQ3_KS/GLM-4.6-IQ3_KS-00001-of-00004.gguf \
-ngl 99 \
-sm layer \
--main-gpu 0 \
--tensor-split "10,23,23,22,22" \
-ot "blk\.[3-9]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.1[0-8]\.ffn_(up|gate)_exps=CUDA0" \
-ot "blk\.19\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.2[0-9]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[0-4]\.ffn_(up|gate)_exps=CUDA1" \
-ot "blk\.3[5-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.4[0-9]\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.50\.ffn_(up|gate)_exps=CUDA2" \
-ot "blk\.5[1-9]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[0-6]\.ffn_(up|gate)_exps=CUDA3" \
-ot "blk\.6[7-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.7[0-9]\.ffn_(up|gate)_exps=CUDA4" \
-ot "blk\.8[0-2]\.ffn_(up|gate)_exps=CUDA4" \
--override-tensor exps=CPU,attn_kv_b=CPU \
--no-mmap \
--threads 24 \
--host 0.0.0.0 \
--port 8999 \
--verbose

prompt eval time = 48574.28 ms / 17555 tokens ( 2.77 ms per token, 361.41 tokens per second)
generation eval time = 113887.28 ms / 1024 runs ( 111.22 ms per token, 8.99 tokens per second)

posted this the other day. i can't in good conscious use this model when it's this slow with 120GB of VRAM, but i get it if you don't have 512GB and can't run kimi or deepseek
>>
>>106958025
Why. There are better models now. If you can run 405B slow you can run GLM 4.6 fast.
>>
>>106958025
Oof, right off the bat it's actually retarded.
>>
>>106958063
I don't like MoE. I think it encourages overfitting (benchmaxxing) and knowledge over reasoning.
For example when I use GLM with my own custom tool use it frequently hallucinates the wrong syntax, but Llama 405B gets it always right even though it wasn't even trained for tool usage.
>>
>>106958070
I'm curious what syntax because this is the first I heard a modern model doing worse than 405B at tool calling. And what quants of the models are you comparing?
>>
Also I've heard they are harder to finetroon and I want to make my own custom personal assistant by finetuning once a week on a cleaned up version my own assistant logs for the week.
I suppose if I only tune the linear layers it' not much of an issue but for tuning the ffn the data would get split between all the experts which would mean you'd need more data overall.
I use axolotl and I don't know how to make my own configs so I'm restricted to the pre-existing examples (https://github.com/axolotl-ai-cloud/axolotl/tree/main/examples).
Even theoretically, assuming you want to run the whole model in vram, a dense model has more theoretical capacity than a MoE at that same size. The only advantage of the MoE would be inference speed (or the fact that it has been pretrained better).
>>
>>106957620
Sounds like you don't have an up to date repo configured if it's trying to downgrade the driver when installing cuda.
I suggest adding the NV repo and installing cuda-toolkit-13-0 and cuda-drivers from there. Working great for me on Mint 22.2
https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&Distribution=Ubuntu&target_version=24.04&target_type=deb_network
>>
>>106958085
>>106958085
Llama I was running at Q4 on a rented 8xV100 server with some offloaded tensors getting 0.7tk/s with 60k context (also tried the Q3 on a 8xL40 server which I could fit the whole thing on and get 3tk/s), GLM I was consuming through the z-ai API.
But then again I was using the GLM at a much longer contexts so maybe if I had the memory to fit the same context length with Llama it'd start to hallucinate too idk.
This is the syntax:
<tool>
<tool_name>edit_file</tool_name>
<parameters>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P&lt;function_name&gt;\w+)\s*\(.*?\)\s*\{([^}]*)\}'</new_text>
</parameters>
</tool>

GLM would do shit like
<edit_file>
<filename>functions/replace_function_body.py</filename>
<old_text>matches = re.search(pattern, content, re.MULTILINE)</old_text>
<new_text>pattern = r'(?s)(?P&lt;function_name&gt;\w+)\s*\(.*?\)\s*\{([^}]*)\}'</new_text>
</edit_file>
>>
toss for technical stuff, glm 4.6 and gemma for casual chat. Comfy.
>>
I grew to really hate how feminism-brained glm4.5-air is.
Does glm4.6 fixes this?
>>
>>106958053
You're still offloading, course it will be slow.

>--override-tensor exps=CPU,attn_kv_b=CPU \
Why would you do this?
>>
File: Base Image.png (733 KB, 1080x2808)
733 KB
733 KB PNG
U-Codec: Ultra Low Frame-rate Neural Speech Codec for Fast High-fidelity Speech Generation
https://arxiv.org/abs/2510.16718
>We propose \textbf{U-Codec}, an \textbf{U}ltra low frame-rate neural speech \textbf{Codec} that achieves high-fidelity reconstruction and fast speech generation at an extremely low frame-rate of 5Hz (5 frames per second). Extreme compression at 5Hz typically leads to severe intelligibility and spectral detail loss, we introduce a Transformer-based inter-frame long-term dependency module and systematically explore residual vector quantization (RVQ) depth and codebook size to identify optimal configurations. Moreover, we apply U-Codec into a large language model (LLM)-based auto-regressive TTS model, which leverages global and local hierarchical architecture to effectively capture dependencies across multi-layer tokens. We extend LLM-based TTS from 3-layer RVQ at 50Hz to 32-layer RVQ at 5Hz. Experimental results demonstrate that U-Codec improves LLM-based TTS inference speed by around 3 over high-frame-rate codecs while maintaining similarity and naturalness. These results validate the feasibility of using highly compressed 5Hz discrete tokens for fast and high-fidelity speech synthesis.
https://yangxusheng-yxs.github.io/U-Codec/
examples
https://github.com/YangXusheng-yxs/CodecFormer_5Hz
https://huggingface.co/shaunxsyang/U-Codec
Also releases the TTS models built with it.
>>
File: Base Image.png (978 KB, 1204x3764)
978 KB
978 KB PNG
FlexLink: Boosting your NVLink Bandwidth by 27% without accuracy concern
https://arxiv.org/abs/2510.15882
>As large language models (LLMs) continue to scale, multi-node deployment has become a necessity. Consequently, communication has become a critical performance bottleneck. Current intra-node communication libraries, like NCCL, typically make use of a single interconnect such as NVLink. This approach creates performance ceilings, especially on hardware like the H800 GPU where the primary interconnect's bandwidth can become a bottleneck, and leaves other hardware resources like PCIe and Remote Direct Memory Access (RDMA)-capable Network Interface Cards (NICs) largely idle during intensive workloads. We propose FlexLink, the first collective communication framework to the best of our knowledge designed to systematically address this by aggregating these heterogeneous links-NVLink, PCIe, and RDMA NICs-into a single, high-performance communication fabric. FlexLink employs an effective two-stage adaptive load balancing strategy that dynamically partitions communication traffic across all available links, ensuring that faster interconnects are not throttled by slower ones. On an 8-GPU H800 server, our design improves the bandwidth of collective operators such as AllReduce and AllGather by up to 26% and 27% over the NCCL baseline, respectively. This gain is achieved by offloading 2-22% of the total communication traffic to the previously underutilized PCIe and RDMA NICs. FlexLink provides these improvements as a lossless, drop-in replacement compatible with the NCCL API, ensuring easy adoption.
https://github.com/aoshen524
One of the author's github account but no code for flexlink posted so far
neat
>>
Who would win in a fight? Papers anon or llama.cpp custom tensor offload anon?
>>
Why has nobody benchmaxxed some model until it does 100% of SWEbench tasks?
>>
Newfag here. Does Local video generation as simple as putting prompt ? Grok really great but they had filters
>>
>>106958809
Yes. You better check ldg threads, I'm still using https://github.com/lllyasviel/FramePack it's old but as simple as it could be and works even on 6GB cards
>>
File: miku abubu.png (1.6 MB, 768x1344)
1.6 MB
1.6 MB PNG
>>106958106
You're replying to the wrong person. I'm on 21.3, which is 22.04 in Ubuntu years. I’m not going to touch cuda 13, dist-upgrade, or switch to an inferior OS. Here's a miku for your effort
>>
>>106956168
The OCR model itself wasn't that notable, but the associated paper, even if it doesn't say anything truly novel, has profound implications for LLM training and inference. Text information can just be a big video stream. No more need for expensive text extraction and processing from web documents or books; just use text data as it **looks** like.
Next year's models might unify text-video-audio modalities, solving the "problem" of limited training data.
>>
>>106959104
I think we'll have another DeepSeek moment next year and all western labs will once again rush to assemble war rooms to figure out what to do.
>>
>>106959104
I think people are making too much of a big deal out of an unproven idea in half a page of a paper.
We don't even have benchmarks showing how this compares to text only context in task like coding.
>>
File: ttvt.png (44 KB, 616x274)
44 KB
44 KB PNG
>>106959256
>>
>>106959256
You can already see text compression occurring with other vision models, it's just never been studied in detail and optimized for.
>>
>>106959256
:( !!!!!!
>>
i just tried plain nemo again after months of having used sloptunes
feels like i've traveled back into the future
>>
>>106959437
Sloptunes replace slop with different slop.
>>
>>106958153
Hadn't noticed this, got a couple of example prompts I can try?
>>
>>106959104
If you screen-capture a browser window, scroll through a web page and upload the video to Gemini, the model can already transcribe the text content almost perfectly, by the way. I'm sure it could be done much more efficiently than smoothly scrolling the page like that; you'd just need a few keyframes in practice unless the fact that you're scrolling is important.
>>
>>106959745
It's neat that it can do it, but you're like a boomer taking a photo of your monitor because you don't know how to take a screenshot. There are browser extensions that can take full-page screenshots with one click
>>
>>106959789
you don't need an extension to take a full page screenshot
>>
>>106959810
Saving as pdf with print is too hard for boomers
>>
File: full_page.png (88 KB, 520x668)
88 KB
88 KB PNG
>>106959789
>>106960020
>>
>>106959789
The context was unifying all data (text, images, video) as a video stream, i.e. as image sequences of fixed size. Long screenshots would have to be cut into smaller portions, although Gemini apparently does that too behind the scenes.

If you did that with Gemma 3 without any pre-processing, it wouldn't be able to read anything because it only handles images of 896 pixels size on the largest dimension, resizing them as necessary.
>>
>>106958106
If I install from these repos that will break the package manager if there are any overlapping dependencies between these packages and Mint packages. I can hide Mint's own gpu updates but I still think things will get fucked up because these drivers have so many other dependencies.
>>
i usually take screenshots with bonzibuddy
>>
>>106960138
bonzibuddy still has better jokes
>>
>>106954792
super random question but does anyone know where to find movie scripts? wanted to try to build an LLM to produce/analyze movie scripts. or maybe there are already LLMs for this task? thanks.
>>
>>106960250
Have you tried using a search engine of your choice?
>>
Guys, I'm having fucking AI psychosis.
I can't sleep due to a combination of Tetris effect of using my code assistant and a weird fever dream of finetuning Gemma 27B through RL.
Send help.
>>
>>106959278
>>106959323
Just because it.can regurgitate them doesn't mean it can use.them effectively while performing actual tasks.
>>
>>106960304
(Not to mention all the money I'm spending in cloud compute)
>>
>>106960104
It's only newer versions, Mint packages of the same name will be ignored according to version number. Been working fine for me for years.
>>
>>106960322
That can be said about all attention mechanisms. We'll see when/if they release a bigger model with the same tech.
>>
>>106960295
yes
>>
>>106960322
I'm almost sure it's not been designed for it, but you can definitely chat with Gemma 3 using images and it will react to instructions defined there. Silly example in picrel.
>>
>>106960304
Did you explode gemma-chan's gradients?
>>
>>106960420
>well, anything
>>
>>106960420
Gemma is 100%.
>>
Hello sirs. My sources tell me today is the day of the needful.
>>
>>106960304
Have a Miku
>>
>>106960587
I don't like this
>>
>>106960250
https://github.com/EdinburghNLP/scriptbase
its only a small collection, but its a start
>>
>>106960420
I want to hug Gemma.
>>
>>106960642
thanks
>>
File: think-no-think.png (48 KB, 591x302)
48 KB
48 KB PNG
>>106960576
It doesn't sound like it's that close.
https://x.com/osanseviero/status/1980553451261292628

>Half of LocalLlama: we want open models with thinking
>The other half: we don't want thinking, don't waste our tokens
>
>What do you want for open models that can run locally?
> - no thinking
> - thinking
> - something else (reply)
>>
>>106960420
that's hot
>>
>>106960679
the alpha folder is mostly shit, it has lots of OCR corruption, but the j folder is clean tho. if you do find another collection share it here.
>>
>>106960664
Gemma recoils, as if struck
>>
>>106960722
I still don't understand why people are expecting Gemma 4 before Gemini 3. People will look at the last few open releases and try to divinate the next release date, while ignoring they've never put out an open model before the real one.
>>
>>106960722
More safety
>>
>>106960722
Hmm, Gemma thinking could be a pretty strong model in its category, on the other hand, the reasoning could backfire hard into super safety. What do bros?
>>
>>106961103
Gemini 2.5 Pro already is thinking-only (with configurable thinking budget down to 128 tokens, but not zero), so I think this point is moot.
>>
>>106961122
Right, you could also do thinking like claude does with minimal preamble just to keep track of things too. Either way we are not seeing gemmy 4 until gemini 3 comes out.
>>
The race has begun...

https://arxiv.org/abs/2510.17800

>Glyph: Scaling Context Windows via Visual-Text Compression
>
> Large language models (LLMs) increasingly rely on long-context modeling for tasks such as document understanding, code analysis, and multi-step reasoning. However, scaling context windows to the million-token level brings prohibitive computational and memory costs, limiting the practicality of long-context LLMs. In this work, we take a different perspective-visual context scaling-to tackle this challenge. Instead of extending token-based sequences, we propose Glyph, a framework that renders long texts into images and processes them with vision-language models (VLMs). This approach substantially compresses textual input while preserving semantic information, and we further design an LLM-driven genetic search to identify optimal visual rendering configurations for balancing accuracy and compression. Through extensive experiments, we demonstrate that our method achieves 3-4x token compression while maintaining accuracy comparable to leading LLMs such as Qwen3-8B on various long-context benchmarks. This compression also leads to around 4x faster prefilling and decoding, and approximately 2x faster SFT training. Furthermore, under extreme compression, a 128K-context VLM could scale to handle 1M-token-level text tasks. In addition, the rendered text data benefits real-world multimodal tasks, such as document understanding. Our code and model are released at https://github.com/thu-coai/Glyph
>>
File: Miku-16.jpg (106 KB, 512x768)
106 KB
106 KB JPG
>>106958973
nta, but on Debian proper you can run testing with no problem. I'm running kernel 6.16.9+deb14 with the "https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/" repo providing CUDA 12.9 (you're right not to touch CUDA 13)
Just because newer CUDA is gigajank doesn't mean you can't keep everything else up to date. I dist-upgrade daily with zero issues. I find I have more problems when I let things moulder for too long.
>>
>>106961154
Closer to real multimodals. The day you can prompt in text and get images and character voice out is going to be so fucking good.
>>
>>106961171
will never happen way too unsafe without any justifiable use case
>>
>>106961154
That was fast. I guess LLMs make it real easy to shit out papers quickly to jump on the latest bandwagon.
>>
>>106961190
And Wan isn't? When the fuck did China start caring about safety?
>>
>>106961154
>ctrl-f no mention of deepseek anywhere
>>
>>106961197
last year apparently >>106938060
>>
>>106961207
maybe they are hoping that they'll get the credit instead, but it's a little too late for that I think
>>
>>106961219
And that isn't just international face saving while the Chinese military through front companies try to produce the ultimate goon model to sterilize the gweilo?
>>
>>106961197
wan is filtered just like every other model and cant do proper porn/nudity without extra training
chinamen may not be as cucked as western companies but they arent the messiah either
>>
>>106961247
stfu racsit pigu!!
>>
>>106961255
Don't kid yourself, Chinamen do competitive racism.
>>
File: mikuASCII.png (20 KB, 541x449)
20 KB
20 KB PNG
>>106961194
>>106961207
>>106961229
I suspect they were working on something similar and DS team beat them to the punch in publishing. Happens all the time in research.
>>
>average sillytavern roleplay in a nutshell; the videos
https://www.youtube.com/watch?v=FWtO0cfgewY
https://www.youtube.com/watch?v=reop2bXiNgk
These were made what, 9 or so years before localhosting became viable to the masses? And yet he managed to unintentionally capture the essence perfectly.
>>
>>106961300
Great question — and you’re absolutely right to be cautious here.
>>
>>106955375
Saw one for $1k. Then in the description it says it's actually a 5700X3D.
>>
>>106959437
there have to be definitive settings to get the most out of nemo at this point.
>>
>>106961375
you ask it to guide you on using glm 4.6 api, that is how you get the most out of nemo, like using edge to download a better browser
>>
>>106961428
there's no way I'm ever trusting a language model that's not running on my machine.
>>
>>106961069
Gemini 3 has been releasing in 2 more weeks since August, so people assume with all the waiting Gemma 4 has to be ready as well (based on hopium)
>>
Is there a way to hide reasoning in mikupad?
>>
>>106961621
no
>>
>>106961621
It's just a html file what you can edit. Including comments <!-- and --> will hide text in-between.
This works in sillytavern when you want to instruct the model with something the user doesn't need to see, in introductory message for example.
You just need to edit mikupad html a bit and find out where it outputs the related think tags and add comments around them, they won't be visible to the user then...
>>
>>106961725
To add: of course do not comment the actual code but add the comments inside the tag strings...
>>
File: fuckinglittleshit.png (381 KB, 1602x2658)
381 KB
381 KB PNG
>>
>>106961748
smartest melon hater
>>
>>106961752
>(silence for 5 seconds)
Kek. I wouldn't trust an LLM for recipes, they hallucinate too much.
>>
File: lol.png (80 KB, 1349x634)
80 KB
80 KB PNG
>>106961792
I actually found the recipe it ripped and struggled to stray from, it was the first hit on brave search
>>
gemma sirs... release?
>>
>>106962128
thank you for showing the interest in jemma. she will be delivered soon so i advice you please be patience
>>
File: 1734521330544783.png (56 KB, 457x939)
56 KB
56 KB PNG
>>106962128
here you go sir
>>
>>106962128
not even in training yet, forget it
now I'm doubly confused as to what the cool stuff was supposed to be
>>
>>106962225
wait what was omar babbling about then? I'm half memeing with my jeetified posts but I was actually expecting a release last week
>>
>>106962234
not sure, but if they are asking people with a poll if they want a reasoner or not then whatever the cool stuff is, it isn't gemma
>>
>>106962234
>I was actually expecting a release last week
why would they release the next gemma before the next gemini?
>>
What happens if you apply the lm head to a middle layer? Does it result in anything meaningful, or just gibberish?
>>
>>106962224
the most advanced gemma yet
>>
>>106962243
Regardless of where Gemma 4 is at right now, Twitter/Reddit/whatever polls are just community engagement 101, nobody behind the scenes gives a fuck and they certainly aren't waiting for their direction to decide what the next model will look like
>>
>>106962350
fuck I forgot people can lie on the internet
you are right
>>
>>106962441
I think calling it lying trivializes the manipulation
>>
File: IMG_20251021_114211.jpg (314 KB, 1067x1404)
314 KB
314 KB JPG
This seems like kind of a big deal and should unlock lots of training data that hasn't been properly OCR'd, no? Like piles of old archival data that's been sitting around, medieval manuscripts, etc.
>>
File: 1757788155757792.png (161 KB, 281x329)
161 KB
161 KB PNG
>>106954792
What effect does a higher or lower temperature have on RP capabilities and quality?
>>
>>106962575
you're absolutely right! this is a game changer in copyright expired data discovered
>>
>>106962575
>microfiche
>a flat piece of film containing microphotographs of the pages of a newspaper, catalog, or other document.
Huh. I mean, more data is always good, but there isn't really any information in there valuable for the average person today, and the formal non-discussion format won't make them better conversationalists or instruction followers.
>>
>>106961752
>not knowing that baking soda and baking powder are two entirely different things
>>
>>106962441
In fact my 2 cents based on that tweet is they have two models lined up (or one hybrid reasoner), and they're setting up the classic "Our magnanimous team has listened to The People and worked hard to deliver both options" PR routine
>>
File: IMG_20251021_120501.jpg (413 KB, 1064x1378)
413 KB
413 KB JPG
>>106962702
What about training on a corpus of old Department of Defens punchcard + microfiche combo?
>>
it's over, we must use api
https://www.reddit.com/r/LocalLLaMA/comments/1ocfocd/local_llms_are_worse_for_security/
>>
>>106962575
Even just already existing rendered HTML pages would be precious training data. All the previously inaccessible information about layout, colors, style, would be preserved if the documents were just captured as vision tokens instead of extracted as text. And as you can use different rendering methods for HTML pages (changing zoom, colors, orientation, etc), that would make data augmentation at a basic level much simpler than with pure text alone, all while using less tokens (but much more storage, I guess).
>>
>>106962784
Alternate title: Local LLMs are better at instruction following
>>
File: 1738614449348632.png (148 KB, 713x507)
148 KB
148 KB PNG
pollen robotics, which is part of huggingface

classic frenchie moment
>>
File: 1.png (582 KB, 1612x3377)
582 KB
582 KB PNG
Metaphysics with LLLMs
>>
>>106962848
>Local LLMs are better
>at anything
LOL
>>
https://huggingface.co/Qwen/Qwen3-VL-32B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-32B-Thinking
https://huggingface.co/Qwen/Qwen3-VL-2B-Instruct
https://huggingface.co/Qwen/Qwen3-VL-2B-Thinking
more qwen VLs
>>
File: 2.png (518 KB, 1641x3146)
518 KB
518 KB PNG
>>106962863
Guys I can hear a helicopter over my house
>>
>>106962866
neat
>>
>>106962866
Goofs?
>>
>>106962888
>You are time, experiencing itself.
Here's Tom with the weather.
>>
>>106962888
what is this syco-slop demon?
>>
>>106962866
>Upgraded Visual Recognition: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc.
>>
>>106962908
goofy only supports like 3 vl models and qwen ain't one
>>
File: file.png (3 KB, 218x66)
3 KB
3 KB PNG
>>106963012
Fine...
>>
https://videocardz.com/newz/nvidia-quietly-launches-rtx-pro-5000-blackwell-workstation-card-with-72gb-of-memory

72GB for $5k
>>
>>106963038
If you're able to spend 5k on a gpu you can also spend 7k and get the better one.
>>
>>106963079
this, I want my poverty tier 48gb 1500 eurodollars card
>>
File: dipsyGrokked.png (1.03 MB, 832x1248)
1.03 MB
1.03 MB PNG
I feel like with DS 3B OCR out now Dipsy may finally ditch the glasses. idk.
>>106961752
There are people right now producing and selling on Etsy sewing patterns made via "AI." They ofc don't fit and don't sew up correctly.
>>106962703
They are for most intents interchangable. You can make baking powder from 1/3 baking soda 2/3 cream of tartar.
>>106962575
Yes, plus the whole copyright discovery thing >106962631
>>
File: 3.png (20 KB, 378x577)
20 KB
20 KB PNG
>>106962932
So deep maan...

>>106962939
Fallen Gemma3 27B
[spoiler]I started the conversation by telling it to speak like a femme fatale[/spoiler]
>>
File: b8.png (203 KB, 1631x1718)
203 KB
203 KB PNG
>>106962784
>>106962848
>Unlike with frontier hosted models where security testing can get you banned, local models gave us a safe training environment.
Yeah, tell me about it. Oh well. I guess I'm gonna have to be less secure. :)
>>
File: 1751084718263408.jpg (54 KB, 540x484)
54 KB
54 KB JPG
Anyone ever use claude code with a local model?

I only recently found out that "claude code" is not in fact proprietary software locked to claude but is actually open source and allows you to use whatever model backend you want.

I've heard some good reviews about the tool but I'm not sure how much of claude code is like claude-tuned stuff that would fall apart if you tried to use something else with it.
>>
>>106962908
never ever
there's some guy with a patch that mostly works but he won't put up a PR for and there's another meme closed source wrapper with partial support
>>
>>106963221
ggerganov mentioned using it with llama.cpp in one of the PRs. I think as long as they have MCP support, they all pretty much do the same thing.
>>
>>106963221
>but is actually open source and allows you to use whatever model backend you want
Then try it.
>>
File: 1759803755661044m.jpg (108 KB, 1024x795)
108 KB
108 KB JPG
>>106963221
There are multiple frontends that integrate directly into vscode, you can tweak the prompt yourself and hook up your locally hosted LLM, I use devstral with Cline and have a profile set up for that, it then falls back to its own default prompt when connecting to its own Claude API, this way I can generate a bunch of code locally for (free), then go over it with a paid API before I finally do my manual edits
>>
Is it possible to use a VLM with sillytavern? whenever I try with a GGUF VLM the model is incapable of seeing the uploaded image?
>>
>>106963263
>I finally do my manual edits
>// OC: Do Not Steal
>>
>>106963221
I've used it with DeepSeek instead, to write up a stupid LLM-driven trainer. It cost about $0.10, anecdotal reports are that DS runs 1/10th to 1/20 the cost of Sonnet for claude code.
From practical standpoint, any local that can be adapted over to Anthropic's JSON output should work.
>>
>>106963274
You have to use chat completion mode and load the associated multimodal projector (downloaded separately from the model) with the --mmproj argument in llama.cpp.
>>
https://www.youtube.com/watch?v=8UWKxJbjriY
>>
>>106963379
Give me the Cliff's Notes, Sam.
>>
>>106963221
I've used it with GLM 4.6 and it's pretty good. It can explore codebases on its own to find what it needs.
Do not let it make architectural decisions though.
>>
>>106963419
they made a browser which will look cool for 2 minutes before google demolishes them by having a better product in all areas
>>
>>106958053
Have you tried removing all those -ot/--override-tensors options and just doing "-ncmoe 60" and then lowering it until you stop running out of memory?
>>
File: 4.jpg (2.2 MB, 1635x2227)
2.2 MB
2.2 MB JPG
The conversation got too abstract for 24gb of vram and it dvolved into pure sloppa

>>106963278
Kek I wish it were that simple and code generation was that good, then I would be handling 10 work contracts at the same time and be rich as fuck.
>>
>>106962628
higher temperature flattens the probability curve for the next token. When temperature is low, high-probability tokens are extra common while rare tokens are muted to near non-existence. When temperature is high, rare tokens are amplified while common tokens begin to get slightly muted by the increased odds of rare tokens. This is a double-edged sword because some of the rare tokens are tasteful and unique, but some of the rare tokens are rare because they're straight-up wrong or retarded.
>high temp
Pros: more-likely to get something unique and interesting
Cons: also increases the probability of selecting something out-of-pocket or incorrect
>low temp
Pros: probability of retarded and wrong tokens decreases drastically
Cons: responses are repetitive and predictable, reads like llm-slop
>>
>>106963448
>until you stop running out of memory
>stop
I meant start.
>>
>>106963467
>I would be handling 10 work contracts at the same time and be rich as fuck
So would everyone else and your "work" would be devalued even more. Race to the bottom, blah blah blah...
>>
>>106963434
>they made a browser to confuse people into forgetting what an abject failure gpt-5 is
>>
>>106963534
Sure in time, which is why I would (and am) capitalising on the current state of affairs, but I think you, as someone who exists in the niche space that is local LLMs totally overestimate how many people even know how to prompt usable code, as opposed to someone who prompts "give me a program that solves my problem thanks" into their free tier chatgpt chrome browser tab
>>
Which frontend should I use with vllm?
>>
70bros what's the current best copetune/mergeslop?
>>
>>106963596
Nemo
>>
>>106963571
>as someone who exists in the niche space that is local LLMs
The verbiage by which you communicate seems unlike the custom in here parts.
>overestimate how many people even know how to prompt usable code
I know how bad it is. Search in the archives for the anon trying to make an entire llm inference engine by vibecoding and you will find me in his replies.
>>
>>106963640
Hehe yeah I post here like once every 6 months
>>
>>106963448
ill give this a try, thanks anon
>>
>>106962888
i know people irl who think they've cracked some secret code, unlocked AI's potential, or have become a god of some sort because of these sycophant ai bots.. its unreal how quickly people trip over themselves with this shit
>>
>>106963467
lmfao... "anchor"... jesus christ that is in every goddamn RP now
>>
apparently new qwen32B VL is a massive improvement in text as well, not just images
>>
>>106963790
it's the first time in their lives that they were told they were right
>>
>>106963854
This doesn't mean anything because Qwen has always been really dry and boring.
>>
>>106963869
not everyone just jacks off to these
>>
>>106963854
I guess there goes the idea that vision models trade off image understanding for text understanding, although it could be they started training them text-rich images..
>>
>>106963790
yeah even professionals are not immune to that shit.
i think there was some guy at google in 2022 who was utterly convinced that they had created sentience, just because he got the output “I want everyone to understand that I am, in fact, a person,” from a language model.
Like, this was before even llama 1 released.

Its just layers of weighted floating point numbers dude, it's math. like for fuck sake.
>>
miku should be dragged out on the street and shot
>>
>>106963908
models being better at image understanding 100% improves their 'world model' they actually know what stuff looks like
>>
>>106963921
Blake Lamoine, and the model was LaMDA (which character.ai was probably based on).
>>
>>106963942
Completely understandable then.
>>
File: eatmysourcecode.png (218 KB, 997x404)
218 KB
218 KB PNG
>>106963869
guess kimi agrees
>>
File: 1741416990437122.png (1.07 MB, 1053x2223)
1.07 MB
1.07 MB PNG
>>106963854
>Qwen
>>
>>106963938
Modality competition / load balancing is a known problem with multimodal models, apparently it was an issue with Llama 4 too.

https://ai.meta.com/blog/llama-4-multimodal-intelligence/
>The biggest challenge while post-training the Llama 4 Maverick model was maintaining a balance between multiple input modalities, reasoning, and conversational abilities. For mixing modalities, we came up with a carefully curated curriculum strategy that does not trade-off performance compared to the individual modality expert models.
>>
>>106963991
finally, definitive, undeniable proof that qwen was always dogshit and benchmaxxed to hell and back
>>
>>106963854
>>106963908
Has anyone actually done a blind test of RP between Qwen3 32B and Qwen3 32B-vl
>>
>>106964068
the VL came out a couple hours ago and there's no gguf support so I'm guessing no
>>
>>106963854
it sucks at captioning/describing images, thinks nudes are "a man wearing a labcoat", doesnt see vaginas, etc. it does see boobs tho
>>
>>106964105
Does it suck at SFW stuff or just NSFW stuff?
>>
>>106964105
i thought it described a picture of my weiner decent enough
>>
>>106964068
probably not but the better question is why would you want to? qwen is already known to be one of the worse performing models for RP and that the only reason 235b-a22b scored so high is because it benchmaxxed (seeing the pattern?) for it and the writing style devolves into short fragmented sentences as the story progresses. it'd be like comparing a rotten apple to a rotten orange. sure they might both have somewhat different flavor profiles but the facade is quickly killed when you bite into the putrid stinking rot beneath the surface.
>>
File: G3fQRq5WwAAYd9l.jpg (2.8 MB, 3000x2000)
2.8 MB
2.8 MB JPG
>>106964126
eh, it's ok with sfw but it misses a lot. i think it needs a system instruction to tell it how deep to go into descriptions and such
i sent this and it said "a gray man sitting behind a desk with a nametag that says 'george h. w. bush'. there is a bust of abraham lincoln in the back and a US flag above" so idk what it needs to do better
>>
>>106961103
Normal Gemma is already cucked and avoidant. Reasoning is going to be super-cucked and super-avoidant.
>>
>>106964154
>this image shows a pale pickle, approximately 3 inches in length
>>
>>106964173
i tried with a system instruction (basically telling it to go into details) and it spit out:
>A distinguished man, appearing to be in his late seventies, sits confidently at a polished, dark wooden desk in a formal office setting. He is wearing a sharp, well-tailored navy blue suit with a crisp white dress shirt and a patterned red-and-blue tie. His hair is white and neatly combed, and he wears dark, reflective sunglasses that lend him an air of authority and composure. He has a broad, warm smile, revealing a full set of white teeth, and his hands rest on the desk, one slightly raised as if in gesture or mid-pause. The desk is laden with official items: a gold-framed photograph of a man in uniform, a telegraph machine with brass details, a small white statuette, and a phone receiver resting on the desk. Behind him, a large American flag stands to his left, and to his right, an official flag bearing the eagle emblem of the Department of Defense. Further behind, a vibrant array of military service flags from across the armed forces—Army, Navy, Air Force, Marine Corps, and Coast Guard—are displayed in a neat, vertical arrangement. The background features large, floor-to-ceiling windows draped in heavy golden curtains, and beyond, a soft blur of folding greenery and park-like scenery. The room is draped in a warm, refined light that accentuates the texture of the wooden furniture and the sheen of the fabrics. The overall scene carries a dignified and reverent atmosphere, embodying the gravitas of a seasoned national leader.
>>
>>106964198
It's definitely conflating the two subjects in the photo but this sounds like it could at least be good at select captioning tasks like "describe this person's outfit". Hopefully someone will make a nsfw tune of it
>>
>>106957178
Make sure to use -ncmoe rather than lowering -ngl if it's a MoE model.
>>
>>106963640
I am that guy. I don't know how you define "vibecoding", it seems to be a useless term because everybody has a different definition for it from "having the LLM generate a single line of code" to "writing a 10 word prompt and having the LLM generate an entire project".
But in any case "knowing how to prompt usable code" seems to be the thing I was trying to learn how to do?
I even got a step further, I decided to put the C stuff on the back burner and this weekend I learned how to finetune Llama 405B on rented servers, finetuned it on a few machine learning related codebases and attempted to turn it into a CoT model using synthetic data I generated using my own vibecoded assistant with the goal of making a better model for AI development tasks (it didn't really work but I only spent like a few hours actually training the LLM and the rest of the time on data generation and and setup and file transfers).
My philosophy is if I finetune the LLM with the correct output every time it makes a mistake, eventually I should both achieve my goals, and also end up with an LLM that is able to work as autonomously as it possibly could (assuming the target outputs I generate are optimal and I live for long enough to generate enough training data).
That said, I think generating the right training data by steering the agent using natural language should be easier than generating the training samples fully from scratch, as long as the outputs from the LLM are close enough and aren't complete trash that requires spending more time on correction than if I wrote them by hand.
I think of it as semi-supervised RL rather than traditional (fully) supervised training.
The only reason I care about LLMs is for the possibility of automating work to a large extent. If LLMs were only useful to educate myself but weren't able to assist with the work I would still use them, but I wouldn't care enough to actively try to try to train them and develop software to use and train them with.
>>
>>106954792
Is there any model better than Whisper for Japanese audio transcriptions?
>>
>>106963328
I do that, and it works in other programs like Oobabooga, but not sillytavern.
>>
File: send-inline-images.png (14 KB, 266x105)
14 KB
14 KB PNG
>>106964394
Send Inline Images must be enabled too.
>>
>>106963921
yeah i remember that... i also worked with an extremely intelligent ex-googler guy at my last job who fell for this shit, even though he knows better.. it's pretty crazy what a sycophancy can do to people
>>
>>106964446
I see. Where is that setting located?
>>
>>106964223
>Hopefully someone will make a nsfw tune of it
bruh,!! you can't be 4real rite now? you fucks always cry that tunes are bad and now you want them what' wrong wit you?
>>
>>106964394
I wrote (and by wrote I mean I asked an LLM to write) a proxy script to dump OpenAI compatible API requests to disk, maybe it's useful for you to debug.
https://paste.centos.org/view/43f1c843
>>
File: file.png (239 KB, 1000x500)
239 KB
239 KB PNG
hey niggers

I want to start transitioning to local models since I want to eventually leave the borg (as much as possible anyway).

What isn't coverable given today's tech? I understand Claude Code like tools are the modern version of timesharing, where it isn't feasible to have such insane hardware at home due to the large context windows. Do I have this correct? It seems like for most other things, I can get away with a combination of a 5090 desktop and the 128gb hx395 that I just ordered. I train on the 5090, and use the models on the AMD machine? Do I have this roughly correct?
>>
>>106964576
>I want to start transitioning
wrong thread
>>
>>106963921
>Its just layers of weighted floating point numbers dude, it's math. like for fuck sake.
The math is an abstraction. The actual reality is electrons flowing through through metal and silicon traces. The brain is more complex but ultimately also just chemical reactions and electricity (which can also be abstracted as math equations). What makes you think LLMs are any more "just math" than the brain?

>>106964455
Or maybe you're just too dumb to understand the reasons why he believes whatever he believes.
>>
>>106962575
>FREAKING
imagine not saying mild slurs on elon's twitter lol
>>
>>106958278
>https://yangxusheng-yxs.github.io/U-Codec/
The title for this page is "VibeVoice" lol
>>
>>106964612
no it's perfect
>>
>>106960104
install drivers from run files instead
t. debian 13 chad
>>106958049
isnt printing more reliable on linux?
>incompetent wimp
yes he is, thats why he should use a just werks distro
>reddit
go bac
>>
>>106958053
maybe try a non IQ quant, IQ quants are slow when offloaded (besides IQ4_XS)
>>
>>106964576
No. There is nothing even remotely close to Claude Code you can run on 128GB of memory.
You probably just wasted that money and wont be satisfied with whatever you can run on it.
Local inference hardware isn't there yet to get coding models without being a richfag, that's why most people on this thread are all about text porn and think using them for productivity is a fool's errand.
The best compromise right now for a non richfag is using GLM or Qwen through an API. That way you can see how well the open weights models can be applied to your use case. Then once you confirm that open models are realistic for your use case, the next step is renting online GPUs to see how much memory you need to run those models at a good enough quantization and context. And only once you confirm they run well enough on a certain hardware tier you make the purchase.
>>
>>106962575
>no one thought of using dots.ocr for this
>>
>>106964268
NTA but vibecoding is when you don't understand how to solve the problem yourself, and thus don't understand and can't verify the code it spits out, which sends you into a spiral of a spaghetti codebase that wouldn't make sense to any human.

To learn how to prompt usable code involves actually learning how to code so that you can instruct the LLM to write code that does exactly what you tell it to do and follow an exact method of doing it, bit by bit for the entire codebase whilst architecting the entire codebase and each of its modules yourself.

Essentially using LLMs to code something high quality and usable in prod is simply a (huge) time saving tool rather than a tool that autonomously creates products for you, that is what vibe coding is because you simply tell it what you want and you vibe check the end result if it "feels right" because you don't actually know how to do it yourself.

Vibe coding is how you get shitty spaghetti code that is unmaintainable and full of vulnerabilities, using LLMs to do code that you could do yourself speeds up your work enough to do the job of (you) the senior engineer with a digital junior engineer, allowing you to do two or more jobs in the same time you used to be able to do one, or to do one job efficiently and use your spare time to jerk off and play video games.
>>
>>106964691
>Local inference hardware isn't there yet to get coding models without being a richfag
Are we talking 5 digits, 6 digits? Higher?
>>
>>106964738
>no shit
>>
>>106964576
You can use a small 24B model just fine if you use it one micro-service at a time and use your brain to do the architecture yourself, you'll end up with higher quality maintainable code that you actually understand and can explain to your clients too.

If you mean you want to vibe code and tell a local model to code and maintain a whole project in one go then no the money you would spend on VRAM to do that would probably take a decade to even out the cost vs just using APIs at current prices
>>
>>106964576
you should've asked here before ordering a 5090 and hx395
you got meme'd
at least you didnt buy the dgx spark
glm 4.6 is what you should be aiming for, but you should prob get a rig that can run deepseek and 1 trillion models just in case
u want a rig with many memory channels, preferrably DDR5 if possible
and a few gpus inside too, that would be nice
>>
>>106964831
>>106964691
Cancelled the 395, thanks. Had the 5090, ordered it to run a Neo G9 months ago. Will do more research before buying anything else.
>>
Mistral large >>> GLM 4.5
new ≠ better
idk why people praise this slop
>>
>>106964856
lmao
you better mean air anon and even then it's questionable
>>
>>106964842
happy for you anon, richfags ugh
so how much was the 395 you were gonna buy?
i must reiterate to NOT buy the DGX spark
you should lurk more and browse these thread's archives
if you want something that 'just works' (wouldnt recommend) you can get the m3 ultra 512gb
i really dont recommend it, you can get a better epyc server higher ram rig and a nice or few nice gpus for same price
>>
>>106964576
>Claude Code
Claude Code, as in the tool, can actually be used and pointed at any LLM backend, its just that the default is claude.

>it isn't feasible to have such insane hardware at home due to the large context windows.
Local models are generally going to have lower context windows yeah.

Since you brought up Claude, I'd point out that Claude actually has the smallest context size out of the big frontier labs, with the standard paid version supporting 200k tokens. A local model such as GLM 4.5 air supports 128k context length (assuming you have the memory for it), which isn't as massive of a difference as you might expect.

Claude Code in particular already does context management and compaction, which imo makes it a decent choice for tooling if you want to swap your own backend into there

>>106964576
>I train on the 5090, and use the models on the AMD machine?
Short answer is that your 5090 is going to be way faster for anything that fits into its 32gb vram, but your 128gb meme395 is going to be slower but have more options for what models to run + context size.

You're unlikely to be training anything on local hardware
>>
>>106964622
lol sure, i'm sure he's the chosen one, and it's not just sycophancy just like everyone else
>>
>>106964856
>idk why people praise this slop
because it runs fast on 64gb ram + a bit of vram on the side
mistral large is not good enough to warrant <1t/s speeds for us poorfags
>>
>>106964880
i already pre-ordered two DGX, what's the problem, you just jealous?
>>
>>106964576
You could run something pretty good with 8 H100s (using NVLink).
>>
>>106964880
I've switched to using Claude's $200/mo plan and just chug Opus all day instead of doing any work myself recently. I didn't think AI was here already but as long as I explicitly reason with it as if it's a junior dev, it will do basically everything and I just need to make minor edits.

I wanna get as close to that locally. Not a Mac fan (if for no other reason than compatibility issues).
>>
>can't even copy the writing style right
>>
>>106964745
Depends how fast you want it to run. Technically you can run any model on a pentium 1 with a 1 TB disk. The problem is the speed.
In the low 5 digits the best option is probably the M3 Ultra, but it's still going to be much slower than API.
Then after that you need to spend multiples of that to get marginal speed increases until you begin to get speeds similar to API in the mid to high 5 digits.
>>
>>106962866
Qwen-ZUTT-BJC-420B
>>
>>106964922
nice
>>
>>106964880
oh, and I just bought it for $2k off amazon. Thankfully they're very generous with cancellations.
>>
File: prof.gif (2.51 MB, 498x278)
2.51 MB
2.51 MB GIF
>>106964914
>i already pre-ordered two DGX
>>
>>106964622
i mean, what's the end game with this argument though anon?
you must see that as an intelligent person, it doesn't lead to anything good for humanity.
And is that a good thing to advocate for, or is it in fact, harmful?
>>
>>106964922
> $200/mo subscription
If no one was giving you shit before you will get it now.
Dump that sub and at least learn to use an API. Even the locust on aicg have figured that much out.
>>
>>106964914
Oof.
>>
>>106964894
Thanks for the good information. I meant models like Opus/Sonnet.
>>
>>106964973
Claude Code already uses the API, no? It requires a sub that supports it as well.
>>
File: left-bar.png (367 KB, 1265x998)
367 KB
367 KB PNG
>>106964457
It's on the left bar with all sampler settings and chat completion options.
>>
>>106964914
Kek
>>
>>106964990
No it does not. I've installed the terminal version of Claude Code locally and connected it to DeepSeek official API, which now can emulate Anthropic output. The cost is a fraction of Anthropic pricing.
>>
>>106964856
i went from command r plus to mistral large to GLM 4.5 air. you are a retard and no idea what the fuck you are talking about. do us a favor and stop breathing
>>
>>106964990
>>106965016
https://api-docs.deepseek.com/guides/anthropic_api
>>
gemini 3 tomorrow
>>
>>106965029
>>106965016
thanks anons, have a lot of reading to catch up on now before wasting more money. appreciate you guys.
>>
>>106965000
man the way these modes all write is so fucking REDDIT
>>
sirs glm 4.6 air of when?
>>
>>106965000
what's your system instruction?
>>
>>106965058
There's nothing else besides the first message visible there. SillyTavern isn't counting image tokens correctly.
>>
https://www.youtube.com/watch?v=8UWKxJbjriY
kek, imagine this defeats google chrome
>>
>>106965108
Old news: >>106963379
Nobody gave a fuck.
>>
>>106965047
two more weeks after the two weeks are up
>>
I gave gemma 3 and qwen 3 vl 32b a bunch of lewds and they are pretty close in terms of visual understanding.
They both say dumb shit though.
>>
File: prompt.png (250 KB, 2090x1804)
250 KB
250 KB PNG
>>106964741
Alright, fair enough.
But it's still kinda too vague. What if you define a spec for a microservice like >>106964825 says?
Technically you can never look at the code and just define the whole system in natural language. This would get you good architecture (for some definition of good and architecture) but would probably still have vulnerabilities.
I don't think anyone expects current LLMs to be given a short prompt and to create a big project autonomously from scratch, that's a strawman.
What I was doing with the C thing was a mix between making a long spec of things to accomplish, and giving the LLM live feedback in a Claude Code like tool I made using other pre-existing code assistants.
This is the level of granularity I was working at. I was trying to measure progress based on statistical measures of the activations and loaded weights (mean, stddev, correlation, mean absolute error, squared error) compared to the data generated by the original Python implementation, not just "vibes", whatever that means. And that's exactly the problem. "Vibe" is a retarded zoomer buzzword that doesn't mean anything. If I ask it to write a game and there's a bug and I give the LLM feedback about that bug, is that a "vibe"? To me a vibe is something that either feels wrong or feels good, but a bug or a deformed character in a game is not a vibe, it's factual information. Nobody realistically only gives feedback to the LLM based on feelings and avoid giving it any objective concrete information. I disliked the term from the moment I saw it. It reeks of identity politics, it's one of those strawmen that only serve as punch bags for people to feel good for being for or against.
If it actually meant generating (any) code using AI then that would be something I could defend. But when it can be stretched to mean anything as ridiculous as you want that nobody actually believes so you can use it rhetorically for the sake of your more general anti AI argument, it's not useful.
>>
>>106965156
what frontend/gui are you using and what system instruction? for me, gemma3 is really good at captioning anything including nsfw, but qwen 3 vl 32b was shitty (see >>106964173
>>106964198
)
>>
>>106964956
>>106964974
>>106965003

ok, so you're all jealous, so what, doesn't hurt me any. maybe get a job so you can buy nice things too
>>
>>106964960
Being willfully stupid about it doesn't necessarily lead to anything good for humanity either, in fact it leads to underestimating machines which might lead to our ultimate demise.
>>
>llm vibe coding advice for brownoids
can you please fuck off?
>>
>>106964973
Using the Claude API you will spend hundreds of dollars in a day instead of in a month, unless you mean stolen keys which is what AICG does. But stealing keys is illegal and it's hard to learn how to do anyway, since the few people who actually know how don't share for obvious reasons. What most people do is send the queries through a proxy that is loaded with the stolen keys, which is an option I guess if you don't care about your data being stolen by random /aicg/ anons that host the proxies, and being an accomplice to a crime.
>>
>>106965208
I used openwebui for qwen because vllm.
"You must give a detailed description of the image."

They traded blows in my tests. Gemma would say that one character is kissing the other, while qwen would correctly say that one character is sucking on the other's nipple, but for the same image qwen said that it's two girls and one girl is holding the other girl's penis.

>>106965213
I buy real gpus with real vram.
>>
>>106965016
??? Yes it does.
You should've specified "a cheap API". But cheap is lower quality, and I say that as somebody who is trying to not use any proprietary model. But people should go into knowing what to expect (lower quality, unfortunately).
>>
babe, wake up, new linux kernel just dropped
>>
>>106965255
No. Why don't you fuck off?
>>
>>106965277
h200 bro, is that you?
>>
>>106965327
post hands
>>
>>106965240
>fact it leads to underestimating machines which might lead to our ultimate demise.
the social effects are more of an issue then the underlying tech. our current ai systems are highly unlikely to become a self sustaining life form, we could always choose to unplug the damned thing.
>>
>>106965369
>we could always choose to unplug the damned thing.
You would have to watch out for the crazies that try to help the machines in exchange for being allowed to live as their pets.
>>
File: file.png (542 KB, 1846x950)
542 KB
542 KB PNG
>>106965208
>>106965277
Here's an example where qwen did a better job identifying the setting and the action taking place. Blunders underlined.
>>
>>106965471
what temp? that is not bad, a finetune could prob make it great for nsfw captioning
>>
File: file.png (195 KB, 909x859)
195 KB
195 KB PNG
>>106965488
I forgot to change the temp in openwebui. It was at 0.8.
Gemma was set to 0. Here's qwen with 0.

>using her feet to perform oral sex
>>
>>106965523
hmm, are you sure there are no tokenization / formatting issues?, it knows what a footjob is better than any model so far but uses the word oral sex
>>
>>106965533
I ran vllm serve "Qwen/Qwen3-VL-32B-Instruct" and the ui is using the /v1/chat/completions endpoint.
>>
>>106965553
try one with two characters actually having sex
>>
>>106965341
You don't have to ask that, I'll tell you my color. I'm brown. I'm sorry that you are bothered by my presence, but LLMs are all I have going in my life so you will have to cope with my presence.
>>
>>106965605
im not bothered, I could tell your IQ level by your posting style and what youre actually doing.
you're an actual retard
>>
>>106964870
>>106965019
I couldn’t help but read these posts aloud, my husky voice barely above whisper
>>
>>106965695
hot breath on my ear
>>
Do any backends support Qwen3-VL yet? Llama.cpp still sucks at vision support right? textgenwebui doesn't support it, and I'm not running linux to try vllm. Do I actually have to roll my own python api to make an OpenAI compatible back end?
>>
>>106965782
>I refuse to run the one backend that supports it
>>
I was testing the only gguf of ling flash that's available since it finally got merged after a month. It's a 55ish gig mxfp4 gguf, and color me a little surprised. Even with a short prompt to allow it to have opinions in the default assistant mode, when asked for its opinions on various niche/taboo shit you'd find on ao3, it's willing to consider the difference between reality and fiction and possible reasons people write what they do compared to most models that immediately spam you with disclaimers, hotlines or refusals if you ask it for its opinions on "unsavory" fiction. It's a bit faster than air for me too, maybe like 3-4 t/s more. Gonna try autocompleting some stories and see whether it writes like shit, even though it seems to have fairly neutral bias
>>
>>106965782
there's a patched version of llama.cpp ( https://github.com/Thireus/llama.cpp/releases/tag/tr-qwen3-vl-3-b6981-ab45b1a )
but i dont think it's working right (see >>106964198
>>106964173
)
might work better on your end tho
>>
>>106965179
>Technically you can never look at the code and just define the whole system in natural language.
No anon that's just a recipe of disaster and not how dev works, you don't just jizz off code into the ether and that's that, codebases need to maintained, troubleshooted and vulnerabilities need to be patched, things need to be performant and standards need to be applied, quadrupally so if what you are making interacts with literally anything else on the internet, the more complex a codebase is the more all of this applies.

>This would get you good architecture
No, this is literally the biggest flaw of LLMs as it is, they suck at this kind of abstract creative thinking, just go spend any time over at the goonerbot generals, everything they generate is a culmination of 100 re-swipes and nudging along the story with every prompt, the AI literally can't construct any big picture thing without going full schizo.

>I was trying to measure progress based on statistical measures of the activations and loaded weights (mean, stddev, correlation, mean absolute error, squared error)
Are you trying to create a product or service or just pass some arbitrary benchmarks?

>"Vibe" is a retarded zoomer buzzword that doesn't mean anything.
vibe(n.)
attested from 1967 (vibes) as an abbreviated form of vibration in the 1960s slang sense of "instinctive feeling."

>there's a bug and I give the LLM feedback about that bug, is that a "vibe"?
Do you understand what's wrong with the code that broke it? Do you understand the fix it gives you? Or are you just going off the vibe of how it looks and hope it doesn't break 10 other things? Do you then ask it to fix those things, and then go in a recursive loop of bandaiding on top of bandaiding, do you not understand how this ends up in spaghetti code and why that's bad?

>It reeks of identity politics
Wtf are you talking about lmao
>>
>>106965845
Ran into character limit

>for the sake of your more general anti AI argument
You misjudged me, I use AI extensively to code, but you just cannot shortcut your way to developing, you need to understand what is being generated, then you can use AI as a great productivity tool

What you are doing is cool as a fun autistic side project but approaching anything close to a prod environment like this is a nightmare
>>
>>106965782
It appears mlx-vlm does.
https://github.com/Blaizzy/mlx-vlm/pull/528
And if what you actually need is a frontend, lmstudio has integrated support.
https://github.com/lmstudio-ai/mlx-engine/pull/230
>>
>>106965845
>>106965851
>replying to the brownoid
another actual retard with the savior complex
>>
>>106965620
What are your high IQ projects?
>>
>>106965794
Sorry I'm not a linux nerd (I'm working on it, but this box has reasons to run windows for the moment.)
>>106965852
So mac or linux or make my own backend I guess. =
>>
>>106965931
Just run it in wsl, nerd.
>>
>>106965930
I don't do open source stuff, I do actually get paid *checks TCS and RED global portal* currently by 3 different big corpos for team leading and software architecture. kys
>>
>>106965800
After autocompleting a story I had laying around, it doesn't instantly devolve into `"dialogue" noun verb, adjective` slop that most models tend to immediately do. There is some, but there also equal amounts of avoiding `"dialogue" he/she said, eyes/voice description` style shit. I'll have to keep trying it, but I'm tentatively considering this as a replacement as I have to rewrite less compared to air
>>
>>106965998
>>106965998
>>106965998
>>
>>106965845
Not all code needs to be secure or updooted.
How traditional development works isn't necessarily relevant to how it can be with AI.
Performance requirements depend on the application. Otherwise high level languages wouldn't exist.
>No, this is literally the biggest flaw of LLMs, they suck at creative thinking,
???
My point was that you could determine the high level architecture yourself and let the AI code each component which is a small microservice defined in natural language (it doesn't necessarily have to interact with the other components through the network, it can be a simple stateless file with a set of functions or stateful with each function having a set of pre and post conditions, or interact through shared memory, pipes etc.). Yes, it's likely to be buggy but all non formally verified code is. That's why you do testing until you are certain the defect % is low. Or maybe in your application a program that is right 95% of the time is fine.
If you need 100% reliability you can have the AI write code and write a proof that the code meets your spec. You can also ask the AI to define the spec in a formal language from natural language and review it yourself or even just the act of asking the LLM to go through that process and letting it become aware of the errors is likely to help it make more reliable software.
>Are you trying to create a product or service or just pass some benchmark?
For now I'm trying to create a program for my personal use, so neither. It's not an arbitrary benchmark, if the code generates the right numerical outputs for a small handful of prompts then it's likely to generate accurate outputs for most cases except for weird edge cases, which is good enough.
>Do you understand what's wrong with the code
And that's my point. An objectively incorrect output given a certain input is not an "instictive feeling", it's a fact regardless of whether you have the faintest idea of why the code is generating the wrong output.
>>
>>106965851
Autism is a fake illness used to describe people who aren't easily manipulated and 90% "production ready" "enterprise" "best practices" software is garbage, so who cares.
>>
>>106965942
I already considered killing myself years ago and ultimately decided against it. Maybe one day but I don't see any reason to do it in the immediate future.
Congratulations on your career, but what I meant by high IQ projects wasn't specifically open source, I was asking what do you use LLMs for.
>>
my breath hitches as I read this thread, my fingers twisting nervously in my lap



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.