[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: HEN SHIN.jpg (179 KB, 1024x1024)
179 KB
179 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108447705 & >>108441758

►News
>(03/24) GigaChat 3.1 released: https://hf.co/collections/ai-sage/gigachat-31
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: what's in the box.jpg (235 KB, 1536x1536)
235 KB
235 KB JPG
►Recent Highlights from the Previous Thread: >>108447705

--Vision models failing deformity edge cases:
>108452331 >108452376 >108452412 >108452429 >108452466 >108452484 >108452523 >108452546 >108452616 >108452385 >108452409 >108452509 >108452607 >108452626 >108452841 >108452845 >108452849 >108452867
--Xeon 6 LLM inference benchmarks debated over AMX optimization gaps:
>108448422 >108448451 >108448886 >108449507 >108451237 >108452095 >108450210 >108452136
--Nvidia Nemotron reasoning challenge puzzles:
>108448817 >108448837 >108448859 >108449204 >108449216 >108448873 >108448945
--Direct-io PR discussion and gemma3 loading failures:
>108451404 >108451435 >108451499 >108451511 >108451525 >108451530 >108451534 >108451515
--Skepticism toward TurboQuant's 2-bit quantization claims:
>108450002 >108450011 >108450054 >108451136 >108450065 >108451386 >108450294
--Qwen 3.5's niche use cases and performance tradeoffs debated:
>108450432 >108450443 >108450488 >108450499 >108450517 >108450519 >108450534 >108450554 >108450571 >108450589 >108450599 >108450615 >108450634 >108452303
--Optimal context window sizes for coding tasks:
>108451293 >108451325 >108451838 >108451330 >108451306 >108451406 >108451432
--Exploring LLM integration for dynamic NPC interactions in ASCII games:
>108447855 >108447871 >108447952 >108447980 >108448029 >108448043 >108448103 >108448045 >108448058
--TurboQuant claims 6x memory reduction and 8x speedup with zero accuracy loss:
>108451313 >108451431 >108451594 >108451872
--PocketTTS.cpp achieves GPU-like CPU inference speeds:
>108451512 >108451553 >108451556 >108451562
--GigaChat-3.1-Ultra Russian model released with DeepSeek architecture:
>108448539 >108448567
--Step3.5 MTP support PR for llama.cpp:
>108450936 >108451133 >108451275
--Miku, Luka, and Dipsy (free space):
>108450983 >108452704 >108448061 >108452647 >108452749

►Recent Highlight Posts from the Previous Thread: >>108447707

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
uoh yeah
>>
>>108453575
cum in between her thighs
>>
4 boobs proove the dog test was ass
>>
You are going to support Israeli-American chip design by buying the Intel® Arc™ Pro B70, aren't you?
>>
>>108453655
If the cost per GB of VRAM is good, yes.
Otherwise, no.
>>
>>108453652
this is horrific, jesus
>>
>>108453655
*Judeo-Christian chip design
>>
>>108453655
Does Intel have their own thing like zluda yet?
>>
>>108453655
repdill me on b70. i hear their software stack is the real issue. just how bad is it and feasibly how long might it take for to catch up?
>>
>>108453570
artist?
>>
>>108453733
noobai piloted by the local autist
>>
Outside of stepfun which I’m too poor to run shit feels pretty stagnant of late
Shoulda went for the 128gb sticks after all
>>
File: 1762969619953363.png (21 KB, 684x75)
21 KB
21 KB PNG
>>108453655
No
>>
>>108453755
Cheaper than a 5090.
>>
>>108453794
yeah..
>>
>>108453794
It's not competing with 5090s though, it's competing with V100s or AMD R9700s.
>>
this was the least noisy format I could come up with for formatting 4chan threads. I just re-serialized the id's and skipped the names and dates. is this good enough?
>>
File: gemma4.jpg (537 KB, 1264x1737)
537 KB
537 KB JPG
Will my gemma-4 27B be very light on my gpu?
>>
>>108453929
It's clean and should be fine. You could also format it as "No.1" instead of square brackets and put a newline before the comments to make it more recognizable as 4chan threads to the models.
>>
>>108453570
>cunny holding onahole
Sounds redundant.
>>
>>108453951
Yes! All that's left is to get gemma 4 onto your GPU.
>>
>>108453951
thou shan't redeem le gemma
>>
I'm still laughing at the dog test from the previous thread.
Can't wait to be replaced my an LLM because I couldn't see the fifth leg on an image with four legs visible at best...
Will the labs start benchmaxxing on the Dog Shit Vision Test if we mention it enough times? Like with mesugaki?
>>
>>108453961
>You could also format it as "No.1" instead of square brackets and put a newline before the comments to make it more recognizable as 4chan threads to the models.
yup, I'll buy it.
>>
>>108454005
https://arxiv.org/html/2505.23941v1
imagine being this proud of being an ignorant pajeet coming to defend muh vision model lady
>>
>>108453951
>KV cache quantization
That's basically irrelevant. All the new models use so little memory for KV cache.
>>
>>108454053
>All the new models use so little memory for KV cache
even with the current efficiency gains there's no way we'll get to 1M locally without some aggressive form of quant
>>
>blacklist "guttural"
>model starts writing "gutteral"
>blacklist "gutteral"
>model starts writing "gutural"
Come on now these are not even words.
>>
>>108454079
The power of the embedding space.
>>
blacklist, antislop sampler, grammars, all of that was always a total cope. the LLM always wants to fit the square pattern into the round box and life finds a way.
>>
>>108454079
>bl*cklist
denylist*
>>
>>108454079
Why should only meatbags be allowed to make typos?
>>
File: file.png (54 KB, 860x581)
54 KB
54 KB PNG
>>108454034
Haha yeah. (What vision model lady?)
Anyway, if you ever wonder about the state of Israel (they are lobbied to hell and have no need to return anything), pic related is Israel controlling the United States.
>>
extremely organic posting
>>
File: 1749508287708464.png (63 KB, 1080x500)
63 KB
63 KB PNG
https://xcancel.com/arcprize/status/2036860080541589529#m
lawl
>>
>>108453951
So will I be able to run 72b instead of 12b in the near future
Yes I am stupid but answer the question please
>>
>>108454064
I just tried loading Qwen3.5 397B with yarn and it needs 31GB for 1M context. That's local.
>>
>>108454132
>moving goalposts
>>
>>108454133
It's just for the kvcache.
>>
>>108454132
>one thousand dollars
>0.035%
wew lads
>>
>>108454133
For that you want BitNet.
>>
>>108454132
This just in: random word generator bad at understanding 2d environments until it's added to the training data
>>
>>108454153
It's been two years already. Where are the 72b bitnets?
>>
>>108454132
> Thinking it is playing another game
> Holding on to early hypothesis
yup, that's vision models in a nutshell.
very big on assumptions, very very overfit to the image datasets they were trained on.
I'm glad a mainstream benchmark went this route, I bet if they ever do anything close to minor alteration of their benchmark it will keep throwing LLMs off and reveal that the emperor never had any clothes to begin with and all pretense of generalization were lies.
>>
it's not coming this week either, is it?
>>
>>108454191
big week for Gemma 4
>>
an employee leaked that the new deepshit would be much bigger than the previous, then removed his post on chinese social media
methinks all this ebegging for ds is going to turn into sour ewhining quick
>>
>>108454210
I'm a big boy I can handle it.
Also source?
>>
File: yeah right.png (346 KB, 3404x746)
346 KB
346 KB PNG
>>108454132
kek, get pwned Jensen
https://www.theverge.com/ai-artificial-intelligence/899086/jensen-huang-nvidia-agi
>>
>>108453699
Pytorch is actually mostly fine and stable, except for memory stats reporting on anything newer than the A series cards so as long as you can get stuff working there, you can get transformers and ComfyUI working, and easily hack up anything else to get a good experience. That's the benefit of going mainline over IPEX which was a hack in the first place so yeah, they started during Pytorch 2.5 and now at Pytorch 2.10, Intel's backend is pretty good there. However, anything lower level, your only real choices for GPU inference using multiple GPUs and https://github.com/intel/llm-scaler for their fork of vLLM or with mainline stuff, Vulkan with llama.cpp and other forks since the SYCL backend is half baked and OpenVIno is not mature and ipex-llm is abandoned since last year so is outdated for the newer models. ik_llama doesn't support SYCL as that was what caused the whole debacle in the first place and Vulkan is an afterthought there but I have no clue about the other forks but I think at least kobold.cpp works too. There's some stuff around OpenVino but none of it is really mature yet. That is the real issue with Intel which is lower level software where you really want to squeeze out the juice, it's not there. But I think for ComfyUI and other stuff on the Pytorch layer of things, it might be fine.
>>
>>108454210
2T or 3T?
I can almost run 2T at 1.X bit
>>
File: never ending loop.png (898 KB, 1080x1084)
898 KB
898 KB PNG
>>108454132
they will train their model on those test and say they reached AGI, then AGI 4 comes and destroys everything, and then they will train their model on...
>>
What's the best model to use as a Claude Code substitute that fits in 128GB?
>>
>>108454319
Minimax 2.5
>>
>>108453227
What boards did you choose to scrape?
>>
>>108454268
Such is the case with chasing benchmarks. Hopefully these companies don't just keep doing this for the rest of our lives, haha... lol...
>>
>>108451136
>>108450054
>>108451431
ok seriously guys you have to explain why Bitnet is bad I am using its techniques as a core component of my models
>>
>>108454331
/g/ /pol/ /sci/ /lit/ /his/ /tg/ /out/
do you have any recommendations, pol seems to move the fastest, it looks like I'm going to have to do some sampling so it doesn't dominate.
>>
>>108454347
They only need to keep up the charade for a few more quarters until they can cash out and let it collapse
>>
>>108454366
Depends on what you're trying to achieve
>>
>>108454364
some dude on discord said it'll never work because it makes training insanely expensive, source his asshole of course, and some here took that screenshot as gospel truth despite it contradicting the original paper itself
>>
>>108454381
on the other hand the paper is obviously gospel...
>>
>>108454381
as opposed to random /lmg/ dude and microsoft jeet saying it works when literally not a single soul in the industry is making a model with it, in an era where hardware costs are going up the wazoo, compute is being limited even in strong AI labs (qwen guys complained about lack of access to compute) and everyone would very much like a model compression technique that actually worked
you are all deluded ai psychotics, bitnet is the fruit of years of coping and zero production
>>
>>108454364
>I am using its techniques as a core component of my models
As in "bitnet" (ternary 2.x bit quantization) or a model trained that way like the actual bitnet?
>>
>>108454404
>qwen guys
even said they'd try bitnet at some point before qwen3
>>
File: 1518376309230.png (3 KB, 279x237)
3 KB
3 KB PNG
>>108453570
How do you even describe this bodytype to an AI without explicitly requesting it to generate loli porn?
>>
>>108454417
Clearly they tried and found it's shit
>>
>>108454434
really really wish they'd have openly said so, then we could have buried this meme for good
>>
>>108454393
The paper reported the training costs for actually training bitnet and full precision models as nearly equivalent. You are free to reproduce their experiments with a small model and make a name for yourself by calling Microsoft out for publishing fraudulent papers.
>>
>>108454434
yeah in this industry it's rare for people to openly call other people's work outright shit, if they gave it a try and it's shit the silent treatment is the most likely outcome.
>>
>>108454432
"Compact"
>>
>>108454432
This is /lmg/. Why is explicity requesting lolis an issue?
>>
>>108454442
Nah people would be like "real BitNet has never been tried"
>>
the bitnet meme will survive as long as jeets have access to the internet and dream of running a llm on their 50 bucks phone
>>
>>108454432
short and petite? height? small or nearly nonexistent tits? permanently stuck in pre-bloom? come on man
>>
>>108454366
>do you have any recommendations
Well would you be able to scrape the archive sites like Desu or NotArch? Then you wouldnt need an active userbase and can scrape years worth of activity.
Also I would get rid of /pol/, pretty sure almost all the posts over there are already bots.
>>
avocado blt rise up
>>
>>108454480
I'm waiting for the news any day now that Meta is cancelling Avocado and firing everyone involved
>>
>>108454452
I was more puzzled by how one is able to request the absolute embodiment of sex that this bodytype is from a generative model without invoking any sexual connotations when formulating a prompt.
>>
Cloud models are down but I don't know if we can benefit from this somehow.
Why aren't we benefitting?
>>
>>108454372
idk really, it was a bit of a lark, I was reading a thread and thought it would have been good training data, so I thought I'd see how hard it was to scrape 4chan, and it turns out they have a really simple and free API so it actually was way easier then expected. so I got that board list by asking claude what boards are more text driven, given the source is an image board and the target is a fucking text model it seemed likely to be the main consideration, but idk I still need to do some test shots on those images and see if I can get a model to annotate them accurately and quickly enough. even if i don't annotate every image on every thread, a few here and there might still be useful training data. I'm not trying to achieve anything specific really, I don't expect the data to improve any benchmarks. its just a little bit of fun I guess, see what happens.
>>
>>108454477
If he goes for archive sites, he can keep pre-2016 /pol/
>>
>>108453575
don't look at me like that, it makes me hard
>>
>>108454504
>ollama
>>
>>108453575
delicious tummy and plump thighs is back
>>
>>108454502
Are you proposing a creative writing exercise or are you unaware of diffusion models trained on booru tags?
>>
>>108454366
/v/ despite the fact that it is a videogame board practically everything gets discussed there at one point or another.
>>
>>108454079
I vaguely remember a model writing sory when I banned sorry. When it really wants to say something, it will try to find a way.
>>
the current spam of irrelevant garbage is why we should ask mossad to kill brittle
>>
>>108454504
>no pantyshot
dude wtf
>>
>>108454366
>pol seems to move the fastest
just add a +100 logit bias to the word jew, same result
>>
>>108454512
I just checked the two sites mentioned, desu and notarch, neither has it. does anyone archive /pol/?
>>108454532
I added it to the list
>>
deepseek ocr just got merged in llmao
ocr2 soon
>>
>>108454563
4plebs and archived.moe
>>
what's the pro of ocr vs image to text
>>
Lmao.
>>
>>108454408
I'm training a tiny model exactly like this yes
>>108454458
>>108454442
>>108454447
>>108454434
>>108454417
>>108454404
>>108454393
>>108454381
thank you everyone for the conversation, I'll be stopping the training promptly then shooting myself in the face. After that I will just train a tiny qwen3 and stop thinking that I'm smart.
>>
>>108453929
>random religionfag manifesting into the imaginary thread
Yeah, thats about right, alongside the pol termites that need to chew their way into any space possible because even they find their board intolerable
>>
>>108454528
Just playing dumb and daydreaming of more tangible labels than just artist names I guess.
>>
>>108454613
>I'm training a tiny model exactly like this yes
That's sick. Keep going.
>>
>>108454582
Image to text is OCR (Optical Character Recognition). I'm not entirely sure what you're asking.
If you wan to figure out the layout of a book or a site, a graph, whatever, you need more than just the text. Now OCR is used more generally to say "The model kind of understands images", which includes translating text in images to text.
>>
>>108454638
so the "ocr" in ds is the same capability as the image to text in qwen 3.5 for example?
if I show it something outside of text, will it be able to describe it?
>>
>>108454635
it's not actually because it has been performing worse than a fucking bitmamba terniary model I've been trying, and that shit was made by a Brazilian. It's been driving me nuts for a week I thought I was doing something wrong. That's why when that anon mentioned Bitnet being ass I wanted to know why
>>
>>108454653
Can you do a write up of exactly what you tried and how badly it performed? Either someone might be able to help you fix it, or it might shut people up asking about bitnet every other thread.
>>
>>108454645
>if I show it something outside of text, will it be able to describe it?
I'm not sure. But if it does, it's probably fairly limited. Seems like an experimental model. llama.cpp just committed compatibility for DS's first OCR model. Give it a go. It's a small model.
https://github.com/ggml-org/llama.cpp/pull/17400
The one for DSOCR2 is being worked on as well.
https://github.com/ggml-org/llama.cpp/pull/20975
>>
>>108454709
thanks anon, that's what I meant then, "ocr" is specifically to decode text, not generally do image to text
>>
PSA: Save your cum. This week will be huge.
>>
>>108454757
>This week will be huge.
Longer than 7 days? Damn. How things change.
>>
>>108454757
Are you saying... it is going to cost as much as RAM?
I better start stockpiling. Thank you insider-anon!
>>
>>108454576
that'll work, the API is less generous but they have data dumps, I can just download the full thing. text is only 89gb

https://archive.org/details/4plebs-org-data-dump-2026-01
>>
>>108454829
I lied, its actually broken out so its even easier to manage. I think I might download /x/ too, could be fun.
>>
https://www.claudescode.dev/?window=since_launch
you can see some pretty funny instances of ai psychosis in the projects with the most commits and most lines added.
https://github.com/synaptent/aragora?tab=readme-ov-file
>Individual LLMs are unreliable. Their personas shift with context, their confidence does not correlate with accuracy, and they often optimize for plausible agreement instead of truth.
>Aragora treats that as a systems problem. It coordinates heterogeneous models through structured debate and review, preserves receipts and provenance, and stops truthfully when evidence is insufficient. The goal is not just faster AI output, but governed AI-assisted execution you can actually inspect.
>>
>>108453684
judeo*
>>
>>108454875
>pure vibecoded repo
yikes
>>
File: tq.png (198 KB, 1100x821)
198 KB
198 KB PNG
vulkan: add TQ3_0 (TurboQuant) 3.5-bit KV cache quantization
https://github.com/ggml-org/llama.cpp/pull/21010
>>
>>108454929
shutdown already lol
>>
>>108454929
Aaaaand, it's closed.
>>
if only wilkin received the same treatment.
>>
>>108454504
stop posting these they're retarded.
>>
>>108453813
V100s are EOL with key features missing and R9700s still don't have good Pytorch support with issues like https://github.com/ROCm/ROCm/issues/5674 and https://github.com/ROCm/ROCm/issues/6007 still unresolved months after the fact. People will gamble on the B70 because of that.
>>
so what's the current meta for ERP is it STILL Mistral Nemo ?
>>
Is cloode down?
>>
>>108451495
Sure, thought it's half vibecoded and then fixed by me without cleaning or anything, so I should clean it up a bit. When I release it Ill send the github link here.
>>
File: file.png (123 KB, 1386x515)
123 KB
123 KB PNG
>https://arxiv.org/pdf/2603.19664
Isn't this even better than cache quantization?
>>
>>108455122
>We verify this empirically: under greedy decoding, generating 30 tokens with and without the cache yields 100% token identity across all six models tested (four architecture families, 135M to 4B parameters).
This testing is not enough, why only 30 tokens? Why not 128 to 1024 tokens? We need to see if the correlation holds the more tokens you generate.
>>
>>108455122
Sounds too good to be true, also
>x is all you need
Gay. But if it's real then I'd like to see it applied to a real usable model
>>
>>108455147
Because it won't be relevant for us if it's 1000 tokens per second because the average person ain't rich
>>
>c.ai dead
>Claude dead
>OpenRouter banning users left and right
Local WONNED
>>
>>108455199
nta, but what the fuck are you on about?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.