/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108447705 & >>108441758►News>(03/24) GigaChat 3.1 released: https://hf.co/collections/ai-sage/gigachat-31>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108447705--Vision models failing deformity edge cases:>108452331 >108452376 >108452412 >108452429 >108452466 >108452484 >108452523 >108452546 >108452616 >108452385 >108452409 >108452509 >108452607 >108452626 >108452841 >108452845 >108452849 >108452867--Xeon 6 LLM inference benchmarks debated over AMX optimization gaps:>108448422 >108448451 >108448886 >108449507 >108451237 >108452095 >108450210 >108452136--Nvidia Nemotron reasoning challenge puzzles:>108448817 >108448837 >108448859 >108449204 >108449216 >108448873 >108448945--Direct-io PR discussion and gemma3 loading failures:>108451404 >108451435 >108451499 >108451511 >108451525 >108451530 >108451534 >108451515--Skepticism toward TurboQuant's 2-bit quantization claims:>108450002 >108450011 >108450054 >108451136 >108450065 >108451386 >108450294--Qwen 3.5's niche use cases and performance tradeoffs debated:>108450432 >108450443 >108450488 >108450499 >108450517 >108450519 >108450534 >108450554 >108450571 >108450589 >108450599 >108450615 >108450634 >108452303--Optimal context window sizes for coding tasks:>108451293 >108451325 >108451838 >108451330 >108451306 >108451406 >108451432--Exploring LLM integration for dynamic NPC interactions in ASCII games:>108447855 >108447871 >108447952 >108447980 >108448029 >108448043 >108448103 >108448045 >108448058--TurboQuant claims 6x memory reduction and 8x speedup with zero accuracy loss:>108451313 >108451431 >108451594 >108451872--PocketTTS.cpp achieves GPU-like CPU inference speeds:>108451512 >108451553 >108451556 >108451562--GigaChat-3.1-Ultra Russian model released with DeepSeek architecture:>108448539 >108448567--Step3.5 MTP support PR for llama.cpp:>108450936 >108451133 >108451275--Miku, Luka, and Dipsy (free space):>108450983 >108452704 >108448061 >108452647 >108452749►Recent Highlight Posts from the Previous Thread: >>108447707Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
uoh yeah
>>108453575cum in between her thighs
4 boobs proove the dog test was ass
You are going to support Israeli-American chip design by buying the Intel® Arc™ Pro B70, aren't you?
>>108453655If the cost per GB of VRAM is good, yes.Otherwise, no.
>>108453652this is horrific, jesus
>>108453655*Judeo-Christian chip design
>>108453655Does Intel have their own thing like zluda yet?
>>108453655repdill me on b70. i hear their software stack is the real issue. just how bad is it and feasibly how long might it take for to catch up?
>>108453570artist?
>>108453733noobai piloted by the local autist
Outside of stepfun which I’m too poor to run shit feels pretty stagnant of lateShoulda went for the 128gb sticks after all
>>108453655No
>>108453755Cheaper than a 5090.
>>108453794yeah..
>>108453794It's not competing with 5090s though, it's competing with V100s or AMD R9700s.
this was the least noisy format I could come up with for formatting 4chan threads. I just re-serialized the id's and skipped the names and dates. is this good enough?
Will my gemma-4 27B be very light on my gpu?
>>108453929It's clean and should be fine. You could also format it as "No.1" instead of square brackets and put a newline before the comments to make it more recognizable as 4chan threads to the models.
>>108453570>cunny holding onaholeSounds redundant.
>>108453951Yes! All that's left is to get gemma 4 onto your GPU.
>>108453951thou shan't redeem le gemma
I'm still laughing at the dog test from the previous thread.Can't wait to be replaced my an LLM because I couldn't see the fifth leg on an image with four legs visible at best...Will the labs start benchmaxxing on the Dog Shit Vision Test if we mention it enough times? Like with mesugaki?
>>108453961>You could also format it as "No.1" instead of square brackets and put a newline before the comments to make it more recognizable as 4chan threads to the models.yup, I'll buy it.
>>108454005https://arxiv.org/html/2505.23941v1imagine being this proud of being an ignorant pajeet coming to defend muh vision model lady
>>108453951>KV cache quantizationThat's basically irrelevant. All the new models use so little memory for KV cache.
>>108454053>All the new models use so little memory for KV cacheeven with the current efficiency gains there's no way we'll get to 1M locally without some aggressive form of quant
>blacklist "guttural">model starts writing "gutteral">blacklist "gutteral">model starts writing "gutural"Come on now these are not even words.
>>108454079The power of the embedding space.
blacklist, antislop sampler, grammars, all of that was always a total cope. the LLM always wants to fit the square pattern into the round box and life finds a way.
>>108454079>bl*cklistdenylist*
>>108454079Why should only meatbags be allowed to make typos?
>>108454034Haha yeah. (What vision model lady?)Anyway, if you ever wonder about the state of Israel (they are lobbied to hell and have no need to return anything), pic related is Israel controlling the United States.
extremely organic posting
https://xcancel.com/arcprize/status/2036860080541589529#mlawl
>>108453951So will I be able to run 72b instead of 12b in the near futureYes I am stupid but answer the question please
>>108454064I just tried loading Qwen3.5 397B with yarn and it needs 31GB for 1M context. That's local.
>>108454132>moving goalposts
>>108454133It's just for the kvcache.
>>108454132>one thousand dollars>0.035%wew lads
>>108454133For that you want BitNet.
>>108454132This just in: random word generator bad at understanding 2d environments until it's added to the training data
>>108454153It's been two years already. Where are the 72b bitnets?
>>108454132> Thinking it is playing another game> Holding on to early hypothesisyup, that's vision models in a nutshell.very big on assumptions, very very overfit to the image datasets they were trained on.I'm glad a mainstream benchmark went this route, I bet if they ever do anything close to minor alteration of their benchmark it will keep throwing LLMs off and reveal that the emperor never had any clothes to begin with and all pretense of generalization were lies.
it's not coming this week either, is it?
>>108454191big week for Gemma 4
an employee leaked that the new deepshit would be much bigger than the previous, then removed his post on chinese social mediamethinks all this ebegging for ds is going to turn into sour ewhining quick
>>108454210I'm a big boy I can handle it.Also source?
>>108454132kek, get pwned Jensenhttps://www.theverge.com/ai-artificial-intelligence/899086/jensen-huang-nvidia-agi
>>108453699Pytorch is actually mostly fine and stable, except for memory stats reporting on anything newer than the A series cards so as long as you can get stuff working there, you can get transformers and ComfyUI working, and easily hack up anything else to get a good experience. That's the benefit of going mainline over IPEX which was a hack in the first place so yeah, they started during Pytorch 2.5 and now at Pytorch 2.10, Intel's backend is pretty good there. However, anything lower level, your only real choices for GPU inference using multiple GPUs and https://github.com/intel/llm-scaler for their fork of vLLM or with mainline stuff, Vulkan with llama.cpp and other forks since the SYCL backend is half baked and OpenVIno is not mature and ipex-llm is abandoned since last year so is outdated for the newer models. ik_llama doesn't support SYCL as that was what caused the whole debacle in the first place and Vulkan is an afterthought there but I have no clue about the other forks but I think at least kobold.cpp works too. There's some stuff around OpenVino but none of it is really mature yet. That is the real issue with Intel which is lower level software where you really want to squeeze out the juice, it's not there. But I think for ComfyUI and other stuff on the Pytorch layer of things, it might be fine.
>>1084542102T or 3T?I can almost run 2T at 1.X bit
>>108454132they will train their model on those test and say they reached AGI, then AGI 4 comes and destroys everything, and then they will train their model on...
What's the best model to use as a Claude Code substitute that fits in 128GB?
>>108454319Minimax 2.5
>>108453227What boards did you choose to scrape?
>>108454268Such is the case with chasing benchmarks. Hopefully these companies don't just keep doing this for the rest of our lives, haha... lol...
>>108451136>>108450054>>108451431ok seriously guys you have to explain why Bitnet is bad I am using its techniques as a core component of my models
>>108454331/g/ /pol/ /sci/ /lit/ /his/ /tg/ /out/do you have any recommendations, pol seems to move the fastest, it looks like I'm going to have to do some sampling so it doesn't dominate.
>>108454347They only need to keep up the charade for a few more quarters until they can cash out and let it collapse
>>108454366Depends on what you're trying to achieve
>>108454364some dude on discord said it'll never work because it makes training insanely expensive, source his asshole of course, and some here took that screenshot as gospel truth despite it contradicting the original paper itself
>>108454381on the other hand the paper is obviously gospel...
>>108454381as opposed to random /lmg/ dude and microsoft jeet saying it works when literally not a single soul in the industry is making a model with it, in an era where hardware costs are going up the wazoo, compute is being limited even in strong AI labs (qwen guys complained about lack of access to compute) and everyone would very much like a model compression technique that actually workedyou are all deluded ai psychotics, bitnet is the fruit of years of coping and zero production
>>108454364>I am using its techniques as a core component of my modelsAs in "bitnet" (ternary 2.x bit quantization) or a model trained that way like the actual bitnet?
>>108454404>qwen guyseven said they'd try bitnet at some point before qwen3
>>108453570How do you even describe this bodytype to an AI without explicitly requesting it to generate loli porn?
>>108454417Clearly they tried and found it's shit
>>108454434really really wish they'd have openly said so, then we could have buried this meme for good
>>108454393The paper reported the training costs for actually training bitnet and full precision models as nearly equivalent. You are free to reproduce their experiments with a small model and make a name for yourself by calling Microsoft out for publishing fraudulent papers.
>>108454434yeah in this industry it's rare for people to openly call other people's work outright shit, if they gave it a try and it's shit the silent treatment is the most likely outcome.
>>108454432"Compact"
>>108454432This is /lmg/. Why is explicity requesting lolis an issue?
>>108454442Nah people would be like "real BitNet has never been tried"
the bitnet meme will survive as long as jeets have access to the internet and dream of running a llm on their 50 bucks phone
>>108454432short and petite? height? small or nearly nonexistent tits? permanently stuck in pre-bloom? come on man
>>108454366>do you have any recommendationsWell would you be able to scrape the archive sites like Desu or NotArch? Then you wouldnt need an active userbase and can scrape years worth of activity.Also I would get rid of /pol/, pretty sure almost all the posts over there are already bots.
avocado blt rise up
>>108454480I'm waiting for the news any day now that Meta is cancelling Avocado and firing everyone involved
>>108454452I was more puzzled by how one is able to request the absolute embodiment of sex that this bodytype is from a generative model without invoking any sexual connotations when formulating a prompt.
Cloud models are down but I don't know if we can benefit from this somehow.Why aren't we benefitting?
>>108454372idk really, it was a bit of a lark, I was reading a thread and thought it would have been good training data, so I thought I'd see how hard it was to scrape 4chan, and it turns out they have a really simple and free API so it actually was way easier then expected. so I got that board list by asking claude what boards are more text driven, given the source is an image board and the target is a fucking text model it seemed likely to be the main consideration, but idk I still need to do some test shots on those images and see if I can get a model to annotate them accurately and quickly enough. even if i don't annotate every image on every thread, a few here and there might still be useful training data. I'm not trying to achieve anything specific really, I don't expect the data to improve any benchmarks. its just a little bit of fun I guess, see what happens.
>>108454477If he goes for archive sites, he can keep pre-2016 /pol/
>>108453575don't look at me like that, it makes me hard
>>108454504>ollama
>>108453575delicious tummy and plump thighs is back
>>108454502Are you proposing a creative writing exercise or are you unaware of diffusion models trained on booru tags?
>>108454366/v/ despite the fact that it is a videogame board practically everything gets discussed there at one point or another.
>>108454079I vaguely remember a model writing sory when I banned sorry. When it really wants to say something, it will try to find a way.
the current spam of irrelevant garbage is why we should ask mossad to kill brittle
>>108454504>no pantyshotdude wtf
>>108454366>pol seems to move the fastestjust add a +100 logit bias to the word jew, same result