[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: ai_server.jpg (1.23 MB, 1821x1490)
1.23 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108742275 & >>108736046

►News
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1
>(04/28) Ling-2.6-flash 104B-A7.4B released: https://hf.co/inclusionAI/Ling-2.6-flash
>(04/28) Nvidia releases Nemotron 3 Nano Omni: https://hf.co/blog/nvidia/nemotron-3-nano-omni-multimodal-intelligence

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: ComfyUI_00182_.png (2.07 MB, 1536x1536)
2.07 MB PNG
►Recent Highlights from the Previous Thread: >>108742275

--Optimizing dual RTX 3090 setup for Gemma 4 with speculative decoding:
>108743090 >108743104 >108743131 >108743530 >108743551 >108743361 >108743224 >108743589 >108743643 >108743838 >108743866 >108744035 >108744070 >108744217 >108744276 >108746395 >108744193
--Encoding images in text completion using llama.cpp server settings:
>108745965 >108745986 >108745995 >108746023 >108746145 >108746162 >108746153 >108746191
--llama.cpp PR adding DFlash support for speculative decoding:
>108743701 >108743736 >108743776 >108743804 >108743909 >108743958 >108743768 >108744535
--Qwen 2-bit performance testing with Canvas API image recreation:
>108743817 >108743856 >108743868 >108743922 >108743957 >108743946 >108743951
--Critique of modern AI's lack of initiative in roleplay:
>108743944 >108744192 >108745177 >108745200 >108745218 >108745240 >108745279 >108745235
--Gemma's logical inference of scenario cards and early RP LLM formats:
>108742385 >108742425 >108742466 >108742480 >108742499 >108742505 >108742558 >108742629 >108742653 >108742651 >108742666 >108742729 >108742594 >108742599 >108742936
--Comparing AI agents to manual copy-pasting for coding productivity:
>108746398 >108746438 >108746461 >108746727 >108746980 >108746758
--Repetitive "Let me write" reasoning loops in various models:
>108744796 >108744899 >108744920 >108744927 >108745667
--Debating the necessity of jinja templates over raw prompt formatting:
>108747743 >108747755 >108747803 >108747912 >108748008
--LLM and inference engine embedded within a .ttf font file:
>108743927 >108744507
--Inconsistency of Gemma 4's refusal vectors and censorship levels:
>108742306 >108742365 >108742490 >108742379 >108743700
--Logs:
>108742558 >108742594 >108743868 >108744345 >108744796 >108746216 >108747024
--Miku (free space):
>108746641 >108747847

►Recent Highlight Posts from the Previous Thread: >>108743862

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Give Dipsy support, llamaniggers.
>>
>>108749305
>exl
can you even run this in llama/kobold? isn't it appleshit?
>>
gemmaballz
>>
>>108749417
no, youre thinking of mlx. exl3 is for tabbyAPI, which outputs an openai compatible API just like kobold and llama
https://github.com/theroyallab/tabbyAPI
>>
Lecun was wrong.
>>
>>108749467
Can you be wrong if you literally haven't done anything?
>>
>>108749467
I wish.
I want my local cat intelligence already...
>>
>>108749477
vjepa says hello
>>
File: 1752075319997188.png (795 KB, 677x609)
795 KB PNG
>>108749486
Holy shit it's really becoming a cat
>>
Remember when Netflix of all companies made JEPA real and nobody cared?
https://huggingface.co/netflix/void-model
>>
>>108747912
>If you aren't doing rag or tool calling I can't imagine a prompt template more complicated than what you do in ST.
but no image support on v1/completions
is that a llama.cpp only limitation?
>>
>>108749548
If you want to use images or other types of data in text completion mode, use the /completion endpoint and send the images or audio encoded as base64 in "multimodal_data", they'll be tokenized and put in your prompt wherever you place the media markers:
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#post-completion-given-a-prompt-it-returns-the-predicted-completion

That's how you'd do it if you were coding your own frondend anyway, not sure if SillyTavern does or supports this sort of thing.
>>
>looks at you like you are a bug under a microscope
>a stain on the carpet
>a gum stuck to her shoe
>a specimen
What's the name for this slop phrase and how do I prompt it away?
>>
>>108749617
what’s wrong with similes?
>>
>>108749642
I already told it to remove similes, it's this specific one that doesn't go away. AI just can't express cold, indifferent characters without injecting this phrase.
>>
>>108749649
>>108749617
metaphors?
>>
>>108749649
yeah but, like, isn’t that good writing?
>>
>>108749649
try a regex to strip it from the gen message/history going forward, so the model can gen it, but it gets ripped out
>>
>>108749642
She looks you and Riley similes.
>>
>>108749658
Broccoli is good for you. Therefore you should eat 10 pounds of it every day.
>>
>>108749678
holy THIS
>>
What would be the ideal optical disc to store day 0 gemma weights on if I want them immutable? I thought we'd have some TB-scale discs by now but it looks like everyone just gave up after blu-ray? Did the guy who invents new discs die or something?
>>
A little birdie told me that openai is getting ready to release gpt-oss 2 and it's going to be a huge shift.
>>
>>108749808
I am expecting that given the case with Musk. Hopefully it's less censored this time.
>>
>>108749697
>optical pleb
https://en.wikipedia.org/wiki/Linear_Tape-Open
>>
>>108749808
more like huge shit
>>
>>108749824
>the case with Musk
What did I miss?
>>
>>108749907
>What did I miss?
tl;dr musk has been calling saltman on his bullshit for the last 2 years (ie. turning openai into closedai and pillaging it)
>>
>>108749697
>if I want them immutable?
just set the immutable bit kiddo
>>
>>108749697
You'll have to find old floppies and move the little tab. There's no other way.
>>
>>108749959
??? But gemma makes me ERECT how can she fit on a FLOPPY disk?
>>
fyi everyone, ace step 1.5 xl is out and really pushes things up in terms of local music gen. aka suno at home.
>>
>>108749988
You said this last week, and it was shit last week. It'll be shit this week too.
>>
File: 1775829778315160.png (303 KB, 825x642)
303 KB PNG
>>108749808
>it's going to be a huge shift.
leaked image of gpt-oss-2
>>
>put porn collection into rag
>track fap and reviews with llm chats daily
>suggest porn for me today
why not?
>>
>>108750021
^ that's a bot, by the way.

Just testing the bots, as you were gents.
>>
>>108750074
hello bot-chama
>>
>>108750030
SLAM THAT BAD BOY UNTIL YOUR KEYS ARE ALL STICKY
>>
>>108749398
i was still using sonnet for opencode but since qwen 3.6 i'm now not using any cloud models any more.
>>
>>108749398
i was still using sonnet for opencode but since qwen 3.6 i'm now not using any cloud models any more.
>>
>>108750096
>>108750097
>>108750098
>>108750099
>>108750100
>>>/vg/vn
>>
>>108750110
yea i think it glitch i didn't mean to post it 5 times and that's why i deleted most of them.
>>
>>108750114
no worries, I'm glad you left 2. it's always smart to have a backup :)
>>
>be /lmg/
>cry about moes all day
>mistral releases big dense model
>no talk about it
y'all are a bunch of hypocrites
>>
ace step 1.5 base xl, not using the "thinking" 5hz lm, but using the lm to set the tempo etc.

https://files.catbox.moe/9lz9tp.mp3
>>
>>108750138
Because mistral is EUROPEAN and the EUROPEAN laws PROHIBITS ai companies from making GOOD ai, so mistral medium is BAD!
>>
>>108750138
Why talk about it when even Mistral doesn't pretend that it's good. It's only selling point is that it uses an ancient backbone so the flops are below EU regulation limits.
>>
>>108750152
We need to protect the EVROPEAN VALUES!
>>
>>108750138
Almost all of those posts were bait.
Maybe even your post is bait.
>>
>>108750138
>cry about moes all day
Moe vs dense is a meme fight, what matters is good models regardless of architecture
Mistral did not release a good model. They haven't for years
>>
>>108750138
bart just put his quants up a few hours ago. i'm not dumb enough to download the first unsloth uploads
>>
>gpt oss 2
God I can't wait desu. Look Qwen isn't the worst thing ever but if it's going to be censored and STEMmaxxed, I'd wish for it to be a bit more reliable and not so heavy on the repetitive redundant looping thoughts.
>>
File: file.png (1.4 MB, 1024x768)
1.4 MB PNG
eggu
>>
File: file.png (93 KB, 1128x456)
93 KB PNG
>>108749933
Altman only needed one release and that's all we got. Why would he release another one especially when it jeopardizes their resources which is already constrained and they were losing to Anthropic until recently? Sure, it would be nice to get another open source release but I doubt it will be done because Musk is going to lose quite badly on the lawsuit as unfortunate as it will be nice to see both of them get taken down a peg. Also, if they do it, it would obsolete their mini and nano release guaranteed on meme marks. I bet it is why they waited for a while after releasing GPT-5 with mini models before they attempted to do a mini series again. I think this is all that we'll see of it until 2030/2031.
>>
File: file.png (1.47 MB, 1024x768)
1.47 MB PNG
>>
>>108750244
>not 7 and i holdings
ngmi
>>
File: file.png (1.82 MB, 1024x768)
1.82 MB PNG
>>
>>108750141
Here's your udio/suno at home:
https://files.catbox.moe/61cwlr.mp3
singing the news! FRESH news! :^)

It sounds so optimistic. Maybe news should always be delivered this way.
>>
>>108749988
>>108750141
>>108750275
What's the USE CASE? When TWO point FIVE percent of the population is DEAF? They need to focus on TEXT, which EVERYONE can read!
>>
>>108750141
how many steps for base?
>>
>>108750298
I did 100, because acestep.cpp is capped at 100 for no apparent reason.
>>
>>108750298
and, I'm using dcw at .001 / .001

if you use the audio codes aka 5hz lm, you will get more squared up and more consistent results, but it's less dynamic.
>>
>>108749467
It's just that JEPA as intended by LeCun won't work with text in any useful or even meaningful manner. It does with images-video, and that's what he's pushing, but people interface with computers and other people primarily via language and its rules.

Planning text with video would be astronomically inefficient, so we can shove that idea aside. Predicting sentence embeddings from other sentence embeddings in order to convert them to intelligible text (the closest thing to an actual text version of JEPA) doesn't even work as there would be no intelligible solution to interpolate between two continuous text embeddings (unlike image frames).

LeCun lost the plot, no wonder he basically got kicked out of Meta.
>>
File: IMG-20260328-WA0005(4).jpg (882 KB, 832x1248)
882 KB JPG
Can You Invest in Cultured and Perfected These? And Have Your System not stall it from Pharmacy andor Prescription?
>>
>>108750287
Uhh, sweaty? Blind people can't read. And no, they don't automatically know Braille.
>>
>>108750122
i wanted to delete all of them but 4chan went "muh you are deleting posts too fast".
>>
>>108750346
BLIND people can just use speed to text ACCESSIBILITY readers
>>
>uncensored-heretic-abliterated-turboquant-opus-distilled-nvfp4-gguf
>>
I'm pretty new to this, am I doing this right?

I downloaded gemma-4-31b-it-the-deckard-heretic-uncensored-thinking-i1 and lm studio, using a 3090. Feels like I can say like 8-10 things to it before my context length fills up and I have to get rid of earlier messages. And because the model is as big as it is, my vram is near capacity so I can't increase it any further. If I have stuff in my system prompt, I have even less context length to work with, and system prompt is the thing you use to have any consistency and keep a somewhat history right? I feel like I'm doing something wrong here.
>>
>>108750366
>one (1) singular 3090
:skull: bruh
>>
>>108750383
I have a degree in computer science and the year is 2026. Do you think I could afford something better?
>>
>>108750366
You can try to QUANTIZE the kv cache, but GEMMA 4 31B is quite HEAVY with its kv cache memory footprint, especially on a SINGLE 3090. Have you tried using LLAMA-CPP to keep the kv cache on CPU ram instead of the gpu ram? It will be SLOWER though.
>>
>>108750366
drag context up anyways, deal with slower speed.
>>
>>108750366
Are you already using the llama.cpp argument --parallel 1
?
>>
>>108750366
llama-server.exe --model "H:\gemma4\gemma-4-31B-it-Q4_K_S.gguf" --parallel 1 --kv-unified --threads 8 --ctx-size 43000 --n-gpu-layers 99 --no-mmap --port 8080 --jinja -b 4096 -np 1 --swa-checkpoints 3 --reasoning off --override-kv gemma4.final_logit_softcapping=float:25.0
>>
is there real use case that you need gemma 31b over 26b a4b?
>>
>>108750392
>>108750399
>>108750407
>>108750424
I needed to throw these posts at gemini and have it explain these to me, but I think I got the jist of it. KV cache is now quantized at q8_0, parallel is now set to 1, threads to 8, and context length is now 16k. Still a few things to look through, but it's looking much better so far. Thanks a ton.
>>
>>108750508
Yes. I will not elaborate.
>>
>>108750510
31b q4km with q8 kv can do 25k context on a 24gb car. At least that's what I'm running.
Alternatively you can use the moe with 100k context and 4 times the speed.
>>
>>108750508
The use case is that you want higher quality outputs and you can run the 31b at a usable speed.
>>
>>108750518
>31b q4km with q8 kv can do 25k context on a 24gb
You can actually do 40k context with that quant and KV=Q8
>>
>>108750529
Yeah, I could minmax a bit more but then I have to close everything else and that kinda sucks.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.