[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1743082399851642.png (160 KB, 606x529)
160 KB
160 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108278008


►News
>(02/24) Introducing the Qwen 3.5 Medium Model Series: https://xcancel.com/Alibaba_Qwen/status/2026339351530188939
>(02/24) Liquid AI releases LFM2-24B-A2B: https://hf.co/LiquidAI/LFM2-24B-A2B
>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759
>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B
>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
migu :3
>>
>>108281695
sex
>>
what's the best model ever?
>>
>>108281704
the original pre-lobotomy c.ai model is still unmatched in terms of pure soul
>>
>>108281704
davinci 003 for writing stories, this shit was absolutely insane
>>
>>108281699
fuck yeah, glad to see Nitro+ XTX homies
what model you running right now
>>
>>108281730
link?
>>
>>108281737
it's not avaiable anymore, OpenAI nuked it
>>
>>108281737
Dead, that's why we need local models.
>>
File: 1762498714987760.jpg (291 KB, 1032x1536)
291 KB
291 KB JPG
>>108281695
>>108281688
>>
>>108281704
it's a five way tie between summer dragon, OG c.ai, mythomax, goliath, and midnight miqu
>>
>>108281764
yet you don't use any of those, curious
>>
>>108281771
when you "assume" it makes an "ass" of "u" and "me"
>>
just woke up from my 12 hour coma
is qwen3.5 122b the new glm 4.5 air
>>
>>108281794
Prove him wrong?
>>
File: 47ywzmHFQp.png (356 KB, 1369x1324)
356 KB
356 KB PNG
what you think? he pay or he pray?
>>
File: pokemon.jpg (166 KB, 872x1344)
166 KB
166 KB JPG
Tangential to /lmg/, but still pretty funny.
>>
>>108281804
can you post pic i wanna try
>>
>>108281804
cuckgpt
>>
File: 1758966336880611.png (150 KB, 287x307)
150 KB
150 KB PNG
>all this time later
>still no actual pixelspace, VAEless image edit model
>still no big, good omnimodal models that can generate images in chat
>still no big, good, natively multimodal models that "see" the image fully and properly
>still no real time voice conversation that you can have with the big, good models where they will also understand how you said something not just what you said
>still no basic real time 3d/2d avatars
>still no easy way to perfectly loop any image into an idle animation with ltx2/wan2.2
>still no good image 2 3d model
>ltx2 i2v still subpar
>even biggest models still get suck on things, still can hallucinate hard
>still no solved, just works, RAG
>still no solved, just works, internet search with something like searXNG
>still no just actually works browser usage
>MCP clients are still spotty, especially paired with spotty tool calling
>still no 1mil perfect context
>still no 3-10mil ok context
>still no infinite context
>still no 1T params 1b active SSDmaxxer model
and hundreds of more things

at least most big models are generally very good now and actually good enough to, with some help, vibecoood most actual projects you want
at least early moeGODS and ramCHADS won
at least z image turbo came out and was a huge leap in multiple big directions, basically solved resolution, almost solved out of the box realism (centered around portraits), huge speed boost
at least ltx2 came out and was a big turn towards faster genning, getting out of 5s hell, getting out of 720p hell, getting out of no audio hell
at least the great seedance 2.0 came out to be distilled by ltx3 or some other company this or next year
at least genie 3 showed that proper 3d space memory can be solved

everything can and will be solved but the lack of some more basic but important things like pixelspace image edit models or at least a basic 14-32b native speech2speech LLMs seems interesting.
>>
>>108281813
tldr?
>>
File: v7i0mvczmhoa1[1].png (245 KB, 345x614)
245 KB
245 KB PNG
>>108281811
Got the pic from /v/, but I believe it's
>pic related
>>
>>108281813
gpt 5.4 checks a few of those
>>
I can run Qwen 27B at 1-1.5 token/s or Qwen 35B-A3B at 15 tokens/s.
>>
>>108281829
gpt 5.4 doesnt exist
>>
File: 1767422027240447.png (600 KB, 1186x1140)
600 KB
600 KB PNG
>>108281825
>>108281804
werks on my machine i guess
>>
>>108281688
https://www.stephendiehl.com/posts/computer_algebra_mcp/

when tf will they add mcp support to llama,cpp aaah. any program recs?
>>
>>108281838
I imagine that there's a whole chat context we don't see that probably steered the model towards that sort of response.
>>
Hello fellow anons. I need help with my qwen 3.5 27B Q5_K_M. its for some reason not thinking with each response its maybe 50% of the time and i have to retry the response to get it to think really annoying. im using koboldcpp btw is that the best backend? used previously ooga but it seems dead.
>>
>>108281704
Me.
>>
File: 1743998171556692.png (131 KB, 2048x286)
131 KB
131 KB PNG
local sisters every time we start getting and edge the corpos fuck us in the ass, you are telling me they already have 5.4 sitting on a shelf?
>>
Do people nowadays care if a model works with context-shifting or not?
>>
>>108281877
>context-shifting
qrd
>>
>>108281877
Yes.
When you send a bunch of requests to the model with just the last message changing, that shit is really useful.
>>
>>108281879
Its a feature in llamacpp/koboldcpp that allows circumventing reprocessing of the whole context once you reach the max context you have set.
>>
>>108281884
Qwen thinks otherwise it seems.
>>
>>108281897
no i dont
>>
>>108281902
are you Qwen?
>>
>>108281897
You mean how llama.cpp can't do kv shifting with smm models?
That'll probably get fixed eventually.
Probably.
Eventually.
>>
>>108281804
Every single time I read chatgpt's output I want to kys myself and do an hero.
>>
>>108281907
No, that's a rnn issue, and it can't be fixed. if you remove a single token from the start you have to reprocess everything.
>>
>>108281913
says the lobotomite
>>
>>108281804
jfc what did they do to make it sound like this
>>
File: 1748948632978330.png (507 KB, 946x1024)
507 KB
507 KB PNG
Why are normies so dumb? And obviously the luddites are throwing a party not realizing this is a skill issue.
>>
>>108281891
koboldcpp has that functionality under "fastforwarding", kobold's "context shift" purges old tokens from context when context is full.
>>
>>108281926
You are too young to know what an hero even means. You are the real retard here.
>>
>>108281936
either that is fake and gay or the company is fake and gay
either way it probably doesn't matter that the ai was also fake and gay
>>
>>108281948
You are the newfriend, imagine saying you want to kys and an hero in the same sentence
>>
>>108281936
That just goes to show that the company in question is worthless, that it doesn't really matter what they say or do, and that that their upper management is retarded and doesn't need to exist.
>>
You are the reason why /g/ has died.
>>
>>108281946
Doesn't work with rnn, still have to reprocess everything once you hit max context. Try running rwkv or qwen 3.5 and you will see that it won't work.
>>
>>108281976
good
>>
>>108281976
meant for >>108281928
>>
I’m hearing good things about this “Qwen” model. Is it actually all that or can I go back to paypigging? I have 2x3090
>>
>>108281978
Yeah I know, I'm talking about how it works in models where the feature is supported.
>>
>>108281988
you need at least 2 6000s to run it properly, then it is legit better than opus 4.6
>>
>>108281988
Try out either the 27B model or the 122B-A10B model. They seem to be roughly similar with the bigger model being a bit better and faster since it's moe.
>>
>>108282000
Guess I’ll just fuck off then.
>>
>>108282018
Yeah...
>>
>>108281988
qwen2.5-72b fits on that at q4 which should be plenty
>>
Sillytavern/Kobold user. I may have altered a setting ages ago that I cannot remember, and now after every general prompt it just keeps going and gens another one after another after another. My token size per gen is 250. Surely there's something simple I'm neglecting here?
>>
>>108282085
auto-swipes in ST user settings?
>>
>>108281936
I don't understand how that happenes. If you feed the model your data and ask it questions it will have numbers to quote but if you don't give it any data why would you expect it to have access to your sales data.
Furthermore how do you not know your data well enough to do a sanity check simply by glancing at what it produces.

You have the same issues when you ask a subordinate to construct a report. You can't just assume he is correct and despite trusting him you must also verify the results.

I don't want to be mean but that guys issue is not AI.
>>
>>108282094
thar she blows, cheers m8
>>
I've been out of the loop for a bit. What's the current best local model available for utilizing large amounts of RAM with 32GB VRAM? Is it still DeepseekV3 and Kimi K2 or has something else been released?
>>
>>108282099
why cant the ai figure out how to find and access the data on its own? isnt it intelligent?
>>
File: 1mXpdOGQoj.png (290 KB, 1376x1233)
290 KB
290 KB PNG
>>108282110
I still use this one
>>
>>108282148
>isnt it intelligent?
No, stop falling for marketing lies like a retard.
>>
>>108282155
i bet you are either very rich or very poor
>>
>>108282165
so its smart enough to bomb iran but smart enough to figure out where the data is?
>>
The bait will continue until anon's pattern recognition improves.
>>
>>108282169
you wouldn't get it
>>
File: 1748635088988770.jpg (374 KB, 2720x3000)
374 KB
374 KB JPG
>>108281688
>>
>>108281813
>still no 1T params 1b active SSDmaxxer model
You sleeping on snowflake arctic?
>>
>>108282172
>so its smart enough to bomb iran
Sorting through communications in a network you already have backdoors in, doesn't require intelligence. An intern doing ctrl+f through the logs could have achieved the same result, albeit not as fast.
>>
>>108282193
why she blushin
>>
>>108282110
K2.5 thinking at q4
>>
>>108282018
>>108281988
You don't need that. Your current hardware is sufficient to run Qwen3.5 122B-A10B or Qwen3.5 27B. Both are good models. If you want to do ERP with them though then you should grab the Heretic versions of those models.
>>108282040
This is an old model, don't use it.
>>
>>108282203
q2 is better, more creativity
>>
>>108282203
>>108282213
What are the gains and losses compared to K2-Instruct and K2-Thinking? Moonshot was hopping on the censorcuck train last I saw.
>>
>>108282245
can you not use such vulgar words?
>>
you crazy nigga. but i appreciate it.
>>
File: nocap.jpg (400 KB, 1536x1536)
400 KB
400 KB JPG
►Recent Highlights from the Previous Thread: >>108278008

--Agentic roleplay potential demonstrated through blackjack simulation:
>108278746 >108278774 >108278813 >108278819
--StepFun releases 3.5-Flash models and training tools:
>108280402 >108280421 >108280426
--122B model excels at Japanese text transcription:
>108278617 >108278679 >108279715 >108280042 >108280080
--Manual offloading outperforms --fit for 122B model on 3090+3060 setup:
>108281460 >108281492 >108281506 >108281543 >108281720
--International models lag behind frontier labs on ARC-AGI-2 benchmark:
>108279363 >108279384 >108279387 >108279404 >108279418 >108279428 >108279567 >108279598 >108279612 >108279657 >108279617 >108279629 >108279836 >108279469 >108279746 >108280473
--Open-source AI models performance gap with proprietary models:
>108279687 >108279804
--Qwen3.5-35B-A3B GGUF quantization benchmarks:
>108280652 >108280670 >108280678 >108280680 >108280735
--Qwen 3.5 Small Model Series release and performance claims:
>108278104 >108278328 >108280444
--Qwen3.5-35B-A3B-Heretic hitting 72 TPS on 7800X3D/7900 XTX with new llama.cpp:
>108281622 >108281636 >108281652 >108281657
--Qwen3.5 35b 4-bit vs 122b 6-bit speed tradeoffs:
>108280506 >108280525 >108280560
--Devstral-2 model's flawed Jinja date logic template:
>108278061 >108280633 >108280638
--AI response generation process critique and benchmarking culture:
>108278971 >108278991 >108279011 >108279036
--Qwen 3.5 benchmarks:
>108278349 >108278416
--AI internal reasoning resisting offensive prompt bypass attempts:
>108278112
--Qwen 3.5 27B speed optimization on budget hardware:
>108279596 >108279608 >108279623 >108279631 >108279638 >108279653 >108279662 >108279685 >108279689
--A.I. Dating Apps Complicate China's Efforts to Boost Birthrate:
>108278523
--Miku (free space):
>108278507 >108280771 >108281230

►Recent Highlight Posts from the Previous Thread: >>108278113

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.