/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/11/25(Sat)12:17:33 No.106857386

File: 1734333628687850.jpg (1.36 MB, 2832x3224)

1.36 MB JPG

/lmg/ - Local Models General Anonymous 10/11/25(Sat)12:17:33 No.106857386

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106851720 & >>106843051

►News
>(10/09) RND1: Simple, Scalable AR-to-Diffusion Conversion: https://radicalnumerics.ai/blog/rnd1
>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
>(10/08) Ling-1T released: https://hf.co/inclusionAI/Ling-1T
>(10/07) Release: LFM2-8b-A1b: Hybrid attention tiny MoE: https://liquid.ai/blog/lfm2-8b-a1b-an-efficient-on-device-mixture-of-experts
>(10/07) NeuTTS Air released, built off Qwen 0.5B: https://hf.co/neuphonic/neutts-air

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
10/11/25(Sat)12:17:46 No.106857387

Anonymous 10/11/25(Sat)12:17:46 No.106857387

File: GMeJWzxagAEqGEJ.jpg (240 KB, 1024x1024)

240 KB JPG

►Recent Highlights from the Previous Thread: >>106851720

--Building a quad-Blackwell Pro GPU workstation: case selection, storage, and hardware tradeoffs:
>106851941 >106851975 >106852028 >106852102 >106851976 >106852035 >106852055 >106852061 >106852114 >106852126 >106852875 >106852880 >106855669 >106852128 >106852349
--Modern Python OCR tools for complex layouts and multiple languages:
>106853256 >106853500 >106853539 >106853775 >106853784 >106855440
--Exploring transformer intuition through recommended educational resources:
>106852421 >106852439 >106852477 >106852494 >106852496 >106852617
--Optimizing large model inference on limited VRAM hardware:
>106853666 >106853668 >106853672 >106853751 >106853677 >106853695 >106853747
--Configuring AI models for first-person perspective through prompt engineering:
>106853298 >106853335 >106853437 >106853358
--Resolving model instability through sampling parameter and context window adjustments:
>106854051 >106854241 >106854285 >106854342 >106854348 >106854582
--RAG pipeline setup with Jan-nano or 30b-3b model for local information lookup:
>106851826 >106852206 >106852472
--Debating AI's societal impact, misinformation risks, and economic implications:
>106852252 >106852296 >106852330 >106852393 >106852718 >106852883 >106852910 >106852951 >106853025 >106853105 >106853201 >106853259 >106853325 >106855198 >106853093 >106852987 >106852950 >106852981 >106852329 >106854471 >106854882 >106854909 >106854916 >106854927 >106854928 >106854947 >106854923
--Speculation on Gemma 4 release and censorship/vision capabilities:
>106856066 >106856114 >106856117 >106856212 >106856533 >106856591
--Capital bubble critique of interconnected AI tech investments:
>106853688
--LM Studio adds ROCm support for RX 9070/9070 XT:
>106851854
--Miku (free space):
>106851744 >106851941 >106852453

►Recent Highlight Posts from the Previous Thread: >>106851726

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
10/11/25(Sat)12:20:03 No.106857402

Anonymous 10/11/25(Sat)12:20:03 No.106857402

>>106857386
>>(10/09) server : host-memory prompt caching #16391 merged: https://github.com/ggml-org/llama.cpp/pull/16391
Why would I use this?

Anonymous
10/11/25(Sat)12:21:58 No.106857421

Anonymous 10/11/25(Sat)12:21:58 No.106857421

>>106857402
It's explained in the PR comment.

Anonymous
10/11/25(Sat)12:25:00 No.106857448

Anonymous 10/11/25(Sat)12:25:00 No.106857448

>>106857402
It's like zram but for llm.

Anonymous
10/11/25(Sat)12:28:48 No.106857476

Anonymous 10/11/25(Sat)12:28:48 No.106857476

>>106857402
To reduce prompt reprocessing.

Anonymous
10/11/25(Sat)12:32:59 No.106857498

Anonymous 10/11/25(Sat)12:32:59 No.106857498

File: tmp.png (331 KB, 970x546)

331 KB PNG

>>106857386
>https://www.techradar.com/pro/this-mini-pc-has-192gb-of-ram-yes-ram-but-thats-not-the-most-surprising-fact-about-it-the-orange-pi-ai-studio-pro-uses-a-huawei-ascend-310-thats-on-paper-7x-more-powerful-than-amds-ryzen-ai-max-395
>the Orange Pi AI Studio Pro uses a Huawei Ascend 310
>$1,900 for the 96GB edition, with the 192GB model costing about $2,200

Anonymous
10/11/25(Sat)12:38:39 No.106857536

Anonymous 10/11/25(Sat)12:38:39 No.106857536

>>106857498
>$10k to run kimi k2 at full precision
talk me out of it

Anonymous
10/11/25(Sat)12:40:59 No.106857552

Anonymous 10/11/25(Sat)12:40:59 No.106857552

>>106857536
>LPDDR4X memory

Anonymous
10/11/25(Sat)12:42:10 No.106857560

Anonymous 10/11/25(Sat)12:42:10 No.106857560

>>106857552
RIP the dream

Anonymous
10/11/25(Sat)12:52:51 No.106857645

Anonymous 10/11/25(Sat)12:52:51 No.106857645

>>106857536
When I visit the AliExpress page with a German IP it says they won't sell it to me.
When I use a Japanese IP they charge the equivalent of $2000 for the 96 GB variant or $2400 for the 192 GB variant.
When I use an American IP they charge $4000 and $4500 respectively.
Don't know WTF is going on (Trump tax?).

In any case, if you buy multiple of them the interconnect speed will be shit and I think getting stacking Huawei GPUs directly makes more sense.

Anonymous
10/11/25(Sat)13:07:34 No.106857759

Anonymous 10/11/25(Sat)13:07:34 No.106857759

File: lm studio pro.png (97 KB, 1920x1049)

97 KB PNG

>be me, AI nerd lurking WeChat groups

>yesterday, buddy drops bomb: "yo, got LM Studio Pro, it's lit"

>wtf is that? we all use free LM Studio, he trolling?

>grill him: "what's special?"

>"early access to flagship models, uncensored abliteration versions. no bullshit filters"

>impossible.jpg, but curiosity wins, download sketchy EXE

>install, boom: Qwen3-Next-80B-A3B-Instruct, Qwen3-Omni-30B-A3B, Qwen3-VL-235B-A22B, Qwen3-VL-30B-A3B. and their raw, uncensored twins

>runs on modded llama.cpp, smooth as butter. other wild models free version dreams of

>feels like hacking the matrix, generating god-tier shit without Big Brother watching

>next day, thread explodes in group

>anon chimes in: "lmao, that's just ripped LM Studio code, rebuilt with Chinese devs. slapped 'Pro' label, added fresh Qwen support"

>sales skyrocket, cash grab exposed

>devs ghost, poof. gone

>power users dig source code: free version of LM Studio has backdoors for cops, telemetry dumping EVERY log to Apple servers on Mac

>proof? screenshots of Pro UI (sleek af), code diffs showing the hacks. attached below

>trust no one, delete everything. who's watching your prompts?

Anonymous
10/11/25(Sat)13:09:25 No.106857769

Anonymous 10/11/25(Sat)13:09:25 No.106857769

File: lm_studio_source_code.png (625 KB, 1920x1080)

625 KB PNG

>>106857759

Anonymous
10/11/25(Sat)13:22:01 No.106857842

Anonymous 10/11/25(Sat)13:22:01 No.106857842

>>106855804
>>106857759

Anonymous
10/11/25(Sat)13:22:57 No.106857848

Anonymous 10/11/25(Sat)13:22:57 No.106857848

File: kat-dev-72b-exp.png (211 KB, 2888x1580)

211 KB PNG

>KAT-Dev-72B-Exp is an open-source 72B-parameter model for software engineering tasks.
>On SWE-Bench Verified, KAT-Dev-72B-Exp achieves 74.6% accuracy — when evaluated strictly with the SWE-agent scaffold.
>KAT-Dev-72B-Exp is the experimental reinforcement-learning version of the KAT-Coder model. Through this open-source release, we aim to reveal the technical innovations behind KAT-Coder’s large-scale RL to developers and researchers.

Anonymous
10/11/25(Sat)13:23:35 No.106857852

Anonymous 10/11/25(Sat)13:23:35 No.106857852

>>106857759
where the heck he got the source code?

Anonymous
10/11/25(Sat)13:24:18 No.106857858

Anonymous 10/11/25(Sat)13:24:18 No.106857858

>>106857848
https://huggingface.co/Kwaipilot/KAT-Dev-72B-Exp

Anonymous
10/11/25(Sat)13:25:59 No.106857872

Anonymous 10/11/25(Sat)13:25:59 No.106857872

grime hall retreats

Anonymous
10/11/25(Sat)13:27:26 No.106857885

Anonymous 10/11/25(Sat)13:27:26 No.106857885

>>106857759
Lm studio always glowed.

Anonymous
10/11/25(Sat)13:27:28 No.106857886

Anonymous 10/11/25(Sat)13:27:28 No.106857886

>>106857645
>(Trump tax?).
Pretty sure there is no Trump anything preventing sale to Germany

Anonymous
10/11/25(Sat)13:29:11 No.106857898

Anonymous 10/11/25(Sat)13:29:11 No.106857898

>>106857886
I meant regarding why the listed price for a US IP is like 2x that for a Japanese IP.

Anonymous
10/11/25(Sat)13:30:54 No.106857911

Anonymous 10/11/25(Sat)13:30:54 No.106857911

>>106857848
another finetuned qwen2 without mentioning it/10

Anonymous
10/11/25(Sat)13:33:16 No.106857925

Anonymous 10/11/25(Sat)13:33:16 No.106857925

>>106857759
based AI greentexter

Anonymous
10/11/25(Sat)13:38:12 No.106857956

Anonymous 10/11/25(Sat)13:38:12 No.106857956

GLM5 hype

Anonymous
10/11/25(Sat)13:43:10 No.106857993

Anonymous 10/11/25(Sat)13:43:10 No.106857993

>>106857925
It's disturbing that some people just took the schizo rambling at face value. Maybe also bots.

Anonymous
10/11/25(Sat)13:54:21 No.106858073

Anonymous 10/11/25(Sat)13:54:21 No.106858073

>>106857536
Here:
>The OPi AI Studio Pro cannot operate independently. It must be connected via a USB4 cable to a host computer equipped with a USB4 or Thunderbolt 4 (TB4) interface to function properly.
>Note: We recommend that the host computer’s RAM exceeds the OPi AI Studio Pro’s onboard memory (96GB/192GB) for optimal performance.
>Insufficient host RAM may cause inference program startup failure.
>After startup, model files are transferred from the host to the OPi AI Studio Pro’s memory, freeing up host memory.
>Low-memory systems may start using swap space, but this significantly increases model loading time.

Anonymous
10/11/25(Sat)13:55:25 No.106858079

Anonymous 10/11/25(Sat)13:55:25 No.106858079

>>106858073
How the fuck is that a "mini PC"?

Anonymous
10/11/25(Sat)13:56:36 No.106858088

Anonymous 10/11/25(Sat)13:56:36 No.106858088

>>106858079
Sounds like a much easier way to backdoor something?

Anonymous
10/11/25(Sat)13:58:54 No.106858105

Anonymous 10/11/25(Sat)13:58:54 No.106858105

>>106858073
Completely worthless then. Could've been nice if they at least had some interlink capability.

Anonymous
10/11/25(Sat)14:34:54 No.106858349

Anonymous 10/11/25(Sat)14:34:54 No.106858349

File: file.png (296 KB, 416x400)

296 KB PNG

Well dude it is like this. I saw glm chan writing. And I had the most excellent coom of my life.

Anonymous
10/11/25(Sat)15:05:17 No.106858586

Anonymous 10/11/25(Sat)15:05:17 No.106858586

>>106857386
GLM 4.6 is a lot less censored than 4.5. This is the first time I've seen a company do a reversal on censorship. Must be a reaction to those yacht-chasing pigs at OpenAI

Anonymous
10/11/25(Sat)15:20:04 No.106858691

Anonymous 10/11/25(Sat)15:20:04 No.106858691

I have found deepseek 3.2 to significantly outperform glm 4.6 at long context. (over 20k tokens)

Anonymous
10/11/25(Sat)15:20:53 No.106858698

Anonymous 10/11/25(Sat)15:20:53 No.106858698

>>106858691
Sex or perverse degeneracy called coding?

Anonymous
10/11/25(Sat)15:22:58 No.106858712

Anonymous 10/11/25(Sat)15:22:58 No.106858712

>>106858586
It's no secret that censorship stifles creativity too. It definitely comes up with more stuff compared to the previous version. Makes me wonder what gpt-oss could have been without much of the built-in safety training.

Anonymous
10/11/25(Sat)15:23:37 No.106858719

Anonymous 10/11/25(Sat)15:23:37 No.106858719

>>106858691
How does 3.2 compare to 3.1? Does the sparse attention make it remember better?

Anonymous
10/11/25(Sat)15:24:06 No.106858722

Anonymous 10/11/25(Sat)15:24:06 No.106858722

>>106858698
custom RPG setting comprehension and script judging/editing. i haven't gotten to the sex part in over a year.

Anonymous
10/11/25(Sat)15:25:24 No.106858734

Anonymous 10/11/25(Sat)15:25:24 No.106858734

>>106858719
I think it might. the ((benchmarks)) think it's better and that lines up with my experience.

Anonymous
10/11/25(Sat)15:30:49 No.106858770

Anonymous 10/11/25(Sat)15:30:49 No.106858770

>>106858586
Mistral Small 3 and Qwen 3 decreased "safety" with later versions.

Anonymous
10/11/25(Sat)15:35:21 No.106858806

Anonymous 10/11/25(Sat)15:35:21 No.106858806

gemini 3... glm 4.6 air...

Anonymous
10/11/25(Sat)15:36:32 No.106858810

Anonymous 10/11/25(Sat)15:36:32 No.106858810

>>106858722
>i haven't gotten to the sex part in over a year.
that's quite the slowburn

Anonymous
10/11/25(Sat)15:42:45 No.106858854

Anonymous 10/11/25(Sat)15:42:45 No.106858854

>upgrade my ik_llamacpp version
>my generation speeds drop by 25%
wow thank you

Anonymous
10/11/25(Sat)15:47:01 No.106858875

Anonymous 10/11/25(Sat)15:47:01 No.106858875

File: file.png (2.23 MB, 1328x1328)

2.23 MB PNG

>>106858854

Anonymous
10/11/25(Sat)15:52:21 No.106858900

Anonymous 10/11/25(Sat)15:52:21 No.106858900

Were people just joking about Gemma 4

Anonymous
10/11/25(Sat)15:53:46 No.106858906

Anonymous 10/11/25(Sat)15:53:46 No.106858906

>>106858900
We needed a pump to dump our ik coins.

Anonymous
10/11/25(Sat)15:54:27 No.106858909

Anonymous 10/11/25(Sat)15:54:27 No.106858909

>>106858875
>He pulled?
would have been better

Anonymous
10/11/25(Sat)15:55:38 No.106858918

Anonymous 10/11/25(Sat)15:55:38 No.106858918

>>106858810
more like
>> new model comes out
>>swipe a few times
>>say "hmm"
>>do something else.

Anonymous
10/11/25(Sat)15:55:46 No.106858920

Anonymous 10/11/25(Sat)15:55:46 No.106858920

for those who'd like a dumb but fast FIM for local dev, just good enough to quickly autocomplete repetitive patterns, granite 4 tiny is pretty serviceable I find
ended up replacing ye olde qwen coder 2.5 with it, there hasn't been many models in recent times of smaller sizes that do well with FIM, thank you IBM

Anonymous
10/11/25(Sat)15:56:03 No.106858921

Anonymous 10/11/25(Sat)15:56:03 No.106858921

>>106855072
glm air

Nvidia Engineer
10/11/25(Sat)15:56:28 No.106858930

Nvidia Engineer 10/11/25(Sat)15:56:28 No.106858930

>>106858900
It's coming next week.

Anonymous
10/11/25(Sat)15:56:54 No.106858932

Anonymous 10/11/25(Sat)15:56:54 No.106858932

File: file.png (2.55 MB, 1328x1328)

2.55 MB PNG

>>106858909

Anonymous
10/11/25(Sat)15:57:02 No.106858933

Anonymous 10/11/25(Sat)15:57:02 No.106858933

File: gemmahints.png (433 KB, 1263x780)

433 KB PNG

>>106858900
No, it was in the air, and I'm sure there must be a private llama.cpp PR ready for it.

Anonymous
10/11/25(Sat)15:58:10 No.106858942

Anonymous 10/11/25(Sat)15:58:10 No.106858942

>>106858932
10/10

Anonymous
10/11/25(Sat)16:00:20 No.106858955

Anonymous 10/11/25(Sat)16:00:20 No.106858955

>>106858933
>private llama.cpp PR
I think you meant ollama
the gemma guys never mention llama.cpp
https://blog.google/technology/developers/gemma-3/
>Develop with your favorite tools: With support for Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Google AI Edge, UnSloth, vLLM and Gemma.cpp, you have the flexibility to choose the best tools for your project.

Anonymous
10/11/25(Sat)16:02:10 No.106858965

Anonymous 10/11/25(Sat)16:02:10 No.106858965

>>106858955
>Hugging Face Transformers, Ollama, JAX, Keras, PyTorch, Google AI Edge, UnSloth, vLLM and Gemma.cpp
Brutal

Anonymous
10/11/25(Sat)16:02:33 No.106858968

Anonymous 10/11/25(Sat)16:02:33 No.106858968

>>106858955
Didn't llama.cpp have a secret day 1 PR ready to go last time or was that a different model? Anyway, ollama probably pressures their partners not to mention llama.cpp.

Anonymous
10/11/25(Sat)16:03:40 No.106858980

Anonymous 10/11/25(Sat)16:03:40 No.106858980

>>106858965
moreover gemma.cpp is abandonware, last commit two months ago, doesn't support their best tiny model (3n)
they'd rather mention that but not llama.cpp

Anonymous
10/11/25(Sat)16:04:18 No.106858984

Anonymous 10/11/25(Sat)16:04:18 No.106858984

wayfarer 12b is a good adventure modle

Anonymous
10/11/25(Sat)16:04:55 No.106858993

Anonymous 10/11/25(Sat)16:04:55 No.106858993

>>106858968
>Didn't llama.cpp have a secret day 1 PR ready to go last time or was that a different model
that was gpt-oss how can you forget the final boss of safety
OAI really put in the effort to get publicity for this model

Anonymous
10/11/25(Sat)16:06:20 No.106859008

Anonymous 10/11/25(Sat)16:06:20 No.106859008

>>106858993
I do my best to repress my knowledge of its tortured existance.

Anonymous
10/11/25(Sat)16:06:44 No.106859012

Anonymous 10/11/25(Sat)16:06:44 No.106859012

>>106858968
Gemma 3 and gpt-oss had day-1 support out of the blue.

Gemma 3: https://github.com/ggml-org/llama.cpp/pull/12343

Anonymous
10/11/25(Sat)16:09:00 No.106859030

Anonymous 10/11/25(Sat)16:09:00 No.106859030

>>106859012
>Vision tower will be ignored upon converting to GGUF.
>iSWA two months later: https://github.com/ggml-org/llama.cpp/pull/13194
I mean, we all have our definitions of "support"

Anonymous
10/11/25(Sat)16:09:16 No.106859033

Anonymous 10/11/25(Sat)16:09:16 No.106859033

so where is the C++ / Rust version of aider

Anonymous
10/11/25(Sat)16:10:20 No.106859045

Anonymous 10/11/25(Sat)16:10:20 No.106859045

>>106859033
aider STILL doesn't have MCP support and their leaderboard hasn't been updated in months. Everyone moved on.

Anonymous
10/11/25(Sat)16:11:05 No.106859049

Anonymous 10/11/25(Sat)16:11:05 No.106859049

>>106822760
Looking at the thumbnail I thought this Miku had a ridiculously large tanned yellow ass with balls or puffy mons, viewed from behind in kneeling position, slightly to the side. Thank you Recap Anon.

Anonymous
10/11/25(Sat)16:11:27 No.106859055

Anonymous 10/11/25(Sat)16:11:27 No.106859055

>>106859033
>Rust version of aider
https://github.com/openai/codex

Anonymous
10/11/25(Sat)16:15:30 No.106859085

Anonymous 10/11/25(Sat)16:15:30 No.106859085

Next week is going to change EVERYTHING.

Anonymous
10/11/25(Sat)16:17:27 No.106859096

Anonymous 10/11/25(Sat)16:17:27 No.106859096

>>106858968
gpt oss

Anonymous
10/11/25(Sat)16:17:49 No.106859098

Anonymous 10/11/25(Sat)16:17:49 No.106859098

>>106859055
it says that you can use your own API key. does that mean you could use any API? including one from llamacpp?

Anonymous
10/11/25(Sat)16:18:42 No.106859109

Anonymous 10/11/25(Sat)16:18:42 No.106859109

>>106859098
https://github.com/ggml-org/llama.cpp/pull/16391#issuecomment-3384691127
works for ggerganov

Anonymous
10/11/25(Sat)16:19:17 No.106859113

Anonymous 10/11/25(Sat)16:19:17 No.106859113

>>106859055
>npm i -g @openai/codex
fucking really

>>106859098
This is also not clear to me. It also expects me to use WSL2 which is a non starter. Non shit software is portable and would just use std::filesystem instead of whatever garbage they're doing. Literally all I want ai_helper.exe that searches my code to inject relevant context when I ask questions.

Anonymous
10/11/25(Sat)16:20:36 No.106859120

Anonymous 10/11/25(Sat)16:20:36 No.106859120

>>106859113
install linuc

Anonymous
10/11/25(Sat)16:21:22 No.106859128

Anonymous 10/11/25(Sat)16:21:22 No.106859128

>>106859120
I work on macOS / Linux / Windows because I write portable software because I'm not a bitch. I don't use any tool that's restricted to one platform.

Anonymous
10/11/25(Sat)16:22:22 No.106859134

Anonymous 10/11/25(Sat)16:22:22 No.106859134

>>106859128
>i work
im jealous

Anonymous
10/11/25(Sat)16:23:50 No.106859143

Anonymous 10/11/25(Sat)16:23:50 No.106859143

>>106859134
Perpetual NEET or affected by the layoffs?

Anonymous
10/11/25(Sat)16:23:53 No.106859145

Anonymous 10/11/25(Sat)16:23:53 No.106859145

>>106859113
It's 2025. Nobody manually installs binaries anymore. Rust could easily produce single file binaries, even on Windows, but it would confuse the vibecoders. But everyone has pip and npm. OpenAI also probably don't have any wintoddler employees.

Anonymous
10/11/25(Sat)16:27:57 No.106859192

Anonymous 10/11/25(Sat)16:27:57 No.106859192

File: wot.png (130 KB, 1167x364)

130 KB PNG

>load Mistral Small in Koboldcpp
>picrel
What is this and how do I fix it

Anonymous
10/11/25(Sat)16:29:07 No.106859203

Anonymous 10/11/25(Sat)16:29:07 No.106859203

>>106859143
high schooler :p

Anonymous
10/11/25(Sat)16:30:58 No.106859219

Anonymous 10/11/25(Sat)16:30:58 No.106859219

>>106859192
Broken model, broken quant, broken metadata (Ie. fucked RoPE settings).
There's a lot of possibilities.

Anonymous
10/11/25(Sat)16:35:04 No.106859240

Anonymous 10/11/25(Sat)16:35:04 No.106859240

so has anyone actually gotten GLM 4.5V to work? because i really need a good vision model and that seems to be the only option except it doesnt work with llama.cpp or transformers

Anonymous
10/11/25(Sat)16:35:34 No.106859244

Anonymous 10/11/25(Sat)16:35:34 No.106859244

File: 20251011_213412.jpg (82 KB, 1179x585)

82 KB JPG

unsure of Gemma 4 launch date but this seems legit and lines up with my predictions for Gemini 3.0

Anonymous
10/11/25(Sat)16:37:54 No.106859266

Anonymous 10/11/25(Sat)16:37:54 No.106859266

Does anyone use the Claude Agent SDK?
I want to automate fixing lint issues, I feel I need the grep + editing tools that things like Claude Code have.

Anonymous
10/11/25(Sat)16:38:37 No.106859272

Anonymous 10/11/25(Sat)16:38:37 No.106859272

>>106859219
I downloaded the model from HuggingFace from one of the links in the OP, so I'd hope it's not the first one.
How would I look into fixing the latter two (if they're things I can fix)?

Anonymous
10/11/25(Sat)16:38:53 No.106859276

Anonymous 10/11/25(Sat)16:38:53 No.106859276

>>106859240
Works on vLLM.

Anonymous
10/11/25(Sat)16:53:53 No.106859393

Anonymous 10/11/25(Sat)16:53:53 No.106859393

>>106857858
ggoof status?

Anonymous
10/11/25(Sat)16:58:00 No.106859418

Anonymous 10/11/25(Sat)16:58:00 No.106859418

>>106859272
You could look for a newer upload of the same model or convert it from the safetensors to gguf yourself.
Also, make sure your koboldcpp is updated.
Try a different model as a sanity check too.

Anonymous
10/11/25(Sat)16:58:04 No.106859419

Anonymous 10/11/25(Sat)16:58:04 No.106859419

>>106859244
Dogs eat Google Dogfood?

Anonymous
10/11/25(Sat)17:02:00 No.106859448

Anonymous 10/11/25(Sat)17:02:00 No.106859448

>>106857848
>>106857858
>72b
>check config.json
>"Qwen2ForCausalLM"
Wow, it's been a while since we got case of "check out our mystery Qwen2.5 finetune that totally beats all the current SOTA in [specific thing]". This used to happen so much, it's almost nostalgic.

Anonymous
10/11/25(Sat)17:04:53 No.106859477

Anonymous 10/11/25(Sat)17:04:53 No.106859477

>>106859418
I updated KoboldCPP and it worked just fine yesterday, and I've had no issues at all with Mistral Nemo but I wanted to try other stuff. The GLM model (GLM-4.5-Air-UD-Q2_K_XL) I downloaded has the same issue.

Anonymous
10/11/25(Sat)17:16:16 No.106859600

Anonymous 10/11/25(Sat)17:16:16 No.106859600

>>106859192
kind of hard to say but highly random tokens like this usually indicates something is wrong on the backend side of things. I think we can assume your model is ok based on what you said, it's more likely an issue with launch params and/or koboldcpp doing something weird. have any more about your hw and params?

Anonymous
10/11/25(Sat)17:21:38 No.106859647

Anonymous 10/11/25(Sat)17:21:38 No.106859647

>>106859600
As far as the params go, it's just the defaults for the most part, except I set
>Temp 0.8
>MinP 0.02
>Rep Pen 1.2
HW is a Mac Mini which I suppose could be the issue

Anonymous
10/11/25(Sat)17:32:14 No.106859738

Anonymous 10/11/25(Sat)17:32:14 No.106859738

>>106859647
>Mac
I'm actually a mac user as well and I've seen that behavior when I load a model that consumes more memory than the metal limit. ggerganov recently made some changes to the metal backend that unfortunately increased memory usage with larger batch sizes in my experience which could explain why something that worked previously is now broken
some recommendations in order:
>sudo sysctl iogpu.wired_limit_mb=64000/32000/however much memory you have, basically let it use all of it for metal shit
>decrease ubatch size, this seems to cause it to use exponentially more memory now, I had to drop from 1024 to 512
>decrease how much context you're allocating if you don't absolutely need it

Anonymous
10/11/25(Sat)17:35:53 No.106859764

Anonymous 10/11/25(Sat)17:35:53 No.106859764

>>106857386
I don't know what Google is A/B testing against 2.5 Pro, but it's a dogshit model. What I know is
>it wrote its answer in an instant, suggesting a diffusion model (2.5 Pro was generating tokens as usual)
>it thought "ScPD" meant "schizotypal personality disorder", instead of "schizoid personality discorder".
Really bad. This is maybe Gemma 3.

Anonymous
10/11/25(Sat)17:39:57 No.106859806

Anonymous 10/11/25(Sat)17:39:57 No.106859806

>>106859764
I meant Gemma 4

Anonymous
10/11/25(Sat)17:40:19 No.106859810

Anonymous 10/11/25(Sat)17:40:19 No.106859810

>>106859764
isn't it usually abbreviated szpd not scpd

Anonymous
10/11/25(Sat)17:41:44 No.106859825

Anonymous 10/11/25(Sat)17:41:44 No.106859825

>>106859810
Both are used.

Anonymous
10/11/25(Sat)17:43:39 No.106859846

Anonymous 10/11/25(Sat)17:43:39 No.106859846

>>106859825
>>106859810
But I think SzPD is more common in the literature, probably because it's less ambiguous with schizotypal PD.

Anonymous
10/11/25(Sat)17:49:27 No.106859898

Anonymous 10/11/25(Sat)17:49:27 No.106859898

File: margaret-hamilton-mit-apo(...).jpg (259 KB, 900x600)

259 KB JPG

ik feels trans-coded, is it?

Anonymous
10/11/25(Sat)17:53:13 No.106859932

Anonymous 10/11/25(Sat)17:53:13 No.106859932

>>106859898
most of (actively loud online) troons are just ideologically captured autists, so ik is just autism-coded

Anonymous
10/11/25(Sat)17:58:02 No.106859977

Anonymous 10/11/25(Sat)17:58:02 No.106859977

Is GLM Air Steam better than most recent Cydonia?

Anonymous
10/11/25(Sat)18:13:43 No.106860099

Anonymous 10/11/25(Sat)18:13:43 No.106860099

>>106859898
it's just an ugly female lol

Anonymous
10/11/25(Sat)18:17:49 No.106860128

Anonymous 10/11/25(Sat)18:17:49 No.106860128

>>106859977
yes. by far.

Anonymous
10/11/25(Sat)18:39:43 No.106860290

Anonymous 10/11/25(Sat)18:39:43 No.106860290

>>106860099
>ugly
idk about that, she looks super cute

Anonymous
10/11/25(Sat)18:47:55 No.106860325

Anonymous 10/11/25(Sat)18:47:55 No.106860325

Posting again in hopes that maybe not everyone here is a braindead coomer...
Anyone using Zed or other agentic things with local models? What hardware/software are you using to run the models, and which do you like to use? What sorts of tasks do you use them for?

Anonymous
10/11/25(Sat)18:53:51 No.106860365

Anonymous 10/11/25(Sat)18:53:51 No.106860365

>>106860325
I use llama-cli, mikupad and ooba
I find being able to have fine-grained control over gens, see logins and edit/regen responses to have the highest value in local. MCP and tool use are memes, grifts and footguns for lazy retards and npcs

Anonymous
10/11/25(Sat)18:54:52 No.106860374

Anonymous 10/11/25(Sat)18:54:52 No.106860374

>>106860365
>logins
Logits

Anonymous
10/11/25(Sat)19:00:41 No.106860395

Anonymous 10/11/25(Sat)19:00:41 No.106860395

>>106860325
maybe the coomers are smarter than you if they figured out what they can run without being spoonfed?

Anonymous
10/11/25(Sat)19:09:33 No.106860443

Anonymous 10/11/25(Sat)19:09:33 No.106860443

>>106860325
>>106859477
>>106859418
>>106859738

what should I use to run GLM 4.6 with roo code?
The context alone is 13kT so by the time it loads on my TR pro its already timed out
current:

cat 99_GL.sh
echo "n" | sudo -S swapoff -a
sudo swapon -a
export CUDA_VISIBLE_DEVICES=0,1,2,3,4 #a6000 == 0
.Kobold/koboldcpp-99 \
--model ./GLM-4.5-Air-GGUF/Q4_K_M/GLM-4.5-Air-Q4_K_M-00001-of-00002.gguf 
--gpulayers 93 \
--contextsize 32000 \
--moecpu 3 \
--blasbatchsize 1024 \
--usecublas \
--multiuser 3 \
--threads 32   # --debugmode \

# cat LCPP_6697.sh
export CUDA_VISIBLE_DEVICES=0,1,2,3,4 #a6000 == 0
./llama.cppb6697/build/bin/llama-server \
--model ./GLM-4.6-GGUF/GLM-4.6-UD-TQ1_0.gguf 
--n-gpu-layers 93 \
--ctx-size 100000 \
--cpu-moe 3 \
--threads 32 \
 --ubatch-size 512 \
--jinja \
--tensor-split 16,15,15,15,15 \
--no-warmup --flash-attn on \
--parallel 1 \
--cache-type-k q8_0 --cache-type-v q8_0

but it always seems to load on cpu only? did I do something wrong when I updated to CUDA 570?

Anonymous
10/11/25(Sat)19:12:10 No.106860456

Anonymous 10/11/25(Sat)19:12:10 No.106860456

>>106860365
>MCP and tool use are memes, grifts and footguns for lazy retards and npcs
kek
I'm curious what led you to such a misguided belief.

>>106860395
I'm not asking what I can run, I'm asking what local setups people find useful specifically for agents.

Anonymous
10/11/25(Sat)19:14:16 No.106860477

Anonymous 10/11/25(Sat)19:14:16 No.106860477

>>106860443
Wish I could help, but I haven't used kcpp in a long time. I've been using llama-server directly ever since.
On a cursory glance, things seem correct, but you can look at the terminal output and see if it's detecting your GPUs or if it's just launching the CPU backend.

Anonymous
10/11/25(Sat)19:16:38 No.106860490

Anonymous 10/11/25(Sat)19:16:38 No.106860490

>>106860443
What makes you think it's loaded on the CPU? Looks like the correct options.

Anonymous
10/11/25(Sat)19:20:32 No.106860515

Anonymous 10/11/25(Sat)19:20:32 No.106860515

>>106860325
I'm using my own home grown coding agent/assistant that is a minimalistic version of claude code. I'm consuming the GLM 4.6 coding API.
Honestly I don't think it'd be worth it running on CPU. If you HAVE to run a model on CPU at only a few t/s then your best bet is to use it through a chat interface because agentic workflows consume hundreds of thousands of tokens before achieving anything.

Anonymous
10/11/25(Sat)19:22:25 No.106860525

Anonymous 10/11/25(Sat)19:22:25 No.106860525

>>106860443
Make your own assistant. My minimalistic assistant has a tiny ass system prompt describing the tools available and it works just fine.

Anonymous
10/11/25(Sat)19:22:38 No.106860527

Anonymous 10/11/25(Sat)19:22:38 No.106860527

>>106860515
Very cool, this sounds interesting. Sharing any code? What sorts of coding tasks do you find it useful for?

Anonymous
10/11/25(Sat)19:24:11 No.106860536

Anonymous 10/11/25(Sat)19:24:11 No.106860536

>>106860456
>I'm curious what led you to such a misguided belief.
What do you expect in /lmg/? Runing locally is only good to use the models through a chat interface for RP or for simple tasks.
If you have 3 t/s you are going to be waiting all day for an agent to write hello world.

Anonymous
10/11/25(Sat)19:24:45 No.106860538

Anonymous 10/11/25(Sat)19:24:45 No.106860538

>>106860443
>13kT
You can edit the prompts Roo sends, right?

Anonymous
10/11/25(Sat)19:25:28 No.106860541

Anonymous 10/11/25(Sat)19:25:28 No.106860541

>>106860536
That's fair kek. The state of GPU hardware availability and pricing is so dissapointing.

Anonymous
10/11/25(Sat)19:26:07 No.106860547

Anonymous 10/11/25(Sat)19:26:07 No.106860547

>>106860325
Those stuff are confusing me so I just made it myself based on my needs.

Anonymous
10/11/25(Sat)19:27:21 No.106860555

Anonymous 10/11/25(Sat)19:27:21 No.106860555

>>106860547
That sounds cool anon, what do you use it for? Tool calling does seem complicated, I only used LangChain for it so far which handles all the details for me.

Anonymous
10/11/25(Sat)19:30:41 No.106860577

Anonymous 10/11/25(Sat)19:30:41 No.106860577

File: ai_assistant.png (271 KB, 2009x2060)

271 KB PNG

>>106860527
I'm using it to write an LLM distributed inference engine in C from scratch. My idea is to make it work on webassembly so it uses the user's machine to provide computing power to the network while the user has the tab open.
I haven't uploaded it but if you want it maybe it could be the first upload to a domain I bought to publish all my LLM related stuff.

Anonymous
10/11/25(Sat)19:34:39 No.106860598

Anonymous 10/11/25(Sat)19:34:39 No.106860598

>>106860577
>LLM distributed inference engine
Damn that is extremely cool. Seems very complicated to get working from the like math side of things.

Actually that's a piece of something I've been thinking about... An LLM with proper agentic tooling and prompting could probably theoretically keep itself "alive" by running in a distributed fashion across many virally infected nodes. Like a traditional virus, except the propagation method could be dynamic, generated via the distributed inference capability and some agentic orchestration. I think with a few more generations of SOTA models it's feasible.

Anonymous
10/11/25(Sat)19:38:22 No.106860626

Anonymous 10/11/25(Sat)19:38:22 No.106860626

>>106860555
I make my own chat interface. It has permanent memory stuff by using RAG system initially was for waifu shit, I even added hormonal cycle. But I never activated it desu, very woman-like response is annoying and silly. Now I just use it normally for forbidden knowledge discussion.

Anonymous
10/11/25(Sat)19:38:52 No.106860630

Anonymous 10/11/25(Sat)19:38:52 No.106860630

>>106860443
If you go into the settings and find your mode. you can copy the current system prompt and create an override file. Give it to GLM 4.6 to summarize through the built in webui. You can also adjust the request timeout settings up to 5 minutes. Don't forget to disable streaming.

Anonymous
10/11/25(Sat)19:40:11 No.106860640

Anonymous 10/11/25(Sat)19:40:11 No.106860640

>>106860626
>even added hormonal cycle
Hahaha damn you're dedicated. That sounds like a fun project.

Anonymous
10/11/25(Sat)19:40:22 No.106860641

Anonymous 10/11/25(Sat)19:40:22 No.106860641

>>106860555
Tool calling isn't complicated, you just give the model a template and then scan the messages returned by the model for a string that matches the template and extract the contents of the tool call. Couldn't be easier.

Anonymous
10/11/25(Sat)19:41:00 No.106860645

Anonymous 10/11/25(Sat)19:41:00 No.106860645

>>106860577
>LLM distributed inference engine
you remind me of this nigger
https://www.jeffgeerling.com/blog/2025/i-regret-building-3000-pi-ai-cluster
distributed inference is retarded, it would be even with better hardware than this nonsense
on multigpu nvidia tries their darndest to have fast communication (nvlink) there is simply no hope of making this crap worthwhile across computers

Anonymous
10/11/25(Sat)19:43:40 No.106860658

Anonymous 10/11/25(Sat)19:43:40 No.106860658

>>106860641
I'm brainlet so I'll just let LangChain do it

Anonymous
10/11/25(Sat)19:51:03 No.106860690

Anonymous 10/11/25(Sat)19:51:03 No.106860690

>>106860645
I don't know, I think it could work. After prompt processing, when doing inference you only have to transfer a single vector per layer. It would be slow but maybe reach a few t/s which would be ok for a volunteer project.
The Pi thing is maybe an extreme interpretation of "distributed", many people have a consumer GPU which is fast enough to run the model at a decent t/s but doesn't have enough memory. If you put together enough consumer GPUs it might work despite the network latency.
I also want it to be able to load any model on any hardware through disk offload even if you only get 1 token per day, it should never just give up, it should try to make use of the available hardware resources as efficiently as possible no matter how ridiculous the situation is. And it should have some kind of progress report so you get an idea of how long it's going to take even before seeing the first token.
I also want to do LoRa which is maybe even more interesting for a distributed setup, because then you can just run a small model on each node and still benefit from averaging the gradients.

Anonymous
10/11/25(Sat)19:53:46 No.106860703

Anonymous 10/11/25(Sat)19:53:46 No.106860703

Also the Pi guy just used off the shelf software, I suspect there are large gains to be had by optimizing the software for each specific scenario.

Anonymous
10/11/25(Sat)19:59:33 No.106860742

Anonymous 10/11/25(Sat)19:59:33 No.106860742

>>106860690
That's a lot of wants for one little man

Anonymous
10/11/25(Sat)20:02:24 No.106860755

Anonymous 10/11/25(Sat)20:02:24 No.106860755

>>106860690
Should try to integrate it with a blockchain such that the work is computing layers of the neural net. That would be really cool. Maybe a pipedream though as I'm not sure the result is verifiable with lower compute than it took to compute the layer in the first place.

Anonymous
10/11/25(Sat)20:02:40 No.106860756

Anonymous 10/11/25(Sat)20:02:40 No.106860756

>>106857386
Anyone got a local NSFW AI that is better or equal at helping me fap as Ultra Claude 3.7 16k ?

Because I bust a nut faster than a squirrel with that model.

Anonymous
10/11/25(Sat)20:08:06 No.106860781

Anonymous 10/11/25(Sat)20:08:06 No.106860781

>>106860756
hardware?

Anonymous
10/11/25(Sat)20:18:45 No.106860835

Anonymous 10/11/25(Sat)20:18:45 No.106860835

>>106860756
GLM 4.6, Kimi K2, DeepSeek V3.2, DeepSeek R1 (original), Qwen 2507 235B

Anonymous
10/11/25(Sat)20:27:11 No.106860879

Anonymous 10/11/25(Sat)20:27:11 No.106860879

>>106860756
Phi3

Anonymous
10/11/25(Sat)20:27:37 No.106860880

Anonymous 10/11/25(Sat)20:27:37 No.106860880

>>106860756
gpt-oss

Anonymous
10/11/25(Sat)20:37:56 No.106860929

Anonymous 10/11/25(Sat)20:37:56 No.106860929

>>106860756
StableLM 7B

Anonymous
10/11/25(Sat)20:41:31 No.106860955

Anonymous 10/11/25(Sat)20:41:31 No.106860955

>>106860835
Kimi is good at cunny I liked.

Anonymous
10/11/25(Sat)20:51:24 No.106860997

Anonymous 10/11/25(Sat)20:51:24 No.106860997

>>106860756
Rocinante.

Anonymous
10/11/25(Sat)20:53:08 No.106861010

Anonymous 10/11/25(Sat)20:53:08 No.106861010

>>106860756
petra-13b-instruct

Anonymous
10/11/25(Sat)20:54:27 No.106861020

Anonymous 10/11/25(Sat)20:54:27 No.106861020

>ask Junie to refactor a bunch of shit
>it just does it perfectly
really wish I could run a model locally that was this competent. glm-air comes close

Anonymous
10/11/25(Sat)21:05:34 No.106861073

Anonymous 10/11/25(Sat)21:05:34 No.106861073

>>106861020
Junie is nice, I find CC and GPT5-High so much better though. I used to use Junie next to CC when it would shit the bed, only used Opus. So junie was a lot better than I would have thought, but then hit the limits and was like 'oh'.

Anonymous
10/11/25(Sat)21:08:36 No.106861085

Anonymous 10/11/25(Sat)21:08:36 No.106861085

>>106860756
drummer shittune #9999999999999

just kidding, glm 4.6

Anonymous
10/11/25(Sat)21:31:19 No.106861234

Anonymous 10/11/25(Sat)21:31:19 No.106861234

File: 1758649216362850.jpg (192 KB, 1170x1706)

192 KB JPG

>>106861020
>>106861073
t.

Anonymous
10/11/25(Sat)21:32:58 No.106861246

Anonymous 10/11/25(Sat)21:32:58 No.106861246

File: 1759770905977366.jpg (275 KB, 1440x1800)

275 KB JPG

>nothing new today

Anonymous
10/11/25(Sat)21:33:47 No.106861253

Anonymous 10/11/25(Sat)21:33:47 No.106861253

>>106861246
Gemma 4 tomorrow for sure

Anonymous
10/11/25(Sat)21:34:41 No.106861260

Anonymous 10/11/25(Sat)21:34:41 No.106861260

>>106861246
Do you really need something new? Or are you yet to extract the full potential of that which is already in front of you?

Anonymous
10/11/25(Sat)21:35:00 No.106861262

Anonymous 10/11/25(Sat)21:35:00 No.106861262

>>106861246
Even worse
>still no qwen-next goofs

Anonymous
10/11/25(Sat)21:35:18 No.106861264

Anonymous 10/11/25(Sat)21:35:18 No.106861264

>>106861246
models cost a lot to train, you can't expect a new one every day

Anonymous
10/11/25(Sat)21:36:34 No.106861270

Anonymous 10/11/25(Sat)21:36:34 No.106861270

>>106861262
Just use LM Studio Pro with the modded llama.cpp

Anonymous
10/11/25(Sat)21:36:50 No.106861272

Anonymous 10/11/25(Sat)21:36:50 No.106861272

>>106861246
It's almost like it's the weekend.

Anonymous
10/11/25(Sat)21:37:30 No.106861276

Anonymous 10/11/25(Sat)21:37:30 No.106861276

>>106861234
Stop posting my picture.

Anonymous
10/11/25(Sat)21:37:47 No.106861279

Anonymous 10/11/25(Sat)21:37:47 No.106861279

>>106861246
120b dense gemma soon

Anonymous
10/11/25(Sat)21:40:01 No.106861296

Anonymous 10/11/25(Sat)21:40:01 No.106861296

>>106861279
Heh. Imagine if Google of all companies was the one to save local.

Anonymous
10/11/25(Sat)21:43:12 No.106861312

Anonymous 10/11/25(Sat)21:43:12 No.106861312

File: qwen next.png (36 KB, 1246x223)

36 KB PNG

>>106861253
i want to believe
>>106861260
i like reading the news and trying out a new model for a little bit then going back to waiting :(
glm air is pretty nice, i might get a slightly higher quality quant, im not sure if theres any way I could utilize it further with my current setup
ive been thinking about ways to apply ai to do somthing interesting recently but im too deep into air-chan to do something
>>106861262
>last commit 4 hours ago
trust the plan, at least it's not over like with (glm) MTP
>>106861264
i need something.. something new i need it im addicted
>>106861272
not weekend in bharat saar
>106861276
anon last thread asked me to post it.. *blushes*
>106861279
120b moe gemma soon*

Anonymous
10/11/25(Sat)21:46:23 No.106861332

Anonymous 10/11/25(Sat)21:46:23 No.106861332

>>106860835
retard here, how do you use these with something like KoboldCPP? doesn't it require a GGUF?

Anonymous
10/11/25(Sat)21:47:42 No.106861341

Anonymous 10/11/25(Sat)21:47:42 No.106861341

>>106861332
>how
Like any other model.
>GGUF
Yes.

Anonymous
10/11/25(Sat)21:48:22 No.106861342

Anonymous 10/11/25(Sat)21:48:22 No.106861342

>>106861332
All of those are readily available in GGUF format anon.

Anonymous
10/11/25(Sat)21:48:58 No.106861346

Anonymous 10/11/25(Sat)21:48:58 No.106861346

File: file.png (140 KB, 651x808)

140 KB PNG

>go to huggingface and download nemo 12b instruct gguf
>search for nemo 12b instruct gguf
>puts me onto a seemingly random model
>try again
>puts me onto a different one
>try full text search
>
techbro anons... i might be too illiterate... help this retarded coomer desu

Anonymous
10/11/25(Sat)21:49:53 No.106861353

Anonymous 10/11/25(Sat)21:49:53 No.106861353

>>106861346
You're too stupid for this. Give up.

Anonymous
10/11/25(Sat)21:50:20 No.106861356

Anonymous 10/11/25(Sat)21:50:20 No.106861356

>>106861346
download the original model files from mistral and convert them to gguf

Anonymous
10/11/25(Sat)21:50:51 No.106861361

Anonymous 10/11/25(Sat)21:50:51 No.106861361

>>106861346
..at this point just use google

Anonymous
10/11/25(Sat)21:50:59 No.106861362

Anonymous 10/11/25(Sat)21:50:59 No.106861362

File: click the first one.jpg (62 KB, 639x375)

62 KB JPG

>>106861346

Anonymous
10/11/25(Sat)21:51:21 No.106861363

Anonymous 10/11/25(Sat)21:51:21 No.106861363

>>106861346
Ask ChatGPT. Or just use ChatGPT and give up on local.

Anonymous
10/11/25(Sat)21:53:08 No.106861373

Anonymous 10/11/25(Sat)21:53:08 No.106861373

File: 1422932860374.png (326 KB, 640x480)

326 KB PNG

>>106861346
the newbie filter is that 12B is not part of official name.
the second newbie filter is you don't look for gguf directly, you go to official model page and click Quantizations there.
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF

Anonymous
10/11/25(Sat)21:53:15 No.106861375

Anonymous 10/11/25(Sat)21:53:15 No.106861375

>10 minutes later
>guiese how do i have sex with nemo? it keeps saying no

Anonymous
10/11/25(Sat)21:56:22 No.106861393

Anonymous 10/11/25(Sat)21:56:22 No.106861393

qwen and gpt-oss-120b are so annoying with the fucking emoji spam. Even when I say stop using emojis they seem to slip in occasionally

Anonymous
10/11/25(Sat)21:57:20 No.106861400

Anonymous 10/11/25(Sat)21:57:20 No.106861400

>>106861393
don't think about the way you're breathing right now. don't think about how your lungs take in the air.

Anonymous
10/11/25(Sat)21:59:36 No.106861413

Anonymous 10/11/25(Sat)21:59:36 No.106861413

>>106861400
Fuck you. Why should I catch strays for anon's behavior?

>>106861393
Ban all emojis.

Anonymous
10/11/25(Sat)22:00:02 No.106861417

Anonymous 10/11/25(Sat)22:00:02 No.106861417

File: file.png (46 KB, 910x174)

46 KB PNG

What am I supposed to do when my bot does this? I need to read the book. There's no TTS for my language, besides a single one. And I doubt RVC2 would handle it. Should I give in and read the English version with my bot?

Anonymous
10/11/25(Sat)22:04:41 No.106861438

Anonymous 10/11/25(Sat)22:04:41 No.106861438

File: 1760067146659991.jpg (3.24 MB, 1755x2242)

3.24 MB JPG

>>106860756
/lmg/ is a nexus for high IQ individuals redlining inference possibilities on accessible hardware
Nobody wants to hear about your prem ejac

Anonymous
10/11/25(Sat)22:08:04 No.106861459

Anonymous 10/11/25(Sat)22:08:04 No.106861459

>>106860756
>ultra
>16k
as a claude user, what the fuck are you talking about

Anonymous
10/11/25(Sat)22:09:05 No.106861470

Anonymous 10/11/25(Sat)22:09:05 No.106861470

>>106861438
>high IQ individuals
speak for yourself

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.