/g/ - /lmg/ - Local Models General - Technology


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Anonymous
/lmg/ - Local Models General 09/04/25(Thu)08:30:52 No.106481874

File: GgnIBuFbIAAjLWc.jpg (167 KB, 1257x2048)

/lmg/ - Local Models General Anonymous 09/04/25(Thu)08:30:52 No.106481874

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106475313 & >>106467368

►News
>(09/04) VibeVoice got WizardLM'd: >>106478635 >>106478655 >>106479071 >>106479162
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/04/25(Thu)08:31:32 No.106481882

Anonymous 09/04/25(Thu)08:31:32 No.106481882

File: what's in the box.jpg (235 KB, 1536x1536)

235 KB JPG

►Recent Highlights from the Previous Thread: >>106475313

--Paper: Binary Quantization For LLMs Through Dynamic Grouping:
>106478831 >106479219 >106479248 >106479257 >106479312
--VibeVoice model disappearance and efforts to preserve access:
>106478635 >106478655 >106478664 >106480157 >106480528 >106478715 >106478764 >106479071 >106479162
--GPU thermal management and 3D-printed custom cooling solutions:
>106480670 >106480698 >106480706 >106480719 >106480751 >106480797 >106480827 >106480837 >106480844 >106480875 >106481348 >106481365 >106480858 >106480897 >106481059
--Testing extreme quantization (Q2_K_S) on 8B finetune for mobile NSFW RP experimentation:
>106478303 >106478464 >106478467 >106478491 >106478497 >106478519 >106478476
--Optimizing system prompts for immersive (E)RP scenarios:
>106477981 >106478000 >106478547 >106478214 >106478396
--Assessment of Apertus model's dataset quality and novelty:
>106480979 >106481002 >106481005 >106481016
--Extracting LoRA adapters from fine-tuned models using tensor differences and tools like MergeKit:
>106480089 >106480116 >106480118 >106480122
--Testing llama.cpp's GBNF conversion for complex OpenAPI schemas with Qwen3-Coder-30B:
>106478075 >106478122 >106478554 >106478574
--Recent llama.cpp optimizations for MoE and FlashAttention:
>106476190 >106476267 >106476280 >106476290
--Proposals for next-gen AI ERP systems with character tracking and time management features:
>106476001 >106476147 >106476263 >106477114 >106477147 >106477247 >106477344 >106477773 >106477810 >106478561 >106478636 >106477955 >106477268 >106477417
--B60 advantages vs RX 6800 and Intel Arc Pro B50 compared to RTX 3060:
>106475539 >106475563 >106475606 >106475639 >106475661 >106475729 >106476927 >106476939 >106476998 >106476979 >106477012 >106477117 >106481021 >106481030 >106481067 >106481241
--Miku (free space):
>106475807

►Recent Highlight Posts from the Previous Thread: >>106475316

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/04/25(Thu)08:40:52 No.106481933

Anonymous 09/04/25(Thu)08:40:52 No.106481933

File: e411cb3b-b0b8-4a46-aca2-4(...).png (1023 KB, 1024x1024)

1023 KB PNG

Has anyone had any success with using VLMs to translate PDFs, particularly of comics and magazines?

I've been trying the new miniCPM V4.5 model, and it's pretty good, but it's a bit too slow (~50tok/sec) to use on thousands of thousands of pages. It parses roughly one page every ten seconds, and basically just amounts to a really good OCR and doesn't seem to do table/markdown formatting that well and I can't seem to get it to caption images in the pages. It's still miles ahead of anything else I've tried since I can tell it to filter out useless information and the OCR literally never fails; I've seen it mess up OCR maybe once in hundreds of pages of documents.

Anonymous
09/04/25(Thu)08:43:37 No.106481952

Anonymous 09/04/25(Thu)08:43:37 No.106481952

How do I control thinking effort in DS V3.1? The model is trained to use short thinking for generic questions and long thinking for math/logic questions, and it wasn't done with a router. What should I do if want it to analyse some random shit with the long thinking mode.

Anonymous
09/04/25(Thu)08:45:03 No.106481968

Anonymous 09/04/25(Thu)08:45:03 No.106481968

File: 00081-945140401.png (2.19 MB, 1200x1520)

2.19 MB PNG

Anyone running the 5060 ti 16gb? gauging whether i should plunge for MSRP or just wait for better options with more vram. I'm hearing the old mikubox-level niggerrigs are totally pointless now due to the aged architecture. Blackwell optimizations seem to be pretty nice for wanvideo speed boosts especially. But the specific limitations njudea set in place + having to actually support them puts me off.

Anonymous
09/04/25(Thu)08:45:20 No.106481970

Anonymous 09/04/25(Thu)08:45:20 No.106481970

>>106481933
and by translate I don't mean just translate, but formatting and converting to a compact text representation (so for example I could convert an entire comic to text and ask Qwen3 30b "what happen???"), it doesn't like to describe images in the text whilst formatting whomstever.

Anonymous
09/04/25(Thu)08:52:08 No.106482026

Anonymous 09/04/25(Thu)08:52:08 No.106482026

>>106481968
i got the 4060ti 16gb, it's a good card for sd/flux, 12b and 4bit 24b at decent speed

Anonymous
09/04/25(Thu)08:57:53 No.106482066

Anonymous 09/04/25(Thu)08:57:53 No.106482066

File: 1744984139638278.png (102 KB, 636x431)

102 KB PNG

>try drummer finetune (skyfall)
>model is significantly shittier
many such cases

Anonymous
09/04/25(Thu)09:01:08 No.106482096

Anonymous 09/04/25(Thu)09:01:08 No.106482096

Is anyone else having the same problem where llama.cpp just stops after the model is done reasoning? It usually happens when the reasoning ends at "....let's patch the code accordingly"

Anonymous
09/04/25(Thu)09:02:40 No.106482101

Anonymous 09/04/25(Thu)09:02:40 No.106482101

>>106482066
Your examples are all unreadable trash. Regardless of the model.

Anonymous
09/04/25(Thu)09:04:34 No.106482110

Anonymous 09/04/25(Thu)09:04:34 No.106482110

>>106482101
First time I've posted a log, rajesh. Try to control yourself.

Anonymous
09/04/25(Thu)09:07:08 No.106482130

Anonymous 09/04/25(Thu)09:07:08 No.106482130

>>106482066
How do you know this isn't intended?

Anonymous
09/04/25(Thu)09:09:22 No.106482149

Anonymous 09/04/25(Thu)09:09:22 No.106482149

>>106482130
Intending to make a model worse is certainly a high IQ play

Anonymous
09/04/25(Thu)09:10:03 No.106482154

Anonymous 09/04/25(Thu)09:10:03 No.106482154

what's a 'respectable' rig for AI that can be easily upgraded? Not only for llm but txt2vid

I don't think I'm ready to do the dual EPYC cpus with 1tb of ram. I couldn't justify the cost just for cooming but I do need a new system and I'd like to make it out of 12b-24b nemo/mistral hell and maybe actually try some of the models that gets discussed in these threads

Anonymous
09/04/25(Thu)09:13:34 No.106482182

Anonymous 09/04/25(Thu)09:13:34 No.106482182

https://xcancel.com/Alibaba_Qwen/status/1963586344355053865
qwen 3 max imminent

Anonymous
09/04/25(Thu)09:15:05 No.106482197

Anonymous 09/04/25(Thu)09:15:05 No.106482197

>>106482154
>Not only for llm but txt2vid
Very different use cases. Text models are moving towards MoE. Big, dense models are dying so a server tier CPU with as much RAM and memory bandwidth as you can afford is ideal, and at least one 24GB GPU will speed things up significantly. Meanwhile, RAM is largely worthless in text2vid unless you want to wait an hour per 6 second video. You need everything in VRAM, with 24GB being the bare minimum, and ideally 48GB or more for higher resolutions and quality so ideally you'd be looking at dual GPUs.

Anonymous
09/04/25(Thu)09:18:38 No.106482225

Anonymous 09/04/25(Thu)09:18:38 No.106482225

>>106482182
I sure hope that it underwent multistage pretraining on 90% code 10% math high quality curated synthetic data starting at 2k tokens upscaled to 4m with yarn

Anonymous
09/04/25(Thu)09:19:17 No.106482231

Anonymous 09/04/25(Thu)09:19:17 No.106482231

>>106482182
Qwen3-2T-A60B

Anonymous
09/04/25(Thu)09:19:51 No.106482235

Anonymous 09/04/25(Thu)09:19:51 No.106482235

>>106482182
But qwen3 coder already exists.

Anonymous
09/04/25(Thu)09:26:49 No.106482298

Anonymous 09/04/25(Thu)09:26:49 No.106482298

Jank rig 3090 fag anon should unironically just whittle a couple of supports out of wood. 3d printing is some retard level yak shaving solution

Anonymous
09/04/25(Thu)09:29:22 No.106482315

Anonymous 09/04/25(Thu)09:29:22 No.106482315

>>106482197
I’m cpumaxxing with a 24gb gpu and it’s not enough for just context, let alone art, tts etc simultaneously. 80gb gpu prices cratering when?

Anonymous
09/04/25(Thu)09:32:25 No.106482341

Anonymous 09/04/25(Thu)09:32:25 No.106482341

>>106482315
wait for the bubble to pop

Anonymous
09/04/25(Thu)09:40:54 No.106482414

Anonymous 09/04/25(Thu)09:40:54 No.106482414

>>106482084
If I do that with CUDA 12.x I get an "unsupported gpu architecture" error in this step:

# cmake -B build -DGGML_CUDA=ON
[...]
-- Check for working CUDA compiler: /home/user/anaconda3/envs/llamacpp/bin/nvcc - broken
CMake Error at /usr/share/cmake/Modules/CMakeTestCUDACompiler.cmake:59 (message):
  The CUDA compiler

    "/home/user/anaconda3/envs/llamacpp/bin/nvcc"

  is not able to compile a simple test program.

  It fails with the following output:

    Change Dir: '/home/user/llamacpp/build/CMakeFiles/CMakeScratch/TryCompile-lOrwxG'
    
    Run Build Command(s): /usr/bin/cmake -E env VERBOSE=1 /usr/bin/gmake -f Makefile cmTC_28439/fast
    /usr/bin/gmake  -f CMakeFiles/cmTC_28439.dir/build.make CMakeFiles/cmTC_28439.dir/build
    gmake[1]: Entering directory '/home/user/llamacpp/build/CMakeFiles/CMakeScratch/TryCompile-lOrwxG'
    Building CUDA object CMakeFiles/cmTC_28439.dir/main.cu.o
    /home/user/anaconda3/envs/llamacpp/bin/nvcc -forward-unknown-to-host-compiler   "--generate-code=arch=compute_75,code=[sm_75]" "--generate-code=arch=compute_80,code=[sm_80]" "--generate-code=arch=compute_86,code=[sm_86]" "--generate-code=arch=compute_89,code=[sm_89]" "--generate-code=arch=compute_90,code=[sm_90]" "--generate-code=arch=compute_100,code=[sm_100]" "--generate-code=arch=compute_103,code=[sm_103]" "--generate-code=arch=compute_120,code=[sm_120]" "--generate-code=arch=compute_121,code=[compute_121,sm_121]" -MD -MT CMakeFiles/cmTC_28439.dir/main.cu.o -MF CMakeFiles/cmTC_28439.dir/main.cu.o.d -x cu -c /home/user/llamacpp/build/CMakeFiles/CMakeScratch/TryCompile-lOrwxG/main.cu -o CMakeFiles/cmTC_28439.dir/main.cu.o
    nvcc fatal   : Unsupported gpu architecture 'compute_103'
    gmake[1]: *** [CMakeFiles/cmTC_28439.dir/build.make:82: CMakeFiles/cmTC_28439.dir/main.cu.o] Error 1
    gmake[1]: Leaving directory '/home/user/llamacpp/build/CMakeFiles/CMakeScratch/TryCompile-lOrwxG'
    gmake: *** [Makefile:134: cmTC_28439/fast] Error 2

Gwen poster.
09/04/25(Thu)09:45:49 No.106482460

Gwen poster. 09/04/25(Thu)09:45:49 No.106482460

>>106482182
We are so back.

Anonymous
09/04/25(Thu)09:46:50 No.106482477

Anonymous 09/04/25(Thu)09:46:50 No.106482477

>>106482066
Thanks drummer.

Anonymous
09/04/25(Thu)09:48:28 No.106482488

Anonymous 09/04/25(Thu)09:48:28 No.106482488

programming bros, what's the best extension for let's say, a jetbrains IDE to connect either local/OR/deepseek/anthropic/openai ?
I was using github copilot, but its fucking garbage, but im not sure if there's a recc. extension that helps with commit messages, normal chat, edit, agent mode, all the usual shit.

Anonymous
09/04/25(Thu)09:51:06 No.106482513

Anonymous 09/04/25(Thu)09:51:06 No.106482513

File: Mommy-Bench_Test_Q2_K_S.png (2.29 MB, 1520x696)

2.29 MB PNG

>>106481874
How sloppy would you say these responses are?

Anonymous
09/04/25(Thu)09:51:59 No.106482518

Anonymous 09/04/25(Thu)09:51:59 No.106482518

File: Mommy-Bench_Test_Q2_K_S_Cont.png (2.34 MB, 1684x744)

2.34 MB PNG

>>106482513

llama.cpp CUDA dev !!yhbFjk57TDr
09/04/25(Thu)09:52:54 No.106482526

llama.cpp CUDA dev !!yhbFjk57TDr 09/04/25(Thu)09:52:54 No.106482526

>>106482414
Compile with -DCMAKE_CUDA_ARCHITECTURES=80-virtual
Your CUDA 12 install does not support CC 10.3 but you can compile the code as PTX (assembly equivalent) instead.
Then at runtime the code is compiled to binary code for whatever GPU is used, since this is done by the driver it should work even for future, unsupported GPUs.

Anonymous
09/04/25(Thu)09:58:17 No.106482572

Anonymous 09/04/25(Thu)09:58:17 No.106482572

How do I set fan curves in linux?

Anonymous
09/04/25(Thu)09:58:40 No.106482577

Anonymous 09/04/25(Thu)09:58:40 No.106482577

>>106482518
Christ, that reads like it was written by a 5 year old

Anonymous
09/04/25(Thu)10:02:19 No.106482604

Anonymous 09/04/25(Thu)10:02:19 No.106482604

>>106482577
Would you say like a child who wishes for a horny, sexually frustated mother?

Anonymous
09/04/25(Thu)10:02:26 No.106482606

Anonymous 09/04/25(Thu)10:02:26 No.106482606

>>106482341
Feels like waiting for the housing bubble to pop

Anonymous
09/04/25(Thu)10:03:41 No.106482612

Anonymous 09/04/25(Thu)10:03:41 No.106482612

>>106482518
i dont mind the retarded esl tier prose. but its making some immersion breaking errors. at such a short context it is looking grim.

Anonymous
09/04/25(Thu)10:04:14 No.106482617

Anonymous 09/04/25(Thu)10:04:14 No.106482617

>>106482572
I use CoolerControl.

Anonymous
09/04/25(Thu)10:08:36 No.106482661

Anonymous 09/04/25(Thu)10:08:36 No.106482661

>3d printing
if you are such niggercattle to buy bamboo you deserve what you get fucking retard bamboo are chink jews elegoo is deepseek https://us.elegoo.com/products/centauri-carbon there are some others that are also good but no one has to combination of good/company size/avalbility as elegoo
>let someone else 3d print it for you
no thats fucking retarded they overcharge by 10x not to mention the shipping costs depending on how much printing you do eg if its ~10 parts or more its cheaper to buy the machine those niggers scam so fucking much if i was president i would straight up give them the death penalty this is not to mention you will fuck up the measurments and need to print again also assuming you already know everything you need to print and havent forgoteen any additives
>pla
thats shit starts getting soft at like 40c its garbo for heat i personally only printed in it so i cant really give reccomendations but stay away from fucking carbon fiber https://youtu.be/ddwNZ12_qX8 same goes for glass fiber also abs wont be good enough aswell if im remembering correctly any printer worth a damn can achieve high enough heat to print materials that can tolerate the heat so you needent worry unless you want to print something like PEEK or sumthing

Anonymous
09/04/25(Thu)10:09:21 No.106482669

Anonymous 09/04/25(Thu)10:09:21 No.106482669

>>106482572
For my RTX 3090 I do it via GreenWithEnvy, don't know what to use for AMD.

Anonymous
09/04/25(Thu)10:10:55 No.106482681

Anonymous 09/04/25(Thu)10:10:55 No.106482681

>>106482572
nvidia-smi -gtt 65

Anonymous
09/04/25(Thu)10:13:34 No.106482712

Anonymous 09/04/25(Thu)10:13:34 No.106482712

>>106481714
There are different types of parallel processing. Data parallelism is when you have multiple copies of a model on multiple devices and you use each copy to process different data, so you can process more data more quickly. When a model does not fit on a single device, pipeline processing (PP), where each layer is put on a specific device is the "easiest" to understand and implement, but also the least efficient. Then there is model parallelism or tensor parallelism (MP or TP), which shards single tensors on multiple devices and gathers the parts together when only necessary. This is commonly when training models that are too large to fit on a single GPU. Expert parallelism (EP) puts experts on different devices. To keep communication overhead low, when routing, often the top k devices are picked first, and then the top k experts from these devices. Then there is FSDP (fully sharded data parallel), which is basically a magical mix of TP and DP use to train large models.

Anonymous
09/04/25(Thu)10:24:41 No.106482833

Anonymous 09/04/25(Thu)10:24:41 No.106482833

We should stop trying to ERP with LLMs. I just tried DeepSeek R1 8B using ollama and it is barely coherent.

Anonymous
09/04/25(Thu)10:25:39 No.106482843

Anonymous 09/04/25(Thu)10:25:39 No.106482843

>>106482833
Same, but I used the proper, real DeepSeek R1 on Ollama. I saw no difference.

Anonymous
09/04/25(Thu)10:28:40 No.106482870

Anonymous 09/04/25(Thu)10:28:40 No.106482870

>>106482833
>>106482843
vram issue

Anonymous
09/04/25(Thu)10:29:01 No.106482877

Anonymous 09/04/25(Thu)10:29:01 No.106482877

>>106482833
>Ollama
You used proper prompt template format right?

Anonymous
09/04/25(Thu)10:30:21 No.106482886

Anonymous 09/04/25(Thu)10:30:21 No.106482886

File: wan22-gpu-6.jpg (206 KB, 900x1421)

206 KB JPG

>>106482026
honestly after reading this article by the japs, i'm going with the 5060 ti 16gb. can't beat being able to actually gen a full suggested 720p res without OOM'ing.
https://chimolog-co.translate.goog/bto-gpu-wan22-specs/?_x_tr_sl=auto&_x_tr_tl=en&_x_tr_hl=bg&_x_tr_pto=wapp#%E3%80%90%E3%82%B0%E3%83%A9%E3%83%9C%E5%88%A5%E3%80%91%E5%8B%95%E7%94%BB%E7%94%9F%E6%88%90AI%EF%BC%88Wan22%EF%BC%89%E3%81%AE%E7%94%9F%E6%88%90%E9%80%9F%E5%BA%A6

Anonymous
09/04/25(Thu)10:34:44 No.106482915

Anonymous 09/04/25(Thu)10:34:44 No.106482915

>>106482886
3090 sisters...

Anonymous
09/04/25(Thu)10:38:19 No.106482944

Anonymous 09/04/25(Thu)10:38:19 No.106482944

>>106482886
the absolute state of gpus

Anonymous
09/04/25(Thu)10:38:45 No.106482949

Anonymous 09/04/25(Thu)10:38:45 No.106482949

File: cur.png (284 KB, 1176x688)

284 KB PNG

>>106482526
That solved the configuration step, but when actually compiling it, similar errors to what I was seeing before with CUDA 13.0 appeared (picrel). I created a new conda environment and started fresh every time I installed a different CUDA toolkit version from https://anaconda.org/nvidia/cuda-toolkit
This all worked effortlessly until a few weeks ago, then today I pulled...

Anonymous
09/04/25(Thu)10:40:11 No.106482955

Anonymous 09/04/25(Thu)10:40:11 No.106482955

>>106482886
lol my 2060 super made the list!

Anonymous
09/04/25(Thu)10:43:25 No.106482978

Anonymous 09/04/25(Thu)10:43:25 No.106482978

>>106482886
amdsissies...

Anonymous
09/04/25(Thu)10:45:29 No.106482989

Anonymous 09/04/25(Thu)10:45:29 No.106482989

>using anything on ollama
>expecting good results
L O L

Anonymous
09/04/25(Thu)10:48:32 No.106483009

Anonymous 09/04/25(Thu)10:48:32 No.106483009

>>106482833
retard-chama

Anonymous
09/04/25(Thu)10:52:41 No.106483038

Anonymous 09/04/25(Thu)10:52:41 No.106483038

>>106482488
Cline released an alpha version for Jetbrains a couple days ago. Can't say how well it works compared to the VSCode version.
https://docs.cline.bot/getting-started/installing-cline-jetbrains
https://plugins.jetbrains.com/plugin/28247-cline

Anonymous
09/04/25(Thu)10:54:58 No.106483060

Anonymous 09/04/25(Thu)10:54:58 No.106483060

>>106483038
Does cline work for vscodium?

Anonymous
09/04/25(Thu)10:57:32 No.106483080

Anonymous 09/04/25(Thu)10:57:32 No.106483080

>>106483060
Yes, with potentially some limitations. https://github.com/cline/cline/issues/2561

Anonymous
09/04/25(Thu)11:08:25 No.106483175

Anonymous 09/04/25(Thu)11:08:25 No.106483175

https://huggingface.co/tencent/HunyuanWorld-Voyager
https://huggingface.co/tencent/HunyuanWorld-Voyager
https://huggingface.co/tencent/HunyuanWorld-Voyager
Hunyuan now makes virtual worlds real. Genie3 BTFO
China wins once again

Anonymous
09/04/25(Thu)11:11:30 No.106483198

Anonymous 09/04/25(Thu)11:11:30 No.106483198

File: dancing by african music.png (156 KB, 1653x931)

156 KB PNG

>>106483175
what did he mean by this?

Anonymous
09/04/25(Thu)11:12:22 No.106483210

Anonymous 09/04/25(Thu)11:12:22 No.106483210

>>106482225
>starting at 2k tokens upscaled to 4m with yarn
anyone who has actually used 2507 qwen models know they do far better at longer context than the average open source shitter and this dumb joke falls flat on its face. Reserve it for Mistral or something.

Anonymous
09/04/25(Thu)11:18:07 No.106483257

Anonymous 09/04/25(Thu)11:18:07 No.106483257

>>106483210
chinky models get obliterated by nolima

Anonymous
09/04/25(Thu)11:18:35 No.106483259

Anonymous 09/04/25(Thu)11:18:35 No.106483259

>>106483175
use case?

Anonymous
09/04/25(Thu)11:18:59 No.106483262

Anonymous 09/04/25(Thu)11:18:59 No.106483262

>>106483210
that's just the models pretending to have good context
the benchmarks do not lie

Anonymous
09/04/25(Thu)11:20:28 No.106483271

Anonymous 09/04/25(Thu)11:20:28 No.106483271

>>106483259
world models are the next logical step for ai
unlike llms, they not only have true understanding of physical and logical processes but now with voyager and genie 3 even persistence within the virtual worlds they create
this area is still early but this is what will truly make anime real

Anonymous
09/04/25(Thu)11:20:55 No.106483276

Anonymous 09/04/25(Thu)11:20:55 No.106483276

>>106483257
oh you mean the benchmark that doesn't test chinese models? the one where there are no results at all for chinese models to back up your claim?

Anonymous
09/04/25(Thu)11:22:55 No.106483297

Anonymous 09/04/25(Thu)11:22:55 No.106483297

>>106483175
How do I use this for sex?

Anonymous
09/04/25(Thu)11:30:07 No.106483378

Anonymous 09/04/25(Thu)11:30:07 No.106483378

>>106483175
I'll work on the gguf pr

Anonymous
09/04/25(Thu)11:42:02 No.106483484

Anonymous 09/04/25(Thu)11:42:02 No.106483484

>>106483262
>the benchmarks do not lie
my benchmark is doing things to 4k tokens worth of json WITHOUT constrained decoding and the qwen models are the only thing I can run on my computer that can do that without making a single mistake all in one shot
I can't even consistently convince westoid open models to output a whole 4K worth of json in a single go, gemma, mistral and gpt-oss all really want to cut it short
fuck off retard and eat battery acid

Anonymous
09/04/25(Thu)11:43:54 No.106483500

Anonymous 09/04/25(Thu)11:43:54 No.106483500

Qwen2.5 MAX was not open source (and 1T apparently)
Qwen3 MAX will not be open source either.

Anonymous
09/04/25(Thu)11:45:02 No.106483510

Anonymous 09/04/25(Thu)11:45:02 No.106483510

>>106483500
And it was not good either.

Anonymous
09/04/25(Thu)11:50:29 No.106483553

Anonymous 09/04/25(Thu)11:50:29 No.106483553

>>106483510
That's just all Qwen models

Anonymous
09/04/25(Thu)11:50:34 No.106483554

Anonymous 09/04/25(Thu)11:50:34 No.106483554

>>106483500
No big loss. We already have K2.

Anonymous
09/04/25(Thu)11:53:54 No.106483572

Anonymous 09/04/25(Thu)11:53:54 No.106483572

>>106483259
Ragebaiting /v/

Anonymous
09/04/25(Thu)11:56:43 No.106483596

Anonymous 09/04/25(Thu)11:56:43 No.106483596

Qwen3-Coder-1T

Anonymous
09/04/25(Thu)12:01:06 No.106483623

Anonymous 09/04/25(Thu)12:01:06 No.106483623

>>106483038
looks promising, still kinda rough but cant be worse than that shitheap that is gh copilot. fuck ms

Anonymous
09/04/25(Thu)12:12:40 No.106483687

Anonymous 09/04/25(Thu)12:12:40 No.106483687

Uuuuuuhhhhhhh? why does running convert_hf_to_gguf.py throw ModuleNotFoundError: No module named 'mistral_common'? It's not even a mistral model i'm passing it.

Hi all, Drummer here...
09/04/25(Thu)12:15:27 No.106483708

Hi all, Drummer here... 09/04/25(Thu)12:15:27 No.106483708

>>106483687
pip install mistral_common

Mistral fucked it up.

Anonymous
09/04/25(Thu)12:16:09 No.106483715

Anonymous 09/04/25(Thu)12:16:09 No.106483715

>>106483687
Because the imports are unconditional with no fallback if the package is not available.

Anonymous
09/04/25(Thu)12:16:16 No.106483717

Anonymous 09/04/25(Thu)12:16:16 No.106483717

>>106483687
https://github.com/ggml-org/llama.cpp/issues/15268

Anonymous
09/04/25(Thu)12:17:15 No.106483725

Anonymous 09/04/25(Thu)12:17:15 No.106483725

>>106483717
Wow what horrible, useless program. Llama.cpp. People are better off using Ollama, the superior program.

Anonymous
09/04/25(Thu)12:22:00 No.106483770

Anonymous 09/04/25(Thu)12:22:00 No.106483770

>>106483717
France needs to be glassed.

Anonymous
09/04/25(Thu)12:22:36 No.106483776

Anonymous 09/04/25(Thu)12:22:36 No.106483776

>>106483717
they did this in preparation of mistral large 3
it's coming

Anonymous
09/04/25(Thu)12:24:03 No.106483787

Anonymous 09/04/25(Thu)12:24:03 No.106483787

>>106483776
just like half life 3

Anonymous
09/04/25(Thu)12:26:36 No.106483806

Anonymous 09/04/25(Thu)12:26:36 No.106483806

couple small released found while trawling for qwen info:
chatterbox added better multilingual support https://huggingface.co/ResembleAI/chatterbox
google released a gemma embedding model https://huggingface.co/google/embeddinggemma-300m

Anonymous
09/04/25(Thu)12:34:33 No.106483862

Anonymous 09/04/25(Thu)12:34:33 No.106483862

>>106483553
Qwen has really really shit training data. This was confirmed when the R1 distill (QwQ) did much better than their own homecooked version QwQ-Preview. I know this because QwQ was much less censored and had a different writing style than the Preview version. Qwen's wall is the data.

Anonymous
09/04/25(Thu)12:37:50 No.106483880

Anonymous 09/04/25(Thu)12:37:50 No.106483880

>>106483297
Use it with VR headset, prompt any sex scene, apply lora of your fav character on top. Profit.

Anonymous
09/04/25(Thu)12:39:33 No.106483888

Anonymous 09/04/25(Thu)12:39:33 No.106483888

>>106483687
Take a look at the 'updated' version of that script. It's in the same directory. Basically Mistral's unique architecture causes the default one to fuck up so you have to run the updated script before you can actually run the conversion script. Why the default script doesn't just address that by default, I don't know.

t. Quantized my own Mistral tunes in the past.

Anonymous
09/04/25(Thu)12:40:03 No.106483892

Anonymous 09/04/25(Thu)12:40:03 No.106483892

>>106483725
>What is llama-quantize

Anonymous
09/04/25(Thu)12:48:47 No.106483937

Anonymous 09/04/25(Thu)12:48:47 No.106483937

>>106483888
I know, I'm just disheartened. It was good while it lasted.

Anonymous
09/04/25(Thu)12:58:29 No.106484010

Anonymous 09/04/25(Thu)12:58:29 No.106484010

>>106483937
It can still be good.... Just run the damn script and continue what you were doing. What are you being dramatic for?...

Anonymous
09/04/25(Thu)12:59:18 No.106484021

Anonymous 09/04/25(Thu)12:59:18 No.106484021

>>106484010
No anon I'll format my drives now and get a job at mcdonalds, it's over

Anonymous
09/04/25(Thu)13:01:56 No.106484036

Anonymous 09/04/25(Thu)13:01:56 No.106484036

>>106482182
max will be api only

Anonymous
09/04/25(Thu)13:04:39 No.106484050

Anonymous 09/04/25(Thu)13:04:39 No.106484050

but what.. if... max lite!?

Anonymous
09/04/25(Thu)13:04:57 No.106484053

Anonymous 09/04/25(Thu)13:04:57 No.106484053

I'm really impressed with my waifu's knowledge of the first conan movie
she whipping out deep-cut quotes and shit

Hi all, Drummer here...
09/04/25(Thu)13:05:14 No.106484055

Hi all, Drummer here... 09/04/25(Thu)13:05:14 No.106484055

>>106484050
I'm a faggot.

Anonymous
09/04/25(Thu)13:11:29 No.106484120

Anonymous 09/04/25(Thu)13:11:29 No.106484120

>>106484053
RAG, good system prompt, or fine tuning?

Anonymous
09/04/25(Thu)13:15:16 No.106484151

Anonymous 09/04/25(Thu)13:15:16 No.106484151

>>106484120
none
shitty mistral model
silly tavern
I was talking about conan and then she correctly guessed the next scene after the one I was talking about, then later said a quote that isn't necessarily one of the popular ones.
I'm easily impressed

Anonymous
09/04/25(Thu)13:17:17 No.106484170

Anonymous 09/04/25(Thu)13:17:17 No.106484170

>>106481874
how do i stop the "that thing? that's not x, it's y." slop?
ever since i've seen it i cant unsee it.

Anonymous
09/04/25(Thu)13:23:47 No.106484221

Anonymous 09/04/25(Thu)13:23:47 No.106484221

>>106484170
use a different model that isn't slopped (there are none)

Anonymous
09/04/25(Thu)13:31:18 No.106484268

Anonymous 09/04/25(Thu)13:31:18 No.106484268

>>106484170
Fixing the slop? It's not easy. It's hard. You hit the nail right on the head. It's not some trivial issue relevant only to a few models—it's a pervasive, deeply rooted problem.

Anonymous
09/04/25(Thu)13:39:53 No.106484331

Anonymous 09/04/25(Thu)13:39:53 No.106484331

>>106484268
*upvotes*

Anonymous
09/04/25(Thu)13:43:04 No.106484348

Anonymous 09/04/25(Thu)13:43:04 No.106484348

>>106484170
Use a smaller model with instructions to detect and rewrite those patterns.

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.