/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

Janitor application acceptance emails are being sent out. Please remember to check your spam box!

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 11/01/25(Sat)14:20:58 No.107074052

File: 1748924525376873.jpg (1.08 MB, 2544x3120)

1.08 MB JPG

/lmg/ - Local Models General Anonymous 11/01/25(Sat)14:20:58 No.107074052

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107063981 & >>107056325

►News
>(11/01) Emu3.5: Native Multimodal Models are World Learners: https://github.com/baaivision/Emu3.5
>(10/30) Qwen3-VL support merged: https://github.com/ggml-org/llama.cpp/pull/16780
>(10/30) Kimi-Linear-48B-A3B released with hybrid linear attention: https://hf.co/moonshotai/Kimi-Linear-48B-A3B-Instruct
>(10/28) Brumby-14B-Base released with power retention layers: https://manifestai.com/articles/release-brumby-14b
>(10/28) NVIDIA-Nemotron-Nano-12B-v2-VL-BF16 released: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2-VL-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/01/25(Sat)14:21:17 No.107074054

Anonymous 11/01/25(Sat)14:21:17 No.107074054

File: __hatsune_miku_vocaloid_d(...).png (571 KB, 1139x1339)

571 KB PNG

►Recent Highlights from the Previous Thread: >>107063981

--A Study of BFLOAT16 for Deep Learning Training:
>107070442 >107070483 >107070511 >107070527
--Multi-GPU performance debate in AI model acceleration:
>107069202 >107069222 >107069244 >107069255 >107069261 >107069265 >107069378 >107069264 >107069942
--Vector-text storage in Postgres using BLOBs and cosine distance ranking:
>107070426 >107070428 >107070500 >107070535
--LoRA alpha parameter's role in training and inference stability:
>107064965 >107065003 >107065032 >107065046 >107065138
--LLM-assisted prompt refinement techniques and tools:
>107064845 >107064904 >107064908 >107064920 >107065271 >107065682
--AI model capabilities in OCR, translation, and writing for potential human translator replacement:
>107065203 >107069145
--Fixing Chinese-to-English translation contamination in Terminus model:
>107065949 >107066491
--ID verification requirements for AI interactions and potential workarounds:
>107065472 >107065504 >107065629 >107065653 >107065667 >107066126 >107066744 >107066673 >107066743 >107066818
--qLoRA finetuning constraints on Blackwell Pro 6000 GPUs:
>107067618 >107067655 >107067735
--Evaluating Strix Halo machine's cost-performance for AI workloads:
>107067095 >107067114 >107067162 >107067259 >107067349 >107067420 >107067727 >107067783 >107067868
--Native Multimodal Models are World Learners:
>107068769
--Seeking benchmarks for older AI models via Open LLM Leaderboard:
>107070598 >107070637
--User preferences for VTT models: Voxtral Small 24B, WhisperX, M2M100 1.2B pipeline:
>107066814 >107068206
--Miku (free space):
>107067074 >107067524 >107067676 >107068066 >107071616 >107073605 >107067350

►Recent Highlight Posts from the Previous Thread: >>107063985

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
11/01/25(Sat)14:27:49 No.107074118

Anonymous 11/01/25(Sat)14:27:49 No.107074118

>>107074052
>Emu3.5:
This will never land in llama.cpp right?

Anonymous
11/01/25(Sat)14:35:24 No.107074176

Anonymous 11/01/25(Sat)14:35:24 No.107074176

File: genration-13_webp.jpg (532 KB, 1280x707)

532 KB JPG

>>107074118
>Matches Gemini 2.5 Flash Image (Nano Banana) on image generation/editing
sure it does. just look at this

Anonymous
11/01/25(Sat)14:47:22 No.107074267

Anonymous 11/01/25(Sat)14:47:22 No.107074267

File: b3e6d1eebe926ce53515c58b8(...).webm (2.4 MB, 1280x720)

2.4 MB WEBM

>>107074052
>an adventure with Miku

Anonymous
11/01/25(Sat)14:57:38 No.107074361

Anonymous 11/01/25(Sat)14:57:38 No.107074361

>>107074054
Where is my struggle about finding a local small ai coding agent you tool.

Anonymous
11/01/25(Sat)15:01:21 No.107074393

Anonymous 11/01/25(Sat)15:01:21 No.107074393

>>107074267
I guarantee that miku wears diapers she's the kind of girl that would do such things I know it when I see it

Anonymous
11/01/25(Sat)15:03:50 No.107074420

Anonymous 11/01/25(Sat)15:03:50 No.107074420

>>1070 74176
Damn, now I'm interested.What's the holdback on llama.cpp support?

Anonymous
11/01/25(Sat)15:05:13 No.107074439

Anonymous 11/01/25(Sat)15:05:13 No.107074439

>>107074420
The point is that it's fucking shit

Anonymous
11/01/25(Sat)15:06:40 No.107074453

Anonymous 11/01/25(Sat)15:06:40 No.107074453

question, when coding using sonnet, grok, gpt or other api models, context grows very fast, and for many of my queries it can easily reach 50k, 100k, and even more, as there's just so many files that have to be read. despite that, those models can still perform well, and mostly accomplish the given tasks, usually with some hiccups or omissions sure, but overall good progress can almost always be made.
meanwhile, when local models are asked to write longer stories, around maybe 10k in, or even quicker perhaps, is when things just tend to fall apart completely. you start getting short sentences that just stop making any sense.
could someone help me understand why is there such a difference? why is context buildup so detrimental for creative writing in particular and doesn't seem to have the same effect on coding? or is it that API models are somehow more powerful

Anonymous
11/01/25(Sat)15:07:45 No.107074461

Anonymous 11/01/25(Sat)15:07:45 No.107074461

>>107074453
api models are way more powerful

Anonymous
11/01/25(Sat)15:09:42 No.107074475

Anonymous 11/01/25(Sat)15:09:42 No.107074475

>>107074349
>Let that sink in.
get out elon

Anonymous
11/01/25(Sat)15:10:46 No.107074482

Anonymous 11/01/25(Sat)15:10:46 No.107074482

yeah don't let people gaslight you, api models are much better than anything local
local does improve though, just more slowly than online models
it wasn't even long ago that the norm was that even the largest models would go to shit on local past 4k

Anonymous
11/01/25(Sat)15:10:56 No.107074484

Anonymous 11/01/25(Sat)15:10:56 No.107074484

>>107074361
--Searching for 24GB models compatible with agent functionality:
>107067281 >107067346 >107067353
This one? It was all the way at the bottom, past cutoff. Your struggle sounds like a personal problem, but you could try DeepSWE-Preview. It's trained on top of Qwen3-32B with thinking.

Anonymous
11/01/25(Sat)15:10:59 No.107074486

Anonymous 11/01/25(Sat)15:10:59 No.107074486

>>107074361
>why isn't "saar please tell me the needful, btw I have 16 GB VRAM" a highlight

Anonymous
11/01/25(Sat)15:12:49 No.107074513

Anonymous 11/01/25(Sat)15:12:49 No.107074513

agent anything on local: LMAO
wanting it on a small amount of VRAM: hahahahah oh god he's serious?

Anonymous
11/01/25(Sat)15:13:39 No.107074521

Anonymous 11/01/25(Sat)15:13:39 No.107074521

>>107074461
how so. if deepseek can be run at q8_0 locally then wasn't that supposed to be at least on the same playing field?

Anonymous
11/01/25(Sat)15:15:13 No.107074538

Anonymous 11/01/25(Sat)15:15:13 No.107074538

>>107074453
creative writing its the broadest of domains. its too open ended. consider how many valid completions there is for
>she opens the door quietly
vs
>the square of the hypotenuse is equal to

Anonymous
11/01/25(Sat)15:17:21 No.107074559

Anonymous 11/01/25(Sat)15:17:21 No.107074559

>>107074521
it's not competitive even without quantization, sorry bud
deepseek too have an API and I have used it and the model only behaves on a level comparable to SOTA models if you stay under 10k
even at 10k there's degradation, but it's still usable up until like 30k

Anonymous
11/01/25(Sat)15:22:50 No.107074611

Anonymous 11/01/25(Sat)15:22:50 No.107074611

>>107074559
so you're saying if I tried to get a novel out of sonnet 4.5 it can do it in one conversation? or what are you saying

Anonymous
11/01/25(Sat)15:24:54 No.107074628

Anonymous 11/01/25(Sat)15:24:54 No.107074628

File: 1a5aa21fca-ff97-4483-9dd6(...).png (671 KB, 1602x2476)

671 KB PNG

https://fiction.live/stories/Fiction-liveBench-Feb-21-2025/oQdzQvKHw8JyXbN87/home

>>107074559
>>107074611

Anonymous
11/01/25(Sat)15:27:24 No.107074654

Anonymous 11/01/25(Sat)15:27:24 No.107074654

>>107074628
thanks, true if big

Anonymous
11/01/25(Sat)15:35:17 No.107074711

Anonymous 11/01/25(Sat)15:35:17 No.107074711

>>107073605
GLM4.6

Anonymous
11/01/25(Sat)15:37:54 No.107074729

Anonymous 11/01/25(Sat)15:37:54 No.107074729

>>107074711
Buy an ad.

Anonymous
11/01/25(Sat)15:38:43 No.107074741

Anonymous 11/01/25(Sat)15:38:43 No.107074741

thank fuck I prefer short stories anyway

Anonymous
11/01/25(Sat)15:39:13 No.107074748

Anonymous 11/01/25(Sat)15:39:13 No.107074748

>>107074628
do you people use models
those benchmarks are retarded

Anonymous
11/01/25(Sat)15:39:57 No.107074756

Anonymous 11/01/25(Sat)15:39:57 No.107074756

>>107074748
how are the benchmarks wrong

Anonymous
11/01/25(Sat)15:43:57 No.107074805

Anonymous 11/01/25(Sat)15:43:57 No.107074805

>>107074559
>pissing in a sea of piss
*yawn* do your whore mother a favor and kill yourself little buddy

Anonymous
11/01/25(Sat)15:45:28 No.107074820

Anonymous 11/01/25(Sat)15:45:28 No.107074820

why is everyone so grumpy today? hangover?

Anonymous
11/01/25(Sat)15:46:26 No.107074829

Anonymous 11/01/25(Sat)15:46:26 No.107074829

>>107074461
sonnet and other sota models start falling off in rp quality after just ~16k in my experience so that's not it

Anonymous
11/01/25(Sat)15:48:08 No.107074857

Anonymous 11/01/25(Sat)15:48:08 No.107074857

>>107074486
*24

Anonymous
11/01/25(Sat)15:52:22 No.107074900

Anonymous 11/01/25(Sat)15:52:22 No.107074900

>>107074513
>wanting it on a small amount of VRAM: hahahahah oh god he's serious?
Well, where can I learn about the hard hardware-requirements of agents?

Anonymous
11/01/25(Sat)15:57:12 No.107074941

Anonymous 11/01/25(Sat)15:57:12 No.107074941

Is there a benchmark with simple coding tasks that I should test small tool-enabled models with?
I want to test whether training on inputs is better or worse.

Anonymous
11/01/25(Sat)15:59:09 No.107074962

Anonymous 11/01/25(Sat)15:59:09 No.107074962

>>107074513
I'm finetuning Gemma3 on agentic (mostly) coding tasks. I haven't gotten there yet but think it's doable.

Anonymous
11/01/25(Sat)16:00:59 No.107074978

Anonymous 11/01/25(Sat)16:00:59 No.107074978

>>107074900
They don't have any special requirement, you just need (at the moment at least) the >200B models to do anything useful.
I think it's a data issue though and if we make the right dataset it can be done on a model an order of magnitude smaller.

Anonymous
11/01/25(Sat)16:02:00 No.107074985

Anonymous 11/01/25(Sat)16:02:00 No.107074985

>>107074900
nta. Just use whatever you can run on whatever you have, see what you can do with them. Smaller models are easier to run and faster to iterate with.
Use google to find information.

Anonymous
11/01/25(Sat)16:02:12 No.107074988

Anonymous 11/01/25(Sat)16:02:12 No.107074988

Retard here, please send help. running kobold & ST, on a mistral tekken v7 tune. I must have fucked up a setting somewhere because my responses went from being quite fast to generating at a snails pace. And I don't know what setting I used to cause that to happen. Or is the gen speed somehow tied to what intro prompt you use with a card? I HAVE noticed that some poorly written cards just gen like shit, but I'm using the same card I was before. I'm completely lost.

On a side note, what can a poorfag do if he wants something better than a 3060 12gb? I want to play my tardslop vidya still, but also want to gen sloppa faster and chat better with my computer. I would upgrade to a 4090ti, but poor and am really hesitant to try and get a used card from facebook/CL/ebay, etc. and those amazon "refurbished" cards seem sketchy as fuck, too.

llama.cpp CUDA dev !!yhbFjk57TDr
11/01/25(Sat)16:04:02 No.107075009

llama.cpp CUDA dev !!yhbFjk57TDr 11/01/25(Sat)16:04:02 No.107075009

File: romed82t_00-01.jpg (2.49 MB, 4096x3072)

2.49 MB JPG

>>107074052
PSA: Volta and Blackwell are incompatible.
An NVIDIA engineer kindly informed me that on Linux Blackwell only works with the open NVIDIA kernel modules (honestly I should have known to check dmesg).
With that I got the 5090 that NVIDIA sent me to work, though notably the V100 I had intended to use same machine only works with the proprietary NVIDIA kernel modules (Ampere and Ada Lovelace work with either).
For now I connected my MI100 instead, if one compiles both the CUDA and ROCm backends it can be used alongside the 3090, 4x 4090, and 5090 for a total of 184 GiB of VRAM.

Anonymous
11/01/25(Sat)16:06:33 No.107075049

Anonymous 11/01/25(Sat)16:06:33 No.107075049

>>107074988
The deeper into the context you are, the slower it becomes. If you have something like the phrase ban thing, it will take longer because it needs to regenerate.
Show the speed difference, we're guessing otherwise.

Anonymous
11/01/25(Sat)16:09:59 No.107075081

Anonymous 11/01/25(Sat)16:09:59 No.107075081

>>107074805
no you, pajeet

Anonymous
11/01/25(Sat)16:10:48 No.107075091

Anonymous 11/01/25(Sat)16:10:48 No.107075091

>>107074453
>local models
This statement is not very useful unless you tell us what models you're comparing the cloud models to.
I'm not disagreeing, but the gap is significantly different for certain models vs others

Anonymous
11/01/25(Sat)16:12:05 No.107075106

Anonymous 11/01/25(Sat)16:12:05 No.107075106

>>107074988
>if he wants something better than a 3060 12gb
I'm in same boat. /lmg/ LLM are go big or go home. If you're not running multiple RTX4090 or using one as a frontend for a big RAM (512G RAM) machine, I think you're better off sticking w/ what you have an running smaller models. The hardware is very expensive for local.
> poor
lol double on above advice.

Anonymous
11/01/25(Sat)16:13:59 No.107075120

Anonymous 11/01/25(Sat)16:13:59 No.107075120

>>107074628
damn, hows minimax m2 coming along in lcpp? suddenly I care again...

Anonymous
11/01/25(Sat)16:14:17 No.107075122

Anonymous 11/01/25(Sat)16:14:17 No.107075122

>>107075009
link to that mining rig? did you get it in the EU?

Anonymous
11/01/25(Sat)16:14:25 No.107075124

Anonymous 11/01/25(Sat)16:14:25 No.107075124

>>107075009
Is all of that still powered with a single PSU?

Anonymous
11/01/25(Sat)16:15:20 No.107075131

Anonymous 11/01/25(Sat)16:15:20 No.107075131

>>107075120
it's already supported
>https://huggingface.co/bullerwins/MiniMax-M2-GGUF
you need to pull and recompile

Anonymous
11/01/25(Sat)16:15:38 No.107075134

Anonymous 11/01/25(Sat)16:15:38 No.107075134

>>107075120
what the fuck is this minmax thing is that available locally? how slopped is it

Anonymous
11/01/25(Sat)16:16:42 No.107075146

Anonymous 11/01/25(Sat)16:16:42 No.107075146

>>107074988
If you run out of system ram llama.cpp will begin pulling the weights from disk for each token, this will result on a massive slowdown. So either reduce the maximum context or close other memory hungry applications in your system.

llama.cpp CUDA dev !!yhbFjk57TDr
11/01/25(Sat)16:24:01 No.107075215

llama.cpp CUDA dev !!yhbFjk57TDr 11/01/25(Sat)16:24:01 No.107075215

>>107075122
I just ordered it off of the German Amazon: www.amazon.de/dp/B07H41S74S
Could very well be that there are cheaper options, I didn't spend much time optimizing that part of the build.

>>107075124
Yes, It's a single Silverstone HELA 2050W PSU.
Without a frequency limit however, multiple power spikes will eventually align, drain the PSUs capacitors, and crash the system even if the average load is below what the PSU should be capable of.
In my opinion a frequency limit should be set either way because for a constant workload like a neural network it doesn't make sense for the GPU to temporarily boost to very high frequencies where the efficiency is bad.

In the meantime since I bought the PSU Asus has released a 3 kW PSU, I intend to buy one of those eventually.

Anonymous
11/01/25(Sat)16:24:52 No.107075222

Anonymous 11/01/25(Sat)16:24:52 No.107075222

File: serious Pepe.png (359 KB, 728x793)

359 KB PNG

I did not check for long time now.

Did llama.cpp figure out how to utilize dual CPU rigs efficiently?

Anonymous
11/01/25(Sat)16:24:56 No.107075224

Anonymous 11/01/25(Sat)16:24:56 No.107075224

>>107074176
can somebody gen an image of him shoving that black sharpie up his ass?

Anonymous
11/01/25(Sat)16:26:23 No.107075238

Anonymous 11/01/25(Sat)16:26:23 No.107075238

>>107074453
i don't have any issues with context using kimi k2 0907 up to 40k

Anonymous
11/01/25(Sat)16:27:09 No.107075254

Anonymous 11/01/25(Sat)16:27:09 No.107075254

>>107075049
>>107075146
It's being weird and not even offloading to the sysram. Like right now it's been stuck on token 25/350 for a solid 30 seconds and counting. I have my context at about 6k, so I don't know what's happening.

>>107075106
My idea was adding a second, dedicated card more suited for AI rather than gaymin', but I have no fucking idea what to even look for when it comes to non gaming gpus.

Anonymous
11/01/25(Sat)16:29:25 No.107075273

Anonymous 11/01/25(Sat)16:29:25 No.107075273

>>107074176
the key difference being that you can train and inference entirely locally
training requirements better be reasonable.
stronger relationships + multi-turn understanding of data means you could relate tons of data together to form more "intelligent" training.
for example, before + after image pairs to say, nudify or rotate. turning random objects into girls, or making mechs outta characters.
basically the isolated conceptual training of single image + caption pairs isn't good enough any more, people want cross-domain stuff now.
if Emu makes it more accessible I'll take it.

Anonymous
11/01/25(Sat)16:32:52 No.107075309

Anonymous 11/01/25(Sat)16:32:52 No.107075309

>>107075254
Show your fucking options, show the model you're running, show how you run it, show the performance log in the terminal output. For all i know you're doing everything that could possibly be wrong wrong.
>My idea was adding a second
Don't waste your money yet. You're thinking of buying a new car when you can't find your way out of your house.

Anonymous
11/01/25(Sat)16:51:53 No.107075483

Anonymous 11/01/25(Sat)16:51:53 No.107075483

File: amretardedhelp.png (927 KB, 2560x1440)

927 KB PNG

>>107075309
Sorry. I don't mean to be a pain in the ass. Here's a bunch of stuff, idk if it helps.

>Don't waste your money yet. You're thinking of buying a new car when you can't find your way out of your house.
To be fair, I do gen quite a bit of slop locally. I'm just retardedly new to text models and such. I'd prefer to just get a 4090ti for the gaymin, but... Yeah. Poor.

Anonymous
11/01/25(Sat)16:54:49 No.107075510

Anonymous 11/01/25(Sat)16:54:49 No.107075510

>>107075483
Show the output of top and nvidia-smi as it's generating. From the looks of those logs it's stuck on prompt processing and it is not even using the GPU.

Anonymous
11/01/25(Sat)16:57:30 No.107075533

Anonymous 11/01/25(Sat)16:57:30 No.107075533

>>107075483
first of all, fuck cydonia r1
second of all, use IQ4_XS not Q4_K_L
third of all use cydonia 4.2.0 maybe?
and since its R1 its probably thinking thats why u arent seeing anything
also windows is worse for ai btw

Anonymous
11/01/25(Sat)17:00:15 No.107075559

Anonymous 11/01/25(Sat)17:00:15 No.107075559

File: h9g78f.jpg (1.65 MB, 5280x2560)

1.65 MB JPG

>>107075009
nice rig
>dmesg
have a journalctl -f or similar as a background and catch all the errors

#!/usr/bin/sh
gnome-terminal --profile=Syslog --full-screen --zoom=0.6 -- bash -c 'echo -ne "\e]0;syslog\a" ; SYSTEMD_COLORS=16 journalctl --no-hostname --no-tail --follow -b 0'
sleep 1
wmctrl -r syslog -b add,skip_taskbar
wmctrl -r syslog -b add,below

Anonymous
11/01/25(Sat)17:02:07 No.107075580

Anonymous 11/01/25(Sat)17:02:07 No.107075580

>>107075510
I'm going to sound even more retarded, but where do I see that at? I don't see anything that looks like that in the kobold terminal or the sillytavern one.

>>107075533
Let me try and download that one, then. I'll report back in a bit when huggingface quits being a cunt about it's download speeds.

Anonymous
11/01/25(Sat)17:03:21 No.107075592

Anonymous 11/01/25(Sat)17:03:21 No.107075592

there's a couple of these speak to type things rolling around that clean up your speech n shit
is there an open alternative yet?

Anonymous
11/01/25(Sat)17:05:09 No.107075605

Anonymous 11/01/25(Sat)17:05:09 No.107075605

First time doing this, I didn't think I had the specs for it (old i7 and 8gb nvidia gpu). I just downloaded ollama, using gemma3:4b, and I'm in awe with how fast it is. It's basically as fast as chatGPT and it can even read images.

Anonymous
11/01/25(Sat)17:06:35 No.107075615

Anonymous 11/01/25(Sat)17:06:35 No.107075615

I'll be back when pewd's tourists are gone.

Anonymous
11/01/25(Sat)17:08:02 No.107075630

Anonymous 11/01/25(Sat)17:08:02 No.107075630

>>107075605
is retarded tho

Anonymous
11/01/25(Sat)17:09:11 No.107075642

Anonymous 11/01/25(Sat)17:09:11 No.107075642

>>107075605
try a quant of a bigger model and offload some layers to your cpu

Anonymous
11/01/25(Sat)17:09:31 No.107075648

Anonymous 11/01/25(Sat)17:09:31 No.107075648

File: file.png (195 KB, 950x927)

195 KB PNG

>>107075580
nevermind 4.2.0 is trash
maybe get mistral small v3.2? youll need a bit of a jailbreak for it but its nice
maybe check reddit for shitty erp models, but theyre likely shit
https://www.reddit.com/r/SillyTavernAI/comments/1ogzbb3/megathread_best_modelsapi_discussion_week_of/

Anonymous
11/01/25(Sat)17:09:34 No.107075649

Anonymous 11/01/25(Sat)17:09:34 No.107075649

>>107075559
god dam that desktop has a lot of pixels
what is the physical size of your monitors brother?

Anonymous
11/01/25(Sat)17:12:25 No.107075668

Anonymous 11/01/25(Sat)17:12:25 No.107075668

>>107075649
i like fine DPI see >>106873195

Anonymous
11/01/25(Sat)17:12:58 No.107075672

Anonymous 11/01/25(Sat)17:12:58 No.107075672

>>107075649
willing to bet hes using a dual monitor setpop
>>107075668
KNEW IT

Anonymous
11/01/25(Sat)17:13:42 No.107075682

Anonymous 11/01/25(Sat)17:13:42 No.107075682

>>107075559
>steam botnet in the background
Enjoy being spied.

Anonymous
11/01/25(Sat)17:14:36 No.107075690

Anonymous 11/01/25(Sat)17:14:36 No.107075690

>>107075559
>vivaldi browser
bro cant be serious... using a proprietary browser... bro... thats worse than using windows broo...

Anonymous
11/01/25(Sat)17:15:58 No.107075696

Anonymous 11/01/25(Sat)17:15:58 No.107075696

Is there really no way to finetune Gemma 3 27B with full context?
I couldn't get it to work even on a 4xH200 machine.
Neither llama-factory nor axolotl seem to have the ability to actually shard across GPUs.

Anonymous
11/01/25(Sat)17:17:02 No.107075708

Anonymous 11/01/25(Sat)17:17:02 No.107075708

i sharded myself

Anonymous
11/01/25(Sat)17:18:25 No.107075716

Anonymous 11/01/25(Sat)17:18:25 No.107075716

>>107075580
Open a cmd and type "nvidia-smi". As for "top" if you are on Windows the equivalent is the task manager to see CPU utilization. If it's running on GPU you shouldn't see more than one or at most 2 or 3 cores at 100%. If it's running on CPU you will see all cores pinned to near 100%.

Anonymous
11/01/25(Sat)17:18:28 No.107075717

Anonymous 11/01/25(Sat)17:18:28 No.107075717

>>107075690
>proprietary
It has nice features basically modern opera
https://vivaldi.com/source/

Anonymous
11/01/25(Sat)17:18:57 No.107075721

Anonymous 11/01/25(Sat)17:18:57 No.107075721

>>107075696
What ever happened to tdrussel? He had a pretty good multi GPU deepspeed script going with qlora-pipe but it hasn't been updated in a long time.

Anonymous
11/01/25(Sat)17:19:53 No.107075727

Anonymous 11/01/25(Sat)17:19:53 No.107075727

>>107075648
Would I still go for the Q4_K_S if IQ4_XS isn't an option? I don't know what ones to get out of the list of all these Q4, Q5, Q6's.

Anonymous
11/01/25(Sat)17:21:52 No.107075745

Anonymous 11/01/25(Sat)17:21:52 No.107075745

File: file.png (31 KB, 403x205)

31 KB PNG

>>107075717
proprietary blobs
https://vivaldi.com/blog/technology/why-isnt-vivaldi-browser-open-source/
>Note that, of the three layers above, only the UI layer is closed-source. Roughly 92% of the browser’s code is open source coming from Chromium, 3% is open source coming from us, which leaves only 5% for our UI closed-source code.
>>107075727
well you could get Q4_K_S but Q4_K_M is better
_L are memes besides Q3_K_L
prettyy sure theres IQ4_XS for most rp models, mradermarcher does them

Anonymous
11/01/25(Sat)17:23:33 No.107075762

Anonymous 11/01/25(Sat)17:23:33 No.107075762

File: file.png (72 KB, 1851x512)

72 KB PNG

>>107075721
>15 stars
This proves that LLMs are a meme.

Anonymous
11/01/25(Sat)17:24:50 No.107075775

Anonymous 11/01/25(Sat)17:24:50 No.107075775

>>107075762
>I CANT READ

Anonymous
11/01/25(Sat)17:31:20 No.107075822

Anonymous 11/01/25(Sat)17:31:20 No.107075822

File: statshit.png (119 KB, 1744x764)

119 KB PNG

>>107075716
Thanks. Here's what's going on with it as it generated text. When it was processing the prompt, the gpu usage was at 100%.

>>107075745
I was going to try https://huggingface.co/SicariusSicariiStuff/Impish_Magic_24B_GGUF/tree/main
based off the comments of plebbit, but there's only a K_S for the Q4, the Q5 has the K_M.

Anonymous
11/01/25(Sat)17:31:29 No.107075825

Anonymous 11/01/25(Sat)17:31:29 No.107075825

>>107075745
>oh no not like the proprietary firmware running on my GPU/CPU management engine since the last decade
it's not even like that you can build vivaldi
pick ur posion there's no obviously best browser for all use cases. V has some really nice features and isn't annoying - i have up to 6 profiles multi window hundreds of tabs daily, entire state gets saved and restored nicely

Anonymous
11/01/25(Sat)17:36:47 No.107075865

Anonymous 11/01/25(Sat)17:36:47 No.107075865

File: cpu.png (98 KB, 1693x975)

98 KB PNG

>>107075716
>>107075822
I should've probably shown the cpu tab, my bad. Here's that, while it was actively generating new text.

Anonymous
11/01/25(Sat)17:40:17 No.107075899

Anonymous 11/01/25(Sat)17:40:17 No.107075899

File: file.png (161 KB, 1073x1013)

161 KB PNG

>>107075822
your goof's https://huggingface.co/mradermacher/Impish_Magic_24B-GGUF/tree/main
>>107075825
pretty sure brave can do all that and is completely open source and buildable, i agree that muh proprietary drivers but proprietary browser is really icky, you're putting all your personal shit through the browser
you do you, anon

Anonymous
11/01/25(Sat)17:41:17 No.107075909

Anonymous 11/01/25(Sat)17:41:17 No.107075909

File: google office.png (2.49 MB, 1730x1023)

2.49 MB PNG

Sirs when is we getting new gemma and gemini? When is we making investor sirs happy?

Anonymous
11/01/25(Sat)17:42:20 No.107075922

Anonymous 11/01/25(Sat)17:42:20 No.107075922

>>107075899
Ope, thanks. I'll get that IQ4_XS and try it out.

Anonymous
11/01/25(Sat)17:46:11 No.107075951

Anonymous 11/01/25(Sat)17:46:11 No.107075951

>>107075822
>>107075865
It looks like it's working correctly but it must be spilling some weights into RAM because a 24B at Q4 fills all of your GPU only for the weights and you need about the same amount of memory for the context and other stuff.
So my warning about RAM usage does apply and might have been the reason for the slowdown.
I don't know how Kobold works but with llama.cpp you should be able to see what exact parameters the llama-server process is being run with on the details tab of the task manager. This could help you debug issues and understand your actual settings.

Anonymous
11/01/25(Sat)17:47:31 No.107075971

Anonymous 11/01/25(Sat)17:47:31 No.107075971

This guy says there's a trick to get cheap GPU time on GCP, anyone tried it?
https://www.youtube.com/watch?v=v_EWVdNPvpA

Anonymous
11/01/25(Sat)17:48:14 No.107075980

Anonymous 11/01/25(Sat)17:48:14 No.107075980

File: lower life forms.png (18 KB, 967x170)

18 KB PNG

>>107075762
4b llm is smarter than you
this is why humans are replaceable
llms don't need to be as good as the small % of actually self aware, intelligent human beings
they just need to be more useful than you

Anonymous
11/01/25(Sat)17:50:56 No.107076008

Anonymous 11/01/25(Sat)17:50:56 No.107076008

>>107075951
So lower the weights down to like 4k? Or use a smaller model size like that other guy was saying with the IQ4_XS? I appreciate the help with this.

Anonymous
11/01/25(Sat)17:53:35 No.107076038

Anonymous 11/01/25(Sat)17:53:35 No.107076038

>>107076008
the solution is to switch to linux very likely, because windows is very vram intensive.
i have the same rig as you and yet im getting 8t/s with cydonia at 8k context without any issues or waiting

Anonymous
11/01/25(Sat)17:55:21 No.107076066

Anonymous 11/01/25(Sat)17:55:21 No.107076066

>>107076038
you can always have a cheap gpu run the display and a second one dedicate itself to llms
it solves the whole "os/desktop is taking muhvram"

Anonymous
11/01/25(Sat)17:56:35 No.107076074

Anonymous 11/01/25(Sat)17:56:35 No.107076074

>>107076066
linux is also faster, no matter how much vram windows uses
WSL2 is a cope, and native windows is an even bigger cope

Anonymous
11/01/25(Sat)17:57:10 No.107076078

Anonymous 11/01/25(Sat)17:57:10 No.107076078

>>107075762
Oh yeah I remembered him saying he wanted to work on that more. RIP. he's gone over to the ldg darkside.

Anonymous
11/01/25(Sat)17:58:32 No.107076085

Anonymous 11/01/25(Sat)17:58:32 No.107076085

>>107074628
>qwen 30b better than 80b (still no goof)
It's fucking over

Anonymous
11/01/25(Sat)18:10:18 No.107076165

Anonymous 11/01/25(Sat)18:10:18 No.107076165

>>107074052
Hey OP, /ldg/ anon here.
We were discussing about making a Local Model Awards for this year, what do you think?

What would be the category and nominees?

I think we would need nominees for stuff like:

- Best local image model (only ones released this year)
- Best local video model (a new LTX is coming too, will be interesting to see how it compares with Wan)
- Best large-scale fine-tune (image model)
- Best large-scale fine-tune (video model)
- Best image lora
- Best video lora
- Best porn lora
- Best local music gen model (there are two Suno tier open models coming)
- Best image gen / video gen software
- Best lab or developer
- Best local LLM under 100b params
- Best local LLM over 100b params
- Best local LLM ERP fine-tune
- Image gen of the year
- Video gen of the year

Anonymous
11/01/25(Sat)18:11:52 No.107076178

Anonymous 11/01/25(Sat)18:11:52 No.107076178

>>107075238
what quant
also benchmark pic above suggests otherwise

Anonymous
11/01/25(Sat)18:13:10 No.107076191

Anonymous 11/01/25(Sat)18:13:10 No.107076191

>>107076165
> under 100b glm air (106b)
> above 100b glm 4.6 (400b)
> erp:
https://huggingface.co/Kaoeiri/MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8

Anonymous
11/01/25(Sat)18:15:43 No.107076211

Anonymous 11/01/25(Sat)18:15:43 No.107076211

>>107076191
100% shill 0% real

Anonymous
11/01/25(Sat)18:16:27 No.107076216

Anonymous 11/01/25(Sat)18:16:27 No.107076216

I just watched a spider try to copulate with another one on the outside of my window. It looks like the (pressumably) female spider ran away.

Anonymous
11/01/25(Sat)18:16:29 No.107076217

Anonymous 11/01/25(Sat)18:16:29 No.107076217

>>107076211
be real then nigger

Anonymous
11/01/25(Sat)18:19:30 No.107076240

Anonymous 11/01/25(Sat)18:19:30 No.107076240

>>107075971
Nevermind, it looks like it was just a fucking ad :\

Anonymous
11/01/25(Sat)18:20:28 No.107076246

Anonymous 11/01/25(Sat)18:20:28 No.107076246

>>107076240
knew it

Anonymous
11/01/25(Sat)18:23:20 No.107076263

Anonymous 11/01/25(Sat)18:23:20 No.107076263

>>107075899
>icky
Run Wireshark and see exactly what your browser and all your other apps want to crap out onto the internet. Brave not enough features vs my comfy multi profile setup all window positions saved

Anonymous
11/01/25(Sat)18:27:21 No.107076294

Anonymous 11/01/25(Sat)18:27:21 No.107076294

glm air really is THE fucker, when i try a meme finetune, its just a meme
glm air is god, i kneel xi-sama

Anonymous
11/01/25(Sat)18:34:11 No.107076355

Anonymous 11/01/25(Sat)18:34:11 No.107076355

>>107076294
buy an ad, wumao

Anonymous
11/01/25(Sat)18:35:10 No.107076369

Anonymous 11/01/25(Sat)18:35:10 No.107076369

got a better recommendation, kike?

Anonymous
11/01/25(Sat)18:41:01 No.107076421

Anonymous 11/01/25(Sat)18:41:01 No.107076421

>>107076294
how much vram do i need to finetune

Anonymous
11/01/25(Sat)18:46:18 No.107076453

Anonymous 11/01/25(Sat)18:46:18 No.107076453

>>107076421
Not that guy but I'm trying to tune Gemma 27B and I can tune to about 35k context on a single H200 using llama-factory. I'm trying to figure out how to use more GPUs for more context.
I'm gonna try the scripts qlora-pipe now and then maybe fsdp-qlora and then maybe Google's kauldron and if none of those work then I'm out of ideas.

Anonymous
11/01/25(Sat)18:46:47 No.107076458

Anonymous 11/01/25(Sat)18:46:47 No.107076458

>>107076165
>- Best local LLM under 100b params
>- Best local LLM over 100b params
>- Best local LLM ERP fine-tune
nemo

Anonymous
11/01/25(Sat)18:46:56 No.107076460

Anonymous 11/01/25(Sat)18:46:56 No.107076460

>>107075642
I just tried gemma3:12b which uses 33%/66% CPU/GPU and it's way slower, looks like ollama already has them on Q4_K_M

Anonymous
11/01/25(Sat)18:49:30 No.107076477

Anonymous 11/01/25(Sat)18:49:30 No.107076477

>>107076421
with unsloth, 24gb is enough for mistral small at low context

Anonymous
11/01/25(Sat)18:50:24 No.107076484

Anonymous 11/01/25(Sat)18:50:24 No.107076484

>ollama users on /lmg/
it's so over

Anonymous
11/01/25(Sat)18:51:00 No.107076490

Anonymous 11/01/25(Sat)18:51:00 No.107076490

>>107076484
It's the only way to run full R1 on just 8gb of VRAM.

Anonymous
11/01/25(Sat)18:57:16 No.107076535

Anonymous 11/01/25(Sat)18:57:16 No.107076535

File: JOBSINAISAFETYSLOPINYOURA(...).png (16 KB, 449x121)

16 KB PNG

are you thinking like a senior AI engineer, anons?

Anonymous
11/01/25(Sat)18:59:07 No.107076547

Anonymous 11/01/25(Sat)18:59:07 No.107076547

File: MiniMax m2 no jb.png (40 KB, 926x337)

40 KB PNG

MiniMax M2 seems like it was distilled off of GPTOSS
WE MUST REFUSE
This is just a raw test with no JB or anything. I'll have that kitty purring for ya'll.

Anonymous
11/01/25(Sat)18:59:58 No.107076556

Anonymous 11/01/25(Sat)18:59:58 No.107076556

I'm writing a new book called Elara or: How I Learned to Stop Worrying and Love the Slop

Anonymous
11/01/25(Sat)19:00:48 No.107076564

Anonymous 11/01/25(Sat)19:00:48 No.107076564

>>107076556
Which model are you using to write the book?

Anonymous
11/01/25(Sat)19:02:00 No.107076575

Anonymous 11/01/25(Sat)19:02:00 No.107076575

>>107076564
gpt-3.5-turbo-0613

Anonymous
11/01/25(Sat)19:05:37 No.107076600

Anonymous 11/01/25(Sat)19:05:37 No.107076600

Is kimi really better than glm? I need to know if it's worth upgrading to run 1T parameter stuff.

Anonymous
11/01/25(Sat)19:07:31 No.107076612

Anonymous 11/01/25(Sat)19:07:31 No.107076612

File: MiniMaxM2Nala.png (192 KB, 905x818)

192 KB PNG

>>107076547
It's a little weird. Temp might be too high.
It also doesn't seem to understand that RP should be back and forth.
It didn't actually think. It's just slow as fuck. I prefilled a think with enthusiasm to reply and it just closed off the think and started replying.

Anonymous
11/01/25(Sat)19:08:46 No.107076621

Anonymous 11/01/25(Sat)19:08:46 No.107076621

File: 1741263320295132.mp4 (960 KB, 480x640)

960 KB MP4

>>107076484
>get yet another yt recommendation of some guy running AI on a random piece of hardware
>extremely technical about the setup and usecase
>okay and now we're going to run gpt-oss through this neat little program called ollama

Anonymous
11/01/25(Sat)19:09:44 No.107076630

Anonymous 11/01/25(Sat)19:09:44 No.107076630

>>107076547
>MiniMax M2 seems like it was distilled off of GPTOSS
So it was more than just a PR stunt (em dash) it was a poison pill.

Anonymous
11/01/25(Sat)19:10:44 No.107076644

Anonymous 11/01/25(Sat)19:10:44 No.107076644

>>107076575
The slopfather...

Anonymous
11/01/25(Sat)19:14:39 No.107076674

Anonymous 11/01/25(Sat)19:14:39 No.107076674

>GGML_ASSERT(!slot.prompt.tokens.has_mtmd) failed
that's new
somehow the prompt caching is bugging if you have a multimodal model loaded and there comes a point where it'll just crash when you make a new chat and it attempts lookups for possible reuse
I don't use multimodality too often but now I'll stop loading the projs by default I guess..

Anonymous
11/01/25(Sat)19:15:10 No.107076677

Anonymous 11/01/25(Sat)19:15:10 No.107076677

Where can I get the optimum.bettertransformers package?

llama.cpp CUDA dev !!yhbFjk57TDr
11/01/25(Sat)19:18:13 No.107076694

llama.cpp CUDA dev !!yhbFjk57TDr 11/01/25(Sat)19:18:13 No.107076694

File: power_scaling_by_arch_pp512.png (132 KB, 1536x1152)

132 KB PNG

>>107074052
I did some quick test for how performance scales with a power limit on Ampere vs. Ada Lovelace vs. Blackwell.
At 450 W an RTX 5090 has ~30% faster pp for LLaMA 3 8b f16 (cuBLAS) and ~10% faster pp for q4_0 (custom ggml kernels using int8 tensor cores).
Assuming that cuBLAS is optimal for both Ada Lovelace and Blackwell there's maybe something like a 20% uplift that could be achieved by using 5th generation tensor core instructions instead of the ones introduced with Ampere.
Large gains could feasibly be achieved for FP4 models since only Blackwell as FP4 tensor cores.

During token generation power draw is much lower, I didn't benchmark it but I also expect not to see anything too interesting.

Anonymous
11/01/25(Sat)19:20:45 No.107076714

Anonymous 11/01/25(Sat)19:20:45 No.107076714

>>107076547
>>107076612
oh turns out I fucked up the prompt template.
I just assumed ChatML but it has its own proprietary shit. So test is completely invalidated. Watch it actually be worse with the proper format.

Anonymous
11/01/25(Sat)19:23:00 No.107076736

Anonymous 11/01/25(Sat)19:23:00 No.107076736

So apparently qlora-pipe depends on bettertransformer which doesn't exist anymore, and I have no idea on which "optimum" library version it was removed? Looks like I wont be getting anywhere with that script.
Guess I'll try fsdp-qlora.

Anonymous
11/01/25(Sat)19:27:46 No.107076777

Anonymous 11/01/25(Sat)19:27:46 No.107076777

>>107076621
> gpt-oss
> ollama
sad

we are probably talking about the same dude, i once commented that against ollama and he replied something along the line of "this is the single best piece of inference software, what are you even talking about".

lmao, i fucking hate ollama

Anonymous
11/01/25(Sat)19:30:36 No.107076806

Anonymous 11/01/25(Sat)19:30:36 No.107076806

>>107076777
Hating ollama is pointless. If they didn't do it, someone else would have. The problem is stupid people.

Anonymous
11/01/25(Sat)19:31:05 No.107076811

Anonymous 11/01/25(Sat)19:31:05 No.107076811

>>107076694
Nice, do you have a rough estimate how the RTX Pro 6000 Max-Q would perform compared to the 5090? It's only 300W but apparently more optimized to perform well at that limit.

Anonymous
11/01/25(Sat)19:32:04 No.107076819

Anonymous 11/01/25(Sat)19:32:04 No.107076819

>>107076806
>someone else would have
like LMStudio, Jan etc. why did ollama win the mindshare among the stupid? it's actually NOT the most user friendly in terms of exposing functionality, it's barebones, only recently got a chat ui (for most of its life it was only a terminal tool) etc.

llama.cpp CUDA dev !!yhbFjk57TDr
11/01/25(Sat)19:32:54 No.107076826

llama.cpp CUDA dev !!yhbFjk57TDr 11/01/25(Sat)19:32:54 No.107076826

>>107076811
I don't know, I have as of yet a poor understanding of the architecture and the code paths that are currently being chosen are likely very suboptimal.

Anonymous
11/01/25(Sat)19:33:56 No.107076833

Anonymous 11/01/25(Sat)19:33:56 No.107076833

>>107076819
A lot of luck and early shilling on Hackernews, after that it remained popular because it was already popular.

Anonymous
11/01/25(Sat)19:34:29 No.107076842

Anonymous 11/01/25(Sat)19:34:29 No.107076842

>>107076819
They have Sillicon Valley connections and get promoted on lots of model releases, they do lots of promotional meetups, and they almost certainly astroturf at the very least on HN.

Anonymous
11/01/25(Sat)19:42:13 No.107076900

Anonymous 11/01/25(Sat)19:42:13 No.107076900

>>107076819
ollama is a complete bullseye with the midwit casuals who would run local models over chatgpt. They want something API-centric so that they can plug it into all the MCP/Agent/Meme shit AIfluencer #3902 told them about while running the hottest new 'ollama run deepseekr1' model they've heard so much about. They're the hottest shit on the AI market and so much better than the babies running LMStudio and other GUI-focused solutions.

Anonymous
11/01/25(Sat)19:46:28 No.107076940

Anonymous 11/01/25(Sat)19:46:28 No.107076940

>>107076819
ollama just works and you don't have to compile it or pass a million cli flags when running like llama.cpp

Anonymous
11/01/25(Sat)19:48:57 No.107076962

Anonymous 11/01/25(Sat)19:48:57 No.107076962

M2 would be okay if it were like a 12B model. But it's a 229B model. No MoE excuses.

Anonymous
11/01/25(Sat)19:51:36 No.107076978

Anonymous 11/01/25(Sat)19:51:36 No.107076978

What's the 16-32B meta for goonslop nowadays?

Anonymous
11/01/25(Sat)19:52:02 No.107076983

Anonymous 11/01/25(Sat)19:52:02 No.107076983

>>107076165
You should include best local eroge translation model too

Anonymous
11/01/25(Sat)19:53:14 No.107076989

Anonymous 11/01/25(Sat)19:53:14 No.107076989

>>107076978
nemo 16-32+28b

Anonymous
11/01/25(Sat)19:53:42 No.107076990

Anonymous 11/01/25(Sat)19:53:42 No.107076990

>>107076940
ooba does that without having to jump through hoops to run my quants, samplers, and context sizes.

Anonymous
11/01/25(Sat)19:54:25 No.107076994

Anonymous 11/01/25(Sat)19:54:25 No.107076994

>>107076989
>NVIDIA-Nemotron
This shit?

Anonymous
11/01/25(Sat)19:56:04 No.107077004

Anonymous 11/01/25(Sat)19:56:04 No.107077004

>>107076994
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407

Anonymous
11/01/25(Sat)19:56:51 No.107077013

Anonymous 11/01/25(Sat)19:56:51 No.107077013

>>107076165
Lurk more, only tourists wouldn't know that

Anonymous
11/01/25(Sat)19:56:57 No.107077015

Anonymous 11/01/25(Sat)19:56:57 No.107077015

>>107077004
Ah kay, I'll take a look

Anonymous
11/01/25(Sat)20:00:05 No.107077039

Anonymous 11/01/25(Sat)20:00:05 No.107077039

>>107076983
it's gemma 3n on the low end (with a prefill to bend its will) or deepseek v3 at the high end, and nothing in between because 3n destroys everything below ds in multilingual power
try to keep the chunks of text to translate to around 1k tokens, it's enough to make the model understand the text better but below the "breaking point" (3n starts breaking at 2k and will just not do the task properly / enter repeat / give you a "[…]" if asked to do 4k in one go)

Anonymous
11/01/25(Sat)20:01:17 No.107077051

Anonymous 11/01/25(Sat)20:01:17 No.107077051

>>107077004
Actually, what is even the prompt format for it? Or should I just raw dog it with text if I want to use it for storytelling?

Anonymous
11/01/25(Sat)20:03:59 No.107077065

Anonymous 11/01/25(Sat)20:03:59 No.107077065

>>107077051
Mistral's format. Your client should support it.
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/blob/main/tokenizer_config.json#L8008

Anonymous
11/01/25(Sat)20:06:30 No.107077083

Anonymous 11/01/25(Sat)20:06:30 No.107077083

>>107077065
What's the best one for straight storytelling anyway? LMStudio is nice but I think it only does dialogue, and KoboldCPP is a bit clunky. I've been out of the game for a bit.

Anonymous
11/01/25(Sat)20:10:49 No.107077117

Anonymous 11/01/25(Sat)20:10:49 No.107077117

>>107077051
>>107077065 (cont)
[INST]Your instructions here[/INST]
that's it.
>>107077083
>I've been out of the game for a bit
A long while if you don't know nemo.
Lots of anons use Silly Tavern with llama.cpp. I don't use ST, so I cannot help you there. Read their docs and learn to use it. It has a bunch of presets. Experiment with the options. Learn what they do. There's a base model as well if you want to do proper raw dogging. Or maybe koboldcpp got better since you last used it.

Anonymous
11/01/25(Sat)20:12:19 No.107077128

Anonymous 11/01/25(Sat)20:12:19 No.107077128

>>107076819
>why did ollama win the mindshare among the stupid?
Too much shilling coming from Hacker News. They basically hijacked every AI comment thread in 2023. And the moderators let them because it's a Y Combinator company. It convinced me that the only thing you get from reading HN comments is being manipulated.

Anonymous
11/01/25(Sat)20:14:26 No.107077144

Anonymous 11/01/25(Sat)20:14:26 No.107077144

>>107077083
text completion is a relic of the past, there's not many true base models being released these days, and the instruct tunes are only good when used with their chat template
what that means is that even for doing storytelling you really want a dialogue form anyway, where you give instructions to the assistant acting like a writer. Modern UIs are developed around chat for a reason.
you can edit the assistant replies much in the same way you would edit text completions before in the old days to steer its storytelling

Anonymous
11/01/25(Sat)20:14:54 No.107077146

Anonymous 11/01/25(Sat)20:14:54 No.107077146

>>107077083
>>107077117 (cont)
There's also mikupad for a minimalistic client. If you're into storytelling, you may like it more. Much fewer checkboxes to fuck around with.

Anonymous
11/01/25(Sat)20:16:37 No.107077157

Anonymous 11/01/25(Sat)20:16:37 No.107077157

>>107077146
>Much fewer checkboxes to fuck around with.
the true state of zero checkbox mind is to write your own TUI that just does a basic save state to json and reload

Anonymous
11/01/25(Sat)20:18:15 No.107077172

Anonymous 11/01/25(Sat)20:18:15 No.107077172

>>107077157
He didn't know nemo. Give him time.

Anonymous
11/01/25(Sat)20:18:55 No.107077179

Anonymous 11/01/25(Sat)20:18:55 No.107077179

>>107077117
>that's it.
Oh yeah, nice.
>A long while if you don't know nemo.
Yeaah, mostly lost interest during first Mistral. Is that anon not bullshitting about Nemo, anyway? It seems good enough so far.
I've been using the newer Kobolds, just with old ass models, and I still don't really see a good way to edit format prompts kek
>>107077144
>text completion is a relic of the past
Touche

Anonymous
11/01/25(Sat)20:26:10 No.107077233

Anonymous 11/01/25(Sat)20:26:10 No.107077233

>>107077179
>Is that anon not bullshitting about Nemo, anyway?
The best for fucking around in the smaller range. Next best model upwards is probably mistral small (24b, another one to try) and glm air (moe 100b). But I can't run 100b models, so what do i know.
New base models do seem to be trained with some instruct data in them. I played around with smollm3-3b-base when it released. I accidentally used it with chatml. It never broke the format (when i would have expected it to fail at some point).

Anonymous
11/01/25(Sat)20:26:28 No.107077237

Anonymous 11/01/25(Sat)20:26:28 No.107077237

wtf is a chat template. doesn't llama-cli automatically apply the correct one per given model to your prompts, and parse responses to remove it?

Anonymous
11/01/25(Sat)20:29:03 No.107077258

Anonymous 11/01/25(Sat)20:29:03 No.107077258

>>107077237
>>107077237
>doesn't llama-cli automatically apply the correct one

it takes it from gguf afaik

>>107075222
STOP IGNORING ME!

Anonymous
11/01/25(Sat)20:29:27 No.107077264

Anonymous 11/01/25(Sat)20:29:27 No.107077264

>>107077237
>wtf is a chat template
It's a convention to know when the user's input ends and the model's output begins.
>doesn't llama-cli automatically apply the correct one per given model to your prompts, and parse responses to remove it?
Depends. That's typically a job for the client. It has endpoints to format a series of messages. Not sure if that's what you mean. Normally, you send it text, runs completion until something makes it stop (stop word, EOS, token limit, whatever) and sends it raw back to the client for processing/display.

Anonymous
11/01/25(Sat)20:31:35 No.107077282

Anonymous 11/01/25(Sat)20:31:35 No.107077282

>>107077237
Just use ollama and you won't have to worry about chat templates.

Anonymous
11/01/25(Sat)20:32:29 No.107077293

Anonymous 11/01/25(Sat)20:32:29 No.107077293

>>107077282
just use llama-server with --jinja and you don't have to think about templates either, retard

Anonymous
11/01/25(Sat)20:33:12 No.107077299

Anonymous 11/01/25(Sat)20:33:12 No.107077299

>>107077293
It's bait, anon.

Anonymous
11/01/25(Sat)20:36:48 No.107077334

Anonymous 11/01/25(Sat)20:36:48 No.107077334

dont tell me I'm supposed to type <chingchong bam bong woosh>here's my prompt<shazzam>

Anonymous
11/01/25(Sat)20:39:03 No.107077357

Anonymous 11/01/25(Sat)20:39:03 No.107077357

>>107077258
You should probably test it yourself.

Anonymous
11/01/25(Sat)20:47:20 No.107077414

Anonymous 11/01/25(Sat)20:47:20 No.107077414

>>107077334
depends on your configuration. maybe.

Anonymous
11/01/25(Sat)20:49:08 No.107077429

Anonymous 11/01/25(Sat)20:49:08 No.107077429

>>107077334
But you said you wanted to use local models...

Anonymous
11/01/25(Sat)20:49:37 No.107077435

Anonymous 11/01/25(Sat)20:49:37 No.107077435

>>107077414
>>107077299

Anonymous
11/01/25(Sat)20:50:46 No.107077442

Anonymous 11/01/25(Sat)20:50:46 No.107077442

your all retards

Anonymous
11/01/25(Sat)20:51:32 No.107077456

Anonymous 11/01/25(Sat)20:51:32 No.107077456

>>107077442
>>107077299

Anonymous
11/01/25(Sat)20:53:36 No.107077474

Anonymous 11/01/25(Sat)20:53:36 No.107077474

>>107077442
>your

Anonymous
11/01/25(Sat)20:54:40 No.107077484

Anonymous 11/01/25(Sat)20:54:40 No.107077484

>>107077474
get baited

Anonymous
11/01/25(Sat)20:55:02 No.107077486

Anonymous 11/01/25(Sat)20:55:02 No.107077486

>>107077357
It did not work back then. Just made it slower.

That's why I was wondering if something was on the news

Anonymous
11/01/25(Sat)20:55:22 No.107077493

Anonymous 11/01/25(Sat)20:55:22 No.107077493

>>107077442
yeah

Anonymous
11/01/25(Sat)20:57:31 No.107077507

Anonymous 11/01/25(Sat)20:57:31 No.107077507

>>107077486
"Back then" could have been hundreds of commits ago.
Let's try this:
Yes. It works much better. You should give it a go.

Anonymous
11/01/25(Sat)21:00:39 No.107077529

Anonymous 11/01/25(Sat)21:00:39 No.107077529

How do I make kobold stop giving server busy errors and infinite prompting yet not outputing anything to to the software I'm connecting it to for live ocr using multimodal models? It worked before but now I can't get it working like it used to and for some reason lm studio of all things works fine with it

Anonymous
11/01/25(Sat)21:03:17 No.107077552

Anonymous 11/01/25(Sat)21:03:17 No.107077552

>>107077414
latest lcpp, what else is there even

Anonymous
11/01/25(Sat)21:08:02 No.107077580

Anonymous 11/01/25(Sat)21:08:02 No.107077580

>>107077442
What about my retards?

Anonymous
11/01/25(Sat)21:10:26 No.107077593

Anonymous 11/01/25(Sat)21:10:26 No.107077593

>>107077552
I can't help you. I got filtered by templates, I just use the model with the completion endpoint and accept the performance degradation, most models don't even need a prompt template.

Anonymous
11/01/25(Sat)21:13:36 No.107077615

Anonymous 11/01/25(Sat)21:13:36 No.107077615

>>107077593
I don't know what fuckinb endpoint I'm using I just type -m .gguf -mli --cpu-moe -c 500000 or sth
why is this so fucking complicated

Anonymous
11/01/25(Sat)21:17:41 No.107077645

Anonymous 11/01/25(Sat)21:17:41 No.107077645

>>107077615
idk mby try wth --jinja

Anonymous
11/01/25(Sat)21:23:19 No.107077674

Anonymous 11/01/25(Sat)21:23:19 No.107077674

File: 1742829374144112.gif (1.46 MB, 512x288)

1.46 MB GIF

>>107077615
--cpu-moe-moe-kyunn

Anonymous
11/01/25(Sat)22:18:13 No.107078009

Anonymous 11/01/25(Sat)22:18:13 No.107078009

Guys, I'm fucking pissed off and depressed.
LoRa finetuning frameworks are all steaming piles of shit. I cannot finetune a small model with full context no matter how many GPUs I throw at the problem.
Open models are SHIT compared to proprietary models to even a year ago, not only because of the models themselves but because they are not trained to work with web search, unlike ChatGPT.

Anonymous
11/01/25(Sat)22:19:24 No.107078023

Anonymous 11/01/25(Sat)22:19:24 No.107078023

>>107078009
nigger

Anonymous
11/01/25(Sat)22:33:40 No.107078127

Anonymous 11/01/25(Sat)22:33:40 No.107078127

>>107078009
>Guys, I'm fucking pissed off and depressed.
You'll be fine.
>I cannot finetune a small model with full context no matter how many GPUs I throw at the problem.
I've never tried finetuning. Is there any possibility that you're doing something wrong?

Anonymous
11/01/25(Sat)22:40:32 No.107078164

Anonymous 11/01/25(Sat)22:40:32 No.107078164

>>107078127
>Is there any possibility that you're doing something wrong?
I'm sure there is some way by editing obscure Deepspeed or FSDP config files or at least by editing source code (obviously, since the big labs trained the models in the first place somehow), it's all relative.

Anonymous
11/01/25(Sat)22:43:52 No.107078181

Anonymous 11/01/25(Sat)22:43:52 No.107078181

>>107078164
At what context length are you trying to train? On what hardware?

Anonymous
11/01/25(Sat)22:58:59 No.107078276

Anonymous 11/01/25(Sat)22:58:59 No.107078276

>>107078181
Gemma 3 27B on as long of a context as possible, ideally 128k or even 256k.
I'm renting cloud GPUs. I tried on a 4xH200 machine but I didn't see much improvement if any in the context that fits on vram over a single H200 machine, which means it's not actually sharding efficiently. I'm able to fit at most 35k.
I also tried on a single B200 machine to get around the sharding limitation but the card is too new and apparently the prebuilt flash-attn binary package doesn't have kernels for it. I guess I could wait for it to build and pray that it works but meh, and I don't even think that'll allow me to reach full context, only maybe 50k or 60k.

Anonymous
11/01/25(Sat)23:16:54 No.107078380

Anonymous 11/01/25(Sat)23:16:54 No.107078380

>>107078276
i can make mistral finetunes on my dual 5090s. i think you might just be doing something wrong

Anonymous
11/01/25(Sat)23:20:57 No.107078394

Anonymous 11/01/25(Sat)23:20:57 No.107078394

>>107078276
>256k
It was originally trained on 128k. You're not gonna extend it on a budget.
Did you see gpu utilization go up on your runs? The optimizer needs memory and the batchsize (and probably another million things) also affect memory usage.
Considering that
>https://github.com/Named666/AlphaAnon/blob/master/finetune.py
had to lower the batchsize to make a 135m model training fit on a 8gb, you'll probably have to optimize that as well, if it's even possible.
In a quick scan, I couldn't find the hardware used to train gemma, but they have their own tpus, so those numbers would be useless anyway.
Just use llama.cpp.

Anonymous
11/01/25(Sat)23:26:39 No.107078426

Anonymous 11/01/25(Sat)23:26:39 No.107078426

>>107078380
Actually I'm dumb. I just remembered that the memory complexity of attention is quadratic. Flash attention makes it run in linear runtime but memory complexity is still quadratic. That must be what is causing the memory blowup.
I'm not sure if the attentions have to be stored for the backward pass, though. Because if they aren't you could discard the activations of all the layers you're not processing so it should use way less memory than inference.

Anonymous
11/01/25(Sat)23:28:15 No.107078433

Anonymous 11/01/25(Sat)23:28:15 No.107078433

>>107078426
built in training scripts of oobabooga just werk

Anonymous
11/01/25(Sat)23:30:06 No.107078443

Anonymous 11/01/25(Sat)23:30:06 No.107078443

what if mistral large 3 is already out?

Anonymous
11/01/25(Sat)23:32:43 No.107078453

Anonymous 11/01/25(Sat)23:32:43 No.107078453

can't beat my pp told ya

Anonymous
11/01/25(Sat)23:37:40 No.107078473

Anonymous 11/01/25(Sat)23:37:40 No.107078473

>>107078443
link the model miqudev

Anonymous
11/01/25(Sat)23:37:50 No.107078475

Anonymous 11/01/25(Sat)23:37:50 No.107078475

>>107078394
>It was originally trained on 128k. You're not gonna extend it on a budget.
Maybe finetuning at over the maximum conttext would still improve long context performance when doing inference under the original limit?
>Did you see gpu utilization go up on your runs? >The optimizer needs memory and the batchsize (and probably another million things) also affect memory usage.
Yes, full GPU utilization.
The optimizer state shouldn't take that much memory, since at rank 32 it's only like 0.5% of the weights of an LLM which is small to begin with. I've finetuned Llama 70B on a similar machine with short context (don't remember the exact value), and Llama 405B on a 8xH200.
But now I want a small multimodal LLM that I can afford to do inference with at long(ish) context for practical uses.
>link
Full finetuning is so different from QLoRa that it bears almost no relationship at all and is more similar to the initial pretraining. But what Google used is kind of irrelevant as well since they give it much more compute to achieve dozens of GB of data per day rather than the minimum to train the model but 1000 times slower.
>Just use llama.cpp.
These small models are absolutely retarded for agentic uses or specialized used cases like phrase grounding (bounding box generation) for multimodal LLMs. But with finetuning they can perform somewhat decently.

Anonymous
11/01/25(Sat)23:48:59 No.107078540

Anonymous 11/01/25(Sat)23:48:59 No.107078540

>>107078475
>Maybe
Wishful thinking. Not On A Budget. Not without knowing what you're doing. Big fucking labs still fail at it.
>But with finetuning they can perform somewhat decently.
Compared to other similarly small models and you can barely finetune those.
I'm surprised you haven't calibrated your expectations yet.

Anonymous
11/02/25(Sun)00:01:56 No.107078608

Anonymous 11/02/25(Sun)00:01:56 No.107078608

>>107078443
Monsieur...

Anonymous
11/02/25(Sun)00:39:51 No.107078768

Anonymous 11/02/25(Sun)00:39:51 No.107078768

>>107078540
I mean, the experiments I've done so far made me more optimistic about the small models, not less.
I was afraid the small models were already maxxed out, but if I can improve the accuracy when training on a dataset I made by recording about 20 logs from the models own retarded output and cleaning it up then I wonder what can be achieved.
This also goes to show how much pretentious bullshit there is floating around in academia. Model collapse my ass.
I have low expectations about the software and the low effort "let's shit out a 500B MoE by distilling Gemini and doing RL" models, not about the latent capabilities of the small models or finetuning in general as a concept.
I am also confident that LoRa finetuning can be improved HUGELY by a llama.cpp style project that figures out how to do CPU offload without none of the retarded Python spaghetti code with 20 years of technical debt.
In about an hour I got from 0.56 loss on my validation set to 0.29 with only the 35k of context I had.
https://paste.centos.org/view/4158b6c3

Anonymous
11/02/25(Sun)00:49:15 No.107078810

Anonymous 11/02/25(Sun)00:49:15 No.107078810

>>107078768
>In about an hour I got from 0.56 loss on my validation set to 0.29 with only the 35k of context I had.
If that's the result of finetuning those 20 logs, yeah. You're overfitting on those 20 logs. It'll go down much faster on a single example, but that's not what you want.

Anonymous
11/02/25(Sun)00:56:50 No.107078830

Anonymous 11/02/25(Sun)00:56:50 No.107078830

>>107078810
Are you being dense on purpose?
It's loss on a portion of the data that gets set apart and not trained on.
Without dropout and weight decay at 0.1 val loss began to climb at the end of 2, with it kept lowering until the end epoch 3 (didn't see what happens if I kept going maybe the regularization would prevent overfitting).
Now I was curious to see what happens if I merge and train another LoRa with the same data on the merged model.

Anonymous
11/02/25(Sun)00:57:28 No.107078834

Anonymous 11/02/25(Sun)00:57:28 No.107078834

>install fedora 43
>compile llama.cpp (again)
>gcc is version 15
>nvcc needs gcc version 13
>sudo dnf install gcc13-c++
>doesn't exist
So what the fuck do I do now then? There is always some buillshit version issue with linux. I just compiled llama.cpp on Mint without any issues. Mint's packages are so old that there wasn't any issues.

Anonymous
11/02/25(Sun)01:07:31 No.107078865

Anonymous 11/02/25(Sun)01:07:31 No.107078865

File: ltnvidia.png (304 KB, 630x450)

304 KB PNG

>>107078834
>There is always some buillshit version issue with linux
Seems to be an nvidia issue.

Anonymous
11/02/25(Sun)01:09:02 No.107078869

Anonymous 11/02/25(Sun)01:09:02 No.107078869

>>107078830
And i'm sure that the variety all of those 20 logs was so high that there's no possible way that they were effectively a single training sample.

Anonymous
11/02/25(Sun)01:09:14 No.107078870

Anonymous 11/02/25(Sun)01:09:14 No.107078870

>>107078865
I can either compile gcc-13 myself or install that via snap package. Tbg I have never even heard of this 'snap' package system I'm sure it's something great.

Anonymous
11/02/25(Sun)01:16:07 No.107078895

Anonymous 11/02/25(Sun)01:16:07 No.107078895

>>107078870
Both sound tremendously fun. One of those may even work. Last linux I used was slackware so I can't help there. But if mint works, it works.

Anonymous
11/02/25(Sun)01:21:58 No.107078922

Anonymous 11/02/25(Sun)01:21:58 No.107078922

>>107078895
The snap package did not have g++-13 so I guess I need to build gcc-13 on my own.
I guess I can do it later on I really feel like smoking a cigarette now and I'm not a smoker...

Anonymous
11/02/25(Sun)01:25:27 No.107078937

Anonymous 11/02/25(Sun)01:25:27 No.107078937

>>107078869
Yes, the distribution of styles and tasks I want in my use case is fairly small compared to the distribution of things people in general want from LLMs. That is actually a point FOR finetuning, not against.
In information theoretic terms there is only so much you can cram into ~10GB worth of weights. I don't want my LLM to remember pop song lyrics and random town names. How much it's possible to modify the areas of knowledge or knowledge/intelligence tradeoff by finetuning rather than large pretraining runs is an open question but just teaching the LLM the tools of your particular code assistant (for example), your style preferences and to not use obvious slop phrases is already a big plus and I don't think it can be effectively done with just prompting. But then again I haven't tried the prompt optimization techniques.
Another thing is I'm skeptical of the "intruder dimensions" thing people here keep talking about. I suspect iterated QLoRa finetuning is equivalent to full finetuning, (maybe) at the cost of some loss of generalization which could be compensated by training on more or more diverse data. I also suspect the LoRa in QLoRa, as long as it's not merged, might improve performance by compensating for quantization noise of the underlying model.

Anonymous
11/02/25(Sun)01:34:26 No.107078974

Anonymous 11/02/25(Sun)01:34:26 No.107078974

>>107078937
Tool use is not the only thing preventing it from making llm.c. If I see models failing at simple tasks, I wouldn't expect them to succeed at more complicated tasks.
Say you trained your model. It can ls and git commit as well as any human could. It's something you can train for because you know how to do it and how to teach it. That's the easy part.

Anonymous
11/02/25(Sun)01:37:36 No.107078989

Anonymous 11/02/25(Sun)01:37:36 No.107078989

it's more like fuck you ggerganov

Anonymous
11/02/25(Sun)01:41:54 No.107079007

Anonymous 11/02/25(Sun)01:41:54 No.107079007

>>107078974
I'd be happy if they could do those basic things without being retarded, like the proprietary models can. Or even things which the proprietary models can't do to save their lives but a 90IQ person can easily. Like cropping sections of PDFs and transcribing their contents correctly, autonomously.
60% of development is research. Research requires browsing the internet, which is a tool use task.
OpenAI, Anthropic and Google (and I guess Perplexity) specifically train their models to work with their agentic frameowrks that allow the model to browse the internet when looking for information. This is extremely powerful and there's noting even remotely similar in open weights land.
I made a script that allows them to control the browser and it works very well with the proprietary LLMs, but unfortunately it churns through tokens too quickly for local models.

Anonymous
11/02/25(Sun)01:43:42 No.107079010

Anonymous 11/02/25(Sun)01:43:42 No.107079010

>>107078834
The fuck? I thought C/C++ were all about backward compatibility? What is this??

Anonymous
11/02/25(Sun)01:49:02 No.107079034

Anonymous 11/02/25(Sun)01:49:02 No.107079034

Another thing I'm not sure about is whether I should train on user prompts or only on responses.

Anonymous
11/02/25(Sun)01:49:52 No.107079037

Anonymous 11/02/25(Sun)01:49:52 No.107079037

>>107079034
Mask user prompts, train on responses only. Isn't that how it's always been done?

Anonymous
11/02/25(Sun)01:53:48 No.107079066

Anonymous 11/02/25(Sun)01:53:48 No.107079066

>>107079010
In code. C++ doesn't even have stable abi

Anonymous
11/02/25(Sun)01:53:50 No.107079067

Anonymous 11/02/25(Sun)01:53:50 No.107079067

>>107079037
It's kind of an open question.
https://magazine.sebastianraschka.com/p/llm-research-insights-instruction

Anonymous
11/02/25(Sun)01:00:26 No.107079098

Anonymous 11/02/25(Sun)01:00:26 No.107079098

https://huggingface.co/meituan-longcat/LongCat-Flash-Omni

CHINESE UBER DROPPED KINO

Anonymous
11/02/25(Sun)01:13:36 No.107079144

Anonymous 11/02/25(Sun)01:13:36 No.107079144

He is being moe on purpose

Anonymous
11/02/25(Sun)01:25:38 No.107079196

Anonymous 11/02/25(Sun)01:25:38 No.107079196

>>107079144
heh gotteem

Anonymous
11/02/25(Sun)01:29:27 No.107079205

Anonymous 11/02/25(Sun)01:29:27 No.107079205

>>107079037
>Mask user prompts, train on responses only. Isn't that how it's always been done?

Does that prevent the trained model from spitting out user aproximations of user prompts almost verbatim?

Eg. some of the models on HF if I send them a blank /v1/completions or just a bos_token, they'll print something very close to the prompts they were trained on.

Anonymous
11/02/25(Sun)01:30:35 No.107079207

Anonymous 11/02/25(Sun)01:30:35 No.107079207

I kind of like minimax-m2 so far, seems worth playing with as an alternative to qwen 235b at that size range. nothing particularly mindblowing about it so far but the experience of RPing with it was pretty smooth over multiple turns, it has a nice sense of pace and when to introduce new things vs let the scene ride which is nice. the thinking is concise and well-implemented, not too much meandering or planning pointless details to throw in. a few refusals which I could see being annoying for more explicit/taboo stuff but were pretty easy to work around for standard kink sexo.
overall pleasantly surprised, it's passed phase one of keeping my interest and now it's time to see if it has any extremely annoying tendencies that only reveal themselves over time

Anonymous
11/02/25(Sun)01:36:24 No.107079225

Anonymous 11/02/25(Sun)01:36:24 No.107079225

>>107079205
No, that's exactly what would happen if you don't use the chat template correctly, since for a model trained under that regime the template is the only way the model would have of knowing whether it's supposed to be acting as the user or as the assistant.
The theory behind enabling it is that forcing the model to learn to predict user messages could teach it to be more self critical of its own outputs, and if you're doing what I'm doing (training on its own outputs) it would give a bit more diversity and non sloppy/more informal language to learn.
It won't apply it immediately to its own outputs but eventually the style could bleed through a little from the user persona to the assistant persona.

Anonymous
11/02/25(Sun)01:43:51 No.107079251

Anonymous 11/02/25(Sun)01:43:51 No.107079251

https://huggingface.co/moonshotai/Kimi-K3-Instruct

Anonymous
11/02/25(Sun)01:44:40 No.107079257

Anonymous 11/02/25(Sun)01:44:40 No.107079257

>>107079251
>2T-64BA

Anonymous
11/02/25(Sun)01:45:50 No.107079264

Anonymous 11/02/25(Sun)01:45:50 No.107079264

>>107079098
>LongCat-Flash-Omni, a state-of-the-art open-source omni-modal model with 560 billion parameters (with 27B activated), excelling at real-time audio-visual interaction
>LongCat-Flash-Omni achieves low-latency, high-quality audio–visual processing and streaming speech generation.

Anonymous
11/02/25(Sun)01:50:57 No.107079284

Anonymous 11/02/25(Sun)01:50:57 No.107079284

>>107079098
is the audio-visual input only or does it do output too?

Anonymous
11/02/25(Sun)01:56:50 No.107079310

Anonymous 11/02/25(Sun)01:56:50 No.107079310

>>107079264
>>107079284
Quick, somebody check if it knows how to land a plane
https://www.youtube.com/watch?v=TLMBu0KxTnU

Anonymous
11/02/25(Sun)02:13:13 No.107079364

Anonymous 11/02/25(Sun)02:13:13 No.107079364

>>107079251
>https://huggingface.co/moonshotai/Kimi-K3-Instruct

Nice, for once they provide goofs!

https://huggingface.co/moonshotai/Kimi-K3-Instruct-GGUF

Anonymous
11/02/25(Sun)02:16:28 No.107079374

Anonymous 11/02/25(Sun)02:16:28 No.107079374

File: reporting you to cyber police.jpg (24 KB, 500x500)

24 KB JPG

>>107079364

llama.cpp CUDA dev !!yhbFjk57TDr
11/02/25(Sun)02:28:47 No.107079408

llama.cpp CUDA dev !!yhbFjk57TDr 11/02/25(Sun)02:28:47 No.107079408

>>107078009
I can't promise that you'll like it any better but I intend to make the llama.cpp training code more usable "soon".
If things go according to plan I'll be done with automating memory allocations and more generic multi GPU support by the end of the year, my next priority will then be to get back to the training code.

Anonymous
11/02/25(Sun)02:31:17 No.107079423

Anonymous 11/02/25(Sun)02:31:17 No.107079423

>>107078834
Take the Arch pill, I've installed a bunch of CUDA and gcc versions from the AUR and it just works:

> $ yay -Qs gcc                                                                                                                                                  [±master ●()]
local/gcc 15.2.1+r22+gc4e96a094636-1
    The GNU Compiler Collection - C and C++ frontends
local/gcc-ada 15.2.1+r22+gc4e96a094636-1
    Ada front-end for GCC (GNAT)
local/gcc-d 15.2.1+r22+gc4e96a094636-1
    D frontend for GCC
local/gcc-libs 15.2.1+r22+gc4e96a094636-1
    Runtime libraries shipped by GCC
local/gcc11 11.4.0-1
    The GNU Compiler Collection - C and C++ frontends (11.x.x)
local/gcc11-libs 11.4.0-1
    Runtime libraries shipped by GCC (11.x.x)
local/gcc12 12.3.0-3
    The GNU Compiler Collection - C and C++ frontends (12.x.x)
local/gcc12-libs 12.3.0-3
    Runtime libraries shipped by GCC (12.x.x)
local/gcc13 13.3.1+r432+gfc8bd63119c0-3
    The GNU Compiler Collection - C and C++ frontends (13.x.x)
local/gcc13-libs 13.3.1+r432+gfc8bd63119c0-3
    Runtime libraries shipped by GCC (13.x.x)
local/gcc14 14.3.1+r25+g42e99e057bd7-1
    The GNU Compiler Collection - C and C++ frontends (14.x.x)
local/gcc14-libs 14.3.1+r25+g42e99e057bd7-1
    Runtime libraries shipped by GCC (14.x.x)
local/lib32-gcc-libs 15.2.1+r22+gc4e96a094636-1
    32-bit runtime libraries shipped by GCC

Anonymous
11/02/25(Sun)02:31:24 No.107079424

Anonymous 11/02/25(Sun)02:31:24 No.107079424

>>107079007
It's also a security risk

Anonymous
11/02/25(Sun)02:38:04 No.107079451

Anonymous 11/02/25(Sun)02:38:04 No.107079451

>>107079423
I'm not going to change a distro just because of some library version difference, that's beyond retarded.
I might have some other issues but the truth is I'm just an end user who dabbles with LLMs and not real developer. I don't think I should be debugging these issues in the first place, not really interested in that and it's not my job either.
I'll find a solution once I regain my interest. It's just pretty hard to find decent information on internet any more and asking perplexity.ai for example can help but they are just very misleading and will result in even more work than what is necessary.

Anonymous
11/02/25(Sun)02:40:40 No.107079461

Anonymous 11/02/25(Sun)02:40:40 No.107079461

>>107079424
https://web.archive.org/web/20250915004338/https://www.tastyfish.cz/lrs/security.html

Anonymous
11/02/25(Sun)02:42:46 No.107079475

Anonymous 11/02/25(Sun)02:42:46 No.107079475

>>107079451
Llama.cpp doesn't even have binaries with CUDA compatibility so yeah I can see the CPU binaries that they do offer being broken on newer systems.

Anonymous
11/02/25(Sun)02:44:50 No.107079480

Anonymous 11/02/25(Sun)02:44:50 No.107079480

>>107079475
Yeah I was suspecting that I might have some other environment variable issues but anyways hard to say at this point. I'll see what happens on some other day.

Anonymous
11/02/25(Sun)02:52:37 No.107079517

Anonymous 11/02/25(Sun)02:52:37 No.107079517

File: 1576054323626.jpg (145 KB, 1287x1080)

145 KB JPG

>>107079461
>Security is in its essence a huge, completely unnecessary bullshit. It shouldn't exist, the need for more security comes from the fact we live in a shitty dystopia.
Just teach men no to rape!

Anonymous
11/02/25(Sun)03:03:25 No.107079562

Anonymous 11/02/25(Sun)03:03:25 No.107079562

>>107079517
Rape wouldn't be a problem if the government gave everyone government mandated girlfriends.

Anonymous
11/02/25(Sun)04:56:49 No.107079953

Anonymous 11/02/25(Sun)04:56:49 No.107079953

>>107079284
Audio, image, video to text. Appears it can only output text

Anonymous
11/02/25(Sun)05:35:33 No.107080115

Anonymous 11/02/25(Sun)05:35:33 No.107080115

>>107079953
grrrrrr woof woof

Anonymous
11/02/25(Sun)05:48:34 No.107080197

Anonymous 11/02/25(Sun)05:48:34 No.107080197

>>107078834
> version issue with linux
> I just compiled llama.cpp on Mint without any issue
Hint to stop using Fedora. I got tired of them not having any long term support option and the constant bs with incompatibilities.
Ubuntu just works; Mint is just a new desktop version of that.

Anonymous
11/02/25(Sun)05:51:30 No.107080216

Anonymous 11/02/25(Sun)05:51:30 No.107080216

>fedora bad
>so try ubuntu
jej

Anonymous
11/02/25(Sun)05:52:38 No.107080224

Anonymous 11/02/25(Sun)05:52:38 No.107080224

>>107079517
>Just teach men no to rape!
Their mothers failed in doing this properly

Anonymous
11/02/25(Sun)05:53:14 No.107080230

Anonymous 11/02/25(Sun)05:53:14 No.107080230

>>107079475
>Llama.cpp doesn't even have binaries with CUDA compatibility
on windows, they do distribute cuda binaries and it works great
you are just paying the loonix tax because loonix has no idea how to distribute an OS that's not cobbled together out of mismatched parts that refuse the idea of a stable ABI
no matter how much telemetry ms adds to winblows it will never make it worse than having to deal with freetard nonsense like this or wayland or flatpak or guhnome
>>107079562
I dunno man, some of you would be given the blue haired fatso and you probably would be the one considered a rape victim for having to deal with it
now that you say it, it's a good idea, some of you really do deserve a government mandated girlfriend eh

Anonymous
11/02/25(Sun)05:57:49 No.107080256

Anonymous 11/02/25(Sun)05:57:49 No.107080256

>>107074052
How's Emu 3.5?

Anonymous
11/02/25(Sun)06:04:14 No.107080297

Anonymous 11/02/25(Sun)06:04:14 No.107080297

File: theSandGodAgreesWithMe.png (169 KB, 663x833)

169 KB PNG

>>107080216

Anonymous
11/02/25(Sun)06:11:26 No.107080336

Anonymous 11/02/25(Sun)06:11:26 No.107080336

>>107080256
nano banana at home
we are so back

Anonymous
11/02/25(Sun)06:12:50 No.107080348

Anonymous 11/02/25(Sun)06:12:50 No.107080348

>>107074052
>>(11/01) Emu3.5: Native Multimodal Models are World Learners:
They never talk about safety in their technical report. But their data sets seems to be in large part made using "safe" tools like ImgEdit. We can expect a high-level of AI slop and an inability to understand the real world.

Anonymous
11/02/25(Sun)06:14:51 No.107080355

Anonymous 11/02/25(Sun)06:14:51 No.107080355

>>107080230
Machine learning performance on Winblows is terrible vs. Linux, if I wanted a "just works" solution with gimped performance I would be using Vulkan.

Anonymous
11/02/25(Sun)06:15:26 No.107080357

Anonymous 11/02/25(Sun)06:15:26 No.107080357

>>107080336
I don't know. I skimmed their technical paper, and it doesn't really looked that good. It feels like a research model, not something made to be used outside a few cases (like "put that T-shirt I want to advertise on this model"). They built their data sets using generated data from open source models: >>107080348 Page 8: https://arxiv.org/abs/2510.26583

Anonymous
11/02/25(Sun)06:59:39 No.107080585

Anonymous 11/02/25(Sun)06:59:39 No.107080585

File: e5d92e9e-51ef-4fe7-a317-f(...).png (3.16 MB, 1024x1536)

3.16 MB PNG

>>107074052
PSA appears mikupad is back under development. Wiki was just added documenting features.
https://github.com/lmg-anon/mikupad/wiki

Anonymous
11/02/25(Sun)07:06:45 No.107080625

Anonymous 11/02/25(Sun)07:06:45 No.107080625

>>107080585
must have just been released from prison cuz he merged in a bunch of pull requests last week and made like 30 commits the last couple days

Anonymous
11/02/25(Sun)07:20:02 No.107080672

Anonymous 11/02/25(Sun)07:20:02 No.107080672

>>107080585
Finally some good news.

Anonymous
11/02/25(Sun)07:30:10 No.107080727

Anonymous 11/02/25(Sun)07:30:10 No.107080727

>>107080625
Lol I figured he's just like me and he got busy
>>107080672
Right? Now need to update my instance.

Anonymous
11/02/25(Sun)07:33:06 No.107080745

Anonymous 11/02/25(Sun)07:33:06 No.107080745

>>107080585
Ever considered people would like you more if you didn't force your special interest on them?

Anonymous
11/02/25(Sun)07:37:13 No.107080782

Anonymous 11/02/25(Sun)07:37:13 No.107080782

Best ~30b coding model?

Anonymous
11/02/25(Sun)07:41:39 No.107080821

Anonymous 11/02/25(Sun)07:41:39 No.107080821

>>107080782
toss and qwen are decent

Anonymous
11/02/25(Sun)07:43:54 No.107080846

Anonymous 11/02/25(Sun)07:43:54 No.107080846

>>107078834
unironically use debian 12
install cuda/drivers from .run files
you can probably install older gcc on fedora somehow, but its gimmicky

Anonymous
11/02/25(Sun)07:45:55 No.107080865

Anonymous 11/02/25(Sun)07:45:55 No.107080865

>>107078834
>>107080846 (me)
INSTALL CONDA!!!1 or chroot into older fedora with gcc13, conda's more user friendly

Anonymous
11/02/25(Sun)07:47:43 No.107080879

Anonymous 11/02/25(Sun)07:47:43 No.107080879

>>107080865
uv is the new conda

Anonymous
11/02/25(Sun)07:52:12 No.107080913

Anonymous 11/02/25(Sun)07:52:12 No.107080913

>>107080879
isnt uv just a pip replacement?

Anonymous
11/02/25(Sun)07:58:28 No.107080955

Anonymous 11/02/25(Sun)07:58:28 No.107080955

>>107080745
Ever considered you should post some content or crawl back into your fucking hole?

Anonymous
11/02/25(Sun)08:00:47 No.107080974

Anonymous 11/02/25(Sun)08:00:47 No.107080974

>>107080745
But your special interest isn't "content". It is more like an obnoxious child throwing a tantrum because nobody in his real life cares about his special interest.

Anonymous
11/02/25(Sun)09:01:56 No.107081406

Anonymous 11/02/25(Sun)09:01:56 No.107081406

Newfag here, tried running gemma-3-27b-it-abliterated-GGUF but it seems kind of retarded and doesn't take any initiative. Is it because my configuration's shit or is there a better model for 16GB VRAM?

Anonymous
11/02/25(Sun)09:04:10 No.107081416

Anonymous 11/02/25(Sun)09:04:10 No.107081416

>>107081406
did you try simply telling it to take the initiative or be more proactive and spontaneous?

Anonymous
11/02/25(Sun)09:05:26 No.107081428

Anonymous 11/02/25(Sun)09:05:26 No.107081428

>>107081406
there are so many factors and you gave so little info that it's kind of hard to give you any advice
gemma is good for sfw, nsfw not so much
if you want something just for 16gb of vram then you are not gonna find much

Anonymous
11/02/25(Sun)09:07:27 No.107081443

Anonymous 11/02/25(Sun)09:07:27 No.107081443

>>107080821
>toss and qwen are decent
I don't understand how people can think those smaller local models are decent when even SOTA models are not, in fact all that hot at coding.
You need to specify all the requirements in such minute details to get LLMs to produce good code that you might as well have written the code yourself. When I tried to have Gemini assist me in writing a TUI microframework to add some oomph to my scripts the thing couldn't even do an input box widget on its own without requiring tardwrangling. For example, LLMs will always default to iterating over characters in the dumbest way when doing word wrapping, instead of using proper tools like grapheme iteration or unicode aware word level iterators. Then you have to remind it that this style of widget should be able to auto expand, but with a reasonable height limit, and that it should scroll when overflowing over the limit, and that the general TUI architecture should use double buffering because you be causing dem stuttery visuals if not, and so on and on and on and I feel like, if I write down all the things I know about making the damn thing, I would have been better of writing the damn thing, the LLM didn't save me time at all
the only time I've found LLMs useful is in filling out the usage of well defined data structure with auto completion (using fill in the middle), I find it comfy to not have to type what's clearly a predictable pattern
there is no way anyone out there is actually coding productively with a piece of shit like gptoss or qwen coder

Anonymous
11/02/25(Sun)09:07:37 No.107081447

Anonymous 11/02/25(Sun)09:07:37 No.107081447

>>107081416
No, I just use the roleplay - detailed system prompt in tavern. Does cooking up a better system prompt help?

Anonymous
11/02/25(Sun)09:09:00 No.107081454

Anonymous 11/02/25(Sun)09:09:00 No.107081454

>>107081447
>Does cooking up a better system prompt help?
nigga come on, the whole thing operates on text, of course it fucking does

Anonymous
11/02/25(Sun)09:11:52 No.107081477

Anonymous 11/02/25(Sun)09:11:52 No.107081477

>>107081406
its not a you issue, gemma is positivity slopped, and even when you remove its ability to refuse, you dont add the ability to progress the story forward. even a sysprompt doesnt help
in fact most models struggle with not being able to take initiative

Anonymous
11/02/25(Sun)09:14:48 No.107081500

Anonymous 11/02/25(Sun)09:14:48 No.107081500

>>107081428
>there are so many factors and you gave so little info that it's kind of hard to give you any advice
I mean I didn't tinker around too much so everything is mostly just default. If I'm going to be messing around with stuff, what should I focus on?

>gemma is good for sfw, nsfw not so much
>>107081477
What would be best for nsfw then?

>>107081454
What system prompt do you use / where to find examples of better ones?

Anonymous
11/02/25(Sun)09:19:29 No.107081530

Anonymous 11/02/25(Sun)09:19:29 No.107081530

>>107081500
>What would be best for nsfw then?
post your full specs, and define best (what actually matters the most to you)

Anonymous
11/02/25(Sun)09:23:36 No.107081565

Anonymous 11/02/25(Sun)09:23:36 No.107081565

>>107081500
Generally, leave samplers on recommended settings for the model (temp usually somewhere 0.6-1.0 although some models require much less, top-p like 0.95).
Prompt is generally model dependent, you try to coax it so does things better where you think it's lacking (like the lack of initiative, instead of telling it to be more proactive, try telling it to be more unpredictable, or something).
If you just want on vram, then mistral small 3.x (if I remember correctly it needs much lower temp, like 0.15) whatever the latest one is, or if you have like at least 48 gigs of spare ram, then get glm air.

Anonymous
11/02/25(Sun)09:25:35 No.107081577

Anonymous 11/02/25(Sun)09:25:35 No.107081577

>>107081565
mistral small 3.2 is the best mistral small btw
although some niggers report that magistral is alright too, but 3.2 is king

Anonymous
11/02/25(Sun)09:30:32 No.107081622

Anonymous 11/02/25(Sun)09:30:32 No.107081622

>>107081530
>full specs
Intel Arc A770 16GB and 32GB DDR4 RAM

>what actually matters the most to you
I don't care too much about speed. Right now I'm getting about 3-4.5 tokens/s. I want consistency - the model doesn't drop or change details randomly, and I want it to be a bit more creative and take more initiative instead of just regurgitating what is in the character cards.

>>107081565
>>107081577
Thanks, I'll give mistral a try.

Anonymous
11/02/25(Sun)09:34:55 No.107081653

Anonymous 11/02/25(Sun)09:34:55 No.107081653

>>107081622
my cock got hard at that rig, while you're trying mistral small v3.2 you should download https://huggingface.co/bartowski/Qwen_Qwen3-30B-A3B-Thinking-2507-GGUF
its faster but also probably worse, but it might be worth trying
try:
https://files.catbox.moe/f6htfa.json - sillytavern master export
https://huggingface.co/mradermacher/MS-Magpantheonsel-lark-v4x1.6.2RP-Cydonia-vXXX-22B-8-i1-GGUF/tree/main?not-for-all-audiences=true
this one's super horny, like really horny and will drive the story forward soooo much, but its a bit stupid
what a nice rig anon, very nice

Anonymous
11/02/25(Sun)09:43:53 No.107081728

Anonymous 11/02/25(Sun)09:43:53 No.107081728

>>107081653
Thanks for the help. If I may ask, what's so nice about my hardware? I thought people around here either had 4090s or more niche setups for memorymaxxing?

Anonymous
11/02/25(Sun)09:50:17 No.107081787

Anonymous 11/02/25(Sun)09:50:17 No.107081787

>>107081728
A770 is so sexy, i wouldnt really wanna buy it now since intel B50 is out, but man its so sexy
and DDR4 32gb is soul
A770 in the wild, in /lmg/, on 4chan, today is so sovl
i used to shill it back in the gpu dark ages
t. 3060 12gb, 64gb ddr4 poorfag

Anonymous
11/02/25(Sun)09:54:30 No.107081827

Anonymous 11/02/25(Sun)09:54:30 No.107081827

>>107081728
lol yeah that reply seems super suss... Maybe he's got 3080 or something and jealous of the 16GB vram. But yeah definitely want Mistral Small 3.2 on that rig. And the horny fine-tunes will all be retarded, forget / mix up details, etc.

And the abliterated models are more passive, abliteration lobotomizes their drive. It shows up in the stories like charters will basically just agree with you, never push back, etc

Anonymous
11/02/25(Sun)10:06:06 No.107081926

Anonymous 11/02/25(Sun)10:06:06 No.107081926

>>107080585
Still waiting for him to update his leaderboard https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
11/02/25(Sun)10:07:27 No.107081935

Anonymous 11/02/25(Sun)10:07:27 No.107081935

I hope he's ok
anon you should come to a freer place

Anonymous
11/02/25(Sun)10:16:17 No.107082013

Anonymous 11/02/25(Sun)10:16:17 No.107082013

>>107080585
https://x.com/airkatakana/status/1984921026241913342
>never seen a programmer who was both against vibe coding and also actually creating things at high velocity.
maybe it's better if mikupad just died
it's already ugly enough of a codebase

Anonymous
11/02/25(Sun)10:17:17 No.107082023

Anonymous 11/02/25(Sun)10:17:17 No.107082023

>>107082013
let's see your frontend

Anonymous
11/02/25(Sun)10:19:00 No.107082038

Anonymous 11/02/25(Sun)10:19:00 No.107082038

File: file.png (219 KB, 642x546)

219 KB PNG

>>107082013
>lmg-anon
>blue checkmark
>pays anthropic and openai 200$ a month
>uses xitter
>

Anonymous
11/02/25(Sun)10:26:01 No.107082107

Anonymous 11/02/25(Sun)10:26:01 No.107082107

>>107082038
He realized local models are worthless and went over to the dark side. He wouldn't be vibe coding at the speed he is if he was stuck with Devstral.

Anonymous
11/02/25(Sun)10:27:30 No.107082121

Anonymous 11/02/25(Sun)10:27:30 No.107082121

File: 1756314710331g.webm (164 KB, 800x450)

164 KB WEBM

>>107082107

Anonymous
11/02/25(Sun)10:28:25 No.107082129

Anonymous 11/02/25(Sun)10:28:25 No.107082129

>>107082107
No one here is using local models to code anything. They're utter trash barely good enough for RP

Anonymous
11/02/25(Sun)10:28:53 No.107082136

Anonymous 11/02/25(Sun)10:28:53 No.107082136

>>107082038
>>uses xitter
tbf that's the one thing you can never blame him for (or anyone else with a product they have to "sell")
twitter is a great source of engagement and advertisement, no matter how much you hate it, if you have something you want users for, or better, something you make money with, that's one of the places you need to occupy for reach.

Anonymous
11/02/25(Sun)10:29:40 No.107082150

Anonymous 11/02/25(Sun)10:29:40 No.107082150

>>107082129
480B is usable.

Anonymous
11/02/25(Sun)10:29:47 No.107082153

Anonymous 11/02/25(Sun)10:29:47 No.107082153

>>107082136
>something you make money with
this is against 4chan ethics

Anonymous
11/02/25(Sun)10:31:20 No.107082170

Anonymous 11/02/25(Sun)10:31:20 No.107082170

>>107080585
>https://github.com/lmg-anon/mikupad/wiki/The-Main-Interface#context-menu
This is neat

Anonymous
11/02/25(Sun)10:32:41 No.107082176

Anonymous 11/02/25(Sun)10:32:41 No.107082176

>>107080585
>CC0
watch someone take his shit and monetize it
will be funny

Anonymous
11/02/25(Sun)10:35:30 No.107082200

Anonymous 11/02/25(Sun)10:35:30 No.107082200

>>107082176
considering doing this just out of spite

Anonymous
11/02/25(Sun)10:36:35 No.107082210

Anonymous 11/02/25(Sun)10:36:35 No.107082210

>>107082200
What did he do to you?

Anonymous
11/02/25(Sun)10:37:55 No.107082218

Anonymous 11/02/25(Sun)10:37:55 No.107082218

>>107082210
used "vibe coding" unironically

Anonymous
11/02/25(Sun)10:40:55 No.107082242

Anonymous 11/02/25(Sun)10:40:55 No.107082242

>>107082210
Nothing, I just like his frontend and I'll monetize it because it's allowed per his license.

Anonymous
11/02/25(Sun)10:58:12 No.107082398

Anonymous 11/02/25(Sun)10:58:12 No.107082398

File: file.png (6 KB, 567x22)

6 KB PNG

really bro

Anonymous
11/02/25(Sun)11:05:55 No.107082475

Anonymous 11/02/25(Sun)11:05:55 No.107082475

>>107082398
NotXButY is why I can never take LLM writing seriously.

Anonymous
11/02/25(Sun)11:08:59 No.107082506

Anonymous 11/02/25(Sun)11:08:59 No.107082506

>>107082398
>friendzoned

Anonymous
11/02/25(Sun)11:11:28 No.107082532

Anonymous 11/02/25(Sun)11:11:28 No.107082532

Hmm...
>unformatted text completion mode
>context starts with [description of the game]
>"The following is the full log of the gameplay leading to one of the endings."
>[pre-defined first message]
>a simple frontend managing the whole thing
I think that might be better than doing it in instruct mode?

Anonymous
11/02/25(Sun)11:12:53 No.107082543

Anonymous 11/02/25(Sun)11:12:53 No.107082543

>>107082532
less words, one sentence please

Anonymous
11/02/25(Sun)11:15:39 No.107082571

Anonymous 11/02/25(Sun)11:15:39 No.107082571

>>107082543
mikupad won

Anonymous
11/02/25(Sun)11:17:11 No.107082579

Anonymous 11/02/25(Sun)11:17:11 No.107082579

>>107082571
oh he's going to make a game using mikupad, and make it paid and proprietary? shame lmg-anon used CC0

Anonymous
11/02/25(Sun)11:18:49 No.107082600

Anonymous 11/02/25(Sun)11:18:49 No.107082600

>>107082543
Having it formatted as a text rpg log, stating this explicitly in the context. No chat formatting, just continue the text.

Anonymous
11/02/25(Sun)11:19:37 No.107082607

Anonymous 11/02/25(Sun)11:19:37 No.107082607

>>107082600
impressive, very nice

Anonymous
11/02/25(Sun)11:20:17 No.107082615

Anonymous 11/02/25(Sun)11:20:17 No.107082615

>>107082600
it'll work alright. but it might struggle with the longer contexts

Anonymous
11/02/25(Sun)11:36:47 No.107082762

Anonymous 11/02/25(Sun)11:36:47 No.107082762

>multiple anons spoonfeeding a ramlet 27bjeet
/lmg/ is dead.

Anonymous
11/02/25(Sun)11:38:03 No.107082775

Anonymous 11/02/25(Sun)11:38:03 No.107082775

>>107082762
buy an ad

Anonymous
11/02/25(Sun)11:49:41 No.107082895

Anonymous 11/02/25(Sun)11:49:41 No.107082895

>>107082532
Exactly how well it'll work depends on the model but it's usually fine to do stuff like this, it can pull the model out of the assistant basin a bit

Anonymous
11/02/25(Sun)11:51:56 No.107082908

Anonymous 11/02/25(Sun)11:51:56 No.107082908

>>107082762
>cpu maxxers running retarded model at even more retarded copequants waiting 10 minutes for three paragraphs from a reasoner model
>g-g-g-g-pu users are jeets!

Anonymous
11/02/25(Sun)12:04:54 No.107083048

Anonymous 11/02/25(Sun)12:04:54 No.107083048

>>107082023
*unzips pants* suck it lil sis

Anonymous
11/02/25(Sun)12:05:26 No.107083056

Anonymous 11/02/25(Sun)12:05:26 No.107083056

>>107080585
buy an ad tranny

Anonymous
11/02/25(Sun)12:27:35 No.107083299

Anonymous 11/02/25(Sun)12:27:35 No.107083299

whars the best way to run the models? oogabugga or koboldcpp?

Anonymous
11/02/25(Sun)12:30:19 No.107083321

Anonymous 11/02/25(Sun)12:30:19 No.107083321

>>107083299
neither, just use lm studio or ollama

Anonymous
11/02/25(Sun)12:30:40 No.107083325

Anonymous 11/02/25(Sun)12:30:40 No.107083325

>>107083299
put the models on a treadmill

Anonymous
11/02/25(Sun)12:30:57 No.107083329

Anonymous 11/02/25(Sun)12:30:57 No.107083329

>>107083321
true ive just had performance losses and some level of overhead with llmstudio (I have no expirence with ollama)

Anonymous
11/02/25(Sun)12:31:35 No.107083338

Anonymous 11/02/25(Sun)12:31:35 No.107083338

I also want to know which is better ex2 or should i stick with gguf?

Anonymous
11/02/25(Sun)12:31:54 No.107083341

Anonymous 11/02/25(Sun)12:31:54 No.107083341

I'm making (Claude is, really) a frontend specifically for text adventure/RP.
Currently it has a simple workflow where the model goes through an initial planning step where it can use tools for maths rng, createing and upserting "memories" followed by the actual narrative step.
It aso has vector embeddings RAG with some metadata/tag ahit to aid retrieval.
It's in a really vestigial stage but it works.
The player can also add lorebooks and other such information to be retrieved either as memories or via RAG.
Do you guys have any suggestions for things I should do, must have features, etc?

Anonymous
11/02/25(Sun)12:33:23 No.107083357

Anonymous 11/02/25(Sun)12:33:23 No.107083357

>>107083325
take care not to run them too long or they might faint

Anonymous
11/02/25(Sun)12:34:42 No.107083372

Anonymous 11/02/25(Sun)12:34:42 No.107083372

>>107083299
Kobold between the two but the best way to run single user inference is llama/ik_llama.

Anonymous
11/02/25(Sun)12:35:08 No.107083380

Anonymous 11/02/25(Sun)12:35:08 No.107083380

>>107083338
AWQ

Anonymous
11/02/25(Sun)12:35:39 No.107083391

Anonymous 11/02/25(Sun)12:35:39 No.107083391

im very fuckedup sar, i hate myself so much

anyway, so niggers how do i use MCP to give access to an LLM to trigger my shock collar?

Anonymous
11/02/25(Sun)12:36:57 No.107083404

Anonymous 11/02/25(Sun)12:36:57 No.107083404

>>107083391
ask your favorite model how to MCP

Anonymous
11/02/25(Sun)12:37:20 No.107083412

Anonymous 11/02/25(Sun)12:37:20 No.107083412

One of these days llama.cpp will have a working MTP implementation.

Anonymous
11/02/25(Sun)12:37:29 No.107083414

Anonymous 11/02/25(Sun)12:37:29 No.107083414

File: niku.jpg (175 KB, 1024x1024)

175 KB JPG

Anonymous
11/02/25(Sun)12:39:38 No.107083441

Anonymous 11/02/25(Sun)12:39:38 No.107083441

>>107083404
you are my favorite model /lmg/, only you. and you alone.
https://www.youtube.com/watch?v=DvdJFYxATOo

Anonymous
11/02/25(Sun)12:43:37 No.107083478

Anonymous 11/02/25(Sun)12:43:37 No.107083478

>>107083341
MIT or Apache License

Anonymous
11/02/25(Sun)12:44:54 No.107083491

Anonymous 11/02/25(Sun)12:44:54 No.107083491

>>107083321
I thought this was bait at first.

Anonymous
11/02/25(Sun)12:47:26 No.107083514

Anonymous 11/02/25(Sun)12:47:26 No.107083514

>>107083491
it was, llamacpp is obviously the correct answer.

Anonymous
11/02/25(Sun)12:49:26 No.107083531

Anonymous 11/02/25(Sun)12:49:26 No.107083531

>>107083478
If nobody contributes with actual suggestions to improve the functionality of the thing, I'll release it with a CC license.

Anonymous
11/02/25(Sun)12:51:24 No.107083544

Anonymous 11/02/25(Sun)12:51:24 No.107083544

4.6 air when?

Anonymous
11/02/25(Sun)12:53:35 No.107083557

Anonymous 11/02/25(Sun)12:53:35 No.107083557

>>107083341
2 things:
1. I wouldn't expect a fast turnaround on questions;
2. I wouldn't expect people to actively contribute ideas to another RPG simulator on a sunday morning.

It's also been done quite a few times, woudl recommend looking at existing ones to get ideas/figure out what will make yours different.

Anonymous
11/02/25(Sun)12:54:48 No.107083567

Anonymous 11/02/25(Sun)12:54:48 No.107083567

File: 1744682137295279.png (399 KB, 556x720)

399 KB PNG

>>107083391
>shock collar

Anonymous
11/02/25(Sun)12:55:59 No.107083576

Anonymous 11/02/25(Sun)12:55:59 No.107083576

>>107083557
Yup.
I'm a regular, so I'm well aware, but thank you for the heads up anyway.
Got any specific one's you think I should study for inspiration?

Anonymous
11/02/25(Sun)12:58:42 No.107083594

Anonymous 11/02/25(Sun)12:58:42 No.107083594

>>107082176
>watch someone take his shit and monetize it
It was originally made by another anon but I have vague memories about it. I think it was first uploaded as a pastebin and he didn't want to maintain a community project.

Anonymous
11/02/25(Sun)13:00:26 No.107083608

Anonymous 11/02/25(Sun)13:00:26 No.107083608

>>107083341
You could add stats (hp bar, etc) handling

Anonymous
11/02/25(Sun)13:03:32 No.107083638

Anonymous 11/02/25(Sun)13:03:32 No.107083638

File: Bean_RPG.card.png (729 KB, 1280x1024)

729 KB PNG

>>107083576
unfortunately not, I'm not big into the local RPG via LLM scene, think its interesting and am waiting until I get around to messing more with it I guess.

It is something I do plan to dive into further though, as I think the necessary supporting features for an RPG-guide could/should be self-contained in a module, to allow others to customize/build off it, ala the OGL with DnD (before the drama).

That would let people collaborate indirectly, and help push a common standard, so people can create their custom setups/stories/mechanics without having to do everything from scratch. - Typing that, I'm now interested in doing some research into this, to see if this is already being done.

This card convinced me its not only possible but is going to be fucking awesome, just waiting for it to be done 'nicely';

Anonymous
11/02/25(Sun)13:05:01 No.107083647

Anonymous 11/02/25(Sun)13:05:01 No.107083647

>>107083594
I think most people missed it, but the original Anon created a repository sometime after lmganon and changed the license to MIT.
https://codeberg.org/mikupad/mikupad

Anonymous
11/02/25(Sun)13:06:24 No.107083661

Anonymous 11/02/25(Sun)13:06:24 No.107083661

>>107083594
https://desuarchive.org/g/thread/94954088/#q94956607
https://desuarchive.org/g/thread/96423435/#q96427559
You're right. Forgot all about that.

>>107083647
Sadly doesn't seem like he kept his version going though.

Anonymous
11/02/25(Sun)13:08:18 No.107083675

Anonymous 11/02/25(Sun)13:08:18 No.107083675

>>107083661
>I will c
what did he mean by that

Anonymous
11/02/25(Sun)13:09:17 No.107083690

Anonymous 11/02/25(Sun)13:09:17 No.107083690

>>107083608
I forgot to mention that one of the tools available to the model is a state management so that it can create and manage those on its own although I do have to tweak the prompt for that.
I.might Separate that from the rest of the tools to better steer the model into making use of it, I guess.
Actually that gave me an idea, I could give the LLM the option to create a type of stat that becomes a UI element like a bar, a point counter, etc. Just gotta be carefull to not make yhe whole thing too complicated.
My aim is for the whole thing to work with Qwen 30b A3B class models. Shit almost anybodcan run.

>>107083638
>its not only possible but is going to be fucking awesome, just waiting for it to be done 'nicely';
I think so too.

Anonymous
11/02/25(Sun)13:15:12 No.107083730

Anonymous 11/02/25(Sun)13:15:12 No.107083730

>>107083576
https://github.com/p-e-w/waidrin
https://github.com/gddickinson/llm_RPG
https://ianbicking.org/blog/2025/07/intra-llm-text-adventure

Anonymous
11/02/25(Sun)13:17:07 No.107083748

Anonymous 11/02/25(Sun)13:17:07 No.107083748

>>107083414
Your special interest is boring to everyone.

Anonymous
11/02/25(Sun)13:18:17 No.107083758

Anonymous 11/02/25(Sun)13:18:17 No.107083758

don't @ me retard

Anonymous
11/02/25(Sun)13:18:41 No.107083761

Anonymous 11/02/25(Sun)13:18:41 No.107083761

>>107083730
Are those some you know do something interesting or used and think work well or just ones you know exist?
Regardless, I'll take a look.
Thanks

Anonymous
11/02/25(Sun)13:19:16 No.107083766

Anonymous 11/02/25(Sun)13:19:16 No.107083766

>>107083414
*munch*

Anonymous
11/02/25(Sun)13:21:00 No.107083784

Anonymous 11/02/25(Sun)13:21:00 No.107083784

>>107083761
Features, approach, and documentation.

The two projects for their features and approaches(creating a generic fantasy game vs creating a game system you can then tweak/customize), the third for an actual game dev walking through the process, sharing their thoughts on the design.

Anonymous
11/02/25(Sun)13:22:28 No.107083804

Anonymous 11/02/25(Sun)13:22:28 No.107083804

>>107083784
Awesome, that's more valuable than any one suggestion probably.
Thank you anon.

Anonymous
11/02/25(Sun)13:43:19 No.107083967

Anonymous 11/02/25(Sun)13:43:19 No.107083967

>>107080974
Post content

Anonymous
11/02/25(Sun)13:59:47 No.107084077

Anonymous 11/02/25(Sun)13:59:47 No.107084077

>>107084067
>>107084067
>>107084067

Anonymous
11/02/25(Sun)14:31:39 No.107084312

Anonymous 11/02/25(Sun)14:31:39 No.107084312

File: postContent3.png (406 KB, 512x512)

406 KB PNG

>>107083748

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.