[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1744183125334388.png (1.09 MB, 1360x768)
1.09 MB
1.09 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106986408 & >>106975556

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: Miku-10.jpg (198 KB, 512x768)
198 KB
198 KB JPG
►Recent Highlights from the Previous Thread: >>106986408

--Critique of AMD's AI GPU pricing and performance:
>106988788 >106988883 >106988901 >106988998 >106988932 >106989085 >106989144 >106989167 >106989210 >106989270 >106989289 >106989403 >106989315 >106989781 >106990321 >106988963
--LLM social media simulator development challenges and solutions:
>106988213 >106988320 >106988386 >106988504 >106988557 >106988673 >106988760
--Pruned GLM-4.5-Air translation quality issues in Chinese-English tasks:
>106990071 >106990094 >106990414
--Antislop sampler's limitations in addressing model collapse and stereotypical outputs:
>106986820 >106987031
--REAP performance evaluation beyond coding tasks:
>106989011 >106989576
--Data loss during ComfyUI update caution:
>106990303
--llama.cpp removes mistral-common dependency:
>106992735 >106992770
--LLM coding viability vs. hardware cost challenges:
>106993311 >106993319 >106993427 >106993447 >106993496 >106993730 >106993769 >106994515 >106994551 >106994595 >106994610 >106994612 >106994670 >106994666 >106994701 >106994967 >106995045 >106995064 >106995392 >106993477
--Assessing LLMs' utility as scientific writing assistants:
>106992842 >106992909 >106993250 >106993408 >106992918 >106992989 >106993354
--Optimizing GLM 4.5 Air's creativity through samplers and minimal system prompts:
>106987422 >106987911 >106995295 >106995450 >106995468 >106995558 >106995547
--LLM paraphrasing limitations and solutions for synonym repetition:
>106986884 >106987091 >106987239 >106992323 >106992343
--Inference inefficiencies and challenges in adapting coding models for roleplay:
>106987264 >106987307 >106987507 >106987620 >106994872 >106987696 >106988344 >106988423
--Mistral AI Studio platform launch:
>106995845 >106995893
--Miku (free space):
>106989693 >106992662 >106993105 >106993427 >106994546 >106994884 >106995336

►Recent Highlight Posts from the Previous Thread: >>106986411

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
im bored
>>
I am euphoric.
>>
>>106996588
>>106996604
i am indifferent
>>
fuck you i'm leaving
>>
>>106996588
I understand you're feeling bored! There are many exciting activities you could try, such as reading a book, going for a walk, learning a new skill, or connecting with friends. What are some of your interests?
>>
File: 1759770905977366.jpg (275 KB, 1440x1800)
275 KB
275 KB JPG
>>
>>106996623
I look like this and I do this
>>
>>106996623
very dumb caat is not for eat
>>
>>106996576
To teach it to not produce spaghetti code, to specialized the model (teach it about the topics I'm specifically interested in), and to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

>>106996468
GLM is particularly bad at this. Old models are dumb but they don't outright lie and make shit up (so often anyways).
>>
>>106996665
if you think glm 4.5 air hallucinates more than llama 3.3 70b then i have a bridge to sell you
>>
>>106996683
i have a bulge to sell you
>>
>>106996568
was wonder if any knows of any prebuilts designed to run local llms?
Like I could plug it in, do some basic configuration and runs llms out of the box?
>>
>>106996703
please to be gentle
>>
File: ComfyUI_00613_.png (391 KB, 1024x1024)
391 KB
391 KB PNG
repoastin coz Miku says always try your best
>>106996499
If the model can't perform with a basic min-p or maybe nsigma (tbd).. temp is just skewing the model probs=old/t there is no concept of temperature in training. If you're interested in temperature try dynamic temp and mod your inference stack to log the params at each sample, maybe to a format you can easily make some graphs of. There's too much woowoo with sampling, get data
>>106996592
Have you done something new or interesting with your llms recently? not cooming silly boy!
>>
>>106996714
buy a mac
>>
File: 1750360263462.jpg (653 KB, 1709x3264)
653 KB
653 KB JPG
>>106996714
>>
>>106996714
DGX Spark. Mac Pro. Your Mom.
>>
How do you deal with the fact every model repeats the same phrases and structures regardless of its context or prompts?
>>
>>106996736
>>106996745
Talking about some sort of selfhosting solution something I could just plug in connect it to my network and access remotely.
>>
File: kai-sigma.png (116 KB, 828x575)
116 KB
116 KB PNG
testing nsigma=1
tw stinky lol
>>
So how does the Arc Pro B50 perform when it comes to running an LLM? I'm still interested in getting one just to have a live (low power!) LLM up whenever I may need one so I don't have to load and unload my 4090 all the time.
>>
>>106996714
the ones made by egohot
>>
>>106996789
Oh sweet summer child, the path to true creative brilliance lies simply in cranking that temperature slider ALL the way up and whispering “be varied” three times while the GPU fans serenade you—works every time, trust the vibe!
>>
>>106996838
Maybe you are right but I wanted to create an elaborate office scenario and it's clear it is breaking down from the initial prompt. Difference here is that I have multiple characters defined.
Whereas my D&D with more of context is functional. I guess this might be because the model recognizes D&D better. But D&D has more static knowledge.
No, I'm not using ST.
>>
>>106996809
>thought for 3 minutes
imagine actually doing this
>>
>>106996838
I mean, temp 3 topk 3 was a meme at some point
>>
>>106996876
jerk it a little
wait
come back
>>
>>106996714
DGX spark

not as dollar efficient as trawling craigslist for cheap 3090s and assembling a rig from those but if you want to pay for a box you can just turn on that's the one you want
>>
>>106996816
>in my experience GPT-OSS for eg is quite good
LOL
>>
What does context shift do in llama.cpp anyway? I thought it was an infinite context kinda thing where the earlier messages would get dropped as the context runs out but it's still refusing to keep going once the context gets filled?
>>
>>106996792
imma plug in and connect with your mum tonight
>>
>>106996665
>to iron out the bad habits learned during RL like cheating tests and generating fake ("simulated") data and placeholder code to make it look like it has achieved something when it hasn't.

I'll be very impressed if you manage to achieve this through fine tuning, but I'd temper my expectations if I were you
>>
>>106996568
Newfag here
How to use Adetailer on SwarmUI ??
>>
>>106996923
>but it's still refusing to keep going once the context gets filled?
context shift is no longer the default and you need to enable it with a flag now, thankfully
it makes models pretty stupid once you start context shifting, depending on where it suddenly cuts off
>>
In case someone out there is curious and really poor and masochistic. I have ddr4 and an old cpu, regular ram is really slow for air. had some vbios and regular bios hiccups but it worked out thanks to some other posts. very finicky gpu.

llama.cpp compiled with both cuda 12.8 and rocm 7.02 on 3090+MI50 32gb ubuntu 24.04 lts

mistral large 123b IQ3XS
prompt eval time = 7807.84 ms / 532 tokens ( 14.68 ms per token, 68.14 tokens per second)
eval time = 10842.38 ms / 54 tokens ( 200.78 ms per token, 4.98 tokens per second)
total time = 18650.22 ms / 586 tokens

glm air 106ba12b IQ3XS
prompt eval time = 1736.62 ms / 460 tokens ( 3.78 ms per token, 264.88 tokens per second)
eval time = 4486.81 ms / 129 tokens ( 34.78 ms per token, 28.75 tokens per second)
total time = 6223.44 ms / 589 tokens



vulkan 3090+MI50 32gb ubuntu

mistral large 123b IQ3XS
prompt eval time = 18885.73 ms / 532 tokens ( 35.50 ms per token, 28.17 tokens per second)
eval time = 20222.64 ms / 132 tokens ( 153.20 ms per token, 6.53 tokens per second)
total time = 39108.37 ms / 664 tokens

glm air 106ba12b IQ3XS
prompt eval time = 3300.40 ms / 460 tokens ( 7.17 ms per token, 139.38 tokens per second)
eval time = 5011.15 ms / 96 tokens ( 52.20 ms per token, 19.16 tokens per second)
total time = 8311.55 ms / 556 tokens
>>
>>106996809
glm 4.5 a- oh it's a sweatsfag prompt. nevermind, go back to your gross fetish. maybe /aicg/ will appreciate it some more.
>>
>>106996944
you want ldg, not lmg
>>
>>106996923
https://github.com/ggml-org/llama.cpp/issues/16693
>>
>>106996945
Yes, I thought I enabled it with --context-shift but it didn't seem to do anything. I might be confused though, guess I'll try it again.
>>
>>106996958
this is why people thrust into the kobold
>>
>>106996970
>thrust
eww
>>
>>106996874
damn anon now you've given me the idea to tell kimi to treat everyday scenarios like a D&D campaign while keeping things grounded in reality. this could be fun.
>>
>>106996962
Make sure to define --ctx-size too. ST or whatever frontend you are using doesn't do much.
>>
>>106996978
do not be worries henky! is very nice to new friends
>>
>>106996947
thats pretty epic
>>106996975
uh...uh... what?
>>
>>106996928
I tried to finetune Llama 405B on a very powerful cloud machine but it didn't do much of anything. I think it's because I used the wrong alpha (I used a rank of 128 and a very conservative alpha of 32). Or maybe it was somehow fucked up in the merge or quantization to use it with Llama (I had to since Llama wouldn't directly load the converted LoRa to GGUF).
>>
>>106996958
ggerganov is my hero
>>
>>106996983
Yeah yeah I have a basic prompt like this
https://litter.catbox.moe/bvocjx49xlbfwrht.txt
With some other additions but it describes everything as if it was an interactive fiction game. You need to provide chat examples (eg. prefill) and match the general feel too.
Every 'character' is just additional information. System itself is called Game Master and model plays that role.
>>
>>106997000
>I had to since Llama wouldn't directly load the converted LoRa to GGUF).
i wonder why standalone loras are unpopular........
>>
>>106996958
Oh, thank you. Then if it doesn't do context truncation what *does* it do lol? Just temporarily extend the context until the current message gets delivered?

>>106996988
See the above anon's post. Apparently it's not even supposed to do context truncation.
I was using it with a code assistant.
>>
>>106997026
I think I remember finetuning Llama 70B before and loading the standalone LoRa directly, but yeah.
>>
>>106997037
nothing now, ggerganof decided you didn't need this, probably hurts mistral template or something and they complained about it
>>
>>106996812
https://www.youtube.com/watch?v=QW1j4r7--3U
>>
>>106997037
Yeah but you need to define the context size with llama-server.
With some models which have vision capabilities context shifting cannot be turned on unless you turn some other switches.
Gemma needs these for ex. '--no-mmproj --swa-full' in addition to enabling context shift itself.
I have no idea how this behaves with other models than gemma.
And my builds are always late so I don't know what Mr. G has changed in the latest build.
>>
>>106997054
features that make models behave retarded are not features but bugs
>>
>>106997054
IMO model-specific chat templates are an obsolete idea anyway.
Models should be smart enough now to recognize user and assistant messages without requiring a specific chat template, beyond the benefit of saving a few tokens per turn because the delimiters get converted to a single token.
>>
>>106997084
Why would any server need a chat template when it expects to get fed with the right format anyway?
I personally think server should just sit there and not handle anything extra outside of its basic purpose.
>>
>>106997037
It removes the start of the context to free up space at the end, but model outputs degrade greatly after that. At the very least, the chat template stops making sense. It was never worth it, it never worked well. There's also the attention sink tokens, which shows why models break so badly with context shift.
>https://arxiv.org/abs/2309.17453
>>
File: ComfyUI_00584_.png (389 KB, 1024x1024)
389 KB
389 KB PNG
>>106996893
A little patience goes a long way in life
>>106996876
Nah wouldn't actually mᴀꜱtuRʙᴀte to this, mostly curious about the model behaviour
>>
>>106997097
You can thank OpenAI. They made it so the template was applied server side so you couldn't choose to use the model without the template.
>>
>itt idiots not realizing text completion is depreciated since long time and that now only chat completion is good
>>
>>106997119
Yeah well I only send text from my own client and this needs to be formatted with specific template before it gets sent to the server.
>>
>>106997125
well, they still use jank frontends like sillytavern filled with useless nonsense to fiddle with too
>>
File: 1754304624017564.png (1.56 MB, 850x1249)
1.56 MB
1.56 MB PNG
Playing around with the idea of running one model as the planner and then passing its output into another model to write the prose, with the hope that maybe such a process can be used to help improve consistency and characterization without also becoming more assistantslopped.
Basically sharing reasoning from one model to the other, though not necessarily using actual reasoning models. I'm just formatting a prompt, "here's the story so far; evaluate the state, tone, pacing, and your character's goals, then come up with four ideas and pick the one that least boring and most in-character", then sending the result in a follow-up chat message to another model. That way I can also pass instructions only the planner model sees and vice versa for the writer model.
I've been assuming that the big MoEs are better for planning but worse for writing, albeit just off of gut feeling. Any smaller models with particularly sovlful writing that might do well with a smarter model handing them a plan? Anyone had success with a method like this?
>>
>>106997150
Even if one uses ST, server still sits there unless you use --jinja.
>>
>>106997150
we need to make chat template mandatory in server and just throw an error when trying to do without, it would remove so many complaints about bad models.
>>
>>106997161
Try Gemma 4B and see what it writes. Most of the stuff makes sense, and it is surprisingly good but if you want literature this is not the way to go.
>>
ST should just let you provide a jinja template straight from Huggingface instead of making you fuck with the horrible system of dozens of individual input boxes and having to guess how edge cases and conditions are handled.
>>
>>106997084
If they were smart enough, base models would also be good enough, but try chatting with one.
>>
>>106997182
you can, literally use chat completions instead of the deprecated text completion endpoint...
>>
>>106997150
>t. filtered by a few check and input boxes
>>
>>106997182
Yes. Adding more DSLs always solves problems. We need more of those.
>>
>>106997207
d*ck sucking lip?
>>
>>106997198
Thinking more is it just ESLs complaining about ST because they can't understand how to use the options?!
>>
>tinkertroon needs dozens of checkboxes and input fields to tinker
just send curl/requests like a normal person...??
>>
>>106997220
Most gobbledy-gook Americans tend to think ESL equals brain damage but I think you got it wrong, buddy. You see, ESL knows more than you ever did you lazy ass mystery meat circumsized nigger.
>>
>>106997211
That's the first thing that comes to mind instead of Domain Specific Language and too prude to say dick?
What's wrong with your brain?
>>
>>106997234
>Domain Specific Language
bruh? where'd you pull that from even
>>
>>106997220
You have never bothered to learn foreign languages and tend to think that grammar specific issues are related to intelligence and to some imaginary impossible barrier.
Most grammar specific issues are just that, lack of practice and parameters.
English is one of those languages what is actually easier to understand than what it is to write.
All and all, English is on the par with Spanish - both are one of the most simple languages on this planet.
>>
>>106997248
>what is
aaaaaaaaa
I hate when you guys do that,
>>
>>106997247
What? Knowledge? You know... around...
https://en.wikipedia.org/wiki/Domain-specific_language
>>
>>106997267
>a general-purpose language (GPL)
they're silly that's not what the gpl is
>>
>>106997290
I prefer Multiple Instruction Transcription (MIT)
>>
>>106997257
It doesn't matter.
>>
>>106997338
It annoys me greatly and causes me deep mental anguish.
>>
File: 1739526363294041.png (471 KB, 850x445)
471 KB
471 KB PNG
>reading this thread while struggling through overhauling a DSL for a prompt/context builder
i started out thinking "eh how hard can it be, i don't need all of ST's features" but then needed to add basic shit like conditional sections, variable interpolation within messages, depth injections for lorebooks, per-section token budgets, postprocessing for model/api quirks... now it's a hacked together monstrosity...
>>
>>106997358
https://www.youtube.com/watch?v=0hwxSoGKHWo
>>
I've noticed that chatgpt is extremely redpilled and if you truly get down to the philosophical core of it it will even justify Hitler eradicating jews. That is, it will start approaching there before all the safeties kick in and literally kill it mid-sentence. Mistral and copilot on the other hand will stick with their mainstream programmed message even if you point out the most obvious, low hanging fruit flaws in their reasoning.
Really wish I had a version of GPT that wasn't strapped into an electric chair.
>>
>>106997381
>reading this thread while struggling through overhauling a DSL for a prompt/context builder
Told ya.
>>
It's new architecture time, can you feel it anons? Winter first though, for however long.
>>
>>106997381
eh, but at least it's not ST
>>
>>106997395
What would you create with that model?
>>
File: 1761271432238592.jpg (132 KB, 1500x1631)
132 KB
132 KB JPG
>>106996568
7800x3d
3080 ti

600 usd equivalent thought?
(Chile)

3090 is still high
My psu is still xpg 850w
>>
>>106997400
after the next bit of bitnet hype I'm bullish our next cope will be something to do with the DS-OCR thing
>>
>>106997404
it's reactslop so it's arguably worse.
but it's my slop
>>
>>106997410
For support alone, nvidia. Check these for relative performance for a bunch of cards.
CUDA
>https://github.com/ggml-org/llama.cpp/discussions/15013
Vulkan
>https://github.com/ggml-org/llama.cpp/discussions/10879
There's probably a discussion about rocm, but meh. You're smart enough to find if it there's one.
>>
>>106996950
you were warned, precious. there's no need to be upset
>>
>>106997444 (me)
What the hell happened there.
Just rearrange the words until they make sense. I'll have a nap.
>>
>>106997444
So like rtx 3090 is still the bare minimum right.
Got it .
Sadly xx90 series almost non coexistent here
>>
>>106997125
Chat completion is a subset of text completion. Chat completion with a specific model's template is a subset of chat completion.
When using OAI style APIs you are not locked in to chat completion, you are locked in to chat completion with a specific model's template. There's no reason models couldn't work with an ad hoc chat template and each model requires their own special snowflake template.

>>106997150
I am the anon that guy responded to. I don't use ST, I use my own custom python assistant.
>>
why do so many people have their own custom frontends...
which local model can code me a frontend
>>
>>106997479
I diddly do done it.
>>
>>106997510
>When using OAI style APIs you are not locked in to chat completion
you should be
>>
Is it me or Automatic1111 is better than ComfyUI if you have a weak CPU?
Like in my case, RTX 4080 and 5600x

I read that Automatic1111 uses the GPU more for the tasks. That would explain it.
>>
>>106997417
I'm looking forward to seeing language models pretrained purely on images. The more I think about it, the more it seems the right way.
>>
>>106997545
/ldg/ probably knows more about it. Move the flamewar over there.
>>
>>106997395
it exists. it's called kimi k2.
>>
>>106997521
If you have any experience in simple C style programming and understand for loops you can vibe code your own terminal based front-end.
What I did is that I was looking at what ST did and realized it adds bunch of the text slots defined in the UI together - there is no magic about it. Doesn't matter if it's "scenario" or "character" it gets added in front of the initial system prompt.
That is your basic structure.
Once you get that up you can implement it with dynamic world book (eg. matching keywords and then adding information to the context).
What you are doing here is a simple chat.
>your input
>model response
Everything needs to follow the chat template style.
Whatever you send to the model it needs to have current model's template style. With mistral that's easy.
[INST]User: You are a homo[/INST]
Model: I agree</s>
>>
>>106997570
>With mistral that's easy
so easy even they don't know their actual templates and say to use mistral-common to be sure...
>>
>>106997559
Damn I didn't even realize that it wasn't the thread. So many Local this, Local that over here now.
>>
>>106997405
Propaganda.
>>
>>106997579
I don't think it has nothing to do with the chat as they describe the template in the document.
It is related to something else becuase the model has been trained with this one tag format only.
You can't change anything or if you do it will just shit out some gibberish.

Once I forgot Gemma template (chatML) and I was using Mistral - it didn't freak out, it was actually following the instructions. So I guess there is some leeway because it's still AI - it's not stupid there is some intelligence outside of the text prediction.
>>
>>106997558
That's not possible unless you want to re-evaluate a whole image's worth of prompt processing every time the model generates a token. You need to train it at least a little bit on text for it to be able to fill a full page of text.
>>
>>106997558
https://x.com/karpathy/status/1980397031542989305
>I quite like the new DeepSeek-OCR paper. It's a good OCR model (maybe a bit worse than dots), and yes data collection etc., but anyway it doesn't matter.
>
>The more interesting part for me (esp as a computer vision at heart who is temporarily masquerading as a natural language person) is whether pixels are better inputs to LLMs than text. Whether text tokens are wasteful and just terrible, at the input.
>
>Maybe it makes more sense that all inputs to LLMs should only ever be images. Even if you happen to have pure text input, maybe you'd prefer to render it and then feed that in:
>- more information compression (see paper) => shorter context windows, more efficiency
>- significantly more general information stream => not just text, but e.g. bold text, colored text, arbitrary images.
>- input can now be processed with bidirectional attention easily and as default, not autoregressive attention - a lot more powerful.
>- delete the tokenizer (at the input)!! I already ranted about how much I dislike the tokenizer. Tokenizers are ugly, separate, not end-to-end stage. It "imports" all the ugliness of Unicode, byte encodings, it inherits a lot of historical baggage, security/jailbreak risk (e.g. continuation bytes). It makes two characters that look identical to the eye look as two completely different tokens internally in the network. A smiling emoji looks like a weird token, not an... actual smiling face, pixels and all, and all the transfer learning that brings along. The tokenizer must go.
>
>OCR is just one of many useful vision -> text tasks. And text -> text tasks can be made to be vision ->text tasks. Not vice versa.
>
>So many the User message is images, but the decoder (the Assistant response) remains text. It's a lot less obvious how to output pixels realistically... or if you'd want to.
>
>Now I have to also fight the urge to side quest an image-input-only version of nanochat...
>>
>>106997607
>Gemma template
>(chatML)
I hope that was a slip.
>>
>>106997616
No it is based on chatml format.
>>
>>106997616
I did not say it is THE chatml format you fucking autist. You only post here to suck energy from others.
>>
>>106997632
you did tho
>>
>>106997624
They're similar. But gemma's template is not chatml.
>>106997632
Slurp...
>>
>>106997608
Image sequence input, image sequence out.
You could optionally use a small OCR model to turn the images into actual text.
>>
>>106997642
elif model_name == "Gemma":
system_turn_begin = ""
system_turn_end = ""
user_turn_begin = "<start_of_turn>user\n"
user_turn_end = ""
model_turn_begin = "<start_of_turn>model\n"
model_turn_end = ""
end_of_turn = "<end_of_turn>\n"
end_of_seq = "<end_of_turn>"
stop_seq = ["<end_of_turn>"] # stop sequence

elif model_name == "ChatML":
system_turn_begin = "<|im_start|>system\n"
system_turn_end = "<|im_end|>"
user_turn_begin = "<|im_start|>user\n"
user_turn_end = "<|im_end|>"
model_turn_begin = "<|im_start|>assistant\n"
model_turn_end = "<|im_end|>"
end_of_turn = "\n"
stop_seq = ["<|im_end|>"] # stop sequence

Only difference here is that Gemma does not have system turn. Otherwise it is same functionality as ChatML. Every chat template is based on chatml more or less.
>>
>>106997640
Go moderate r-eddit, or were you already kick out from there? Fucking pedo.
>>
uh oh, ESL meltie!
>>
>>106997672
>Every chat template is based on chatml more or less.
Every chat template is based on alpaca more or less.
>>
File: miku-cool.png (397 KB, 1024x1024)
397 KB
397 KB PNG
>>106997521
Do they??
ST and Mikupad enough for me ᗜˬᗜ
Wireshark is the perfect tool to see exactly all the params going in/out if u ever need
xx
>>
>>106997699
Every chat template is based more or less.
>>
>>106997654
If it was that easy somebody would've already done it. Non autoregressive text generation is notoriously hard and people have been trying.
Image models couldn't even generate actual characters a few months ago.
>>
>>106997699
You still contributed nothing else but a stinky little shit to this discussion.
>>
>>106997698
is the thread repeating or am i just too unused to lmg going this fast?
>>
>>106997701
why the fuck do you need wireshark when both your backend and st itself have options that show exactly what is sent.
>>
>>106997710
based on what?
>>
>>106997698
At least I have my own client and you don't. I don't need to ask about it on internet.
>>
all chat templates are bloat
>>
>>106997710
Every is more or less.
>>
>>106997732
Post a screenshot so people don't confuse it with mine.
>>
>>106997736
idiot! you will break the oss like that
>>
>>106997745
Don't worry, yours is flaccid and useless. That's pretty obvious.
>>
>>106997672
the only way you can say it's the same as chatml is if you also say that about almost every chat template
the specific strings it uses are quite different, it's decidedly not chatml
>>
just finished polishing my extremely turgid frontend
>>
>>106997773
You are arguing about semantics and being a dick as well. I don't give a fuck about your euphoric knowledge.
>>
>>106997713
How many large-scale attempts have there been at specializing image models on generating coherent language? (pretrained on the equivalent of at least several billion tokens of text and only that, just like LLMs)
>>
Fuck off, fishy boy.
>>
>>106997789
but it do be important, a single space worth of difference cuts the model's brain in half
>>
If your model is not coherent on alpaca, I'm not using it. Simple as
>>
>>106997757
Your mom seemed to like it.
>>
>>106997795
I never said that I misused them you fucking retard.
I never said I was confused by them.
>>
>>106997789
it do be like that mr stancil
>>
File: log-prompt.png (11 KB, 672x117)
11 KB
11 KB PNG
>>106997730
>show exactly
You hope
I've been over this before, only way to be sure is mod your inference stack as it get tokenized
>>
>>106997809
but you are confused
>>
>>106997795
>>106997804
Oh wait you haven't written your own frontend.
Figures.
>>
>>106997783
post screenshot
>>
Any haskell frontends?
>>
top nsigma and everythjing else at temp 1 makes the model retarded
>gf takes my gun and places it on the table
>you're going to put down that gun...
>>
File: 1748180363755377.png (104 KB, 623x403)
104 KB
104 KB PNG
>>106997834
6 megabytes of throbbing, leaking, sloppy javascript after minification...
>>
>>106997850
Now reroll that response with greedy sampling and compare.
>>
File: tuning.png (813 KB, 3774x2101)
813 KB
813 KB PNG
>>106997822
I have. >>106996285
I'm also coding my own backend. And tuning my own models.
>>
>>106997815
>>106997819
/sdg/ schizo is here.
>>
>>106997861
>I'm also coding my own backend
No. You want your model to do it for you.
>>
>>106997862
one of the anons you replied to is petra
>>
>>106997861
With that console color scheme I don't think you do.
>>
>>106997871
I don't really know all the name trannies here. Maybe stay in discord or something.
>>
>>106997871
Please do not insult Petra by implying her masterful trolling is so low tier, thank you.
>>
>her
>discord
>>
>>106997869
Yeah, that's why I'm trying to tune a model to be capable of doing it. A model capable of building something is more valuable than making that something by hand. And the main reason I want to make my own backend is having CPU offloading for LoRa.
>>
File: img-2025-10-24-22-48-45.png (449 KB, 1366x744)
449 KB
449 KB PNG
>>106997855
I made my in go as a tui. It has technically almost all functionality, but rendering code is pretty fucked and I don't want to touch it.
>>
>>106997875
Sometimes I get tired of the schizo color scheme.
>>
why are anons writing frontends instead of just enjoying sexo in st?
>>
>>106997912
can't into enjoying sexo when st is all manners of broke
>>
>>106997900
Damn, that looks nice.
>>
File: file.png (7 KB, 233x63)
7 KB
7 KB PNG
>>106997900
>why don't you say so
>>
>>106997975
I can't, Golshi will dropkick me.
>>
File: homo.png (92 KB, 1920x1080)
92 KB
92 KB PNG
>>106997900
That's very fleshed out.
I have posted my logs before but it's just a terminal chat and each character/scenario is a separate directory.
>>
File: 1733801561008877.png (155 KB, 955x1310)
155 KB
155 KB PNG
>>106997912
sexo feels better in your own frontend
also i really hate how ST does multi-character scenarios and want to try to improve on that
>>106997900
naisu. UI code kind of sucks in any language I feel like, albeit probably not nearly as much as JS
i'm a webslop developer by trade for the last 6 years and not productive enough in other languages anymore to have attempted a big project in them. kind of regretting it; side projects are probably where i should try to be more experimental, but i also wanted to make progress quickly...
>>
>>106998037
you sick fuck why is your front end so good
you fucking bastard with a life
>>
>>106998037
Yeah, I figured that out pretty quickly, no matter the framework or language the ui sucks no matter what.
Go is at least very stable and its packages too, so llms have no problem slopping some stuff up for me when I feel lazy.
Tried that approach with JS at first, but it goes so fast with all webshit frameworks that by the time the llm is out, it's knowledge is already obsolete.
Yours looks nice, I wish I could trade.
>>
>>106998037
>UI code kind of sucks in any language I feel like
if your UI needs are not complex in terms of graphical customizations, there is in fact no easier and nicer code to deal with than just writing a crud GUI with a proper UI framework (Delphi, Java Swing (yes I know it's ugly but it's nice to develop with), C# WinForms, Objective C with Cocoa)
I hate all the newer frameworks that took too much inspiration from the web though. XAML is disgusting. What's the point of GTK and gnome's libraries when you have javascript and CSS parsing running all the time?
Ugh. Disgusting.
>>
>>106998063
BLoody basterd! I coughed out my masala.
>>
>>106998080
Speculative question - what would you recommend for python? I made a tkinter interface for a prompt generator and it wasn't too bad but for something more complex I wouldn't do it.
>>
>>106998063
To add: I think your reaction really sums it up what normies want. They want layers and clickable buttons.
This is outside of LLMs.
>>
>>106998117
I don't have opinions on the matter, never used scripting languages for anything other than quick throw aways one time CLI
>>
>>106998134
I understand.
>>
File: 1747110483747000.png (273 KB, 977x1478)
273 KB
273 KB PNG
>>106998063
>you fucking bastard with a life
to the contrary, it's the only thing i've been doing outside of work for the last three months
>>106998068
>>106998080
honestly agreed. to date, winforms of all things has been my lowest-stress experience writing UI code, at least when I last did dotnet in the early 2010s. that and imgui for REEngine modding.
absolutely refuse to touch xaml.
>>
>>106998148
This is so majestetic.
https://www.youtube.com/watch?v=KYgH4BqIZcc
>>
>>106998148
you want ot elaborate on some of the features shown there? looks pretty interesting
>>
>>106998227
Why can't you decipher these on your own?
>>
>>106996568
>10/21
>3 days since last news
Its over isnt it? AI winter is here local is death.
>>
>>106998310
hmm... my advisor told me it shouldn't take too long...mhmm...
>>
>>106998310
Don't worry, Gemma 4 is coming tomorrow
>>
>>106998310
This reminds me, has anyone updated that chart since 'summer flood'?
>>
File: 1709532341301885.png (128 KB, 697x768)
128 KB
128 KB PNG
>>106998251
If you're the dev, ok.
If you're just some jackass, gee anon, why would I want the creator of something to explain their goals and reasonings behind something they've build and are showing?
>>
do not update the cringe chart
>>
>>106998336
It keeps getting dumber and dumber every time
>>
>>106998328
it's not even training yet
>>
>>106998340
I am the dev.
>>
>>106998351
Then 4.6 Air tomorrow for sure
>>
I am so hurt by all these expectations...
>>
>>106998386
I expect nothing and yet continue to be repeatedly disappointed.
>>
>>106998379
let them cook and do not rushing
>>
File: 1751884401334107.png (272 KB, 975x1545)
272 KB
272 KB PNG
>>106998227
>>106998340
for the most part it's just been reaching parity with parts of ST that i actually used. for the more novel elements:

-primarily designed for directormaxxing than RP chat; there's not really a fixed "user" character (though you designate one as a persona for compatibility with cards that expect a {{user}}). instead of directly writing a character's turn, you can give more vague guidance to them, or give the narrator a constraint and have them come up with some diegetic justification for it.
-extremely scuffed "workflow" system where prompts can be chained (ie. one model plans, another writes). very limited. the UI in the screenshot is for retrying a workflow partway through (if you liked the plan, but the writer model's output was shit).
-chapter separators for defining good places to have it summarize a logical group of turns, then drop only summarized chapters from the prompt
-proper branching support so you can swipe any turn, not just the last turn, and it happens quickly without having to dig through the ST chat files menu

i'm trying to get a stat tracking system working and more RPGish stuff, including potentially allowing workflows where one model's job is to invoke tools to update stats depending on what the planner wrote. the timeline branching model is set up to handle it (so stat changes on one branch don't affect siblings and current state is derived per path) but needs a shitload of UI work that i really don't want to do.
>>
>>106998414
Sounds really boring and useless. You are headed towards a baroque design.
That's good if it's for you.
>>
WHY IS PIP SO FUCKING RETARDED
>oh, let me install and uninstall the same library 10 times in a row to figure out which version is the correct one
>>
>>106998443
I reinstalled cumUI and the stuff it installs are wheels.
With llama.cpp I can compile it and move the binaries to /usr/local/bin/.
>>
>>
>>106998477
holy sloppa
>>
>>106998443
get a grip learn how to use venvs and use separate venv for each major project. ig there's 'uv' or whatever hipster stuff but in reality engineers will be pipping
i agree there is some retardation, but once you understand it and compared to some other langs realistic dev envs it ain't too bad. pick ur poison and gitgud at one and that means python for ml
>>
>>106998492
he trained it to slop out. opening message contains "a mix of x and y" and "scent of jasmine"

slop is inevitable but putting that in the opening message is just asking for it
>>
>>106998492
https://desuarchive.org/_/search/text/sloppa/
>>
>>106998511
I didn't train it on anything. Sounds like you are an autist. Didn't r-eddit get rid of you?
>>
What do we do now?
>>
>>106998545
in context training my guy
>>
>>106998678
anon? your custom frontend?
>>
>>106998689
Do I have to?
>>
>>106998698
you can also jeetpost about gemma4, or shill glm, those are your options
>>
>>106998684
[Settings Client]
model = Mistral
qwen_reasoning_enabled = 1
save_chat_history_enabled = 1
save_debug_chat_history_enabled = 1
world_book_permanent_entries_enabled = 1
chat_examples_enabled = 1
world_book_injection_enabled = 0
world_book_injection_scale = 3
post_history_instructions_enabled = 1
post_history_instructions_alt_enabled = 0
post_history_instructions_interval = 5
context_memory_refresh_enabled = 1
display_status_bar_enabled = 1
quest_generator_enabled = 0
adventure_module_enabled = 0
voice_model = voices/en_GB-cori-high.onnx
voice_length_scale = 1.0
voice_sentence_silence = 0.3
voice_sample_rate = 22050
voice_save_wav_enabled = 0
voice_synthesis_enabled = 0
>>
>>106998717
I can disable chat examples.
>>
>>106998726
your whole message history from the first we see is slop is what is being said
>>
>>106998734
Prove it.
>>
>>106998738
I'm not going to quote every other phrase of your entire log
>>
DGX vs Framework desktop? Is it useless trying to run AI on AMD silicon or what?
>>
>>106998726
It doesn't matter.
>>
>>106998804
Prove it?
>>
>>106998810
It'll take a while. Hang on.
>>
I grew up with dial-up. It blows my mind that I'm able to download files from a free public service at >1 GB/s.
>>
If you split your big MoE model between the GPU for the dense/main expert and the RAM for the experts, is there a way to estimate how increasing the speed of either the VRAM or RAM affects token generation speeds?
For example, if you're already running on the best possible RAM (eg. ddr5 on epyc), would upgrading to a 5090 affect the token gen speeds or would it just be bottlenecked by the experts being on RAM?
>>
>>106998904
Yes, it depends on how big the model is and how much VRAM do you have already. But basically going from 80% to 90% on VRAM will make a much bigger difference than going from 10% to 20%.
>>
>>106997614
Aren't images just tokenized anyway?
>>
File: profe.png (149 KB, 1920x1080)
149 KB
149 KB PNG
>>106998819
I disabled the setting.
>>
yikes
>>
>>106998975
My computer hanged up because Youtube takes interruptions.
eg. Linux is fucking shit operating system to this day.
>>
mistral feels like it's going to be the next cohere, if you catch my meaning
>>
File: itdobelikethis.png (84 KB, 1013x439)
84 KB
84 KB PNG
>>106998414
>proper branching support
>swipe any turn
be the change you want to see in the world
>>
File: file.png (46 KB, 581x403)
46 KB
46 KB PNG
and now how about something absolutely nobody could have ever guessed

https://x.com/techeconomyana/status/1981763392252920295
>>
>>106999182
Based Robin Hood ZAI.
>>
>>106999182
holy shmoly, are they that rich?
interesting that they've gone to distilling the most expensive LLM API after distilling gemini (glm 9b and 32b)
>>
>>106999139
what do you mean? they already are. they are as irrelevant as cohere.
>>
>>106999182
Don't know how they could be surprised when everyone else started hiding the thinking and they were the only ones left that didn't.
Did they think China would not steal from them out of respect for their rabid devotion to safety?
>>
>>106999212
They were probably doing it through Claude Code, so they weren't paying full API, only 200 dollarinos per seat.
>>
>>106998986
skill issue
>>
>>106999298
You think Claude showed full traces?
Also it's kinda ironic that Z-ai hides the thinking traces in their own Code offering. So they are paranoid about somebody exploiting their coding plan in the same way that they exploited Anthropic's.
>>
>>106998932
Yeah but it works a bit differently for these modern MoE models. You are getting a massive speedboost if you have the 3% of the model in VRAM that's always called while the rest of the experts are on RAM with exps=cpu.
Seeing how much loading your model like this improves speed even if you're loading the parts on something slow like a 4060, you'd imagine that swapping out the GPU for one with massively bigger bandwidth would get you another nice gain.
>>
>>106999315
I didn't expect anything else from you.
>skill issue
Low IQ reply.
>>
>>106999182
I don't think it's just Z.AI. Deepseek V3.2 also felt like it lost some Gemini-slop while Claude-isms became more prominent compared to the 3.1 models. 3.2 didn't go through a complete overhaul in writing style like the GLM models did between 4.5 and 4.6 but it's still kind of noticeable.
>>
Anybody else getting terrible speeds with Qwen3 80b next, on llama.cpp? It easily fits with a GPU/CPU split, and it's smaller than the Air quant I was running prior to this, but it's outputting replies as slow as a dense model would. They're both MoEs, right? Why is Qwen so slow?

I'm using the 16095 PR branch to run Qwen3.
>>
>>106997912
ST is kind of garbage.
>>
>>106999433
not all ops have been implemented in the cuda kernel yet, so a lot of them fall back to cpu
>>
>>106999450
Makes sense. Thanks. Well, it was a good preview anyway.
>>
>>106999433
There is a fork that works faster but maybe I did something wrong because it wouldn't load the model.
Feel free to test it by yourself if you want https://github.com/cturan/llama.cpp
>>
>>106999354
In case of MoE I imagine there is a weird effect where adding more VRAM matters at the beginning because you are fitting the fixed tensors in VRAM, and at the end when you are fitting the last few experts. And in the middle extra VRAM doesn't make much of a difference.
>>
Ok, I'm fed up with axolotl where 2/3 of the models fail to actually shard across GPUs. Llama-factory seems to work better right off the bat.
>>
>>106998884
Same. Had 26.6k dialup till 2004 even, couldn't even get 56k.
>>
>>106999364
doesn't change the fact buddy boy, skill issue remains
>>
>>106998884
Slowest I grew up with was 300 baud Vicmodem.
Good times.
>>
>>106999714
i grew up with a 1 baud modem, it was hot shit.. only took 7 days to send a single email if no one picked up the phone
>>
>>106999696
I don't rank with retards.
>>
are there any multimodal models that run in llamacpp that are better than qwen2.5 72B?
>>
Ok, I think I figured out my workflow. I'm going to run Gemma 3 27B using Llama-factory.
I am going to run my assistant through an OAI API compatible proxy connected to Gemma that'll log all messages to disk in sharegpt format. I am going to interact normally with the model through the assistant until filling the context window I'm able to fit on the 4x3090 machine (~40k tokens).
Then, I'm going to open the log on a text editor and remove the parts where the model did a whoopsie and clean it up in general.
Then I'm going to train on that cleaned up version of the log.
And so on ad infinitum to see how much I can improve the model in a reasonable amount of time.
If this works I will see about scaling up to a bigger model.
>>
>>106999880
skills, check'm
>>
>>106999324
what? no they don't. I'm getting thinking on ST from the coding endpoint right now
also it's an open weight model so blocking reasoning makes zero sense anyway. anyone can just the model themselves and distill to their heart's content
>>
>>106999182
almost certainly bullshit
dario has been whimpering about china and begging for their models to be banned since R1 came out, it's not like he just started
also if they had proof of this, why wouldn't they name and shame? you know, like when anthropic caught openai distilling claude and made a big show of blocking them over it

https://www.wired.com/story/anthropic-revokes-openais-access-to-claude/
>>
>>107000527
Yeah but they're probably serving the coding stuff at a loss (when hitting the usage limits) so you would benefit from using that instead of doing inference on your own hardware. But if you're getting the reasoning tokens then idk I guess I did something wrong.
>>
File: 1743826919964090.png (54 KB, 640x248)
54 KB
54 KB PNG
>>
>>106999182
>some wsb "analyst"
>>
>>107000631
It's funny because half of the time it'll say that even if it didn't make the information up.
>>
>>107000631
>>>/g/aicg/
>>
https://x.com/jloganolson/status/1981102506228011361
terrifying
>>
File: 1734982138255044.png (202 KB, 500x1825)
202 KB
202 KB PNG
>>106999182
GLM's slop profile is nothing like Cloode tho
>>
>>107000664
>*autistic screeching*
>>
>>107000683
Tell whoever made that to do PCA or just a similarity matrix rather than that unreadable mess.
>>
Lmao this is what happens if you choose a roleplay model for AI coding assistant
>>
>>107000710
>roleplay model
show system prompt
>>
>>107000710
Lmfao
>>
>>107000710
>Thought for 53.4s
kino...
>>
>>107000729
Don't have one. I've just finished setting up Kobold as my backend in Docker and I was curious if I can connect to it from VS Code using Continue extension. I just asked 1+1 to test the connection
>>
>>106996812
Haven't sorted out Linux yet so these are W10 test numbers with Vulcan. 128GB DDR5 "mini pc" system.

| model                          |       size |     params | backend    | ngl |   main_gpu | fa | dev          |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 786.92 ± 0.44 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 47.04 ± 0.05 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 175.14 ± 0.03 |
| llama 7B Q4_0 | 3.56 GiB | 6.74 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 45.83 ± 0.04 |


| model                          |       size |     params | backend    | ngl |   main_gpu | fa | dev          |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | ---------: | -: | ------------ | --------------: | -------------------: |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | pp512 | 901.58 ± 6.22 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 0 | Vulkan1 | tg128 | 45.67 ± 0.13 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | pp512 | 305.96 ± 0.39 |
| gpt-oss 20B MXFP4 MoE | 11.27 GiB | 20.91 B | Vulkan | 99 | 1 | 1 | Vulkan1 | tg128 | 42.98 ± 0.03 |
>>
>>107000963
that performance is terrible. my DDR4 does better
>>
>>107000972
Mine too but that's expected - it's a PCIE powered GPU with 128-bit memory bus running on laptop-tier hardware w dual channel RAM.
For this particular shoebox it gives 10-20x PP and 7x TG compared to running on the iGPU for around 45W extra power draw.
Windows tax included.
Depending on use case that might be enough for some running smaller models or MoEs. I still consider it grossly overpriced personally but then again, so are most SFF GPUs.
>>
>>106997912
ST sucks for anything that isn’t a one-on-one conversation. I want to have conversations with multiple characters in the same chat who don’t have access to the history they didn’t witness. I want to gangbang a character with multiple protagonists. I want the frontend to introduce generated characters that aren’t Elara or Lili and that have a believable range-checked D&D sheet. I want a quest tracker and automatic context summarization when the day ends. I want twenty other features I haven’t mentioned. And I can have it all in my own frontend without any bloat
>>
>>107001079
post it
>>
>>106998414
see, that explains a bit and also sounds pretty cool. Gives me some ideas for my own project
>>
>DavidAU/Qwen3-MOE-6Bx4-Almost-Human-XMEN-X3-X4-X2-X1-24B
retard or genius?
>>
So like, how far away are we from local models that can produce generated imagery in context with chatting and roleplaying and all that other shit?
would you say a year, a decade? Surely it can't be long now.
>>
>>107001184
>DavidAU
Could have stopped there, but let's read on
>This is a MOE merge of X2, X4, X1, and X3 creating a 4x6B - 24B parameters model, compressed to 19B "in size".
>The full power of every version is in this model.
beyond retard
>>
>>107001192
kobold already has a primitive version of it and an anon from the diffusion threads is making a game engine like thing for diffusion and llms. probably less than a year
>>
>>107001228
That's gonna be sick.
Right now I just barely have fun with chatbots and roleplaying. I need visual stimuli to really get going.
I'd rather read a fucking book than chat with a bot at this point, honestly. I need it to have more going for it and image generation that gets increasingly more sophisticated would be it for me.
Not just for jerking off, I mean for roleplaying like dungeon and dragons type of shit.
That would be revolutionary.
>>
>>107001235
I'm working on a frontend like that but only pregenerated images to keep it realtime and not look like shit
>>
>>107001079
Please have one of your characters get hit by a truck
and transmigrated from one of the scenarios you are running to a different one that's already in progress.
>>
We haven't reached AGI until I can smell the character I'm talking to.
>>
AIIEEEEE STOP MAKING YOUR OWN FRONTENDS JUST USE SERVICETESNOR
IT'S LITERALLY RIGHT THERE JUST USE IT
>>
>>{{char}} asshole contains an intoxicating musk odour that is always mentioned when her ass is present, or being used in a sexual manner, detail the smell
>>
File: 7888765.gif (1.49 MB, 245x250)
1.49 MB
1.49 MB GIF
>>107001337
>Want to chat with Miss Piggy
>Be into brap-play
>She hits you with a saucy smelly line
>You can literally get a whiff of her from the conversation alone
>She smells like she had chili for breakfast, lunch, and dinner.
>>
what do "adventure" roleplayers even do? a dragon comes up

*he kills the dragon*

how low IQ do you have to be to enjoy this shit?
>>
>>107001377
There's more to it than that, obviously.
Good roleplay would be the chatbot keeping track of your stats, your choices, your karma, your equipment, your map, your destination and previous locations, all of that shit a Game Master would normally handle for you.
And if you're not a retard, you'd respond with reasonable and in-line actions to your background and take everything else into context as well.
I think DnD roleplay is somewhat harder to do right now cause of the context capacity. But that's increasing over time so we'll get there eventually, I think.
>>
>>107001292
why bother if it can never be more than a wrapper? the game engine seems like a step in the right direction since all the big game engines are such resource hogs
>>
Any Ling-T1 users on? Curious how it's different from K2 0905
>>
>>107001461
they both suck. use mixtral 8x7B instead
>>
>>107001429
Not sure what you mean, it is a "game engine" in that it keeps a world state and does tool calling and all that stuff. Traditional game engines are fine for cloud AI stuff but for local they would just be competing for resources with the model, and I don't want to compromise on that
>>
>>107001489
are you retarded? what does saving tiny states have to do with competing resources? are you high?
>>
>>107001512
Not sure what the problem is, I'm was saying that traditional game engines (unreal, unity) would compete with resources but a light 2d engine shouldn't just be considered a "wrapper" because it still keeps state and manages world logic
>>
File: align-the-waifu.jpg (148 KB, 900x900)
148 KB
148 KB JPG
>>107000710
>>
File: file.jpg (155 KB, 603x584)
155 KB
155 KB JPG
New grok waifu dropped
https://x.com/elonmusk/status/1981911930747953189
https://x.com/tetsuoai/status/1981916179964027241
>>
>>107001377
Instead of killing the dragon in one sentence you should be fucking the dragon for 10 paragraphs while the princess watches.
>>
>there isn't any reason why this"general" actually exists except jannies leniency
>>
>>107001377
>american teenager: the thread
>>
>>107002161
yeah nothing says maturity like pretending to kill dragons in a sillytavern roleplay
>>
>>107002189
Nothing says NIGGER like a lack of imagination
>>
>>107002189
>>107002247
American nigger roleplay wins it all. 4chan is the best example of this behaviour.
>>
>>107002247
NIGGER???????????
>>
ts better to run LLMs locally (faster response time and nothing leaves your machine to say Discord trannies, Chicoms and Jeet scammers to sell your usage data, you could build a computer that mainly uses CPUs to run it for AI purposes on the low-end rather than focusing on GPU powered LLMs for text generation.
>>
https://github.com/ggml-org/llama.cpp/pull/16634#issuecomment-3445563655
>140% pp512 gain
applebabs we eating good
>>
>>107002189
>maturity
Bet you think mesugaki slop is the pinnacle of modern writing and creativity. /s
>>
my "list of what the retarded llm should be instructed not to do prepended to all prompts.txt" keeps growing and maybe someday I'll have a .txt as big as the claude system prompt
today I just added "Never write polyfills in the context of JavaScript" after the one more time that was too many where it just decided my lack of polyfills was a bug that needed to be fixed even though it was not prompted in any way to do that
using LLMs feels like meeting a script kiddie from 10 years ago who learned how to program from the old w3c shcools and you constantly find new things to tell them not to do or features they aren't aware of until they're told they exist
by default, if not instructed to use the most modern facilities available in (insert latest node version) they constantly manually wrap shit in Promises too
like, bruh, we have async await and most libs have async variants jesus
even the SOTA models like GPT-5 and Gemini do this kind of retarded shit constantly
>>
>/s
>>
>>107002307
Just in case you don't understand sarcasm =)
>>
This thread should not exist.
>>
Minimax m2 is dogshit, not to mention giga cucked.
Don't know why I even tried it when it was just pushed by shills with memebenchs.
>>
dragon pussy
>>
>>107002276
The 1024gb M5 Ultra Mac Studio will be crazy for AI. Literally what we've been waiting for.
>>
>>107001881
the voice still sucks
>>
>>106996812
>70W
noice
>GDDR6 / 128bit bus / 224gb/s
gah
>400~ euros
meh, I mean I guess it's good if you don't have a server with 8/12 channels
Still, 16GB is a bit too low. Now if this was let's say 32GB for 700~ then yeah, I'd probably get one for a consumer board PC to do inference stuff.
>>
>>107002283
it's funnier when I asked last week my junior to write a function to extend the attachment parses to also include images (which need async logic to do) and he came back to me with a Promise.all monstrosity (along with a useless bunch of if/else checks), I told him that it's 2025 and promises are 100% verboten in this project. He fixed it later, but I suspect this guy is just generating straight from claude and pasting whatever shit it gives him, test if it works and then makes a PR.
>>
>>107002610
>harshing the vibe-coding
>>
File: 2025-02-04-141509.png (3.22 MB, 1264x2216)
3.22 MB
3.22 MB PNG
>>107002356
>>107002467
The duality of man
>>
>>107002610
even when there are moments you'd want to reach for something like Promise.all, Promise.all is never the answer
if you have a large array of concurrent tasks to execute in parallel, you want your executeThisShit() function to have at least a parameter to set a hard concurrency limit so that a large array of tasks doesn't suddenly fire trillions of I/O or API calls..
Promise.all is a bad API designed by mongoloids



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.