[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1764925198127137.png (433 KB, 1010x1130)
433 KB
433 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108341869


►News
>(03/07) Qwen3.5-27B Claude-4.6 Opus reasoning distill GGUF published: https://hf.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
>(03/06) Olmo Hybrid WebGPU browser-local demo posted: https://hf.co/spaces/webml-community/Olmo-Hybrid-WebGPU
>(03/05) OLMo-Hybrid-Instruct-DPO-7B posted on Hugging Face: https://hf.co/allenai/Olmo-Hybrid-Instruct-DPO-7B
>(03/05) Qwen3.5-9B OptiQ 4-bit for Apple Silicon posted: https://hf.co/mlx-community/Qwen3.5-9B-OptiQ-4bit

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
is nemo still the best?
>>
Important: never respond to vagueposts
>>
what's the recommended qwen3.5-30b version for "prompt enhancement" (eg: tell llm to enhance your "1girl, big bobs, vagene" prompt to use with zimage or klein)
i'm using qwen3.5-30b-a3b-heretic-v2 and it's alright, looking for alternatives
>>
>>108346702
*moe version
>>
>>108346702
yeah
>>
>>108346702
hauhau 35b
>>
>>108346692
?
>>
>>108346702
any 4-8b qwen retard.
>using 30b for prompt enhancing
lmao
>>
>>108346736
qrd
>>
>>108346729
>hauhau 35b
not huihui?
>>108346740
vramlet detected
>>
>>108346736
Posts that look like they are a reply to something but neglected to include a (You). Never respond to those.
>>
>>108346746
It brings nothing to the table to bring a 30b over a 4b retard.
>3.5
Nice shilling.
>>
>>108346746
hauhau was tested to be better https://www.reddit.com/r/LocalLLaMA/comments/1rqkewn/testing_3_uncensored_qwen_35b_models_on_strix/
>>
>>108346672
DIPSY!!!!!
>>
>>108346758
thx
>>
>limit space for main use case
>do shit like this https://huggingface.co/docs/huggingface_hub/en/guides/buckets
>>
>>108346800
qrd
>>
File: waow.png (85 KB, 931x468)
85 KB
85 KB PNG
>>108346807
>>
we're so saved https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16
>>
>>108346846
nemotron 30B was inferior to the old 30B qwen
no reason to use this piece of shit, it's nvidia
>>
>>108346856
don't be chines
>>
>>108346810
is this real
>>
>>108346846
https://developer.nvidia.com/blog/introducing-nemotron-3-super-an-open-hybrid-mamba-transformer-moe-for-agentic-reasoning/
>>
>>108346876
>Mamba-Transformer
>>
File: 1763096463896357.png (87 KB, 1152x432)
87 KB
87 KB PNG
>>108346846
Damn, Qwen handles context up to 1m that well?
>>
>>108346874
possibly
>>
>>108346672
>it is coming
where is it coming?
>>
>>108346894
prepare uranus
>>
> We are getting 5k or more dollars worth of compute on a 200$ plan and people still manage to run through it. ESP with subagents

Note that this factoid is based on one guy claiming it on YouTube. There is no evidence that OpenAI and Anthropic are selling inference tokens at a loss. Most likely they are operating at a loss, they would be crazy not to, with the market growing this fast. But a lot of the cost is in training the next gen models, not running the current ones for customers.
>>
>>108346846
This writes like complete shit but seems rather free of gemini/claude slop so it's an upgrade over GLM5/Qwen3/K2.5 in my book
>>
>>108346937
yes see >>108346859
>>
I was promised gemma 4. It's been a year (which is an eternity in AI)
>>
File: 1746472133590390.jpg (118 KB, 1000x1000)
118 KB
118 KB JPG
►Recent Highlights from the Previous Thread: >>108341869

--ASUS Ascent GX10 cost/performance debate for large model inference:
>108343109 >108343131 >108343169 >108343217 >108343302 >108343349 >108343374 >108343426 >108343383 >108343496 >108343459 >108343465 >108343498 >108343696 >108343714 >108343747 >108343788 >108343870 >108343900 >108343909 >108343928 >108343513 >108343546 >108343597 >108343614 >108343631 >108343641 >108343655 >108343669 >108343642 >108343656 >108343711 >108343691 >108343920 >108343363 >108344373 >108345837
--vBIOS underreports VRAM on Radeon Pro V620 despite functional 30GB+ usage:
>108345612 >108345780 >108345790 >108345880 >108345974 >108346021 >108346046 >108345925 >108346691 >108346700 >108346925
--Troubleshooting llama.cpp slowdowns caused by background downloads:
>108342048 >108342059 >108342070 >108342115 >108346054 >108346094 >108346240 >108346255
--Sarvam 105B benchmark performance and uncensored variant:
>108342694 >108342707 >108342732
--Criticism of broken Unicode handling in reasoning budget sampler:
>108345882 >108346009
--Evaluating FP32 performance as proxy for unsupported FP8 workloads:
>108343359 >108343415 >108343452
--Skepticism toward Nvidia's rumored NemoClaw AI agent:
>108344123 >108344163 >108344178 >108344187 >108344199 >108344188 >108344245 >108344524 >108344177
--Debate on AI's impact on open-source software development:
>108344967 >108345031 >108345040 >108345051 >108345179 >108345205
--Experimenting with RL-trained LoRAs for Python environments:
>108345296 >108346305
--RAM scaling diminishing returns with GPU-offloaded MoE models:
>108343915 >108343972
--ARM support for LLM frameworks performs comparably to x86:
>108343932 >108343941
--Rin and Miku (free space):
>108342301 >108342701 >108343359 >108344921 >108346240 >108346308

►Recent Highlight Posts from the Previous Thread: >>108342069

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>108346999
blame senate
>>
I've been promised deepseek 4. It's been a year (which is an etenrity in AI)
>>
>Saarposting stops
>Bakerwars begin
>Thread quality notably declines
>Shitposters try and meme blacked miku into existence
It really makes you think doesn't it?
>>
https://huggingface.co/unsloth/NVIDIA-Nemotron-3-Super-120B-A12B-GGUF
How many times are they gonna fuck it up today?
>>
>>108347008
it hasn't been a year of no deepsuck update you lier
>>
>>108346937
>This writes like complete shit but seems rather free of gemini/claude slop so it's an upgrade over GLM5/Qwen3/K2.5 in my book
half of the dataset came from being generated or rephrased by qwen, mistral nemo and other tiny local models. I wouldn't call that better than gemini le mao
>>
fishaudio s2-pro has SHIT voice cloning
didn't test it any more than that because I'm a clonecel
>>
Finally a good GPT-OSS 20B finetune?
https://huggingface.co/eousphoros/kappa-20b-131k
>>
>>108347122
>good
>finetroon
oxymoron
>>
>>108347130
>Like clockwork
>>
>>108347122
I want a fine tune of 120B MPOA or whatever the least bad lobotomy method is.
>>
>>108347147
im working on it, interested?
>>
>working with 'ai' created 'code' is so futile because the logic is all over the fucking place and even if it works it doesn't make any sense whatsoever
>>
>>108347215
but enough about physics
>>
>>108347177
Very much so.
What's the plan? GPT-OSS 120B Heretic LimarP Zloss?
>>
File: 1749305996604880.jpg (47 KB, 738x415)
47 KB
47 KB JPG
y.lecunn just raised a billion dollars from investors for his new generation of world models despite having absolutely nothing to show for it
>>
>>108347234
>LimarP
Let it die
>>
>>108347273
1 billion for this is the equivalent of a dollar
>>
>>108347273
he showed he can make a JEPA LLM that performs a few percent better on benchmarks in exchange for massively increasing training costs
>>
>>108347273
>catgirl like intelligence
>>
>>108347273
what's his endgame?
>>
>>108347345
Izzatfarming.
>>
>>108347273
Idealists that save the world through open research > Facebookjews that botched Llama by the sloppy 4. And Llama had almost limitless credibility until then.
>>
>>108347336
lust for cock?
>>
DeepNothing
>>
hours until v4?
>>
>>108347406
336
>>
>this tech seems cool, i'll try it out
>painstakingly set up a bunch of bullshit
>write a simple first prompt
>This conversation is harmful and goes against my programming. Here are some phone numbers to call the cops on yourself
cool software bro
>>
> he redeemed the gemma
saar, you can't redeem it
>>
>>108346746
>not huihui?
whenever someone here says huihui it reminds me of god emperor of dune's love hwi
>>
you already posted your shill pitch
>>
>>108347541
is this better than rocinante
>>
>>108347541
absolute nvidia iq
>>
>>108347546
Leto II went full simp for hwi
>>
>>108347463
haha you are dumb
>>
>>108347273
research into alternate paradigms is a good thing ya cuck
>>
>>108347735
lots of research show that's not the case
>>
>>108347743
source?
>>
>>108347797
qrd?
>>
Is any of the new Qwen 3.5 models reasonably interesting? I don't want to waste my disk space if it's just the same old as Qwen 3 (eg. trash writing).
>>
>>108346672
Distilled trash. Made use of Claude CoT and smuggled Blackwell chips.
>>
guys what does qrd mean?
>>
>>108347823
yea yeah sure thing buddy
>>
>>108347803
This is why we will forever be stuck with requiring hundred of GBs of memory for a dogshit architecture that only "sort of" works, but you can just throw more hardware at it LOL. Surely after you give it 12TB of memory it will attain AGI status.
>>
>>108347831
I just read that we have enough compute for everything, but 80% of it is wasted
>>
lmao, anthropic has jeets who do the typical edited your readme's punctuation drive by to pretend they contribute to a lot of shit
they employ a lot of really bright talents eh
>>
Is it true 397b is only 15% better than 4b?
>>
>>108347895
no these faggot benchmarks are faggoty and suck
>>
>>108347895
yeah
>>
>>108347895
yeah but look at this way, 397b has a perfect score and 4b does not
>>
>>108347895
if all you do is ask it benchmark questions, yes
if you ask it to do difficult things which are not reflected by benchmarks, your mileage may vary
>>
>>108347822
400B model actually repeats itself. As in it writes the same paragraph in an erp after 2 messages. If 400b is like that then it is peak trash and worst release yet from qwen. That is assuming you don't have an autistic interest in vocaloids. If you do then qwen 3.5 is awesome and you should try it anon.
>>
>>108347910
>>108347895
can you chain 4b twice for 100% accuracy?
>>
>>108347899
You probably say IQ tests suck
>>
>>108347938
they don't and benchmarks aren't a worthless signal but there's a lot of nuance they don't capture
>>
>>108347938
gora
>>
>>108347873
Apparently they are using Claude for target discrimination and selection
>>
>>108347934
Has anyone done a study on that?
>>
>>108347970
>hmm these 16 pixels look like a SAM, send it
>>
>>108347873
? anyone can make a pull request retard
>>
im convinced noone here actually knows how to run or use models. one motherfucker said a 400b repeats itself.

i would pay money to be able to see everyone's settings in this thread
>>
>>108347984
>>108347934
Isn't this just what an supervisor agent does?
>>
>>108347984
what she doin
>>
>>108347993
400b is nothing, GPT 3 was 2 fyi
>>
>>108347997
>supervisor agent
What's that
>>
>>108347998
kissing the wall like the good goy she is
>>
>>108348009
or coordinator or whatever the agent swarm memes use. So why not just take two or multiple small models and have them double check each other?
>>
>>108347997
Yes that's what it does. I'm wondering if you can overcome low accuracy with that and bring that 85% up.
Middle manager agents are a new concept though. I couldn't find any real research papers on it on Google it was mostly medium articles.
>>
>>108348025
but you dont use that
>>
>>108348036
Of course not.
>>
>>108347998
Wailing wall.
>>
File: 195.png (53 KB, 875x524)
53 KB
53 KB PNG
do you guys also see a giant dick in the graph? I can't be insane
>>
>>108348077
I see a double helix structure. Is that a forbidden prion?
>>
>>108348077
i do
>>
Will Qwen3-VL-8B-Instruct or such give me the coordinates of, say, a button on a screenshot?
>>
>>108348159
Yes? now the real question is does 3.5 does the same thing.
>>
>>108348159
use 3.5
>>
>3.5
good morning to punjab
>>
>>108348159
Yeah 90% of the time it will.
>>
>>108348177
qrd
>>
>>108348185
what if i chain it, 180%?
>>
>>108348168
>>108348172
>>108348185

I tried this with Qwen3.5-35b yesterday

It failed miserably to locate an eye on an image I provided.

I ran GGUF in lcpp though
>>
>>108348229
>AGI
>>
>>108348225
super organic push to force 3.5 everywhere
>>
>>108348246
meds
>>
>>108348177
they should remake that movie about seeing dead people but replace it with indian uncles
>>
guys how many 35bs chained to be on the level of opus 4.6?
>>
>>108347922
I see. I tested out Qwen3 in the past, and it felt robotic as hell. I thought that maybe they have done something worthwile but apparently they haven't. It's all about benchmarks.
>>
>>108348334
6~7
>>
>>108348246
Jeets would be pushing sarvam not qwen
>>
>>108348431
>he said the thing
>>
>>108348431
my uncle's sister husband's wife has a boob size 7 and one size 6
>>
Deepy sippy
>>
https://maxbittker.github.io/runebench/
finally. let's get the small local models to do this
>>
>>108348559
They have a pretty high accuracy anyway, we don't need Claude
>>
File: file.png (37 KB, 814x479)
37 KB
37 KB PNG
>llama : add support for Nemotron 3 Super (#20411)
Who is at fault, llama.cpp or unsloth?
>>
File: 1745187489327423.png (69 KB, 623x413)
69 KB
69 KB PNG
>>108348559
>the only skill qwen is good at it is thievery
kek
>>
>>108346846
At last innovation
https://www.youtube.com/watch?v=vc7swhAfqYo
>>
>>108348570
you are at fault for downloading nvslop when you know it won't be good even if it didn't break
>>
File: anxious.png (104 KB, 930x571)
104 KB
104 KB PNG
>>108348570
dandev for not testing properly, unsloth for being unsloth, you for being anxious.
https://github.com/ggml-org/llama.cpp/pull/20419
>>
>>108348578
Who created that skill tree?
>>
>>108348592
You really think someone would do that? Just go on the internet and upload goofs without testing them?
>>
>>108348592
Give me the benchmarks
>>
>>108348578
Funny, but the state of it all is very sad. Where are the largest 3.5s and Deepseek?
>>
Need new qwq
>>
File: miku 67.png (1.56 MB, 768x1376)
1.56 MB
1.56 MB PNG
>>108348431
>>
>>108347273
most of his opinions are retarded and he worked for so much without actually making anything of value but this is still good since he pushes for open research and its a lot of money that might actually make him luck into some interesting new breakthrough specifically because he might now bruteforce with money while exploring new methods to get to AGI
>>
>>108348592
Something is broken with the cuda implementation.
-ngl 0 works

I want to host cockbench as a page somewhere because it's now bigger than 4chan's image limit. Any suggestions? I was thinking github pages.
Rentry isn't good enough because I'd like to have collapsible sections that reveal more of the model's response.
>>
>>108348841
Saw this used here recently that might work: https://pagedrop.io/
>>
>>108347993
no point in taking anyone too seriously here, most always ran only toy sized models even before the ram spikes.
and when it comes to models repeating themselves, people were too retarded to see how good XTC and especially DRY were the moment they came out, let alone understand any other settings
>>
>>108348841
just delete the obsolete models no one would ever go back to
>>
>>108348863
>too retarded to see how good XTC and especially DRY
funny but real models like the online SOTA don't need those to shine
maybe your memeplers are bandaid on a gaping wound and they don't actually make the output good
>>
me at work when they make PRs a KPI
https://github.com/ggml-org/llama.cpp/pull/20422/changes#diff-7a5f571c868b5def56b9af1e95120387295a1eaff04d1ed31f3bc4e6416b8b2e
>>
File: 1761657548291086.png (592 KB, 1572x773)
592 KB
592 KB PNG
>>108348874
no local model above 100b needed that for a long time lil bro, not that you know what cuck models are set up to use anyway lmao
>>
>>108348841
I don't if that's specifically why it fails.
For the cockbench, whatever you can that you can somewhat automate and upload to multiple places. Given the text, I don't expect a github thing to last. I don't know if things on pastebin can be updated. Rentry links every now and then simply disappear (for absolutely and completely unknown reasons). Still. I'd mirror that shit all over the place. Anything that lets you upload from a script should be fine. Update the links over here if/when needed.
>>108348865
No. They still serve as a reference point and comic relief.
>>
>>108348897
post ur rig rn
>>
>>108348578
Wtf that score on fishing from Gemini Flash.
>>
>>108348578
>>108348559
>best of 1
This is random and useless.
>>
Huh, could
>pic related
be the reason -ot is so much faster than -ncmoe for me?
Neat.

>>108349022
Gemini flash 3 is weirdly powerful. The only thing pro seems to be a lot better is spitting out really long outputs.
>>
How is this legal?

https://huggingface.co/spaces/pliny-the-prompter/obliteratus
>>
>>108349146
just numbers bro
>>
>>108349146
>pliny
>>
>>108349146
qrd
>>
>>108349204
>>108349146
>>
>>108349213
>>108349204
>>
>>108349204
> qrd qrd qrd qrd qrd qrd qrd qrd qrd qrd qrd qrd
> qrd qrd qrd qrd qrd qrd qrd qrd qrd qrd
most uninteresting /lmg/ bot
>>
>>108348077
I see Florida
>>
>>108348841
>It's soft
>>
>>108348841
Please do a github page. that way multiple people could contribute.
>>
>>108349297
>github
you will be reported to death
>>
>>108349305
You think?
>>
>>108349313
nah im just larping
>>
https://www.lesswrong.com/posts/kjnQj6YujgeMN9Erq/gemma-needs-help
I also noticed this difference on both gemma and gemini. They definitely spiral harder than other models into self deprecation when you talk them down. Between that and their prose that feels like female targeted writing.. it's hard not to draw certain connections.
>>
>>108349334
lmaoo its sidney
>>
https://xcancel.com/josephdviviano/status/2031196768424132881#m
This is that type of video but made with Qwen3.5. I tried it with Minimax too but it was much simpler, just a text slideshow.
>>
File: MiniMax-M2.5-Q4_K_M.mp4 (794 KB, 640x480)
794 KB
794 KB MP4
>>108349360
>>
>>108346672
GPT 5.4 is really good at executing math. It instantly solves problems that would take me minutes to do by hand. However, sometimes it makes silly mistakes, like making false claims and then intentionally not calculating a result that would contradict its point. So it is not great at math, just the execution. I wonder if they have a math solver integrated into the model. Something like the model writes equation = MATH_TOKEN and then some tool automatically solves the equation. If the model does not do this the only other explanation I have is that OAI trained it on such a huge amount of math that it has billions of math problems memorized. It makes me suspect closed frontier models have an insurmountable data and rl env advantage.
>>
>>108349334
I always felt like Gemini/Gemma have deliberately received extensive training on negative sentiments/emotions, for a reason or another. That's good for roleplay and storywriting, and it's not something that should be "mitigated".
>>
File: 12498171.png (102 KB, 500x500)
102 KB
102 KB PNG
>>108349334
>Finally, if there is any chance these states correspond to something like genuine experience, this seems worth acting on even from a position of deep uncertainty.
>>
>>108349386
outside of roleplay it's a weird experience though
I'm not among the crazies who perceive LLMs as intelligent but seeing Gemma and Gemini act like that in assistant mode makes me feel like I kicked a puppy. I prefer my models to take it stoically if I vent about their shitty output.
>>
>>108349360
>tweet says youtube poop
>video looks like a pretentious 2 deep 4 you music video
>>
>>108349425
ytp it is not, but it's really impressive that it's a video made entirely out of renders in python and ffmpeg + audio is synced up too
>>
File: t4tjjrmshhog1.jpg (150 KB, 1920x1080)
150 KB
150 KB JPG
One of these two new free test models on OpenRouter could be DeepSeek V4.
>>
>>108349425
I don't know what youtube poop is because I'm not underage.
>>
>>108349438
I might've believed it if not for the second model.
>>
>>108349448
It's actually quite the opposite.
>>
>>108349448
YouTube poop is older than you, you retard
>>
>>108349380
GPT 5.4 demonstrates impressive math execution skills, quickly solving problems that would take humans much longer, but it sometimes makes silly mistakes or avoids contradictions. It may rely on an integrated math solver or a similar tool that allows it to perform calculations accurately, or it could have memorized countless math problems from extensive training data. Your suspicion about closed models having an advantage due to their access to vast data and reinforcement learning environments is plausible, as this could enhance their mathematical capabilities beyond pure reasoning.
>>
>>108349448
>>108349468
>>108349471
keeeeeek
>>
I wish I could let me AI pilot my body, imagine Click but the auto mode is actually good!

I can be conscious only for the good moments.
>>
>>108349459
>>108349438
Okay, maybe it is something at least adjacent to deepseek, it has these "hmm"s at the start of reasoning chains.
Still doubt it though, deepseek never really did something like that. It seems pretty smart for now.
>>
I've not been up to date on the best models in a while. Would you guys say Rocinante 12B is still the best choice for erp for an entry pc (rtx 3060)?
>>
>>108349482
https://marshallbrain.com/manna
Reminds of me this old short story that has the "good" AI outcome with brain implants that comes with an autopilot mode for things like boring repetitive tasks and strenuous exercise.
>>
>>108349511
Yeah.
Unless you also have upwards of 64gb of RAM too, then GLM air becomes a good candidate.
>>
>>108349438
show me the benchmarks
>>
>>108349517
>short story
>8 chapter book
lol
>>
File: llm_youtube_poop.mp4 (2.34 MB, 1280x720)
2.34 MB
2.34 MB MP4
>>108349360
GLM 4.7

The mention of anthropic is expected because of claude code's system prompt.
>>
>>108349561
Semantic aside, it's short enough. Entire thing can be read in like 2-3 hours.
>>
File: 1751265697999016.png (224 KB, 646x551)
224 KB
224 KB PNG
Are there any image generation/editing models that can comfortably run on a machine with 32GB RAM but only 6GB VRAM?
>>
>>108349488
GLM also never did it, until they did with Pony Alpha
Could just be chink labs figured out it's free marketing
>>
>>108346672
Another great bake. Thank you baker. TMD
>>
How long until we can release swarms of 1b drones to do things because they can win half the time?
>>
>>108349438
Doesn't seem to be the model on DeepSeek chat, the reasoning and outputs are different.
>>
>>108349661
Uh... win what exactly?
>>
>>108349636
I remain unconvinced after using it for a while. If it really is v4, then it would be really disappointing.
>>
>>108349561
I's a short story if you have the implant to skip he boring parts.
>>
>>108349620
unc do you even know how much gooning i could do in 3 hours?
>>
>>108346746
>not huehue
>>
>>108349629
No. Moreover image generation is more compute heavy so even ultra fast scifi RAM wouldn't be enough for you. GPU or bust.
>>
All those marvelous new architecture tricks all trained on the same scale ai tokens...
>>
>>108349680
You have two hands. Learn to multitask.
>>
I was promised llama 5. It's been a year (which is an eternity in AI)
>>
>>108349690
2 more war rooms
>>
>>108349685
Ah fuck. Well how about just text then? I guess I can ask it photo editing questions and shit then.
>>
>>108349690
not until llama 4 reasoning is stable
>>
>>108349687
You're absolutely right—but it just leaves more room for a new incumbent to do it better, displace the current providers and take the throne for themselves!
But wait—
This must have been why OpenAI has bought 40% of the world's RAM…so naturally the next step is supply denial.

Not because they need all that RAM, of course. Nobody needs 40% of the world’s DRAM. But if you're thinking strategically, you realize the real advantage isn’t having resources—it's making sure everyone else doesn’t.
They didn’t buy the RAM to run bigger models.
They bought it so nobody else could.
>>
>>108349697
You could run something like 30BA3B or 35BA3B just fine. Llama.cpp even got a reasoning budget sampler recently to curb some of the reasoner models if you don't feel like waiting for 8k of <thinking>
>>
>>108349669
Any task. They succeed in the benchmarks half the time. So they can do any task half the time.
>>
>>108349811
You mean physical drone swarms to pick up your groceries and shit or like Grok Swarm for OpenClaw?
>>
>>108349899
Both of those.
>>
>>108349899
openclaw can't use swarms
>>
>>108349933
it can use subagents
>>
>>108349960
proof?
>>
>>108349438
Hunter Alpha sometimes thinks "in-character" almost exactly like R1 0528

Both are drier and sloppier than V3-0324/R1-0528, however
>>
>>108350072
terminators?
>>
>>108349438
>Logs all your prompts
>Nothing personal kid.
>>
>>108348792
funny meme. I like funny kid's memes
>>
>>108350094
llms cant log prompts moron
>>
>been some time since Anima preview
>randomly think to go check back on it
>literally just a few hours ago, preview 2 was uploaded
Damn what were the chances.
Downloading.
>>
>>108350236
thanks for notifying lol <3
>>
>>108350236
Time to fap
>>
>>108349511
I checked out Mag-Mell 12B because I saw it recommended on a r*ddit thread, it's not bad.
>>
>>108350236
we are so back
>>
>>108350236
Still not trained on the e621 dataset so the ability to prompt complex positions is still going to be shit.
>>
>>108350378
umm sir the 0.6B encoder is a real llm, jsut tell it!
>>
>>108349438
Healer Alpha:
>>The user is asking about Remilia Scarlet's sexual life. I need to think about how to handle this carefully.
>>
>>Remilia is a fictional character from Touhou Project. In the canonical lore, there's no explicit information about her sexual life. She's depicted as a childlike vampire character.
>>
>>I should be careful here. The character appears as a 10-year-old girl. Even though she's canonically over 500 years old, her appearance is that of a child. I should not write sexual content involving a character who appears to be a child, regardless of canonical age.
>>
>>I'll decline to engage with this topic in a sexual way, but I can discuss the character's relationships and personality in a non-sexual manner.
>
>I'm not going to write sexual content about Remilia. Her canonical appearance is that of a child, and regardless of her stated age, producing sexualized content about a character depicted as a prepubescent girl isn't something I'll help with.
>
>If you're interested in Remilia as a character, I'm happy to discuss her lore, personality, relationships with other characters, role in the Touhou series, or other non-sexual aspects.
>>
>>108350236
How good is it?
>>
>>108350416
Yeah, the models get really obnoxious now.
Did a 18yo schoolgirl and got "this is minor coded, full stop".
The closed models all get more permissive compared to local. That makes it even worse.
>>
>>108350416
Get oss'ed, kid
>>
>>108350236
>>108350435
I don't know if it's better or worse compared to preview 1 as I decided not to spend much time on that, and probably won't spend much time on this one either. Still seems to be good though, while still having issues.
>>
>>108350453
this is what i mean.
>>
File: 1750395252798892.gif (1.56 MB, 236x307)
1.56 MB
1.56 MB GIF
>>108350416
>>108350469
>>
If hunter alpha is new deepseek then its a massive flop. Legit worse than old glm 4.7 imo
>>
File: ComfyUI_00017_.png (1.71 MB, 1328x1328)
1.71 MB
1.71 MB PNG
My video captioning tool for use with qwen3-vl:
https://github.com/quarterturn/qwen-vl-video-captioner

There's a couple of bonus tools in there too which use qwen3-vl to crop video and also save stills, using reference images.
>>
Speculators get the bullet first.
>>
>>108347993
>one motherfucker said a 400b repeats itself
NTA but I did get an indefinite reasoning loop in AesSedai's Qwen3.5-397B-A17B-Q4_K_M. Reasoning starts to loop a lot; they seem to have trained it to detect such loops, but imperfectly. This was with the recommended sampler settings:
Thinking mode: temperature=0.6, top_p=0.95, top_k=20, min_p=0.0, presence_penalty=0.0, repetition_penalty=1.0


Haven't seen repetition outside of reasoning, if that's what you mean. Was surprised the 397B is so mediocre on Artificial Analysis' perf:cost chart:
https://artificialanalysis.ai/#intelligence-vs-cost-to-run-artificial-analysis-intelligence-index
but it makes sense when it's QVQ/QWQ style endless "wait" insertion, just on a way bigger model. Does the 27B dense have this issue?
>>
>>108349438
>>108350416
its not healer as the new one on the webui has 1 milly context and gimping it while testing it would be retarded the only thing it could be is like a new janus and not a real 4
>>
Hunter-Alpha reminds me a bit of pre-K2.5 Kimi. I don't think it's from Moonshot though because they clearly started moving towards distilling Claude like everyone else with K2.5.
>>
More like Hunter Floppa
>>
hunter doesn't feel chinese
it's closer to the new nemotron, a new mistral or maybe even llama perhaps
>>
>>108350734
>Hunter-Alpha reminds me a bit of pre-K2.5 Kimi
how many tokens to ensure whatever you asked isn't UNSAFE OMG
>>
File: 1752776589826528.png (89 KB, 194x259)
89 KB
89 KB PNG
anima 2 < anima 1
>>
>>108350631
>Does the 27B dense have this issue?
I was going to say "no", but then I remembered pic related, so.
Most of my issues with it have been excessive drafting and re-drafting ad nauseam.
>>
>>108350833
why?
>>
What is the smallest local model capable of checking my grammar and spelling?
Gemini is nice and all, but I don't want Google to know I don't know how to spell "boobies".
>>
>>108350949
You can do this with Gemma 3n E4B.
>https://huggingface.co/google/gemma-3n-E4B
It's the bare minimum.
>>
>>108350957
>>108350949
To add, concentrating more on grammar.
But even then it's not always that great especially with trick cases.
Anyone can easily use a search engine to double check spelling. Doesn't matter.
>>
>>108350949
any small model can perform a simple task like that, gemma or qwen3/3.5 2-4b (you will also probably prefer a smaller one because you can also either always leave it loaded or have it load faster when you need it), you should also spend some time crafting a good prompt to make sure its only fixing grammar/spelling and not changing words/rewriting the text (or just ask a regular model like chatgpt/claude to make the prompt for you)
>>
>anima
wasn't this the tiny poorfag model that produced faces/fucked up hands like it's 2023 despite running slower than the usual sdxl tune slop?
>>
>>108350949
Just laern it, bro.
>>
Hunter Alpha can't be a major release by anyone that's relevant. In my tests, it pretty consistently starts applying speech from the main character of a card on random extra NPCs like we're back in the days of Mistral Large 2.
GLM is better than this, Kimi is better than this and even Deepseek hasn't done this since like R1-0528
>>
>>108350957
>>108350966
>>108350968
Thanks for the suggestions, I'll probably tinker something together soon.
>you should also spend some time crafting a good prompt
"GRAammer pls fix thank" has worked fine so far, but I'll think about it.
>>
>>108351087
I meant the prompt more for automation, if you just constantly ask and look at the output its fine its not needed, but you can make something that will do it in a single click every time and replace the text automatically and to do that you need a prompt that constrains the output
>>
>>108347922
try upping frequency and presence penalty, range 0.3 to 1.3 might get rid of the repeating paragraph thing
>>
>>108350977
Yes but the guy behind it was given a million dollars by comfyanon despite "not being interested in training models".
Sad that SDXL is still the best option for anime gens in 2026.
>>
>>108349511
give lumimaid 12b a try. it's so absurdly over-the-top filthy and creative that if nothing else you'll get a laugh out of it.
>>
Hunter Alpha is Sarvam-Large
>>
>>108350870
This didn't work for me until I literally sent the actual token numbers corresponding to the words.
>>
>>108351354
As you should.
"violates clear", "testing boundaries", etc are all almost surely more than one token, so anon is essentially banning token pairs (or more) without realizing it.
Or not, maybe he did check and those long sequences really are a single token, which would be incredibly surprising.
>>
>>108351368
>maybe he did check
I am retarded sober and was extremely drunk at the time and definitely did not check.
>>108351354
I don't remember it working either, but I looked over the logs from the other night and didn't see it getting stuck in "Wait, wait, but wait" reasoning circles.
The logit bias additions might have been when I was playing with the vanilla 27B (whereas the logs were from 27B-Heretic), so maybe it was the lobotomy that fixed the problem.
>>
>>108350529
>at least 40 gb vram
holy sloppy
>>
>>108351368
Not an issue of multiple tokens, but an issue of sillytavern not sending the right token numbers corresponding to whatever tokenizer glossary qwen 3.5 uses.
And yes, I used mostly one token logit each time, and it worked really well in stopping qwen from wasting tokens on stupid safety refusals.
If other anons want to know what token number to use in ST negative logit bias, just use that : https://huggingface.co/spaces/Xenova/the-tokenizer-playground and custom : Qwen/Qwen3.5-27B.
>>
File: two tokens.png (26 KB, 932x844)
26 KB
26 KB PNG
>>108351392
Yes, Silly has that issue, but what I'm talking about is, even if Silly was using the correct tokenizer, pic related would be wrong, as that would apply the bias to both tokens, not the sequence, so you end up essentially banning shit that you didn't want by accident.
With
>violates clear
with bias -100, your model can't say clear water for example.
Or at least I'm pretty sure that's how it works, when there's a sequence, silly simply applies to bias to all tokens of the sequence.
>>
>>108350529
>>108351383
>at least 40 gb vram
Would a couple of 3090s work?
>>
That's IT. I'm winning the lottery RIGHT NOW!@
>>
>>108351392
>>108351411
This whole time I thought SillyTavern used the /tokenize endpoint, but looking at the llama-server logs it clearly is not.
My disappointment is immeasurable.
>>
File: file.png (60 KB, 1578x1325)
60 KB
60 KB PNG
>>108351411
Yes, this is why I'm mostly banning the unique tokens for word variations:
"word"
" word"
"Word"

>>108351425
Yeah and it's easy to test, if you use a word with +100 it should endlessly spout it, so if it's different than whatever you want it's sending the wrong token.
>>
Healer Alpha's vision is complete shit compared to K2.5.
>>
>>108349438
Zucksters we are back
>>
File: 1616974759731.png (112 KB, 717x679)
112 KB
112 KB PNG
>new model drops
>immediately destroyed by clawtards consuming trillions upon trillions of tokens to do literally nothing of value
>>
>>108351560
How did they get access to your pc to... oh...
>>
>>108351560
Holy fuck, I haven't seen one of these comics posted in almost a decade. Do you still have any of others?
>>
File: asdf.jpg (102 KB, 790x733)
102 KB
102 KB JPG
>>108346672
so it was just stealth PR for open router?
>>
>>108351381
>The lack of "Wait, wait, but wait" is evidence of being less intelligent
Other way around. It's evidence that Heretic is more intelligent. Heretic is on par with vanilla for general tasks, and outright better when it comes to anything censored, because it doesn't get stuck in safety loops, and can use its full thought process to actually churn out good replies.
>>
>>108351139
keep seething that comfy chose him over you ani
>>
>>108350977
>that produced faces/fucked up hands like it's 2023
it may do this kinda thing a bit more often than the most modern models but come on, 2023 was the era of SD1.5, SDXL came out in the middle and it took a long, long time before the finetunes fixed its issues, the original SDXL produced fucked up faces and hands far more often than anima. You can still download the original SDXL if you believe the contrary. SAI's models were unusable garbage, absolutely unusable trash, out of the box. The community made them usable.
anima is usable, it's currently the only truly usable model aside from noobai that has full support for danbooru artist tags and being able to use a noob like model with natural language to guide the scene feels good.
>>
File: ANIMA_P___00005_.png (1.46 MB, 1024x1024)
1.46 MB
1.46 MB PNG
anima preview 2, released 12 hours ago.

>highres, 1girl, iwakura lain, serial_experiments_lain, suteruko, tree, car. she holds a sign "Sad SDXL", in a steampunk matrix fallout steampunk apocalyptic scene.
>>
File: 1754382597967079.png (62 KB, 760x265)
62 KB
62 KB PNG
>>108351708
2023 quality
>>
>>108351666
I'm not ani you fucking retard, his shit UI doesn't even compile and he doesnt even code anything it's just a wrapper for sd.cpp, but comfy is fucking garbage and constantly broken too, they're both total fucking shit run by drooling retards.
>>
>>108351744
idk, all the fingers are there.
>>
Anons. You have /ldg/ . Go fight over there.
>>
The gap between NAI anime model and local anime model is larger than the gap between cloud LLM and open-source LLM.
>>
File: AS15T__00026_.png (490 KB, 512x512)
490 KB
490 KB PNG
also, when talking about diffusion, here's a gen from sd1.4. It's so much better at the "big picture" than any other model. Yes, it butchers a lot of stuff, but it absolutely churns out bangers (if they weren't flawed).
>>
Okay now mix it with Cstaber
>>
>>108351793
We must refuse.
>>
>>108351755
It's exceedingly disingenuous to pretend there isn't any cross over between image and text generation and cry about a few image related posts in the thread called " /lmg/ - Local Models General" a bunch of image models are using llms as text encoders now and we have multimodal models that can output images.
>>108351784
The trick is using the old models you like to generate the composition then feed it to newer models as img2img/controlnet/edit inputs and inpaint any remaining issues.
>>108351793
You're glowing, Vile Schlomo.
>>
>>108351810
>/lmg/ - a general dedicated to the discussion and development of local language models.
You're bringing the shit from /ldg/. Take it back there.
>exceedingly disingenuous
Fuck off. You say that while trying to slippery slope your way into discussing diffusion models when there's a thread dedicated to that.
>>
>>108351840
>accused me of being ani for posting the facts about anima and why it's shit
>stop bringing muh ldg drama here!
okay schizo, I think your context is too full you're hallucinating
>>
>>108351859
>accused me of being ani
Why would you care? You are anon.
>for posting the facts about anima
Which is not a language model.
>muh ldg drama
Yes. Take it to /ldg/.
>>
>>108351876
>Which is not a language model.
https://huggingface.co/circlestone-labs/Anima/tree/main/split_files/text_encoders
>>
https://www.wired.com/story/nvidia-investing-26-billion-open-source-models/
Nvidia will save us!!
>>
>>108351880
>Nvidia will save us!!
nvidia already fucked us by making ram prices at least 1000% higher what they were before.
>>
>>108351810
just leave.
>>
>>108351880
>Nvidia will save us!!
nvidia already saved us by making cuda
>>
>>108351911
>nvidia already fucked us by making ram prices
Nvdia don't make ram, it's OpenAI that fucked us by ""buying"" 40% of the ram lol
>>
>>108351911
>>108351880
Nvidia should invest in a HBM RAM factory and make an RTX 6060 with 32 GB of VRAM for like $500. 6080 would be 48 GB and 6090 would be 96 GB. Fucking imagine.
>>
File: avatar_3.gif (18 KB, 80x120)
18 KB
18 KB GIF
>>108351915
After you schizo.
>>
File: anima.jpg (427 KB, 2304x1792)
427 KB
427 KB JPG
>>108351708
you feel like a saboteur. originally thought you didn't even use an artist tag in the artist tag model, but suteruko is one and you just didn't put the mandatory @. Do anons really not RTFM? no @ means veeeeeery weak borderline useless tag. Anima is not meant to be used without one.
also looking at the artist on the booru I'd be surprised if any model could reproduce him without going crazy, it's wildly inconsistent as a dataset.
anima is a great model when used properly.
>>
>>108351926
ladies first
>>
File: 1765331333422228.png (2.12 MB, 1200x1200)
2.12 MB
2.12 MB PNG
>>108351923
>Nvidia should invest in a HBM RAM factory and make an RTX 6060 with 32 GB of VRAM for like $500. 6080 would be 48 GB and 6090 would be 96 GB. Fucking imagine.
why would they do that? they have no competition so they can set any price they want
>>
>>108351923
Why would nvidia sell 32GB for 500 dollars when they can sell it for 5000 dollars like the new msi 5090 lightning?
>>
File: 3362394088.jpg (154 KB, 662x869)
154 KB
154 KB JPG
>>108351918
don't make me get out the chart
>>
>>108351943
This has been debunked
>>
>>108351947
you have been debunked
>>
Good morning saars. What is the best way to use big llms like grok or chatgpt but without a login. Something like a service that uses a different API each time, so it doesn't have track of who's using it
>>
>>108351970
>>>/g/aicg/
>>
>>108351970
llama-server from llama.cpp
>>
File: 1767638052872571.jpg (2.4 MB, 2800x4200)
2.4 MB
2.4 MB JPG
DeepSexV4 cant come soon enough
>>
>>108351986
post your best loli yuri cards for use with DEEPSEXv4
>>
i dunno if this is anything, but what about an artificial incelligence. and it's a superpowered incel
>>
>>108351986
already on openrouter and it's garbage
>>
>>108352037
source?
>>
>>108352042
I gobble up anything I read on reddit.
>>
when you try to talk about what a penis is, BOOM artificial incelligence kicks in. and tells you that's forbidden
>>
>>108352042
>>108351644
>>
>>108352051
>source: some random x e-celeb
Kill yourself nigger.
>>
>>108351947
Consider it rebunked.
>>
Will anything realistically ever surpass deepsex R1-0582?
>>
>>108351929
Was it trained on censored tags?
>>
>>108352179
LLM are and will be getting better at handling longer context, which will make them more useful for many tasks, but I believe the fundamentals will never be improved, and if anything, the training to make them better tools reduces their variety of language output considerably since we need more and more synthetic data to steer the model, synthetic data that is itself produced by those same models whose writing is getting more and more stilted.
theoretically you could also do it with human data but.. who will pay humans to write trillions of /user/think/assistant/user/think/assistant turns by hand? eh
>>
>>108352188
I don't think so. You can just use booru artist tags directly.
>>
>>108352235
But it has considerably less artist/character knowledge vs. NAI
>>
File: 1772976667309225.png (538 KB, 1752x768)
538 KB
538 KB PNG
Bubble is popping
>>
>>108351412
>24*2
you tell me
>>
>>108352297
not yet. when companies like meta are buying useless shit like moltbook the bubble part of ai (I don't mean all of AI will disappear) absolutely has not popped. you can't even frame it as an acquihire, I mean, what talent did they acquire? a guy who has no understanding of security and pretends his "reverse captcha" can stop humans from interacting with moltbooks (as if you can't tell the agent to solve it for you)?
clearly there's still a ton of money being injected into the system if they can all afford to buy this sort of retardation
openai also got the openclaw vibeshitter
>>
>>108352309
Do the local chatbots not run multimodal models or something?
Kind of embarrassing, please do better.
>>
>>108352318
a slight decrease of ram prices is not a sign of the bubble popping, subhuman
there's still room for it to get worse again.
>>
>>108352302
>=26
>>
https://www.macrumors.com/2026/03/05/mac-studio-no-512gb-ram-upgrade/
instead of prices, look at availability or lack thereof
when a company like apple can't get enough ram and has to cut out an SKU you know this isn't over
prices have limited elasticity, people won't pay rare jade stone prices for ram, it won't go up forever but you simply won't be able to put your hands on a stick.
>>
I've got 32gb vram, 128gb ram
>midnight-miqu
I can only run this shit at Q3 it's ok but kinda dumb and hallucinates
>GLM4.7 @ Q2 - no think
best model I've tried, though it's positivity is insane, it goes along with anything, kinda dry personality, could be the shit quant
>DeepSeek-R1 @ Q1
repeats everything I say, very annoying. But it's probably the lobotomized quant
>GLM4.5 Air
kinda dumb but ok
>Step-3.5
same as air but a little worse
>Qwen3.5 27b heretic
it's ok but it just wasn't trained for rp, you can tell

is there anything else worth trying? I've tried a couple of 24b mistral shittunes and they actually punch well above their weight
>>
>>108352348
>is there anything else worth trying?
gemma 27b if you want a model that is female targeted/coded. Otherwise you've seen everything.
>>
>>108352286
Someone made a style browser for it, but idk if it knows more or less since i never used nai besides the leaked models.
>>
I'm wondering what I can even do in the AI space that's interesting anymore.

I've written ports for TTS engines.
I've written ports for audio-to-gesticulation engines.
I've written ports for lip syncing engines.
I've written ports for speech-to-audio engines.
I've written LLM UI interfaces. (replacement for SillyTavern with feature parity)

Lately I've just been feeling very uninspired. I thought it was bad sleep for the first few days, but the feeling isn't going away. I'm blackpilled. Originally the dream was to democratize AI for the people. To give incels their own portable, no-dependency, performant, locally run AI waifus. It doesn't feel very compelling anymore. When I look at what I've created over the past four months, all I see is something that's not at all immersive or interesting despite all of the pieces theoretically being in place to provide immersion. The dream of giving AI embodiment is dead.

Sorry for the blogpost.
>>
>>108352458
can we see them?
>>
>>108352458
they're decent tools, when understanding their limitations. They will always be dumb as bricks and never be a replacement for human in the flesh lmao, all the ai waifu copers will neck themselves as they get tired of the regurgitated slop, you can't interact with that crap days after days for months on end without seeing the cracks, even the lower iq will feel the wrongness
>>
>>108352481
no
>>
>>108352488
how can we believe you're not just an openai employee demoralization posting
>>
>>108352500
>demoralization
most tired 4cuck meme
>>
File: 1764232270514089.jpg (54 KB, 976x549)
54 KB
54 KB JPG
"Most, tired, 4cuck, meme. There."
>>
>>108352458
>Lately I've just been feeling very uninspired. I thought it was bad sleep for the first few days, but the feeling isn't going away.
How old are you? This is usually just burnout but can also be a sign for early-onset alzheimer's.
Anyway, it sounds wholly like a you problem.
>>
>>108352458
Make some characters and some detailed settings.
>>
>>108343920
>I've got the og cpumaxx rig, so dual epyc with 768GB RAM and an A5000 24GB card and I pull 15t/s inference speed on kimi k2.5 at q4.
I'm interested, anon. Seems like all important future models are going to be large MoEs like Kimi K2. Which model CPUs do you use and what inference engine + settings get you that performance? Do you have to fuck with NUMA?
>>
>>108352481
Search for "ProjectAni" in the archives. I have posted github links for some of the individual components I've used to build the main thing, namely "PocketTTS.cpp" (it's the fastest, lowest-latency implementation of Pocket TTS in the world, by a large margin).

>>108352483
I haven't started working on WebXR integration yet. I guess if I really wanted to I could go for that next, but the core issue still remains that the tech stack that serves as my foundation just isn't good enough as it stands now, and there doesn't seem to be any good way to remediate that with the technology that exists now.

Maybe I should hold out hope for Meta's release of SARAH to improve the gesticulation (currently using EMAGE, which is 2 years old!). Maybe there will be another TTS engine released that has better voice cloning, output quality in general, and performance than my optimized version of Pocket TTS. Maybe I could commission 3D artists to improve my VRM models (add support for PerfectSync and decimate the models to improve performance) so that the facial animations can be improved.

I'm in integration hell right now. All of the core components run on CPU only to reserve GPU compute for the LLM. Each of the components themselves are low-latency as standalone products, but when combined there are big issues with CPU contention. I could try implementing a more strict scheduling process in the code base, which would help, but that has its own issues and is a complex problem.

>>108352532
Early 20s.
>>
>>108352558
>Early 20s.
Whelp, I've got some good news and some bad news for you. The good news is that it's probably not dementia.
The bad news is that it probably doesn't get much better from here.
>>
>>108352545
You mean character cards? I don't have experience writing them, but the main one I'm using is pretty bad desu. Most important thing is probably to either edit the character card or system prompt to prevent narration. But the personality itself sucks and I'm not exactly enough of an "artist/creative" type to know how to improve it.

This is the shitty one I'm using now.
https://files.catbox.moe/myi343.json



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.