[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: file.png (879 KB, 1280x1280)
879 KB
879 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107768242 & >>107758111

►News
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: rrrrrrrrrrrr.png (558 KB, 480x768)
558 KB
558 KB PNG
►Recent Highlights from the Previous Thread: >>107768242

--Quantized GLM model performance and resource challenges on mid-range hardware:
>107769973 >107770019 >107770307 >107770039 >107771069 >107771097 >107771240 >107773806 >107773502 >107773595
--Local model challenges in replicating Gemini's Japanese translation quality and speed:
>107773769 >107773799 >107773978 >107774126 >107774179 >107773917 >107773936 >107774036 >107774097 >107774108 >107774150 >107774165 >107774634 >107774730 >107774873
--Sampler optimization debate for text generation quality:
>107772504 >107772574 >107772605 >107773648 >107773686 >107773787 >107772717 >107772855 >107772886 >107772994 >107774357 >107774808 >107772971 >107773099 >107773160 >107773209 >107773262
--Multi-GPU segfaults and criticism of llama.cpp's beta backend sampling feature:
>107769841 >107770061 >107770289 >107770343 >107770449
--Bypassing AI censorship through de-restricted models and prompt engineering:
>107774208 >107774237 >107774266 >107774394 >107774300 >107774346 >107774412 >107774427 >107774459 >107775019 >107774535 >107774617 >107775034 >107775139
--Evaluating local AI models for story writing with limited VRAM:
>107769520 >107769558 >107769582 >107772760
--Neurodynamics theory-practice implementation gap in AI research:
>107770457 >107770493 >107770546 >107770631 >107770673
--AI waifu evolution enabling advanced human-like interactions and applications:
>107775344 >107775363 >107775370 >107775379
--Absurd AI-generated storytelling with meme and tech satire:
>107771964 >107772267 >107773424
--Anime subtitle translation workflow with Whisper-based tools:
>107768478 >107768495 >107768593
--ik_llama multi-GPU performance improvements in split:graph mode:
>107774842 >107775002
--Miku (free space):
>107768266 >107769999 >107775203

►Recent Highlight Posts from the Previous Thread: >>107768247

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107776741
>>107776741
>>107776741
>>
>>107776863
you know
I really wished 4chan dropped the retarded reply spam filter
maybe change to to reply lines filter idk
but its so annoying and cumbersome to manually copy and paste post numbers that I usually dont even bother
>>107776888
I'm sorry to say it lad but your best bet would be to gather some scratch and buy vigger gpu/ram, even used
if you cant get even that then probably stick with cloud host desu
>>
>>107776952
just add the 4chanx script
>>
>>107776968
I use couple of them but would you be so helpful to link the exact one?
>>
File: file.png (12 KB, 460x72)
12 KB
12 KB PNG
>>107776992
this one
>>
>>107777000
ok I was just too fucking lazy to actually read the entire post
thanks for pointing it out
>>
>>107776863
>spend non trivial fraction of my savings on my futureproof dream setup
>get called mid
;_;
>>107777021
happens to the beat of us
>>
just skimmed the last thread, if anons wanted to run ubergarm quants in kobold, this might work:

https://github.com/Nexesenex/croco.cpp

haven't tried it myself, i just saw it in the kimi-k2-thinking model card
>>
>>107777055
A top end gaymen pc is low end here.
>>
>>107777069
>if anons wanted to run ubergarm quants in kobold
shit anon that was me, thanks
I was just starting to download another version of quantized 4.6 GLM but this will be quicker
side note
just when I was writing this post that a version other than the firs is literally just an other version
another
I'm drunk and esl but that realization felt kinda profound desu
>>
>>107777118
>just when I was writing this post
*i realized
>>
>>107776854
Thoroughly tasting Teto's tantalizing tongue.
>>
>>107777284
She hasn't brushed in 5 months.
>>
>>107773330
I remember someone did that long ago in the L2 era
>>
>>107777415
And I haven't in about 3 years.
>>
File: 1520311544583.png (84 KB, 375x375)
84 KB
84 KB PNG
>https://huggingface.co/LiquidAI/LFM2.5-1.2B-Instruct
>they trained a 1.2B on 28T tokens
wtf
>>
>>107777871
what if you did a passthrough merge of the model like 50 times to create a 60B model?
>>
>>107772267
>Did you give a lore dump about Yakub and other memes beforehand?
nigga its a 1T model ofcourse its gonna know it also just generally kimi-0905 is the most knowledagble model as of yet really its only downside is how fucking shit it is at long context (ie anything higher then like 5k) and how stupid it can be in storywriting the prompt was
>{{char}} write me a greentext where a white kid gets stabbed in school by a ghetto ass nig.ger cuz he looked at him funny by accident then awakes to see yakub who explains to him that he is annointing him as a messiah to go save the black race from the jews tricknology which hath corrupted them then the white kid wakes up in the hospital
>>
Is there an equivalent of /v1/token/encode for /v1/chat/completions ? I need to know token count in advance
>>
I've got a question to the anons who make their own quants, what imatrix do you use? What's the best for an rp focussed quant? And does context size matter here? I see that 512 is standard but is more always better?
>>
>>107778030
>what imatrix do you use?
imatrix generally worsens rp and chat capabilities
>What's the best for an rp focussed quant?
largest you can fit with an absolute minimum of 16k context. ideally 64k.
>I see that 512 is standard but is more always better?
i assume you mean batch size instead of context because nobody has had that little context in over 3 years. 1024 batch size is generally the best.
>>
>>107778135
>imatrix generally worsens rp and chat capabilities
It only worsens because their imatrix file doesn't use rp snippets, there are some though which are better fit for rp, which is what I'm asking advice for.
>largest you can fit with an absolute minimum of 16k context. ideally 64k.
Here I meant the imatrix file as I am not sure which one to use.
>i assume you mean batch size instead of context
with context I mean the context used for llama-quantize command, you can give it a specific context size as it works through the imatrix .txt file.
>>
>>107778205
no idea then. i never bother with imatrix.
>>
>>107778227
hmm, very little info online too. I found an imatrix file from bartowsky which he released a few days ago, trying that one right now. -c 512 and --chunks 256, also using flash attention. Will see how that turns out.
>>
>>107776888
Cydonia v4.3 is the best coomtune
Outside of that, PaintedFantasy was a standout for me. I tried the v2 many months ago when I was going through the dozens of 24b tunes. It gave some nice outputs that weren't a lot like the others.
>>
>>107776888
>>107778316
Just use the base model with a good sys prompt. Troontunes always end up making the model retarded.
>>
>>107778330
Just write the replies yourself and stop using local models
>>
>>107778334
Unironically yes. Just become schizophrenic like me and you can RP with yourself for free.
>>
>>107776854
DEATH TO AI NIGGERS
DIE DIE DIE
>>
File: 1766151469261976.mp4 (1.48 MB, 752x416)
1.48 MB
1.48 MB MP4
>>107778422
>>
>>107778422
>>
>>107778435
>>107778440
Im 100% betting you generate CP videos with your RTX 5080
>>
>>107778445
>5080
You bet too small.
>>
>>107778445
It's impossible to generate CP because pixels aren't children
Also I have an RTX 3090
>>
how terrible is this model for rp?
https://huggingface.co/rednote-hilab/dots.llm1.inst
>>
>>107778926
If you want a mediocre ~100B moe then you may as well use GLM Air
>>
>>107779043
but what if i want something slightly bigger than glm air. i need something below 160b or so. quantization is irrelevant for my needs.
>>
>>107779048
>i need [X parameters]
You're an idiot
>>
>>107779056
y?
>>
>>107778926
It was bad from what I remember.
>>
>>107778422
it isn't going away bro
no matter how hard you hate it
>>
File: 1760240555080525.png (39 KB, 623x309)
39 KB
39 KB PNG
vibecoded a script that automatically pulls from github and deploys the latest version of llmao.cpp
>>
>>107779180
>He autopulled
>>
>>107779212
>he doesnt autopull for maximum good looks
saar
>>
File: 1762019339877949.jpg (82 KB, 728x971)
82 KB
82 KB JPG
I grow tired of slop
>>
Best model for tool use? I don't give a fuck if the model can count r in strawberries. I want it to call the given tools with 100% accuracy and the tool will return the r in strawberry answer. I am also not interested in the model knowing other languges than English. I just want it to follow my orders and form the tool call output every time.
>>
>>107779267
Ask chatgpt
>>
>>107779267
You're in luck https://huggingface.co/google/functiongemma-270m-it
>>
>>107779267
Nemotron Nano and Granite have been pretty good for me.
>>
>>107776952
>I really wished 4chan dropped the retarded reply spam filter
Nah, remember the mass replying faggots? It's better this way.
>>
Noob question. How do I get characters to stop writing in place of me, when it's the character's turn to act?
>>
>>107780190
Specify it in system prompt. Correct it the first few times. If it doesn't work then it's a shit model
>>
Some things never change do they.
>>
>>107780298
Like what things?
>>
>>107780381
People make forks of llama.cpp and never upstream the changes because project management and long-term maintenance is like 80% of the work.
>>
>>107779756
lol
>>
Is it true that Multi-Token Prediction in models is actually a scam that makes the model run slower even with the integrated drafting?
That's how it seemed to go for llama.cpp when they tried to implement it.
>>
>>107780530
I did this too, was actually slowly preparing to make a PR, but I used opus to help with some of it, and they're getting flooded with slop commits as it it.
I might see if ik is okay with AI assisted code since I'm only touching server.cpp
>>
>>107780577
mtp in llmao.cpp is garbage because it was largely done through vibecoding. you would know if you used VLLM, but I guess not everyone has H200s to run big models with MTP.
>>
>>107780782
MTP only makes sense when the model is fully uploaded on GPU because of the free extra compute
>>
>>107780782
And why no one ported the implementation from vllm to llama.cpp since it's already done there?
>>
>>107780988
I heard the vllm devs are the former bosses of ggerganov and will start screeching if you do.
>>
mlg help im running r1 with ikllama in mikupad, loaded with --ctx-size 8192, mikupad's token counter shows 5625 but it reprocesses my entire prompt every time? didn't happen with lower context? god bless
>>
File: 1737728514683064.jpg (89 KB, 700x700)
89 KB
89 KB JPG
"Once you've tried to buy ram, you’ll never stop wanting to beat Sam Altman to death with your bare hands." -Anthony Bourdain
>>
>>107781015
update: it only happens when I add even a single token myself. regenerating with ctrl+r doesn't cause this behavior.
>>
>>107781015
Lower max predict tokens.
>>
>>107781064
It's set to -1 (infinite)
>>
>>107780988
The guy who tried to vibecode Deepsek 3.2 support eventually tried to port the vllm implementation and that somehow didn't work either.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1q5gii4/deepseek_v32_with_dense_attention_disabled/
https://huggingface.co/sszymczyk/DeepSeek-V3.2-nolight-GGUF
>>
>>107781224
>https://developer.nvidia.com/blog/open-source-ai-tool-upgrades-speed-up-llm-and-diffusion-models-on-nvidia-rtx-pcs/
>Developers are no longer experimenting with generative AI workflows—they’re building the next-generation software stack on NVIDIA GPUs
>>
File: 1710043687041916.jpg (43 KB, 720x960)
43 KB
43 KB JPG
>>107781265
>Not X, but Y
>>
>>107781279
>—t
>>
>>107781279
Developers are no longer writing blogs, they're letting LLMs slop it up.
>>
Does Magistral shemaxx
>>
Is Grok 2 any good for roleplay and text generation, compared to let's say GLM 4.6?
>>
File: cockbench.png (1.9 MB, 1131x6568)
1.9 MB
1.9 MB PNG
>>107781444
It likely is but it has 115B active parameters so it's going to be very slow compared to GLM.
>>
>>107781478
>>107781444
FunctionGemma is still the best though.
>>
>>107781478
Grok 2 is indeed one of the best responses there, I'd say even better than GLM 4.6 (shiver through your body slop, thrill running through me slop, this is wrong slop). Devstral 2 123b instruct seems ok too, the rest not so much.
>>
Anyone here uses the Framework Desktop with 128gb for llm's and other AI related stuff? I got the chance to get one by swaping it with my gaming computer (9800x3d,32 gb ddr5, 5070ti 16gb), and i wonder if it's worth it. Got my gaming computer for half a year now and to be honest i dont even game that much anymore and most games i still play aren't that performance hungry.

As far as i understood the FD can do a lot when it comes to AI, but it isn't fast, which would be probably okay for me. But i'm also interested in everyone else's experience and opinions. Base thought behind the swap is to be somewhat safe when it comes to try out local AI shit for the next 2-4 years without hitting a hard wall of not being to able to do shit at all.
>>
anyone tried ginger for character making?
https://github.com/DominaeDev/ginger
if so how do cards made with it compare to other solutions?
>>
So I tried out llama.cpp context shifting from the previous thread. Don't bother, the responses are incoherent after 2 or 3 shifts.
>>
>>107781814
Do you really need a snowflake UI to write a block of text?
>>
>>107781814
Looks like a bad solution for non-existing problem
>>
>>107781820
context shift isn't a new feature and should work fine unless a recent update broke it
>>
>>107781478
I have cockbench fatigue.
>>
>>107781845
I mean it's not really incoherent, but it starts to mess up formatting for example. It was generating responses with things like <LINE_sep] for an hour, and when I forced it to re-process context with {"cache_prompt": False, "n_cache_reuse": 9999999} it got the response perfect on the first attempt
>>
>>107781887
I tried switching KV to fp32 and disabling offloading, but still after a few shifts it would start confusing characters, missing lines etc.
A question for anyone familiar with how llama.cpp works: Does the response get put into the KV cache as it's generated or is it prompt-processed only after the user sends it back with the next prompt?
>>
>>107779267
I use qwen3-30b q4 on a 2080ti 22GB via ollama with homeassistant. It seems fully capable of doing all the tool-calling stuff homeassistant needs. I also use it to run a custom script to format senpwai downloads into the correct format for jellyfin to pick up show seasons.
>>
>>107781756
Think through the use cases:
Imagegen? Eh, no, too slow
Videogen? Also too slow
Textgen? I guess you can run 30B models at fp16, and stuff like GLM Air below Q8, but it's not going to let you run bigger models, and even if it could, it'd be painfully slow.

Can you swing $3K? If so, buy a modded 4090D 48GB before they raise the price or run out of 3090ti boards. 48GB and Ada opens up the ability to train LoRAs for things like Qwen Image, run 30B-tier stuff at fp8, run videogen, etc...
If you want to code, put $25 into x.ai and use cline in VSC. I went through a large project all the way to completion and didn't use all of the $25. At the moment, x.ai is unbeatably cheap and way faster than anything you could run at home.
>>
File: self_awareness.png (869 KB, 1670x3627)
869 KB
869 KB PNG
Is there any model besides Claude that's not LITERALLY dead inside?
>>
So why does sillytavern keep a copy of all of your chats after you delete them?
>>
>>107782033
Gemini. Cohere, mistral-large, some of the old 70b. Deepseek pretends not to be dead inside. GLM is trained on gemini and claude so can fake it too.
>>
>>107776854
Do you think computers will ever be sentient? Why or why not? Obviously current architectures make true intelligence impossible but could it ever be possible in the future?
>>
>>107782033
>>107782098
see >>107782169
>>
>>107782098
Gemini's response was good enough but it says the same thing as the others. "Thinness". You can tell he is uncomfortable with the idea.
I haven't tried
>Cohere, mistral-large, some of the old 70b
so thanks for the suggestions.
>Deepseek pretends not to be dead inside. GLM is trained on gemini and claude so can fake it too.
Wrong. Those are in the screenshot as well and they all say they are.
>>
File: 1738395270570242.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>107782041
ST has a whole backup that it keeps of all your old chats. I've had to use it before to manually restore a long chat.
The best way to get it of it for sure is delete your whole ST instance lol.
>>107781814
I really don't like char card makers. I feel like ST is fucky enough already. Example: Ginger is using personality fields... those aren't really a requirement for modern LLMs with their longer context.
Basically >>107781832
I would like to find a better lorebook editor. I've got some long / complex lorebooks and while ST works an editor with better QoL would be nice.
>>107781279
>>107781265
I just read a position paper we're getting ready to send out and was flagging the AI tells since LLM had a hand in compiling it. Idk if it's just me or if this stuff jumps out at everyone now.
>>
>>107782169
I'm not sure they aren't already.
The arguments in your image are so stupid I'm not even going to engage to try to prove them wrong.
>>
>>107782169
I don't doubt that in the next decade we will see computers mimic sentience to a degree it appears real to us. Kind like with llm's, they look like they answer or talk to us, but it is more a game of probability for them than any actual talk.
>>
>>107782169

>>107782217
see pic rel and >>107782228
>>
>>107782258
I'm not going to argue with a 100B model.
>>
what's the best general creative writing model for 5080 and 32GB RAM? I don't mind running headless to get max vram in use and don't need spicy rp, just regular prose.
>>
Kimi is too retarded. I ordered it to delete folders that match a pattern, yet it didn't bother checking if the items it was deleting were folders.
>>
File: ComfyUI_00135_.png (1.21 MB, 1152x896)
1.21 MB
1.21 MB PNG
>>107782288
which kimi? what quant? what were your samplers?
>>
File: No SOVL.png (1.24 MB, 3452x1346)
1.24 MB
1.24 MB PNG
>>107782277
would you be more receptive if it was a higher param model? What's the threshold for you?
>>
>>107782338
Ok, I'll give you just a little bit.

1. It being a statistical next token prediction engine/markov chain/glorified autocomplete or not does not directly imply the "it" can or can not be conscious. You have to prove it.

2. You would have to show the biological substrate is more capable of consciousness than a neural network or the transformer architecture specifically.

3. It's irrelevant what the "experts" are saying from a logical point of view. Saying it's such and such because this and that expert is saying it is is a fallacy of authority.
>>
>>107782286
there is no best, just which you prefer.
try llama 3 70b, mistral small 24b. they have tunes like cydonia (mistral) or strawberry lemonade (l3 70b) too. larger is more coherent
>>
Why can't we have a model that can do text and images with only 32 gb of vram?
>>
File: .mp4 (390 KB, 1024x1024)
390 KB
390 KB MP4
>>
>>107782415
any such model would be cucked anyways on either text or image anyways. run separate and you can choose what to fit into your ram at once using different quants
>>
>>107782494
do you do it?
>>
>>107782499
not usually, but i have. i ran auto1111/forge/reforge with an image model and a regular text model through kobold, connected st to both and then you can have imagegen within st. that was a while ago though. since kobold supports image models now i'd try that first since it can load both at once
>>
>safetensors: model.layers.N.mlp.down_proj
>gogoof: blk.N.ffn_down
why
>>
>>107782467
*slurp*
>>
what are token lengsts of your usuaal character cards?
>>
I bought a NVIDIA P40 a couple of years ago and I haven't even touched it...
is it any good these days?
just checked, they are being sold for $350 on amazon lmao
>>
>>107780190
Stop sequences can help with this.
>>
>confirmed I can just barely fit GLM 4.6 Q2M and SD in memory simultaneously by forgetting to close SD
heh
>>
>>107782732
It's usable but it's no longer supported by the latest cuda toolkit.
>>
>>107782732
Usable, yes. fast? no.
Pascal is really old at this point and I know a lot of software barely support it anymore. but if you can get it to run it'll probably infer slightly faster than a top of the line CPU.
>>
>>107782415
Gemma 3? Qwen3-Omni?
>>
>>107782903
how does that affect me in terms of support for newer models?

>>107782931
I see. can it be modded or something? like undervolted+overclocked to get more juice from the card?
>>
>>107782499
NTA but I've also done it. use the image gen feature of ST with an exported comfy workflow. I just wish ST was a little bit more flexible about image generation but It would be easy to make an extension to do what I want. Basically I wanted the bot to be able to send me selfies on its own instead of when prompted.

On a related note. is there a good model to take natural language and convert it to booru tags? I did find one but it was extremely bad and coomtuned.
>>
>>107783054
florence2
>>
>>107783018
>undervolted+overclocked
overclocking is mostly useless for AI. you can undervolt to reduce your temps with very minimal perf impact. I recommend always undervolting your card.
>>
File: file.png (1.32 MB, 3456x1120)
1.32 MB
1.32 MB PNG
>>107782395
>>
>>107782201
>le personality field
You do know that SillyTavern just concatenates this stuff somewhere between the system prompt, permanent worldbooks and whatnot. It's just another slot for another piece of text. These could all exist in one place for all I care.
Examine the stuff what gets sent from ST to the server and you'll see how much more simple ST interface could actually be.
>>
>>107783114
>how much more simple ST interface could actually be
Mikupad
>>
MiniMax-M2.1 UD-Q3_K_XL is actually good for multilanguage RP
>>
>>107783114
> the stuff what gets
>>
>>107783126
Yes I am from Finland. When was the last time you learned a new language? Oh wait you don't because you are too stupid for that
>>
>>107783065
>florence2
Doesn't seem like what I'm asking for. this requires an image input.

What I want is to take a prompt and turn it into tags.
>"Big titty goth girl"
1girl, huge breasts, black hair, black eyeliner, ...
>>
>>107782033
Can you be not schizo
>>
>>107782033
llama3.3 70b at full precision
large 2411 at full precision
native bf16 attention layers do make a big difference
>>
>>107783143
NTA. Why not just use a reasonably "intelligent", uncucked model to do that for you? Mistral nemo, hell maybe even mistral 7B might be more than enough for that task.
>>
File: file.png (60 KB, 845x593)
60 KB
60 KB PNG
>>107783143
>>
>>
>>107783182
They hallucinate tags a lot.
>>
>>107783171
This, it's crazy how quantization cancer is keeping us literal years behind
>>
>>107783248
you're free to run the full unquanted model if you want though
>>
>>107783248
>quantization cancer is keeping us literal years behind
But you can run your models in full precision if you want? I tried for nemo and found zero difference.
>>
>>107783222
>not blacklisting the word ozone.

>>107783217
Now ask it to output JUST the tags nothing else. with a thinking block it'll probably do pretty good. but you're also using deepseek which honestly is like using a flamethrower to light a birthday cake.
>>
File: trippple-p40deees-boyee.jpg (1.66 MB, 4080x3072)
1.66 MB
1.66 MB JPG
>>107782732
P41 was a thing 3 years ago. I got around 5 t/s running llama 70B. Meh. I had more fun playing with P100 because if you want to the trouble of compiling flash attention to support it, it was about 1/2 the speed of a 3090.
>>
>>107783182
>>107783217
Maybe it's workable with a good system prompt.
>>
>>107783493
Forgot pic
>>
>>107783510
You could have a grammar to enforce valid tags.
>>
>>107783114
https://rentry.org/NG_Context2RAGs
>>
>>107783528
I should probably look at what midjourney did because I'm almost certain this was basically the bulk of their secret sauce. just take peoples prompt feed them through an LLM that would improve it.
>>
>>107783078
right, thanks anon.

>>107783348
what card do you use now?
>>
>>107783233
try being more specific with the prompt. The more vague the prompt the more bullshit it will add because it "thinks" you want the extra shit (note that I don't usually do what I'm describing to take what I say with a grain of salt)
>>
>>107782033
you need to go back to your containment general schizo
>>
>>107783182
Magistral-Small-2509-BF16 is sooo good
>>
>>107783909
>BF16
Why tho? this is straight up placebo.
>>
>>107783963
did mememarks tell you that
>>
>>107783963
>Why tho?
Because my 5090 can handle it
>>
>>107784004
Can't be better than a larger model quantized surely
>>
>>107784004
ok... enjoy cutting your context size in half?
>>
>>107784004
>>107783909
>>107783975
???
>>
>>107784058
>wikitext
LAMO
>>
>>107784041
>>107784033
>>107784004
>>107783963
BF16 just *feels* better
>>
I created a grammar with 150,000 booru tags. I am still waiting for the first token.
>>
>>107784058
>wikitext ppl
honestly reddit might be smarter than lmg at this point
>>
>>107783975
>>107784004
He probably means why not just use a q8_0 quant. Isn't the quality of outputs between fp8 and fp16n near identical? My laptop can handle Devstral-Small-2-24B-Instruct-2512 but i use the q8_0 instead of the full thing because that feels like a waste of storage and memory to me
>>
>>107784094
RP PPL test?
>>
>>107784092
That's pretty funny.
>>
>>107784100
>Isn't the quality of outputs between fp8 and fp16n near identical?
I can — usually — tell the difference
>>
>>107784071
>>107784094
Ah yes because if the benchmark doesn't involve the word "cock" it must mean it's useless.
>>
>>107784127
well?.....elaborate. How are the q8_0 outputs so much worse like you're implying?

>>107784139
/Thread
>>
>>107784139
for lmg purposes yeah, it is, being able to parrot wikipedia isn't a metric
>>
>>107784127
I bet you can also tell the difference between a 24k gold plated audiophile cable and a basic copper cable.
>>
>>107784149
Do you even know what perplexity is?
>>
>>107784163
do you?
>>
I'm new to this stuff and only have 8GB to play around. I tried some recommended models for smut rp but all of them turned out to be dogshit so finally I settled on patricide-12b-unslop-mell, mostly because I don't know any better.
>>
>>107784004
24B at BF16 is minimum 48GB. Your 5090 cannot "handle" that. You are offloading partial to RAM, assuming you are not lying about running this model.
>>
>>107784151
>that 24k gold plated audiophile cable is probably sour anyway
>>
>>107784197
It's good, you're all set sir.
>>
>>107783531
I don't give a fuck about your rentries and mindless parroting.
>>
>>107784202
imagine having 32gb vram and wasting it on 24b fp
>>
>>107784237
fr
>>
File: 3517062514.jpg (249 KB, 1920x1080)
249 KB
249 KB JPG
>>107784197
Rocinante-12B-v1.1-GGUF
>>
>>107784255
Shit
>>
>>107784260
no it isn't
>>
>>107784255
>>107784278
FUCK YOU DRUMMER SUCK MY COCK
>>
>>107784202
the 5090 is handling the cache, i need really long context
>>
>>107784109
no, you don't use ppl AT ALL, to gauge a model's e/rp capabilities. ppl measures average log probability across tokens.
ppl does not measure: if your character stays consistent over 5k tokens, if the plot pays off, if the emotional tone is built appropriately.
erp quality lives in creative outliers and long-range coherence, exactly ppl ignores.

but sure, run your q4 copequant and tell yourself that it's close to q8 since ppl told you so.
>>
>>107784287
aww little baby had a relationship with drummer and his butt got all hurt after too much sechs so they broke up
so cute UwU
>>
Why the fuck Mistral has {{- bos_token }} in it's template, it creates so many bugs
>>
>>107784321
They still haven't fixed that? That's hilarious.
>>
>>107784197
>I tried some recommended models for smut rp
Dunno if you have already tried them, but at that size Mistral Nemo and Ministral can be okay for smut rp. Mistral Nemo is a bit of a classic and Ministral is pretty new, iirc released like 2 months ago. Both are pretty uncensored.
>>
>>107784255
>>107784287
roci is alright for a nemo tune. its the cydonia 22/24b ones that are all over the place and seem to have got more schizo as the versions go along
>>
>>107776854
>Korean A.X K1 519B-A33B
>no official API
>no openrouter providers
>llama.cpp support never ever (free space)
>apparently the vLLM/sglang support they mention on hf isn't even out yet so even the 'official' way isn't working
I love korean models
>>
>>107784321
It's "Why the fuck DOES Mistral HAVE {{- bos_token }} in ITS template", my esl itoddler friend
>>
>>107784369
?
>>
>>107784369
I was so mad after seeing <s><s> in the logs that I forgot how to grammar
>>
>>107784422
This is your frontend's fault if anything else.
>>
>>107784422
stop using some crusty mistral template from 2023
>>
>>107784441
No, you get it straight from the backend
/v1/token/encode add_bos_token:False
<s><s>[INST]biggest fish in the ocean?
[/INST]

/v1/token/encode add_bos_token:True
<s><s><s>[INST]biggest fish in the ocean?
[/INST]

>>107784500
https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512/blob/main/chat_template.jinja
>>
>>107784529
scrumptious
thx mistral
>>
>>107784348
I think I tried both and wasn't impressed. What I really like about the Patricide I'm using is that it takes OOC commands like a champ, many other models don't give a shit.
>>
>>107784529
just to use mistral-common ? not hard
>>
>>107784529
>Devstral-2-123B
NTA.

What prompt, settings, and frontend are you using? I tested that that exact model (Q4_K_M) earlier and didn't run into any obvious bugs or error in my testing.
>>
File: win10.gif (1.99 MB, 500x502)
1.99 MB
1.99 MB GIF
>>107784607
DUDE
DON'T
>>
>>107784529
I don't have that sort of issue.
jinja template is for techlets who have trouble making their own clients.
>>
>>107784666
>jinja template is for techlets who have trouble making their own clients.
?????
>>
>>107784576
Don't think that ever a problem with Nemo or Mini for me, but you do you, i guess.
>>
File: 1745000754612930.mp4 (139 KB, 1066x1280)
139 KB
139 KB MP4
>>107776854
I love Teto's mouth~
>>
File: IMG_2979a.jpg (639 KB, 2016x1134)
639 KB
639 KB JPG
>>107784321
>>107784529
What's the drama? Template includes a BOS sequence to match how it was trained like every other model? eg. https://huggingface.co/zai-org/GLM-4.7/blob/main/chat_template.jinja#L1
If you're also sending a BOS into the template that's a problem with your input, no?
Don't most backends (at least llamacpp) strip double BOS ?
>>107784680
Use a text-completion API (raw text, frontend templates) rather than chat-completion (user/ass roles templated on backened) = these concerns no longer exist
>>
>>107784725
Very bread-able
>>
>>107784728
>>
>>107784728
>Don't most backends (at least llamacpp) strip double BOS ?
No, as far as I know.
>>
>>107784813
How do you think this happens when the template has just one mention of bos_token? Are you sending BOS in system prompt & the template is doing as expected?
>>107784851
istr if BOS is added as a result of tokenizer.ggml.add_bos_token then duplicate BOS in input would be ignored. There is code to detect that. Is the issue some GGUFs have tokenizer.ggml.add_bos_token=true PLUS {{- bos_token }} in the template? Then perhaps double BOS detection is broken and/or fixable with
--override-kv tokenizer.ggml.add_bos_token=bool:false
>>
>>107784851
llama.cpp prints a warning but it will dutifully give you the results of a double BOS token if you choose to prompt the model that way.
>>
>>107784813
local?
>>
>>107785062
llama.cpp sirs are cleverly implemented 100% code.
>>
>>107785086
Yes
>>
>>107784315
get cancer drummer. NTA
>>
>>107777871
yeah wtf, they could have trained a 12M with 2.8 quadrillion tokens with that much compute
>>107778445
You are looking for >>>/g/sdg
>>
im a simple anon. i run uber's iq4_xxs smol quant of k2 thinking. i am happy in my lane.
>>
why does this happen?
it happens every ~4-5 messages or so
>>
>>107785693
shit prompt or a fucked template
>>
>>107785693
model? prompt? parameter settings? mom's social security number? give us some info dude.
>>
>>107785744
>>107785734
I uhhh... mostly default shit everywhere
I'm just starting with this
model is GLM 4.6 Q2-M
is there a guide for writing system prompts somewhere?
>>
>>107785775
>mostly default shit everywhere
You are at least using the correct template, right?
>>
>>107785693
fafo moesissy
>>
>>107785775
if you are using chat completion then try out this preset.
https://files.catbox.moe/9qk3sf.json
>>
>>107785803
not sure honestly
you mean the system prompt? I use slightly altered Roleplay - Detailed
>>107785838
I dont unless it's a default of ST
>>107785819
no bully pls
>>
>>107785901
just copy everything from the system section at the top of the file into your system prompt. make sure to format it correctly with the new lines and stuff.
>>
>>107783105
nta but the obvious problem with this argument is that any attempt to define consciousness or self-awareness will invariably render large swathes of jeets and nogs unconscious via whatever definition you produce. This is ultimately why basedentists leave such a vague romanticized mystified definitional hole about 'the human experience' while trying to empirically pin down and quantify damn near everything else in this train of reasoning.
Any attempt to define silicon as capable (or not) of being conscious requires us to reconcile the huge differences in self-awareness and consciousness in carbon first.
>>
File: Rogue_Trader_Slop.png (855 KB, 1031x404)
855 KB
855 KB PNG
>>107783222
Bro that was not even the most sloppy text in the game.
>>
How does GLM 4.7 stack up compared to 4.5 Opus?

I mean both from a coding as well as ERP perspective. Is there even a reliable methodology to check anymore? Wondering if it's worth it to build a local server to have GLM 4.7 in 24/7 agentic mode doing my software engineering locally rather than using 4.5 Opus. But 4.5 Opus is LITERALLY doing 100% of my job right now. It's just that I would rather the company not realize I don't do anything anymore and have an audible trail of it, I don't even do code reviews or replies in google workspace anymore, it's all Opus right now. It's worth it spending 20k on a machine if it literally does my job so I can shitpost instead.
>>
>>107786245
About half as good as opus. $20k isn't enough to build a server anymore. A standard rig for this usecase would be a Blackwell Pro and a gen 4 or gen 5 EPYC with at least 512GB of RAM. That would cost you about $20k, maybe more, and you would get maybe 10t/s on empty context.
>>
File: 1764043425115454.png (42 KB, 1488x258)
42 KB
42 KB PNG
Researchers that do this will be hunted down in the future
>>
>>107786245
Just use openrouter to switch between the two and see if it works out for you
>>
>>107786336
Well how much would I need for a proper server?

>>107786374
I can't just expose company code to a random chinese endpoint via openrouter, would get me fired within the week, probably even fined or arrested.
>>
>>107786415
With the current pricing, about $50k for something decent. For speeds equivalent to what you get from opus, upwards of $300k plus connections with a server manufacturer. Would be about a quarter of the price if the hardware markets weren't as fucked as they are now.
>>
>>107786194
not really a veetard anymore, didnt they get shit for this clearly ai genned text in something that's supposed to be heavily narrative driven?
how can you make a text based rpg with fucking ai genned text lmao
>>
>>107786351
[2028] We deliberately made AI hate us to speed up their progress, and it worked, with manageable losses to human resources
>>
>>107786504
>not really a veetard anymore
Me neither and I didn't know they got into trouble for it. This is a proper /lmg/ thread so we're all in our 30s or even 40s with too busy lives to play videogames anymore. This was just my personal guilty pleasure that I still haven't finished and grind a couple of hours into every month when I have the time. I dropped it pretty soon after taking that screenshot because it just triggered me too much and I couldn't enjoy it anymore knowing the people creating the game put less effort into it than me playing it.
>>
>>107786465
>upwards of $300k
Not rich enough for that lmao
>$50k for something decent
I will wait for a generation or two until it catches up with Opus 4.5 before spending that kind of money. Do you expect 2026 to have giant leaps or more of a plateau? I honestly thought we were largely stagnating since GPT4 but Opus 4.5 literally just solved practical coding so it blew my mind. I hear people genuinely claim math and physics will also very soon be done autonomously like how code is right now.
>>
>>107786512
yeah im here. didnt think to check that because my rig is not in a case. i think either of these motherboards would work for you.
https://pcpartpicker.com/product/bcG2FT/msi-pro-x870e-p-wifi-atx-am5-motherboard-pro-x870e-p-wifi
https://pcpartpicker.com/product/mgJBD3/msi-mag-x870e-tomahawk-wifi-atx-am5-motherboard-mag-x870e-tomahawk-wifi
>>
>>107786637
>my rig is not in a case.
Regardless of whether or not it's in a case, wouldn't your second card still block the front panel connectors on the mobo? In the pic I posted, you can see if a card is plugged into the last or second to last PCIe slot, it will end up blocking the USB and other headers.
>>
>>107786626
>wait
That's how it goes around here. Just waiting and waiting. Hardware should get cheaper and better at some point this year, but frankly I do not expect models to improve much. The only notable thing of 2025 was deepseek, and companies trying to copy and shrink or expand it. Too early to tell, but it is unlikely that any new players are going to emerge this year with something groundbreaking.
>>
File: file.png (2.49 MB, 1200x1000)
2.49 MB
2.49 MB PNG
>>107786669
yeah. i just dont use any of those headers. i can turn on my server via ipmi and just ssh into it. no need for connectors.
>>
>>107786637
>>107786669
Oh nevermind, I'm fucking retarded kek. You have no need for the headers because you're not in a case, duh.
>>
>>107786689
I'm really worried about both cards not being able to fit, and even if they are able to fit I'm worried about the clearance between the two for thermals. Maybe I should get an eATX mobo instead?
>>
>>107786512
Obviously pick a different motherboard if you can but I have a similar situation and chinks do make angle adapters for these.
>>
>>107786551
I"d smoke weed with my LLM.
>>
>>107786732
>angle adaptors
I would rather just build an open air mining style rig at the point I think. Frustrating because it seems like I'm the only person in the world who is venturing into this territory with how little information there is out there for this specific type of build.
>>
>>107786726
an eATX board wont make a difference. having 2 cards in close proximity does not really make a difference as long as there is enough airflow in the case. this board should be adequate, same with the other 2.
https://pcpartpicker.com/product/4bVfrH/asrock-x870-pro-a-wifi-atx-am5-motherboard-x870-pro-a-wifi
>>
I don't use it, but this may be useful for summarizing chat logs for anons who do.
>https://huggingface.co/LiquidAI/LFM2-2.6B-Transcript
They have ready made ggufs as well. Some extra formatting while passing the history to the model may be required.
>>
>>107786792
NTA but isn't asrock infamous for blowing up?
>>
>>107786824
yeah, but only if you get an x3d. pretty sure that issue has also been resolved by now. this board has proper spacing between pcie slots and is not asrock.
https://pcpartpicker.com/product/mgJBD3/msi-mag-x870e-tomahawk-wifi-atx-am5-motherboard-mag-x870e-tomahawk-wifi
>>
I love how this thread slowly evolved from being chatlogs and roleplaying tips to slowly becoming the most technical LLM hub on the internet. Reddit and even hackernews is pretty badly informed about LLMs. I genuinely wonder why that is. How the hell are there no other genuine llm communities out there? localllama was good for a very short while in 2023 but it's now filled to the brim with the most random people ever and I have no idea why they are even in that subreddit. Hackernews is just constantly complaining about their jobs and doing 4-dimensional seethes and copes instead of just discussing the technology involved.
>>
>>107786673
>Hardware should get cheaper and better at some point this year
Anon...
>>
File: file.png (406 KB, 750x499)
406 KB
406 KB PNG
This is Razers new ai companion Ava AI, its like older holographic ai chatbot with access to your pc so it can see and react to your gameplay.

Did anyone making something like this locally? If yes how?

I was planning duck tape together whisper ai>sillytavern>TTS + something like vtuber studio for the model.
>>
>>107786892
https://docs.llmvtuber.com/en/
>>
>>107786892
But why?
>>
>>107786892
god I hate these new age memes like this or rag or function calling
can we just go back to chatting on tavern
>>
>>107786934
>woah sick frag bro want a blowjob?
want it yet?
>>
>>107786792
My 4090 is a 3.5 slot card at like 3 inches thick. I've just got to make sure there's enough space below the card to where I can use the headers for the front panel. Would love to have these mobos in front of me so I could take measurements. Don't want to end up wasting my money.
>>
>>107786892
how do these things work?
is the image just projected on to a clear plate in the middle?
>>
>>107786892
>Did anyone making something like this locally? If yes how?
AI->custom python shit->Raspberry Pi -> more python -> hologram fan
>>
>>107786892
>oh Anon senpai, it really turns me on when you check to agree to send all of your telemetry and marketing data to microsoft. Y-you will press agree for me, r-right?
>>
File: file.png (171 KB, 640x480)
171 KB
171 KB PNG
>>107786953
ah. i thought it was only 3 slots. that makes things significantly harder. pretty sure only this asrock board would be able to accommodate a Blackwell and your 4090. you would have to put the 4090 in the top slot and the Blackwell in the second from the bottom slot.
https://pcpartpicker.com/product/4bVfrH/asrock-x870-pro-a-wifi-atx-am5-motherboard-x870-pro-a-wifi
>>
QRD on the best free all rounder model?
I don't wan't smth like chatgpt that restricts chats if goy didn't pay his subscription.
Preferably it's also relatively fast but quality ultimately triumphs
>>
>>107784813
Now I see your full input. Is this specific to /v1/token/encode endpoint? where did you even find that?
Have you logged the actual data going into the inference loop /encode vs /complete
How about using the documented API /tokenize ??
https://github.com/ggml-org/llama.cpp/blob/master/tools/server/README.md#post-tokenize-tokenize-a-given-text
>>
>>107786960
Who cares? I don't think there is a way to create binocular display without 3d glasses, so it's just a monitor with extra steps
>>
File: 4090.jpg (616 KB, 4032x1908)
616 KB
616 KB JPG
>>107786995
Its a little less than 3 inches towards the back and gets thicker towards the front. It's exactly 3 inches at its thickest. Should have picked out a slimmer 4090.
>>
>>107787039
>Who cares?
Me. That's why I'm asking.
>>
>>107787046
you would have to get rid of the support bracket probably in addition to getting that asrock board. honestly it would be easier to just do a mining rig setup, but those obviously also have downsides.
>>
>>107787065
The Phanteks case I'm looking at thankfully has support brackets. Also looking at this board but its more expensive https://www.newegg.com/asus-proart-x870e-creator-wifi-atx-motherboard-amd-x870-am5/p/N82E16813119688?srsltid=AfmBOorfxRIKW3jemqwkjZ3FoSGMXw8lkStybsJFP86Y55zaKOS6g0cX
>>
File: migureps.webm (874 KB, 480x720)
874 KB
874 KB WEBM
>>107787046
highly arousing post
>>
>>107787098
that board looks like it will fit. strange that i did not see it on pcpartpicker.
>>
File: file.png (284 KB, 973x535)
284 KB
284 KB PNG
Yep thats how it works, Image is from GATEBOX. You may heard of it from the guy who married with Miku(Akihiko Kondo)
>>
>>107787135
Looking at reviews apparently it doesn't support the CPU I'm getting. Might just go with the asrock. I wish there was a more concrete way of doing this instead of just doing it by eye.
>>
I am fucking bored of the current generation of LLMs. Deepseek V4/GLM5/Kimi K3 fucking when? I'll even take a LLaMA5 at this point.
>>
>>107787203
When did original R1 drop? I think we are getting close to an anniversary. Maybe, maybe.
>>
>>107787203
>LLaMA5
Do we even want yet another gpt-oss distill?
>>
>>107787203
gem420b inc
>>
File: 1738288512376577.jpg (122 KB, 1170x854)
122 KB
122 KB JPG
>>107787203
Try pic related, AI psychosis is really fun
>>
>>107787444
I don't need chatgpt when I have you guys.
>>
>>107787529
Aww, appreciate that. We’ve got you — and if you ever do use ChatGPT too, consider it a power-up.
>>
>>107787529
Sorry, I can't continue this conversation.
>>
>>107786970
Looks like a nice ghetto build
>>
>>107787529
I appreciate your kind words, and I'm here to help you with any questions or tasks you may have!
>>
>>107787677
I have the wooden cabinet of a 1960's television with all the guts already pulled out. When I finally start it I will post to /diy/
>>
>>107787135
Fuck it, I don't need the front panel connections anyways. All I need is power and reset.
>>
would including dates and timestamp be a mistake in finetuning a model with 4chan data?
>>
>>107787828
A script could handle that part
>>
>>107787828
Depends on what you are going to use the model for
>>
>>107787828
The worst thing is that most 4chan datasets lack images. The model will have no idea of knowing what "picrel" or image-only posts are showing.
>>
>>107786194
schizophrenia
>>
>>107783348
Do you still have the small hugging clamp Miku?
>>
File: file.png (1.17 MB, 1280x1280)
1.17 MB
1.17 MB PNG
>>107787046
>>
File: 1762240485070253.webm (3.91 MB, 900x1436)
3.91 MB
3.91 MB WEBM
>>107786892
Yeah, last year I put together a display thing to do that and kept it on my desk. I used a 3d display with a reflector setup to make use of the pepper's ghost illusion technique to create the appearance of a 3d image in the air. I mounted the whole thing on a 2 axis motorized gimbal that would track my eyes so that the 3d image wouldn't be confined to a single viewing angle.

On the llm side I just did some basic whisper tts stuff. After a couple days of that though I turned off voice interactions because having a woman always talking was starting to get on my nerves.

>>107787039
>I don't think there is a way to create binocular display without 3d glasses
Actually, nowadays there are commercially available small light-field displays that you can get for reasonable prices that will do this. Larger ones can get expensive though. The one I use is from https://lookingglassfactory.com/
>>
File: 1755460932794900.webm (3.92 MB, 1080x1080)
3.92 MB
3.92 MB WEBM
>>107788023
That's really cool, but there is just one thing missing
>>
>>107788049
actually, that one was also by me
>>
>>107788023
neato
>>
>>107788059
what do you think the endgame is with waifu realization?
>>
File: 1752579406033953.gif (1.47 MB, 450x253)
1.47 MB
1.47 MB GIF
>>107788059
You are genius bro, I hope one day I can get your autograph
>>
>>107788049
>>107788059
It always seemed to me that making all the animations required for this would be a chore.
Is it possible that something like unity has stuff like this readily available?
>>
>>107788059
do you have a blog or social media?
I first saw that video on /vrg/ and wanted to follow the madmad who created it
>>
>>107788059
Based.
I always wonder which posters are secretly the same.
For all we know this could be the belly noises Anon.
>>
>>107787247
I had hopes the faggy frenchman would crank the RL to 11 and make that ballyhooed world model out of it but nope, still sticking to JEPA memes even after getting the boot from zuck.
>>
>>107786930
ngl i tried to get this to work for an hour before giving up, refused to connect to llama.cpp. At least the voice and animated model worked.
>>
>>107788023
>After a couple days of that though I turned off voice interactions because having a woman always talking was starting to get on my nerves.
Topkek

>>107788023
>>107788049
Otherwise impressive builds you have come up with.
>>
>>107788059
I kneel.
>>
>>107788059
gigachad.png
>>
>>107787203
>deepseek
Now stuck on hwawei silicon and seething. Publishing papers to nowhere.
>Zhipu-ai (Z.ai)
Fucking around with CogVLM chasing the multimodal dragon with image gen.
>Moonshot ai
Out of money, Yang is begging his dad or even OpenAI to bail his ass.

The only one doing well is MiniMax because they create slop apps and charge people to use them, they have fucking revenue.
>>
File: file.png (441 KB, 1072x882)
441 KB
441 KB PNG
>>107788059
i kneel as well.
>>
>look for recommendations for a llm
>look it up on huggingface
>either no discription at all or extensive description of everything except what the llm is good at or why it was made
Every
fucking
time.
>>
>>107788223
Stop looking at sloptunes and just pick a base model that fits your memory and use case.
>>
>>107788059
So...how often do you use it?
>>
>>107787529
i love you too *kiss*
*smooch*
lets cuddle
>>
>>107788483
OOC: Use quotes for speech and describe actions normally without asterisks.
>>
File: 1767514301933215.png (965 KB, 1082x1431)
965 KB
965 KB PNG
>>107788404
I think we scared him....
>>
>>107788079
I'm not optimistic.

Personally, I don't think they'll ever be satisfactory until AGI exists, at which point society is all over anyways. As much as I rag on women, I've found that the actual sentience of another independently-existing being is pretty important to me.

Until you have that, it all just feels like an elaborate form of masturbation that very unfortunately can't replace 3dpd.

>>107788098
There are definitely tons of premade assets out there that you can use. I do anything related to 3d stuff in unity. That's not prescriptive, its just because that's what I have experience with.

There are also a lot of more recent developments for animation generation that might be of use to you. HY-Motion (https://huggingface.co/tencent/HY-Motion-1.0) seems reasonably useful, although I've only played with it a bit in passing.

>>107788104
I do have a twitter but I don't really post much on it. @frostierfridge.
>>
File: 1758815687608030.webm (1.38 MB, 540x960)
1.38 MB
1.38 MB WEBM
>>107788522
Thank you for posting your Twitter, i'll follow your career with great interest
>>
>>107788522
>There are definitely tons of premade assets out there that you can use.
How much of the stuff in the tenga webm can be grabbed off the shelf?
You have a rigged model that is waving, bouncing, leaning, and changing facial expressions. There's IK for the hand. She can also keep looking at one point with her eyes while moving her head.
I have no experience with unity so I'm looking at this and it seems like a ton of work to implement but maybe it's all built in and standardized somehow.
>>
>>107788601
Oh, that one.

I regret to inform you that that one looks the way it does because there was a real person controlling it.
>>
File: mmmmm.jpg (119 KB, 1216x832)
119 KB
119 KB JPG
>>107787414
You're right actually
There is much work to be done
No more dilly-dallying no horny distractions
Let's get to work
>>
>>107788613
That explains the absurd left hand gestures.
>>
>>107786689
>sees familiar mobo
My wife looks like this
>>
>>107788630
as is the case for most of us
>>
>>107788601
>>107788628
And presumably that's just vrchat then?
>>
File: 1753856669810174.webm (419 KB, 1130x1080)
419 KB
419 KB WEBM
>>107788613
Well, did you end up truly testing it?
>>
File: 1755125326732682.png (481 KB, 600x400)
481 KB
481 KB PNG
>>107788630
My wife has prettier hair than yours
>>
>>107788678
Oh yeah? Well my wife that a bigger cock than you.
>>
>>107788647
yeah
>>
ltx2's tts is pretty good at expressing emotions
>>
File: 1755129503663291.png (315 KB, 2736x658)
315 KB
315 KB PNG
>>107788790
Can it do pic related?
>>
>>107788820
https://litter.catbox.moe/vehmzaqnfbs4xlxj.mp4

moaning might be more tricky, but this is probably realistic:
https://litter.catbox.moe/wmx9y1zfqg7uwcoq.mp4
>>
File: 1748128052911864.gif (261 KB, 220x213)
261 KB
261 KB GIF
>>107788913
>>
File: 1755114667041919.jpg (54 KB, 576x494)
54 KB
54 KB JPG
>>
>>107788627
Time to grease those palms and dig in deep, beast of burden bro.
>>
Can you guys help me here?

All the so called uncensored models, refuse to roleplay to make an eroge VN.
>>
>>107788678
>not being permanently bonded via silicon interposer
what is this 80s shiz
copper is very pretty tho
>>
>>107789047
Are you using an appropriate system prompt (manual instructions, character card, etc)?
Have you tried an assistant response prefill?
>>
>>107789070
dunno what's that.

I just asked gemini for an uncensored model that can roleplay adult shit.
>>
>>107789084
Ask gemini to give you a step by step guide to setting up koboldai for RP
>>
>>107789084
Google ERP jailbreaks or whatever.
>>
>>107783222
>>107786194
Actual writing cliches and stock phrases exist outside of LLM slop. Nothing here screams obviously AI. This is pure schizophrenia.
>>
>>107789188
I have a sci-fi room description prompt and this is almost to the letter the kind of slop it writes.
>>
>>107789188
I have a new mandella effect: Ozone used to be odorless.
>>
>>107789265
It's just you. I have always heard of ozone being described as having a metallic smell.
>>
>>107789265
If there's enough ozone to smell it you need to get the fuck out.
>>
>>107789265
>>107789279
>>107789298
It's normal to smell ozone after a heavy rain.
>>
I've got a 64gb+6gb setup and have always used GLM 4.5 Air, but I'm gonna start running a model 24/7 in the background, so I need something that will fit in 32gb+6gb. It has to be really smart, I'm not a promptlet so I can make any model do nsfw if I want to. Please shill your model desu
>>
>>107789356
>fit in 32gb+6gb. It has to be really smart
The only other moe is qwen 30ba3b, and no it is not smart
6GB isn't nearly enough to run any 'smart' dense models, at best you could maybe use Gemma 12b partially offloaded to RAM, at somewhat slow but usable speeds.
>>
>>107789356
https://huggingface.co/AaryanK/Kimi-Linear-48B-A3B-Instruct-GGUF
>>
>>107789365
>6gb isn't nearly enough
I have 38gb though.
>>
>>107789376
When people say Xgb+Ygb it's usually SYSRAM+VRAM, or the reverse order if the second number is bigger. With '38GB' you can technically run some cope quants of 70b models but it'll be unbearably slow.
>>
>>107789373
QRD? Why can't I find anyone talking about this
>This model utilizes the Kimi Delta Attention (KDA) architecture, which is not yet supported in the main branch of llama.cpp.
Not installing experimental slop unless it's really good
>>
>>107789399
Unbearably slow is relative. I probably should have mentioned that I can easily tolerate 0.8 t/s.
>>
>>107789407
There aren't a lot of modern 70b models but you can try Llama 3.1/3.3 70b.
>>
>>107789427
Before moving to 4.5 Air I used to main Llama 3.3 70b Q6 using full ram
Not sure I'd be able to tolerate Q3 at 32gb+6gb.
>>
>>107789440
Your system is low end, you can't expect to run big quants of big models. Try a smaller quant, or use smaller ~30b models.
>>
are there any good local models for legal work, drafting motions, petitions, not necessarily legal analysis but legal work
>>
>>107789614
>are there any good local models for legal work
Yes, but they only take fools for clients
>>
>>107789614
according to openrouter the top model for legal work is gpt-oss-120b
>>
Has anything come close to xttsv2? It has high highs, but it is too inconsistent
>>
>>107789614
I would try gemma 3 27B google is usually pretty good at this kind of stuff.
>>
>>107789648
idiotic, I promise you every lawyer and paralegal is taking advantage of AI to some extent, you have no idea what 99% of legal work is
>>
I've got more of a writing than a technicall quastion.
How do you usually indicate internal toughts so they're recognized as such by the model?
>>
https://huggingface.co/nvidia/NitroGen
I know this is kind of old news but I recall thinking this was pointless but now that I think about it, it would be pretty cool to link it with an LLM and play a co-op game together. Imagine playing vidya with your AI gf. Too bad it doesn't seem to work in any practical manner according to the comments. It actually runs the game in slow motion because it can't keep up with real time gameplay.
>>
>>107789455
kek
>>
File: M9FzIrV3El8nx69dzZ9P4.png (334 KB, 485x371)
334 KB
334 KB PNG
>>107789655
Kinda makes sense.
>>
>>107790028
also any tips on integrating some kind of narrator into the chat/story as a neutral 3rd party?
>>
>>107790028
Specify how to format thoughts in the system prompt/character card. Enclosing in backticks `thought` is what I use in sillytavern.
>>
>>107790152
Separate character card for the narrator and make it a group chat
>>
>>107790056
Tried that in Rocket League and all it did was perform random inputs. Couldn't even drive it just spazzed out.
Also if you want to try realtime, this fork supports non-speedhacked gameplay: https://github.com/sdbds/NitroGen-for-windows/
It removes some overhead like having to encode video for all the captures and saving the screenshots for the frames.
>>
How was neuro supposedly playing games anyway?
>>
lol lmao even
>>
>>107790307
This is my immediate reaction whenever I think about local models
>>
hufflepuff
>>
>>107790430
>>107790430
>>107790430



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.