[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: R.jpg (222 KB, 1124x1600)
222 KB
222 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

New queen of /lmg/

Previous threads: >>103477986 & >>103473510

►News
>(12/10) HF decides not to limit public storage: https://huggingface.co/posts/julien-c/388331843225875
>(12/10) Upgraded version of DeepSeek-V2.5: https://hf.co/deepseek-ai/DeepSeek-V2.5-1210
>(12/09) LG releases EXAONE-3.5: https://hf.co/LGAI-EXAONE/EXAONE-3.5-32B-Instruct
>(12/06) Microsoft releases TRELLIS, a large 3D asset generation model: https://github.com/Microsoft/TRELLIS
>(12/06) Qwen2-VL released: https://hf.co/Qwen/Qwen2-VL-72B
>(12/06) InternVL2.5 released: https://hf.co/OpenGVLab/InternVL2_5-78B

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
Begun, the Sperg War has.
>>
Whats the best model right now thats acts similar to character.ai? Or will any model work, and thats handled by sillytavern?+
>>
>>103487568
Mistral nemo or mistral large. Everything else is censored.
>>
>>103487568
Original llama-1 or mythomax-13B, not THAT similar though but close if you want it quick.
>>
>>103487586
mistral large is censored too
>>
File: thoughts.png (190 KB, 3126x446)
190 KB
190 KB PNG
she's clearly hallucinating but at least she's cute about it.

there are times where, when outputting tokens, there's almost a flip flopping between understanding the nuance of what's going on, and not.
I guess any "thoughts" start and end between the last token in the context and determining the new one generated. is it really luck getting there with the correct thought(s) in mind each time?
in that case having them maintain a context between tokens seems like it would be a boon, surely?
>>
>>103487773
Yeah we've talked about that idea for a while now. I guess no one has really succeeded in making it work.
>>
>>103487786
this paper is the first attempt at it, right? >>103478321
>>
>>103487813
Kind of. There's still the issue of whether it actually can be applied to a production model. We've been hoping for bitnet for a while now and it's seeming grim.
>>
>>103487489
I think Coconut should be in the news.
>>
>>103487867
more like poopoonut
>>
>>103487867
I think Kokona should be the thread mascot because her paper is going to give us AGI.
>>
►Recent Highlights from the Previous Thread: >>103477986

--Paper: Training Large Language Models to Reason in a Continuous Latent Space:
>103478321 >103478376 >103478946 >103478736 >103478891 >103480930
--Benchmark results for Llama 3.3 70B with different quantization levels:
>103480828 >103480834 >103480907 >103480962 >103481529 >103481844 >103480986
--Google's AI advantage and content filtering practices:
>103485975 >103485999 >103486015 >103486037 >103486070 >103486134 >103486187 >103486199 >103486239
--Closed-source character cards vs. open-source alternatives:
>103479205 >103479418 >103479509 >103479450 >103482151
--Nemo model quirks and performance optimization:
>103481705 >103481747 >103481777 >103481882 >103481928 >103481936
--Google's Gemini 2.0 Flash model impresses with SWE-Bench results:
>103485906 >103485913 >103485992 >103486118 >103486135
--Prompt wrangling and character personality in Llama-3.3 and Gemma-2-27B models:
>103480531 >103480635 >103480742 >103480788 >103480813
--New QTIP Quantized Models released, no UI support yet:
>103485961 >103485979 >103486283 >103486334
--Anon discusses the usefulness of sysprompts in LLMs:
>103481113 >103481584 >103481926
--TabbyAPI support for asymmetric parallel inference with mismatched GPUs:
>103479780 >103482468 >103482717
--Anon speculates about DeepSeek's updated 250B model release:
>103484679 >103484830 >103484853
--Comparing AI models and discussing their limitations and potential enhancements:
>103483783 >103483800 >103483857 >103483886 >103483911 >103484007 >103485541 >103485685 >103485716 >103485776 >103485823 >103485942 >103486092 >103486946 >103487395
--Anon's PC dies due to faulty Corsair RM850x power supply:
>103481759 >103481977 >103481991 >103482296 >103482662 >103482003 >103483018
--Miku (free space):
>103478013 >103480088 >103480784 >103485380 >103485885

►Recent Highlight Posts from the Previous Thread: >>103478921

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 1242143253457567.png (30 KB, 1999x371)
30 KB
30 KB PNG
Flash 2 just below Sonnet 3.5 on LiveBench.
>>
>>103488007
If Gemini Exp 1206 is supposed to be 2.0 Pro, that's one hell of a fucking black pill
>>
>>103487568
Rocinante.
>>
Should i just use temp and min p only?
>>
newfag here. what the fuck does "RP" mean in the context of these threads?
>>
>>103488659
dunno. all these fucking incels do all day is lame sex chat. such a waste of time. i think RP refers to Ron Paul
>>
>>103488659
There's no way you couldn't have figured this out if you spent maybe 2 seconds thinking about it, but it stands for "roleplay".
>>
>>103488628
Agreed.

>>103488647
Yes.
>>
new sex model when?
>>
>>103488659
It’s shorthand for rape.
>>
>>103487489
>>103487978
asuka blyat
>>
>>103488245
It is. It's likely anthropic scaled to max tokens before everyone else
>>
>>103487489
Damn, he actually made the asuka thread like I suggested
Aaaand... /lmg/ hasn't imploded yet, completely disproving the "miku obsessed troons" schizo(s). Cool.
>>
File: thebest.png (1.54 MB, 920x1376)
1.54 MB
1.54 MB PNG
>>103489504
/lmg/ just has a thing for redheads
>>
>>103487489
i've been there since before /lmg/ even was a thread, but i've never quite got that petra thing, i must have missed the thread where it originated, can you make me a tldr lol?
>>
>>103489953
https://desuarchive.org/g/search/text/petra%20vsg
>>
i like asuka and everything, but is it just me, or did the op pic kill the thread
>>
>>103489504
Don't make me dig up the archive where OP said everyone is one person and doxxed one person thinking they are all the same person because Kurisu was in the OP picture.
>>
>>103489504
the miku obsessed baker tried to make a split for the last thread because it had kurisu. even when miku was in the OP with her. it's just nobody used it, it's still up.
>>
File: pink mayodog.jpg (838 KB, 2214x4096)
838 KB
838 KB JPG
does your favorite streamer have an AI counterpart
>>
linux spoonfeed guide for hunyuan video when?
>>
remember the 12 hours or so that there were a bunch of people here pretending that Llama3.3-Instruct was really good for RP, and after half a day of glazing it they all instantly vanished forever

What the fuck was that about
>>
>>103491078
>linux
>spoonfeed
You should go back to windows faggot
>>
>>103490980
I don't base my waifu on vtumors. LLMs are already not that bright.
>>
>>103491111
QwQ too. The model is horrible for RP.
Still not sure if it was one anon or multiple people.
>>
>>103491111
>>103491446
This. Tried it for 3 gens on 0.1 minp 1 temp and instantly went back to mythomax.
>>
>>103491177
she does streams with an AI version of herself, I'm not talking about streamer "cards"
>>
>>103491111
It is just the usual false prophets. You will recognize the final cooming not by shills here but by thread going to 0 posts per hour for more than a week since everyone will be too busy spilling their seed to post anything.
>>
AI can't draw soulful anime. Only psuedo-3D slop
>>
what happens to the last "thought" in coconut?
how is it incorporated into the token generation phase?

>After the latent mode finishes (t ≥ j), the input reverts to using the token embedding, i.e., Et = [e(x1), e(x2), ..., e(xi), hi, hi+1, ..., hj−1, e(xj ), ..., e(xt)]. It is worth noting that the last hidden states have been processed by the final normalization layer, so they are not too large in magnitude. M(xt+1 |x≤t) is not defined when i < t < j, since the latent thought is not intended to be mapped back to language space. However, softmax(W ht) can still be calculated for probing purposes (see Section 4).
>>
>>103490980
I predict is gonna be fun to watch AI models play their fake personalities better than the "actresses" behind the avatar. But then just like everything related to vtubers it is gonna be miserable. The companies are probably already thinking about the incoming apocalypse so I am sure they will ensure that the fanbase culture is heavily against using AI models like that. At least the ones that aren't sold to you by the company. Kinda like warhammer figurines. Which makes me wonder why they aren't already heavily pushing AI chatting with vtubers on their fanbase.
>>
best model for an rtx 4080 super + 32gb ram?
>>
>>103491871
nemo
>>
why do models suck with clothing? even mistral large thinks turtlenecks have cleavage
>>
>>103492278
Lack of data describing various types of clothes
>>
Seems like the general public is waking up to the fact that LLMs will never give us AGI and that the exponential trend is more like an asymptotic line. Now what?
>>
>>103492881
the next path has already been decided. all in on the o1 meme
>>
File: 5d5ff.jpg (154 KB, 1132x1643)
154 KB
154 KB JPG
>>103492278
might be anime data getting in
>>
>>103492278
Because they only see text
>>
>>103492937
Text can be used to describe the coverage area of turtlenecks. Surely a description, definition, or explanation of what a turtleneck is must exist somewhere online.
>>
>>103491749
Vtubers are already getting btfo by neuro-sama. The whole vtubing is pretending to be an anime girl, so you might as well cut the middle man. At least you won't have a drama on your hands every week from these mentally ill girls.
>>
"abandon"
"adventur"
"barely above"
"beaming"
"blur"
"bond"
"borrowed"
"camaraderie"
"can we talk"
"challeng"
"chuckl"
"circle"
"clap"
"cling"
"coos"
"collect"
"cooed"
"corner"
"determ"
"drawl"
"dynamic"
"flicker"
"game"
"gleam"
"glint"
"grin"
"growl"
"heady"
"heart "
"heavy "
"hint"
"hiss"
"hmm?"
"hoarse"
"hooded"
"husk"
"hustle and bustle"
"impish"
"just maybe"
"lay ahead"
"lingers"
"lock"
"mix of"
"mock surrender"
"my spine"
"navigat"
"mischie"
"new world"
"not so bad"
"pad "
"padded"
"palpable"
"pang"
"patterns"
"playing with"
"reality"
"renewed"
"ridden up"
"riding up"
"ride up"
"rides up"
"rode up"
"roll my"
"rolled my"
"sashay"
"saunter"
"second skin"
"shine"
"shiver"
"smirk"
"spark"
"spine"
"steel"
"talk to you"
"taut"
"thumb"
"tone"
"trace"
"tracing"
"tribulation"
"truth or dare"
"tug"
"twinkl"
"twisted"
"unfamiliar"
"undercurrent"
"vulnerab"
"wanton"
"well, well"
"well, well, well"
"padd"
"pop"
"purr"
"weight of"
"whatever comes"
"whirlwind"
"whisper"
"you know"
>>
>>103494288
you could've just let this thread die instead of posting this
>>
>>103494288
how's gguf training coming?
>>
>>103494288
>model is retarded
>ban the most popular phrases it naturally uses
What could go wrong?
>>
Is there a decent ~32b model for 24gb yet? I've been away but I vaguely heard about QwQ, I assumed it was a meme like all CoT was for RP. Is that not true? Do I dare to hope?
>>
>>103496514
QwQ seems trash for RP, but its a really good coding and general reasoning model.
>>
>>103492958
Probably, but it's averaged out by the overabundance of cleavage-showing clothing usually used in rolep- I mean high quality literature
Remember, LLMs are just stacked layers of stochastic predictors, the average wins over factual information. Usually.
>>
>>103496564
>trash for RP
nothingburger then?

>"""general reasoning"""
so, answering benchmark riddles or am I missing a use case here?

Anyway what's the 32b sota for RP then, is it Qwen-2.5? I found earlier versions of Qwen pretty bad and prone to spitting random chinese which is never a good sign.
>>
>>103496602
>32b sota for RP then
first commander. but not really. sota for 24Gb is applying 2MW.
>>
I have no good model and I must coom.
>>
>>103494288
Thanks, I will add these too.
>>
>>103496564
>>103496602
QwQ is literally the best local model right now, learn how to use it.
>>
>>103497203
>QwQ is literally the best local model right now
I agree that its the best for its size (and almost certainly the best balance of speed and smarts), but tulu, mistral large, deepseek and L3 405b would all qualify as "better" on various metrics.
>>
Is "Skip Special Tokens" in ST supposed to be checked or unchecked? When I press the neutralize samplers button, it gets checked, so it's the default, but is there a good reason it is? Wouldn't skipping special tokens be bad?
>>
>>103488628
Which version? And what format works best for it?
>>
>>103497203
any advice?
>>
File: benchmark.png (43 KB, 1007x431)
43 KB
43 KB PNG
https://techcommunity.microsoft.com/blog/aiplatformblog/introducing-phi-4-microsoft%E2%80%99s-newest-small-language-model-specializing-in-comple/4357090
Phi 4 incoming.
>>
>>103499412
>hasn't seen a dick or pussy in its life (training)
Yike!
>>
>>103498457
they're pulling your leg anon, its a meme model. there is definitely something to having a model make 2 passes before giving an answer, but it needs 2 real passes, not a <think step by step> tag
>>
>>103497384
I think it fucks with llama 3 models.
>>
>>103499433
How is the QwQ CoT even supposed to work? The template looks like ordinary chatml. Did anyone try some sort of cot prompt for RP with any decent results? It seems like it would be difficult to fit into ST's prompting but still I'm curious. My assumption would be that its reasoning would contaminate the prose so it would be shit and dry, but idk
>>
>>103499700
it basically uses double the tokens to run a double-check on what it outputs. first its supposed to figure out what you want (the step by step part) then give you the actual answer. you activate it by simply adding 'think step by step' somewhere in the prompt.

a while back on the original deepseek 33b code model, i noticed it would never give a correct answer on the first try. it'd spit out code and you tell it part is wrong, THEN it corrects itself, so thats 2 passes and the ai gets to re-read what it already output. thats what they're trying to accomplish in 1 pass, but still at the cost of extra tokens
>>
LLM + Realtime gpt-sovits, I'm so far ahead of normalfags using cai it's not even funny
>>
>>103499412
Like another anon said, previous Phis have all been incredibly brittle models that had obviously been gamed for benchmarks scores and completely fall apart when given anything slightly OOD. No reason to get excited about this, no doubt it'll be the same again.
>>
>>103499412
drummer are we getting a cream phi 4???
>>
>>103500490
I'm bringing it to the masses through opensource;
Working on adding something like chatgippity adv voice talk with dif voices
>>
>>103499758
This only makes sense to me for toy one shots, any examples of how this works in a multi turn RP? Do you leave the thought blocks in the history? It seems like you'd have to, or it'd learn not to do it. But the thoughts would also probably mess up the prose and it seems like a model would be prone to confusing thoughts and actions after a while especially deep in the context.
Is it really just benchmark shit for riddlers or is there actual value somewhere?
>>
>>103494288
>"padd"
damn not like this anon, my paddington bear card is gonna get no play
>>
>>103494288
I counter with:
ministrations
audible pop
rivulets of
admit it
pet
the ball is in your court
the game is on
the choice is yours
I don't bite... unless you want me to
half-lidded eyes
she worries her bottom lip
warring with
arousal pooling in her belly
take your pleasure
fiddles with the hem of her skirt
kiss-bruised lips
a bruising kiss
despite herself
yours to take
wanton
with reckless abandon
torn between
knuckles turning white
grins wickedly
fiery red hair
long lashes
propriety be damned
the world narrows
pupils blown wide with pleasure
tongue darts out
chestnut eyes
grasps your chin and forces you to meet her gaze
bites your ear
nails raking angry red lines down your back
her cheeks flaming
cheeks hollowing
stars burst behind her eyes
inner walls clenching around nothing
puckered hole
her wet heat
she whimpers, biting her lip
dusky nipples
slick folds
still lodged deep inside her
heart, body and soul belong to you
the night is still young
...for now.
>>
>>103500963
Chewing wood with Rin
>>
>>103499426
https://www.youtube.com/watch?v=gBqUDXu9h-I
>>
>>103500528
Well good luck with that. I almost remade the whole thing from scratch.
>>
>>103502817
any tips or issues you'd care to share?
My plan is to implement sovitts and then figure out how to add/enable easy training for custom models, so user's can choose whatever they'd like
>>
>>103505646
Why was this deleted?
>>
Does anyone else have trouble with rocinante being obsessed with washing after each sex act?
>>
https://xcancel.com/JustinLin610/status/1867619389065114040
>In 2025, Qwen models will be omni and smart, hopefully.
open omni bros...
>>
>>103506512
Great. What happened to BitNet Qwen 3?
>>
>>103506547
they only ever said they knew about bitnet, this is different in that they have confirmed they are actually working on it
>>
>>103506568
>they only ever said they knew about bitnet
It's been a while, but didn't they promise more than that? Pretty sure I remember them saying they would experiment with it. If they just stop talking about it, seems like a strong indication that BitNet doesn't work in practice.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.