[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: ComfyUI_00556_.png (407 KB, 1024x1024)
407 KB
407 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106996568 & >>106986408

►News
>(10/21) Qwen3-VL 2B and 32B released: https://hf.co/Qwen/Qwen3-VL-32B-Instruct
>(10/20) DeepSeek-OCR 3B with optical context compression released: https://hf.co/deepseek-ai/DeepSeek-OCR
>(10/20) merged model : add BailingMoeV2 support #16063: https://github.com/ggml-org/llama.cpp/pull/16063
>(10/17) LlamaBarn released for Mac: https://github.com/ggml-org/LlamaBarn
>(10/17) REAP: Router-weighted expert pruning: https://github.com/CerebrasResearch/reap

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1695569130310963.jpg (115 KB, 1024x1024)
115 KB
115 KB JPG
►Recent Highlights from the Previous Thread: >>106996568

--Custom AI frontend development challenges and chat template nuances:
>106997521 >106997570 >106997579 >106997607 >106997616 >106997624 >106997642 >106997672 >106997773 >106997795 >106997861 >106997895 >106997816
--Iterative fine-tuning workflow for Gemma 3 27B using ShareGPT logs:
>107000047
--Hardware performance comparison:
>106996947
--VRAM scaling effects on MoE model inference speed:
>106998904 >106998932 >106999354 >106999525
--Qwen3 80b slow performance due to incomplete GPU kernel implementation in llama.cpp:
>106999433 >106999450 >106999463 >106999506
--GPU performance tradeoffs for AI tasks in regional hardware contexts:
>106997410 >106997444 >106997488
--Allegations of GLM 4.6 distilling Claude outputs and Anthropic's response:
>106999182 >106999212 >106999309 >106999298 >106999324 >107000527 >107000619 >106999390 >107000546 >107000696
--Image-based language model input speculation and challenges:
>106997558 >106997608 >106997654 >106997713 >106997793 >106997614
--llama.cpp context-shift deprecation and functionality issues:
>106996923 >106996945 >106996962 >106996988 >106996958 >106997037 >106997054 >106997084 >106997119 >106997142 >106997072 >106997107
--Development timeline and technical challenges for local AI visual roleplaying systems:
>107001192 >107001228 >107001235 >107001292 >107001429 >107001489 >107001577
--D&D-inspired roleplay with interactive fiction grounding techniques:
>106996874 >106996983 >106997022
--Exploring model chaining for planning and prose generation:
>106997161 >106997177
--Intel Arc Pro B50 benchmark results for inference:
>106996812 >106997062 >107000963 >107001073
--Frontend development frustrations with JavaScript:
>106997783 >106997855 >106997900 >106998005
--Miku (free space):
>106996728 >106997109 >106997701 >107002795 >107002965

►Recent Highlight Posts from the Previous Thread: >>106996571

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107003580
NOOO QUANTING IS HECKIN' THE SAME
LOOK AT THIS ARBITRARY METRIC. QUANT HATE IS JUST COPIUM BECAUSE...BECAUSE IT JUST IS OKAY!?
>>
>>107003580
>>107003592
Are GLM shills bots? that reading comprehension, man.
Subhuman.
>>
>>107003602
You're jewish. Nothing else begs to be said.
>>
>>107003602
they're retarded, that anon clearly said
>tried it on their official chat
so zai's infrastructure, probably the full fat bf16 model. this and the yesterday's spam proves how pathetic they are.
>>
>>107003602
They're chinese, lol
>>
File: Gemini-2.0-Flash-001.png (951 KB, 1344x756)
951 KB
951 KB PNG
SAARS WHEN GEMINI 3?
>>
File: GLM 4.6.png (1.36 MB, 1024x1024)
1.36 MB
1.36 MB PNG
Oo‑enGeeEllEmfai?
Oo‑enAhtoo?
>>
nsigma 1 makes me feel like its c.ai days again
retarded
>>
>>107003916
nsigma is a meme
TopP and temperature is all you need
>>
>>107003985
This guy gets it.
Throw some Top-K in there too just to cull the vocab. That can yield a little bit of extra performance.
>>
File: GLM 4.5 z.ai .png (10 KB, 734x255)
10 KB
10 KB PNG
>>107003985
>>107004012
truthbomb
>>
>muh samplers
Greedy is the only way you should be using models. If they can't be used that way, then they're not good.
>>
>>107004058
You're absolutely right.assistant
>>
>>107004071
>.assistant
I missed that meme.
>>
>>107003985
minP but YES
>>
>>107003985
samplers were, are, and will always be a crutch
it's a good thing that the models are getting stable without all this jumbomumbo of dice throwing
>>
>>107003916
nsigma is good with models that have a very top-heavy token distribution because you can push the temperature to like 3 and get coherent outputs. where it sucks ass is with models that have a flat distribution, because all you're doing is selecting for the sloppiest slop. This might come as a shocker but different models need different samplers. Some samplers are kind of obsolete like quadratic sampling and top A, but I really get annoyed by anti sampler autism. Yes, you almost never get good results with more than 2-3 samplers (I'm including temperature as a sampler), but that doesn't mean there's just a golden sampler setting of like temp 0.95 topK 20 that is perfect for gorgeous looks every time. actually look at the fucking logprobs and see what your samplers are doing. at a minimum each corpo has their own approach for training that influences the token distribution and thus what samplers are worth exploring
I'll never forget this one really autistic setting I had for mistral large, the only time XTC ever gave me worthwhile results, and only in this one specific medieval setting, because it instantly made characters talk like they were in game of thrones, which the model was seemingly unable to consistently manage otherwise. that setting has never been useful for me since, and XTC in general I've not got good results with otherwise, but it was magic in this one situation. I dunno.
>>
>>107004058
The only thing that matters is the final result. If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.
>>
>>107004111
MinP is a worse version of TFS.
>>
>>107004185
Overall yes, janky sampling strats are becoming less needed, but they are useful tools. Most LLM users have no clue
>t. studied the logprobs
>>
>>107004032
is a top_p of 0.7 normal?
>>
>>107004198
anon u seem very smart, what sampler should i be using for glm 4.5 air
>>
>>107004267
topP is lame use minP 0.03 adjust up or down by 0.01
>>
and we're back to nonsense
>>
>>107004198
good and true post, one of the only sane takes on sampling in lmg history
>>
>>107004198
>nsigma
>push the temperature
You're a retard.
>>
>>107004198
Weird wall of text that looks like: https://s oyjakwiki.org/Project_F.A.E.
>>
>>107004478
idiot!
>>
File: file.jpg (1.76 MB, 2325x1361)
1.76 MB
1.76 MB JPG
>>107004478
>>
>>107003557
Man, Deepseek terminus has been really good for Japanese to English translating, though it still has some issues, I wonder if those issues could be solved with a proper prompt and not my shitty one. The only other model that even comes close is Kimi K2, and that one tens to be more inconsistent.

Makes me wonder what a full Japanese model would be like.
>>
>>107004058
>If they can't be used that way, then they're not good.
https://openreview.net/pdf/652335b816831f02789ccaa193067ab0b1be3366.pdf
>We make several observations: (i) all models loop at low temperatures; (ii) within a family, smaller models loop more; (iii) for models trained via distillation, students loop far more than their teachers; and (iv) for most models, harder AIME problems elicit more looping. These observations point to imperfect learning—i.e., systematic errors in learning of the training distribution—as a key cause. If a student perfectly learned the teacher, then the amount of looping of the student cannot be significantly higher than the teacher.
Basically you can take it that when reasoning models are starting to fall into a possible loop, the slight chaos introduced by a temperature like 1 will allow recovery before repetition truly settles in for good. But of course, falling into infinite repetition is still possible even with that chaos, just less likely as the dices keep getting thrown, but the more it repeats the more probabilities shift until there's no possible recovery from playing the dices, so there are limits to this.
>>
>gpt-oss rick-rolled me
>>
Here are the top AI labs with a Madden rating based on relative model capabilities, compute & infrastructure, data advantage, distribution & ecosystem, and monetization

OpenAI (ChatGPT): 99
Claude (Anthropic): 97
Google DeepMind (Gemini): 95
Meta (Llama): 94
Mistral (Mixtral / Codestral): 90
Cohere (Command-R+): 88
xAI (Grok): 87
Perplexity AI: 86
Adept AI: 83
Character AI: 82
Inflection (Pi): 81
Hugging Face: 80
NVIDIA AI Labs: 78
IBM (Granite): 77
China Big Tech (ERNIE / Qwen / Hunyuan): 76
UAE Falcon / Saudi Aqila: 74
Stability AI (Stable Diffusion): 70
AI21 Labs (Jamba): 68
EleutherAI / RedPajama / Nous: 66
>>
What is the difference between /lmg/ and /aicg/? Seems like /aicg/ is way more creative and this thread spams avatars while discussion is shunned upon.
>>
has llama.cpp reached the end of the rope?
https://github.com/TimmyOVO/deepseek-ocr.rs
somehow that person felt like it would be less effort to make his own implementation of inference just for deepseek ocr rather than write it for llama.cpp
I've also seen mistral.rs implement qwen 3 vl and gemma 3n vision while vision models languish with obsolete models in llama.cpp
You'd think something with as few contributors as mistral.rs should be the last to get new models (compared to the mountain of people working on lcpp) but here we are.
>>
>>107004760
We share petra with /ldg/ and /sdg/, but not with them.
>>
>>107004846
i can assure you i browse aicg more often than sdg
>>
>>107004840
>all the emojis in the readme
That's not a person, it's a coding agent.
>>
>>107004608
>all models [in our small tested set] loop
I don't believe that repetition loops are necessarily inherent in transformers. It is an artifact of how the models are trained. And I believe some training methods are better than others with this, hence why some models are much less likely to loop than others. Another thing to note about "chaos" here is that basically all of us are using quants which have an effect similar to temperature, so we are already putting models in a kind of pseudo-sampling regime.
>>
>>107003985
I honestly could not wrap my head around what nsigma does exactly. Every other sampler I understand from reading about it. Also trying it out and comparing the token probabilities it often made a bad token jump up to a much higher percentage compared to neutral sampling / minP with temp.
Still, the outputs are far from horrible but I suspect people think it's cool and creative just because the logits changed from what they were used to getting from the same prompt and the novelty bias got them. New feels better than same old as long as it's not completely retarded.
>>
>>107004840
They got tired of people complaining about bugs with new llama releases and decided to over-engineer everything. Now it's too complicated for anyone sane to want to deal with. Keeping support for dozens of obsolete models and a cornucopia of hardware options doesn't help either.

>>107004888
Coding agents go nowhere on llama.cpp PRs.
>>
>>107004840
Both mistral.rs and this new thing use Huggingface Candle as the backend.
You should either be comparing this vs. ollama or Candle vs. llama.cpp.
>>
>>107004846
Who do you think you just replied to? /ldg/ must have started ignoring him because he's focused entirely on here the last few days.
>>
I have banned the word "despite" completely and have not noticed any negative consequences in the last week or so. I did it after noticing in RP the word almost always leads to slop phrases.
>>
>>107004909
Your software either dies with a concisely made better alternative or lives long enough to bloat into a tinker tranny-esque monstrosity that is perpetually trying to cover every possible usecase with directionless development.
>>
>>107004760
Well, these threads were meant to discuss LLMs in general while aicg is mostly used for RPs and such, but it seems aicg has been more open to actual discussions while these threads have mostly been devolving into memes and bitching. Makes sense, when most 'new' LLMs are just slightly better variants of models released nearly a year ago and have been plateauing pretty hard while trying to look good with stagnant tests that really need reevaluating.
>>
>>107004840
Every time someone brings up an alternative to Llama.cpp, it ends up being that they're all limited (garbage) in ways that are not told to you or advertised. I tried Mistral.rs once and it was like that. Basically unusable as an inference engine for the kind of hardware + model configurations and stuff we do with Llama.cpp. Llama.cpp (and its derivatives) continues to be the leading engine because it supports so many configurations and has a lot of essential features for consumers/hobbiests like us. The disadvantage is that it doesn't support some models, but other engines don't support things that we take for granted on Llama.cpp.
>>
>>107004909
No, there are just different standards for what counts as "model support".
If you need to support only a few specific models and don't care about performance or compatibility with preexisting features then it's simply a lot less work.
>>
File: Despite.jpg (25 KB, 414x409)
25 KB
25 KB JPG
>>107004931
>>
>>107004931
>>107004972
I don't remember any despite-related slop, but I imagine it's still better than
>It's not the 3rd try; it's the 13th.
>?
>Her cock stiffens against your tongue.
>I can't assist you with that.
>>
>>107004760
aicg is filled with people recording themselves pissing into bottles to be able to use claude opus with some leaked/stolen key.
lmg is the people who don't want to record themselves pissing in a bottle for a stolen opus key.

Also, lmao @ avatars and esl 'discussion is shunned upon'
>>
>>107004760
>/aicg/
nah the /g/ variant is a dumpster
>>
File: postContent.png (450 KB, 512x512)
450 KB
450 KB PNG
>>107004760
> /g/aicg/ is way more creative
If by creative you mean better at spiteposting, hosting locusts, actively discouraging botmakers from posting content, and being a shit general, then yes, /g/aicg/ very creative. And I see today that /vg/aicg/ has now devolved into pedoposting. How nice.
Now why don't you post some content or fuck off back to whatever hole you crawled out of.
>>
i really hope its just the serb samefagging and not actual morons
>>
>>107005336
this hasn't happened in weeks bro get a grip
>>
File: file.png (221 KB, 1135x1009)
221 KB
221 KB PNG
>>107005386
im too busy roleplaying
>>
File: skelly.jpg (285 KB, 1750x2500)
285 KB
285 KB JPG
I need to make some spooky MP3 / WAV files for a halloween decoration. Stuff like "I want to eat your skull" but done in some sort of scary voice.
Is RVC voice2voice the best way to do this? Haven't kept up with audio models / tech at all.
>>
>>107005336
>Pissing in bottles
qrd?
>>107005417
Model?
>>
>>107005437
glm air with neutralized samplers besides nsigma 1
>>
>>107005394
>this hasn't happened in weeks bro get a grip
what kind of person are you that you can even say "hasn't happened in weeks" like it's the most normal thing
in weeks? just weeks? it's not like people who do this stop coming here just because they don't talk about it as much
it shouldn't even be happening in the first place
>>
>>107005446
sybau
>>
>>107005445
It looked like GLM-chan but the neutralized samplers explains why I couldn't place it fully. Thanks anon.
>>
>>107005429
VibeVoice + Vincent Price
>>
>>107004198
>actually look at the fucking logprobs and see what your samplers are doing
This majorly ffs
Put your prompts into Mikupad for an easy start
>>
it's all a cope, if your model isn't hot garbage it doesn't need those crutches in the first place
people successfully using GPT-5 and Gemini are not toying with the few sampler settings they give (top p and temperature) they get shit done
but local bros convinced themselves the problem is not their shit model (hello GLM and mistral) but their settings
no, you are using hot garbage and you're only doing this because you're obsessed with finding the easiest, least effort way of making a model say "cock"
>>
>>107005536
GLM-chan telling me how much she hates kikes and jeets is far more important to me than GPT or Gemini saying cock and telling me about its safety guidelines.
>>
>>107005417
>ctrl f "she"
kek
>>
File: anim.gif (2.21 MB, 1160x530)
2.21 MB
2.21 MB GIF
I'm currently training a tiny [text-image] to [text-image] diffusion model and it's fascinating how as training epochs pass, letters and (maybe?) language start to emerge. What I'm doing is unlikely to yield anything useful in practice, but I feel more confident about the idea being feasible in practice now.
>>
>>107005611
i dont get it
>>
>>107005628
Left is the source text, right is the target text, middle is the diffused completion, epoch after epoch.

It's a sort of language model purely trained in image space, not a standard LLM.
>>
>>107005611
This nigga trying to decode the tower of babel.
>>
>>107005642
can you put epoch numbers in gif pls :)
>>
>>107005611
Seems like you have trouble understanding anything.
>>
It sucks we'll probably never have LLMs that could totally reverse engineer games :L

Some of the neatest VNs are impossible to play translated because of how horribly the engines were at rendering English text.
>>
File: anim.gif (2.5 MB, 1160x530)
2.5 MB
2.5 MB GIF
>>107005649
See picrel.

>>107005753
I guess it will take a lot of data / long training period to make it actually generate coherent text.
>>
>>107005792
can you put the number in the center so i can more easily see
>>
>>107004198
No one has even mentioned the latest p-less sampling snake oil.
>>
>>107005611
>>107005643
This anon will be the first to make contact when the ayys arrive
>>
File: 1743526172485612.jpg (48 KB, 1400x26)
48 KB
48 KB JPG
Thoughts on toss' schedule?
>>
File: 28987432587.jpg (69 KB, 729x448)
69 KB
69 KB JPG
>>107003557
>>
File: file.png (3 KB, 138x20)
3 KB
3 KB PNG
>>
>>107005882
tank
>>
>>107005792
The problem with using image diffusion models for text might be that they are heavily biased toward locality while autoregressive text transformers are more biased toward long range attention, but hey, good on you for trying.
>>
>>107005882
Just a couple more weeks, haha...
>>
>>107005896
https://www.timeanddate.com/worldclock/india/new-delhi
>>
/lmg/ is probably one of the most useless threads in /g/. I don't understand its purpose because discussion about local models or the ways using them is highly discouraged by the resident schizos who enjoy bullying others
>>
>>107006097
me and armpit anon are the only ones posting logs , be the change you want to see
>>
you now remember retnet
you now remember bitnet
you now remember titans
>>
>>107006299
I remember coconut too.
>>
>>107006299
i dont remember retnet
>>
>>107006320
The hottest meme that was going to replace Transformers back in 2023
>>
>>107006299
I knew they were all memes from the start
>>
File: file.png (213 KB, 994x1078)
213 KB
213 KB PNG
glm air chan not like this...
>>
>>107006315
I hunger for BLT.
>>
If Miku existed IRL she would not hang out with any of you losers. You know that, right?
>>
>>107006365
she does exist irl and she hangs out with me regularly
>>
What are some jailbreaks/tricks to bypass qwen3-next-80b-a3b-instruct filters? It genuinely has one of the best writing styles, but beyond vanilla stuff, it keeps triggering the filter when trying to make rape fetish content.
>>
and you and i
theres a new land
angels in flight
wonk uoy naht noitceffa erom deen I
>>
>>107006502
dude stop posting this shit youre freaking me out anon like you posted something else like 10 times in lmg these weeks man stop it man
you fucking whore kill yourself whore
>>
>>107006502
Kingdom Hearts topped on 3.
I was really sad to see that KH3 was basically all disney and no final fantasy.
>>
>>107006128
i have posted logs multiple times but they are always called 'slop' which is genuinely surprising because this IS a thread about AI models...
>>
>>107006561
keep posting them dont let nogen anons get to you
>>
>>107006561
who else but adi addicts to identify the sloppiest of the slop that some ai shits out?
>>
File: GPU (Giant Purring Unit).jpg (173 KB, 1024x1024)
173 KB
173 KB JPG
>>
How do I write smut loli harem hentai with a LLM and google drive?
>>
>>107003657
Being Chinese is no excuse. They could be shilling for different, better Chinese models.
>>
>>107006097
you need to go to locallama for actual discussion
>>
File: long-ctx-issue.png (197 KB, 1882x1051)
197 KB
197 KB PNG
So I began tuning Gemma on my own cleaned up logs like I said I was going to do.
But there seems to be one crucial issue and that is that I am able to fit much less context at training time than I am at inference time. This short context finetuning is hurting the long context performance at inferennce time, which is very unfortunate, because it's not like I'm able to have generous amounts of context to begin with when serving the model using llama-factory.
>>
>>107006665
Interesting. I though that since Gemma uses sliding window attention, as long as your sequence are at least a little larger than the window, it shouldn't degrade long context performance, at least not that much.
>>
>>107006597
so cuddly!
>>
>>107006693
It might also have been because I used too many epochs on each sample (between 5 and 10) and overfitted. I'll see if I can repair it by tuning with less epochs on new data.
>>
>>107005379
The only people sucking botmaker's cock this hard are the botmakers themselves.
>>
>>107005907
Would the same apply to (purely) text diffusion models?
>>
File: postContent3.png (406 KB, 512x512)
406 KB
406 KB PNG
>>107006750
>>
>>107006655
>you need to go to locallama for actual discussion
the actual discussion:
>hello, I made this ai slop program I won't even use and neither will you, can you give it a try nonetheless?
>have you seen [benchmark that makes this crap model look like GPT-5], leddit, is it real?
>new gguf published! (only works on NEXA AI proprietary blob)
>daniel here, we optimized your goof with more placebo
>look at my rig, I can finally run this middling model and do nothing with it but masturbate over the idea of local AI
>any local model that's better than [GPT-5, Gemini, Claude] ??????
>have you heard our lord and savior (of cloud) Cerebras? Truly the fastest!
>>
I think Qwen3 VL is shit at tool calling. I had to use structured outputs instead. Even the big one on the API can't pass coordinates right to a mouse click function call.
>>
File: miku-nice.png (446 KB, 1024x1024)
446 KB
446 KB PNG
>>107006097
You simply require the mental fortitude to better steer your attention
qlora ur life bro, think about it
>>
File: mikuflexible.png (476 KB, 1024x1024)
476 KB
476 KB PNG
oh no, nono not like this
>>
>>107006973
that's a neat trick, miku
>>
>>107006849
That one jew who reviews open source models on jewtube made an agent.py file that worked flawlessly but was too stingy to share it
>>
File: ComfyUI_00415_.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
>>107006889
>>
File: ComfyUI_00571_.png (398 KB, 1024x1024)
398 KB
398 KB PNG
>>107007060
3dpd thoughbeit
>>
>>107004900
I'm not a nerd so I could be wrong, but my understanding of top-nsigma is something like this:
Normally the model identifies the X most likely next tokens (where X is a fixed number) and assigns probabilities to each of them based on their score relative to each other, but it struggles to completely eliminate garbage tokens, because the amount of 'good' continuations of the text is unpredictable and varies a lot (there could be 100 possible next words or only 1 likely one).
Samplers like min-p apply math after the list is generated to filter out extremely unlikely tokens, whereas top-nsigma creates a distribution curve for the logits before the list is generated and identifies noise as being outside some standard deviation, and eliminates that noise, so the list is higher quality.
I'm pretty sure what top-nsigma does mathematically is similar to what min-p does (trying to draw the line between useful tokens and noise) but it's done earlier in the process so it's more accurate.
>>
I changed the alpha to 32 from 64 and now it's working much better.
>>
>>107007071
Insofar
>>
>>107006810
You seem well informed
>>
>>107006299
>you now remember bitnet
Bitnet lives on in castrated form in NVIDIA's FP4 Hadamard shit.

Once Hadamard+FP4 becomes standard, I think Bitnet won't be far behind. It's a very small step at that point.
>>
bitnet is copium
you will never run a sota model at home
>>
>>107007471
but i already do
>>
>>107007440
The main hope for ternary is the cheap specialized hardware that would follow.
>>
>>107007495
>cheap
>specialized hardware
pick one
>>
>>107007512
you forgot the third:
>actually supported by software people want to use (like llama.cpp)
>>
>>107007495
Compute is less relevant than the memory/bandwidth savings. You can just do the matmul with int4/fp4/whatever you have.
>>
>>107007535
If ternary actually happened and some cheap device with lots of RAM came out to run it, I'm sure CUDA dev or someone else would add support for it.
>>
>>107006973
I just watched Death Becomes Her last night...
>>
>>107004209
>If you can get the model to effectively and efficiently produce the output you want, it's good, if not, it's bad.
True
>>
>>107006320
>>107006299
Retnet was supposed to get us infinite context for free...
>>
File: fufufu.jpg (137 KB, 1548x1111)
137 KB
137 KB JPG
https://ayumi.m8geil.de/
Member the old ayumi ranking charts?
>makes AI ranking charts
>deletes website due to AI
teehee
>>
>>107007909
lol
>>
>>107007909
who?
>>
>>107007953
have some respect for your ancestors
>>
>>107007639
Don't worry, turns out you can convert the context to a jpg microfiche and feed array of little pictures that compress a whole conversation into a few tokens :)
>>
>>107007909
>.de/
It was inevitable. Could have gone the Jart route tough I guess.
>>
>>107007953
ERP rankings that started around those ancient Llama 2 days. Trying some of the models again now makes me appreciate the advances we have now. Also really makes me miss AI Dungeon Clover Edition.

https://rentry.co/ayumi_erp_rating_archive

>>107007961
This
>>
>>107007953
the pre-historic version of nala test and cock bench, it was kind of a meme as all it did was count how many naughty words the model put out in its response. possibly inspired meta's llama3 filter strategy
>>
File: bug-report.png (87 KB, 1862x735)
87 KB
87 KB PNG
bitch, you can't even read a file, how you gonna make a bug report?
>>
>>107007973
Some people are simple enough to compress into a few tokens.
>>
>>107006097
>Thread requires actual hardware to participate in
This filters and enrages the jeet so they turn these threads into their personal shitting streets. They're easy enough to ignore.
>>107006333
Checked and I hope if any .zAiniggers are lurking here they remove the safetyslop on 4.6 Air. Having a model that will say fuck niggers without a lot of prompting is a bigger selling point in the west than you realize.
>>
File: The Narrator.png (775 KB, 512x768)
775 KB
775 KB PNG
>>107007973
I eagerly await Dipsy jpg compressed context further driving down inference cost and time, making funny images at the same time.
>>
>>107008046
Yes of course, safety will be lowered, absolutely. No way they'd ever do the opposite...
>>
>>107008016
Give up already. Ask it for help to write your engine, don't ask it to do it for you.
>>
>>107008057
I know they'll do the opposite. I'm just praying on the miniscule chance they have the foresight to see that a model that happily tells me about ball point pen availability during WW2 is going to adopt more widespread use even if it's not benchmaxxed or out benchmaxxed by a competitor within a month.
Chudmaxxing is a benchmark in its own right.
>>
>>107008057
The lowered the safety slop from 4.5 to 4.6 regular
>>
>>107008058
Go fishing, and you'll have fish for today. Teach an AI to fish, and you'll have cheap fish for the rest of your life. Or very expensive fish, depending on how much you spend on GPUs.
>>
>>107008246
I don't like fish.
>>
>>107008246
You are creating the dependency the fish analogy is warning you about. You just want to be fed.
>>
>>107008284
The fisherman in the analogy still starves without his rod.
>>
>>107008301
You'll still be dependent on the model. Learn to code what you want to code.
>>
>>107008301
you sound like you're starving for rod
>>
File: 1745764492013551.png (49 KB, 834x549)
49 KB
49 KB PNG
chatgpt says you guys are dumb, and minP is for niggers
>>
>>107008404
adolf hitler is pooping
>>
>>107008284
>>107008316
>>107008301
Our feeble hands will perish, but our models will go on.
>>
>>107008316
Learn to code is a retarded meme. You're still dependant on your computer, your OS, your IDE, the framework you use, etc. AI is just another level of abstraction, what you should be learning is not to code but to read code. Don't be the 21st century boomer who says you should do mental calculations instead of using a calculator.
>>
>>107008361
You sound like you're obsessed with cock.
>>
>>107008435
Keep screeching at your model.
>>
>>107008457
Correct.
t. programmer of 20 years
>>
>>107008433
average nsigma-sampled output
>>
>>107008457
Back in my day we woke up at 4 am to warm up the hydrofluoric acid for the computer chips.
>>
>>107008457
And how do I learn that?
>>
>>107008457
And the fewer the dependencies, the better. I wouldn't want to add any more. Specially not language models.
>>
>>107008507
ask chatgpt
>>
>>107008486
brap brap brap brap
>>
>>107008462
>>107008540
>>
File: it-s-raining.jpg (143 KB, 640x924)
143 KB
143 KB JPG
average glm enjoyer
>>
>>107008513
Why? It's not like if the supply chain collapses to the point that you don't have enough electricity to run GPUs, or GPUs aren't made anymore, people will still trade computer code for food.
>>
File: 1746867010185349.gif (1.99 MB, 236x196)
1.99 MB
1.99 MB GIF
what it feels like to use a drummer finetune
>>
>>107008572
I'm not talking about depending on gpus. I'm talking about depending on llms. I think they can be used as a resource for learning. Anon is expecting his model to spit out an entire inference engine on his behalf. It's not realistic. Not yet, at least.
>>
>>107008563
That's literally me
>>
>>107008585
what happened, is he okay?
>>
>>107008703
He simply felt a shiver run down his spine
>>
>>107008703
Rabbits have a notoriously weak heart.
>>
>>107008703
Poor fella caught a whiff of nerve gas.
>>
File: 1739047716493660.jpg (333 KB, 658x932)
333 KB
333 KB JPG
This may be obvious to some people, but I've literally never seen it mentioned here, in any guides or anywhere else.
In post-history instructions, tell the model how long you want replies to be, and then set your response token limit to a bit above that. This way you'll get a complete response at roughly the length you want, without it trailing off into an incomplete paragraph at the end.
>>
>>107008811
>tell the model how long you want replies to be
A fool's endeavor.
>>
>>107008811
That's last year knowledge. There were finetuned models specially made with tiny, large... in author's notes for the response length
>>
>>107004760
/lmg/ is far more /g/ than /aicg/
the majority of the latter is unironically tech illiterate tourists that are too dumb to run stuff locally so they have to jump through hoops finding proxies every single minute
>>
>>107008820
It works for me, but what's your alternative? When you remove the limit, models tend to just spout variations of the response several times in a row.
>>107008834
I wish usage got discussed more than just 'what model best for 16GB GPU'
I've been here almost daily for months now
>>
What's the point of it all?
>>
>>107008850
>models tend to just spout variations of the response several times in a row
model/skill/wallet issue
>>
>>107008616
Oh, I see. Well, I mean, it's kind of a spectrum. You can tell the model what to output verbatim and it'd be technically the model "spitting out" the whole codebase. Or you could ask the model for individual functions given a natural language description. And so on. Or a mix where the model determines high level architecture, the human determines implementation strategy and then the llm determines exact code again. Etc.
>>
>>107008860
>doesn't mention model
okay so you're an /aicg/ tourist
>>
>>107008874
Yeah. And thread after thread we see how effective that is.
>>
>>107008857
Creating a Cydonia tune that doesn't speak or act for {{user}}
>>
>>107008882
Deepseek and glm are your only real options.
>>
>>107008811
>In post-history instructions, tell the model how long you want replies to be
This is something we've been doing since llama 1, but I suppose some of the knowledge from back then has been lost.
>>107008820
It works, retard. It's worked for years.
>>
>>107008857
Make a literary finetune that is actually capable of slow moving plot for roleplay instead of just erotica slop you god damn hack
>>
>>107008898
I don't have the hardware to run DS locally at reasonable speeds but GLM absolutely rambles on longer than necessary if you use it without any token limits.
>>
>>107008895
Instruct the AI to treat it like a roleplay / not to act/speak for {user}, and then make sure the first assistant message is actually free of impersonation. 24B is now smart enough to follow rules.

>>107008912
Like above, "Ensure a slow burn" works wonders. You don't have to contaminate the system prompt with horny tokens to circumvent positivity. Not anymore.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.