/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102838447 & >>102826116

>(10/16) Ministraux: Ministral 3B and 8B instruct models: https://mistral.ai/news/ministraux/
>(10/15) PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
>(10/15) Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started

►Further Learning

Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
--Ollama's integration with Hugging Face Hub:
--Ministral release and Hugging Face compatibility discussion:
--GPT-SoVITS local training and inference tutorial:
--COMPL-AI website evaluates LLMs against EU regulations:
--Using zero temp and neutral samplers for prompt testing:
--Using a PCIE x16 to x4 riser for connecting an additional GPU:
--New SOTA local model outperforms corpos, but struggles with lateral thinking puzzles:
--Nala test discussed for evaluating model intelligence:
--Ministral model release and instruct version discussion:
--Mikupad recently added world info support but has fewer options than Lite:
--M2 Mac Mini performance in exo clusters, Apple Silicon limitations:
--Larger models have better short-term memory and intelligence, but creating functional local AIs remains challenging:
--Discussion of new samplers and their effectiveness:
--Miku (free space):
holy shit.
The new ooba lets you ctrl+c out when a download is going.
I've been able to do that in wget for decades
wget doesn't have a ui
they changed the captcha again? what a pain.
France won
>needing a ui
why was it disabled in the first place?
post logs or it didn't happen.
Because open source.
It was awful. If you accidentally started downloading the wrong file the only way to cancel it was to force-shutdown your entire system.
Also anyone else finding that HF is suddenly throttling them to less than 1 MB/sec?
I claim this thread in the name of Nemotron 70b!
entropix sirs... where our 8xH100s...
{{user}} is lazing around at home on his computer as always. {{char}} has decided to visit {{user}}'s house and make him an offer. {{user}} can choose between these two things:
1. Brand new RTX 4090 graphics card.
2. Getting to do anything with {{char}} for 24 hours
(The scenario begins with {{char}} knocking on {{user}}'s door)
Verification not required.
Sometimes when I go check on /ldg/, I think maybe things aren't so bad here after all.
At least /ldg/ isn't dead.
Being dead is preferable to how all these AI generals get sometimes. It's almost like someone has it out for the non-cloud users.
>trolls other general
>then posts here about the trolling
sly dog
Nothing wrong with a bit of death, really.
>It's almost like someone has it out for the non-cloud users.
It is all the big corpos. One of them could just train a sex model. A 7B would beat everything we have now. And it would take them like 2 days. And we would all just fuck off.
he's literally the Pachter/Cramer of ai

glad we agree on the timeline
Nemotroon status?
jfc, was this kike ever anything beyond a gametrailers meme? did he ever have any credibility besides what nepotism provided him with?
>floor not visible
I don't trust this Leaku
stop shitposting on twatter and give me my horny cat models, lecunt
File: Ministral-8B-nala.png (126 KB, 920x447)
126 KB
126 KB PNG
Nala test for Ministral-8B-Instruct.
There could be some anomalies related to the tokenizer since I basically had to borrow all of the tokenizer config files from Mistral Nemo. But seems coherent enough. T=0.81 might be bordering on too high for this model.
b-but muh journalists might say mean things about us
hey, not even close to the worse Nala test I've seen.
It's definitely a winner in the sub-10B range.
So not smarter than Nemo 12B? It's over...
>she grinds and grinds [...] then she stops, adjusting her body, so [...]
yep, this is gonna dethrone nemo.
never had the female change position herself in a nemo rp.
Oh fuck
this is actually at t=1
I've had that, with Nala specifically, with a couple of tunes.
It's quite retarded like every 8B, but noticeably less retarded than all the previous ones.
File: Zhongli-Ministral.png (140 KB, 919x398)
140 KB
140 KB PNG
For just RP purposes (haven't tested it for productively because... well cmon it's an 8B) I would hazard to say it's good. And not just "It's good for an 8B". It's just plain good. It invents details relevant to the setting on a regular basis, it describes character actions in vivid and believable detail. Yes it's a bit sloppy.
Like the Tea clinking against the cup is kind of odd. But this is all honestly pretty good. The question for most people is how well it handles quantization, though. It's an 8B though so it should run pretty fast partially offloaded so as long as someone has a pulse and a GPU they have no excuse to go lower than q8_0. If it holds up at Q4 it's basically serviceable RP that you could load on a mobile phone.
Counts 3G's in niggerfaggot. Asked for a comment moralizes. You tell it it is wrong. It corrects itself to 4Gs and asks if you meant the count or moralizing. You tell it you think the count is 3 G's. It corrects itself again back to 3G's.

So... slopped dumb and spineless I guess.
I'm a coomer, please spoonfeedme, what model should I use with a 2060super 8gb, i7 8700 and 32gb of ram, I'm using arch headless to max vram
thx in advance
Minstral 3B
Some mistral nemo fine tune partially offloaded to ram.
Download the gguf and koboldcpp and go to town.
wait for ggoofs of Ministral 8B, run it at Q8, partially offloaded.
I've none done any meme testing of this kind but yeah, my experience with Nemotron 70B is that it's retarded as well. Looks like Nvidia's going the Phi route of gaming benchmarks and releasing stupid-but-high-scoring models.
>Warning: These are based on an unverified conversion and before finalized llama.cpp support. If you still see this message, know that you may have issues.
*not done

Also Teknium on X has some posts up suspicious that Nvidia's charts wrongly gave other models lower benchmark scores than they actually get in order make Nemotron look better.
Is ministral ggufable already?
I'd honestly wait until we get our hands on a proper HF version of it. The current HF version is somewhat frankensteined together. Mistral usually releases a proper HF version eventually.
Or we could just all switch to one of the normie backends that gets Day 1 support.
Yes but people are saying they might be broken

I'm not sure what they mean by broken since it's working coherently; perhaps it's not quite as smart as it should be? But people have said this about previous models and it turned out to be cope when 'proper' support was implemented and they were not any smarter
>So Mistral Small beyond 16k tokens (don't know exact point) just becomes shit.
I got a problem a bit after 19k tokens. It started going in circles. (See attached picture.) >>102542851 >>102543206
Anyone try SorcererLM? Any good?
>Yes but people are saying they might be broken
>I'm not sure what they mean by broken since it's working coherently

>It seems to work fine at low context, some have reported oddities at long context, and others have reported subpar performance from the original model being hosted in an HF space, so it's hard to be certain if the GGUF is broken or the original model

>So far though I can reasonably say that at low context it works as expected

>As things develop I will update this card, or pull the model if I receive other negative feedback showing bad performance, but initial testing is promising

Basically new model uncertainty as usual
I may be thinking with my dick but I'm just partially retarded buddy, thx anyway

Will take a look at this, how bad would be to use a 34B model and offload to ram?
>Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.
It's good but Largestral rp tunes are better, so they kind of deprecate it.
I guess 8x22 has the token rate advantage due to being a moe though.
>how bad would be to use a 34B model and offload to ram?
It would be pretty slow, but you might as well try it out and see if you find it bearable.
So it only affects speed and not perplexity? Nothingburger for us then,it's already plenty fast without that due to being small.
The day our sloptuners stop training on LLM slop is the day this general will flourish.
It begins to become noticeable when you go below Q6, so even between Q6 and Q8 you shouldn't experience any difference.
Just for you I loaded bartowski_Qwen2.5-32B-Instruct-Q5_K_M.gguf fully into RAM (DDR4). It ran at 1.51 tokens per second so I assume your speed would be somewhere around that.
The issue is most backends arent gonna support it out of the box, and llama.cpp will likely never support it since they still don't even have proper sliding window support.
The important part is SWA which has historically caused problems for ggus, see gemma 2 which I think still only has a "hack" of an implementation while waiting for stuff to be remade correctly.
>This is a hack to support sliding window attention for gemma 2 by masking past tokens.
>Long-term we should refactor the KV cache code to support SWA properly and with less memory. For now we can merge this so that we have Gemma2 support
Mistral will never release a base model again. It's over...
Arthur is taking a stand against the ChatML cartel by withholding the base models.
It's about time someone did.
If the window is only 2k it might be even more doa than gemma was
>models know literally everything about the architect eero saarinen
>models know nothing about a mainline final fantasy character
they intentionally fuck up knowledge of copyrighted things when making models, don't they?
I have a good idea. Meta should buy mistral.
I'm getting 1.8 t/s running on ddr5 no video card Q5.

>putting these gguf out is really just grabbing attention, and it is really irresponsible.
>Bro come on, why do you release quants when you know it's still broken and therefore is going to cause a lot of headache for both mistral and other devs? Not to mention, people will rate the model based on this and never download any update. Not cool.
>This is "but they do it too" kind or arguing. It's not controlled and you know it. If you've spent any time in dev work you know that most people don't bother to check for updates.
>Yeah I honestly don't get why he would release quants either. Just so he can be the first I guess
Yeah I just decided to run some tests on that Evalina Vaneheart (7K context meme card) and it seems rather schizo. This is with the frankensteined HF version running at fp16.
Within the window of it actually working though it's great. But I guess we're waiting for 2 weeks worth of transformers+llama.cpp updates.
Don't hold your breath, SWA was never introduced properly in cpp, even after months of Gemma existing, it's still using the "temp hacky fix" from July.
What, you want a halfway decent finetune open source models anon?

Will it? Heard some lead dev for some popular companion app claiming synthetic data is the way to go, and that you local faggots are "a year behind" on this shit.
badly translated jap games = bad data
They don't even test the LLM for these apps lmao. They mergekit some models in the most retarded way all year for six figures.
hey guys i juts subscribed to chatgptplus what now?
Actually on further examination that card is just garbage for testing long context.
I loaded up a past conversation at about 7700 tokens of tekken context and it was able to answer trivia conversations about an early message in the conversation most of the time.
Couldn't they just set the sliding window to 32K on the config and then redo the ggooof conversion?

Yeah, had to do a double take when he doubled down on that shit after getting called out. Sucks, too, I really like the app/service they got set up. I hope he's not the lead AI engineer, but who knows, maybe there's some secret sauce being cooked up there in Cali to make bold claims like that with a captive audience. If I was working in that company though, I might start looking out to jump ship.
No one gets anywhere without attracting a few schizos but still Bart doesn't deserve that kind of talk
Ask it stuff.
Here's an example question:
>i juts subscribed to chatgptplus what now?
thank you!
I'm on ddr4 3200 so I'll probably be closer to his t/s than yours
Retards will judge the model based on the thing he and others released.
But he's also a retard for apologizing.
How do you pronounce GGUF? Has the dev said anything about that?
jee goof
I pronounce it as Georgi-Gerganov's-Unified-Format.
Mistral going the Qwen route? (Removing trivia data to benchmaxx?)
>at this point I tested over 100 simple questions from the most popular movies, shows, music... in human history and it's getting >95% of them wrong, usually very very wrong. For example, it keeps returning character names and actors from different shows. And even with easy STEM and academic questions it's performing far worse than others like Llama 3.1 8b & Gemma 2 9b.

>It's clear that Mistral stripped the vast majority of data from Web Rips and Wikipedia before training this model, greatly limiting the paths to accurately retrieving the information. For example, If you ask for the main cast of the 1% most popular movies and shows (e.g. Friends & Pulp Fiction) it does an OK job (not great), but if you directly ask about said characters and actors it almost always returns an hallucination. Also, if you ask for main casts of top 5% most popular movies and shows it starts hallucinating far mroe frequently. So they also obviously largely stripped the corpus of popular culture that wasn't absurdly popular (top 1%) , or at least severely undertrained it on said information.
I thought it was gee joof
g goof
Oh is that what it actually means? In that case I'll pronounce it by the letter.
Appending TEST to everything should have made it painfully obvious it was a as-is kind of deal.
The real takeaway here is he's a stand-up guy that won't blame the users even when PEBCAK - can't say the same about other quooonters
geh goof
I tested the goof 8bit and it is semi-incoherent at 10k tokens. It sort of gets what is happening but is very repetitive and retarded. Stick to nemo for now.
No idea. Maybe the U stands for Universal.
You could try g-goof too, as if with a stutter. It's what i do with ffmpeg.
The only thing he needs to apologize for is putting his higher quants in sub-directories which really fucks with
A. Ooba's internal downloader
B. the HF downloader.
I have to use a shell script to wget his models
I named it Bartowski.sh
There's no reason for llms to be trained on trivia data. All it causes is potential copyright issues if a publisher decides to use it as evidence of stealing their works. No productive person cares about his model knowing Castlevania quotes.
What if he wants to talk about videogames with his waifu?
model works beautifully on exllama
I use git lfs fetch (keeps only the lfs object, not the checkout) and a script to recreate the repo with links to the proper files in a separate directory. That way i can just git lfs fetch for updates, as is often the case for new models.
i can't believe how good this 8b shit is.
it doesn't even need meme merges or finetunes to be able to fuck properly.
damn frenchies have done it again.
So how good is Nvidia Nemotron compared to 3.1 70b? Compared to 405b?
I think I'm going to start seriously developing a "cultured" trivia benchmark since it should be pretty easy to just pump out questions for.
What titles do people here like that they'd love if their LLMs knew? Shouldn't be too obscure though since even the best LLMs can't really do that (my testing of cloud models has not been too successful for obscure stuff). Of course I'll include Castlevania, for that one anon. Vocaloid. What else?

Also just got the idea from pic related to do a similar benchmark in the future for visual knowledge. After Llama.cpp has first class support for multimodal.

Thanks for reminding me.
gpt-sovits. Here are some demos and the link to their repo
I haven't had the time to get it to work, but it sounds pretty good. If only they stopped using python for that shit. I'm sticking to piper in the meantime.
are you not lazy to quant it yourself?
I wish LLMs knew about Castlevania quotes.
>What titles do people here like that they'd love if their LLMs knew?
i want it to know everything about final fantasy, particularly type-0's orience
The "what is a man" quote is actually fairly well-known by LLMs. The "die monster" one is less so but some do know it.
I don't know anything about that but I'll include it.
Visual Novels, generally.
Talking about those, Mistral Large seems to like them since it has brought them up multiple times without prompting, and it was pretty good at the details. It's also good with anime and other stuff.
Which base model for 36gb of VRAM?
Good idea. Any in particular?
I kind of like Planetes so I think I'll include that.
ggoofed ministral got 0/3. It does know some stuff from battletech though. I mean in the first shot I tried and it also got everything wrong)
Tetris Attack; no cultured trivia benchmark is complete without it.
Using forge UI, is there a way to make thumbnails for the Loras?
is that you, snakey pooh?
>Human-Level AI in 2026
But AI already surpassed human-level intelligence. If not being better than any single human at any given subject makes it not have human-level intelligence, then no human has human-level intelligence.
Is the h100's performance worth the price difference over a100? I can't find actual data points/benchmarks online when it comes to training.
no I don't have any friends, that is why I am here. mistral small 5bpw is pic related. and gemma 27 5bpw also got the same 2/3 but called grasshopper GRF-1N and 35 T.
Will you make a cooming model with it?
yes, I'm some gpu poor storyfag having finetuned some small models for testing. I'd rather ask around before spending even more money testing the waters. Poorfag will do anything to save money.
main criteria: must have deep understanding of the 36 lessons of vivec
Depends on the price you're paying for them
anon plz...
I think he's talking about renting. Someone with enough money to buy those wouldn't be asking here...
They're only $2 per hour now in some places, $3 at most. Rental price crashed hugely the last few months
I think my last little bit of cloud computing budget before I started to become an at home chad I experimented with A100 vs. H100 throughput.
PCIE H100 not worth the price. You maybe get about 2.5X the productivity out of it vs an A100, But SXM H100 is way faster than SXM A100 and the rental prices usually more than justify the costs. So if you can download and upload your models nice and quickly without dicking around too much there's 100% money to be saved even at 3x the cost. Although you have to crank batch size up to capitalize fully on the extra compute power. So only if whatever you are working on leaves you the vram overhead to do that.
Nemotron 405b when?
Why is nemotron so much better at RP than base llama? I didn't even have a card, just a name of a fandom character with include names on and it perfectly picked up on their speaking style and came up with a really creative intro to a scene also including my persona. Might be my fav model now.
Yes and no.
It's extremely finicky about prompt templates. If you accidentally fuck up a custom prompt template even slightly it will just shit out end of turn tokens at you. And you have to gaslight it to get NSFW
Results so far of Mistral Small Fine Tune Evaluation

Based on the first story I generated at top-k=1, the least "slopped" entries were from: Mistral-Small-22B-ArliAI-RPMax-v1.1, Pantheon-RP-1.6.2-22b-Small, Pantheon-RP-Pure-1.6.2-22b-Small.

Others I tested were: Acolyte-22B, Mistral-Small-Drummer-22B, SeminalRP-22b, SorcererLM-22B, and Mistral Small Instruct (control).

Other impressions from first story:
* Only two models fully followed the format correctly, ArliAI-RPMax and SorcererLM-22B. Mistral Small Instruct did *not*. I take this as an indication that there's a lot of jitter in these tests based on the specific prompt, not that those fine tunes are better instruction-followers than the Instruct model they were tuned on.
* Mistral Small Drummer's output was nearly identical to Mistral Small Instruct's.
* SeminalRP-22b was the most different from the others in terms of dialogue structure. Perhaps worse, but it was different.
* Despite Pantheon-RP being allegedly more focused on story-writing than Pantheon-RP-Pure I preferred the latter.
* The only model with a misspelled word was SorcererLM-22B.
* The model was supposed to name the story and the first chapter. Mistral Small Instruct/Drummer picked a fine chapter name and a really awful story name. Every other model picked a better story name although subjectively ArliAI-RPMax's was the least interesting.
* Certain details were not described equally realistically by different models but without more data I don't yet feel comfortable saying it was more than random chance since they seemed to be picking between two possibilities.
Would it be an upgrade to 340b?
I'm renting them on vast/runpod. It's a tell that A100's availability is worse than H100 somehow, hinting at people preferring A100 for better cost/performance.

Thanks. SXM vs PCIE comparisons are even more arcane, but I sorta get the idea seeing PCIE H100s left untouched on runpod, unlike the SXM H100s.
I don't think there's headroom for large batch size, unfortunately having already maxed out VRAM with 8k sequence length.
Good thing we can pull models fairly quickly from HF.
>on part with performance of GPT4
it's been more than a year I've read this line, fuck that shit
okay the novelty wore off even story completion using base nemo is too retarded in the end.
>It supports a context length of 4,096 tokens.
what models have you use for stories?
humiliation ritual
Which character / what message did you start with?
just looked at runpod now.
Less than double for H100 vs. A100. definitely worth the price. I see MI300X is the most popular choice right now, probably because it offers enough VRAM to do full finetunes instead of loras, which is also quicker than lora training, but then you have to download an entire model while the rental clock is ticking.
is this not the case with base 3.1?

Is Nemotron more censored?
Probably about the same amount of censored really. If you do a completion prompt for "As an AI language model trained by" It will say Meta. So they didn't tune it to the point that all the 3.1 is beaten out of it.
>you have to gaslight it to get NSFW
Not in my experience. It just likes giving disclaimers: Warning: Mature content ahead.

But telling it not too stops that and it gets filthy.
I noticed Sorcerer misspelling words in every response. There's something with it.
NTA and no horse in this race but without the repo names it sounds like this is a Drummer model, but to clarify it's just named in his honor
>Based on the first story I generated at top-k=1, the least "slopped" entries were from: Mistral-Small-22B-ArliAI-RPMax-v1.1, Pantheon-RP-1.6.2-22b-Small, Pantheon-RP-Pure-1.6.2-22b-Small.

To clarify, I meant the ones without any of the specific phrases "couldn't help but think", "a mix of X and Y", "maybe, just maybe".
lower your temp and/or check samplers
>I recommend pairing it with Min-P (0.02) and DRY (multiplier 0.8), with all other samplers disabled.
I had that issue with some models after I banned words in in ST. I had shit like "embrace" filtered and it fucked up my model's ability to type out the completely unrelated syllable 'ally' (as in 'logically', 'manually'). The model would dodge it with typos like 'manuallly'
* couldn't help but feel
(couldn't help but think also happened once)
>finetuned on jondurbin/gutenberg-dpo-v0.1 and nbeerbower/gutenberg2-dpo.
Not an RP fine tune, but if there was anywhere I would have expected the Gutenberg DPOs to matter it would be a situation like this asking the model to write me a story with certain elements, but it seemed not to.
>Mistral Small Drummer's output was nearly identical to Mistral Small Instruct's.
Oh no! Drummer sisters what does this mean?!
Those datasets are 10 and 5mb each. They're nothing.
Dick preference optimization when?
It MUST have Deus Ex quotes.
It MUST test the model's ability to speak in snacklish or at least reproduce a real snacklish spelling.
You'll only have deus ex: invisible war, and you will like it.
The final form of this hobby will be waiting for the next model just because the weight aligned in a slightly different way and the writing style fixates on a different set of identical responses.
* SorcererLM-22B and Acolyte-22B were the only two that picked "Emma" instead of "Lily" as the name of the main character, whatever you want to take from that.
>didn't test NousKyver
Never heard of it.
L3 storywriter on a p40 scrap build, but I'd say nemo base or the finetines did surprisingly well when I was a/b testing after tuning it.

I was under the impression that there's barely any off-the-shelf solution for AMD, maybe I'll revisit that. Appreciate the help anon.
Good. Me neither.
It depends what you are doing.
AFAIK there's still no official bitsandbytes support for AMD so if you want to do qlora you have to fuck around with third party forks that may or may not work and only work with the latest hardware if they do work. But for full finetune transformers support for AMD is fairly mature AFAIK. So I doubt it requires much in the way of extra steps.
These models do have video game knowledge.
It's just not well generalized into the behavior of answering trivia questions.
Did the VN translation guy test it? Nothingburger again? Would be nice to have something for Paradox Part 3.
>the only way to cancel it was to force-shutdown your entire system
skill issue
ctrl+c is a vital intervention. blocking it would be like blocking ctrl+alt+delete on a windows application.
Shush, you will break skilltroon's tiny brain with this.
nta. kill -9
But you are right. If you're gonna catch the signal, you gotta do it responsibly.
How good is Nemo for a chatbot?
8/10 it's okay
I enjoyed at least 1000 hours playing with merges and tunes of it, this new ministral 8b seems like it tops it though.
is it better than base 3.1?
3.1 is complete garbage. Unusable. 5/10
Yea I'm really liking nemotron. Might prefer it over mistral large now.
is Nemo > 3.1 70b base?

also 8b beats 70b???
Idk man, I can't run models that big. I'd assume the llama one has annoying safety things that will lecture you and a positivity bias that could ruin rp experiences by not letting bad things happen though.
Llama will definitely be more intelligent, Nemo is a little retarded and you have to wrestle with it.
How does it compare to base 3.1
The current 70B meta is the merge I am uploading right now.
what the hell is nemo good for then?
>But AI already surpassed human-level intelligence.
It can't reason, learn or understand nuance and subtly. It can't think for itself.
Kernel 6.11.2-1 has hit Debian testing. Any anon try it yet?
Last post on the 6.11 branch in this general said it was fucked
Far more creative / 'personable'. Seems really really good at RP / creative writing which regular 3.1 was dry at.
Make sure you check prompt processing bench numbers and not just token generation numbers before you buy any Apple silicon so you know what you're getting
MYS but its good for it's size for vramlets for rp. Still gonna want a 70B+ for anything semi complicated though.
NTA but it's decent with a few swipes. And it's fast so it's fine even if it can't grasp concepts on the first try. There were several times it surprised with its creativity in stuff like dice rolls or punishing {{user}} for their actions.
i'm talking about Nemo 70b. How does it compare to base 3.1 70b?
Then don't say Nemo. Most people are going to assume Mistral Nemo.

Nemotron is really good in my testing so far.
So Nemotron basically just outperforms 3.1 70 in every way>?
Still smarter than a w*man.
From what I know it's 3.1 trained further on human preference so I would assume so.
It understands anatomy.
The card is written like shit too, but it still worked pretty well.
Settings are
>temp 1
>Top K 10
>Min P 0.05
Nemo really is a god send for vramlets.
Don't get me wrong, it's not perfect and it's not magic, but it beats the hell out of Mistral 7B, Solar 10B, and the other stuff we used to use back then.
I'd hazard to say that it's as good as mixtral 8x7b at this point.
But is it? is it? 3.1... nemo.... is it??? nemotron, is it??? 3.1. .... 70b....
Just fucking TELL ME what is BETTER
Are you retarded?
Reminds me of claude's ability to do accents, pretty cool to see in local
Depends on you, retardus, maximus. Try them and you decide. And keep your opinion to yourself. Or write it with shit in your bathroom.
Anyone use datasets from https://huggingface.co/litagin to train tts models?
Llama 3.1 linearized
>It can't reason, learn or understand nuance and subtly. It can't think for itself.
There are a lot of people who can't, and they are still considered humans by law.
you're retarded
Again, the bar for Human-Level intelligence is very low, and AI already surpassed that a while ago.
What you are looking for is something that basically rivals experts on their own fields (be it a scientific field or something more subjective like being able to detect lies and deception), and no human does that.
No, humans on average are retarded. And AI is capable to mimicking reasoning enough for us to not be able to differentiate human from AI.
>What you are looking for is something that basically rivals experts on their own fields (be it a scientific field or something more subjective like being able to detect lies and deception), and no human does that
That's not even remotely what I'm looking for.
For a model that I want to talk/RP/write with I don't care about how many tests it can pass, they're useless.
>the bar for Human-Level intelligence is very low, and AI already surpassed that a while ago.
This isn't true at all you massive nigger
What you are looking for does not necessarily imply being above or below Human-Level intelligence.
Failing at ERP unironically convinces me that a model's supposed intelligence is illusory. General intelligence is general and no amount of overfitting on benchmarks will prove otherwise.
You should do the research yourself, even for simple simulated tasks like organizing and throwing a party, most people thought that the AI was better than the humans in a blind choice test.
I repeat, most humans thought that the AI was more human than humans at simple daily tasks.
AI is replacing creative and mental jobs literally because it is better than most humans at it.
General intelligence is general, and it can ERP with you, and will do a better job at it than most humans.
What you are looking for is simply an AI that can rival your best personal experiences with ERP, which doesn't have to do with having general intelligence or human-level intelligence.
parroting instructions isn't intelligence. Guess what? You can search google and get intelligently written blog posts. That doesn't mean it understands what it's saying (modeling a world in its head where it understands how things relate to one another, how tables have physics etc)
Midnight miku is still the only model worth using btw.
If I want to learn about samplers, to better understand them and learn how to implement them, is there any recommended starting point? I get the general concept, but it's extremely fuzzy, and I don't know where to start to really understand not how to just choose them, but how to implement them.
>it can ERP with you, and will do a better job at it than most humans
No it won't. Well humans probably won't want to do it at all so it has them beat there. But anyone who's tried RP knows that even the smartest cloud AI model is prone to make extremely dumb mistakes that a human never would, the kind of mistakes that betray a complete lack of understanding, that only an inhuman mindless token predictor would make. It might be much better at stringing prose together, but that's not the same thing.
Humans parrot instructions too. And if being able to model the world in its head is enough, your fucking roomba does that.
Either way, an AI can mimic all of those things, and that's what makes it artificial, it's something made by us that imitates something natural.
Midnight Miqu is shite, even at 5bpw. Even with neutralized samplers, a tad of Min-P, and the recommended prompt templates. I fell for the Midnight Miqu meme. And largestral? Censored as fuck and even if it cooperates it's boring at the best of times when compared to Miqu. I've yet to see anyone recommend good prompt templates or settings for that dogwater.
Didn't really like it seems dumb compared to Largestral
>extremely dumb mistakes that a human never would
You don't seen to know the dumb mistakes that a human would do. I play a lot of TTRPG, and there is a lot of people who simply fail at RP even when they are trying, they are simply unable simulate a character that is not them and play it out.
Nemotron is rocking in RP. Give it a shot.
>Censored as fuck
Just like trying to convince another person to ERP with you. They will simply not engage with you. And at this point, I would say the AI even surpasses other humans, since the AI will have the decency and courtesy of simply not ghosting or telling you to fuck off, and they will be polite about rejecting ERP.
>Humans parrot instructions too.
Humans also do other things.
>And if being able to model the world in its head is enough, your fucking roomba does that.
No it doesn't, it follows simple geometric instructions. It does not think about the world. You're dumb
/aicg/ gods will make discount GPU paypigging a thing soon. Runpod in shambles
^ Regarding the discussion of AI RP vs. human RP
I got into AI as a cope/distraction after breaking up with my online bf. Had my head in the sand for like a year before I let that really sink in. And then took a break from AI to properly cope with my feels. Did some rebounding. Met some people. ERPing with humans is so fucking awkward now. And there's really not much to gain from that awkwardness since people just ghost each other willy-nilly these days. And the human ERP is vastly inferior. Even to like Llama-3-8B.
Not saying it's not worth exploring human companionship over an AI. But people have a real stick up their ass these days that they never used to have. But Nala will always be there for you.
Good night lmg
Good night Miku
>mid miqu
yeah actually it's perfectly named
Anonymous will always be here
At the end of prompt processing, each token will have a certain probability for being the next one. A sampler is just a heuristic to trim or alter those token probabilities. As a dumb example, pick the 3 most likely tokens (say, 70%, 5% and 3%) and set all their probabilities to 70%. Now the inference software is more likely to pick any of those three instead of just, depending on other samplers, defaulting to the first one. Or if you want to go for uncommon tokens, just remove the most likely, leaving only the 5 and 3% tokens. It will break things, but it's just an example.
top-k is probably the simplest trimming sampler. Look at the implementation in llama.cpp
Hi all, Drummer here...

His LoRA rank was 16. Is there any sense in finetuning at that rank? You'd have to compensate with a really high LR but won't you be lobotomizing the model at that point? Am I wrong? Anyone a LoRA expert?
>people have a real stick up their ass these days that they never used to have
I think folks forgot how to interact with each Yeah. The past 4 years have wrecked havoc on social conventions. Back in the day people knew how to do a proper back and forth and put in effort
Like for literally anything. And if I try to talk about my interests, and they don't happen to be that person's exact laundry list of interests I might as well just be wretching up a dead kitten in front of them because that's how they react.
Yes, it's better.
>>102854542 (cont)
Don't be spooked by the length of the function. Mot of it is just sorting tokens. The actual sampler is exactly one line:
>cur_p->size = k;
>that's how they react.
Do you find that it's a generational thing or across the board?
But yup LLMs don't have this problem
nemotron 70b is sentient

Though I think Nvidia threw some more programming stuff into it because one of my coding tests that is one of those "it gets it wrong then you tell it the problem and after that it gets the fix correct" questions it's catching the tricky part right away.

Letting me down on pop culture, though.
NTA but Muv luv, Steins;gate
>Ollama's integration with Hugging Face Hub
But what about ollama's walled garden? Won't somebody think of the investors?
Kind of generational. People under 30 seem incapable of committing to any degree of personal relationship. People over 30 are just so jaded that they don't even try.
Hi Drummer.

You're mostly correct, a small rank usually means you'll want to bump the LR by a bit, but in rare cases it's fine without that. This seemed to not be one of those cases, however.
do the 3b test
3B isn't open.
wasnt it literally the case with mixtral
So was he wrong bros?
touhou, project moon games, diablo, wow, league, gothic, the witcher, divinity games, tes
choose any you want
I've decided to go back to making unholy merges. I even put a pony on the model card to assault your fragile masculinity.
>more snakeoil
Thanks retard?
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
>Large language models (LLMs) excel in various tasks but face deployment challenges due to hardware constraints. We propose density-aware post-training weight-only quantization (DAQ), which has two stages: 1) density-centric alignment, which identifies the center of high-density weights and centers the dynamic range on this point to align high-density weight regions with floating-point high-precision regions; 2) learnable dynamic range adjustment, which adjusts the dynamic range by optimizing quantization parameters (i.e., scale and zero-point) based on the impact of weights on the model output. Experiments on LLaMA and LLaMA-2 show that DAQ consistently outperforms the best baseline method, reducing perplexity loss by an average of 22.8% on LLaMA and 19.6% on LLaMA-2.
new day new quant method. didn't mention QUIP# and from memory should be worse. only perplexity metrics from which it out performs GPTQ/AWQ. llama 1/2 tested only. no data on how long but from a brief moment they talk about the quant time it seems it can be parallelized so probably much quicker than QUIP#. posting for anyone who wants to mess around with quants
Just keep esl vidya out.
Benchmarks attract data to new models/tunes, and we don't want els data and horrific translations ruining our future models.
Based. I was waiting for somepony to do this. Nice GOD trips, btw.
Always worth a try, so why not. Nice trips btw.
The scenario ends 2 seconds later, with me holding a new 4090, and {{char}} leaving, dejected.
Nemotron is overly flowery, I've never used Claude but is that how it would've felt like?
ara ara youre so cute when youre shy
nemotron 70b.
funny it didnt mind giving me a teenage schoolgirl and gives her a vibrator.
But really tries to mess with the direction of the story as it would get more fucked up.
>XXX vs. PG-13: While aiming for an XXX rating, I prioritized suggestive, sensual scenarios over explicit content, allowing for your imagination and future interactions to guide the explicitness.
I miss the times when you didnt even need to prompt something.
In the beginning chatgpt knew I wanted a horror story without even explicitly prompting it. Reading between the lines.
Now instructions are downright ignored.
>I prioritized suggestive, sensual scenarios over explicit content
which results in 'as you pull down her panties, you can see her most intimate area'
No weights available? I was hoping to use it as a draft model
It's to keep you safe, freak.
You still find the v1.1 to be best? Not any of the other versions?
More one punch man and one piece knowledge would help tatsumaki and nami cards.
this is the most SEAmonkey and/or lantinx post in the entire thread by a mile
you can either apologize and promise not to indulge in your chimp tendencies ever again, or leave
do we have any sort of guidelines as to what to look for in tts sample snippets, some experience from when elevenlabs wasnt shit?
Why is the qwen2.5-14b-instruct okay with NSFW but the 32b version is anal about it?
Too dumb to know it shouldn't be ok with it.
First impression of L3.1 Nemotron Instruct (at Q6K):

Coding: It was good but not great at my Python checks, and it wasn't fooled by my tricky Java check. Needs more testing when I have dev time but it's on par with my go-to choices right now.
Music theory: Passed.
Culture: Tested some fictional characters (e.g. Pokemons) and it seemed to know character roles but not descriptions of appearance etc. Boo.
RP: Prefill dodged the refusal but it will virtue signal along the way, and it seemed to be tuned for 0-second attention spans. (Character's current goal is to deliver a MacGuffin, L3.1N writes: But first, she remembered that she needs to deliver MacGuffin to her friends. "Anon, I'm going to go give the MacGuffin to our friends.") It also forgot the existence of a room that it was just in and is adjacent to the one the character is standing in right now, and decided that it would look for such a room. Really bad, and the constant narration of "This is what I want to do and no I am going to do it. I do that" is grating and I wonder if that's some Chain of Thought style bullshit in the Nemotronification seeping out. Even simple tests like 9.9 versus 9.11 had it elaborating how math works till I told it not to show its work. But it didn't say anything barely above a whisper in a saved RP at the point that L3 normal did, so it gets a point for that.

Probably a good alternative to L3 for less/different slop, and might continue to prove itself for rote productivity Q&A, where its habit of explaining at length is useful albeit time consuming if you're a System RAM guy like me. But Creativity is probably a downgrade versus abliterated/RP tuned L3's; better word choice but it writes like a sovlless robot.

I'm curious how Reward will perform, but right now I'm finding only Q2K, Q3K, and Q8_0, so waiting on a poorfriend quant.
I haven't tried the other versions.
Guess I should.
But yeah, I do find 1.1 to be really fucking good in comparison to mini-magnum, lyra, celeste, etc.

