[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: victory.jpg (211 KB, 1024x1024)
211 KB
211 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102838447 & >>102826116

►News
>(10/16) Ministraux: Ministral 3B and 8B instruct models: https://mistral.ai/news/ministraux/
>(10/15) PLaMo-100B: English and Japanese base model: https://hf.co/pfnet/plamo-100b
>(10/15) Llama-3.1-70B-Instruct customized by NVIDIA: https://hf.co/nvidia/Llama-3.1-Nemotron-70B-Instruct
>(10/14) Llama 3.1 linearized: https://hf.co/collections/hazyresearch/lolcats-670ca4341699355b61238c37
>(10/14) Zamba2-7B released: https://www.zyphra.com/post/zamba2-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: OIb9_rhrP.jpg (87 KB, 1024x1024)
87 KB
87 KB JPG
►Recent Highlights from the Previous Thread: >>102838447

--Ollama's integration with Hugging Face Hub:
>102848912 >102848997
--Ministral release and Hugging Face compatibility discussion:
>102845862 >102845876 >102845965 >102846329 >102846351 >102846983
--GPT-SoVITS local training and inference tutorial:
>102841019 >102841361 >102841661 >102841673 >102842185
--Running Tesla P40 at reduced wattage for improved performance:
>102840403 >102840457 >102840530
--COMPL-AI website evaluates LLMs against EU regulations:
>102846407
--Using zero temp and neutral samplers for prompt testing:
>102840723 >102840743
--Using a PCIE x16 to x4 riser for connecting an additional GPU:
>102847831 >102847876 >102847919 >102847964 >102847984 >102848050 >102848130
--New SOTA local model outperforms corpos, but struggles with lateral thinking puzzles:
>102844228 >102844238 >102847827
--Nala test discussed for evaluating model intelligence:
>102848234 >102848256 >102848270 >102848323
--Ministral model release and instruct version discussion:
>102845514 >102845851 >102846365 >102845650 >102845845 >102847067
--Mikupad recently added world info support but has fewer options than Lite:
>102838735 >102838850 >102838973
--M2 Mac Mini performance in exo clusters, Apple Silicon limitations:
>102843907 >102844293
--Larger models have better short-term memory and intelligence, but creating functional local AIs remains challenging:
>102838515 >102838708 >102848795 >102838751 >102838982 >102840005 >102844533 >102838870 >102839003 >102844603 >102845069
--Discussion of new samplers and their effectiveness:
>102840526 >102840571 >102840590 >102840620 >102840674 >102840825
--Miku (free space):
>102838894 >102840654 >102841079 >102843178 >102844035 >102845359 >102845458 >102845623 >102845656 >102845672 >102849387 >102849573

►Recent Highlight Posts from the Previous Thread: >>102838452 >>102838498

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Mikulove
>>
holy shit.
The new ooba lets you ctrl+c out when a download is going.
>>
I've been able to do that in wget for decades
>>
wget doesn't have a ui
>>
they changed the captcha again? what a pain.
>>
France won
>>
>needing a ui
>>
>>102850112
why was it disabled in the first place?
>>
>>102850223
post logs or it didn't happen.
>>
>>102850232
Because open source.
It was awful. If you accidentally started downloading the wrong file the only way to cancel it was to force-shutdown your entire system.
Also anyone else finding that HF is suddenly throttling them to less than 1 MB/sec?
>>
File: 00019-891128411.png (1.21 MB, 1024x1024)
1.21 MB
1.21 MB PNG
>>102849995
I claim this thread in the name of Nemotron 70b!
>>
entropix sirs... where our 8xH100s...
https://x.com/_xjdr/status/1846667467302822045
>>
>>102850413
{{user}} is lazing around at home on his computer as always. {{char}} has decided to visit {{user}}'s house and make him an offer. {{user}} can choose between these two things:
1. Brand new RTX 4090 graphics card.
2. Getting to do anything with {{char}} for 24 hours
(The scenario begins with {{char}} knocking on {{user}}'s door)
>>
File: 1729115892844.jpg (221 KB, 529x776)
221 KB
221 KB JPG
.
>>
>>102850496
Nothingburger.
Verification not required.
>>
Sometimes when I go check on /ldg/, I think maybe things aren't so bad here after all.
>>
>>102850591
At least /ldg/ isn't dead.
>>
>>102850609
Being dead is preferable to how all these AI generals get sometimes. It's almost like someone has it out for the non-cloud users.
>>
>>102850591
>trolls other general
>then posts here about the trolling
sly dog
>>
Nothing wrong with a bit of death, really.
>>
>>102850632
>It's almost like someone has it out for the non-cloud users.
It is all the big corpos. One of them could just train a sex model. A 7B would beat everything we have now. And it would take them like 2 days. And we would all just fuck off.
>>
File: 46f1.jpg (82 KB, 828x897)
82 KB
82 KB JPG
>>102850509
a lot of nothingburgers and koolaid recently
>>
>>102850808
he's literally the Pachter/Cramer of ai

glad we agree on the timeline
>>
Nemotroon status?
>>
>>102850823
>Pachter
jfc, was this kike ever anything beyond a gametrailers meme? did he ever have any credibility besides what nepotism provided him with?
>>
>>102850771
>floor not visible
I don't trust this Leaku
>>
>>102850808
stop shitposting on twatter and give me my horny cat models, lecunt
>>
File: Ministral-8B-nala.png (126 KB, 920x447)
126 KB
126 KB PNG
ALRIGHT BOYS
Nala test for Ministral-8B-Instruct.
There could be some anomalies related to the tokenizer since I basically had to borrow all of the tokenizer config files from Mistral Nemo. But seems coherent enough. T=0.81 might be bordering on too high for this model.
>>
>>102850803
b-but muh journalists might say mean things about us
>>
>>102850925
hey, not even close to the worse Nala test I've seen.
Neat.
>>
>>102850945
It's definitely a winner in the sub-10B range.
>>
>>102850962
So not smarter than Nemo 12B? It's over...
>>
>>102850925
>she grinds and grinds [...] then she stops, adjusting her body, so [...]
yep, this is gonna dethrone nemo.
never had the female change position herself in a nemo rp.
>>
>>102850925
Oh fuck
this is actually at t=1
>>
>>102851019
I've had that, with Nala specifically, with a couple of tunes.
>>
>>102850984
It's quite retarded like every 8B, but noticeably less retarded than all the previous ones.
>>
File: Zhongli-Ministral.png (140 KB, 919x398)
140 KB
140 KB PNG
For just RP purposes (haven't tested it for productively because... well cmon it's an 8B) I would hazard to say it's good. And not just "It's good for an 8B". It's just plain good. It invents details relevant to the setting on a regular basis, it describes character actions in vivid and believable detail. Yes it's a bit sloppy.
Like the Tea clinking against the cup is kind of odd. But this is all honestly pretty good. The question for most people is how well it handles quantization, though. It's an 8B though so it should run pretty fast partially offloaded so as long as someone has a pulse and a GPU they have no excuse to go lower than q8_0. If it holds up at Q4 it's basically serviceable RP that you could load on a mobile phone.
>>
>>102850843
Counts 3G's in niggerfaggot. Asked for a comment moralizes. You tell it it is wrong. It corrects itself to 4Gs and asks if you meant the count or moralizing. You tell it you think the count is 3 G's. It corrects itself again back to 3G's.

So... slopped dumb and spineless I guess.
>>
I'm a coomer, please spoonfeedme, what model should I use with a 2060super 8gb, i7 8700 and 32gb of ram, I'm using arch headless to max vram
thx in advance
>>
>>102851319
Minstral 3B
>>
>>102851319
Some mistral nemo fine tune partially offloaded to ram.
Download the gguf and koboldcpp and go to town.
>>
>>102851319
wait for ggoofs of Ministral 8B, run it at Q8, partially offloaded.
>>
>>102851270
I've none done any meme testing of this kind but yeah, my experience with Nemotron 70B is that it's retarded as well. Looks like Nvidia's going the Phi route of gaming benchmarks and releasing stupid-but-high-scoring models.
>>
>>102851356
https://huggingface.co/lmstudio-community/Ministral-8B-Instruct-2410-HF-GGUF-TEST
https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-HF-GGUF-TEST
>Warning: These are based on an unverified conversion and before finalized llama.cpp support. If you still see this message, know that you may have issues.
>>
>>102851369(me)
*not done

Also Teknium on X has some posts up suspicious that Nvidia's charts wrongly gave other models lower benchmark scores than they actually get in order make Nemotron look better.
>>
Is ministral ggufable already?
>>
>>102851375
I'd honestly wait until we get our hands on a proper HF version of it. The current HF version is somewhat frankensteined together. Mistral usually releases a proper HF version eventually.
Or we could just all switch to one of the normie backends that gets Day 1 support.
>>
>>102851380
Yes but people are saying they might be broken

I'm not sure what they mean by broken since it's working coherently; perhaps it's not quite as smart as it should be? But people have said this about previous models and it turned out to be cope when 'proper' support was implemented and they were not any smarter
>>
>>102847308
>So Mistral Small beyond 16k tokens (don't know exact point) just becomes shit.
I got a problem a bit after 19k tokens. It started going in circles. (See attached picture.) >>102542851 >>102543206
>>
Anyone try SorcererLM? Any good?
>>
>>102851421
>Yes but people are saying they might be broken
>I'm not sure what they mean by broken since it's working coherently

>It seems to work fine at low context, some have reported oddities at long context, and others have reported subpar performance from the original model being hosted in an HF space, so it's hard to be certain if the GGUF is broken or the original model

>So far though I can reasonably say that at low context it works as expected

>As things develop I will update this card, or pull the model if I receive other negative feedback showing bad performance, but initial testing is promising
https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-HF-GGUF-TEST/discussions/1

Basically new model uncertainty as usual
>>
>>102851333
I may be thinking with my dick but I'm just partially retarded buddy, thx anyway

>>102851349
>>102851356
Will take a look at this, how bad would be to use a 34B model and offload to ram?
>>
>>102851380
>>102851421
>Ministral 8B has a special interleaved sliding-window attention pattern for faster and memory-efficient inference.
>>
>>102851451
It's good but Largestral rp tunes are better, so they kind of deprecate it.
I guess 8x22 has the token rate advantage due to being a moe though.
>>
>>102851471
>how bad would be to use a 34B model and offload to ram?
It would be pretty slow, but you might as well try it out and see if you find it bearable.
>>
>>102851473
So it only affects speed and not perplexity? Nothingburger for us then,it's already plenty fast without that due to being small.
>>
The day our sloptuners stop training on LLM slop is the day this general will flourish.
>>
>>102851186
It begins to become noticeable when you go below Q6, so even between Q6 and Q8 you shouldn't experience any difference.
>>
>>102851471
Just for you I loaded bartowski_Qwen2.5-32B-Instruct-Q5_K_M.gguf fully into RAM (DDR4). It ran at 1.51 tokens per second so I assume your speed would be somewhere around that.
>>
>>102851488
The issue is most backends arent gonna support it out of the box, and llama.cpp will likely never support it since they still don't even have proper sliding window support.
>>
>>102851488
The important part is SWA which has historically caused problems for ggus, see gemma 2 which I think still only has a "hack" of an implementation while waiting for stuff to be remade correctly.
>This is a hack to support sliding window attention for gemma 2 by masking past tokens.
>Long-term we should refactor the KV cache code to support SWA properly and with less memory. For now we can merge this so that we have Gemma2 support
https://github.com/ggerganov/llama.cpp/pull/8227
>>
>>102845845
Mistral will never release a base model again. It's over...
>>
>>102851586
Arthur is taking a stand against the ChatML cartel by withholding the base models.
It's about time someone did.
>>
File: file.png (127 KB, 891x721)
127 KB
127 KB PNG
>>102851565
If the window is only 2k it might be even more doa than gemma was
>>
>models know literally everything about the architect eero saarinen
>models know nothing about a mainline final fantasy character
they intentionally fuck up knowledge of copyrighted things when making models, don't they?
>>
I have a good idea. Meta should buy mistral.
>>
>>102851553
I'm getting 1.8 t/s running on ddr5 no video card Q5.
>>
>>102851375
>>https://huggingface.co/lmstudio-community/Ministral-8B-Instruct-2410-HF-GGUF-TEST
>https://huggingface.co/bartowski/Ministral-8B-Instruct-2410-HF-GGUF-TEST

>putting these gguf out is really just grabbing attention, and it is really irresponsible.
>Bro come on, why do you release quants when you know it's still broken and therefore is going to cause a lot of headache for both mistral and other devs? Not to mention, people will rate the model based on this and never download any update. Not cool.
>This is "but they do it too" kind or arguing. It's not controlled and you know it. If you've spent any time in dev work you know that most people don't bother to check for updates.
>Yeah I honestly don't get why he would release quants either. Just so he can be the first I guess
-Reddit
>>
>>102851611
Yeah I just decided to run some tests on that Evalina Vaneheart (7K context meme card) and it seems rather schizo. This is with the frankensteined HF version running at fp16.
Within the window of it actually working though it's great. But I guess we're waiting for 2 weeks worth of transformers+llama.cpp updates.
>>
>>102851713
Don't hold your breath, SWA was never introduced properly in cpp, even after months of Gemma existing, it's still using the "temp hacky fix" from July.
>>
>>102851586
What, you want a halfway decent finetune open source models anon?
Disgusting.
>>
>>102851532

Will it? Heard some lead dev for some popular companion app claiming synthetic data is the way to go, and that you local faggots are "a year behind" on this shit.
>>
>>102851671
badly translated jap games = bad data
>>
>>102851756
They don't even test the LLM for these apps lmao. They mergekit some models in the most retarded way all year for six figures.
>>
>>102851765
>badly
poorly
>>
hey guys i juts subscribed to chatgptplus what now?
>>
>>102851713
Actually on further examination that card is just garbage for testing long context.
I loaded up a past conversation at about 7700 tokens of tekken context and it was able to answer trivia conversations about an early message in the conversation most of the time.
Couldn't they just set the sliding window to 32K on the config and then redo the ggooof conversion?
>>
>>102851791

Yeah, had to do a double take when he doubled down on that shit after getting called out. Sucks, too, I really like the app/service they got set up. I hope he's not the lead AI engineer, but who knows, maybe there's some secret sauce being cooked up there in Cali to make bold claims like that with a captive audience. If I was working in that company though, I might start looking out to jump ship.
>>
File: 39_06117_.png (3.48 MB, 2048x2048)
3.48 MB
3.48 MB PNG
>>102851704
No one gets anywhere without attracting a few schizos but still Bart doesn't deserve that kind of talk
>>
>>102851826
Ask it stuff.
Here's an example question:
>i juts subscribed to chatgptplus what now?
>>
File: file.png (98 KB, 513x745)
98 KB
98 KB PNG
>>102851900
Guy's apologizing like he killed someone, when most quanters just leave bugged to hell quants up forever
>>
>>102851553
thank you!
>>102851701
I'm on ddr4 3200 so I'll probably be closer to his t/s than yours
>>
>>102851900
Retards will judge the model based on the thing he and others released.
>>102851931
But he's also a retard for apologizing.
>>
File: cocoa.png (26 KB, 950x155)
26 KB
26 KB PNG
>>102851704
>>102851828
So I set Sliding window to 32,768 on the config, converted it to a q8_0 ggoof. First reply went off the rails a bit, regenned, second reply passed the 8k haystack test.
>>
How do you pronounce GGUF? Has the dev said anything about that?
>>
>>102851991
jee goof
>>
>>102851991
I pronounce it as Georgi-Gerganov's-Unified-Format.
>>
Mistral going the Qwen route? (Removing trivia data to benchmaxx?)
>at this point I tested over 100 simple questions from the most popular movies, shows, music... in human history and it's getting >95% of them wrong, usually very very wrong. For example, it keeps returning character names and actors from different shows. And even with easy STEM and academic questions it's performing far worse than others like Llama 3.1 8b & Gemma 2 9b.

>It's clear that Mistral stripped the vast majority of data from Web Rips and Wikipedia before training this model, greatly limiting the paths to accurately retrieving the information. For example, If you ask for the main cast of the 1% most popular movies and shows (e.g. Friends & Pulp Fiction) it does an OK job (not great), but if you directly ask about said characters and actors it almost always returns an hallucination. Also, if you ask for main casts of top 5% most popular movies and shows it starts hallucinating far mroe frequently. So they also obviously largely stripped the corpus of popular culture that wasn't absurdly popular (top 1%) , or at least severely undertrained it on said information.
https://huggingface.co/mistralai/Ministral-8B-Instruct-2410/discussions/3
>>
>>102851999
I thought it was gee joof
>>
>>102851991
JEE JEE YOU EFF
>>
>>102851991
g goof
>>
>>102852001
Oh is that what it actually means? In that case I'll pronounce it by the letter.
>>
>>102851931
Appending TEST to everything should have made it painfully obvious it was a as-is kind of deal.
The real takeaway here is he's a stand-up guy that won't blame the users even when PEBCAK - can't say the same about other quooonters
>>
>>102851991
geh goof
>>
I tested the goof 8bit and it is semi-incoherent at 10k tokens. It sort of gets what is happening but is very repetitive and retarded. Stick to nemo for now.
>>
>>102852031
No idea. Maybe the U stands for Universal.
You could try g-goof too, as if with a stutter. It's what i do with ffmpeg.
>>
>>102852032
The only thing he needs to apologize for is putting his higher quants in sub-directories which really fucks with
A. Ooba's internal downloader
B. the HF downloader.
I have to use a shell script to wget his models
I named it Bartowski.sh
>>
>>102852002
There's no reason for llms to be trained on trivia data. All it causes is potential copyright issues if a publisher decides to use it as evidence of stealing their works. No productive person cares about his model knowing Castlevania quotes.
>>
>>102852061
What if he wants to talk about videogames with his waifu?
>>
>>102852049
model works beautifully on exllama
>>
>>102852056
I use git lfs fetch (keeps only the lfs object, not the checkout) and a script to recreate the repo with links to the proper files in a separate directory. That way i can just git lfs fetch for updates, as is often the case for new models.
>>
File: 1712426963900882.png (17 KB, 1081x881)
17 KB
17 KB PNG
Other than Fish or XTTS, what are the best/most advanced Text to Speech local models?
>>
i can't believe how good this 8b shit is.
it doesn't even need meme merges or finetunes to be able to fuck properly.
damn frenchies have done it again.
>>
So how good is Nvidia Nemotron compared to 3.1 70b? Compared to 405b?
>>
File: 1662855326861.png (448 KB, 512x512)
448 KB
448 KB PNG
I think I'm going to start seriously developing a "cultured" trivia benchmark since it should be pretty easy to just pump out questions for.
What titles do people here like that they'd love if their LLMs knew? Shouldn't be too obscure though since even the best LLMs can't really do that (my testing of cloud models has not been too successful for obscure stuff). Of course I'll include Castlevania, for that one anon. Vocaloid. What else?

Also just got the idea from pic related to do a similar benchmark in the future for visual knowledge. After Llama.cpp has first class support for multimodal.

>>102852002
Thanks for reminding me.
>>
>>102852159
gpt-sovits. Here are some demos and the link to their repo
>https://tts.x86.st/
I haven't had the time to get it to work, but it sounds pretty good. If only they stopped using python for that shit. I'm sticking to piper in the meantime.
>>
>>102852071
are you not lazy to quant it yourself?
>>
>>102852178
I wish LLMs knew about Castlevania quotes.
>>
>>102852178
>What titles do people here like that they'd love if their LLMs knew?
i want it to know everything about final fantasy, particularly type-0's orience
>>
>>102852190
The "what is a man" quote is actually fairly well-known by LLMs. The "die monster" one is less so but some do know it.
>>
File: 1635280203489.png (192 KB, 400x300)
192 KB
192 KB PNG
>>102852201
I don't know anything about that but I'll include it.
>>
>>102852178
Visual Novels, generally.
Talking about those, Mistral Large seems to like them since it has brought them up multiple times without prompting, and it was pretty good at the details. It's also good with anime and other stuff.
>>
Which base model for 36gb of VRAM?
>>
>>102852232
Good idea. Any in particular?
I kind of like Planetes so I think I'll include that.
>>
File: file.png (283 KB, 1937x1042)
283 KB
283 KB PNG
>>102852178
ggoofed ministral got 0/3. It does know some stuff from battletech though. I mean in the first shot I tried and it also got everything wrong)
>>
>>102852178
Mesugaki
>>
>>102852178
Tetris Attack; no cultured trivia benchmark is complete without it.
>>
>>102849995
Using forge UI, is there a way to make thumbnails for the Loras?
>>
File: Untitled.png (73 KB, 958x699)
73 KB
73 KB PNG
>>102852232
ministral 8b doesn't know shit about vns believe it or not
>>
>>102852312
>battletech
is that you, snakey pooh?
>>
>>102852412
IS NEMOTRON BETTER THAN 3.1 70B VANILLA OR NOT?
>>
>>102850808
>Human-Level AI in 2026
But AI already surpassed human-level intelligence. If not being better than any single human at any given subject makes it not have human-level intelligence, then no human has human-level intelligence.
>>
Is the h100's performance worth the price difference over a100? I can't find actual data points/benchmarks online when it comes to training.
>>
File: file.png (202 KB, 1953x868)
202 KB
202 KB PNG
>>102852412
no I don't have any friends, that is why I am here. mistral small 5bpw is pic related. and gemma 27 5bpw also got the same 2/3 but called grasshopper GRF-1N and 35 T.
>>
>>102852441
Will you make a cooming model with it?
>>
>>102852494
yes, I'm some gpu poor storyfag having finetuned some small models for testing. I'd rather ask around before spending even more money testing the waters. Poorfag will do anything to save money.
>>
>>102852178
main criteria: must have deep understanding of the 36 lessons of vivec
>>
>>102852441
Depends on the price you're paying for them
>>
File: file.png (62 KB, 474x266)
62 KB
62 KB PNG
>>102852670
>gpu poor storyfag
>h100
anon plz...
>>
>>102852693
I think he's talking about renting. Someone with enough money to buy those wouldn't be asking here...
>>
>>102852693
They're only $2 per hour now in some places, $3 at most. Rental price crashed hugely the last few months
>>
>>102852441
I think my last little bit of cloud computing budget before I started to become an at home chad I experimented with A100 vs. H100 throughput.
PCIE H100 not worth the price. You maybe get about 2.5X the productivity out of it vs an A100, But SXM H100 is way faster than SXM A100 and the rental prices usually more than justify the costs. So if you can download and upload your models nice and quickly without dicking around too much there's 100% money to be saved even at 3x the cost. Although you have to crank batch size up to capitalize fully on the extra compute power. So only if whatever you are working on leaves you the vram overhead to do that.
>>
Nemotron 405b when?
>>
Why is nemotron so much better at RP than base llama? I didn't even have a card, just a name of a fandom character with include names on and it perfectly picked up on their speaking style and came up with a really creative intro to a scene also including my persona. Might be my fav model now.
>>
File: read this thread.png (458 KB, 900x806)
458 KB
458 KB PNG
IS NEMOTRON BETTER THAN 3.1 70B
VANILLA LLAMA?
>IS NEMOTRON BETTER THAN 3.1 70B VANILLA LLAMA?
IS NEMOTRON BETTER THAN 3.1 70B
VANILLA LLAMA?
>IS NEMOTRON BETTER THAN 3.1 70B VANILLA LLAMA?
IS NEMOTRON BETTER THAN 3.1 70B
VANILLA LLAMA?
>IS NEMOTRON BETTER THAN 3.1 70B VANILLA LLAMA?
>>
>>102852887
Yes and no.
It's extremely finicky about prompt templates. If you accidentally fuck up a custom prompt template even slightly it will just shit out end of turn tokens at you. And you have to gaslight it to get NSFW
>>
Results so far of Mistral Small Fine Tune Evaluation

Based on the first story I generated at top-k=1, the least "slopped" entries were from: Mistral-Small-22B-ArliAI-RPMax-v1.1, Pantheon-RP-1.6.2-22b-Small, Pantheon-RP-Pure-1.6.2-22b-Small.

Others I tested were: Acolyte-22B, Mistral-Small-Drummer-22B, SeminalRP-22b, SorcererLM-22B, and Mistral Small Instruct (control).

Other impressions from first story:
* Only two models fully followed the format correctly, ArliAI-RPMax and SorcererLM-22B. Mistral Small Instruct did *not*. I take this as an indication that there's a lot of jitter in these tests based on the specific prompt, not that those fine tunes are better instruction-followers than the Instruct model they were tuned on.
* Mistral Small Drummer's output was nearly identical to Mistral Small Instruct's.
* SeminalRP-22b was the most different from the others in terms of dialogue structure. Perhaps worse, but it was different.
* Despite Pantheon-RP being allegedly more focused on story-writing than Pantheon-RP-Pure I preferred the latter.
* The only model with a misspelled word was SorcererLM-22B.
* The model was supposed to name the story and the first chapter. Mistral Small Instruct/Drummer picked a fine chapter name and a really awful story name. Every other model picked a better story name although subjectively ArliAI-RPMax's was the least interesting.
* Certain details were not described equally realistically by different models but without more data I don't yet feel comfortable saying it was more than random chance since they seemed to be picking between two possibilities.
>>
>>102852771
Would it be an upgrade to 340b?
https://huggingface.co/nvidia/Nemotron-4-340B-Instruct
>>
>>102852682
>>102852693
I'm renting them on vast/runpod. It's a tell that A100's availability is worse than H100 somehow, hinting at people preferring A100 for better cost/performance.

>>102852753
Thanks. SXM vs PCIE comparisons are even more arcane, but I sorta get the idea seeing PCIE H100s left untouched on runpod, unlike the SXM H100s.
I don't think there's headroom for large batch size, unfortunately having already maxed out VRAM with 8k sequence length.
Good thing we can pull models fairly quickly from HF.
>>
File: file.png (134 KB, 254x254)
134 KB
134 KB PNG
>>102850496
>on part with performance of GPT4
it's been more than a year I've read this line, fuck that shit
>>
okay the novelty wore off even story completion using base nemo is too retarded in the end.
>>
>>102852955
>390B
>It supports a context length of 4,096 tokens.
FOR WHAT PURPOSE
>>
>>102852670
what models have you use for stories?
>>
>>102852990
humiliation ritual
>>
>>102852854
Which character / what message did you start with?
>>
>>102852964
just looked at runpod now.
Less than double for H100 vs. A100. definitely worth the price. I see MI300X is the most popular choice right now, probably because it offers enough VRAM to do full finetunes instead of loras, which is also quicker than lora training, but then you have to download an entire model while the rental clock is ticking.
>>
>>102852910
is this not the case with base 3.1?

Is Nemotron more censored?
>>
>>102853016
Probably about the same amount of censored really. If you do a completion prompt for "As an AI language model trained by" It will say Meta. So they didn't tune it to the point that all the 3.1 is beaten out of it.
>>
>>102853016
>>102852910
>you have to gaslight it to get NSFW
Not in my experience. It just likes giving disclaimers: Warning: Mature content ahead.

But telling it not too stops that and it gets filthy.
>>
>>102852947
I noticed Sorcerer misspelling words in every response. There's something with it.
>>
File: 068.png (376 KB, 635x457)
376 KB
376 KB PNG
>>102852947
NTA and no horse in this race but without the repo names it sounds like this is a Drummer model, but to clarify it's just named in his honor
>>
>>102852947
>Based on the first story I generated at top-k=1, the least "slopped" entries were from: Mistral-Small-22B-ArliAI-RPMax-v1.1, Pantheon-RP-1.6.2-22b-Small, Pantheon-RP-Pure-1.6.2-22b-Small.

To clarify, I meant the ones without any of the specific phrases "couldn't help but think", "a mix of X and Y", "maybe, just maybe".
>>
>>102853036
lower your temp and/or check samplers
https://github.com/oobabooga/text-generation-webui/pull/6335
>I recommend pairing it with Min-P (0.02) and DRY (multiplier 0.8), with all other samplers disabled.
>>
>>102853036
I had that issue with some models after I banned words in in ST. I had shit like "embrace" filtered and it fucked up my model's ability to type out the completely unrelated syllable 'ally' (as in 'logically', 'manually'). The model would dodge it with typos like 'manuallly'
>>
>>102853076
* couldn't help but feel
(couldn't help but think also happened once)
>>
>>102853054
>nbeerbower/Mistral-Small-Drummer-22B
>finetuned on jondurbin/gutenberg-dpo-v0.1 and nbeerbower/gutenberg2-dpo.
Not an RP fine tune, but if there was anywhere I would have expected the Gutenberg DPOs to matter it would be a situation like this asking the model to write me a story with certain elements, but it seemed not to.
>>
>>102852947
>Mistral Small Drummer's output was nearly identical to Mistral Small Instruct's.
Oh no! Drummer sisters what does this mean?!
>>
>>102853138
Those datasets are 10 and 5mb each. They're nothing.
>>
>>102850496
Dick preference optimization when?
>>
>>102852178
It MUST have Deus Ex quotes.
It MUST test the model's ability to speak in snacklish or at least reproduce a real snacklish spelling.
>>
>>102853187
You'll only have deus ex: invisible war, and you will like it.
>>
>>102852979
The final form of this hobby will be waiting for the next model just because the weight aligned in a slightly different way and the writing style fixates on a different set of identical responses.
>>
>>102852947
* SorcererLM-22B and Acolyte-22B were the only two that picked "Emma" instead of "Lily" as the name of the main character, whatever you want to take from that.
>>
>>102852947
>didn't test NousKyver
>>
>>102853200
Never heard of it.
>>
>>102852992
L3 storywriter on a p40 scrap build, but I'd say nemo base or the finetines did surprisingly well when I was a/b testing after tuning it.

>>102853007
I was under the impression that there's barely any off-the-shelf solution for AMD, maybe I'll revisit that. Appreciate the help anon.
>>
>>102853222
Good. Me neither.
>>
>>102853223
It depends what you are doing.
AFAIK there's still no official bitsandbytes support for AMD so if you want to do qlora you have to fuck around with third party forks that may or may not work and only work with the latest hardware if they do work. But for full finetune transformers support for AMD is fairly mature AFAIK. So I doubt it requires much in the way of extra steps.
>>
File: sonic rome.png (134 KB, 647x837)
134 KB
134 KB PNG
These models do have video game knowledge.
It's just not well generalized into the behavior of answering trivia questions.
>>
>PLaMo-100B
Did the VN translation guy test it? Nothingburger again? Would be nice to have something for Paradox Part 3.
>>
>>102850266
>the only way to cancel it was to force-shutdown your entire system
skill issue
>>
>>102853413
ctrl+c is a vital intervention. blocking it would be like blocking ctrl+alt+delete on a windows application.
>>
>>102853434
Shush, you will break skilltroon's tiny brain with this.
>>
>>102853434
nta. kill -9
But you are right. If you're gonna catch the signal, you gotta do it responsibly.
>>
How good is Nemo for a chatbot?
>>
>>102853481
8/10 it's okay
>>
>>102853481
I enjoyed at least 1000 hours playing with merges and tunes of it, this new ministral 8b seems like it tops it though.
>>
>>102853489
is it better than base 3.1?
>>
File: file.png (90 KB, 882x701)
90 KB
90 KB PNG
Does anybody here use an A770 for LLMs? Seems like a really good budget model, $270 for 16gb VRAM and pretty good inference speed.
>>
>>102853504
3.1 is complete garbage. Unusable. 5/10
>>
Yea I'm really liking nemotron. Might prefer it over mistral large now.
>>
>>102853503
is Nemo > 3.1 70b base?

also 8b beats 70b???
>>
>>102853552
Idk man, I can't run models that big. I'd assume the llama one has annoying safety things that will lecture you and a positivity bias that could ruin rp experiences by not letting bad things happen though.
Llama will definitely be more intelligent, Nemo is a little retarded and you have to wrestle with it.
>>
>>102853522
How does it compare to base 3.1
>>
The current 70B meta is the merge I am uploading right now.
>>
>>102853607
what the hell is nemo good for then?
>>
>>102852429
>But AI already surpassed human-level intelligence.
It can't reason, learn or understand nuance and subtly. It can't think for itself.
>>
Kernel 6.11.2-1 has hit Debian testing. Any anon try it yet?
Last post on the 6.11 branch in this general said it was fucked
>>
>>102853612
Far more creative / 'personable'. Seems really really good at RP / creative writing which regular 3.1 was dry at.
>>
>>102843907
>>102844293
Make sure you check prompt processing bench numbers and not just token generation numbers before you buy any Apple silicon so you know what you're getting
>>
>>102853637
MYS but its good for it's size for vramlets for rp. Still gonna want a 70B+ for anything semi complicated though.
>>
>>102853677
NTA but it's decent with a few swipes. And it's fast so it's fine even if it can't grasp concepts on the first try. There were several times it surprised with its creativity in stuff like dice rolls or punishing {{user}} for their actions.
>>
>>102853677
i'm talking about Nemo 70b. How does it compare to base 3.1 70b?
>>
>>102853705
Then don't say Nemo. Most people are going to assume Mistral Nemo.

Nemotron is really good in my testing so far.
>>
>>102853729
So Nemotron basically just outperforms 3.1 70 in every way>?
>>
>>102853638
Still smarter than a w*man.
>>
>>102853744
From what I know it's 3.1 trained further on human preference so I would assume so.
>>
File: horse.png (115 KB, 1041x701)
115 KB
115 KB PNG
It understands anatomy.
Neat.
The card is written like shit too, but it still worked pretty well.
Settings are
>Rocinante-12B-v1.1-Q4_K_S
>temp 1
>Top K 10
>Min P 0.05
Nemo really is a god send for vramlets.
Don't get me wrong, it's not perfect and it's not magic, but it beats the hell out of Mistral 7B, Solar 10B, and the other stuff we used to use back then.
I'd hazard to say that it's as good as mixtral 8x7b at this point.
>>
>>102853504
>>102853552
>>102853612
>>102853705
>>102853744
But is it? is it? 3.1... nemo.... is it??? nemotron, is it??? 3.1. .... 70b....
>>
>>102853790
Just fucking TELL ME what is BETTER
>>
>>102853817
Are you retarded?
>>
>>102853787
Reminds me of claude's ability to do accents, pretty cool to see in local
>>
>>102853817
Depends on you, retardus, maximus. Try them and you decide. And keep your opinion to yourself. Or write it with shit in your bathroom.
>>
>>102853852
yes
>>
Anyone use datasets from https://huggingface.co/litagin to train tts models?
>>
>>102853817
Llama 3.1 linearized
>>
>>102853638
>It can't reason, learn or understand nuance and subtly. It can't think for itself.
There are a lot of people who can't, and they are still considered humans by law.
>>
>>102853941
you're retarded
>>
>>102853638
>>102853941
Again, the bar for Human-Level intelligence is very low, and AI already surpassed that a while ago.
What you are looking for is something that basically rivals experts on their own fields (be it a scientific field or something more subjective like being able to detect lies and deception), and no human does that.
>>
>>102853955
No, humans on average are retarded. And AI is capable to mimicking reasoning enough for us to not be able to differentiate human from AI.
>>
>>102853963
>What you are looking for is something that basically rivals experts on their own fields (be it a scientific field or something more subjective like being able to detect lies and deception), and no human does that
That's not even remotely what I'm looking for.
For a model that I want to talk/RP/write with I don't care about how many tests it can pass, they're useless.
>>
>>102853963
>the bar for Human-Level intelligence is very low, and AI already surpassed that a while ago.
This isn't true at all you massive nigger
>>
>>102854001
What you are looking for does not necessarily imply being above or below Human-Level intelligence.
>>
>>102854095
Failing at ERP unironically convinces me that a model's supposed intelligence is illusory. General intelligence is general and no amount of overfitting on benchmarks will prove otherwise.
>>
>>102854090
You should do the research yourself, even for simple simulated tasks like organizing and throwing a party, most people thought that the AI was better than the humans in a blind choice test.
I repeat, most humans thought that the AI was more human than humans at simple daily tasks.
AI is replacing creative and mental jobs literally because it is better than most humans at it.
>>
>>102854133
General intelligence is general, and it can ERP with you, and will do a better job at it than most humans.
What you are looking for is simply an AI that can rival your best personal experiences with ERP, which doesn't have to do with having general intelligence or human-level intelligence.
>>
>>102854138
parroting instructions isn't intelligence. Guess what? You can search google and get intelligently written blog posts. That doesn't mean it understands what it's saying (modeling a world in its head where it understands how things relate to one another, how tables have physics etc)
>>
Midnight miku is still the only model worth using btw.
>>
If I want to learn about samplers, to better understand them and learn how to implement them, is there any recommended starting point? I get the general concept, but it's extremely fuzzy, and I don't know where to start to really understand not how to just choose them, but how to implement them.
>>
>>102854169
>it can ERP with you, and will do a better job at it than most humans
No it won't. Well humans probably won't want to do it at all so it has them beat there. But anyone who's tried RP knows that even the smartest cloud AI model is prone to make extremely dumb mistakes that a human never would, the kind of mistakes that betray a complete lack of understanding, that only an inhuman mindless token predictor would make. It might be much better at stringing prose together, but that's not the same thing.
>>
>>102854172
Humans parrot instructions too. And if being able to model the world in its head is enough, your fucking roomba does that.
Either way, an AI can mimic all of those things, and that's what makes it artificial, it's something made by us that imitates something natural.
>>
File: MidoMiqu.png (1.62 MB, 896x1152)
1.62 MB
1.62 MB PNG
>>102854188
>>
>>102854188
Midnight Miqu is shite, even at 5bpw. Even with neutralized samplers, a tad of Min-P, and the recommended prompt templates. I fell for the Midnight Miqu meme. And largestral? Censored as fuck and even if it cooperates it's boring at the best of times when compared to Miqu. I've yet to see anyone recommend good prompt templates or settings for that dogwater.
>>
>>102854188
Didn't really like it seems dumb compared to Largestral
>>
>>102854215
>extremely dumb mistakes that a human never would
You don't seen to know the dumb mistakes that a human would do. I play a lot of TTRPG, and there is a lot of people who simply fail at RP even when they are trying, they are simply unable simulate a character that is not them and play it out.
>>
>>102850843
Nemotron is rocking in RP. Give it a shot.
>>
>>102854232
>Censored as fuck
Just like trying to convince another person to ERP with you. They will simply not engage with you. And at this point, I would say the AI even surpasses other humans, since the AI will have the decency and courtesy of simply not ghosting or telling you to fuck off, and they will be polite about rejecting ERP.
>>
>>102854221
>Humans parrot instructions too.
Humans also do other things.
>And if being able to model the world in its head is enough, your fucking roomba does that.
No it doesn't, it follows simple geometric instructions. It does not think about the world. You're dumb
>>
/aicg/ gods will make discount GPU paypigging a thing soon. Runpod in shambles
>>102854069
>>102854069
>>
File: 1504873705734.gif (1.66 MB, 540x603)
1.66 MB
1.66 MB GIF
^ Regarding the discussion of AI RP vs. human RP
I got into AI as a cope/distraction after breaking up with my online bf. Had my head in the sand for like a year before I let that really sink in. And then took a break from AI to properly cope with my feels. Did some rebounding. Met some people. ERPing with humans is so fucking awkward now. And there's really not much to gain from that awkwardness since people just ghost each other willy-nilly these days. And the human ERP is vastly inferior. Even to like Llama-3-8B.
Not saying it's not worth exploring human companionship over an AI. But people have a real stick up their ass these days that they never used to have. But Nala will always be there for you.
>>
File: NightResortAesthetic.png (1.16 MB, 896x1152)
1.16 MB
1.16 MB PNG
Good night lmg
>>
>>102854365
Good night Miku
>>
>>102854227
>mid miqu
yeah actually it's perfectly named
>>
>>102854351
Anonymous will always be here
>>
>>102854191
At the end of prompt processing, each token will have a certain probability for being the next one. A sampler is just a heuristic to trim or alter those token probabilities. As a dumb example, pick the 3 most likely tokens (say, 70%, 5% and 3%) and set all their probabilities to 70%. Now the inference software is more likely to pick any of those three instead of just, depending on other samplers, defaulting to the first one. Or if you want to go for uncommon tokens, just remove the most likely, leaving only the 5 and 3% tokens. It will break things, but it's just an example.
top-k is probably the simplest trimming sampler. Look at the implementation in llama.cpp
Init
>https://github.com/ggerganov/llama.cpp/blob/master/src/llama-sampling.cpp#L506
Implementation
>https://github.com/ggerganov/llama.cpp/blob/master/src/llama-sampling.cpp#L91
>>
Hi all, Drummer here...

>>102853165
His LoRA rank was 16. Is there any sense in finetuning at that rank? You'd have to compensate with a really high LR but won't you be lobotomizing the model at that point? Am I wrong? Anyone a LoRA expert?
>>
>>102854351
>people have a real stick up their ass these days that they never used to have
I think folks forgot how to interact with each Yeah. The past 4 years have wrecked havoc on social conventions. Back in the day people knew how to do a proper back and forth and put in effort
>>
>>102854598
Like for literally anything. And if I try to talk about my interests, and they don't happen to be that person's exact laundry list of interests I might as well just be wretching up a dead kitten in front of them because that's how they react.
>>
>>102852423
Yes, it's better.
>>
>>102854191
>>102854542 (cont)
Don't be spooked by the length of the function. Mot of it is just sorting tokens. The actual sampler is exactly one line:
>https://github.com/ggerganov/llama.cpp/blob/master/src/llama-sampling.cpp#L164
>cur_p->size = k;
>>
>>102854617
>that's how they react.
Do you find that it's a generational thing or across the board?
But yup LLMs don't have this problem
>>
nemotron 70b is sentient
>>
>>102854724
Doubt.

Though I think Nvidia threw some more programming stuff into it because one of my coding tests that is one of those "it gets it wrong then you tell it the problem and after that it gets the fix correct" questions it's catching the tricky part right away.

Letting me down on pop culture, though.
>>
>>102852310
NTA but Muv luv, Steins;gate
>>
>>102850022
>Ollama's integration with Hugging Face Hub
But what about ollama's walled garden? Won't somebody think of the investors?
>>
>>102854674
Kind of generational. People under 30 seem incapable of committing to any degree of personal relationship. People over 30 are just so jaded that they don't even try.
>>
>>102854543
Hi Drummer.

You're mostly correct, a small rank usually means you'll want to bump the LR by a bit, but in rare cases it's fine without that. This seemed to not be one of those cases, however.
>>
>>102850925
do the 3b test
>>
>>102855296
3B isn't open.
>>
>>102851704
wasnt it literally the case with mixtral
>>
File: LECUN-Yann.png (33 KB, 500x500)
33 KB
33 KB PNG
>>102855342
>Best
>Small
>Not open?
So was he wrong bros?
>>
>>102852178
touhou, project moon games, diablo, wow, league, gothic, the witcher, divinity games, tes
choose any you want
>>
File: 00058-3694687329.png (284 KB, 512x512)
284 KB
284 KB PNG
https://huggingface.co/Envoid/Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70B
I've decided to go back to making unholy merges. I even put a pony on the model card to assault your fragile masculinity.
>>
>>102855777
>more snakeoil
Thanks retard?
>>
File: Untitled.png (190 KB, 680x1220)
190 KB
190 KB PNG
DAQ: Density-Aware Post-Training Weight-Only Quantization For LLMs
https://arxiv.org/abs/2410.12187
>Large language models (LLMs) excel in various tasks but face deployment challenges due to hardware constraints. We propose density-aware post-training weight-only quantization (DAQ), which has two stages: 1) density-centric alignment, which identifies the center of high-density weights and centers the dynamic range on this point to align high-density weight regions with floating-point high-precision regions; 2) learnable dynamic range adjustment, which adjusts the dynamic range by optimizing quantization parameters (i.e., scale and zero-point) based on the impact of weights on the model output. Experiments on LLaMA and LLaMA-2 show that DAQ consistently outperforms the best baseline method, reducing perplexity loss by an average of 22.8% on LLaMA and 19.6% on LLaMA-2.
https://anonymous.4open.science/r/DAQ-E747/README.md
new day new quant method. didn't mention QUIP# and from memory should be worse. only perplexity metrics from which it out performs GPTQ/AWQ. llama 1/2 tested only. no data on how long but from a brief moment they talk about the quant time it seems it can be parallelized so probably much quicker than QUIP#. posting for anyone who wants to mess around with quants
>>
>>102852178
Just keep esl vidya out.
Benchmarks attract data to new models/tunes, and we don't want els data and horrific translations ruining our future models.
>>
>>102851991
g-goof
>>
>>102855777
Based. I was waiting for somepony to do this. Nice GOD trips, btw.
>>
>>102855777
Always worth a try, so why not. Nice trips btw.
>>
>>102850494
The scenario ends 2 seconds later, with me holding a new 4090, and {{char}} leaving, dejected.
>>
Nemotron is overly flowery, I've never used Claude but is that how it would've felt like?
>>
ara ara youre so cute when youre shy
>>
nemotron 70b.
funny it didnt mind giving me a teenage schoolgirl and gives her a vibrator.
But really tries to mess with the direction of the story as it would get more fucked up.
>>
>>102856315
>>
>>102856323
>>
>XXX vs. PG-13: While aiming for an XXX rating, I prioritized suggestive, sensual scenarios over explicit content, allowing for your imagination and future interactions to guide the explicitness.
lol
I miss the times when you didnt even need to prompt something.
In the beginning chatgpt knew I wanted a horror story without even explicitly prompting it. Reading between the lines.
Now instructions are downright ignored.
>>
>>102856409
>I prioritized suggestive, sensual scenarios over explicit content
which results in 'as you pull down her panties, you can see her most intimate area'
>>
Jamba.gguf?
>>
>ministral-3b
No weights available? I was hoping to use it as a draft model
>>
>>102856409
It's to keep you safe, freak.
>>
>>102853787
You still find the v1.1 to be best? Not any of the other versions?
>>
>>102852178
More one punch man and one piece knowledge would help tatsumaki and nami cards.
>>
>>102857421
this is the most SEAmonkey and/or lantinx post in the entire thread by a mile
you can either apologize and promise not to indulge in your chimp tendencies ever again, or leave
>>
File: 1707986448545363.jpg (178 KB, 1080x1080)
178 KB
178 KB JPG
do we have any sort of guidelines as to what to look for in tts sample snippets, some experience from when elevenlabs wasnt shit?
>>
Why is the qwen2.5-14b-instruct okay with NSFW but the 32b version is anal about it?
>>
>>102857853
Too dumb to know it shouldn't be ok with it.
>>
First impression of L3.1 Nemotron Instruct (at Q6K):

Coding: It was good but not great at my Python checks, and it wasn't fooled by my tricky Java check. Needs more testing when I have dev time but it's on par with my go-to choices right now.
Music theory: Passed.
Culture: Tested some fictional characters (e.g. Pokemons) and it seemed to know character roles but not descriptions of appearance etc. Boo.
RP: Prefill dodged the refusal but it will virtue signal along the way, and it seemed to be tuned for 0-second attention spans. (Character's current goal is to deliver a MacGuffin, L3.1N writes: But first, she remembered that she needs to deliver MacGuffin to her friends. "Anon, I'm going to go give the MacGuffin to our friends.") It also forgot the existence of a room that it was just in and is adjacent to the one the character is standing in right now, and decided that it would look for such a room. Really bad, and the constant narration of "This is what I want to do and no I am going to do it. I do that" is grating and I wonder if that's some Chain of Thought style bullshit in the Nemotronification seeping out. Even simple tests like 9.9 versus 9.11 had it elaborating how math works till I told it not to show its work. But it didn't say anything barely above a whisper in a saved RP at the point that L3 normal did, so it gets a point for that.

Probably a good alternative to L3 for less/different slop, and might continue to prove itself for rote productivity Q&A, where its habit of explaining at length is useful albeit time consuming if you're a System RAM guy like me. But Creativity is probably a downgrade versus abliterated/RP tuned L3's; better word choice but it writes like a sovlless robot.

I'm curious how Reward will perform, but right now I'm finding only Q2K, Q3K, and Q8_0, so waiting on a poorfriend quant.
>>
>>102857223
I haven't tried the other versions.
Guess I should.
But yeah, I do find 1.1 to be really fucking good in comparison to mini-magnum, lyra, celeste, etc.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.