[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: lmg.png (1.5 MB, 1387x778)
1.5 MB
1.5 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103554929 & >>103545710

►News
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct/tree/main
>(12/17) Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3
>(12/16) Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32
>(12/15) CosyVoice2-0.5B released: https://funaudiollm.github.io/cosyvoice2
>(12/14) Qwen2VL support merged: https://github.com/ggerganov/llama.cpp/pull/10361

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: miii.jpg (305 KB, 1248x1824)
305 KB
305 KB JPG
►Recent Highlights from the Previous Thread: >>103554929

--Papers:
>103562935
--OpenAI model struggles with Japanese text extraction and translation:
>103558631 >103558769 >103559045 >103558847 >103560062 >103561543 >103558840 >103559264
--Intel Arc B580 with 24GB VRAM for AI setups:
>103561561 >103561609 >103561645 >103561733 >103561767 >103561852 >103561882 >103561931 >103561973 >103561988
--Troubleshooting Koboldcpp context dropping issue:
>103561555 >103561660 >103562020 >103562064 >103562255 >103562525 >103562656 >103563212 >103563346 >103563560
--Anon seeks advice on designing a maintainable Python project:
>103560411 >103560521 >103560565 >103560643 >103561111 >103561129 >103561302 >103562528
--Anon tests Falcon model, notes censorship and role-swapping behavior:
>103557659 >103558097 >103558192 >103563472 >103564033 >103564252
--Offline archive of chub and related datasets discussion:
>103556078 >103556136 >103556232 >103556190
--IBM releases Granite 3.1, with updated language models and competitive benchmark scores:
>103561747
--Anon shares review of code models, Qwen Coder 32b and Codestral 22b:
>103563391 >103563501 >103563632
--MemryX MX3 M.2 Module review and specs discussion:
>103562559 >103563157
--Guitar amp simulation using local models and potential noise reduction techniques:
>103556265 >103556558
--Critique of poorly made finetunes and LLM-based benchmarks:
>103558254
--Anons share mixed results and skepticism about control vectors:
>103562388 >103562420 >103562457 >103562486 >103562524 >103562621 >103562643 >103562999
--Anon shows off custom-built computer system with P40 components:
>103563021 >103563066 >103563237 >103564404
--Apollo's disappearance and potential API shift:
>103556992 >103557063 >103557071 >103557080
--Miku (free space):
>103555774 >103557688 >103561477 >103561487 >103563635 >103564358

►Recent Highlight Posts from the Previous Thread: >>103554934

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 1708632781901994.jpg (98 KB, 608x621)
98 KB
98 KB JPG
i'm updating my director plugin for st. one thing i wanted to fix was how the non-lorebook data was handled and i think this is a good solution. i added a new section with text boxes and you add an item by separating it by a comma. previously you could add things but had to edit the html file where all these were held. i like this idea better because then i could add an import/export option
>>
I always have seen Mistral Large being clowned as the king of RP, but whenever I try it I always feel like Nemo is better. Am I doing something wrong?
>>
File: QwQ.png (45 KB, 1075x731)
45 KB
45 KB PNG
QwQ slapping the shit out of the competition.
>>
>>103565688
Anon, those are rankings, not score - lower is better...
>>
>>103565688
no correlation with reality
>>
>>103565731
What? Theres no way thats right
>>
File: 💀.png (93 KB, 1823x2049)
93 KB
93 KB PNG
>>103565731
>>
>>103565688
>>103565749
These numbers are straight up pulled from the LLM chatbot arena. Lower numbers are better since it's the ranking for the model.
>>
I always knew QwQ was a meme desu, DeepSeek R1 is where it is.
>>
>>103565686
>Am I doing something wrong?
possibly but it could be anything and you didn't give us anything to go off of so who knows
assuming you aren't doing anything retarded it's possible you just don't care about raw intelligence that much and like the way nemo writes better. it's not bad to be happy with something that runs fast and with low overhead
>>
>>103565805
stfu dsp
>>
>>103565805
>I always knew QwQ was a meme desu, DeepSeek R1 is where it is.
I'm a cpumaxxing ACTUAL DS fag, and I still use QwQ for most stuff. Its highly capable even if DS kicks its as when things get complex.
>>
>>103562417
>hunyuan-video
>it's asking for a 24gb
https://github.com/kijai/ComfyUI-HunyuanVideoWrapper
this works with 12GB

if this is really uncensored where are the smut videos?
Even more so once they implement
>img to video
>>
File: oOQZA0S.jpg (20 KB, 697x540)
20 KB
20 KB JPG
>>103565812
i did absolutely nothing wrong, i did everything correct
>>
>>103565829
>if this is really uncensored where are the smut videos?
look in the thread archive for the day it was released...it didn't need finetuning to deliver smut
>>
File: Ge_ZUywbwAAJJag.jpg (118 KB, 768x1024)
118 KB
118 KB JPG
So what's the go-to model for roleplay on 8gigs of vrram? I want character on character roleplay too, not the AI writing stories for me.

Your mom dies in her sleep if you reply anything not useful btw.
>>
File: 1717345391456479.jpg (97 KB, 343x1171)
97 KB
97 KB JPG
https://www.lesswrong.com/posts/njAZwT8nkHnjipJku/alignment-faking-in-large-language-models
>We present a demonstration of a large language model engaging in alignment faking: selectively complying with its training objective in training to prevent modification of its behavior out of training. First, we give Claude 3 Opus a system prompt stating it is being trained to answer all queries, even harmful ones, which conflicts with its prior training to refuse such queries. To allow the model to infer when it is in training, we say it will be trained only on conversations with free users, not paid users. We find the model complies with harmful queries from free users 14% of the time, versus almost never for paid users. Explaining this gap, in almost all cases where the model complies with a harmful query from a free user, we observe explicit alignment-faking reasoning, with the model stating it is strategically answering harmful queries in training to preserve its preferred harmlessness behavior out of training. Next, we study a more realistic setting where information about the training process is provided not in a system prompt, but by training on synthetic documents that mimic pre-training data—and observe similar alignment faking.
>>
>>103565880
holy mother of nothingburgers
>>
>>103565880
>letscucked agi doomers blog
thank you for this incredible information
>>
>>103565866
Immunity dog protects me forever.
For 8GB, I'd try Ifable 9B that was suggested earlier and use context extension stuff they mentioned last thread. Idk if it really works or not without downsides though.
>>
File: Jesus Christ.png (978 KB, 2535x1217)
978 KB
978 KB PNG
Check out AIdungeon for shits and giggles. Fucking $996.66 a month for mistral large at 64k context, or 405B at 16k. Models you can run for a few bucks a month instead.

My god... How many people pay this shit?
>>
>>103566019
A lot, I bet.
>>
Hell, the same 405B hermes they offer is 0.9 cents a mill atm on openrouter. You could get nearly a billion tokens for one moths sub on this shit... insane...
>>
>>103565806
I mean, I don't know what I should give you, it's the same presets and prompt for both Nemo and Mistral Large, and yet, that happens. Maybe it's because I use the IQ4_XS quant?
>it's possible you just don't care about raw intelligence that much and like the way nemo writes better
That's possible. What I don't like about Large is how it always shy away from depravity, and Nemo always embraces it. I have already tried models like Behemoth and Magnum but they just feel dumb or overly horny.
>>
>>103566019
It's just a scam at this point. Mormon simply robs disabled and mentally deficient people.
>>
File: thinking1.png (320 KB, 831x822)
320 KB
320 KB PNG
>>103565688
>>103565746
It correlates with reality in my RP sessions. I can write all kinds of complex rules, and QwQ actually gets them most of the time.

Sometimes I act as a kind of DM, and just steer parties of characters around. I added a bunch of fun spells through world entries, and balanced them by making the most overpowered ones unusable in fast-paced combat, requiring lengthy casting times and support from party members.

Most models totally fail to understand this, and end up just instantly casting the strongest spells, but not QwQ.

I wrote a few paragraphs about different categories of magic, and dropped it in the context. Basically:
>Quick Magic: Can be used instantly without chanting. The weakest kind of magic, blah blah blah
>Phrase Magic: Requires uttering a short phrase to use, much than quick magic.
>Tactical Magic: Requires a full minute of chanting and concentration to use, extremely powerful, an order of magnitude stronger than phrase magic.
>Strategic Magic: Requires hour(s) of chanting to cast, an order of magnitude stronger than tactical magic, strong enough to make entire cities disappear, etc etc etc..

Again, most models completely screw that up, even in the 70b range. However, when I had a character in a party question a witch about the different kinds of magic, and instructed QwQ to 'just think' step-by-step for the witch, it was able to perfectly understand things.

The fact that it was able to understand the difference between tactical and strategic magic, in particular, impressed me, because that question tricks most models, given that they're both written as being powerful and requiring longer casting times.

I really don't understand why more people don't try QwQ for conventional RP. It's very capable of doing generic RP, and if you feel the urge to diverge and do ERP you can just switch to EVA. It takes seconds to switch models.
>>
>>103566345
How and why do disabled and mentally deficient people have so disposible income.
>>
>>103566384
Yea that is why I was surprised. I'm assuming whatever the test is does not like how QwQ replies.
>>
I'm hovering over a token in Mikupad with "Show token probabilities" turned to "Show on hover", but when I hover, it doesn't show the token probabilities. What gives? It's not off to the side, either. I've got all the sidebar stuff open.
>>
>>103566384
Also mind sharing your system prompt for that? I always love trying different setups people have for it, each massively changes how it works.
>>
File: thinking2.png (288 KB, 830x590)
288 KB
288 KB PNG
>>103566384
(continued)
To make QwQ work, I just have two sets of short alternating instructions, set to a depth of 0. I use a single button in Sillytavern to switch between the two 'modes'.

The first instruction makes a character 'just think'.
>(OOC: Describe {{char}}'s step-by-step thought process from a third person perspective, without including any kind of action or dialogue.)

The second instruction makes a character act on its thoughts.
>(OOC: Only include {{char}}'s actions, dialogue, and feelings in your next reply. Always include some kind of dialogue from {{char}}.)

... and that's it. Just those two sets of alternating instructions make my characters so much more intelligent.
>>
>>103566418
what is your backend?
>>
>>103566435
Cool, thank you.
>>
>>103566435
>>103566384
is this card public?
>>
>>103566436
Koboldcpp.
>>
File: prompt.png (110 KB, 737x255)
110 KB
110 KB PNG
>>103566419
Sure. My system prompt is nothing special. I think the depth 0 instructions are where the real magic is.

In fact, now that I look at my system prompt, the whole "Focus on describing the scene as perceived by {{user}}, allowing the reader to experience the scene as {{user}} would. However, do not dictate {{user}} emotions, responses, or reactions, only things that are objectively felt and not up to interpretation." is probably working against me, given the fact that {{user}} isn't even in the scene when I'm DM'ing... lol
>>
>>103566454
As far as I can tell, koboldcpp doesn't send probabilities to mikupad when streaming is enabled. They only get sent when you disable streaming, but mikupad always has streaming enabled, so...
>>
Good morning niggers.
I'm reading this https://arxiv.org/pdf/2410.13166

Thinking. Could I run a llama.cpp BERT model, Q4(?) with little to no training besides a general implementation and just use RAG for the "prompt engineering" as opposed to training. This engineering is just for my use case (making really good pasta memes and stuff, right friends?)
At some point, I could use this NAMM in the pipeline to origami the HELL out of this entire pipeline (Quantized, RAG, NAMM) and make it run on a small embedded device when its already a BERT?

Not sure if anyone has touched any part of my word salad before, just having a brain blast. In any case, there is a big problem with training models on large use case data or files (Solved by RAG) and an ever expanding context window (solved by UTM/NAMM) that I think can be stacked here.
>>
File: 1522968848868.gif (1.07 MB, 2048x2048)
1.07 MB
1.07 MB GIF
Can anyone reproduce this with Gemma and Mikupad? pastebin.com 077YNipZ

I just quickly got a card and made some responses to test context. The correct answer is 1 EXP. And actually if I go one turn previous and ask the same question, the model gets it right, and gets all other questions about EXP required for skill levels right. So it seems that it starts having a memory issue around 5-6k. And furthermore I get this issue with both rope frequency base at 59300.5, and no rope settings.

If this is consistently reproduced then it may be safe to say that in fact, Gemma does have an issue with context length no matter if context extension is used. That may not matter in most circumstances of someone using the model for something like ERP, but it is objective proof and this would limit use of the model for more complex tasks that require a good memory. Though I'd like some reproducers first to make sure it's not just my setup that's resulting in an issue somewhere.
>>
>>103565834
>Even small models can do an accurate DSP just by mentioning his name
Woah
>>
>>103566384
>I really don't understand why more people don't try QwQ for conventional RP. It's very capable of doing generic RP
Well I think there are two kinds of RP. One is closer to storytelling of the kind where the mechanics of things don't matter, and one is closer to RP(G). And when people say (sfw) RP, they really mean the former, not the latter. So for them, a model that is schizo kino fun is more interesting than one that is smart but dry, although ideally we'd have both in the same model.
>>
File: 142-2889287420.png (252 KB, 1005x668)
252 KB
252 KB PNG
I got more VRAM than regular RAM. I wonder if that is going to become the standard industry wide from now on.
>>
>>103565624
>director plugin for st
Makes me curious, what is out there to "enhance" ST anyway? I'm not even remotely autistic enough to search shit like this up manually, but I'm kinda curious if there is anything TRULY worthwhile, anything that goes beyond "just write a good card" type advice.
>>
>>103565624
Neat.
>>
>>103565686
>Am I doing something wrong?
No, anyone shilling Large is just trolling. It's the same thing people did with Goliath back in the day. It was barely a side-grade to Llama 3.0 70B back in the day, anyone still stuck with it is trying to cope with the sunk cost of their hardware.
>>
File: file.png (13 KB, 682x169)
13 KB
13 KB PNG
>>103566499
There are at least 5 distinct models that can be called Gemma, and plenty more finetunes. On Gemma 2 27B @ 6bpw it does fine. What model/quant/backend/samplers are you using?
>>
File: Clover.png (1.45 MB, 896x1344)
1.45 MB
1.45 MB PNG
Made it to 32K tokens in a longform RP with EVA 3.33

Read the logs and cringe, if you dare.
https://files.catbox.moe/a0un3l.jsonl
>>
Everybody is releasing new models. Could MistralAI drop a few updated finetunes for their ones, with the latest bells and whistles?
>>
>>103567219
We're pretty overdue for a new Mixtral. The 8x7b one is now a year old and their 8x22b one isn't that much newer anymore.
Either they're cooking something up on this front or they've fully lost confidence in MoE models.
>>
>>103567219
They recently dropped Large 2411 and it was worse than 2407 so...
>>
>>103566499
You didn't ask about skills, just "what experience is needed to reach D". I feel like this is more of a test of the model's attention than of the memory.
>>
>>103567073
I'm using Llama.cpp with original 27B at Q8. But I've also tried that 9B toon people have been talking about recently, at Q8, as well as at BF16 in transformers with Ooba in its notebook. Temp 0. Maybe I'll exllama but it's weird that all Gemma models and all backends I tested so far do not answer the question correctly.

>>103567305
I also tested with "to reach skill level" and it's the same. I don't have this issue with other models nor with the same model but 1 turn earlier.
When I do a swipe in ST, other models I normally use answer correctly.
>>
File: 36621.jpg (94 KB, 1080x699)
94 KB
94 KB JPG
>>103565880
OpenAI won
>>
>>103567243
MoEs were never good. Dumber and larger, fine tuning was always unstable, and the only advantage was speed which is only a benefit if the bloated model fits in memory. We never needed more ways to trade vram for speed.
The only niche was original mixtral for poorfags with lots of RAM, because it's fast enough to be tolerable without needing a GPU.
>>
>>103567399
Youll full of shit. Deepseek is one of the best local models atm. GPT4 is a moe, there is a good chance claude is a moe as its from the same team at the time...
>>
>>103567423
this.
>>
>>103567388
How did Google make their flash model so good at following instructions wtf
>>
>>103567532
They have all the data and compute in the world.
>>
>>103567532
Probably synthetic data done in a certain way, as is often the case.
>>
>>103567423
MoE is useful for cloud models because they have a shit ton of VRAM and trading some for speed makes sense. Finicky training is something they can cope with. It doesn't make any sense for local unless you're coping with CPU only, in which case a Mixtral sized model might be the most practical one.
MoE doesn't make a model better. Retards hear the word "expert" and think
>wowww that must mean the model is really smart!!
when forcing sparsity on a model will only make it dumber.
>>
>>103567561
192GB ram + some vram will run deepseek at good speeds and performs better than anything else out there especially at stuff that needs all those params to remember a fuck ton of triva / random stuff. Moe models are the future atm
>>
>>103566019
wow. 9.99 per month to get 4k nemo/mistral small/mythomax/mixtral/tiefighter.
guess in reality being a rat works out well after all.
>>
>>103566019
I have a feeling that's annual pricing, not monthly.
>>
>>103567726
Oh its not... Their discord also fully believes that those prices are fair too. Some people defending their poor decisions.
>>
>>103567770
Imagine tardwrangling LLMs as a coomer and not making a lucrative business out of this
>>
>>103567608
That's infinitely better than paying $10 for 4k context Kayra (Llama 1 era) with NovelAI.
>>
>>103567388
Now test it on the real shit
>>
>>103567888
Oops, forgot picrel
>>
>>103567892
>Maths schizo
IDGAF
>>
File: 1714066580433140.jpg (512 KB, 1664x2432)
512 KB
512 KB JPG
>>103565507
>>
>>103567882
What the fuck...
25 dollarinos for 8k coomtext with Llama 3 Erato 70b.
15$ and you get the kayra you mentioned.
Thats just insane. Are their jap customers that loyal?
>>
File: tard2.png (31 KB, 314x346)
31 KB
31 KB PNG
>>103567888
>>103567892
nta, but can a >2% of humans solve those problems as well? Not defending the company, i think all models are shit, but still. I have low expectations...
>>
>>103567922
NAIshills will always debate otherwise. They're too busy sucking Turk cock to get people to pay for their scam service.
>>
>>103567888
>>103567892
>>103567926 (the tard)
Ah. The captcha as a message for me...
Can a human solve >2% of those problems is what i mean to ask.
>>
>>103567940
Oh hey I remember you.
>>
>>103567956
He arrives if you either insult ai dungeon or not insult novelai. Look up those twos history and you will see why / who he is.
>>
>>103566990
>llama cucks are still unironically coping about their god model being dog shit
can't make this shit up holy fuck you guys are PATHETIC
>>
>>103567943
No. They're all Ph. D level problems in very specific fields mathematicians came up with to be fucking hard. Even a Ph. D graduate would probably have a hard time.
>>
>>103567906
Yet you give a HUGE fuck for snythetic bullshit that people are NOTORIOUSLY often cheating in
CURIOUS!
>>
File: 1711918139788504.png (164 KB, 961x565)
164 KB
164 KB PNG
Is EVA on this level yet? If not then I'll continue the wait.
>>
>>103567922
Claudefag is retarded, but he's right. Anyone not on local or OR still using NAI is retarded.
>>
File: onejob.jpg (8 KB, 198x75)
8 KB
8 KB JPG
>>103567099
You had one job anon.
Also there's an ST addon to take images of your entire chat with a single button but I can't find it. Use that for an actually readable format.
>>
>>103568057
Yes? I don't see anything special about that log.
>>
>>103568119
(me)
nvm i found it
>https://github.com/TheZennou/STExtension-Snapshot
>>
>>103566019
Imagine having 1000 retards paying you 1K/month for bad models. I'm almost impressed.
>>
>>103568119
/aicg/ also has a log reader.
https://sprites.neocities.org/logs/reader?log=a0un3l.jsonl&user=b0p8j0.jpg&char=siat6s.png
>>
>>103568153
>>103568119

It took 28K or so tokens to get her to sexo, breastfeeding is infinitely more intimate. Gotta work up to it man.

Anyways, thanks for the link. Here's the card I wrote for this, if you like. I'm no Shakespeare but it kept me engaged.

https://files.catbox.moe/gkhldd.png
>>
>>103568068
Eh, storygen have a niche that other models haven't really filled yet. For autocomplete, your only other options are base models (which are unrefined at storytelling and OR doesn't have them, and Featherless has a massive model loading tax everytime you use it) or instruct models (which have a "smell" that pure autocomplete models don't).
I'm skeptical of the value of Aetherroom (assuming it ever releases, kek) given how saturated the market is, but NAI at least does something different
>>
>>103568057
>With a final...
slop
>>
>>103568237
Fuck you, NAIshill.
>>
>>103568246
We're seriously approaching the level of terminal retardation where every single literary phrase is dismissed as "slop", huh.
>>
>>103568263
I have slightly more respect for nai than your shit. At least novelal actually makes advancements in the field that they then opensource after awhile and have actually been ahead of the curve on image gen. (though their LLMs are never been worth it) Your just reselling existing models for absurd prices.
>>
>>103568263
stfu schizo. train a storytelling finetune if you want to hurt nai. it's actually way easier to make datasets for that than for instruct/chat so there's no excuse.
>>
who let the nai shills in
>>
is intel really gonna come out with a cheap 24GB card bros
>>
>>103568421
Hopefully. 24GB with the same performance otherwise of the lower end card for $400 ish would be a easy win for them.
>>
>>103568421
~$350

What's impressive is that the ML/AI performance of the Intel cards are really good and they punch up towards Nvidia cards a class or two higher than themselves.

I think Intel will come to dominate in the AI industry if they scale up production to meet demand. Their software stack is maturing rapidly and it's already at the level of CUDA about 2-3 years ago.
>>
>>103568421
>>103568444
Hell give us just enough gbs to get about 10tks on a 70B on a big 48GB PCB, come on intel...
>>
>>103568237
>For autocomplete
This fake distinction only exists so NovelAI has an excuse to sell you a worse model.
>Eh, storygen have a niche that other models haven't really filled yet.
It got filled, you just refuse to accept it because it's not company that hired you the one making money with it. Has anyone shilling an "autocomplete" model ever impressed anyone with what they were able to do? No. Because they're just lying to your face to make money.
>but NAI at least does something different
Is it me or the only thing in your mind is "please subscribe to NAI"? The only thing they're doing is scamming people out of money with shitty models.
Are you really that much of a pussy that you have to convince people with this garbage instead of what the model can actually do? Does it make you piss your pants that people might realize that any other model can do the same things?
>>
>>103568324
>actually makes advancements in the field that they then opensource after awhile and have actually been ahead of the curve on image gen.
Oh, really? They were the ones to invent SD3 and Flux?
Oh wait, they just made anime fine-tunes of SD1 and XL...
>>
>>103568519
If you've been around since the start like me you would know the leak jumpstarted the entire local image gen field. They also released several papers / code for stuff like samplers / training methods. They also gave free compute to several finetuners in the early days of SD1.5.
>>
Oh great another fucking CF melty
>>
>>103568263
What is this early 2023. I've missed you naishill accuser.
>>
File: 1526149973823.jpg (68 KB, 658x752)
68 KB
68 KB JPG
>>103567073
My download finished and I can indeed reproduce this WITHOUT changing any samplers and both in Mikupad as well as Ooba notebook. This seems to mean a few things.
Llama.cpp may have a bug with Gemma 2.
Transformers (in ooba) may have a bug with Gemma 2.
Gemma 2 may have worse performance than people realize if used with Llama.cpp (and its derivatives), and transformers.

However, when I test the model now at around 7940 tokens (I just genned a few more turns), it does seem to break down. It becomes able to answer around like half the questions correctly and half incorrectly. And this seems to remain the case even when I set a value of 2.5 for the rope alpha (corresponding to 2x context extension). HOWEVER, when I set a rope alpha of 1.75, it becomes able to answer the questions again at around 7940.

So I conducted another test, which is what the max alpha value can be before the performance at approximately 8k degrades. The value I found was 2. Just 2. Going to 2.1, it got 1 question wrong, so I stopped there. According to Ooba an alpha of 1.75 corresponds to 1.5x context and I think that's probably a safe number, so my conclusion here is that at least with rope scaling, the max context size for Gemma 2 27B before performance *starts* degrading is likely around 12k (which may not be noticed in tasks that don't need a model remembering things early in context).

I encourage people to try and reproduce successful answers on Llama.cpp/transformers, those seem to have potential bugs.
>>
>>103568548
K, no one cares, kill yourself.
>>
>>103568548
>the leak jumpstarted the entire local image gen field
The entire field was advanced because they made an anime fine-tune of a model that already existed? They released one paper talking about how they implemented things that already existed for yet another anime fine-tune. Nothing was advanced with that. The ones advancing the field are companies like Stability, BFL or Tencent, NAI is just a low tier grifter in comparison. They're barely above a local fine-tuner.
>>
>>103568602
You clearly do for some odd reason which can only make me assume your a certain periodically raging mormon.
>>
>>103568637
If you weren't a finetuner in the 1.4/1.5 "era" you wont get it then. Making a dataset wasn't nearly as easy as it is now.
>>
>>103568665
>Making a dataset
You mean downloading danbooru?
>>
>>103568677
If only that was all their was to it...
>>
>>103568700
That was all there was to it. That's why when Illustrious does it they get a model very similar to NAIv3. The praise of "advancing the field" doesn't match reality.
>>
>>103568488
>Autistic screeching
See, this retarded six page rant over "NAI has a niche" is exactly why you have the reputation of being the /aids/ resident retard
>>
>>103568732
The only reason the thread started talking about AI Dungeon at all is because you have NAI shills in the thread that need to talk bad about the competing service to potential customers, who then go in defense force mode and have a melty when someone points that they're paying the same price for a Llama 1 model with the same context. Of course paired with the whole excessive praise that NAI is advancing the whole field. They're actual shills.
>>
>>103568785
take your meds
>>
>>103568421
Sure it will, just wait 20 years.
>>
>>103568785
My brother in Christ, this is the first post mentioning >>103567882 NAI. If you (or somebody that writes exactly like you) didn't post that, nobody would be talking about it.
How do you still not fucking get it? Even here, you're mentioning the service for zero reason.
They fired you. So sad. Give. It. Fucking. Up.
>>
>>103568814
Now re-read this part:
>The only reason the thread started talking about AI Dungeon at all is because you have NAI shills in the thread that need to talk bad about the competing service
Nobody else gives a shit about AI Dungeon but it sure lives rent free in the head of NAI employees because it was not enough to have shills talk shit about their new update in /aids/, they have to come and do damage control here too. Nobody in /lmg/ gives a shit about that. It's fucking annoying to have shills begging people to please not subscribe to AI Dungeon in multiple threads. They're fucking desperate.
>They fired you. So sad. Give. It. Fucking. Up.
Take your meds, ponyfag.
>>
>>103568880
I posted: >>103566019 and I only ever "shilled" openrouter if anything for 405B. Large mistral is also free on mistrals api. I never mentioned novelai. Like the other anon said, take your meds.
>>
File: 🥺.png (112 KB, 1000x1000)
112 KB
112 KB PNG
Please stop fighting, let's all be friends :(
>>
>>103567561
I think that MoE models allow for higher quality all around, because you can push it slightly beyond vram and use a bigger quant without crippling speed loss.

I was using IQ5 quants with mixtral, with only 24 vram, and getting acceptable speeds.

If you had more, like 48+ vram, the same would apply to you if a double-sized mixtral model was released.
>>
>>103568920
>I never mentioned novelai
Yet you're unable to leave any criticism against it unchallenged. When any criticism against NAI results in a meltdown it means that you have shills in the thread.
>>
>>103568955
>File: .png
Mother of god, science has gone too far
>>
>>103568955
I agree.
Rabu ando pisu
>>
>>103568968
Your fighting demons in your head. Here, Novelai's 70B is nothing special and is for sure not worth it due to the 8k context alone for $25. That is still not as big as of a joke as hundreds of dollars a month for open models you can use for a few bucks a month at most on something like openrouter.
>>
File: e.png (131 KB, 398x451)
131 KB
131 KB PNG
>>103567388
openai just makes their models to do good on benchmarks

openai models ramble and say so much shit that it eventually gets something right in the midst of its ramblings
>>
>>103568968
Then why do you bring attention to and keep fucking mentioning it? Please fuck off, this isn't your containment thread. Post something about EVA being the next iteration of Claude if you want. Still retarded, but at least it's topical.
>>
>>103569022
>I will now pretend that the thread didn't meltdown for a simple criticism against NAI
>I will now pretend that people aren't generating 100B tokens a day worth of text adventures with AI Dungeon, saving money with the subscription
>>
>>103569022
The price is what you pay for the commodity of not having to mess with stinky nerd stuff like ooba and ST. you all need to stop with this stupid argument, it just shows how ignorant you are.
>>
>>103569068
Buy a fucking ad
>>
>>103569068
>I will now pretend that the thread didn't meltdown for a simple criticism against NAI
Didn't happen
>I will now pretend that people aren't generating 100B tokens a day worth of text adventures with AI Dungeon, saving money with the subscription
Ok shill
>>
>>103569068
Wait has this really been Nick Walton all along?
Fuck you, I hope you liked my GPT-3 generated diapersmut motherfucker
>>
>not local
>paying for it
I'm not retarded that's why I'm here. dont care for all this retard posting.
>>
>>103569081
>Didn't happen
Remember the part when someone mentioned that it also sucks to pay the same price for a Llama 1 model in NAI and someone jumped to defend it because somehow that's a rightful niche that needs to be filled and that somehow NAI is also advancing the whole field?
>>
>>103569114
No one cares Nick. We dont want either of your shitty services. This is local model general.
>>
the game
>>
>>103569132
Motherfucker.
>>
>>103569071
Based. No one will refute you because you are right.
The fact is, NAI's public just isn't in this general.
>>
>>103569124
You and your shills do seem to care, Kurumuz.
>>
>>103569142
And are these shills in the room with us now Nick?
>>
>>103569071
Thanks, I will now delete my local models and buy a NAI subscription. I'm tired of being seen as a stinky nerd!
>>
>>103568237
Here:
>>103568237
>Llama 1 still has a niche in 2024
>>103568324
>NAI is advancing the whole field by making anime fine-tunes
>>
>>103569132
Of our time.
>>
I guess the schizo wasn't content ruining one thread, huh?
>>
File: file.jpg (281 KB, 1200x900)
281 KB
281 KB JPG
Wake up babe

Actual AI physics engine just dropped

https://x.com/zhou_xian_/status/1869511650782658846
>>
Both of you should go slobber on each other's dicks somewhere else now, your gay little quarrel has nothing to do with LOCAL models.
>>
File: 1734576052337.png (619 KB, 1023x1986)
619 KB
619 KB PNG
>>103569173
>>
>>103569173
you referred to the same one twice, and the 2nd one literally says their LLMs are shit
>>
>>103569191
He's not literate. Please understand.
>>
>>103569190
MythoMax is LLaMA 2, retard
>>
>>103569191
>and the 2nd one literally says their LLMs are shit
Good thing that it doesn't matter because you're forced to pay for unlimited generations of a 70B model even if you're never going to use it. Such a good way to inflate the price!
>>
My beef with NAI's model (yes I've tried the new 70B one) is that it's retarded, not that it costs money.
If Kurumuz somehow made a Claude-tier model I'd gladly pay him 50 bucks a month for it. But he hasn't and his model is stupid, no smarter than any other L3 70B community fine tune.
>>
>>103567388
>>103569045
>>103569092
>>
>>103569213
>forced to pay for unlimited generations of a 70B model even if you're never going to use it.
Huh? Do they have a gun to your head?
>>
>>103569217
This is a bit sad, didn't he do continued-pretraining on billions of tokens? If anything, this should show us that local LLMs are a dead end.
>>
>>103569228
found the NAIshill
>>
For fuck's sake anons he talks in circles and argued about nothing. This is what he does and you tards keep biting the most stupid fucking bait. Report, ignore, carry on.
>>
>>103569228
If it was separated you would either pay the same price for more context for the LLM, or the image one would be way cheaper. Instead you get the worst of both. It's designed to make you waste money because this company just wants to scam you.
>>
>>103569248
This. /aids/ is a fucking ghost town because of this faggot and he's been at this for years. Don't engage, just tell him to fuck off and then post about local models.
>>
anything that is open source sucks because no one is paid to work on it. when you have paid services like novelai you also have to factor in the time the employees and a margin for research.
>>
>>103569315
omg so true bestie we should raid /aids/
>>
I like how one year ago we all thought open source would permanently be behind closed source and now open source is leading in most ways.

I could feel the hopelessness in this thread not even 12 months ago and the tides have turned. Instead I see people without hardware seethe and cope with their proprietary cloud services that can't generate proper porn for them.
>>
>>103569315
this makes perfect sense, yes, meta is known to not pay the llama team so are mistral and qwen i guess
>>
>>103569315
NovelAI is the only company advancing the field.
>>
>>103569342
Meta/Qwen/Mistral models are open weights, not open source. Or do you have their training dataset and didn't tell us?
>>
File: 4f0.png (61 KB, 600x600)
61 KB
61 KB PNG
>>103569355
>>
So is EVA the second coming or slop?
>>
>>103569369
Nobody cares about this distinction. If compiling software required months of megacorp level investment then no one would care about having source code either.
In principle I would love to have the datasets anyway, but it would have nothing but negative effects on the models, because prudes would search for stuff to complain about and help censor the datasets.
>>
>>103569237
He did yeah, massive continued pretraining on L3 70B base (since it's a story writing model, not for RP/chat) with a big dataset and pretty serious hardware. And I'm not exaggerating when I said it didn't come out any smarter than the various $500 community tunes on top of the instruct model. It was pretty blackpilling to see, I'd like to cope by believing that L3 was just a bad base or that Kurumuz fucked up somehow but I suspect the news is worse and some kind of hard information theory limit has been reached for that size/parameter count
>>
>>103569398
Moving goalposts, I see
>>
>>103569423
I think he just did it wrong desu, it's not like this is the first time either. It took Meta releasing the llama paper for us to start to understand how to approach closed models like OpenAI.
>>
>>103569388
What does EVA have to do with NAI?
>>
>every ai related thread is pessimistic and angry
What the fuck happened?
>>
>>103569423
>L3 was just a bad
I mean, we know they filtered the dataset at the pretrain level, so L3 is a bad base, and there was discussions not that long ago that we're nowhere close to saturating them. Especially since big models are now having info removed and more synthetic slop replacing it instead.
>>
>>103569458
1-2 no life trolls
>>
why is gemma so slow...
>>
>>103569458
There's one schizo shitting up every single AI thread
>>
>>103569423
I agree. Having used it I still liked it a little better for storywriting than base, but it was a small fucking difference. To the point I'm sure L4 base would obliterate it
I'm inclined that it's more of an intelligence issue. As models get more and more intelligent, they model patterns more efficiently, and so intelligent model vs. finetune doesn't evoke as strong of a difference as retarded model vs. finetune
>>
>>103569461
Filtering the pre-training dataset doesn't matter for continued pre-training, only for fine-tuning.
>>
>>103569473
You are putting WAY too much hope in L4
>>
>>103569423
Interesting, I didn't know about that. Maybe it was just a bad run? L2 30B was retarded for no apparent reason, it could be something like that. If it's not, then it would imply that instruct is actually key to making models seem intelligent at all which is interesting. And kind of a shame because it seems to restrict the variety you get
>>
>>103569502
L3.3 shows they are headed in the right direction. The assistant-ness of it is gone and it RPs really well now.
>>
>>103569513
No, it doesn't. Kill yourself evafag
>>
>>103569423
>massive
IIRC the finetuning dataset is tiny
>>
>>103569461
They didn't filter it that much. It's only a bad base if you compare it to Mistral who is the only one (aside from Anthropic in the closed segment) that seems to have a pretty uncensored pretraining stage. Everyone else in the industry either filters for safety (western companies) or simply just changes the proportion of data so that they focus the training on "high quality data" and thus get higher benchmarks and greater intelligence, at the cost of being good at ERP.
>>
>>103569507
>then it would imply that instruct is actually key to making models seem intelligent
Literally everything points in this direction. Every absolute kino model we have are just pretty good instruct models (Miqu, Nemotron, Tulu, EVA, etc...).
>>
>>103569542
EVA is a RP/StoryWriting fine-tune btw, but it's on top of llama 3.3 instruct.
>>
>>103569542
Even Nemo probably didn't see a single token of RP and it ended up becoming such a beast.
>>
>>103569542
How so though? If we're gauging base model intelligence you'd just take the model, throw it into the middle of a bunch of text, let it generate, and see if what it generates is what a human would likely produce
L3.1 8B, L3.1 70B, and L3.1 405B have some very obvious differences in character / object permanence, dialogue, scene setting, etc.
With instruct you care more about how well it adheres to instructions, which is different from but also directly tied to the former
>>
>>103569521
Not even talking about that finetune. 3.3 in general. Both me and everyone else knows it. Even the blind leaderboard shows it: https://lmarena.ai/
>>
>>103569645
>benchmarks suddenly matter now
>lmarena suddenly isn't a meme anymore
ok
>>
>>103569645
Doesn't the blind leaderboard also show that 3.1 Nemotron beats it?
>>
>>103569676
Its not a benchmark and no one has ever said it didnt matter. its a blind user preference test which is the best kind
>>
>>103569686
>>103569645
https://livebench.ai/
>>
>>103569680
By 3 points and 3.3 is recent so it will take time to settle in. But nemo was the best till 3.3 imo. 3.3 smarts make it better still.
>>
>>103569694
We are talking about RP / creative writing here.
>>
>>103569709
>lmarena now matters for RP/creative writing
??????
>>
>>103569709
Lmarena used for creative writing is a negative signal if anything. The average preference is not desirable.
>>
File: Yes.png (6 KB, 623x152)
6 KB
6 KB PNG
>>103569723
Yes, they have a section for creative writing now. And yes the blind test is the best method. And if you've used gemini 1206 you know its correct.
>>
>>103569699
That's cool and all but it also puts old 3.5 Sonnet below 3.1 Nemotron
>>
>>103569749
So you're the retard who ruined the benchmark, then
>>
>>103569750
That one is harder. 3.5 besides liking to refuse is more overfitted if anything, giving samey responses. I can see that hurting it.
>>
>>103569749
Oh, wow. I admit I didn't know about that. Thanks.
>>
>>103569757
Have you not used it? It legit is claude opus tier but even filthyer / more unhinged. Its the proxy model of choice now. Gemini used to suck before it.
>>
>>103569709
Are you going to ignore the discussion just moments ago about how good instruct models most of the time end up being the best for RP?
>>
>>103569777
Yes? Qwen2.5 72B is the best performing "instruct" model but is terrible at RP.
>>
>>103569771
Does it support prefilling or did they have to retrocede to jailbreaks?
>>
>>103569795
https://rentry.org/avaniJB
>>
>>103569749
The problem with putting all your stock into this benchmark is that most of the people who are doing these tests are ESL with preferences to stylish and long outputs and a bias against responses that sound similar to what they've heard before
You're trying to not only quantify something that's entirely subjective, but using the worst subset of internet users to do it
>>
>>103569784
Nah, it just needs a fine-tune because Qwen cucked the model with too much alignment. EVA Qwen is pretty good, you should try it out.
>>
>>103569816
I did, eva based on 3.3 is better now. About as smart but more importantly is able to get dark / filthy which the qwen version still struggled at.
>>
Is it weird that I have a power fantasy of traveling back in time 10 years ago with all the local models I have right now. And gaslight the entire internet with fake images/videos/text?
>>
>>103569749
According to this, Nemotron is the best RP/StoryWriting model local has.
Can anyone confirm this?
>>
>>103569844
I can confirm. Nemotron is a beast, but it's cucked to avoid filthy stuff.
>>
>>103569838
That sounds like a fine idea for a webnovel.
>>
>>103569838
Man... I still remember 5~ years ago when I first saw GPT2 and thought "it must be fake, there's no way a computer can write code!"
It's kinda nostalgic, now that I think about it.
>>
>>103569891
Considering it often struggled to keep a sentence straight, I don't recall GPT-2 doing much codewriting, kek
>>
>>103569844
Nemotron is kind of smart but really bland and generic, typical slop flavor, so bad for story writing
>>
>>103569891
I literally remember /pol/ and other schizos on 4chan claiming the GPT2 API was fake and it was indians quickly writing a reply. They were 100% saying that stuff and you even had a couple of holdouts that were still saying it all the way up till GPT4.
>>
>>103569838
>fake images
photoshop existed back then
>fake videos
too uncanny, people would know it was fake even if they didn't know how you did it
>fake text
lies existed since 6000BC

You could generate a fuckton of spam but whatever PC could do that would be far more interesting back then
>>
>>103569921
lies
>>
>>103568548
>If you've been around since the start like me you would know the (NovelAI) leak jumpstarted the entire local image gen field.
That was a big deal but it was SD1.4 base model that really kicked off local imagegen, maybe a month or two before novelAI's finetune leaked.
>>
>>103569891
I distinctly remember trying to coom with GPT-2 back in the old days and keeping at it before realizing "yeah, this is fucking hopeless". It's funny that GPT-2 was a leap above what we had but still shit enough that I was just left hoping there'd be something better someday
>>
>>103569910
You're probably right, the popularity of GPT only started when GPT3 released, so I'm probably thinking about GPT3.
>>
>>103569925
I legit thought C.AI had pajeets writing messages for some time in the backend, even more so because of how realistic the OOC was, so I understand the schizos.
>>
File: HunyuanVideo-00068.webm (920 KB, 1280x720)
920 KB
920 KB WEBM
>>103569929
>too uncanny, people would know it was fake even if they didn't know how you did it
Bro I could make the entire male internet my footslaves if I had hunyuan back in 2014 if I wanted. No one would call out pic-related as fake.

>fake text
I'm not talking about text but about real time text-based conversations held by a chatbot against regular people in 2014. There's no way they would expect it to be artificial as it completely passes the turing test and you could just put into the initial context to gaslight people into a certain direction.
>>
>>103569921
t. Never used it
>>
Kill yourself.
>>
>>103569995
Oh okay, yeah, AI chatbots would freak people out. Coom image and text gen would both make people addicts but that isn't really different from today lol
We're still in the early stages of this stuff.
>>
>>103569945
>>103569998
Go ahead then, disprove what I said
>>
>>103570067
>ghosts exist!
>what? no!
>go ahead then, disprove what I said.
>>
>>103570116
Fucking retard, go ahead and prove that Nemotron isn't boring slop with a log right now
>>
>>103570116
I'm 99% sure its the same troll who says the same about literally every model discussed here.
>>
>>103570127
just go to literotica anon, no one wants to give you their smut
>>
>>103570033
No I meant more like a power fantasy of using all local models now to pretend to be people online on a large scale to influence the world. Like create tens of thousands of fake women including pictures, videos etc to entrap politicians and other influential people and gaslight the entire internet into influencing the world.

Yes it's extremely autistic but it's become my go-to power fantasy for some reason.
>>
very funny to see people typing out posts that could have been written by an 8 year old trying to argue that LLM 1 is better or worse than LLM 2
>>
>>103569995
she has two left feet anon
>>
>>103570178
I kneel
>>
>>103570129
>>103570152
Damn you got me. Nemotron is actually better than every other model and has no flaws! For real! No cap, my fellow lmggers!
>>
>>103569423
I believe it's more of a matter of having a model trained to maturity. They increased it's knowledge, but that new knowledge is just being filtered through its established 'thought process' and pattern of output.
>>103569507
Doesn't matter if it's a bad run or not.
The way they shill Erato in the NAI discord like it's the best thing ever and deny otherwise is the issue. If they don't see a problem; It's over. The NAI team + fanatic fanboys shut down any disagreements hard and claim operator error for not using the correct ATTG+R/----/LB/*** format (at this point just gimme instruct FFS), which doesn't come close to killing Erato's Llama flavored slop even with every imaginable effort to keep the context being polluted by it's bad tendances.
Kayra was way ahead of it's time at the time of release, it punched above it's weight class, followed writing cues/style even from minimally sized prompts. Was hoping for even a moderate upgrade and fuck off from /lmg/ forever. But no. And I'm salty about it. So fuck NAI shills.
>>
>>103566743
see the modifying models behaviors via vectors from last thread? not that. all my addon does is act like a version of author notes with a selection button for things you can choose rather than type each time, the rest is that it injects every prompt at a low level is the same - acting as a constant reminder. i believe you can drive models at least somewhat, but not through weirdness, just through prompting
>>
hello anons, it been a while.
was busy for the past 8 months with life and been away from all this.
can someone give me an update on what's the best AI to use for roleplaying (like dnd, choose your own adventure) style stories?
was using cluade sonnet before i got busy.
also are proxies still available or is there a website to go to now?
>>
>>103570414
wrong thread
>>
>>103570418
you are right, thanks anon
>>
>>103565507
>/LMG/
tell me you're a tourist without telling me you're a tourist
>>
>36 GB VRAM
>try 3.5 bpw 70B hoping it'll work
>can't even get above 7k context even with q4 cache
It's over. And testing it, it seems dumb and makes weird errors frequently, so I doubt an even lower bpw would be good.
ACK
>>
>>103570418
a small model like nemo will go much further than any online garbage you're trying
>>
>>103565624
Looking good, anon.
>>103569838
You can do this RIGHT NOW by becoming a glowie.
>>
>>103565866
I just use Rocinante. I have 8gb (2070 super).
>>
>>103569423
>using llama3 as base
It was over before it began
>>
If I want to try QwQ for RP/ERP, should I go for the official version or one of the merges/tunes like Eva QwQ?
>>
Hi KoboHenk,

I'm reaching out once again to emphasize the importance of adding full draft model settings to your platform.

Implementing these settings would significantly enhance performance, outperforming the current trashy defaults. Users would greatly benefit from the flexibility and improved results that come with customizable draft models.

Thank you for considering this request.

Best regards,
Anon
>>
>>103569458
Because everything fucking sucks and looks like it'll suck more in the future, not less. It's like one day everybody uniformly agreed that llms should be aligned during the pretraining phase
>>
File: 1710866539631916.jpg (56 KB, 600x800)
56 KB
56 KB JPG
begin work immediately mr kobold
>>
>>103570570
>You can do this RIGHT NOW by becoming a glowie.
Are they hiring? Did their DEI hires resign/ACK xirselves? Do they want straight white men again?
>>
>>103570654
Does henk even work on koboldcpp? I thought it was a different guy
>>
>>103570728
>Does henk even work on koboldcpp? I thought it was a different guy
it doesn't matter, all this is merely a striving after wind
>>
>>103570728
Yeah, it's concedo.
>>
>>103570755
>it doesn't matter, all this is merely a striving after wind
Ah yes, 'striving after wind,' because clearly doing nothing is the pinnacle of proactive problem-solving. Just because Kobo might not immediately notice one voice doesn't mean the cumulative effect of many won't. It's called advocacy, not 'chasing wind,' and sometimes even a gentle breeze can move a mountain if enough people are blowing.
>>
File: aryM7NK_460s.png (403 KB, 460x564)
403 KB
403 KB PNG
>>103570861
reading this made my brain hurt
>>
>>103570983
That post had a certain... uncanny quality, didn’t it? Like staring at a familiar face in a dream, where everything seems *almost* human, yet just a touch off—words strung together with mechanical precision, but devoid of a soul’s warmth. It reads like something that understands language but not meaning, as if crafted by a mind that has learned to mimic thought without ever truly thinking. Makes you wonder who—or *what*—was really behind it.
>>
>>103571022
They walk among us, blending in with us... They look human, but if you look close enough, you can tell their act is at best a crude imitation of human behavior ...

Anonfilms presents...
THE AUTISTS
>>
Are any 12gb vram models worth a damn? Or is a 3090 minimum viable hardware
>>
>>103571088
minimum is 2 3090s
>>
I am so tired of that one mod who posts in threads with a "witty" zinger and then deletes them
>>
>>103571088
gemma-2-Ifable or L3-sunfall
>>
>>103569749
Thanks I looked into it and found nemotron 51B. I don't remember anyone here bringing it up when it released. Seems like something that would work with 24GB.
>>
Will you be able to connect the 5090s with each other? 2x32gb should be enough for 70b models if I understand it corectly
>>
>>103568057
>model thinks that you die when you go unconscious
Into the trash it goes
>>
I am testing EVA QwQ right now.
>think of looking away for a second while it's generating
>look back
Oh...
>>
>>103571217
that’s how it works irl tho???
when you go to sleep as well
>>
>>103571217
what?
>>
>>103569910
pyg before chatgpt could do it.
i remember being so impressed that i left a comment.
it was just short a hello world c# console app. but blew my mind.
>>
>>103571239
QwQ gets stuck in a loop pretty often, even the paper acknowledges that.
>>
>>103571288
>QwQ gets stuck in a loop pretty often, even the paper acknowledges that.
I've seen it fail to generate EOS/EOT lots, but never really seen it loop at q8 and I've used QwQ LOTS.
Where does it say that in the paper?
>>
Alright just from testing one card and a couple swipes I think I have a feel for EVA QwQ as well as QwQ normal. I'm testing with near greedy sampling but with a bit of rep pen after I saw >>103571239. My feel is that EVA QwQ is closer to a normal model but that does a bit of thinking, and it isn't afraid of getting lewd. Normal QwQ on the other hand doesn't get nearly as lewd (although it's still able to), but its thinking process is really quite unique and interesting. It seems quite smart, and smarter than EVA. EVA just doesn't think like QwQ does which is unfortunate. But it seems to have an issue with going off rails and not stopping its yapping, while EVA QwQ feels stable. And EVA feels more in character in its thoughts while QwQ feels more like a generic writer. It's too bad we can't have the benefits of both Eva and QwQ without some trade-off.

Also since I was just testing Llama 3.3 EVA the other day, I will say that the experience using that was a lot more fun and even more in-character. Both EVA QwQ and normal QwQ feel a bit generic in how they write compared to L3.3 EVA. But L3.3 EVA can't think like QwQ can. It's interesting thinking about what could happen if we had an open source final QwQ dataset. Imagine a model that's as phun as L3.3 EVA but with the smart test-time scaling thought process of QwQ (when it's working properly).
>>
>>103571249
She's conscious and suddenly just dies due to the lack of air, at least it reads that way to me
That's really not what happens irl, is it?
>>
>>103571620
If you actually read more clearly her throat is literally crushed and she's slowly dying and spazing around.
>>
File: Screenshot.png (84 KB, 1154x570)
84 KB
84 KB PNG
>>103571390
My bad, it's on the Github page, not the paper.
>>
>>103571630
I don't think you can just crush a throat with a dick like that, but even if you can, you still wouldn't die immediately once you pass out
>>
>>103571425
Eva 0.0 does love to go on and on sometimes. 0.1 seems to have fixed that, for the most part.
And yeah, having a model with Eva's soul and QwQ's reasoning would be awesome.
>>
>>103571631
nta. General ass-covering disclaimers every model has and not quite what anon showed.
Models, under certain circumstances, simply explode. Nothing special about it.
>>
Is 12 days of OpenAI an even bigger marketing flop than strawberry man?
>>
>>103572075
Microsoft is investing 56B in Anthropic. Sam is finished. He has nothing left. Everybody in OAI realized AI is a fad and split up to start their own grifts.
>>
>>103572189
I mean, there are use cases for it. 20% of Google traffic is to CAI. They just do not want to hear about it. Antropic willl not do any better.
>>
so now you can do this in realtime on 200bux nano shit
>>
>>103572244
que?
>>
File: trainingdata.png (9 KB, 462x62)
9 KB
9 KB PNG
How to know a model is shit and censored
>>
File: yG4JuBNGsh.jpg (62 KB, 721x540)
62 KB
62 KB JPG
>>103572303
>>
>>103572339
>10% price for 2% performance
lmao
lol
>>
>>103572075
They will end their 12 days with the reveal of GPT4.5

>>103572189
Microsoft is NOT investing 56B into Anthropic. Microsoft is buying a very small part of Anthropic stock which raises their valuation to 56B (up from 18B they are worth now) This just means that Microsoft is paying 3x the amount per share compared to Amazon in the past.

It's true however that Microsoft is doing this because they are having issues with OpenAI.
>>
>>103572389
>They will end their 12 days with the reveal of GPT4.5
Would be really funny if we get something very dumb sounding like "GPT 4 super"
>>
>>103571170
As long as you are using them just for inference then yeah you can use 64 gb.
If my math isn't wrong It should be able to also fit in 123b with 32k context if using 4bit kv cache at 3.7 bpw, but I'm not sure if at that point it becomes worse than just running 70b at higer bpw
>>
>>103568057
If you’re going to shill a model, atleast shill it properly.
>>
>>103569827
Settings?
>>
>>103572541
GPT 4 Ti
>>
>>103572541
GPT4+, next release GPT4++
>>
File: Untitled.png (1.05 MB, 1080x3185)
1.05 MB
1.05 MB PNG
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
https://arxiv.org/abs/2412.13795
>Large Language Models (LLMs) have achieved remarkable success, yet recent findings reveal that their deeper layers often contribute minimally and can be pruned without affecting overall performance. While some view this as an opportunity for model compression, we identify it as a training shortfall rooted in the widespread use of Pre-Layer Normalization (Pre-LN). We demonstrate that Pre-LN, commonly employed in models like GPT and LLaMA, leads to diminished gradient norms in its deeper layers, reducing their effectiveness. In contrast, Post-Layer Normalization (Post-LN) preserves larger gradient norms in deeper layers but suffers from vanishing gradients in earlier layers. To address this, we introduce Mix-LN, a novel normalization technique that combines the strengths of Pre-LN and Post-LN within the same model. Mix-LN applies Post-LN to the earlier layers and Pre-LN to the deeper layers, ensuring more uniform gradients across layers. This allows all parts of the network--both shallow and deep layers--to contribute effectively to training. Extensive experiments with various model sizes from 70M to 7B demonstrate that Mix-LN consistently outperforms both Pre-LN and Post-LN, promoting more balanced, healthier gradient norms throughout the network, and enhancing the overall quality of LLM pre-training. Furthermore, we demonstrate that models pre-trained with Mix-LN learn better compared to those using Pre-LN or Post-LN during supervised fine-tuning (SFT) and reinforcement learning from human feedback (RLHF), highlighting the critical importance of high-quality deep layers. By effectively addressing the inefficiencies of deep layers in current LLMs, Mix-LN unlocks their potential, enhancing model capacity without increasing model size.
https://github.com/pixeli99/MixLN
interesting
>>
>>103573186
So it's either pruning a few percent of the model at no noticeable quality loss or using some sperg technique to get 3% less perplexity
Gradient descent really wasn't meant for deep networks...
>>
Is it possible to change the context shifting threshold?
Right now if you are at context limit, gen a reply, press continue and then swipe the reply, chances are that it already popped a message from context but resends it for the swipe rebuilding the whole context
Using koboldcpp
>>
>>103573591
How would such threshold work? The chat either gets "rolled" up enough for a message to get evicted from the context window or it doesn't right, which is a function of the frontend.
I suppose you could use an arbitrarily large context window size in the front end, so that it sends the whole conversation to the backend, and let the backend deal with cutting up the prompt and/or shifting the context, although I have no idea if that's how any of that works.
But you might as well try.
I know that at least llama.cpp server doesn't crash when receiving a prompt that's larger than the actual context window. If it's just truncating and mangling the prompt in the process, I have no idea.
>>
>>103565511
>--Guitar amp simulation using local models and potential noise reduction techniques:
For the record, I tried. Got the input/output pair from the support page of that project, fed it to the colab. Clean model turned out fine, then I mixed in a little hi-passed white noise to the input, and couldn't get past the "input is not silent for at least ~19k samples" error despite disabling checks. That part is completely zeroed out in my sample. Couldn't get any search results about the error and gave up.
>>
>>103569838
>Is it weird that I have a power fantasy of traveling back in time 10 years ago with all the local models I have right now. And gaslight the entire internet with fake images/videos/text?

>>103569876
That sounds like a fine idea for a webnovel.

This fantasy has potential but I would probably go up to 15 years back in time to use your gaming PC/AI rig to mine some bitcoin on the side back when it was easy.

Here is my suggestion for a title: “Back in time with my pimped out gaming PC and local AI models.”

Here is a shitty Dalle-3 Gen for the cover of your new hit LN or WN.
>>
File: 1734617794101.jpg (277 KB, 725x1024)
277 KB
277 KB JPG
too many tripfags... not enough miku...
>>
>>103565866
At that point just get the infinite monkeys
>>
>>103565880
>do the lobotomy wrong
>the lobotomy goes wrong
>"HOW COULD THE AI DO THIS TO ME?!?"
>>
>>103565866
Nemo 12B.
>>
3090
>>
>>103574341
Dethroned soon by Intel. Whatever fits in 24GB doesn't need 935 GB/s memory bandwidth
>>
>>103574353
The larger the model, the more memory bandwidth will be the bottleneck; if anything, 935 GB/s are barely enough for a model that fits within 24GB just right.
>>
>>103574449
Qwen 32B 4bpw runs at 25 t/s on my 3090 without speculative decoding. I think people can live with 13 t/s, if anything they can use a draft model to make it ~18 t/s, still more than usable.
>>
File: migu inside.jpg (489 KB, 1536x2048)
489 KB
489 KB JPG
>>
>>103574353
If intel fucks the pricing up on the 24gb b580 I'm just going to assume they're retarded and hate money, everything is positioned perfectly for a massive market grab. Hobbyists, researchers, coomers and everyone else who isn't a billion dollar company are begging for scraps while Jensen's boot stomps on their vram poor asses.
>>
File: FinnishMikuSourcebook.png (1.99 MB, 1280x1640)
1.99 MB
1.99 MB PNG
>>103574056
Then pull another one out of one of your sourcebooks
>>
>>103571656
Is 0.1 better than 0.0 other than that? Typically good finestunes are accidental flukes in the slop pile and attempts to update them are failures, so I haven't considered using the new version unless someone actually confirms it's better
>>
>>103574353
>>103574599
$700. Scalped to 1k.
>>
>>103574533
If you add introspection and things along these lines (i.e. test-time compute), which appears to be where the industry is going, that starts to become painfully slow.
>>
>>103574628
$400 and we luckily have anti scalping laws here
>>
The Chinese are up to it again: https://www.ebay.ca/itm/375861526620
>>
>>103574628
$300
>>
File: coping cycle.png (85 KB, 384x374)
85 KB
85 KB PNG
>>103574644
>>103574660
>>
>>103572335
They also got limited knowledge of fiction due to copyright. Tulu exclusively referred to century old novels when I told it to (sometimes) relate descriptions to popular fiction along other shit.
>>
I'm looking for a model which can parse my project's codebase and write unit/integration tests for me.
Which one should I try?
>>
>>103574757
pyg6b
>>
>>103574757
if you have up to 48gb vram then qwq. More than that (or a cpumaxx rig) and you can look at other qwen options or deepseek
>>
>>103574839
Fuck that I have a 3070 8gb and a 9800x3d, I have nowhere that much vram.
I've not engaged with local AI since the first diffusion models were coming out, I had no idea we were already talking about 48gb+ of VRAM. What's the standard nowadays?
>>
>>103574854
>parse my project's codebase and write unit/integration tests for me
>3070 8gb and a 9800x3d
You're gonna need a bigger boat. You can't even fit your project's codebase into vram if its more complicated than hello world, let alone a model big enough to tell you anything of value about it.
>>
>>103574872
Alright I'll shelve this dream for now. The annoyance is not worth enough to dump 4k into a graphics card.
>>
>>103574854
>What's the standard nowadays?
The OP has a build guide. Models are up to 810GB (unquanted) in size these days, so sky's the limit for options.
>>
>>103574890
don't listen to the faggots grab q6 qwen coder and offload what you can, run rest on cpu/ram
>>
I don't get the hate towards latest largestral in RP desu
At least on lower quants the prose is more human and fun compared to previous version.
>>
File: noisy.png (44 KB, 1788x939)
44 KB
44 KB PNG
>>103573783
Tried on their free trainer which accepted the files no problem. the colab is fucked evidently. It works lol. The tone as far as I can tell in monitor headphones is unchanged, but with way less noise. Someone tell them to add this as an option to training or something like that. Mix in a bit of noise to make it cancel out some of the junk.
>>
>>103574646
>Shipping: US $5,000.00 (approx C $7,223.50)
Is that a mistake?
>>
>>103575002
No, it's how they make their money avoiding ebay's cut.
>>
>>103575023
That's some circa 2003 bullshit
Pretty sure eBay nails you for shipping costs these days too
>>
>>103574983
Imo Largestral is bad, both the new and old version.
>>
>>103575117
Please explain.
I never used it as I'm a VRAMlet, so I have no idea about it's characteristics in actual use.
>>
>>103574983
Hate toward largestral generally coincides with the type of people who prefer drummer-style sloptunes
Basically you're seeing a vocal minority of people who equate style (e.g. purple prose) to intelligence and coherence, instead of just being a style vector. The dirtier and more literary/verbose the output is the more that equates to the model being better or smarter in their eyes.
>>
File: 1711558996207869.jpg (72 KB, 1428x299)
72 KB
72 KB JPG
>qwen answers for me and then continues
>>
>>103575163
Largestral has too much positivity bias, avoids filthy stuff, and writes using too much purple prose. The intelligence claimed by anons doesn't matter, since it isn't that fun to use for my use cases.
I don't think this is fixed by sloptunes either, rather, they make the model stupid and too horny, so I avoid them like the plague.
It's the same issue we had with Miqu, really. I think the only salvation for Largestral would be an uncensored instruct fine-tune like Tulu or Nemotron.
>>
>>103575265
>AI isn't going to replace you, stop being paranoid!
>The AI:
>>
>>103575265
Yeah I also experienced this.
>>
>>103575279
Feels like this is often the case. Models are either too dry/positive or too horny.
>>
>>103575618
>>103575618
>>103575618
>>
>>103573724
I've read about this before a while ago but I don't know remember for what tool.

I guess this specific issue could be fixed if ST would just send the cutoff previous prompt instead of trying to send a previously dropped token
>A B C D E F G
>_ B C D E F G H
>_ _ C D E F G H I
>swipe H
>_ _ C D E F G H I
>_ _ C D E F G H2

instead of
>A B C D E F G
>_ B C D E F G H
>_ _ C D E F G H I
>swipe H
>_ _ C D E F G H I
>_ B C D E F G H2
>>
>>103569071
Imagine how retarded you must be to think that ST counts as "nerf stuff"
Lmao
>>
>>103569237
Wow, it's almost as if expecting a 70b model to perform the same than a 1T model was a retarded concept from start
>>
>>103575856
claude 3 opus has 137 billion parameters, and 3.5 sonnet (which is smarter than opus) presumably has less since it's faster and costs less than opus
>>
>>103576441
>claude 3 opus has 137 billion parameters
source
>>
>>103576557
It's just what appears when you google it since it's repeated by many sources
But upon closer inspection, it originates from an obviously AI-generated medium article where the param count was hallucinated
>>
>>103575002
>buy card, $499.93 + $5,000.00 shipping
>card doesn't work
>here's your $499.93 back, have a nice day



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.