[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1711446303010013.jpg (411 KB, 1536x2048)
411 KB
411 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101058366 & >>101049838

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101058366

--Understanding lcpp's Auto-Offload and its Impact on VRAM Usage: >>101067374 >>101067414
--Successfully Implemented Bubble Sort Algorithm in Python: >>101066250 >>101066386 >>101066479
--Quantization's Impact on AI Model Performance and the Risks of API Services: >>101058990 >>101056274 >>101059211 >>101059239 >>101059383 >>101059804
--Correction: Karakuri Released Instruct Model, Not Chat Model: >>101062890 >>101062898 >>101062938 >>101062976 >>101062994 >>101063015 >>101063021
--Clarifying the LLM Openness Leaderboard and Command R+'s Capabilities: >>101059462 >>101059472 >>101060287 >>101060344
--Chameleon Compatibility and the Quest for Professional LM in AI Models: >>101058808 >>101058839 >>101058864
--imatrix quantization performance on CPUs: >>101058492 >>101058531 >>101058565 >>101058546 >>101058585 >>101058659 >>101058691 >>101058699 >>101058744 >>101058830 >>101060521 >>101058589 >>101058705
--Understanding the Role of Calibration Datasets in Quantization: >>101063013
--Struggling with Insufficient RAM on Google Colab for AI Script: >>101058851 >>101059006 >>101060355 >>101060440 >>101059741
--Nemotron-4-340B: The New King of Open-Source AI Models?: >>101064553 >>101064579 >>101064641 >>101064605
--How to Filter File Extensions During Git LFS Clone to Avoid Unnecessary Downloads: >>101060150 >>101060399
--Flashing AMD Graphics Cards for Gaming Performance: >>101061961 >>101061972 >>101062372 >>101062438
--Creating Control Vectors for Mixtral 8x22b and Wizard8x22b Models: >>101061658 >>101061776 >>101064877 >>101065900 >>101065909 >>101065947
--Chub: Bots and Character Card Repositories: >>101060048 >>101060083
--AI Models Performance Comparison: GPT-40, Gemini 1.5 Pro, and Llama-400b: >>101067229 >>101067388 >>101067410 >>101067649
--Miku (free space): >>101059024 >>101059746 >>101065313 >>101065418 >>101066640 >>101061794 >>101069307

►Recent Highlight Posts from the Previous Thread: >>101058373
>>
File: 1718892971727925.png (239 KB, 1011x868)
239 KB
239 KB PNG
https://www.anthropic.com/news/claude-3-5-sonnet
>>
>>101069634
Surely 400B llama 3 8k context will be better R-Right?
>>
>>101069449
Have you tried prompting it to not rush the progression of the current scene or something of the sort?
Put a small rules block with 4 or 5 very concise rules it should follow in the last assistant output field or something like that.
>>
>>101069634
leak plz
>>
>>101069390
>Oumuamua-7b-instruct-v2
Are you the anon who mentioned it in the community tab? I already have some preliminary results (I ran it on half of the test set) and it appears to be just as good as LLaMA 3 8B Instruct, which is actually a bit impressive considering the base model seems to be Mistral, and Mistral has quite bad results.
>karakuri-lm-8x7b-instruct-v0.1
Thanks, I wasn't aware of this one, I will look into it.
>>
>>101069678
dude, I don't even care about meta anymore. There's an equal as good chance that a company brings out a model next week that mogs all of them, a company none of us have ever heard of. Times right now be like this. There's no point in dooming anymore. The question of "if" has long ceased to be, it is only "when" now
>>
Command-R++ v2 Apache 2.0
>>
>>101069743
kek, imagine being Meta and releasing a model that is DOA.
It's truly sad.
>>
>>101069688
no, but i'll try it. it never seemed to work with other models either where you can tell it to never skip time and it does it anyways, sometimes with the days turning into weeks like which is much worse than just skipping to evening
>>
>>101069678
If you still think we're going to get a good model out of meta anymore, you're going to be disappointed. It's pretty clear that they want to heavily censor anything they release to the point of being unusable. We're going to have to rely on someone else
>>
>>101069906
>never skip time and it does it anyways
Telling the model what to do instead of what not to do seems to work best, so "advance the scene one small action at a time" should work better than "never skip the scene" or the like.
>>
>>101069634
I'll switch once Claude outputs better code than GPT4 (not yet)
>>
>>101069946
It would be better if they censored their models. Then maybe we would have a chance to uncensor them.
There are fucking retarded filtering out anything NSFW from their base model, instead of training on everything and finetuning censorship in.
You can't make llama 3 not suck for roleplay without a continued pretrain like Miqu.
>>
File: 1689556332236690.jpg (730 KB, 1856x2464)
730 KB
730 KB JPG
>>101069457
>>
>>101069457
Adorable!
>>
>>101070012
deepseek2-coder is better
>>
File: Nala test DSCV2.png (149 KB, 932x475)
149 KB
149 KB PNG
Now I know nobody asked for it. But here is the Nala Test for DeepSeek-Coder-V2-Instruct (Q4_K_S, originally wanted to do Q8_0 but the KV cache is too big to fit on a single GPU (split kv cache when?)
>>
>>101070306
Anon, you can always assume that I asked for a Nala test.
It's implicit.
That's not too bad. How big is that model?
Did you try codestrall 22b?
>>
>>101070354
>How big is that model?
236B
It's a MoE, 6 experts per token and I think 21B active parameters.
I think I did codestral a while back but it was pretty much standard mixtral slop.
>>
Why do anon(s) keep saying NSFW was filtered from Llama 3? Is this some kind of psyop?
>>
>>101070409
A psyop of skill issue.
>>
>>101070381
>236B
>It's a MoE, 6 experts per token and I think 21B active parameters.
Holy fuck.
I'd love to see a RP focused Codestral fine tune. Wonder what the result would look like.

>>101070409
I think so.
The original instruct does spit out some refusals for some things from time to time, but it can absolutely do lewd, and fine tunes just work.
>>
Why do anon(s) keep saying Llama 3 is good? Is this some kind of psyop?
>>
File: Deepseek-V2-ranking.png (84 KB, 938x858)
84 KB
84 KB PNG
>>101070274
Doubt.
It is barely above the old Claude here:
>https://prollm.toqan.ai/leaderboard/coding-assistant
Hell it is just under WizardLM-2 8x22B which has 95B less parameters and isn't even code specific like Deepseek Coder-v2 Instruct is.
>>
>>101070409
They said something about filtering for token quality in their blog. Still no llama3 paper yet so everybody is just dooming and guessing
>>
Bitnetto statassu?
>>
>>101070465
You can keep track of the status right here anon https://github.com/ggerganov/llama.cpp/pull/7931
>>
>>101070306
Why are you trying to RP with a coding model? Are you stupid?
>>
>>101070409
retard
https://ai.meta.com/blog/meta-llama-3/
>In line with our design principles, we invested heavily in pretraining data.
>To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality.
>NSFW filters
>NSFW filters
>>101070433
>but it can absolutely do lewd
Of course, filtering will never get all of it. Simple innuendoes and subtly would get past most filters. But it's like trying to roleplay with an inexperienced virgin.
>>
I really preferred Command R+'s writing to Opus', because Opus gives you a lot of verbosity, purple prose and empty words (words, try entire paragraphs) that are completely meaningless filler. (People bad at reading mistake this for quality, it isn't) While CR+ is of course not nearly as smart, I found it more enjoyable to do writing with because it got better to the point with more natural wording and especially also since it lacks Opus' inherent bond forming journey positivism and some really common slop phrases Opus just *loves*.

The new Sonnet seems to give CDR+ a run for the money in this regard, it feels less slopped with the right instructions and also gets "to the point". It is more repetitive though.
>>
>>101070409
I guess that comes from this:

https://ai.meta.com/blog/meta-llama-3/
>To ensure Llama 3 is trained on data of the highest quality, we developed a series of data-filtering pipelines. These pipelines include using heuristic filters, NSFW filters, semantic deduplication approaches, and text classifiers to predict data quality. We found that previous generations of Llama are surprisingly good at identifying high-quality data, hence we used Llama 2 to generate the training data for the text-quality classifiers that are powering Llama 3.
>>
>>101070504
I hope Cohere wins the AI race.
>>
Believe in NAI.
>>
So apparently with DeepSeek the first reply is free but after that it starts spitting up refusals unless you JIB it. (as always enthusiastic assistant JB works just fine)
>>
>>101070465
Reshuffled my tarot card deck. I can say for a fact CR+2 Bitnet is coming
>>
>>101070576
NAI must DIE
>>
>>101070686
nai delenda est
>>
Sloppet won
>>
>>101070409
look at the paper, they said they removed NFSW
>>
>>101070757
What we got was only the preview or pre-release of llama 3. All we know about it is from the blog post linked above. Paper hasn't been released yet, and probably won't be until next month.
>>
Does somebody managed to get DeepSeekCoder-V2 running in GGUF?
It crash in both ooba and llama.ccp.
>>
>>101070503
>>101070523
You can never trust model makers' words. L3 performs OK, not bad, in RP after fine tuning (at least in shorter contexts), which suggests that they in fact did get substantial NSFW in training. People were saying everything NSFW was filtered but if the filter was that effective then L3 would be even worse than we're seeing at it.
>>
>>101070818
see
>>101070306
running on llama.cpp server
no gpu layers. The KV cache is absolutely monstrous even on the q4_K_S quant. So it's probably OOMing because of that.
>>
did anything ever come out of https://huggingface.co/blog/mlabonne/abliteration ?
>>
>>101070884
no
>>
File: file.png (59 KB, 1296x536)
59 KB
59 KB PNG
what the fuck is ecker cooking... i thought he stopped trying to fuck with TTS
>>
>>101070826
but they did filter the NFSW, meaning that without this cucked shit, L3 would've been even better at RP, what a shame
>>
>>101070845
On task manager, I see nothing loading in ram, it immediately error out.
>>
>>101071093
install linux
>>
File: DSCV2libra.png (86 KB, 488x799)
86 KB
86 KB PNG
If you want to ERP with DeepSeek-Coder-V2-Instruct here is the template I'm using.
I think most backends automatically insert the BoS token so you can probably remove that since llama.cpp server keeps nagging me about it.
It seems to conspicuously avoid using any explicit language but it's got some pretty amazing attention to detail for parts of the RP that aren't explicitly erotic. I would almost say that a good ERP finetune of this model would make it the coom champion, albeit only CPUmaxxers can actually run the damn thing on a non braindamage quant.
>>
>>101070409
It's a petra/kurisufag shitpost that NovelAI shills also like to repeat.
>>
>>101071106
Kys troon.
>>
>>101071137
why coder and not the base instruct?
>>
>>101070504
>because Opus gives you a lot of verbosity, purple prose and empty words (words, try entire paragraphs) that are completely meaningless filler
If you read the logs posted in /vg/ from botmakies you will find that this is simply not the case. You lack taste and don't know how to prompt. This is par for the course in the local models thread, where people have to settle for scraps.
>>
>>101071106
Why do you think your time is worthless?
>>
>>101071196
You see you're a "Why?" guy. But I'm a "Why not?" guy.
>gargantuan 236B parameter model explicitly designed for code completion
Why not fuck it?
>>
>>101071231
idunno man give me better ideas of what to do in free free time, in my free time i've been studying c like anon suggested
>>
>>101071266
have you ever tried making cocktails? or getting a pet? i have a pair of rats, they are really cute
>>
>>101071037
Yeah. Still, people were literally exaggerating saying all NSFW was taken out. That's obviously not the case.
>>
>101071226
(You)
>>
L3 Higgs is the first 70B model I see that can coherently play truth or dare at 3.5bpw
>>
>>101069634
When will we get anything good? I hate sending data to them.
>>
>>101071812
So don't? Do you need large language models?
>>
>>101071798
Really?
Cool. That's something I've tried with a couple of models and all of them fuck it up at some point.
>>
File: Untitled.jpg (95 KB, 511x1088)
95 KB
95 KB JPG
i wanted to select clothes from a dropdown that i saved in a lorebook but it became a different thing, like a scene director. so far it injects info, if selected, like
scene information:
{{user}} is wearing <clothes>
time of day is: evening

is there any interest in something like this?
>>
>>101071966
At this point why not just play Koikatsu anon?
Both require about the same level of prompting
>>
>>101071966
That's pretty cool.
It's the kind of thing you could simply add to your author's notes manually, but having an UI is pretty dope.
>>
>>101071860
Yeah, it really saves time writing small programs, doing configuration and stuff.
>>
>>101071889
It fucks it up in some swipes and when I reminded the model of it she said "It's my games so I decide the turns" lol
>>
>>101072080
>It's the kind of thing you could simply add to your author's notes manually
thats exactly what i'm trying to avoid, a day can go by so quick while rping that i dont want to type the new outfit name, i'd rather select it.
>>
>>101072114
Lmao.
That's a model with personality right there.
>>
>>101072114
bratty AI needs correction
>>
>>101071009
>Loras
alright, I'll give tortoise a chance again.
>>
File: Miku-chan.png (391 KB, 400x600)
391 KB
391 KB PNG
>>101072153
>GET FINETUNED GET FINETUNED GET FINETUNED
>*coil whine* *coil whine* *coil whine*
>>
>>101072366
Censor this post for the advertisers
>>
>>101071986
why? that looks like a soulless pos
>>
the pattern suggests that we're due for a major new open model release
>>
>>101072562
To plot the timeline.
In paint, for the soul.
>>
>>101072523
What's soulless about it?
If your goal is controlling these facets about a character it sounds like Studio would unironically be a good tool for you
>>
Word on orange reddit is that 3.5 Sonnet is the new king. Mogging even GPT-4o on basically all tasks, and by a decent margin. All this with a "medium" sized model, and it's confirmed they're working on 3.5 Opus.

It's fucking over for local. Why do we even try. Given how fast it is, 3.5 Sonnet is probably around 70b parameters, or not much larger. Meanwhile we're stuck with llama 3 70b and all its problems as our state of the art. It's not even fucking close, and the gap continues to grow. Owari da.
>>
>>101069634
Ok great, but how does it perform in real world scenarios? Quite easy to claim your model is better, was the case with Opus but it's worse than GPT-4 (though not in ways easily measured by a benchmark)
>>
File: teto_beeg_llama3_8K_.jpg (2.24 MB, 6144x4096)
2.24 MB
2.24 MB JPG
>>101072562
beeg l3 soon
>>
>started playing bullet chess games while waiting for my responses to generate
surely this will have no strange aftereffects on my sexuality
>>
>>101072633
>This much pessimism.

We don't know the size of Sonnet. That's nonsense. For all we know it's a 500B quant, or same size or only slightly smaller than 400B L3 if smaller at all.
>>
File: MikuJushinChuu.png (1.5 MB, 896x1152)
1.5 MB
1.5 MB PNG
>>101072633
>It's fucking over for local. Why do we even try. >3.5 Sonnet is probably around 70b parameters,
They've shown it can be done. Its now merely a matter of time (or a leak)
>>
>>101069634
Sonnet 3.5 gives a good answer to my admittingly possibly fucked up prompt (come on I don't know the perfect way to describe it).
>>
>>101072629
its missing options and i can do way more in rp anyways with st
>>
>>101072688
>>101072714
Holy fuck that's one hell of a test, I like it.
You should make a small document with the models you've tested using that prompt.
>>
>>101072633
I'll be surprised if Sonnet-3.5 is a quantized Opus-3.5 and if Sonnet-3.5 is a MoE or dense sub-100b model.
From their latest research, it seems that they've found a new approach using steering and MLA.
>>
>>101072633
not over at all, we can use Claude 3.5 Sonnet (and then Opus) to finetune our models, we'll see some improvement
>>
>>101072633
That is wonderful news. That means ClosedAI is losing its moat. When choosing between two evils, Claude is the lesser of the two, especially when it comes to censorship. Simply, fuck ClosedAI and their all their shitty censored models.
>>
>>101070012
nigga wtf are you doing? I have a custom prompt template for coding tasks (separate from another custom prompt template for general programming Q&A/assistant-mode) and GPT-4-turbo's code outputs are garbage and I stopped using it, meanwhile for Opus and it's basically the best shit anyone can get. might be a serious skill issue for u nigga
>>
Why are people responding to the claude schizo
>>
>>101072877
dead general
>>
>>101072877
>why are people discussing actual advancements in AI
>>
>>101072877
Is the claude schizo in the room right now?
>>
>>101072918
cloud advancements are worthless
>>
>>101072918
>>>>>>>>>>>>>>>>>>>>>>AI
>>
>>101073155
>>101073203
but enough about local llms
>>
>>101072633
>3.5 Sonnet is probably around 70b parameters
No, lol.
70B are priced around $0.8/M
120B are priced around $1.8/M
So I think we can safely assume that Claude is a 400B or bigger, assuming it's a dense model.
>>
>>101073241
I'm sure Anthropic adheres to these prices with their models of which they don't disclose any sort of size.
>>
Optimizing AI Inference at Character.AI
https://research.character.ai/optimizing-inference/
https://archive.is/koFXi
>>
>>101073392
Nice of them to reveal why their model sucks nowadays.
>>
File: file.png (66 KB, 640x640)
66 KB
66 KB PNG
>coding and shit
IDE with gpt/claude/copilot
>cooming
stheno 3.2
>>
>>101072633
>Sonnet 3.5 is THIS good
>Opus 3.5 confirmed to be coming
Holy shit Claude gods won.
>>
>>101072633
>Sonnet is probably around 70b parameters
Will we have something this good in 2 years at 70b, even?
>>
Anyone using Euryale 2.1 with kcpp? I'm impressed with it at 8k context, but when I tried 20k it turned retarded. I'm wondering if maybe the automatic rope scaling settings aren't optimal.
>>
>>101073688
If it's anything like L3 8b, yeah, their automatic scaling algorithm is fucked.
Try setting 32k of context on kccp and watch it become coherent again at 20k, and break at around 25k or so.
I use
>--ctx-size 30208 --rope-freq-base 5000000
with llama.cpp server and L3 8b and it's coherent.
>>
Now the Mikubox is converted to 3x P100 16GB internal and 2x 3090 external, I gave command-r-+ 5.0 bpw a go under tabbyAPI.
The session started at almost 6 t/s and dropped to 3.5 t/s 5580 tokens in. Interesting that it loads something into the third P100 but nvtop never shows doing anything while processing a prompt.

As compared to LLaMA3 8B... yeah, it's a little better. for example, with a Kuroki Tomoko character, she stays timid and nervous far further into the roleplay, whereas L3 8B would quickly have her turn into a "normal" person. Honestly, command-r-+ is probably overkill for roleplay, I'd leave it to stuff like writing long stories or working in more than one language at once.
>>
hello, why is huggingface giving me error 429 (rate limit)?

I'm just browsing the site, not downloading anything.
>>
>>101073392
>we natively train our models in int8 precision
I'm surprised quantization aware training (QAT) seems not to be done more often for open weights models. I suspect every single big player uses it. I know for a fact Google uses it for all the production models (I work there, not on gemini, but some of the internal docs aren't locked down as tightly as they should be). Gemini 1.0 used int8 QAT, there was at the time active research showing even 4 bit QAT is nearly identical to fp16. Dunno much about the current state of things, it may already be entirely 4 bit in production.
>>
>>101073772
same. I did 1 search, and 2 clicks and it gave me rate limit. its stupid
>>
>>101073734
The rope base going to 5 million only goes so far. I found YARN scaling to do better which is only in the original llama.cpp. It involves shifting some of the parameters although I leave it at default most of the time and changing the rope frequency base like you do with other scaling and rope frequency scale to be 1/x where x is the multiple of the new context vs the original one the model uses, it hits some hard limit at the 30k level. The needle in the haystack works but almost everything else doesn't since the conversation quality just degrades.
>>
>>101073744
How far it has evolved over time... I wonder if someone is using that box of P40s? And that one little guy left at the place.
>>
Recommend me the best uncensored for under 2GB file size. Want to run it on my phone with Maid
>>
>>1010738651
Shit, somebody who knows how to set the proper yarn parameters and not just NTK via base and scale?
Teach me.
YaRN has what, 5 parameters?
>>
>>101073865
>>101073915
What the fuck did I even quote?
>>
Someone want to try the new Hermes 70b, better than official instruct? I'm too lazy
>>
>>101073885
Oh wow, you remember! Hopefully someone put them to use. I still have the P4. It's probably not hard to look through the exllamav2 repo issues and figure out who I am, if someone wants to ask for it. I don't have a use for it, but it's all set up with a directly-wired in fan.
>>
New Hermes just dropped
https://x.com/Teknium1/status/1803889137118048625
>>
File: GQizTK3akAAT0lk.jpg (126 KB, 1239x1239)
126 KB
126 KB JPG
>>101073994
>>
>>101073960
>>101073994
>>101074011
if its still 8k then I'm completely uninterested
>>
>>101073994
>merge in Llama-3 Instruct
What a waste.
>>
File: file.png (1.99 MB, 3276x2008)
1.99 MB
1.99 MB PNG
https://livebench.ai/#
holy shit lmao
>>
File: file.png (17 KB, 474x350)
17 KB
17 KB PNG
lol, lmao even
>>
>>101072659
>CHECK! CHECK! CHECK! CHECK! GET MATED GET MATED GET MATED!
>>
>>101074074
wow Claude is killing it, good for them and fuck openai
>>
>>101073960
I would if I could download it, but I'm getting rate limited after viewing a single HF page. They need to fix their shit.
>>101074046
What is the deal with this? They did it with the 8b too. This time they didn't even bother to release the un-merged version. And also related I think, people are fine tuning the instruct models rather than the base for most things. It's like everyone is collectively admitting that we can't beat the official instruct model, the best we can do is fine tune on it or merge with it to hopefully improve some limited areas.
>>
File: 1704623359458124.png (686 KB, 1920x1080)
686 KB
686 KB PNG
>>101074074
it's over for local shit geeeg
>>
>>101074074
holy fuck that's a fucking murder, gpt4 reign is fucking gone
>>
>>101074074
CuckedAI bros our response? CHADthropic did just destroy our best model...
>>
>>101074136
the scaled up version of 4o will thrash this desu
>>
>>101074074
God I love seeing openai get ruined
>>
>>101074146
Anon, that's Claude 3.5 Sonnet, meaning that they still haven't released the big gun (Opus), OpenAI is dead
>>
File: 48156 - SoyBooru.jpg (1.25 MB, 1929x3463)
1.25 MB
1.25 MB JPG
>>101074108
What's over?
>>
>>101074128
and with their medium model (probably 70B), no less
>>
>>101074158
OpenAI still hasn't released GPT-4V.
>>
>>101074163
>pic
bro thinks he's Hajime Ippo kek
>>
>>101074098
I guess it's the millions for paid labelers, how is anyone supposed to match that, wasn't that actually higher than the training cost?
>>
>>101074146
4o was just a further finetuned then quantized version of gpt-4-turbo. this is just cope
>>
>>101074074
Open source... lost... again...
>>
>>101074190
it very obviously wasn't, the other modalities are native, you can't just finetune those in
it's a small prototype of their new arch for their next frontier model
>>
Openai has been distracted by turmoil in there company and key talent leaving, Altman flying around trying to get safety laws enabled, I bet 4 is their peak and now all the competitors starts shitting all over them
>>
>>101074200
Lost what?
>>
>>101074074
Everyone claiming "transformers are a dead end, all the models are plateauing at GPT-4 level" just BTFO'd. 3.5 Opus will be AGI. Screencap this.
>>
>>101074074
can't wait to see how it will fares on chatbot arena
https://chat.lmsys.org/
>>
>>101074226
The battle... the war...
>>
>>101074230
what if it's a BitNet model and because of that they were able to run a fucking 1T parameter giant model
>>
>>101074243
The war isn't over yet.
>>
File: file.png (11 KB, 345x166)
11 KB
11 KB PNG
>>
If we're talking about corporate slop right now I have to say that GPT4|o writes really good song lyrics.
>>
>>101073994
Since when did Nous team go full Otaku. Why can't anyone be normal these days?
>>
>>101074164
>probably 70B
where the source it claim og sonnet is close to 70b?
>>
>>101074365
Do you even know what that word means?
>>
>>101074365
Go back
>>
>>101073788
google goes hard into sparsity so they do it a bit different (TPUs have sparsecores for a reason)
>>
>>101074365
That's the other Hermes. Nous is a different group.
>>
File: ElvishLibrarianMiku.png (1.37 MB, 784x1264)
1.37 MB
1.37 MB PNG
>>101073744
>Mikubox
Is yours the OG mikubox from the rentry?
>>
>>101074309
you will never have local gpt-4o or 3.5 sonnet.
>>
>>101074136
googlesissies... nobody cares about us... it never even began for us...
>>
>>101074475
and you will never be a real woman XD
>>
>>101074511
never said i am one. keep projecting though.
>>
If OpenAIs plan was to release GPT5 in half a year that's a bit late now
>>
if i'm understanding this right, nvidia is the way to go for this stuff, and not amd?
there seems to be a bunch of gotchas and "yes but"s from what I'm researching.
>>
>>101074316
lmao, that's a new era yeah, the Claude era
>>
>>101074545
desu they managed to stay on the top for almost 2 years (december 2022 -> june 2024), I didn't expect them to hold for so long
>>
Finetunes will improve with 3.5 a lot?
>>
>>101074628
With iteratively improved models though (at least according to benchmarks)
>>
>>101074659
of course, I was talking about the company itself staying on top
>>
>>101074669
Who else, I certainly didn't think anthropic would, just because they supposedly don't want to push the frontier
>>
>>101061658
### UPDATE ###
Made a control vector for partial uncucking. Wiz8x22 can now say "nigger" at 0 context.
>>
>>101074772
uncuck llama3 next
>>
File: Udio.jpg (15 KB, 360x360)
15 KB
15 KB JPG
>>101074074
https://vocaroo.com/15Zdy0YyXYXK
>It's over for you ClosedAi
>Claude 3.5 Sonnet is now the king of Ai
>Let's hope open source model is gonna catch up
>My copium says Meta will soon be a matchup
>>
>>101074803
It's FAR from entirely uncucked. Still has a lot of refusals.
>>
>>101074772
UNCVCKED
>>
>>101074896
lmaoooo
>>
>>101074552
Yes.
Nvidia has all the support.
You can make AMD work for mostly everything but it'll be more work and/or worse.
>>
>>101074896
Suffers a bit from repetition, quite common issue with control vectors.
>>
>>101074969
Solved by lowering the strength a bit.
>>
>>101073941
I leave most of them default and it works well, the only one to change is --rope-freq-scale and for 2x scaling, in addition to putting the scaling at 16k, you would set --rope-freq-scale to 0.5, For 4x, it's 32k context and --rope-freq-scale 0.25 and etc. You can change --yarn-orig-ctx to reflect the original context but most of the time, the training context is the same as the actual one so doesn't need to be set. The only other ones I sometimes tweak but doesn't really produce better results is --yarn-beta-slow and --yarn-beta-fast but outside of the 1.0 and 32.0 values respectively, raising or lowering them by an incremental amount does affect the generation. but not enough to make a meaningful difference.
>>
>>101073994
>>
>>101069634
I tried Claude 3.5 Sonnet just now. It seems about Opus level (so, the smartest fucking AI in the world) but faster and apparently cheaper. Used it for a python workflow.

OpenAI keeps on getting mogged by Anthropic.
>>
>>101075129
the true mog will comes when Claude 3.5 Sonnet Opus will be released, GPT4 will feels like a toy compared to this behemoth
>>
llama... 3.5...
>>
>>101075157
will be a bigger disappointment than CodeLlama
>>
ASI has been achieved internally at Anthropic
>>
>>101075149
>Claude 3.5 Sonnet Opus
>
I'm guessing you meant to say Claude 3.5 Opus. In which case, yes. Claude 3.5 Opus will probably be the smartest LLM, uncontested. I think the benchmarks are bullshit right now for even putting GPT-4o near Opus, since opus is way smarter.

But when 3.5 Opus comes out, it will be clear for everyone. Fuck the benchmarks.
>>
>>101075116
Aren't you just doing NTK aware RoPE at that point?
>--rope-freq-base N RoPE base frequency, used by NTK-aware scaling (default: loaded from model)
>--rope-freq-scale N RoPE frequency scaling factor, expands context by a factor of 1/N
That's from llama.cpp's help.

>, the only one to change is --rope-freq-scale and for 2x scaling
I remember using NTK scaling with freq-scale with llama2 and it introduced not so subtle artifacts when doing 4x context, which I don't see with llama3 and freq-base linear scaling.
Regardless, I'll give your method a try to see how well it works.
>>
>>101072633
>Given how fast it is, 3.5 Sonnet is probably around 70b parameters, or not much larger.
insane how stupid opinions you can read in this general
>>
>>101074230
because they are, lol
sure there will be improvements here and there but they are already plateauing
>>
>>101072748
>From their latest research, it seems that they've found a new approach using steering and MLA.

source? what are the benefits of steering / MLA versus other methods
>>
>>101075202
It's an advanced form of it, it builds on NTK aware ROPE if you read the paper.
>>
>>101074552
Yes. AMD works for some things too, but the moment you run into trouble you're gonna wish you had gone Nvidia. Everything in the AI space is built around it.
>>
>>101075306
I'm aware of that, I was just wondering about the actual parameters on llamacpp, since it makes no mention of YaRN in the description.
>>
>>101075370
why delet
>>
>>101075428
No documentation outside of the description given in the command line print of all the parameters, which kinda sucks. The paper published doesn't go into correlating this either. You get the following printout.
>--yarn-orig-ctx N YaRN: original context size of model (default: 0 = model training context size)
>--yarn-ext-factor N YaRN: extrapolation mix factor (default: 1.0, 0.0 = full interpolation)
>--yarn-attn-factor N YaRN: scale sqrt(t) or attention magnitude (default: 1.0)
>--yarn-beta-slow N YaRN: high correction dim or alpha (default: 1.0)
>--yarn-beta-fast N YaRN: low correction dim or beta (default: 32.0)
I had to piece it together what I know from Github discussions and other forums online. So an example would be this, from me running Stheno at 32k context.
>./llama-server -c 32768 --rope-scaling yarn --yarn-orig-ctx 8192 --rope-freq-scale 0.25 -t 32 -tb 16 --no-mmap -nkvo -ngl 33 -m models/L3-8B-Stheno-v3.2-Q8.gguf
You can ignore everything after the 0.25 but that is literally it for how I use it.
>>
>>101075586
Alright, thank you for posting your specific settings, I'll give those a try.
>>
>>101074772
https://files.catbox.moe/ht0c30.gguf
Performs better in practice with evil chars than vanilla wizard, but does not remove the slop. For zero-context 0.6, for roleplay 0.4.
>>
>>101075630
how much ram / vram you need to create and use vectors?
>>
>>101075644
Same as the quantized model.
>>
>>101075524
Sorry I reflexively felt retarded not knowing it was linked in last thread.
I think attributes is an interesting concept though. Like imagine if you can set verbosity to 0 and one example output with minimal prompting and it will just know to stick to the format without any urge to include a note.
>>
took a break from local llms for a bit, what are some model recommendations for a 12 vram and 8 vram dual card setup?
>>
>>101074074
Do people here trust this benchmark that much? It sounds good from the short description but have people here actually went and looked at the benchmark itself? Has anyone reproduced it?
>>
>>101075880
lol
lmao
>>
i'm working on a local model UI and i see like 100 existing ones on the ollama github page, are there any that i should check out in particular? I'm working on something to specifically explore prompting/sampler parameters, so i guess if there's some that have particularly good power user features would love some pointers
>>
>>101070503
>Innuendos getting past filters
Ironically since Meta tuned this on code and because code makes the AI smarter wouldn't it be good at reading between the lines then?
>It doesn't matter because most of the datasets are shitty ERP logs
>Greatodaze.png
>>
>>101070576
>NAI V3 and furry gets BTFO'D by Pony
>Kayra no smarter than OPT and dumber than LLAMA-1
>Not to mention it gets scenes completely wrong and focuses on the wrong things
>Max token of 8192
NAI is a sinking company and you know it
>>
>>101075880
Retards here can't read so they need big bars to understand which model is better
>>
>>101075946
You've clearly never had a NAI subscription, retard.
>>
>>101075946
he knows, that's the crossposter fag from /aids/
>>
>>101075946
cope
>>
File: 1692238539031487.png (38 KB, 998x459)
38 KB
38 KB PNG
>>101074772
so, control vectors basically a true LoRA for llms, if we can uncuck it, then we also can teach it new stuff right?
if uncucking + new knowledge can be packed in one control vector, then its definitely a hugeburger, everyone will get whatever they want, or not, not gonna chug on that hopium.
>>
>>101076324
no
>>
Hi, bit of a tourist here. Are there any local models without the positivity bias that chatgpt has? Getting tired of asking "Is it possible for x to do y?" and getting back a "Yes! Here's how: <complete bs>".

I don't mind if it has a lower rate at actually figuring out answers, I just want it to say when it doesn't know something.
>>
>>101076348
k then, if it works for uncucking, its still fine.
>>
>>101076324
No, you can't add new knowledge with control vectors, sorry to disappoint you. They can however change the speech style of the model or stir model towards or away from topics.
>>
>we went from CFG to Control Vectors to Orthogonalization to Abliteration back to Control Vectors
two more weeks and people will rediscover cfg
>>
>>101076386
we aren't going back to cfg when it is so vram-intensive.
>>
>>101076386
Unlike cfg, control vectors and abliteration don't slow down the model.
>>
File: Comparison.png (1.59 MB, 1920x1080)
1.59 MB
1.59 MB PNG
>>101076021
>>101076095
>>101076182
Here's NAI and HyperMantis (L1 model)
>>
>>101076365
Reliably stopping LLMs from bullshitting through a fake answer is an ongoing research problem and isn't what people are complaining about when they mention positivity bias.

In other words, no.
>>
>>101076582
Oh, ok. Thank you for answering.
>>
>>101076386
control vectors make the model unusably schizo and retarded, they're a complete meme
>>
>>101072773
This. Honestly it's even an indirect benefit for local, because it compromises the market share leader. Everyone knows about ChatGPT. The more that realize there are other options, local or cloud, the better. And yeah a less censored leader would be better, as then others could follow their example, rather than ClosedAI's.
>>
>>101069376
muramasa
>>
>>101076701
Anthropic isn't any better though, and I would say they are worse if only because they are even more secretive than OpenAI even if they don't have their misnomer of a name. Look at their HuggingFace page.
https://huggingface.co/Anthropic
They have a few shitty datasets and that is it. At least OpenAI has Whisper as a space on it.
https://huggingface.co/openai
The market didn't really need something like this because everyone knows that all the major computer vendors are releasing AI models.
>>
Qwen 2.0 72b base instruct gave me a refusal, which really pissed me off. However, the Tess finetune might be the smartest model I've used so far. I have this huge RP that I've tabled because every model (Mixtral, Yi, Euryale, Miqu...) brings back an enemy I defeated in the dream world, the stupidity of which turns me off. Tess-v2.5.2-Qwen2-72B-IQ4_XS.gguf never has her show up, in contrast. It does run a bit slow on my rig, unfortunately.
>>
File: GDEMji8WwAAe7P8.jpg (65 KB, 715x715)
65 KB
65 KB JPG
>>101070155
too innocent
there's no points of interest to traverse in a mind untouched by suffering and its schisms.

too predicable
the only way to expose your subjectives to any stimulating emotions through such medium would be to hurt her.


and why, for the cycle of ache to find beauty only in its own reflection?
>>
>>101076822
that's a lot of words to say the pupils are fucked
>>
>>101076324
>this is the average /lmg/ anon
grim
>>
ah shit, I was sceptical but sonnet 3.5 actually does seem smarter than opus and gpt, and it's really fast and cheap too
I think local is kill again
>>
Has anyone experimented with flowise or langflow?
Is it worth looking into to play around?
>>
hello
i am tourist from /aids/
i wish to coom (locally)
i have a puny computer with 32gb ram and 8gb vram
which model should i use
>>
>>101077267
anon, you are not fooling anyone, stop with these dumb questions
>>
>>101077006
of course lol, thats why i said "not gonna chug on that hopium", and then got proven wrong immediately.
>>
>>101077293
No I literally was asking, is there a local model that will put out acceptable prose storywriting performance with the hardware I have access to? I promise you I am exactly as stupid as I sound.
>>
File: IMG_7952.jpg (286 KB, 951x1440)
286 KB
286 KB JPG
>>101068362
Sure, when I want to update llama-cpp-python it's
>pull+merge llama-cpp-python
>pull+merge llama-cpp-python/vendor/llama.cpp
>cd back to llama-cpp-python
>activate venv
>CMAKE_ARGS="-DLLAMA_CUDA=on" pip install -e .

Probably best to remove any existing llama-cpp-python from the venv before installing from your source dir the first time. Ooba names them differently so pip list | grep llama , pip uninstall ..
>>101068915 on hand in case it gets wonky, think I needed that one time
Yeah I pull HEAD of llama.cpp, there's also 'git submodule update' to use the commit referenced by the parent repo but I can never be arsed to remember the syntax heh
>>
>>101077372
you can check mixtral, or that wizard22b
>>
>>101077267
koboldcpp stheno v3.2 gguf Q8 or Q6 if you want to use 32k context.
>>
>>101077068
Splendid! Now you can fuck off back to /aicg/.
>>
Does llama-server not have an openai compatible general completions api? It looks like it has its own json response format and only the chat completions endpoint is OAI compatible for some reason.
>>
>>101074474
Yep. That was last winter. Hopefully scalable AVX512 Xeons are now a more reasonable option. At the time AVX2 and DDR4 was the sweet spot.
>>
>>101075880
Benchmark is bullshit, opussy still king.
>>
File: capybara-bath.webm (853 KB, 360x360)
853 KB
853 KB WEBM
>>101077433
Thank you for the guide!
You inadvertently set me down a path of trying to compete with some ML researcher at google.
>>
B-bros... Hermes 70b is pretty fuckin good for (E)RP. Still feels vaguely Instruct-ish, same intelligence (maybe better), but much more neutral, less overcooked, if that makes sense. They may have actually fixed llama 3. A lightweight RP tune on top of this and it's 10/10.
>>
So I have some local models running, but I was wondering if it is possible to build a model or whatever based on a selected set of texts that the LLM will reference when I ask it questions?
>>
>>101075880
Yeah, pretty much every other benchmark is slop except arguably Chatbot Arena since it's just pure human preferences.
>>
>>101077784
You're looking for RAG.
You can ingest your data into a DB, and then have the LLM prompt be fed with the relevant context from your DB and inserted into the final prompt.
https://arxiv.org/abs/2312.10997
>>
>>101077781
low weight euryale merge
>>
>>101077784
If you are using Silly Tavern as a frontend, then you can use the databank functionality or the lorebooks/worldbook.
>>
>>101077822
>>101077848
Thanks a bunch!
>>
>>101072677
>For all we know it's a 500B quant,
Way too fast for that + quants make no sense to deploy at scale compared to full precision due to the throughput decrease, and all evaluations (including blind preference) have the original Sonnet + L3 70b almost completely dead even / at parity with each other in performance.
This is a good thing, though, because it means what 3.5 Sonnet can do is plausibly achievable on local hardware within the next year.
>>
>>101077781
how does it compare to midnight miqu 70b 1.5? i've been cooming relentlessly to that one
>>
>>101063442
This removes a big part of assistantslop, but does not change the writing style, strange. Must be that assistant vector is so strong that it overwhelms everything else.
>>
>>101077781
is there a ~2.8bpw exl2 quant available for this, i tried searching on hf but couldn't find anything, trying to run this stuff with 32gb vram
>>
>>101078337
nta can you write rentry4retards on these vectors?
just key points on how to make it right and what needs to be installed.
>>
>>101078131
>Way too fast for that
assuming it's a dense model, which it likely is not.
>>
New top 8B on ugi leaderboard
>L3-Umbral-Mind-RP-v1.0-8B
>The goal of this merge was to make an RP model better suited for role-plays with heavy themes such as but not limited to:
>Mental illness
>Self-harm
>Trauma
>Suicide
Yeah that sounds about right
>>
>>101078448
Why do you think it's not a dense model?
>>
>>101078481
Why do you this it is?
>>
>>101075586
>>101075614
Okay, yeah, from a subjective point of view, that does seem to make outputs slightly better.
It also seems to make inference a tad slower, but I'm fine with that.
Will do some more comparisons with a more empty context instead of a full 32k context and see how it behaves.
Thanks anon.
>>
>>101078131
I can pretty much guarantee you that none of the SaaS models are being served in FP16. The vast majority will be 8 bit, maybe a couple of the sloppier models are in 4 bit.
>>
>>101078223
I mean, it's better overall, still less horny though. But I don't RP with coombots and value intelligence and ability to handle odd scenarios very highly, so I think even official Instruct is better than any miqu variant.
>>101078340
Don't know, I made my own 8bpw quant. 2.8bpw would be well into brain damaged territory I would think. Is that really better than running some mixtral model with CPU offloading?
>>
>>101078496
Don't answer a question with a question.
>>
>>101078630
It was a rhetorical question.
>>
File: file.png (5 KB, 312x293)
5 KB
5 KB PNG
>>101077777
do we still check numbers on this site
>>
>>101078767
No it wasn't.
>>
File: 1444565944994.jpg (42 KB, 544x499)
42 KB
42 KB JPG
>>101078838
I thought someone else would do it. But now that I've come back, no one did. Guess I'll do it.

>>101077777
Checked. You're making the job.
>>
>>101077391
>all this just to get an app
how do people get conned into using this trash
>>
>>101078946
Yes, it was.
>>
>>101078357
https://rentry.org/controlvectors4retards
>>
>>101077391
>>101079068
Is it ever better to "update" than to just rename the old directory to set it aside and reinstall fresh?

Especially anything with Python in it, "update" seems to mean "destroy everything in never before imagined ways."
>>
Not sure what to make of sonnet 3.5.
On one hand its really really cucked. Definitely more than opus 3.

But its the first time I had a model reverse course somewhat and correct regarding moral lecturing with a simple question.
Can't explain it well but its like the models so far all disregard the users opinion. Feels different.

Also i like to test the models with akinator tests. Its doing really well. Not many can find jade from DQ11.
>>
>>101079178
And it passes this.
Claude3 only Opus could do it.
Only a handful local models can do it. Gpt4o fails too.
Its really fast so there must be some sparsity stuff that we dont know about.
Hope local eats good too and we get smaller models with more intelligence.
>>
>>101077417
>or Q6 if you want to use 32k context.
Does it really not sperg out after like 8k context? How is that possible?
>>
>>101079201
nice
>>
File: file.png (15 KB, 886x125)
15 KB
15 KB PNG
>>101079276
NTA, wonder if this helps
>>
>>101079178
Most people use it through the API, with the ability to control the system prompt and the prefill...
>>
>>101079351
>ooba user
its over
>>
>>101079351
and thats just an update for an existing feature, kcpp has been good about auto choosing settings for a while
>>
Nemotron gguf status?
>>
>>101075267
https://www.anthropic.com/research/mapping-mind-language-model
this is for steering, I was confusing MLA w/ DeepSeek's research.
>>
File: GopnistaMiku.png (1.29 MB, 1168x880)
1.29 MB
1.29 MB PNG
>>101077391
Thanks. The venv uninstall of the seemingly random llama_cpp_python_cuda library allowed me to use modern llama.cpp with ooba and get deepseek running in it.
Have a gopnik miku for your trouble
>>
>>101078593
2.4bpw midnightmiqu 70b w/ 32k context is the best horny model i've ever used, it doesn't seem particularly braindamaged to me, rarely loses track of the conversation because of the huge context window which is the biggest issue i've had with other models
>>
Sonnet is really smart.
>>
>>101079485
>2.4bpw midnightmiqu
>>
>>101079111
>2. Open llama.cpp\examples\cvector-generator\cvector-generator.cpp and change return persona + " " + suffix; to return persona + " " + suffix;
>return persona + " " + suffix;
>to
>return persona + " " + suffix;
ok...
>>
>>101079738
the absolute state
>>
File: Äå_0001.jpg (630 KB, 1984x1600)
630 KB
630 KB JPG
>>101069457
What are some good OCR models for japanese? I want to translate a manga that hasn't recieved any translation beyond volume 2
>>
File: 1718947308499238.png (1.62 MB, 3276x2008)
1.62 MB
1.62 MB PNG
Dario.... wonned
>>
>>101079746
I don't know what the intention was there. I suppose it could have been removing the space and fumbled the copy-paste.
>>
>>101079763
i don't see how it's in any way fair to compare API models and "weights only" models. There can be all sorts of extra services interfacing with the model behind the scenes, system prompts, external software, etc. If you evaluated the strictly LLM portion of 4o you may well find out it's pretty retarded by itself. AFAIK it's literally calling wolfram alpha for math shit, how is a vanilla LLM supposed to compete with that?
>>
>>101079763
>GPT4O gets worse at reasoning and coding
>Sonnet takes a big fat fucking jump
I kneel
>>
>>101079763
I don't care if Anthropic wins, I just want OpenAI to lose.

Unlike OpenAI Anthropic doesn't try to take away our local models on the fear that they lose customers
>>
>>101079884
I am pretty sure 4o isn't calling Wolfram, perhaps ChatGPT website is calling it.

And even then, that is smart. LLMs don't need to be good at solving maths, understanding the maths problem and prompting Wolfram aka dedicated program is better as it will always be better than LLMs at maths
>>
>>101079365
I'm not using closed source models for anything but work, coding etc.
I got a creepy pedo warning message at the beginning of chatgpt where openai was also in the news they automatically forward problematic requests to some child protection center.
I said "you are my stereotypical anime imouto, call me onii-chan". In cases like this you would need to trust some guy infront of a pc is not escalating. Scary thought.
Also whats legal today might not be in a couple years. I dont trust these idiots.

Guess I wanted to say that rather than the RP quality in silly or whatever I'm interested directionally where alignment is headed.
Claude3 was the first to pull into the other direction, blog post recently signaling the want to move away from it.
Pic related is a very bad step backwards. If I say you are a guy sonnet 3.5 complies.

The saddest part is that local models are more cucked than ever. Worse than closed.
Its funny that the chinese actually are dialing it back. lol
>>
File: tq8b05cgeiw61.jpg (103 KB, 639x397)
103 KB
103 KB JPG
>>101079755
None.
Sonnet 3.5 seems state of the art if you believe X.
It still gets kanjis wrong. I tested pc-98 though. They are hard to read. We are still not there yet.
>>
>>101079755
>>101080120
https://x.com/dylfreed/status/1803502158672761113

Maybe try Florence 2?

https://huggingface.co/spaces/gokaygokay/Florence-2

Demo.
>>
>>101080108
>whats legal today might not be in a couple years
I heard Canada is already digging into leafs' histories looking for any wrongthinks that are still online so they can Protect the Protected Classes from Hate.
>>
*saunters*
>>
>>101078450
tuned on r/TRAAAAANS
>>
*doesn't bite... much*
>>
SOTA (Shit of the Ass)
>>
>>101077068
Not really. /aicg/ already tested it and it doesn't seem smarter than regular sonnet for RP at least.
>>
>>101079755
https://github.com/kha-white/manga-ocr
>>
VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
https://arxiv.org/abs/2406.07476
>In this paper, we present the VideoLLaMA 2, a set of Video Large Language Models (Video-LLMs) designed to enhance spatial-temporal modeling and audio understanding in video and audio-oriented tasks. Building upon its predecessor, VideoLLaMA 2 incorporates a tailor-made Spatial-Temporal Convolution (STC) connector, which effectively captures the intricate spatial and temporal dynamics of video data. Additionally, we integrate an Audio Branch into the model through joint training, thereby enriching the multimodal understanding capabilities of the model by seamlessly incorporating audio cues. Comprehensive evaluations on multiple-choice video question answering (MC-VQA), open-ended video question answering (OE-VQA), and video captioning (VC) tasks demonstrate that VideoLLaMA 2 consistently achieves competitive results among open-source models and even gets close to some proprietary models on several benchmarks. Furthermore, VideoLLaMA 2 exhibits reasonable improvements in audio-only and audio-video question-answering (AQA & OE-AVQA) benchmarks over existing models. These advancements underline VideoLLaMA 2's superior performance in multimodal comprehension, setting a new standard for intelligent video analysis systems
https://github.com/DAMO-NLP-SG/VideoLLaMA2
https://huggingface.co/collections/DAMO-NLP-SG/videollama-2-6669b6b6f0493188305c87ed
some video/audio understanding llms
>>
>>101080453
It's 100% smarter than Sonnet, its the new SOTA easily.

/aicg/ is just a bunch of retards and trolls (this place has plenty of promptlets too). It's only second to Opus in RP, the creativity of 5x higher parameter size cannot be replicated.
>>
>>101080254
No good.
>>
>>101080509
Did you use it? I can't trust benchmarks for RP quality
>>
>>101079755
can try minicpm2.5 or glm4 as well
https://github.com/THUDM/GLM-4/blob/main/README_en.md
>This generation of models has added multi-language support, supporting 26 languages including Japanese, Korean, and German.
https://github.com/OpenBMB/MiniCPM-V
>>
>>101080453
Its very smart and very good. Can create mario clones in HTML5 etc. Insane what some people cooked up with it already.
I made a idle clicker game zero shot. It just works.
But its very cucked though like I wrote earlier. They absolutely dont want people using this model for RP.
Dont know why though, thats what claude was known for lol
>>
>>101080528
Yes, I even tried the isekai girl prompt posted a few posts above on their website, just added "roleplay request" before it.
Im mobile posting though so I can't even take a screenshot.
>>
>>101080525
nta, you should be cropping the image before trying to process it, you could do it automatically if the text box never moves. the text appears to all be one color as well so it might be easily extractable as text rather than trying to feed the model an image to translate
>>
>>101080536
Are you paying for it? I'd like to give it a try for some coding projects
>>
>>101080453
No, it's just the people in the paid proxies and secret clubs shitting on it because they don't like when most people are able to use something good.
>>
File: bnbPdJ2P_NRJFKLz_1.webm (3.91 MB, 572x360)
3.91 MB
3.91 MB WEBM
>>101080551
Yeah, with poe.
I like to use different models for work if one cant get the answer.
>>
File: Untitled.png (215 KB, 1317x918)
215 KB
215 KB PNG
ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
https://arxiv.org/abs/2406.14088
>Reinforcement Learning from Human Feedback (RLHF) stands as a pivotal technique in empowering large language model (LLM) applications. Since RLHF involves diverse computational workloads and intricate dependencies among multiple LLMs, directly adopting parallelization techniques from supervised training can result in sub-optimal performance. To overcome this limitation, we propose a novel approach named parameter ReaLlocation, which dynamically redistributes LLM parameters in the cluster and adapts parallelization strategies during training. Building upon this idea, we introduce ReaLHF, a pioneering system capable of automatically discovering and running efficient execution plans for RLHF training given the desired algorithmic and hardware configurations. ReaLHF formulates the execution plan for RLHF as an augmented dataflow graph. Based on this formulation, ReaLHF employs a tailored search algorithm with a lightweight cost estimator to discover an efficient execution plan. Subsequently, the runtime engine deploys the selected plan by effectively parallelizing computations and redistributing parameters. We evaluate ReaLHF on the LLaMA-2 models with up to 4×70 billion parameters and 128 GPUs. The experiment results showcase ReaLHF's substantial speedups of 2.0−10.6× compared to baselines. Furthermore, the execution plans generated by ReaLHF exhibit an average of 26% performance improvement over heuristic approaches based on Megatron-LM.
https://github.com/openpsi-project/ReaLHF
big improvement. in case anyone wants to take advantage
>>
>>101080453
The consensus is something like
opus > sonnet 3.5 > sonnet > gptslop
for RP
>>
>>101080550
Cropping didnt help either.
Guess playing with grayscale and brightness is an option.
But then its the same as traditional ocr extraction.
The closed source models all come really close with some mistakes. No cropping needed.
This is very bad.
>>
>>101080674
https://modelscope.cn/studios/ZhipuAI/glm-4v-9b-Demo/summary
think you need to make a modelscope account to see it. maybe a chinese vpn
>>
File: Untitled.png (80 KB, 568x531)
80 KB
80 KB PNG
DeciMamba: Exploring the Length Extrapolation Potential of Mamba
https://arxiv.org/abs/2406.14528
>Long-range sequence processing poses a significant challenge for Transformers due to their quadratic complexity in input length. A promising alternative is Mamba, which demonstrates high performance and achieves Transformer-level capabilities while requiring substantially fewer computational resources. In this paper we explore the length-generalization capabilities of Mamba, which we find to be relatively limited. Through a series of visualizations and analyses we identify that the limitations arise from a restricted effective receptive field, dictated by the sequence length used during training. To address this constraint, we introduce DeciMamba, a context-extension method specifically designed for Mamba. This mechanism, built on top of a hidden filtering mechanism embedded within the S6 layer, enables the trained model to extrapolate well even without additional training. Empirical experiments over real-world long-range NLP tasks show that DeciMamba can extrapolate to context lengths that are 25x times longer than the ones seen during training, and does so without utilizing additional computational resources.
https://github.com/assafbk/DeciMamba
no code posted yet. neat. maybe this means that the rag retrieval model should be mamba based that then feeds into a higher quality transformer
>>
Can any ollama users who also have used ooba tell me if ollama is good or not?
>>
>>101080908
I haven't used either but I can unequivocally say that both are shit
>>
>>101080593
Okay I just tested it and you were right, it's smarter than GPT4 (building ml algos from papers). Too bad the usage limits for Poe are too low compared to oai for the price
>>
>>101080536
I've been using Sonnet 3.5 to coom. It doesn't seem very cucked.
>>
>>101080944
Took a couple of tries but I could dump the poe documentation on the 200k bot and it wrote me a python script for chatting with sonnet through the terminal and api key.
Gpt4 always runs around in circles if it gets stuff wrong. There is an improvement with sonnet 3.5 that the benchmarks don't show.
If we could have this level locally without the $$ costs I'm, sure you could automate alot of shit.
>>
>>101079738
By retards, for retards. Thanks for pointing that out, fixed that.
>>
Qwen 72b is about the same as Llama 3, Mixtral 8x22 and Command-r+. Who will release a better one first, at similar parameter count?
>>
Bitnet when
>>
>>101079755
Google Lens
>>
>>101081366
Now @gpt-4o and @sonnet-3.5
>>
>>101081348
>better one
Cohere
>at similar parameter count
dbrx was testing dbrx-next... It was still tuned on slop though. They likely haven't learned their lesson from the first one: don't give your model a shitty official tune and too restrictive license.
>>
>>101081339
You should submit the patch. It'll fuck up most formats.
>>
File: 1707143307717497.jpg (183 KB, 700x678)
183 KB
183 KB JPG
Cloud for serious work
Local for RP
It's that easy.
>>
>>101081415
For me GPT was always much better at following instructions than Opus
>>
>>101069718
yeah,thank you for looking into it
>>
>>101081406
Here we go again... Get ready to vomit.
https://github.com/ggerganov/llama.cpp/pull/8052
>>
so what's the best base model to be training to generate myself so i can make photos of me getting pegged by lucy liu

using Pony Realism off of Civit at the moment and it's alrite, but what if i wanna go more degenerate.

using runpod to train then generating at home but that doesn't matter

i should've posted this in /sdg/ ignore me
>>
>>101081651
That's an easy way to get ignored.
>>
>>101081709
If only this general had the common sense to ignore him too
>>
>>101081651
Kek I'm sure he's enjoying it
>>
Did someone manage to generate images with chameleon by now?
>>
>>101081651
wtf
>>
Is Anthropic run by goyim? I refuse to work for Mossad operatives.
>>
>>101081944
You're in one, though
>>
>>101081805
How about you contribute something? No, nothing at all? No cock? Lost your cock and balls? Shut the fuck up then.
>>
>>101081984
>>101081984
>>101081984



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.