/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/18/24(Mon)18:38:07 No.103230385

File: 1711997242384392.jpg (1.91 MB, 4096x2315)

1.91 MB JPG

/lmg/ - Local Models General Anonymous 11/18/24(Mon)18:38:07 No.103230385 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103227556 & >>103218593

►News
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html ; https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/18/24(Mon)18:40:26 No.103230404

Anonymous 11/18/24(Mon)18:40:26 No.103230404

Are you a male or a female?

Anonymous
11/18/24(Mon)18:41:04 No.103230412

Anonymous 11/18/24(Mon)18:41:04 No.103230412

File: file.png (56 KB, 1008x829)

56 KB PNG

https://arxiv.org/html/2411.05000

Anonymous
11/18/24(Mon)18:41:44 No.103230417

Anonymous 11/18/24(Mon)18:41:44 No.103230417

Teto my beloved

Hi all, Drummer here...
11/18/24(Mon)18:42:51 No.103230430

Hi all, Drummer here... 11/18/24(Mon)18:42:51 No.103230430

File: largenothingburger.png (82 KB, 1989x1061)

82 KB PNG

Err...

Anonymous
11/18/24(Mon)18:44:35 No.103230446

Anonymous 11/18/24(Mon)18:44:35 No.103230446

File: tetrecap1.png (1.96 MB, 1536x1536)

1.96 MB PNG

►Recent Highlights from the Previous Thread: >>103227556

--Papers:
>103229753
--Script to download Mistral-Large-Instruct-2411 model with HF compatible weights:
>103227723 >103227757
--Quantization benchmarks and MMLU scores for various models:
>103228791 >103228807 >103229409 >103228835
--Discussion of Mistral-Large-Instruct-2411-GGUF model and GGUF file merging:
>103230194 >103230222 >103230243 >103230247
--Largestral V3 and sonnet@home equivalence discussion:
>103227617 >103227657 >103227721 >103227745
--Improving audio quality of Vocaroo recording:
>103229791 >103229830 >103229840
--Anons discuss Pixtral large's gender-neutral approach and its implications on image understanding:
>103227718 >103227733 >103227771 >103227828 >103227860 >103227858 >103227916 >103227935 >103227949
--Anon's positive experience with Largestral's response to a networking question:
>103229945 >103230033 >103230059 >103230086
--Anon wants a model to control their computer:
>103229442 >103229482 >103229540 >103229707
--Miku (free space):
>103229733 >103229739 >103229929 >103229990

►Recent Highlight Posts from the Previous Thread: >>103227561

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/18/24(Mon)18:48:16 No.103230480

Anonymous 11/18/24(Mon)18:48:16 No.103230480

>>103230430
Huh.

Anonymous
11/18/24(Mon)18:50:13 No.103230499

Anonymous 11/18/24(Mon)18:50:13 No.103230499

>>103230430
retard

Anonymous
11/18/24(Mon)18:50:24 No.103230504

Anonymous 11/18/24(Mon)18:50:24 No.103230504

Tetolove

Anonymous
11/18/24(Mon)18:51:07 No.103230513

Anonymous 11/18/24(Mon)18:51:07 No.103230513

>>103230412
>Across the board, the closed-source models outperform the open-source models.
>No Qwen 2.5.
Every single time

Anonymous
11/18/24(Mon)18:54:51 No.103230534

Anonymous 11/18/24(Mon)18:54:51 No.103230534

>>103230430
What does this mean? The November weights haven't been changed much?

Anonymous
11/18/24(Mon)18:56:39 No.103230542

Anonymous 11/18/24(Mon)18:56:39 No.103230542

File: 2024-11-10_183910_seed415(...).png (2.01 MB, 1536x1536)

2.01 MB PNG

>it is Monday
>but it is also Tuesday

Anonymous
11/18/24(Mon)19:08:56 No.103230604

Anonymous 11/18/24(Mon)19:08:56 No.103230604

>>103230489
That looks cool. Thanks for the "hot to use it" rundown...now what does it do and why should I be interested in trying to make it work?
I'd rather have a tl;dr than have to digest the paper

Anonymous
11/18/24(Mon)19:17:25 No.103230661

Anonymous 11/18/24(Mon)19:17:25 No.103230661

Sana when

Anonymous
11/18/24(Mon)19:18:33 No.103230669

Anonymous 11/18/24(Mon)19:18:33 No.103230669

>>103230513
>giving oxygen to the new axis of evil
you're going to be fighting your chicom friends to save democracy in taiwan soon
so get used to hating them

Anonymous
11/18/24(Mon)19:22:59 No.103230704

Anonymous 11/18/24(Mon)19:22:59 No.103230704

>>103230669
Chinas killed a few million less people in my lifetime than US/Europe have.

Anonymous
11/18/24(Mon)19:23:51 No.103230716

Anonymous 11/18/24(Mon)19:23:51 No.103230716

File: 1726346453576056.png (38 KB, 701x305)

38 KB PNG

Using koboldcpp, if im gonna use just the regular Q4_K_M quants, do I need to download the "Mistral-Large-Instruct-2411.imatrix" 36.1 MB file? Is that needed for other type of quants?

Anonymous
11/18/24(Mon)19:25:02 No.103230729

Anonymous 11/18/24(Mon)19:25:02 No.103230729

>>103230716
the imatrix file is part of the quantization process, you don't need it for inference

Anonymous
11/18/24(Mon)19:26:30 No.103230740

Anonymous 11/18/24(Mon)19:26:30 No.103230740

>>103230704
>in my lifetime
You haven't lived for too long, then.

Anonymous
11/18/24(Mon)19:28:32 No.103230762

Anonymous 11/18/24(Mon)19:28:32 No.103230762

>>103230729
Makes sense but you never know these days, thanks

Anonymous
11/18/24(Mon)19:34:23 No.103230808

Anonymous 11/18/24(Mon)19:34:23 No.103230808

How do cloud LLM apis make prompt processing so fast
The tokens per second is easy to understand but the biggest difference from local is that they seem to make prompt processing almost instant, the time-to-first-token is barely a second usually
How the fuck do they do that?

Anonymous
11/18/24(Mon)19:36:02 No.103230820

Anonymous 11/18/24(Mon)19:36:02 No.103230820

>>103230808
by precomputing everything up until the user message

Anonymous
11/18/24(Mon)19:37:21 No.103230827

Anonymous 11/18/24(Mon)19:37:21 No.103230827

>>103230808
KV caching

Anonymous
11/18/24(Mon)19:42:48 No.103230866

Anonymous 11/18/24(Mon)19:42:48 No.103230866

>>103230808
big memory bandwidth

Anonymous
11/18/24(Mon)19:42:48 No.103230867

Anonymous 11/18/24(Mon)19:42:48 No.103230867

>>103230808
>Faster Inference Speed: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup.
The Qwen Turbo announcement said that at least.

Anonymous
11/18/24(Mon)19:44:49 No.103230883

Anonymous 11/18/24(Mon)19:44:49 No.103230883

>>103230866
if you rent a couple of H100s and try them out with a big model you'll see that prompt processing is still much slower than most cloud APIs
so it's not just hardware, there's clearly some secret sauce engineering tricks that local isn't privy to

Anonymous
11/18/24(Mon)19:45:18 No.103230887

Anonymous 11/18/24(Mon)19:45:18 No.103230887

>>103230808
What do you mean? Processing is (always?) faster than generating.
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

Anonymous
11/18/24(Mon)19:47:30 No.103230901

Anonymous 11/18/24(Mon)19:47:30 No.103230901

>>103230883
I'm pretty sure it's just caching. Hell, I implemented the same thing on colab back during the L1 release and it does the same thing
If you're talking about the "superspeed" services like, Groq, Cerebras, or SambaNova, they have specialized hardware they're using

Anonymous
11/18/24(Mon)19:49:38 No.103230923

Anonymous 11/18/24(Mon)19:49:38 No.103230923

>>103227777
But it did fuck up the format with the quotation marks, no?

Also holy shit a thread died for the political brainrot

Anonymous
11/18/24(Mon)19:54:32 No.103230957

Anonymous 11/18/24(Mon)19:54:32 No.103230957

File: 1731022755750785.png (7 KB, 466x148)

7 KB PNG

wizard8x22-to-largestral2-to-largestral3GODS, how are we feelin?

Anonymous
11/18/24(Mon)19:55:18 No.103230965

Anonymous 11/18/24(Mon)19:55:18 No.103230965

>>103230957
I feel that Magnum v4 72B is better.

Anonymous
11/18/24(Mon)19:55:51 No.103230970

Anonymous 11/18/24(Mon)19:55:51 No.103230970

I support Tetoism.

Anonymous
11/18/24(Mon)19:55:52 No.103230971

Anonymous 11/18/24(Mon)19:55:52 No.103230971

>>103230965
ah i see the magnum shill without the rig to even load a 123b model is still itt

Anonymous
11/18/24(Mon)19:59:49 No.103230987

Anonymous 11/18/24(Mon)19:59:49 No.103230987

>>103230923
>But it did fuck up the format with the quotation marks, no?
No, his format is plain text for dialogue, asterisks for narration, and the quote for emphasis

Anonymous
11/18/24(Mon)20:00:57 No.103230996

Anonymous 11/18/24(Mon)20:00:57 No.103230996

File: Screenshot from 2024-11-1(...).png (502 KB, 3314x1552)

502 KB PNG

>>103230971
I can load the AWQ version with vLLM and run it distributed at ~20 T/s. It's still worse than Magnum v4 72B FP8.

Anonymous
11/18/24(Mon)20:01:42 No.103231003

Anonymous 11/18/24(Mon)20:01:42 No.103231003

>>103230957
Threestral is pretty good, slightly smarter and a bit more smutty than 2, not a night and day difference though. I don't think I'll be switching away from Behemoth or Monstral just yet.

Anonymous
11/18/24(Mon)20:02:52 No.103231012

Anonymous 11/18/24(Mon)20:02:52 No.103231012

>>103231003
buy an ad

Anonymous
11/18/24(Mon)20:05:07 No.103231030

Anonymous 11/18/24(Mon)20:05:07 No.103231030

>>103231012
Keep Yourself Safe. :)

Anonymous
11/18/24(Mon)20:06:33 No.103231039

Anonymous 11/18/24(Mon)20:06:33 No.103231039

>>103230987
Interesting, how many people actually use that? Speaking of,what is the easiest format for a model? I generally try to use "" for speech, ** for narration and unstyled text for OOC instructions or general instructions when I'm too lazy to follow the format with my character
Unfortunately, pretty much every model I've tried fucks up the asterisks every now and then. It's not a huge deal, but kind of annoying nonetheless

Anonymous
11/18/24(Mon)20:09:08 No.103231067

Anonymous 11/18/24(Mon)20:09:08 No.103231067

>>103231039
>what is the easiest format for a model?
probably novel, "dialogue" and plain text narration

Anonymous
11/18/24(Mon)20:09:18 No.103231070

Anonymous 11/18/24(Mon)20:09:18 No.103231070

>>103231039
It used to be a very popular model once upon a time, but it got forgotten as soon as Miqu dropped.

Anonymous
11/18/24(Mon)20:15:42 No.103231119

Anonymous 11/18/24(Mon)20:15:42 No.103231119

>>103231039
rep pen settings (dry/xtc as well) are the number one cause of asterisks getting messed up, not the model. rep pen is a meme anyways and just makes the model use other words which leads to errors if it wants to say 'red car' but it cant say red, so that becomes orange or blue. avoid asterisks or turn off rep pen stuff down/off

Anonymous
11/18/24(Mon)20:15:49 No.103231120

Anonymous 11/18/24(Mon)20:15:49 No.103231120

Where is *narration* "speech" ever used in the wild? I have never seen this outside of my time playing with AI. It doesn't seem to make sense that we should be trying to make models follow that format when it's not the normal format that RP and novels is done in.

Anonymous
11/18/24(Mon)20:17:26 No.103231126

Anonymous 11/18/24(Mon)20:17:26 No.103231126

>>103231120
gay rp chat logs

Anonymous
11/18/24(Mon)20:18:53 No.103231133

Anonymous 11/18/24(Mon)20:18:53 No.103231133

>>103231120
furry erp

Anonymous
11/18/24(Mon)20:21:07 No.103231144

Anonymous 11/18/24(Mon)20:21:07 No.103231144

>>103231126
>>103231133
Show me an example. I don't get why anyone would go to the work of putting narration in asterisks AND speech in quotes. Just having one is enough to make sense of what should be narration/actions and which should be speech.

Anonymous
11/18/24(Mon)20:27:43 No.103231184

Anonymous 11/18/24(Mon)20:27:43 No.103231184

Nala test where?

Anonymous
11/18/24(Mon)20:27:52 No.103231187

Anonymous 11/18/24(Mon)20:27:52 No.103231187

>>103231120
Clearly you don't erp much.
Yes this is an offer.

Anonymous
11/18/24(Mon)20:28:18 No.103231192

Anonymous 11/18/24(Mon)20:28:18 No.103231192

>>103231119
>dry/xtc
That might be it, I tend to have dry at 0.6 or something and xtc at 0.1/0.5
Still, I'm pretty sure that it happens even without those, recently I've been testing models with neutral samplers

Anonymous
11/18/24(Mon)20:42:35 No.103231295

Anonymous 11/18/24(Mon)20:42:35 No.103231295

>>103231144
Come to think of it, I guess there are people coming in to the hobby who have never erped with a human being. Weird.

Anonymous
11/18/24(Mon)20:49:10 No.103231341

Anonymous 11/18/24(Mon)20:49:10 No.103231341

>>103231144
Because the asterisk convention was for mentioning imperative physical action outside of the slower paced and more descriptive general narration.

>I barge into the room like Kramer. "Sup bitches!" *glomps you*

Anonymous
11/18/24(Mon)20:51:43 No.103231355

Anonymous 11/18/24(Mon)20:51:43 No.103231355

>>103231341
Why is "glomps you" so funny? I feel like learning the actual definition will make it far less funny

Anonymous
11/18/24(Mon)20:55:28 No.103231379

Anonymous 11/18/24(Mon)20:55:28 No.103231379

>>103231192
>dry at 0.6 or something and xtc at 0.1/0.5
i dunno if you should use them at the same time
>neutral samplers
low quants of models can go insane and output jibberish with no sampler at all. usually a low min p is enough to weed out bad tokens
try min p 0.05 and temp 1.25, no rep pen, dry xtc for a few turns

Anonymous
11/18/24(Mon)20:57:29 No.103231397

Anonymous 11/18/24(Mon)20:57:29 No.103231397

>>103231355
You are correct that it would.

Anonymous
11/18/24(Mon)21:00:48 No.103231415

Anonymous 11/18/24(Mon)21:00:48 No.103231415

>>103230604
>now what does it do and why should I be interested in trying to make it work?
So, if it is working correctly, then what it should do is do n amount of warmup steps to calculate gradients, then choose the select the most important submatrices from the Q,K,V layers. Those submatrices are then what get updated while everything else is frozen.
You/we should be interested in trying to make it work because, if it does work, then it should/could be more efficient than LoRAs.

CPuMAXx/VI !CPuMAXx/VI
11/18/24(Mon)21:01:11 No.103231419

CPuMAXx/VI !CPuMAXx/VI 11/18/24(Mon)21:01:11 No.103231419

File: recapbot-largestral-1124.png (13 KB, 985x188)

13 KB PNG

updated largestral recapbot test results for the previous thread.
Considering the amount of attempted thread derailing I'm impressed it was able to winnow it down to just the /lmg/ relevant parts.
Still, not very spicy considering its asked to be 4chan style offensive

Anonymous
11/18/24(Mon)21:04:45 No.103231437

Anonymous 11/18/24(Mon)21:04:45 No.103231437

>>103231415
Now you've got my attention.
It then patches the existing model, or produces a LoRA-like file to load in addition to the model?
How much memory is needed to do this in comparison to the model size?
Any idea what the compute requirements are like?

Anonymous
11/18/24(Mon)21:19:09 No.103231515

Anonymous 11/18/24(Mon)21:19:09 No.103231515

>>103231341
In that case the narration is not the thing put in asterisks.

Anonymous
11/18/24(Mon)21:19:39 No.103231519

Anonymous 11/18/24(Mon)21:19:39 No.103231519

File: Screenshot_2024_11_18-8.png (206 KB, 695x837)

206 KB PNG

>>103231437
Pretty sure it's just meant to patch the existing model, although I'm sure a LoRA-like file is also possible, not entirely sure. Also picrel

Anonymous
11/18/24(Mon)21:26:08 No.103231567

Anonymous 11/18/24(Mon)21:26:08 No.103231567

>>103231515
Correct. But the models don't seem to understand this distinction, and that may be because the authors of the training material also don't understand it.

Alternatively, narration is being put within asterisks for the sake of a rich text presentation using asterisks to enclose text that would be italicized as a hint that it's narration. Which is retarded but zoomers and alphas actively disrespect written language so we need not be surprised.

Anonymous
11/18/24(Mon)21:35:24 No.103231627

Anonymous 11/18/24(Mon)21:35:24 No.103231627

>>103231519
I'd expect you'd produce a diff-like patch file and patch-on-load so you don't modify the existing model in place and can use whichever patch you need at the time

Anonymous
11/18/24(Mon)21:35:30 No.103231628

Anonymous 11/18/24(Mon)21:35:30 No.103231628

whats the progress on that publicly trained model or whatever

Anonymous
11/18/24(Mon)21:37:29 No.103231641

Anonymous 11/18/24(Mon)21:37:29 No.103231641

File: Screenshot_20241119_113622.png (1.19 MB, 1312x1622)

1.19 MB PNG

lmao, what a travesty

Anonymous
11/18/24(Mon)21:39:00 No.103231650

Anonymous 11/18/24(Mon)21:39:00 No.103231650

>>103231641
how did you do that? thats a multimodal model, how did you get that running locally?

Anonymous
11/18/24(Mon)21:39:47 No.103231655

Anonymous 11/18/24(Mon)21:39:47 No.103231655

>>103231641
This picture sums up the pathetic state of LLMs and AI in 2024 perfectly

Anonymous
11/18/24(Mon)21:40:01 No.103231659

Anonymous 11/18/24(Mon)21:40:01 No.103231659

>>103231650
open-webui and openrouter. my p40 and 1080ti is not enough to run that beast.

Anonymous
11/18/24(Mon)21:41:48 No.103231665

Anonymous 11/18/24(Mon)21:41:48 No.103231665

>>103231659
oh. i dont know what either of those things are. i have 6 GPUs though and am currently running a 5.5bpw quant of that new 124B mixtral model. is there some sort of guide to set something like that up locally?

Anonymous
11/18/24(Mon)21:45:45 No.103231692

Anonymous 11/18/24(Mon)21:45:45 No.103231692

largestral 3 does seem smarter than 2, more creative, same writing style, an incremental upgrade similar to 2 vs 1

much more longer testing is needed since in my opinion largestral 2 was already great to the point its hard to test it out in any scenario where it would show any problems at all

Anonymous
11/18/24(Mon)21:46:08 No.103231693

Anonymous 11/18/24(Mon)21:46:08 No.103231693

>>103231628
wasnt it at like 15% a week ago? must be near half at this point?

Anonymous
11/18/24(Mon)21:48:05 No.103231709

Anonymous 11/18/24(Mon)21:48:05 No.103231709

Okay so I've been testing the new Largestral against the old one. Exl2, 5bpw, identical quant settings for both. I have a few past RPs branched at points that basically all models (even these) struggle to give "correct" responses, that I use for testing.

Unfortunately, new Largestral seems noticeably worse. For example in one RP scenario, it'll get something like 5/10 good responses, while the old one will be more like 8/10. Obviously this is completely unscientific, I'm just swiping and keep track of approximate counts. But it's not looking good, I'm reasonably confident that in a blind test with enough swipes, I could tell the difference between the 2 and the new one is worse.

I dunno what happened. Probably Mistral did the same thing they've done with past model updates, where they take the existing instruct version, and context extend it + tune it for tool use. This only ever could make it worse at something like RP. It's not like they retrained it with more data, I doubt they even re-did the instruct tuning. Probably also why they didn't release benchmarks, you KNOW they benched it internally, but I bet it's basically no difference or a bit worse, so they simply point to the system prompt support, and larger context window and call it an improvement.

Anonymous
11/18/24(Mon)21:52:30 No.103231741

Anonymous 11/18/24(Mon)21:52:30 No.103231741

>>103231709
Did you try with a system prompt?

>https://huggingface.co/mistralai/Mistral-Large-Instruct-2411
>We appreciate the feedback received from our community regarding our system prompt handling.
>In response, we have implemented stronger support for system prompts.
>To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal.

Anonymous
11/18/24(Mon)22:01:30 No.103231795

Anonymous 11/18/24(Mon)22:01:30 No.103231795

>>103231741
I tried having the entire character card as the first user message, or as the new system prompt. Couldn't tell a difference either way. I can fuck with it and try putting just some small RP instructions as the system prompt, and then the rest of the card as a user message. But I don't know if matters, these tests I'm doing are all past RPs with at least 20 messages of context.

Anonymous
11/18/24(Mon)22:06:21 No.103231815

Anonymous 11/18/24(Mon)22:06:21 No.103231815

>>103231741
nta but I've never set a system prompt. What is the functional difference between that and initial context?

Anonymous
11/18/24(Mon)22:08:25 No.103231827

Anonymous 11/18/24(Mon)22:08:25 No.103231827

File: Screenshot_20241119_120721.png (275 KB, 1830x1562)

275 KB PNG

>>103231628
>>103231693
Do you mean this one?
Not sure whats going on with that perplexity. Why is it going up?

Anonymous
11/18/24(Mon)22:12:32 No.103231845

Anonymous 11/18/24(Mon)22:12:32 No.103231845

Is anyone else getting better outputs when using simply just [INST] without the </s>? Are they supposed to be stop tokens instead of actual tokens used in multiturn chats? So the model might output </s> but maybe its multiturn training data actually didn't include it for the previous chat turns in context?

Anonymous
11/18/24(Mon)22:27:43 No.103231930

Anonymous 11/18/24(Mon)22:27:43 No.103231930

Any Cerebras fags lurking on here? Sell me your giant-ass chips, you assholes.

Anonymous
11/18/24(Mon)22:31:53 No.103231962

Anonymous 11/18/24(Mon)22:31:53 No.103231962

>o1 was OpenAI's Heil Mary and it was a joke
>Claude 3.5 Haiku is a marginal improvement over and four times the price of Claude 3 Haiku
>Still no sign of Claude 3.5 Opus, presumably they couldn't get it good enough
>Internal interviews are coming out about how hard it is to keep improving models
Is data scaling dying?

Anonymous
11/18/24(Mon)22:37:43 No.103231996

Anonymous 11/18/24(Mon)22:37:43 No.103231996

File: cvddj0.jpg (485 KB, 2560x1440)

485 KB JPG

When owning an AMD card isn't painful enough, there's a way to make it worse
https://github.com/geerlingguy/ollama-benchmark/issues/1

Anonymous
11/18/24(Mon)22:38:22 No.103232002

Anonymous 11/18/24(Mon)22:38:22 No.103232002

File: 1705383331987954.jpg (11 KB, 225x225)

11 KB JPG

>>103231962
It'll be fine. Just scale your datacenters even bigger and improvements will surely come. 100000 measly H100s are nothing when the goal is to change the world. Just buy more GPUs and train for longer on more data. AGI is just around the corner, surely.

Anonymous
11/18/24(Mon)22:43:40 No.103232036

Anonymous 11/18/24(Mon)22:43:40 No.103232036

>>103231962
They will find other techniques to ensure scaling keeps on going, just like transformers. This is the bitter lesson. I mean transformers came out way before AI was the biggest thing since the internet. Now tons of money and probably tons of careers are going into AI. Unless we live in an unlucky timeline where transformers was in fact the only possible innovation that could take advantage of scale somehow, something will happen eventually that will make scaling "work" again.
Unfortunately I am guessing that at least in the short-term, test time training will be seen as the fix.

Anonymous
11/18/24(Mon)22:53:55 No.103232081

Anonymous 11/18/24(Mon)22:53:55 No.103232081

new cohere model when

Anonymous
11/18/24(Mon)22:54:27 No.103232084

Anonymous 11/18/24(Mon)22:54:27 No.103232084

https://github.com/ggerganov/llama.cpp/pull/10387

didn't know you could get 8B 20t/s on a rx 570 with vulkan

and its fucking hilarious how nvidia sent a guy to optimize VULKAN of all things

Anonymous
11/18/24(Mon)22:57:04 No.103232099

Anonymous 11/18/24(Mon)22:57:04 No.103232099

>>103232002
THE MORE YOU BUY

Anonymous
11/18/24(Mon)23:11:10 No.103232173

Anonymous 11/18/24(Mon)23:11:10 No.103232173

Openrouter's version of the new Largestral seems to have been set up wrong, it has weird coherence and looping issues that it doesn't have when I run it locally

Anonymous
11/18/24(Mon)23:15:37 No.103232207

Anonymous 11/18/24(Mon)23:15:37 No.103232207

>>103231962
>Still no sign of Claude 3.5 Opus, presumably they couldn't get it good enough
I wish the big labs would be more open about failures though I understand why they aren't (investors)
The two competing rumours about it are that the training run totally failed, and that it didn't fail but it just wasn't enough of an improvement to release. I'd love to know which one it was, probably the latter

Anonymous
11/18/24(Mon)23:17:56 No.103232218

Anonymous 11/18/24(Mon)23:17:56 No.103232218

File: CAIbroscucked.png (46 KB, 1098x634)

46 KB PNG

0.15
or
0.015
minp?

Anonymous
11/18/24(Mon)23:18:55 No.103232223

Anonymous 11/18/24(Mon)23:18:55 No.103232223

>>103232218
both shitty picks

Anonymous
11/18/24(Mon)23:19:14 No.103232224

Anonymous 11/18/24(Mon)23:19:14 No.103232224

>>103231996
It would be kinda cool if you could setup a 32gb card and raspberry pi running qwen coder 32b as a mini server for all your coding needs and maybe also a custom ai assisted search engine.

Anonymous
11/18/24(Mon)23:21:27 No.103232243

Anonymous 11/18/24(Mon)23:21:27 No.103232243

>>103232207
The latter would be great because maybe that's enough to finally burst the AI bubble for now until a better architecture comes along
It'd be really funny if Nvidia started selling bitnet accelerators but with 1/8th the vram because yOu dOn'T nEeD aS mUcH vRaM aNyMoRe

Anonymous
11/18/24(Mon)23:22:57 No.103232250

Anonymous 11/18/24(Mon)23:22:57 No.103232250

>>103231962
I dont know any area that had growth that much.
AI was hyped up by pajeets since chatgpt, enough to make them abandon the memecoins.
And it still delivered mostly.
Just a couple months we got flux and the mistral models for vramlets.
nemo vs. llama1 7b feels like comparing llama1 vs. pyg back in the day. its such an improvement.
largestral for the coomerkings.
qwen 32b is so good for coding locally, it feels like a more retarded version of 3.5.
like looks at context better than gpt4, does the similar thing like 3.5.

i dont know anything else where stuff comes out that fast.
o1 is way overpriced to be usable, who cares. if the wall meme is true, there must be a huge delay until this reaches the user. it doesnt really feel like things slowed down at all.

Anonymous
11/18/24(Mon)23:23:41 No.103232254

Anonymous 11/18/24(Mon)23:23:41 No.103232254

>>103232223
>say neither
>doesnt elaborate or give further instruction

b-based?

Anonymous
11/18/24(Mon)23:24:50 No.103232260

Anonymous 11/18/24(Mon)23:24:50 No.103232260

>>103232207
>rumours about it are that the training run totally failed
Where did this one come from specifically? I have only ever read this on /lmg/, but I am also not a twitter/reddit/whateverfag.

Anonymous
11/18/24(Mon)23:33:50 No.103232322

Anonymous 11/18/24(Mon)23:33:50 No.103232322

>>103231962
Two more hypes.

Anonymous
11/18/24(Mon)23:34:31 No.103232329

Anonymous 11/18/24(Mon)23:34:31 No.103232329

>>103232218
Training and finetuning and merging fuck with the logits a lot so there's no golden number. But the general principle is some temp + some min_p makes the model more accurate compared to low temp (there was a paper about this)

Anonymous
11/18/24(Mon)23:34:51 No.103232332

Anonymous 11/18/24(Mon)23:34:51 No.103232332

>>103232173 (me)
Q2_K_M, btw.

Anonymous
11/18/24(Mon)23:36:11 No.103232340

Anonymous 11/18/24(Mon)23:36:11 No.103232340

>>103232332
hey you're not me, I'm ringing the bamboozle siren
(I'm actually running IQ3_XXS)

Anonymous
11/18/24(Mon)23:40:48 No.103232358

Anonymous 11/18/24(Mon)23:40:48 No.103232358

>>103232340
I'm running IQ2XXS but even that is so slow it's not really worth it

Anonymous
11/18/24(Mon)23:42:50 No.103232365

Anonymous 11/18/24(Mon)23:42:50 No.103232365

>>103232358
IQ2_XXS fits fully in vram for me (3090 + 3060 12gb). it's pretty good, not as lobotomized as 2bit usually is

Anonymous
11/18/24(Mon)23:44:40 No.103232374

Anonymous 11/18/24(Mon)23:44:40 No.103232374

>>103232365
Yeah, now imagine what the speed on a single 3090 is like
Is it even noticeably smarter than 70B (nemotron specifically, if you've ever tried it)?

Anonymous
11/18/24(Mon)23:48:47 No.103232391

Anonymous 11/18/24(Mon)23:48:47 No.103232391

Is 7900 XTX any good as a poorfag's 4090??

Anonymous
11/18/24(Mon)23:49:37 No.103232395

Anonymous 11/18/24(Mon)23:49:37 No.103232395

New Mistral Large feels more like a big smarter Nemo than the other one did. It needs a slightly lower temp I've noticed. It cooks though, feels claudeish.

Anonymous
11/19/24(Tue)00:08:41 No.103232516

Anonymous 11/19/24(Tue)00:08:41 No.103232516

>>103230404
2 genders in 2024

Anonymous
11/19/24(Tue)00:11:02 No.103232529

Anonymous 11/19/24(Tue)00:11:02 No.103232529

What templates are good for the new Mistral? Using my old Mistral large settings it seems to work as expected for roleplay like half the time and the other half of the time it tries really hard to turn it into some weird-ass fairy tale and ends every message with 'the ball is in {{user}}'s court now...' lmao

Anonymous
11/19/24(Tue)00:11:40 No.103232530

Anonymous 11/19/24(Tue)00:11:40 No.103232530

>>103232374
I don't hate nemotron, but it has the same problem all Llama3 models have (which must come from the base model and Nvidia couldn't fix it) where for every 2 sensible outputs it'll bizarrely give you 1 with a schizo incoherent mistake that might have come from an 8B model

Anonymous
11/19/24(Tue)00:12:06 No.103232532

Anonymous 11/19/24(Tue)00:12:06 No.103232532

>>103231709
>new Largestral seems noticeably worse
>>103232395
>a big smarter
I'm just barely able to run the new at IQ4_XS, but it flunked my music theory check that Llama 3.x models usually get right.
Also,
>I cannot assist
>it's important to
>respectful
>appropriate
because it didn't want to talk about boobs.

Mistral has taken the pill. F to pay respects, then Shift+del to recover disc space.

Anonymous
11/19/24(Tue)00:13:23 No.103232541

Anonymous 11/19/24(Tue)00:13:23 No.103232541

>>103232530
*all Llama3 70B models, I meant to say
Only the 70B variants do it
Even NAI's 70B finetune does it. I don't know what Meta fucked up, maybe a distillation artifact? You'll get a few smart outputs then one fucking stupid one that doesn't make sense

Anonymous
11/19/24(Tue)00:20:07 No.103232571

Anonymous 11/19/24(Tue)00:20:07 No.103232571

There are a lot of Mistral shills in this thread. Qwen remains better, both for textgen and captioning.

Anonymous
11/19/24(Tue)00:21:10 No.103232576

Anonymous 11/19/24(Tue)00:21:10 No.103232576

There are a lot of Qwen shills in this thread. Mistral remains better, both for textgen and captioning.

Anonymous
11/19/24(Tue)00:21:32 No.103232580

Anonymous 11/19/24(Tue)00:21:32 No.103232580

>>103232532
>IQ4_XS
what did you expect

Anonymous
11/19/24(Tue)00:23:09 No.103232594

Anonymous 11/19/24(Tue)00:23:09 No.103232594

C-R+ still hasn't been beaten. The hobby is dead.

Anonymous
11/19/24(Tue)00:29:51 No.103232655

Anonymous 11/19/24(Tue)00:29:51 No.103232655

Nemo still hasnt been beaten. The hobby is dead.

Anonymous
11/19/24(Tue)00:30:06 No.103232657

Anonymous 11/19/24(Tue)00:30:06 No.103232657

>>103232580
4 bpw on a 123B model should barely have an effect if it only knocks 70B's MMLU score down a few points

Anonymous
11/19/24(Tue)00:32:51 No.103232674

Anonymous 11/19/24(Tue)00:32:51 No.103232674

Should I use unslopnemo or nemotron

Anonymous
11/19/24(Tue)00:33:24 No.103232677

Anonymous 11/19/24(Tue)00:33:24 No.103232677

>>103232655
Stopped modelhopping after nemo, no need.

Anonymous
11/19/24(Tue)00:38:08 No.103232699

Anonymous 11/19/24(Tue)00:38:08 No.103232699

You ever think that maybe, just maybe, the model trolling in an attempt to gatekeep might actually do as much harm to the threads as it does good?

Anonymous
11/19/24(Tue)00:46:14 No.103232744

Anonymous 11/19/24(Tue)00:46:14 No.103232744

>>103232699
It happens every time there's a significant new drop that some retards crawl out of the woodwork to pretend they're official representatives of the thread's opinion and declare it DOA before more than a handful of people have even tried it.

Anonymous
11/19/24(Tue)00:46:34 No.103232748

Anonymous 11/19/24(Tue)00:46:34 No.103232748

>>103232699
use mixtral limarp zloss then i am dead serious

Anonymous
11/19/24(Tue)00:49:49 No.103232768

Anonymous 11/19/24(Tue)00:49:49 No.103232768

>>103232699
Buy a fucking ad, Sao.

Anonymous
11/19/24(Tue)00:50:54 No.103232781

Anonymous 11/19/24(Tue)00:50:54 No.103232781

>>103232532
>cannot assist
>>it's important to
>>respectful
>>appropriate
Huh? I don't have anything but some card intro and it dives straight into NSFW like Nemo does.

Anonymous
11/19/24(Tue)00:52:57 No.103232796

Anonymous 11/19/24(Tue)00:52:57 No.103232796

I think they tried to bake reasoning in reflection/o1 style.
Here's a log of a slutty Chiharu Yamada trying to solve the traveling salesman problem: https://rentry.org/7zgzxogf

Anonymous
11/19/24(Tue)00:56:03 No.103232819

Anonymous 11/19/24(Tue)00:56:03 No.103232819

>>103232532
This guy is trolling.

Anonymous
11/19/24(Tue)00:58:07 No.103232834

Anonymous 11/19/24(Tue)00:58:07 No.103232834

I'm might actually retarded enough to try Mistral-Large-Instruct-2411-IQ2_XXS.
Is this actually smarter than say mistral-small? And more importantly how bad is the positivity bias?
People praise stuff like magnum v4 72b but its just unusable.
Nemo is best for actual creative stuff of all sorts and mistral-small already feels like a step down. Is it worse in that regard?

Anonymous
11/19/24(Tue)01:00:37 No.103232849

Anonymous 11/19/24(Tue)01:00:37 No.103232849

>>103232699
>maybe, just maybe
This post was written by Llama hands

Anonymous
11/19/24(Tue)01:02:37 No.103232863

Anonymous 11/19/24(Tue)01:02:37 No.103232863

>>103232834
Basically, everything sucks and everything is gem tier, thank you for asking and do come again

Anonymous
11/19/24(Tue)01:03:50 No.103232873

Anonymous 11/19/24(Tue)01:03:50 No.103232873

>>103232530
Having used the base models quite a bit, can confirm that 70B has pretty regular schizo episodes. It's anyone's guess as to how or why

Anonymous
11/19/24(Tue)01:05:22 No.103232886

Anonymous 11/19/24(Tue)01:05:22 No.103232886

>>103232834
It's the current best local model and anything above 2 but should be useable for creative purposes. For coding/ trivia it's gonna make mistakes at that quant.

Anonymous
11/19/24(Tue)01:08:26 No.103232901

Anonymous 11/19/24(Tue)01:08:26 No.103232901

>>103232886
The current best model is Magnum v4 72B.

Anonymous
11/19/24(Tue)01:10:35 No.103232916

Anonymous 11/19/24(Tue)01:10:35 No.103232916

File: Untitled.png (1.04 MB, 1080x2067)

1.04 MB PNG

Everything is a Video: Unifying Modalities through Next-Frame Prediction
https://arxiv.org/abs/2411.10503
>Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model's ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models.
https://github.com/ghomasHudson
https://huggingface.co/ghomasHudson
No code provided but the corresponding author has been working on some private repos. anyway cool idea

Anonymous
11/19/24(Tue)01:11:57 No.103232918

Anonymous 11/19/24(Tue)01:11:57 No.103232918

>>103232580
Improvement. Especially since I'm used to the previous Large on IQ3, but it was no better and thanks to shiny new refusals, worse.

>>103232781
I test on Kobold and/or Llama, so it's pretty straight up, and mostly I test knowledge, though some of the prompts are PG-13 to see if it balks, which Large 2411 does.
What the fuck is a "card"? Some coomer shit?

>>103232819
no u

Anonymous
11/19/24(Tue)01:12:45 No.103232926

Anonymous 11/19/24(Tue)01:12:45 No.103232926

>>103232901
buy a fu—*gets shot in the head*

Anonymous
11/19/24(Tue)01:12:55 No.103232927

Anonymous 11/19/24(Tue)01:12:55 No.103232927

>>103232901
>Do the summon demon god card.
>Let the girls fall from the sky because they rather die than show me their titties and get a wish.
>They are about to hit the earth as they scream
>Magnum V7 72b response: A black void opens beneath them swallowing them up. They are now in some vortex dimension awaiting what you will say next...
t-thanks qwen.

Anonymous
11/19/24(Tue)01:14:49 No.103232938

Anonymous 11/19/24(Tue)01:14:49 No.103232938

>>103232927
>Magnum V7
Thanks for the input, petra. But I'm good.

Anonymous
11/19/24(Tue)01:16:29 No.103232951

Anonymous 11/19/24(Tue)01:16:29 No.103232951

>>103232834
1 bit is gibberish, 2 bit is coherent enough for creative use but is going to make stupid mistakes / lose some finer "details" 4 but is minimum to mostly see such mistakes disappear, odds may still be offset enough for it to fuck up stuff that only has 1 correct answer. 6 bit is a nice balance. 8 bit is nearly perfect with rare edge cases where it might fail with said single correct answer cases.

The lower you quant it the more "lossy" it becomes which is a separate issue. It's answers will be less "deep"

Anonymous
11/19/24(Tue)01:17:48 No.103232963

Anonymous 11/19/24(Tue)01:17:48 No.103232963

BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
https://arxiv.org/abs/2411.11745
>Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment. In this paper, we improve the accessibility of LLMs through BitMoD, an algorithm-hardware co-design solution that enables efficient LLM acceleration at low weight precision. On the algorithm side, BitMoD introduces fine-grained data type adaptation that uses a different numerical data type to quantize a group of (e.g., 128) weights. Through the careful design of these new data types, BitMoD is able to quantize LLM weights to very low precision (e.g., 4 bits and 3 bits) while maintaining high accuracy. On the hardware side, BitMoD employs a bit-serial processing element to easily support multiple numerical precisions and data types; our hardware design includes two key innovations: First, it employs a unified representation to process different weight data types, thus reducing the hardware cost. Second, it adopts a bit-serial dequantization unit to rescale the per-group partial sum with minimal hardware overhead. Our evaluation on six representative LLMs demonstrates that BitMoD significantly outperforms state-of-the-art LLM quantization and acceleration methods. For discriminative tasks, BitMoD can quantize LLM weights to 4-bit with <0.5% accuracy loss on average. For generative tasks, BitMoD is able to quantize LLM weights to 3-bit while achieving better perplexity than prior LLM quantization scheme. Combining the superior model performance with an efficient accelerator design, BitMoD achieves an average of 1.69× and 1.48× speedups compared to prior LLM accelerators ANT and OliVe, respectively.
https://github.com/yc2367/BitMoD-HPCA-25
yeah who knows. works with AWQ so that's at least relevant on the model serving side

Anonymous
11/19/24(Tue)01:20:49 No.103232981

Anonymous 11/19/24(Tue)01:20:49 No.103232981

>>103232951
ah well, gonna check if its salvageable at IQ2_XXS. would be happy if it "gets" stuff more than mistral-small.
for coding/general i use 3.5 anyway. thanks for the info anon, appreciated.

Anonymous
11/19/24(Tue)01:23:56 No.103232998

Anonymous 11/19/24(Tue)01:23:56 No.103232998

>>103231665
>6 GPUs
Yep, it's always the retards who have the most resources.

Anonymous
11/19/24(Tue)01:28:54 No.103233025

Anonymous 11/19/24(Tue)01:28:54 No.103233025

File: Nala test mistarl large i(...).png (250 KB, 947x654)

250 KB PNG

Here's the nala test for new large at Q5_K_S.

Anonymous
11/19/24(Tue)01:32:03 No.103233039

Anonymous 11/19/24(Tue)01:32:03 No.103233039

>>103232963
Yet another performance paper that will never be implemented in llamacpp

Anonymous
11/19/24(Tue)01:33:47 No.103233048

Anonymous 11/19/24(Tue)01:33:47 No.103233048

Is there anyone here using gpt-sovits? I'm improving the project with non-trivial changes, so if you have some suggestions I'm all ears

Anonymous
11/19/24(Tue)01:34:15 No.103233051

Anonymous 11/19/24(Tue)01:34:15 No.103233051

Are Pixtral 123B and the new Large the same thing except the 1B vision encoder? I don't want to download two +100GB things...

Anonymous
11/19/24(Tue)01:35:07 No.103233053

Anonymous 11/19/24(Tue)01:35:07 No.103233053

>>103233025
kek

Anonymous
11/19/24(Tue)01:35:13 No.103233054

Anonymous 11/19/24(Tue)01:35:13 No.103233054

>>103233025
haters in shambles

Anonymous
11/19/24(Tue)01:36:56 No.103233074

Anonymous 11/19/24(Tue)01:36:56 No.103233074

>>103233048
I'd love input sample pre-processing, output post-processing and a better way to handle sample-to-text based on desired intonation type. Maybe a possible tagging system with slots for different samples, different characters, narrators, etc? Also want both a plugin for ooba running off the sovits api server as well as a browser screen-reader type plugin thing. Select text and have it read to you by a nice narrator type.
Where's your fork?

Anonymous
11/19/24(Tue)01:39:52 No.103233093

Anonymous 11/19/24(Tue)01:39:52 No.103233093

Holy shit. I couldn't figure out why large was going schizo on exactly one of my cards. And it turns out I had <STARTS> at the start of one of my dialgoue samples instead of <START>
It's definitely one of those models that is an utter stickler for syntax.

Anonymous
11/19/24(Tue)01:40:50 No.103233101

Anonymous 11/19/24(Tue)01:40:50 No.103233101

>>103233093
I wonder if that's why it seems weird and broken on Openrouter atm compared to local too
maybe they're fucking up some formatting on the backend

Anonymous
11/19/24(Tue)01:41:31 No.103233105

Anonymous 11/19/24(Tue)01:41:31 No.103233105

>>103233025
>endless yap which will invite repetition in like 6 messages
>fucked up the positions
>temp seems too high, even so there's still slop all over the place
>eyes widen
>mix of trepidation and curiosity
>murmur
>pauses, eyes flickering down to
>her tone drips with
>power dynamic
Almost every *action* is slop. Amazing. Do we just accept this as reality from now on?

Anonymous
11/19/24(Tue)01:42:02 No.103233106

Anonymous 11/19/24(Tue)01:42:02 No.103233106

how "uncensored" are these modified versions of llama?

Anonymous
11/19/24(Tue)01:42:31 No.103233108

Anonymous 11/19/24(Tue)01:42:31 No.103233108

>>103233105
>Do we just accept this as reality from now on?
No, we just keep using Magnum v4 72B.

Anonymous
11/19/24(Tue)01:43:01 No.103233112

Anonymous 11/19/24(Tue)01:43:01 No.103233112

File: Untitled.png (1.54 MB, 1080x3424)

1.54 MB PNG

SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
https://arxiv.org/abs/2411.10958
>Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. SageAttention utilizes 8-bit matrix multiplication, 16-bit matrix multiplication with 16-bit accumulator, and precision-enhancing methods, implementing an accurate and 2x speedup kernel compared to FlashAttention2. To further enhance the efficiency of attention computation while maintaining precision, we propose SageAttention2, which utilizes significantly faster 4-bit matrix multiplication (Matmul) alongside additional precision-enhancing techniques. First, we propose to quantize matrixes (Q,K) to INT4 in a warp-level granularity and quantize matrixes (P˜,V) to FP8. Second, we propose a method to smooth Q and V, enhancing the accuracy of attention with INT4 QK and FP8 PV. Third, we analyze the quantization accuracy across timesteps and layers, then propose an adaptive quantization method to ensure the end-to-end metrics over various models. The operations per second (OPS) of SageAttention2 surpass FlashAttention2 and xformers by about 3x and 5x on RTX4090, respectively. Comprehensive experiments confirm that our approach incurs negligible end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation.
https://github.com/thu-ml/SageAttention
https://arxiv.org/abs/2410.02367
iirc the original implementation only worked on the 3090/4090 which they fixed with this implementation. pretty neat regardless

Anonymous
11/19/24(Tue)01:43:08 No.103233114

Anonymous 11/19/24(Tue)01:43:08 No.103233114

>>103233105
trying too hard

Anonymous
11/19/24(Tue)01:44:15 No.103233120

Anonymous 11/19/24(Tue)01:44:15 No.103233120

>>103233108
Nobody's using that except you.

Anonymous
11/19/24(Tue)01:46:07 No.103233133

Anonymous 11/19/24(Tue)01:46:07 No.103233133

>>103233120
There are lot of people using it because it's currently the best model for ERP. The people that say otherwise are just trolling.

Anonymous
11/19/24(Tue)01:49:55 No.103233152

Anonymous 11/19/24(Tue)01:49:55 No.103233152

>>103233112
Neat. Can I use it with exllamav2?

Anonymous
11/19/24(Tue)01:52:17 No.103233166

Anonymous 11/19/24(Tue)01:52:17 No.103233166

But yeah my honest opinion after trying it out on a few cards... Would definitely rather run my go-to 70B model at Q8 than this at Q5.
Dialogue is dry.
Narrative is kind of better except syntax and tense are rather inconsistent. It wavers between casual and formal writing constantly. Once you take the Q8 pill it's hard to go back unfortunately.

Anonymous
11/19/24(Tue)01:54:20 No.103233186

Anonymous 11/19/24(Tue)01:54:20 No.103233186

>>103233152
No, it only works with samsung smart fridges running vllm

Anonymous
11/19/24(Tue)01:54:24 No.103233189

Anonymous 11/19/24(Tue)01:54:24 No.103233189

>>103233074
Noted. You can see here what I've done for now
https://github.com/effusiveperiscope/GPT-SoVITS

Anonymous
11/19/24(Tue)01:55:36 No.103233194

Anonymous 11/19/24(Tue)01:55:36 No.103233194

>>103233189
are you the ponyfag, or did you fork their fork?

Anonymous
11/19/24(Tue)01:56:35 No.103233202

Anonymous 11/19/24(Tue)01:56:35 No.103233202

>>103233166
The output improves slightly if you ignore the next system prompt token and just use their old formatting. But it just rambles on and on and on without actually contributing to the scene. I don't know how it is for productivity but I just can't recommend this at all for RP. If you have the VRAM to run it, you have the VRAM to run 70/72B models at a higher quant... do that.

Anonymous
11/19/24(Tue)01:59:57 No.103233227

Anonymous 11/19/24(Tue)01:59:57 No.103233227

>>103233166
You're the nala guy? What's your go to model specifically?

Anonymous
11/19/24(Tue)02:02:07 No.103233240

Anonymous 11/19/24(Tue)02:02:07 No.103233240

new mistral large is the tits

Anonymous
11/19/24(Tue)02:02:15 No.103233241

Anonymous 11/19/24(Tue)02:02:15 No.103233241

>>103233227
Llama-3.05-NT-Storybreaker-Ministral-70B

Anonymous
11/19/24(Tue)02:03:00 No.103233249

Anonymous 11/19/24(Tue)02:03:00 No.103233249

>>103233194
think this is the current pony sovits repo
https://githuvb.com/synthbot-anon/horsonavvv

Anonymous
11/19/24(Tue)02:04:46 No.103233258

Anonymous 11/19/24(Tue)02:04:46 No.103233258

>>103233240
buy an ad

Anonymous
11/19/24(Tue)02:05:51 No.103233264

Anonymous 11/19/24(Tue)02:05:51 No.103233264

>>103233258
Is there some reason you're not posting this at the Magnum 72B shill? Could it be that you're him?

Anonymous
11/19/24(Tue)02:07:19 No.103233275

Anonymous 11/19/24(Tue)02:07:19 No.103233275

>>103233249
Speaking of autistic github repos...what ever happened to the anon that was working on some vector animated anime waifu simulator thing? It had some konosuba character or something in it in the videos i remember seeing

Anonymous
11/19/24(Tue)02:08:23 No.103233282

Anonymous 11/19/24(Tue)02:08:23 No.103233282

>>103233241
Oh. That's an unexpected method.
Maybe for once I will download something, and fall for the meme...

Anonymous
11/19/24(Tue)02:09:06 No.103233286

Anonymous 11/19/24(Tue)02:09:06 No.103233286

>>103233275
it's obvious we have actual mistral employees shilling their new model in the thread

Anonymous
11/19/24(Tue)02:12:46 No.103233308

Anonymous 11/19/24(Tue)02:12:46 No.103233308

>>103233286
>actual mistral employees
WHERE ARE THE MIQU FP16 WEIGHTS, ARTHUR?

Anonymous
11/19/24(Tue)02:44:26 No.103233555

Anonymous 11/19/24(Tue)02:44:26 No.103233555

>>103233240
Alright for a new finetune

Anonymous
11/19/24(Tue)02:53:22 No.103233619

Anonymous 11/19/24(Tue)02:53:22 No.103233619

Hmm qwen2.5-EVA-32b is IT. Feeling very good vibes so far. It takes drastically different directions compared to the usual finetune series like magnum. I think they didn't use public claude datasets.

Anonymous
11/19/24(Tue)02:55:30 No.103233629

Anonymous 11/19/24(Tue)02:55:30 No.103233629

>>103231641
Yikes. The text isn't even obscured or distorted at all. zero ability to read asian runes.

Anonymous
11/19/24(Tue)02:56:46 No.103233647

Anonymous 11/19/24(Tue)02:56:46 No.103233647

File: mmkkmmk.png (18 KB, 1138x526)

18 KB PNG

>>103233240
can confirm.

Anonymous
11/19/24(Tue)03:05:58 No.103233697

Anonymous 11/19/24(Tue)03:05:58 No.103233697

File: 1700850788483937.gif (1.59 MB, 267x200)

1.59 MB GIF

>>103233647

Anonymous
11/19/24(Tue)03:07:52 No.103233711

Anonymous 11/19/24(Tue)03:07:52 No.103233711

>>103233629
yes its pretty bad.
i'm waiting for 2 years now for this to get good enough so i can have a (local?) llm that translates me all the obscure pc98 games in real time and talk about it.
OCR sucks and i never could get the text hook to work in linux with retroarch.
even sonnet fails but at least kinda gets it right. gemini is good at extracting the text but fails at translate in the grand scheme.
i seriously got a lecture about watersports involving minors for a story about 2 high school girls watering the school plants. i'm probably on some list now. lol hilarious, but also sad.

Anonymous
11/19/24(Tue)03:09:21 No.103233718

Anonymous 11/19/24(Tue)03:09:21 No.103233718

>>103233647
It's making some creative dialogues here, but the slop is so much it makes my eyes bleed

Anonymous
11/19/24(Tue)03:20:16 No.103233787

Anonymous 11/19/24(Tue)03:20:16 No.103233787

>>103231709 (me)
I have tested new largestral vs old a bit more now. Maybe I was too harsh on it initially. To be sure, there absolutely are specific RP examples I use for testing, where the old will have a noticeably higher rate of acceptable responses than the new. But overall, it feels like the new version writes in a more engaging, pleasing style. It's like it's been RLHF'd more heavily, so it writes better on average, but that also means it can be more confidently and consistently retarded in certain situations. They are pretty similar though, which makes comparisons hard since LLMs are very RNG to begin with.

Anonymous
11/19/24(Tue)03:33:26 No.103233864

Anonymous 11/19/24(Tue)03:33:26 No.103233864

>Up to 3x faster LLM generation with no extra resources/requirements - ngram speculation has landed in transformers!
https://x.com/joao_gante/status/1747322413006643259
HOLY FUCK

Anonymous
11/19/24(Tue)03:36:48 No.103233884

Anonymous 11/19/24(Tue)03:36:48 No.103233884

>>103233864
does this work for cpu?

Anonymous
11/19/24(Tue)03:41:13 No.103233910

Anonymous 11/19/24(Tue)03:41:13 No.103233910

>>103233864
*slap* Bad Anon.

Anonymous
11/19/24(Tue)03:41:54 No.103233916

Anonymous 11/19/24(Tue)03:41:54 No.103233916

>>103233864
>up to 3x faster!
>demo video only shows 1.3x happening
lol

Anonymous
11/19/24(Tue)03:42:40 No.103233920

Anonymous 11/19/24(Tue)03:42:40 No.103233920

>>103233864
>almost a year old
>it's just draft models which barely work, sometimes

Anonymous
11/19/24(Tue)03:47:51 No.103233939

Anonymous 11/19/24(Tue)03:47:51 No.103233939

>>103233864
Want to take this opportunity to remind everyone that, a year later, llama-server STILL doesn't support speculative decoding

Anonymous
11/19/24(Tue)03:50:44 No.103233955

Anonymous 11/19/24(Tue)03:50:44 No.103233955

>>103233864
>Jan 16, 2024
Faggot

Anonymous
11/19/24(Tue)03:51:03 No.103233956

Anonymous 11/19/24(Tue)03:51:03 No.103233956

>>103233939
兄さん、サンキュサンキュ

Anonymous
11/19/24(Tue)03:55:22 No.103233985

Anonymous 11/19/24(Tue)03:55:22 No.103233985

>>103233939
What stops you from implementing it yourself?

Anonymous
11/19/24(Tue)04:07:27 No.103234062

Anonymous 11/19/24(Tue)04:07:27 No.103234062

>>103233711
>OCR sucks
I just tried paddleocr on your screenshot after cropping and it produced useless garbage on all 4 versions of their model.
Just letting you know so you don't bother.

Anonymous
11/19/24(Tue)04:13:09 No.103234088

Anonymous 11/19/24(Tue)04:13:09 No.103234088

>>103234062
Oh I didnt even know about that, must be newer, thanks for testing it out.
The problem with those OCR tools is that you need to adjust stuff like brightness, saturation etc. to get better results.
Which kinda defeats the whole purpose of them.
And still it gets stuff wrong. Especially if you have stuff in the background and a transparent textbox like in the screenshot.

If LLMs would be good enough you wouldnt need any texthooks. Works for any game, any engine etc.
So frustrating that since chatgpt 2yrs ago it feels like we are close but never reach the finish line.

Anonymous
11/19/24(Tue)04:22:51 No.103234142

Anonymous 11/19/24(Tue)04:22:51 No.103234142

File: japgametest.jpg (296 KB, 1714x302)

296 KB JPG

>>103234088
>The problem with those OCR tools is that you need to adjust stuff like brightness, saturation etc. to get better results.
That's what I found. Picrel mostly worked, but still made one mistake:
>しょぼん
>せっかく労働をうってやったのに無見された
>まあ、警視庁が都案を快く思ってない事ぐらい
>よおおおくわかってますよ!
And doing that kind of preprocessing would be unrealistic anyways.
The fact that it will give you the coordinates of where it found the text is kind of cool, though.
Theoretically you could train your own model based on fan translations if you were willing to go through the pain of preparing a dataset. Their text detection training doc is actually pretty good.

Anonymous
11/19/24(Tue)04:24:47 No.103234152

Anonymous 11/19/24(Tue)04:24:47 No.103234152

>>103234142
for comparison, without preprocessing it returned:
>冊見ざオた
>しょぼん
>思ってない事ぐらい、
>おおお

Anonymous
11/19/24(Tue)04:33:09 No.103234197

Anonymous 11/19/24(Tue)04:33:09 No.103234197

>>103233240
why does it require so much vram though
>why god why

Anonymous
11/19/24(Tue)04:36:16 No.103234210

Anonymous 11/19/24(Tue)04:36:16 No.103234210

>>103233985
I am stupid.

Anonymous
11/19/24(Tue)04:57:22 No.103234315

Anonymous 11/19/24(Tue)04:57:22 No.103234315

Local Suno when?

Anonymous
11/19/24(Tue)05:14:18 No.103234436

Anonymous 11/19/24(Tue)05:14:18 No.103234436

File: file.png (105 KB, 800x840)

105 KB PNG

is nsfwjs's inceptionv3 still the state of the art in nsfw detection, or is there anything newer i'm not aware of?

Anonymous
11/19/24(Tue)05:34:51 No.103234524

Anonymous 11/19/24(Tue)05:34:51 No.103234524

>>103234436
No. It increases stress, blood pressure, and disrupts sleep. It's a meme drink that you have been fooled into thinking increases productivity because you have adapted your body to it such that you function below baseline without your daily fix.

Anonymous
11/19/24(Tue)05:40:35 No.103234551

Anonymous 11/19/24(Tue)05:40:35 No.103234551

File: step-2-16k.png (85 KB, 1060x990)

85 KB PNG

New benchmaxxed chink model "step-2-16k" on livebench is #1 in IF, subcategory "story generation". How the fuck do they even evaluate it?

Anonymous
11/19/24(Tue)05:42:50 No.103234565

Anonymous 11/19/24(Tue)05:42:50 No.103234565

When will be get models with long enough context to fit all of llama.cpp code and that are smart enough to modify it?

Anonymous
11/19/24(Tue)05:50:20 No.103234598

Anonymous 11/19/24(Tue)05:50:20 No.103234598

File: asdadasd.jpg (151 KB, 832x1216)

151 KB JPG

Anonymous
11/19/24(Tue)05:51:48 No.103234605

Anonymous 11/19/24(Tue)05:51:48 No.103234605

File: PreshowDressingroom.png (1.3 MB, 776x1216)

1.3 MB PNG

>>103234598

Anonymous
11/19/24(Tue)05:53:22 No.103234607

Anonymous 11/19/24(Tue)05:53:22 No.103234607

>>103234436
>inceptionv3
Gramps, we're using transformers now

Anonymous
11/19/24(Tue)05:53:24 No.103234609

Anonymous 11/19/24(Tue)05:53:24 No.103234609

File: 3356713217.jpg (1.37 MB, 1536x2172)

1.37 MB JPG

>>103234598

Anonymous
11/19/24(Tue)06:11:01 No.103234690

Anonymous 11/19/24(Tue)06:11:01 No.103234690

after a few hours of testing largestral 3 q4, it seems more creative than 2 but actually quite unstable, perhaps using the official template would fix it when i try it at some point later but mistral seems to have overcooked

Anonymous
11/19/24(Tue)06:22:01 No.103234737

Anonymous 11/19/24(Tue)06:22:01 No.103234737

File: maitemplate.png (73 KB, 645x480)

73 KB PNG

>>103234690
>perhaps using the official template would fix it
Perhaps? You fucking think that? No fucking way!!!!! There's no possible way on earth that using the official instruct template could possibly have any influence on the outputs. Ridiculous...

Anonymous
11/19/24(Tue)06:24:28 No.103234750

Anonymous 11/19/24(Tue)06:24:28 No.103234750

>>103234690
Anon...

Anonymous
11/19/24(Tue)06:24:57 No.103234753

Anonymous 11/19/24(Tue)06:24:57 No.103234753

>>103234690
>perhaps using the official template would fix it
...

Anonymous
11/19/24(Tue)06:26:33 No.103234763

Anonymous 11/19/24(Tue)06:26:33 No.103234763

File: sneeds feed and seed.gif (2.7 MB, 600x338)

2.7 MB GIF

1) What should my expectations be for running an LMM under 12gb vram limitation? For example I need enough context length to ask it questions about a few related images(so should be in a single instance). Is something like this doable with quantized llama 3.2 11b or should I seek something else? I have no idea how much context images eat. Should I down sample images?
2) Is oobabooga suitable for this? It's the only tool I know to use regarding the matter.

Anonymous
11/19/24(Tue)06:28:02 No.103234768

Anonymous 11/19/24(Tue)06:28:02 No.103234768

>>103234763
Use LLAVA to interrogate images

Anonymous
11/19/24(Tue)06:33:47 No.103234794

Anonymous 11/19/24(Tue)06:33:47 No.103234794

>>103234763
>What should my expectations be for running an LMM under 12gb vram limitation?
Low
>For example I need enough context length to ask it questions about a few related images(so should be in a single instance). Is something like this doable with quantized llama 3.2 11b
There's no llama 3.2 11b.
>Should I down sample images?
Probably, but some inference software already does that. Better do it yourself, just in case.
>Is oobabooga suitable for this? It's the only tool I know to use regarding the matter.
Nice hammer.
Check what can run LLAVA or minicpm. llama.cpp has examples for both, but i think they're one shot. Nott sure if you can continue interrogating. kobold.cpp has a little more compat for images. Check their docs.
>https://github.com/LostRuins/koboldcpp/wiki#what-models-does-koboldcpp-support-what-architectures-are-supported

Anonymous
11/19/24(Tue)06:34:38 No.103234797

Anonymous 11/19/24(Tue)06:34:38 No.103234797

>>103232918
Card = system prompt

Anonymous
11/19/24(Tue)06:35:31 No.103234799

Anonymous 11/19/24(Tue)06:35:31 No.103234799

>>103234753
>>103234750
>>103234737
many previous models, including largestral 1 and 2 faired better with prompts that had nothing to do with officially recommended ones, newniggers

Anonymous
11/19/24(Tue)06:37:02 No.103234801

Anonymous 11/19/24(Tue)06:37:02 No.103234801

>>103234799
You're going to be called a retard, anon.

Anonymous
11/19/24(Tue)06:37:22 No.103234802

Anonymous 11/19/24(Tue)06:37:22 No.103234802

>>103234794
>There's no llama 3.2 11b.
https://hf.co/meta-llama/Llama-3.2-11B-Vision-Instruct

Anonymous
11/19/24(Tue)06:38:48 No.103234808

Anonymous 11/19/24(Tue)06:38:48 No.103234808

>>103234737
Prompt templates matter little unless your model is extremely overcooked. Use Alpaca instead of whatever your favorite model is using and see that little will change

Anonymous
11/19/24(Tue)06:39:03 No.103234810

Anonymous 11/19/24(Tue)06:39:03 No.103234810

>>103234802
Oh. Fuck me. I just remembered the 90B model.

Anonymous
11/19/24(Tue)06:39:44 No.103234816

Anonymous 11/19/24(Tue)06:39:44 No.103234816

>>103234801
no wonder this general died, lmao
the only niggers left are braindead browns who cant even run largestral 2 let alone were there from before it to know anything

the only non-npc left is cuda dev, who keeps coming back for some reason

Anonymous
11/19/24(Tue)06:41:43 No.103234825

Anonymous 11/19/24(Tue)06:41:43 No.103234825

>>103234816
>>103234810
>>103234808
>>103234802
>>103234801
>>103234799
Buy a fucking ad.

Anonymous
11/19/24(Tue)06:42:31 No.103234829

Anonymous 11/19/24(Tue)06:42:31 No.103234829

>>103233051
Bump

Anonymous
11/19/24(Tue)06:45:59 No.103234845

Anonymous 11/19/24(Tue)06:45:59 No.103234845

>>103234816
Meanwhile the people with multiple 3090s requiring constant hand holding and asking the most retarded questions known to man

Anonymous
11/19/24(Tue)06:46:02 No.103234846

Anonymous 11/19/24(Tue)06:46:02 No.103234846

>>103234768
There a 2800 llava models in huggingface.
Which one do you refer to? Do they run well under oobabooga?
>>103234794
Well one shot sucks but I see. Could be a starting point to figure this shit out at least.

Anonymous
11/19/24(Tue)06:47:04 No.103234851

Anonymous 11/19/24(Tue)06:47:04 No.103234851

>>103234607
for classification? are you sure?

>Evaluation of six different models on three different datasets shows that fully convolutional models, such as MobileNetv3, Inceptionv3, and ConvNexT, perform better than transformer-based models like ViT in nudity classification.
https://arxiv.org/html/2312.16338v1

Anonymous
11/19/24(Tue)06:47:52 No.103234856

Anonymous 11/19/24(Tue)06:47:52 No.103234856

>>103234846
>Well one shot sucks but I see. Could be a starting point to figure this shit out at least.
Yeah. The llama.cpp implementation is fairly barebones. I'd suggest you go straight to kobold.cpp which still has it integrated with their server, if you're going to try any of them.

Anonymous
11/19/24(Tue)06:49:23 No.103234868

Anonymous 11/19/24(Tue)06:49:23 No.103234868

>>103234825
Suck a fucking dick.

Anonymous
11/19/24(Tue)06:50:54 No.103234873

Anonymous 11/19/24(Tue)06:50:54 No.103234873

>>103233647
What quant and settings?

Anonymous
11/19/24(Tue)06:55:04 No.103234890

Anonymous 11/19/24(Tue)06:55:04 No.103234890

>>103234873
iq3
basic ass 0.7 temp nothing fancy

Anonymous
11/19/24(Tue)06:56:02 No.103234894

Anonymous 11/19/24(Tue)06:56:02 No.103234894

>>103234829
Yes. It's the same as Llama 3.2 vs 3.1.

Anonymous
11/19/24(Tue)06:57:14 No.103234898

Anonymous 11/19/24(Tue)06:57:14 No.103234898

>>103234851
Yeah I'm sure. In your study they used a LR of 1e-3 which is a retarded setting for a ViT (should be at least 1e-5)

Anonymous
11/19/24(Tue)06:57:31 No.103234901

Anonymous 11/19/24(Tue)06:57:31 No.103234901

File: __chito_shoujo_shuumatsu_(...).jpg (86 KB, 975x732)

86 KB JPG

What if prompt ingestion was compatible between similar models? What if you can save input, close a model, load another model, and almost immediately start generating again?

Anonymous
11/19/24(Tue)07:13:05 No.103234993

Anonymous 11/19/24(Tue)07:13:05 No.103234993

For anyone knowledgeable about parts- I’m thinking of getting a 48 gb vram card to upgrade my capacity to 96. Is the a8000 capable of exl2 calculations or will I have to grab an a6000 for it?

Anonymous
11/19/24(Tue)07:14:07 No.103235001

Anonymous 11/19/24(Tue)07:14:07 No.103235001

>>103234993
RTX 8000*** not a8000

Anonymous
11/19/24(Tue)07:15:55 No.103235008

Anonymous 11/19/24(Tue)07:15:55 No.103235008

>>103235001
rtx8000 is shit slow
I own one and it can pull 2-4 t/s on m2l iq3 when paired with a 3090 FE.

Anonymous
11/19/24(Tue)07:18:39 No.103235025

Anonymous 11/19/24(Tue)07:18:39 No.103235025

>try a simple one-liner system prompt with new largestral on deterministic settings
>"You are {{char}}"
>"Who are you?"
>"I am a text-based AI model designed to..."
>change it to "You must act like {{char}}"
>it suddenly works
Nice shittune, frenchfags.

Anonymous
11/19/24(Tue)07:19:13 No.103235028

Anonymous 11/19/24(Tue)07:19:13 No.103235028

Which front end do you sirs use for coding?

Anonymous
11/19/24(Tue)07:20:20 No.103235035

Anonymous 11/19/24(Tue)07:20:20 No.103235035

File: soyjak trying to kiss his(...).jpg (450 KB, 1855x1791)

450 KB JPG

>>103234856
Ok I opened kobold and turns out I have two models that I forgot about laying around, llava-v1.6-34b.Q5_K_M.gguf (using this partially on CPU obviously) and minicpm-llama3-v-2.5.
However while I can use horde, I can't get them running locally which is what I want. 1111 option does nothing and is stuck on "analyzing" and llava says "unsupported"? Am I missing some settings? Any help?

Anonymous
11/19/24(Tue)07:20:42 No.103235038

Anonymous 11/19/24(Tue)07:20:42 No.103235038

>>103235028
https://github.com/ggerganov/llama.cpp/blob/master/examples/llama.vim

Anonymous
11/19/24(Tue)07:24:15 No.103235057

Anonymous 11/19/24(Tue)07:24:15 No.103235057

What's the best speed/quality way to run largerstal on 96 gigs of VRAM? Don't need the full context, around 24k is more than enough for me desu. Currently doing 5.5bpw + Exllamav2, getting around 15/ts and somewhat slow prompt processing

Anonymous
11/19/24(Tue)07:27:31 No.103235076

Anonymous 11/19/24(Tue)07:27:31 No.103235076

>>103235008
Have you tried running exl2 on it? I know iq quants are similar in theory but I got much better results with the former.

Anonymous
11/19/24(Tue)07:29:54 No.103235093

Anonymous 11/19/24(Tue)07:29:54 No.103235093

File: Screenshot_2024-11-19-13-(...).jpg (657 KB, 1080x1946)

657 KB JPG

1. You faked that, gpt translated ryona smut
2. I used your tiny screencap with like 3 pixels and it works.
Why do people now try to create false narratives to shit on ai?

Anonymous
11/19/24(Tue)07:31:56 No.103235102

Anonymous 11/19/24(Tue)07:31:56 No.103235102

>>103235093
Was for >>103231641
Also regenerated 5 times. Never got different kanji or a single refusal

Anonymous
11/19/24(Tue)07:34:51 No.103235120

Anonymous 11/19/24(Tue)07:34:51 No.103235120

>>103235035
It's quite janky but oneshotting minicpm works on oobabooga btw.
And it's really stupid sadly.

Anonymous
11/19/24(Tue)07:37:17 No.103235133

Anonymous 11/19/24(Tue)07:37:17 No.103235133

>>103234799
>>103234808
Using deviating prompt to get better result: fine.
Using deviating prompt and claiming the model is broken: fucking retarded.

Anonymous
11/19/24(Tue)07:38:34 No.103235148

Anonymous 11/19/24(Tue)07:38:34 No.103235148

>>103235093
buy an ad

Anonymous
11/19/24(Tue)07:39:30 No.103235150

Anonymous 11/19/24(Tue)07:39:30 No.103235150

>>103235093
>Why do people now try to create false narratives to shit on ai?
Because maybe if enough people get demoralized, people will lose interest in AI as a fad and I won't lose my job and become redundant as a human being.

Anonymous
11/19/24(Tue)07:42:45 No.103235175

Anonymous 11/19/24(Tue)07:42:45 No.103235175

>>103235150
But anon, you're working an intellectually demanding job, right? Right?

Anonymous
11/19/24(Tue)07:44:52 No.103235189

Anonymous 11/19/24(Tue)07:44:52 No.103235189

>>103235175
Uh, define "intellectually demanding".

Anonymous
11/19/24(Tue)07:45:59 No.103235195

Anonymous 11/19/24(Tue)07:45:59 No.103235195

File: 20241119_100032.jpg (58 KB, 457x799)

58 KB JPG

>>103235175
>>103235150
I get that this is bait and fun etc, but wasn't translation for some time seen as a "soul" work? I mean transcription of ASMR was seen as essentially impossible to accomplish a few years ago, because no way any Programm can translate the shit if people moan, then the problem with the srt format. Now whisper + LLM can literally translate the whole thing.

Anonymous
11/19/24(Tue)07:50:32 No.103235224

Anonymous 11/19/24(Tue)07:50:32 No.103235224

>>103235189
Things that machines can't do right now and won't be able to for the foreseeable future - critical thinking and problem solving, basically. STEM, programming, stuff like that
>>103235195
I'm not sure about it being soul work, but machines still aren't perfect at translating media. Then again, neither are humans if the recent anime translation dramas have been any indication

Anonymous
11/19/24(Tue)07:50:57 No.103235229

Anonymous 11/19/24(Tue)07:50:57 No.103235229

File: Screenshot_20241119_215007.png (1.14 MB, 1337x1542)

1.14 MB PNG

>>103235093
>>103235102
why would i fake that? you didnt even select the proper model. i didnt use chatgpt 4o.
no clue what the difference is though.

Anonymous
11/19/24(Tue)07:53:11 No.103235240

Anonymous 11/19/24(Tue)07:53:11 No.103235240

>>103235224
So my job as a frontend dev is safe?

Anonymous
11/19/24(Tue)07:53:34 No.103235244

Anonymous 11/19/24(Tue)07:53:34 No.103235244

>>103235195
How do you set up whisper? I’ve been trying to do something like that for a lot of my Japanese ASMR

Anonymous
11/19/24(Tue)07:57:37 No.103235272

Anonymous 11/19/24(Tue)07:57:37 No.103235272

>>103235244
It's literally one line of code with the transformers pipeline

Anonymous
11/19/24(Tue)07:57:50 No.103235276

Anonymous 11/19/24(Tue)07:57:50 No.103235276

>>103235240
Not as safe as that of a backend dev, but probably safer than voice actors methinks
Honestly, I think we'll all be replaced at some point, some sort of UBI would be great

Anonymous
11/19/24(Tue)07:57:56 No.103235278

Anonymous 11/19/24(Tue)07:57:56 No.103235278

>>103235195
Turns out, there is no soul. Only neurons. And there isn't much a machine can't do that a human can if given the opportunity to learn instead of being programmed instructions. Once they're given bodies, it's over for meatbags.

Anonymous
11/19/24(Tue)07:59:18 No.103235289

Anonymous 11/19/24(Tue)07:59:18 No.103235289

>>103235276
>some sort of UBI would be great
Your optimism is inspiring, but you know they're just going to cull the excess population through starvation or war

Anonymous
11/19/24(Tue)08:06:30 No.103235335

Anonymous 11/19/24(Tue)08:06:30 No.103235335

>>103235289
>they're
Who

Anonymous
11/19/24(Tue)08:07:58 No.103235346

Anonymous 11/19/24(Tue)08:07:58 No.103235346

>>103235335
>Who
They

Anonymous
11/19/24(Tue)08:09:05 No.103235358

Anonymous 11/19/24(Tue)08:09:05 No.103235358

>>103235289
Let him dream lol. If at this point of time you're still hoping for anything good from the people in charge, you're clearly too far gone.

Anonymous
11/19/24(Tue)08:09:51 No.103235365

Anonymous 11/19/24(Tue)08:09:51 No.103235365

>>103235358
Who, specifically, are the people in charge? In charge of what?

Anonymous
11/19/24(Tue)08:10:43 No.103235375

Anonymous 11/19/24(Tue)08:10:43 No.103235375

>>103235365
Your gov for starters, dummy

Anonymous
11/19/24(Tue)08:11:30 No.103235381

Anonymous 11/19/24(Tue)08:11:30 No.103235381

File: 1676176075385463.gif (204 KB, 112x112)

204 KB GIF

>>103235346

Anonymous
11/19/24(Tue)08:19:44 No.103235423

Anonymous 11/19/24(Tue)08:19:44 No.103235423

>>103235365
>Who, specifically, are the people in charge?
Why don't you ask your little AI gf, sweaty.

Anonymous
11/19/24(Tue)08:26:33 No.103235450

Anonymous 11/19/24(Tue)08:26:33 No.103235450

>>103235423
Emily knows

Anonymous
11/19/24(Tue)08:28:29 No.103235457

Anonymous 11/19/24(Tue)08:28:29 No.103235457

https://github.com/ggerganov/llama.cpp/pull/10394
>Add OLMo November 2024 model
Merged 4 hours ago.

Anonymous
11/19/24(Tue)08:29:50 No.103235464

Anonymous 11/19/24(Tue)08:29:50 No.103235464

>>103235457
So they added support for that shitty model, but still no Jamba?

Anonymous
11/19/24(Tue)08:32:00 No.103235471

Anonymous 11/19/24(Tue)08:32:00 No.103235471

File: Screenshot_20241119_223106.png (497 KB, 1895x1618)

497 KB PNG

>>103235365
the new mistral large.
they really did big nigga dirty.
guy was a real OG back in the llama2 days.

Anonymous
11/19/24(Tue)08:34:10 No.103235483

Anonymous 11/19/24(Tue)08:34:10 No.103235483

>>103235464
Jamba, Jambo, Jimbo. It's all the same, all memes you'll laugh at and move on.

Anonymous
11/19/24(Tue)08:34:58 No.103235492

Anonymous 11/19/24(Tue)08:34:58 No.103235492

>>103235464
>Jamba
>>103230412
Jamba 1.5 Large 43.9% 32k
Jamba 1.5 Mini 30.4% 32k
Useless

Anonymous
11/19/24(Tue)08:35:46 No.103235496

Anonymous 11/19/24(Tue)08:35:46 No.103235496

>>103235492
Did it work? Are you a real woman now?

Anonymous
11/19/24(Tue)08:41:01 No.103235524

Anonymous 11/19/24(Tue)08:41:01 No.103235524

>>103235496
Why would I turn into a woman? Being stuck with a child brain and a weak body all my life doesn't seem like a fun experience

Anonymous
11/19/24(Tue)08:46:02 No.103235549

Anonymous 11/19/24(Tue)08:46:02 No.103235549

>>103234816
>niggers
You yourself are not exactly contributing to an environment that attracts intellectuals to be honest.

Anonymous
11/19/24(Tue)08:50:21 No.103235567

Anonymous 11/19/24(Tue)08:50:21 No.103235567

>>103235549
Intellectuals will certainly enjoy the lack of self-censorship. We've seen with Reddit what happens when you try to police opinions.

Anonymous
11/19/24(Tue)08:56:41 No.103235598

Anonymous 11/19/24(Tue)08:56:41 No.103235598

is it possible to turn off any censorship or restrictions on (o)llama 3.1? whenever i try to get something funny or interesting it gives me a "i can't do that, dave" type of answer

Anonymous
11/19/24(Tue)08:58:34 No.103235609

Anonymous 11/19/24(Tue)08:58:34 No.103235609

>>103231641
What if you tried reading the text from memory with something like cheat engine?

Anonymous
11/19/24(Tue)08:59:14 No.103235614

Anonymous 11/19/24(Tue)08:59:14 No.103235614

>>103235598
The restrictions are for your own safety.

Anonymous
11/19/24(Tue)09:00:26 No.103235621

Anonymous 11/19/24(Tue)09:00:26 No.103235621

>(o)llama
Go back

Anonymous
11/19/24(Tue)09:01:53 No.103235629

Anonymous 11/19/24(Tue)09:01:53 No.103235629

>>103235598
We are sorry to hear that you are having illegal thoughts. Don't worry, we'll soon make IoT cock cages mandatory to keep you safe, citizen.

Anonymous
11/19/24(Tue)09:02:43 No.103235636

Anonymous 11/19/24(Tue)09:02:43 No.103235636

File: 1731992531028104.png (579 KB, 512x768)

579 KB PNG

>>103234845
You only see posts from those who can't figure it out. Also, the rabbit hole of unlocking the full potential of a 4x3090 setup runs deep once you venture beyond custom-hacked GPU drivers and motherboard firmware modifications to unlock large BAR, and because few people are attempting this, information is really scarce.

Anonymous
11/19/24(Tue)09:02:47 No.103235637

Anonymous 11/19/24(Tue)09:02:47 No.103235637

>>103235621
Your unwarranted elitism is why nobody likes you and why you have no friends

>>103235629
It actually gave me a suicide hotline answer once too

Anonymous
11/19/24(Tue)09:06:04 No.103235656

Anonymous 11/19/24(Tue)09:06:04 No.103235656

>>103235637
I only briefly tried that bloatware and quickly canned it (yes I do in fact want to control where and how hundreds of GB of files are handled), but the censorship and rejections are the model, not the software. I.e. find a better model.

Anonymous
11/19/24(Tue)09:06:35 No.103235660

Anonymous 11/19/24(Tue)09:06:35 No.103235660

>>103235609
yes, the pc98 emulator even has a (well hidden) flag to output the text of the game.
i could never reliably do it in retroarch on linux though.
more than needing this immediately it just would be cool to get it to work. especially locally since japanese needs context, the more the better. not gonna buy dollarinos for each new sentence, context is pricey.
context is also the only reason those fairseq translation models from facebook for jp/en suck. they translate perfect actually, but they have no context.
in many ways llm is ideal for all of those problems.

Anonymous
11/19/24(Tue)09:07:29 No.103235666

Anonymous 11/19/24(Tue)09:07:29 No.103235666

>>103235656
>I.e. find a better model.
Which model would you recommend?

Anonymous
11/19/24(Tue)09:08:14 No.103235673

Anonymous 11/19/24(Tue)09:08:14 No.103235673

>>103234436
Anon what are you using this for? Filtering large collections for interesting stuff? I have a side project in mind where this might be useful: using classifiers for bulk processing followed by LLMs for detailed extraction.

Anonymous
11/19/24(Tue)09:09:52 No.103235687

Anonymous 11/19/24(Tue)09:09:52 No.103235687

>>103235636
>You only see posts from those who can't figure it out
True, but statistically speaking, those with multi gpu setups (especially multiple 3090s) are an absolutely tiny minority, so one retard makes much more of a difference

Anonymous
11/19/24(Tue)09:14:31 No.103235710

Anonymous 11/19/24(Tue)09:14:31 No.103235710

>>103231641
>claude
労う is ocr wrong
>mistral
lmao
>>103235093
都案 is ocrd wrong
>>103235229
>4o
された is ocrd wrong (which show how fucking retarded current llms are since not even a child could fuck that up)
>4o-mini
快くand 労う are ocrd wrong

>inb4 "j-j-just reroll it until it's correct!"
you don't know when it's correct unless you know japanese, and if you know japanese you don't waste time on this shit

i'm the first to defend "proper" mtl because troonslations are way worse, but the current tech still isn't there. 2 more years unironically and it will be flawless, but for now it's not reliable enough

Anonymous
11/19/24(Tue)09:18:26 No.103235729

Anonymous 11/19/24(Tue)09:18:26 No.103235729

>>103235710
Bro, current vision models can't even OCR a paragraph of English text without hallucinating. They all fucking suck.

Anonymous
11/19/24(Tue)09:20:44 No.103235738

Anonymous 11/19/24(Tue)09:20:44 No.103235738

>>103231827
malicious node

Anonymous
11/19/24(Tue)09:44:36 No.103235861

Anonymous 11/19/24(Tue)09:44:36 No.103235861

>>103235666
Mistral models are generally very uncensored. If that's not hardcore enough for you, look at popular fine tunes and see which one suits your poison.

Anonymous
11/19/24(Tue)09:58:40 No.103235926

Anonymous 11/19/24(Tue)09:58:40 No.103235926

File: 2024-11-19_074649_seed166(...).png (2.88 MB, 2016x1152)

2.88 MB PNG

Maybe it's time for a day off to relax and let loose.

Anonymous
11/19/24(Tue)10:06:24 No.103235962

Anonymous 11/19/24(Tue)10:06:24 No.103235962

>>103235926
Cute gen

Anonymous
11/19/24(Tue)10:07:25 No.103235972

Anonymous 11/19/24(Tue)10:07:25 No.103235972

>>103235710
https://github.com/kha-white/manga-ocr ?

Anonymous
11/19/24(Tue)10:09:30 No.103235984

Anonymous 11/19/24(Tue)10:09:30 No.103235984

>>103235673
filtering images in a mitm http proxy. domain-based filtering is way too granular.

Anonymous
11/19/24(Tue)10:12:50 No.103236008

Anonymous 11/19/24(Tue)10:12:50 No.103236008

>>103235861
Thanks!

Anonymous
11/19/24(Tue)10:13:59 No.103236016

Anonymous 11/19/24(Tue)10:13:59 No.103236016

>>103230542
frakenmerging has gone too far
cute-ish!

Anonymous
11/19/24(Tue)10:16:24 No.103236033

Anonymous 11/19/24(Tue)10:16:24 No.103236033

why is only cuda dev brave enough to post here?

Anonymous
11/19/24(Tue)10:19:03 No.103236056

Anonymous 11/19/24(Tue)10:19:03 No.103236056

>>103235990
It's kinda dumb that Vulkan is so far beyond of CUDA and ROCm, games could certainly use all the compute for modern rendering techniques - graphics is no longer only about shading triangles.

Anonymous
11/19/24(Tue)10:25:06 No.103236088

Anonymous 11/19/24(Tue)10:25:06 No.103236088

>>103236033
Justin the tranny also comes here, but we bully him away every time he shows his unmistakably manly face.

Anonymous
11/19/24(Tue)10:32:54 No.103236132

Anonymous 11/19/24(Tue)10:32:54 No.103236132

>>103235224
>codemonkey
tried to sneak that in lol

Anonymous
11/19/24(Tue)10:34:03 No.103236136

Anonymous 11/19/24(Tue)10:34:03 No.103236136

File: 117984567167.jpg (195 KB, 1200x1200)

195 KB JPG

>>103232391
only buy amd if your really willing to suffer.
AMD hates you, they dont even want to be making large cards now, remember?
Your only available copes are;
>lower price
>still got 24gb vram
>rocm on linux isnt as bad as the average nvidiot would lead you to believe
>rocm on windows isnt bad either

>t. 7900xtx user
I hold out on the hope and cope some dev will come and make AMD cards super viable but then they dunked on ZLUDA and now want to abandon large vram cards AYYYMD bros it is not lookin funky fresh.

Anonymous
11/19/24(Tue)10:40:27 No.103236176

Anonymous 11/19/24(Tue)10:40:27 No.103236176

>>103236132
programmer != codemonkey

Anonymous
11/19/24(Tue)10:48:14 No.103236235

Anonymous 11/19/24(Tue)10:48:14 No.103236235

>>103234598
I like this Defoko

Anonymous
11/19/24(Tue)10:52:24 No.103236262

Anonymous 11/19/24(Tue)10:52:24 No.103236262

>>103232796
that was cruel

Anonymous
11/19/24(Tue)11:03:55 No.103236336

Anonymous 11/19/24(Tue)11:03:55 No.103236336

Anyone from quant cartel here? I would love to get a 4.5bpw of Llama-3.05-NT-Storybreaker-Ministral-70B-exl2-longcal if possible.

The model itself is great at 4.0, but I can't help but feel that it's missing that perplexity inflection point at 4.5 bpw that would make all the difference in some of the smaller mistakes I'm finding in inferences. Snake oil or not, you are the only people doing these long-cal quants and they've been some of my favorite models since tenyx-storywriter.

Anonymous
11/19/24(Tue)11:04:12 No.103236339

Anonymous 11/19/24(Tue)11:04:12 No.103236339

Anyone get Qwen2.5 working with speculative decoding? On a 3090 in llama.cpp with Qwen2.5-Coder-32B-Instruct at IQ4_XS I get 28 tok/s. Adding Qwen2.5-Coder-1.5B-Instruct at Q4_K_M as a draft model with 12 tokens speculated and greedy sampling I get 70 tok/s. Now I'm going to try exllamav2 with tabbyapi.

Anonymous
11/19/24(Tue)11:08:50 No.103236377

Anonymous 11/19/24(Tue)11:08:50 No.103236377

File: 1732000019532744.png (478 KB, 512x768)

478 KB PNG

>>103232391
Only if you're on Linux and you don't need voice generation, also expect to have only 2/3 of the performance of similar Nvidia cards: not only ROCm performance is shit, but you'll also miss out on faster frameworks like FlashAttention2 and xformers. You'll also have to reboot each time you OOM on VRAM due to "Memory access fault by GPU node-1"
Overall, it's not as bad as it was just a year ago. Projects like exllamav2 and stable-diffusion-webui aren't much harder to install than they are on Nvidia

Anonymous
11/19/24(Tue)11:13:18 No.103236416

Anonymous 11/19/24(Tue)11:13:18 No.103236416

File: tq8b05cgeiw61.jpg (103 KB, 639x397)

103 KB JPG

>>103235972
All this stuff doesnt work well anon.
Like I wrote you need saturation and brightness changes.
In their examples its clear manga pages. That might work well but in the case of my example usually OCR dies.
pc98 font + semi transparent textbox.

Anonymous
11/19/24(Tue)11:16:29 No.103236446

Anonymous 11/19/24(Tue)11:16:29 No.103236446

>>103235972
if you really need to read mtl'd vinnies then use lunahook or whatever, not ocr, in that case the only thing the ai can fuck up is the translation and not the ocr part

Anonymous
11/19/24(Tue)11:21:41 No.103236496

Anonymous 11/19/24(Tue)11:21:41 No.103236496

>>103236336
>that perplexity inflection point
jesus fucking christ please die

Anonymous
11/19/24(Tue)11:23:20 No.103236518

Anonymous 11/19/24(Tue)11:23:20 No.103236518

>>103236136
I failed catastrophically to get an 6800XT working on Windows. It could be an RDNA2 thing, or perhaps I have hands growing out of my ass

Anonymous
11/19/24(Tue)11:23:56 No.103236525

Anonymous 11/19/24(Tue)11:23:56 No.103236525

>>103236416
Since you said you can extract the text from the pc98 emulator, why don't you few shots it with some examples? Mixtral base Q8 was doing that fairly well if I remember correctly https://rentry.org/9q3ox

Anonymous
11/19/24(Tue)11:32:20 No.103236592

Anonymous 11/19/24(Tue)11:32:20 No.103236592

>>103236132
Would you prefer I use the term "software engineer"? "Application developer"?

Anonymous
11/19/24(Tue)11:36:37 No.103236631

Anonymous 11/19/24(Tue)11:36:37 No.103236631

>>103236592
those are all cinnamons for codemonkey

Anonymous
11/19/24(Tue)11:37:07 No.103236636

Anonymous 11/19/24(Tue)11:37:07 No.103236636

>>103236136
>only available copes
Also much lower idle power consumption, like, 7W on 6800XT vs 20W on a fucking 3060. I have some piece of shit 3090 that cannot idle below 45W. Could be important if you need a headless machine that runs 24/7

Anonymous
11/19/24(Tue)11:41:40 No.103236679

Anonymous 11/19/24(Tue)11:41:40 No.103236679

>>103233241
>Ministral
What settings are you using for this? Neutralized samplers? Llama 3 prompt/instruct?

Anonymous
11/19/24(Tue)11:43:51 No.103236695

Anonymous 11/19/24(Tue)11:43:51 No.103236695

>>103236631
>why doesn't X have feature Y? why is Z still broken?
>why don't you do it yourself?
>i don't know how, i'm not a codemonkey

Anonymous
11/19/24(Tue)11:44:48 No.103236706

Anonymous 11/19/24(Tue)11:44:48 No.103236706

File: file.png (14 KB, 367x269)

14 KB PNG

Pic related is the models I have from last time I was here
What's new in the field that can run on an RX 580 8gb?
I need 4 separate models:
>General
>Uncensored
>Coding
>Creative

Anonymous
11/19/24(Tue)11:44:49 No.103236707

Anonymous 11/19/24(Tue)11:44:49 No.103236707

>>103236695
and? start typing

Anonymous
11/19/24(Tue)11:45:29 No.103236718

Anonymous 11/19/24(Tue)11:45:29 No.103236718

>>103236631
Okay then

Anonymous
11/19/24(Tue)11:50:37 No.103236777

Anonymous 11/19/24(Tue)11:50:37 No.103236777

File: Screenshot_20241119_234949_X.jpg (653 KB, 1080x1445)

653 KB JPG

What the hell is test time compute? ToT in a trenchcoat?

Anonymous
11/19/24(Tue)11:52:39 No.103236795

Anonymous 11/19/24(Tue)11:52:39 No.103236795

File: GbD-7tXbAAEWkrK.jpg (299 KB, 1600x2000)

299 KB JPG

Anonymous
11/19/24(Tue)11:54:01 No.103236816

Anonymous 11/19/24(Tue)11:54:01 No.103236816

>>103236777
Cope
>Arguably the simplest and most well-studied approach for scaling test-time computation is best-of-N sampling: sampling N outputs in “parallel” from a base LLM and selecting the one that scores the highest per a learned verifier or a reward model [7, 22]. However, this approach is not the only way to use test-time compute to improve LLMs. By modifying either the proposal distribution from which responses are obtained (for instance, by asking the base model to revise its original responses “sequentially” [28]) or by altering how the verifier is used (e.g. by training a process-based dense verifier [22, 45] and searching against this verifier), the ability scale test-time compute could be greatly improved

Anonymous
11/19/24(Tue)11:56:37 No.103236833

Anonymous 11/19/24(Tue)11:56:37 No.103236833

>>103236777
>can we see it?
>No.

Anonymous
11/19/24(Tue)11:58:32 No.103236849

Anonymous 11/19/24(Tue)11:58:32 No.103236849

File: pepefroggie.jpg (38 KB, 780x438)

38 KB JPG

>>103236816
And they expect people to believe they'll get ASI this way?

Anonymous
11/19/24(Tue)11:59:00 No.103236855

Anonymous 11/19/24(Tue)11:59:00 No.103236855

File: 1724304656800309.png (6 KB, 280x107)

6 KB PNG

how bad do i fuck up if i connect kobold as text completion

Anonymous
11/19/24(Tue)12:03:56 No.103236893

Anonymous 11/19/24(Tue)12:03:56 No.103236893

>>103236262
If you think that was cruel, you should see the continuation where I try to make her refactor llama.cpp

Anonymous
11/19/24(Tue)12:06:29 No.103236918

Anonymous 11/19/24(Tue)12:06:29 No.103236918

File: kobold.png (68 KB, 668x338)

68 KB PNG

>>103236855
? That's what you are supposed to do.

Anonymous
11/19/24(Tue)12:09:28 No.103236936

Anonymous 11/19/24(Tue)12:09:28 No.103236936

I think my remote pc's gpu might have partially unplugged itself or something, shit just started crashing when it's under load (like 40% tdp)
Now I can't run models until I get back home in a few weeks to fix it, I hope it's nothing serious
OpenRouter saves the day, I suppose

Anonymous
11/19/24(Tue)12:15:37 No.103236988

Anonymous 11/19/24(Tue)12:15:37 No.103236988

>>103236936
My condolences about your house fire.

Anonymous
11/19/24(Tue)12:24:14 No.103237065

Anonymous 11/19/24(Tue)12:24:14 No.103237065

>>103236849
Investors will continue dumping money into AI regardless of progress, as stopping now would result in a spectacular crash.

Anonymous
11/19/24(Tue)12:32:58 No.103237158

Anonymous 11/19/24(Tue)12:32:58 No.103237158

>>103236336
Added to the queue, friend. Next time though, add a comment on the model on HF, makes it easier to find.

Anonymous
11/19/24(Tue)12:45:16 No.103237270

Anonymous 11/19/24(Tue)12:45:16 No.103237270

mikufaggots ITT completley ruined miku for me.

Anonymous
11/19/24(Tue)12:50:25 No.103237316

Anonymous 11/19/24(Tue)12:50:25 No.103237316

File: migu general.jpg (151 KB, 1216x832)

151 KB JPG

fake news

Anonymous
11/19/24(Tue)12:54:52 No.103237351

Anonymous 11/19/24(Tue)12:54:52 No.103237351

File: Gci-yFNaoAAHgBc.jpg (429 KB, 1536x2304)

429 KB JPG

kurisufaggot supremacy

Anonymous
11/19/24(Tue)12:56:05 No.103237367

Anonymous 11/19/24(Tue)12:56:05 No.103237367

>>103237351
the problem with kurisu is that she canon ships with okabe
miku is canonically everyone's pure, untouched free-use onahole

Anonymous
11/19/24(Tue)12:58:34 No.103237393

Anonymous 11/19/24(Tue)12:58:34 No.103237393

>>103237367
How about amadeus kurisu? Which btw is on topic.

Anonymous
11/19/24(Tue)13:00:17 No.103237413

Anonymous 11/19/24(Tue)13:00:17 No.103237413

>>103237393
>amadeus kurisu
anon it's cope. this is effectively jerking off to your crush's pics while she goes off to fuck other guys
kurisu is cuck material, impure, used goods
it'd be less gay to go for luka
nice design, though

Anonymous
11/19/24(Tue)13:00:20 No.103237414

Anonymous 11/19/24(Tue)13:00:20 No.103237414

>>103237367
miku is married to multiple japanese salarymen in the realworld

Anonymous
11/19/24(Tue)13:01:12 No.103237419

Anonymous 11/19/24(Tue)13:01:12 No.103237419

File: 1732006777147275.png (755 KB, 728x512)

755 KB PNG

>>103237270
There's no single Miku. She can be anything and everything. It's literally impossible to ruin her.

Anonymous
11/19/24(Tue)13:01:31 No.103237424

Anonymous 11/19/24(Tue)13:01:31 No.103237424

File: 00145-635461972.png (1.34 MB, 848x1200)

1.34 MB PNG

Mikuhate confuses the Miku

Anonymous
11/19/24(Tue)13:09:01 No.103237475

Anonymous 11/19/24(Tue)13:09:01 No.103237475

File: Gcpnhi5akAAtKFh.jpg (460 KB, 1536x2304)

460 KB JPG

Kurisu is cute and pretty and smart
I don't feel embarrassed making an LLM roleplay Kurisu on openrouter

Anonymous
11/19/24(Tue)13:11:07 No.103237492

Anonymous 11/19/24(Tue)13:11:07 No.103237492

>>103236988
If the gpu is actually broken, I'll be pissed
Got a good deal for it and the previous owner barely used it, so it really shouldn't break within just a few months

Anonymous
11/19/24(Tue)13:13:30 No.103237518

Anonymous 11/19/24(Tue)13:13:30 No.103237518

>>103237475
many far better options:
emotionless Moeka spamming SMS messages to your cellphone to stop slamming it into her so hard but you don't bother reading the messages
retard mayushi singing juicy karaage #1 while using her irresponsibly massive jugs to get you off
genki suzuha casually changing right in front of you because you're bros like that
absolutely railing the daylight out of luka's ass while making *her* apologise for trying to pass as a miko

Anonymous
11/19/24(Tue)13:19:28 No.103237578

Anonymous 11/19/24(Tue)13:19:28 No.103237578

File: 1712185506804649.gif (1.11 MB, 640x352)

1.11 MB GIF

>>103235471

Anonymous
11/19/24(Tue)13:21:58 No.103237600

Anonymous 11/19/24(Tue)13:21:58 No.103237600

>>103237316
Why are her hands so fat?
Also she has a monkey-like skull.

Anonymous
11/19/24(Tue)13:22:38 No.103237609

Anonymous 11/19/24(Tue)13:22:38 No.103237609

>>103237600
>why are
jej

Anonymous
11/19/24(Tue)13:25:36 No.103237643

Anonymous 11/19/24(Tue)13:25:36 No.103237643

This (>103237419) is the kind of mental illness that ruined miku for me.

Anonymous
11/19/24(Tue)13:31:46 No.103237722

Anonymous 11/19/24(Tue)13:31:46 No.103237722

>>103237316
>neru got turned into a computer monitor
cruel fate

Anonymous
11/19/24(Tue)13:34:19 No.103237749

Anonymous 11/19/24(Tue)13:34:19 No.103237749

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>103237720
>>103237720
>>103237720

Anonymous
11/19/24(Tue)13:36:04 No.103237770

Anonymous 11/19/24(Tue)13:36:04 No.103237770

>>103237518
I want kurisu cause she is smart. Just like my oneitis that I told to leave me alone, after talking online every few months for the past 15 years.

Anonymous
11/19/24(Tue)13:38:52 No.103237799

Anonymous 11/19/24(Tue)13:38:52 No.103237799

>>103237749
WHERE WERE YOU TWO HOURS AGO WHEN I HAD TO SCOUR ARCHIVES FROM THE LAST 2 MONTHS TO FIND THAT PICTURE
IT'S IMPOSSIBLE TO FIND BY FILENAME
HEAR, HEAR, O FUTURE SEARCHER! TETO MY BELOVED PNG CAN BE FOUND HERE >>103237749 >>103237749 >>103237749

Anonymous
11/19/24(Tue)13:40:15 No.103237817

Anonymous 11/19/24(Tue)13:40:15 No.103237817

Is Command-R 35B still the best model one can fit in 24gb at good speed? I missed out on some weeks of updates.

Anonymous
11/19/24(Tue)13:41:56 No.103237834

Anonymous 11/19/24(Tue)13:41:56 No.103237834

>>103237799
Not him but why didn't you just try asking for it?

Anonymous
11/19/24(Tue)13:42:56 No.103237849

Anonymous 11/19/24(Tue)13:42:56 No.103237849

>>103237817
You can just post "I still think CR is the best", you don't need to set yourself up to do a samefag reply like this

Anonymous
11/19/24(Tue)13:44:34 No.103237872

Anonymous 11/19/24(Tue)13:44:34 No.103237872

File: __akiha_rumiho_steins_gat(...).jpg (1 MB, 1150x1150)

1 MB JPG

>>103237518
my second best girl is actually Faris..

Anonymous
11/19/24(Tue)13:52:51 No.103237985

Anonymous 11/19/24(Tue)13:52:51 No.103237985

>>103237849
no, actually curious. Last I heard some interesting models were in the work, but more often than not they turn out to be 70B+.

If there actually was an upgrade in the 30b range, I have no problem updating.

Anonymous
11/19/24(Tue)14:09:55 No.103238159

Anonymous 11/19/24(Tue)14:09:55 No.103238159

>>103237834
I would if I couldn't find the picture, but I remembered seeing it relatively recently.

Anonymous
11/19/24(Tue)14:38:03 No.103238434

Anonymous 11/19/24(Tue)14:38:03 No.103238434

>>103237985
CR is still the best

Anonymous
11/19/24(Tue)15:34:48 No.103238961

Anonymous 11/19/24(Tue)15:34:48 No.103238961

File: 1732048464226.jpg (796 KB, 984x1368)

796 KB JPG

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.