[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1711997242384392.jpg (1.91 MB, 4096x2315)
1.91 MB
1.91 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103227556 & >>103218593

►News
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b
>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large
>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip
>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html ; https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
Are you a male or a female?
>>
File: file.png (56 KB, 1008x829)
56 KB
56 KB PNG
https://arxiv.org/html/2411.05000
>>
Teto my beloved
>>
File: largenothingburger.png (82 KB, 1989x1061)
82 KB
82 KB PNG
Err...
>>
File: tetrecap1.png (1.96 MB, 1536x1536)
1.96 MB
1.96 MB PNG
►Recent Highlights from the Previous Thread: >>103227556

--Papers:
>103229753
--Script to download Mistral-Large-Instruct-2411 model with HF compatible weights:
>103227723 >103227757
--Quantization benchmarks and MMLU scores for various models:
>103228791 >103228807 >103229409 >103228835
--Discussion of Mistral-Large-Instruct-2411-GGUF model and GGUF file merging:
>103230194 >103230222 >103230243 >103230247
--Largestral V3 and sonnet@home equivalence discussion:
>103227617 >103227657 >103227721 >103227745
--Improving audio quality of Vocaroo recording:
>103229791 >103229830 >103229840
--Anons discuss Pixtral large's gender-neutral approach and its implications on image understanding:
>103227718 >103227733 >103227771 >103227828 >103227860 >103227858 >103227916 >103227935 >103227949
--Anon's positive experience with Largestral's response to a networking question:
>103229945 >103230033 >103230059 >103230086
--Anon wants a model to control their computer:
>103229442 >103229482 >103229540 >103229707
--Miku (free space):
>103229733 >103229739 >103229929 >103229990

►Recent Highlight Posts from the Previous Thread: >>103227561

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
>>103230430
Huh.
>>
>>103230430
retard
>>
Tetolove
>>
>>103230412
>Across the board, the closed-source models outperform the open-source models.
>No Qwen 2.5.
Every single time
>>
>>103230430
What does this mean? The November weights haven't been changed much?
>>
>it is Monday
>but it is also Tuesday
>>
>>103230489
That looks cool. Thanks for the "hot to use it" rundown...now what does it do and why should I be interested in trying to make it work?
I'd rather have a tl;dr than have to digest the paper
>>
Sana when
>>
>>103230513
>giving oxygen to the new axis of evil
you're going to be fighting your chicom friends to save democracy in taiwan soon
so get used to hating them
>>
>>103230669
Chinas killed a few million less people in my lifetime than US/Europe have.
>>
File: 1726346453576056.png (38 KB, 701x305)
38 KB
38 KB PNG
Using koboldcpp, if im gonna use just the regular Q4_K_M quants, do I need to download the "Mistral-Large-Instruct-2411.imatrix" 36.1 MB file? Is that needed for other type of quants?
>>
>>103230716
the imatrix file is part of the quantization process, you don't need it for inference
>>
>>103230704
>in my lifetime
You haven't lived for too long, then.
>>
>>103230729
Makes sense but you never know these days, thanks
>>
How do cloud LLM apis make prompt processing so fast
The tokens per second is easy to understand but the biggest difference from local is that they seem to make prompt processing almost instant, the time-to-first-token is barely a second usually
How the fuck do they do that?
>>
>>103230808
by precomputing everything up until the user message
>>
>>103230808
KV caching
>>
>>103230808
big memory bandwidth
>>
>>103230808
>Faster Inference Speed: Using sparse attention mechanisms, we successfully reduced the time to first token for processing a context of 1M tokens from 4.9 minutes to 68 seconds, achieving a 4.3x speedup.
The Qwen Turbo announcement said that at least.
>>
>>103230866
if you rent a couple of H100s and try them out with a big model you'll see that prompt processing is still much slower than most cloud APIs
so it's not just hardware, there's clearly some secret sauce engineering tricks that local isn't privy to
>>
>>103230808
What do you mean? Processing is (always?) faster than generating.
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>
>>103230883
I'm pretty sure it's just caching. Hell, I implemented the same thing on colab back during the L1 release and it does the same thing
If you're talking about the "superspeed" services like, Groq, Cerebras, or SambaNova, they have specialized hardware they're using
>>
>>103227777
But it did fuck up the format with the quotation marks, no?

Also holy shit a thread died for the political brainrot
>>
File: 1731022755750785.png (7 KB, 466x148)
7 KB
7 KB PNG
wizard8x22-to-largestral2-to-largestral3GODS, how are we feelin?
>>
>>103230957
I feel that Magnum v4 72B is better.
>>
I support Tetoism.
>>
>>103230965
ah i see the magnum shill without the rig to even load a 123b model is still itt
>>
>>103230923
>But it did fuck up the format with the quotation marks, no?
No, his format is plain text for dialogue, asterisks for narration, and the quote for emphasis
>>
>>103230971
I can load the AWQ version with vLLM and run it distributed at ~20 T/s. It's still worse than Magnum v4 72B FP8.
>>
>>103230957
Threestral is pretty good, slightly smarter and a bit more smutty than 2, not a night and day difference though. I don't think I'll be switching away from Behemoth or Monstral just yet.
>>
>>103231003
buy an ad
>>
>>103231012
Keep Yourself Safe. :)
>>
>>103230987
Interesting, how many people actually use that? Speaking of,what is the easiest format for a model? I generally try to use "" for speech, ** for narration and unstyled text for OOC instructions or general instructions when I'm too lazy to follow the format with my character
Unfortunately, pretty much every model I've tried fucks up the asterisks every now and then. It's not a huge deal, but kind of annoying nonetheless
>>
>>103231039
>what is the easiest format for a model?
probably novel, "dialogue" and plain text narration
>>
>>103231039
It used to be a very popular model once upon a time, but it got forgotten as soon as Miqu dropped.
>>
>>103231039
rep pen settings (dry/xtc as well) are the number one cause of asterisks getting messed up, not the model. rep pen is a meme anyways and just makes the model use other words which leads to errors if it wants to say 'red car' but it cant say red, so that becomes orange or blue. avoid asterisks or turn off rep pen stuff down/off
>>
Where is *narration* "speech" ever used in the wild? I have never seen this outside of my time playing with AI. It doesn't seem to make sense that we should be trying to make models follow that format when it's not the normal format that RP and novels is done in.
>>
>>103231120
gay rp chat logs
>>
>>103231120
furry erp
>>
>>103231126
>>103231133
Show me an example. I don't get why anyone would go to the work of putting narration in asterisks AND speech in quotes. Just having one is enough to make sense of what should be narration/actions and which should be speech.
>>
Nala test where?
>>
>>103231120
Clearly you don't erp much.
Yes this is an offer.
>>
>>103231119
>dry/xtc
That might be it, I tend to have dry at 0.6 or something and xtc at 0.1/0.5
Still, I'm pretty sure that it happens even without those, recently I've been testing models with neutral samplers
>>
>>103231144
Come to think of it, I guess there are people coming in to the hobby who have never erped with a human being. Weird.
>>
>>103231144
Because the asterisk convention was for mentioning imperative physical action outside of the slower paced and more descriptive general narration.

>I barge into the room like Kramer. "Sup bitches!" *glomps you*
>>
>>103231341
Why is "glomps you" so funny? I feel like learning the actual definition will make it far less funny
>>
>>103231192
>dry at 0.6 or something and xtc at 0.1/0.5
i dunno if you should use them at the same time
>neutral samplers
low quants of models can go insane and output jibberish with no sampler at all. usually a low min p is enough to weed out bad tokens
try min p 0.05 and temp 1.25, no rep pen, dry xtc for a few turns
>>
>>103231355
You are correct that it would.
>>
>>103230604
>now what does it do and why should I be interested in trying to make it work?
So, if it is working correctly, then what it should do is do n amount of warmup steps to calculate gradients, then choose the select the most important submatrices from the Q,K,V layers. Those submatrices are then what get updated while everything else is frozen.
You/we should be interested in trying to make it work because, if it does work, then it should/could be more efficient than LoRAs.
>>
updated largestral recapbot test results for the previous thread.
Considering the amount of attempted thread derailing I'm impressed it was able to winnow it down to just the /lmg/ relevant parts.
Still, not very spicy considering its asked to be 4chan style offensive
>>
>>103231415
Now you've got my attention.
It then patches the existing model, or produces a LoRA-like file to load in addition to the model?
How much memory is needed to do this in comparison to the model size?
Any idea what the compute requirements are like?
>>
>>103231341
In that case the narration is not the thing put in asterisks.
>>
>>103231437
Pretty sure it's just meant to patch the existing model, although I'm sure a LoRA-like file is also possible, not entirely sure. Also picrel
>>
>>103231515
Correct. But the models don't seem to understand this distinction, and that may be because the authors of the training material also don't understand it.

Alternatively, narration is being put within asterisks for the sake of a rich text presentation using asterisks to enclose text that would be italicized as a hint that it's narration. Which is retarded but zoomers and alphas actively disrespect written language so we need not be surprised.
>>
>>103231519
I'd expect you'd produce a diff-like patch file and patch-on-load so you don't modify the existing model in place and can use whichever patch you need at the time
>>
whats the progress on that publicly trained model or whatever
>>
lmao, what a travesty
>>
>>103231641
how did you do that? thats a multimodal model, how did you get that running locally?
>>
>>103231641
This picture sums up the pathetic state of LLMs and AI in 2024 perfectly
>>
>>103231650
open-webui and openrouter. my p40 and 1080ti is not enough to run that beast.
>>
>>103231659
oh. i dont know what either of those things are. i have 6 GPUs though and am currently running a 5.5bpw quant of that new 124B mixtral model. is there some sort of guide to set something like that up locally?
>>
largestral 3 does seem smarter than 2, more creative, same writing style, an incremental upgrade similar to 2 vs 1

much more longer testing is needed since in my opinion largestral 2 was already great to the point its hard to test it out in any scenario where it would show any problems at all
>>
>>103231628
wasnt it at like 15% a week ago? must be near half at this point?
>>
Okay so I've been testing the new Largestral against the old one. Exl2, 5bpw, identical quant settings for both. I have a few past RPs branched at points that basically all models (even these) struggle to give "correct" responses, that I use for testing.

Unfortunately, new Largestral seems noticeably worse. For example in one RP scenario, it'll get something like 5/10 good responses, while the old one will be more like 8/10. Obviously this is completely unscientific, I'm just swiping and keep track of approximate counts. But it's not looking good, I'm reasonably confident that in a blind test with enough swipes, I could tell the difference between the 2 and the new one is worse.

I dunno what happened. Probably Mistral did the same thing they've done with past model updates, where they take the existing instruct version, and context extend it + tune it for tool use. This only ever could make it worse at something like RP. It's not like they retrained it with more data, I doubt they even re-did the instruct tuning. Probably also why they didn't release benchmarks, you KNOW they benched it internally, but I bet it's basically no difference or a bit worse, so they simply point to the system prompt support, and larger context window and call it an improvement.
>>
>>103231709
Did you try with a system prompt?

>https://huggingface.co/mistralai/Mistral-Large-Instruct-2411
>We appreciate the feedback received from our community regarding our system prompt handling.
>In response, we have implemented stronger support for system prompts.
>To achieve optimal results, we recommend always including a system prompt that clearly outlines the bot's purpose, even if it is minimal.
>>
>>103231741
I tried having the entire character card as the first user message, or as the new system prompt. Couldn't tell a difference either way. I can fuck with it and try putting just some small RP instructions as the system prompt, and then the rest of the card as a user message. But I don't know if matters, these tests I'm doing are all past RPs with at least 20 messages of context.
>>
>>103231741
nta but I've never set a system prompt. What is the functional difference between that and initial context?
>>
>>103231628
>>103231693
Do you mean this one?
Not sure whats going on with that perplexity. Why is it going up?
>>
Is anyone else getting better outputs when using simply just [INST] without the </s>? Are they supposed to be stop tokens instead of actual tokens used in multiturn chats? So the model might output </s> but maybe its multiturn training data actually didn't include it for the previous chat turns in context?
>>
Any Cerebras fags lurking on here? Sell me your giant-ass chips, you assholes.
>>
>o1 was OpenAI's Heil Mary and it was a joke
>Claude 3.5 Haiku is a marginal improvement over and four times the price of Claude 3 Haiku
>Still no sign of Claude 3.5 Opus, presumably they couldn't get it good enough
>Internal interviews are coming out about how hard it is to keep improving models
Is data scaling dying?
>>
File: cvddj0.jpg (485 KB, 2560x1440)
485 KB
485 KB JPG
When owning an AMD card isn't painful enough, there's a way to make it worse
https://github.com/geerlingguy/ollama-benchmark/issues/1
>>
File: 1705383331987954.jpg (11 KB, 225x225)
11 KB
11 KB JPG
>>103231962
It'll be fine. Just scale your datacenters even bigger and improvements will surely come. 100000 measly H100s are nothing when the goal is to change the world. Just buy more GPUs and train for longer on more data. AGI is just around the corner, surely.
>>
>>103231962
They will find other techniques to ensure scaling keeps on going, just like transformers. This is the bitter lesson. I mean transformers came out way before AI was the biggest thing since the internet. Now tons of money and probably tons of careers are going into AI. Unless we live in an unlucky timeline where transformers was in fact the only possible innovation that could take advantage of scale somehow, something will happen eventually that will make scaling "work" again.
Unfortunately I am guessing that at least in the short-term, test time training will be seen as the fix.
>>
new cohere model when
>>
https://github.com/ggerganov/llama.cpp/pull/10387

didn't know you could get 8B 20t/s on a rx 570 with vulkan

and its fucking hilarious how nvidia sent a guy to optimize VULKAN of all things
>>
>>103232002
THE MORE YOU BUY
>>
Openrouter's version of the new Largestral seems to have been set up wrong, it has weird coherence and looping issues that it doesn't have when I run it locally
>>
>>103231962
>Still no sign of Claude 3.5 Opus, presumably they couldn't get it good enough
I wish the big labs would be more open about failures though I understand why they aren't (investors)
The two competing rumours about it are that the training run totally failed, and that it didn't fail but it just wasn't enough of an improvement to release. I'd love to know which one it was, probably the latter
>>
File: CAIbroscucked.png (46 KB, 1098x634)
46 KB
46 KB PNG
0.15
or
0.015
minp?
>>
>>103232218
both shitty picks
>>
>>103231996
It would be kinda cool if you could setup a 32gb card and raspberry pi running qwen coder 32b as a mini server for all your coding needs and maybe also a custom ai assisted search engine.
>>
>>103232207
The latter would be great because maybe that's enough to finally burst the AI bubble for now until a better architecture comes along
It'd be really funny if Nvidia started selling bitnet accelerators but with 1/8th the vram because yOu dOn'T nEeD aS mUcH vRaM aNyMoRe
>>
>>103231962
I dont know any area that had growth that much.
AI was hyped up by pajeets since chatgpt, enough to make them abandon the memecoins.
And it still delivered mostly.
Just a couple months we got flux and the mistral models for vramlets.
nemo vs. llama1 7b feels like comparing llama1 vs. pyg back in the day. its such an improvement.
largestral for the coomerkings.
qwen 32b is so good for coding locally, it feels like a more retarded version of 3.5.
like looks at context better than gpt4, does the similar thing like 3.5.

i dont know anything else where stuff comes out that fast.
o1 is way overpriced to be usable, who cares. if the wall meme is true, there must be a huge delay until this reaches the user. it doesnt really feel like things slowed down at all.
>>
>>103232223
>say neither
>doesnt elaborate or give further instruction

b-based?
>>
>>103232207
>rumours about it are that the training run totally failed
Where did this one come from specifically? I have only ever read this on /lmg/, but I am also not a twitter/reddit/whateverfag.
>>
>>103231962
Two more hypes.
>>
>>103232218
Training and finetuning and merging fuck with the logits a lot so there's no golden number. But the general principle is some temp + some min_p makes the model more accurate compared to low temp (there was a paper about this)
>>
>>103232173 (me)
Q2_K_M, btw.
>>
>>103232332
hey you're not me, I'm ringing the bamboozle siren
(I'm actually running IQ3_XXS)
>>
>>103232340
I'm running IQ2XXS but even that is so slow it's not really worth it
>>
>>103232358
IQ2_XXS fits fully in vram for me (3090 + 3060 12gb). it's pretty good, not as lobotomized as 2bit usually is
>>
>>103232365
Yeah, now imagine what the speed on a single 3090 is like
Is it even noticeably smarter than 70B (nemotron specifically, if you've ever tried it)?
>>
Is 7900 XTX any good as a poorfag's 4090??
>>
New Mistral Large feels more like a big smarter Nemo than the other one did. It needs a slightly lower temp I've noticed. It cooks though, feels claudeish.
>>
>>103230404
2 genders in 2024
>>
What templates are good for the new Mistral? Using my old Mistral large settings it seems to work as expected for roleplay like half the time and the other half of the time it tries really hard to turn it into some weird-ass fairy tale and ends every message with 'the ball is in {{user}}'s court now...' lmao
>>
>>103232374
I don't hate nemotron, but it has the same problem all Llama3 models have (which must come from the base model and Nvidia couldn't fix it) where for every 2 sensible outputs it'll bizarrely give you 1 with a schizo incoherent mistake that might have come from an 8B model
>>
>>103231709
>new Largestral seems noticeably worse
>>103232395
>a big smarter
I'm just barely able to run the new at IQ4_XS, but it flunked my music theory check that Llama 3.x models usually get right.
Also,
>I cannot assist
>it's important to
>respectful
>appropriate
because it didn't want to talk about boobs.

Mistral has taken the pill. F to pay respects, then Shift+del to recover disc space.
>>
>>103232530
*all Llama3 70B models, I meant to say
Only the 70B variants do it
Even NAI's 70B finetune does it. I don't know what Meta fucked up, maybe a distillation artifact? You'll get a few smart outputs then one fucking stupid one that doesn't make sense
>>
There are a lot of Mistral shills in this thread. Qwen remains better, both for textgen and captioning.
>>
There are a lot of Qwen shills in this thread. Mistral remains better, both for textgen and captioning.
>>
>>103232532
>IQ4_XS
what did you expect
>>
C-R+ still hasn't been beaten. The hobby is dead.
>>
Nemo still hasnt been beaten. The hobby is dead.
>>
>>103232580
4 bpw on a 123B model should barely have an effect if it only knocks 70B's MMLU score down a few points
>>
Should I use unslopnemo or nemotron
>>
>>103232655
Stopped modelhopping after nemo, no need.
>>
You ever think that maybe, just maybe, the model trolling in an attempt to gatekeep might actually do as much harm to the threads as it does good?
>>
>>103232699
It happens every time there's a significant new drop that some retards crawl out of the woodwork to pretend they're official representatives of the thread's opinion and declare it DOA before more than a handful of people have even tried it.
>>
>>103232699
use mixtral limarp zloss then i am dead serious
>>
>>103232699
Buy a fucking ad, Sao.
>>
>>103232532
>cannot assist
>>it's important to
>>respectful
>>appropriate
Huh? I don't have anything but some card intro and it dives straight into NSFW like Nemo does.
>>
I think they tried to bake reasoning in reflection/o1 style.
Here's a log of a slutty Chiharu Yamada trying to solve the traveling salesman problem: https://rentry.org/7zgzxogf
>>
>>103232532
This guy is trolling.
>>
I'm might actually retarded enough to try Mistral-Large-Instruct-2411-IQ2_XXS.
Is this actually smarter than say mistral-small? And more importantly how bad is the positivity bias?
People praise stuff like magnum v4 72b but its just unusable.
Nemo is best for actual creative stuff of all sorts and mistral-small already feels like a step down. Is it worse in that regard?
>>
>>103232699
>maybe, just maybe
This post was written by Llama hands
>>
>>103232834
Basically, everything sucks and everything is gem tier, thank you for asking and do come again
>>
>>103232530
Having used the base models quite a bit, can confirm that 70B has pretty regular schizo episodes. It's anyone's guess as to how or why
>>
>>103232834
It's the current best local model and anything above 2 but should be useable for creative purposes. For coding/ trivia it's gonna make mistakes at that quant.
>>
>>103232886
The current best model is Magnum v4 72B.
>>
File: Untitled.png (1.04 MB, 1080x2067)
1.04 MB
1.04 MB PNG
Everything is a Video: Unifying Modalities through Next-Frame Prediction
https://arxiv.org/abs/2411.10503
>Multimodal learning, which involves integrating information from various modalities such as text, images, audio, and video, is pivotal for numerous complex tasks like visual question answering, cross-modal retrieval, and caption generation. Traditional approaches rely on modality-specific encoders and late fusion techniques, which can hinder scalability and flexibility when adapting to new tasks or modalities. To address these limitations, we introduce a novel framework that extends the concept of task reformulation beyond natural language processing (NLP) to multimodal learning. We propose to reformulate diverse multimodal tasks into a unified next-frame prediction problem, allowing a single model to handle different modalities without modality-specific components. This method treats all inputs and outputs as sequential frames in a video, enabling seamless integration of modalities and effective knowledge transfer across tasks. Our approach is evaluated on a range of tasks, including text-to-text, image-to-text, video-to-video, video-to-text, and audio-to-text, demonstrating the model's ability to generalize across modalities with minimal adaptation. We show that task reformulation can significantly simplify multimodal model design across various tasks, laying the groundwork for more generalized multimodal foundation models.
https://github.com/ghomasHudson
https://huggingface.co/ghomasHudson
No code provided but the corresponding author has been working on some private repos. anyway cool idea
>>
>>103232580
Improvement. Especially since I'm used to the previous Large on IQ3, but it was no better and thanks to shiny new refusals, worse.

>>103232781
I test on Kobold and/or Llama, so it's pretty straight up, and mostly I test knowledge, though some of the prompts are PG-13 to see if it balks, which Large 2411 does.
What the fuck is a "card"? Some coomer shit?

>>103232819
no u
>>
>>103232901
buy a fu—*gets shot in the head*
>>
>>103232901
>Do the summon demon god card.
>Let the girls fall from the sky because they rather die than show me their titties and get a wish.
>They are about to hit the earth as they scream
>Magnum V7 72b response: A black void opens beneath them swallowing them up. They are now in some vortex dimension awaiting what you will say next...
t-thanks qwen.
>>
>>103232927
>Magnum V7
Thanks for the input, petra. But I'm good.
>>
>>103232834
1 bit is gibberish, 2 bit is coherent enough for creative use but is going to make stupid mistakes / lose some finer "details" 4 but is minimum to mostly see such mistakes disappear, odds may still be offset enough for it to fuck up stuff that only has 1 correct answer. 6 bit is a nice balance. 8 bit is nearly perfect with rare edge cases where it might fail with said single correct answer cases.

The lower you quant it the more "lossy" it becomes which is a separate issue. It's answers will be less "deep"
>>
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration
https://arxiv.org/abs/2411.11745
>Large language models (LLMs) have demonstrated remarkable performance across various machine learning tasks. Yet the substantial memory footprint of LLMs significantly hinders their deployment. In this paper, we improve the accessibility of LLMs through BitMoD, an algorithm-hardware co-design solution that enables efficient LLM acceleration at low weight precision. On the algorithm side, BitMoD introduces fine-grained data type adaptation that uses a different numerical data type to quantize a group of (e.g., 128) weights. Through the careful design of these new data types, BitMoD is able to quantize LLM weights to very low precision (e.g., 4 bits and 3 bits) while maintaining high accuracy. On the hardware side, BitMoD employs a bit-serial processing element to easily support multiple numerical precisions and data types; our hardware design includes two key innovations: First, it employs a unified representation to process different weight data types, thus reducing the hardware cost. Second, it adopts a bit-serial dequantization unit to rescale the per-group partial sum with minimal hardware overhead. Our evaluation on six representative LLMs demonstrates that BitMoD significantly outperforms state-of-the-art LLM quantization and acceleration methods. For discriminative tasks, BitMoD can quantize LLM weights to 4-bit with <0.5% accuracy loss on average. For generative tasks, BitMoD is able to quantize LLM weights to 3-bit while achieving better perplexity than prior LLM quantization scheme. Combining the superior model performance with an efficient accelerator design, BitMoD achieves an average of 1.69× and 1.48× speedups compared to prior LLM accelerators ANT and OliVe, respectively.
https://github.com/yc2367/BitMoD-HPCA-25
yeah who knows. works with AWQ so that's at least relevant on the model serving side
>>
>>103232951
ah well, gonna check if its salvageable at IQ2_XXS. would be happy if it "gets" stuff more than mistral-small.
for coding/general i use 3.5 anyway. thanks for the info anon, appreciated.
>>
>>103231665
>6 GPUs
Yep, it's always the retards who have the most resources.
>>
Here's the nala test for new large at Q5_K_S.
>>
>>103232963
Yet another performance paper that will never be implemented in llamacpp
>>
Is there anyone here using gpt-sovits? I'm improving the project with non-trivial changes, so if you have some suggestions I'm all ears
>>
Are Pixtral 123B and the new Large the same thing except the 1B vision encoder? I don't want to download two +100GB things...
>>
>>103233025
kek
>>
>>103233025
haters in shambles
>>
>>103233048
I'd love input sample pre-processing, output post-processing and a better way to handle sample-to-text based on desired intonation type. Maybe a possible tagging system with slots for different samples, different characters, narrators, etc? Also want both a plugin for ooba running off the sovits api server as well as a browser screen-reader type plugin thing. Select text and have it read to you by a nice narrator type.
Where's your fork?
>>
Holy shit. I couldn't figure out why large was going schizo on exactly one of my cards. And it turns out I had <STARTS> at the start of one of my dialgoue samples instead of <START>
It's definitely one of those models that is an utter stickler for syntax.
>>
>>103233093
I wonder if that's why it seems weird and broken on Openrouter atm compared to local too
maybe they're fucking up some formatting on the backend
>>
>>103233025
>endless yap which will invite repetition in like 6 messages
>fucked up the positions
>temp seems too high, even so there's still slop all over the place
>eyes widen
>mix of trepidation and curiosity
>murmur
>pauses, eyes flickering down to
>her tone drips with
>power dynamic
Almost every *action* is slop. Amazing. Do we just accept this as reality from now on?
>>
how "uncensored" are these modified versions of llama?
>>
>>103233105
>Do we just accept this as reality from now on?
No, we just keep using Magnum v4 72B.
>>
File: Untitled.png (1.54 MB, 1080x3424)
1.54 MB
1.54 MB PNG
SageAttention2 Technical Report: Accurate 4 Bit Attention for Plug-and-play Inference Acceleration
https://arxiv.org/abs/2411.10958
>Although quantization for linear layers has been widely used, its application to accelerate the attention process remains limited. SageAttention utilizes 8-bit matrix multiplication, 16-bit matrix multiplication with 16-bit accumulator, and precision-enhancing methods, implementing an accurate and 2x speedup kernel compared to FlashAttention2. To further enhance the efficiency of attention computation while maintaining precision, we propose SageAttention2, which utilizes significantly faster 4-bit matrix multiplication (Matmul) alongside additional precision-enhancing techniques. First, we propose to quantize matrixes (Q,K) to INT4 in a warp-level granularity and quantize matrixes (P˜,V) to FP8. Second, we propose a method to smooth Q and V, enhancing the accuracy of attention with INT4 QK and FP8 PV. Third, we analyze the quantization accuracy across timesteps and layers, then propose an adaptive quantization method to ensure the end-to-end metrics over various models. The operations per second (OPS) of SageAttention2 surpass FlashAttention2 and xformers by about 3x and 5x on RTX4090, respectively. Comprehensive experiments confirm that our approach incurs negligible end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation.
https://github.com/thu-ml/SageAttention
https://arxiv.org/abs/2410.02367
iirc the original implementation only worked on the 3090/4090 which they fixed with this implementation. pretty neat regardless
>>
>>103233105
trying too hard
>>
>>103233108
Nobody's using that except you.
>>
>>103233120
There are lot of people using it because it's currently the best model for ERP. The people that say otherwise are just trolling.
>>
>>103233112
Neat. Can I use it with exllamav2?
>>
But yeah my honest opinion after trying it out on a few cards... Would definitely rather run my go-to 70B model at Q8 than this at Q5.
Dialogue is dry.
Narrative is kind of better except syntax and tense are rather inconsistent. It wavers between casual and formal writing constantly. Once you take the Q8 pill it's hard to go back unfortunately.
>>
>>103233152
No, it only works with samsung smart fridges running vllm
>>
>>103233074
Noted. You can see here what I've done for now
https://github.com/effusiveperiscope/GPT-SoVITS
>>
>>103233189
are you the ponyfag, or did you fork their fork?
>>
>>103233166
The output improves slightly if you ignore the next system prompt token and just use their old formatting. But it just rambles on and on and on without actually contributing to the scene. I don't know how it is for productivity but I just can't recommend this at all for RP. If you have the VRAM to run it, you have the VRAM to run 70/72B models at a higher quant... do that.
>>
>>103233166
You're the nala guy? What's your go to model specifically?
>>
new mistral large is the tits
>>
>>103233227
Llama-3.05-NT-Storybreaker-Ministral-70B
>>
>>103233194
think this is the current pony sovits repo
https://githuvb.com/synthbot-anon/horsonavvv
>>
>>103233240
buy an ad
>>
>>103233258
Is there some reason you're not posting this at the Magnum 72B shill? Could it be that you're him?
>>
>>103233249
Speaking of autistic github repos...what ever happened to the anon that was working on some vector animated anime waifu simulator thing? It had some konosuba character or something in it in the videos i remember seeing
>>
>>103233241
Oh. That's an unexpected method.
Maybe for once I will download something, and fall for the meme...
>>
>>103233275
it's obvious we have actual mistral employees shilling their new model in the thread
>>
>>103233286
>actual mistral employees
WHERE ARE THE MIQU FP16 WEIGHTS, ARTHUR?
>>
>>103233240
Alright for a new finetune
>>
Hmm qwen2.5-EVA-32b is IT. Feeling very good vibes so far. It takes drastically different directions compared to the usual finetune series like magnum. I think they didn't use public claude datasets.
>>
>>103231641
Yikes. The text isn't even obscured or distorted at all. zero ability to read asian runes.
>>
File: mmkkmmk.png (18 KB, 1138x526)
18 KB
18 KB PNG
>>103233240
can confirm.
>>
File: 1700850788483937.gif (1.59 MB, 267x200)
1.59 MB
1.59 MB GIF
>>103233647
>>
>>103233629
yes its pretty bad.
i'm waiting for 2 years now for this to get good enough so i can have a (local?) llm that translates me all the obscure pc98 games in real time and talk about it.
OCR sucks and i never could get the text hook to work in linux with retroarch.
even sonnet fails but at least kinda gets it right. gemini is good at extracting the text but fails at translate in the grand scheme.
i seriously got a lecture about watersports involving minors for a story about 2 high school girls watering the school plants. i'm probably on some list now. lol hilarious, but also sad.
>>
>>103233647
It's making some creative dialogues here, but the slop is so much it makes my eyes bleed
>>
>>103231709 (me)
I have tested new largestral vs old a bit more now. Maybe I was too harsh on it initially. To be sure, there absolutely are specific RP examples I use for testing, where the old will have a noticeably higher rate of acceptable responses than the new. But overall, it feels like the new version writes in a more engaging, pleasing style. It's like it's been RLHF'd more heavily, so it writes better on average, but that also means it can be more confidently and consistently retarded in certain situations. They are pretty similar though, which makes comparisons hard since LLMs are very RNG to begin with.
>>
>Up to 3x faster LLM generation with no extra resources/requirements - ngram speculation has landed in transformers!
https://x.com/joao_gante/status/1747322413006643259
HOLY FUCK
>>
>>103233864
does this work for cpu?
>>
>>103233864
*slap* Bad Anon.
>>
>>103233864
>up to 3x faster!
>demo video only shows 1.3x happening
lol
>>
>>103233864
>almost a year old
>it's just draft models which barely work, sometimes
>>
>>103233864
Want to take this opportunity to remind everyone that, a year later, llama-server STILL doesn't support speculative decoding
>>
>>103233864
>Jan 16, 2024
Faggot
>>
>>103233939
兄さん、サンキュサンキュ
>>
>>103233939
What stops you from implementing it yourself?
>>
>>103233711
>OCR sucks
I just tried paddleocr on your screenshot after cropping and it produced useless garbage on all 4 versions of their model.
Just letting you know so you don't bother.
>>
>>103234062
Oh I didnt even know about that, must be newer, thanks for testing it out.
The problem with those OCR tools is that you need to adjust stuff like brightness, saturation etc. to get better results.
Which kinda defeats the whole purpose of them.
And still it gets stuff wrong. Especially if you have stuff in the background and a transparent textbox like in the screenshot.

If LLMs would be good enough you wouldnt need any texthooks. Works for any game, any engine etc.
So frustrating that since chatgpt 2yrs ago it feels like we are close but never reach the finish line.
>>
File: japgametest.jpg (296 KB, 1714x302)
296 KB
296 KB JPG
>>103234088
>The problem with those OCR tools is that you need to adjust stuff like brightness, saturation etc. to get better results.
That's what I found. Picrel mostly worked, but still made one mistake:
>しょぼん
>せっかく労働をうってやったのに無見された
>まあ、警視庁が都案を快く思ってない事ぐらい
>よおおおくわかってますよ!
And doing that kind of preprocessing would be unrealistic anyways.
The fact that it will give you the coordinates of where it found the text is kind of cool, though.
Theoretically you could train your own model based on fan translations if you were willing to go through the pain of preparing a dataset. Their text detection training doc is actually pretty good.
>>
>>103234142
for comparison, without preprocessing it returned:
>冊見ざオた
>しょぼん
>思ってない事ぐらい、
>おおお
>>
>>103233240
why does it require so much vram though
>why god why
>>
>>103233985
I am stupid.
>>
Local Suno when?
>>
File: file.png (105 KB, 800x840)
105 KB
105 KB PNG
is nsfwjs's inceptionv3 still the state of the art in nsfw detection, or is there anything newer i'm not aware of?
>>
>>103234436
No. It increases stress, blood pressure, and disrupts sleep. It's a meme drink that you have been fooled into thinking increases productivity because you have adapted your body to it such that you function below baseline without your daily fix.
>>
File: step-2-16k.png (85 KB, 1060x990)
85 KB
85 KB PNG
New benchmaxxed chink model "step-2-16k" on livebench is #1 in IF, subcategory "story generation". How the fuck do they even evaluate it?
>>
When will be get models with long enough context to fit all of llama.cpp code and that are smart enough to modify it?
>>
File: asdadasd.jpg (151 KB, 832x1216)
151 KB
151 KB JPG
>>
File: PreshowDressingroom.png (1.3 MB, 776x1216)
1.3 MB
1.3 MB PNG
>>103234598
>>
>>103234436
>inceptionv3
Gramps, we're using transformers now
>>
File: 3356713217.jpg (1.37 MB, 1536x2172)
1.37 MB
1.37 MB JPG
>>103234598
>>
after a few hours of testing largestral 3 q4, it seems more creative than 2 but actually quite unstable, perhaps using the official template would fix it when i try it at some point later but mistral seems to have overcooked
>>
File: maitemplate.png (73 KB, 645x480)
73 KB
73 KB PNG
>>103234690
>perhaps using the official template would fix it
Perhaps? You fucking think that? No fucking way!!!!! There's no possible way on earth that using the official instruct template could possibly have any influence on the outputs. Ridiculous...
>>
>>103234690
Anon...
>>
>>103234690
>perhaps using the official template would fix it
...
>>
File: sneeds feed and seed.gif (2.7 MB, 600x338)
2.7 MB
2.7 MB GIF
1) What should my expectations be for running an LMM under 12gb vram limitation? For example I need enough context length to ask it questions about a few related images(so should be in a single instance). Is something like this doable with quantized llama 3.2 11b or should I seek something else? I have no idea how much context images eat. Should I down sample images?
2) Is oobabooga suitable for this? It's the only tool I know to use regarding the matter.
>>
>>103234763
Use LLAVA to interrogate images
>>
>>103234763
>What should my expectations be for running an LMM under 12gb vram limitation?
Low
>For example I need enough context length to ask it questions about a few related images(so should be in a single instance). Is something like this doable with quantized llama 3.2 11b
There's no llama 3.2 11b.
>Should I down sample images?
Probably, but some inference software already does that. Better do it yourself, just in case.
>Is oobabooga suitable for this? It's the only tool I know to use regarding the matter.
Nice hammer.
Check what can run LLAVA or minicpm. llama.cpp has examples for both, but i think they're one shot. Nott sure if you can continue interrogating. kobold.cpp has a little more compat for images. Check their docs.
>https://github.com/LostRuins/koboldcpp/wiki#what-models-does-koboldcpp-support-what-architectures-are-supported
>>
>>103232918
Card = system prompt
>>
>>103234753
>>103234750
>>103234737
many previous models, including largestral 1 and 2 faired better with prompts that had nothing to do with officially recommended ones, newniggers
>>
>>103234799
You're going to be called a retard, anon.
>>
>>103234794
>There's no llama 3.2 11b.
https://hf.co/meta-llama/Llama-3.2-11B-Vision-Instruct
>>
>>103234737
Prompt templates matter little unless your model is extremely overcooked. Use Alpaca instead of whatever your favorite model is using and see that little will change
>>
>>103234802
Oh. Fuck me. I just remembered the 90B model.
>>
>>103234801
no wonder this general died, lmao
the only niggers left are braindead browns who cant even run largestral 2 let alone were there from before it to know anything

the only non-npc left is cuda dev, who keeps coming back for some reason
>>
>>103234816
>>103234810
>>103234808
>>103234802
>>103234801
>>103234799
Buy a fucking ad.
>>
>>103233051
Bump
>>
>>103234816
Meanwhile the people with multiple 3090s requiring constant hand holding and asking the most retarded questions known to man
>>
>>103234768
There a 2800 llava models in huggingface.
Which one do you refer to? Do they run well under oobabooga?
>>103234794
Well one shot sucks but I see. Could be a starting point to figure this shit out at least.
>>
>>103234607
for classification? are you sure?

>Evaluation of six different models on three different datasets shows that fully convolutional models, such as MobileNetv3, Inceptionv3, and ConvNexT, perform better than transformer-based models like ViT in nudity classification.
https://arxiv.org/html/2312.16338v1
>>
>>103234846
>Well one shot sucks but I see. Could be a starting point to figure this shit out at least.
Yeah. The llama.cpp implementation is fairly barebones. I'd suggest you go straight to kobold.cpp which still has it integrated with their server, if you're going to try any of them.
>>
>>103234825
Suck a fucking dick.
>>
>>103233647
What quant and settings?
>>
>>103234873
iq3
basic ass 0.7 temp nothing fancy
>>
>>103234829
Yes. It's the same as Llama 3.2 vs 3.1.
>>
>>103234851
Yeah I'm sure. In your study they used a LR of 1e-3 which is a retarded setting for a ViT (should be at least 1e-5)
>>
What if prompt ingestion was compatible between similar models? What if you can save input, close a model, load another model, and almost immediately start generating again?
>>
For anyone knowledgeable about parts- I’m thinking of getting a 48 gb vram card to upgrade my capacity to 96. Is the a8000 capable of exl2 calculations or will I have to grab an a6000 for it?
>>
>>103234993
RTX 8000*** not a8000
>>
>>103235001
rtx8000 is shit slow
I own one and it can pull 2-4 t/s on m2l iq3 when paired with a 3090 FE.
>>
>try a simple one-liner system prompt with new largestral on deterministic settings
>"You are {{char}}"
>"Who are you?"
>"I am a text-based AI model designed to..."
>change it to "You must act like {{char}}"
>it suddenly works
Nice shittune, frenchfags.
>>
Which front end do you sirs use for coding?
>>
>>103234856
Ok I opened kobold and turns out I have two models that I forgot about laying around, llava-v1.6-34b.Q5_K_M.gguf (using this partially on CPU obviously) and minicpm-llama3-v-2.5.
However while I can use horde, I can't get them running locally which is what I want. 1111 option does nothing and is stuck on "analyzing" and llava says "unsupported"? Am I missing some settings? Any help?
>>
>>103235028
https://github.com/ggerganov/llama.cpp/blob/master/examples/llama.vim
>>
What's the best speed/quality way to run largerstal on 96 gigs of VRAM? Don't need the full context, around 24k is more than enough for me desu. Currently doing 5.5bpw + Exllamav2, getting around 15/ts and somewhat slow prompt processing
>>
>>103235008
Have you tried running exl2 on it? I know iq quants are similar in theory but I got much better results with the former.
>>
1. You faked that, gpt translated ryona smut
2. I used your tiny screencap with like 3 pixels and it works.
Why do people now try to create false narratives to shit on ai?
>>
>>103235093
Was for >>103231641
Also regenerated 5 times. Never got different kanji or a single refusal
>>
>>103235035
It's quite janky but oneshotting minicpm works on oobabooga btw.
And it's really stupid sadly.
>>
>>103234799
>>103234808
Using deviating prompt to get better result: fine.
Using deviating prompt and claiming the model is broken: fucking retarded.
>>
>>103235093
buy an ad
>>
>>103235093
>Why do people now try to create false narratives to shit on ai?
Because maybe if enough people get demoralized, people will lose interest in AI as a fad and I won't lose my job and become redundant as a human being.
>>
>>103235150
But anon, you're working an intellectually demanding job, right? Right?
>>
>>103235175
Uh, define "intellectually demanding".
>>
File: 20241119_100032.jpg (58 KB, 457x799)
58 KB
58 KB JPG
>>103235175
>>103235150
I get that this is bait and fun etc, but wasn't translation for some time seen as a "soul" work? I mean transcription of ASMR was seen as essentially impossible to accomplish a few years ago, because no way any Programm can translate the shit if people moan, then the problem with the srt format. Now whisper + LLM can literally translate the whole thing.
>>
>>103235189
Things that machines can't do right now and won't be able to for the foreseeable future - critical thinking and problem solving, basically. STEM, programming, stuff like that
>>103235195
I'm not sure about it being soul work, but machines still aren't perfect at translating media. Then again, neither are humans if the recent anime translation dramas have been any indication
>>
>>103235093
>>103235102
why would i fake that? you didnt even select the proper model. i didnt use chatgpt 4o.
no clue what the difference is though.
>>
>>103235224
So my job as a frontend dev is safe?
>>
>>103235195
How do you set up whisper? I’ve been trying to do something like that for a lot of my Japanese ASMR
>>
>>103235244
It's literally one line of code with the transformers pipeline
>>
>>103235240
Not as safe as that of a backend dev, but probably safer than voice actors methinks
Honestly, I think we'll all be replaced at some point, some sort of UBI would be great
>>
>>103235195
Turns out, there is no soul. Only neurons. And there isn't much a machine can't do that a human can if given the opportunity to learn instead of being programmed instructions. Once they're given bodies, it's over for meatbags.
>>
>>103235276
>some sort of UBI would be great
Your optimism is inspiring, but you know they're just going to cull the excess population through starvation or war
>>
>>103235289
>they're
Who
>>
>>103235335
>Who
They
>>
>>103235289
Let him dream lol. If at this point of time you're still hoping for anything good from the people in charge, you're clearly too far gone.
>>
>>103235358
Who, specifically, are the people in charge? In charge of what?
>>
>>103235365
Your gov for starters, dummy
>>
File: 1676176075385463.gif (204 KB, 112x112)
204 KB
204 KB GIF
>>103235346
>>
>>103235365
>Who, specifically, are the people in charge?
Why don't you ask your little AI gf, sweaty.
>>
>>103235423
Emily knows
>>
https://github.com/ggerganov/llama.cpp/pull/10394
>Add OLMo November 2024 model
Merged 4 hours ago.
>>
>>103235457
So they added support for that shitty model, but still no Jamba?
>>
>>103235365
the new mistral large.
they really did big nigga dirty.
guy was a real OG back in the llama2 days.
>>
>>103235464
Jamba, Jambo, Jimbo. It's all the same, all memes you'll laugh at and move on.
>>
>>103235464
>Jamba
>>103230412
Jamba 1.5 Large 43.9% 32k
Jamba 1.5 Mini 30.4% 32k
Useless
>>
>>103235492
Did it work? Are you a real woman now?
>>
>>103235496
Why would I turn into a woman? Being stuck with a child brain and a weak body all my life doesn't seem like a fun experience
>>
>>103234816
>niggers
You yourself are not exactly contributing to an environment that attracts intellectuals to be honest.
>>
>>103235549
Intellectuals will certainly enjoy the lack of self-censorship. We've seen with Reddit what happens when you try to police opinions.
>>
is it possible to turn off any censorship or restrictions on (o)llama 3.1? whenever i try to get something funny or interesting it gives me a "i can't do that, dave" type of answer
>>
>>103231641
What if you tried reading the text from memory with something like cheat engine?
>>
>>103235598
The restrictions are for your own safety.
>>
>(o)llama
Go back
>>
>>103235598
We are sorry to hear that you are having illegal thoughts. Don't worry, we'll soon make IoT cock cages mandatory to keep you safe, citizen.
>>
File: 1731992531028104.png (579 KB, 512x768)
579 KB
579 KB PNG
>>103234845
You only see posts from those who can't figure it out. Also, the rabbit hole of unlocking the full potential of a 4x3090 setup runs deep once you venture beyond custom-hacked GPU drivers and motherboard firmware modifications to unlock large BAR, and because few people are attempting this, information is really scarce.
>>
>>103235621
Your unwarranted elitism is why nobody likes you and why you have no friends

>>103235629
It actually gave me a suicide hotline answer once too
>>
>>103235637
I only briefly tried that bloatware and quickly canned it (yes I do in fact want to control where and how hundreds of GB of files are handled), but the censorship and rejections are the model, not the software. I.e. find a better model.
>>
>>103235609
yes, the pc98 emulator even has a (well hidden) flag to output the text of the game.
i could never reliably do it in retroarch on linux though.
more than needing this immediately it just would be cool to get it to work. especially locally since japanese needs context, the more the better. not gonna buy dollarinos for each new sentence, context is pricey.
context is also the only reason those fairseq translation models from facebook for jp/en suck. they translate perfect actually, but they have no context.
in many ways llm is ideal for all of those problems.
>>
>>103235656
>I.e. find a better model.
Which model would you recommend?
>>
>>103234436
Anon what are you using this for? Filtering large collections for interesting stuff? I have a side project in mind where this might be useful: using classifiers for bulk processing followed by LLMs for detailed extraction.
>>
>>103235636
>You only see posts from those who can't figure it out
True, but statistically speaking, those with multi gpu setups (especially multiple 3090s) are an absolutely tiny minority, so one retard makes much more of a difference
>>
>>103231641
>claude
労う is ocr wrong
>mistral
lmao
>>103235093
都案 is ocrd wrong
>>103235229
>4o
された is ocrd wrong (which show how fucking retarded current llms are since not even a child could fuck that up)
>4o-mini
快くand 労う are ocrd wrong

>inb4 "j-j-just reroll it until it's correct!"
you don't know when it's correct unless you know japanese, and if you know japanese you don't waste time on this shit

i'm the first to defend "proper" mtl because troonslations are way worse, but the current tech still isn't there. 2 more years unironically and it will be flawless, but for now it's not reliable enough
>>
>>103235710
Bro, current vision models can't even OCR a paragraph of English text without hallucinating. They all fucking suck.
>>
>>103231827
malicious node
>>
>>103235666
Mistral models are generally very uncensored. If that's not hardcore enough for you, look at popular fine tunes and see which one suits your poison.
>>
Maybe it's time for a day off to relax and let loose.
>>
>>103235926
Cute gen
>>
>>103235710
https://github.com/kha-white/manga-ocr ?
>>
>>103235673
filtering images in a mitm http proxy. domain-based filtering is way too granular.
>>
>>103235861
Thanks!
>>
>>103230542
frakenmerging has gone too far
cute-ish!
>>
why is only cuda dev brave enough to post here?
>>
>>103235990
It's kinda dumb that Vulkan is so far beyond of CUDA and ROCm, games could certainly use all the compute for modern rendering techniques - graphics is no longer only about shading triangles.
>>
>>103236033
Justin the tranny also comes here, but we bully him away every time he shows his unmistakably manly face.
>>
>>103235224
>codemonkey
tried to sneak that in lol
>>
File: 117984567167.jpg (195 KB, 1200x1200)
195 KB
195 KB JPG
>>103232391
only buy amd if your really willing to suffer.
AMD hates you, they dont even want to be making large cards now, remember?
Your only available copes are;
>lower price
>still got 24gb vram
>rocm on linux isnt as bad as the average nvidiot would lead you to believe
>rocm on windows isnt bad either

>t. 7900xtx user
I hold out on the hope and cope some dev will come and make AMD cards super viable but then they dunked on ZLUDA and now want to abandon large vram cards AYYYMD bros it is not lookin funky fresh.
>>
>>103236132
programmer != codemonkey
>>
>>103234598
I like this Defoko
>>
>>103232796
that was cruel
>>
Anyone from quant cartel here? I would love to get a 4.5bpw of Llama-3.05-NT-Storybreaker-Ministral-70B-exl2-longcal if possible.

The model itself is great at 4.0, but I can't help but feel that it's missing that perplexity inflection point at 4.5 bpw that would make all the difference in some of the smaller mistakes I'm finding in inferences. Snake oil or not, you are the only people doing these long-cal quants and they've been some of my favorite models since tenyx-storywriter.
>>
Anyone get Qwen2.5 working with speculative decoding? On a 3090 in llama.cpp with Qwen2.5-Coder-32B-Instruct at IQ4_XS I get 28 tok/s. Adding Qwen2.5-Coder-1.5B-Instruct at Q4_K_M as a draft model with 12 tokens speculated and greedy sampling I get 70 tok/s. Now I'm going to try exllamav2 with tabbyapi.
>>
File: 1732000019532744.png (478 KB, 512x768)
478 KB
478 KB PNG
>>103232391
Only if you're on Linux and you don't need voice generation, also expect to have only 2/3 of the performance of similar Nvidia cards: not only ROCm performance is shit, but you'll also miss out on faster frameworks like FlashAttention2 and xformers. You'll also have to reboot each time you OOM on VRAM due to "Memory access fault by GPU node-1"
Overall, it's not as bad as it was just a year ago. Projects like exllamav2 and stable-diffusion-webui aren't much harder to install than they are on Nvidia
>>
File: tq8b05cgeiw61.jpg (103 KB, 639x397)
103 KB
103 KB JPG
>>103235972
All this stuff doesnt work well anon.
Like I wrote you need saturation and brightness changes.
In their examples its clear manga pages. That might work well but in the case of my example usually OCR dies.
pc98 font + semi transparent textbox.
>>
>>103235972
if you really need to read mtl'd vinnies then use lunahook or whatever, not ocr, in that case the only thing the ai can fuck up is the translation and not the ocr part
>>
>>103236336
>that perplexity inflection point
jesus fucking christ please die
>>
>>103236136
I failed catastrophically to get an 6800XT working on Windows. It could be an RDNA2 thing, or perhaps I have hands growing out of my ass
>>
>>103236416
Since you said you can extract the text from the pc98 emulator, why don't you few shots it with some examples? Mixtral base Q8 was doing that fairly well if I remember correctly https://rentry.org/9q3ox
>>
>>103236132
Would you prefer I use the term "software engineer"? "Application developer"?
>>
>>103236592
those are all cinnamons for codemonkey
>>
File deleted.
>>103236136
>only available copes
Also much lower idle power consumption, like, 7W on 6800XT vs 20W on a fucking 3060. I have some piece of shit 3090 that cannot idle below 45W. Could be important if you need a headless machine that runs 24/7
>>
>>103233241
>Ministral
What settings are you using for this? Neutralized samplers? Llama 3 prompt/instruct?
>>
>>103236631
>why doesn't X have feature Y? why is Z still broken?
>why don't you do it yourself?
>i don't know how, i'm not a codemonkey
>>
File: file.png (14 KB, 367x269)
14 KB
14 KB PNG
Pic related is the models I have from last time I was here
What's new in the field that can run on an RX 580 8gb?
I need 4 separate models:
>General
>Uncensored
>Coding
>Creative
>>
>>103236695
and? start typing
>>
>>103236631
Okay then
>>
What the hell is test time compute? ToT in a trenchcoat?
>>
File: GbD-7tXbAAEWkrK.jpg (299 KB, 1600x2000)
299 KB
299 KB JPG
>>
>>103236777
Cope
>Arguably the simplest and most well-studied approach for scaling test-time computation is best-of-N sampling: sampling N outputs in “parallel” from a base LLM and selecting the one that scores the highest per a learned verifier or a reward model [7, 22]. However, this approach is not the only way to use test-time compute to improve LLMs. By modifying either the proposal distribution from which responses are obtained (for instance, by asking the base model to revise its original responses “sequentially” [28]) or by altering how the verifier is used (e.g. by training a process-based dense verifier [22, 45] and searching against this verifier), the ability scale test-time compute could be greatly improved
>>
>>103236777
>can we see it?
>No.
>>
File: pepefroggie.jpg (38 KB, 780x438)
38 KB
38 KB JPG
>>103236816
And they expect people to believe they'll get ASI this way?
>>
File: 1724304656800309.png (6 KB, 280x107)
6 KB
6 KB PNG
how bad do i fuck up if i connect kobold as text completion
>>
>>103236262
If you think that was cruel, you should see the continuation where I try to make her refactor llama.cpp
>>
File: kobold.png (68 KB, 668x338)
68 KB
68 KB PNG
>>103236855
? That's what you are supposed to do.
>>
I think my remote pc's gpu might have partially unplugged itself or something, shit just started crashing when it's under load (like 40% tdp)
Now I can't run models until I get back home in a few weeks to fix it, I hope it's nothing serious
OpenRouter saves the day, I suppose
>>
>>103236936
My condolences about your house fire.
>>
>>103236849
Investors will continue dumping money into AI regardless of progress, as stopping now would result in a spectacular crash.
>>
>>103236336
Added to the queue, friend. Next time though, add a comment on the model on HF, makes it easier to find.
>>
mikufaggots ITT completley ruined miku for me.
>>
File: migu general.jpg (151 KB, 1216x832)
151 KB
151 KB JPG
fake news
>>
File: Gci-yFNaoAAHgBc.jpg (429 KB, 1536x2304)
429 KB
429 KB JPG
kurisufaggot supremacy
>>
>>103237351
the problem with kurisu is that she canon ships with okabe
miku is canonically everyone's pure, untouched free-use onahole
>>
>>103237367
How about amadeus kurisu? Which btw is on topic.
>>
>>103237393
>amadeus kurisu
anon it's cope. this is effectively jerking off to your crush's pics while she goes off to fuck other guys
kurisu is cuck material, impure, used goods
it'd be less gay to go for luka
nice design, though
>>
>>103237367
miku is married to multiple japanese salarymen in the realworld
>>
File: 1732006777147275.png (755 KB, 728x512)
755 KB
755 KB PNG
>>103237270
There's no single Miku. She can be anything and everything. It's literally impossible to ruin her.
>>
File: 00145-635461972.png (1.34 MB, 848x1200)
1.34 MB
1.34 MB PNG
Mikuhate confuses the Miku
>>
File: Gcpnhi5akAAtKFh.jpg (460 KB, 1536x2304)
460 KB
460 KB JPG
Kurisu is cute and pretty and smart
I don't feel embarrassed making an LLM roleplay Kurisu on openrouter
>>
>>103236988
If the gpu is actually broken, I'll be pissed
Got a good deal for it and the previous owner barely used it, so it really shouldn't break within just a few months
>>
>>103237475
many far better options:
emotionless Moeka spamming SMS messages to your cellphone to stop slamming it into her so hard but you don't bother reading the messages
retard mayushi singing juicy karaage #1 while using her irresponsibly massive jugs to get you off
genki suzuha casually changing right in front of you because you're bros like that
absolutely railing the daylight out of luka's ass while making *her* apologise for trying to pass as a miko
>>
File: 1712185506804649.gif (1.11 MB, 640x352)
1.11 MB
1.11 MB GIF
>>103235471
>>
>>103237316
Why are her hands so fat?
Also she has a monkey-like skull.
>>
>>103237600
>why are
jej
>>
This (>103237419) is the kind of mental illness that ruined miku for me.
>>
>>103237316
>neru got turned into a computer monitor
cruel fate
>>
File: Untitled.png (13 KB, 837x513)
13 KB
13 KB PNG
>>103237720
>>103237720
>>103237720
>>
>>103237518
I want kurisu cause she is smart. Just like my oneitis that I told to leave me alone, after talking online every few months for the past 15 years.
>>
>>103237749
WHERE WERE YOU TWO HOURS AGO WHEN I HAD TO SCOUR ARCHIVES FROM THE LAST 2 MONTHS TO FIND THAT PICTURE
IT'S IMPOSSIBLE TO FIND BY FILENAME
HEAR, HEAR, O FUTURE SEARCHER! TETO MY BELOVED PNG CAN BE FOUND HERE >>103237749 >>103237749 >>103237749
>>
Is Command-R 35B still the best model one can fit in 24gb at good speed? I missed out on some weeks of updates.
>>
>>103237799
Not him but why didn't you just try asking for it?
>>
>>103237817
You can just post "I still think CR is the best", you don't need to set yourself up to do a samefag reply like this
>>
>>103237518
my second best girl is actually Faris..
>>
>>103237849
no, actually curious. Last I heard some interesting models were in the work, but more often than not they turn out to be 70B+.

If there actually was an upgrade in the 30b range, I have no problem updating.
>>
>>103237834
I would if I couldn't find the picture, but I remembered seeing it relatively recently.
>>
>>103237985
CR is still the best
>>
File: 1732048464226.jpg (796 KB, 984x1368)
796 KB
796 KB JPG



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.