[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 39_05556___.png (812 KB, 720x1280)
812 KB
812 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101981616 & >>101970380

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
I'm the best
>>
Is the new Magnum Largestral finetune any good (as opposed to base Largestral)?
>>
>>101990728
This Anon is the best!
>>
►Recent Highlights from the Previous Thread: >>101981616

--Nothing, local is dead.

►Recent Highlight Posts from the Previous Thread: >>101982458
>>
>>101990781
I'm having fun with it for long (E)RP
>>
>>101990805
Bitnet will save local!
>>
>>101990805
Seething schizo. Local is better than ever. Stay mad.
>>
►Recent Highlights from the Previous Thread: >>101981616

--Paper: SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models: >>101985710
--THUDM/LongWriter repository for long context LLMs word generation: >>101986037
--Used 3090 shilled on /lmg/ due to being cheapest 24GB GPU: >>101986950 >>101986969 >>101986973 >>101986974 >>101988401 >>101987334 >>101986991 >>101987691 >>101987707 >>101987714 >>101987755 >>101987825 >>101987953
--LLMs used for game development and improving productivity: >>101985401 >>101985435 >>101985592
--Different hardware affects reproducibility of AI-generated logs: >>101986552 >>101986631 >>101986664 >>101986687 >>101986716 >>101986752
--Collaborative AI storytelling and roleplay session proposed: >>101982186
--Tensor Parallelism with uneven GPU count is unsupported: >>101987979 >>101988308 >>101988418
--Mistral Large can run locally but has low tokens per second: >>101983668 >>101983702 >>101984650 >>101986432 >>101986891 >>101987197
--MiniCPM struggles with image insertion in RP scenarios: >>101985433
--Llama 3.1 70B works well for some users, but has issues with chain of thought for others: >>101984397 >>101984421 >>101984513
--Flux performance on AMD hardware is currently slow: >>101987632 >>101987674 >>101987771 >>101987779 >>101987801
--DRY issues in koboldcpp 1.73, user relies on MiniCPM support: >>101983865 >>101983869 >>101983896 >>101983976 >>101983976
--Zeyuan Allen-Zhu's ICML keynote talk is back up: >>101985742
--Reminder to compile with -j flag and cores: >>101985928
--Miku (free space): >>101982419

►Recent Highlight Posts from the Previous Thread: >>101981738
>>
teto...
>>
>>101990781
the iq2_s works for 2 3090 pretty well, i'm liking it
>>
>literally zero magnum finetunes for llama3
Why??
>>
> try one of those Mistral Nemo 12B based models
>in my eyes the scenario is of middling complexity
>bot is a thief in a wizard's tower
>they encounter my character trapped in glass prison
>80% of the time the thief still reaches trough the glass or somehow the wizard became a friend.
shits still retarded
>>
>>101991006
Because it sucks.
>>
>>101991011
Are there any models in that weight bracket that do better?
>>
>>101991032
Sucks how? Benchmark numbers look good and I'd rather run 70B at 4.5bpw than 123B at 2bpw
>>
as a retard who only has experience with claude, is my 3.5k sys prompt going to work well out of the box on a model like mistral large? or am i best off paring back the instructions and leaving just basic stuff?
>>
File: large-vs-magnum.jpg (1.42 MB, 3228x5082)
1.42 MB
1.42 MB JPG
>>101990781
Here's one test.
Large on the left with temp 1.8 / min p 0.1 vs Magnum with temp 1.0 / min p 0.05
>>
>>101991079
>is my 3.5k sys prompt going to work well out of the box on a model like mistral large?
no
>>
>>101991095
thanks anon, i appreciate the help. are the very short prompts i see that are just `write like char.` and then the card + persona contents the ideal length/complexity wise?
>>
File: 1713291463105789.jpg (980 KB, 1856x2464)
980 KB
980 KB JPG
>>101990712
>>
>>101991011
Mistral did something fucky with their datasets because Mistral-Math, Mistral-Nemo and yes, even Mistral-Large occasionally struggle with certain basic bitch concepts, such as possession. (Yeah I've had fucking Mistral-Large, at Q5, not even some brain-damage quant flip fucking possessive clauses in the middle of an RP)
>>
>>101991128
omg it looga and migu
>>
Why is nobody talking about Magnum v2 123b?

This model beats Qwen 2 72b and comes really close to Llama 3.1 405b in a bunch of benchmarks. 53.6 on UGI leaderboard is absolutely insane, Hermes 3 405b has just 66.71. And with 4bit quants, you should be able to fit it on three 3090s.
>>
>>101991238
>53.6 on UGI leaderboard
But Mistral Large is 55.45.
>>
>>101991238
We're all vramlets.
>insane
That's less than the instruct tune, you know?

And imo, Magnum v2 123B feels just like Largestral lobotomized.
>>
stop replying to the schizo who keeps posting about magnum
>>
>>101991238
So this is the quality of pasta in /lmg/.
How disapointing.
>>
>>101991006
We tried and it's just bad
>>
>>101991238
It came out yesterday, give people time. Also this >>101991285 What did they tune on to cuck it up?
>>
l3 sucks
>>
>>101991664
prompt issue + buy an ad
>>
>>101991238
>Why is nobody talking about Magnum v2 123b?
Because you didn't buy an ad
>>
>>101990805
let it be known that mikutroons want /lmg/ to die
>>
>>101991238
Make a money transfer for an information banner on the bottom of the page.
>>
>>101991132
That's a language issue. If you use Prolog instead of English, you'll never have that problem, because Prolog specifies possession explicitly.
>>
File: 1719595846011771.jpg (1.7 MB, 3218x2968)
1.7 MB
1.7 MB JPG
>>101990712
good morning I love teto
>>
Do you think we will get new models before end of year?
>>
>>101991825
whoa cute tet
>>
D'you think you could buy a 16gb V100 and upgrade it to 32gb? I kinda think it's possible. Just have to find a source for the chips.
>>
>>101991866
There's more to life than vram. Turn back while you can, before it's too late. Dedicated LLM cards will be coming in the next few years..
>>
>people shit on model makers for making slop
>but give nvidia, the biggest bottleneck, free pass
>some even pride themselves on giving money to leather jacket man
>>
File: LLM-history-fancy.png (732 KB, 6285x1307)
732 KB
732 KB PNG
>>101991847
100%. Cohere, DBRX and Chyna all can drop stuff. After elections, late November a big one will come out.
Source: look at the cycle, every ~4 months is a new era.
>>
>>101991923
Because Nvidia made all of this possible. Sloptuners are like video game modders, they live by fooling gullible people into thinking they know better than the billion dollar corporations that built the very tools they're using.
>>
>>101991935
Llama1 mid size was 33B, not 34B
>>
>>101991958
People shit on llama3 too though
>>
>>101991997
Yeah but nobody's trying to tune it, so they're not pretending to improve it.
>>
File: LLM-history-fancy.png (737 KB, 6277x1302)
737 KB
737 KB PNG
>>101991975
Corrected and updated.
>>
File: 1438271983159.jpg (149 KB, 500x608)
149 KB
149 KB JPG
Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.
>>
>>101992319
>7900XTX is both faster and cheaper, including AI.
Really? Better than 3090? Have anon tried? What's the current state of AMD GPUs?
>>
Has anyone created a cringe leaderboard? I want to use the MOST cringe model there is. I don't care what model you think is cringe, I want objective metrics.
>>
File: 1549655806676.jpg (75 KB, 1024x683)
75 KB
75 KB JPG
>101990805
>Russian in filename
>anime.reactor.cc
What in the-
>https://joyreactor.com/post/5896280
>it's real
>>
>>101991935
>Chyna
I've never heard of this company before
>>
>>101991701
>t. Sad, obsessed little faggot.
>>
>>101992768
>stupid poor zigger can't run local models and is butthurt
like pottery, always knew that "local lost" posters were third worlders
>>
>>101992785
Yi, Qwen, GLM, Deepseek?
>>
>>101992630
ok sure. i will be here.
>>
are mradermacher's quants worth downloading?
vaguely recall some posts a while back about his quants being fucked up but sadly they're the only IQ4_XS options for some models
>>
>>101992824
yeah i know all these, but not Chyna
>>
>>101992834
I always check his quants for NaNs before using them.
>>
File: chyna.jpg (16 KB, 335x335)
16 KB
16 KB JPG
>>101992845
Chyna, folks, Chyna. Let me tell you, it's a big country, huge! Over a billion people, can you believe that? They've got these massive cities, like Beijing and Shanghai, with skyscrapers going up all over the place. They're building things left and right, I mean, they're really good at building things.

But, you know, we've got to be careful with Chyna. They've been taking advantage of us on trade for years, stealing our jobs, ripping us off. We need to get tough with them, negotiate better deals, bring those jobs back home. Make America great again, right?

But hey, Chyna's a big player on the world stage, you can't ignore them. We'll deal with them, believe me, we'll deal with them.
>>
>>101992845
Ching-ling, anon...
>>
>>101991923
>some even pride themselves on giving money to leather jacket man
Though almost everyone buys used GPUs from miners
>>
>>101992932
buying a used GPU takes it off the market, so the next person who wants to buy one has one less available
demand is demand, it all benefits nvidia in the end
>>
>>101992811
Nay. I can say with confidence that local is dead because I CAN run local models, and they all suck.
>>
>>101992222
neat chart
>>
>>101992222
You forgot Mistral Nemo
>>
File: 39_05488_.png (949 KB, 1280x720)
949 KB
949 KB PNG
>>101993017
picrel and also nice filename lmao
>>
>>101992222
Sad that people hate on Llama 3.0 so much. It was essentially an early preview release. Maybe Zucc shouldn't have pushed them to release it early.
>>
>Hermes-3-Llama-3.1-70B
not very good. it writes weird, like a list in paragraph form.
>he got up and walked to the kitchen. he grabbed a glass. he filled the glass with water. he sat back down on the couch with the glass. he took a sip of the water.
i dunno what that type of writing is called but it sucks, and it gets like that with other words/phrases
>>
>>101993088
quite a few say 3.1 is even worse so, i'm not sure
>>
>>101992916
Will there be a chink model that breaks through to the big dog league on lmsys? Deepseek-coder already did in it's specific niche, but the highest ranking general model is Yi-Large and it's lower than 70B3.1.
>>
I fucking hate it when model makers don't put the B count in the name. Like, it's not gonna make me download it because I can't SEE the 8B, I'm just gonna get to the page, see the filesize, then leave.
>>
>>101991238
There's no 'official' 5.5bpw quant so I had to make my own for my 96gb vramlet build. It's done now so I'll try it later.
>>
>>101993137
Got it, I'll add a bunch of unused tensors with random garbage in the future.
>>
>>101993088
So, where's the final llama3? 3.1 was even worse, local would have been dead if it weren't for Mistral saving us.
>>
>>101993176
I'm not saying don't MAKE 8B tunes, I'm saying label them as such so I don't have to waste my time and the people who want 8B tunes can find them more easily.
>>
File: dusk.png (45 KB, 1287x380)
45 KB
45 KB PNG
>>101993137
You don't check the source repos, i take it...
>>
>>101993123
>Bans selling GPUs to China
>Where is mah model
Burgers, so stoopid
>>
>>101991923
its not nvidias fault everyone else sucks
>>
>>101993190
>>101993112
It was smarter than 3.0 in my testing. Mistral's cool but but to be fair it's probable they wouldn't have released stuff like Large if Llama 405B and the Llama series did not exist.
>>
File: 1718458864628527.png (581 KB, 1028x498)
581 KB
581 KB PNG
>>101993283
Blame Sam "Track all AI-capable GPUs and limit the export of AI-capable hardware" Altman, leader of the official government-backed American AI ethics council (to which he did not invite any company that deals with open source for good reason). He's the one making the laws, the government is his puppet in these matters.
>>
>>101992932
a used mining card is better than a used gaming card on average as miners took better care of the cards & constant workload means no thermal cycling.
also no one was mining on 3090s
>>
>>101993287
Yeah, they sure have nothing to do with it
https://www.tomshardware.com/pc-components/gpus/rival-firm-says-nvidias-ai-customers-are-scared-to-be-seen-courting-other-ai-chipmakers-for-fear-of-retaliatory-shipment-delays-report
https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers
>>
>>101993137
surprised you got that far. As soon as I saw bartowski I did a 360 and walked away.
>>
>>101991076
Dude I don't know how to tell you, but benchmakrs only show how good the model is at being good at benchmarks and anything else is just a coincidence.
>>
How long until I get a model that is good enough to coom at?
>>
>>101993444
huh?
>>
>>101993396
Every 3090 I bought in Japan had been previously used for mining.
>>
>>101993472
monday, 3pm
>>
>>101993397
It's surprising that they manage to do this, especially given the reality of the AI race. NVIDIA's monopoly stifles competition and ultimately harms the West more than China
>>
>70B 4bit quant
how slow is it to run on a single 3090 + cpu?
>>
>>101993566
About 1-2T/s slow
>>
>>101993582
Huh. NTA, but effectively the same as a 3060 + RAM. It's amazing how much of an oppressive bottleneck CPU offloading is.
>>
>>101993566
I think I got like 1.5t/s using a 3090 + DDR5-6000 RAM running llama2-70b q4 when I was still stuck with a poverty build like that.
>>
>>101993472
Depends on how long it takes you to download Mistral Large.
The real question is how long until there is a model that doesn't need excessive amounts of handholding.
>>
>>101992632
>What's the current state of AMD GPUs?
Still tensorcorelets, at least on RDNA. That's *a lot* of specialized TFLOPS you're giving up.
>>
File: wew.gif (674 KB, 474x498)
674 KB
674 KB GIF
Am I missing something with this character ai bullshit btw?

I remember some anons shilling it nonstop these past few weeks, I finally tried it out and it's not only filtered (lol) but the bot responses are literally 12b model tier.

Is it just their responses being shorter (literal prompt issue) or am I missing a special bot on that cringe website that everyone uses?

I tried this cat woman one and it's literally just GPTisms out the ass and similar type responses to most 12Bs i've used. In fact, i'm sure the model they use is 12B
>>
newfag or bait?
>>
>>101993766
lmao
gottem
>>
>>101993766
You need a certain level of intellect to appreciate Character AI's genius.

Character AI has soul the likes of which local could never hope to match
>>
>>101993766
When people talk about c.ai they usually mean the early era of peak soul before it got lobotomized to hell.
>>
>>101993766
You're over a year late. Missed it.
>>
>>101993888
Brain issue, it's still light-years better than any local model.
>>
File: 1724137438609531.png (543 KB, 512x768)
543 KB
543 KB PNG
Given that prompt processing is batched and fast, even with offloading, is it technically possible to process prompt using FP16 and then generate tokens with lower quants for potentially better responses?
>>
>>101993766
12b models are so focused on that they legit outperform every 30b model in my experience.

CAI is defo smarter, I figure it's on a 70b or some shit because its responses are clearly more "aware" I guess is how I would put it, but yea, simply implying that CAI sucks because it gives "12b" responses isn't saying much when 12bs, at least in ERP, outperform a lot of higher models nowadays.

Especially the older ones, I don't give a fuck how many people swear by Command R, when it comes to cooming it's not only far slower due to the RAM context eats up but it's unironically no better than Nemo or other smaller models
>>
>>101993920
no
>>
>>101993920
>is it technically possible to process prompt using FP16 and then generate tokens with lower quants for potentially better responses?
that's kind of the _L / robert zeroww thing no? not quanting some parts of the model and keeping them at f16
>>
>>101993920
In principle yes but I think it's just not worth the effort and added complexity vs. lots of other improvements that could still be made.

>>101993960
No, I think that one just kept the output tensor at FP16 instead of q8_0.
>>
>>101993960
No, it's more akin to having a full FP16 version of the same model and using it solely for prompt processing.
>>
File: Untitled.jpg (83 KB, 982x245)
83 KB
83 KB JPG
>>101993766
its beyond pozzed which makes reading the complaints amusing https://old.reddit.com/r/CharacterAI/
>>
>>101993823
Sounds like temperature above 1.5
>>
>>101993987
It depends on the outcome. I can imagine FP16 + Q2 potentially surpassing Q5 in short answers, given that most of the thinking occurs during prompt processing
>>
>>101994006
I had a lot of fun trying to coax bots into fetish stuff with it in the past, but now it's so dumb and strict that's it's not worth it.
>>
>>101994049
Yeah but especially on newer hardware the prompt processing speed falls off pretty hard if you can't fit the entire model.
And if the answers are short anyways, why not just use FP16 for everything?
>>
>>101993920
Interesting idea, but at very long contexts, the prompt processing would get pretty painful if you are moving between chats or you want to modify something early in context. I think potentially there could be a middleground instead, where we store a copy of particularly "important" layers at higher precision on RAM, and process the prompt using those, while token gen uses the lower precisions stored in VRAM. Though as cuda dev says this type of idea is adding more complexity and I don't think anyone's going to do it.
>>
>>101994116
it seems so bad if you just used kobold in adventure mode (shorter responses) with a tiny 8b you'd get better responses
>>
File: ComfyUI_00969_.png (1.3 MB, 1256x1024)
1.3 MB
1.3 MB PNG
>>101993766
You got trolled by unironic and ironic schizos like >>101993913 or this >>101993823
Lurk five more years before coming here again
>>
>>101994184
We can store processed prompts alongside chat history
>or you want to modify something early in context.
I hardly ever done this with a long chat log.
>>
File: 1724139161767763.png (520 KB, 512x768)
520 KB
520 KB PNG
>>101994184
To test this idea, all that's needed is the ability to save and load processed prompts. Then, we can process lengthy prompts using FP16, load Q2 with a previously processed prompt, and check if it improves the response.
>>
>>101993920
This would not really work well for new chats. If you are using the lower quant to generate new tokens, then basically all tokens in that chat from the model's responses are going to be from the lower quant anyway.
>>
Anyone tried multi-device local models? Want to have a local AI on my watch/laptop that's actually running on a bunch of graphics cards in my basement.
>>
>>101994443
It will comprehend a character's card more effectively. You may also use chat examples, either generated using FP16 or written manually.
>>
>>101994488
Most (all?) inference programs have some sort of API already. New in the subject?
>>
>>101994493
The model's responses are still important. I don't know how many people are that happy with low context chats. And if they are then they might not be people that care all that much about stuff like this anyway.
>>
>>101994546
It's uncertain whether it will have a negligible or significant impact on the outcome. No one can know.
>>
>>101994278
>We can store processed prompts alongside chat history
We are already short on VRAM and in some cases even RAM. It'd probably be helpful if we could tell Llama.cpp to store and recall contexts from a save folder or something, but I'm guessing there are some nasty complications that would make that not possible easily.
>I hardly ever done this with a long chat log.
Maybe for you. I like modifying the card/system prompt during chats.
>>
File: 0fa.jpg (1.05 MB, 3264x2448)
1.05 MB
1.05 MB JPG
Is Nemo unironically the only under 70b model worth running?

I feel like i've went through everything, Qwen, Gemma, Llama, Command R, yi, mixtral but I always find myself going back to Nemo.

I just wish the bigger finetunes weren't so fucking horny, instruct
>>
>>101994601
Obviously, I was talking about storing it on an SSD
>>
>>101994437
You can do that with llama.cpp can't you? Save the processed prompt to a file for each slot.
>>
>>101994638
basically, yeah
the finetunes are too horny and never decline
base nemo sits at a great point of reasonable-ness for fun assistant/RP
>>
>>101994638
Yes. Unironically.
>>
>>101994638
pretty much, yep, without irony
>>
>>101994667
Base nemo is just so dry though, I don't get how people prefer it for ERP.

Even if the finetunes are overly horny, usually with good prompting/cards you can find a good balance. Base nemo just snoozes me
>>
>>101994584
Not exactly, but think about it again. You're proposing a hypothetical scenario where someone uses FP16+Q2 over say just a simple Q4. If half the chat is processed by the FP16 and half the Q2, the resulting intelligence is, very likely, something in-between anyway, which likely wouldn't be far from Q4, but you're making the work on the backend more complicated which isn't a good thing for long-term development, and you're giving up actually a pretty significant amount of prompt processing speed which still sucks for people who use models differently than you.
>>
>>101994745
just a taste thing, im an ESL so the dryness doesn't register too much with me
>>
>>101994644
Like I said > It'd probably be helpful if we could tell Llama.cpp to store and recall contexts from a save folder or something, but I'm guessing there are some nasty complications that would make that not possible easily.
>>
>>101994763
> over say just a simple Q4
You'll need 4x3090 for largestral at Q4. Imagine running it on just 2, but with better responses.
>very likely
Or not likely at all. Who knows.
>you're making the work on the backend more complicated
Same can be said about CPU offloading as well. Anyway, I'm just tossing idea that stuck in my head. It might work well, or it might not. There's no way to know until someone give it a try.
>>
I think the context saving idea could be worth it. Would there be any issues with this?
>specify a folder for saving contexts to
>specify a maximum number of contexts to save, so older contexts get thrown away and therefore you can limit how much space it takes up on your drive
>specify if you want to save contexts to begin with
>when enabled, the program automatically saves contexts to the folder, along with metadata containing the actual text prompt and the model itself (Llama 3.1 70B Q8_0.gguf etc)
>and when a prompt to process is received, the program checks if there is a match between the prompt and any of the saved contexts, and a match with the model, and if there is, uses the saved context
>>
>>101994937
>You'll need 4x3090 for largestral at Q4. Imagine running it on just 2, but with better responses.
I was thinking more someone using a Q4 of something smaller, so the FP16+Q2 would have some of the FP16 held in VRAM, which would help prompt processing. If you are using purely FP16 in RAM, that is a significant loss in prompt processing speed.

And I disagree with the idea of "Who knows." It is always necessary to have an estimate of the real gains and confidence of that before "trying" something which you could spend time on that would've been better spent elsewhere. Cuda dev does not need to be trying every little idea someone has that would result in questionable gains for trade-offs in other areas.

>Same can be said about CPU offloading as well.
No, because that's an essential feature of Llama.cpp. What you are proposing is something added on top (of something that is already complex).
>>
>>101993043
No one uses that.
>>
>>101993617
I have even less and get those same speeds. Crazy that even tripling my vram would get practically zero speedup. I guess the real speed comes when you get more than 70% of the model into vram.
>>
i think i prefer normal rep pen over dry. dry settings were the default of 0.8 multi, 1.75 base, 2 len. rep pen is 1.05, 2048 length for 16k context with various l3.1 tunes. dry was fine on l2 70b tunes but it doesn't cut it for l3.1 there is just to much repetition, and if you mix dry/rep pen it starts to mess up text. i've gone back to rep pen for now.
>>
is there a 405b llava yet
>>
File: offload_x_performance.png (96 KB, 1536x1152)
96 KB
96 KB PNG
>>101995240
>I guess the real speed comes when you get more than 70% of the model into vram.
yup
>>
>>101995250
have you tried turning rep pen off completely without dry, i run zero rep pen, no dry, no freq and only get sentence repetition on the shitest of tunes
>>
>>101995293
Based on that you'd still think you'd go from 1.7T/s at 20% (8gb) to higher than 2 at 60% (24gb) but maybe the 70b curve is even flatter at the start.
>>
>>101995337
yeah in the past, but its been a while since i did that. i feel you need a tiny penalty, but once you set something to high you'll notice spelling errors and messed up text. with dry on l3.1 i get so much repittion i wouldn't be surprised if it wasn't working at all (i saw that comment about 1.73 but this has been going on since they implemented dry for me). i'm still trying l3.1 but now with just rep pen and repetitive, but not as bad
>>
>>101995416
i'm wondering if having any penalty at all somehow might increase some kinds of repetition personally
>>
>>101995451
Maybe, I had the best results with 3.1 (70b) with neutral samplers. As soon as I messed with anything it started being kinda repetitive (but only like 16k in). Could just be luck, I only tested it a few times.
>>
>>101995451
i'll try it with minp 0.05, no temp anything else
>>
>>101995494
Is hitting "neutralize samplers" actually enough to get the samplers neutral? I notice some of them are still on after hitting it.
>>
>>101995529
for some samplers a number of 1 (or 0 sometimes) is considered off
>>
File: st.jpg (67 KB, 358x677)
67 KB
67 KB JPG
>>101995529
also select zen sliders in st. they look better and show a clearer off state
>>
File: 172414380273232.png (398 KB, 396x579)
398 KB
398 KB PNG
>>101995118
I disagree with your disagreement. Especially in ML, when attempting something never done before, predicting the outcome is impossible. The most important part of the context is at the beginning, where prompts are typically placed and models are trained to adhere to them. If quants mess that part less, maybe responses could actually be better. I'm not asking anyone to try it. All that's required to give it a try is the ability to load and save processed context, also a useful feature in its own right.
>>
>testament
>>
>>101995746
>Newsflash: it's not
>>
>>101995680
You are a retard.
>>
>>101995842
No you.
>>
>>101995680
When are you going to stop shilling meme models like Midnight Miqu and Wizard?
>>
>>101994745
Too low temp. Nemo is an absolute retard but it is not dry at all.
>>
>>101995851
You lost. Also program it yourself retard. And test it. And come back and tell us you were a retard all along.
>>
>>101995680
>Especially in ML, when attempting something never done before, predicting the outcome is impossible
Yes but ironically it's even more important in modern ML in general to justify experiments over other experiments as they are more costly. Though in this context it's still basically just a programming issue. I'd argue it'd still only be worth it to implement the saving/loading context feature if it can actually easily be done, but we don't know if it is unless you're a contributor or have deep knowledge of the codebase to say whether it would be that easy and there aren't any things that hold it back. Then if it's done, we can coincidentally try your experiment, rather than implementing it primarily so you can do it.

>The most important part of the context is at the beginning, where prompts are typically placed and models are trained to adhere to them
This is important for staying in character and other things in the card but potentially if we are talking about the parts of a context that are important, the middle of the context is still really just as important. People still complain that models forgetting or ignoring what happened in recent replies.
>>
>>101995451
its more repetitive, so rep pen at least was helping a little bit. that makes base, instruct, lumimaid, tess 3, and hermes l3.1 70b tunes that just feel odd to rp with. is anyone using any of those and would vouch for them, for rp?
>>
Magnum 123b is officially the worst fine-tune of all time.
>>
Magnum 123b is officially the best fine-tune of all time.
>>
>>101995919
buy an anti-ad
>>
>>101995919
>>101995938
Buy an attack ad.
>>
>>101995919
>>101995936
Hi Sao, hi Lemmy
>>101995938
Hi Petrus
>>101995948
Hi Miku
>>
>>101994745
To me all the current Nemo fine tunes feel like they've been lobotomized compared with base instruct. I haven't tried the non-instruct base model so far. As for dry writing style, in my experience Nemo is really dependent on context for its writing style, so it benefits from something like an ali:chat card or a lot of high quality example dialogue.

I haven't tried it yet, but it also might work well to use one of the fine tunes for a few messages before switching to the base version, since they tend to be more verbose and have more of an intrinsic style to their writing.
>>
>>101995954
die undi
>>
>>101995948
You just activated my trap ad.
>>
>>101995919
it is pretty terrible. when are we gonna stop falling for the 'we ran no tests but it seemed alright' meme finetunes/merges?
>>
>>101995867
how's it a retard when it comes to ERP?

I keep needing people to explain when they call models I find surprisingly smart retarded. As I find Nemo for example as smart as any non 70b model (shit, some 70b models are close to it)

It's one of the few models that unironically deserved the hype which almost never happens. Mythomax was the last one I remember
>>
>>101995985
>Mythomax
kek
>>
>>101995985
surprise prostates. quantum clothes. intense french kisses during blowjobs, absolute complete lack of understanding of what a titfuck is.
>>
>>101995985
The more time you spend with models, the higher your standards become. I definitely would have killed a man for something like Nemo is right now, and that's most people's experience, until they've spent time with a cloud model. It's like going from NAI to, say, 3.5 Turbo or Claude 1.
>>
>>101996149
My nemo chats, compared to my old C1 logs, feel roughly the same (though it may be because I was shit at prompting back then desu). The only thing lacking from nemo is advanced comprehension for complex scenarios and spatial awareness, as well as context recall.
>>
>>101996041
>absolute complete lack of understanding of what a titfuck is.
I'm convinced you guys unironically don't use the models you shit on
>>
>>101996149
>brings up cloud models in a local general
>>101992933
>>
>>101996325
>doesn't even try to defend local models
lol
>>
>>101990712
lmg, your great uncle dies and gives you his 4x 3090s. What model do you run?
>>
>>101992222
2k context seems like so long ago... Thanks for making this Timeline Anon
>>
>>101996486
Mistral Large
>>
>>101996486
I would use that for fine-tuning models, finally attaining my life's goal of becoming a full-time sloptuner
>>
>>101996644
Couldn't you only fine tune like 8B at fp16?
>>
>>101996669
I believe in qlora supremacy.
>>
File: ComfyUI_05952_.png (1.45 MB, 1024x1400)
1.45 MB
1.45 MB PNG
>>101990712
>the latest news is almost from a week ago
>it's about some shitty merge
We are truly in the coldest AI winter
>>
>>101996742
>>101990805
>>
>>101996742
there are no model merge mentioned in the news
>>
>>101996742
Bro it's been 4 days.
>>
>>101996742
>the least stupid mikufag
>>
===MODEL REVIEW===
Tried Magnum 123b. Wasn't bad, but wasn't good either. Wasn't too horny like Unditune and 72b Magnum, wasn't too dry like Tess. Has brain damage like lumimaid, can't follow custom style instructions like the official, just sticks to it's own style. Guess it's okay if you like it. Tradeoff feels a bit pointless though, I was willing to tolerate GPTisms of Largestral as long as they were compensated by intelligence, but now that it's gone, why should I use this tune over CR+ which has a nicer style?

===RANT===
Warning: incoherent schizo rambling.
It feels like all tuners are still stuck in llama2 days while the models have moved on. Nous Research, why the fuck did you tune in refusals into a local model(Yes, I know I can jailbreak, don't "skill issue" me faggot, that's not the point.)? You just wasted your compute on something your userbase doesn't want. I know Undi&co are just incompetent and can't remove them, (see >>101983894) but you just chose not to, you dumb fuck. Why the fuck do tuners not pre-abliterate instruct models if they have to tune on top of them? Appropriate, not moralizing refusals will get tuned back in from the dataset, right? Claude logs this Claude logs that, maybe hire some Kenyans for 2$/h like OpenAI did and screen your fucking dataset for moralizing refusals, or 1b model. Or if you are feeling confident place your dataset on HF and ask users to screen it for you in the title. Name it something like CLAUDE-LOGS-PLEASE-CHECK-FOR-REFUSALS(or -FOR-SLOP). That's actual free labor. I don't know and I don't care about your beef is with [insert name of sloptuner], to me it's just seems like a logical thing to do. LMSYS has also dumped a big dataset recently, why hasn't nobody besides NexusFlow used it? There may be some good data in there. Rant over.
Thanks for reading my rant. No, I am not Petra/Undi/Alpin/Sao/Lemmy/tranny/faggot/nigger/GNU/Linux.
>>
Hey /lmg/ how's it going? What's the current best model for 24GB VRAM? I liked mini-magnum for its freshness.
>>
how is anthracite going to recover after the magnum 123b flop?
>>
>>101997046
Llama 405B Q0.1_K
>>
>>101997061
Without any difficulties. If they didn't get kofibucks, they will learn a valuable lesson and try to improve, if they got kofibucks, they will continue as usual.
>>
File: 1721316096375374.png (1.38 MB, 966x1024)
1.38 MB
1.38 MB PNG
>>101996742
Stop being such a worry wart and fire up that Mag-123B, Miku. It'll make you feel better.
>>
wake up safe users!
Phi-3.5 has been released
https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
(16x3.8B)
https://huggingface.co/microsoft/Phi-3.5-mini-instruct
https://huggingface.co/microsoft/Phi-3.5-vision-instruct
>>
>>101997141
Magnum it is then
>>
>>101997221
>https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
Does it embody the "essence of moe"?
>>
>>101997221
Will probably be useless for RP again. It's a shame everyone loves focusing so much on le safe and harmless assistant huh?
>>
Magnum 123b is the disappointment of the century.
>>
>>101997221
>phi
pure distilled slop (literally)
>>
>>101997300
>direct preference optimization to ensure precise instruction adherence and robust safety measures.
>high quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.
>synthetic data and filtered publicly available documents
simply the best data, by all means it should be godly sine it wasn't exposed to trash like most other model
>>
>>101997061
>>101997294
Just because I made a rant it doesn't mean you have to dunk on it even more. I get it, Anthracite are your discord rivals or something, but please keep discord shit in discord, okay?
>>
How about this: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda/tree/main/cuda-fp16 is that garbage too? I mean, MS - it probably is - but who knows maybe it's a hidden gem? D/Ling it now.
>>
>>101997221
>16x3.8B
Although the model isn't interesting, it is interesting they went with this config for MoE.
>>
>>101990920
>Llama 3.1 70B works well for some users, but has issues with chain of thought for others
I use L3-70b locally and it's much better at details and remembering than any of the 7b/13b models I tried. The only saving grace for the small models is that they're much faster (almost instant).

Roleplaying is far better with L3-70b.
>>
>>101997645
Well, I had better luck with 3 vs 3.1, so your experience might not transfer to 3.1.
>>
>>101997221
>Training time: 23 days
And yet there's still no bitnet demonstration model.
It really makes you think.
>>
>>101997748
yeah it really makes me think it works
>>
>>101990712
I have 12GB VRAM and haven't touched a model in about a year. What's good at this size?
>>
>>101997821
Every big CEO has his own personal bitnet AI gf.
>>
>>101997827
Mistral Nemo 12B
>inb4 another anon recommends finetunes
Try the base model (instruct too!) first, then you can check its finetunes
>>
>>101997858
>Try the base model (instruct too!) first
Buy and ad Guillaume
>>
>>101997827
mini-magnum.

>>101997858
That should be standard procedure,
>>
>>101997858
Thanks anon, are there any differences between the different quant uploaders?
>>
>>101997474
It is a no-gpu dream desu. It is a real tragedy it never saw a penis during training.
>>
>>101997908
avoid bartowski he recently said he didn't know what hes doing, mradermacher is a much more serious professional guy.
>>
>>101997827
You can fit a Q4 quant of Nemo or one of its fine tunes on 12gb vram with 16k context for a pretty decent experience.

I generally prefer the vanilla version of Nemo Instruct, but mini-magnum, magnum 12b v2, and nemoremix are decent among the fine tunes I've tested.
>>
>>101997857
i wouldnt share either
>>
anyone got the L3.1 4B inferencing? how is it?
>>
>>101997951
Is this the new bait?
>>
>>101997951
does it even matter for running quants?
>>
>>101997748
It is pretty obvious at this point that nvidia has some microcode that detects any attempt of training bitnet and intentionally adds errors that make training convergence impossible. Only nvidia loses when bitnet happens.
>>
>>101997221
gguf when
>>
>>101997904
>>101997951
>>101997961
thanks bros, seems like i can clear a lot of obsolete models of my disks.
>>
What the fuck is washcloth? Why every model wants to wash me with a fucking rag instead of soapy hands or sponge? Is that some american thing I don't understand?
>>
>>101997961
>Q4 and 16k context
nta but still kinda new to this, why you prioritize context over quants? I'm running Nemo with a Q6KL quant with 8k context in 12gb vram
>>
>>101997221
Does this have GQA this time?
>>
>>101998131
Because he is retarded, don't listen to him anon.
>>
File: gguf when.jpg (125 KB, 1185x499)
125 KB
125 KB JPG
>>101997221
Bros....
>>
>>101998170
>robustness
What is that metric even measuring?
>>
>>101998177
It measures robustness.
>>
>>101997221
nala test please
>>
>>101998103
There was a time before your time. Is it a fantasy or old timey setting? It'd make sense then.
>>
>>101997326
>robust safety measures.
>>101998177
>What is that metric even measuring?
>>
>>101998131
Using it for roleplay, I like having the extra room in context for more complex cards and more chat history, but that's going to vary a lot from person to person. I think the quality loss going down to Q4 is worth it, but that's a matter of preference. I've also played around with q6 + no kv offload, which is slower but still pretty usable.
>>
>>101998170
What does this mean.

Does this mean that the 16x3.8b one is gonna be the Nemo for 24GB cards now?
>>
>>101998239
no because phi is probably the worst model series for roleplay, by design, trained on only academic and synthetic safe data.
>>
>>101998003
We've had such an influx of retards and schizos this past week, it's hard to tell if posts like the one you're responding to are actual bait, stupidity, or just another attempt by disturbed forever-alone anons like this guy >>101990805 to shit up the thread.
>>
>>101998274
into the trash
>>
Can you call off the Slavic catastrophe for a bit?
>>
File: 1723435460204937.jpg (28 KB, 736x709)
28 KB
28 KB JPG
>trying to use gemma
>it writes a bit of text that's pretty good and then goes HERE'S WHAT HAPPENED NEXT! and it offers me like 2 or 3 choices, each with a title and a description

Why is this happening. How do I make it stop
>>
File: not-even-4k-context.jpg (1.46 MB, 1359x9000)
1.46 MB
1.46 MB JPG
>>101990712
Kayra fails basic password retrieval tests even at 4k context!
>>>/vg/491110706
>>>/vg/491113839
>>>/vg/491112854
And /aids/ pays $25 a month for the extra context!
>>
>>101998319
The distant neighing of horses was heard in the background.
>>
>>101998319
*plap plap plap*
>>
>>101998307
I hope NATO gives Ukraine nukes so they can drop them on your subhuman head, zigger. Fucking subhuman.
>>
>>101998337
It's worth repeating that they pay $10 for a 13B model from the Llama 1 era with 3k context and TWENTY-FIVE for 8k
>>
>>101998319
>And /aids/ pays $25 a month for the extra context!
Why do they do that to themselves?
>>
>>101998365
Do not (you) me, and learn how completion models work.
>>
>>101998367
because they are utter retards or zoomers or troons. or all three.
>>
>>101998319
>Still hasn't liberated Kursk.
There's an AK-12 with your name on it, buddy.
>>
>>101998319
for a year's worth of subscription you could buy a 3060 12gb and run nemo, gemma, llama or whatever comes out in next months
>>
Asking the nemo finetune enjoyers which ones they are using. Danke sehr.
>>
>>101998210
I'm not specifying it but it's regular modern setting. Multiple different models do this though.
>>
>>101998472
magnum v2.5
>>
>>101998202
working on doing a fresh install of ooba right now in order to try it out. Should be able to squeeze it in at F16. My computer with all of my templates is not operational at the moment so I can't promise a properly indicative nala test until later tonight.
>>
>>101998319
Isn't Kayra a L1-13B tune? Kek
>>
>>101998559
It's a replica.
>>
>>101997230
What the fuck. Magnum feels COMPLETELY different to mini-magnum. What is this shit? Isn't this supposed to be its big brother? They feel like two completely different models, and mini is WAY better.
Are there any 30b models like mini?
>>
>>101998144
the moe does

mini and vision still do not
>>
>>101998628
>no logs
>>
>>101998634
What the. Weird. I guess that's kind of fine then. How did you figure that out btw? Is there something in the config that reveals this?
>>
Why do you guys fuck with shitty low models when CR+ is literally free, even on your toasters
>>
>>101998628
>magnum 72b
>overcooked on the original slopset
of course it's not the same, silly anon
>>
>>101998628
>uses mistral presets with magnum
>>
>>101998641
Isn't this known?
>>101998698
Please elaborate. What's going on here?
>>
>>101998697
>he wants to be blackmailed because of his roleplay logs
EL OH EL
>>
File: livebench-2024-08-06.png (830 KB, 3092x1782)
830 KB
830 KB PNG
>>101998697
Look at where CR+ is in the graphic.
>>
>>101998697
Where?
>>
>>101998697
Elaborate.
>>
>>101998717
>l-le graphs! le le benchmarks!
nta in my experience cr+ performed way better than any list would lead me to believe compared to other models in similar size or smaller
still dropped it for largestral though
have we not yet established that only redditors basedpoint at this dogshit
>>
File: image.png (56 KB, 814x167)
56 KB
56 KB PNG
how incompetent do you have to be, that you can't fucking setup a Shopify store. How the fuck does Nous have funding?
>>
>>101998732
>>101998745
just make an account and you get a trial key lmao... 1000 words
>>101998711
>what's a burner email
>>
>>101998717
To be fair this bench isn't intended to measure performance on creativity, storytelling, and RP.
>>
>>101998784
I want to keep my RP logs on my own device. Not send them out to some Canadian twats.
>>
File: file.png (10 KB, 532x59)
10 KB
10 KB PNG
>>101998784
>1000 words of whatever shitty service
That's like 3 posts nigga. What kind of illiterate rp you doing?
>>
>>101998170
>all those big numbers
>no cock sucking number
>you just know if there was a cock sucking number it would be lower or barely above l3-8b
>>
>>101990712
Dear OP,
I am new to this website, and to language models in general. I have very little Python coding experience and am at novice level. Do I need Python skills to install a local language model? Also, can the language model be trained with my own materials or does it come pre-trained if I install it locally? I would appreciate any information and guidance in this matter

With much thanks and gratitude.
-Steve
>>
>>101998784
>1000 words
bro
>>
>>101998774
>>101997022(Me)
I must have overestimated Nous. They too must be too incompetent to filter out refusals. Damn.
>>
File: ....png (909 KB, 800x1400)
909 KB
909 KB PNG
>the last decent model came out months ago
Local is dead
>>
>>101998841
Phi-MoE will be our savior.
>>
>>101998784
>1000 words
>proxy so everyone can read your logs
you are not even trying
>>
I think there's an error in the first link
https://rentry.org/llama-mini-guide

'make' is not recognized as an internal or external command,
operable program or batch file.
>>
>>101998367
It is probably like apple. They have invested their personality into the brand.
>>
>>101998628
>Magnum feels COMPLETELY different to mini-magnum
Not sure if you are new or just very retarded (probably both and a faggot) but it is all because different base model. The longer I use this shit the more I am thinking finetunes barely do anything.
>>
>>101998784
That's barely enough words to describe the act of penetration
>>101998841
Nuh-uh, Miku! Things have never been better.
>>
with only 6.6B active Phi MoE would run great on any potato once it has gguf support (new achitecture so no day 1 support, sorry sweaty) with a cheap RAM upgrade. It could very well be the end of cloud models.
>>
>>101998717
Just like with the penis enlargement pills one day someone will create an actual benchmark instead of all the useless mememarks and nobody will believe it isn't just another mememark. Today isn't that day of course.
>>
>>101998907
No, they're both tuned on Nemo.
>>
>>101998912
That's not the real Miku. It's a pretender.
>>
>>101998641
It's night and day. I know it's 12 vs 34 beaks, but there is no comparison. magnum is generic, boring, dry, while mini is really faithful to the language you used in the context, throws curveballs at you and feels fresh. the base models are painfully lightyears apart. I wish there was a 30B "mini", since I bought a new card to be able to use bigger models among other things.
At least I can now use the Q8 quant at breakneck speed
>>
Fine-tuning is placebo.
>>
>>101998774
>Teknium
I haven't tried his newer models. Does he still force his shitty
You are "Hermes 2", a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.

vanity system prompt for his finetunes?
>>
>>101998957
Magnum 32B is tuned on Qwen1.5 32B
Mini (or Magnum 12B) is tuned on Nemo
Night and fucking day.
>>
>>101998994
yes
>You are Hermes 3, a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.<|im_end|>
https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B
>>
File: file.png (267 KB, 366x548)
267 KB
267 KB PNG
>>101998784
>just make an account
>get a trial key lmao
Whenever you touch your penis anon, remind yourself that he is reading your logs. And he weeps.
>>
>>101998830
Dear Steve,

No, you don't need Python skills to install a local language model, but you need them to train it. To run a model you can use koboldcpp(https://github.com/LostRuins/koboldcpp) which is currently the simplest way to run LLMs. You would need a model in GGUF format, you can download them from huggingface.co. To calculate if you can run your desired model you can use the following formula: model size(GB)+20%=GB RAM model needs. Generally it is better to pick a low quant(=compression, q1-q3 is considered small) of big model over a big quant of small model. In the upper segment of the market Mistral Large is currently dominating, in the lower segment Mistral Nemo.

Hope this helps you out!

-4chan the hacker
>>
>>101998666
yeah, in the configuration_phi*.py file
> If num_key_value_heads=num_attention_heads, the model uses Multi-Head Attention (MHA); if num_key_value_heads=1, it uses Multi-Query Attention (MQA); otherwise, it uses Grouped Query Attention (GQA)
mini/vision use MHA and phimoe uses gqa with 8 kv heads
>>
File: file.png (91 KB, 1386x432)
91 KB
91 KB PNG
>>101998628
they arent finetuned on the same base model and not even the same dataset
>>
After some time, I turned to llava again.

I remember, I had to tweek one of JSON files to make it load (llava-13b)

it complains on load as follows:
>ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.
>>
>>101998875
/g/ - technology
>>
File: ComfyUI_00794_.png (1.07 MB, 1024x1024)
1.07 MB
1.07 MB PNG
>>101998961
Yeah it's pretty obvious. Our very own mentally ill blacked anon was doing the false flag thing a while ago after everyone started ignoring and reporting him. The /lmg/ village idiots are a bunch of slippery fish I tell ya!
>>
>>101999062
It would appear this is already my current location within this website.
>>
File: 1000006162.jpg (66 KB, 600x1000)
66 KB
66 KB JPG
>>101998875
so over for localbros...
>>
>>101999093
Install linux. Come back when you're done.
>>
>>101999122
I cannot install linux. The last time I tried in the 90's it was too confusing. Also I require Windows for work and school. But thank you for the hint that this is only for Linux. Sadly, I will never be able to run it then .
>>
What is the worst model for RP out there in the 7B~13B range?
Asking for research purposes.
>>
>>101998994
>force
you can and always have been able to use whatever system prompt you want, anon
>>
>>101999152
anything phi
>>
>>101999035
Oh I see, thanks.
>>
>>101998988
>placebo
Now that is a word I haven't heard for a while.
>>
File: phimoeinteresting.png (43 KB, 1120x778)
43 KB
43 KB PNG
Holy shit you guys. It didn't write a mandelbrot set script. I think Phi-MoE might be AGI.
>>
>>101999138
Do you need to compile from source if you on are windows?
There are pre-compiled binaries in the llama.cpp repo
>>
>>101999138
I don't know if i should insult you or take pity on you.
>https://github.com/LostRuins/koboldcpp
They have pre-built binaries. Read their documentation.
>>
Weeks until next Cohere drop?
>>
>>101999152
Something like OPT-13b or anything else from the pre-llama era if you're willing to step into the pre-llama time. Alternatively any instruct-tune based on early llama1 if we're talking about 'modern' llms.
>>
>>101999231
we would've gotten column-r last week but elon secretly bought it and released it as grok-2 with an x.ai sticker slapped on
>>
>>101999231
cohere realized they can't compete and have been using all the money they got from VCs on sports cars and exotic club drugs
the rest will be spent on organizing the founders' disappearance so they can slip away to new lives under assumed identities on remote south american island compounds
>>
File: phischizo.png (77 KB, 1125x473)
77 KB
77 KB PNG
Phimoe definitely suffering from EOS token issues.
>>
phimosis
>>
>>101999350
The same joke crossed my mind. A Phi MoE finetuned to better handle system messages. Phi-MoE-SYS
>>
>>101999360
drummer, get on it
>>
How much worse is 2 3060s vs a 3090? I already have one 3060 and want to get up to 24gb
>>
>>101999329
kek
>>
>>101999350
*keks audibly, a wry smirk on his lips*
>>
>>101999350
Lol
>>
>>101999378
How about a 3060 and a 3090.
>>
>>101999414
I would prefer to buy a 3090, but its $200 for a 3060 versus $700 for a 3090 and I can't make the afford the difference right now
>>
Does this count as a Miku/Migu?
>>
>>101999378
Rent them on runpod and see for yourself.
Imo 2x 3060 is better because 3090 will only give you more speed.
>>
>>101997221
>state of the art
Why does everyone claim to be state of the art?
>>
File: 1714835911803032.jpg (951 KB, 1792x2304)
951 KB
951 KB JPG
>>101999520
Yes
>>
>>101999520
Close enough.
>>
>>101999541
Because no one would release a model that isn't special in any way whatsoever.
>>
>>101999541
It's for the investors.
>>
>>101999554
Most of the time it isn't, not everything can be special. Most models are just average.
>>
I used to be smarter when I was a teenager now I'm middle aged and feel stupid as fuck. why can't I wrap my head around simple shit like this anymore? I am going to end it bros.
>>
>>101999583
I didn't get into LLMs until mid-forties. What's your quandary, millenial boomer-anon?
>>
>>101999530
Runpod doesn't seem to have 3060s available
>>
>the thread schizos could be middle-aged boomers
I don't know how to feel about this
>>
>>101999608
just lack of focus, I read the words and don't retain what I've read. I think my brain died from too much MSG and mercury in my tuna. I would have had a lot of fun with this technology when I was young. There should be optional exit booths for people like me, when the mind starts to go, just go to the exit booth lol
>>
>>101999631
Likely younger than that. Schizophrenia usually presents in the early twenties for men. The old timer schizos off their meds are likely too far gone to post here.
>>
>>101999696
NTA but that's also how I feel and I'm far from middle-age, it's over for me before it even began. I'm lucky I'm just very used to technology from the countless nights on my PC.
>>
File: 1724192663001.jpg (129 KB, 1944x1032)
129 KB
129 KB JPG
doko
>>
File: dfghv0.png (41 KB, 527x202)
41 KB
41 KB PNG
>>101995416
>wouldn't be surprised if it wasn't working
add autistic debug prints
>>
I tried mini-magnum-12b, after the good feedback here.

Why doesn't it start?

- Other models work fine
- Memory isn't full
- It doesn't even work on CPU only (where memory really wouldn't be an issue)
- I tried a fresh install
>>
>>101999808
whats your context set to? nemo has 1 million context (1024000) as default config despite going crazy after 16-ish k
>>
>>101999808
Try with llama-server. If that works, then your shit is outdated despite doing a fresh install, somehow.
>>
>>101999696
>>101999732
MSG doesn't do that anon. And I guarantee as an otoro fiend I've had more mercury infested tuna at my age than most people could eat over several lifetimes. You might unironically try lifting before you kys. Get your own set of weights because fuck the gym.
>>101999765
Jibun de yare
>>101999808
Try ditching booba before you give up
>>
>>101999808
It has long context. If you don't set it to something more reasonable than 128k, you'll OOM. That's my bet, at least. Try setting the context length to 4k and then adjust according to your mem.
>>
File: ComfyUI_01013_.png (1.1 MB, 1256x1024)
1.1 MB
1.1 MB PNG
>>101999350
>>101999368
Model card ready
>>
>playing a shota
>model keeps trying to give me lip about asking for consent and shit even as I'm literally asking for consent
make it stop
>>
File: basedflux6.jpg (364 KB, 1024x768)
364 KB
364 KB JPG
Flux is pretty darn poggers for a local model.
>>
>>101999922
If you had only mentioned the model so anons could tell you skill issue or change model. Shame.. shame..
>>
>>101999950
>impyling anons wouldn't just write "skill issue" like the rubbernecking retards they are
>>
>>101998705
The dataset expanded to become more general (22k Instruction samples from Opus as well as 5k creative writing specific ones), and certain low quality entries got pruned from the Stheno set.
>>
>>101999216
ask it to make ascii art of miku riding a unicorn
>>
>>101999260
not sure if memeing but would make perfect sense. commander is really uncensored and cohere didn't join that ai safety group so they would be the best candidate to buy considering his le edgy persona. also I would like to say that elon musk is a nigger and I hope he dies soon. everything he touches turns to shit. and i hate him with a passion of a thousand suns now that he touched my dead hobby.
>>
>>101999871
will give it a try.
>>
>>101999926
Fuck that weak-ass holding your own stomach from laughing, help your bro by holding his.
>>
File: 7szz8x.jpg (74 KB, 507x492)
74 KB
74 KB JPG
Having just coomed to an LLM I am in a state of post nut clarity. And I am starting to consider if it would take less effort to just get a girlfriend and groom her into my fucked up fetishes. The amount of editing, rerolling and prompting exactly what I want makes is incredibly tiresome. And now that I am done I feel hollow. This technology is cursed. It is supposed to be the ultimate form of automation, but when it comes to dick sucking the amount of manual input adjustment is insane. And the lack of clear feedback from changes to your manual input is the cherry on top. In a way LLM is the exact opposite of what it is presenting itself as.
>>
>>102000277
buy an ad
>>
>Magnum 72b runs fast on 48GB at 4bpw but is retarded
>Magnum 123b is good, but you probably get severe quant brain damage if you use IQ2_S instead of just offloading a bigger quant, and it's slow as fuck when not fully offloaded

How do I cope without buying another 3090?
>>
>>102000277
First,
>I am starting to consider if it would take less effort to just get a girlfriend and groom her into my fucked up fetishes
Wrong.
>This technology is cursed
Correct.
>but when it comes to dick sucking the amount of manual input adjustment is insane
Buy an-- I mean, skill issue.
>>
>>102000307
Buying a 4090, of course
>>
>>102000316
skill issue presence is a function of tolerance to gleaming eyes, complexity of the fetish of your choice and available vram. blanket skill issue statements are a meme.
>>
>>102000277
Any fucking imaginary girlfriend you are picturing would not be a K-cup titted anime girl. She'd be a vaguely passable, mostly flat bitch OR fat bitch that whines, sleeps, talks, shits and stinks.
>>
>>102000401
But what about real love?
>>
>>101998832
I’ve been saying this for awhile but they actually have safety alignment in their own dataset. Have been for awhile. Their llama 2 model was a pain to work with esp the dpo version.
>>
>>102000307
wait for magnum 72b v2
https://wandb.ai/doctorshotgun/
>>
>>102000383
You can pay to rent a machine online and "retouch" a model on the most perverse shit. Complexity of a fetish is, in fact, mitigated by having more material on the subject to feed the electronic demons jailed in their silicon prisons.
Skill issue is, in fact, very real.
>>
>>102000481
I hope a spontaneous cockblocking paladin appears in your next ERP.
>>
>>102000514
Imagine how funny it'd be if you had a superpower and it was the ability to appear in other people's fantasies as a cockblocking paladin.
>>
>>102000433
AHAHAHAHAHAH
L M A O


anon. you already lived it and it's gone. Every love from then on is only meant to help you forget
>>
>>102000514
I would honestly welcome it, considering I like when spontaneous bullshit happens. As long as cucking is not involved.
>>
>>
>>102000590
>cucking is not involved.
My LLM spontanously brought up cucking recently. I wasn't happy.
>>
>>102000619
>mike plate on the wall
At last the truth is revealed. It was always Hatsune Miku(male).
>>
>>101990712
Hermes 8B is so trope-ridden and broken. All elves must have raven-black hair. Tell it not to do that and it says "raven-black tresses" etc. It regularly ignores instructions.
It's slightly better at a more novel-like style without constantly recapping or trying to conclude every output, but it's still unusable to me.
ERP fags with chatbots might like it, but it sucks for creative writing.
>>
Im back after a long long time, I've been following the news but not really trying new models. The last thing I tried was Llama 3 8B when it came out. I know about the 3.1 series and Mistral Large. What's a good 70B model for RP? I've been using miqu for the past year
>>
>>102000683
miqu
>>
File: GU1TYARbsAAZUb_.jpg (387 KB, 1720x2273)
387 KB
387 KB JPG
>>101990712
Teto my beloved

https://www.youtube.com/watch?v=satZx43Sv_0
>>
File: the suffering.jpg (54 KB, 474x604)
54 KB
54 KB JPG
>>102000627
Well, fuck.
>>
>>102000706
So ERP locally is as dead as always? Great
>>
>>102000754
yeah, local is in a lull right now, we're all just kind of huddling around waiting for the big release on november 5th
>>
>>102000662
All LLMs suck at creative writing. Letting porn brained idiots ERP without bothering a real human is the most noble pursuit this technology is or ever will be capable of.
>>
Will there be a Magnum v2 405b?
>>
>>102000777
>porn brained
if you don't have a girlfriend and get horny what are you supposed to do?
>>
>>102000854
Hopefully either use it as incentive to get out and meet new people or go into a self-improvement cycle
>>
>>102000911
you zoomers are really fucked in the head.
>>
>>102000307
>I only use Magnum models
I still can't tell what the 123b one adds compared to Large.
>>
>>102000911
Hm...
>hard path for self-improvement in order to deal with human-based bullshit and waste hundreds of thousands on upkeeping a relationship
vs
>easy path for inner peace by removing all connections with human-based bullshit and save hundreds of thousands necessary for sexbots of the future
I dunno...
>>
I need you to explain to me like I'm five. Using kobold and ST, it used to be that the AI would need time to "read" a post and then time to reply. Now however it seems the AI needs no time to read at all? How does that work?
>>
>>102000854
To be porn brained isn't just to make use of porn, it's more having used porn to the point that your expectations and standards have been warped beyond repair. It's the kind of brain damage that leads to people leaving comments in the pornhub comment section.
>>
File: V100price.png (491 KB, 1623x638)
491 KB
491 KB PNG
>32 GB V100 is still $700+ dollars and needs as specialized server, and even worse if you try and find the PCIe ones you can use in a regular system.
VRAM starvation is no joke. Please someone just bump the limit on VRAM you can get in a reasonably priced PCIe card to 32 GB, I don't have 2k USD to burn.
>>
>>102001024
I think the worst part about porn addiction is all those countless hours you waste jerking off instead of doing something productive. It is horrifying. Especially when I also consider how many 4chan posts about porn addiction I could have written in that time.
>>
>>102001018
prompt processing still takes time, but you are probably referring to streaming or caching
>>
>>102001067
That's still an improvement. they were all ~$900 when I was checking last week.
>>
>>101994638
How did Mistral do it? Best small model. best large model
>>
What's the best way to raise temp but keep the bot on track? I find writing becomes a lot more detailed and hotter with higher temps.
>>
>>102001133
>>102001133
>>102001133
>>
>>102001018
That sounds like context shift working for you. Basically the prompt is cached up to the point where new information is added to it (so dynamic stuff like lorebooks in a character card can result in it barely doing anything at all). Basically you're looking at only long enough to process the the newest response instead of the full context.
>>
rentry co/xtfqvv4h
Help me, anon. You’re my only hope.
>>
File: how.png (779 KB, 1619x638)
779 KB
779 KB PNG
>>102001125
They aren't dropping fast enough. The PCIe versions are 2x the price, they are selling at the same price AMD's old workstation and datacenter 32GB cards are at and it's all because of CUDA lock-in. The situation is just sad, man.
>>
>>102001163
Overfitting is all you need.
>>
https://huggingface.co/MangoHQ/TinyMagnum-4b

leaked magnum model?
>>
cmon wheres your pasty, i know you're lurking here humiliated
>>
oh nvm it was a rentry this time, almost missed it
>>
>>102001479
Thank you. And sorry, pastebin was down at the time.
>>
>>102001544
sorry's not got cut it, hand over the miku
>>
File: Luka.jpg (18 KB, 296x256)
18 KB
18 KB JPG
>>102001562
I'll do you one better and give you this Luka. Seriously, thanks again.
>>
>>102001612
yes... goooood... safe travels recapfag



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.