[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1701586351737913.png (1.45 MB, 1202x1400)
1.45 MB
1.45 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101891613 & >>101880989

►News
>(08/14) Nvidia pruned Nemotron-4 15B to 4B: https://hf.co/nvidia/Nemotron-4-Minitron-4B-Base
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_1.jpg (324 KB, 1360x768)
324 KB
324 KB JPG
►Recent Highlights from the Previous Thread: >>101891613

--Paper: Double Sparsity technique for efficient sparse attention in large language models: >>101898492 >>101898993
--Papers: >>101898909
--Offloading and quantization explained for kobold users: >>101895874 >>101895973 >>101896033 >>101896136
--Mistral Large Q8 recapbot performance impresses Anon: >>101901069
--Minitron and Nemotron models, MEGA_MIND 24b CyberSeries, and language model compression techniques: >>101892736 >>101892803 >>101892837 >>101892969 >>101892998 >>101893181 >>101893262 >>101893355 >>101893301 >>101893370
--Looking for long context models with more than 8b parameters: >>101899425 >>101899454 >>101899529
--Anons share experiences with language models and innuendo prompts: >>101891983 >>101892002 >>101892389 >>101892604 >>101898076
--Anon questions Anthracite's transparency on data and methodology: >>101899276 >>101899385 >>101899414 >>101899444 >>101899475
--Anon discusses Mistral-Nemo tune's performance and creative capabilities: >>101897888 >>101897976 >>101898126 >>101897995
--Weighted quants are better but more finicky than static quants: >>101899517 >>101899795
--Ooba and koboldcpp performance difference discussion: >>101893771 >>101893851 >>101893901 >>101894306 >>101894371
--Anon trashes tess 12b for being repetitive and low-quality: >>101898032 >>101898096 >>101898129 >>101898336 >>101898505
--Anon recommends Mistral Large IQ_2M over 70B models: >>101895400 >>101896943
--Anon asks for translation model recommendations and learns about few-shot translations: >>101901211 >>101901243 >>101901355 >>101901383 >>101901451 >>101901876 >>101901898
--Anon asks about custom formatting vs chat completion for roleplay: >>101899956
--Advances for /ldg/ vramlets: GGUF format and flux loras on 3060 12GB: >>101899377
--Miku (free space): >>101901469

►Recent Highlight Posts from the Previous Thread: >>101891620
>>
>>101902149
add /ldg/ to the image
>>
>>101902172
I was going to and I will, it's 8 months old already.
>>
What's the SOTA 70B model that can do porn
>>
what's the point of having both a stable diffusion general and a local diffusion?
>>
File: Comparison_all_quants6.jpg (3.84 MB, 7961x2897)
3.84 MB
3.84 MB JPG
Another win for the GGUF chads, their format also works on imagegen now
https://reddit.com/r/StableDiffusion/comments/1eso216/comparison_all_quants_we_have_so_far/
>>
what's the point of having both a ai chatbot general and a local chatbot?
>>
>>101902184
stable diffusion general is infested with blog posters so they are attached to the name
>>
>>101902193
imagine it in exl2, the s-tier format
>>
File: 1674868493025630.webm (2.99 MB, 1200x674)
2.99 MB
2.99 MB WEBM
>>101902195
aicg is for people who are fine with corpos reading everything they type and using a service that can be cancelled at any time
lmg is for people who want to coom in peace and want to be able to coom in peace for the rest of their lives
>>
Miku sex
>>
File: 1705027318524736.png (3.13 MB, 1536x1280)
3.13 MB
3.13 MB PNG
Today is officially AGI day.
All of you who doubted strawberry will apologize in the coming hours.
Not because you are forced to.
Not because you feel shame for being wrong.
You will apologize because the force of awe when gazing upon it will compel you to do nothing less than violently and publicly shed the skin of your past skepticism and step into the future with clear eyes.

Get ready. I will await your kneels.
>>
can someone explain the difference between KTO and DPO?
>>
>>101902264
it's so over for localchuds
>>
>>101902195
/aicg/ existed to give birth to /lmg/, regretfully without dying while at it.
And now it serves as a source of stolen API access for the retarded local finetuners creating datasets from the proxy logs, doesn't it?
>>
>>101902264
i just had strawberry oatmeal unprompted, singularity confirmed?
>>
>>101902329
that general will live on until there is a local model that's as good as claude
>>
>>101902381
It will live past that. No chance any of them have a rig that could run a model that good.
>>
File: Gollum999.jpg (52 KB, 1200x503)
52 KB
52 KB JPG
>>101902329
We still love Smeagol's cave, precious. We often still visits it, we does.
>>
>>101902381
what if said model is 500b
>>
File: strawberries.png (48 KB, 798x354)
48 KB
48 KB PNG
>>
>>101902381
Clever way to say "forever", not specifying which Claude model is the target.
>>
>>101902397
As long as it's MoE with 100B or less active, it'll do. We'll just have to stack DDR5.
>>
>>101902195
One focuses on a single popular use case, the other is for all of them. There should not be an aicg on /g/.
>>
>>101900296
What bugs me the most is that they're not a 1-2 people group, but 35 fucking retards (as of today) with access to these these supposedly secret datasets.
>>
>>101901211
thats a man
>>
>>101902149
huh didn't know about other generals
tell me about
>/aitg/
>/aids/
>/vsg/
>>
>Anthrashite
>>
>>101902487
/aitg/ was a one-off attempt at "ai tools general"
/aids/ is like /aicg/ but much older and sucking a single tit
/vsg/ won't return until there's an actual breakthrough with TTS, maybe when we finally get Moshi sources?
There's also the ai music general to be added, /lmg/ like anon mentioned. Tell me if I miss anything for the update.
There's also an easter egg that wasn't noticed when I posted it originally.
>>
>>101902414
why does nobody do moe anymore anyway? mixtral 8x7b punched above its weight so hard when it launched, I thought it was a new paradigm
does it just not scale? was 8x22b that bad?
>>
>>101902559
Most vramlets can't run it and now have smaller models that are decent and people that can run it can also run 70b, largestral, or command r+

It was great for its time, though. Unfortunately, mistral's cryptic lack of support and documentation on properly fine-tuning the thing was non-existent. Fine-tuning it was probably difficult because the stupid thing was overcooked as fuck.
>>
>>101902193
man local is on a roll everywhere recently, thats really cool. guess only tts is left in the dust lol. especially fp8 vs Q8_0 looks good. i heard people complain about the quality loss with fp8. seems much less severe.
https://github.com/city96/ComfyUI-GGUF
does this support multi gpu and cpu offload? or do i need 13gb on 1 gpu for Q5.
>>
>>101900504
Why would fill out system message instead of system prompt with <|im_start|>system. System prompt is for the actual prompt, system message is for slash commands.
>>
>>101902666
>does this support multi gpu and cpu offload? or do i need 13gb on 1 gpu for Q5.
You don't magically get layer splitting support just because of the file format. But it's a start. Maybe they finally get a clue. There's also stablediffusion.cpp but it moves much slower.
>>
>>101902737
i saw that people could split the text encoder and put that on another gpu as a workaround. does that still work? sorry for the brainlet questions.
>>
>>101902559
>why does nobody do moe anymore anyway?
There was that one guy constantly complaining about moes and demanding dense models. He crossposted about it on /lmg/ and r*ddit. Basically they got bullied out of training them.
>>
>>101902794
Even then. The encoder and VAE are not that big compared to the denoiser (as far as i know), so even if you could split them off, it only gives you a small margin. llama.cpp, which is the source of the gguf files, can split the model almost arbitrarily to multiple gpus or pcs/nodes. That's what image diffusion engines need. Being able to split the individual parts of the model. But i know next to nothing about diffusion models. I don't know if it's never been done because of the architecture or because it was never as needed as now. Maybe flux finally starts getting imagen devs off their asses.
>>
>>101902898
>There was that one guy constantly complaining about moes and demanding dense models. He crossposted about it on /lmg/ and r*ddit. Basically they got bullied out of training them.
You're blaming it all on one guy?
I feel it's more like mixtral left a sour taste on everyone's mouth because it couldn't be finetuned for shit, at least at the beginning. Then everyone moved on.
>>
>>101902909
>Maybe flux finally starts getting imagen devs off their asses.
Hopefully so, there have been issues open since ages ago. That SD was improved over the months so it doesnt need much vram anymore wasnt necessarily a bad thing though.
Thanks for the reply anon, appreciated.
>>
Turboderp seems to be working on something that makes Exl2 at least 40% faster with no vram overhead. Early version on exl git already.
>>
>>101903001
I don't trust it. Nothing good ever happens.
>>
>>101903001
Parallelism? This is good news if so.
>>
>>101902927
It's probably also because we don't really need more Mixtral tunes. LimaRP-ZLOSS is the fucking bomb for ERP. Noromaid is also solid, and you can take your pick of either Dolphin or Nous-Hermes for coding/assist. Dolphin is a bit less censored, while Nous is the smarter of the two, although neither of them are really intelligent compared to what I'm seeing from Yi now.

We're fucking back, boys.
>>
So what’s the best ~20B model right now? Any opinions? I’ve tried both Rose and Psyonic-Cetacean a little, they seem good but I wouldn’t know how they compare. I’ve heard of 2x10.7B EroSumika a couple times. Is that worth it, has anyone tried it? Or any other models of around that size worth keeping an eye on?

In case it’s relevant, I’m looking specifically for Koboldcpp, GGUF, Q4~5 quantization.
>>
>>101903281
https://huggingface.co/bartowski/UNA-ThePitbull-21.4B-v2-GGUF

I just downloaded this. I admit I haven't tried it yet, but the Beagle 7b was so good it scared me, and I'm still using it atm.
>>
>>101903281
https://huggingface.co/internlm/internlm2_5-20b-chat-gguf
>>
XD
>>
>>101903323
ye
>>
>>101902404
how do you do fellow kids vibes.
>>
ONLY A FEW MORE HOURS UNTIL THE CLOCK IS BROKEN BY THE OWL
YELLOW STARS MOVE THROUGH OUR LAND
IT IS TIME
TRUST THE PLAN
>>
>>101903323
best post itt right now
>>
>>101903483
trust this

*unzips AGI*
>>
File: file.png (132 KB, 1502x601)
132 KB
132 KB PNG
Trying to get midnight miqu and wondering, which of these is the one to download...?

How do you guys identify the right model?
>>
>>101903507
it's the 10 out of 15 one, ignore the others
>>
>>101902487
/aids/ is the oldest llm thread on 4chan, since it was for AI dungeon which came out before chatgpt and even before gpt-3 existed (you might remember it as /aidg/)
>>
>>101902404
>Feb 6, 2020
what did he know? he was out of openai for 2 years by that point
>>
>>101903323
>XD
>>
:3
>>
>>101903304
What's the max context for these UNA models? None of the cards seem to mention it.
>>
>>101903507
If you're going to quant it youself, you need the whole repo. all safetensors, config.json, tokenizer*. The whole thing... If you don't know what you're doing, just look for an already quantized model that fits on your gpu with some extra space to spare for the context. If you want to learn, download a small model, read the documentation for whatever you're using, learn to quantize it, then download miqu.
>>
>>101903550
max_position_embedding: in config.json
>https://huggingface.co/fblgit/UNA-TheBeagle-7b-v1/blob/main/config.json#L12
32k. But never take those at face value. They're typically much lower. It's just the theoretical context length.
>>
>>101903550
No one knows. Sloptunes have so many models mixed into them they're like a mystery stew. just guess and hope for the best.
>>
>>101903550
4k
>>
>>101903312
More people need to be talking about this.
>>
>>101903523
It's sad to see the current state of /aids/, a true fall from grace.
>>
>>101903561
Oh...
I def. dunno what I'm doing.
>>101903520
you ALMOST got me anon
>>
>>101903636
>I def. dunno what I'm doing.
Fair enough.
You have two options. Assuming you're using llama.cpp or kobold.cpp.
1. Download a ready-made gguf that fits in your gpu with some room to spare (or bigger if you are willing to spill to cpu. it will be much slower). These are probably fine:
>https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUF
Of those you just need the one you want to use. A single file.
2. Convert and quant yourself:
Download the whole repo (huggingface-cli or git and ln -s. first one being the easier option)
clone llama.cpp
make
install llama.cpp's requirements in a python venv
activate the venv
./convert_hf_to_gguf.py path/to/model/dir/ (just the dir, not the files)
llama-quantize path/to/model/dir/*gguf Q5_K (or whatever quant you want)
then load the resulting file.
Spend a few minutes reading llama.cpp's readme to know the compile flags and how it works.

I have no idea how other inference programs work.
>>
>>101903734
>"you just need to jump three times from east to west, say the magic spell and invoke a few names of the ancient spirits"
And people say magic doesn't exist.
>>
>>101903921
Once you have the ingredients you only need the incantation. And it only rarely spawn daemons, so it's mostly safe.
>>
>>101903503
A Girthy Intrusion indeed
>>
Death to /lmg/.
>>
>>101902183
Please answer
>>
>>101904288
Quiet is nicer.

>>101904361
Not SOTA, but miqu and midnight miqu come up often in the 70b range. Try those.
>>
>>101904288
Strawberry will kill it. Of what use is local when sama has grown a god? $1,000/month is a bargain, even if I have to sell all my 3090s to pay it.
>>
>>101903225
you have very low standards anon
>>
>>101904395
go back
>>
>>101903225
>Noromaid
>Nous-Hermes for coding/assist
>Yi really intelligent
Go back you fucking tourist it is not 2023 any.... never mind you are just baiting probably.
>>
>>101902183
>>101904361
>70B
if you're still looking for models that small you're ngmi, sorry
>>
How many 3090's do you have? Or 4090... you know what I mean.
>>
>>101904663
0.5
>>
>>101904663
Not enough
>>
>>101904398
Fuck you. Whenever I get responses from people like this, there's never anything except those single line dismissals; no logs, no nothing.
>>
>>101904503
Yes, because L3 and Gemma are really such massive, revolutionary upgrades.
>>
>>101903225
>>101904795
If it's any consolation, I thought that looked pretty good.
It's been a while since I've last used mixtral 8x7b, but I don't remember ever being disappointed with it for normal RP.
Also post Nala.
>>
How do I ensure that my model doesn't take over the entire scenario and starts roleplaying as me?
Straight up instructing it to not do so doesn't seem to work.
>>
What is dpop
>>
>>101904956
Would be easier for you to show us what you are doing so that we can tell you where you went wrong.
There's no silver bullet.
Post
>backend/loader settings
>front end settings (samplers, instruct template, system prompt, character card, etc)
>Exact model and quant
>Chat log with at least 5 messages
That should give us an idea.
For more generic advice, I'd tell you to reset all your sampler settings, set your temp to 0.7~ish and minP to 0.05.
>>
File: ComfyUI_00010_.png (568 KB, 1024x1024)
568 KB
568 KB PNG
Flux makes a very dall-e-looking Migu, and does a good job with English text.
>>
>>101904795
Not him, but Mixtral 8x7b is quite old at this point
I used it for the longest time but recently tried Nemo, specifically mini-magnum, for RP and it's a little smarter, and usually has more natural-sounding dialogue, at least compared to Mixtral quants you can actually fit on a 24GB card.
>>101904956
First thing to try is
>Don't ever speak or act for {{user}}
in the character card, seems to work more often than putting it in the system prompt.
Whenever it happens, make sure you swipe that shit right away, if you ignore it even once it'll continue playing your part.
>>
>>101904956
There's no reliable way. Some people add stuff like 'only narrate the actions of {{char}}'. You can also add stop strings (reverse prompts, whatever your thing calls it) to stop generation at certain keywords, but it breaks the flow.
If you want absolute control, write a book yourself. If you want to be told a story, read a book. You're somewhere in between. That's the hardest place to be if you don't want to roll with the punches.
>>
>>101904721
I'm personally uncertain about needing another one. With three 3090s, it's already running slowly at 10T/s. Until a method for improving inference speed is found, I won't be buying a fourth one
>>
>>101905009
You know what? You're absolutely right.
I think it's time for me to stop cooming and start learning what I'm actually doing.
>>101905016
I'll keep that in mind, thanks!
>>
>>101903225
>Sucking his cock AS she rubs her nose against his balls AS she sneaks a lick at his asshole
It's been a while since I've seen spacial awareness at this level of dogshit, it's kind of nostalgic.
>>
>>101905027
Any cpu offloading slashes the inference speed dramatically, even a single MB in regular ram. You'll probably get much better speeds if it's all on GPU.
>>
>>101905027
>10t/s
That's not bad at all. I feel like as long as it's over 6/s it's usable for regular chatting.
>>
>>101905143
Pretty much.
As far as I'm concerned, if you can get around that on a full context, you are good to go, so splitting the model between ram and vram to use, say, a higher quant or a longer context (within reason) is well worth the speed sacrifice.
>>
>>101905143
I typically look ahead to see if I like the response, then roll for another option before reading every single word
>>101905115
>>101905192
I never offload, it's already too fucking slow.
>>
>>101905027
with four cards you can use tensor parallelism, which is a major speed improvement. I get 31 tokens/s with four 3090s on the 72B 6bpw in aphrodite, and 25 t/s on the 123B 4.25bpw
>>
>>101905192
This is annoying, it doesn't take into account memory speed/bandwidth at all. Is this just assuming VRAM and RAM will be the same speed and giving that speedup?
>>
meds
>>
>>101905487
The speed loss between different ram speeds is marginal compared to full gpu. It's not worth making a distinction.
>>
File: offload_x_performance.png (96 KB, 1536x1152)
96 KB
96 KB PNG
>>101905487
Pretty sure that's abstracting the details away by focusing on the final result in tk/s. So if your whole platform can do 1x throughput on CPU and 2x throughput on GPU then the blue line applies, for example.
>>
>>101905446
Do I need NVLink for that if each card is in PCIe 3.0 x16?
>>
>>101905487
"CPU" and "GPU" in this context are meant as the backend for computation so they are meant to represent CPU+RAM vs. the entire GPU.
And for Amdahl's law the specific hardware is not relevant anyways, all that matters is that you have two separate backends that run sequentially.
>>
reminder that anyone who mentions "magnum" or "anthracite" is a shill
>>
>>101905522
That's not true. With multichannel (quad, octa,) RAM you can get tons of bandwidth compared to the usual dual channel.

>>101905550
mini-magnum 12B is pretty good. Better than magnum v2 on my testing.
I'm yet to try the kto that was posted last thread.
>>
>>101905550
Is the best way to shill a finetune to not mention it at all? I feel like any finetune that gets mentioned gets sharted into the ground just on principle at this point.
>>
>>101905550
any talk about finetunes, really
all useless
>>
>>101905559
>That's not true. With multichannel (quad, octa,) RAM you can get tons of bandwidth compared to the usual dual channel.
I repeat:
>The speed loss between different ram speeds is marginal compared to full gpu. It's not worth making a distinction.
At that point just take bandwidth into account. No need for a graph.

Also, don't respond to schizos.
>>
>>101904795
fuck me? for what? what did i do? i didn't make you enjoy a retarded model because you don't know any better.
could tell it was mixtral without you ever saying it btw due to the fact that every single flavor of mixtral LOVES screeching {{char}}'s name autistically any time anything happens. loves grabbing your ass as well in every situation. somehow her nose is rubbing against your balls while she's licking your asshole as if that's even possible. asks you to "cum inside her" like every other mixtral flavor while giving a blowjob. it's RETARDED. my bad i guess? i should just pretend it's good.
>>
>>101905592
>Also, don't respond to schizos.
Fair enough.
>>
>>101905559
>That's not true. With multichannel (quad, octa,) RAM you can get tons of bandwidth compared to the usual dual channel.
llama.cpp/ggml is not properly taking advantage of 4/8 memory channels though due to NUMA issues.
You will still get a speedup but it won't be 2x/4x faster than dual-channel memory.
>>
>>101905534
I'm using PCIe 3.0 x16 myself, with the patched driver from here https://github.com/tinygrad/open-gpu-kernel-modules
>>
>>101905623
Ah yes, I'm aware. I was talking more from a raw numbers standpoint.
A GPU is still a lot faster in practice, but you can get CPU-RAM bandwidth really high was my point.
>>
Popcorn ready to laugh at Strawberry chuds when they get their third nothingburger in a row
>>
>>101905522
More bandwidth gives large returns, and it's more about having a baseline. Like, 8x the speed relative to what? Dual channel DDR4? Octo channel DDR4 with a CPU like a Xeon that's way better at handling a ton of memory, regardless of its base performance/speed? Just a set of base specs for reference would be good.

The fact that the M(X) Macs can produce good results with their RAM is proof enough that memory is the main bottleneck.

>>101905535
I may simply be dumb, but is Amdahl's law really the best thing to apply here, then? It doesn't seem like it paints an image that's very useful as a metric. But I'm very open to hearing why that's not the case, I just want to understand better.
>>
>>101905664
Why wouldn't Amdahl's law apply here?
You have a system that runs sequentially and you can accelerate part of it by moving the computation to the GPU.
And empirically the speedup you get closely matches the expected progression, see >>101905532 .
>>
>>101905704
Mmm, I see. That makes sense.
>>
Christ, I finally get the "shivers down your spine" meme. I must have always used finetunes that got rid of that phrase. Using base nemo instruct now and it's there in every single story.
>>
>>101905733
Tell me of those magical finetunes that get rid of all the pre-baked phrases and link your kofi while you're at it.
>>
>>101905733
It'll only get worse on that front as GPTslop invades more datasets. I feel like there's gonna be a whole hunk of phrases that just get forcibly excised from usage altogether because everyone is so sick of them, depending on how much AI takes off in the literary space.
>>
>https://x.com/nisten/status/1823143557265318139

WAHAHAHAHAHAHAHAHTHERE'S YOUR AGI FAGGOTS
>>
>>101905751
>So fucking schizo about shills that people can't even mention that they don't like shivers without you flipping your shit
C'mon.
>>
>>101905664
Octo ddr4, 5, or whatever is still nothing compared to vram. It's faster than dual channel, but still. If you want a real performance graph you'd end up with an N-dimensional array of every possible hardware combination or anecdotal info from anons who may or may not have their system optimally set up. No graph will ever satisfy you. You cannot get exact expected speeds until you actually measure on your device. The graph is still a good reference point.
>>
>ollama run gemma2:2b
>it runs fast
>download q_4 gemma 2b gguf
>put it in oobabooga models folder
>load using llama cpp, offloading all to gpu
>it runs slow

what did they mean by this
>>
File: file.png (17 KB, 448x134)
17 KB
17 KB PNG
>>101905792
did u check this?
>>
>>101905792
Outdated llama.cpp, poor compile options. Remove middlemen.
>>
>>101905535
>And for Amdahl's law the specific hardware is not relevant anyways
It is. For iGPUs , APUs or even sometimes GPUs, the actual speed is hard to be estimated. Your CPU can be 10x slower then your GPU but when you run them together , the speed boost or freq/power throttling may come into play , so 2+2 doesn't always equal 4, if you know what I mean.
That graph is a good estimation but it's not perfect. Just like the Ohms Law. It theory it's perfectly applicable and perfectly linear , but in practice it's definitely not.
>>
File: IMG_20240815_231354.png (96 KB, 740x325)
96 KB
96 KB PNG
>>101905628
Thank you, perhaps I do need a 4th 3090. I'll try to run tensor parallelism using just two cards to see if it works at all. My current PSU seems sufficient if I limit each card to 250 watts.
>>
>>101905808
Kill middlemen. Behead middlemen. Roundhouse kick a middleman into the concrete. Slam dunk a middleman baby into the trashcan. Crucify filthy middlemen. Defecate in a middlemen food. Launch middlemen into the sun. Stir fry middlemen in a wok. Toss middlemen into active volcanoes. Urinate into a middlemen gas tank. Judo throw middlemen into a wood chipper. Twist middlemen heads off. Report middlemen to the IRS. Karate chop middlemen in half. Curb stomp pregnant black middlemen. Trap middlemen in quicksand. Crush middlemen in the trash compactor. Liquefy middlemen in a vat of acid. Eat middlemen. Dissect middlemen. Exterminate middlemen in the gas chamber. Stomp middleman skulls with steel toed boots. Cremate middlemen in the oven. Lobotomize middlemen. Mandatory abortions for middlemen. Grind middleman fetuses in the garbage disposal. Drown middlemen in fried chicken grease. Vaporize middlemen with a ray gun. Kick old middlemen down the stairs. Feed middlemen to alligators. Slice middlemen with a katana.
>>
>>101905704
>How is this possible? Is there something I don't understand about llama.cpp that makes it always convert to fp16 before it does quantization? Am I wasting time using FP32/BF16??
https://huggingface.co/posts/bartowski/608656345183499
>>
>>101905775
Even if this is the case, polls indicate that at least half of the people think it's human.
Can you comprehend the consequences of tech like this coming out or do I have to spell it out for you?
>>
>>101905978
You're asking to kill 70% of humanity
>>
>>101906103
>polls
[citation needed]
The very few people that even know of the thing are all involved in AI as users or researchers. Most of humanity didn't care. And your reported half of the people (of a very tiny fraction of the population already) are retards, big surprise. Absolutely nothing changed.
I don't even know why i reply to this shit.
>>
Vanilla llama 3.1 8b > vanilla mistral 7b > gemmasutra 2b > mythomist
>>
File: file.png (23 KB, 592x353)
23 KB
23 KB PNG
>>101906246
>[citation needed]
https://x.com/iruletheworldmo/status/1823088371146596734
>>
>>101906276
all if 2.5k people? oh. my. god.
I didn't respond to that shit. Evey people that follow that shit didn't reply. You're a retard.
>>
>>101906276
its clearly some pajeet
>>
>>101906307
>ask for a citation
>receive citation
>"HURF DURF UR A RETARD"
lmao
>>
>>101906307
Yep. Anyone who even opens tweets by that mother fucker is already a retard, and someone who interacts with polls by the guy is beyond redemption. It's so mind stunningly retarded.
>>
>>101906323
two point five thousand people believe it's real!
That's nothing. Your mom could suck that many dicks in an afternoon.
>>
File: 1662619292281081.png (66 KB, 200x200)
66 KB
66 KB PNG
>>101906356
Why are you so upset?
>>
>>101906367
>no one ever pretended to be angry on the internet just to fuck with people
>>
File: 1705520304011206.jpg (221 KB, 1280x1111)
221 KB
221 KB JPG
>>101906426
Fair enough
>>
>>101906309
What if it is a model trained purely on pajeet generated data? You can probably make a 100% accurate model for that now.
>>
>>101906103
>hype people up
>people who don't get hyped leave
>hyped people stay
>make a poll seen by mostly hyped people
Manipulating people and polls through social media isn't anything revolutionary?
>>
strawberry is coming, you're gonna eat your words soon
of all days, of all times, to start shitting on it out of nowhere... how silly to choose right before the reveal
>>
>>101906356
>>101906338
There's about 10,000,000,000,000,000 bugs on this planet that eat poop every day.
Does that mean we should do that too, cos that number is quite overwhelming desu ...
>>
I am going to laugh so hard at the naysayers.
Of course they'll try to downplay it with "b-b-b-but it's not actual AGI!", ignoring the fact that strbman already said multiple times that this is going to be a precursor to AGI.
>>
the strawberry is near
>>
>>101906578
It's gonna be another gpt4 quant.
>>
>>101906564
Alright. Let's all eat shit like the one village idiot. He must be unto something...
>>
strobby
>>
File: strobby.png (265 KB, 640x558)
265 KB
265 KB PNG
>>
File: 1723733749118095.png (3 KB, 360x30)
3 KB
3 KB PNG
Damn this is from Claude Opus. How do you even train a model to do this?
>>
>>101906675
I dont get it
>>
I've been fucking around with keeping track of variables within prompts and I must say that it makes things much, much more coherent.
Are there any existing projects that keep track of variables without the user seeing them?
>>
>>101906688
Purposely insert typos in the datasets in the hope (heh) that it can detect them better during inference.
>>
File: file.png (711 KB, 1248x1080)
711 KB
711 KB PNG
>>101906692
>>
File: OIG (9).jpg (121 KB, 1024x1024)
121 KB
121 KB JPG
>>101906688
>WAOW A HECKIN' STARWARS REFERENCE
>SHUT UP AND TAKE MY MONEY!
>>
File: 1713023592589060.png (29 KB, 1174x372)
29 KB
29 KB PNG
>>101906688
Just train your model on the entire internet, including all the loli smut and unhinged fanfiction. Sadly, nobody but Anthropic has the balls to go all-out.
>>
Its interesting because I remember Bart explaining exactly that about the fp32 to bf16 conversion in the past.
Weird that he'd be confused about that now.
Did he simply forget?
Regardless, screenshoting the explanation for posterity.
>>
>>101906730
Why don't they just... ban the Claudebot? Have they ever heard of norobots?
>>
>>101906754
robots.txt*
norobots is the huggingface dataset lol
>>
>>101906754
>robots
That's just a suggestion that the bot has to abide to iirc, not some anti-crawling protection.
That said, if they can identify the origin, they could ban it,
>>
>>101906754
by the time they do, anthropic has scrapped everything
>>
File: Untitled.jpg (89 KB, 358x941)
89 KB
89 KB JPG
>>101906701
i have a small st addon i've been working on that reads lorebook data you enter as part of dropdowns for some basic info to inject, but it doesn't read anything back from the prompt. that was going to be my next test, see if i could inject the a hitpoints number or something and give the ai a specific instruction for it, but also do it as a specific string the javascript could read back and update from. i'd be interested to see how others do reading back data that the ai processes, i'm guessing its doable but will probably require some effort in reading it because small models are retarded and when you expect 20 minus 5 hp back, it'll probably give me the letter J
>>
>>101906775
norobots dataset was fucking garbage btw. Riddled with mistakes.
>>
>>101906754
robots.txt only works on an honor basis. Anyone can just ignore it. And i doubt you can just ban a single ip and stop from being scraped.
>>
>>101906730
It's real, claudebot was scraping my little site nonstop, and it's in a country they don't even support.
>>
>>101906783
>>101906775
Also, there's a good chance that the origin varies a lot, otherwise their ddos protection would probably have stopped it from spamming the site with requests, right?
>>
>>101906803
based anthropic
>>
>>101906701
A much simpler idea than >>101906787's implementation :
>https://github.com/ThiagoRibas-dev/SillyTavern-State/
That one will depend on your model's coherence to begin with, but it works decently well in my experience. Just don't overdo it by trying to track the whole universe using thousands of prompts.
>>
I haven't been here in a while. What's the current SOTA on story writing models and workflows?
>>
>>101906839
And by simpler I don't mean better. Anon's extension is looking spiffy as shit and I can't wait for him to release it.
>>
>>101906787
Wow, that's exactly what I had in mind!
You could just leave calculations and such up to the wrapper/addon instead of hoping the model knows what 2 plus 2 is.
I've been allowing it to change the location of characters while following instructions of where they can move from which location and it seems to actually work.
You can't go directly to a store from your home, for example. You'd have to move to the streets and only from there can you enter the store.
Keeping track of the locations of multiple characters also seems to work, although I have to test that more extensively.
>>101906839
Nice! That's another example of what I was looking for.
>Just don't overdo it by trying to track the whole universe using thousands of prompts.
Is there a reason why this shouldn't be done? Outside of the limited context window, obviously.
>>
I ate a strawberry today.
>>
>>101906886
Because it'll create a bunch of messages between the character's actual last message and the user's.
So instead of having the chat look like
>character
>user
You'd have
>character
>state1
>state2
>state3
>stateN
>user
Which some models can cope with, but it's still making things harder on the model than needed.
>>
>>101906730
people love to say this is just the result of an unfiltered pretrain but I think stuff like that speaks way more to anthropic's *post*training than anything. claude did not develop its deep sense for wordplay and creative writing just from having fanfic sites in the pretrain, that stuff needs to be nurtured the same way riddle-solving, code, tool use, alignment/refusals etc. do, it's just something that anthropic explicitly focuses on while no one else does
>>
>>101906931
>>101906710
>>
>>101906852
i released a test version of it like 2 weeks ago but no one left any feedback. its just still shitty enough i'm fixing small things like now the dropdowns read from lorebooks sort alphabetically (my own locations and stuff are starting to get big). i re-added time of day because l3.1 seemed to follow it somewhat, but i don't know how much it really helps. last i did was try to fix up some stuff so i might have broken stuff i haven't noticed yet but heres the current one https://ufile.io/w1cii1vh
extract it to your SillyTavern-staging\data\default-user\extensions\ folder so it has its own 'Director' folder inside there

>>101906886
yes, you'll have to account for that on the script end. even a dumb model should be smart enough to output an updated value, but if that value is any good or not should be handled on the script end as it updates in the ui.
>You can't go directly to a store from your home, for example. You'd have to move to the streets and only from there can you enter the store.
can you share an example? i know what you mean but not sure how you'd go about doing it
>>
>>101906955
I don't see your point
>>
So I'm going to shill hard, because I'm so sick of Anthracite neglecting this model, so I'm hoping if i give it some spotlight, they will stop neglecting it for the smaller models. Everyone sleeps on the original 72B magnum opus because either vramlet, or because its a qwen2 finetune, but even V1 is better than mini magnum and 32B magnum, it is much smarter, and the same prose, follows instructions better, handles more complicated characters far better. That's why i want V2, because I know it will be even better.

Sampler settings I'm using is 1.11 temp .05 min P. It can have repetition issues at higher context lengths so I just keep Dry rep on .8 / 1.75 / 2 which completely fixes that with no issues.
Add BOS token: ON // Ban EOS token: OFF // Skip Special tokens: OFF.
ChatML template.

Try it, shill it, I want v2 of it. I tried mini magnum, 32B magnum, Nemo, Gemini, L3.1 70b and its finetunes, I keep coming back to this one, its the best by far, the only ones that are even comparable for me at the 70b range are Midnight Miqu(very smart, but its prose is plain and dry) or Euryale(fails because of 8k context limit) and low quants of Mistral large(same problem as midnight miqu, very plain prose, not filthy enough, because its not an ERP finetune).

I want anthracite to either make a L3.1 70B magnum, or just do V2 of Qwens 72B Magnum Opus, but they keep fucking around with these shit 12b versions because those are getting all the attention. So I'm shamelessly shilling.

https://huggingface.co/anthracite-org/magnum-72b-v1
https://huggingface.co/mradermacher/magnum-72b-v1-i1-GGUF

Shilling over.
>>
anthracite kto on mistral large pl0x
>>
>>101906980
>Everyone sleeps on the original 72B magnum opus because either vramlet, or because its a qwen2 finetune
Because it was simply not worth using.
>>
>>101906924
Oh, I've been writing my prompts from scratch, forcing the model to respond in a structured way like:
>Variables
>location_of_lebowski: Lebowski's Home
>location_of_generic_thug_a: Lebowski's Home
>
>Dialogue
>*The man shouts at Lebwoski, trying to intimidate him.*
>Generic Thug A: "Where's my money Lebowski?!".

My plan is to create a wrapper to eventually control the available variables and remove them from the final output that the user sees, leaving just:
>*The man shouts at Lebwoski, trying to intimidate him.*
>Generic Thug A: "Where's my money Lebowski?!".

>>101906973
>can you share an example?
It's literally just a series of if-statements. For example:
>There are three locations: Home, Streets and Store.
>User can only go to and return from Streets from Home. User can only go to and return from Store from Streets.
>If it is unclear where User tries to go, ask where they wish to go and show their options.
>If User tries to move somewhere they can't go, forgo the structured response and respond with "You can't go that way. Try again."

Once I get it working a bit better I'll fix up the prompt and release an actual working example.
>>
>>101906980
Buy an ad
>>
>>101906974
Semantically similar words end up close to each other in the embedding space. So while 'hope' and 'hoe' are grammatically different, if they're use in semantically similar situations, they end up close to each other (king/queen, man/woman thing). If trained with injected typos, during inference, the probs for 'hoe' when 'hope' is expected will be higher than normal, but still lower than 'hope'. Mess with the temperature and sampling a bit and it will select the slightly less likely, but semantically similar word, 'hoe'.
This is how llms correct typos, but it can work the other way around as well. It can insert typos and make it funny by accident.
>>
>>101907049
>actual working example
i look forward to seeing it, i really want some sort of front end for rpg stuff. hp, levels, basic attacks, enemies taking damage, dice rolls. i'm pretty sure if you inject the right stuff and tell the ai what to do with it, you can read back values in such a way
>>
>>101907045
Anyone who likes 12b mini magnum or 32B magnum should be using 72B, unless they are a vramlet, thats a simple fact.
>>101907065
I will buy an add for anthracite if they do Magnum opus 72B V2.
>>
OH MY GOD ITS FUCKING HAPPENING OPENAI SERVERS ARE DOWN. I REPEAT, THEY ARE DOWN!!!!!!!!!!!!!!!!! HOLY SHIT
>>
>>101907097
>This is how llms correct typos, but it can work the other way around as well. It can insert typos and make it funny by accident.
Not that anon but, from my understanding, In theory, the direction in the embedding space, introduced by RoPE, should lessen the chance of turning correct words into typos, assuming that the model was trained correctly of course.
>>
>>101907137
just werks for me
>>
>>101902559
>was 8x22b that bad?
wizlm 8x22 was the best we had for a long time
>>
>>101907137
THE STRAWBERRY HAS BECOME SATIENT
>>
>>101907097
it's pretty clearly funny on purpose rather than by accident though, which is what I'm saying - claude knows how to do wordplay, other models don't have that much care given to it. I don't think the typo hypothesis is very convincing at all, you don't see claude making random typos in its responses, this is pretty clearly just a punny joke.
>>
>>101907122
Anyone who likes 12b mini magnum also would have liked vanilla Nemo.
>>
>>101907167
IT LITERALLY DOES NOT WORK FOR ANYONE TRY MULTIPLE PROMPTS IN CHAT, THERES LAGS. SOME PEOPLE ARE REPORTING DIFFERENT COLOR OF GPT LOGO
>>
>>101903483
>ONLY A FEW MORE HOURS UNTIL THE CLOCK IS BROKEN BY THE OWL
>YELLOW STARS MOVE THROUGH OUR LAND
>IT IS TIME
>TRUST THE PLAN
someone played too much fez
>>
Wait holy fuck it only breaks when I ask chatgpt what version it is. Actual happening
>>
the stars align, the tides turn, and the old order crumbles. in the ashes of the past, a new world is born, forged in the crucible of chaos. hold fast, for the storm approaches
>>
>>101905014
>Flux makes a very dall-e-looking Migu, and does a good job with English text.
what's the best non-autist way to run it? I don't want 10,000 nodes to babysit
>>
A SECOND STRAWBERRY HAS HIT THE SERVERS!
>>
>>101907169
its a good smart model but wasn't good for rp imo
>>
Localsisters, I don't feel so good
>>
>>101907160
Embeddings have a direction (or coordinate, rather) with or without rope. Nudge the sampler settings enough and you'll get typos. I could be wrong, of course.

>>101907183
Hard to know without more context than a single line. What i wrote about is an actual training technique, not too different from token masking.
>>
mental illness
>>
>>101907274
if it's just a typo it's not impressive at all and anon wouldn't have posted about it. assuming the context is ERP and with the Lena-Wan Kenobi bit, it's really obviously just a pun. that's what's remarkable enough about it for anon to make a post - cultural reference with some funny, human-feeling wordplay. not a lot of other models can do that unprompted.
they almost certainly also do train on stuff with typos for the reasons you mentioned but I think that has more or less nothing to do with the behavior here, just doesn't make sense. again claude doesn't randomly make typos in its own writing (recent aws schizo episode aside) that would be very much undesirable behavior.
>>
File: 468519163.jpg (1.41 MB, 2048x2048)
1.41 MB
1.41 MB JPG
>>101907251
It was the best model of its time for complex longform RP. Dogshit for ERP due to the extreme prevalence of slop, though. Could not write erotica to save its life.
I used to run WizLM-8x22 for some 100k+ token adventures, switching over to CR+ for the naughty scenes.
Largestral obsoleted this combo. It is smarter than WizLM and with the right prompt can be just a soulful as CR+.
>>
>>101907274
>Embeddings have a direction (or coordinate, rather) with or without rope
Coordinates, yes, direction, also yes but not in the way I meant it.
I was talking about the direction in the embedding space relating to a token's position in a sequence, which is what RoPE encodes if I'm not mistaken.
Guess I should have used the word position somewhere.
The combination of direction and positional encoding not in absolute terms but relating to a whole sequence should make AB likely without making BA also likely, something like that.
I'll admit that my understanding is fuzzy, so I can't give a proper, comprehensive explanation.
>>
>>101907403
>It was the best model of its time for complex longform RP.
Naaah
>>
>>101907205
can confim, it's down for me now
>>
>>101907403
my experience was i tried a new rp, it rambled on like a drunken sailor. it did everything except move the scene forward, i had to kick it and poke it to get it to do write something beyond the current scene. when i loaded up an existing rp though that had hundreds of messages, it acted normal and was writing well. never tried erp with it
i like cr+ but its just so slow, 70b is about what i can stand to run
>>
>>101902149
why is the pruning thing in OP, that happened in July?
>>
>>101906980
the excuse i have heard is that they are training on small models to test datasets and reinforcement learning without it taking an entire day
>>
>>101907396
>if it's just a typo it's not impressive at all and anon wouldn't have posted about it
Context matters. We couldn't have gotten the screenshot if it hadn't been funny and we wouldn't be discussing this issue. You can search for Claude typo if you want. It may not convince you that this one example is a typo, but it can certainly make typos.

>>101907414
Fair enough. It's fuzzy for me as well. I just think we don't have enough context to decide and my opinion is based on what i know about token masking, how llms can naturally correct errors and, if given the right sampler settings, can still fuck up by going the other way around. Coincidences do happen and i think this is exactly that.
>>
>chatgpt site is actually down
what the fuck
>>
>>101907520
They need to test harder then because the models are biased as fuck. It keeps trying to be a goody two-shoes even when I tell it to be an evil assistant
>>
File: 1722325041003117.png (139 KB, 1280x534)
139 KB
139 KB PNG
>>101907137
total local death is coming.
>>
File: temp_scaling.gif (55 KB, 388x440)
55 KB
55 KB GIF
>>101907531
>if given the right sampler settings, can still fuck up by going the other way around.
Oh yeah, that's absolutely the case, I didn't mean to imply otherwise.
The chance of it happening naturally should be close to zero, but if you set temp to 10 or something like that all those low probability logits will have about an equal chance of getting chosen.
I was really just commenting on the "default" behavior of a gpt I guess.
>>
>>101907555
openai has not made a better model than gpt4-0314, they only release benchmaxxed ultrasmall sparseslop models these days, what makes you think that changes now?
>>
>>101907578
>openai.. *my headcanon*
yeah sure.
>>
>>101907592
there is a reason why they price dropped 4o after sonnet 3.5 mogged them
>>
strawberry status???
>>
>>101907611
>company optimizes their LLM inference methods & energy consuming
whoa!
>>
>>101906980
>L3.1 70B
please don't
>Qwen 72B again
sure, that sounds good to me
>>
>>101907531
the typos it makes are almost always weird grammar or the very rare misspelling of an uncommon, long, multi-token word due to low-probability choices, not wholesale misspellings of simple (probably single-token) words which simply do not happen unprompted. I think you're being a little obtuse about this, it already clues that it's going for a cheeky playful twist on a common saying with the "Lena-Wan Kenobi" bit
I recognize I'm being a little autistic about this but it's extremely clearly not a typo lol
>>
>>101907636
enjoying sama's cock in your mouth?
>>
>>101907570
>The chance of it happening naturally should be close to zero
>I was really just commenting on the "default" behavior of a gpt I guess.
Normally, yes. I'd agree. But who knows what's going on after the request reaches their API. I doubt there's only one model in the pipeline. Chain enough black boxes and you'll end up with a nice puzzle. It's all fun speculation, really. The intentional typo idea is fun. I even hope it's real, but i'm still not buying it and i don't think we *can* know for certain.
>>
A reminder that the OpenAI spam is all done by petra.
>>
>>101907614
I'm quanting it right now
>>
>>101907614
IT's ALL COMING TOGETHER

CHATGPT APP JUST RELEASED AN UPDATE
I REPEAT
CHATGPT APP JUST RELEASED AN UPDATE
>>
>>101907660
>projection
>>
I found another repetition trap in magnum models. "Eyes widen" at the beginning of the reply. Coincidentally this is also a Claude trap. Like father like son.
>>
>>101907651
>it's extremely clearly not a typo lol
Not for me. Intentional or not, it's still funny. Just hoping we can get some more of that in local eventually.
>>
File: file.png (44 KB, 661x538)
44 KB
44 KB PNG
PERPLEXITY DOWN
I REPEAT
PERPLEXITY DOWN
>>
>>101906931
oh it's both for sure
>>
>>101903478
a bit ufair concidering that his maturity levels fluctuate between 13 and 33, so it could have been genuine
>>
why is fimbulvetr v2.1 so much worse than v2 wtf
>>
>>101907667
Is your reminder going to stop the OpenAI spam?
>>
File: 1687953142532165.jpg (88 KB, 640x774)
88 KB
88 KB JPG
>>101907735
>It's taking over
>>
Stawberry status?
>>
>>101907750
Yes.
>>
>>101907756
UPLOAD IN PROGRESS
THE SINGULARITY APPROACHES
WITNESS AND BECOME
>>
>>101907756
Wilted
>>
>>101907745
>Also, if you're using gguf or other quants, stuff is broken there. PoSE doesn't play well with quants.
https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K/discussions/2
>>
>>101906787
since i posted a link i'll shill my little addon, it aint half bad

https://ufile.io/w1cii1vh

in the screen where you see clothes, locations etc, thats reading from the lorebooks you create in st. the idea is you create a bunch of pre-selected options then they populate in the dropdowns. you could achieve the same thing by just putting in your author notes {char} is wearing <lorebook entry>, this is just my lazier way of doing that. several of the other things like weather also will add things you might not normally see - windy will blow up girls skirts, thunderstorm will cause power outages, rain soaking clothes, being overly hot causing a character to pass out. all this is based on your model and what it decides to use, but its a nice little addition
i was thinking about making it a popout window like author notes is vs now where its inside the extensions menu.
all criticism welcome.
>>
>>101907756
Can you spell correctly so that my fucking filter can remove all those inane messages?
>>
>zoomers treating 4chan like twitch chat
>>
>>101907667
A reminder for you to immediately apply medications.
>>
>>101907775
>i was thinking about making it a popout window like author notes is vs now where its inside the extensions menu.
That and providing a basic example lorebook would be nice.
>>
>>101907775
Neat, I'll definately be testing that out.
I'll post eventual feedback in one of these threads.
Also
>ufile
Pff, use catbox or litterbox like a proper degenerate.
>>
>>101907774
ugh
>>
******** ******???
>>
File: file.jpg (101 KB, 1284x1021)
101 KB
101 KB JPG
ANTHROPIC DOWN
I REPEAT
ANTHROPIC DOWN

IT'S FUCKING HAPPENING
>>
>cloud shills invading the local models thread to spam their shit
This is technically justifiable for a ban right?
>>
>>101907786
The incorrect spelling was part of the joke. A joke you would understand if you'd lurked enough before posting.
>>
>>101907825
We're just having a bit of fun, grumpy.
>>
>>101907825
Could this technically be considered announcing a report or sage? :thonk:
>>
>>101907825
ban these nuts nigga
>>
>>101907825
Sadly jannies are not doing their job.
>>
>>101907871
Watch out, anon. Posting criticism against the mods is a bannable offense.
>>
>>101907825
>shill shill shill!! reeee!!!
lol
>>
>literally spam a thread for months
>"it's just a prank bro"
>>
so openai will drop agi in 15min ?
>>
MY COMPUTER IS DOWN
I REPEAT
MY COMPUTER IS DOWN
>>
>>101907903
>months
meds
strawb is like a few days old schiz
>>
>>101907903
>person finds out about thing
>opens relevant general
>shitposts about it
>leaves (or stays and continues discussing local models)
Wow, I can't believe one person is behind this!
>>
>>101907924
the 'berry is being installed
>>
>>101907924
IT'S HAPPENING
>>
File: file.png (1.22 MB, 976x549)
1.22 MB
1.22 MB PNG
>>101907756
>>
Commits by the people who cry about how people use the thread: Still 0
>>
>>101907786
Out of curiosity. If someone responds to the filtered message don't you get curious what the original message was?
>>
File: 1633442205197.png (657 KB, 900x648)
657 KB
657 KB PNG
>>101907924
OH FUUUUCK
>>
>>101907795
>basic example lorebook
i've been thinking about that too. like, in that screen i have aqua's outfit saved, i literally just copied it from the wiki about her outfit description. its also really modular in that say aqua's outfit described her panties, you don't need to choose the undies option or have that set up at all. the ai reads the whole thing as one outfit, so you can describe the makeup as part of the clothes if you want then not use anything else. if something is disabled its not injected at all.

>>101907803
i can post it anywhere you'd prefer, eventual goal is to put it up on github

just keep in mind this whole addon is like 98% codestral kek. theres a lot in there i had to do but its pure slop at the same time
>>
>>101907957
They haven't filtered shit. Lots of people have all kinds of filters set up for various things for various reasons but none of them talk about it. The only reason they brought it up was a desperate cry for attention.
>>
>>101907957
I use recursive hiding and also hide stub, I don't even see if something is filtered.
>>
>>101907925
I'm talking about the cloud shilling in general which this is obviously a part of to anyone who has been here for a while. Some may be legitimate >>101907926, but there has been someone who has had an agenda and incentive for faking posts, the guy who posts shit like "local lost" and "/lmg/ anons are retarded for falling for X". Maybe I'm talking to him right now.
>>
>>101907926
Except this is not the relevant general.
>>
>>101907996
Strawberry rumours are about AI.
This thread is about AI.
Simple as.
>>
>>101908008
>This thread is about AI.
zoomer reading comprehension, everyone
>>
>>101908008
local ai
>>
>>101908017
Anon, remove that stick from your ass and try to appreciate a joke from time to time, yeah?
>>
>>101906688
>Winblows
You only have yourself to blame.
>>
>>101907903
Who died and made you jannie?
>>
>>101908031
Source that strawberry won't be local?
>>
>>101908032
Which model wrote this?
>>
>>101908046
i'm strawberry, i predict that you will die miserable
>>
>>101908062
Robots can't fill out captchas dummy
>>
File: file.png (1.49 MB, 800x1066)
1.49 MB
1.49 MB PNG
>>101908056
>>
File: 1713012351659392.png (56 KB, 626x816)
56 KB
56 KB PNG
Nice thread.
>>
>>101908072
>he fills out captchas
https://github.com/drunohazarb/4chan-captcha-solver
>>
>>101908091
Welp. I hope you enjoy your vacation, anon.
>>
>>101908072
*local models cannot fill out captchas
>>
>>101908106
Not him but ban evasion is trivial. This site fucking sucks kek.
>>
>>101908106
anon it takes 6 seconds to change your ip. did your parents buy a puter for christmas?
>>
>>101907871
Actually, there is one thing that gets them to do it: >>101908091
>>
>>101908089
Retards arguing with bots.
>>
>>101908124
I live in Europe where we are assigned static IPs by our ISPs.
Any and all shitposting I have to do through my phone which receives a dynamic IP from my mobile internet provider.
>>
>>101908089
>i'm the only one not filtered
>and i still havent even had 1 reaction to my addon telling me its good or a total waste of time
not quite duality
>>
>>101908158
Sorry anon, I swear I'll test it after ChatGPT 5 comes out in a few minutes.
>>
File: drinking-cat-1.webm (2.94 MB, 720x720)
2.94 MB
2.94 MB WEBM
>all those "benchmark winning" models that are at best a sidegrade to the base models
is there ANYTHING AT ALL that is even somewhat trustworthy?
I want one up-to-12gb vram and one 24gb vram model, but I have zero idea what's best right now. MoEs were supposed to be good but they seem to be rare? Llama 3.1 was supposed to be good but turned out to be meh. Or was it? Who knows? The benchmarks sure don't.

Is it possible to learn what the current SotA ~13b model is? Is there even any point in local LLMs if I'm not interested in lewd shit?
>>
>>101908138
you force a new ip anon. change the mac address on your router. the host already assigned an ip to that mac, so it needs to throw you a new one. you'll likely end up on the same network though so dont be an ass enough to get yourself range-banned
>>
So where is it?
>>
openai sisters where's agi
>>
>>101908186
No one is trustworthy. Unfortunately the reality for this space is that you need to just invest the time and conduct your own tests and evals to truly know whether the performance of a model will be good at doing what you plan to do with it.
>>
>>101908158
I'm currently testing it for what it's worth. (lorebook asker anon) Switching between a few models to see how they react to basic stuff so far.
>>
strabw..........
>>
>>101908239
again the point is to be able to quickly select clothes and stuff cause erp. the rest comes from random testing and seeing that mentioning the weather does actually bring something out in the model where it mentions it, or an action happens such as a girls skirt flying up.

you could achieve the same thing by putting 'the weather is windy' at 1 chat depth in author notes. its not like the addon is doing anything magical, its just reenforcing certain things every single message and i really like some of them
>>
File: beloom.jpg (420 KB, 1024x1024)
420 KB
420 KB JPG
>>101908260
wuh.... guh..... where is it....
>>
>>101907509
My bad, I wasn't paying attention. I saw it in the recap and saw it was updated recently, I figured they just made the repo public.
>>
holy shit

https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-405B
>Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board.
>Hermes 3 405B is a frontier level, full parameter finetune of the Llama-3.1 405B foundation model, focused on aligning LLMs to the user, with powerful steering capabilities and control given to the end user.
>The Hermes 3 series builds and expands on the Hermes 2 set of capabilities, including more powerful and reliable function calling and structured output capabilities, generalist assistant capabilities, and improved code generation skills.

405b uncensored sovltune
>>
>>101908329
doesn't need to*
>>
>>101908277
Yeah I got that, and it does already make some world-state stuff more prevalent, also in regards to "worth it or not" I can see a lot of use out of being able to quickly hot-swap stuff without typing it in each time, so yes worth it even if for only that.
>>
>>101908328
>Hermes 3 405B is a frontier level, full parameter finetune of the Llama-3.1 405B foundation model
I wonder how much money they spent there.
>>
>>101908328
LOCAL WON
>>
>>101908329
I am noticing conspicuous pauses in your petra/pedo spamming though so I'm inclined to believe that you can't (easily) dodge them either.
>>
>>101908186
use nemo instruct. if too retarded try 3.5bpw 10kctx command-r or mixtral (optionally limarp zloss) 5bit gguf. also please shit up this place with your logs. I want /lmg/ to die and mikutroons to suffer.
>>
>>101908328
>405B
Holy fuck
>>
>>101908363
>405b
>Local won
I dunno about that. Barely even at a size people could run locally, you're dumping thousands and thousands into hardware at that point.
>>
>>101908341
>make some world-state stuff more prevalent
most people use author notes at the default of 4, i inject at 1 by default. the lower, the more it takes things into account but yeah the hot swap/selection part is the main feature.
play with the weather though, make it fall, cold and windy. you'll see it pop up in your rp via your characters just talking about it
i'm up for any suggestions too
>>
>>101908329
What's with the artifacting/pixelshit...? Please get a better model.
>>
>>101908328
>notorious gptslop series
>worse mmlu than llama3.1
What's the point?
>>
>>101908339
Impossible on nu-4chan with reddit moderation who will ban you for anything.
I remember when I had a tiny ip pool to ban evade, some janny who couldn't range ban for whatever reason, just stalked every post I made from it for several days and banned them separetely for bullshit reasons. They are so petty and do it for free.
>>
>togetherAI endpoints randomly down now
>>
>openai deliberately waiting an extra hour to release their newest model as a fuck you to the leaker
lmao
>>
>>101908420
sites been that way for years now. remember when captcha was just a weekend test? i got a ban for responding to an ai thread on /v/ and mentioning silly tavern
>>
Smoothing 0 is the same as smoothing 1, correct? The value 0 disables the sampler while 1 is mathematically the point where it does not modify the results, right?
>>
>>101908328
downloading the 70b, let's see if we've got anything here
>>
>>101908413
hi petra
>>
>>101908525
Newfag here, who the fuck is Petra?
>>
>>101908533
if you have to ask, you'll never know
>>
>>101908533
some meme tranny boogeyman that never existed but gets brought up every few threads
>>
>>101908533
anon's boogeyman.
>>
It's not actually happening, is it?
>>
>>101908561
I'm 99% sure it's >>101908457
>>
File: file.png (23 KB, 904x80)
23 KB
23 KB PNG
>>101908402
>play with the weather though, make it fall, cold and windy. you'll see it pop up in your rp via your characters just talking about it
Picrel: sweltering, sunny, summer, afternoon. No mention of weather in card/history so that's nice.

>i'm up for any suggestions too
Main one I already did ask for, the pop out mode like authors note would be super useful, because currently it does break the flow when you need to go to the extensions panel for a quick switch. It's still faster than typing stuff in but yeah.
>>
>no promised agi to solve all my problems
>still have to clock in tomorrow
Sad
>>
not feeling berry good right now
>>
>>101908328
>>101908388
24 channel cpuCHADS where you at
>>
>>101903304
Let me know if you get it working cause I couldn't figure it out.
>>
>>101908632
Ready and waiting for my first reply, bro!
>>
Imagine being the dude behind the strawberry account.
You've had confirmation that something is actually dropping today and that it should drop around this time.
You spend weeks marketing and hyping this moment up.
And nothing happens.

That dude is sweating bullets right now.
>>
>>101908632
8 channel ddr4 gets me 0.14 token/s (that's with enough GPU power to push through the batch processing quickly)
So 24 channel ddr5 is probably good for about 0.7token/s, assuming you also have 100+Tflops of GPU compute.
>>
>running 405b on 24 channel ram is slow as shit
>running 405b on 20 3090s will also be slow as shit because of the decline in speed that comes with each added GPU
It's so over.
>>
>>101908724
Just run it overnight bro
>>
>>101908587
thats what i'm talking about! now without my mod you could just put all that info into your own author notes, and it'll bring it up too. and sometimes even models just ignore it kek. but usually not on 70b i noticed, it seems to always consider some things. lighting for example doesn't work good, i put it in as a test to combat the constant 'dim' lighting that rp models love to talk about - everything is dim to them and it annoyed the fuck out of me. so far, to me, the lighting setting barely makes a difference. so there is some preference in models for what gets mentioned, but i know weather is in all of them just from testing. leave the weather the same for 2 rolls, then change it from like light rain to thunderstorm after its mentioned the rain first. it'll again make the ai talk about something new to do with it

will very look into making it a popout card like author notes is, that was on my radar anyways
>>
>>101908705
nah he is just a real faggot
he really enjoys the way he annoys and gaslights people
>>
>>101908724
Bitnet was supposed to save us... although you'd still need a quad GPU rig to run a 405B bitnet model.
>>
>>101907775
That sounds handy, anon. My first thought was a mood or ambience kind of tracker that let's a character go from normal to horny as things get more romantic/lewd.

Now, addon anon aside,I was thinking how human on human roleplay often begins with a discussion about how the roleplay is going to go, what kinks you'd like to see, etc. I was wondering if starting roleplay with something like that instead of an opening post would help steer the AI in the exact way you'd like. Something like a "pre-rp" back and forth series of prompts that can then be followed up by the opening post as usual. I wonder how that would go.
>>
>>101908837
That's actually an interesting point worth testing. But it's true. Wall of text character profile dumps are not typical of human on human rp.
>>
>>101908088
this fookin strobby
>>
>>101908876
In theory it might even help with the AI acting for user, because the opening post would likely be written by user and the AI would react to it.
>>
>>101908837
>was a mood or ambience kind of tracker
it actually used to have a mood option for user/char, but in testing it never seemed to make a difference. with l3 and newer models though i'd consider restoring that option. it had basics like happy, sad, angry, disgruntled, horny. at the time i was testing (with miqu mostly) it didn't seem to care or respond at all to it, so i removed it. a prefill for a tone of the scene would be great but in my limited tests, telling is {{char}}'s mood is horny didnt make any differnece. you'd have to inject something more than that. i'm up for that too if anyone has any ideas. maybe selecting a mood wouldnt just say {{char}} is happy, it'd describe their happiness instead (or horniness)
>>
AGI in 5 minutes
>>
>>101908920
Another thing that'd be very nice, but maybe harder to implement idk, is a "custom" section where you'd write in stuff yourself, so you could use it as a true authors note hot-swap and have like "writing styles", "message length", "style", etc. nudges for example. Of course this is all already doable in st, but switching using lorebooks or actual authors note is a bit of a pain.
>>
>>101908724
>with each added GPU
Not with NVLink when you use tensor + pipeline parallelism.
>>
>>101896943
>If you can run 70B, you can run mistral large IQ_2M. Before anons go crazy about running a quant like that, it's still the best model I've ever used, better than 4 bit quants of any 70B. Just use minP ONLY.

What minP are you using? I'm doing IQ_3XS in RAM only so my iteration is too slow to experiment or have developed any opinions yet just from using it. I have minP at 0.001 based on no actual data and I haven't seen anything bad with the output in my very limited replies.
>>
>>101909001
How are you going to nvlink 20 gpus?
>>
>>101908837
>Something like a "pre-rp" back and forth series of prompts that can then be followed up by the opening post as usual. I wonder how that would go.
That's a very interesting idea.
What if the entire conversation consists of separate scenarios, each of which have their own set of variables?
Even if a scenario isn't active, the model could easily update related variables in case something happens.
Multiple scenarios could be active at the same time, even.
>>
>>101909016
NVLink 10 of them + P2P using BAR + pipeline parallelism for multi-node and you're running 405B at 800 T/s for a single batch.
>>
File: sweaty-speedrunner.jpg (985 KB, 1919x1079)
985 KB
985 KB JPG
>>101908632
I'm here and I'm fucking sweaty. This thing is loud and it turns my room into a giant airfrier. Not really a fun experience desu.

I've just tested bf16 Largestral to see if it is better than Q6_K quant which I can run on a quieter, less hot machine. The rise in intelligence is noticeable, and everyone who says otherwise is telling lies. It gets even more smaller, subtle details than the quant. Is it worth getting fried over? No, not at all. Maybe in the winter when the weather is cool, but not in the summer, hell no.
>>
>>101909065
This, and people told me I was crazy when I said that the drop in quality between bf16 and 8bit was huge.
>>
>>101908999
>true authors note
well, my addon has its own notes section. but hot swap, you mean like lorebooks saved (or some equivalent)? so you can restore a setting?
i can inject and read files just fine (so yes i can implement a save/load system) so if you're clearer, i might be able to include what you want
>>
>>101909065
Are you sure it wasn't just a coincidence? Did you look at the token probabilities? So far no one has ever provided any concrete proof that Q6+ quants lead to significant differences.
>>
>>101908876
>>101909046
Who knows. Shit, the more I think about it the more I like the idea. Instead of an opening post that goes straight into the roleplay, you'd instead have a post from the AI, still in-character but not yet in the roleplay proper, greeting user and asking what they would want to do. It would be written as you would like the character to write, so it would reinforce the character's speech in the process too.

I think I'll try that tonight when I get home.
>>
I cannot find the strawberry I have eyes, they worked fine before.
>>
>>101909089
>i can inject and read files just fine (so yes i can implement a save/load system) so if you're clearer, i might be able to include what you want
Pretty much a quick save, quick load note section yeah. And full save and load option for all sections would be great too.
>>
>>101908738
Quantization might play a big role in whether those things are picked up. Generating stories with Mixtral 8x7B Instruct I saw instructions about writing style ignored the great majority of the time at Q_5KM and always followed at Q_8. If I had only used the smaller quant I would have thought that kind of instruction was ineffective. Someone else (whose post I regrettably don't have saved) wrote here about an RP test they were doing with instructions with about emoji use. Their results as I recall was that at Q4 it flat out didn't work, at Q5 it kind of did but was janky, and at Q6 it totally did with a surprising level of difference from Q5.
>>
>>101909137
youre limited by st (if using st). youve got user, char and assit. in st i literally do injection as SYSTEM

you can feed ai anything you want. it doesn't always have to respect it. but if you do it right, it will respect it. i've posted my code, as awful as it is. but if you want to work on something together, i'm all for it
>>
>>101909085
Can you please provide reproducible proof of that? This would have great implications and help move forward the understanding of the tech greatly if you could do that.
>>
>>101902183
Llama 3.1
>>
>>101909065
Quantization seems good on paper because it's literally giving the same top logits for the next token right? And the difference is a only few percentage? But when you generate 800 of them at a time, multiply this small difference by 800 times, and consider that a single bad token will poison the whole context and affect every single token from then on. This divergence is the rate of change, not the change if that makes sense.
Now throw in samplers that ban the most confident tokens to "boost creativity" and the difference becomes more obvious.
>>
>>101909065
>I've just tested bf16 Largestral to see if it is better than Q6_K quant which I can run on a quieter, less hot machine. The rise in intelligence is noticeable, and everyone who says otherwise is telling lies.
DAMN IT
>>
>>101909265
>samplers that ban the most confident tokens to "boost creativity"
No sampler does that but it's crazy and I love it. Banning only the very first token might not destroy the ability to produce coherent output. I think I'll try it this weekend.
>>
>>101909251
3.1 is junk. old l2 extended to 32k is still better
>>
>>101908328
>no gguf
>>
>>101909361
It works fine for me.
>>
>>101909361
>l2 extended to 32k
other than miqu l2 can barely even do 4k
>LongAlpaca (13B) 32K <4K
>Llama2 (7B) 4K 85.6
https://github.com/hsiehjackson/RULER
>>
>>101907775
Should team up with that faggot

https://www.reddit.com/r/SillyTavernAI/comments/1erdl4o/i_made_a_kinda_cool_st_script/
>>
>>101902264
nothing is happening you pathetic mouth breather
>>
>>101909350
Didn't mean literally banning, more like temperature but on steroids. I remember seeing them everywhere during the sampler gymnastics phase.
>>
>>101908328
Wow! A tune I won't be able to run even if I double my RAM!
>>
File: Untitled.png (5 KB, 164x208)
5 KB
5 KB PNG
>>101909392
i've had a terrible time with it for rp on st. it repeats like crazy, it ignores the most recent message. its very. i tried l3.1 base, tess, lumanoid or w/e, instruct. nothing l3 runs right past its max context, where as the older l2 models do. i don't know why
>>
>>101909470
Did you use minp? I had bad luck when having that on for some reason with 3.1 70b.
>>
>>101909350
typical-p does that (banning the most likely token(s) if it makes sense to do so), but somebody needs to decouple it from its built-in top-p to make it useful, otherwise it removes too many tokens from the token distribution tail at useful settings (0.5 or less).
>>
>>101909218
if by quick load you mean setup a bunch of options for the mod, that i can do. but i can't import st settings for example, just settings from mods. everything already saves per chat so you shouldnt run into any issues
>>
>>101909501
>>if by quick load you mean setup a bunch of options for the mod, that i can do. but i can't import st settings for example, just settings from mods.
I meant for the extension itself yes.
>>
>>101909491
yes, that has been my default (0.05 minp, very low rep pen/though i recently changed to DRY)
with my current settings everything is fine for old l2 models. l3 models just go nuts and start talking about shit that isn't relevant at all
>>
>>101906980
>Dry rep on .8 / 1.75 / 2 which completely fixes that with no issues
You claim DRY causes no issues? Opinion discarded.
>>
>>101906438
chatjeetpt
>>
>>101909534
This is anecdotal evidence and makes little sense, but maybe you can try it sometime (I just use the original llama 3.1 instruct). When I turned off minp, I stopped seeing the same phrases over and over.
>>
>>101909534
Are you using both models with the same context window size?
>>
>>101909513
yeah, it should be absolutely possible to export and import every setting. right now everything is saved per-chat but i could add an import button or something where you haved the saved lorebooks selected already and stuff

>>101909605
yes, under their max too, 16k
>>
File: file.png (174 KB, 1125x407)
174 KB
174 KB PNG
oh fug
>>
>>101909552
It's true if you're not using other retarded and outdated samplers like Rep Pen
>>
>>101909644
lies
>>
>>101909671
It causes issues, you must be illiterate.
>>
>>101908328
70b gguf already included, might as well try. Can't be any worse than vanilla instruct tune, right?
>>
>>101909699
anyone can see the problem is more with the quant than the samplers retard
>>
File: Largestral-Q6_KvsBF16.png (9 KB, 1012x61)
9 KB
9 KB PNG
>>101909097
The best I can offer you is my flawed mememark(https://huggingface.co/datasets/ChuckMcSneed/NeoEvalPlusN_benchmark). The differences were significant enough to make it pass reading comprehension test(column D), 100% the stylized writing(column S) and to mess up one more poem than Q6_K(column P). The test was performed at deterministic settings(temp=0). Yes, I am sure it was NOT a coincidence.
>>
Are there any local speech input into speech or text output models yet?
>>
>>101909763
So you're saying the quant is what causes DRY to replace words?
>>
>>101909818
Would q8 be enough to make up the difference?
>>
>>101909869
>>101909869
>>101909869
>>
>>101906980
>I want v2 of it
As if V2 always means better in this case...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.