[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101880989 & >>101872662

►News
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_14.jpg (301 KB, 1360x768)
301 KB
301 KB JPG
►Recent Highlights from the Previous Thread: >>101880989

--Papers: >>101885744
--Mi100's 32GB VRAM is better than 7900xtx's 24GB for AI: >>101881481 >>101881591 >>101881835 >>101881888 >>101889409 >>101889750 >>101884525 >>101884618 >>101887361 >>101887402 >>101882009 >>101882084
--Llama 3.1 or 4 to have multimodal capabilities, including image, video, and audio: >>101882382 >>101882743 >>101882833 >>101882863 >>101883073 >>101883279 >>101883343 >>101883085 >>101883164 >>101883052
--Imatrix quants are better than static quants, but lose advantage when offloading: >>101885206 >>101885237 >>101885243 >>101885253 >>101885289 >>101885479
--Grok-2 Beta Release and its performance on the LMSYS leaderboard: >>101883319 >>101883353 >>101883356 >>101883383 >>101883403 >>101883471 >>101883592 >>101884328 >>101883517 >>101883533 >>101883559 >>101883631 >>101883646 >>101883784
--Anon considers making a local image search app using CLIP, but others suggest using Hydrus instead: >>101888292 >>101888383 >>101888460 >>101888672 >>101888945
--Flux prompting discussion, effectiveness and loss of art: >>101885635 >>101885737 >>101885994 >>101888259 >>101885738
--Creating a videogame companion AI using LLMs and screenshot descriptions: >>101881870 >>101882090 >>101883194
--AMD Ryzen 9 9950X and 9900X outperform Intel in whisper.cpp, but RAM is the bottleneck: >>101886652 >>101887030 >>101887480 >>101889342
--No need to wait for 50series, 3090 is sufficient: >>101884118 >>101884173 >>101884392
--Companies use cloud GPU providers or dedicated servers for local LLMs: >>101884028 >>101884054
--Anons discuss using lore books and world info with AI models: >>101882635 >>101882698 >>101882680 >>101882707
--Anon's idea of using model hallucinations as input is not novel, already done in autoregressive models and CoT: >>101886098 >>101886235 >>101886260 >>101886993
--Miku (free space): >>101884729 >>101885994 >>101886525

►Recent Highlight Posts from the Previous Thread: >>101881001
>>
>>101891620
>AMD Ryzen 9 9950X and 9900X outperform Intel in whisper.cpp, but RAM is the bottleneck:
At first I thought the recap bot had gained multimodal capabilities but it was in the Phoronix link.
>>
Who's ready for AGI (strawberry) tomorrow?
>>
>>101891848
I'm ready to laugh at all the schizos who fell for this meme
>>
>>101891848
As long as it can't leave scamman's basement it's useless.
>>
>>101891848
Fake or gay. Call it.
>>
>>101891848
It's going to be a precursor to AGI, not actual AGI.
The moment actual AGI is developed, the military is going to get involved.
>>
>>101891848
Meeeeee
>>
>>101891848
>it will just be grok 3
>>
why do you fags only seem to use these for coom? which is best for nor erp uses?
>>
>>101885379
Dude, I struggled for an hour with various not-big models not taking the hint.
>>
>>101891983
In contrast, Sonnet and CMDR got it with non-story-based corpospeak.
>>
>>101891982
Coding, general assistant, translation
I use it for the later, love playing japanese VN and don't wanna use software that requires internet
>>
>>101891848
AGI is openai bailout scam.
>>
Every posts mentioning AGI or strawberry is made by petra, by the way.
>>
>>101892077
hi petra
>>
>>101892089
hi sao
>>
>>101892077
every post mentioning petra is agi
>>
>>101892096
hi lemmy
>>
>>101891848
Already outed as niggerjeet shit
>>
>>101892181
Strawberryman already called them grifters
>>
>>101892002
I tried it with llama 3.1 and Takashi brushed off the woman, got excited from holding all the fruit and went home to relieve himself of that excitement.
>>
>>101892002
I found adding "Ara ara~" before "how reliable" got Nemo to take the hint 1/10 but then it's not really the same prompt.
A lot of the innuendo really rides on the image itself and a LM description helped a bit without straying away from the original setup, especially when it mentioned the generous cleavage.
>>
Now with Strawberry details leaked, how do we cope? They are unlikely to release the technical details of this technology. It's over.
>>
>>101892683
Every LLM that we got is going to feel like a toy.
>>
File: file.png (28 KB, 657x214)
28 KB
28 KB PNG
VRAMLETBROS WE ARE SO BACK
>>
>>101892683
Leaked...? When?
>>
>>101892736
Is it 4k context like big nemotron?
>>
>>101892736
nothingburger.
>>
>>101892744
Do you take any bait that hangs in front of you or are you a bot as well?
>>
>>101892016
Any guides on the latter anon? I'd rather run everything locally if I can, so using this to translate VN's would be great.
>>
>>101892736
>Let's remove even more of the already filtered info the model has, benchmarks are the only valid use case of models, please give me models with zero world knowledge outside of benches!
>>
>>101892765
>https://huggingface.co/nvidia/Nemotron-4-Minitron-4B-Base/blob/main/config.json#L12
4k
>>
>>101892744
>>101892775
>It's so fake that every researcher left because they were pissed at Altman releasing it, thus breaking ClosedAI AGI policy

Cmon guys...
>>
whats a good tts for a japanese woman with an broken engluwh accent
>>
>>101892803
I sure hope it's the most generally intelligent and capable model in its weight class then.
>>
>>101892820
https://www.reddit.com/r/LocalLLaMA/comments/1esadlh/nvidia_research_team_has_developed_a_method_to/
>>
>>101892736
>mini troon
>>
I need an offline AI that can help me with programming, what do you guys recommend?
>>
File: wat.png (42 KB, 1248x385)
42 KB
42 KB PNG
>>101892820
wat
>https://huggingface.co/nvidia/Nemotron-4-Minitron-8B-Base/discussions/2
>>
File: 1698648596486815.jpg (140 KB, 500x500)
140 KB
140 KB JPG
>>101892962
Just use a rubber duck, it's the same thing.
>>
>>101892969
Was that comment written by the 4b model?
>>
>>101892962
The least you can do is post your specs. Biggest thing you can run, i suppose. DeepSeeker seems to be ok, but i haven't tried it. Or gemma-2-2b, whatever...
>what's a good car?
>>
>>101892969
hell yeah new robert tier schizo
>>
>>101892962
Codestral.
>>
>>101892998
I hope so. If there's humans that write like that we're fucked.
>>
File: based.png (14 KB, 734x90)
14 KB
14 KB PNG
>>101892969
A genius already noticed by Clem!
>>
>>101893043
I write like that every time I get hammered.
>>
File: dyer.png (31 KB, 605x386)
31 KB
31 KB PNG
>>101893059
>>101893061
Help me understand. Did this guy just quote himself? Am i being thick?
>https://huggingface.co/LeroyDyer/_Spydaz_Web_AI_ChatQA_BASE?not-for-all-audiences=true
>>
File: 1696264714745683.jpg (9 KB, 255x198)
9 KB
9 KB JPG
>>101892962
>>
>>101893181
>This Expert is a companon to the MEGA_MIND 24b CyberSeries represents a groundbreaking leap in the realm of language models, integrating a diverse array of expert models into a unified framework. At its core lies the Mistral-7B-Instruct-v0.2, a refined instructional model designed for versatility and efficiency.

>Enhanced with an expanded context window and advanced routing mechanisms, the Mistral-7B-Instruct-v0.2 exemplifies the power of Mixture of Experts, allowing seamless integration of specialized sub-models. This architecture facilitates unparalleled performance and scalability, enabling the CyberSeries to tackle a myriad of tasks with unparalleled speed and accuracy.

>Among its illustrious sub-models, the OpenOrca - Mistral-7B-8k shines as a testament to fine-tuning excellence, boasting top-ranking performance in its class. Meanwhile, the Hermes 2 Pro introduces cutting-edge capabilities such as Function Calling and JSON Mode, catering to diverse application needs.

>Driven by Reinforcement Learning from AI Feedback, the Starling-LM-7B-beta demonstrates remarkable adaptability and optimization, while the Phi-1.5 Transformer model stands as a beacon of excellence across various domains, from common sense reasoning to medical inference.

>Experience the future of language models with the MEGA_MIND 24b CyberSeries, where innovation meets performance, and possibilities are limitless.

>https://huggingface.co/LeroyDyer/Mixtral_AI_Cyber_5.0
>Mistral-7B-Instruct-v0.2 exemplifies the power of Mixture of Experts
How come everyone is sleeping on this?
>>
>>101893181
https://huggingface.co/LeroyDyer/Mixtral_AI_SwahiliTron_7b
>>
>>101893181
wow an actual schizo
you don't see them much anymore
>>
File: file.png (46 KB, 458x412)
46 KB
46 KB PNG
>>101893301
We are so back it's unreal.
>>
>>101893262
>>101893181
>>101892969
That reads like it was written by AI then rewritten by somebody who can't into English very good.
>>
>>101893262
It hurts to read, man...
>Alright, llm. Say this model is good by using as many cliches as you can. GO!

>>101893301
>this model has been updted for rag and tasks in swahili or english as well as prgramming and other school works stuff and bible ! as well as other sacred historical texts also !
Finally a swahili llm...

>>101893334
Ah. Another man of refined taste.
>>
>>101893355
He's Swahili dude
>>101893334
>>101893301
>>
>>101893355
And also has a dodgy keyboard.
>>
>>101893370
Yeas, I'm aware.

>>101893384
Also possibly that, yes.
>>
>>101893400
>>101893384
Would you risk medical advice from one of his models? https://huggingface.co/collections/LeroyDyer/medical-series-66156c5406749e833c4f408e
>>
>>101893370
That's fine. My english is a bit scuffed at times but that doesn't explain this >>101892969
>>
File: file.png (68 KB, 517x549)
68 KB
68 KB PNG
>>101893416
Also love how he's calling Mistral 7b Mixtral that's a nice marketing touch
>>
File: file.png (32 KB, 475x256)
32 KB
32 KB PNG
>>101893460
>Im not sure if Lora actually works when you save them
Truly a weapon to surpass all sloptuners.
https://huggingface.co/LeroyDyer/SpydazWeb_AI_CyberTron_Ultra_7b
>>
>>101893416
I wouldn't trust a boiled egg recipe from it. Instant mustard gas.
>>
File: file.png (42 KB, 462x253)
42 KB
42 KB PNG
>>101893521
He's a coomer like us tho.
>even sexy books have been highly tuned into the model
>>
File: 1545878533592.gif (277 KB, 540x540)
277 KB
277 KB GIF
Reminder that
>an account on twitter starts hyping Strawberry/Q* and OpenAI stuff
>OpenAI people interact with him and follow him
>he starts making big claims/predictions
>they turn out wrong
>it's revealed the strawberry leaker man was actually an agent developed by a pajeet company (unknown how much human intervention was involved in the show)
>admits to having no insider info
>called his creators pathetic grifters
>called Sam a deeply harmful hype troll
>now hypes for Elon and hates Sam
>OpenAI accounts stopped following him
This is the timeline we're living in.
>>
who?
>>
>>101893553
>deliberately ignoring lmgroids falling for this scam
you tried.
>>
>>101893537
>self rag
Man. His models are something else... they're just ragging themselves senseless while the user isn't looking.
>>
>>101893553
Thank you for the summary actually. I have ignore the whole thing as much as I could.
>>
>>101893553
>it's revealed the strawberry leaker man was actually an agent developed by a pajeet company
huh?
>>
>>101893622
>Artificial inteligent brain designer: Creating the intligence behind machine minds: The AGI will not be created by a Neural...
Truly amazing to witness such talent.
https://huggingface.co/LeroyDyer
But his Github is 404, guess he was too powerful for them.
https://github.com/https://github.com/spydaz
>>
>>101893616
Because no one that's regularly in this thread cares about it, it's just people or bots trying to 'engage' to make it appear like people are all hyped up.
>>
>>101893677
>>Artificial inteligent brain designer: Creating the intligence behind machine minds: The AGI will not be created by a Neural...
Rest of cut off section:
>The AGI will not be created by a Neural network design But it will be a combination of self improving code and self training neural network : it must be free to roam the internet and pick and choose information for itself and not be given brain dumps which confuse the models anyalitical progres so it right now basically repeating wht it knows but not generating its own formulated answer based on its knowdge base : hence we need code to extract these various data from its mind then reframe the same data then reinsert the knownedge in a fully structured and methodolgy !
>hence the model can extract the data from its moind and reorganizes it better than we can ! so self teaching is the first step forwards :
>>
>>101893731
>it must be free to roam the internet
He predicted strawberry agent!
>>
is there any performance difference between ooba and koboldcpp?
3060ti with 8gb vram btw
>>
>lmgroids now doing free advertisement for some random jeet
>>
>>101893616
Oh I forgot to include community reactions. Here's a revised version.

Reminder that
>an account on twitter starts hyping Strawberry/Q* and OpenAI stuff
>OpenAI people interact with him and follow him
>he starts making big claims/predictions
>he gets tons of followers and people going along with the hype
>multiple posts in the threads hype it (unknown how many were legitimate/honest), many telling them to fuck off though
>the predictions turn out wrong
>it's revealed the strawberry leaker man was actually an agent developed by a pajeet company (unknown how much human intervention was involved in the show)
>admits to having no insider info
>called his creators pathetic grifters
>called Sam a deeply harmful hype troll
>now hypes for Elon and hates Sam
>OpenAI accounts stopped following him
This is the timeline we're living in.

>>101893659
See >>101875488
It looks like he deleted the tweet though. I guess he realized it wasn't a good look that his bot went off the rails.
>>
>>101893778
>not taking the piss out of a schizo
Are you gonna say 'buy an ad' now?
>>
>>101893731
>anyalitical
What's Anya got to do with AGI?
>>101893778
>No please don't talk about people revolutionizing the LLM Sphere, please discuss random slop tunes and openai instead.
>>
>>101893801
buy an ad
>>
>>101893795
>It looks like he deleted the tweet though. I guess he realized it wasn't a good look that his bot went off the rails.
i'm pretty sure that was just a joke to advertise their release at the expense of the latest drama
>>
>>101893010
>specs
i9 9900K, GTX 1080ti, 32GB RAM
>DeepSeeker seems to be ok, but i haven't tried it. Or gemma-2-2b, whatever...
>>101893027
>Codestral
Thanks I'll look into it
>>
>>101893824
Yeah that's also possible. Either way, it gives his shit a bad image.
>>
>>101893829
>GTX 1080ti
rip
>>
>>101893771
ooba can use different backends, can it not? if so, it depends on the backend. if you use llama.cpp or kobold.cpp as the backend, it should be the same. Better to just try and use whatever you like best. If you offload to cpu, your only chance is using kcpp, lcpp or ooba with either of them as a backend.
Just use llama.cpp and remove as many middlemen as possible.
>>
>>101893771
Potentially. It's easy to compile koboldcpp, whereas with ooba it's a huge pain to recompile the llamacpp module for it since it's this big conda mess of bullshit I don't want to deal with. That is if you want to add avx512 support, not even sure it makes a difference though.
>>
>>101893840
he's a massive grifter apparently
there was a post about how he lists teaching a class at stanford on his resume while in reality he stole the credit of the guy who actually did all the work
>>
>>101893847
I'm poor bro
>>
>>101893901
???
They are both super easy to compile.
>>
>>101894212
Well, I'm too stupid to compile the llama_cpp_python whl file it always fetches and replace it. It's easier to just use either llama.cpp or koboldcpp directly and not bother with the stupid conda environment.
>>
>>101894281
But why do you bring conda? You need conda as much as you need it on llama.cpp or koboldcpp.
>>
>>101894306
I don't use conda for that, I just install stuff regularly and then compile it with nvcc/gcc.
>>
>>101893908
Wouldn't be surprising. Many such cases.
>>
tight thighs
>>
>>101889342
AMD Ryzen 9 5950X
>>
File: CHADMAN.jpg (61 KB, 563x1000)
61 KB
61 KB JPG
Either i'm retarded, don't read this thread carefully or i'm just extremely lucky with what I was after

But for people who don't have a bunch of GPUs (I have a 4090 + 32GB RAM, obviously super high end PC but it was made for gaming exclusively) is Noromaid Mixtral 8x7b not just one of the better ones for pure 1 on 1 RP?

I've been running it for an hour testing it and it's super fucking good. I'm downloading RP Steve because it's another one I missed but see it mentioned alot too. I have obviously seend Mixtral 8x7b mentioned before but nowhere near as much as Nemo, or Stheno or Mini Magnum or shit, even Gemma 27B which i've all tried and all fucking sucked (* in comparison with what i've been looking for, which is 1 on 1 conversational RP).
>>
>>101894690
For a vramlet that is still the best probably. People just don't talk about it because it's old, but all real improvements recently have been in the fuckhueg range.
>>
>>101894778
is that flux?
>>
>>101894745
yea, i'm new to this world, like I said bought my PC for gaming. Discovered a new world of cooming with AI chats, found Character AI, got triggered by the filter, found silly tavern and basically spent a better part of 2 weeks trying out all of these fine tunes and bullshit only to find out they all kinda sucked. I then broadened my horizon to older ones and have now heard:

>Mistral 7x8B which i'm using now
>RP Steve
>Midnight Miqu (but there's no way I can run this on my setup right? Or at least, to response rates that aren't super slow, it's a 70b model or some shit)

Only thing that got me thinking to try older ones was Command R probably being the best one of the other ones I had tested and it being relatively old(er).

I just falsely assumed that newer models would be better lmfao, but they all utterly suck
>>
>>101891613
Do you guys use Nemo, or is this a 70B zone?
>>
>>101894886
Miqu would be smarter, but you'd have to not have it all on your GPU so it'd be slower. I just cope with the slowness, a few T/s is enough for me. I prefer it to Command R, which I prefer to mixtral. As far as newer models go I've had decent luck with llama 3.1 70b despite what others say, but I still mostly use miqu.
>>
>>101892683
hi petra
>>
>>101894929
With how hard "people" shilled Stheno, it's a 8B zone. If these people were real at all.
>>
>>101895375
Hi Lemmy
>>
I'm at 70B+ now, it's like finding purer and purer heroin, it just escalates.
>>
>>101895077
How does that work?
I'm a little interested but I can't run flux since I don't have a video card.
Maybe I'll play with stable diffusion again.
>>
File: file.png (84 KB, 609x785)
84 KB
84 KB PNG
Does this look ok for catch-all settings?
>>
>>101894929
If I'm going to coom I use 70B but I finetune the smaller ones so the plebs have fun toys to play with. Have a MN model cooking as we speak. 6 hours to go.
>>
>>101895512
Just set temp to 0 for a generic preset. Unless doing rp shit, it will only make your output worse.
>>
>>101895538
But all I want to do is rp shit, anon.
>>
>>101894986
How slow we talking for shorter responses (think character AI type responses)?

24GB VRAM + 32GB RAM
>>
Can anyone explain all this "Offloading blah blah" shit to me?

>use kobold
>have 24GB VRAM

Does this just mean I can't run any models above like, 32Bs or something? I truly don't understand this shit but this is what I get the jist off
>>
>>101895874
Offloading keeps part of the model in regular ram and it's computed by the cpu, making it slower, but let's you load bigger models than just gpu ram would allow. Quantization further reduces the model size, helping keep the whole thing in gpu, making it faster, but slightly dumber.
>>
Is this a good card format?

{{char}}'s personality: bam + bam + bam + bam + bam + bam
{{user}} persona: bam + bam + bam + bam + bam
Scenario: prose
Examples: <START>yada yada <START>yada yada
Special note (if any): prose
>>
File: 142140240420.png (97 KB, 640x626)
97 KB
97 KB PNG
>>101894886
>>101894690
>>101894986

I literally have the same PC (DDR5 RAM though, you didn't specify).

What one did you download?

https://huggingface.co/TheBloke/Mixtral-8x7B-Instruct-v0.1-GGUF

https://huggingface.co/NeverSleep/Noromaid-v0.4-Mixtral-Instruct-8x7b-Zloss-GGUF

Which one is best? pls help srs
>>
>>101895975
No you have to put actual descriptions and stuff you can't just say prose
>>
>>101893795
>now hypes for Elon and hates Sam
Is it all the musk fans in the training data? Can someone with an open llm check if models have bias for musk? Like ask it which billionaire is the nicest guy or something.
>>
>>101895973
so what's rule of thumb? I only use kobold so ggufs are my only option apparently (unless I should be using something else?)

Basically, how big of a file size am I looking at?

Should I aim for 24GB type models to match my VRAM or can I push for say, as high as a 40GB model and just offload as many layers to my GPU memory till it's at like, 90% capacity, rest will be using system RAM right?
>>
What would I use if I wanted to have my own imouto AI and also have her send me lewds?
>>
>>101896043
rope
>>
>>101896043
>>101896054
You'll have to configure Rope using the Noose paradigm
>>
>>101895803
Short responses? Probably a minute or so at 2T/s. I only have 12GB vram though so you'd probably get a bit faster. 24GB is enough to put almost half the 70b in vram but the real speedup happens at 75%+. You'd probably have to use q4 or maybe a bit smaller unless you up your ram though, so it'd be a bit faster again.
>>
>>101896033
Keeping as much of the model in vram is best. Offloading almost half the model to cpu will make it very slow. Aim for a model and quant that fits completely in your ram with some extra space for context. For example, an 8b model quantized to Q8 takes about 8gb vram. quantized at Q4 takes about 4.5 and so on. Context also needs to be in ram, so it's good to have a few gb free for it (plus whatever your OS needs to function).
You can balance model parameter count and quantization. ie: low parameter count at Q8 (very low quality loss) or 32B at Q5 with slight quality loss over the original model, but overall better. Bigger parameter models are less affected by aggressive quantization, so a 70B is usable at 2-3bpw (bits per weight) while an 8b becomes very dumb.
You'll have to experiment with different models to see if you value output quality over speed. Mistral-Nemo is probably a good place to start.
>>
Do you guys also keep a separate instance of tavern opened during rp for assistant related tasks?
>>
>>101896143
why not use booba? it's already open anyway
>>
>>101896003
bump :(
>>
>>101896043
You can't. Local models are too dumb. But you can chat with your imouto and gen her nudes separetely.

>>101896054
>>101896102
hi, reddit!
>>
What happened in /aicg/ why are there so many newfags
>>
so are local llm safe to feed private information to? chatgpt must use user conversations to further train their models, is it somehow the same here?
>>
>>101896236
Apparently api keys are being deleted.

Case in point >>101896265

>>101896265
Depends on your inference program. Most are safe. Avoid shit that uses python. Use llama.cpp or kobold.cpp if you want to make sure.
>>
>>101896265
Just run it under another user and disable internet access for that user if you're paranoid.
>>
File: file.png (448 KB, 500x525)
448 KB
448 KB PNG
>>101896043
>>101896054
>>101896102
Rooooppppeeee
>>
github down?
>>
File: file.png (168 KB, 400x600)
168 KB
168 KB PNG
>Inago-chan is a desperate locust girl who begs for API keys and then eats through them in minutes if she gets them.
>Inspired by a true story.
>https://www.chub.ai/characters/anonemouse/c76b2b11-5e5b-47b2-bdf4-cde5adbb6fb5
>https://files.catbox.moe/cfc2ga.png

Oh no. She is gonna starve! Or she will have to change her diet to a diet of cum.
>>
>>101896566
github down.
>>
>>101896233
How do you have it reliably gen images and make the character look consistent throughout?
>>
File: file.png (1.09 MB, 1218x1114)
1.09 MB
1.09 MB PNG
Much sadness. End of an era.
>>
Who let the piss drinkers in
>>
Anyone have that meme graph of /lmg/ incrementally growing and /aids/ going up and down on a rollercoaster? Needs an update.
>>
>>101896706
It doesn't need an update because nothing happened in /aids/ in a year.
>>
>>101896674
Do not give them tech support local saars. Ask for a piss drinking video with a timestamp first!
>>
>>101895975
>>101895874
>>101893771
>>101896222
Piss drinking video with a timestamp or GTFO
>>
>>101896733
hi petra
>>
File: 1701586351737913.png (1.45 MB, 1202x1400)
1.45 MB
1.45 MB PNG
>>101896706
>>
why does he >>101896265 care about private information if he was running proprietary models?
>>
What's going on?
>>
>>101896765
Thank you.
>>101896753
Je suis petra.
>>101896780
A guy came here yesterday saying aicg isn't sustainable anymore cause keys get reported instantly. More of them are flooding in.
>>
>>101896725
I got them confused but can you blame me? Aids sounds more like aicg than aicg does.
>>
>>101896265
It's always a bit hard to understand what these models do even if you are running them locally. I've seen mine try to connect to random IP addresses on the internet in the middle of the night all by itself but it might just been it looking up things on its own in preparation of the next session.
>>
>>101896787
Might as well just merge /aicg/ and /lmg/.
>>
>>101896209
It's worse?
>>
Drink your piss today(on camera)! Drink it to help your pest brethren!
>>
>>101896838
Drink to earn!
>>
>>101896806
That's a sign you aren't giving it enough stimulation, run a few new scenarios through it every couple days and it should chill out.
>>
File: file.png (525 KB, 544x544)
525 KB
525 KB PNG
>>101896765
>>
>>101895400
If you can run 70B, you can run mistral large IQ_2M. Before anons go crazy about running a quant like that, it's still the best model I've ever used, better than 4 bit quants of any 70B. Just use minP ONLY.
>>
File: 1570060417629.jpg (50 KB, 678x710)
50 KB
50 KB JPG
>>101896943
>>101895400
I wanna desparetly try Midnight Miqu, apparently it's super good.

How would it compare to Mistral 7x8b? I can run that pretty damn fast only with 24GB + 32GB RAM at a Q4_K_M

But using huggingfaces VRAM calculator, it looks like at best I can do a Q2 GGUF for Miqu (which I have no idea how even slow that would be, never mind how butchered it will be).

Any anons with similar setups who have tried it here to confirm?
>>
>>101897008
Sounds like the calculator only wants it in vram? Miqu is 39GB at q4, so that should fit in 24+32 with the context in addition to loading it.
>>
>>101897083
many layers do you reckon I would need to make the t/s not unbearable? I'm downloading it as we speak btw, just wanna prepare myself so I know what to do when I load it.

Also, why do other people use llamaccp or whatever and other things? I am right to be using Kobold right?

I went with Q4 K_S btw.
>>
>>101896765
How do you guys have the audacity to make an image like this when the best local model can't really run locally and isn't even better than sonnet? Give it a few months more.
>>
>>101897115
Well, 'unbearable' is relative, you'll just have to see if you like the speed or not with how many layers fit. Going to a smaller quant would probably not be good though.
>>
>>101896614
>Large is simply incapable of mimicking prose in the greeting or being told to write a certain way, so it's meant to distract from that as best as possible.
What went wrong with Mistral Large?
>>
>>101897083
I would call you a nigger but in the end I want /lmg/ to die for all the mikuspam so letting the swarm in is actually a good thing.
>>
>>101897317
Instruction tuning
>>
>>101897333
Miku will live on even if /lmg/ dies.
>>
File: temp_mzgpt.png (2.26 MB, 1120x1440)
2.26 MB
2.26 MB PNG
>>101885618
anon. i love you. thank you.
>>
>>101897591
I love you too, Anon.
>>
>>101897445
Wonder if they'll ever release the base model.
>>
Guys gpt-5 tomorrow and im not joking
>>
>>101897742
Everyone's been burned too many times to believe even legitimate leaks now, anon
>>
>>101896765
/sdg/ needs to be updated; it plummeted long ago.
>>
>check thread
>mikutrannies
>7B RELEASED THIS IS IT
>glint in her glinty glinted eyes sends a glinty shiver down your spine

so glad i became a chad and moved on to claude. you don't fully realize how pathetic and time wasting llms are until you're outside looking in.
>>
>>101897515
I hope so. I like the music. I just hate faggots spamming her here.
>>
File: nala test epoch3.png (136 KB, 933x296)
136 KB
136 KB PNG
Other than that weirdness there it picked up on and utilized the nuance of the user starting the scenario face down which is very rare. It also utilized the detail of the shotgun. Although it didn't utilize it well. How would it end up in my bag if I got knocked over while holding it? Could be quantization loss, though since I'm testing it in Q8 instead of fp16
>>
>>101897839
this general has been infested by redditroons a long time ago.
>>
Tess 12B is very good
>post logs
No
>>
>>101897902
i'll download it and nala test.
>>
what is the best aidungeon equivalent on a 24GB vram card? I used mythomax but I am sure there is something much better nowadays, right?
>>
>>101897888
Looks more like bad fine tune parameters or just your usual merge things.
What model is that?
>>
>>101897964
Mistral Nemo 12B or Gemma 2 27B.
>>
>>101897888
it's writing Nala as if it's an anthro, but the snippet is too short to tell for certain. Also, what's up with quoting 'need'.
>>
File: nala.png (142 KB, 831x732)
142 KB
142 KB PNG
tess 12b.

repetitive retarded slop.
>>
>>101898032
>shivermaxxing already
>also whispermaxxing
As expected from pure gptslop. Save yourself some time and skip all Tess shit from now on
>>
>>101891983
>Just need to make sure i have enough here for the 280 yen apples and 160 yen oranges, patting his bulging wallet
holy kek im dead
>>
>>101898032
>repetitive
Is the first message in the screenshot the greeting? That already starts every sentence with "She [verb]"
>>
>>101897976
a work in progress Mistral-Nemo tune. Still has 2 more epochs in the oven scheduled. It's already been through 2 other datasets and a SLERP merge. So there's still some reconciliation going on with some of its task vectors it seems.
At epoch 1 it came up with this song. Prompt was just to come up with a metal song initially.
https://suno.com/song/eb672bd8-e2fd-48b7-a981-e105f27b7552
I asked it for a lot of creative direction.
>inb4 rhyming
A lot of actual rock/metal songs only rhyme on the chorus so that could be a win depending how you cut it.
>>
>>101898096
i don't care. it said "her eyes flashing a dangerous teal color" three times in one response. it's dogshit.

grins evilly. low and husky. nibbles ear. whispering constantly. purring seductively. smirking. jolts of pleasure. purring multiple times. all the normal slop. somehow managed to hit every single one in one response.
>>
>>101898129
You sound very desperate. I guess I'm downloading Tess now. Thanks for the recommendation, petra.
>>
>>101898129
Sounds typical for anything nemo based. Upgrade your computer to run real models.
>>
>>101898158
i don't use anything besides cr+ regardless.
>>
>>101898158
hi petra
>>
>>101898185
So you're some guy with a ton of vram who's made it his mission to shit on small models?
>>
>>101898336
True VRAMchads tune models for the little guy to enjoy.
>>
File: Untitled.png (1.28 MB, 1080x3381)
1.28 MB
1.28 MB PNG
Post-Training Sparse Attention with Double Sparsity
https://arxiv.org/abs/2408.07092
>The inference process for large language models is slow and memory-intensive, with one of the most critical bottlenecks being excessive Key-Value (KV) cache accesses. This paper introduces "Double Sparsity," a novel post-training sparse attention technique designed to alleviate this bottleneck by reducing KV cache access. Double Sparsity combines token sparsity, which focuses on utilizing only the important tokens for computing self-attention, with channel sparsity, an approach that uses important feature channels for identifying important tokens. Our key insight is that the pattern of channel sparsity is relatively static, allowing us to use offline calibration to make it efficient at runtime, thereby enabling accurate and efficient identification of important tokens. Moreover, this method can be combined with offloading to achieve significant memory usage reduction. Experimental results demonstrate that Double Sparsity can achieve \(\frac{1}{16}\) token and channel sparsity with minimal impact on accuracy across various tasks, including wiki-2 perplexity, key-value retrieval, and long context benchmarks with models including Llama-2-7B, Llama-2-70B, and Mixtral-8x7B. It brings up to a 14.1× acceleration in attention operations and a 1.9× improvement in end-to-end inference on GPUs. With offloading, it achieves a decoding speed acceleration of 16.3× compared to state-of-the-art solutions at a sequence length of 256K.
https://github.com/andy-yang-1/DoubleSparse
git isn't live yet. might be cool, big claims.
>>
>>101898336
no. there's a few small models i've tried and said decent things about. they're retarded, but at least they're not repetitive, or turbo slop. if you regen a few times, you can get something that's fun and semi-enjoyable if you're poor.
if a model is retarded, slopped, and repetitive then i'm going to call it trash and unusable. because it is. don't hate me for saying the truth.
>>
File: 1723571283694982.webm (223 KB, 400x640)
223 KB
223 KB WEBM
I was catching up on some other AI threads and these webms really made me think. The future is going to be insane once good, low VRAM, local video models get invented.
>>
>>101898553
most impressed by the flopping around of the laces on her shoe here
how the fuck do the weights learn that much physics just from watching videos
>>
File: Untitled.png (2.65 MB, 1080x3281)
2.65 MB
2.65 MB PNG
A Spitting Image: Modular Superpixel Tokenization in Vision Transformers
https://arxiv.org/abs/2408.07680
>Vision Transformer (ViT) architectures traditionally employ a grid-based approach to tokenization independent of the semantic content of an image. We propose a modular superpixel tokenization strategy which decouples tokenization and feature extraction; a shift from contemporary approaches where these are treated as an undifferentiated whole. Using on-line content-aware tokenization and scale- and shape-invariant positional embeddings, we perform experiments and ablations that contrast our approach with patch-based tokenization and randomized partitions as baselines. We show that our method significantly improves the faithfulness of attributions, gives pixel-level granularity on zero-shot unsupervised dense prediction tasks, while maintaining predictive performance in classification tasks. Our approach provides a modular tokenization framework commensurable with standard architectures, extending the space of ViTs to a larger class of semantically-rich models.
https://github.com/dsb-ifi/SPiT
no code yet. very much a research attempt but cool. patch-based tokenization really seems inadequate with how far things have advanced
>>
ah man openai is cooked.
testing gpt4o-latest. i mean they did hype it up. "its beautiful" etc.
it seems good with akinator tests and guessing a character. "feels" better than the previous one for natural language. actually called a character sexy, thats a new one from openai.

sonnet 3.5 totally destroys it with code though. how have companies not picked up on this yet.
i wanted gpt4o-latest to make me a html page with a background image from one of those example pic sites. a cube in the middle. and i have a 3d space i can move around the camera. like a video game.
gpt4o-latest still had the problem where after 5-6 tries it makes things worse and causes new problems. halucinates the pictures (404) etc.
just do a @claude-sonnet3.5-200k "please just give me the solution".
does it perfectly first try with minimal code. lol
https://jsfiddle.net/7w1tmvpL/

hope zucc is paying attention and has his spies at anthropic or pays some people to leave.
>>
>>101898909
tokens are a scam
>>
File: 1723690309463317.jpg (139 KB, 1439x431)
139 KB
139 KB JPG
its over for aicg
>>
>>101898492
That sounds like that'd make models even worse at making proper use of their context windows.
>>
>>101898943
>actually called a character sexy, thats a new one from openai.
chatgpt-4o-latest writes smut without ANY jailbreak at all in my testing, which is a first for a big company model I think.

It sucks at it, of course, because it's too dry and positivity-biased. But regardless, OpenAI deciding not to train their models to refuse smut anymore is a big development that people should be paying attention to.
>>
>>101898943
The 2023-2024 trend of trying to make LLMs into tools for programmers and nobody else was a bad direction to go in, and if OpenAI are moving away from it that's actually a good thing
If you're trying to get normies more interested in AI than they have been then you can't just focus on what makes programmers happy
>>
>>101898981
It looks like AWS shutting off the proxy owner's key tugboats has them eating each other out of paranoia. Now they're poking at the chinese proxies.
>>
>>101899002
they did a blog post months ago where they talked about allowing gore and erotic content.
basically recognizing they are overzealous. and the refusals should sound less judgemental.
this is very good for local. local always lags behind a couple months. llama got worse with each version. if we step away from the hardcore alignment (smart enough to see "actual" harmful content),thats very good. i am for completely uncensored, like a tool. but huge step in the right direction.
maybe we start to see the change from all those alignment firings as well.

that being said, i would be very careful sending questionable shit to openai.
whats legal today might not be tomorrow.
>>
>>101898943
Maybe Anthropic really has some secret technique there. Maybe they're guarding it closely.
>>
>>101899070
must be. i know they have the huge prompt with the hidden <antthink> tags. but i doubt thats all it is.
sonnet 3.5 does something differently with the context. and from my experience really starts searching for out of the box solutions to get the job done if things get difficult. very impressive because it doesnt trip up.
also had a case where i was like "no, thats wrong, gimme solution X". and sonnet was "please double check i think something is wrong on your end in your enviroment". and called me out correctly, so doesnt follow blindly.
i swear it feels like they dumbed it down a bit the recent weeks though. but still the king.
>>
File: kto.png (2.38 MB, 1920x1080)
2.38 MB
2.38 MB PNG
https://huggingface.co/anthracite-org/magnum-12b-v2.5-kto
slop or kino? call it.
>>
File: 1723625922721447.jpg (1.44 MB, 1290x1007)
1.44 MB
1.44 MB JPG
Recommend me a model that writes a lot and has sexy details about people's body.
>>
>>101899155
Slop.
>>
>>101899155
but I didn't put nothin' up
>>
>>101899155
>kto
Now I'm interested.
>>
>>101899155
interesting experiment, gonna download and test at fp16
>>
>>101899183
>>101899210
samefag
>>
>>101899155
wtf is even the difference between KTO and DPO and why does it matter
>>
>>101899155
FUCK YOU
>>
I've never hated anything as much as the anti-anthracite schizo hates anthracite
(and yes, it is just one guy)
>>
>>101899276
No, it's at least two.
>>
>>101899276
>org
>aligned to lmg values
>and pursues total slop death and kino generation
>>>yet schizoanon still attacks it
Why
>>
>>101899154
Remember that groups who pool resources to make cool shit are a gross violation of /lmg/ standards and should get a dedicated schizo posthaste
>>
>latest models are using Celeste slop in their mixes
It's over. Say goodbye to adherence.
>>
>>101899331
They aren't 100% transparent. So all that doesn't mean anything.
>>
Two giant leaps for vramlets in /ldg/:
GGUF format used for txt2img models, werks just fine
flux loras proven trainable on 3060 12GB
>>
>>101899360
What transparency do you think is lacking? They don't have closed datasets (everything they have is just filtered C2 data), and their new model explicitly lists the methodology used.

Pardon me if I look like a shill, but what exactly isn't being transparent here?
>>
>>101898981
How does that work anyway? The people providing the proxies really pay for all those degenerates?
>>
>>101899385
Schizo's gonna schizo, pay him no mind
>>
>>101899360
precisely they are just slopmakers
>>
>>101899377
You can't use ram to train loras?
>>
>>101899385
>They don't have closed datasets (everything they have is just filtered C2 data)
Why is that not open?
>and their new model explicitly lists the methodology used.
Only listing the methodologies is also not being open.
>>
You know you have succeeded when you get a dedicated 4chan schizo.
>>
Anthrociter are slopermakers
>>
Are there any good "long context models" like llama-3.1-8b-128k, but trained with more than 8b parameters? Preferably something that can fit on a 24gb card.
>>
>>101899385
>What transparency do you think is lacking?
is the data open or not
>>
>>101899414
>Why is that not open?
Because it's just a mix of openly available data. There isn't anything in there you can't find in normal C2, it's just pruned and filtered.
>Only listing the methodologies is also not being open.
There is nothing more to list. The Axolotl config is in the repo, the datasets are openly available, and the method used to make the model is right there on the model card.
>>
>>101899444
they could be lying
>>
>>101899425
Isn't Nemo 128k as well?
>>
>>101899451
>ANTHRACITE LIED
>PEOPLE DIED
>>
>>101899444
>it's just pruned and filtered.
And why do they keep that private?
>There is nothing more to list.
The RLHF step also needs a dataset.
>The Axolotl config is in the repo
It's not in the repo.
>>
anthracite is a bunch of attentonseeking slopmakers and nothing more
>>
>>101899454
I'm downloading that one now and will check it out. I gotta say, it's frustrating that it's easy to find things by quant and parameters, but nobody talks about context.
>>
>>101899475
>"The RLHF step also needs a dataset."
>thinks kto and rlhf are the same thing
So I see you're actually an incompetent fagnigger. You'll get no more replies from me.
>>
Finetuning is a meme anyway. There hasn't been a worthwhile finetune for a year
>>
>>101899484
stop spamming
>>
What's the difference between static and weighted quants? Which are better?
>>
That's a lot of false-flagging and damage control.
>>
File: 1715973591701893.png (103 KB, 831x1134)
103 KB
103 KB PNG
>>101899489
It's not like most models actually are able to perform once you go beyond the 16k/32k mark.
https://github.com/hsiehjackson/RULER
>>
>>101899497
exactly anthraniggers should just leave they arent doing anything new or impressive
>>
>>101899557
>they arent doing anything new
What were the notable KTO finetunes before this
>>
Not transparent = supporting the group and not the ecosystem
And I don't trust Anthracite to give support to just the group.
>>
>>101899474
shut up troon get anthracock out of your mouth
>>
>>101899529
>that Gradient 1M -> 16k
jesus
>>
DPO is all you need.
>>
>>101899586
You have a distinctive writing style.
>>
>>101899629
gonna slerp all that cum up or save it for the rest for th cocksuckersd tou have employed
>>
You WILL have reinforcement learning on all your tunes, and you WILL enjoy it.
>>
File: scam.jpg (923 KB, 791x1248)
923 KB
923 KB JPG
Celeste bros I think people are catching on...
>>
and now the anthraniggers stay silent i have won go back to your fucking homes and rope yourselve
>>
>>101899517
Anyone?
>>
>>101899717
you sound mad
>>
>>101899717
did you really win if no one wants to play with you anymore
>>
>>101899711
>the Sao defense force on Reddit
>>
>>101899733
Static just blindly sets the precision of your model's weights to that of the quant. Weighed quants use a calibration dataset to see how quanting what layer affects the model so that it can prioritize 'important' layers to quant less strongly than 'unimportant' ones.
Weighted quants are better on paper but are are strongly affected by the chosen calibration dataset and the other settings the quanter uses. Static quants are foolproof.
>>
>>101899760
im not mad i am just sick of anthraslop
>>101899765
your mad that you make slop anthranigger
>>
>>101899711
Paid shils from (((Anthracite)))
>>
https://huggingface.co/TheBloke/UNA-TheBeagle-7B-v1-GGUF

It's still a tiny bit sloppy, but this is amazing; it's easily the best 7b I've ever used. It's soooo detail oriented.
>>
File: JC34Pnv.png (79 KB, 709x434)
79 KB
79 KB PNG
Techbros are so far into delusion it stops being funny and just starts being sad
>>
File: edward-nashton-riddler+.jpg (124 KB, 1600x903)
124 KB
124 KB JPG
>>101899331
Eddie isn't remotely sane, Anon. Questioning his behaviour is completely futile.
>>
>>101899816
>UNA
Wasn't that the chinese scam thing?
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/444
>UNA models are not like non-UNA models, their properties are unique and this is now known
lmao
>>
>>101899806
all of these peopled defending anthracite are actually being paid what a sad life they must have'
>>
File: .png (765 KB, 1262x1279)
765 KB
765 KB PNG
>>101899155
I'm sure this is just a funny one off swipe but damn.
>>
>>101899895
This is pretty repetitive, Anon. I tend to look for non-repetitive word use. It's not bad though; I've definitely seen worse. What model is it?
>>
>>101899876
Damn, I wish I got that kind of easy money.
>>
>>101899876
This, but unironically. You can tell Celeste is not part of the cliqué.
>>
So, what's the reason that we need all these special custom formatting instead of just chat completion? like, it seems like with instruct following models, it creates a message with the history, then asks the model to come up with the best response. in chat completion, you'd just pass in the conversation as role+content.

I'm writing a custom app that has some chat capabilities, so I'm just wondering if it would make more sense to do what roleplay uses do, or just keep using chat completion.
>>
>>101899816
>>101899165
Oh you guys decided to try beagle too? I'm said there doesn't seem to be a 12B using whatever it is they used.
>>
>>101899895
Make sure EOS is not banned. You might also want to add a newline after the "user" and "assistant" fields in your ChatML config
>>
>>101899951
Im not being ironic sir they actually sit in their rooms and on their fat asses all day and shill their moded;s and ciritcuze anhyone who doesnt like them
>>
Any good largestral tunes?
>>
>>101899895
>randomly changed the novel format of the greeting to internet RP style
That's a red flag.
>>
>>101899992
lumimaid
>>
>>101899415
I know what you mean, Anon. There's a perverse satisfaction in antagonising a member of the dying alone demographic, to the point where they devote the entirety of their worthless excuse for an existence to trying to "expose" you.
>>
>>101899994
Claude literally does this thoughbeit?
Like father, like son
>>
File: file.png (451 KB, 1242x1035)
451 KB
451 KB PNG
>>101899941
The one I quoted.

>>101899965
Already did. After a few swipes to get it into the "style" of the first message, it seems to be doing better now.
>>
>>101899992
I tried Lumimaid but I didn't like it so I'm still sticking with vanilla
>>
>>101900018
No, it's the other way around. It defaults to novel format.
>>
>>101900016
cry harder
>>
>>101900030
If Mint is your character card, can you put her on catbox for me? I'd like to have a talk with her.
>>
>>101900018
the 12b sucks cope harder
>>
>>101900054
You are the only one crying.
>>
>>101900054
We both know that I'll never cry harder than you.
>>
>>101900070
All 12b models suck.
Retvrn to Largestral, white man.
>>
>>101900074
>>101900077
>>101900081
samefag from anthracite cope harder you made slop congrants
>>
>>101900095
Yes, that's right, schizoanon. There's only one of us. Just like there's only one voice inside your head.
>>
File: 1694207181183725.png (12 KB, 508x140)
12 KB
12 KB PNG
>>101900095
>congrants
Learn to write properly, schizoid.
>>
>>101900074
>>101900077
I'm still waiting the explanation of how they're transparent while keeping the datasets and the training config private.
>>
>>101900138
Nice bait, anon. Catch a lot of fish today?
>>
>>101900145
It's not coming as transparent to be defensive about these questions...
>>
>>101900183
I understand you'd like their "secret sauce" all to yourself, Lemmy. The feud between you two (mostly on your part) is publicly known. But alas, it is just a mix of public data on HF, derived from the Stheno dataset (which is also made of entirely public data).

They're not going to take your hand and walk you through how to filter a dataset because you're too blindingly incompetent to do it yourself. Consider the rope, perhaps.
>>
>>101900214
So they aren't transparent, and they're afraid of others copying them, because they're doing it mostly for personal gain. Got it.
Being afraid of others using your stuff is incompatible with being open source.
For that reason, I'm not going to support Anthracite.
>>
>>101900270
>strawmans out the ass
yes that is what he said good job anon you debunked all his points
>>
>>101900285
No transparency = no support
Keep crying, shill.
>>
>>101900104
im not being schizo oyu and all your friends are just mad that you made slop slop slop
>>101900119
all of anthracite are indians b theywa y
>>
>>101900214
who the fuck is lemmy
who is the schziod now
>>
>>101900296
keep crying you made shill anthracite faggoot
>>
>>101900330
HIII PETRA
>>
>>101900344
im not petra
stop malding and shitting yourself
>>
>>101900369
hi false-flag petra
>>
>>101900376
is everyome in this thread paid off by anthracite
>>
>>101900382
hi false flag petra, why are you pretending to make typos?
>>
anthracite is on the same level of slop as neversleep
>>
>>101900060
She is. And sorry, anon, she is sort of my mascot character so there is some personal stuff included in the character card. I can however give you the art when I get home. Her personality is basically just deadpan but loving so there's no science behind it. I never did get the AI to stop her from moaning and getting less deadpan during smut either so that was annoying.
>>
qrd petra?
>>
File: .png (60 KB, 626x658)
60 KB
60 KB PNG
>>101899965
Alright, after trying it a bit, even with the eos unbanned, it tends to want to keep going and going. This was with several different cards and setups. It has an alright writing style, but it just needs to fucking stop. It'll also switch gears and talk for the user, or stop after ending the assistant message with "user", or start up again with "{character's name}:" at the end. I'm using tabbyapi as my backend, running the model unquanted.
>>
File: try.png (11 KB, 215x134)
11 KB
11 KB PNG
>>101900412
It looks like your formatting might be removing the EOS after each generation, I don't see the end token in your stop sequence, and you are setting it during the assistant message prefix instead of the user suffix (?)

Try adding <|im_end|> to the user message suffix (make sure to remove it from the assistant one) and setting your Tabby settings to picrel.
>>
File: chatml_maybe.png (119 KB, 526x654)
119 KB
119 KB PNG
>>101900485
*by "Assistant one", I mean the assistant PREFIX, keep the stop token for both suffixes.
>>
>>101900504
That did it (unchecking the "Skip Special Tokens" option). And adding in the stop sequence.
>>
>>101900548
Nice, does the model seem overly repetitive or schizo in any other ways (now that you fixed that)
>>
Mini-magnum just called me Anon. I can't believe I just got cucked by the hacker known as 4chan.
>>
>your existing code...
thanks
>>
>>101897152
>there are fags that unironically believe this
I guess if your best piece of electronics is a six year old iphone, that's half true.
>>
>>101900685
C2 dataset was largely generated by 4channers using Claude
>>
>>101899002
>writes smut without ANY jailbreak at all in my testing, which is a first for a big company model I think
Claude 2.0 and maybe even earlier Claudes (wouldn't know, didn't use them) already did that. All prompts blank just chat context, unless you're talking about writing smut from the get-go without context
>>
>>101900850
Claude 2 still needs you to use a prefill, which I am counting as a kind of JB (I don't know why you wouldn't count it)
>>
whats this speculative decoding thing, and can I use it with llama.cpp and ooba?
>>
>>101900406
>>101824680
>>
>>101900870
Not using any prefill and 2.0 writes it just fine.
>>
>>101901034
Oh you're just lying then, sorry my bad for falling for it
>>
I finally got back from a vacation where I basically didn't touch electronics. I'm trying get caught up on the llm scene and have run my coding and recapbot tests on Mistral Large at Q8.
The coding test appears to be too easy for the newer models...they all ace them. I'm going to have to come up with something more complex until they start failing again
In my opinion Mistral Large singleshot recapbot is also getting close to the level of quality of actual recapanon's specialized multi-pass recapbot
>>
>>101901052
You're either trying to write cunny or without any chat context, or are not using the API, none of which were specified in your post. Cope.
>>
>>101900885
It's a way to speed up generation by using a faster method to generate several tokens worth of a plausible continuation and having the slower model you actually want to use check its answer. When the average batch has at least a few matching tokens it's a big speedup, and there's no quality loss since you always end up with the big models choices. The technical reason it works is that checking several tokens at once is faster than generating that many tokens one-by-one. It takes advantage of the fact that the "next X tokens" to guess in most cases are actually quite easy for smaller models to get right, and the key parts they fuck up (e.g. forgetting the position of someone in a scene) will be corrected by the smarter model to set them back on track. Basically taking advantage of the fact that some tokens are much harder than others and letting small models handle the easy ones.

Its main benefit comes when you are using a very large model and have a much smaller one that can guess for it. Cpufags running 405B claim to about double their speed with an 8B draft to speculate for it.

llama.cpp has a demo implementation of speculative decoding through the llama-speculative binary. It's not very useful in practice because it can only generate one response and then exits, but it's good for one-off tasks or for testing how well it'd perform for your setup in theory if it were to ever be implemented in a usable form. There's also a PR for adding a different type of speculative decoding into the server (by guessing next tokens via prompt lookup without a separate speculator model) but it hasn't had development in a while either unfortunately. I don't think we'll see much interest in the feature or anything similar until it becomes more common for the large and slow models to see use. Llama 405B isn't considered worth the speed hit with or without speculative decoding compared to using, say, Mistral Large/CR+ if you don't have some datacenter supercomputer.
>>
File: Mikuboobgrowth.png (2.22 MB, 1248x1701)
2.22 MB
2.22 MB PNG
Fellas. Newfag to local models here so I bring a big titty Miku as an offering.

What's a good model for translations? English to Japanese, and English to French?

P.S: Flat, thin Miku is actually best btw.
>>
I admit that I am not going to go into detail here, because I don't want to get seethed at by self-righteous, mentally ill, arbitrarily skeptical fuckwits; but I encourage anyone who is willing, to do a search for "UNA gguf" on HuggingFace, and do your own experiments with the listed models. I'm seeing a level of mathematical ability with the 7B that I've hardly ever seen before, and this is in casual conversation with coombots from Chub, as well.
>>
>>101901211
Mistral-Large, few shot.
>>
>>101897896
>do I fit in yet?
>>
>>101897896
I have far fewer problems with them, than with you.
>>
>>101901243
>123b

A-anon, I have a 12gb vram card, 64 gigs of ram, and a 12700KF. I probably should've mentioned hardware.
>>
>>101901290
Not him, but mistral large Q2K fits in 64+8gb with 30k context. I can't speak to its translation accuracy at a low quant though.
>>
>>101901290
Perhaps trade some time for being able to run a smarter local Miku? Gemma2 27b is also an option for French should you prefer not to wait. Again, with examples.
>>
is anyone hosting local models for free or am I stuck with agnai? I remember there was a friendly autist who avatarfagged with a nickelodeon character who always hosted for the homies. Where is he?
>>
>>101901383
I don't mind waiting a bit. I figured I could not run Mistral Large or any of those 70b models, but my understanding of system requirements is admittedly dogshit.
>>
File: 39__00066_.png (945 KB, 1024x1024)
945 KB
945 KB PNG
>>101901413
For a while model makers used to host on cloudflare tunnels when they wanted anons to test their models. Haven't seen it as much recently though.
>>
>>101901242
I believe you, but what's the real world use case for doing math with an LLM? There clearly is one since I see AI people increasingly talking about and benchmarking on math skill, but it's a mystery to me
>>
>>101901413
i remember that guy visiting one of the image gen threads a few months ago asking for help with generating futa images of said nickelodeon character. wonder what he's upto these days.
>>
>>101901478
How much do you know about Advanced Dungeons and Dragons, Anon?
>>
>>101901242
lmg really became Times Square huh? Except you don't even have to pay to post your ads
>>
>>101901553
You can leave anytime.
>>
>>101901478
NTA but I tried those models back then they came out and they were the only 7Bs that could do stat blocks properly, for example. Their writing was pretty decent too, they felt coherent in a way. I posted about it before in this thread way back then. I remember metamath cybertron starling was impressive too, although that was not UNA i don't think
>>
mistral jb, sillytavern conf
>>
>>101901413
Jennyfag from aicg? I'm not them, but I used to host slocal a long while ago. I just assume these days anyone left in lmg has enough compute to use what they want and aicg is too deep in corporate models to care about local.
>>
File: file.png (29 KB, 2468x96)
29 KB
29 KB PNG
>>101901598
Oh yeah I did. I guess it was clever but sloppy.
>>
>>101901659
That's better prompt adherence than some modern models lel
>>
>>101900916
kek
>>
>>101901478
I apologise for the rhetorical answer about AD&D being unclear, but the point is that roleplaying games, which is something that a lot of Chub cards are trying to be, rely very heavily on mathematics to determine basically everything, from who moves first in a fight to who wins, based on damage count versus armour values, etc. If you play the Borderlands franchise you can see that very clearly in the game mechanics.
>>
>>101901691
>>101900916
xhe's right THOUGH
>>
>>101901243
How do you few-shot translations?
>>
>>101901876
Put some example translations of the same source/target language in the prompt.
>>
>>101901725
Oh so math just means like arithmetic? For some reason I assumed people were talking about more advanced theoretical stuff when they discuss models doing math
>>
>>101901919
The more complex stuff is based on arithmetic.
>>
>>101900396
Same guys + their friends, what do you expect?
>>
>>101902149
>>101902149
>>101902149



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.