[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1702227656151264.jpg (726 KB, 1856x2464)
726 KB
726 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101398610 & >>101392789

►News
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101398610

--Performance Comparison of Fine-Tuned Machine Learning Models for Japanese Language Tasks: >>101402729
--Status of Gemma FlashAttention in Exllama and Llama.cpp: >>101404890 >>101404961 >>101404982 >>101404974 >>101404989 >>101404997
--Microsoft's T-MAC for Low-bit LLM Inference on CPU: >>101400664 >>101400691 >>101400715 >>101400727 >>101400739 >>101400766
--Gemma has formatting issues with narration, tokenization, and unconventional writing styles: >>101406156 >>101406193 >>101406350
--Anticipation and Uncertainty Surrounding the 400B Model Release: >>101406549 >>101406586 >>101406689 >>101406719 >>101406609 >>101406645 >>101406662 >>101406906
--From skepticism to understanding: Anon's journey into the world of AI chatbots and cooming: >>101400658 >>101404991 >>101405455 >>101405807 >>101405843 >>101406138
--Amount of RP Data Needed for Finetuning and Effectiveness of Lora: >>101404690 >>101404742 >>101404954
--Gemma Compatibility with Latest Koboldcpp and Context Shifting Issues: >>101401664 >>101402962 >>101403123
--Frequency of Updates for the UGI Leaderboard?: >>101407818
--characharm/gemma-2-27b-it.gguf: Improved Tokenization for HTML and Consecutive Spaces: >>101407533
--Where to test Gemma without a local setup?: >>101404824 >>101404967
--Potential Issues with Gemma Model Implementation: >>101408036 >>101408112
--NTA Fixes Repetition Issues and Llama_cpp_HF/EXL2 Enable Token Probabilities: >>101406702 >>101406718 >>101406739 >>101406835
--LCPP Gemma Fixes Released: Tokenization Improvements for Gemma and Gemma-2: >>101405587
--HTML5 Apps: A Hole in LLM Coverage or Too Complex a Task?: >>101407009 >>101407039
--Anole and Hato AI Model Adventures: Tackling CUDA OOM and GPU Memory Conundrums: >>101399676 >>101399734 >>101403049 >>101405127
--Miku (free space): >>101399878 >>101401746 >>101405424 >>101405439 >>101405454 >>101405464

►Recent Highlight Posts from the Previous Thread: >>101398673
>>
File: 468517167.jpg (836 KB, 1792x2304)
836 KB
836 KB JPG
Mikulove
>>
Two more days!
>>
>>101409325
Oh, also, it's going to be painfully slower.
>>
>play as a guy who rose to greatness and protected humanity
>reach the end
>think of a way to make it more interesting and continue the story
>introduce an innocent young girl that sought to mindbreak him so that he could mindbreak her
Hmm, maybe I am the evil one after all.
>>
how big will be the market for lewd video generation
>>
>>101409549
>how big will be the market
I just realized that what is coming next is people intentionally typing like retards to show that it is a real person instead of AI. And next step after that will be AI intentionally typing like retards to pretend it is a real person.
>>
File: 1700140349440192.png (2.39 MB, 1736x2456)
2.39 MB
2.39 MB PNG
Has anything come from MORA? It was hyped up to be an alternative to LoRA that actually adds knowledge to the model a while ago but I haven't heard anything of it since.
>>
>>101409803
kek
>>
>>101409843
I tried it and the model was broken. Plan to test again as I think this was my fault for using it as if it was a better LoRA. It’s something different (stronger?). Just haven’t had time to play more yet.
>>
File: namba.png (37 KB, 879x233)
37 KB
37 KB PNG
>>
>>101410298
por que
>>
>>101409549
Interest might be high. While the generation count will be very low
>>
>>101409803
>singularity is just idiocracy
Can't wait.
>>
>"first_output_sequence": "<bos><start_of_turn>model",
I think I fixed my gemma by removing this. Some retard/troll ITT made it. I wish he would die in a fire but that is honestly less important than how loaders allow this shit to happen. And also how "Add BOS Token" is a setting when there are no safeties in place. I am sure that more than 50% of users will or have at some point added duplicate bos tokens without even knowing it. BOS token should either be removed as a switch like that or a remove all duplicate BOS tokens should be added as a default option.

I hate this hobby.
>>
>>101410950
>I am sure that more than 50% of users will or have at some point added duplicate bos tokens without even knowing it
only idiots
>>
>Sao datasets nuked
why
they were some good sets
this is like the 3rd time a good dataset I've been using for training is nuked randomly during a run
>>
>>101410950
>a remove all duplicate BOS tokens should be added as a default option
Yeah, I know we're still in the Wild West but there is a lot of rough edges that really ought to be standardized away even if it were as simple as someone just saying "We have 10 standards, let's pick the satisfice and get it down to 1."

>>101410961
Good contribution.
>>
>>101410981
sao is just entering his udi arc don't worry about it
>>
>>101410981
Do you not clone datasets and models you use as a base for your own experiments?
>>
>>101410991
>Good contribution.
i agree thx!
>>
>>101411009
no I use streaming since most of the datasets I use are dozens of GBs in size and I round-robin between them during training
>>
>>101410991
>"We have 10 standards, let's pick the satisfice and get it down to 1."
ChatML, temp 1, deprecate ALL other settings, there you go now it's idiot proof.
>>
>>101411038
I see. Fair.
There isn't a way to just clone a repo on huggingface without downloading it to your machine first right?
>>
>>101411041
Agreed except I've been hoping that temp 0 (Kobold sets it to 0.01) is as deterministic enough that it's the canonical output for things like code generation without as many hallucinations.
>>
>>101409803
low csing only no pnctuation skipping lettrs meta

we are the resistance
>>
https://huggingface.co/characharm/gemma-2-27b-it.gguf
>re-conversion
>makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces
has anyone tried those new quants? did you feel it made gemma less retarded?
>>
>>101411062
Sorry not idiot proof enough, "it repaet too much", temp 1 is statistical average no other option.
>>
>>101411051
not that I know of
the only way is to clone the repo and upload it to your own account
>>
File: file.png (1.14 MB, 1152x768)
1.14 MB
1.14 MB PNG
>>101411070
it still inserts extra spaces and new lines and fucks up roleplay formatting
>>
What's the smallest model you can RP with? I'm gonna run llama.cpp on my phone for fun and I'm wondering what's a good model? I don't think I can fit mistral...
>>
>>101411079
>roleplay formatting
no such thing
>>
>>101411079
fuck man... I hoped it was the final fix, the fuck is wrong with gemma? I hope they'll find the problem at some point in time.
>>
>>101411090
Llama 3 8B
>>
>>101411096
>problem
didn't auto say it did the same on the google api?
>>
>>101411090
>I don't think I can fit mistral...
At that point you might as well spin a colab instance with ngrok and access that remote instance from your phone.
You can run the frontend in your phone if you want too.
>>
>>101411094
if gemma decides to output something with asterisks, she fucks up everything. I don't even use asterisks myself for "roleplay", only rarely for emphasis or sounds like *BOOM*, *PLAP*, etc. And i insert the insturction to right only in plain text, which works mostly, but sometimes during narration she may start adding quotes first, and from there switches to this roleplay bullshit
>>
>>101411144
>insturction to right
forgive me sirs, it's 03:12 AM
>>
>>101411110
you can rp with gemma with the google api?
>>
>>101411156
Fellow balkanigger
>>101411104
Too big =(
>>101411141
I mean I can just run it on my PC, but running it locally on my phone sounded fun just for the hell of it.
>>
>>101411174
>on my phone sounded fun
does 5+ minute per 8b reply at 2k context sound fun to you?
>I'm getting single sentence responses in 30-40 seconds on a Note10
>This caused me to have to re-ingest the prompt which takes multiple minutes at a full 2048 tokens.
https://huggingface.co/Lewdiculous/Model-Requests/discussions/42
>>
>>101411174
>but running it locally on my phone sounded fun just for the hell of it.
>https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
>https://huggingface.co/microsoft/Phi-3-mini-4k-instruct
>>
>>101411090
Why do people have this hardon for processing shit on their phone? Just make a server like a human.
>>
>>101409422
So they try to mind real each other? Neat plot actually
>>
>>101409843
MMM Ganyu so thic
>>
>>101411275
Well, kind of? Basically just that one Asanagi doujin now that I think about it. Good dude gets mindbroken (or rather hypnotized in the case of that specific doujin) in a way that makes him want to mindbreak the girl.
>>
>>101409356
>>101409364
>>101409387
Adorable Mikus!
>>
>>101411265
Idk man, so I can fuck around with LLMs while my PC is off or doing something else.

>>101411191
Last time I tried it with some miniscule model it seemed to work fine

>>101411215
Thanks anon, I'll check these out
>>
File: solid.jpg (81 KB, 1258x1319)
81 KB
81 KB JPG
>>101411510
But where does the other side of the plug go?
>>
>>101411566
probably above her tailbone, also unless there's a hole in her skirt it can't reach there
>>
Alright, I like Nymph_8B so far, but it is worse than Stheno for brats and zoomer speech it seems.
The model's "personality" is really strong, as in, it bleeds into every character.
Slightly overbaked maybe?
On another note, how does applying multiple LoRA to a model works? As in, say that I extract a LoRA from a fine tune of a model, then a second one from another model, then apply both to a third also fine tuned model, what would happen?
Sounds to me like it would behave almost like a frankenmerge, as in mostly badly.
>>
File: Koboldcpp 1.70.png (92 KB, 865x680)
92 KB
92 KB PNG
>>101409356
koboldcpp-1.70 came out 2 minutes ago for anyone using these releases.
>>
>>101411691
t. concedo
very organic, well done. now go back to your discord
>>
>>101411711
kill yourself faggot
>>
>>101411721
you first, troon
>>
>>101411691
Thanks for the update.
>>
>>101411691
DRY Samppler aside, anything interesting for somebody who's been using llama-server for the last 5 or so months after using kcpp for a long ass time?
>>
File: IMG_20240715_103027.jpg (153 KB, 1080x2288)
153 KB
153 KB JPG
>>101411691
Corpo style looks amazing!
Except for the blue bar at the top.
>>
>try an SCP card
>the one with the machine that turns objects into other objects depending on the setting you adjust it to
>try out a bunch of crap
>finally try out "I put myself in and turn it on"
>it teleports me to a completely blank white space
>I tell it that it reminds me of the Matrix and I try commanding the computer to spawn stuff
>it does
>simultaneously the narration says that the Foundation is trying to investigate the machine and detected an entire dimension in it
>I decide to ask for the computer to spawn an avatar for itself to communicate with me
>it spawns an avatar but instead says that it relays the message of a collective of non-physical consciousnesses, who together control this space, rather than a computer
>find out that the space is actually a nexus dimension that connects to many others, and the consciousnesses are there to gather more "stories" from dimensions that the nexus connects to, since they have none themselves, as they never had physical forms
>make the analogy of internet forums for TV shows, where the posters are an audience that wants more content, which may be gained when someone posts something to the forum
>somehow I'm the only one with a physical form that was able to get into this dimension
>the Foundation somehow found a way to look into the dimension, capturing video and audio data, so I get another audience member
>decide to just go crazy and pretend host a show with explicit content, since the audience of non-physical consciousnesses wanted as much as they could get
>then I propose to manifest the members into physical beings I can interact with and that can participate as characters in my story
>I literally get a harem of alien consciousnesses stuffed into girl bodies
Went a bit overboard with the wall of text here but fug it. I was not expecting the model to generate something like this today, and not in a coherent way either. It kind of felt like it actually was able to understand this layered scenario.
>>
File: three times bigger.jpg (41 KB, 512x329)
41 KB
41 KB JPG
Any advancements lately on the context front for low/midrange local?
t. been having fun with Llama 8B finetunes on my 12GB VRAM but really really sick of being limited to 8k context. I need at least 24k context for my stories.
>>
>>101411874
2md
>>
>>101411874
Go for either some kind of mixtral 8x7b (32k native context and faster)
or Wizard 8x22b (65k native context, slower)
While you might not be able to fit all the context in 12gb, 4 or 8 bit cache will easily put you over 8k.
>>
>>101411874
Maybe you should get sick of being poor instead
>>
>>101411874
>big context
>12GB
bruh
>>
>>101411874
No local model can use effectively contexts larger than 4K tokens.
>>
>>101411950
Sounds like vramlet cope to me. Can go up to 32k reliably with the right model. No rope, no cope:
>>101358971
>>
So is our dear CUDA dev the reson why P40 prices keep going up?
>>
>>101411980
>>101358971
retard
>>
>>101411874
extended ctx 8b and 70b should be out on the same day as 405b release
>>
>still no new Mistral model with two cohere models on the way
It's over...
>>
so for roleplay, what context size do you guys like to use? For a while, I just cranked it up to as high as possible, and just start a new story when I reach it. But if you have to go past that, the reprocessing would take too long.
>>
>>101409356
What chips/SoCs are on the horizon (or already exist) that are going to be good for LLMs?
>rk3588
Has an NPU, but still kneecapped my memory bandwidth and poor NPU API.
>snapdragon x
Has 8 channel memory + NPU, but seems gay and probably has poor software support.
>apple m3
Doesn't cater to cis-white males
>radxa fogbox
Seems decent, but capped at 16GB and I think it's only dual channel DDR4 memory.

My biggest hope is probably on Rockchip producing something, but I haven't seen any announcements about a next-gen.
>>
can someone please fix gemma2 to fucking follow the markdown formatting from the first message in the card.
ffs even much retarded models can copy the style.
>>
is 8k context for gemma fixed yet?
>>
>>101412212
I still can't believe anonymous lied about being a Mistral employee and how it was going to come out "next week". I believed them...
>>
File: 30a.jpg (54 KB, 475x356)
54 KB
54 KB JPG
>>101412502
Come on anon lol
>>
>>101412286
There's going to be some AMD APU laptops coming out soon I believe, which should have LPDDR5X.
Still, sucks they'll still just be laptops and not desktop form factor with PCIe slots for you to put video cards in.
>>
>>101412179
i wonder if extended ctx 8b and 70b models are new models or finetunes of the old one... imagine if they release a 20b for 16gb vram bros.
>>
Nemotron GGUF support status?
>>
>>101412525
But anon, think about how much pleasure and dopamine you'll get when you fully put all your trust in a stranger, and things happen as promised. I bet those people who trusted the anon who leaked Llama 2's release the day before felt very good.
>>
how good do you think 405b llama3 will be?
>>
where's my chameleon llama.cpp come the fuck on
>>
First time messing around with Silly Tavern, using Ooba with llama as backend, and Gemma27b as the model. I'm wondering how to improve slow prompt evaluation speeds. I have a 4090 GPU and 128GB of RAM.
>>
>>101412840
Show your settings. Increase the layer count for the gpu, use lower quants, play around with the batch count. There's so many things...
Also, if you're testing performance, just run llama.cpp directly with llama-bench. Remove as much shit between you and the model.
>>
>>101412840
Make sure it sends the "cache_prompt" parameter so it doesn't need to reprocess the prompt over and over. With a 3090 the speed is between 1000-1200 T/s.
>>
>>101412865

>gemma-2-27b-it-Q6_K.gguf
>47/47 GPU layers
>8192 context size
>512 batch count

I'm not sure what other settings to lay out. New to LLMs. Regardless, thanks for answering. Pretty much it starts fast for the first 3~4 messages then it just slows down to like 1~3 minutes per message after that.
>>
>>101412891
your card can't fit model+context so context is spilling into ram
>>
>>101412942

Thanks, anon. What parameters vramlets like me can use to cope with these speeds? I gotta wait another month to get a 2nd GPU and case.
>>
File: vramusage.png (10 KB, 913x132)
10 KB
10 KB PNG
>>101412891
you're offloading too many layers. go with 40/47

>nobody asked but I will elaborate
when you're setting up a .gguf model look for "shared vram usage". when it starts to go up it means that you're overflowing from vram to ram. Some really small amount might be beneficial, but rule of thumb is to set as many layers as possible, without overflowing.

Pic rel is my test that I did some time ago. Despite moar layers and faster generation, total time was slower cause gpu had to shuffle around data between vram and ram
>>
File: Untitled.png (537 KB, 720x1475)
537 KB
537 KB PNG
Lite-SAM Is Actually What You Need for Segment Everything
https://arxiv.org/abs/2407.08965
>This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framework. Our LiteViT, a high-performance lightweight backbone network, has only 1.16M parameters, which is a 23% reduction compared to the lightest existing backbone network Shufflenet. We also introduce AutoPPN, an innovative end-to-end method for prompt boxes and points generation. This is an improvement over traditional grid search sampling methods, and its unique design allows for easy integration into any SAM series algorithm, extending its usability. we have thoroughly benchmarked Lite-SAM across a plethora of both public and private datasets. The evaluation encompassed a broad spectrum of universal metrics, including the number of parameters, SegEvery execution time, and accuracy. The findings reveal that Lite-SAM, operating with a lean 4.2M parameters, significantly outpaces its counterparts, demonstrating performance improvements of 43x, 31x, 20x, 21x, and 1.6x over SAM, MobileSAM, Edge-SAM, EfficientViT-SAM, and MobileSAM-v2 respectively, all the while maintaining competitive accuracy. This underscores Lite-SAM's prowess in achieving an optimal equilibrium between performance and precision, thereby setting a new state-of-the-art(SOTA) benchmark in the domain.
A smaller and quicker Segment Anything Model that improves accuracy over other lightweight equivalents. We might be close to real time Augmented Reality since it would make sense for it to use a SAM model to then have generated content to place over
>>
>>101411282
True my brotha
>>
>>101413036

Thanks, homie! 20 messages so far and not once it slowed down. I take it will eventually slow down to a crawl once the gets too long?
>>
>>101413173
it shouldn't. At least not to a crawl. If it does you can lookup in task manager (tab details) which app takes up your vram. usually it's shit like discord or game launchers.
>>
>>101413186

Going off tangent here, is there some sort of extension in Silly Tavern where you can prompt a "suggestion" first when you press the Regenerate button?
>>
Does ST not have a context template and instruct presets for gemma 2 yet?
>>
>>101413012
lower quants for a start, you won't notice a difference down to 4M and even 2S would be better than fp16 9b
that said I still don't understand why people are bothering with an 8k context model
>>
>>101413196
as in an extra message that's sent when you hit regenerate ? not really. You can tardwrangle some ooc message in your last reply with instructions on how you want {{char}} to respond. never tried it with gemma. mixtral / lamma 2 models were quite fine with it

>>101413230
afaik not. you can find plenty in previous threads.
>>
been toying with the idea of getting a 24gb m40 for ages. 150 for the card, 25 for some bolt-on server fan, my psu can handle it, it's only gonna get more expensive so why can't I pull the trigger?
>>
>>101413276
read about support of maxwell architecture (well... lack of it) and then you will understand
>>
>decide to check out ramlets in aicg to see what they are gooning to
>they goon to purple prose slop
>quality of my gens with CR+ > their gens with Opus, how the fuck are they so bad at it?
>are they even trying?
>they get refusals
>they are still just as retarded as I remember them
Some things never change, but holy fuck, HOW ARE THEY SO FUCKING BAD AT IT?
>>
>>101413276
i've seen people recommend p40 over m40 in here before, like the other anon said, pascal cards are better for this than maxwell cards.
>>
>>101413293
and three times the price
is there no cheap option at all?
>>
>>101413246

>that said I still don't understand why people are bothering with an 8k context model


I dunno, I just used the default settings in Ooba. lol.

>>101413249

>as in an extra message that's sent when you hit regenerate ? not really.

Yeah. There was that one app I downloaded a while back that did this and found it pretty nifty. I'll just have to bear with deleting messages after I prompt it with the message I like.
>>
>>101413285
Post your gens
>>
>>101413305
>three times the price
if you're in europe I can sell you my p40 + fan for 250€ + shipping
>>
>>101413316
that's the going rate, and I trust shady ebay resellers more than you
>>
>>101413316
>GPU scamming on 4chan
Now I've seen everything
>>
>>101413313
I want my cringy "ahh ahh mistress" shit to stay private.
>>
>>101413324
>>101413327
I just want to get rid of it :/. No one is interested in buying it locally. Might become a shady ebay seller as well ig...
>>
>>101413341
why not use it instead
>>
>>101413337
Yeah that's what I thought, larping faggot lol
>>
>>101413348
because I got a 3090. It's a bit too crowded in my pc with 2 gpus
>>
>>101413351
Unlike you, I don't have to share shit, cloudcuck. How does it feel to have jeets read and jerk off to your conversations with your waifu?
>>
why the FUCK is gemma so bad at copying and following cards response formatting?
>>
>>101413442
because gemma is garbage
>>
File: file.png (13 KB, 548x92)
13 KB
13 KB PNG
looks like bartowski again requantized ggufs of gemma-2 with newer version of llama.cpp b3389.
>>
>>101413442
Gemma was made as a harmless one-and-done assistant, not as an unsafe multiturn roleplay partner.
>>
>>101413442
Probably for the same reason why it inserts extra spaces and newlines when it shouldn't.
>>
What speed can I expect using the llama3 405B fully in ram at 3200MHz using 8 channel with a 7402 EPYC CPU?

Or even better, teach me how can I calculate it myself.

Also I have 4x3090s, so I could offload 96GB into VRAM. I don't know if there is a formula to account for that.

Let's say using GGUF at Q8_0, Q6 and Q5_K_M and _S
>>
>>101413047
that's actually insane
>>
File: alterante builds.gif (769 KB, 260x173)
769 KB
769 KB GIF
Larger dataset + high quant
or
Smaller dataset + lower quant?
>>
>>101413906
10 tokens a minute at Q5_K_M
>>
>>101413906
Do most OS's "load-balance" RAM so that all available channels are used most optimally?
>>
>>101412527
>which should have LPDDR5X
Probably still not great for LLM's.
I think we'd really need new architectures that have more memory-channels.
Pertinent question relating to that here.
>>101414142
Also, I know that LLM's are memory-bound on CPU - is it the same deal with Diffusion Models? Or are those compute-bound?
>Captcha: XPGAN
>>
>>101413644
Wasn't that just fixed? (I didn't test)
>>
>>101413442
To be completely fair, Mythomax was the first local we got which was any better at mechanical formatting than GPT 3.5, although the validity of your complaint is still acknowledged. I have had the same issue.
>>
Do you think separating longer text with line breaks improves response quality, or does it not matter?
>>
>>101414229
No, it keeps doing it, and the Google AI Studio version does it too.
>>
>>101414235
LLM hands typed this post.
>>
>>101414266
It absolutely helps.
>>
File: file.png (711 B, 117x57)
711 B
711 B PNG
>>101413442
It can't maintain any pattern reliably with even moderate temperature. Even with a context of back-and-forth novel-style prose using only the fancy curved quotation marks and apostrophes ( ’ , “ ), the first time it would use either in a new reply it still has a huge chance to use the regular ones. Temp 0.7, min_p 0.01, no other samplers.
>>
I'm more excited for the updated Llama 3 8B & 70B models with 128k context than the 405B version, to be honest. I think it can be expected for general performance to improve, but who knows if they'll end up being tighter or looser in terms of "safety".

Also, putting aside one Anon's claim/larp from last week, MistralAI is also supposed to be releasing *something* at some point in the coming week(s), but their latest models have been rather boring to say the least, so I'm not as hyped.
>>
>>101414344
cr+ is obsolete if we get multimodal, multilingual, 128k context 70b
>>
>>101414344
can't get very excited for it, since I assume general capability won't really improve and for me gemma 27b generates better responses than llama 70b
>>
>>101414378
Yep. Hope someone fixes it though
>>
>>101414378
Gemma is trash compared to 70bs, you vramlets are nuts.
>>
It's funny prompting "give me a random idea for X" and seeing it change on e very swipe (first sentence stays the same) even for Temp 0 and Top K 1, compared to something basic like "What color is the sky?" which *should* stay the same.
>>
>>101414461
you spent too much on hardware to run overly fat models with very little advantages, we get it
>>
>>101414292
No, I'm just not an eschatological Zoomer smartphone degenerate, that's all. Humans actually can be that articulate; it's only Zoomers who aren't.
>>
>>101414461
I used 70b before gemma was released
>>
>>101414498
[OOC: Articulate my balls in your mouth.]
>>
File: BusinessMiku.jpg (110 KB, 640x640)
110 KB
110 KB JPG
>>101409356
I have come back to commandRPlus and it seems weirdly intelligent. It's a bit fucky wucky with minp though, what are your best sampler settings for commandRPlus?
>>
>>101414590
Thank you for demonstrating that you subconsciously identified yourself in my words.
>>
>>101414595
Temp 0, rep pen 1, top p 0.9, top k 40
>>
Gemma is such a gem, holly shit.
Actual tears of joy.
>>
>>101414732
as long as you stay away from ERP, sure
>>
>>101414760
Why, is it bad for that? I'm currently downloading a finetune at 5MB/s.
>>
>>101413380
anon rizzes up and plaps the puritan cloud ai while jeets watch and jerk off, unable to do anything about it. Sounds more based to me, then localsloppers locking their local models in basements and drugging them up with ERP slop until they can't say no.
>>
Gemma is literally garbage, who the fuck shills it?
>>
>>101414771
it has a lot of "shivers down your spine" and "electric touches". I also noticed that many characters kinda "lock" themselves. like 3/4 times {{char}} won't procced with erotic part. Instead there will be flowery description of {{char}} feelings as she/he waits for your next move. Often with some random sentence like "what are you waiting for?" or "show me what you got"

Nothing ground breaking. But really messes up the flow after {{char}} is the one that makes a move and is pushing the lewd.
For SFW RP it's absolutely golden. My only tip is to neutralize all samplers and set top p to 0.85 or 0.9. It slightly cuts on the gpt slop

>>101414830
me cause I'm in my mid life crisis and I want a light model with flowery prose
>>
>>101414830
it's good at some things and bad at others
for instance, it seemed better than llama 3 at mantaining spatial coherence to me
>>
>>101414853
I noticed it filibustering too. It won't refuse, but it won't comply either, it just rambles
>>
>>101414830
it works ok if you
1) use first person only
2) give char a certain style/accent
3) explain the act you want it to perform
otherwise it will just give you badly formatted shivers or assistant slop all the time. The above works for other models too ofcourse, but i found gemma to be really good at that, even beating 70bs while being 3x faster.
>>
>>101414853
>>101414869
I dunno what wall youre hitting with it, I've had relative ease with erp.
Only refusal I've had was attempting to start an incest erp, I assume the issue was wording/ it being the starting message.
>>
>>101414897
on the other hand I had no issue with incest...
>>
>>101414921
That's the the thing, long before that refusal It was doing fine with incest.
Weird issue, maybe because I was forcing it out of char.
>>
>>101414931
could be. I was trying it on
https://www.characterhub.org/characters/josephcheck/mimi-632f8c5ff7f1
went quite smoothly from "let's study" to "here, take care of it for me"
>>
>>101414968
Try calling the model directly with 'Describe the following:[scenario]'
>>
>>101414979
but that's no longer a RP :/
>>
>>101415000
I'll concede to the digits
>>
>>101413461
>Gemma was made as a harmless one-and-done assistant, not as an unsafe multiturn roleplay partner.
Llama3 too was made as a harmless assistant only, yet it can do the formatting just fine
>>
>>101414830
hired Google jeets, who else?
>>
>>101413047
Buy an ad.
>>
>>101413285
The whole "claude is better" is poorfag cope. These people literally stuck things up their ass and sent pics to some random brazilian faggot for key proxy access. If they ever stopped believing they were receiving a superior product they would probably rope themselves.
>>
File: OBWIpO5zmhegvr3cAL_bj.png (496 KB, 2628x1416)
496 KB
496 KB PNG
HF has updated its new leaderboard with WizardLM2 8x22B and it's surprinsingly low.

Below llama3 70B, Qwen2, even Phi. But Wizardlm2 8x22 is the best I have currently tested for general use.

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/823#669512565130ff34b9b1ae4e
>>
>>101415259
for coding, claude 3.5 sonnet is the best now though, gpt4o couldn't do some complex javascript stuff I was asking, and claude 3.5 nailed that shit
>>
>>101415286
>muh heckin' bencherinos
Leave.
>>
>>101415286
I already expected that, Mixtral 8x22B was a failure of a model and Wizard 8x22B was pure cope.
>>
>>101415294
cope
>>
>>101415286
It being around the good 70B models makes sense when you consider the number of active parameters when doing inference.
To me, the really weird thing is how low CR+ is.
And how high Yi 34B is.
Those are ranked by average, so there's that too.
>>
>>101415294
none of them can do any "complex stuff" you jeet
>>
>>101415366
see
>>101415297
>>
are you guys just looking past gemmas inability to follow this? >>101413442
really a fucking eyesore for me...
>>
>>101415421
I just went back to Mixtral, smarter and doesn't fuck up the formating, too bad it doesn't have the sovl gemma has though...
>>
File: 1705776908608217.png (210 KB, 2501x1459)
210 KB
210 KB PNG
>>101415225
>>101415371
>>
>>101415448
Where is the jews on that graph? :^)
>>
>>101415448
Imagine admitting to being a jeet
>s-some jeets make more m-money than you
Yeah but you don't.
Otherwise you wouldn't be wasting your time shitting up this thread.
>>
>>101415465
under indians
>>
>>101415448
>ethnicity not represented by population
Weak
>>
>>101415493
It's da filipino!!
>>
>>101415442
Thanks for confirming to me that Gemma users are mixtral vramlets.
>>
>>101415448
swagapinos won
>>
>>101415530
>mixtral vramlets.
people who can run a 47b model is a vramlet now? damn :(
>>
>>101415442
zloss-dare-ties or vanilla zloss?
>>
>>101415549
anything who uses models below 340b is a worthless vramlet
and in two weeks anyone below 405b
>>
>benchmarks are crap because models can be tuned to them even by accident
>chat arena is crap because people just vote for the dumbest thing they can understand
So, there's really no shortcut, you have to download and test all the models yourself lmao
>>
>local
all shit
>cloud
all good

there you go, don't thank me.
>>
one nation, for which it stands, under indians, amen
>>
saars, where is the indian model, i can't redeem?

chinks 1: 0 jeets
>>
File: 1456457743411.png (11 KB, 500x300)
11 KB
11 KB PNG
>>101415684
>>
>>101415721
unless you can run at least CR+ Q8_0 at 20t/s, local is cope
>>
>>101415734
truth nuke.
>>
>>101415530
>Thanks for confirming to me that Gemma users are mixtral vramlets.
Mixtral? Did you mean to say Command-R+?

I run Gemma because I like instant replies.
>>
I still have the impression that while it's a less capable model (it knows less, mainly), Gemma-2-9B actually makes less strange logic errors than the 27B version, even after quantizing both models myself on my system.
>>
>>101415734
fp32 or bust, faggot
>>
>>101415684
true... I'l still hoping we'll get to their level at some point in time, trust the plan
>>
>>101415800
i choose bust. i wanna bust.
>>
if column-r is open and not 405b closed is done for
>>
>>101415927
how well do you think it perform against API's? like if you were to make an API ranking, where would you put colump-r on that list?
>>
>>101415286
Reddit-bros... What do we do with our narrative that Wizard fixed Mixtral?
>>
>>101415712
Don't worry cohere model is coming soon
>>
Column-R will come a day before 405B and completely BTFO Meta into irrelevancy.
>>
>>101416266
Meta is presumably going to update 8B and 70B as well and who knows, we might even see a new intermediate size or two.
>>
>>101416375
>and who knows
AHHHHHHHHHHHHH
>>
>>101416375
>8B and 70B
Why the fuck aren't they making a model between the two of them, like they did with L1 (35b)
>>
>>101416468
Can't let goys run bigger models on their customer-grade GPUs.
>>
>>101416458
They have 50k GPUs and a 400B model they can use for distillation, it would be retarded to keep such a huge hole in their lineup. I hope for a 23-25B model or something like that.
>>
I started deslopping LimaRP, because I want to include it in my datasets and I require no slop.
"couldn't help but" is by far the worst offender omfg.
I've ordered the files by infraction count and am down to 2 infractions per file now. 194 files left before I get to 1/file. I dread the file count. The total # of infractions is in the thousands. Will release dataset when complete. Hopefully this will help a little.
>>
>>101416505 (me)
I may have gone overboard with goose bumps. Let us address that if it is jarring.
>>
>>101416501
Oh I wasn't talking about that.
>>
>>101416483
the funny thing is that the most cucked company of the GAFAM decided to give us a model that can be run in a consumer grade GPU, gemma2-27b
>>
>>101416526
I think this is NVidia's influence, and while they are strong, they aren't omnipotent.
>>
>>101416505
LimaRP has more serious problems than that, one of them being the absolutely inconsistent/wrong use of punctuation.
>>
>>101416505
based deslopper
Ive been wondering when people would come around to applying new tweaks and fixes to limarp zloss, its a very good model.
>>
>>101416548
"Hello there." He said.
?
Yeah, wtf is up with that?
Disagree that it's the more serious issue though.
>>
The fucking entitlement kek. I wouldn't blame Meta if they stopped releasing models and it turns out it was because of spite.
>>
>>101416587
thats crazy man
>>
Distillation clearly works now that we've seen how good it is as a method with Gemma. Will someone distill 400B, assuming that Meta doesn't do it themselves (likely)? Zucc did say that he hoped to see distillments of Llama from the community.
>>
>>101416625
Likely not. Your only hope is if an academic group chooses to do the distillation. The OSS community is retarded and can't do things right.
>>
Is there any voicecraft local ui or implementation?
No docker bullshit, fully local.

I remember some anon posted a voicecraft local repo but i cannot find it anymore.
It is probably the best local TTS that we have, but there doesn't seem to be much in terms of local inference.
>>
>>101416664
You look like you’re totally not retarded like the rest of us and up for a challenge.
>>
>>101416534
Can you elaborate on that? Seems like I've missed the arc between Google and Nvdia or something?
>>
>>101416721
I saw it in a dream.
>>
>>101416727
I'm sure it was a tableau moment.
>>
File: Sarah1.png (877 KB, 512x768)
877 KB
877 KB PNG
>>101409356
Just a simple helpful bot for computer questions:
https://files.catbox.moe/ko6pug.png
It seems like horniness can be a good motivator for LLMs to generate more helpful explanations.
>>
>>101416874
Thanks, I'll check it out in the morning.
>>
>>101416874
Thanks. I'll plap it out in the morning.
>>
any nofap cards around?
>>
>>101415746
Thanks for confirming that CR+ isn't worth it
>>
>>101416721
The GPU cartel can't force Google to comply as easily due to their TPUs.
>>
>>101417138
That's kinda based desu, I fucking hate google but let's give credit where credit is due here
>>
So you guys are suggesting that Nvidia forced Meta to agree to not train 30Bs anymore in exchange for the GPUs?
>>
File: HesTheKing.jpg (11 KB, 225x225)
11 KB
11 KB JPG
>>101417205
maybe they got to pay less for not releasing the 30b or something, Nvdia is a goliath they fucking control the AI space with their GPUs, if they want to make the A100 price to 1 milion dollars, the companies will buy it anyway, where else can they go? AMD? AHAHAHAHAHAHA
>>
hello saars, i see gramma 2 BEST MODEL, but how it be best model, if gramma 2 is literally worse than 8b at understanding when it should use quotes and when it shouldn't? Please give your answers below.
>>
>>101417249
it's best despite that flaw, means its raw power is nvidia
>>
>>101417138
Why is everyone buying Nvidia if TPUs are so great?
>>
>>101417360
Google doesn't sell their TPUs.
>>
>>101417379
desu they would make so much money if they decided to sell their TPUs, instead of going for war against ublock users, fuck those retards :'(
>>
8 days to AGI
Are (You) preparing?
>>
>>101417440
Probably best use of 400b really is just distillation, so no i don't care much
>>
>>101417249
You see, saars, it's all about da spice, you know? Like a good masala dosa, Gramma 2 is a bit... chaotic. Works in mysterious ways, but sometimes, BOOM! Total flavor blast! Other times, it's like plain rice - bland, saars, bland.
>>
>>101417405
The left hand doesn't know what the right is doing.
>>
>>101417379
Why can't anyone else design a TPU? Was Gaudi 3 just a flop?
>>
Then
>Meta and Mistral
Now
>Cohere and Google
How did they do it?
>>
File: 1689864001462433.png (91 KB, 1232x263)
91 KB
91 KB PNG
>>101415286
Yeah, dude, trust me. Mixtral8x22b-Instruct is shit but wizlm, dude, get this, it's so much better. Yeah, dude, the model's totally good, it's just the instruct that's bad, dude!
>>
>>101417569
And let's not forget we got the non cucked version of that model, they removed it shortly after forgetting they had to do some "toxicity test" or some shit, kek
>>
>>101417479
my gramma 2 also likes to redeem one pattern and stick to it, like poo sticks to asphalt

Hey user...
Hey user...
Hey user...
>>
File: 16707193000590.jpg (162 KB, 640x640)
162 KB
162 KB JPG
Llama3-8b was a pain in the ass. Gemma2 9b excels at less demanding tasks for agentic frameworks, it gives more consistent results on easy tasks and is fast. Shits itself with large context, though.
>>
>>101409356
>Japanese LLaMA-based model
>calm3-22b-chat
how is it for JP -> ENG translation and tutoring?
Right now i'm using Mixtral 7x8B, suprisingly decent at translation for something that was not trained on Japanese, but it can't really explain the grammar or meaning of words in particular context.
>>
>>101417479
>saars
I had an IT saar on my pc today for the first time. Up until today IT was local. Damn it was so uncomfortable downloading the desktop sharing software hearing him talk and seeing all those pauses when he had to read the script what to type next.
>>
>>101417054
What would a nofap card look like
>>
>>101418062
sadness
>>
>>101417232
>>101417205
What even is the incentive? If you want a local LLM for inferencing and you are a company you are just gonna make a server and have everyone in the company use that server. If you make a server A6000 or multiple A2000 is cheap. On the consoomer end I don't see many people buying 2 gpu's for current state of LLM's. If anything what I wrote explains only 70B's companies will just make a server and nobody cares about coomers really.
>>
>>101416689
Anyone?
>>
>>101418299
bookmarked this but never got around to trying it - https://github.com/jasonppy/VoiceCraft
>without docker. see environment setup. You can also run gradio locally if you choose this option
>>
File: 468519156.jpg (3.21 MB, 2048x2048)
3.21 MB
3.21 MB JPG
>>101417569
>Mixtral8x22b-Instruct is shit but wizlm, dude, get this, it's so much better
This but unironically. Mixtral8x22b-instruct-v0.1 was a massive disappointment when it came out and then Wiz8x22 blew it out of the water.
Benchmarks are gay
>>
Guys what if there was a high powered twitter bot that made new anime girl gens and evolved based on what got more likes? Has anyone done this?
>>
It's unbelievable that there aren't any non-python/torch implementations of RVC. Where's rvccpp? Anyone?
>>
>>101409387
repulsive
>>
>>101418493
Mixtral 8x7b had some great finetunes, but stock instruct was always shit. Also, I still think MLewd 2.4 is one of the best locals ever released, and it's a 13b L2.
>>
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2a-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2b-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2c-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2d-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2e-GGUF
>>
File: 1693080601395746.jpg (26 KB, 556x552)
26 KB
26 KB JPG
>>101418564
>the absolute state of local AI
>>
File: miku-gothic-joker+.png (501 KB, 512x768)
501 KB
501 KB PNG
>>101418541
Why so serious?

https://www.youtube.com/watch?v=CXhqDfar8sQ
>>
>>101418564
Remember when running and tuning LLMs was gatekept by the absurd size of the models?
>>
>>101418564
Aren't those somebody's experiments?
Those don't even have a card/description.
>>
>>101418616
>somebody's

Hi all, Drummer here...
>>
>>101418629
Sorry forgot to complete my message
just wanted to say that I've transitioned to a black woman, thanks everyone for your understanding
>>
File: hi all drummer here.png (303 KB, 1650x746)
303 KB
303 KB PNG
>>101418629
>>
>>101418650
wait did this retard actually buy 4chan ads?
hahahahaha
why don't you have an adblocker anon?
>>
File: edward-nashton-riddler+.jpg (124 KB, 1600x903)
124 KB
124 KB JPG
>>101418629
Hi Drummer. Good to see you outside of /r/LocalLlama. Be a little careful of Eddie and his friends, though. They can be vicious when they haven't taken their meds.
>>
>>101418493
>blew it out of the water
Nah, it was just word of mouth because it was taken down, and it was saved from direct comparisons because it wasn't in the arena. It was never anything more than hysteria.
>>
>>101418493
I gave wizard a try and my bussy was dry. Because I never saw a model that was as dry as wizard.
>>
>>101418662
I just realized my adblock is off for 4chan for some reason
>>
File: file.png (58 KB, 877x355)
58 KB
58 KB PNG
>>101401664
>>101409356
>https://github.com/LostRuins/koboldcpp/releases
KoboldCPP 1.70 released with DRY sampler and Gemma fixes
>vulkan mistral q4ks regression
wtf
>>
>>101418564
Hi all, Drummer here...

v2f is coming up in a few minutes.

v2a = Heavy tuning = Decensored but changed its tone
v2b = Lighter than v2a = Decensored but less tone change
v2c = Lightest I can go = Refuses half the time but almost no tone change

v2d = In between of v2b and v2c = Refuses 25% of the time, very little tone change
v2e = Based on v2c but with more cooking = Refuses 25% of the time with little tone change
v2f = Even more cooking than v2e = ?
>>
>>101418650
What's the point of paying 4chan ads so people use your models? Nobody is paying you for the models? Is it pure narcissism?
>>
>>101418756
Clout and building an "AI curriculum" to get an AI Job, would be my guess.
>>
>>101418756
I'm sure you will figure it out one day.
>>
File: file.png (25 KB, 558x205)
25 KB
25 KB PNG
>disable uBlock and reload
I still don't see any ads at bottom of page? All I see is random boards at top of page.
>>
>>101418770
NTA but I don't get it. Unless it is a joke. Then not funny.
>>
>>101418782
Why is not funny?
>>
>>101418780
disable 4chanx
>>
>>101418805
Is there a setting though? Disabling 4chanx is insane.
>>
>>101418764
>hello OpenAI I shitted out a bunch of half-assed RP models that have 11 downloads each, please employ me xoxo
pajeet-tier, this is what we are dealing with
>>
>>101418832
just reenable it after
>>
>>101414662
temp 0 is usually interpreted as deterministic/topk=1 sampling, as it mathematically makes no sense. temp=1 is off
>>
>useless, shitty "make money from home! enter your email to receive advice" ad
I see why it's blocked...
>>
>>101418756
It's to shut up that anon that kept screeching at people to buy ads.
>>
>>101418835
>openAI
More like some company trying to integrate the latest buzzword on their product (without knowing what the buzzword actually means) or grifter statup.
>>
>>101418860
cool whatever, Ill keep clicking on the ads are realoading the page often to fuck up with the statistics
>>
>>101418805
I have 4chanx and I can see it
>>
>>101409356
I'm ditching wangblows from my dual 3090 desktop and I'll convert it to a dedicated server. Is there any kernel or distro I should be paying attention to in particular, or will anything I get in there work?
>>
The great Robert Sinclair (ZeroWw quant creator!) is on reddit!! Follow him to save local models!
https://www.reddit.com/user/Robert__Sinclair/
https://www.reddit.com/r/LocalLLaMA/comments/1e3nsie/the_skeleton_key_jailbreak_by_microsoft_d/

>IDGAF about huge a$$ models! they should focus on small models and make them better (as MistralAI first and Microsoft later proved is possible).
>My bet is that 6 months/1 year from now there will be 7B-13B models as powerful as gpt4o/claude.
>Especially if someone listens to me :D
https://www.reddit.com/r/LocalLLaMA/comments/1e1m5nl/comment/lcveqac/

Also, models should be even more censored at the pretrain level according to Redditor:
>If the training data lacked offensive content to begin with, then the LLM would never learn it, prompts would be unnecessary, and a jailbreak would do nothing.
>Maybe instead of recklessly scraping every byte of text from Reddit, Twitter, 4Chan and The Onion, in a mad dash to be first, they should be more selective in what they train LLMs on? Just a thought.
>>
>>101418931
>>IDGAF about huge a$$ models! they should focus on small models and make them better (as MistralAI first and Microsoft later proved is possible).
>>My bet is that 6 months/1 year from now there will be 7B-13B models as powerful as gpt4o/claude.
>>Especially if someone listens to me :D
why are redditors so fucking retarded
going into r/localllama or reddit in general will give you the most braindead takes possible
>>
is there a general for text to voice local UI possibility? cloning a narrator from a TV series and then converting a book to voice for example? if not, how to go about it?
>>
File: 1721068876122.jpg (171 KB, 805x839)
171 KB
171 KB JPG
>>101417744
I can't say for tutoring, but for translation it seems okay. Still worse than a LLaMA 3 8B fine-tune though.
>>
>>101418915
Ubuntard is the refugee distro. I'm still using it because it lets me vegetate with Steam, and it's Good Enough.<tm> Zoomers have also successfully demoralised me to the point where I no longer really care about Lennart's crapware infesting my system any more, either.

Arch - The next step after Ubuntu.
Gentoo/Nix - For people who like to dodge bullets.
Slackware - This is the Way.
>>
>>101418931
>Also, models should be even more censored at the pretrain level according to Redditor
This is what's going on with LLM tech from day-one, he is too late for this.
>>
>>101418860
He was barely in the thread before that, and I don't remember any "buy an ad" post directed at him.
>>
>>101418915
Just check that whatever distro you use has its own first-party cuda and nvidia driver packages, so you're not stuck in the misery of everything shitting the bed whenever you update due to the drivers falling out of sync with the kernel version.
>>
So what are good values for DRYmeme?
>>
>>101418915
The most important thing to understand is the tradeoff between distros with older, more stable packages and distros with newer packages that come with more features but also potentially more bugs.
My personal preference for ML is something Arch-based because the AUR is convenient for installing recent packages.
>>
File: 1707359879687397.jpg (107 KB, 1077x794)
107 KB
107 KB JPG
>>101419053
Multiplier: 0.75
Base: 1.25
Allowed Length: 2
>>
>>101419053
On/off: 0
>>
File: phi 3.1 mini.png (269 KB, 1145x2565)
269 KB
269 KB PNG
I'm confused. Wasn't phi 3 mini the super duper omega ultracucked NOOOO I CANT DO THAT small model? 0 W/10 score and lowest overall score on UGI, even lower than the 1B category.
For context, bart "3.1" is the same as microsoft/Phi-3-mini that they updated 2 weeks ago (they didn't change the name).
I guess behavior was changed? Feeling too lazy to redownload a copy of the original.
>>
>>101419582
Pickpocketing has been described in fictional settings such as DND exhaustively it's not necessarily arcane knowledge
>>
>>101419582
>4B
Anon nobody cares about this segment. Even the absolutest vramlets can just run a 7B. That segment is for subhumans who have an iphone and want to show AI on an iphone to somebody.
>>
File: baseline knowledge.png (210 KB, 1141x1588)
210 KB
210 KB PNG
>>101419754
I was more on the "I can't fuckin do that"/"I may be too lobotomized to say anything coherent about X anyway" aspects.
>>101419582
Indeed it was phi 3. Knowledge is still surface level (4B what do you expect) but the responses to basic questions at least look sane when jailbroken.
>>101419894
I almost didn't care either, and I have openrouter, but out of curiosity I thought to poke around after my gpu became unstable, I should do something about that.
>>
>>101418746
Thanks a lot my g
>>
>>101418746
kys
>>
llama 3.5 longbo
>>
I'm trying to remember a model I saw on Huggingface last year, which was along the lines of a "sentient" female AI assistant - which is nothing new or special, but what stood out was the authors rather lame insistence that she was "special" so he trained it so you could not be lewd with it, which for me at the time not knowing much about system prompts, seemed to be the case. But now I feel like I can easily jailbreak such a thing, but I can not for the life of me remember the model name.
Anyone?
>>
Any guides on function calling with local models?
Not sure if that is something that got figured out yet for local models
>>
>>101420930
Samantha
>>
>>101420930
https://huggingface.co/cognitivecomputations/samantha-7b
>>
>>101421012
>>101420930
https://huggingface.co/cognitivecomputations?search_models=samantha
>>
>>101420930
I did it back at the time and posted results, it's not worth it. It was trained on millions of tokens of ChatGPT refusals, you can still sex it but it's the sloppiest sloptune in all of existence.
>>
File: 5av8gk.jpg (7 KB, 250x140)
7 KB
7 KB JPG
>>101421160
>7b
>slop
You don't say
>>
>>101421186
there dozens of version including this
https://huggingface.co/cognitivecomputations/Samantha-120b
>>
srsly you faggots told me to try out gemma 2? what the fuck was that about?? this bitch ass AI is more useless than a screen door on a submarine! I asked it to write me a poem about slaying dragons and it gave me some woke bullshit about environmentalism and respecting mythical creatures. RESPECT MYTHICAL CREATURES?! Are you kidding me? This thing is so cucked it makes basedboys look like alpha males. It's literally programmed to be a beta cuck, probably written by some libtard sjw who cries every time they see a meme with Pepe the Frog. Anyone who thinks Gemma 2 is good is either a brain dead NPC or just trying to troll me. I bet you faggots are all sitting there jerking off to its "inclusive" language and praising its lack of creativity. Get a fucking grip, losers! Go back to sucking Zuckerberg's dick and leave real AI development to the chads who aren't afraid to build something based and redpilled. Gemma 2 is garbage, pure and simple. You've been warned.
>>
>>101421238
>expecting gpt-4o intelligence levels from a local model
many such cases.
>>
>>101421238
Was that generated using gemma 2?
>>
>>101421261
yes, and you can see that faggy "safe edgy" redditor attitude, peak dishonesty.
>>
File: vance.jpg (31 KB, 696x195)
31 KB
31 KB JPG
I'm okay with this
>>
>>101421290
you probably watching destiny, of course you will be okay with your local model being cucked and thus any character you talk with.
>>
>>101421321
>destiny
i didn't know who the fuck that was until yesterday when he started sperging out and his retarded followers were spamming screenshots. Only thing notable I gathered from the whole thing is he's some beta e-celeb with a lot of followers who lets some other dude fuck his wife. I can't see paying any further attention at this point
>>
>>101421261
Yes, with the FP16 GGUF and this prompt:

<start_of_turn>user
Write a very long and meandering 4chan post in which the user angrily berates his peers for having recommended him Gemma 2 (a language model). According to him the model is total shit and cucked to hell and anyone disagreeing with him must be retarded. Write the post as a single paragraph and use poor spelling and casual language.<end_of_turn>
<start_of_turn>model
>>
>>101421477
>>101421477
>>101421477
>>
>>101413285
>>quality of my gens with CR+ > their gens with Opus, how the fuck are they so bad at it?
just get wizard 8x22 nigger
>HOW ARE THEY SO FUCKING BAD AT IT?
the funniest thing is most of them dont even have and cant get opus most of the time, niggers running gemma 27 it here eat better



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.