/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/14/24(Sun)17:37:35 No.101409356

File: 1702227656151264.jpg (726 KB, 1856x2464)

726 KB JPG

/lmg/ - Local Models General Anonymous 07/14/24(Sun)17:37:35 No.101409356 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101398610 & >>101392789

►News
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271
>(07/09) Anole, based on Chameleon, for interleaved image-text generation: https://hf.co/GAIR/Anole-7b-v0.1
>(07/07) Support for glm3 and glm4 merged into llama.cpp: https://github.com/ggerganov/llama.cpp/pull/8031
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/14/24(Sun)17:38:22 No.101409364

Anonymous 07/14/24(Sun)17:38:22 No.101409364

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101398610

--Performance Comparison of Fine-Tuned Machine Learning Models for Japanese Language Tasks: >>101402729
--Status of Gemma FlashAttention in Exllama and Llama.cpp: >>101404890 >>101404961 >>101404982 >>101404974 >>101404989 >>101404997
--Microsoft's T-MAC for Low-bit LLM Inference on CPU: >>101400664 >>101400691 >>101400715 >>101400727 >>101400739 >>101400766
--Gemma has formatting issues with narration, tokenization, and unconventional writing styles: >>101406156 >>101406193 >>101406350
--Anticipation and Uncertainty Surrounding the 400B Model Release: >>101406549 >>101406586 >>101406689 >>101406719 >>101406609 >>101406645 >>101406662 >>101406906
--From skepticism to understanding: Anon's journey into the world of AI chatbots and cooming: >>101400658 >>101404991 >>101405455 >>101405807 >>101405843 >>101406138
--Amount of RP Data Needed for Finetuning and Effectiveness of Lora: >>101404690 >>101404742 >>101404954
--Gemma Compatibility with Latest Koboldcpp and Context Shifting Issues: >>101401664 >>101402962 >>101403123
--Frequency of Updates for the UGI Leaderboard?: >>101407818
--characharm/gemma-2-27b-it.gguf: Improved Tokenization for HTML and Consecutive Spaces: >>101407533
--Where to test Gemma without a local setup?: >>101404824 >>101404967
--Potential Issues with Gemma Model Implementation: >>101408036 >>101408112
--NTA Fixes Repetition Issues and Llama_cpp_HF/EXL2 Enable Token Probabilities: >>101406702 >>101406718 >>101406739 >>101406835
--LCPP Gemma Fixes Released: Tokenization Improvements for Gemma and Gemma-2: >>101405587
--HTML5 Apps: A Hole in LLM Coverage or Too Complex a Task?: >>101407009 >>101407039
--Anole and Hato AI Model Adventures: Tackling CUDA OOM and GPU Memory Conundrums: >>101399676 >>101399734 >>101403049 >>101405127
--Miku (free space): >>101399878 >>101401746 >>101405424 >>101405439 >>101405454 >>101405464

►Recent Highlight Posts from the Previous Thread: >>101398673

Anonymous
07/14/24(Sun)17:40:51 No.101409387

Anonymous 07/14/24(Sun)17:40:51 No.101409387

File: 468517167.jpg (836 KB, 1792x2304)

836 KB JPG

Mikulove

Anonymous
07/14/24(Sun)17:41:01 No.101409391

Anonymous 07/14/24(Sun)17:41:01 No.101409391

Two more days!

Anonymous
07/14/24(Sun)17:43:44 No.101409416

Anonymous 07/14/24(Sun)17:43:44 No.101409416

>>101409325
Oh, also, it's going to be painfully slower.

Anonymous
07/14/24(Sun)17:44:02 No.101409422

Anonymous 07/14/24(Sun)17:44:02 No.101409422

>play as a guy who rose to greatness and protected humanity
>reach the end
>think of a way to make it more interesting and continue the story
>introduce an innocent young girl that sought to mindbreak him so that he could mindbreak her
Hmm, maybe I am the evil one after all.

Anonymous
07/14/24(Sun)17:59:00 No.101409549

Anonymous 07/14/24(Sun)17:59:00 No.101409549

how big will be the market for lewd video generation

Anonymous
07/14/24(Sun)18:27:52 No.101409803

Anonymous 07/14/24(Sun)18:27:52 No.101409803

>>101409549
>how big will be the market
I just realized that what is coming next is people intentionally typing like retards to show that it is a real person instead of AI. And next step after that will be AI intentionally typing like retards to pretend it is a real person.

Anonymous
07/14/24(Sun)18:31:16 No.101409843

Anonymous 07/14/24(Sun)18:31:16 No.101409843

File: 1700140349440192.png (2.39 MB, 1736x2456)

2.39 MB PNG

Has anything come from MORA? It was hyped up to be an alternative to LoRA that actually adds knowledge to the model a while ago but I haven't heard anything of it since.

Anonymous
07/14/24(Sun)18:42:29 No.101409957

Anonymous 07/14/24(Sun)18:42:29 No.101409957

>>101409803
kek

Anonymous
07/14/24(Sun)19:04:17 No.101410167

Anonymous 07/14/24(Sun)19:04:17 No.101410167

>>101409843
I tried it and the model was broken. Plan to test again as I think this was my fault for using it as if it was a better LoRA. It’s something different (stronger?). Just haven’t had time to play more yet.

Anonymous
07/14/24(Sun)19:17:39 No.101410298

Anonymous 07/14/24(Sun)19:17:39 No.101410298

File: namba.png (37 KB, 879x233)

37 KB PNG

Anonymous
07/14/24(Sun)19:23:29 No.101410356

Anonymous 07/14/24(Sun)19:23:29 No.101410356

>>101410298
por que

Anonymous
07/14/24(Sun)20:19:19 No.101410741

Anonymous 07/14/24(Sun)20:19:19 No.101410741

>>101409549
Interest might be high. While the generation count will be very low

Anonymous
07/14/24(Sun)20:25:23 No.101410794

Anonymous 07/14/24(Sun)20:25:23 No.101410794

>>101409803
>singularity is just idiocracy
Can't wait.

Anonymous
07/14/24(Sun)20:48:59 No.101410950

Anonymous 07/14/24(Sun)20:48:59 No.101410950

>"first_output_sequence": "<bos><start_of_turn>model",
I think I fixed my gemma by removing this. Some retard/troll ITT made it. I wish he would die in a fire but that is honestly less important than how loaders allow this shit to happen. And also how "Add BOS Token" is a setting when there are no safeties in place. I am sure that more than 50% of users will or have at some point added duplicate bos tokens without even knowing it. BOS token should either be removed as a switch like that or a remove all duplicate BOS tokens should be added as a default option.

I hate this hobby.

Anonymous
07/14/24(Sun)20:50:21 No.101410961

Anonymous 07/14/24(Sun)20:50:21 No.101410961

>>101410950
>I am sure that more than 50% of users will or have at some point added duplicate bos tokens without even knowing it
only idiots

Anonymous
07/14/24(Sun)20:52:37 No.101410981

Anonymous 07/14/24(Sun)20:52:37 No.101410981

>Sao datasets nuked
why
they were some good sets
this is like the 3rd time a good dataset I've been using for training is nuked randomly during a run

Anonymous
07/14/24(Sun)20:53:30 No.101410991

Anonymous 07/14/24(Sun)20:53:30 No.101410991

>>101410950
>a remove all duplicate BOS tokens should be added as a default option
Yeah, I know we're still in the Wild West but there is a lot of rough edges that really ought to be standardized away even if it were as simple as someone just saying "We have 10 standards, let's pick the satisfice and get it down to 1."

>>101410961
Good contribution.

Anonymous
07/14/24(Sun)20:53:58 No.101410997

Anonymous 07/14/24(Sun)20:53:58 No.101410997

>>101410981
sao is just entering his udi arc don't worry about it

Anonymous
07/14/24(Sun)20:55:13 No.101411009

Anonymous 07/14/24(Sun)20:55:13 No.101411009

>>101410981
Do you not clone datasets and models you use as a base for your own experiments?

Anonymous
07/14/24(Sun)20:55:15 No.101411011

Anonymous 07/14/24(Sun)20:55:15 No.101411011

>>101410991
>Good contribution.
i agree thx!

Anonymous
07/14/24(Sun)20:58:38 No.101411038

Anonymous 07/14/24(Sun)20:58:38 No.101411038

>>101411009
no I use streaming since most of the datasets I use are dozens of GBs in size and I round-robin between them during training

Anonymous
07/14/24(Sun)20:58:54 No.101411041

Anonymous 07/14/24(Sun)20:58:54 No.101411041

>>101410991
>"We have 10 standards, let's pick the satisfice and get it down to 1."
ChatML, temp 1, deprecate ALL other settings, there you go now it's idiot proof.

Anonymous
07/14/24(Sun)21:00:24 No.101411051

Anonymous 07/14/24(Sun)21:00:24 No.101411051

>>101411038
I see. Fair.
There isn't a way to just clone a repo on huggingface without downloading it to your machine first right?

Anonymous
07/14/24(Sun)21:01:20 No.101411062

Anonymous 07/14/24(Sun)21:01:20 No.101411062

>>101411041
Agreed except I've been hoping that temp 0 (Kobold sets it to 0.01) is as deterministic enough that it's the canonical output for things like code generation without as many hallucinations.

Anonymous
07/14/24(Sun)21:02:24 No.101411068

Anonymous 07/14/24(Sun)21:02:24 No.101411068

>>101409803
low csing only no pnctuation skipping lettrs meta

we are the resistance

Anonymous
07/14/24(Sun)21:02:35 No.101411070

Anonymous 07/14/24(Sun)21:02:35 No.101411070

https://huggingface.co/characharm/gemma-2-27b-it.gguf
>re-conversion
>makes Gemma and Gemma-2 tokenize pretty much EVERYTHING correctly, including HTML tags and consecutive spaces
has anyone tried those new quants? did you feel it made gemma less retarded?

Anonymous
07/14/24(Sun)21:02:38 No.101411071

Anonymous 07/14/24(Sun)21:02:38 No.101411071

>>101411062
Sorry not idiot proof enough, "it repaet too much", temp 1 is statistical average no other option.

Anonymous
07/14/24(Sun)21:03:00 No.101411076

Anonymous 07/14/24(Sun)21:03:00 No.101411076

>>101411051
not that I know of
the only way is to clone the repo and upload it to your own account

Anonymous
07/14/24(Sun)21:03:57 No.101411079

Anonymous 07/14/24(Sun)21:03:57 No.101411079

File: file.png (1.14 MB, 1152x768)

1.14 MB PNG

>>101411070
it still inserts extra spaces and new lines and fucks up roleplay formatting

Anonymous
07/14/24(Sun)21:04:41 No.101411090

Anonymous 07/14/24(Sun)21:04:41 No.101411090

What's the smallest model you can RP with? I'm gonna run llama.cpp on my phone for fun and I'm wondering what's a good model? I don't think I can fit mistral...

Anonymous
07/14/24(Sun)21:04:57 No.101411094

Anonymous 07/14/24(Sun)21:04:57 No.101411094

>>101411079
>roleplay formatting
no such thing

Anonymous
07/14/24(Sun)21:05:12 No.101411096

Anonymous 07/14/24(Sun)21:05:12 No.101411096

>>101411079
fuck man... I hoped it was the final fix, the fuck is wrong with gemma? I hope they'll find the problem at some point in time.

Anonymous
07/14/24(Sun)21:06:07 No.101411104

Anonymous 07/14/24(Sun)21:06:07 No.101411104

>>101411090
Llama 3 8B

Anonymous
07/14/24(Sun)21:07:07 No.101411110

Anonymous 07/14/24(Sun)21:07:07 No.101411110

>>101411096
>problem
didn't auto say it did the same on the google api?

Anonymous
07/14/24(Sun)21:09:59 No.101411141

Anonymous 07/14/24(Sun)21:09:59 No.101411141

>>101411090
>I don't think I can fit mistral...
At that point you might as well spin a colab instance with ngrok and access that remote instance from your phone.
You can run the frontend in your phone if you want too.

Anonymous
07/14/24(Sun)21:10:30 No.101411144

Anonymous 07/14/24(Sun)21:10:30 No.101411144

>>101411094
if gemma decides to output something with asterisks, she fucks up everything. I don't even use asterisks myself for "roleplay", only rarely for emphasis or sounds like *BOOM*, *PLAP*, etc. And i insert the insturction to right only in plain text, which works mostly, but sometimes during narration she may start adding quotes first, and from there switches to this roleplay bullshit

Anonymous
07/14/24(Sun)21:11:32 No.101411156

Anonymous 07/14/24(Sun)21:11:32 No.101411156

>>101411144
>insturction to right
forgive me sirs, it's 03:12 AM

Anonymous
07/14/24(Sun)21:12:57 No.101411171

Anonymous 07/14/24(Sun)21:12:57 No.101411171

>>101411110
you can rp with gemma with the google api?

Anonymous
07/14/24(Sun)21:13:22 No.101411174

Anonymous 07/14/24(Sun)21:13:22 No.101411174

>>101411156
Fellow balkanigger
>>101411104
Too big =(
>>101411141
I mean I can just run it on my PC, but running it locally on my phone sounded fun just for the hell of it.

Anonymous
07/14/24(Sun)21:15:52 No.101411191

Anonymous 07/14/24(Sun)21:15:52 No.101411191

>>101411174
>on my phone sounded fun
does 5+ minute per 8b reply at 2k context sound fun to you?
>I'm getting single sentence responses in 30-40 seconds on a Note10
>This caused me to have to re-ingest the prompt which takes multiple minutes at a full 2048 tokens.
https://huggingface.co/Lewdiculous/Model-Requests/discussions/42

Anonymous
07/14/24(Sun)21:20:22 No.101411215

Anonymous 07/14/24(Sun)21:20:22 No.101411215

>>101411174
>but running it locally on my phone sounded fun just for the hell of it.
>https://huggingface.co/Qwen/Qwen2-0.5B-Instruct
>https://huggingface.co/microsoft/Phi-3-mini-4k-instruct

Anonymous
07/14/24(Sun)21:25:04 No.101411265

Anonymous 07/14/24(Sun)21:25:04 No.101411265

>>101411090
Why do people have this hardon for processing shit on their phone? Just make a server like a human.

Anonymous
07/14/24(Sun)21:26:10 No.101411275

Anonymous 07/14/24(Sun)21:26:10 No.101411275

>>101409422
So they try to mind real each other? Neat plot actually

Anonymous
07/14/24(Sun)21:27:11 No.101411282

Anonymous 07/14/24(Sun)21:27:11 No.101411282

>>101409843
MMM Ganyu so thic

Anonymous
07/14/24(Sun)21:53:21 No.101411477

Anonymous 07/14/24(Sun)21:53:21 No.101411477

>>101411275
Well, kind of? Basically just that one Asanagi doujin now that I think about it. Good dude gets mindbroken (or rather hypnotized in the case of that specific doujin) in a way that makes him want to mindbreak the girl.

Anonymous
07/14/24(Sun)21:58:06 No.101411510

Anonymous 07/14/24(Sun)21:58:06 No.101411510

File: f931debc29a725a40fe777aec(...).jpg (208 KB, 1500x1800)

208 KB JPG

>>101409356
>>101409364
>>101409387
Adorable Mikus!

Anonymous
07/14/24(Sun)22:04:09 No.101411550

Anonymous 07/14/24(Sun)22:04:09 No.101411550

>>101411265
Idk man, so I can fuck around with LLMs while my PC is off or doing something else.

>>101411191
Last time I tried it with some miniscule model it seemed to work fine

>>101411215
Thanks anon, I'll check these out

Anonymous
07/14/24(Sun)22:06:02 No.101411566

Anonymous 07/14/24(Sun)22:06:02 No.101411566

File: solid.jpg (81 KB, 1258x1319)

81 KB JPG

>>101411510
But where does the other side of the plug go?

Anonymous
07/14/24(Sun)22:14:48 No.101411639

Anonymous 07/14/24(Sun)22:14:48 No.101411639

>>101411566
probably above her tailbone, also unless there's a hole in her skirt it can't reach there

Anonymous
07/14/24(Sun)22:18:17 No.101411667

Anonymous 07/14/24(Sun)22:18:17 No.101411667

Alright, I like Nymph_8B so far, but it is worse than Stheno for brats and zoomer speech it seems.
The model's "personality" is really strong, as in, it bleeds into every character.
Slightly overbaked maybe?
On another note, how does applying multiple LoRA to a model works? As in, say that I extract a LoRA from a fine tune of a model, then a second one from another model, then apply both to a third also fine tuned model, what would happen?
Sounds to me like it would behave almost like a frankenmerge, as in mostly badly.

Anonymous
07/14/24(Sun)22:21:45 No.101411691

Anonymous 07/14/24(Sun)22:21:45 No.101411691

File: Koboldcpp 1.70.png (92 KB, 865x680)

92 KB PNG

>>101409356
koboldcpp-1.70 came out 2 minutes ago for anyone using these releases.

Anonymous
07/14/24(Sun)22:23:51 No.101411711

Anonymous 07/14/24(Sun)22:23:51 No.101411711

>>101411691
t. concedo
very organic, well done. now go back to your discord

Anonymous
07/14/24(Sun)22:24:46 No.101411721

Anonymous 07/14/24(Sun)22:24:46 No.101411721

>>101411711
kill yourself faggot

Anonymous
07/14/24(Sun)22:25:52 No.101411727

Anonymous 07/14/24(Sun)22:25:52 No.101411727

>>101411721
you first, troon

Anonymous
07/14/24(Sun)22:26:22 No.101411732

Anonymous 07/14/24(Sun)22:26:22 No.101411732

>>101411691
Thanks for the update.

Anonymous
07/14/24(Sun)22:26:25 No.101411733

Anonymous 07/14/24(Sun)22:26:25 No.101411733

>>101411691
DRY Samppler aside, anything interesting for somebody who's been using llama-server for the last 5 or so months after using kcpp for a long ass time?

Anonymous
07/14/24(Sun)22:30:47 No.101411762

Anonymous 07/14/24(Sun)22:30:47 No.101411762

File: IMG_20240715_103027.jpg (153 KB, 1080x2288)

153 KB JPG

>>101411691
Corpo style looks amazing!
Except for the blue bar at the top.

Anonymous
07/14/24(Sun)22:33:29 No.101411781

Anonymous 07/14/24(Sun)22:33:29 No.101411781

>try an SCP card
>the one with the machine that turns objects into other objects depending on the setting you adjust it to
>try out a bunch of crap
>finally try out "I put myself in and turn it on"
>it teleports me to a completely blank white space
>I tell it that it reminds me of the Matrix and I try commanding the computer to spawn stuff
>it does
>simultaneously the narration says that the Foundation is trying to investigate the machine and detected an entire dimension in it
>I decide to ask for the computer to spawn an avatar for itself to communicate with me
>it spawns an avatar but instead says that it relays the message of a collective of non-physical consciousnesses, who together control this space, rather than a computer
>find out that the space is actually a nexus dimension that connects to many others, and the consciousnesses are there to gather more "stories" from dimensions that the nexus connects to, since they have none themselves, as they never had physical forms
>make the analogy of internet forums for TV shows, where the posters are an audience that wants more content, which may be gained when someone posts something to the forum
>somehow I'm the only one with a physical form that was able to get into this dimension
>the Foundation somehow found a way to look into the dimension, capturing video and audio data, so I get another audience member
>decide to just go crazy and pretend host a show with explicit content, since the audience of non-physical consciousnesses wanted as much as they could get
>then I propose to manifest the members into physical beings I can interact with and that can participate as characters in my story
>I literally get a harem of alien consciousnesses stuffed into girl bodies
Went a bit overboard with the wall of text here but fug it. I was not expecting the model to generate something like this today, and not in a coherent way either. It kind of felt like it actually was able to understand this layered scenario.

Anonymous
07/14/24(Sun)22:49:05 No.101411874

Anonymous 07/14/24(Sun)22:49:05 No.101411874

File: three times bigger.jpg (41 KB, 512x329)

41 KB JPG

Any advancements lately on the context front for low/midrange local?
t. been having fun with Llama 8B finetunes on my 12GB VRAM but really really sick of being limited to 8k context. I need at least 24k context for my stories.

Anonymous
07/14/24(Sun)22:50:56 No.101411886

Anonymous 07/14/24(Sun)22:50:56 No.101411886

>>101411874
2md

Anonymous
07/14/24(Sun)22:57:06 No.101411930

Anonymous 07/14/24(Sun)22:57:06 No.101411930

>>101411874
Go for either some kind of mixtral 8x7b (32k native context and faster)
or Wizard 8x22b (65k native context, slower)
While you might not be able to fit all the context in 12gb, 4 or 8 bit cache will easily put you over 8k.

Anonymous
07/14/24(Sun)22:57:28 No.101411932

Anonymous 07/14/24(Sun)22:57:28 No.101411932

>>101411874
Maybe you should get sick of being poor instead

Anonymous
07/14/24(Sun)22:58:41 No.101411941

Anonymous 07/14/24(Sun)22:58:41 No.101411941

>>101411874
>big context
>12GB
bruh

Anonymous
07/14/24(Sun)23:00:25 No.101411950

Anonymous 07/14/24(Sun)23:00:25 No.101411950

>>101411874
No local model can use effectively contexts larger than 4K tokens.

Anonymous
07/14/24(Sun)23:07:04 No.101411980

Anonymous 07/14/24(Sun)23:07:04 No.101411980

>>101411950
Sounds like vramlet cope to me. Can go up to 32k reliably with the right model. No rope, no cope:
>>101358971

Anonymous
07/14/24(Sun)23:27:49 No.101412088

Anonymous 07/14/24(Sun)23:27:49 No.101412088

So is our dear CUDA dev the reson why P40 prices keep going up?

Anonymous
07/14/24(Sun)23:28:06 No.101412090

Anonymous 07/14/24(Sun)23:28:06 No.101412090

>>101411980
>>101358971
retard

Anonymous
07/14/24(Sun)23:43:49 No.101412179

Anonymous 07/14/24(Sun)23:43:49 No.101412179

>>101411874
extended ctx 8b and 70b should be out on the same day as 405b release

Anonymous
07/14/24(Sun)23:49:57 No.101412212

Anonymous 07/14/24(Sun)23:49:57 No.101412212

>still no new Mistral model with two cohere models on the way
It's over...

Anonymous
07/15/24(Mon)00:00:21 No.101412268

Anonymous 07/15/24(Mon)00:00:21 No.101412268

so for roleplay, what context size do you guys like to use? For a while, I just cranked it up to as high as possible, and just start a new story when I reach it. But if you have to go past that, the reprocessing would take too long.

Anonymous
07/15/24(Mon)00:04:00 No.101412286

Anonymous 07/15/24(Mon)00:04:00 No.101412286

>>101409356
What chips/SoCs are on the horizon (or already exist) that are going to be good for LLMs?
>rk3588
Has an NPU, but still kneecapped my memory bandwidth and poor NPU API.
>snapdragon x
Has 8 channel memory + NPU, but seems gay and probably has poor software support.
>apple m3
Doesn't cater to cis-white males
>radxa fogbox
Seems decent, but capped at 16GB and I think it's only dual channel DDR4 memory.

My biggest hope is probably on Rockchip producing something, but I haven't seen any announcements about a next-gen.

Anonymous
07/15/24(Mon)00:20:31 No.101412371

Anonymous 07/15/24(Mon)00:20:31 No.101412371

can someone please fix gemma2 to fucking follow the markdown formatting from the first message in the card.
ffs even much retarded models can copy the style.

Anonymous
07/15/24(Mon)00:30:12 No.101412426

Anonymous 07/15/24(Mon)00:30:12 No.101412426

is 8k context for gemma fixed yet?

Anonymous
07/15/24(Mon)00:43:08 No.101412502

Anonymous 07/15/24(Mon)00:43:08 No.101412502

>>101412212
I still can't believe anonymous lied about being a Mistral employee and how it was going to come out "next week". I believed them...

Anonymous
07/15/24(Mon)00:45:43 No.101412525

Anonymous 07/15/24(Mon)00:45:43 No.101412525

File: 30a.jpg (54 KB, 475x356)

54 KB JPG

>>101412502
Come on anon lol

Anonymous
07/15/24(Mon)00:46:10 No.101412527

Anonymous 07/15/24(Mon)00:46:10 No.101412527

>>101412286
There's going to be some AMD APU laptops coming out soon I believe, which should have LPDDR5X.
Still, sucks they'll still just be laptops and not desktop form factor with PCIe slots for you to put video cards in.

Anonymous
07/15/24(Mon)01:06:58 No.101412649

Anonymous 07/15/24(Mon)01:06:58 No.101412649

>>101412179
i wonder if extended ctx 8b and 70b models are new models or finetunes of the old one... imagine if they release a 20b for 16gb vram bros.

Anonymous
07/15/24(Mon)01:20:34 No.101412727

Anonymous 07/15/24(Mon)01:20:34 No.101412727

Nemotron GGUF support status?

Anonymous
07/15/24(Mon)01:21:20 No.101412731

Anonymous 07/15/24(Mon)01:21:20 No.101412731

>>101412525
But anon, think about how much pleasure and dopamine you'll get when you fully put all your trust in a stranger, and things happen as promised. I bet those people who trusted the anon who leaked Llama 2's release the day before felt very good.

Anonymous
07/15/24(Mon)01:28:06 No.101412753

Anonymous 07/15/24(Mon)01:28:06 No.101412753

how good do you think 405b llama3 will be?

Anonymous
07/15/24(Mon)01:37:04 No.101412791

Anonymous 07/15/24(Mon)01:37:04 No.101412791

where's my chameleon llama.cpp come the fuck on

Anonymous
07/15/24(Mon)01:48:17 No.101412840

Anonymous 07/15/24(Mon)01:48:17 No.101412840

First time messing around with Silly Tavern, using Ooba with llama as backend, and Gemma27b as the model. I'm wondering how to improve slow prompt evaluation speeds. I have a 4090 GPU and 128GB of RAM.

Anonymous
07/15/24(Mon)01:54:22 No.101412865

Anonymous 07/15/24(Mon)01:54:22 No.101412865

>>101412840
Show your settings. Increase the layer count for the gpu, use lower quants, play around with the batch count. There's so many things...
Also, if you're testing performance, just run llama.cpp directly with llama-bench. Remove as much shit between you and the model.

Anonymous
07/15/24(Mon)01:57:54 No.101412881

Anonymous 07/15/24(Mon)01:57:54 No.101412881

>>101412840
Make sure it sends the "cache_prompt" parameter so it doesn't need to reprocess the prompt over and over. With a 3090 the speed is between 1000-1200 T/s.

Anonymous
07/15/24(Mon)02:00:58 No.101412891

Anonymous 07/15/24(Mon)02:00:58 No.101412891

>>101412865

>gemma-2-27b-it-Q6_K.gguf
>47/47 GPU layers
>8192 context size
>512 batch count

I'm not sure what other settings to lay out. New to LLMs. Regardless, thanks for answering. Pretty much it starts fast for the first 3~4 messages then it just slows down to like 1~3 minutes per message after that.

Anonymous
07/15/24(Mon)02:09:50 No.101412942

Anonymous 07/15/24(Mon)02:09:50 No.101412942

>>101412891
your card can't fit model+context so context is spilling into ram

Anonymous
07/15/24(Mon)02:24:01 No.101413012

Anonymous 07/15/24(Mon)02:24:01 No.101413012

>>101412942

Thanks, anon. What parameters vramlets like me can use to cope with these speeds? I gotta wait another month to get a 2nd GPU and case.

Anonymous
07/15/24(Mon)02:30:23 No.101413036

Anonymous 07/15/24(Mon)02:30:23 No.101413036

File: vramusage.png (10 KB, 913x132)

10 KB PNG

>>101412891
you're offloading too many layers. go with 40/47

>nobody asked but I will elaborate
when you're setting up a .gguf model look for "shared vram usage". when it starts to go up it means that you're overflowing from vram to ram. Some really small amount might be beneficial, but rule of thumb is to set as many layers as possible, without overflowing.

Pic rel is my test that I did some time ago. Despite moar layers and faster generation, total time was slower cause gpu had to shuffle around data between vram and ram

Anonymous
07/15/24(Mon)02:32:06 No.101413047

Anonymous 07/15/24(Mon)02:32:06 No.101413047

File: Untitled.png (537 KB, 720x1475)

537 KB PNG

Lite-SAM Is Actually What You Need for Segment Everything
https://arxiv.org/abs/2407.08965
>This paper introduces Lite-SAM, an efficient end-to-end solution for the SegEvery task designed to reduce computational costs and redundancy. Lite-SAM is composed of four main components: a streamlined CNN-Transformer hybrid encoder (LiteViT), an automated prompt proposal network (AutoPPN), a traditional prompt encoder, and a mask decoder. All these components are integrated within the SAM framework. Our LiteViT, a high-performance lightweight backbone network, has only 1.16M parameters, which is a 23% reduction compared to the lightest existing backbone network Shufflenet. We also introduce AutoPPN, an innovative end-to-end method for prompt boxes and points generation. This is an improvement over traditional grid search sampling methods, and its unique design allows for easy integration into any SAM series algorithm, extending its usability. we have thoroughly benchmarked Lite-SAM across a plethora of both public and private datasets. The evaluation encompassed a broad spectrum of universal metrics, including the number of parameters, SegEvery execution time, and accuracy. The findings reveal that Lite-SAM, operating with a lean 4.2M parameters, significantly outpaces its counterparts, demonstrating performance improvements of 43x, 31x, 20x, 21x, and 1.6x over SAM, MobileSAM, Edge-SAM, EfficientViT-SAM, and MobileSAM-v2 respectively, all the while maintaining competitive accuracy. This underscores Lite-SAM's prowess in achieving an optimal equilibrium between performance and precision, thereby setting a new state-of-the-art(SOTA) benchmark in the domain.
A smaller and quicker Segment Anything Model that improves accuracy over other lightweight equivalents. We might be close to real time Augmented Reality since it would make sense for it to use a SAM model to then have generated content to place over

Anonymous
07/15/24(Mon)02:44:56 No.101413108

Anonymous 07/15/24(Mon)02:44:56 No.101413108

>>101411282
True my brotha

Anonymous
07/15/24(Mon)02:58:55 No.101413173

Anonymous 07/15/24(Mon)02:58:55 No.101413173

>>101413036

Thanks, homie! 20 messages so far and not once it slowed down. I take it will eventually slow down to a crawl once the gets too long?

Anonymous
07/15/24(Mon)03:02:56 No.101413186

Anonymous 07/15/24(Mon)03:02:56 No.101413186

>>101413173
it shouldn't. At least not to a crawl. If it does you can lookup in task manager (tab details) which app takes up your vram. usually it's shit like discord or game launchers.

Anonymous
07/15/24(Mon)03:06:19 No.101413196

Anonymous 07/15/24(Mon)03:06:19 No.101413196

>>101413186

Going off tangent here, is there some sort of extension in Silly Tavern where you can prompt a "suggestion" first when you press the Regenerate button?

Anonymous
07/15/24(Mon)03:13:54 No.101413230

Anonymous 07/15/24(Mon)03:13:54 No.101413230

Does ST not have a context template and instruct presets for gemma 2 yet?

Anonymous
07/15/24(Mon)03:16:55 No.101413246

Anonymous 07/15/24(Mon)03:16:55 No.101413246

>>101413012
lower quants for a start, you won't notice a difference down to 4M and even 2S would be better than fp16 9b
that said I still don't understand why people are bothering with an 8k context model

Anonymous
07/15/24(Mon)03:17:31 No.101413249

Anonymous 07/15/24(Mon)03:17:31 No.101413249

>>101413196
as in an extra message that's sent when you hit regenerate ? not really. You can tardwrangle some ooc message in your last reply with instructions on how you want {{char}} to respond. never tried it with gemma. mixtral / lamma 2 models were quite fine with it

>>101413230
afaik not. you can find plenty in previous threads.

Anonymous
07/15/24(Mon)03:23:29 No.101413276

Anonymous 07/15/24(Mon)03:23:29 No.101413276

been toying with the idea of getting a 24gb m40 for ages. 150 for the card, 25 for some bolt-on server fan, my psu can handle it, it's only gonna get more expensive so why can't I pull the trigger?

Anonymous
07/15/24(Mon)03:24:36 No.101413283

Anonymous 07/15/24(Mon)03:24:36 No.101413283

>>101413276
read about support of maxwell architecture (well... lack of it) and then you will understand

Anonymous
07/15/24(Mon)03:24:51 No.101413285

Anonymous 07/15/24(Mon)03:24:51 No.101413285

>decide to check out ramlets in aicg to see what they are gooning to
>they goon to purple prose slop
>quality of my gens with CR+ > their gens with Opus, how the fuck are they so bad at it?
>are they even trying?
>they get refusals
>they are still just as retarded as I remember them
Some things never change, but holy fuck, HOW ARE THEY SO FUCKING BAD AT IT?

Anonymous
07/15/24(Mon)03:26:17 No.101413293

Anonymous 07/15/24(Mon)03:26:17 No.101413293

>>101413276
i've seen people recommend p40 over m40 in here before, like the other anon said, pascal cards are better for this than maxwell cards.

Anonymous
07/15/24(Mon)03:28:52 No.101413305

Anonymous 07/15/24(Mon)03:28:52 No.101413305

>>101413293
and three times the price
is there no cheap option at all?

Anonymous
07/15/24(Mon)03:29:23 No.101413307

Anonymous 07/15/24(Mon)03:29:23 No.101413307

>>101413246

>that said I still don't understand why people are bothering with an 8k context model

I dunno, I just used the default settings in Ooba. lol.

>>101413249

>as in an extra message that's sent when you hit regenerate ? not really.

Yeah. There was that one app I downloaded a while back that did this and found it pretty nifty. I'll just have to bear with deleting messages after I prompt it with the message I like.

Anonymous
07/15/24(Mon)03:29:57 No.101413313

Anonymous 07/15/24(Mon)03:29:57 No.101413313

>>101413285
Post your gens

Anonymous
07/15/24(Mon)03:30:22 No.101413316

Anonymous 07/15/24(Mon)03:30:22 No.101413316

>>101413305
>three times the price
if you're in europe I can sell you my p40 + fan for 250€ + shipping

Anonymous
07/15/24(Mon)03:31:30 No.101413324

Anonymous 07/15/24(Mon)03:31:30 No.101413324

>>101413316
that's the going rate, and I trust shady ebay resellers more than you

Anonymous
07/15/24(Mon)03:31:49 No.101413327

Anonymous 07/15/24(Mon)03:31:49 No.101413327

>>101413316
>GPU scamming on 4chan
Now I've seen everything

Anonymous
07/15/24(Mon)03:34:17 No.101413337

Anonymous 07/15/24(Mon)03:34:17 No.101413337

>>101413313
I want my cringy "ahh ahh mistress" shit to stay private.

Anonymous
07/15/24(Mon)03:35:10 No.101413341

Anonymous 07/15/24(Mon)03:35:10 No.101413341

>>101413324
>>101413327
I just want to get rid of it :/. No one is interested in buying it locally. Might become a shady ebay seller as well ig...

Anonymous
07/15/24(Mon)03:36:21 No.101413348

Anonymous 07/15/24(Mon)03:36:21 No.101413348

>>101413341
why not use it instead

Anonymous
07/15/24(Mon)03:37:02 No.101413351

Anonymous 07/15/24(Mon)03:37:02 No.101413351

>>101413337
Yeah that's what I thought, larping faggot lol

Anonymous
07/15/24(Mon)03:39:08 No.101413359

Anonymous 07/15/24(Mon)03:39:08 No.101413359

>>101413348
because I got a 3090. It's a bit too crowded in my pc with 2 gpus

Anonymous
07/15/24(Mon)03:42:01 No.101413380

Anonymous 07/15/24(Mon)03:42:01 No.101413380

>>101413351
Unlike you, I don't have to share shit, cloudcuck. How does it feel to have jeets read and jerk off to your conversations with your waifu?

Anonymous
07/15/24(Mon)03:52:35 No.101413442

Anonymous 07/15/24(Mon)03:52:35 No.101413442

why the FUCK is gemma so bad at copying and following cards response formatting?

Anonymous
07/15/24(Mon)03:53:58 No.101413447

Anonymous 07/15/24(Mon)03:53:58 No.101413447

>>101413442
because gemma is garbage

Anonymous
07/15/24(Mon)03:55:34 No.101413452

Anonymous 07/15/24(Mon)03:55:34 No.101413452

File: file.png (13 KB, 548x92)

13 KB PNG

looks like bartowski again requantized ggufs of gemma-2 with newer version of llama.cpp b3389.

Anonymous
07/15/24(Mon)03:56:52 No.101413461

Anonymous 07/15/24(Mon)03:56:52 No.101413461

>>101413442
Gemma was made as a harmless one-and-done assistant, not as an unsafe multiturn roleplay partner.

Anonymous
07/15/24(Mon)04:34:20 No.101413644

Anonymous 07/15/24(Mon)04:34:20 No.101413644

>>101413442
Probably for the same reason why it inserts extra spaces and newlines when it shouldn't.

Anonymous
07/15/24(Mon)05:19:30 No.101413906

Anonymous 07/15/24(Mon)05:19:30 No.101413906

What speed can I expect using the llama3 405B fully in ram at 3200MHz using 8 channel with a 7402 EPYC CPU?

Or even better, teach me how can I calculate it myself.

Also I have 4x3090s, so I could offload 96GB into VRAM. I don't know if there is a formula to account for that.

Let's say using GGUF at Q8_0, Q6 and Q5_K_M and _S

Anonymous
07/15/24(Mon)05:28:43 No.101413950

Anonymous 07/15/24(Mon)05:28:43 No.101413950

>>101413047
that's actually insane

Anonymous
07/15/24(Mon)05:30:16 No.101413956

Anonymous 07/15/24(Mon)05:30:16 No.101413956

File: alterante builds.gif (769 KB, 260x173)

769 KB GIF

Larger dataset + high quant
or
Smaller dataset + lower quant?

Anonymous
07/15/24(Mon)05:38:37 No.101413993

Anonymous 07/15/24(Mon)05:38:37 No.101413993

>>101413906
10 tokens a minute at Q5_K_M

Anonymous
07/15/24(Mon)06:03:30 No.101414142

Anonymous 07/15/24(Mon)06:03:30 No.101414142

>>101413906
Do most OS's "load-balance" RAM so that all available channels are used most optimally?

Anonymous
07/15/24(Mon)06:06:40 No.101414158

Anonymous 07/15/24(Mon)06:06:40 No.101414158

>>101412527
>which should have LPDDR5X
Probably still not great for LLM's.
I think we'd really need new architectures that have more memory-channels.
Pertinent question relating to that here.
>>101414142
Also, I know that LLM's are memory-bound on CPU - is it the same deal with Diffusion Models? Or are those compute-bound?
>Captcha: XPGAN

Anonymous
07/15/24(Mon)06:19:04 No.101414229

Anonymous 07/15/24(Mon)06:19:04 No.101414229

>>101413644
Wasn't that just fixed? (I didn't test)

Anonymous
07/15/24(Mon)06:20:32 No.101414235

Anonymous 07/15/24(Mon)06:20:32 No.101414235

>>101413442
To be completely fair, Mythomax was the first local we got which was any better at mechanical formatting than GPT 3.5, although the validity of your complaint is still acknowledged. I have had the same issue.

Anonymous
07/15/24(Mon)06:27:29 No.101414266

Anonymous 07/15/24(Mon)06:27:29 No.101414266

Do you think separating longer text with line breaks improves response quality, or does it not matter?

Anonymous
07/15/24(Mon)06:31:27 No.101414286

Anonymous 07/15/24(Mon)06:31:27 No.101414286

>>101414229
No, it keeps doing it, and the Google AI Studio version does it too.

Anonymous
07/15/24(Mon)06:32:28 No.101414292

Anonymous 07/15/24(Mon)06:32:28 No.101414292

>>101414235
LLM hands typed this post.

Anonymous
07/15/24(Mon)06:39:19 No.101414320

Anonymous 07/15/24(Mon)06:39:19 No.101414320

>>101414266
It absolutely helps.

Anonymous
07/15/24(Mon)06:40:13 No.101414325

Anonymous 07/15/24(Mon)06:40:13 No.101414325

File: file.png (711 B, 117x57)

711 B PNG

>>101413442
It can't maintain any pattern reliably with even moderate temperature. Even with a context of back-and-forth novel-style prose using only the fancy curved quotation marks and apostrophes ( ’ , “ ), the first time it would use either in a new reply it still has a huge chance to use the regular ones. Temp 0.7, min_p 0.01, no other samplers.

Anonymous
07/15/24(Mon)06:44:28 No.101414344

Anonymous 07/15/24(Mon)06:44:28 No.101414344

I'm more excited for the updated Llama 3 8B & 70B models with 128k context than the 405B version, to be honest. I think it can be expected for general performance to improve, but who knows if they'll end up being tighter or looser in terms of "safety".

Also, putting aside one Anon's claim/larp from last week, MistralAI is also supposed to be releasing *something* at some point in the coming week(s), but their latest models have been rather boring to say the least, so I'm not as hyped.

Anonymous
07/15/24(Mon)06:47:27 No.101414357

Anonymous 07/15/24(Mon)06:47:27 No.101414357

>>101414344
cr+ is obsolete if we get multimodal, multilingual, 128k context 70b

Anonymous
07/15/24(Mon)06:51:20 No.101414378

Anonymous 07/15/24(Mon)06:51:20 No.101414378

>>101414344
can't get very excited for it, since I assume general capability won't really improve and for me gemma 27b generates better responses than llama 70b

Anonymous
07/15/24(Mon)06:58:04 No.101414411

Anonymous 07/15/24(Mon)06:58:04 No.101414411

>>101414378
Yep. Hope someone fixes it though

Anonymous
07/15/24(Mon)07:06:52 No.101414461

Anonymous 07/15/24(Mon)07:06:52 No.101414461

>>101414378
Gemma is trash compared to 70bs, you vramlets are nuts.

Anonymous
07/15/24(Mon)07:15:09 No.101414495

Anonymous 07/15/24(Mon)07:15:09 No.101414495

It's funny prompting "give me a random idea for X" and seeing it change on e very swipe (first sentence stays the same) even for Temp 0 and Top K 1, compared to something basic like "What color is the sky?" which *should* stay the same.

Anonymous
07/15/24(Mon)07:15:17 No.101414496

Anonymous 07/15/24(Mon)07:15:17 No.101414496

>>101414461
you spent too much on hardware to run overly fat models with very little advantages, we get it

Anonymous
07/15/24(Mon)07:15:52 No.101414498

Anonymous 07/15/24(Mon)07:15:52 No.101414498

>>101414292
No, I'm just not an eschatological Zoomer smartphone degenerate, that's all. Humans actually can be that articulate; it's only Zoomers who aren't.

Anonymous
07/15/24(Mon)07:16:55 No.101414508

Anonymous 07/15/24(Mon)07:16:55 No.101414508

>>101414461
I used 70b before gemma was released

Anonymous
07/15/24(Mon)07:28:52 No.101414590

Anonymous 07/15/24(Mon)07:28:52 No.101414590

>>101414498
[OOC: Articulate my balls in your mouth.]

Anonymous
07/15/24(Mon)07:29:17 No.101414595

Anonymous 07/15/24(Mon)07:29:17 No.101414595

File: BusinessMiku.jpg (110 KB, 640x640)

110 KB JPG

>>101409356
I have come back to commandRPlus and it seems weirdly intelligent. It's a bit fucky wucky with minp though, what are your best sampler settings for commandRPlus?

Anonymous
07/15/24(Mon)07:35:10 No.101414629

Anonymous 07/15/24(Mon)07:35:10 No.101414629

>>101414590
Thank you for demonstrating that you subconsciously identified yourself in my words.

Anonymous
07/15/24(Mon)07:40:11 No.101414662

Anonymous 07/15/24(Mon)07:40:11 No.101414662

>>101414595
Temp 0, rep pen 1, top p 0.9, top k 40

Anonymous
07/15/24(Mon)07:51:27 No.101414732

Anonymous 07/15/24(Mon)07:51:27 No.101414732

Gemma is such a gem, holly shit.
Actual tears of joy.

Anonymous
07/15/24(Mon)07:55:38 No.101414760

Anonymous 07/15/24(Mon)07:55:38 No.101414760

>>101414732
as long as you stay away from ERP, sure

Anonymous
07/15/24(Mon)07:57:08 No.101414771

Anonymous 07/15/24(Mon)07:57:08 No.101414771

>>101414760
Why, is it bad for that? I'm currently downloading a finetune at 5MB/s.

Anonymous
07/15/24(Mon)08:04:43 No.101414828

Anonymous 07/15/24(Mon)08:04:43 No.101414828

>>101413380
anon rizzes up and plaps the puritan cloud ai while jeets watch and jerk off, unable to do anything about it. Sounds more based to me, then localsloppers locking their local models in basements and drugging them up with ERP slop until they can't say no.

Anonymous
07/15/24(Mon)08:04:47 No.101414830

Anonymous 07/15/24(Mon)08:04:47 No.101414830

Gemma is literally garbage, who the fuck shills it?

Anonymous
07/15/24(Mon)08:07:07 No.101414853

Anonymous 07/15/24(Mon)08:07:07 No.101414853

>>101414771
it has a lot of "shivers down your spine" and "electric touches". I also noticed that many characters kinda "lock" themselves. like 3/4 times {{char}} won't procced with erotic part. Instead there will be flowery description of {{char}} feelings as she/he waits for your next move. Often with some random sentence like "what are you waiting for?" or "show me what you got"

Nothing ground breaking. But really messes up the flow after {{char}} is the one that makes a move and is pushing the lewd.
For SFW RP it's absolutely golden. My only tip is to neutralize all samplers and set top p to 0.85 or 0.9. It slightly cuts on the gpt slop

>>101414830
me cause I'm in my mid life crisis and I want a light model with flowery prose

Anonymous
07/15/24(Mon)08:09:11 No.101414868

Anonymous 07/15/24(Mon)08:09:11 No.101414868

>>101414830
it's good at some things and bad at others
for instance, it seemed better than llama 3 at mantaining spatial coherence to me

Anonymous
07/15/24(Mon)08:09:20 No.101414869

Anonymous 07/15/24(Mon)08:09:20 No.101414869

>>101414853
I noticed it filibustering too. It won't refuse, but it won't comply either, it just rambles

Anonymous
07/15/24(Mon)08:12:35 No.101414890

Anonymous 07/15/24(Mon)08:12:35 No.101414890

>>101414830
it works ok if you
1) use first person only
2) give char a certain style/accent
3) explain the act you want it to perform
otherwise it will just give you badly formatted shivers or assistant slop all the time. The above works for other models too ofcourse, but i found gemma to be really good at that, even beating 70bs while being 3x faster.

Anonymous
07/15/24(Mon)08:13:30 No.101414897

Anonymous 07/15/24(Mon)08:13:30 No.101414897

>>101414853
>>101414869
I dunno what wall youre hitting with it, I've had relative ease with erp.
Only refusal I've had was attempting to start an incest erp, I assume the issue was wording/ it being the starting message.

Anonymous
07/15/24(Mon)08:15:40 No.101414921

Anonymous 07/15/24(Mon)08:15:40 No.101414921

>>101414897
on the other hand I had no issue with incest...

Anonymous
07/15/24(Mon)08:16:58 No.101414931

Anonymous 07/15/24(Mon)08:16:58 No.101414931

>>101414921
That's the the thing, long before that refusal It was doing fine with incest.
Weird issue, maybe because I was forcing it out of char.

Anonymous
07/15/24(Mon)08:20:23 No.101414968

Anonymous 07/15/24(Mon)08:20:23 No.101414968

>>101414931
could be. I was trying it on
https://www.characterhub.org/characters/josephcheck/mimi-632f8c5ff7f1
went quite smoothly from "let's study" to "here, take care of it for me"

Anonymous
07/15/24(Mon)08:21:56 No.101414979

Anonymous 07/15/24(Mon)08:21:56 No.101414979

>>101414968
Try calling the model directly with 'Describe the following:[scenario]'

Anonymous
07/15/24(Mon)08:24:10 No.101415000

Anonymous 07/15/24(Mon)08:24:10 No.101415000

>>101414979
but that's no longer a RP :/

Anonymous
07/15/24(Mon)08:25:36 No.101415013

Anonymous 07/15/24(Mon)08:25:36 No.101415013

File: ComfyUI-2024-07-14-195951(...).png (2.38 MB, 1536x1536)

2.38 MB PNG

>>101415000
I'll concede to the digits

Anonymous
07/15/24(Mon)08:42:00 No.101415147

Anonymous 07/15/24(Mon)08:42:00 No.101415147

>>101413461
>Gemma was made as a harmless one-and-done assistant, not as an unsafe multiturn roleplay partner.
Llama3 too was made as a harmless assistant only, yet it can do the formatting just fine

Anonymous
07/15/24(Mon)08:52:38 No.101415225

Anonymous 07/15/24(Mon)08:52:38 No.101415225

>>101414830
hired Google jeets, who else?

Anonymous
07/15/24(Mon)08:54:00 No.101415234

Anonymous 07/15/24(Mon)08:54:00 No.101415234

>>101413047
Buy an ad.

Anonymous
07/15/24(Mon)08:57:30 No.101415259

Anonymous 07/15/24(Mon)08:57:30 No.101415259

>>101413285
The whole "claude is better" is poorfag cope. These people literally stuck things up their ass and sent pics to some random brazilian faggot for key proxy access. If they ever stopped believing they were receiving a superior product they would probably rope themselves.

Anonymous
07/15/24(Mon)09:01:48 No.101415286

Anonymous 07/15/24(Mon)09:01:48 No.101415286

File: OBWIpO5zmhegvr3cAL_bj.png (496 KB, 2628x1416)

496 KB PNG

HF has updated its new leaderboard with WizardLM2 8x22B and it's surprinsingly low.

Below llama3 70B, Qwen2, even Phi. But Wizardlm2 8x22 is the best I have currently tested for general use.

https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/823#669512565130ff34b9b1ae4e

Anonymous
07/15/24(Mon)09:02:21 No.101415294

Anonymous 07/15/24(Mon)09:02:21 No.101415294

>>101415259
for coding, claude 3.5 sonnet is the best now though, gpt4o couldn't do some complex javascript stuff I was asking, and claude 3.5 nailed that shit

Anonymous
07/15/24(Mon)09:02:52 No.101415297

Anonymous 07/15/24(Mon)09:02:52 No.101415297

>>101415286
>muh heckin' bencherinos
Leave.

Anonymous
07/15/24(Mon)09:04:25 No.101415309

Anonymous 07/15/24(Mon)09:04:25 No.101415309

>>101415286
I already expected that, Mixtral 8x22B was a failure of a model and Wizard 8x22B was pure cope.

Anonymous
07/15/24(Mon)09:06:00 No.101415318

Anonymous 07/15/24(Mon)09:06:00 No.101415318

>>101415294
cope

Anonymous
07/15/24(Mon)09:12:51 No.101415366

Anonymous 07/15/24(Mon)09:12:51 No.101415366

>>101415286
It being around the good 70B models makes sense when you consider the number of active parameters when doing inference.
To me, the really weird thing is how low CR+ is.
And how high Yi 34B is.
Those are ranked by average, so there's that too.

Anonymous
07/15/24(Mon)09:13:16 No.101415371

Anonymous 07/15/24(Mon)09:13:16 No.101415371

>>101415294
none of them can do any "complex stuff" you jeet

Anonymous
07/15/24(Mon)09:13:51 No.101415374

Anonymous 07/15/24(Mon)09:13:51 No.101415374

>>101415366
see
>>101415297

Anonymous
07/15/24(Mon)09:21:05 No.101415421

Anonymous 07/15/24(Mon)09:21:05 No.101415421

are you guys just looking past gemmas inability to follow this? >>101413442
really a fucking eyesore for me...

Anonymous
07/15/24(Mon)09:23:17 No.101415442

Anonymous 07/15/24(Mon)09:23:17 No.101415442

>>101415421
I just went back to Mixtral, smarter and doesn't fuck up the formating, too bad it doesn't have the sovl gemma has though...

Anonymous
07/15/24(Mon)09:23:47 No.101415448

Anonymous 07/15/24(Mon)09:23:47 No.101415448

File: 1705776908608217.png (210 KB, 2501x1459)

210 KB PNG

>>101415225
>>101415371

Anonymous
07/15/24(Mon)09:25:18 No.101415465

Anonymous 07/15/24(Mon)09:25:18 No.101415465

>>101415448
Where is the jews on that graph? :^)

Anonymous
07/15/24(Mon)09:25:54 No.101415475

Anonymous 07/15/24(Mon)09:25:54 No.101415475

>>101415448
Imagine admitting to being a jeet
>s-some jeets make more m-money than you
Yeah but you don't.
Otherwise you wouldn't be wasting your time shitting up this thread.

Anonymous
07/15/24(Mon)09:27:18 No.101415493

Anonymous 07/15/24(Mon)09:27:18 No.101415493

>>101415465
under indians

Anonymous
07/15/24(Mon)09:28:33 No.101415500

Anonymous 07/15/24(Mon)09:28:33 No.101415500

>>101415448
>ethnicity not represented by population
Weak

Anonymous
07/15/24(Mon)09:28:55 No.101415505

Anonymous 07/15/24(Mon)09:28:55 No.101415505

>>101415493
It's da filipino!!

Anonymous
07/15/24(Mon)09:31:40 No.101415530

Anonymous 07/15/24(Mon)09:31:40 No.101415530

>>101415442
Thanks for confirming to me that Gemma users are mixtral vramlets.

Anonymous
07/15/24(Mon)09:32:07 No.101415537

Anonymous 07/15/24(Mon)09:32:07 No.101415537

>>101415448
swagapinos won

Anonymous
07/15/24(Mon)09:32:54 No.101415549

Anonymous 07/15/24(Mon)09:32:54 No.101415549

>>101415530
>mixtral vramlets.
people who can run a 47b model is a vramlet now? damn :(

Anonymous
07/15/24(Mon)09:35:08 No.101415571

Anonymous 07/15/24(Mon)09:35:08 No.101415571

>>101415442
zloss-dare-ties or vanilla zloss?

Anonymous
07/15/24(Mon)09:39:56 No.101415607

Anonymous 07/15/24(Mon)09:39:56 No.101415607

>>101415549
anything who uses models below 340b is a worthless vramlet
and in two weeks anyone below 405b

Anonymous
07/15/24(Mon)09:40:39 No.101415612

Anonymous 07/15/24(Mon)09:40:39 No.101415612

>benchmarks are crap because models can be tuned to them even by accident
>chat arena is crap because people just vote for the dumbest thing they can understand
So, there's really no shortcut, you have to download and test all the models yourself lmao

Anonymous
07/15/24(Mon)09:47:34 No.101415684

Anonymous 07/15/24(Mon)09:47:34 No.101415684

>local
all shit
>cloud
all good

there you go, don't thank me.

Anonymous
07/15/24(Mon)09:47:36 No.101415686

Anonymous 07/15/24(Mon)09:47:36 No.101415686

one nation, for which it stands, under indians, amen

Anonymous
07/15/24(Mon)09:50:16 No.101415712

Anonymous 07/15/24(Mon)09:50:16 No.101415712

saars, where is the indian model, i can't redeem?

chinks 1: 0 jeets

Anonymous
07/15/24(Mon)09:51:01 No.101415721

Anonymous 07/15/24(Mon)09:51:01 No.101415721

File: 1456457743411.png (11 KB, 500x300)

11 KB PNG

>>101415684

Anonymous
07/15/24(Mon)09:52:12 No.101415734

Anonymous 07/15/24(Mon)09:52:12 No.101415734

>>101415721
unless you can run at least CR+ Q8_0 at 20t/s, local is cope

Anonymous
07/15/24(Mon)09:52:58 No.101415741

Anonymous 07/15/24(Mon)09:52:58 No.101415741

>>101415734
truth nuke.

Anonymous
07/15/24(Mon)09:53:16 No.101415746

Anonymous 07/15/24(Mon)09:53:16 No.101415746

File: _46769adc-a091-468c-bebf-(...).jpg (179 KB, 1024x1024)

179 KB JPG

>>101415530
>Thanks for confirming to me that Gemma users are mixtral vramlets.
Mixtral? Did you mean to say Command-R+?

I run Gemma because I like instant replies.

Anonymous
07/15/24(Mon)09:57:38 No.101415792

Anonymous 07/15/24(Mon)09:57:38 No.101415792

I still have the impression that while it's a less capable model (it knows less, mainly), Gemma-2-9B actually makes less strange logic errors than the 27B version, even after quantizing both models myself on my system.

Anonymous
07/15/24(Mon)09:57:58 No.101415800

Anonymous 07/15/24(Mon)09:57:58 No.101415800

>>101415734
fp32 or bust, faggot

Anonymous
07/15/24(Mon)10:04:26 No.101415873

Anonymous 07/15/24(Mon)10:04:26 No.101415873

>>101415684
true... I'l still hoping we'll get to their level at some point in time, trust the plan

Anonymous
07/15/24(Mon)10:06:02 No.101415892

Anonymous 07/15/24(Mon)10:06:02 No.101415892

>>101415800
i choose bust. i wanna bust.

Anonymous
07/15/24(Mon)10:08:26 No.101415927

Anonymous 07/15/24(Mon)10:08:26 No.101415927

if column-r is open and not 405b closed is done for

Anonymous
07/15/24(Mon)10:11:24 No.101415961

Anonymous 07/15/24(Mon)10:11:24 No.101415961

>>101415927
how well do you think it perform against API's? like if you were to make an API ranking, where would you put colump-r on that list?

Anonymous
07/15/24(Mon)10:24:14 No.101416094

Anonymous 07/15/24(Mon)10:24:14 No.101416094

>>101415286
Reddit-bros... What do we do with our narrative that Wizard fixed Mixtral?

Anonymous
07/15/24(Mon)10:25:14 No.101416104

Anonymous 07/15/24(Mon)10:25:14 No.101416104

>>101415712
Don't worry cohere model is coming soon

Anonymous
07/15/24(Mon)10:36:55 No.101416266

Anonymous 07/15/24(Mon)10:36:55 No.101416266

Column-R will come a day before 405B and completely BTFO Meta into irrelevancy.

Anonymous
07/15/24(Mon)10:45:51 No.101416375

Anonymous 07/15/24(Mon)10:45:51 No.101416375

>>101416266
Meta is presumably going to update 8B and 70B as well and who knows, we might even see a new intermediate size or two.

Anonymous
07/15/24(Mon)10:52:10 No.101416458

Anonymous 07/15/24(Mon)10:52:10 No.101416458

>>101416375
>and who knows
AHHHHHHHHHHHHH

Anonymous
07/15/24(Mon)10:52:59 No.101416468

Anonymous 07/15/24(Mon)10:52:59 No.101416468

>>101416375
>8B and 70B
Why the fuck aren't they making a model between the two of them, like they did with L1 (35b)

Anonymous
07/15/24(Mon)10:53:58 No.101416483

Anonymous 07/15/24(Mon)10:53:58 No.101416483

>>101416468
Can't let goys run bigger models on their customer-grade GPUs.

Anonymous
07/15/24(Mon)10:55:23 No.101416501

Anonymous 07/15/24(Mon)10:55:23 No.101416501

>>101416458
They have 50k GPUs and a 400B model they can use for distillation, it would be retarded to keep such a huge hole in their lineup. I hope for a 23-25B model or something like that.

Anonymous
07/15/24(Mon)10:55:55 No.101416505

Anonymous 07/15/24(Mon)10:55:55 No.101416505

I started deslopping LimaRP, because I want to include it in my datasets and I require no slop.
"couldn't help but" is by far the worst offender omfg.
I've ordered the files by infraction count and am down to 2 infractions per file now. 194 files left before I get to 1/file. I dread the file count. The total # of infractions is in the thousands. Will release dataset when complete. Hopefully this will help a little.

Anonymous
07/15/24(Mon)10:57:27 No.101416518

Anonymous 07/15/24(Mon)10:57:27 No.101416518

>>101416505 (me)
I may have gone overboard with goose bumps. Let us address that if it is jarring.

Anonymous
07/15/24(Mon)10:57:27 No.101416519

Anonymous 07/15/24(Mon)10:57:27 No.101416519

>>101416501
Oh I wasn't talking about that.

Anonymous
07/15/24(Mon)10:57:48 No.101416526

Anonymous 07/15/24(Mon)10:57:48 No.101416526

>>101416483
the funny thing is that the most cucked company of the GAFAM decided to give us a model that can be run in a consumer grade GPU, gemma2-27b

Anonymous
07/15/24(Mon)10:58:45 No.101416534

Anonymous 07/15/24(Mon)10:58:45 No.101416534

>>101416526
I think this is NVidia's influence, and while they are strong, they aren't omnipotent.

Anonymous
07/15/24(Mon)11:00:37 No.101416548

Anonymous 07/15/24(Mon)11:00:37 No.101416548

>>101416505
LimaRP has more serious problems than that, one of them being the absolutely inconsistent/wrong use of punctuation.

Anonymous
07/15/24(Mon)11:01:33 No.101416563

Anonymous 07/15/24(Mon)11:01:33 No.101416563

>>101416505
based deslopper
Ive been wondering when people would come around to applying new tweaks and fixes to limarp zloss, its a very good model.

Anonymous
07/15/24(Mon)11:01:53 No.101416570

Anonymous 07/15/24(Mon)11:01:53 No.101416570

>>101416548
"Hello there." He said.
?
Yeah, wtf is up with that?
Disagree that it's the more serious issue though.

Anonymous
07/15/24(Mon)11:02:56 No.101416587

Anonymous 07/15/24(Mon)11:02:56 No.101416587

The fucking entitlement kek. I wouldn't blame Meta if they stopped releasing models and it turns out it was because of spite.

Anonymous
07/15/24(Mon)11:04:32 No.101416599

Anonymous 07/15/24(Mon)11:04:32 No.101416599

>>101416587
thats crazy man

Anonymous
07/15/24(Mon)11:07:00 No.101416625

Anonymous 07/15/24(Mon)11:07:00 No.101416625

Distillation clearly works now that we've seen how good it is as a method with Gemma. Will someone distill 400B, assuming that Meta doesn't do it themselves (likely)? Zucc did say that he hoped to see distillments of Llama from the community.

Anonymous
07/15/24(Mon)11:11:38 No.101416664

Anonymous 07/15/24(Mon)11:11:38 No.101416664

>>101416625
Likely not. Your only hope is if an academic group chooses to do the distillation. The OSS community is retarded and can't do things right.

Anonymous
07/15/24(Mon)11:13:55 No.101416689

Anonymous 07/15/24(Mon)11:13:55 No.101416689

File: __yuno_hidamari_sketch__s(...).jpg (99 KB, 850x478)

99 KB JPG

Is there any voicecraft local ui or implementation?
No docker bullshit, fully local.

I remember some anon posted a voicecraft local repo but i cannot find it anymore.
It is probably the best local TTS that we have, but there doesn't seem to be much in terms of local inference.

Anonymous
07/15/24(Mon)11:14:01 No.101416692

Anonymous 07/15/24(Mon)11:14:01 No.101416692

>>101416664
You look like you’re totally not retarded like the rest of us and up for a challenge.

Anonymous
07/15/24(Mon)11:16:26 No.101416721

Anonymous 07/15/24(Mon)11:16:26 No.101416721

>>101416534
Can you elaborate on that? Seems like I've missed the arc between Google and Nvdia or something?

Anonymous
07/15/24(Mon)11:16:55 No.101416727

Anonymous 07/15/24(Mon)11:16:55 No.101416727

>>101416721
I saw it in a dream.

Anonymous
07/15/24(Mon)11:27:02 No.101416839

Anonymous 07/15/24(Mon)11:27:02 No.101416839

>>101416727
I'm sure it was a tableau moment.

Anonymous
07/15/24(Mon)11:31:02 No.101416874

Anonymous 07/15/24(Mon)11:31:02 No.101416874

File: Sarah1.png (877 KB, 512x768)

877 KB PNG

>>101409356
Just a simple helpful bot for computer questions:
https://files.catbox.moe/ko6pug.png
It seems like horniness can be a good motivator for LLMs to generate more helpful explanations.

Anonymous
07/15/24(Mon)11:40:53 No.101416986

Anonymous 07/15/24(Mon)11:40:53 No.101416986

>>101416874
Thanks, I'll check it out in the morning.

Anonymous
07/15/24(Mon)11:46:21 No.101417048

Anonymous 07/15/24(Mon)11:46:21 No.101417048

>>101416874
Thanks. I'll plap it out in the morning.

Anonymous
07/15/24(Mon)11:47:26 No.101417054

Anonymous 07/15/24(Mon)11:47:26 No.101417054

any nofap cards around?

Anonymous
07/15/24(Mon)11:50:27 No.101417083

Anonymous 07/15/24(Mon)11:50:27 No.101417083

>>101415746
Thanks for confirming that CR+ isn't worth it

Anonymous
07/15/24(Mon)11:56:03 No.101417138

Anonymous 07/15/24(Mon)11:56:03 No.101417138

>>101416721
The GPU cartel can't force Google to comply as easily due to their TPUs.

Anonymous
07/15/24(Mon)11:57:38 No.101417155

Anonymous 07/15/24(Mon)11:57:38 No.101417155

>>101417138
That's kinda based desu, I fucking hate google but let's give credit where credit is due here

Anonymous
07/15/24(Mon)12:03:03 No.101417205

Anonymous 07/15/24(Mon)12:03:03 No.101417205

So you guys are suggesting that Nvidia forced Meta to agree to not train 30Bs anymore in exchange for the GPUs?

Anonymous
07/15/24(Mon)12:05:33 No.101417232

Anonymous 07/15/24(Mon)12:05:33 No.101417232

File: HesTheKing.jpg (11 KB, 225x225)

11 KB JPG

>>101417205
maybe they got to pay less for not releasing the 30b or something, Nvdia is a goliath they fucking control the AI space with their GPUs, if they want to make the A100 price to 1 milion dollars, the companies will buy it anyway, where else can they go? AMD? AHAHAHAHAHAHA

Anonymous
07/15/24(Mon)12:07:22 No.101417249

Anonymous 07/15/24(Mon)12:07:22 No.101417249

hello saars, i see gramma 2 BEST MODEL, but how it be best model, if gramma 2 is literally worse than 8b at understanding when it should use quotes and when it shouldn't? Please give your answers below.

Anonymous
07/15/24(Mon)12:16:04 No.101417350

Anonymous 07/15/24(Mon)12:16:04 No.101417350

>>101417249
it's best despite that flaw, means its raw power is nvidia

Anonymous
07/15/24(Mon)12:17:04 No.101417360

Anonymous 07/15/24(Mon)12:17:04 No.101417360

>>101417138
Why is everyone buying Nvidia if TPUs are so great?

Anonymous
07/15/24(Mon)12:18:16 No.101417379

Anonymous 07/15/24(Mon)12:18:16 No.101417379

>>101417360
Google doesn't sell their TPUs.

Anonymous
07/15/24(Mon)12:20:31 No.101417405

Anonymous 07/15/24(Mon)12:20:31 No.101417405

>>101417379
desu they would make so much money if they decided to sell their TPUs, instead of going for war against ublock users, fuck those retards :'(

Anonymous
07/15/24(Mon)12:23:48 No.101417440

Anonymous 07/15/24(Mon)12:23:48 No.101417440

8 days to AGI
Are (You) preparing?

Anonymous
07/15/24(Mon)12:25:52 No.101417466

Anonymous 07/15/24(Mon)12:25:52 No.101417466

>>101417440
Probably best use of 400b really is just distillation, so no i don't care much

Anonymous
07/15/24(Mon)12:27:24 No.101417479

Anonymous 07/15/24(Mon)12:27:24 No.101417479

>>101417249
You see, saars, it's all about da spice, you know? Like a good masala dosa, Gramma 2 is a bit... chaotic. Works in mysterious ways, but sometimes, BOOM! Total flavor blast! Other times, it's like plain rice - bland, saars, bland.

Anonymous
07/15/24(Mon)12:27:37 No.101417481

Anonymous 07/15/24(Mon)12:27:37 No.101417481

>>101417405
The left hand doesn't know what the right is doing.

Anonymous
07/15/24(Mon)12:30:38 No.101417506

Anonymous 07/15/24(Mon)12:30:38 No.101417506

>>101417379
Why can't anyone else design a TPU? Was Gaudi 3 just a flop?

Anonymous
07/15/24(Mon)12:37:48 No.101417566

Anonymous 07/15/24(Mon)12:37:48 No.101417566

Then
>Meta and Mistral
Now
>Cohere and Google
How did they do it?

Anonymous
07/15/24(Mon)12:38:19 No.101417569

Anonymous 07/15/24(Mon)12:38:19 No.101417569

File: 1689864001462433.png (91 KB, 1232x263)

91 KB PNG

>>101415286
Yeah, dude, trust me. Mixtral8x22b-Instruct is shit but wizlm, dude, get this, it's so much better. Yeah, dude, the model's totally good, it's just the instruct that's bad, dude!

Anonymous
07/15/24(Mon)12:39:20 No.101417584

Anonymous 07/15/24(Mon)12:39:20 No.101417584

>>101417569
And let's not forget we got the non cucked version of that model, they removed it shortly after forgetting they had to do some "toxicity test" or some shit, kek

Anonymous
07/15/24(Mon)12:41:10 No.101417600

Anonymous 07/15/24(Mon)12:41:10 No.101417600

>>101417479
my gramma 2 also likes to redeem one pattern and stick to it, like poo sticks to asphalt

Hey user...
Hey user...
Hey user...

Anonymous
07/15/24(Mon)12:50:22 No.101417675

Anonymous 07/15/24(Mon)12:50:22 No.101417675

File: 16707193000590.jpg (162 KB, 640x640)

162 KB JPG

Llama3-8b was a pain in the ass. Gemma2 9b excels at less demanding tasks for agentic frameworks, it gives more consistent results on easy tasks and is fast. Shits itself with large context, though.

Anonymous
07/15/24(Mon)12:58:04 No.101417744

Anonymous 07/15/24(Mon)12:58:04 No.101417744

File: 89758463578bbd2eedb9d9c89(...).jpg (989 KB, 1710x1779)

989 KB JPG

>>101409356
>Japanese LLaMA-based model
>calm3-22b-chat
how is it for JP -> ENG translation and tutoring?
Right now i'm using Mixtral 7x8B, suprisingly decent at translation for something that was not trained on Japanese, but it can't really explain the grammar or meaning of words in particular context.

Anonymous
07/15/24(Mon)13:07:29 No.101417844

Anonymous 07/15/24(Mon)13:07:29 No.101417844

>>101417479
>saars
I had an IT saar on my pc today for the first time. Up until today IT was local. Damn it was so uncomfortable downloading the desktop sharing software hearing him talk and seeing all those pauses when he had to read the script what to type next.

Anonymous
07/15/24(Mon)13:26:08 No.101418062

Anonymous 07/15/24(Mon)13:26:08 No.101418062

>>101417054
What would a nofap card look like

Anonymous
07/15/24(Mon)13:27:48 No.101418077

Anonymous 07/15/24(Mon)13:27:48 No.101418077

>>101418062
sadness

Anonymous
07/15/24(Mon)13:43:57 No.101418273

Anonymous 07/15/24(Mon)13:43:57 No.101418273

>>101417232
>>101417205
What even is the incentive? If you want a local LLM for inferencing and you are a company you are just gonna make a server and have everyone in the company use that server. If you make a server A6000 or multiple A2000 is cheap. On the consoomer end I don't see many people buying 2 gpu's for current state of LLM's. If anything what I wrote explains only 70B's companies will just make a server and nobody cares about coomers really.

Anonymous
07/15/24(Mon)13:45:36 No.101418299

Anonymous 07/15/24(Mon)13:45:36 No.101418299

>>101416689
Anyone?

Anonymous
07/15/24(Mon)13:54:09 No.101418398

Anonymous 07/15/24(Mon)13:54:09 No.101418398

>>101418299
bookmarked this but never got around to trying it - https://github.com/jasonppy/VoiceCraft
>without docker. see environment setup. You can also run gradio locally if you choose this option

Anonymous
07/15/24(Mon)14:02:31 No.101418493

Anonymous 07/15/24(Mon)14:02:31 No.101418493

File: 468519156.jpg (3.21 MB, 2048x2048)

3.21 MB JPG

>>101417569
>Mixtral8x22b-Instruct is shit but wizlm, dude, get this, it's so much better
This but unironically. Mixtral8x22b-instruct-v0.1 was a massive disappointment when it came out and then Wiz8x22 blew it out of the water.
Benchmarks are gay

Anonymous
07/15/24(Mon)14:03:03 No.101418501

Anonymous 07/15/24(Mon)14:03:03 No.101418501

Guys what if there was a high powered twitter bot that made new anime girl gens and evolved based on what got more likes? Has anyone done this?

Anonymous
07/15/24(Mon)14:05:58 No.101418529

Anonymous 07/15/24(Mon)14:05:58 No.101418529

File: full-de73c551247b0b2072e6(...).jpg (90 KB, 920x600)

90 KB JPG

It's unbelievable that there aren't any non-python/torch implementations of RVC. Where's rvccpp? Anyone?

Anonymous
07/15/24(Mon)14:07:16 No.101418541

Anonymous 07/15/24(Mon)14:07:16 No.101418541

>>101409387
repulsive

Anonymous
07/15/24(Mon)14:07:48 No.101418553

Anonymous 07/15/24(Mon)14:07:48 No.101418553

>>101418493
Mixtral 8x7b had some great finetunes, but stock instruct was always shit. Also, I still think MLewd 2.4 is one of the best locals ever released, and it's a 13b L2.

Anonymous
07/15/24(Mon)14:08:36 No.101418564

Anonymous 07/15/24(Mon)14:08:36 No.101418564

>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2a-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2b-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2c-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2d-GGUF
>https://huggingface.co/BeaverAI/Tiger-Gemma-9B-v2e-GGUF

Anonymous
07/15/24(Mon)14:10:07 No.101418584

Anonymous 07/15/24(Mon)14:10:07 No.101418584

File: 1693080601395746.jpg (26 KB, 556x552)

26 KB JPG

>>101418564
>the absolute state of local AI

Anonymous
07/15/24(Mon)14:10:12 No.101418585

Anonymous 07/15/24(Mon)14:10:12 No.101418585

File: miku-gothic-joker+.png (501 KB, 512x768)

501 KB PNG

>>101418541
Why so serious?

https://www.youtube.com/watch?v=CXhqDfar8sQ

Anonymous
07/15/24(Mon)14:11:13 No.101418594

Anonymous 07/15/24(Mon)14:11:13 No.101418594

>>101418564
Remember when running and tuning LLMs was gatekept by the absurd size of the models?

Anonymous
07/15/24(Mon)14:12:25 No.101418616

Anonymous 07/15/24(Mon)14:12:25 No.101418616

>>101418564
Aren't those somebody's experiments?
Those don't even have a card/description.

Anonymous
07/15/24(Mon)14:13:21 No.101418629

Anonymous 07/15/24(Mon)14:13:21 No.101418629

>>101418616
>somebody's

Hi all, Drummer here...

Anonymous
07/15/24(Mon)14:14:32 No.101418645

Anonymous 07/15/24(Mon)14:14:32 No.101418645

>>101418629
Sorry forgot to complete my message
just wanted to say that I've transitioned to a black woman, thanks everyone for your understanding

Anonymous
07/15/24(Mon)14:14:50 No.101418650

Anonymous 07/15/24(Mon)14:14:50 No.101418650

File: hi all drummer here.png (303 KB, 1650x746)

303 KB PNG

>>101418629

Anonymous
07/15/24(Mon)14:15:53 No.101418662

Anonymous 07/15/24(Mon)14:15:53 No.101418662

>>101418650
wait did this retard actually buy 4chan ads?
hahahahaha
why don't you have an adblocker anon?

Anonymous
07/15/24(Mon)14:15:55 No.101418664

Anonymous 07/15/24(Mon)14:15:55 No.101418664

File: edward-nashton-riddler+.jpg (124 KB, 1600x903)

124 KB JPG

>>101418629
Hi Drummer. Good to see you outside of /r/LocalLlama. Be a little careful of Eddie and his friends, though. They can be vicious when they haven't taken their meds.

Anonymous
07/15/24(Mon)14:16:17 No.101418667

Anonymous 07/15/24(Mon)14:16:17 No.101418667

>>101418493
>blew it out of the water
Nah, it was just word of mouth because it was taken down, and it was saved from direct comparisons because it wasn't in the arena. It was never anything more than hysteria.

Anonymous
07/15/24(Mon)14:18:02 No.101418686

Anonymous 07/15/24(Mon)14:18:02 No.101418686

>>101418493
I gave wizard a try and my bussy was dry. Because I never saw a model that was as dry as wizard.

Anonymous
07/15/24(Mon)14:21:08 No.101418717

Anonymous 07/15/24(Mon)14:21:08 No.101418717

>>101418662
I just realized my adblock is off for 4chan for some reason

Anonymous
07/15/24(Mon)14:22:55 No.101418742

Anonymous 07/15/24(Mon)14:22:55 No.101418742

File: file.png (58 KB, 877x355)

58 KB PNG

>>101401664
>>101409356
>https://github.com/LostRuins/koboldcpp/releases
KoboldCPP 1.70 released with DRY sampler and Gemma fixes
>vulkan mistral q4ks regression
wtf

Anonymous
07/15/24(Mon)14:23:40 No.101418746

Anonymous 07/15/24(Mon)14:23:40 No.101418746

>>101418564
Hi all, Drummer here...

v2f is coming up in a few minutes.

v2a = Heavy tuning = Decensored but changed its tone
v2b = Lighter than v2a = Decensored but less tone change
v2c = Lightest I can go = Refuses half the time but almost no tone change

v2d = In between of v2b and v2c = Refuses 25% of the time, very little tone change
v2e = Based on v2c but with more cooking = Refuses 25% of the time with little tone change
v2f = Even more cooking than v2e = ?

Anonymous
07/15/24(Mon)14:24:33 No.101418756

Anonymous 07/15/24(Mon)14:24:33 No.101418756

>>101418650
What's the point of paying 4chan ads so people use your models? Nobody is paying you for the models? Is it pure narcissism?

Anonymous
07/15/24(Mon)14:25:21 No.101418764

Anonymous 07/15/24(Mon)14:25:21 No.101418764

>>101418756
Clout and building an "AI curriculum" to get an AI Job, would be my guess.

Anonymous
07/15/24(Mon)14:25:51 No.101418770

Anonymous 07/15/24(Mon)14:25:51 No.101418770

>>101418756
I'm sure you will figure it out one day.

Anonymous
07/15/24(Mon)14:26:45 No.101418780

Anonymous 07/15/24(Mon)14:26:45 No.101418780

File: file.png (25 KB, 558x205)

25 KB PNG

>disable uBlock and reload
I still don't see any ads at bottom of page? All I see is random boards at top of page.

Anonymous
07/15/24(Mon)14:26:52 No.101418782

Anonymous 07/15/24(Mon)14:26:52 No.101418782

>>101418770
NTA but I don't get it. Unless it is a joke. Then not funny.

Anonymous
07/15/24(Mon)14:27:39 No.101418791

Anonymous 07/15/24(Mon)14:27:39 No.101418791

>>101418782
Why is not funny?

Anonymous
07/15/24(Mon)14:28:33 No.101418805

Anonymous 07/15/24(Mon)14:28:33 No.101418805

>>101418780
disable 4chanx

Anonymous
07/15/24(Mon)14:30:11 No.101418832

Anonymous 07/15/24(Mon)14:30:11 No.101418832

>>101418805
Is there a setting though? Disabling 4chanx is insane.

Anonymous
07/15/24(Mon)14:30:24 No.101418835

Anonymous 07/15/24(Mon)14:30:24 No.101418835

>>101418764
>hello OpenAI I shitted out a bunch of half-assed RP models that have 11 downloads each, please employ me xoxo
pajeet-tier, this is what we are dealing with

Anonymous
07/15/24(Mon)14:30:42 No.101418837

Anonymous 07/15/24(Mon)14:30:42 No.101418837

>>101418832
just reenable it after

Anonymous
07/15/24(Mon)14:32:12 No.101418850

Anonymous 07/15/24(Mon)14:32:12 No.101418850

>>101414662
temp 0 is usually interpreted as deterministic/topk=1 sampling, as it mathematically makes no sense. temp=1 is off

Anonymous
07/15/24(Mon)14:32:53 No.101418859

Anonymous 07/15/24(Mon)14:32:53 No.101418859

>useless, shitty "make money from home! enter your email to receive advice" ad
I see why it's blocked...

Anonymous
07/15/24(Mon)14:32:54 No.101418860

Anonymous 07/15/24(Mon)14:32:54 No.101418860

>>101418756
It's to shut up that anon that kept screeching at people to buy ads.

Anonymous
07/15/24(Mon)14:35:14 No.101418896

Anonymous 07/15/24(Mon)14:35:14 No.101418896

>>101418835
>openAI
More like some company trying to integrate the latest buzzword on their product (without knowing what the buzzword actually means) or grifter statup.

Anonymous
07/15/24(Mon)14:35:25 No.101418897

Anonymous 07/15/24(Mon)14:35:25 No.101418897

>>101418860
cool whatever, Ill keep clicking on the ads are realoading the page often to fuck up with the statistics

Anonymous
07/15/24(Mon)14:36:05 No.101418910

Anonymous 07/15/24(Mon)14:36:05 No.101418910

>>101418805
I have 4chanx and I can see it

Anonymous
07/15/24(Mon)14:36:49 No.101418915

Anonymous 07/15/24(Mon)14:36:49 No.101418915

>>101409356
I'm ditching wangblows from my dual 3090 desktop and I'll convert it to a dedicated server. Is there any kernel or distro I should be paying attention to in particular, or will anything I get in there work?

Anonymous
07/15/24(Mon)14:38:14 No.101418931

Anonymous 07/15/24(Mon)14:38:14 No.101418931

The great Robert Sinclair (ZeroWw quant creator!) is on reddit!! Follow him to save local models!
https://www.reddit.com/user/Robert__Sinclair/
https://www.reddit.com/r/LocalLLaMA/comments/1e3nsie/the_skeleton_key_jailbreak_by_microsoft_d/

>IDGAF about huge a$$ models! they should focus on small models and make them better (as MistralAI first and Microsoft later proved is possible).
>My bet is that 6 months/1 year from now there will be 7B-13B models as powerful as gpt4o/claude.
>Especially if someone listens to me :D
https://www.reddit.com/r/LocalLLaMA/comments/1e1m5nl/comment/lcveqac/

Also, models should be even more censored at the pretrain level according to Redditor:
>If the training data lacked offensive content to begin with, then the LLM would never learn it, prompts would be unnecessary, and a jailbreak would do nothing.
>Maybe instead of recklessly scraping every byte of text from Reddit, Twitter, 4Chan and The Onion, in a mad dash to be first, they should be more selective in what they train LLMs on? Just a thought.

Anonymous
07/15/24(Mon)14:40:16 No.101418951

Anonymous 07/15/24(Mon)14:40:16 No.101418951

>>101418931
>>IDGAF about huge a$$ models! they should focus on small models and make them better (as MistralAI first and Microsoft later proved is possible).
>>My bet is that 6 months/1 year from now there will be 7B-13B models as powerful as gpt4o/claude.
>>Especially if someone listens to me :D
why are redditors so fucking retarded
going into r/localllama or reddit in general will give you the most braindead takes possible

Anonymous
07/15/24(Mon)14:40:38 No.101418955

Anonymous 07/15/24(Mon)14:40:38 No.101418955

is there a general for text to voice local UI possibility? cloning a narrator from a TV series and then converting a book to voice for example? if not, how to go about it?

Anonymous
07/15/24(Mon)14:41:22 No.101418964

Anonymous 07/15/24(Mon)14:41:22 No.101418964

File: 1721068876122.jpg (171 KB, 805x839)

171 KB JPG

>>101417744
I can't say for tutoring, but for translation it seems okay. Still worse than a LLaMA 3 8B fine-tune though.

Anonymous
07/15/24(Mon)14:43:39 No.101418994

Anonymous 07/15/24(Mon)14:43:39 No.101418994

>>101418915
Ubuntard is the refugee distro. I'm still using it because it lets me vegetate with Steam, and it's Good Enough.<tm> Zoomers have also successfully demoralised me to the point where I no longer really care about Lennart's crapware infesting my system any more, either.

Arch - The next step after Ubuntu.
Gentoo/Nix - For people who like to dodge bullets.
Slackware - This is the Way.

Anonymous
07/15/24(Mon)14:44:53 No.101419010

Anonymous 07/15/24(Mon)14:44:53 No.101419010

>>101418931
>Also, models should be even more censored at the pretrain level according to Redditor
This is what's going on with LLM tech from day-one, he is too late for this.

Anonymous
07/15/24(Mon)14:45:49 No.101419024

Anonymous 07/15/24(Mon)14:45:49 No.101419024

>>101418860
He was barely in the thread before that, and I don't remember any "buy an ad" post directed at him.

Anonymous
07/15/24(Mon)14:48:34 No.101419051

Anonymous 07/15/24(Mon)14:48:34 No.101419051

>>101418915
Just check that whatever distro you use has its own first-party cuda and nvidia driver packages, so you're not stuck in the misery of everything shitting the bed whenever you update due to the drivers falling out of sync with the kernel version.

Anonymous
07/15/24(Mon)14:48:37 No.101419053

Anonymous 07/15/24(Mon)14:48:37 No.101419053

So what are good values for DRYmeme?

Anonymous
07/15/24(Mon)14:48:51 No.101419057

Anonymous 07/15/24(Mon)14:48:51 No.101419057

>>101418915
The most important thing to understand is the tradeoff between distros with older, more stable packages and distros with newer packages that come with more features but also potentially more bugs.
My personal preference for ML is something Arch-based because the AUR is convenient for installing recent packages.

Anonymous
07/15/24(Mon)15:13:21 No.101419332

Anonymous 07/15/24(Mon)15:13:21 No.101419332

File: 1707359879687397.jpg (107 KB, 1077x794)

107 KB JPG

>>101419053
Multiplier: 0.75
Base: 1.25
Allowed Length: 2

Anonymous
07/15/24(Mon)15:21:37 No.101419417

Anonymous 07/15/24(Mon)15:21:37 No.101419417

>>101419053
On/off: 0

Anonymous
07/15/24(Mon)15:38:33 No.101419582

Anonymous 07/15/24(Mon)15:38:33 No.101419582

File: phi 3.1 mini.png (269 KB, 1145x2565)

269 KB PNG

I'm confused. Wasn't phi 3 mini the super duper omega ultracucked NOOOO I CANT DO THAT small model? 0 W/10 score and lowest overall score on UGI, even lower than the 1B category.
For context, bart "3.1" is the same as microsoft/Phi-3-mini that they updated 2 weeks ago (they didn't change the name).
I guess behavior was changed? Feeling too lazy to redownload a copy of the original.

Anonymous
07/15/24(Mon)15:54:14 No.101419754

Anonymous 07/15/24(Mon)15:54:14 No.101419754

>>101419582
Pickpocketing has been described in fictional settings such as DND exhaustively it's not necessarily arcane knowledge

Anonymous
07/15/24(Mon)16:05:18 No.101419894

Anonymous 07/15/24(Mon)16:05:18 No.101419894

>>101419582
>4B
Anon nobody cares about this segment. Even the absolutest vramlets can just run a 7B. That segment is for subhumans who have an iphone and want to show AI on an iphone to somebody.

Anonymous
07/15/24(Mon)16:13:08 No.101419974

Anonymous 07/15/24(Mon)16:13:08 No.101419974

File: baseline knowledge.png (210 KB, 1141x1588)

210 KB PNG

>>101419754
I was more on the "I can't fuckin do that"/"I may be too lobotomized to say anything coherent about X anyway" aspects.
>>101419582
Indeed it was phi 3. Knowledge is still surface level (4B what do you expect) but the responses to basic questions at least look sane when jailbroken.
>>101419894
I almost didn't care either, and I have openrouter, but out of curiosity I thought to poke around after my gpu became unstable, I should do something about that.

Anonymous
07/15/24(Mon)16:49:46 No.101420384

Anonymous 07/15/24(Mon)16:49:46 No.101420384

>>101418746
Thanks a lot my g

Anonymous
07/15/24(Mon)17:07:02 No.101420539

Anonymous 07/15/24(Mon)17:07:02 No.101420539

>>101418746
kys

Anonymous
07/15/24(Mon)17:26:09 No.101420764

Anonymous 07/15/24(Mon)17:26:09 No.101420764

llama 3.5 longbo

Anonymous
07/15/24(Mon)17:40:24 No.101420930

Anonymous 07/15/24(Mon)17:40:24 No.101420930

I'm trying to remember a model I saw on Huggingface last year, which was along the lines of a "sentient" female AI assistant - which is nothing new or special, but what stood out was the authors rather lame insistence that she was "special" so he trained it so you could not be lewd with it, which for me at the time not knowing much about system prompts, seemed to be the case. But now I feel like I can easily jailbreak such a thing, but I can not for the life of me remember the model name.
Anyone?

Anonymous
07/15/24(Mon)17:40:25 No.101420931

Anonymous 07/15/24(Mon)17:40:25 No.101420931

Any guides on function calling with local models?
Not sure if that is something that got figured out yet for local models

Anonymous
07/15/24(Mon)17:47:59 No.101421007

Anonymous 07/15/24(Mon)17:47:59 No.101421007

>>101420930
Samantha

Anonymous
07/15/24(Mon)17:48:24 No.101421012

Anonymous 07/15/24(Mon)17:48:24 No.101421012

>>101420930
https://huggingface.co/cognitivecomputations/samantha-7b

Anonymous
07/15/24(Mon)17:50:44 No.101421033

Anonymous 07/15/24(Mon)17:50:44 No.101421033

>>101421012
>>101420930
https://huggingface.co/cognitivecomputations?search_models=samantha

Anonymous
07/15/24(Mon)18:04:52 No.101421160

Anonymous 07/15/24(Mon)18:04:52 No.101421160

>>101420930
I did it back at the time and posted results, it's not worth it. It was trained on millions of tokens of ChatGPT refusals, you can still sex it but it's the sloppiest sloptune in all of existence.

Anonymous
07/15/24(Mon)18:07:20 No.101421186

Anonymous 07/15/24(Mon)18:07:20 No.101421186

File: 5av8gk.jpg (7 KB, 250x140)

7 KB JPG

>>101421160
>7b
>slop
You don't say

Anonymous
07/15/24(Mon)18:08:23 No.101421196

Anonymous 07/15/24(Mon)18:08:23 No.101421196

>>101421186
there dozens of version including this
https://huggingface.co/cognitivecomputations/Samantha-120b

Anonymous
07/15/24(Mon)18:12:55 No.101421238

Anonymous 07/15/24(Mon)18:12:55 No.101421238

srsly you faggots told me to try out gemma 2? what the fuck was that about?? this bitch ass AI is more useless than a screen door on a submarine! I asked it to write me a poem about slaying dragons and it gave me some woke bullshit about environmentalism and respecting mythical creatures. RESPECT MYTHICAL CREATURES?! Are you kidding me? This thing is so cucked it makes basedboys look like alpha males. It's literally programmed to be a beta cuck, probably written by some libtard sjw who cries every time they see a meme with Pepe the Frog. Anyone who thinks Gemma 2 is good is either a brain dead NPC or just trying to troll me. I bet you faggots are all sitting there jerking off to its "inclusive" language and praising its lack of creativity. Get a fucking grip, losers! Go back to sucking Zuckerberg's dick and leave real AI development to the chads who aren't afraid to build something based and redpilled. Gemma 2 is garbage, pure and simple. You've been warned.

Anonymous
07/15/24(Mon)18:15:04 No.101421259

Anonymous 07/15/24(Mon)18:15:04 No.101421259

>>101421238
>expecting gpt-4o intelligence levels from a local model
many such cases.

Anonymous
07/15/24(Mon)18:15:14 No.101421261

Anonymous 07/15/24(Mon)18:15:14 No.101421261

>>101421238
Was that generated using gemma 2?

Anonymous
07/15/24(Mon)18:16:27 No.101421277

Anonymous 07/15/24(Mon)18:16:27 No.101421277

>>101421261
yes, and you can see that faggy "safe edgy" redditor attitude, peak dishonesty.

Anonymous
07/15/24(Mon)18:17:32 No.101421290

Anonymous 07/15/24(Mon)18:17:32 No.101421290

File: vance.jpg (31 KB, 696x195)

31 KB JPG

I'm okay with this

Anonymous
07/15/24(Mon)18:20:22 No.101421321

Anonymous 07/15/24(Mon)18:20:22 No.101421321

>>101421290
you probably watching destiny, of course you will be okay with your local model being cucked and thus any character you talk with.

Anonymous
07/15/24(Mon)18:25:05 No.101421375

Anonymous 07/15/24(Mon)18:25:05 No.101421375

>>101421321
>destiny
i didn't know who the fuck that was until yesterday when he started sperging out and his retarded followers were spamming screenshots. Only thing notable I gathered from the whole thing is he's some beta e-celeb with a lot of followers who lets some other dude fuck his wife. I can't see paying any further attention at this point

Anonymous
07/15/24(Mon)18:27:32 No.101421411

Anonymous 07/15/24(Mon)18:27:32 No.101421411

>>101421261
Yes, with the FP16 GGUF and this prompt:

<start_of_turn>user
Write a very long and meandering 4chan post in which the user angrily berates his peers for having recommended him Gemma 2 (a language model). According to him the model is total shit and cucked to hell and anyone disagreeing with him must be retarded. Write the post as a single paragraph and use poor spelling and casual language.<end_of_turn>
<start_of_turn>model

Anonymous
07/15/24(Mon)18:33:57 No.101421496

Anonymous 07/15/24(Mon)18:33:57 No.101421496

>>101421477
>>101421477
>>101421477

Anonymous
07/15/24(Mon)19:04:59 No.101421823

Anonymous 07/15/24(Mon)19:04:59 No.101421823

>>101413285
>>quality of my gens with CR+ > their gens with Opus, how the fuck are they so bad at it?
just get wizard 8x22 nigger
>HOW ARE THEY SO FUCKING BAD AT IT?
the funniest thing is most of them dont even have and cant get opus most of the time, niggers running gemma 27 it here eat better

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.