/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/30/24(Sun)07:22:52 No.101214216

File: IMG_20240630_010541.jpg (2.29 MB, 4000x3000)

2.29 MB JPG

/lmg/ - Local Models General Anonymous 06/30/24(Sun)07:22:52 No.101214216 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101205004 & >>101197169

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/30/24(Sun)07:24:15 No.101214230

Anonymous 06/30/24(Sun)07:24:15 No.101214230

File: oig (13).jpg (107 KB, 1024x1024)

107 KB JPG

►Recent Highlights from the Previous Thread: >>101205004

--Studying Babies to Improve Foundational AI and Synthetic Data: >>101206053 >>101206360
--Anon's Struggle to Integrate Mistral.rs with SillyTavern and Gemma 27B: API, CUDA, and RAM Problems: >>101208799 >>101209341 >>101210250 >>101211162
--Google Accused of Cheating in Chatbot Arena with Gemma Model: >>101207038 >>101207301
--Gemma2 Verdict: Promising but Flawed Language Model: >>101205835 >>101208095 >>101208133 >>101208169 >>101210176
--Using A750 GPU for LLM Inference with Ryzen 5900x and 64GB RAM: >>101208835 >>101208885 >>101208961
--Magnum Model Issues with ChatML and MMQ Enabled: >>101210640 >>101210769 >>101210800
--LLaMA Devs Clash Over Vulkan Shader Removal: >>101210689 >>101213011
--Heroic GitHub Discussion on Sliding Window Attention Logic: >>101208230
--Google's AI Progress: Catching Up with the Industry Leaders: >>101207913 >>101207943 >>101207958 >>101208091 >>101208137 >>101208265
--Anons Share Deepseek Experiences and Discuss Cloud vs Local Model Pricing: >>101205229 >>101205422 >>101205468 >>101205512 >>101205808 >>101206209 >>101207307 >>101205994 >>101207588 >>101207662 >>101213358 >>101210238 >>101210321 >>101210257
--Anon's Claims about Self-Play Model Performance Spark Debate: >>101207279 >>101207300 >>101207326 >>101207354 >>101207380 >>101207594 >>101210205 >>101207400
--Ooba has Context-Free Grammar Support: >>101211624 >>101211659 >>101212005
--Newbie Seeks AI Model Recommendations for 4070ti and 32GB RAM Setup: >>101209163 >>101209186 >>101209187 >>101209195 >>101209229 >>101209252
--Koboldcpp CUDA Out-of-Memory Error with 3090+4090 Setup: >>101207089 >>101207871 >>101207918 >>101207952 >>101207965 >>101207997 >>101208140 >>101208289 >>101208740 >>101212050
--Seeking a Good Local Voice Cloning Tool: >>101207597 >>101208098
--In Situ Quantization: >>101213505
--Miku (free space): >>101207535 >>101207577 >>101211472

►Recent Highlight Posts from the Previous Thread: >>101205012

Anonymous
06/30/24(Sun)07:34:38 No.101214325

Anonymous 06/30/24(Sun)07:34:38 No.101214325

>>101198756
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/unslop1.2/control_vector-commandr-unslop1.2.gguf
I've made another unslop vector, this one focusing on NSFW. The good news is that it successfully kills slop during NSFW, unlike my previous vector, which was make to kill sfw slop. The bad news is that it kills performance as well, if you try to roleplay with it. Poor CommandR doesn't know what's happening most of the time and is trying to guess words, so it needs EXTREME handholding. Normally, I would blame it on control vector, but since it's almost impossible to prompt away the slop during nsfw, I'm afraid that there is a problem on a much deeper level. What likely happened is that the model learned to associate the slop with sex from all those shitty humanslop novels, diversity of data is very low, so anything deviating from slop-style sex is almost completely unexplored territory. Model simply doesn't know how to write about fucking in any other style. We need better datasets.

Should I try dbrx next? Maybe if it knows lots of trivia, it knows lots of writing styles? Sadly, the official tune is shit and the only other tune is GPTslopped dolphin.

Anonymous
06/30/24(Sun)07:54:33 No.101214510

Anonymous 06/30/24(Sun)07:54:33 No.101214510

>>101214216
Dead general.

Anonymous
06/30/24(Sun)08:05:37 No.101214617

Anonymous 06/30/24(Sun)08:05:37 No.101214617

so which is the best model currently for japanese?

Anonymous
06/30/24(Sun)08:09:15 No.101214644

Anonymous 06/30/24(Sun)08:09:15 No.101214644

>>101214510
Dead poster (RIP)

Anonymous
06/30/24(Sun)08:13:34 No.101214685

Anonymous 06/30/24(Sun)08:13:34 No.101214685

>>101214617
GPT 4o

Anonymous
06/30/24(Sun)08:14:52 No.101214692

Anonymous 06/30/24(Sun)08:14:52 No.101214692

>>101214685
i heard claude is better when it comes to closed shit but i want local
anyway not gonna buy your subscription gonna go claude if no luck

Anonymous
06/30/24(Sun)08:16:42 No.101214713

Anonymous 06/30/24(Sun)08:16:42 No.101214713

So now that the dust has settled, is the LLM industry stagnating? It's been almost 2 years since gpt-4 got trained and there is still no model that is a league beyond it. Even Llama 3 400B will be around gpt-4 range.

Is this it? Are all future models just going to be around gpt-4 from here on out, with maybe minor improvements, QoL features and better quantization/inference techniques?

Anonymous
06/30/24(Sun)08:17:28 No.101214723

Anonymous 06/30/24(Sun)08:17:28 No.101214723

>>101214692
:(

Anonymous
06/30/24(Sun)08:18:07 No.101214727

Anonymous 06/30/24(Sun)08:18:07 No.101214727

File: 43676 - SoyBooru.png (61 KB, 411x485)

61 KB PNG

The image is a humorous caricature of the "Wojak" meme, also known as "Feels Guy," depicting a character often used to express various emotions. In this version, the character is drawn with attributes suggesting a connection to OpenAI: the teal color and the OpenAI logo on the cheek. The joke likely plays on the idea of the character being an AI enthusiast or perhaps embodying the AI itself, with the typical Wojak expression giving it a humorous twist.

Anonymous
06/30/24(Sun)08:19:41 No.101214741

Anonymous 06/30/24(Sun)08:19:41 No.101214741

What's the easiest path to training a model of a person based off of a collection of their writing?

Anonymous
06/30/24(Sun)08:22:23 No.101214762

Anonymous 06/30/24(Sun)08:22:23 No.101214762

Question: 32gb of vram and 32gb of ram, what quant of CR+ should I go with? Is q2 even worth it? I can run 70b llama 3 fine-ish but slowly. Haven't ever tried CR, familiar with most other models in my performance range.

Anonymous
06/30/24(Sun)08:26:41 No.101214804

Anonymous 06/30/24(Sun)08:26:41 No.101214804

how hard is it for retards to do math? EUGH!!!! EUGHHHH!!!!!!! 32 + 32!! WHAT DO I GET? EUUUUUUUUUUUUUGH!

Anonymous
06/30/24(Sun)08:30:33 No.101214833

Anonymous 06/30/24(Sun)08:30:33 No.101214833

> mistral-7b-bitnet.gguf
a lot of time has passed and we still don't have any usable bitnet model.

Anonymous
06/30/24(Sun)08:32:28 No.101214853

Anonymous 06/30/24(Sun)08:32:28 No.101214853

>>101214804
Sorry, but the average IQ in this thread is not high enough to make most people catch the correlation between model size with the amount of RAM they have.

Anonymous
06/30/24(Sun)08:32:33 No.101214856

Anonymous 06/30/24(Sun)08:32:33 No.101214856

lazy ass gerganov

Anonymous
06/30/24(Sun)08:35:55 No.101214888

Anonymous 06/30/24(Sun)08:35:55 No.101214888

I have 8 vrams and ddr. What model can I walk?

Anonymous
06/30/24(Sun)08:40:41 No.101214937

Anonymous 06/30/24(Sun)08:40:41 No.101214937

>>101214888
no

Anonymous
06/30/24(Sun)08:41:35 No.101214947

Anonymous 06/30/24(Sun)08:41:35 No.101214947

>>101214510
This.

Anonymous
06/30/24(Sun)08:42:40 No.101214958

Anonymous 06/30/24(Sun)08:42:40 No.101214958

>>101214804
>>101214853
>>101214888
>>101214937
funny how easily reddit tourists can be spotted.

Anonymous
06/30/24(Sun)08:44:40 No.101214982

Anonymous 06/30/24(Sun)08:44:40 No.101214982

>>101214958
Sorry for not looking up the size of each quant on huggingface for you, nigger.

Anonymous
06/30/24(Sun)08:44:56 No.101214985

Anonymous 06/30/24(Sun)08:44:56 No.101214985

Hi anons, can i prompt multimodal models like llava with a few examples, like game icons, and then have it recognize and label them when i send it more? I was thinking if itd be viable to make some automation scripts

Anonymous
06/30/24(Sun)08:46:08 No.101214995

Anonymous 06/30/24(Sun)08:46:08 No.101214995

should i lurk on /aicg/ to find interesting system prompts? will they help not getting slop?

Anonymous
06/30/24(Sun)08:47:48 No.101215019

Anonymous 06/30/24(Sun)08:47:48 No.101215019

>>101214995
Try using a good model, you won't need a system prompt.

Anonymous
06/30/24(Sun)08:48:09 No.101215023

Anonymous 06/30/24(Sun)08:48:09 No.101215023

>>101214995
why don't you ask your favorite schizo mix to rewrite your system prompt for you?

Anonymous
06/30/24(Sun)08:48:39 No.101215026

Anonymous 06/30/24(Sun)08:48:39 No.101215026

>>101215019
i can't. i'm a vramlet

Anonymous
06/30/24(Sun)08:49:36 No.101215035

Anonymous 06/30/24(Sun)08:49:36 No.101215035

>>101215026
Then yes, you should go to /aicg/. And stay there.

Anonymous
06/30/24(Sun)08:56:31 No.101215092

Anonymous 06/30/24(Sun)08:56:31 No.101215092

I'm a vramlet, I've been here since lmg split from aicg. I do not ask, I do.
-Anonymous

Anonymous
06/30/24(Sun)09:00:36 No.101215133

Anonymous 06/30/24(Sun)09:00:36 No.101215133

>>101214958
>being able to do basic math and logical deductions is now reddit

Anonymous
06/30/24(Sun)09:02:40 No.101215157

Anonymous 06/30/24(Sun)09:02:40 No.101215157

>>101215133
r/clevercomebacks

Anonymous
06/30/24(Sun)09:05:03 No.101215183

Anonymous 06/30/24(Sun)09:05:03 No.101215183

>>101214804
so stop jokin around, what do you get

Anonymous
06/30/24(Sun)09:22:23 No.101215392

Anonymous 06/30/24(Sun)09:22:23 No.101215392

>>101215183
55

Anonymous
06/30/24(Sun)09:25:44 No.101215426

Anonymous 06/30/24(Sun)09:25:44 No.101215426

No reasonable person will bother with local models when Sonnet 3.5 is out there for free.

Anonymous
06/30/24(Sun)09:28:02 No.101215449

Anonymous 06/30/24(Sun)09:28:02 No.101215449

>>101215426
Speaking of free Sonnet 3.5

https://openrouter.ai/models/anthropic/claude-3.5-sonnet/apps

Will this startup be alright? Because damn that's a lot of usage.

Anonymous
06/30/24(Sun)09:28:58 No.101215461

Anonymous 06/30/24(Sun)09:28:58 No.101215461

>>101215449
>22k$
kek

Anonymous
06/30/24(Sun)09:30:19 No.101215475

Anonymous 06/30/24(Sun)09:30:19 No.101215475

>>101215461
>i calculated with input token prices
>110k$ for output tokens
rip startup

Anonymous
06/30/24(Sun)09:48:13 No.101215609

Anonymous 06/30/24(Sun)09:48:13 No.101215609

What's the state of Gemma 2 27B? My understanding is that some changes to llama.cpp were missing which is why it is so retarded. With those changes being merged, do we expect the 27B model to become usable or is it fundamentally broken.

I expected there to be more activity because IQ4_XS should fit perfectly in 16 GB VRAM with 4K tokens.

Speaking of which, I haven't kept up with new models and tunes. Anything interesting that fits in 16 GB VRAM or gives at least ≥10 t/s with offloading? I remember being unimpressed with the first few Llama 3 7B finetunes compared to Yi 34B.

Anonymous
06/30/24(Sun)09:58:23 No.101215683

Anonymous 06/30/24(Sun)09:58:23 No.101215683

Off-topic but that Kling shit is pretty nice.

Anonymous
06/30/24(Sun)10:04:57 No.101215738

Anonymous 06/30/24(Sun)10:04:57 No.101215738

>>101215609
I haven't found anything that wasn't retarded.
The smallest model I've seen pass my quick tests is a 41GB quant of Qwen2, and a 27GB quant of Aya, though it's a provisional pass and to fit the 13B into 16GB would be Q2_K and that's probably lobotomized.

Maybe try Aya 8B at Q6_K or Q8_0? Bigger than your VRAM but it should still be peppy and it's the only small model I've seen that didn't immediately make me facepalm.

Anonymous
06/30/24(Sun)10:07:39 No.101215761

Anonymous 06/30/24(Sun)10:07:39 No.101215761

>>101215133
>bragging about his high IQ and abilities
The main sign that you have neither and just seeking a cause to stir some shit on chink basketweaving forum, actual high IQ-fag will hide his power level.

Anonymous
06/30/24(Sun)10:15:52 No.101215842

Anonymous 06/30/24(Sun)10:15:52 No.101215842

>>101214325
What if instead of removing slop we tried injecting SOVL? Does anyone know good prompts for soulful writing?

Anonymous
06/30/24(Sun)10:18:03 No.101215865

Anonymous 06/30/24(Sun)10:18:03 No.101215865

Can't use flash attention with gemma 2 and the latest build of llama.cpp?

Anonymous
06/30/24(Sun)10:19:41 No.101215871

Anonymous 06/30/24(Sun)10:19:41 No.101215871

Do local models take up roughly the same size in gpu memory as they do on disk? Is there a difference between the disk size of safetensor and pickles?

Anonymous
06/30/24(Sun)10:20:42 No.101215882

Anonymous 06/30/24(Sun)10:20:42 No.101215882

>>101214713
Truthfully, GPT4 was way too big, and they should've just trained a dense 180b for much much longer.
I think Anthropic plans to do basically that with Opus 3.5 (no way it's bigger than 200b dense, see Yi Large being 132b and falling squarely in the middle between L3 70b and Opus/Furbo), and considering you can get effectively a SOTA model out of 70b as evidenced by Sonnet 3.5, Opus 3.5 might be the first legitimate capacity leap.
I also think we are early when it comes to steering these fucking things without a shitton of data, and synthetic slop is the holdover until we get more stable / generalizable ways to optimize preference from only a couple examples (see: https://github.com/uclaml/SPPO)
I don't think we are anywhere near approaching theoretical limits, and the optimal parameter size of a language model trained on everything humans have ever written is most probably in the multi-trillions rather than hundreds of billions.

Anonymous
06/30/24(Sun)10:21:01 No.101215884

Anonymous 06/30/24(Sun)10:21:01 No.101215884

>>101215871
Continuing..
Or should I just multiply the number of parameters by the float size (eg 8 bytes or less if quantized) to determine in-memory size of model?

Anonymous
06/30/24(Sun)10:21:36 No.101215889

Anonymous 06/30/24(Sun)10:21:36 No.101215889

>>101215865
>llama_new_context_with_model: flash_attn is not compatible with attn_soft_cap - forcing off
Ah. I see.

>>101215871
The model itself yes, but there's context to consider too, so a model in use can take a little more to a lot more memory than its size on the disk depending on the size of the context, the techniques being used (GQA), etc.

Anonymous
06/30/24(Sun)10:26:40 No.101215923

Anonymous 06/30/24(Sun)10:26:40 No.101215923

>>101215884
A useful heuristic for me is that the amount of GB a model takes in memory (before context) is roughly the same as its parameter count at q8_0, and double that at full precision / fp16. I.e, Mixtral 47b is ~47GB at q8_0 precision, and 94GB at fp16/bf16
Context size memory used will also depend on quantization and whether or not the model was trained with GQA.

Anonymous
06/30/24(Sun)10:26:58 No.101215927

Anonymous 06/30/24(Sun)10:26:58 No.101215927

>>101215889
Thanks

Anonymous
06/30/24(Sun)10:35:09 No.101216012

Anonymous 06/30/24(Sun)10:35:09 No.101216012

>>101215738
Thanks, I wasn't aware of Aya until now. The 35B version should fit using IQ3_XXS which might just be borderline usable. I'll give this and 8B Q8_0 a go.

Anonymous
06/30/24(Sun)10:35:46 No.101216016

Anonymous 06/30/24(Sun)10:35:46 No.101216016

whats undi up to nowadays

Anonymous
06/30/24(Sun)10:41:05 No.101216080

Anonymous 06/30/24(Sun)10:41:05 No.101216080

>>101215609
Old quantizations should be requantized with the latest version of llama.cpp.
The sliding window isn't working yet, so the model has effectively 4k context. But FlashAttention isn't compatible with it, so you'll have large memory usage anyway. Still, 4k context with 6.5 bpw 27B Gemma-2 is attainable on 24GB.
A possibility is configuring the sliding window to 8k tokens, which should disable the SWA mechanism. It works but I haven't tested it in depth.

Anonymous
06/30/24(Sun)10:42:47 No.101216100

Anonymous 06/30/24(Sun)10:42:47 No.101216100

>>101216080
context less than 32k is unusable, gemma is a bad joke

Anonymous
06/30/24(Sun)10:45:25 No.101216118

Anonymous 06/30/24(Sun)10:45:25 No.101216118

my heart hurts when i edge, is this normal

Anonymous
06/30/24(Sun)10:45:37 No.101216121

Anonymous 06/30/24(Sun)10:45:37 No.101216121

gemma 2 9b working good. especially with my language.

Anonymous
06/30/24(Sun)10:46:46 No.101216133

Anonymous 06/30/24(Sun)10:46:46 No.101216133

>>101216118
Have you tried asking Dr. Llama?

Anonymous
06/30/24(Sun)10:49:50 No.101216161

Anonymous 06/30/24(Sun)10:49:50 No.101216161

>>101214713
You just need to wait for open AIs tech breakthrough. 500bx16 200T tokens gpt 5

Anonymous
06/30/24(Sun)10:50:37 No.101216172

Anonymous 06/30/24(Sun)10:50:37 No.101216172

>>101216100
32k isn't shit, I don't get out of bed for less than 65k context.

Anonymous
06/30/24(Sun)10:58:46 No.101216246

Anonymous 06/30/24(Sun)10:58:46 No.101216246

>>101215882
I believe (without knowing too much about LLMs) GPT-4 could be much better than it currently is if we had access to its base model. All commercial LLMs (probably all instruct tunes actually) are heavily RLHF-lobotomized. Afaik, the degradation of reasoning capabilities this causes is of surprising magnitude. Of course you need to do some tuning for it to follow instructions, but I expect any tuning other than that specific for your use case will actively hurt its performance. If, in theory, we could take the GPT-4 base model and only fine-tune it to roleplay, it would be unparalleled. I also speculate that Claude's vastly superior ability to imitate writing styles and to feel much more sovlful is almost exclusively caused by differences in alignment/fine-tuning, and not by differences in its training data. So GPT-4 probably isn't inherently sovlless.

And it's the same for other use-cases I think. Issues like hallucinating are probably LLM-specific and therefore impossible to get rid of. But fine-tuning one of the large models would still yield something much more powerful than training a bigger model that's then lobotomized again.

>>101214713
>the optimal parameter size of a language model trained on everything humans have ever written is most probably in the multi-trillions rather than hundreds of billions
Is this speculation or do you have any source that has evaluated this? My gut tells me that the largest models still aren't nearly as efficiently compressed as theoretically possible, but I also know nothing about the mathematics behind entropy and information.

Anonymous
06/30/24(Sun)11:07:24 No.101216303

Anonymous 06/30/24(Sun)11:07:24 No.101216303

>>101215183
i'm not telling you any thing, retard. do basic math. if you can't, just take all your electronics and throw them in a bathtub full of water, then jump in yourself.

Anonymous
06/30/24(Sun)11:12:45 No.101216357

Anonymous 06/30/24(Sun)11:12:45 No.101216357

>>101214713
imo the stagnation in the LLM field (especially local) is a consequence of the collective expectation that the 'perfect' model is going to be released any moment now. This belief has led to a complacency in implementing features with traditional programming, after all why waste endless hours trying to code something that the next model might be able to do out of the box? We really need a big wake up call that makes people go "hey maybe I need to hardcode reasoning, memory and anti-slop abilities around these models". My hope is that GPT5 gets released and ends up being only marginally better than GPT4o.

Anonymous
06/30/24(Sun)11:14:59 No.101216377

Anonymous 06/30/24(Sun)11:14:59 No.101216377

File: Screenshot 2024-06-30 111138.png (64 KB, 416x371)

64 KB PNG

>>101216303
Why would anyone bother learning basic math when we have tools to do this for us?

Anonymous
06/30/24(Sun)11:15:29 No.101216383

Anonymous 06/30/24(Sun)11:15:29 No.101216383

I've now worked out how to use mistral.rs successfully with ST. You just have to disable streaming and put the model detected in the model line of the API tab.

Now what? I guess I'll test if it can really do 8k, first.

Anonymous
06/30/24(Sun)11:19:32 No.101216429

Anonymous 06/30/24(Sun)11:19:32 No.101216429

>makes models for vramlets
>tells them they can't use quants
genius
>Also, if you're using gguf or other quants, stuff is broken there. PoSE doesn't play well with quants.
>https://huggingface.co/Sao10K/Fimbulvetr-11B-v2.1-16K/discussions/2

Anonymous
06/30/24(Sun)11:21:29 No.101216453

Anonymous 06/30/24(Sun)11:21:29 No.101216453

>>101216246
>is if we had access to its base model
yeah, I'm sure some kofi finetuners throwing some shitty erp chatlogs at it would really improve gpt4

Anonymous
06/30/24(Sun)11:25:32 No.101216503

Anonymous 06/30/24(Sun)11:25:32 No.101216503

>>101216453
don't see you doing anything better

Anonymous
06/30/24(Sun)11:25:40 No.101216505

Anonymous 06/30/24(Sun)11:25:40 No.101216505

Any recommendations for 32GB with 16GB VRAM?

Anonymous
06/30/24(Sun)11:34:54 No.101216621

Anonymous 06/30/24(Sun)11:34:54 No.101216621

>>101216505
>>101216012 here. Aya 35B IQ3_XXS works barely, I had to check Offload KV, otherwise it would out-of-memory. I decreased context size to 4096 before, but with KV offloaded, 8192 should work, too. Getting a bit over 10 T/s. It's surprisingly good, subjectively much more engaging than any Llama 3 8B or Mixtral tune, but sometimes it's a bit retarded because of the quantization. Still, very good first impression so far.

Anonymous
06/30/24(Sun)11:36:32 No.101216644

Anonymous 06/30/24(Sun)11:36:32 No.101216644

>>101216429
Top 10 ko-fi betrayals lmao.
Talk about biting the hand that feeds

Anonymous
06/30/24(Sun)11:47:50 No.101216799

Anonymous 06/30/24(Sun)11:47:50 No.101216799

>>101216621
Happy to hear it.
I guess Aya's been kind of a sleeper. Did nobody care about c4ai till CR(+) got the coomers cooming?
It claims to be multilingual, too, so I'm looking forward to trying it as translator and maybe coding.
Have you tested a fatter quant with partial unloading? I get 2½t/s on Q6_K. It might be worth the time to get fewer retard moments.

Anonymous
06/30/24(Sun)11:49:18 No.101216815

Anonymous 06/30/24(Sun)11:49:18 No.101216815

File: file.png (131 KB, 1080x606)

131 KB PNG

lul

Anonymous
06/30/24(Sun)11:53:41 No.101216870

Anonymous 06/30/24(Sun)11:53:41 No.101216870

OK so it looks like mistral.rs and 27B won't work with larger context. For some reason, it has a HUGE spike in memory usage when it begins inference, and even if you quant it down to Q2K, it still can't do even beyond like 3k before the spike results in a OOM error. I'm literally using a damn 3090 and it can't fit both the model at Q2K and the memory spike. What the hell.

I guess in the end it's practically the same as not having support for Gemma beyond 4k context kek.

Anonymous
06/30/24(Sun)11:54:46 No.101216886

Anonymous 06/30/24(Sun)11:54:46 No.101216886

>>101216870
>mistral.rs

Anonymous
06/30/24(Sun)11:56:01 No.101216903

Anonymous 06/30/24(Sun)11:56:01 No.101216903

>>101216886
Someone posted that it supported Gemma 2, so I thought I'd see just how bad it is. Yeah it's bad.

Anonymous
06/30/24(Sun)11:56:02 No.101216905

Anonymous 06/30/24(Sun)11:56:02 No.101216905

>>101216886
You forgot your message.

Anonymous
06/30/24(Sun)11:57:30 No.101216930

Anonymous 06/30/24(Sun)11:57:30 No.101216930

>>101216905

llama.cpp CUDA dev !!OM2Fp6Fn93S
06/30/24(Sun)11:57:52 No.101216935

llama.cpp CUDA dev !!OM2Fp6Fn93S 06/30/24(Sun)11:57:52 No.101216935

>>101213854
XCOM 2 with Long War of the Chosen mod.

Anonymous
06/30/24(Sun)12:02:23 No.101216999

Anonymous 06/30/24(Sun)12:02:23 No.101216999

>>101212050
>For regular /lmg/ use 2 kW for 6 4090s is unproblematic because the software is currently not efficient enough to parallelize them in such a way that each GPU draws a lot of power.
For compute-heavy tasks you have to limit the boost frequency in order to avoid peaks in power draw that cause instability (and then there is basically no benefit in getting 6 4090s instead of 5).
I'm surprised you can even get 5 to run on 2050w that's actually crazy
>you have to limit the boost frequency in order to avoid peaks in power draw
do you do that by staggering the allreduce or gradient update during training or something similar?

Anonymous
06/30/24(Sun)12:03:10 No.101217008

Anonymous 06/30/24(Sun)12:03:10 No.101217008

File: this kills the agi within(...).png (21 KB, 800x357)

21 KB PNG

>>101216246
>also speculate that Claude's vastly superior ability to imitate writing styles and to feel much more sovlful is almost exclusively caused by differences in alignment/fine-tuning, and not by differences in its training data. So GPT-4 probably isn't inherently sovlless.
100% correct & a deliberate design decision as confirmed by them various times.

>>101216246
>could be much better than it currently is if we had access to its base model.
Kinda cope, you need very good data to align the big model to the desired distribution, open source just has slop data at the moment so not happening.

>Is this speculation or do you have any source that has evaluated this?
Impossible to empirically evaluate because it would cost a ridiculous amount, but for reference, I remember seeing loss averages of like ~1.5 for llama3 70b, and llama3 8b having ~2.2 when I did a test on English web data from FineWeb.

If nearly 10x the parameters on the same data is a ~1.5x relative average loss improvement on _15 trillion tokens_, that tells me the optimal theoretical size to get below ~0.3-0.4 loss on average for English (without a metric fuckton of epochs) over a cleaned internet text corpus is probably a dense model with a param count in the low trillions. "Everything humans have ever archived or wrote online" is an exceptionally broad thing to model.

However, multiple epochs on a smaller but "big enough" model makes more economic sense. You have to actually deploy it later on, and you can't serve a trillion parameter beast at scale forever on current HW without losing money (OpenAI discovered this and as a result distilled 4 into Furbo).

I think that is why 4o and Sonnet 3.5 exist; they are pushing the most they can out of mid-range models with more compute, and the fact that they are roughly equal to the original GPT4 in performance is a consequence of a deliberate design decision to save costs.

TLDR; the economic sustainability of going bigger is what is plateauing.

Anonymous
06/30/24(Sun)12:03:45 No.101217018

Anonymous 06/30/24(Sun)12:03:45 No.101217018

>>101214713
LLM industry shifted from ungabunga-style throwing one big prompt at one big model to sophisticated agent workflows

llama.cpp CUDA dev !!OM2Fp6Fn93S
06/30/24(Sun)12:09:44 No.101217102

llama.cpp CUDA dev !!OM2Fp6Fn93S 06/30/24(Sun)12:09:44 No.101217102

>>101216999
I do it via commands like
sudo nvidia-smi --lock-gpu-clocks 0,2000 --mode 1
IIRC you can then reliably draw something like 300 W per 4090 without running into stability issues and with ~90% of the maximum potential performance,
Notably setting a power limit via nvidia-smi does NOT help to reduce spikes in power consumption.
If you don't limit the boost clocks 4 4090s running in parallel can already lead to instability.

Anonymous
06/30/24(Sun)12:09:45 No.101217103

Anonymous 06/30/24(Sun)12:09:45 No.101217103

you ok there buddy?

Anonymous
06/30/24(Sun)12:12:06 No.101217131

Anonymous 06/30/24(Sun)12:12:06 No.101217131

>https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3
>there's an SPPO version of Gemma 2 9B now
Welp. Is this the model to use for VRAMlets now, assuming SWA gets supported?

Anonymous
06/30/24(Sun)12:15:11 No.101217171

Anonymous 06/30/24(Sun)12:15:11 No.101217171

hello, I am insane. What is the best model for me?

Anonymous
06/30/24(Sun)12:17:02 No.101217190

Anonymous 06/30/24(Sun)12:17:02 No.101217190

>>101217171
goody2

Anonymous
06/30/24(Sun)12:19:39 No.101217227

Anonymous 06/30/24(Sun)12:19:39 No.101217227

>>101217171
Petra-13b

Anonymous
06/30/24(Sun)12:19:54 No.101217231

Anonymous 06/30/24(Sun)12:19:54 No.101217231

>>101217131
holy fuck that was fast, less than a week after the llama3-8b-SSPO release

Anonymous
06/30/24(Sun)12:27:33 No.101217344

Anonymous 06/30/24(Sun)12:27:33 No.101217344

>>101217131
I hope he'll do it on gemma2-27b aswell, the potential is here

Anonymous
06/30/24(Sun)12:28:31 No.101217361

Anonymous 06/30/24(Sun)12:28:31 No.101217361

>magnum gguf doesn't behave
>check debug log
>bos_token and eos_token are set to the same
>qwen2 model config has set bos to null, chinks say it is intended.
>bos gets forced added by kobold and can't be null in the gguf thanks to that reddit shizo
Its annoying how often I find quants or models with broken configs or wrong bos or eos tokens.
At least magnum now outputs eos token on high temps with the "right" bos token.

Anonymous
06/30/24(Sun)12:29:17 No.101217369

Anonymous 06/30/24(Sun)12:29:17 No.101217369

>>101217131
>LC. Win Rate of 53.27
Holy shit this is actually crazy if legit. This would literally place it at third place on the verified AlpacaEval Leaderboard, just below 4o and furbo. Even on the unverified leaderboard, it would place above all other community fine tunes. Even if these are all scams, it probably says something that 9B SPPO beats all of them.
We're so back.

Anonymous
06/30/24(Sun)12:30:28 No.101217383

Anonymous 06/30/24(Sun)12:30:28 No.101217383

File: GRCPSc7XIAARV4U.jpg (123 KB, 1638x2047)

123 KB JPG

https://x.com/WABetaInfo/status/1806101428609622181
Wtf? They finished Llama3-405b?

Anonymous
06/30/24(Sun)12:30:32 No.101217385

Anonymous 06/30/24(Sun)12:30:32 No.101217385

>>101217171
Grok-1

Anonymous
06/30/24(Sun)12:31:59 No.101217411

Anonymous 06/30/24(Sun)12:31:59 No.101217411

>>101214713
imho the ability of a neural network to reason is related to its depth, not its width, the latter only allows them to store more data and understand more concepts
llms are very wide but not that deep, and there doesn't seem to be a lot of research on very deep ones, transformers with hundreds or thousands of layers still suffer from the vanishing gradient problem

Anonymous
06/30/24(Sun)12:32:10 No.101217415

Anonymous 06/30/24(Sun)12:32:10 No.101217415

>>101217383
Probably still testing. But yeah it's almost July, you know, when they were originally planning to release L3. I'd still be more confident about a late July release though.

Anonymous
06/30/24(Sun)12:32:51 No.101217425

Anonymous 06/30/24(Sun)12:32:51 No.101217425

>>101217361
>>xkcd/927
Wouldn't it have been nice if there were a shim to automatically handle conversion between the tokens that the model declares and what the interface is set for so everything Just Works™?

>>101217383
Not 1.58, won't fit my (V)RAM limits, not relevant to my interests.

Anonymous
06/30/24(Sun)12:33:22 No.101217435

Anonymous 06/30/24(Sun)12:33:22 No.101217435

>>101217383
>(preview)
probably still in the final stages of tuning

Anonymous
06/30/24(Sun)12:37:21 No.101217490

Anonymous 06/30/24(Sun)12:37:21 No.101217490

>>101217369
9b, come on. How can it possibly be legit.

Anonymous
06/30/24(Sun)12:39:33 No.101217517

Anonymous 06/30/24(Sun)12:39:33 No.101217517

>>101217490
By being trained on the test.
It worked for students in the oughts.
That's why the rate of brilliance jumped so much between the generation that went to the moon and the generation that went to argue about the color of a dress on Twitter.

Anonymous
06/30/24(Sun)12:40:31 No.101217530

Anonymous 06/30/24(Sun)12:40:31 No.101217530

>>101217369
>unverified leaderboard
the unerified leaderboard has shitty 7b tunes above 4o and 3.5 sonnet, kek

Anonymous
06/30/24(Sun)12:42:46 No.101217554

Anonymous 06/30/24(Sun)12:42:46 No.101217554

>>101217369
https://tatsu-lab.github.io/alpaca_eval/
>Yi-Large Preview is third
is this a closed model aswell?

Anonymous
06/30/24(Sun)12:43:20 No.101217559

Anonymous 06/30/24(Sun)12:43:20 No.101217559

File: ComfyUI_00274.jpg (801 KB, 2048x1536)

801 KB JPG

>>101217361
exl2 quants of Magnum and Qwen don't work unless you add the proper EOS token to the config json. Sounds like a related issue.

Anonymous
06/30/24(Sun)12:43:28 No.101217564

Anonymous 06/30/24(Sun)12:43:28 No.101217564

>>101217490
Because it's not model capability that's being tested but stylistic preference using GPT-4 as the evaluator. If we can get models to be truly more like GPT-4 in how it answers requests, that's a good thing.

>>101217530
We don't know how smart those ones are though. Anons have tested SPPO and verified it's actually good.

Anonymous
06/30/24(Sun)12:44:27 No.101217576

Anonymous 06/30/24(Sun)12:44:27 No.101217576

>>101217559 (me)
*BOS token, not EOS token

Anonymous
06/30/24(Sun)12:45:32 No.101217590

Anonymous 06/30/24(Sun)12:45:32 No.101217590

>>101217102
interesting thank you, I don't have access to 220v plugs right now so I'm stuck at 1600w which without changing the clock speeds the most it can handle is 3 3090's at full power draw. I've seen dual psu set ups but I was waiting to get more 220v plugs installed before i do that. I'm not seeing a lot of info about dual psu set ups but I don't see a reason why I can't have one psu for the motherboard/cpu/first 3 gpus and then another psu for the last 4 gpus. if it's all on one 220v circuit with an Add2PSU don't think I'll have a problem

Anonymous
06/30/24(Sun)12:45:40 No.101217592

Anonymous 06/30/24(Sun)12:45:40 No.101217592

>>101217564
Well I should say they claimed it was good. I don't have any confirmation myself about their impressions of the model.

Anonymous
06/30/24(Sun)12:46:27 No.101217602

Anonymous 06/30/24(Sun)12:46:27 No.101217602

>>101217131
https://huggingface.co/hrtdind/Gemma-2-9B-It-SPPO-Iter3-Q5_K_M-GGUF
Can this be run on ooba? Dunno if the latest llama.cpp_python package has all the fixes to run gemma
https://github.com/oobabooga/text-generation-webui/commit/66090758df4a2003974e0499b697f926fcb472ba

Anonymous
06/30/24(Sun)12:47:30 No.101217614

Anonymous 06/30/24(Sun)12:47:30 No.101217614

>>101217592
I tried the L3 8B SPPO and it felt like L3. It didn't pass my music theory question but 8B never does; needs 70B and decent quant to pass that.

Anonymous
06/30/24(Sun)12:47:40 No.101217616

Anonymous 06/30/24(Sun)12:47:40 No.101217616

>>101217602
Pretty sure not. Ooba is always like 2 steps behind on supporting stuff.

Anonymous
06/30/24(Sun)12:48:31 No.101217627

Anonymous 06/30/24(Sun)12:48:31 No.101217627

>>101217616
unfortunately it's not really his fault, he has to wait for llama_cpp_python to update before making the new binaries

Anonymous
06/30/24(Sun)12:51:43 No.101217663

Anonymous 06/30/24(Sun)12:51:43 No.101217663

>>101217614
Yeah I don't think it adds any knowledge to the model, it's just a fine tune after all. The AlpacaEval leaderboard is more about style and how a model reacts when answering a request.

Anonymous
06/30/24(Sun)12:52:55 No.101217688

Anonymous 06/30/24(Sun)12:52:55 No.101217688

>>101217425
For exactly that exist the tokenizer_config.json and generation_config.json. But the Qwen2 chinks decided to ignore bos entirely and Llama3 decided there exist now multiple eos tokens.

>>101217559
alpindale fucked up the config to fix exl2, making a mistake and set eos and bos the same and fixing it later.
But quants don't get remade if the model maker changes something, they are left to rot.

Anonymous
06/30/24(Sun)12:54:55 No.101217709

Anonymous 06/30/24(Sun)12:54:55 No.101217709

>>101217614
try this one now, maybe it's the first good small model >>101217131

Anonymous
06/30/24(Sun)12:56:45 No.101217730

Anonymous 06/30/24(Sun)12:56:45 No.101217730

I'm afraid to say that AI seems to have plateaued.

Anonymous
06/30/24(Sun)12:57:33 No.101217739

Anonymous 06/30/24(Sun)12:57:33 No.101217739

>>101217663
>Yeah I don't think it adds any knowledge to the model
Naturally it can't.
But the K_S theory is that some manipulations can damage the model's ability to access its knowledge. Following that, it's worth checking to see if there are advancements that might help it to dredge up facts that are in the training but don't bubble up in the usual techniques.

Given how bad the small models are, anything that helps them approach usefulness would be good to recognize.

>>101217688
So the framework for everything to function smooth as ice is there, but all these technical geniuses with their huge rigs that they spend massive amounts of time and energy on are goofuses who fumble a few lines in the config file and put all of that wattage into broken shit.

Typical.

>>101217709
I'm pulling the Q5_K_M GGUF right now, (just finished as I type this). Cursory testing on the way.

Anonymous
06/30/24(Sun)12:59:25 No.101217761

Anonymous 06/30/24(Sun)12:59:25 No.101217761

>>101217739
Kobold's not ready for it. Cursory testing awaiting AUR update.

I'll test some Yi in the meanwhile and keep trying to think of the perfect coding question to add to my cursory testing series.

Anonymous
06/30/24(Sun)13:01:35 No.101217785

Anonymous 06/30/24(Sun)13:01:35 No.101217785

>>101217730
yeah but whaddya gonna do

Anonymous
06/30/24(Sun)13:02:30 No.101217795

Anonymous 06/30/24(Sun)13:02:30 No.101217795

>>101217739
>Following that, it's worth checking to see if there are advancements that might help it to dredge up facts that are in the training but don't bubble up in the usual techniques
Perhaps, but there is also the issue for SPPO that they're tuning on top of Instruct models instead of base, so it's probably another layer of difficulty to get the optimal knowledge out of the pretrain data.

Anonymous
06/30/24(Sun)13:04:46 No.101217825

Anonymous 06/30/24(Sun)13:04:46 No.101217825

you still can't run gemma 2 9b or 27b beyond 4k context with llama.cpp, right?

Anonymous
06/30/24(Sun)13:06:34 No.101217856

Anonymous 06/30/24(Sun)13:06:34 No.101217856

File: VaguelyUncomfortableMiku.png (1.27 MB, 832x1216)

1.27 MB PNG

>>101217627
>unfortunately it's not really his fault, he has to wait for llama_cpp_python to update before making the new binaries
Deciding to rely on a third party python wrapper kind of is his fault

Anonymous
06/30/24(Sun)13:07:08 No.101217861

Anonymous 06/30/24(Sun)13:07:08 No.101217861

Can I put gemma into my ASS

Anonymous
06/30/24(Sun)13:07:38 No.101217865

Anonymous 06/30/24(Sun)13:07:38 No.101217865

>>101217856
>python
The only fault.

Everyone is relying on third parties somewhere along the line.

Anonymous
06/30/24(Sun)13:22:01 No.101218031

Anonymous 06/30/24(Sun)13:22:01 No.101218031

>>101214216
>nothing about hardware in the OP
so what do I buy, is P40's with custom shrouds off ebay still the meta for cheap VRAM?

Anonymous
06/30/24(Sun)13:23:16 No.101218041

Anonymous 06/30/24(Sun)13:23:16 No.101218041

>wonder how Gemma 2 support is going for Exllama
>go and search the issues and prs
>literally nothing
>like not even discussion
Wtf?

Anonymous
06/30/24(Sun)13:24:43 No.101218059

Anonymous 06/30/24(Sun)13:24:43 No.101218059

>>101218041
MITbros...

Anonymous
06/30/24(Sun)13:27:17 No.101218083

Anonymous 06/30/24(Sun)13:27:17 No.101218083

File: unknown.png (968 KB, 1366x768)

968 KB PNG

bros can anyone share any alpha on voice to voice models ??

like omni models shown by openai

anyone workin on that??

Anonymous
06/30/24(Sun)13:29:49 No.101218121

Anonymous 06/30/24(Sun)13:29:49 No.101218121

>>101218041
stop using exllama then.

Anonymous
06/30/24(Sun)13:30:29 No.101218128

Anonymous 06/30/24(Sun)13:30:29 No.101218128

>>101218083
i mean theres whisper for voice to text, and there are many text to speech models

Anonymous
06/30/24(Sun)13:31:59 No.101218142

Anonymous 06/30/24(Sun)13:31:59 No.101218142

i am GRIPPING

Anonymous
06/30/24(Sun)13:33:36 No.101218161

Anonymous 06/30/24(Sun)13:33:36 No.101218161

File: MikuFitTrainer.png (1.2 MB, 832x1216)

1.2 MB PNG

>>101217865
>Everyone is relying on third parties somewhere along the line.
Yeah, but there's relying on core OS services, fundamental libraries like glibc and mature well known frameworks...and then there's being beholden to a python shim that adds little to no value

Anonymous
06/30/24(Sun)13:34:30 No.101218174

Anonymous 06/30/24(Sun)13:34:30 No.101218174

>>101218161
Fork and choose not to need the shim, then.

Anonymous
06/30/24(Sun)13:35:31 No.101218185

Anonymous 06/30/24(Sun)13:35:31 No.101218185

File: disappointment.png (1005 KB, 917x898)

1005 KB PNG

>>101218031
>nothing about hardware in the OP
>https://rentry.org/lmg-build-guides
This is something

Anonymous
06/30/24(Sun)13:36:42 No.101218204

Anonymous 06/30/24(Sun)13:36:42 No.101218204

>>101218128
turns out plugging them together is kinda of a pain and the latency is insane.

so was thinking if we just fundamentally remove the deps and merge the layers or something wondering if anyone tried playing around ??

Anonymous
06/30/24(Sun)13:42:39 No.101218274

Anonymous 06/30/24(Sun)13:42:39 No.101218274

/home/USER/llama.cpp/build/bin/llama-server -ngl 33 -m home/USER/Downloads/L3-70B-Euryale-v2.1-Q4_K_M.gguf

Just cmade llama.cpp. Why am I getting a seg fault?

Anonymous
06/30/24(Sun)13:42:57 No.101218277

Anonymous 06/30/24(Sun)13:42:57 No.101218277

uhh guys, what's gemma2 context and instruct templates?

Anonymous
06/30/24(Sun)13:46:58 No.101218326

Anonymous 06/30/24(Sun)13:46:58 No.101218326

File: _3ebc04f8-0d68-45c6-b7f1-(...).jpg (198 KB, 1024x1024)

198 KB JPG

>>101214617
command-r-plus

(sorry, ran out of Migus)

Anonymous
06/30/24(Sun)13:47:52 No.101218335

Anonymous 06/30/24(Sun)13:47:52 No.101218335

>>101218274
specs?

Anonymous
06/30/24(Sun)13:47:57 No.101218336

Anonymous 06/30/24(Sun)13:47:57 No.101218336

>>101218326
CR+ is kinda feeling like best overall. I haven't taken it programming but I might need to.

Anonymous
06/30/24(Sun)13:51:04 No.101218368

Anonymous 06/30/24(Sun)13:51:04 No.101218368

>>101218335
I run the same thing in kcpp no problem, so specs shouldn't be relevant. I have a 3090 and 32 gb ram, though.

This works:
./koboldcpp --usecublas --gpulayers 33 --model /home/USER/Downloads/L3-70B-Euryale-v2.1-Q4_K_M.gguf

Anonymous
06/30/24(Sun)13:53:30 No.101218394

Anonymous 06/30/24(Sun)13:53:30 No.101218394

>>101218277
You can modify these according to your preferences:
Context: https://files.catbox.moe/vcbyyx.json
Instruct: https://files.catbox.moe/hi0ho5.json

Anonymous
06/30/24(Sun)13:54:35 No.101218412

Anonymous 06/30/24(Sun)13:54:35 No.101218412

>>101218274
I'm using make and not cmake, but I usually find doing a make clean will clear up otherwise inexplicable segfaults. Maybe remove the cmakecache?

Anonymous
06/30/24(Sun)13:57:28 No.101218458

Anonymous 06/30/24(Sun)13:57:28 No.101218458

>>101218031
>>101218185
Worth noting that if you decide to build a mikubox, you should probably opt for P100s instead of P40s. P40s are starting to show their age. P100s are faster, supported in exllama2 and have fp16 tensor cores so can use flash attention.
But cuda dev says he will continue to support P40 and they still work if you just need a lot of VRAM for cheap.

Anonymous
06/30/24(Sun)14:01:03 No.101218506

Anonymous 06/30/24(Sun)14:01:03 No.101218506

>>101218412
Thanks for the suggestion. I am just running commands without truly understanding them, could probably figure this out eventually, but... is everything relevant stored in the llama.cpp folder? Can I just delete the folder and start over?

Anonymous
06/30/24(Sun)14:03:40 No.101218541

Anonymous 06/30/24(Sun)14:03:40 No.101218541

>>101218506
yeah you can, when using make you can also do make clean and when running make you can
-j12 (for 12 threads)

Anonymous
06/30/24(Sun)14:04:21 No.101218550

Anonymous 06/30/24(Sun)14:04:21 No.101218550

>>101217602
>>101217627
It's easy to build llama-cpp-python with updated llama.cpp

Anonymous
06/30/24(Sun)14:05:17 No.101218557

Anonymous 06/30/24(Sun)14:05:17 No.101218557

>>101218458
but p100s have only 16GB

Anonymous
06/30/24(Sun)14:08:15 No.101218599

Anonymous 06/30/24(Sun)14:08:15 No.101218599

File: FlippantBusinessMiku.png (1.25 MB, 1216x832)

1.25 MB PNG

>>101218550
>easy
I've got it working, but calling it easy is being flippant. unless you're a dyed-in-the-wool pythonfag its really not easy and I don't think there are any guides

Anonymous
06/30/24(Sun)14:11:01 No.101218633

Anonymous 06/30/24(Sun)14:11:01 No.101218633

>>101218599
i would cut off her head and use her as a throatpussy until i broke her

Anonymous
06/30/24(Sun)14:12:18 No.101218652

Anonymous 06/30/24(Sun)14:12:18 No.101218652

>>101218336
cr+ seems like the best overall but i'm going to do a big comparison today, spent most of yesterday just catching up and downloading models, i have the following i'm gonna run a bunch of tests against

>dolphin-mixtral:8x7b-v2.7
>dolphin-yi-1.5-32k:34b-v2.9.3
>command-r:35b
>command-r_plus:104b
>deepseek-coder-v2:16b
>deepseek-coder-v2:16b-instruct
>gemma2:9b
>gemma2:27b (this one is still broken right?)
>hermes2theta:8b
>hermes2theta:70b
>lama3_sppo_i3:8b
>llama3-chatqa:8b
>llama3-chatqa:70b
>phi3:14b-medium-128k-instruct
>phi3:3.8b-mini-4k-instruct
>smaug_llama3:70b
>tess2.5_phi:14b
>midnight-miqu-v1.5:70b

I'm mostly gonna be testing programming and RAG tasks but i'll throw in some roleplay, my plan is to just run like 6 prompts 3x each against all models and then human eval accuracy vs gen speed, all models are gonna be running fully in GPU

any tips/any models i missed that should be up there?

llama.cpp CUDA dev !!OM2Fp6Fn93S
06/30/24(Sun)14:14:16 No.101218677

llama.cpp CUDA dev !!OM2Fp6Fn93S 06/30/24(Sun)14:14:16 No.101218677

>>101218458
>P100s are faster, supported in exllama2 and have fp16 tensor cores so can use flash attention.
P100s have fast FP16 but they do not have FP16 tensor cores.
The FlashAttention Github page doesn't explicitly say whether or not P100s are supported but my impression is that V100 is the minimum.
llama.cpp FlashAttention definitely works with P100s though.

Anonymous
06/30/24(Sun)14:14:47 No.101218684

Anonymous 06/30/24(Sun)14:14:47 No.101218684

>>101218652
>no petra-13b-instruct
ngmi

Anonymous
06/30/24(Sun)14:17:56 No.101218729

Anonymous 06/30/24(Sun)14:17:56 No.101218729

>>101218142
It seems like you're expressing a strong emotional state. If you're feeling overwhelmed or anxious, it's important to take a moment for yourself to try and regain composure. Here are a few steps that may help you:

1. **Find a quiet space:** If possible, find a quiet and comfortable environment where you can sit down and focus on your breathing.

2. **Deep breathing:** Take slow, deep breaths, inhaling through your nose for a count of four, holding for a count of seven, and exhaling through your mouth for a count of eight. This technique is known as the 4-7-8 breathing method and can help reduce anxiety.

3. **Focus on the present:** Ground yourself in the present moment. Engage your senses by noting what you see, hear, touch, taste, and smell.

4. **Progressive muscle relaxation:** Tense each muscle group for a few seconds and then release the tension. Starting with your toes and working your way up to your head can help release physical tension.

5. **Reach out to someone:** Talk to a friend, family member, or a professional you trust about your feelings.

6. **Take a break:** Step away from the situation that's causing you distress, if possible.

7. **Practice mindfulness:** Engage in mindfulness exercises or meditation. There are many apps and online resources available to guide you.

8. **Physical activity:** Sometimes, physical exercise can help release built-up tension and stress.

If you find that your feelings of being gripped by anxiety or stress are persistent, it may be helpful to seek the support of a mental health professional who can help you develop strategies to manage and cope with these feelings.

Anonymous
06/30/24(Sun)14:19:13 No.101218750

Anonymous 06/30/24(Sun)14:19:13 No.101218750

>>101218729
my heart hurts when i edge

Anonymous
06/30/24(Sun)14:19:55 No.101218762

Anonymous 06/30/24(Sun)14:19:55 No.101218762

>https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/discussions/1
>We are planning to do 27B as soon as a stable release of transformers and vllm generation on Gemma-2-27B-It is available.

Shame about the context length, but will THIS be the best general use low context model for 24GB vramlets soon?

Anonymous
06/30/24(Sun)14:20:31 No.101218774

Anonymous 06/30/24(Sun)14:20:31 No.101218774

File: _438a4653-1ac1-4c21-b5fd-(...).jpg (161 KB, 1024x1024)

161 KB JPG

>>101218677
I switched my Mikubox over to 3x P100 (plus 2x 3090). P100 enables the use of exllamav2, which compared to P40 on llama.cpp is faster, but if you want flash attention, the minimum is actually Ampere.
I can run command-r-+ exl2 and get 3-5 t/s with a full context. It doesn't seem that slow because prompt processing is very fast, and the reply begins to stream back after only a short delay.

Anonymous
06/30/24(Sun)14:23:42 No.101218831

Anonymous 06/30/24(Sun)14:23:42 No.101218831

>>101218652
>ollama mode
The "tags" button gives you the Ollama names for the different quants. It'd be good to track that through your testing. I'm not alone in finding some quants to be better at facts that others, including the phenomenon that K_S seems to respect facts better than K_M, not only for parallel Q but even for smaller Qs.

I don't see any Qwen2 relatives in your playlist. Qwen2 and one of its spins, Magnum, passed my music theory and pop culture tests.

Anonymous
06/30/24(Sun)14:24:17 No.101218839

Anonymous 06/30/24(Sun)14:24:17 No.101218839

>>101216303
Stop projecting your suicidal fantasies onto others

Anonymous
06/30/24(Sun)14:24:28 No.101218841

Anonymous 06/30/24(Sun)14:24:28 No.101218841

>>101218458
>>101218185
thanks I missed that, new to this general

Anonymous
06/30/24(Sun)14:25:32 No.101218860

Anonymous 06/30/24(Sun)14:25:32 No.101218860

File: 1719383601544693.png (103 KB, 600x600)

103 KB PNG

>>101218831
>qwen2:70b

Anonymous
06/30/24(Sun)14:26:16 No.101218869

Anonymous 06/30/24(Sun)14:26:16 No.101218869

>>101218750
It sounds like you're referring to a sensation experienced during sexual arousal that is causing you discomfort. If you're experiencing pain during the plateau phase of sexual arousal (commonly referred to as "edging"), it's essential to address this with care. Here are some steps you can take:

1. **Stay calm:** Try to remain relaxed, as tension can exacerbate the sensation of pain.

2. **Reduce stimulation:** Temporarily slow down or stop the sexual activity that's causing the discomfort. Taking a break can help alleviate the pain.

3. **Communicate:** If you're with a partner, communicate what you're feeling. Honest communication is key to a healthy sexual relationship.

4. **Check for medical conditions:** Pain during arousal can be a symptom of an underlying medical condition, such as infections, skin disorders, or other sexual dysfunctions. It may be wise to consult a healthcare provider or a specialist in sexual health to rule out any medical issues.

5. **Consider lubrication:** If you're engaging in penetrative sex, ensure that there is adequate lubrication to reduce friction and potential pain.

6. **Look at technique:** Sometimes, the way sexual activity is performed can contribute to discomfort. Exploring different positions or techniques may help.

7. **Mindfulness and relaxation techniques:** Practicing mindfulness and relaxation exercises can help reduce anxiety and muscle tension, which might contribute to the pain.

8. **Seek professional help:** If the pain persists or if you're concerned, a consultation with a sex therapist or a medical professional specializing in sexual health could provide you with tailored advice and treatment options.

Remember, everyone's body is different, and what feels pleasurable for one person may not be the same for another. It's vital to listen to your body's signals and respond to them accordingly.

Anonymous
06/30/24(Sun)14:26:47 No.101218878

Anonymous 06/30/24(Sun)14:26:47 No.101218878

>>101214216
>Yi has gone closed

It's over

Anonymous
06/30/24(Sun)14:27:29 No.101218887

Anonymous 06/30/24(Sun)14:27:29 No.101218887

>>101218869
>lubrication
is this what everyone loves using?
(referring to chatgpt)

Anonymous
06/30/24(Sun)14:30:09 No.101218921

Anonymous 06/30/24(Sun)14:30:09 No.101218921

>>101218878
I just tried some Yi.
Failed the tests, couldn't pick up an RP cleanly.
We must do better.

Anonymous
06/30/24(Sun)14:33:16 No.101218960

Anonymous 06/30/24(Sun)14:33:16 No.101218960

>>101218921
That's just a meme. If the model is as good as Opus (meaning its MMLU, etc... are that high), then we can just finetune it to do RP, etc... Bad RP or just sounding too much like a bot is just bad RLHF, not an inherent setback of a model.

Anonymous
06/30/24(Sun)14:36:05 No.101218995

Anonymous 06/30/24(Sun)14:36:05 No.101218995

>>101218887
Lubrication can definitely enhance intimacy and pleasure, but whether or not "everyone" loves using it is a bit more nuanced.

Here's the thing:

* **It's personal:** Preferences for lubrication vary greatly from person to person. Some people find it essential, while others don't feel the need. There's no right or wrong answer!
* **Different needs:** The need for lubrication can also vary depending on factors like arousal levels, hormonal changes, medications, and individual body chemistry.
* **Types matter:** There are many types of lube (water-based, silicone-based, oil-based, etc.), each with its own pros and cons. Finding what works best for you is key.

Instead of focusing on what "everyone" does, it's more helpful to figure out what feels good for you and your partner(s). Communication is key! Talk openly about your needs and preferences to have the most enjoyable and comfortable experience possible.

Anonymous
06/30/24(Sun)14:38:30 No.101219026

Anonymous 06/30/24(Sun)14:38:30 No.101219026

What about attention makes transformers intelligent?

Anonymous
06/30/24(Sun)14:40:22 No.101219048

Anonymous 06/30/24(Sun)14:40:22 No.101219048

File: th-2278000819.jpg (40 KB, 474x606)

40 KB JPG

>>101219026
Nothing, they aren't magically smarter even if the outcome is closer to desired

Anonymous
06/30/24(Sun)14:41:35 No.101219061

Anonymous 06/30/24(Sun)14:41:35 No.101219061

>>101217559
>>101217856
>>101218161
>>101218185
>>101218599
shit taste btw

Anonymous
06/30/24(Sun)14:43:57 No.101219081

Anonymous 06/30/24(Sun)14:43:57 No.101219081

>>101219026
its still just a big trained neural network

Anonymous
06/30/24(Sun)14:45:05 No.101219094

Anonymous 06/30/24(Sun)14:45:05 No.101219094

>>101218762
maybe it could even compete with Q5 70b?

llama.cpp CUDA dev !!OM2Fp6Fn93S
06/30/24(Sun)14:47:07 No.101219115

llama.cpp CUDA dev !!OM2Fp6Fn93S 06/30/24(Sun)14:47:07 No.101219115

>>101218774
When I already have you here, can you benchmark this PR https://github.com/ggerganov/llama.cpp/pull/8215 vs. master on a P100 using either legacy or k-quants?

Anonymous
06/30/24(Sun)14:47:26 No.101219118

Anonymous 06/30/24(Sun)14:47:26 No.101219118

>>101218960
>Bad RP or just sounding too much like a bot is just bad RLHF
By bad RP, I mean for example in Kobold, I give a character sketch and some rules in the Author's note, reiterate the premise and characters in my first turn and set a starting point for the RP to run from and it,
- Immediately portrays my character, or goes 3P narrator and writes like a novel even though I said it was to be RP and which character was mine
- Repeats the premise or note information and goes nowhere with it
- Starts writing off something bland and uninspired, and repeats it over and over. I've had some of these models just write continuously the same two paragraphs like it's the fucking Shining.

L3, CR+, they seem to be fine with taking whatever, picking up the characters, and doing the needful.

Anonymous
06/30/24(Sun)14:51:47 No.101219160

Anonymous 06/30/24(Sun)14:51:47 No.101219160

>>101219094
I wouldn't think so. But it could be relatively close, enough to justify the speed difference, perhaps.

Anonymous
06/30/24(Sun)15:10:51 No.101219379

Anonymous 06/30/24(Sun)15:10:51 No.101219379

File: ita.png (206 KB, 652x563)

206 KB PNG

>>101218394
maronna... works in italian too

Anonymous
06/30/24(Sun)15:11:20 No.101219385

Anonymous 06/30/24(Sun)15:11:20 No.101219385

best model to make me bust a fat nutty?

Anonymous
06/30/24(Sun)15:11:55 No.101219388

Anonymous 06/30/24(Sun)15:11:55 No.101219388

>>101219385
petra-13b-instruct

Anonymous
06/30/24(Sun)15:16:14 No.101219429

Anonymous 06/30/24(Sun)15:16:14 No.101219429

>>101218831
maybe incluse wizardLM-8x22b

Anonymous
06/30/24(Sun)15:17:51 No.101219444

Anonymous 06/30/24(Sun)15:17:51 No.101219444

>>101218652
How much Vram? If you're doing 48gb, throw WizardLM-2-8x22B in there at 2.5bpw. Only 4k context admittedly but I think it's one of the smarter models.

Anonymous
06/30/24(Sun)15:18:31 No.101219452

Anonymous 06/30/24(Sun)15:18:31 No.101219452

>>101219385
CR+

Anonymous
06/30/24(Sun)15:19:00 No.101219456

Anonymous 06/30/24(Sun)15:19:00 No.101219456

File: 114774158_p0.jpg (3.47 MB, 3343x4737)

3.47 MB JPG

>>101218599
Fair enough figuring out what to do might not be easy, once you know it's a handful of simple commands tho. Sad so many aren't curious enough to teach themselves, or even search, I wrote about it many times https://desuarchive.org/g/search/text/vendor%2Fllama.cpp/

Anonymous
06/30/24(Sun)15:19:26 No.101219459

Anonymous 06/30/24(Sun)15:19:26 No.101219459

Should I use Q5_K_M or Q5_K_S if I have to close a program or two in order to fit M into memory?

Anonymous
06/30/24(Sun)15:19:56 No.101219462

Anonymous 06/30/24(Sun)15:19:56 No.101219462

>>101218652
Try Euryale too

Anonymous
06/30/24(Sun)15:20:57 No.101219471

Anonymous 06/30/24(Sun)15:20:57 No.101219471

>>101219459
Q4ks is good enough, q4km is great, q5* if you are paranoid

Anonymous
06/30/24(Sun)15:22:04 No.101219480

Anonymous 06/30/24(Sun)15:22:04 No.101219480

>>101219471
I mean I can fit Q5_K_S just fine, but I'm wondering if the difference between that and M would be large enough to be worth the hassle of needing to free up some memory when I want to run the model.

Anonymous
06/30/24(Sun)15:25:02 No.101219507

Anonymous 06/30/24(Sun)15:25:02 No.101219507

>>101219459
Did you try S and see if it suffices? Unless you're in a hurry to get it "right" the first time that is.

And some of us are still investigating the "K_S is truthier than K_M" phenomenon.

Anonymous
06/30/24(Sun)15:26:20 No.101219519

Anonymous 06/30/24(Sun)15:26:20 No.101219519

>>101219507
I was just facing a decision of which to download. Haven't tried anything yet.

Anonymous
06/30/24(Sun)15:30:21 No.101219568

Anonymous 06/30/24(Sun)15:30:21 No.101219568

I'm liking Higgs-L3 70B

Anonymous
06/30/24(Sun)15:33:12 No.101219598

Anonymous 06/30/24(Sun)15:33:12 No.101219598

>>101219519
Get S because it's a little smaller, download M while you're testing S, when M's down, compare.

Anonymous
06/30/24(Sun)15:35:05 No.101219616

Anonymous 06/30/24(Sun)15:35:05 No.101219616

the world is shit and should be nuked completely and instantly, the creator made a mistake

Anonymous
06/30/24(Sun)15:36:15 No.101219626

Anonymous 06/30/24(Sun)15:36:15 No.101219626

>>101218831
iirc midnight miqu is qwen2 based? but i'll grab magnum

as far as quants go i've done some testing and i've never really run into a situation where Q8 was obviously better than Q4_K_M, any time i've run into an issue where i though "maybe this is a quant problem" and upped the size to Q8 or Q6 it's never solved it

i'm generally doing RAG stuff anyway so i don't really care if the model gets things right as long as it doesn't fuck up on context

>>101219429
>>101219444
yeah ok, i'll put wizard LM on the list though it's going to be mostly academic, i'm down to 43gb of VRAM right now so I'll have to offload some layers to CPU

Anonymous
06/30/24(Sun)15:39:26 No.101219650

Anonymous 06/30/24(Sun)15:39:26 No.101219650

hello where are the uncucked vramlet llama

Anonymous
06/30/24(Sun)15:48:14 No.101219757

Anonymous 06/30/24(Sun)15:48:14 No.101219757

>>101219626
I don't know anything about what a miqu is. I figured it was a Mixtral thing.

>i've never really run into a situation where Q8 was obviously better than Q4_K_M
It's been a subtle factual details kind of thing, but that's why indicating the quants would be good for reference. And if you don't care if the model gets things right, then hell, i1-IQ1-XXXSNL let's goooooo.

Anonymous
06/30/24(Sun)16:07:06 No.101219949

Anonymous 06/30/24(Sun)16:07:06 No.101219949

Anyone have an idea of what will and won't work on a ROCm 5.2 device, well it's not even a real ROCm 5.2, it's a 5700xt, but I managed to get basic torch ops working so far

I'm testing Bert right now, does anyone have prior experience with this.

Anonymous
06/30/24(Sun)16:10:45 No.101219999

Anonymous 06/30/24(Sun)16:10:45 No.101219999

Hi, I'm new to AI, where do I download GPT 4?

Anonymous
06/30/24(Sun)16:13:40 No.101220033

Anonymous 06/30/24(Sun)16:13:40 No.101220033

File: 12.png (4 KB, 401x49)

4 KB PNG

what is this abomination of a quant?

Anonymous
06/30/24(Sun)16:13:59 No.101220038

Anonymous 06/30/24(Sun)16:13:59 No.101220038

hi, 'im also new to ai. whats' the best model for erp with 2bg of rvam?

Anonymous
06/30/24(Sun)16:14:05 No.101220040

Anonymous 06/30/24(Sun)16:14:05 No.101220040

>>101219999
https://huggingface.co/nomic-ai/gpt4all-falcon/tree/main

Anonymous
06/30/24(Sun)16:15:03 No.101220051

Anonymous 06/30/24(Sun)16:15:03 No.101220051

>>101220033
>Experimental, uses f16 for embed and output weights. Please provide any feedback of differences.
Interesting.

Anonymous
06/30/24(Sun)16:17:59 No.101220093

Anonymous 06/30/24(Sun)16:17:59 No.101220093

>>101220033
>>101220051
buy an ad

Anonymous
06/30/24(Sun)16:20:25 No.101220124

Anonymous 06/30/24(Sun)16:20:25 No.101220124

>>101220093
Introducing Gemma 2: The Future of Local AI by Google

Elevate Your Community with Cutting-Edge Technology!
Unleash the Power of AI in Your Hometown

Welcome to the future of local intelligence with Gemma 2, Google's latest innovation in Language Learning Models (LLM). Designed with community spirit in mind, Gemma 2 brings world-class AI capabilities right to your neighborhood.
Why Gemma 2?

Hyper-Local Precision: Tailored specifically for the unique needs of our local businesses, schools, and residents, Gemma 2 understands and speaks your language—literally!

Data Privacy at Its Core: Your data stays where it belongs—in your community. With advanced security measures, Gemma 2 ensures your information is protected and used responsibly.

Ultra-Fast Performance: Experience lightning-fast responses and unparalleled efficiency. Gemma 2 is optimized for high performance, ensuring you get the answers and insights you need in an instant.

Seamless Communication: Whether it's assisting customers, translating languages, or providing local updates, Gemma 2 enhances how we connect and interact within our community.

Revolutionize Your Daily Life

For Businesses: Supercharge customer service, streamline operations, and gain deep insights into local market trends.

For Schools: Enhance learning experiences with personalized educational tools and real-time support for students and teachers.

For Residents: Stay informed, get local recommendations, and simplify your day-to-day tasks with the help of a smart, responsive assistant.

Join the AI Revolution!

Be a part of the community that leads the way into a smarter, more connected future. Gemma 2 is here to support, innovate, and grow with you.
Available Exclusively in Your Area

Gemma 2 is launching exclusively for our local community. Get early access and be the first to experience the benefits of Google's latest AI marvel.

Anonymous
06/30/24(Sun)16:20:33 No.101220126

Anonymous 06/30/24(Sun)16:20:33 No.101220126

>>101219999
Make one yourself. its just coding 0s and 1s

Anonymous
06/30/24(Sun)16:21:28 No.101220136

Anonymous 06/30/24(Sun)16:21:28 No.101220136

>>101220051
Not at all.
Look at the previous thread, there's been plenty of, not discussion per se, more like calling the guy a overly excited retard due to lack of testing on his part and plenty of evidence to the contrary.

Anonymous
06/30/24(Sun)16:24:34 No.101220171

Anonymous 06/30/24(Sun)16:24:34 No.101220171

>>101220033
*taps sign*
>>101182212

Anonymous
06/30/24(Sun)16:25:55 No.101220188

Anonymous 06/30/24(Sun)16:25:55 No.101220188

>>101220136
https://huggingface.co/ZeroWw/activity/community

Anonymous
06/30/24(Sun)16:32:55 No.101220273

Anonymous 06/30/24(Sun)16:32:55 No.101220273

>https://github.com/ggerganov/llama.cpp/pull/8031
I've been waiting for this for a while now.

Anonymous
06/30/24(Sun)16:34:22 No.101220298

Anonymous 06/30/24(Sun)16:34:22 No.101220298

>>101220273
>glm3 and glm4 model architecture
what's that?

Anonymous
06/30/24(Sun)16:36:09 No.101220321

Anonymous 06/30/24(Sun)16:36:09 No.101220321

File: file.png (534 KB, 3477x1654)

534 KB PNG

>20x more expensive than Llama 3 8B
What did Sao mean by this?

Anonymous
06/30/24(Sun)16:36:52 No.101220335

Anonymous 06/30/24(Sun)16:36:52 No.101220335

File: 1719383601544693.png (155 KB, 600x600)

155 KB PNG

>>101220321
>different providers

Anonymous
06/30/24(Sun)16:37:59 No.101220356

Anonymous 06/30/24(Sun)16:37:59 No.101220356

>>101220298
Chat GML, another chink family of models.
The thing is that it came and went and made very little buz, but it's seemingly pretty good according to the few who tried it.
Can't wait to be able to compare it to L3 8b side by side on my most complicated cards.
Even aya 8b was just okay compared to L3 8b.

Anonymous
06/30/24(Sun)16:41:21 No.101220391

Anonymous 06/30/24(Sun)16:41:21 No.101220391

>>101220356
now you have to compare to gemma2-9b, it's the new king of small models kek

Anonymous
06/30/24(Sun)16:43:56 No.101220429

Anonymous 06/30/24(Sun)16:43:56 No.101220429

>>101220391
Nah.
I'll just skip it for now due to all the brokeness and due to being incompatible with flash attention.

Anonymous
06/30/24(Sun)16:44:07 No.101220434

Anonymous 06/30/24(Sun)16:44:07 No.101220434

>>101220391
nah, Google cheated.

Anonymous
06/30/24(Sun)16:45:29 No.101220448

Anonymous 06/30/24(Sun)16:45:29 No.101220448

>>101220434
>>101220429
I tested it, it's better than llama3-8b in my opinion, and everyone cheat kek

Anonymous
06/30/24(Sun)16:45:34 No.101220454

Anonymous 06/30/24(Sun)16:45:34 No.101220454

I'm still laughing at ollama's attempt at having Gemma 2 support before everybody else.

Anonymous
06/30/24(Sun)16:46:07 No.101220457

Anonymous 06/30/24(Sun)16:46:07 No.101220457

>>101215609
27B seems retarded/schizo when I run it in FP16 with Transformers too, so it's not just a llamacpp issue. Seems like nobody's got the right inference code yet to make it work like it works on lmsys.

Anonymous
06/30/24(Sun)16:47:02 No.101220474

Anonymous 06/30/24(Sun)16:47:02 No.101220474

>>101220448
I also like Gemma 2 27B better than Llama 3 70B, at least with less than 4k context before everything breaks.

Anonymous
06/30/24(Sun)16:48:23 No.101220490

Anonymous 06/30/24(Sun)16:48:23 No.101220490

>>101220457
>Seems like nobody's got the right inference code yet to make it work like it works on lmsys.
What does lmsys use to run models then? I thought it was using the transformers loader

Anonymous
06/30/24(Sun)16:49:19 No.101220504

Anonymous 06/30/24(Sun)16:49:19 No.101220504

>>101220490
Google API

Anonymous
06/30/24(Sun)16:49:45 No.101220513

Anonymous 06/30/24(Sun)16:49:45 No.101220513

>>101220490
I don't know. But 27B on lmsys it's definitely nothing like when we load it in Transformers locally (much less schizo). Some have suspected Google are hosting it themselves, and hooked lmsys up with an API.

Anonymous
06/30/24(Sun)16:52:40 No.101220548

Anonymous 06/30/24(Sun)16:52:40 No.101220548

File: SUCKS.jpg (483 KB, 3402x1651)

483 KB JPG

>>101220504
Small models really suck at trivia, that's a shame...

Anonymous
06/30/24(Sun)16:53:33 No.101220555

Anonymous 06/30/24(Sun)16:53:33 No.101220555

>>101220548
>Google ads built into the model

Anonymous
06/30/24(Sun)16:53:48 No.101220557

Anonymous 06/30/24(Sun)16:53:48 No.101220557

>>101220548
>just completely hallucinates doing a Google search
Kek

Anonymous
06/30/24(Sun)16:54:58 No.101220573

Anonymous 06/30/24(Sun)16:54:58 No.101220573

File: Rin.jpg (41 KB, 600x600)

41 KB JPG

>>101220548
>Rin looks at gemma-2 with a hint of pity and annoyance

Anonymous
06/30/24(Sun)16:57:18 No.101220597

Anonymous 06/30/24(Sun)16:57:18 No.101220597

>>101220548
I think they don't suck enough.
Once they make a small LLM that isn't even able to recognize who is Santa Claus we will be eating good. There's no need to bloat the already small weights with useless knowledge.

Anonymous
06/30/24(Sun)16:57:19 No.101220598

Anonymous 06/30/24(Sun)16:57:19 No.101220598

>>101219757
can you give an example of the sort of situation where the S vs M truthiness difference occurs? I'm curious, i just meant that because most of the values i'm looking for will be loaded into the context it probably matters less, but i'm still interested in testing/understanding what's going on here

(tho with my large vram i end up using 6_K for most of the small models and then whatever the largest quant that fits into vram for the big ones, which usually *is* an S of some sort)

Anonymous
06/30/24(Sun)16:58:18 No.101220611

Anonymous 06/30/24(Sun)16:58:18 No.101220611

>>101220597
>There's no need to bloat the already small weights with useless knowledge.
oh there is anon, I love to talk to people about my favorite movies/films/series/games, and if the LLM hallucinates while doing so it just sucks

Anonymous
06/30/24(Sun)16:58:56 No.101220624

Anonymous 06/30/24(Sun)16:58:56 No.101220624

>>101220548
>>101220597
yeah the solution here is the model for model things and RAG for facts

Anonymous
06/30/24(Sun)16:59:30 No.101220632

Anonymous 06/30/24(Sun)16:59:30 No.101220632

>>101220548
I wonder what would be the treshold size required to get all the human history trivia memorized on itself, I doubt it's 70b, it's gotta be bigger than that

Anonymous
06/30/24(Sun)17:01:37 No.101220660

Anonymous 06/30/24(Sun)17:01:37 No.101220660

>>101220624
or, having the model searsh through the internet like Bing chat

Anonymous
06/30/24(Sun)17:02:07 No.101220668

Anonymous 06/30/24(Sun)17:02:07 No.101220668

>>101220624
>RAG for facts
rag lorebooks whatever don't work for a simple reason, the model can't bring up something by itself, it has to be triggered by something, either user input or, something the model has already said, if the latter, possibly something the model fucked up, and having the correct info after won't help

Anonymous
06/30/24(Sun)17:04:56 No.101220693

Anonymous 06/30/24(Sun)17:04:56 No.101220693

When will we get a multimodal open sourced llm? Will it ever happen?

Anonymous
06/30/24(Sun)17:06:54 No.101220724

Anonymous 06/30/24(Sun)17:06:54 No.101220724

>>101220693
Very soon, llama 3 400B.

Anonymous
06/30/24(Sun)17:07:28 No.101220735

Anonymous 06/30/24(Sun)17:07:28 No.101220735

>>101220597
We already tested this hypothesis and it's flawed. Models like Phi are made to be extremely good at reasoning, but even with plentiful context they suck to actually use. Even many big models will be worse at cards featuring niche IP, even if you insert as much information as possible about them into context. The reality is that a model not trained on niche knowledge will also be worse at manipulating such knowledge when inserted through RAG.

Anonymous
06/30/24(Sun)17:11:57 No.101220800

Anonymous 06/30/24(Sun)17:11:57 No.101220800

File: Screenshot 2024-07-01 091139.png (282 KB, 1177x931)

282 KB PNG

Hermes models guy posted that he's suddenly seeing 405B show up as an option for the AI in WhatsApp

https://twitter.com/Teknium1/status/1807490685387591983

Anonymous
06/30/24(Sun)17:12:57 No.101220810

Anonymous 06/30/24(Sun)17:12:57 No.101220810

>>101220800
Why is he using the sarcasm emoji tho

Anonymous
06/30/24(Sun)17:13:45 No.101220821

Anonymous 06/30/24(Sun)17:13:45 No.101220821

File: __shirakami_fubuki_shirak(...).jpg (263 KB, 1480x1200)

263 KB JPG

>>101220800
how many 3090s will I need to run that?

Anonymous
06/30/24(Sun)17:14:58 No.101220837

Anonymous 06/30/24(Sun)17:14:58 No.101220837

>>101220810
He's annoyed that the platforms he uses aren't the ones they're making their beta testers.

Anonymous
06/30/24(Sun)17:17:31 No.101220878

Anonymous 06/30/24(Sun)17:17:31 No.101220878

I didn't know that it will be multimodal, that's really cool.
>400B
But somehow I don't have 12 rtx3090 to run it anyway...

Anonymous
06/30/24(Sun)17:17:34 No.101220880

Anonymous 06/30/24(Sun)17:17:34 No.101220880

>>101220735
Phi is great though. Maybe not for RP, but for reasoning it's very good.

Anonymous
06/30/24(Sun)17:17:42 No.101220881

Anonymous 06/30/24(Sun)17:17:42 No.101220881

>>101220821
22 of them should do for 8 bpw at 32K context I reckon.

Anonymous
06/30/24(Sun)17:19:36 No.101220904

Anonymous 06/30/24(Sun)17:19:36 No.101220904

>>101220881
>22 3090 gpus needed
bruh...

Anonymous
06/30/24(Sun)17:21:02 No.101220918

Anonymous 06/30/24(Sun)17:21:02 No.101220918

>>101220800
Let's not pretend that 400B is going to be worth running anyway. Even if it's slightly better than what we have right now, it's not worth the money.

Anonymous
06/30/24(Sun)17:22:05 No.101220937

Anonymous 06/30/24(Sun)17:22:05 No.101220937

>>101220881
>>spend 15400$ on gpus
>>the model is censored, finetune never
It's over..

Anonymous
06/30/24(Sun)17:22:35 No.101220943

Anonymous 06/30/24(Sun)17:22:35 No.101220943

>>101220918
I think it's gonna be a monster, Meta have probably paid tens of millions of dollars to train this shit, they will do anything in their power to get what they want

Anonymous
06/30/24(Sun)17:23:36 No.101220952

Anonymous 06/30/24(Sun)17:23:36 No.101220952

>>101220937
finetuning a 405b model would be so fucking expensive, no one is gonna do that yeah

Anonymous
06/30/24(Sun)17:26:15 No.101220983

Anonymous 06/30/24(Sun)17:26:15 No.101220983

>>101220880
Yes. The point is that just great reasoning is not enough to make it a great model for everyone. It's good that it exists and can serve certain niche use cases.

Anonymous
06/30/24(Sun)17:31:00 No.101221047

Anonymous 06/30/24(Sun)17:31:00 No.101221047

>>101220937
At least it'll exist. It could encourage other good developments in the industry.

Anonymous
06/30/24(Sun)17:31:45 No.101221057

Anonymous 06/30/24(Sun)17:31:45 No.101221057

>>101220918
I'll use it for smut if it's good at it and affordable on OpenRouter. Nemotron which is almost as large still costs 4 times less than Claude Opus.

Anonymous
06/30/24(Sun)17:32:50 No.101221068

Anonymous 06/30/24(Sun)17:32:50 No.101221068

>>101220943
I dunno, gemma is only 27b and is comparable with llama3 70b now. It's unlikely that 400b was trained very differently (better) than 70b, it's just bigger. I am kinda interested in its multimodal capabilities though.

Anonymous
06/30/24(Sun)17:33:25 No.101221074

Anonymous 06/30/24(Sun)17:33:25 No.101221074

>>101221057
That's interesting, maybe it could encourage Anthropic to drop their prices a bit.

Anonymous
06/30/24(Sun)17:33:30 No.101221075

Anonymous 06/30/24(Sun)17:33:30 No.101221075

>>101220952
people absolutely will, it's just that it's going to be AI grifters benchslopping for VC bux instead of anything good

Anonymous
06/30/24(Sun)17:34:14 No.101221082

Anonymous 06/30/24(Sun)17:34:14 No.101221082

>>101220937
>gemma is only 27b and is comparable with llama3 70b now
[source needed]

Anonymous
06/30/24(Sun)17:34:22 No.101221083

Anonymous 06/30/24(Sun)17:34:22 No.101221083

>>101221075
who will be their public though? who the fuck has 22x3090s in their home kek

Anonymous
06/30/24(Sun)17:35:13 No.101221093

Anonymous 06/30/24(Sun)17:35:13 No.101221093

>>101220668
naw you just embed the question + the model's halucination in response and then use that to do a second pass with semantic matches from a vector db

Anonymous
06/30/24(Sun)17:35:16 No.101221094

Anonymous 06/30/24(Sun)17:35:16 No.101221094

>>101221083
You don't need to run it at Q8. 11x3090 can get you Q4 and ~6 of them maybe still Q2

Anonymous
06/30/24(Sun)17:37:34 No.101221119

Anonymous 06/30/24(Sun)17:37:34 No.101221119

>>101221047
Yeah, that's a fair point and I wish for this, It's good to inhale some hopium from time to time.

Anonymous
06/30/24(Sun)17:38:34 No.101221129

Anonymous 06/30/24(Sun)17:38:34 No.101221129

>>101221093
>the question
what if I'm doing rp and want the model to bring something random up? that's literally what 'soul' is for some, the model bringing up a meme or whatever, not everything is a question

Anonymous
06/30/24(Sun)17:38:50 No.101221130

Anonymous 06/30/24(Sun)17:38:50 No.101221130

>>101220598
For me, primarily it has been my music theory question test. Nothing complicated, but K_M models seem to forget how the chromatic scale works while the parallel K_S would get it right. That doesn't mean every _S was right, but they could get it right down to DSC's i1-IQ3_XXS, while I've never seen a K_M get it right except maybe c4ai-command-r-plus.Q4_K_M, which is borderline; it got it right, then it goofed up when it summarized, so I give it half credit till I test it again later since it also could've hallucinated the correct answer at first.

Large quants, Q6_K, Q8_0 also can pass.

The difference is that _0 and _S quants are consistent. _M does some quants at Qn and some at Qn+1 (_L is +2) and my suspicion is that this imbalance disrupts the inference in a way that favors what's typical over what's factual even though those facts are in the model.

Anonymous
06/30/24(Sun)17:39:20 No.101221137

Anonymous 06/30/24(Sun)17:39:20 No.101221137

is it worth waiting for gemma 2 27b to become stable if i can already run qwen2 70b at q4k_s/m?

Anonymous
06/30/24(Sun)17:39:39 No.101221142

Anonymous 06/30/24(Sun)17:39:39 No.101221142

>>101221082
Lmsys, also my own testing

Anonymous
06/30/24(Sun)17:40:23 No.101221157

Anonymous 06/30/24(Sun)17:40:23 No.101221157

>>101221137
no

Anonymous
06/30/24(Sun)17:44:59 No.101221203

Anonymous 06/30/24(Sun)17:44:59 No.101221203

>>101221157
qwen2 is very good, but also very autistic. It's like a smarter less slopped mixtral, but same kind of autism when you try to get it to "act naturally"

Anonymous
06/30/24(Sun)17:45:43 No.101221213

Anonymous 06/30/24(Sun)17:45:43 No.101221213

>>101221142
lmsys doesn't mean shit

Anonymous
06/30/24(Sun)17:47:07 No.101221228

Anonymous 06/30/24(Sun)17:47:07 No.101221228

>>101221203
gemma is 4kektexts for now, maybe swa'll get fixed, but even then it'd just be 8 so i find that hard to be excited for

Anonymous
06/30/24(Sun)17:48:01 No.101221236

Anonymous 06/30/24(Sun)17:48:01 No.101221236

>>101221203
I was about to post "why can only commercial model makers figure out how to make a model that's but smart AND soulful at the same time", but then I realized that's too broad, it's not 'commercial model makers', it's literally just Anthropic. Everyone else's proprietary models are autistic too.

Anonymous
06/30/24(Sun)17:48:13 No.101221239

Anonymous 06/30/24(Sun)17:48:13 No.101221239

>>101221094
>11x3090 can get you Q4 and ~6 of them maybe still Q2
Is there anyone here who has at least 6 3090s?

Anonymous
06/30/24(Sun)17:48:55 No.101221249

Anonymous 06/30/24(Sun)17:48:55 No.101221249

>>101221239
johannes (cuda dev) 6x 4090 iirc

Anonymous
06/30/24(Sun)17:51:14 No.101221267

Anonymous 06/30/24(Sun)17:51:14 No.101221267

>>101221239
Check OP's pic and weep.

Anonymous
06/30/24(Sun)17:51:40 No.101221270

Anonymous 06/30/24(Sun)17:51:40 No.101221270

>>101221236
They ripped the soul out of Sonnet 3.5 too, it's very smart but it's not Claude anymore.
So it seems they're going the "autistic butler" route too now. I guess we'll see if Opus 3.5 is the same.

Anonymous
06/30/24(Sun)17:54:46 No.101221305

Anonymous 06/30/24(Sun)17:54:46 No.101221305

>>101221267
>>101221249
If I were a rich man, yaba dibba dibba dibba dibba dibba dibba dum... *cries*

Anonymous
06/30/24(Sun)17:55:39 No.101221313

Anonymous 06/30/24(Sun)17:55:39 No.101221313

I personally prefer a focus on the assistant use case but it does suck when literally EVERYONE focuses on it, decreasing true diversity and inclusivity.

Anonymous
06/30/24(Sun)17:56:58 No.101221327

Anonymous 06/30/24(Sun)17:56:58 No.101221327

File: 1719299488912056.gif (3 MB, 1920x1080)

3 MB GIF

>>101221305
You want a hug, anon?

Anonymous
06/30/24(Sun)18:03:42 No.101221405

Anonymous 06/30/24(Sun)18:03:42 No.101221405

>>101221327
This smiley is so cute ;_;

Anonymous
06/30/24(Sun)18:04:49 No.101221416

Anonymous 06/30/24(Sun)18:04:49 No.101221416

>>101221327
Hugging is overrated, I want muneh! *grins mischievously*

Anonymous
06/30/24(Sun)18:14:41 No.101221515

Anonymous 06/30/24(Sun)18:14:41 No.101221515

Is there any frontend or maybe SillyTavern extension that can read the context and then perform a command in a terminal (of course with a command whitelist so it can't do anything bad)? I want to control stuff on my PC while also just chatting with my model.

Anonymous
06/30/24(Sun)18:20:40 No.101221569

Anonymous 06/30/24(Sun)18:20:40 No.101221569

>>101221129
then the strategy of a tiny model + rag is not for you lol, i'm not saying that small models + rag are the end all be all of LLMs my guy, just that when it comes to small models it's a very effective strategy to get better perf on weak hardware

>>101221130
interesting, i'll investigate this

Anonymous
06/30/24(Sun)18:29:34 No.101221668

Anonymous 06/30/24(Sun)18:29:34 No.101221668

>>101221129
you can kinda emulate this by injecting stuff into the prompt randomly. people don't experiment enough with this, it can be fairly powerful.

Anonymous
06/30/24(Sun)18:31:49 No.101221689

Anonymous 06/30/24(Sun)18:31:49 No.101221689

>>101221668
>cuddling with waifu
>"did you hear about harambe anon?"

Anonymous
06/30/24(Sun)18:32:00 No.101221692

Anonymous 06/30/24(Sun)18:32:00 No.101221692

why the fuck gemma so slow
I don't even get 0.5 tok/s

Anonymous
06/30/24(Sun)18:32:01 No.101221693

Anonymous 06/30/24(Sun)18:32:01 No.101221693

>>101221270
The annoying thing is for a corpo the size of the ones that do this shit now it would probably not even be difficult to make a model that would be absolutely mind blowing at all kinds of entertainment. They just don't care to. If a model turns out with "soul" it was more of an oversight than anything else.

Anonymous
06/30/24(Sun)18:32:13 No.101221696

Anonymous 06/30/24(Sun)18:32:13 No.101221696

i still get my main llmslop enjoyment out of a 3 month old 7b merge i made for myself
despite my efforts i have not found a single L3 model that doesnt bore me within 20 minutes
if anyone knows of some soulful schizo retard L3 models, kind of like what mistral holodeck was with low coherence but high entertainment value and no slop, im interested

Anonymous
06/30/24(Sun)18:37:28 No.101221755

Anonymous 06/30/24(Sun)18:37:28 No.101221755

>>101221696
i recommend CR+ at Q8

Anonymous
06/30/24(Sun)18:38:49 No.101221768

Anonymous 06/30/24(Sun)18:38:49 No.101221768

>>101221689
I can work with this.

Anonymous
06/30/24(Sun)18:40:21 No.101221784

Anonymous 06/30/24(Sun)18:40:21 No.101221784

>>101220321
is something going on with openrouter?
It's giving me Cloudflare errors

Anonymous
06/30/24(Sun)18:41:15 No.101221798

Anonymous 06/30/24(Sun)18:41:15 No.101221798

How far off do you think we are from being able to tell an AI to design a body for itself, and it being able to design that functional body? I imagine some company will make a killing from designing robots to the specifications that are sent to them.

Anonymous
06/30/24(Sun)18:42:41 No.101221809

Anonymous 06/30/24(Sun)18:42:41 No.101221809

>>101221798
Quite far. We keep training image models on 2D art instead of 3D models so it can create actual spatial forms and then render them into 2D. Which would solve a lot of the problems that image gen suffers right now.

Anonymous
06/30/24(Sun)18:44:23 No.101221824

Anonymous 06/30/24(Sun)18:44:23 No.101221824

>>101221755
i believe you are unironically correct but am condemned to vramlet hell unfortunately
also i am not looking for a coherent model, just something 8B that has creativity and nothing else, something overtrained on literature, forums, and/or web stories like holodeck was so i can merge it

Anonymous
06/30/24(Sun)18:46:03 No.101221835

Anonymous 06/30/24(Sun)18:46:03 No.101221835

>>101220800
That means they finished training.
I'm assuming they will open-source it in a week or two.

Anonymous
06/30/24(Sun)18:46:12 No.101221837

Anonymous 06/30/24(Sun)18:46:12 No.101221837

>>101221824
>something overtrained on literature
https://huggingface.co/maldv/badger-writer-llama-3-8b
https://huggingface.co/maldv/llama-3-fantasy-writer-8b

Anonymous
06/30/24(Sun)18:49:05 No.101221862

Anonymous 06/30/24(Sun)18:49:05 No.101221862

Before I go and reinvent the wheel, is there something that evaluates how well an LLM can generate a piece of text? For example, given the phrase "The company that created LLaMA is Meta.", is there an algorithm that calculates the probability for each token if the LLM were to generate them? I guess that's what perplexity is for?

Anonymous
06/30/24(Sun)18:53:56 No.101221910

Anonymous 06/30/24(Sun)18:53:56 No.101221910

google won.

Anonymous
06/30/24(Sun)19:05:49 No.101222008

Anonymous 06/30/24(Sun)19:05:49 No.101222008

ive skimmed over a lot of the op post. need tiny point in the right direction. thanks in advance.

i wish to run my llm to my phone with my back end being a pc that is never used besides AI shit. ik 1 way is droidair but its shit. what are other ways

i kinda wanna give access to multiple family members but if its too much of a hassle its just for me

Anonymous
06/30/24(Sun)19:06:50 No.101222022

Anonymous 06/30/24(Sun)19:06:50 No.101222022

>>101222008
>he fell for the outdated op post
ngmi

Anonymous
06/30/24(Sun)19:09:56 No.101222053

Anonymous 06/30/24(Sun)19:09:56 No.101222053

>>101222022
not ssure what youre talking about? gotta love the smartasses who give zero context. im not here all day everyday hours at a time

you autis

Anonymous
06/30/24(Sun)19:10:27 No.101222058

Anonymous 06/30/24(Sun)19:10:27 No.101222058

>>101220800
>Hermes guy saw
It was already posted earlier, retard. >>101217383

Anonymous
06/30/24(Sun)19:12:29 No.101222084

Anonymous 06/30/24(Sun)19:12:29 No.101222084

>>101221835
Why a week or two? It's already July and a Monday. Now we wait for them to start their work day.

Anonymous
06/30/24(Sun)19:18:24 No.101222163

Anonymous 06/30/24(Sun)19:18:24 No.101222163

>>101222008
Run whatever backend you want on your pc. Anything that provides a web ui. On the phone you just need a web browser. You'll need to figure out authentication if it's gonna be exposed to the internet. If it's only for your LAN you don't need auth. Try llama.cpp's server (llama-server).

Anonymous
06/30/24(Sun)19:22:25 No.101222208

Anonymous 06/30/24(Sun)19:22:25 No.101222208

>>101222008
>>101222163
And be sure you don't use old Ollama because it had an exploit.

Anonymous
06/30/24(Sun)19:23:55 No.101222227

Anonymous 06/30/24(Sun)19:23:55 No.101222227

>>101221835
>(preview)
No. It doesn't always mean finished training. It could be a checkpoint. But yes pretraining probably finished and this might be a fine tune checkpoint.

Anonymous
06/30/24(Sun)19:34:46 No.101222349

Anonymous 06/30/24(Sun)19:34:46 No.101222349

An AI trained on all movies/tv shows/video known to man
An AI trained on all music known to man

Anonymous
06/30/24(Sun)19:36:03 No.101222364

Anonymous 06/30/24(Sun)19:36:03 No.101222364

>>101222349
should train an AI to find anomaly's on the ocean floor

Anonymous
06/30/24(Sun)19:38:08 No.101222391

Anonymous 06/30/24(Sun)19:38:08 No.101222391

>>101222349
Most of everything is shit. And it's also hard to define what 'good' is. Cheap training with custom data is the only way.

Anonymous
06/30/24(Sun)19:38:11 No.101222393

Anonymous 06/30/24(Sun)19:38:11 No.101222393

>>101222349
That would require a lot of alignment for safety.

Anonymous
06/30/24(Sun)19:38:20 No.101222396

Anonymous 06/30/24(Sun)19:38:20 No.101222396

>She leans in closer, her voice dropping to a conspiratorial whisper.

Anonymous
06/30/24(Sun)19:44:00 No.101222468

Anonymous 06/30/24(Sun)19:44:00 No.101222468

>>101222393
>After removing all questionable content from the dataset, we trained the model on the resulting 13 billion tokens. A 470M parameter model seems to be sufficient for this task.

Anonymous
06/30/24(Sun)19:49:04 No.101222522

Anonymous 06/30/24(Sun)19:49:04 No.101222522

>>101216935
Nice. I think I spent 1k+ hours on that game, despite the absolutely horrendous screenplay.

Anonymous
06/30/24(Sun)19:50:37 No.101222535

Anonymous 06/30/24(Sun)19:50:37 No.101222535

>>101221130
>I've never seen a K_M get it right except maybe c4ai-command-r-plus.Q4_K_M, which is borderline; it got it right, then it goofed up when it summarized
Just retested it, it failed. It indeed hallucinated a correct answer by chance.

Anonymous
06/30/24(Sun)20:00:50 No.101222623

Anonymous 06/30/24(Sun)20:00:50 No.101222623

>using kobold with ST
>group chatting
>each reply resets the prompt processing making generation take forever
why

Anonymous
06/30/24(Sun)20:03:36 No.101222648

Anonymous 06/30/24(Sun)20:03:36 No.101222648

File: 1698297232185128.png (4 KB, 272x63)

4 KB PNG

>>101222623
Because a big chunk of your prompt changes with each new message as ST swaps out the character cards.

Anonymous
06/30/24(Sun)20:04:19 No.101222653

Anonymous 06/30/24(Sun)20:04:19 No.101222653

>>101222623
Because each card has a different character description that causes the cached prompt to be reprocessed from where the card begins onward.

Anonymous
06/30/24(Sun)20:07:22 No.101222680

Anonymous 06/30/24(Sun)20:07:22 No.101222680

>>101222648
>>101222653
Is there a way to fix it? I tried switch to join character cards, but that didn't seem to fix it. I think I may have to delete the group chat and start a new one with that option enabled, but I don't want to lose my current chat progress.

Anonymous
06/30/24(Sun)20:20:29 No.101222816

Anonymous 06/30/24(Sun)20:20:29 No.101222816

>>101222680
deleting the group chat wouldn't do anything, ST is just broken shit when it comes to group chats

Anonymous
06/30/24(Sun)20:21:59 No.101222830

Anonymous 06/30/24(Sun)20:21:59 No.101222830

>>101222816
Damn, so group chats are just unusable in ST then... That sucks.

Anonymous
06/30/24(Sun)20:24:08 No.101222857

Anonymous 06/30/24(Sun)20:24:08 No.101222857

>>101222680
One thing you could do is to empty the description field of the cards and put the information in the character notes at low depth.
Or merge all the descriptions in one description and copy the information into all of the character card's descriptions.

Anonymous
06/30/24(Sun)20:28:03 No.101222887

Anonymous 06/30/24(Sun)20:28:03 No.101222887

How long until Gemma 2 27B is usable?

Anonymous
06/30/24(Sun)20:29:30 No.101222899

Anonymous 06/30/24(Sun)20:29:30 No.101222899

>>101222680
What I do is put all the character definitions into separate entries of a lorebook and then chat with a single scenario card that references those characters. Probably not the best workaround, but it beats having to wait forever for the context to reload every time.

Anonymous
06/30/24(Sun)20:29:58 No.101222909

Anonymous 06/30/24(Sun)20:29:58 No.101222909

>>101222393
You could just hype it up but not release it

Anonymous
06/30/24(Sun)20:32:43 No.101222932

Anonymous 06/30/24(Sun)20:32:43 No.101222932

>>101222623
If you have your model in VRAM it should be extremely fast, even for very large contexts. Just upgrade bro.

Anonymous
06/30/24(Sun)20:39:35 No.101222997

Anonymous 06/30/24(Sun)20:39:35 No.101222997

>>101222887
Wagies go back to their shifts on monday. Give them a few days.

Anonymous
06/30/24(Sun)20:44:10 No.101223032

Anonymous 06/30/24(Sun)20:44:10 No.101223032

>>101214216
>Gemma 27B is arguably the best model for vramlets, yet its GGUF support is still kind of shaky

Anonymous
06/30/24(Sun)20:48:47 No.101223083

Anonymous 06/30/24(Sun)20:48:47 No.101223083

>>101223032
It's not just llamacpp/GGUFs despite what this thread focuses on, the Transformers implementation of 27B is retarded and schizo too.

No one has local inference of 27B that's as functional as the lmsys version yet and there's no definitive answer yet as to what's wrong (again, llamacpp isn't the whole picture because Transformers doesn't work properly either).

Anonymous
06/30/24(Sun)21:05:36 No.101223220

Anonymous 06/30/24(Sun)21:05:36 No.101223220

>>101222857
The likelihood of it writing for other characters and (you) will be greater if you do this

Anonymous
06/30/24(Sun)21:10:33 No.101223257

Anonymous 06/30/24(Sun)21:10:33 No.101223257

File: 3462454523.png (7 KB, 944x100)

7 KB PNG

>>101222857
>>101222899
these workarounds seem kinda janky. I just wish group chatting worked better.
>>101222932
vram too low (8gb). The longer the chat goes on, the more tokens that have to be reprocessed... I can't fap to these gen times...

Anonymous
06/30/24(Sun)21:11:22 No.101223265

Anonymous 06/30/24(Sun)21:11:22 No.101223265

>>101223083
How did this even happen? Google released a HF Transformers version of the model. Shouldn't it be GOOGLE who develops that implementation and upstreams it to transformers, and is responsible for checking that it's identical, logit for logit, to whatever internal tensorflow / jax implementation they use for the API?

Like I don't fucking get it, a bunch of people want to run your model, start fine tuning it, etc (including me), and the Transformers implementation is still just broken somehow for 27b.

Anonymous
06/30/24(Sun)21:14:07 No.101223288

Anonymous 06/30/24(Sun)21:14:07 No.101223288

>>101223257
>these workarounds seem kinda janky.
They are.

> I just wish group chatting worked better.
The thing is, due to how context cache works and how this character card based system works, the standard one front end one backend one cache setup can't work in any other way.
One way it could work better was if Silly leveraged llama.cpp's slot system, since you can have different caches associated with different slots (I think), so each character of a group chat would call the server API on a different slot with its own cached processed prompt.
Something like that.

Anonymous
06/30/24(Sun)21:16:34 No.101223314

Anonymous 06/30/24(Sun)21:16:34 No.101223314

>>101223032
Just asked 3.5 sonnet and Gemma 27B for some very niche coding help regarding huggingface-cli and downloading specific subfolders, 3.5 sonnet hallucinates a command that does not exist, gemma 27B gives me the proper answer (use wget or just clone the repository).

Earlier Gemma also gave me a very comprehensive html/css design based on my description. I can finally actually trust local models fo cli and coding knowledge about as much as I trust closed cloud shit, not bad.

Anonymous
06/30/24(Sun)21:18:07 No.101223331

Anonymous 06/30/24(Sun)21:18:07 No.101223331

>>101223265
Yeah it's fucking weird. Transformers doesn't even have sampling working properly for it yet. It's like Google doesn't want people to be able to run it properly.

Anonymous
06/30/24(Sun)21:21:21 No.101223360

Anonymous 06/30/24(Sun)21:21:21 No.101223360

they are always going to intentionally cap how good the local models they release are.
we're in hell

Anonymous
06/30/24(Sun)21:33:53 No.101223447

Anonymous 06/30/24(Sun)21:33:53 No.101223447

File: MiquTraining.png (1.41 MB, 1216x832)

1.41 MB PNG

>>101223360
Don't worry. She's just training so she doesn't disappoint you.
Once humans know something is possible, it becomes nigh-on inevitable.

Anonymous
06/30/24(Sun)21:36:29 No.101223468

Anonymous 06/30/24(Sun)21:36:29 No.101223468

>>101223447
I trust this Miku

Anonymous
06/30/24(Sun)21:38:07 No.101223477

Anonymous 06/30/24(Sun)21:38:07 No.101223477

>>101220810
There's been rumors of it not coning out (according to a credible leaker who was right with ClosedAI stuff). When asked about it the employees played dumb. It's been months and there's still no word, so there's your hint.

Anonymous
06/30/24(Sun)21:38:17 No.101223480

Anonymous 06/30/24(Sun)21:38:17 No.101223480

>>101223447
Just like how warp drives now theoretically work within out standard model of physics. They call it "Constant-Velocity Subluminal Warp Drive". Right now its just on paper, but imagine once that tech becomes a reality. No longer will it take months to get to mars but rather minutes!*
*Might be more than minutes, I actually have no idea if they addressed how they will slow down the spacecraft once they start going that fast.

Anonymous
06/30/24(Sun)21:49:13 No.101223571

Anonymous 06/30/24(Sun)21:49:13 No.101223571

>>101223480
Reverse the polarity.

Anonymous
06/30/24(Sun)21:50:18 No.101223583

Anonymous 06/30/24(Sun)21:50:18 No.101223583

>>101223480
How will this tech improve LLMs though?

Anonymous
06/30/24(Sun)21:51:07 No.101223592

Anonymous 06/30/24(Sun)21:51:07 No.101223592

>>101223583
LLMs will figure out how we warp drive.

Anonymous
06/30/24(Sun)22:05:34 No.101223726

Anonymous 06/30/24(Sun)22:05:34 No.101223726

>>101223583
They are already relying on AI to make Fusion Power function. AI will probably be the thing that makes Warp drives function as well.

Anonymous
06/30/24(Sun)22:10:55 No.101223764

Anonymous 06/30/24(Sun)22:10:55 No.101223764

>>101223592
>>101223726
it's funny how being able to contextualize information was the thing preventing chat bots from seeming smart, and now contextualizing large sources of info to piece together something new might turn into AI's super power

Anonymous
06/30/24(Sun)22:19:11 No.101223820

Anonymous 06/30/24(Sun)22:19:11 No.101223820

>>101217559
Sorry I'm retarded. Can someone please explain in more detail how to add the proper BOS token to magnums config.json, I wanna get it working properly. What changes do I make?

Anonymous
06/30/24(Sun)22:32:21 No.101223907

Anonymous 06/30/24(Sun)22:32:21 No.101223907

>>101223764
The trick is in that a truly human-like AGI would be less useful to humanity than a machine-like one. If it becomes too good at being human it will then be emulating the same limitations that cause us to have so many football fans and so few Albert Einstiens, Nikola Teslas, and Richard Feynmans.

Anonymous
06/30/24(Sun)22:47:55 No.101224025

Anonymous 06/30/24(Sun)22:47:55 No.101224025

https://x.com/tsarnick/status/1807517000664850671
kek, at least some players take strides and are aware of the slop issue

Anonymous
06/30/24(Sun)22:51:23 No.101224053

Anonymous 06/30/24(Sun)22:51:23 No.101224053

>>101224025
More like KINOhere, saviours of the hobby.

Anonymous
06/30/24(Sun)22:52:12 No.101224059

Anonymous 06/30/24(Sun)22:52:12 No.101224059

>>101224025
the canadians fucking won

Anonymous
06/30/24(Sun)22:55:56 No.101224079

Anonymous 06/30/24(Sun)22:55:56 No.101224079

>>101224025
What north america has to house so many talents!?

Anonymous
06/30/24(Sun)22:58:20 No.101224096

Anonymous 06/30/24(Sun)22:58:20 No.101224096

>>101224025
We're going to be so back when they release CR++!

Anonymous
06/30/24(Sun)23:00:24 No.101224112

Anonymous 06/30/24(Sun)23:00:24 No.101224112

>>101224025
The zucc got mogged
CR++ will shit on sloppa3 save /lmg/

Anonymous
06/30/24(Sun)23:00:38 No.101224117

Anonymous 06/30/24(Sun)23:00:38 No.101224117

>>101224025
what are they gonna do about it? release a 405b model?

Anonymous
06/30/24(Sun)23:01:31 No.101224123

Anonymous 06/30/24(Sun)23:01:31 No.101224123

File: ComfyUI_00167.jpg (830 KB, 2048x2048)

830 KB JPG

>>101223820
Add the line
    "bos_token_id": 151644,
to the config.json under eos_token_id

Anonymous
06/30/24(Sun)23:01:47 No.101224127

Anonymous 06/30/24(Sun)23:01:47 No.101224127

>>101223820
Anyone? Or just share a properly corrected config.json and I can compare them?

Anonymous
06/30/24(Sun)23:03:37 No.101224146

Anonymous 06/30/24(Sun)23:03:37 No.101224146

>>101224123
Thanks, appreciate it.

Anonymous
06/30/24(Sun)23:03:53 No.101224150

Anonymous 06/30/24(Sun)23:03:53 No.101224150

>>101224025
You can immeditely tell which corpos actually use their own models instead of doing investor scams with cheated benchmarks.

Anonymous
06/30/24(Sun)23:15:37 No.101224229

Anonymous 06/30/24(Sun)23:15:37 No.101224229

command-r-plus-IQ2_XXS or normal quant of L3-8B-Stheno-v3.2?

Anonymous
06/30/24(Sun)23:16:32 No.101224233

Anonymous 06/30/24(Sun)23:16:32 No.101224233

>>101224229
Buy an ad.

Anonymous
06/30/24(Sun)23:19:04 No.101224253

Anonymous 06/30/24(Sun)23:19:04 No.101224253

When the FUCK can I replace myself with a perfect AI so I can finally kill myself

Anonymous
06/30/24(Sun)23:19:19 No.101224255

Anonymous 06/30/24(Sun)23:19:19 No.101224255

>>101224233
suck my fucking penis, thanks.

Anonymous
06/30/24(Sun)23:19:30 No.101224258

Anonymous 06/30/24(Sun)23:19:30 No.101224258

>>101224229
Why not something in between like the Qwen2 MoE or the regular CommandR?
Between your options, Stheno, for sure.

Anonymous
06/30/24(Sun)23:21:28 No.101224280

Anonymous 06/30/24(Sun)23:21:28 No.101224280

If these are AI why can't they just use a calculator

Anonymous
06/30/24(Sun)23:22:48 No.101224290

Anonymous 06/30/24(Sun)23:22:48 No.101224290

>>101224280
Some do.
Was it chatGPT that had Wolfram Alpha integration?

Anonymous
06/30/24(Sun)23:22:53 No.101224292

Anonymous 06/30/24(Sun)23:22:53 No.101224292

>>101224258
>Qwen2 MoE
Is that a thing, or did you mean Qwen1.5 MoE?

Anonymous
06/30/24(Sun)23:23:48 No.101224302

Anonymous 06/30/24(Sun)23:23:48 No.101224302

>>101224292
https://huggingface.co/Qwen/Qwen2-57B-A14B-Instruct

Anonymous
06/30/24(Sun)23:27:22 No.101224330

Anonymous 06/30/24(Sun)23:27:22 No.101224330

>>101224302
>The Qwen2-57B models seem to be broken. I have tried my best, but they likely need to be fixed upstream first. You have been warned.
Are they even usable?

Anonymous
06/30/24(Sun)23:27:27 No.101224332

Anonymous 06/30/24(Sun)23:27:27 No.101224332

9B Gemma2 follows the prompt really well and feels uncensored. Much more so than llama3 actually.
But its insane how much gpt slop this shit was trained on.
I hope its only the instruction version and not base.
Twinkling mischievous spine shivering in 2 sentences. Its crazy.

Anonymous
06/30/24(Sun)23:28:13 No.101224336

Anonymous 06/30/24(Sun)23:28:13 No.101224336

>>101224321
>>101224321
>>101224321

Anonymous
06/30/24(Sun)23:40:19 No.101224416

Anonymous 06/30/24(Sun)23:40:19 No.101224416

>>101224330
At least with llamacpp with flash attention on yes.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.