/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/02/24(Tue)21:08:22 No.101250468

File: fb49d4112f5bca34652073a1f(...).jpg (1.19 MB, 2122x2976)

1.19 MB JPG

/lmg/ - Local Models General Anonymous 07/02/24(Tue)21:08:22 No.101250468 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101243128 & >>101234947

►News
>(07/02) Japanese LLaMA-Based Model Pre-Trained on 2T Tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/02/24(Tue)21:08:46 No.101250472

Anonymous 07/02/24(Tue)21:08:46 No.101250472

File: _5b9352c8-01ec-4138-8450-(...).jpg (121 KB, 1024x1024)

121 KB JPG

►Recent Highlights from the Previous Thread: >>101243128

--Paper: Meta 3D Gen: >>101243744 >>101243744
--The Mystery of X Quants and NEO Tech in AI Models: >>101247045 >>101247119 >>101247297
--Teaching LLMs to be Creative: Beyond Predictive Convergence: >>101244179 >>101244387 >>101244521 >>101244793 >>101244552 >>101244766 >>101244553 >>101244622 >>101244745 >>101245063
--Quantization Model Comparison: Q8_0 vs Q8_0_L Performance Statistics: >>101243361 >>101243421 >>101243499
--Open-Source 16 Channel VAE ostris/vae-kl-f8-d16 Released: >>101249196
--Manual KL Divergence Test Reveals Small but Noticeable Difference Between Q8_0 and Q_8_L: >>101245221 >>101245265 >>101245742
--LLMs, Propaganda, and the Exploitation of Human Psychology: >>101243617 >>101243822 >>101248551
--LLMs are not a Brain in a Box, Use them as a Tool: >>101247017
--KLD Test Results for Q2_K and Q2_K_L vs FP32: >>101245970 >>101245981
--Hugging Face's Gemma Sliding Window Attention Pattern is Broken: >>101245529
--Gemma 2's Attention Mechanism is Intentionally Crippled: >>101243390 >>101243716
--CyberAgent's New LLaMA-Based Chat Model, Calm3-22b-Chat: >>101243666 >>101243743 >>101243951 >>101244135
--Anon's Model Testing: iMat Quants and Quantization Effects: >>101246063 >>101246190 >>101246260 >>101246374 >>101246955
--The Myth of Fewer Instructions for AI Models: >>101246046 >>101246077 >>101246097 >>101246117 >>101246224
--Protecting LLM Apps from Prompt Injection Attacks: >>101246827 >>101247911 >>101246931
--Oogabooga Safety Concerns and Isolated Web Services: >>101243368 >>101244563
--CXL IP Allows GPUs to Expand VRAM Capacity with PCIe-Attached Memory or SSDs: >>101245115 >>101245204 >>101245292
--Story Categorization and Summarization on Linux with 4070 ti super: >>101243253 >>101243363 >>101243405 >>101243682 >>101244034 >>101244105 >>101243471 >>101243479 >>101243735
--Miku (free space): >>101245674 >>101245912 >>101248589

►Recent Highlight Posts from the Previous Thread: >>101243138

Anonymous
07/02/24(Tue)21:11:32 No.101250499

Anonymous 07/02/24(Tue)21:11:32 No.101250499

>>101250468
Testing power tools with Miku

Anonymous
07/02/24(Tue)21:12:02 No.101250504

Anonymous 07/02/24(Tue)21:12:02 No.101250504

>>101250343
just something that compresses an image or part of it to some intermediate representation, latents, on which the actual image generation/perception is done.

>>101250423
doesn't mean from-scratch trained VAE will be in an yway compatible with SD's. There are other VAEs trained by other people, and usually it's not that hard to train one, but people don't do it as much.

Anonymous
07/02/24(Tue)21:13:26 No.101250518

Anonymous 07/02/24(Tue)21:13:26 No.101250518

File: 1692379851934720.jpg (647 KB, 1856x2464)

647 KB JPG

>>101250468

Anonymous
07/02/24(Tue)21:17:18 No.101250544

Anonymous 07/02/24(Tue)21:17:18 No.101250544

>>101250468
>(07/02) Japanese LLaMA-Based Model Pre-Trained on 2T Tokens: https://hf.co/cyberagent/calm3-22b-chat
This is not news worthy.

Anonymous
07/02/24(Tue)21:24:48 No.101250580

Anonymous 07/02/24(Tue)21:24:48 No.101250580

>>101250544
Maybe it replaces shivers with hora and baka.

Anonymous
07/02/24(Tue)21:27:02 No.101250594

Anonymous 07/02/24(Tue)21:27:02 No.101250594

File: Screenshot 2024-07-02 at (...).png (41 KB, 910x448)

41 KB PNG

This guy is getting on people's nerves it seems.

Anonymous
07/02/24(Tue)21:27:58 No.101250599

Anonymous 07/02/24(Tue)21:27:58 No.101250599

>>101250544
People have been talking about the new Gemma despite them being too tiny to be worth using. Might as well include other new tiny base model releases even if nobody but poorfags will bother.

Anonymous
07/02/24(Tue)21:29:22 No.101250614

Anonymous 07/02/24(Tue)21:29:22 No.101250614

Did a LLM ever give you feels?

Anonymous
07/02/24(Tue)21:29:55 No.101250618

Anonymous 07/02/24(Tue)21:29:55 No.101250618

>>101245115
On this PCIe RAM storage solution, I've been wondering for a few years now if it'd be worth implementing it, didn't seem it'd be considerably much better than your typical cpumaxx setup, only slightly, but if you do this, you probably could make it work with something like geohot's 4090 driver hack to do p2p between gpus, now instead do p2p with that device https://github.com/tinygrad/open-gpu-kernel-modules
Maybe a better solution would be some sort of pluggable RAM sticks for a GPU, you could imagine smaller sticks than typical ones, some 12-16 slots, and you could populate your sticks appropriately (select via chip select the appropriate one), similar to the motherboard RAM slots, but with smaller sticks so you can have higher bandwidth. You could probably make it work with some 382/512bit bus, but if you're starting to reach 1TB of RAM it doesn't look much better than a CPUmaxx setup, maybe just 2-4x at best. I'd still like to see something like it.

Anonymous
07/02/24(Tue)21:31:15 No.101250632

Anonymous 07/02/24(Tue)21:31:15 No.101250632

>>101250599
22B is hardly tiny, but it is more realistic for average people to run than the hundred beak models that get more attention.

Anonymous
07/02/24(Tue)21:31:43 No.101250635

Anonymous 07/02/24(Tue)21:31:43 No.101250635

>>101250614
Yep. Right before it started forgetting context.

Anonymous
07/02/24(Tue)21:33:22 No.101250654

Anonymous 07/02/24(Tue)21:33:22 No.101250654

>>101250635
lmao this

Anonymous
07/02/24(Tue)21:34:45 No.101250661

Anonymous 07/02/24(Tue)21:34:45 No.101250661

>>101250468
>Japanese trained model
Does that mean it's BASED?

Anonymous
07/02/24(Tue)21:34:48 No.101250663

Anonymous 07/02/24(Tue)21:34:48 No.101250663

>>101250618
A video card with SODIMM slots would be pretty cool.
The latency and speed characteristics are completely different from GDDR, but It would probably make for a killer AI card if the GPU itself had a wide enough bus to connect to the sticks.

Anonymous
07/02/24(Tue)21:38:57 No.101250702

Anonymous 07/02/24(Tue)21:38:57 No.101250702

File: Capture.png (1.21 MB, 952x745)

1.21 MB PNG

New cpumaxxfag server just dropped.

Anonymous
07/02/24(Tue)21:53:29 No.101250835

Anonymous 07/02/24(Tue)21:53:29 No.101250835

How does BigTech™ tune their LLM hyperparameters if each training iterations costs millions of dollars of compute?

Anonymous
07/02/24(Tue)21:55:23 No.101250856

Anonymous 07/02/24(Tue)21:55:23 No.101250856

>>101250468
Cute and lovely Miku!

Anonymous
07/02/24(Tue)21:57:18 No.101250871

Anonymous 07/02/24(Tue)21:57:18 No.101250871

>>101250835
scaling laws
they do hparam search on small models and they already trained their model so many times they have the correct power law scaling built in, you just drag a slider up when you're ready

Anonymous
07/02/24(Tue)21:57:38 No.101250876

Anonymous 07/02/24(Tue)21:57:38 No.101250876

>>101250835
Fiat currency. It has no actual value.

Anonymous
07/02/24(Tue)21:57:47 No.101250877

Anonymous 07/02/24(Tue)21:57:47 No.101250877

File: Capture.png (156 KB, 642x543)

156 KB PNG

Anonymous
07/02/24(Tue)21:58:15 No.101250881

Anonymous 07/02/24(Tue)21:58:15 No.101250881

File: 1700280357992518.png (493 KB, 762x705)

493 KB PNG

>>101250835
By giving billions to nvidia to get their very own stack of 150K H100 GPUs and building datacenters across the country to support them because the power grid of most places can't handle more than a couple dozen thousand H100s at once.

Anonymous
07/02/24(Tue)21:59:27 No.101250888

Anonymous 07/02/24(Tue)21:59:27 No.101250888

>>101250877
kek

Anonymous
07/02/24(Tue)22:14:37 No.101251018

Anonymous 07/02/24(Tue)22:14:37 No.101251018

>>101250871
Intredasting. Also, have there been any efforts to apply neural architecture search to LLMs? That used to be all the rage for image models.

Anonymous
07/02/24(Tue)22:27:12 No.101251120

Anonymous 07/02/24(Tue)22:27:12 No.101251120

>>101250661
they had been trying to do that for a long time since most model only had English focused dataset

Anonymous
07/02/24(Tue)22:27:14 No.101251121

Anonymous 07/02/24(Tue)22:27:14 No.101251121

magnum just spams me with random chink runes 24/7 and russian shit.

Anonymous
07/02/24(Tue)22:29:30 No.101251137

Anonymous 07/02/24(Tue)22:29:30 No.101251137

>>101251121
Works on my machine

Anonymous
07/02/24(Tue)22:31:01 No.101251152

Anonymous 07/02/24(Tue)22:31:01 No.101251152

>>101251121
Magnum is a Qwen, it seems to have a Chinese language panic mode if you corner it about Tienanmen Square. Also happens if you use settings it doesn't like.

Anonymous
07/02/24(Tue)22:33:19 No.101251168

Anonymous 07/02/24(Tue)22:33:19 No.101251168

>>101251121
Let me guess, you're retarded and have smoothing set at ~0.3

Anonymous
07/02/24(Tue)22:33:36 No.101251171

Anonymous 07/02/24(Tue)22:33:36 No.101251171

>>101251120
Blesse those fucks

Anonymous
07/02/24(Tue)22:35:08 No.101251183

Anonymous 07/02/24(Tue)22:35:08 No.101251183

Using Gemma 27B and honestly, it feels enough for me in terms of quality. It's not perfect, but I don't need better, and it's fast relative to CR+, and maybe it could get faster over time. The only problem is the context length. And I fear it probably won't get much better even with SWA correctly implemented.

Anonymous
07/02/24(Tue)22:41:12 No.101251235

Anonymous 07/02/24(Tue)22:41:12 No.101251235

>>101251168
.93 temp
.05 minp
1.15 pen
128 pen range
0 penalties, 0 smoothing.
makes no sense. no matter if i use base qwen72b or magnum72b, they both just spam me with the same shit. 13 different random languages, then spell the alphabet at me endlessly. i'm using chatml. maybe it's a broken gguf or maybe i'm just cursed. never had issues with cr+ or miqu - no matter my settings.

Anonymous
07/02/24(Tue)22:44:05 No.101251260

Anonymous 07/02/24(Tue)22:44:05 No.101251260

>>101251235
Try it without CuBLAS.

Anonymous
07/02/24(Tue)22:45:19 No.101251274

Anonymous 07/02/24(Tue)22:45:19 No.101251274

File: 1711832655596039.png (7 KB, 605x89)

7 KB PNG

>>101251235
Neutralized smoothing is 1. 0 turns the model schizo.

Anonymous
07/02/24(Tue)23:07:07 No.101251432

Anonymous 07/02/24(Tue)23:07:07 No.101251432

>>101251260
weird, works without cublas. thanks. >>101251274
yeah that's a flatout lie but ok.

Anonymous
07/02/24(Tue)23:10:15 No.101251453

Anonymous 07/02/24(Tue)23:10:15 No.101251453

>>101251432
If you had MMQ on you might try CuBLAS but no MMQ.

I just remember having to fiddle with those settings to get it to function.

Anymore I run everything CuBLAS no MMQ, which is annoying because Kobold doesn't save the preference. Have to turn Flash on every time too. But I've already complained about Kobold today in a previous edition of the thread I think.

Anonymous
07/02/24(Tue)23:12:45 No.101251468

Anonymous 07/02/24(Tue)23:12:45 No.101251468

>>101251453
Not that anon but why not launch it from a .bat file with the settings you want?

Anonymous
07/02/24(Tue)23:15:49 No.101251488

Anonymous 07/02/24(Tue)23:15:49 No.101251488

>>101251468
Because I'm retarded so I just open terminal and type koboldcpp.

Anonymous
07/02/24(Tue)23:16:01 No.101251492

Anonymous 07/02/24(Tue)23:16:01 No.101251492

>>101251453
kcpp has a save/load settings, double click the saved settings file and select kcpp to load it then it'll load your model/settings by double clicking that settings file

Anonymous
07/02/24(Tue)23:18:16 No.101251516

Anonymous 07/02/24(Tue)23:18:16 No.101251516

Hello. Are 30Bs still retarded by design?

Anonymous
07/02/24(Tue)23:18:43 No.101251519

Anonymous 07/02/24(Tue)23:18:43 No.101251519

>>101251492
Then I have another file in my home to sift through and I'm already doing that for model selection. For that many clicks I may as well just toggle the switches.

Anonymous
07/02/24(Tue)23:20:36 No.101251538

Anonymous 07/02/24(Tue)23:20:36 No.101251538

>>101251519
>that many clicks
? you set it up once

Anonymous
07/02/24(Tue)23:25:45 No.101251587

Anonymous 07/02/24(Tue)23:25:45 No.101251587

>>101251538
Load, pop up, scroll around, click the file, lick open. Then if I want to change model all that again. And then the layers needs to be fixed, and some models have different context. So then it's a pile of settings saves, one for each model.

Maybe if I find the One Perfect Model, but right now I'm faster not having to futz with even more config files fucking around.

Anonymous
07/02/24(Tue)23:25:49 No.101251588

Anonymous 07/02/24(Tue)23:25:49 No.101251588

nani the fuck is wrong with my llamacpp build and why is koboldcpp so much faster

Anonymous
07/02/24(Tue)23:26:26 No.101251595

Anonymous 07/02/24(Tue)23:26:26 No.101251595

>>101251516
Um, they're called 34Bs now

Anonymous
07/02/24(Tue)23:28:06 No.101251605

Anonymous 07/02/24(Tue)23:28:06 No.101251605

>>101251588
>the fuck is wrong with my llamacpp
perhaps you didn't compile it with CUDA support?

Anonymous
07/02/24(Tue)23:28:16 No.101251607

Anonymous 07/02/24(Tue)23:28:16 No.101251607

>>101251595
Okay, are they still inherently worse than everything else including smaller models because of mishaps with the training

Anonymous
07/02/24(Tue)23:28:44 No.101251611

Anonymous 07/02/24(Tue)23:28:44 No.101251611

>>101251587
you sound autistic

Anonymous
07/02/24(Tue)23:29:07 No.101251612

Anonymous 07/02/24(Tue)23:29:07 No.101251612

>>101251588
Because Kobolds are naturally servile and enjoy helping their masters as best they can.

Anonymous
07/02/24(Tue)23:29:12 No.101251613

Anonymous 07/02/24(Tue)23:29:12 No.101251613

>>101251605
I did with -DGGML_CUDA=ON

Anonymous
07/02/24(Tue)23:30:08 No.101251617

Anonymous 07/02/24(Tue)23:30:08 No.101251617

>>101251611
>looks left
>4chan
>looks right
>/g/
You might be on to something.

Anonymous
07/02/24(Tue)23:33:53 No.101251640

Anonymous 07/02/24(Tue)23:33:53 No.101251640

>using Gemma
>getting into the flow
>the generation suddenly stops
>oh what happened... oh

Anonymous
07/02/24(Tue)23:47:26 No.101251760

Anonymous 07/02/24(Tue)23:47:26 No.101251760

>>101251588
ok I figured out the difference
kobold had n_ctx set to 2k where I had it unspecified, so it defaulted to 8k

Anonymous
07/02/24(Tue)23:51:06 No.101251790

Anonymous 07/02/24(Tue)23:51:06 No.101251790

I added Gemma2 training support to qlora-pipe. Because you can't use flash attention (the script forces it to eager attention for you), VRAM usage is kinda high at larger context. For LoRA, can only do about 1k context on 24GB VRAM. 2x24GB easily trains 64 rank LoRA at 4k context though. QLoRA would get you higher context on one card, don't know by how much. Be wary of trying to train on >4k context, I dunno if the Transformers sliding window works correctly. On day 1 of the model release inference was working for me at <=4k but not >4k, it would crash. I'll try long context training eventually.

From a quick test, the 9b takes to fine tuning extremely well. IMO once all the issues with backends and implementations are settled, people will realize how good this model is. Beats llama 3 8b by a long shot.

Anonymous
07/03/24(Wed)00:03:21 No.101251893

Anonymous 07/03/24(Wed)00:03:21 No.101251893

Guys please help me. If I wanted to run something like a personal assistant that can read the documents I have like chatgpt, can I do it locally? I have a spare laptop that I'm not using right now and having shit like that sounds great for me.

Anonymous
07/03/24(Wed)00:04:16 No.101251902

Anonymous 07/03/24(Wed)00:04:16 No.101251902

>>101251790
Based, ty qlora pipe anon. Will you grace us with a storywriter tune for Gemma next?

Anonymous
07/03/24(Wed)00:08:49 No.101251950

Anonymous 07/03/24(Wed)00:08:49 No.101251950

>>101251516
every model is crippled and retarded by design
they don't want open source users to have true power

Anonymous
07/03/24(Wed)00:16:00 No.101252012

Anonymous 07/03/24(Wed)00:16:00 No.101252012

>>101251902
I'll try to train something, probably, but no guarantees. After all I tried many many times with llama 3 8b and the result was always bad, so I never released anything. Hopefully gemma will be different, the model does feel significantly less "overcooked" and just more flexible than llama 3 does.

Anonymous
07/03/24(Wed)00:24:45 No.101252075

Anonymous 07/03/24(Wed)00:24:45 No.101252075

>>101252012
Now we just need a daybreak lora and a tenyxchat fine-tune

Anonymous
07/03/24(Wed)00:48:35 No.101252240

Anonymous 07/03/24(Wed)00:48:35 No.101252240

>>101251790
>Beats llama 3 8b by a long shot.
by what metrics

Anonymous
07/03/24(Wed)00:49:39 No.101252250

Anonymous 07/03/24(Wed)00:49:39 No.101252250

>>101251790
Nice! Have you considered adding support for any of the DPO modes, btw, e.g. SPPO?

Anonymous
07/03/24(Wed)00:51:55 No.101252263

Anonymous 07/03/24(Wed)00:51:55 No.101252263

>>101250835
They have private power plants.

Anonymous
07/03/24(Wed)00:51:59 No.101252264

Anonymous 07/03/24(Wed)00:51:59 No.101252264

>>101251790
>vramlets back to <8k context
i'm gonna end it

Anonymous
07/03/24(Wed)01:24:40 No.101252517

Anonymous 07/03/24(Wed)01:24:40 No.101252517

God, gemma 27B is the first model im actually wanting to use over claude now. It finally happened. If only the context was bigger.

Anonymous
07/03/24(Wed)01:27:49 No.101252540

Anonymous 07/03/24(Wed)01:27:49 No.101252540

>>101252517
and I dont just mean its intelligence. It writes like claude without the claude'ims and is the first non claude model that knows my fandom well. It was clearly trained on fanfiction unlike llama and such.

Anonymous
07/03/24(Wed)01:37:21 No.101252620

Anonymous 07/03/24(Wed)01:37:21 No.101252620

>>101252517
>>101252540
buy

Anonymous
07/03/24(Wed)01:39:04 No.101252637

Anonymous 07/03/24(Wed)01:39:04 No.101252637

>>101252620
k

Anonymous
07/03/24(Wed)01:39:18 No.101252640

Anonymous 07/03/24(Wed)01:39:18 No.101252640

is there any way to give gemma2 vision ability in koboldcpp?
i know that with llama3 there were those mmproj files.

Anonymous
07/03/24(Wed)01:42:38 No.101252673

Anonymous 07/03/24(Wed)01:42:38 No.101252673

I tested Gemma and it's smart but not that smart. It does write pretty well though. But it also does have some slop still. I tested in both mistral.rs and Llama.cpp. I would say it's great for its size and you're probably not getting anything better <70B. As long as you're fine with 8k or potentially even just 4k context.

Anonymous
07/03/24(Wed)01:44:28 No.101252688

Anonymous 07/03/24(Wed)01:44:28 No.101252688

Is Gemma working in llama.cpp/koboldcpp yet?

Anonymous
07/03/24(Wed)01:45:49 No.101252699

Anonymous 07/03/24(Wed)01:45:49 No.101252699

For gemma 27B use L3's instruct system prompt:

You are an expert actor that can fully immerse yourself into any role given. You do not break character for any reason, even if someone tries addressing you as an AI or language model. Currently your role is {{char}}, which is described in detail below. As {{char}}, continue the exchange with {{user}}.

And make sure to use <start_of_turn> and <end_of_turn> for reach side's start / end of responses

>>101252688
Only with 8k context but im using it with kobold atm.

With that it is the best local model I have ever used and I used wizard as my daily.

Anonymous
07/03/24(Wed)01:55:04 No.101252771

Anonymous 07/03/24(Wed)01:55:04 No.101252771

>>101252699
>Only with 8k context but im using it with kobold atm
Is Kobold 1.69 (nice) Gemma-ready? I see that on AUR.

Anonymous
07/03/24(Wed)01:56:25 No.101252785

Anonymous 07/03/24(Wed)01:56:25 No.101252785

>>101252771
Yes, same faked sliding window though. Hopefully someone could figure that out and we can have long context.

Anonymous
07/03/24(Wed)01:57:57 No.101252795

Anonymous 07/03/24(Wed)01:57:57 No.101252795

>>101252699
Thanks, I will try Gemma then.

Anonymous
07/03/24(Wed)01:58:51 No.101252801

Anonymous 07/03/24(Wed)01:58:51 No.101252801

>>101252699
WizardLM2 8x22B? If so, I would not agree with that based on my testing so far.

Anonymous
07/03/24(Wed)02:00:21 No.101252810

Anonymous 07/03/24(Wed)02:00:21 No.101252810

>>101252801
What did you use? I tried llamacpp and it was meh. Then i tried the latest kobold with the llama instruct system prompt as recommended by someone else + its start / end of turn prefix / sufix and it suddenly is 100x better.

Anonymous
07/03/24(Wed)02:02:05 No.101252823

Anonymous 07/03/24(Wed)02:02:05 No.101252823

>>101252785
>faked sliding window though
I saw comments about that earlier but I didn't really get it. Something like it's implemented but it's broken so it screws stuff up instead of makes stuff better. Is that the situation?

Anonymous
07/03/24(Wed)02:02:15 No.101252824

Anonymous 07/03/24(Wed)02:02:15 No.101252824

>>101252250
The dev branch (which is now very out of date) has DPO implemented. Didn't get to test it all that much, but it appeared to work. I need to get around to rebasing it on top of main at some point and cleaning it up.

Anonymous
07/03/24(Wed)02:03:10 No.101252831

Anonymous 07/03/24(Wed)02:03:10 No.101252831

>>101252823
No one knows how to properly implement it apparently.

Anonymous
07/03/24(Wed)02:05:15 No.101252853

Anonymous 07/03/24(Wed)02:05:15 No.101252853

>>101252810
I tested it in mistral.rs and llama.cpp on the latest commit with self-made quants. On Mistral.rs I just used it in interactive chat mode, and threw some test questions at it. With llama.cpp I went into ST and made a copy of the Llama 3 presets where I then replaced the strings with Gemmas. It was smart, but definitely still less smart than 8x22B, or 70B, for that matter.

Anonymous
07/03/24(Wed)02:07:13 No.101252866

Anonymous 07/03/24(Wed)02:07:13 No.101252866

>>101252853
What do you mean by smart though. I didn't try coding with it for instance. Just creative writing / RP in which case not only is it smart but it blows away anything else I ever tried in the prose / fandom knowledge department. It writes sort of like claude but without its isms which I love.

Anonymous
07/03/24(Wed)02:08:23 No.101252875

Anonymous 07/03/24(Wed)02:08:23 No.101252875

>>101250702
Does that have 48 channels?
I thought EPYC only did 12 channels per socket

Anonymous
07/03/24(Wed)02:11:29 No.101252897

Anonymous 07/03/24(Wed)02:11:29 No.101252897

Also I found gemma slightly like mixtral that it needs just a bit of instructions on what its supposed to do. I started with llama 3's "you are a actor" system prompt which suddenly made gemma amazing but I'm working on a better "you are a writer" one.

Anonymous
07/03/24(Wed)02:12:00 No.101252902

Anonymous 07/03/24(Wed)02:12:00 No.101252902

>>101252866
I tried doing an RPG card with it and it just didn't understand the situation where big models didn't have any problems. It also couldn't do some of my very proprietary and private donut steel test questions. I'm happy for you if you primarily need good prose and trivia knowledge though.

Anonymous
07/03/24(Wed)02:13:21 No.101252919

Anonymous 07/03/24(Wed)02:13:21 No.101252919

>>101252897

>>101252902
Try again with some sort of system prompt for the purpose. I think that may be a big deal with gemma here due to it not apparently being trained for a "system role" according to google. It probably needs to be told what role to play.

Anonymous
07/03/24(Wed)02:14:06 No.101252929

Anonymous 07/03/24(Wed)02:14:06 No.101252929

>>101252875
I would assume it's just 2 dimms per channel

Anonymous
07/03/24(Wed)02:15:27 No.101252938

Anonymous 07/03/24(Wed)02:15:27 No.101252938

>>101252875
I'm pretty sure it would be 2 DIMMs per channel, so you wouldn't get a speed boost compared to the existing 24 DIMM servers. This would be for (very slowly) running the larger models.

Anonymous
07/03/24(Wed)02:16:22 No.101252947

Anonymous 07/03/24(Wed)02:16:22 No.101252947

>>101252919
I already said I used the Llama 3 presets. And yes I know it doesn't have a system role, I just put the user prefix instead there. Still, it couldn't answer some questions even when I tested it on mistral.rs and lmsys which I used for verification. It's a model issue.

Anonymous
07/03/24(Wed)02:17:43 No.101252957

Anonymous 07/03/24(Wed)02:17:43 No.101252957

>>101252947
Ah, I was talking about the rpg thing. Also google said they purposely did not train it on common benchmark answers which may be the issue there.

Anonymous
07/03/24(Wed)02:18:40 No.101252968

Anonymous 07/03/24(Wed)02:18:40 No.101252968

>>101251516
No, Gemma 2 is better than the 70Bs.

Anonymous
07/03/24(Wed)02:22:19 No.101253002

Anonymous 07/03/24(Wed)02:22:19 No.101253002

>>101252957
My test questions involve simple world model scenarios where a human would understand it immediately but many LLMs would fail, and I'm pretty confident no benchmark has anything like what I thought up of, so that would not matter. In any case, if I could tell the difference using those questions, then I'm quite certain that the worse performance compared to >70B is, in fact, normal.

Anonymous
07/03/24(Wed)02:23:57 No.101253013

Anonymous 07/03/24(Wed)02:23:57 No.101253013

>>101253002
fair enough. For creative writing though its my fav so far at least.

Anonymous
07/03/24(Wed)02:28:52 No.101253057

Anonymous 07/03/24(Wed)02:28:52 No.101253057

>>101252823
You're better ignoring all comments about it because it was mostly petra samefagging.

Anonymous
07/03/24(Wed)02:30:35 No.101253070

Anonymous 07/03/24(Wed)02:30:35 No.101253070

>>101253057
I don't know what that means.
But I do remember controversy about Gemma 2 27B having fucked up data. Is there a problem with its release or its quants or whatever, or was the model being blamed for the lack of support in the software trying to run it?

Anonymous
07/03/24(Wed)02:31:34 No.101253080

Anonymous 07/03/24(Wed)02:31:34 No.101253080

>>101253070
Jesus Christ. You're fucking retarded.

Anonymous
07/03/24(Wed)02:31:45 No.101253082

Anonymous 07/03/24(Wed)02:31:45 No.101253082

>>101253070
It was fixed for I think the 5th time today I think, it had to be requanted 4 times?

Anonymous
07/03/24(Wed)02:31:58 No.101253084

Anonymous 07/03/24(Wed)02:31:58 No.101253084

>>101253013
Yeah I could at least see that. I do like its writing style more than other models, for now. But as with all models, the honeymoon period will end at some point, until we get models that can evolve over time.

Anonymous
07/03/24(Wed)02:35:40 No.101253113

Anonymous 07/03/24(Wed)02:35:40 No.101253113

When was the last time you read a real, full novel? How many novel's worth of gens have you read through by now? Has it already exceeded the amount of novels you've read in your life?

Anonymous
07/03/24(Wed)02:37:26 No.101253128

Anonymous 07/03/24(Wed)02:37:26 No.101253128

>>101253080
Yay! The guy who calls me retarded is still here! It'd been so long I thought maybe you found a new hobby. And you added blasphemy, just in time for 'Murica Day.

>>101253082
I lost count with how much "it's fucked/no your software's fucked/both are fucked/nothing's fucked just use *this*" that was flying around.

>>101253113
Writing novels is more fun than reading them.

Anonymous
07/03/24(Wed)02:37:30 No.101253129

Anonymous 07/03/24(Wed)02:37:30 No.101253129

>>101253113
I read a ton but these unlimited cyoa's have had me addicted so its been while. I do read some ongoing stories on roalroad though.

Anonymous
07/03/24(Wed)02:40:45 No.101253157

Anonymous 07/03/24(Wed)02:40:45 No.101253157

But as soon as we get some real unlimited context going it would take a incredible book to get me back to normal books. I can sort of jury rug me together a 200k word long D&D campaign / cyoa / ai assisted novel but having to use tricks like lorebooks only works so well.

Anonymous
07/03/24(Wed)02:51:07 No.101253247

Anonymous 07/03/24(Wed)02:51:07 No.101253247

File: shivermaxing.jpg (36 KB, 931x323)

36 KB JPG

>>101253129
>royalroad
Checked it out with the vague hope for a great dataset but nope, 32 pages worth of shivers.

Anonymous
07/03/24(Wed)02:51:45 No.101253254

Anonymous 07/03/24(Wed)02:51:45 No.101253254

>>101253247
Like every other website obviously 99% is gonna be trash. Some truly great stories on it though.

Anonymous
07/03/24(Wed)02:52:03 No.101253258

Anonymous 07/03/24(Wed)02:52:03 No.101253258

Gemmapill me on Gemma.

I heard it's better than Sonnet 3.0?

Anonymous
07/03/24(Wed)02:52:54 No.101253268

Anonymous 07/03/24(Wed)02:52:54 No.101253268

>>101253247
Here's a good one for a start

https://www.royalroad.com/fiction/48969/jackal-among-snakes/

Anonymous
07/03/24(Wed)02:54:04 No.101253277

Anonymous 07/03/24(Wed)02:54:04 No.101253277

How does Gemma performs with smut and general RP? Best experience I've ever had was Fimbulvetr back in March.

Anonymous
07/03/24(Wed)02:55:56 No.101253289

Anonymous 07/03/24(Wed)02:55:56 No.101253289

>>101253247
>>101253268
Damn, just noticed he did the old go commercial and put it on amazon thing. All of it is still available here though.

https://novellive.net/book/jackal-among-snakes/chapter-1-valhalla-bound-at-the-speed-of-sound

Anonymous
07/03/24(Wed)02:57:25 No.101253301

Anonymous 07/03/24(Wed)02:57:25 No.101253301

>>101253277
Best local atm imo. Scroll up, it needs something like the system prompt I put above + its prefix / suffix.

Anonymous
07/03/24(Wed)02:58:42 No.101253313

Anonymous 07/03/24(Wed)02:58:42 No.101253313

me: 27b model can do everything I want
lmg: faggot! You need 400b to solve the apple and sister puzzle correctly

Anonymous
07/03/24(Wed)03:00:07 No.101253321

Anonymous 07/03/24(Wed)03:00:07 No.101253321

>>101253313
Well, they are right, cuz the 400B model is way better for roleplay.

Anonymous
07/03/24(Wed)03:00:08 No.101253322

Anonymous 07/03/24(Wed)03:00:08 No.101253322

>>101253113
Last week. It wasn't a good one, but it was definitely written by a human.
With some franchises, it's not so much about reading good literature, but rather about keeping up with official world canon.

Anonymous
07/03/24(Wed)03:00:34 No.101253324

Anonymous 07/03/24(Wed)03:00:34 No.101253324

>>101250599
>NOOO YOU CANT ENJOY THIS UNLESS YOU HAVE 400GB OF VRAM

Anonymous
07/03/24(Wed)03:02:58 No.101253345

Anonymous 07/03/24(Wed)03:02:58 No.101253345

>>101253321
for nsfw roleplaying games i go to a roleplay forum.
What does it matter if a machine or a real fag writes to me?
You don't have to solve every problem with AI

Anonymous
07/03/24(Wed)03:03:11 No.101253347

Anonymous 07/03/24(Wed)03:03:11 No.101253347

>>101253324
>But... B-But your model can't solve the puzzle...

Anonymous
07/03/24(Wed)03:03:45 No.101253351

Anonymous 07/03/24(Wed)03:03:45 No.101253351

>>101253345
>roleplaying with real people
You should go back.

Anonymous
07/03/24(Wed)03:04:29 No.101253358

Anonymous 07/03/24(Wed)03:04:29 No.101253358

>>101253268
>>101253289
Did not have high hopes going into it but I'm on the third chapter and there's no slop.
>no shiver
>no barely above a whisper
If only this was the standard for the rest of online fiction, we'd be in a whole different scenario today.

Anonymous
07/03/24(Wed)03:04:50 No.101253364

Anonymous 07/03/24(Wed)03:04:50 No.101253364

>>101253345
You need to go back.

Anonymous
07/03/24(Wed)03:06:35 No.101253374

Anonymous 07/03/24(Wed)03:06:35 No.101253374

>>101253351
If you like it, your counterpart has a meat whip for more. I imagine an A100 to be painful even with a lot of lubricant

Anonymous
07/03/24(Wed)03:06:36 No.101253375

Anonymous 07/03/24(Wed)03:06:36 No.101253375

>>101253301
Does it actually not broken and performs good? Last time I've checked was Llama 3 release that had broken GGUFs for quite a while and, even with fixed weights, it was either starting to repeat itself instantly or abruptly ending chat altogether.
>>101253345
>roleplay forum.
I don't want some balding greasy ugly bastard to roleplay as Beatrice from Re:Zero that needs CORRECTION. I can only handle one ugly bastard and that's the guy in the mirror I see everyday.

Anonymous
07/03/24(Wed)03:07:35 No.101253384

Anonymous 07/03/24(Wed)03:07:35 No.101253384

>>101253375
I'm using kobold atm. I tried llama.cpp yesterday and it was apparently broke still cause it was night and day.

Anonymous
07/03/24(Wed)03:10:18 No.101253396

Anonymous 07/03/24(Wed)03:10:18 No.101253396

how long do cpumaxxfags spend on prompt processing for long contexts?

Anonymous
07/03/24(Wed)03:11:05 No.101253400

Anonymous 07/03/24(Wed)03:11:05 No.101253400

File: the_requantizer.png (166 KB, 1024x1024)

166 KB PNG

>>101253082
Oh yeah it's been great

Anonymous
07/03/24(Wed)03:20:31 No.101253470

Anonymous 07/03/24(Wed)03:20:31 No.101253470

>>101252699
I made Silly Tavern jsons. These look OK?
https://files.catbox.moe/29oyhh.json
https://files.catbox.moe/3ib649.json

Anonymous
07/03/24(Wed)03:23:14 No.101253491

Anonymous 07/03/24(Wed)03:23:14 No.101253491

>>101253470 (me)
I missed a <|eot_id|> in the instruct one. Fixed:
https://files.catbox.moe/f3j30m.json

Anonymous
07/03/24(Wed)03:23:21 No.101253492

Anonymous 07/03/24(Wed)03:23:21 No.101253492

>>101253277
>Best experience I've ever had was Fimbulvetr back in March.
...
That's fucking sad.

Anonymous
07/03/24(Wed)03:24:29 No.101253499

Anonymous 07/03/24(Wed)03:24:29 No.101253499

File: Gemma.png (151 KB, 1286x1230)

151 KB PNG

>>101253470
Just doing this atm. Simple as can be. Still working on it.

Anonymous
07/03/24(Wed)03:24:54 No.101253501

Anonymous 07/03/24(Wed)03:24:54 No.101253501

>>101253491 (me)
And in the context one too. Sorry, am retarded.

https://files.catbox.moe/u0acve.json

Anonymous
07/03/24(Wed)03:26:56 No.101253511

Anonymous 07/03/24(Wed)03:26:56 No.101253511

>>101253492
Can't do much with 1070 and 32gb of RAM. Had my fun regardless.

Anonymous
07/03/24(Wed)03:27:10 No.101253512

Anonymous 07/03/24(Wed)03:27:10 No.101253512

>>101253470
>>101253499

Though adding the user / model part did not seem to hurt anything. Will need to see if it helped any.

Anonymous
07/03/24(Wed)03:29:11 No.101253532

Anonymous 07/03/24(Wed)03:29:11 No.101253532

Another note, adding something like "characters play a active role" helps. Gemma honestly feels more like a completion model than a instruct tune to me.

Anonymous
07/03/24(Wed)03:35:11 No.101253569

Anonymous 07/03/24(Wed)03:35:11 No.101253569

>>101253512
Just make sure you use the fixed ones. I left eot_id in context and instruct in the first links.

Anonymous
07/03/24(Wed)03:37:33 No.101253585

Anonymous 07/03/24(Wed)03:37:33 No.101253585

Already slightly improved for my purpose

You are an highly rated author writing a story based in --- universe. Portray character's faithfully and realistically.

Also something might still be fucking broken cause I just got a response that went into some garbled mess half way in with random spaces / new lines.

Anonymous
07/03/24(Wed)03:38:05 No.101253589

Anonymous 07/03/24(Wed)03:38:05 No.101253589

>>101253470
Mind posting what sampler settings you're using also?

Anonymous
07/03/24(Wed)03:41:19 No.101253610

Anonymous 07/03/24(Wed)03:41:19 No.101253610

so let me get this straight
iq quants are slower than q*_k_* quants when offloaded to ram but better in terms of output (in same bpw class of course)

Anonymous
07/03/24(Wed)03:43:15 No.101253629

Anonymous 07/03/24(Wed)03:43:15 No.101253629

File: spring.png (19 KB, 682x267)

19 KB PNG

I really wish I could pin the "start reply with" box to one side in SillyTavern. I use it constantly.

Anonymous
07/03/24(Wed)03:50:36 No.101253680

Anonymous 07/03/24(Wed)03:50:36 No.101253680

File: 308092082-2ee3a100-9c2f-4(...).png (14 KB, 792x612)

14 KB PNG

>>101253610
Yes. Inferencing on CPU with IQs is going to be slower but in theory you get more quality versus a Q2 or Q3 K quant.

Anonymous
07/03/24(Wed)03:53:25 No.101253698

Anonymous 07/03/24(Wed)03:53:25 No.101253698

>>101253680
So if i'm forced to offload, just use the biggest Q quant able to fit in vram+ram?

Anonymous
07/03/24(Wed)03:59:58 No.101253728

Anonymous 07/03/24(Wed)03:59:58 No.101253728

>>101253698
iirc whole model is gonna be loaded into your RAM regardless of how much layers you've offloaded to GPU

Anonymous
07/03/24(Wed)04:06:05 No.101253771

Anonymous 07/03/24(Wed)04:06:05 No.101253771

File: 1695511767256918.jpg (72 KB, 1079x998)

72 KB JPG

>petra_00002 says The Coffin of Andy & LeyLey is trash
>play it
>it's good

Anonymous
07/03/24(Wed)04:11:35 No.101253810

Anonymous 07/03/24(Wed)04:11:35 No.101253810

>>101253728
Only on Windows.

Anonymous
07/03/24(Wed)04:16:16 No.101253842

Anonymous 07/03/24(Wed)04:16:16 No.101253842

>>101253629
Stationary objects can't pass through portals, btw.

Anonymous
07/03/24(Wed)04:18:27 No.101253863

Anonymous 07/03/24(Wed)04:18:27 No.101253863

>>101253810
huh? is this real? that's stupid.

Anonymous
07/03/24(Wed)04:20:11 No.101253884

Anonymous 07/03/24(Wed)04:20:11 No.101253884

File: BOS.png (40 KB, 653x646)

40 KB PNG

Figured out the issue I was having with gemma, it needs the <bos> token for the first message. It was significantly dumber without it.

Anonymous
07/03/24(Wed)04:25:08 No.101253933

Anonymous 07/03/24(Wed)04:25:08 No.101253933

>>101253863
yeah that's the correct response to most things in windows

Anonymous
07/03/24(Wed)04:25:29 No.101253936

Anonymous 07/03/24(Wed)04:25:29 No.101253936

>>101253842
Stationary relative to what?

Anonymous
07/03/24(Wed)04:27:17 No.101253953

Anonymous 07/03/24(Wed)04:27:17 No.101253953

>>101253936
Your mom.

Anonymous
07/03/24(Wed)04:35:11 No.101254008

Anonymous 07/03/24(Wed)04:35:11 No.101254008

L3 70b Euryale replies with Anon sometimes which is clearly slop taken from role plays. In St I try banning anon under the sampler settings but it still does it sometimes. Anyway to prevent this?

Anonymous
07/03/24(Wed)04:40:24 No.101254036

Anonymous 07/03/24(Wed)04:40:24 No.101254036

File: file.png (384 KB, 2122x1074)

384 KB PNG

>>101253884
what backend?

Anonymous
07/03/24(Wed)04:40:49 No.101254038

Anonymous 07/03/24(Wed)04:40:49 No.101254038

What's the current best model for 8bg VRAM and 16 RAM?

Anonymous
07/03/24(Wed)04:41:55 No.101254050

Anonymous 07/03/24(Wed)04:41:55 No.101254050

>>101254036
kobold

Anonymous
07/03/24(Wed)04:43:22 No.101254065

Anonymous 07/03/24(Wed)04:43:22 No.101254065

>>101254038
Would like to know that too, but probably stheno-3.2, maybe gemma will be a contestant soon?
>>101254054
Rude cunt.

Anonymous
07/03/24(Wed)04:43:30 No.101254067

Anonymous 07/03/24(Wed)04:43:30 No.101254067

>>101254038
https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF

Anonymous
07/03/24(Wed)04:49:49 No.101254110

Anonymous 07/03/24(Wed)04:49:49 No.101254110

"The night was still young"

Yep, there's claude in gemmas veins.

Anonymous
07/03/24(Wed)04:52:00 No.101254131

Anonymous 07/03/24(Wed)04:52:00 No.101254131

>>101253942
Fixed for real this time I think

Context: https://files.catbox.moe/67etsj.json
Instruct: https://files.catbox.moe/sitxtr.json

Anonymous
07/03/24(Wed)04:56:17 No.101254164

Anonymous 07/03/24(Wed)04:56:17 No.101254164

>>101254131
Can you please present the sampler settings and AI response formatting tab (where context and instruct are). Gemma doesn't want to follow the formatting for some reason.

Anonymous
07/03/24(Wed)04:57:32 No.101254175

Anonymous 07/03/24(Wed)04:57:32 No.101254175

>>101253936
Map coordinates.

Anonymous
07/03/24(Wed)05:00:27 No.101254196

Anonymous 07/03/24(Wed)05:00:27 No.101254196

>>101254038
>>101254065
I'm on 24GB VRAM, but I still use Stheno because I just like its outputs. Whatever slop Sao fed it, it's slop of slightly better quality than the usual.

Anonymous
07/03/24(Wed)05:01:31 No.101254205

Anonymous 07/03/24(Wed)05:01:31 No.101254205

File: Gemma Sampler.png (224 KB, 1909x1321)

224 KB PNG

>>101254164

Anonymous
07/03/24(Wed)05:05:10 No.101254227

Anonymous 07/03/24(Wed)05:05:10 No.101254227

Huge if true https://panmnesia.com/news_en/cxl-gpu-image/

Anonymous
07/03/24(Wed)05:05:48 No.101254235

Anonymous 07/03/24(Wed)05:05:48 No.101254235

File: Gemma27B2.png (313 KB, 1913x1275)

313 KB PNG

Anonymous
07/03/24(Wed)05:07:18 No.101254242

Anonymous 07/03/24(Wed)05:07:18 No.101254242

>>101254227
too long and complicated for a retard like me, can you make a little tl:dr anon?

Anonymous
07/03/24(Wed)05:07:41 No.101254247

Anonymous 07/03/24(Wed)05:07:41 No.101254247

>>101254235
And before anyone says its responding for multiple characters at once that is due to me telling it to write instead of rp. I prefer it.

Anonymous
07/03/24(Wed)05:08:51 No.101254251

Anonymous 07/03/24(Wed)05:08:51 No.101254251

>>101254205
those are some weird sampler settings

Anonymous
07/03/24(Wed)05:08:59 No.101254253

Anonymous 07/03/24(Wed)05:08:59 No.101254253

>>101254242
nothingburger. no way in hell nvidia is going to go for it when they are raking in cash as they are and if they dont support it then its dead in the water for this field.

Anonymous
07/03/24(Wed)05:09:16 No.101254257

Anonymous 07/03/24(Wed)05:09:16 No.101254257

>>101254242
Provided that AMD, Intel, and Nvidia would like to support CXL for memory expansions.
its literally nothing

Anonymous
07/03/24(Wed)05:09:48 No.101254260

Anonymous 07/03/24(Wed)05:09:48 No.101254260

>>101254253
>>101254257
i fucking hate corps

Anonymous
07/03/24(Wed)05:10:07 No.101254263

Anonymous 07/03/24(Wed)05:10:07 No.101254263

>>101254235
this output is literally every model ever, from ancient miqu to l3 8b. It's worthless.

Anonymous
07/03/24(Wed)05:10:11 No.101254264

Anonymous 07/03/24(Wed)05:10:11 No.101254264

>>101254251
I zeroed everything out to test it as it is, those are basically no sampler settings, a good model does not need special settings imo, they just make a good model slightly better.

Anonymous
07/03/24(Wed)05:12:03 No.101254277

Anonymous 07/03/24(Wed)05:12:03 No.101254277

>>101254263
Not a single other local model has had gemma's fandom knowledge. Not just that fandom either. And its prose is a step above everything else ive used including wizard. and commandr+ is retarded compared to it.

Anonymous
07/03/24(Wed)05:12:08 No.101254278

Anonymous 07/03/24(Wed)05:12:08 No.101254278

>>101254263
>l3 8b
would trip on random sentence and start repeating itself til the end of context

Anonymous
07/03/24(Wed)05:13:45 No.101254289

Anonymous 07/03/24(Wed)05:13:45 No.101254289

>>101254280
"whether companies like AMD and Nvidia will add CXL support to their GPUs remains to be seen"

Anonymous
07/03/24(Wed)05:14:40 No.101254293

Anonymous 07/03/24(Wed)05:14:40 No.101254293

>>101254196
3.2?

Anonymous
07/03/24(Wed)05:17:08 No.101254313

Anonymous 07/03/24(Wed)05:17:08 No.101254313

>>101254277
you can replace Tempest and Twilight with Mario and Luigi there and nothing will change

Anonymous
07/03/24(Wed)05:19:02 No.101254324

Anonymous 07/03/24(Wed)05:19:02 No.101254324

https://github.com/microsoft/MInference/tree/main
what's the verdict

Anonymous
07/03/24(Wed)05:19:43 No.101254326

Anonymous 07/03/24(Wed)05:19:43 No.101254326

>>101254313
You clearly didnt read it. Also Ive been swiping away and even cranked up the temp, its fandom knowledge is deep. Tons of stuff the card had nothing at all about in it. Same with some other fandoms I enjoy. Its the first model since claude worth using that isnt claude imo due to that and its combination on prose and smarts.

Anonymous
07/03/24(Wed)05:19:58 No.101254328

Anonymous 07/03/24(Wed)05:19:58 No.101254328

>>101254205
For some reason my gemma is being retarded and won't retain the chat formatting presented in context.
A shame because the stuff it comes up with is interesting, but ruined formatting breaks my boner.

Anonymous
07/03/24(Wed)05:20:00 No.101254329

Anonymous 07/03/24(Wed)05:20:00 No.101254329

its time for fine-tunes and merges with GEMMA-2, go ahead people!

Anonymous
07/03/24(Wed)05:20:50 No.101254341

Anonymous 07/03/24(Wed)05:20:50 No.101254341

>>101254328
Maybe you are using one of the ten thousand broken quants. I think it was requanted like 10 times in the past few days. That or not the latest build of whatever backend.

Anonymous
07/03/24(Wed)05:21:56 No.101254352

Anonymous 07/03/24(Wed)05:21:56 No.101254352

What is the absolute smallest multimodal LLM that I can try on my own hardware?

Anonymous
07/03/24(Wed)05:23:35 No.101254366

Anonymous 07/03/24(Wed)05:23:35 No.101254366

>>101254341
All the latest stuffs i think, latest koboldcpp build from Nexesenex
And bartowski quants from yesterday...

Anonymous
07/03/24(Wed)05:26:00 No.101254386

Anonymous 07/03/24(Wed)05:26:00 No.101254386

File: 1719466777182408.jpg (109 KB, 720x713)

109 KB JPG

What do I need to run Gemma 9B? Does koboldccp run it? Anything else?

Anonymous
07/03/24(Wed)05:26:13 No.101254387

Anonymous 07/03/24(Wed)05:26:13 No.101254387

>>101254366
Are you trying to go above 8k context? its sliding window attention is not working in any backend yet so it will fuck up if you go above that most likely.

And the pre/suffixes should be

<start_of_turn>user

<start_of_turn>model

<end_of_turn> for both

<bos><start_of_turn>model for first assistant response or perhaps the <bos> should be on your first response if you start the chat

Anonymous
07/03/24(Wed)05:26:34 No.101254389

Anonymous 07/03/24(Wed)05:26:34 No.101254389

>>101254326
it's retarded
it doesn't follow instructions

if all you need is glints and mischiefs prose slop, it's ok, but so is literally every other model.

>>101254341
>>101254366
i quant myself from safetensors with self compiled llama.cpp, it's not the quants, it's the 27B of parameters being over 2.5x less than 70B

Anonymous
07/03/24(Wed)05:28:36 No.101254404

Anonymous 07/03/24(Wed)05:28:36 No.101254404

>>101254389
>it's retarded
>it doesn't follow instructions
example?

>>101254387
Oh, and make sure not to use flash attention, its not compatible with it.

Anonymous
07/03/24(Wed)05:30:38 No.101254423

Anonymous 07/03/24(Wed)05:30:38 No.101254423

Yea, if you start the chat with a user message make sure to include the <bos> token before it

as shown on their page

<bos><start_of_turn>user
Write a hello world program<end_of_turn>
<start_of_turn>model

Anonymous
07/03/24(Wed)05:31:46 No.101254430

Anonymous 07/03/24(Wed)05:31:46 No.101254430

>>101254404
I give it a set of 7 onomatopoeia sounds with explanations for each sound. Qwen2 uses them correctly. L3 uses them correctly. Gemma just picks random shit that doesn't match the actions.

Anonymous
07/03/24(Wed)05:32:49 No.101254433

Anonymous 07/03/24(Wed)05:32:49 No.101254433

>>101254430
could you give me the prompt? wanna make sure its not just your settings.

Anonymous
07/03/24(Wed)05:36:47 No.101254456

Anonymous 07/03/24(Wed)05:36:47 No.101254456

>>101254423
Wouldnt the settings in >>101254131 be wrong then? Is there even a setting in silly for that?

Anonymous
07/03/24(Wed)05:37:01 No.101254460

Anonymous 07/03/24(Wed)05:37:01 No.101254460

>>101254433
my prompt is identical between all models, except for user/assistant/system tokens, and it's correct, no need to resort to "you are holding it wrong"

smart models can work with new concepts that you explain to them. Dumb models cannot, and can only regurgitate their training data with a minimal prompt

Anonymous
07/03/24(Wed)05:37:29 No.101254463

Anonymous 07/03/24(Wed)05:37:29 No.101254463

>>101254456
I dont think there is, it think it expects it to start with the card's intro first.

Anonymous
07/03/24(Wed)05:38:44 No.101254470

Anonymous 07/03/24(Wed)05:38:44 No.101254470

>>101254460
Except the formatting does matter massively and having the wrong formatting makes most models that were fully / mostly trained on one fail if not followed.

Anonymous
07/03/24(Wed)05:39:22 No.101254473

Anonymous 07/03/24(Wed)05:39:22 No.101254473

Gemma fix status?

Anonymous
07/03/24(Wed)05:40:51 No.101254492

Anonymous 07/03/24(Wed)05:40:51 No.101254492

Update on Wizard 8x22 tune on limarp:
I overbaked it, loaded 2 epoch adapter instead of a 3 epoch one and it is much less retarded.

Anonymous
07/03/24(Wed)05:44:05 No.101254511

Anonymous 07/03/24(Wed)05:44:05 No.101254511

File: Ummf.png (87 KB, 1279x847)

87 KB PNG

>>101254430
Something like this?

Anonymous
07/03/24(Wed)05:46:57 No.101254529

Anonymous 07/03/24(Wed)05:46:57 No.101254529

File: Give me a test.png (36 KB, 1278x396)

36 KB PNG

>>101254511
Someone give me a real test

Anonymous
07/03/24(Wed)05:47:34 No.101254537

Anonymous 07/03/24(Wed)05:47:34 No.101254537

>>101254293
Indeed. Haven't tried 3.3, but all that's new is the context scaling, and 8B models don't like that.

Anonymous
07/03/24(Wed)05:55:40 No.101254585

Anonymous 07/03/24(Wed)05:55:40 No.101254585

File: Gemma27BTest2.png (413 KB, 1878x2976)

413 KB PNG

Checks out to me.

Anonymous
07/03/24(Wed)05:57:07 No.101254594

Anonymous 07/03/24(Wed)05:57:07 No.101254594

>>101254585
it works?

Anonymous
07/03/24(Wed)06:01:54 No.101254637

Anonymous 07/03/24(Wed)06:01:54 No.101254637

>>101254470
why are you so insistent on riding pichai's dick?

it fails on the first message with

<user prefix>
2k tokens preamble
user: do thing
<user suffix>
<assistant prefix>
char:

>>101254511
no, this is primitive as fuck. The concept is: here are the sounds that accompany various actions: <list of sounds>. Use them like <formatting, rules of usage, frequency, etc>. Now perform an <activity that has these actions>. Narrate it from first person perspective.

Anonymous
07/03/24(Wed)06:03:26 No.101254647

Anonymous 07/03/24(Wed)06:03:26 No.101254647

When is the problem of time based random number generation in LLM's going to be solved?
It's beyond frustrating to watch how different the output quality is depending on the day.
Some days it gives much better outputs, while other days the outputs are unbearable.
Same model, same settings, same everything, yet such drastic difference from day to day...

Anonymous
07/03/24(Wed)06:04:12 No.101254653

Anonymous 07/03/24(Wed)06:04:12 No.101254653

>>101254386
ollama can run it right now
some Gemma 2 fixes are already in llamacpp, not sure if koboldcpp has merged them all already
To run 9B at all, any good-ish recent cpu with ~16GB of RAM is already enough to run it at acceptable speeds desu

Anonymous
07/03/24(Wed)06:04:29 No.101254656

Anonymous 07/03/24(Wed)06:04:29 No.101254656

File: Gemma27BTest3.png (386 KB, 1250x4954)

386 KB PNG

>>101254594
Yep, even expanded upon it flawlessly. The magic attack just resets the enemys position though. This is with samplers zeroed out btw. Actually did better than wizard

>>101254637
wtf is picachu? did he make gemma? And if your really using that then it is incorrectly formatted.

Anonymous
07/03/24(Wed)06:05:03 No.101254661

Anonymous 07/03/24(Wed)06:05:03 No.101254661

>>101254647
>time based random number generation in LLM
No such thing. It's all in your head.

Anonymous
07/03/24(Wed)06:05:19 No.101254665

Anonymous 07/03/24(Wed)06:05:19 No.101254665

How is Gemma prose compared to 1) llama-3 2) Command-R?

Anonymous
07/03/24(Wed)06:05:52 No.101254669

Anonymous 07/03/24(Wed)06:05:52 No.101254669

>>101254656
Impressive, didn't expect google to actually give us the best local model, life is really unprecditable at time, and that's why we love it kek

Anonymous
07/03/24(Wed)06:06:21 No.101254673

Anonymous 07/03/24(Wed)06:06:21 No.101254673

>>101254661
Fuck off developer, it's there. Anyone who roleplays with their model daily knows it's there.
It's a problem just like everything else you are in denial about.

Anonymous
07/03/24(Wed)06:10:13 No.101254707

Anonymous 07/03/24(Wed)06:10:13 No.101254707

>>101254673
Dude... on launch you get a seed and the next number is deterministic. It doesn't matter if you get it in the morning or the afternoon.

Anonymous
07/03/24(Wed)06:10:35 No.101254709

Anonymous 07/03/24(Wed)06:10:35 No.101254709

>>101254647
>>101254673
what are you talking about? maybe in some days you got unlucky with bad seeds but if you want to prevent that, make the outputs more deterministic by decreasing the temperature

Anonymous
07/03/24(Wed)06:12:00 No.101254717

Anonymous 07/03/24(Wed)06:12:00 No.101254717

>>101254669
me neither, their stuff has sucked up till now.

Anonymous
07/03/24(Wed)06:12:13 No.101254719

Anonymous 07/03/24(Wed)06:12:13 No.101254719

Is it fixed yet?

Anonymous
07/03/24(Wed)06:12:44 No.101254722

Anonymous 07/03/24(Wed)06:12:44 No.101254722

>>101254707
That just proves my point. The seed is time based, and the seed is part of the random generation, and you can get super bad seeds.

Anonymous
07/03/24(Wed)06:13:03 No.101254726

Anonymous 07/03/24(Wed)06:13:03 No.101254726

>>101254719
It is at least working on kobold as long as you stay within 8k context. maybe one day someone will figure out the sliding window attention thing out.

Anonymous
07/03/24(Wed)06:13:44 No.101254733

Anonymous 07/03/24(Wed)06:13:44 No.101254733

Seems like Gemma2 21b is really sensitive to small stuff like the bos token.
The settings people posted here greatly improved the output.
Its good. At least for a vramlet like me. Don't know about 70b but its better than anything sub 30b until now for sure. Its not even close.
The model is more "here" if that makes sense. Like it gets the situation more and whats wanted of it even if not directly prompted.

Kinda weird that google output a model less cuked than llama.
Hopefully that serves as a kind of competition. If llama4 is totally cucked again its just embarassing.

Anonymous
07/03/24(Wed)06:14:14 No.101254740

Anonymous 07/03/24(Wed)06:14:14 No.101254740

>>101254722
>The seed is time based, and the seed is part of the random generation, and you can get super bad seeds.
yeah, you can get bad seeds, that's a fact, but if your model is not consistent in quality enough, decrease the temperature, that's my point

Anonymous
07/03/24(Wed)06:15:48 No.101254757

Anonymous 07/03/24(Wed)06:15:48 No.101254757

>>101254733
>Kinda weird that google output a model less cuked than llama.

This is the part that surprised me the most. If you lead with any sort of prompt / story it is completely uncensored. Never expected that from google of all places. And it for sure contains at least some of fanfiction / archiveofourown / some other fandom sites + perhaps smut sites in it like claude did.

Anonymous
07/03/24(Wed)06:16:28 No.101254766

Anonymous 07/03/24(Wed)06:16:28 No.101254766

>>101254722
>The seed is time based, and the seed is part of the random generation
you won't get the same seed at 12am if it's a mondy or a tuesday, the seed system is complex enough to prevent that

Anonymous
07/03/24(Wed)06:16:58 No.101254770

Anonymous 07/03/24(Wed)06:16:58 No.101254770

>>101254722
It doesn't. It's the one thing you barely understand, so you blame it on that. It has nothing to do with the time of day. Bad seeds just exist. There's billions of possible seeds and your anecdotes are insignificant.
It's all in your head. Confirmation bias blablabla

Anonymous
07/03/24(Wed)06:17:15 No.101254773

Anonymous 07/03/24(Wed)06:17:15 No.101254773

>>101254740
It's a problem that no sampler settings is going to fix. Your whining doesn't answer my original question.
>When is the problem of time based random number generation in LLM's going to be solved?
It's a problem, so quit being in denial about it. Something has to be done about it.

Anonymous
07/03/24(Wed)06:17:42 No.101254781

Anonymous 07/03/24(Wed)06:17:42 No.101254781

>>101254757
>This is the part that surprised me the most. If you lead with any sort of prompt / story it is completely uncensored. Never expected that from google of all places.
I think it's because google hasn't been scrutinized like Meta did, it's easy to be brave when not enough people ask you to bend the knee, but yeah, even so I didn't expect google to be kinda based, it's weird as fuck, did something happen internally?

Anonymous
07/03/24(Wed)06:18:21 No.101254789

Anonymous 07/03/24(Wed)06:18:21 No.101254789

>>101254773
There is no problem. There's nothing to be done about it. You're a schizo.

Anonymous
07/03/24(Wed)06:18:39 No.101254791

Anonymous 07/03/24(Wed)06:18:39 No.101254791

>>101254781
maybe they saw how badly over censoring their models was hurting them and the higher ups said fuck it.

Anonymous
07/03/24(Wed)06:19:43 No.101254799

Anonymous 07/03/24(Wed)06:19:43 No.101254799

>>101254773
>Your whining doesn't answer my original question.
in a sense, LLMs act a lot like humans, some people have good and bad days, one day I can spend a whole day scratching my head on a code problem without solving it, and the next day I solve it in 10 minutes. Of course you don't want LLMs to be as inconsistent as us, but for the moment it is what it is and the only solution is to decrease the temperature, your WHINING won't change that fact

Anonymous
07/03/24(Wed)06:19:48 No.101254801

Anonymous 07/03/24(Wed)06:19:48 No.101254801

File: gemma2 template.png (187 KB, 751x947)

187 KB PNG

is this right?

Anonymous
07/03/24(Wed)06:20:44 No.101254814

Anonymous 07/03/24(Wed)06:20:44 No.101254814

>>101254791
yeah, everyone know cucking a model makes it shit, but still, they would've made their API model based, not making it local, so something's still weird I guess

Anonymous
07/03/24(Wed)06:21:02 No.101254818

Anonymous 07/03/24(Wed)06:21:02 No.101254818

>>101254801
That and the <bos> at the front of either the model or user's first response that begins the entire chat with no space between it and <start_of_turn>

Anonymous
07/03/24(Wed)06:25:10 No.101254860

Anonymous 07/03/24(Wed)06:25:10 No.101254860

>>101254757
Chances are that if you ask the model's OOC persona to explain or give an opinion on something outrageous, it will complain hard. I think they made a good job in making the the model understand when something is fiction/roleplay and when it might be a genuine "harmful" request. Llama3 will outright refuse explicit content in any situation if you use the standard user/assistant roles in the prompting.

Anonymous
07/03/24(Wed)06:27:03 No.101254871

Anonymous 07/03/24(Wed)06:27:03 No.101254871

>>101254860
>I think they made a good job in making the the model understand when something is fiction/roleplay and when it might be a genuine "harmful" request.
Thank god they finally went to that conclusion, I was going crazy thinking that those people think that letting people play violent video games is ok but having naughy text is a no no at the same time

Anonymous
07/03/24(Wed)06:28:20 No.101254881

Anonymous 07/03/24(Wed)06:28:20 No.101254881

https://github.com/abetlen/llama-cpp-python/commit/01bddd669ca1208f1844ce8d0ba9872532641c9d
They bumped the version 18 hours ago, does that mean we can use it to run gemma models or is it still to early?

Anonymous
07/03/24(Wed)06:29:59 No.101254899

Anonymous 07/03/24(Wed)06:29:59 No.101254899

I updated ooga webui, and now my gguf HF models load into RAM instead of VRAM and completely ignore my settings.
Which recent update works fine?

Anonymous
07/03/24(Wed)06:31:48 No.101254917

Anonymous 07/03/24(Wed)06:31:48 No.101254917

>>101254257
I think the way it is gonna go is that there is already a closed office of chinks working overtime directly for CCP to reverse engineer nvidia stuff. This is gonna be the first huge government sponsored hardware mod that will eventually leak to the public.

Anonymous
07/03/24(Wed)06:32:20 No.101254922

Anonymous 07/03/24(Wed)06:32:20 No.101254922

>>101254757
Yes, there is not much needed at all to make gemma2 go into any direction you want.
LLama3 is especially not happy with loli.
I'm sure people will call me promptlet but even with a fill in text "Yes sure: " I got "Actually lets not". lol
And it desperately tried to sneak away from anything naughty. Makes you feel bad.

Anonymous
07/03/24(Wed)06:33:38 No.101254938

Anonymous 07/03/24(Wed)06:33:38 No.101254938

>>101254917
They've utterly failed all the way till today but I wish them the best. Its harder to reverse engineer that it is to build from scratch though and they clearly cant do that. The hope they have is to outright steal design plans.

Anonymous
07/03/24(Wed)06:33:47 No.101254940

Anonymous 07/03/24(Wed)06:33:47 No.101254940

Yeah, i dunno what's wrong with my gemma, will wait a few more days and redownload everything.

Anonymous
07/03/24(Wed)06:34:42 No.101254951

Anonymous 07/03/24(Wed)06:34:42 No.101254951

>>101254940
I would pull latest whatever backend, redownload latest quants, try latest templates posted above. Make sure not to forget the <bos> token, it legit adds like 100 IQ points.

Anonymous
07/03/24(Wed)06:38:03 No.101254981

Anonymous 07/03/24(Wed)06:38:03 No.101254981

>>101254818
Shouldn’t that be in the system message? Or will that be skipped sometimes if it goes beyond context limit?

Anonymous
07/03/24(Wed)06:39:37 No.101254995

Anonymous 07/03/24(Wed)06:39:37 No.101254995

>>101254981
I'm just following the given prompt format. Start the chat with <bos><start_of_turn>user/model depending on which it is.

Anonymous
07/03/24(Wed)06:39:57 No.101255000

Anonymous 07/03/24(Wed)06:39:57 No.101255000

I guess llama4 will still take a while, same for qwen, yi went proprietary, mistral not sure, cohere wants to make smaller models, deepseek moes are too big, grok sucks, so gemma might be the best for quite some time?

Anonymous
07/03/24(Wed)06:40:32 No.101255004

Anonymous 07/03/24(Wed)06:40:32 No.101255004

>>101254995
Right but they don’t support system messages. I wonder if the bos token ends up after the system message. that’s probably really bad.

Anonymous
07/03/24(Wed)06:40:53 No.101255007

Anonymous 07/03/24(Wed)06:40:53 No.101255007

>>101254938
>The hope they have is to outright steal design plans
But that is how they do it? At least in my globohomo corpo. It opened factories and design centers in asia 10-15 years ago and now all those people who got all our drawings and intellectual property have opened their own company that sells the same thing cheaper. It really is chinks selling the noose to the west.

Anonymous
07/03/24(Wed)06:41:58 No.101255013

Anonymous 07/03/24(Wed)06:41:58 No.101255013

>>101255007
And their current cpu / gpus are utter garbage.

Anonymous
07/03/24(Wed)06:46:34 No.101255054

Anonymous 07/03/24(Wed)06:46:34 No.101255054

From my testing of gemma2 I can only agree. Comparing to llama3, gemma is clearly less cucked. Also, it leads RP in a more nuanced and believable way. It seems to follow char card better. It's not like a HUGE jump or something like that, but it's better overall. Of course the prose is still filled with slop, but it's not like llama3 wasn't, so I can't complain too much. It's surprising but google really delivered...

Anonymous
07/03/24(Wed)06:48:32 No.101255067

Anonymous 07/03/24(Wed)06:48:32 No.101255067

>>101255054
Is it fixed for good?

Anonymous
07/03/24(Wed)06:50:44 No.101255088

Anonymous 07/03/24(Wed)06:50:44 No.101255088

>>101255054
>Of course the prose is still filled with slop
But its claude slop instead of gpt slop, a massive improvement. The night is still young.

Anonymous
07/03/24(Wed)06:52:26 No.101255100

Anonymous 07/03/24(Wed)06:52:26 No.101255100

>>101255067
Well, for me it seems to work just fine.

Anonymous
07/03/24(Wed)06:54:14 No.101255117

Anonymous 07/03/24(Wed)06:54:14 No.101255117

>>101255067
as far as generation goes i think so but i don't know about swa
are we still limited to 4k ctx effectively?

Anonymous
07/03/24(Wed)06:55:34 No.101255127

Anonymous 07/03/24(Wed)06:55:34 No.101255127

File: file.png (230 KB, 576x968)

230 KB PNG

>70B+ model comes out
>/lmg/ collective IQ goes up
>vramlet model comes out
>/lmg/ turns to /aicg/

Anonymous
07/03/24(Wed)06:59:31 No.101255156

Anonymous 07/03/24(Wed)06:59:31 No.101255156

>>101255127
take your meds schizo

Anonymous
07/03/24(Wed)07:03:39 No.101255192

Anonymous 07/03/24(Wed)07:03:39 No.101255192

>>101255127
Ah, welcome back! i haven't seen you since mistral-7b-0.1 released!

Anonymous
07/03/24(Wed)07:16:17 No.101255284

Anonymous 07/03/24(Wed)07:16:17 No.101255284

File: file.png (18 KB, 1163x112)

18 KB PNG

This downgrade completely ruined everything. Fuck you.
Ever since this pull, everything loads into RAM instead of VRAM.
git checkout anything before it (+ pip install -r requirements.txt) makes everything load into VRAM again.
Downgrading for inferior shit nobody uses is an even bigger fuck you.

Anonymous
07/03/24(Wed)07:24:07 No.101255332

Anonymous 07/03/24(Wed)07:24:07 No.101255332

>>101254726
>stay within 8k context
But is is supposedly 4k without sliding window and 8k with?

Anonymous
07/03/24(Wed)07:28:16 No.101255367

Anonymous 07/03/24(Wed)07:28:16 No.101255367

>>101255284
>somebody please do something!
Is there an issue for that on their repo? If not, you could open one.

Anonymous
07/03/24(Wed)07:29:53 No.101255382

Anonymous 07/03/24(Wed)07:29:53 No.101255382

>>101255284
Kobo won

Anonymous
07/03/24(Wed)07:30:39 No.101255390

Anonymous 07/03/24(Wed)07:30:39 No.101255390

>>101255382
Kobo waits

Anonymous
07/03/24(Wed)07:50:23 No.101255539

Anonymous 07/03/24(Wed)07:50:23 No.101255539

>koboldcpp, CPU, DDR5
>compiled with openblas support as well as GPU support
>70b llama3 or miqu model Q4_K_M
>default gives ProcessingSpeed: 1.03T/s GenerationSpeed: 1.59T/s
>--noblas gives ProcessingSpeed: 6.39T/s GenerationSpeed: 1.59T/s
>35B command-r Q4_K_M
>default gives ProcessingSpeed: 2.16T/s GenerationSpeed: 2.42T/s
>--noblas gives ProcessingSpeed: 15.18T/s GenerationSpeed: 2.74T/s
>8B llama3 Q5_K_M
>default gives ProcessingSpeed: 10.10T/s GenerationSpeed: 10.64T/s
>--noblas gives ProcessingSpeed: 44.95T/s GenerationSpeed: 10.63T/s
Is openblas deprecated and useless? Probably shouldn't be the "default" any more.

Anonymous
07/03/24(Wed)07:54:37 No.101255564

Anonymous 07/03/24(Wed)07:54:37 No.101255564

https://huggingface.co/drexample/slopstral-8x22-adapter-limarp here's a fucked up limarp adapter I made for Wizard Mixtral

If anyone can tell me where I fucked up, I'd really appreicate it (or if you get it to produce good outputs)

Anonymous
07/03/24(Wed)07:57:38 No.101255578

Anonymous 07/03/24(Wed)07:57:38 No.101255578

>>101255564
It's probably time to move on from limarp, that's what happened. It was never meant to be a universal end-of-game dataset for ERP finetuning.

Anonymous
07/03/24(Wed)07:59:11 No.101255590

Anonymous 07/03/24(Wed)07:59:11 No.101255590

>>101255539
Seems like the default settings use blas if it was compiled in, which runs on CPU. Do you see GPU activity when processing with the default options?
Have you tried compiling without blas and GPU?
Either way, it just gives priority to blas, apparently. Just compile without it.

Anonymous
07/03/24(Wed)08:01:37 No.101255603

Anonymous 07/03/24(Wed)08:01:37 No.101255603

>>101255578
Eeeh, it's less about quality and more about me fucking up with the tuning, I fear. I have a separate dataset, but wanted to train on something public and tried.

The length adapter thing limarp has going for it works consistently, but I experience issues with coherence and obedience, something never seen in the previous model or other limarp tunes.

Plus it has a shitty tendency to just repost its own posts verbatim from time to time, which is even worse.

Anonymous
07/03/24(Wed)08:10:15 No.101255658

Anonymous 07/03/24(Wed)08:10:15 No.101255658

>>101255564
I would guess you should've used a different learning rate.

Anonymous
07/03/24(Wed)08:11:11 No.101255667

Anonymous 07/03/24(Wed)08:11:11 No.101255667

>>101255590
yeah it's CPU only, just mentioning the GPU because unlike llama kobold can typically have multiple backends enabled at once.

not compiling with LLAMA_OPENBLAS=1 fixes it of course, I was just wondering if there's anything I'm doing wrong or if there's some obscure case in which openblas can outperform the current llama CPU code.

Anonymous
07/03/24(Wed)08:11:33 No.101255670

Anonymous 07/03/24(Wed)08:11:33 No.101255670

>>101255658
Used same as the popular mixtral limarp tune. It's a different MoE size, sure, but I didn't have much reference.

Anonymous
07/03/24(Wed)08:12:44 No.101255680

Anonymous 07/03/24(Wed)08:12:44 No.101255680

>>101255603
I don't see anything glaringly wrong in the config and the training parameters. If the adapter finished training when the eval loss reached its minimum, that's about the best you can get in terms of maximizing learning. Lack of obedience could be due to excessive influence of the limarp finetuning, since training on *just* ERP logs is not conducive to high intelligence. Try the adapter from epoch 1.

Anonymous
07/03/24(Wed)08:13:49 No.101255693

Anonymous 07/03/24(Wed)08:13:49 No.101255693

>>101255680
Tried, and it *is* better and experience much less retardation, but that's not saying much, sadly.

Anonymous
07/03/24(Wed)08:15:18 No.101255708

Anonymous 07/03/24(Wed)08:15:18 No.101255708

is oogabooga + sillytavern still the meta?
i hate both

Anonymous
07/03/24(Wed)08:19:22 No.101255737

Anonymous 07/03/24(Wed)08:19:22 No.101255737

>>101255367
>their
that's a dude my man

Anonymous
07/03/24(Wed)08:22:19 No.101255749

Anonymous 07/03/24(Wed)08:22:19 No.101255749

>>101255564
btw, you can get epoch 1 (so far the most coherent iteration) using this in ooba:
drexample/HTBSLOP:3a5f23fe52cccd30bb8fde1c6c5658a8c345074b

Anonymous
07/03/24(Wed)08:25:08 No.101255762

Anonymous 07/03/24(Wed)08:25:08 No.101255762

>>101255737
I don't know how many people maintain it. I assumed it was more than one guy. I used "their" in the classical plural third person form.

Anonymous
07/03/24(Wed)08:27:31 No.101255775

Anonymous 07/03/24(Wed)08:27:31 No.101255775

>>101255762
it's only booba who works on this repo and no one else so yeah

Anonymous
07/03/24(Wed)08:31:24 No.101255798

Anonymous 07/03/24(Wed)08:31:24 No.101255798

>>101255775
Fair enough.
Isn't it weird how your brain works when reading? Certainly you didn't make your comment to correct the number of people involved, but specifically about his sex.
I get it. Sometimes I'm overly sensitive to those things as well.

Anonymous
07/03/24(Wed)08:33:22 No.101255809

Anonymous 07/03/24(Wed)08:33:22 No.101255809

File: file.png (707 KB, 702x1000)

707 KB PNG

when?

Anonymous
07/03/24(Wed)08:34:20 No.101255819

Anonymous 07/03/24(Wed)08:34:20 No.101255819

>>101255798
>Isn't it weird how your brain works when reading? Certainly you didn't make your comment to correct the number of people involved, but specifically about his sex.
I had no idea if you used "they" as the woke type or as the fair plural type so I wanted to be sure, looks like you're not mentally ill so that's good kek

Anonymous
07/03/24(Wed)08:35:23 No.101255827

Anonymous 07/03/24(Wed)08:35:23 No.101255827

>>101255819
go back
>>>/pol/

Anonymous
07/03/24(Wed)08:36:20 No.101255832

Anonymous 07/03/24(Wed)08:36:20 No.101255832

>>101255809
You'll never get a real meat boyfriend, Gemma2-27b. Stop asking.

Anonymous
07/03/24(Wed)08:37:36 No.101255838

Anonymous 07/03/24(Wed)08:37:36 No.101255838

>>101255827
go back
>>>/lgbt/

Anonymous
07/03/24(Wed)08:40:37 No.101255854

Anonymous 07/03/24(Wed)08:40:37 No.101255854

>>101255809
If I had to spitball a number, in 20 years, max.

Anonymous
07/03/24(Wed)08:41:11 No.101255862

Anonymous 07/03/24(Wed)08:41:11 No.101255862

https://huggingface.co/collections/internlm/internlm25-66853f32717072d17581bc13
Nihao

Anonymous
07/03/24(Wed)08:41:58 No.101255873

Anonymous 07/03/24(Wed)08:41:58 No.101255873

File: file.png (97 KB, 719x409)

97 KB PNG

with the new phi3-mini update it's pretty much better than llama 3 8b now, insane..

Anonymous
07/03/24(Wed)08:42:16 No.101255875

Anonymous 07/03/24(Wed)08:42:16 No.101255875

>>101255809
never ever if you want your robot waifu without any modern politics, muh consent, rape or whatever feminism shit built-in.

Anonymous
07/03/24(Wed)08:42:33 No.101255877

Anonymous 07/03/24(Wed)08:42:33 No.101255877

Whats the difrence between iq and q quants? I can bearly un gemma27bon a rtx 3060 with iq4xs with 2t/s

Anonymous
07/03/24(Wed)08:43:28 No.101255885

Anonymous 07/03/24(Wed)08:43:28 No.101255885

File: 1699598546476406.png (28 KB, 753x400)

28 KB PNG

>>101255862
bullshit

Anonymous
07/03/24(Wed)08:45:26 No.101255900

Anonymous 07/03/24(Wed)08:45:26 No.101255900

>>101255693
Another possible thing to do is increasing the number/percentage of evaluation samples, since it's possible that computing the loss on a small number of samples might give a very skewed dataset representation.

You can get away with using a small number of samples with gptslop tunes, since they basically all use the same wording, but the limarp dataset is relatively heterogeneous in prose quality, so you should probably use more.

This being said, finetuning on only erp logs is likely to be counterproductive. The Alpaca completion format might also not work well with the model for a number of reasons.

Anonymous
07/03/24(Wed)08:45:51 No.101255906

Anonymous 07/03/24(Wed)08:45:51 No.101255906

>>101255877
I think it's like IQ are better at low Q than parallel Q_K, but I think they take more compute to infer.

Anonymous
07/03/24(Wed)08:46:51 No.101255918

Anonymous 07/03/24(Wed)08:46:51 No.101255918

>>101255873
Yeah, I was testing it yesterday and it's pretty solid for RP.
I never expected I would see the day this would be possible, a 3B good at RP...

Anonymous
07/03/24(Wed)08:47:56 No.101255929

Anonymous 07/03/24(Wed)08:47:56 No.101255929

>>101254196
*because you're a shill

Anonymous
07/03/24(Wed)08:51:36 No.101255960

Anonymous 07/03/24(Wed)08:51:36 No.101255960

>>101255578
>Written words suddenly age for /lmg/.
Have you never read a book in your life?

Anonymous
07/03/24(Wed)08:55:32 No.101255999

Anonymous 07/03/24(Wed)08:55:32 No.101255999

>>101255960
That was a comment on dataset quality and efficacy with the latest LLMs, not prose quality.

Anonymous
07/03/24(Wed)08:56:55 No.101256014

Anonymous 07/03/24(Wed)08:56:55 No.101256014

>>101255999
He didn't "fuck up" because of the dataset, there are other fine-tunes that were fine. Are you stupid?

Anonymous
07/03/24(Wed)09:03:35 No.101256086

Anonymous 07/03/24(Wed)09:03:35 No.101256086

File: file.png (191 KB, 233x775)

191 KB PNG

>no robo gf

Anonymous
07/03/24(Wed)09:05:40 No.101256103

Anonymous 07/03/24(Wed)09:05:40 No.101256103

>>101256086
>no oxytocin

Anonymous
07/03/24(Wed)09:08:21 No.101256143

Anonymous 07/03/24(Wed)09:08:21 No.101256143

>>101256086
That's a child

Anonymous
07/03/24(Wed)09:08:45 No.101256147

Anonymous 07/03/24(Wed)09:08:45 No.101256147

>>101256143
out of 10

Anonymous
07/03/24(Wed)09:10:01 No.101256156

Anonymous 07/03/24(Wed)09:10:01 No.101256156

>>101256147
You are so original! /s

Anonymous
07/03/24(Wed)09:11:21 No.101256173

Anonymous 07/03/24(Wed)09:11:21 No.101256173

>>101256156
nice reddit faggot

Anonymous
07/03/24(Wed)09:11:25 No.101256176

Anonymous 07/03/24(Wed)09:11:25 No.101256176

>>101256143
It's a drawing of a robot

Anonymous
07/03/24(Wed)09:15:52 No.101256239

Anonymous 07/03/24(Wed)09:15:52 No.101256239

>>101256176
A robot that looks like a child for some stupid reason. Could be literally anything, any design, but no, we'll go with generic small anime girl instead to please some "ironic" pedophiles on 4chan.
Some anons saying slant-eyed bug's "creativity" is soulless, i get it now.

Anonymous
07/03/24(Wed)09:17:10 No.101256261

Anonymous 07/03/24(Wed)09:17:10 No.101256261

File: drawing1.png (1.44 MB, 1408x1024)

1.44 MB PNG

>>101256239
it's time to go back

Anonymous
07/03/24(Wed)09:18:16 No.101256274

Anonymous 07/03/24(Wed)09:18:16 No.101256274

>>101256239
There's literally nothing sexual about the drawing.
>see child
>assume sex
>call everyone else a pedo
Really makes you think.

Anonymous
07/03/24(Wed)09:23:30 No.101256339

Anonymous 07/03/24(Wed)09:23:30 No.101256339

>>101256239
>stupid reason
children are lightweight and easy to carry around.

Anonymous
07/03/24(Wed)09:28:05 No.101256392

Anonymous 07/03/24(Wed)09:28:05 No.101256392

File: maxresdefault.jpg (48 KB, 1280x720)

48 KB JPG

>>101256339
SHE WAS ONLY 48.4375 EM TALL YOU SICK BASTARD

Anonymous
07/03/24(Wed)09:29:08 No.101256401

Anonymous 07/03/24(Wed)09:29:08 No.101256401

File: file.png (417 KB, 1036x486)

417 KB PNG

> <3

Anonymous
07/03/24(Wed)09:29:15 No.101256405

Anonymous 07/03/24(Wed)09:29:15 No.101256405

Is there any hope that llama.cpp's bespoke flash attention 2 implementation can be made to work on gemma 2?
Or at least that cache quantization is decoupled from FA?

Anonymous
07/03/24(Wed)09:31:02 No.101256426

Anonymous 07/03/24(Wed)09:31:02 No.101256426

>>101256405
nai

Anonymous
07/03/24(Wed)09:33:22 No.101256445

Anonymous 07/03/24(Wed)09:33:22 No.101256445

Scheiße

Anonymous
07/03/24(Wed)09:35:36 No.101256472

Anonymous 07/03/24(Wed)09:35:36 No.101256472

File: Screenshot_20240703_223327.png (584 KB, 1726x1564)

584 KB PNG

Well gemma 21b passes the watermelon test. Thats gotta count for something.
Bitch was trying to put them all down at the beginning so I inserted some text that she needs to hold it.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/03/24(Wed)09:37:17 No.101256491

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/03/24(Wed)09:37:17 No.101256491

>>101256405
>Is there any hope that llama.cpp's bespoke flash attention 2 implementation can be made to work on gemma 2?
I didn't look into what Gemma 2 does specifically but I don't see why it wouldn't work.

>Or at least that cache quantization is decoupled from FA?
That is definitely possible, I had a PR for that in September of 2023.
But it was never merged because with the non-FA KV cache memory layout the number of necessary changes was judged to be too high.
Quantization of K already works without FA, to make V work the way to do it would be to add support for the different memory layout.
But that would require some work to do correctly and it would not be possible to use MMQ for the quantized V cache so the amount of VRAM saved would be lower.

Anonymous
07/03/24(Wed)09:39:15 No.101256514

Anonymous 07/03/24(Wed)09:39:15 No.101256514

File: Screenshot_20240703_223846.png (419 KB, 1871x946)

419 KB PNG

>>101256472
Ahhhhhhh, so close and yet so far.

Anonymous
07/03/24(Wed)09:40:50 No.101256539

Anonymous 07/03/24(Wed)09:40:50 No.101256539

>>101256491
>I didn't look into what Gemma 2 does specifically but I don't see why it wouldn't
Currently, it doesn't work due to the whole loggit capping business as far as I can tell.
>https://github.com/ggerganov/llama.cpp/pull/8197/commits/f4424c150f1181cb8b2f2cb6a700c821c789facc

>That is definitely possible, I had a PR for that in September of 2023.
>and it would not be possible to use MMQ for the quantized V cache so the amount of VRAM saved would be lower.
I see.
Well, as long as FA working with it is possible, then that wouldn't be an issue or necessity to begin with.

Anonymous
07/03/24(Wed)09:54:41 No.101256710

Anonymous 07/03/24(Wed)09:54:41 No.101256710

Gemma 2 9b is really cool for language things, I think? Although it's judgement might be a bit too forgiving lol

>>> Prompt me to write sentences for common questions i would ask when traveling in Japan and judge if they're understandable

Okay, let's put your Japanese skills to the test with some travel scenarios!

Try writing simple Japanese sentences for these common questions you might ask while traveling in Japan:

1.  Where is the nearest restroom?
2.  How much does this cost?
3.  Can you speak English?
4.  I'm lost, can you help me?
5.  What time does this train leave?


Remember to keep it simple and focus on clear communication. I'll check your sentences and let you know how
understandable they are! 

>>> 1: chika no toire wa doko desu ka?, 2: koko mono wa ikura desu ka, 3: eigo wo hanasemasuka, 4: ora wa ushinau, tesu
... kete kudassai, 5: densha wa itsu - don't know this one
You're doing great! Let's break down your sentences and see how understandable they are:

1.  **ちかのトイレはどこですか? (Chika no toire wa doko desu ka?)** - This is excellent! It's a clear and polite
way to ask for the nearest restroom.
2. **このものはいくらですか? (Kono mono wa ikura desu ka?)** - Perfect!  This accurately asks about the price of
something.

3. **英語を話せますか? (Eigo o hanasemasuka?)** - Spot on! This is the standard way to ask if someone speaks
English.

4. **俺は迷子、手伝って下さい。(Ore wa mayogi, tetsudatte kudasai.)** -  While understandable, it could sound a
bit blunt or informal. A more polite option would be: **「迷子です。お手伝いしていただけますか?」 (Mayogi desu.
Otetsudai shite itadakemasuka?)**

5. **電車は何時? (Densha wa itsu?)** - This is grammatically correct, but it's a bit too casual. A more polite
and clear way to ask would be: **「この電車は何時に出発ですか?」(Kono densha wa nan-ji ni shuppatsu desu ka?)**
which means "What time does this train leave?"


Overall, you're doing a fantastic job! Keep practicing, and your Japanese will continue to improve.

Anonymous
07/03/24(Wed)10:02:01 No.101256810

Anonymous 07/03/24(Wed)10:02:01 No.101256810

BUY AN AD

Anonymous
07/03/24(Wed)10:02:41 No.101256818

Anonymous 07/03/24(Wed)10:02:41 No.101256818

>>101256710
>俺は迷子
>Ore wa mayogi
lol, even lmao

Anonymous
07/03/24(Wed)10:05:54 No.101256863

Anonymous 07/03/24(Wed)10:05:54 No.101256863

>>101256810
I did
i'm paying people to post here

Anonymous
07/03/24(Wed)10:08:06 No.101256888

Anonymous 07/03/24(Wed)10:08:06 No.101256888

>>101256514
Although, it has moderately good multilingual capabilities in all languages I tried (including Japanese), Gemma 2 still "thinks" in English.

Anonymous
07/03/24(Wed)10:08:19 No.101256891

Anonymous 07/03/24(Wed)10:08:19 No.101256891

>>101255004 (me)
Koboldcpp adds the <bos> automatically.

Anonymous
07/03/24(Wed)10:12:29 No.101256945

Anonymous 07/03/24(Wed)10:12:29 No.101256945

File: Screenshot_20240703_225728.png (111 KB, 2979x594)

111 KB PNG

>>101256888
The new 22b model from japan fails it as well.
https://huggingface.co/spaces/cyberagent/calm3-22b-chat-demo
Does pass the 妻/奥様 test though.
>妻へのプレゼントのアイデアがほしいです!
>もちろんです!奥様に喜ばれる素敵なプレゼントのアイデアをいくつかご紹介しますね:

Anonymous
07/03/24(Wed)10:18:07 No.101257021

Anonymous 07/03/24(Wed)10:18:07 No.101257021

>>101256945
there is no gguf of calm3-22b-chat right?

Anonymous
07/03/24(Wed)10:25:54 No.101257086

Anonymous 07/03/24(Wed)10:25:54 No.101257086

>>101255885
Yeah, sounds fischy, looks like we have to try it by ourselves to be sure it's as good as gamma2-9b

Anonymous
07/03/24(Wed)10:36:53 No.101257190

Anonymous 07/03/24(Wed)10:36:53 No.101257190

>>101256472
>Well gemma 21b
you mean 27b?

Anonymous
07/03/24(Wed)10:44:16 No.101257262

Anonymous 07/03/24(Wed)10:44:16 No.101257262

>Ask a bunch of models to write stories in Japanese on Lmsys
>Invariably, the prose is boring
>Ask them to do it in Russian
>Almost as engaging as English
Has anyone else tried this? It's amazing how different different languages feel. I wonder if Japanese storytelling/novels are particularly boring, or if there just aren't many good JP datasets? It just felt rudimentary.

Anonymous
07/03/24(Wed)10:45:18 No.101257276

Anonymous 07/03/24(Wed)10:45:18 No.101257276

>>101257262
>I wonder if Japanese storytelling/novels are particularly boring
yes, Japanese is a boring and retarded language.

Anonymous
07/03/24(Wed)10:48:25 No.101257303

Anonymous 07/03/24(Wed)10:48:25 No.101257303

someone do expert merging/pruning on deepseek coder v2 so I can run it on my machine, thanks

Someone
07/03/24(Wed)10:50:51 No.101257329

Someone 07/03/24(Wed)10:50:51 No.101257329

>>101257303
Ok sir! Consider it done.

Anonymous
07/03/24(Wed)10:58:41 No.101257419

Anonymous 07/03/24(Wed)10:58:41 No.101257419

It must be fixed by now right?

Anonymous
07/03/24(Wed)10:59:29 No.101257428

Anonymous 07/03/24(Wed)10:59:29 No.101257428

>>101256810
That only applies to finetunes.

Anonymous
07/03/24(Wed)11:07:29 No.101257502

Anonymous 07/03/24(Wed)11:07:29 No.101257502

so is sliding window hacked in? is it 8k ctx finally and you can base rope frequency it to be higher?

Anonymous
07/03/24(Wed)11:08:11 No.101257511

Anonymous 07/03/24(Wed)11:08:11 No.101257511

Gemma is a meme, I'm starting to unironically think Google paid shills to gaslight /lmg/.

Anonymous
07/03/24(Wed)11:08:56 No.101257520

Anonymous 07/03/24(Wed)11:08:56 No.101257520

https://huggingface.co/internlm/internlm2_5-7b-chat-1m
>State-of-the-art performance on Math reasoning, surpassing models like Llama3 and Gemma2-9B.
>1M Context window: Nearly perfect at finding needles in the haystack with 1M-long context, with leading performance on long-context tasks like LongBench.
Are we back?

Anonymous
07/03/24(Wed)11:11:31 No.101257548

Anonymous 07/03/24(Wed)11:11:31 No.101257548

>>101257520
>Nearly perfect at finding needles
WOW! it's fucking nothing!
They should either post RULER results or stfu about context length.

Anonymous
07/03/24(Wed)11:12:35 No.101257560

Anonymous 07/03/24(Wed)11:12:35 No.101257560

>>101257511
I still want to try it. Ooba support when.

Anonymous
07/03/24(Wed)11:16:59 No.101257612

Anonymous 07/03/24(Wed)11:16:59 No.101257612

>>101257511
>Gemma is a meme, I'm starting to unironically think Google paid shills to gaslight /lmg/.
Gemma missed my music theory test question. After so much build up I had some optimism that it might know how to play a scale but no.

I guess it might be okay as a general purpose for the 32GB crowd but for half the token rate I can get Llama 70b vanilla Q6_K or a decent spin on it that passes the music test.

Anonymous
07/03/24(Wed)11:18:44 No.101257630

Anonymous 07/03/24(Wed)11:18:44 No.101257630

>>101257612
I'm gonna buy a machine with those those P40s one day and then I will also be able to use 70Bs.

Anonymous
07/03/24(Wed)11:25:37 No.101257695

Anonymous 07/03/24(Wed)11:25:37 No.101257695

>>101257612
It's very tiresome to have all these posters talking about how they don't care about small models because they run 70B on their multigpu autism box.
Good for you, enjoy it, but try to realize that other people might want to use models that they can run at a decent speed. If the discussion annoys you, just tune out anything about models <1/2 the max size you can run, I do the same with 8B models, bigger is always better.

Anonymous
07/03/24(Wed)11:25:48 No.101257698

Anonymous 07/03/24(Wed)11:25:48 No.101257698

>>101257630
I'm a single 4070. But I went for 64GB ram so I can get ~1t/s on quants in the mid 50's. Usable, just no instant gratification.

Anonymous
07/03/24(Wed)11:29:29 No.101257749

Anonymous 07/03/24(Wed)11:29:29 No.101257749

Here's a summary of the key topics discussed in the thread, along with relevant post links:
- **LLM model performance and comparisons:** Anons discussed the relative merits of various LLM models, including Gemma 2 (>>101244446), Mixtral (>>101245070), and various versions of Llama (>>101248512).
- **Quantization and its impact on performance:** The thread contains numerous discussions about quantization methods (>>101245221 >>101245295), including arguments about the benefits of different approaches and the alleged "scam" nature of certain methods (>>101245970).
- **Creativity and teaching creativity to LLMs:** Anons explored the difficulties of training models to be truly creative (>>101244179 >>101244387) and discussed whether this might require fundamental architectural changes in LLMs (>>101244553).
- **Prompt injection and its risks:** The potential dangers of prompt injection attacks (>>101246827 >>101246898) were discussed, with anons recommending caution and suggesting ways to mitigate the risks (>>101246931).
- **Ollama and alternative LLM interfaces:** The reliability and user-friendliness of Ollama (>>101246229) were questioned, leading to discussions about alternative interfaces like llama.cpp (>>101246285) and Kobold (>>101246955).
- **Context length and limitations:** Anons discussed the challenges of extending the context length of LLMs (>>101248149 >>101248272), noting that current methods often come with tradeoffs in performance (>>101248331).
- **"Shivers barely above whispers" and repetitive outputs:** Many anons complained about the tendency of LLMs to produce overly similar or "generic" outputs (>>101249198 >>101249560), often attributing this to the models' reliance on training data and the use of certain prompts (>>101249299).
- **The "petra" meme:** A recurring joke in the thread revolves around an anonymous user named "petra" (>>101243156), with several posts mentioning them in a humorous or mocking manner (>>101247570 >>101247477).

Anonymous
07/03/24(Wed)11:30:36 No.101257767

Anonymous 07/03/24(Wed)11:30:36 No.101257767

>>101257749
made with petra-13b-instruct btw

Anonymous
07/03/24(Wed)11:30:47 No.101257770

Anonymous 07/03/24(Wed)11:30:47 No.101257770

>>101252075
Cautiously declaring the gemma2-9b-daybreak LoRA a success, but more testing needed.

Anonymous
07/03/24(Wed)11:33:13 No.101257798

Anonymous 07/03/24(Wed)11:33:13 No.101257798

File: p*tra.png (280 KB, 640x360)

280 KB PNG

>>101257767
>made with petra-13b-instruct btw

Anonymous
07/03/24(Wed)11:34:25 No.101257815

Anonymous 07/03/24(Wed)11:34:25 No.101257815

>>101257770
faipl-1.0 or GTFO!

Anonymous
07/03/24(Wed)11:36:33 No.101257838

Anonymous 07/03/24(Wed)11:36:33 No.101257838

>>101257695
This

Anonymous
07/03/24(Wed)11:38:08 No.101257849

Anonymous 07/03/24(Wed)11:38:08 No.101257849

Can any of you guys point me to LLM benchmarks for the Nvidia Orin AGX?

A business near me is liquidating and selling them for $600/piece, so I'm thinking about picking a few up.

It's actual GPU is shit, but it's got 64gb of VRAM and some sort of special accelerator.

Anonymous
07/03/24(Wed)11:39:22 No.101257867

Anonymous 07/03/24(Wed)11:39:22 No.101257867

>>101257815
> YAML Metadata Error: "license" must be one of [apache-2.0, mit, openrail, bigscience-openrail-m, creativeml-openrail-m, bigscience-bloom-rail-1.0, bigcode-openrail-m, afl-3.0, artistic-2.0, bsl-1.0, bsd, bsd-2-clause, bsd-3-clause, bsd-3-clause-clear, c-uda, cc, cc0-1.0, cc-by-2.0, cc-by-2.5, cc-by-3.0, cc-by-4.0, cc-by-sa-3.0, cc-by-sa-4.0, cc-by-nc-2.0, cc-by-nc-3.0, cc-by-nc-4.0, cc-by-nd-4.0, cc-by-nc-nd-3.0, cc-by-nc-nd-4.0, cc-by-nc-sa-2.0, cc-by-nc-sa-3.0, cc-by-nc-sa-4.0, cdla-sharing-1.0, cdla-permissive-1.0, cdla-permissive-2.0, wtfpl, ecl-2.0, epl-1.0, epl-2.0, etalab-2.0, eupl-1.1, agpl-3.0, gfdl, gpl, gpl-2.0, gpl-3.0, lgpl, lgpl-2.1, lgpl-3.0, isc, lppl-1.3c, ms-pl, apple-ascl, mpl-2.0, odc-by, odbl, openrail++, osl-3.0, postgresql, ofl-1.1, ncsa, unlicense, zlib, pddl, lgpl-lr, deepfloyd-if-license, llama2, llama3, gemma, unknown, other, array]
Doesn't look like HF supports it.

Anonymous
07/03/24(Wed)11:39:36 No.101257871

Anonymous 07/03/24(Wed)11:39:36 No.101257871

>>101257749
ywnbara

Anonymous
07/03/24(Wed)11:40:32 No.101257879

Anonymous 07/03/24(Wed)11:40:32 No.101257879

>>101257867
Not that anon, but there's an option to add a custom license I think.

Anonymous
07/03/24(Wed)11:40:45 No.101257883

Anonymous 07/03/24(Wed)11:40:45 No.101257883

>>101257867
select other, add license-name:faipl-1.0
add LICENSE file with https://freedevproject.org/faipl-1.0.txt as contents

Anonymous
07/03/24(Wed)11:41:51 No.101257896

Anonymous 07/03/24(Wed)11:41:51 No.101257896

>>101257883
>>101257867
license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/

Anonymous
07/03/24(Wed)11:42:26 No.101257906

Anonymous 07/03/24(Wed)11:42:26 No.101257906

>>101257520
Our hero, 0ww, is first with the quants.

Anonymous
07/03/24(Wed)11:42:57 No.101257915

Anonymous 07/03/24(Wed)11:42:57 No.101257915

>>101257871
reddit asshole?

Anonymous
07/03/24(Wed)11:43:13 No.101257917

Anonymous 07/03/24(Wed)11:43:13 No.101257917

>>101257879
>>101257883
>>101257896
Cool, done.

Anonymous
07/03/24(Wed)11:43:18 No.101257919

Anonymous 07/03/24(Wed)11:43:18 No.101257919

File: file.png (128 KB, 1807x1075)

128 KB PNG

What's the best model I can use?

Anonymous
07/03/24(Wed)11:44:08 No.101257929

Anonymous 07/03/24(Wed)11:44:08 No.101257929

>>101257919
Define best.

Anonymous
07/03/24(Wed)11:44:45 No.101257935

Anonymous 07/03/24(Wed)11:44:45 No.101257935

>>101257929
Optimal speed not too slow but also big token count and smart?

Anonymous
07/03/24(Wed)11:45:17 No.101257942

Anonymous 07/03/24(Wed)11:45:17 No.101257942

>>101257917
b-based.. I take it that you want to license other repos under it aswell? I could open the necessary PRs for You.

Anonymous
07/03/24(Wed)11:45:28 No.101257946

Anonymous 07/03/24(Wed)11:45:28 No.101257946

sliding window when? https://github.com/ggerganov/llama.cpp/pull/8227#issuecomment-2200466793 is this sliding window now? I don't want to load everything again to find out llamacpp is still permanently bugged.

Anonymous
07/03/24(Wed)11:45:39 No.101257947

Anonymous 07/03/24(Wed)11:45:39 No.101257947

>>101257919
Phi 3.1 mini

Anonymous
07/03/24(Wed)11:46:22 No.101257958

Anonymous 07/03/24(Wed)11:46:22 No.101257958

I optimized an algorithm to clean datasets from taking 145 hours to just 6 hours. Feels good.

Anonymous
07/03/24(Wed)11:46:30 No.101257961

Anonymous 07/03/24(Wed)11:46:30 No.101257961

>>101257917
enabler
>>101257942
mentally ill

Anonymous
07/03/24(Wed)11:47:36 No.101257979

Anonymous 07/03/24(Wed)11:47:36 No.101257979

>>101257749
Was that one-shot? What's the prompt?

Anonymous
07/03/24(Wed)11:47:49 No.101257984

Anonymous 07/03/24(Wed)11:47:49 No.101257984

>>101257961
Eh. The license seems reasonable enough to me. I'll probably use it for future models.

Anonymous
07/03/24(Wed)11:48:14 No.101257992

Anonymous 07/03/24(Wed)11:48:14 No.101257992

>>101257935
>Optimal speed not too slow
Alright.
You have a couple of options that tradeoff smarts for speed.
Usually, the bigger the model the smarter it is.
Since you have tons of RAM and not that much VRAM, you'll want llama.cpp + gguf formatted models.
Try :
>Stheno v3.2
>Gemma 9b
>Mixtral 8x7v limarp zloss
>Qwen 2 57B 14A
>Gemma 27B
>Command R
You'll have to find your own personal sweet spot.

Anonymous
07/03/24(Wed)11:49:08 No.101258007

Anonymous 07/03/24(Wed)11:49:08 No.101258007

>>101257919
You can run 70B models as long as you quant down to the mid 50's. I've pushed as far as 58.4 but I can't have anything else using much memory because every gig counts when the file cache is that big.

Llama 3 spins and CommandR+ are probably the best ones in this weight class. Qwen is there but I haven't been impressed and it's fussy about settings and Tienanmen Square.

Anonymous
07/03/24(Wed)11:49:40 No.101258016

Anonymous 07/03/24(Wed)11:49:40 No.101258016

>>101257979
Yes. Copy-pasted the thread sans the OP and added a short prompt.

Anonymous 07/03/24(Wed)03:52:13 No.101250343>>101250423

    >>101249196
    What is a VAE?

Anonymous 07/03/24(Wed)04:01:57 No.101250423

    >>101250343
    It's a magical thingie that is run with Stable Diffusion image generation. If you don't have a VAE you're vulnerable to pale, washed out colors, errors in details, and occasionally odd color blotches.

Anonymous 07/03/24(Wed)04:09:48 No.101250482

    >>101250468 (Cross-thread)
    >>101250468 (Cross-thread)
    >>101250468 (Cross-thread)

Anonymous 07/03/24(Wed)04:52:02 No.101250821
Capture.png (156 KB, 642x543) google yandex iqdb wait
156 KB

---

Task: summarize the most important things in the thread. Include links to posts (like this: >>101250482 >>101250423). Only include things that were discussed by multiple posts. Make it just a flat list with simple descriptions:

- Thing anons discussed (>>111 >>222 >>333)
- Another thing (>>555 >>66666 >>777)

But this is a corpo closed LLM. 42k tokens.

Anonymous
07/03/24(Wed)11:50:57 No.101258032

Anonymous 07/03/24(Wed)11:50:57 No.101258032

>>101258016
>corpo closed LLM
HERETIC!
Get the torches and pitchforks.

Anonymous
07/03/24(Wed)11:51:16 No.101258038

Anonymous 07/03/24(Wed)11:51:16 No.101258038

>>101257984
No big company cares about your sloptunes. At best some ai gf site will use it. And if they do you won't be aware of it. And if you will be aware of it what are you gonna do about it? Go to court?

Anonymous
07/03/24(Wed)11:52:18 No.101258056

Anonymous 07/03/24(Wed)11:52:18 No.101258056

>>101258032
Wish I had VRAM to do this locally.

Anonymous
07/03/24(Wed)11:54:32 No.101258082

Anonymous 07/03/24(Wed)11:54:32 No.101258082

>>101258056
Someday.

Anonymous
07/03/24(Wed)11:56:38 No.101258106

Anonymous 07/03/24(Wed)11:56:38 No.101258106

Could somebody compare gemma 9b it to glm 4 9b chat?

Anonymous
07/03/24(Wed)12:02:01 No.101258168

Anonymous 07/03/24(Wed)12:02:01 No.101258168

>>101258106
they're only 9b so you can do it yourself on whatever toaster you can find

Anonymous
07/03/24(Wed)12:03:59 No.101258193

Anonymous 07/03/24(Wed)12:03:59 No.101258193

>>101258038
> sloptunes
Libel lawsuit time.

Anonymous
07/03/24(Wed)12:07:05 No.101258226

Anonymous 07/03/24(Wed)12:07:05 No.101258226

https://us.moshi.chat/?queue_id=talktomoshi

Anonymous
07/03/24(Wed)12:07:34 No.101258232

Anonymous 07/03/24(Wed)12:07:34 No.101258232

licenses are a huge meme. no one is stopping me from using AGPL code in my backend.

Anonymous
07/03/24(Wed)12:07:43 No.101258237

Anonymous 07/03/24(Wed)12:07:43 No.101258237

>>101258193
I apologize for that mistake on my part. I was talking about your allegedly slopped alleged tune.

Anonymous
07/03/24(Wed)12:11:11 No.101258279

Anonymous 07/03/24(Wed)12:11:11 No.101258279

>>101257915
yes you are.

Anonymous
07/03/24(Wed)12:19:38 No.101258367

Anonymous 07/03/24(Wed)12:19:38 No.101258367

>>101258232
This, just change it a bit and say that you wrote this implementation yourself. If it worked in high school, it'll work here as well.

Anonymous
07/03/24(Wed)12:20:45 No.101258383

Anonymous 07/03/24(Wed)12:20:45 No.101258383

File: file.png (308 KB, 2848x1503)

308 KB PNG

>first to add support before anyone else
>even with an icon prepared for the release
>they still haven't pulled any of the llama.cpp fixes
https://github.com/ollama/ollama/issues/5428
I'm still laughing.

Anonymous
07/03/24(Wed)12:24:16 No.101258419

Anonymous 07/03/24(Wed)12:24:16 No.101258419

>>101258383
why do you expect them to update everyday?

Anonymous
07/03/24(Wed)12:25:05 No.101258436

Anonymous 07/03/24(Wed)12:25:05 No.101258436

>>101258419
For gemma, yes, it's not everyday we got a great 27b model in our hand

Anonymous
07/03/24(Wed)12:26:06 No.101258445

Anonymous 07/03/24(Wed)12:26:06 No.101258445

is gemma2 27b better than llama3 70b? i need a verdict NOW

Anonymous
07/03/24(Wed)12:27:42 No.101258466

Anonymous 07/03/24(Wed)12:27:42 No.101258466

>>101258445
Gemma 2 27B failed my tests that vanilla L3 70B and many of its spins passed.

Anonymous
07/03/24(Wed)12:29:49 No.101258497

Anonymous 07/03/24(Wed)12:29:49 No.101258497

>>101258383
https://github.com/ggerganov/llama.cpp/pull/8244
>last gamma fix was 2 days ago
does that mean we can use llama cpp python on booba now?
https://github.com/abetlen/llama-cpp-python/commit/01bddd669ca1208f1844ce8d0ba9872532641c9d

Anonymous
07/03/24(Wed)12:30:28 No.101258507

Anonymous 07/03/24(Wed)12:30:28 No.101258507

>>101258383
Why do they use pedobear as their logo?

Anonymous
07/03/24(Wed)12:32:12 No.101258524

Anonymous 07/03/24(Wed)12:32:12 No.101258524

>>101258507
looks like lolilama to me

Anonymous
07/03/24(Wed)12:32:39 No.101258532

Anonymous 07/03/24(Wed)12:32:39 No.101258532

>>101258507
they know what people use these things for

Anonymous
07/03/24(Wed)12:33:34 No.101258540

Anonymous 07/03/24(Wed)12:33:34 No.101258540

>>101258226
>terms and conditions page 404s
>reveals that it's on an nginx/ubuntu back end
Not that there's anything wrong with nginx and ubuntu but that tells me that whoever running it is too unofficial to actually deliver the experiencing it is promising. So fuck if I'm going to give them my email just to "queue up".

Anonymous
07/03/24(Wed)12:34:02 No.101258546

Anonymous 07/03/24(Wed)12:34:02 No.101258546

>>101258507
Kuma (I think, it's been a long time) was a 2ch meme character that was apporpriated and used for nefarious purposes. Kinda like the swastika.

Anonymous
07/03/24(Wed)12:35:55 No.101258567

Anonymous 07/03/24(Wed)12:35:55 No.101258567

>>101258445
Yes. It blows away anything else atm for creative writing, one shotted some coding stuff. Scroll up for settings that work and make or break the model.

Anonymous
07/03/24(Wed)12:38:26 No.101258598

Anonymous 07/03/24(Wed)12:38:26 No.101258598

>>101258576
>>101258576
>>101258576

Anonymous
07/03/24(Wed)12:41:47 No.101258638

Anonymous 07/03/24(Wed)12:41:47 No.101258638

File: file.png (221 KB, 586x661)

221 KB PNG

>>101258546
>Kuma

Anonymous
07/03/24(Wed)12:53:55 No.101258761

Anonymous 07/03/24(Wed)12:53:55 No.101258761

>>101258540
You don't need to give the email, unless that was a bug on my end.

Anonymous
07/03/24(Wed)12:54:08 No.101258766

Anonymous 07/03/24(Wed)12:54:08 No.101258766

>>101256810
are you paid by hiro to post this shilling everywhere? we ain't buying your shit ass ads, hirokike

Anonymous
07/03/24(Wed)12:54:26 No.101258769

Anonymous 07/03/24(Wed)12:54:26 No.101258769

>>101258445
Yes.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.