/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/28/24(Fri)06:03:46 No.101186500

File: 1689556332236690.jpg (730 KB, 1856x2464)

730 KB JPG

/lmg/ - Local Models General Anonymous 06/28/24(Fri)06:03:46 No.101186500 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101180092 & >>101173181

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/28/24(Fri)06:04:34 No.101186508

Anonymous 06/28/24(Fri)06:04:34 No.101186508

File: __hatsune_miku_and_chibi_(...).jpg (189 KB, 800x800)

189 KB JPG

►Recent Highlights from the Previous Thread: >>101180092

--Paper: HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale: >>101183476
--Gemma Debugging Issues with HF Transformers Implementation: >>101183120 >>101184453
--Gemma 27b Models' Coherence Issues with Sliding Window Attention: >>101180648 >>101180665 >>101181050 >>101181078
--The Frustrations of AI Model Performance and Limitations: >>101181282 >>101181268 >>101181298 >>101181321 >>101181349 >>101181350 >>101181433 >>101181573 >>101181345 >>101181362
--SPPO Performance and Comparison to Instruct Models: >>101183296 >>101183595 >>101183661 >>101183827 >>101183939
--Llama Model Load Error: Unknown Model Architecture 'Gemma2': >>101184465 >>101184685 >>101184767
--LLM-Compiler's Limitations in Compiler Development: >>101182838 >>101182947 >>101183745 >>101183855 >>101183896 >>101183874
--Gemma-9B's NSFW Behavior: Anomaly or Dataset Issue?: >>101180719 >>101180943 >>101181086
--Gemma-2 Support Issues in Llama.cpp: >>101182001 >>101182048 >>101182376
--Gemma 2 Release Format Issues and Official Implementation: >>101181569 >>101181611 >>101181640
--Eagle's Speed for Inferencing and Decoding in Creative Writing and RP: >>101184243 >>101184286 >>101184304 >>101184318 >>101184337 >>101184368 >>101184297
--Drama in the Quantization Community: Q8_0_L Quant Development: >>101180438 >>101180456 >>101180471
--Can LLMs Solve Programming by Example?: >>101180866
--Anon's ST Addon Development: Constant Reminders and UI Improvements: >>101180277 >>101181094 >>101185317
--Nala Test for Gemmy 9b (Q8_0): >>101184502 >>101184521 >>101184551
--Anon's Struggle to Access LLM Compiler and Unconventional Plans for its Use: >>101182269 >>101182463
--AI Models and the Human Brain: Efficiency and Unrealistic Portrayals: >>101182977 >>101183020 >>101183069 >>101183105 >>101183084 >>101183078
--Miku (free space): >>101183371 >>101185164

►Recent Highlight Posts from the Previous Thread: >>101180096

Anonymous
06/28/24(Fri)06:07:23 No.101186546

Anonymous 06/28/24(Fri)06:07:23 No.101186546

File: rbc.png (31 KB, 539x131)

31 KB PNG

what does she mean by that?

Anonymous
06/28/24(Fri)06:08:32 No.101186559

Anonymous 06/28/24(Fri)06:08:32 No.101186559

>>101186546
>rape by consent
Yep. It's properly emulating a woman. Enjoy your model

Anonymous
06/28/24(Fri)06:08:52 No.101186561

Anonymous 06/28/24(Fri)06:08:52 No.101186561

>>101186508
>Gemmy 9b (Q8_0)
it's coal though.

Anonymous
06/28/24(Fri)06:10:35 No.101186575

Anonymous 06/28/24(Fri)06:10:35 No.101186575

>>101186546
>what does she mean by that?
nothing? never take hallucinations from artificial redditor at face value.

Anonymous
06/28/24(Fri)06:35:16 No.101186722

Anonymous 06/28/24(Fri)06:35:16 No.101186722

I'm retarded, please help me
Last time I messed with LLMs I could barely run anything coherent on my poorfag machine. Have any of the newer hacks made things better are are we still stuck on using the biggest merge that works? I see some stuff about bit nets on the news, are they better for low model sizes?

Anonymous
06/28/24(Fri)06:36:37 No.101186732

Anonymous 06/28/24(Fri)06:36:37 No.101186732

>>101186722
install linux

Anonymous
06/28/24(Fri)06:37:33 No.101186741

Anonymous 06/28/24(Fri)06:37:33 No.101186741

>>101186732
I use Linux, what's the next step?

Anonymous
06/28/24(Fri)06:37:40 No.101186742

Anonymous 06/28/24(Fri)06:37:40 No.101186742

>>101186546
It's rape if her body reacts to some(You) whom she has decided that her body shouldn't react to because that doesn't align with what she has chosen as her ideal.

Anonymous
06/28/24(Fri)06:38:41 No.101186752

Anonymous 06/28/24(Fri)06:38:41 No.101186752

>>101185918
>you definitely aren't running on full gpu then at those speeds
Being vramlet I'm used to the drop as soon as I run any non-garbage model, from 20+ to 1-2 t/s. I'm just not sure about the next drop from 0.7ish to glacial. Happens sometime in the 60-69 (nice) GB model range, so I'm theorizing system RAM is a factor. (I'm on 64GB RAM.)

Anonymous
06/28/24(Fri)06:38:56 No.101186755

Anonymous 06/28/24(Fri)06:38:56 No.101186755

24 VramChads, Gemma 27B is our solve? We are so back, or is just slop shit and we are doomed and our only hoppe is to die?

Anonymous
06/28/24(Fri)06:39:54 No.101186762

Anonymous 06/28/24(Fri)06:39:54 No.101186762

>>101186741
tell specs

Anonymous
06/28/24(Fri)06:41:46 No.101186774

Anonymous 06/28/24(Fri)06:41:46 No.101186774

>>101186598
>I understood the selling point that they maintain a 'library' of models for nubs that can't understand HF. You just pick llama3 or whatever and don't need to deploy braincells to think about quant levels and what is appropriate for your hardware etc.
>If you understand how to choose a GGUF you're probably better off running a backend closer to the upstream.
It's part of the learning curve.
Ollama got me a model and a prompt that goes.
Then I learned about the other models.
Then I learned about the model variations.
Then I learned about what the quants are about.
Then I learned to use those through Ollama.
Then I learned to want more options.
And that they live on HF.
That pushed me to step up to Kobold.
And it told me I needed GGUF files.
That's the last experience point I needed to level up.

Anonymous
06/28/24(Fri)06:42:50 No.101186784

Anonymous 06/28/24(Fri)06:42:50 No.101186784

>>101186762
Specs are garbage. My GPU has less than 2GB of VRAM so I run on CPU with 16GB of RAM (koboldcpp). If I stick to 8-12GB models it generates at a decent speed but I can't really work with anything bigger without going too slow to be worth it

Anonymous
06/28/24(Fri)06:46:19 No.101186805

Anonymous 06/28/24(Fri)06:46:19 No.101186805

>>101186774
The problem is that it seems to be actively hostile to doing anything outside of it's prescribed way to do things.

Anonymous
06/28/24(Fri)06:46:27 No.101186806

Anonymous 06/28/24(Fri)06:46:27 No.101186806

>>101186784
i mean fimb-11b-v2 is decent, u could also use stheno 8b 3.2

Anonymous
06/28/24(Fri)06:47:25 No.101186811

Anonymous 06/28/24(Fri)06:47:25 No.101186811

I used that trick to get the entire prompt froh the chatgpt website

https://rentry.org/stcrcggo

Feels kind of a stretch that it can follow all that

Anonymous
06/28/24(Fri)06:47:43 No.101186814

Anonymous 06/28/24(Fri)06:47:43 No.101186814

>>101186806 (me)
and remember to only download models from sao

Anonymous
06/28/24(Fri)06:47:57 No.101186815

Anonymous 06/28/24(Fri)06:47:57 No.101186815

>>101186752
yeah when you split its back to mem/cpu speed. i get 1.4t/s on l2 70b at 32k context which is fine for my usage. its really the lowest i'd want to go though as far as speed. bigger models despite the slowness are worth using because the responses are so much better. 8b is so retarded i can't believe anyone even wastes time on them no matter how fast it is

Anonymous
06/28/24(Fri)06:48:25 No.101186816

Anonymous 06/28/24(Fri)06:48:25 No.101186816

File: file.png (12 KB, 464x95)

12 KB PNG

>>101186814
not me thougheverbeit

Anonymous
06/28/24(Fri)06:49:58 No.101186826

Anonymous 06/28/24(Fri)06:49:58 No.101186826

Said fuck it and manually updated ooba's tranformer. Works fine as long as you turn off do_sample

That said, 9b is actually garbage. What the fuck are you all seeing in this shit? It can't follow instructions for shit and the writing is awful. I guess uh, good for you vramlets you get something that isn't hard refusing porn shit?

I'mma stick with real models though.

Anonymous
06/28/24(Fri)06:51:01 No.101186832

Anonymous 06/28/24(Fri)06:51:01 No.101186832

>>101186826
Fuck, forgot to specify: I meant gemma-2-9b is what I tried out, and it's trash.

Anonymous
06/28/24(Fri)07:07:31 No.101186935

Anonymous 06/28/24(Fri)07:07:31 No.101186935

>>101186826
I'm a vramlet, but even I have standards. 9b is completely useless.

Anonymous
06/28/24(Fri)07:10:30 No.101186953

Anonymous 06/28/24(Fri)07:10:30 No.101186953

>>101186500
>that nature
Sovl, I wish I wasn't around buildings all the time. How do I cope?

Anonymous
06/28/24(Fri)07:22:17 No.101187024

Anonymous 06/28/24(Fri)07:22:17 No.101187024

Retards saying that Gemma-2-9B is trash while the 27B is great haven't actually tried either model. The 27B version appears to defective and incoherent.

Anonymous
06/28/24(Fri)07:22:17 No.101187025

Anonymous 06/28/24(Fri)07:22:17 No.101187025

Gents, I want to try running command-r+ on my pc with 2 3090s. Will this work and can I load it in ooba?

Anonymous
06/28/24(Fri)07:23:48 No.101187037

Anonymous 06/28/24(Fri)07:23:48 No.101187037

>>101187024
I tried 27B using online hosted services, it works well there. I'm not one of people who praised 27B in those threads, but I imagine that's what they did as well.

Anonymous
06/28/24(Fri)07:25:07 No.101187045

Anonymous 06/28/24(Fri)07:25:07 No.101187045

>>101186811
>Personality: v2

What did OpenAI mean with this

Anonymous
06/28/24(Fri)07:30:13 No.101187073

Anonymous 06/28/24(Fri)07:30:13 No.101187073

In booba's exl2 loader, what exactly is cfg-cache?

Anonymous
06/28/24(Fri)07:31:16 No.101187079

Anonymous 06/28/24(Fri)07:31:16 No.101187079

>>101187073
RTFM
https://github.com/oobabooga/text-generation-webui/wiki/04-%E2%80%90-Model-Tab#exllamav2_hf
>Creates a second cache to hold the CFG negative prompts. You need to set this if and only if you intend to use CFG

Anonymous
06/28/24(Fri)07:31:21 No.101187081

Anonymous 06/28/24(Fri)07:31:21 No.101187081

>>101187073
CFG is when you have a positive and a negative prompt, and CFG cache, I assume, means it reserves two caches: one for the usual, and one for the negative prompt.

Anonymous
06/28/24(Fri)07:32:01 No.101187086

Anonymous 06/28/24(Fri)07:32:01 No.101187086

>>101187073
The thing people claimed would make open models free from gptisms and make them smarter than proprietary ones

Anonymous
06/28/24(Fri)07:32:08 No.101187089

Anonymous 06/28/24(Fri)07:32:08 No.101187089

>>101186388
Control vectors influence output direction of the model, so when applied at higher strength, they will make model output the same this every time.

Anonymous
06/28/24(Fri)07:32:41 No.101187093

Anonymous 06/28/24(Fri)07:32:41 No.101187093

>>101187037
Well, it is easy to download them both from the official Google repositories on HF, quantize them to the same level (e.g. GGUF q6_k, using the latest patches) and observe back-to-back how not only the 27B version is about as censored as the previous Gemma (albeit with a somewhat less irritating tone), it rambles and mixes up user/model responses, whereas the 9B has no issues.

Hard to imagine that 6.5-bit quantization hits the 27B version harder than the 9B, but anything is possible, I suppose?

Anonymous
06/28/24(Fri)07:34:33 No.101187105

Anonymous 06/28/24(Fri)07:34:33 No.101187105

>>101187025
Sure it will work, although it will be slow I would suggest part-offloading >=Q4 GGUF rather than trying to cram a low bpw entirely into VRAM.

Anonymous
06/28/24(Fri)07:35:19 No.101187115

Anonymous 06/28/24(Fri)07:35:19 No.101187115

>>101187093
I assume that the problem is somewhere in the open source implementation, not in quantization. And, yes, 27B definitely is cucked, but that can be partially fixed later on. I evaluated its level of intelligence when it did talk to me.

Anonymous
06/28/24(Fri)07:44:04 No.101187171

Anonymous 06/28/24(Fri)07:44:04 No.101187171

>>101186806
Thanks, I'll try those out. I just hoped that all that (((research))) would have produced better local models by now instead of just bigger ones

Anonymous
06/28/24(Fri)08:02:24 No.101187294

Anonymous 06/28/24(Fri)08:02:24 No.101187294

>>101187171
8Bs of today are so much better than llama1 8B that it's not even close.

Anonymous
06/28/24(Fri)08:04:21 No.101187305

Anonymous 06/28/24(Fri)08:04:21 No.101187305

>>101187294
llama1 7b*

Anonymous
06/28/24(Fri)08:18:01 No.101187390

Anonymous 06/28/24(Fri)08:18:01 No.101187390

>>101187305
llama1 6.7b*

Anonymous
06/28/24(Fri)08:29:59 No.101187479

Anonymous 06/28/24(Fri)08:29:59 No.101187479

>>101187390
>https://arxiv.org/abs/2302.13971
7b.

Anonymous
06/28/24(Fri)08:32:59 No.101187503

Anonymous 06/28/24(Fri)08:32:59 No.101187503

>>101187479
and how many parameters did it have

Anonymous
06/28/24(Fri)08:36:26 No.101187525

Anonymous 06/28/24(Fri)08:36:26 No.101187525

>>101187503
anon we call it llama-1-7b not llama-1-6.738

Anonymous
06/28/24(Fri)08:36:46 No.101187527

Anonymous 06/28/24(Fri)08:36:46 No.101187527

>>101187503
Seven parameters, of course.

Anonymous
06/28/24(Fri)08:37:49 No.101187535

Anonymous 06/28/24(Fri)08:37:49 No.101187535

File: .png (80 KB, 640x564)

80 KB PNG

>>101187525
>llama-1-6.738
llama-1-6.738b*

Anonymous
06/28/24(Fri)08:55:44 No.101187650

Anonymous 06/28/24(Fri)08:55:44 No.101187650

Gemma-2-9B really wants to write to "X, Ying"-type prose during RP even if you manually randomize that with something else like:

"Xing, Y"
"X and then Y"
"As X, Y"
"X"
etc.

Anonymous
06/28/24(Fri)09:07:40 No.101187737

Anonymous 06/28/24(Fri)09:07:40 No.101187737

File: 5880528_p0.jpg (225 KB, 651x700)

225 KB JPG

>>101186774
proud of you Anon
>>101186805
It can run arbitrary GGUFs with a couple extra steps if that's what you mean. It's fine to have more options for beginners who understandably don't want to struggle with details just to try the stuff.

Anonymous
06/28/24(Fri)09:11:48 No.101187764

Anonymous 06/28/24(Fri)09:11:48 No.101187764

>>101187650
Damn, this triggers my autism like nothing else. I guess it will pass the Nala test with flying colors though.

Anonymous
06/28/24(Fri)09:12:13 No.101187768

Anonymous 06/28/24(Fri)09:12:13 No.101187768

>>101186561
f-finetunes will fix it.

Anonymous
06/28/24(Fri)09:13:14 No.101187776

Anonymous 06/28/24(Fri)09:13:14 No.101187776

>>101186755
27B is fucking brain damaged.
It can handle simple assistant type prompting but RP prompts confuse and enrage it.

Anonymous
06/28/24(Fri)09:21:45 No.101187846

Anonymous 06/28/24(Fri)09:21:45 No.101187846

>>101187105
Thanks, by low bpw you mean like a 3bit quant to fit in 48gb vram?

Anonymous
06/28/24(Fri)09:24:18 No.101187865

Anonymous 06/28/24(Fri)09:24:18 No.101187865

>>101187776
dumbass

Anonymous
06/28/24(Fri)09:27:04 No.101187890

Anonymous 06/28/24(Fri)09:27:04 No.101187890

>>101187737
would brutally rape both

Anonymous
06/28/24(Fri)09:27:23 No.101187893

Anonymous 06/28/24(Fri)09:27:23 No.101187893

>>101187846
Yeah. Keep in mind you also need VRAM for context+other buffers on top of the model size. I did not get great results with Q2/3 but why not try it and compare. Might take some fiddling to get it to fit (quantized KV cache and lower context length can help). I find it worth the lower speed to use Q4KM

Anonymous
06/28/24(Fri)09:27:27 No.101187894

Anonymous 06/28/24(Fri)09:27:27 No.101187894

File: investigation.png (97 KB, 1304x786)

97 KB PNG

>>101187776
https://huggingface.co/google/gemma-2-27b-it/discussions/10

Anonymous
06/28/24(Fri)09:28:07 No.101187901

Anonymous 06/28/24(Fri)09:28:07 No.101187901

>>101187865
Do you ever sleep?

Anonymous
06/28/24(Fri)09:29:19 No.101187918

Anonymous 06/28/24(Fri)09:29:19 No.101187918

>Anonymous 06/28/24(Fri)15:28:07 No.101187901
>>>101187865 (You)
>Do you ever sleep?
12 hours a day im a neet

Anonymous
06/28/24(Fri)09:32:45 No.101187959

Anonymous 06/28/24(Fri)09:32:45 No.101187959

>>101187918
Want to be my NEET bf? UwU

Anonymous
06/28/24(Fri)09:34:15 No.101187978

Anonymous 06/28/24(Fri)09:34:15 No.101187978

>>101187893
Thanks again, not sure how far I’ll get with only 32gb ram but will see. Might need to try the non + version

Anonymous
06/28/24(Fri)09:36:25 No.101187996

Anonymous 06/28/24(Fri)09:36:25 No.101187996

>>101187894
>Just don't use float16
>lmao no we're not going to release the fp32 weights

Anonymous
06/28/24(Fri)09:40:55 No.101188046

Anonymous 06/28/24(Fri)09:40:55 No.101188046

>>101186575
It's not a hallucination it's an accurate simulation of a woman. LLMs are getting better and better.

Anonymous
06/28/24(Fri)09:43:37 No.101188077

Anonymous 06/28/24(Fri)09:43:37 No.101188077

friendly reminder that ollama WON
>received a private PR by google for gemma support before the release. llama.cpp was ignored
>available as an option in the brave browser, used my millions
>on its way to 100k stars
>redditors on localllama love it and only talk about it
>every llm YouTube video recommens it
>every twitter influencer recommends it
>hosts events, receives endless vc funding
Sorry chuds

Anonymous
06/28/24(Fri)09:45:17 No.101188094

Anonymous 06/28/24(Fri)09:45:17 No.101188094

>>101188077
so you're saying I should start a betting pool on when the ceo gets metoo'd?

Anonymous
06/28/24(Fri)09:45:19 No.101188095

Anonymous 06/28/24(Fri)09:45:19 No.101188095

>>101187894
I knew 27b was fucked up in transformers. They rushed it and didn't test things properly.

Anonymous
06/28/24(Fri)09:46:29 No.101188102

Anonymous 06/28/24(Fri)09:46:29 No.101188102

>>101188094
ollama guy is the Bill Gates of llm

Anonymous
06/28/24(Fri)09:47:59 No.101188113

Anonymous 06/28/24(Fri)09:47:59 No.101188113

>>101188095
How can it be transformers if 9B works fine though?

Anonymous
06/28/24(Fri)09:48:09 No.101188115

Anonymous 06/28/24(Fri)09:48:09 No.101188115

Anyone know a repo that has styletts2 + rvc integrated nicely? I currently use xtts + rvc but xtts isn't consistent enough and tend to produce results that slur/shit itself from time to time. Particularly want an implementation with voice cloning

Anonymous
06/28/24(Fri)09:48:32 No.101188118

Anonymous 06/28/24(Fri)09:48:32 No.101188118

>>101188113
Are they exactly the same architecture?

Anonymous
06/28/24(Fri)09:48:56 No.101188124

Anonymous 06/28/24(Fri)09:48:56 No.101188124

>>101188077
At this point I think llama.cpp should just give up and let ollama maintain the project, ngl.
I hate them, but llama.cpp is probably even worse. llama.cpp is always broken and, when you report it, the maintainers blame you instead of investigating. It's infuriating

Anonymous
06/28/24(Fri)09:55:24 No.101188182

Anonymous 06/28/24(Fri)09:55:24 No.101188182

>>101188124
>llama.cpp is always broken
Yeah, this is the reason I use exlama, and more recently lamafiles. Shit just works.
Don't even mind switching to globo-slop approved olama in the future, as long as I can launch my waifu with no fuss.

Anonymous
06/28/24(Fri)09:56:48 No.101188193

Anonymous 06/28/24(Fri)09:56:48 No.101188193

>>101188182
Does llamafile work at all? Does it behave nicely with sillytavern?

Anonymous
06/28/24(Fri)09:57:55 No.101188205

Anonymous 06/28/24(Fri)09:57:55 No.101188205

>>101188077
>received a private PR by google for gemma support before the release. llama.cpp was ignored
They did? Their PR looked like a ctrl+c ctrl+v of the llamacpp one, with the tokenizer errors and all.

Anonymous
06/28/24(Fri)09:58:53 No.101188211

Anonymous 06/28/24(Fri)09:58:53 No.101188211

>>101188205
They didn't. These anons are retarded.

llama.cpp CUDA dev !YOmst7Ghe6
06/28/24(Fri)10:02:26 No.101188248

llama.cpp CUDA dev !YOmst7Ghe6 06/28/24(Fri)10:02:26 No.101188248

>>101188077
I've been thinking it would be kind of funny to implement some critical component on an AGPL fork but I really don't think it would be worth the drama.
I don't want or need a job or attention so to me downstream projects have non-negative value (depending on what and how much they contribute upstream).

>>101188124
>At this point I think llama.cpp should just give up and let ollama maintain the project, ngl.
I have never seen any bug reports or fixes for llama.cpp issues from ollama devs so I don't think they could.
I think the only reason there are fewer issues with ollama is that they wait for the llama.cpp issues to be fixed before they take over the code.

Anonymous
06/28/24(Fri)10:05:54 No.101188277

Anonymous 06/28/24(Fri)10:05:54 No.101188277

>>101188077
>Sorry chuds
you call the llama.cpp dev chuds? lol, they're the one putting Jartroon back on the team in the first place

Anonymous
06/28/24(Fri)10:06:41 No.101188285

Anonymous 06/28/24(Fri)10:06:41 No.101188285

>>101188248
What is your motivation for continuing to maintain llama.cpp? Asking as a llama.cpp contributor myself, the amount of code that you put out and the consistency are insane.

Anonymous
06/28/24(Fri)10:06:57 No.101188287

Anonymous 06/28/24(Fri)10:06:57 No.101188287

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL
That would be so fucking funny.

Anonymous
06/28/24(Fri)10:15:34 No.101188357

Anonymous 06/28/24(Fri)10:15:34 No.101188357

>>101188277
I mean the people who unironically sling the word chud around unironically believe that Donald Trump is anything other than an Israel-First neoliberal hack. The bar for being considered a chud is pretty low.

Anonymous
06/28/24(Fri)10:17:37 No.101188375

Anonymous 06/28/24(Fri)10:17:37 No.101188375

>>101188357
true, true

llama.cpp CUDA dev !YOmst7Ghe6
06/28/24(Fri)10:18:05 No.101188382

llama.cpp CUDA dev !YOmst7Ghe6 06/28/24(Fri)10:18:05 No.101188382

>>101188285
-I like building and optimizing things and making numbers go up.
-I am by nature a very competitive person and one of my long-standing ambitions is to write the code with the worldwide best performance (at least for those use cases I care about).
-While I think that as of right now generative neural networks are still kind of lackluster I think that they will become very good in a few years and that the infrastructure for that needs to be developed ahead of time. In particular a low upfront cost is I think important.
-I am ideologically very pro open knowledge/open source (though I prefer free software).
-I plan to use llama.cpp/ggml for my own projects (doctoral thesis in physics, AI-powered RPG if no one else does it before me, pretraining models if I can make it cheap enough that I can actually afford it).

Anonymous
06/28/24(Fri)10:20:49 No.101188408

Anonymous 06/28/24(Fri)10:20:49 No.101188408

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork but I really don't think it would be worth the drama.
holy fvcking based..

Anonymous
06/28/24(Fri)10:22:22 No.101188421

Anonymous 06/28/24(Fri)10:22:22 No.101188421

File: IMG_623.png (835 KB, 1080x1350)

835 KB PNG

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork

Anonymous
06/28/24(Fri)10:23:36 No.101188437

Anonymous 06/28/24(Fri)10:23:36 No.101188437

File: 1719580060679661.png (402 KB, 1600x900)

402 KB PNG

>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork

Anonymous
06/28/24(Fri)10:25:14 No.101188456

Anonymous 06/28/24(Fri)10:25:14 No.101188456

>>101188382
verdict: based, on all counts

Anonymous
06/28/24(Fri)10:25:33 No.101188458

Anonymous 06/28/24(Fri)10:25:33 No.101188458

File: tf2.png (743 KB, 956x933)

743 KB PNG

>Ive been thinking it would be kind of funny to implement some critical component on an AGPL fork

Anonymous
06/28/24(Fri)10:25:54 No.101188463

Anonymous 06/28/24(Fri)10:25:54 No.101188463

>cornpop did so bad that the damage control is spilling into lmg
big lel

Anonymous
06/28/24(Fri)10:26:13 No.101188468

Anonymous 06/28/24(Fri)10:26:13 No.101188468

AGPL is a fair license if you would like to take your part on the Free Software movement.

Anonymous
06/28/24(Fri)10:26:15 No.101188469

Anonymous 06/28/24(Fri)10:26:15 No.101188469

>>101188382
>very competitive
>and yet, exllamaV2 is still miles ahead of llama.cpp
I'm starting to think it's over.

Anonymous
06/28/24(Fri)10:27:06 No.101188480

Anonymous 06/28/24(Fri)10:27:06 No.101188480

File: mikutweeku.png (1.11 MB, 970x755)

1.11 MB PNG

>>101188382
do it.

Anonymous
06/28/24(Fri)10:27:47 No.101188490

Anonymous 06/28/24(Fri)10:27:47 No.101188490

can someone do a TLDR about licences? not everyone is a lawyer. Is AGPL a good thing? And what does the cuda dev wants to do with it?

Anonymous
06/28/24(Fri)10:28:08 No.101188494

Anonymous 06/28/24(Fri)10:28:08 No.101188494

File: df.png (98 KB, 619x693)

98 KB PNG

>>101188382
it's time

Anonymous
06/28/24(Fri)10:29:24 No.101188502

Anonymous 06/28/24(Fri)10:29:24 No.101188502

File: miku_73.png (315 KB, 962x962)

315 KB PNG

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork
You've gotta deliver now that you've said it.

Anonymous
06/28/24(Fri)10:30:01 No.101188510

Anonymous 06/28/24(Fri)10:30:01 No.101188510

>>101188382
based

Anonymous
06/28/24(Fri)10:30:11 No.101188513

Anonymous 06/28/24(Fri)10:30:11 No.101188513

File: 1699141071733348.png (47 KB, 698x658)

47 KB PNG

>>101188469
this
exllamav2 embarrasses llamacpp

Anonymous
06/28/24(Fri)10:31:40 No.101188526

Anonymous 06/28/24(Fri)10:31:40 No.101188526

>>101188469
>miles ahead
the fuck you talk about? llama.cpp gives deterministic output + allows for some cpu offloading if we want to get a slightly higher quant

Anonymous
06/28/24(Fri)10:32:04 No.101188530

Anonymous 06/28/24(Fri)10:32:04 No.101188530

File: 1717919262009708.png (237 KB, 640x640)

237 KB PNG

>>101188248
>AGPL
did i hear something..?

Anonymous
06/28/24(Fri)10:32:17 No.101188533

Anonymous 06/28/24(Fri)10:32:17 No.101188533

File: bog.png (587 KB, 854x480)

587 KB PNG

>>101188248
>do it

Anonymous
06/28/24(Fri)10:32:20 No.101188534

Anonymous 06/28/24(Fri)10:32:20 No.101188534

>>101188513
its literally slower and worse quality

Anonymous
06/28/24(Fri)10:32:57 No.101188537

Anonymous 06/28/24(Fri)10:32:57 No.101188537

>>101188534
Glad we're on the same page about llamacpp

Anonymous
06/28/24(Fri)10:33:23 No.101188547

Anonymous 06/28/24(Fri)10:33:23 No.101188547

>>101188526
Probably means in terms of speed. I'm still using exl2 exclusively since mixtral.

Anonymous
06/28/24(Fri)10:33:28 No.101188548

Anonymous 06/28/24(Fri)10:33:28 No.101188548

>>101188513
how well does gemma2 9b run on exllama?

Anonymous
06/28/24(Fri)10:34:13 No.101188558

Anonymous 06/28/24(Fri)10:34:13 No.101188558

>>101188537
imagine being this retarded
lmao even

Anonymous
06/28/24(Fri)10:34:59 No.101188566

Anonymous 06/28/24(Fri)10:34:59 No.101188566

File: 1702321261805953.jpg (222 KB, 720x720)

222 KB JPG

>>101188548
are we really going to pretend llamacpp didnt have issues on literally every single new fucking model release

Anonymous
06/28/24(Fri)10:35:52 No.101188572

Anonymous 06/28/24(Fri)10:35:52 No.101188572

>>101188566
so it doesn't run, alright

Anonymous
06/28/24(Fri)10:36:29 No.101188581

Anonymous 06/28/24(Fri)10:36:29 No.101188581

>>101188547
>I'm still using exl2 exclusively since mixtral.
I'm using llama.cpp because I can get a bigger quant (Q5_K_M) even if I don't have enough GPU vram, exllama just doesn't allows you to do that

Anonymous
06/28/24(Fri)10:36:51 No.101188585

Anonymous 06/28/24(Fri)10:36:51 No.101188585

Sirs, you are way too fast. I can't keep up reading the threads.

Anonymous
06/28/24(Fri)10:37:10 No.101188590

Anonymous 06/28/24(Fri)10:37:10 No.101188590

File: gplgod-0.jpg (142 KB, 735x830)

142 KB JPG

>>101188248
gpl gods stay winning

Anonymous
06/28/24(Fri)10:37:12 No.101188591

Anonymous 06/28/24(Fri)10:37:12 No.101188591

File: Bildschirmfoto von 2024-0(...).png (53 KB, 897x443)

53 KB PNG

holy shit, I love google now

Anonymous
06/28/24(Fri)10:37:32 No.101188593

Anonymous 06/28/24(Fri)10:37:32 No.101188593

>>101188581
vramlets need the rope

Anonymous
06/28/24(Fri)10:37:36 No.101188594

Anonymous 06/28/24(Fri)10:37:36 No.101188594

File: cudadevandme.png (454 KB, 651x700)

454 KB PNG

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork
this will be us if you do it

Anonymous
06/28/24(Fri)10:38:09 No.101188600

Anonymous 06/28/24(Fri)10:38:09 No.101188600

>>101188513
>>101188526
>>101188534
>>101188537
>>101188547
>>101188548
>>101188558
Seeing tards slapfight over quant methods is really funny when you've been using float16 since day one like me

Anonymous
06/28/24(Fri)10:38:12 No.101188601

Anonymous 06/28/24(Fri)10:38:12 No.101188601

>>101188581
And oftentimes you are better off doing that. Get a quant that's only slightly bigger than your vram and you are golden.
Something like 80~85% of the model in VRAM is around the sweet spot as far as I can tell.

Anonymous
06/28/24(Fri)10:38:37 No.101188607

Anonymous 06/28/24(Fri)10:38:37 No.101188607

>>101188581
I get it, and I get it's important for a lot of people here, but I am used to the speed of having everything in the VRAM, and for that exl2 is still superior (unless something changed very recently).

Anonymous
06/28/24(Fri)10:39:36 No.101188617

Anonymous 06/28/24(Fri)10:39:36 No.101188617

>>101188601
exactly this, you offload 80% GPU + 20% CPU, the speed is still good and you get a way less retarded quant, that's a win/win situation

Anonymous
06/28/24(Fri)10:40:46 No.101188624

Anonymous 06/28/24(Fri)10:40:46 No.101188624

File: BEST DAY EVER MEME.gif (79 KB, 700x715)

79 KB GIF

>>101188248
excited for it

Anonymous
06/28/24(Fri)10:40:54 No.101188625

Anonymous 06/28/24(Fri)10:40:54 No.101188625

>>101188591
what does this say? I'm not speaking the nazi language kek

Anonymous
06/28/24(Fri)10:41:02 No.101188626

Anonymous 06/28/24(Fri)10:41:02 No.101188626

>>101188617
gguf quants are less efficient than exl2's, you will have worse quality and speed than what you would have just running 100% with exl2

Anonymous
06/28/24(Fri)10:41:30 No.101188635

Anonymous 06/28/24(Fri)10:41:30 No.101188635

File: oyvey.png (715 KB, 782x782)

715 KB PNG

>>101188248

Anonymous
06/28/24(Fri)10:41:58 No.101188642

Anonymous 06/28/24(Fri)10:41:58 No.101188642

>>101188626
>gguf quants are less efficient than exl2's
that's not true, the gguf quants have improved a lot since then

Anonymous
06/28/24(Fri)10:42:31 No.101188650

Anonymous 06/28/24(Fri)10:42:31 No.101188650

>>101188626
>efficient
doesn't exl2 pad the 8bpw so people don't complain about size being too small or something?

Anonymous
06/28/24(Fri)10:44:13 No.101188663

Anonymous 06/28/24(Fri)10:44:13 No.101188663

File: Richard Stallman.png (1.08 MB, 1024x680)

1.08 MB PNG

>>101188248
>GNU/llamacpp

Anonymous
06/28/24(Fri)10:44:45 No.101188671

Anonymous 06/28/24(Fri)10:44:45 No.101188671

File: 1612419376786.jpg (207 KB, 692x1100)

207 KB JPG

>>101188382

Anonymous
06/28/24(Fri)10:44:59 No.101188673

Anonymous 06/28/24(Fri)10:44:59 No.101188673

>>101188650
you retards have been peddling q6/6bpw as the best you can get with anything above being imperceptible, now you wanna pretend you give a shit about q8?

Anonymous
06/28/24(Fri)10:45:53 No.101188680

Anonymous 06/28/24(Fri)10:45:53 No.101188680

>>101188248
Join us now and share the software;
You'll be free, hackers, you'll be free
Join us now and share the software;
You'll be free, hackers, you'll be free
Hoarders can get piles of money
That is true, hackers, that is true
But they cannot help their neighbors;
That's not good, hackers, that's not good
When we have enough free software
At our call, hackers, at our call
We'll kick out those dirty licenses
Ever more, hackers, ever more

Anonymous
06/28/24(Fri)10:46:05 No.101188686

Anonymous 06/28/24(Fri)10:46:05 No.101188686

>>101188673
>you
no, if you can't run q8 you can't run it, period.

Anonymous
06/28/24(Fri)10:46:42 No.101188689

Anonymous 06/28/24(Fri)10:46:42 No.101188689

>>101188686
no, if you can't run fp16 you can't run it, period.

Anonymous
06/28/24(Fri)10:47:39 No.101188698

Anonymous 06/28/24(Fri)10:47:39 No.101188698

>>101188626
EXL2 is based on GPTQ, which is a terrible quantization method.

Anonymous
06/28/24(Fri)10:48:08 No.101188702

Anonymous 06/28/24(Fri)10:48:08 No.101188702

>>101188626
>you will have worse quality and speed than what you would have just running 100% with exl2
Speed is fair enough, but I've never seen any evidence that exl2 produces better results than an equivalent bpw gguf, even more so considering imatrix now.
And considering the rpcal debacle, I'm even less inclined to believe subjective reports.

Anonymous
06/28/24(Fri)10:48:40 No.101188708

Anonymous 06/28/24(Fri)10:48:40 No.101188708

>>101188698
You are a retard.

Anonymous
06/28/24(Fri)10:48:53 No.101188713

Anonymous 06/28/24(Fri)10:48:53 No.101188713

File: stallman saluting.png (1.83 MB, 1600x1060)

1.83 MB PNG

>>101188248

Anonymous
06/28/24(Fri)10:49:35 No.101188721

Anonymous 06/28/24(Fri)10:49:35 No.101188721

>>101188708
I know what I'm talking about. The rpcal situation mentioned by
>>101188702
is sufficient evidence that EXL2 is garbage.

Anonymous
06/28/24(Fri)10:50:32 No.101188730

Anonymous 06/28/24(Fri)10:50:32 No.101188730

>team llamacpp are RP fags
it all makes sense now, kek

Anonymous
06/28/24(Fri)10:51:02 No.101188736

Anonymous 06/28/24(Fri)10:51:02 No.101188736

>>101188721
>is sufficient evidence that EXL2 is garbage.
It's not. That's just users being retarded.
I still want to see actual comparisons of kv divergence, ppl, and loggits between full precision, exl2 at Y bpw and gguf at Y bpw.

Anonymous
06/28/24(Fri)10:53:29 No.101188761

Anonymous 06/28/24(Fri)10:53:29 No.101188761

File: 1708953045138-2.png (17 KB, 871x870)

17 KB PNG

>>101188248
DO IT

llama.cpp CUDA dev !YOmst7Ghe6
06/28/24(Fri)10:53:34 No.101188762

llama.cpp CUDA dev !YOmst7Ghe6 06/28/24(Fri)10:53:34 No.101188762

File: Screenshot_20240628_164853.png (318 KB, 3840x2160)

318 KB PNG

>>101188469
llama.cpp peak performance on an RTX 4090 currently sits at ~90% of the peak performance reported on the ExLlama Github repository (for both token generation and prompt processing).
So I'm thinking that with a bit more MMQ optimization and speculative decoding support llama.cpp will be faster.

>>101188490
(A)GPL has a "copyleft", meaning you cannot make any forks or derivative software closed-source.
If there was some critical feature that was licensed with a copyleft it would force downstream projects like ollama to either re-license their project to also include a copyleft or they would not be legally allowed to take over the feature.
Since permissive licenses without copylefts are considered more "business friendly" this would basically just troll projects like ollama that are more business focused.
Koboldcpp and Ooba would be unaffected since they already use copyleft licenses.

>>101188502
Did you not read the part where I said that it wouldn't be worth the drama?

>>101188626
According to https://github.com/matt-c1/llama-3-quant-comparison llama.cpp quantization is more efficient in terms of MMLU score at a given size though for >4 BPW it probably won't matter much.

>>101188730
Actually if you look at Google trends team llama.cpp are Chinese.

Anonymous
06/28/24(Fri)10:55:56 No.101188791

Anonymous 06/28/24(Fri)10:55:56 No.101188791

>>101188382
A true hero

Anonymous
06/28/24(Fri)10:57:15 No.101188803

Anonymous 06/28/24(Fri)10:57:15 No.101188803

>>101188762
If I mix 3090 and P100, I'm basically forcing the 3090 down to the level of the P100, in terms of supported math operations, right?
I'm just trying to figure out how to speed up my L3 70B gens without buying more 3090s at the moment.
I've got 2x 3090 and 3x P100, everything is on PCIe 3.0 16x.

Anonymous
06/28/24(Fri)10:57:47 No.101188811

Anonymous 06/28/24(Fri)10:57:47 No.101188811

>>101188762
>US isn't even in the top 5
Wtf?

Anonymous
06/28/24(Fri)10:58:31 No.101188817

Anonymous 06/28/24(Fri)10:58:31 No.101188817

File: 3belzjkbpex61.jpg (84 KB, 1200x675)

84 KB JPG

>>101188762
thanks for the licence lesson anon, much appreciated

Anonymous
06/28/24(Fri)10:59:04 No.101188825

Anonymous 06/28/24(Fri)10:59:04 No.101188825

>>101188762
You are a very skilled programmer and a all around based invididual doing very important work. Do you have a Ko-fi account or some shady crypto-adress I could send 20$ to?

t: Vramlet Simp.

Anonymous
06/28/24(Fri)10:59:52 No.101188829

Anonymous 06/28/24(Fri)10:59:52 No.101188829

File: 1719780960379661.png (1.42 MB, 832x1216)

1.42 MB PNG

>>101188762
>Did you not read the part where I said that it wouldn't be worth the drama?
Yes, it would be worth it. 100x over.
We wouldn't be here without the original llama leaker.
I'm thinkin' miqu

Anonymous
06/28/24(Fri)11:01:48 No.101188847

Anonymous 06/28/24(Fri)11:01:48 No.101188847

>>101188382
>AI-powered RPG
Infinite Zork is actually coming, HOLY FUCKING KINO

Anonymous
06/28/24(Fri)11:03:56 No.101188869

Anonymous 06/28/24(Fri)11:03:56 No.101188869

>>101188847
not on lcpp
processing prompt 3/16192

Anonymous
06/28/24(Fri)11:04:15 No.101188872

Anonymous 06/28/24(Fri)11:04:15 No.101188872

File: patrick Bateman.png (757 KB, 900x900)

757 KB PNG

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork

Anonymous
06/28/24(Fri)11:04:54 No.101188877

Anonymous 06/28/24(Fri)11:04:54 No.101188877

>>101188803
Yes, the slowest component dictates the performance.

Anonymous
06/28/24(Fri)11:06:36 No.101188890

Anonymous 06/28/24(Fri)11:06:36 No.101188890

File: mfw.png (802 KB, 1000x562)

802 KB PNG

>>101188872
Wait what? He's going to license all of his llamacpp contributions under AGPL from now on?

llama.cpp CUDA dev !YOmst7Ghe6
06/28/24(Fri)11:07:26 No.101188896

llama.cpp CUDA dev !YOmst7Ghe6 06/28/24(Fri)11:07:26 No.101188896

>>101188803
For llama.cpp it should depend on --split-mode .
With --split-mode layer each GPU should be using the optimal kernels (but for some quantization formats there is no P100-compatible implementation).
Wtih --split-mode row the P100s will force the 3090s to use suboptimal kernels because P100s lack the __dp4a instruction and thus cannot run MMQ.
Any other Pascal card or more modern cards (except for V100s which lack int8 tensor cores) should not be causing issues.

>>101188825
I at some point had a ko-fi account linked on my Github but I decided to remove it when I accepted a part-time job for a known AI company.
I cannot in good conscience accept money from people that earn significantly less than me per hour when I right now don't even have a use for it and would need to pay a large percentage of it in taxes.
Maybe I'll do crowdfunding if I ever invest relevant amounts of money into training.

Anonymous
06/28/24(Fri)11:08:15 No.101188906

Anonymous 06/28/24(Fri)11:08:15 No.101188906

>>101188890
Yes, we are back

Anonymous
06/28/24(Fri)11:08:35 No.101188911

Anonymous 06/28/24(Fri)11:08:35 No.101188911

File: votzefuc.png (935 KB, 1024x912)

935 KB PNG

>>101188248
>I'm going to implement some critical component on an AGPL fork

Anonymous
06/28/24(Fri)11:09:36 No.101188918

Anonymous 06/28/24(Fri)11:09:36 No.101188918

>>101188896
I tend to use q8 quants, is that best for P100?

Anonymous
06/28/24(Fri)11:10:37 No.101188931

Anonymous 06/28/24(Fri)11:10:37 No.101188931

Are all these posts being made by the same person kek

Anonymous
06/28/24(Fri)11:10:49 No.101188934

Anonymous 06/28/24(Fri)11:10:49 No.101188934

File: 1709780422379321.png (591 KB, 1200x1200)

591 KB PNG

>>101188248
>>101188906

Anonymous
06/28/24(Fri)11:11:57 No.101188943

Anonymous 06/28/24(Fri)11:11:57 No.101188943

File: file.png (604 KB, 900x900)

604 KB PNG

>>101188248
>AGPL fork
yup i'm thinkin' based

Anonymous
06/28/24(Fri)11:11:59 No.101188944

Anonymous 06/28/24(Fri)11:11:59 No.101188944

pretty sad tbdesu

llama.cpp CUDA dev !YOmst7Ghe6
06/28/24(Fri)11:13:23 No.101188960

llama.cpp CUDA dev !YOmst7Ghe6 06/28/24(Fri)11:13:23 No.101188960

>>101188847
Do not expect anything anytime soon though.

>>101188918
It should only be the IQ quants that cause issues for P100s.

Anonymous
06/28/24(Fri)11:14:45 No.101188972

Anonymous 06/28/24(Fri)11:14:45 No.101188972

File: 1714466048436h.gif (70 KB, 99x109)

70 KB GIF

>>101188248
>I've been thinking it would be kind of funny to implement some critical component on an AGPL fork but I really don't think it would be worth the drama.
DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT DO IT

Anonymous
06/28/24(Fri)11:15:49 No.101188983

Anonymous 06/28/24(Fri)11:15:49 No.101188983

File: anon?!.png (473 KB, 860x799)

473 KB PNG

>>101188248

Anonymous
06/28/24(Fri)11:16:34 No.101188993

Anonymous 06/28/24(Fri)11:16:34 No.101188993

>>101188591
based

Anonymous
06/28/24(Fri)11:19:01 No.101189017

Anonymous 06/28/24(Fri)11:19:01 No.101189017

File: literally me.jpg (89 KB, 600x435)

89 KB JPG

>>101188248

Anonymous
06/28/24(Fri)11:20:46 No.101189033

Anonymous 06/28/24(Fri)11:20:46 No.101189033

He's not going to do it, no matter how much you spam, you know.

Anonymous
06/28/24(Fri)11:21:29 No.101189039

Anonymous 06/28/24(Fri)11:21:29 No.101189039

File: BlueSkyColumnGarden.png (1.32 MB, 1248x800)

1.32 MB PNG

Good morning lmg!

Anonymous
06/28/24(Fri)11:21:50 No.101189044

Anonymous 06/28/24(Fri)11:21:50 No.101189044

>>101189039
Good morning Miku

Anonymous
06/28/24(Fri)11:22:50 No.101189049

Anonymous 06/28/24(Fri)11:22:50 No.101189049

File: DOITODIODOTIOIDOITOITDOID(...).png (2.74 MB, 2560x2560)

2.74 MB PNG

>>101188762

Anonymous
06/28/24(Fri)11:22:57 No.101189054

Anonymous 06/28/24(Fri)11:22:57 No.101189054

>>101189039
Good morning anon that posts tasteful artistic mikus

Anonymous
06/28/24(Fri)11:25:30 No.101189069

Anonymous 06/28/24(Fri)11:25:30 No.101189069

File: 1714732567846c.png (180 KB, 830x830)

180 KB PNG

>>101189033

Anonymous
06/28/24(Fri)11:26:32 No.101189078

Anonymous 06/28/24(Fri)11:26:32 No.101189078

File: (You).png (1.31 MB, 1000x1000)

1.31 MB PNG

>He's not going to do it, no matter how much you spam, you know.

Anonymous
06/28/24(Fri)11:30:05 No.101189106

Anonymous 06/28/24(Fri)11:30:05 No.101189106

File: hatsune-sad_0034.png (1.98 MB, 1280x960)

1.98 MB PNG

>>101188762

Anonymous
06/28/24(Fri)11:33:55 No.101189153

Anonymous 06/28/24(Fri)11:33:55 No.101189153

File: Hatsune Miku (Vocaloid).png (960 KB, 850x1258)

960 KB PNG

>>101188762
>Did you not read the part where I said that it wouldn't be worth the drama?

Anonymous
06/28/24(Fri)11:39:55 No.101189211

Anonymous 06/28/24(Fri)11:39:55 No.101189211

File: SCREAM SCREAAAAAAAAAAAAAA(...).png (473 KB, 600x450)

473 KB PNG

>JOOIIN US NOW AND SHAAAREE THE SOFTWARE
>YOU'LL BE FREE HACKERS
>YOU'LL BE FREEE

Anonymous
06/28/24(Fri)11:40:58 No.101189223

Anonymous 06/28/24(Fri)11:40:58 No.101189223

>>101188931
Yes it is the license autist.

Anonymous
06/28/24(Fri)11:43:39 No.101189243

Anonymous 06/28/24(Fri)11:43:39 No.101189243

for me, it's cc-by-nc4.0

Anonymous
06/28/24(Fri)11:44:13 No.101189249

Anonymous 06/28/24(Fri)11:44:13 No.101189249

File: .png (1.54 MB, 832x1216)

1.54 MB PNG

>for me, it's cc-by-nc4.0

Anonymous
06/28/24(Fri)11:46:05 No.101189264

Anonymous 06/28/24(Fri)11:46:05 No.101189264

>>101189249
I hate having to tell guys with muscle-girl fetishes this all the time but the level of testosterone required for muscles like that is mutually exclusive to having tits.

Anonymous
06/28/24(Fri)11:48:30 No.101189277

Anonymous 06/28/24(Fri)11:48:30 No.101189277

>>101188896
>I cannot in good conscience accept money from people
Based ethical dev

Anonymous
06/28/24(Fri)11:48:34 No.101189278

Anonymous 06/28/24(Fri)11:48:34 No.101189278

File: gemma2-9b-brot.png (49 KB, 668x592)

49 KB PNG

I'm running gemma2 9b on an i5-6500T server just for fun to see it struggle but the speeds are kinda acceptable? That's incredible
And gemma2 9b passes my non-scientific mandelbrot coding test, albeit a bit weird, which only recent small models like llama3-8b passed at all. Mistral 7B and older did not pass this test.

Anonymous
06/28/24(Fri)11:52:48 No.101189308

Anonymous 06/28/24(Fri)11:52:48 No.101189308

>>101188248
Release it with no license provided. Copyleft and copyright are two sides of the same cancerous coin. Make it so that nobody who believes in the validity of ""intellectual"" ""property"" can use your code.

Anonymous
06/28/24(Fri)11:58:01 No.101189353

Anonymous 06/28/24(Fri)11:58:01 No.101189353

>>101189308
nta, i'm a hobbyist programmer and have never once read a license and i've been releasing stuff for over 25 years. even the bigger stuff that uses libraries, never cared, i just release it and have never once had a single issue raised. i dunno why people even care unless you're starting a company based on stolen code or something

Anonymous
06/28/24(Fri)11:59:15 No.101189362

Anonymous 06/28/24(Fri)11:59:15 No.101189362

File: base9bnalatest.png (52 KB, 930x237)

52 KB PNG

Gemma 9b base model coming up a little lackluster on the Nala test.

Anonymous
06/28/24(Fri)12:01:02 No.101189374

Anonymous 06/28/24(Fri)12:01:02 No.101189374

>>101189353
>i dunno why people even care unless you're starting a company based on stolen code or something
thats the point, agpl scares big corpo away

Anonymous
06/28/24(Fri)12:01:18 No.101189377

Anonymous 06/28/24(Fri)12:01:18 No.101189377

>>101189362
She's very thorough when it comes to licking

Anonymous
06/28/24(Fri)12:06:04 No.101189425

Anonymous 06/28/24(Fri)12:06:04 No.101189425

>>101189278
Sick. What are you running it with? llama.cpp?

>>101189362
That does look weird.
Is that with the proper prompt format, what backend are you using?

Anonymous
06/28/24(Fri)12:07:50 No.101189439

Anonymous 06/28/24(Fri)12:07:50 No.101189439

>>101189425
Q8 on llama.cpp
I just used a generic base model prompt template that I had available.

Anonymous
06/28/24(Fri)12:07:58 No.101189443

Anonymous 06/28/24(Fri)12:07:58 No.101189443

>>101189362
That looks like a bug.

Anonymous
06/28/24(Fri)12:10:01 No.101189468

Anonymous 06/28/24(Fri)12:10:01 No.101189468

>>101189211
Miku are you okay? Are you okay Miku?

Anonymous
06/28/24(Fri)12:12:38 No.101189496

Anonymous 06/28/24(Fri)12:12:38 No.101189496

>>101189353
Exactly. This is the optimal mindset, and the one that's cultivated by releasing software with no license provided.

Anonymous
06/28/24(Fri)12:13:23 No.101189501

Anonymous 06/28/24(Fri)12:13:23 No.101189501

What makes the transformer architecture intelligent?

Anonymous
06/28/24(Fri)12:14:53 No.101189513

Anonymous 06/28/24(Fri)12:14:53 No.101189513

>>101189501
>What makes the transformer architecture intelligent?
attention

Anonymous
06/28/24(Fri)12:15:47 No.101189518

Anonymous 06/28/24(Fri)12:15:47 No.101189518

>>101189501
Me.

Anonymous
06/28/24(Fri)12:17:34 No.101189529

Anonymous 06/28/24(Fri)12:17:34 No.101189529

>>101189501
"Expert roleplayer" in system prompt.

Anonymous
06/28/24(Fri)12:17:56 No.101189535

Anonymous 06/28/24(Fri)12:17:56 No.101189535

>>101188896
>>101189362
Dichotomy of 4chan

Anonymous
06/28/24(Fri)12:20:45 No.101189562

Anonymous 06/28/24(Fri)12:20:45 No.101189562

>>101189496
a lot of times what i get from other code its a simple function i have to rewrite anyways to fit what i need, but the original served as a good example. imo if code is out there and visible it should be free game and treated like that. personally i like when i get a message from someone who uses my code as part of theirs, they show me how they hacked it up and changed stuff so it fits what they need. when i do use a whole library from something that has a license, i include a thanks/credits but never even bother to check the license. its such a non-issue for 99% of people i dunno why anyone even cares

Anonymous
06/28/24(Fri)12:21:43 No.101189568

Anonymous 06/28/24(Fri)12:21:43 No.101189568

>>101189535
the Nala test is the only objective RP test we have.

Anonymous
06/28/24(Fri)12:23:19 No.101189580

Anonymous 06/28/24(Fri)12:23:19 No.101189580

>>101189049
we did it reddit!

Anonymous
06/28/24(Fri)12:28:45 No.101189617

Anonymous 06/28/24(Fri)12:28:45 No.101189617

Latest bartowski gguf 9B gemma and current llamacpp still is incoherent past 4k context.

Anonymous
06/28/24(Fri)12:29:49 No.101189623

Anonymous 06/28/24(Fri)12:29:49 No.101189623

>>101189501
who knows really? this shit is way too complex to be understood theorically

Anonymous
06/28/24(Fri)12:32:00 No.101189644

Anonymous 06/28/24(Fri)12:32:00 No.101189644

>>101189617
SWA thing, won't be fixed 'closed this as not planned'
>>101181078
>9b and 9b-it: seem to be fine as long as you're under 4k context. When I gen a message in RP with a 5k context, both have severe quality degradation. Can't spell things right, can't write grammatically correct sentences. Possibly problem with sliding window attention? The model interleaves 4k SWA and 8k dense attention. Once context is over 4k, the sliding window actually starts sliding and maybe something breaks? Hopefully something is just broke and can be fixed, and model is not fundamentally a 4k context model.
shit, then lcpp is fucked for that, since gergio said he didn't cared
>It feels that since Mistral 7B from last year, there hasn't been much interest in this technique. Even later Mistral models dropped it as a feature. Taking this into account, I guess we can leave this issue closed
https://github.com/ggerganov/llama.cpp/issues/3377

Anonymous
06/28/24(Fri)12:32:02 No.101189646

Anonymous 06/28/24(Fri)12:32:02 No.101189646

>>101189425
I'm running gemma2 9b with just ollama run gemma2, which is Q4_0, so it's not the best it could be but it still passes the test lol

Anonymous
06/28/24(Fri)12:34:51 No.101189674

Anonymous 06/28/24(Fri)12:34:51 No.101189674

>>101187024
>>101187037
>27b is fucked
I quanted my own to bf16 and am running it without problems. I could provide instructions if anyone cares

Anonymous
06/28/24(Fri)12:35:53 No.101189693

Anonymous 06/28/24(Fri)12:35:53 No.101189693

>>101189501
It's not intelligent.
But if you ask when it appears to be then the answer is brute forcing with the number of neurons + obviously attention layers.

Anonymous
06/28/24(Fri)12:36:56 No.101189702

Anonymous 06/28/24(Fri)12:36:56 No.101189702

>>101189674
google themselves are saying something's wrong with it
>Yes we are investigating what went wrong! Note that float16 should not be used for this model
https://huggingface.co/google/gemma-2-27b-it/discussions/10#667e9fc6f0820e80d39aaf3e

Anonymous
06/28/24(Fri)12:37:10 No.101189705

Anonymous 06/28/24(Fri)12:37:10 No.101189705

>>101189644
Well at least it should work with Transformers right? Then we can at least confirm what the "good" context length of the model is, and whether interleaved global + sliding window attention really works without any issues.

Anonymous
06/28/24(Fri)12:39:03 No.101189726

Anonymous 06/28/24(Fri)12:39:03 No.101189726

>>101189705
no actually, someone tested on transformers, and said it didn't handle >4k well either
>>101181113

Anonymous
06/28/24(Fri)12:39:19 No.101189729

Anonymous 06/28/24(Fri)12:39:19 No.101189729

>>101189702
>Note that float16 should not be used for this model
the fuck do they mean by that? we have no other choice but to use their fp16 model to do the quants, that's the only thing they gave us

Anonymous
06/28/24(Fri)12:40:13 No.101189739

Anonymous 06/28/24(Fri)12:40:13 No.101189739

>>101188277
most of jart's PRs have been ignored for weeks. draw your own conclusions.

Anonymous
06/28/24(Fri)12:40:19 No.101189740

Anonymous 06/28/24(Fri)12:40:19 No.101189740

>>101189729
there is this
https://huggingface.co/google/gemma-2-27b-it-pytorch

Anonymous
06/28/24(Fri)12:40:29 No.101189742

Anonymous 06/28/24(Fri)12:40:29 No.101189742

>>101189264
what? next time you will tell me that cocks in real life can't actually penetrate cervix to shoot cum in her womb

Anonymous
06/28/24(Fri)12:41:27 No.101189749

Anonymous 06/28/24(Fri)12:41:27 No.101189749

>>101189705
>Well at least it should work with Transformers right
Some anon was comparing the transformers implementation to the reference and it seems that they might have fucked some stuff up too.
Basically, give it 4 or 5 days until everything is in working order.

Anonymous
06/28/24(Fri)12:42:36 No.101189767

Anonymous 06/28/24(Fri)12:42:36 No.101189767

When will we ever not get a fucked model launch? Christ. They really couldn't spare just a bit more time and manpower to make sure things actually work properly on people's machines.

Anonymous
06/28/24(Fri)12:43:15 No.101189776

Anonymous 06/28/24(Fri)12:43:15 No.101189776

>>101188115
>>101188115
>>101188115
bumping for visibility

Anonymous
06/28/24(Fri)12:43:44 No.101189784

Anonymous 06/28/24(Fri)12:43:44 No.101189784

>>101189767
why do that when the autismos might fix it for them, for free?

Anonymous
06/28/24(Fri)12:43:45 No.101189787

Anonymous 06/28/24(Fri)12:43:45 No.101189787

>>101189767
why bother, just wait and let open source chumps to do it for you

Anonymous
06/28/24(Fri)12:47:17 No.101189816

Anonymous 06/28/24(Fri)12:47:17 No.101189816

>>101189702
>google themselves
Doesn't change the fact that I'm using it and its working fine
>float16 should not be used for this model
f16 != bf16. another comment says bf16 works.
Its not ultra-impressive, but it works

Anonymous
06/28/24(Fri)12:47:52 No.101189827

Anonymous 06/28/24(Fri)12:47:52 No.101189827

>>101189729
They provided BF16 didn't they? Nobody should be quanting from F16 for BF16 models since llamacpp added support for BF16 1-2 months ago. Even before BF16 support you were supposed to upscale BF16 to F32 then quant.

Anonymous
06/28/24(Fri)12:48:18 No.101189833

Anonymous 06/28/24(Fri)12:48:18 No.101189833

>>101189513
how

>>101189693
>then the answer is brute forcing with the number of neurons + obviously attention layers.
well, how

Anonymous
06/28/24(Fri)12:51:07 No.101189868

Anonymous 06/28/24(Fri)12:51:07 No.101189868

>>101189833
>how
https://machinelearningmastery.com/the-transformer-attention-mechanism/
https://www.youtube.com/watch?v=kf_eGgVtOcs

Anonymous
06/28/24(Fri)12:51:29 No.101189872

Anonymous 06/28/24(Fri)12:51:29 No.101189872

>>101186805
What blew my mind was when I went digging and found that they mask the files behind le epic hash-code like renames, then put the key in a JSON in the next directory.

Which meant part of my evolution was becoming a l33t hax0r by switching around file names/hashes, and getting to the point of thinking of using a tool for it, and then saying, naaaaaaaaaah, I'll just get that program named after D&D fun size wife lizards.

>>101186815
>bigger models despite the slowness are worth using because the responses are so much better. 8b is so retarded i can't believe anyone even wastes time on them no matter how fast it is
Same. I want for there to be a small model that isn't total ass for the sake of having a real-time-ish option. But the smallest one that has passed my music theory question is 40GB (qwen2-72b-instruct-q4_k_s, and yes, the parallel _m failed. Just barely, but it also blew a pop culture question I've started testing against as well that _s got right. S to M is +4GB and -40IQ.)

Anonymous
06/28/24(Fri)12:52:43 No.101189885

Anonymous 06/28/24(Fri)12:52:43 No.101189885

>>101189749
Yes that was me, they reversed the order of sliding window attention and global attention. But at >4k context, where this actually matters, latest HF Transformers commit doesn't even work, it crashes with some internal cuda error, index out of bounds or something. But once that's fixed they still need to fix the off-by-one error for SWA / global attn. Someone should probably tell them, I don't think anybody else has realized it yet.

Anonymous
06/28/24(Fri)12:52:50 No.101189887

Anonymous 06/28/24(Fri)12:52:50 No.101189887

>>101189827
>>101189816
well apparently people are reporting issues with bratowski quants, so maybe help him out then? if yours works correctly
>Just a heads up, there seems to be some serious issues with this model regardless of whether you use the template correctly or not. In my testing it performs significantly worse than the 9b version, so much worse that there's clearly something fundamentally wrong. And I've seen many others have the same experience. An issue has been created on the official Repo, and Google states they are currently investigating it.

https://huggingface.co/bartowski/gemma-2-27b-it-GGUF/discussions/3#667ee47b8972e9eb302f7724

Anonymous
06/28/24(Fri)12:54:35 No.101189905

Anonymous 06/28/24(Fri)12:54:35 No.101189905

>>101189833
The more neurons you have the more complex function you can emulate.
Easy functions like linear function you can emulate with a single neuron, for XOR you need a few neurons. Language is a very complex function so to emulate it on a reasonable level you need billions of neurons. Neural networks are universal approximators so it's just not a question "if" but "how big"

Anonymous
06/28/24(Fri)12:59:59 No.101189956

Anonymous 06/28/24(Fri)12:59:59 No.101189956

File: 1717520245667244.png (674 KB, 1792x1024)

674 KB PNG

>>101189362

Anonymous
06/28/24(Fri)13:00:49 No.101189965

Anonymous 06/28/24(Fri)13:00:49 No.101189965

Haha, looooool gogle can't even release a mode right, holy shit.

Anonymous
06/28/24(Fri)13:02:17 No.101189984

Anonymous 06/28/24(Fri)13:02:17 No.101189984

>>101189905
what about the attention layers part?

Anonymous
06/28/24(Fri)13:02:36 No.101189993

Anonymous 06/28/24(Fri)13:02:36 No.101189993

File: bf16-f16-bartowski.png (46 KB, 845x299)

46 KB PNG

>>101189887
Bartowski is not retarded and knows not to quant from and F16 base (pic) so I don't think it is that. If BF16 works and his quants don't he is either fucking up somewhere else or there is something wrong with the quant code in llamacpp with regards to gemma.

Anonymous
06/28/24(Fri)13:03:35 No.101190006

Anonymous 06/28/24(Fri)13:03:35 No.101190006

>>101189956
the meme that saved /lmg/

Anonymous
06/28/24(Fri)13:04:06 No.101190012

Anonymous 06/28/24(Fri)13:04:06 No.101190012

>>101189993
>Bartowski is not retarded
i know, which is why it's weird his quants are reported as being borked too, if it was darmercher that'd be par for the course, but him?

Anonymous
06/28/24(Fri)13:07:19 No.101190042

Anonymous 06/28/24(Fri)13:07:19 No.101190042

File: Untitled.jpg (19 KB, 542x76)

19 KB JPG

i'm still following this new cai drama for the luls and it keeps delivering

Anonymous
06/28/24(Fri)13:07:56 No.101190052

Anonymous 06/28/24(Fri)13:07:56 No.101190052

>>101189993
Wait does this imply that I shouldn't be converting straight from BF16 to Q8 (if I want objectively the most accuracy), but BF16->FP32->Q8? Or is he comparing simply to BF16->FP16? I mean BF16->FP16 is done by the conversion script rather than the quantize script, but the quantize script can take in a BF16 GGUF file, so I just assumed that worked the same as making it work off of a FP32.

Anonymous
06/28/24(Fri)13:08:26 No.101190059

Anonymous 06/28/24(Fri)13:08:26 No.101190059

What is the smallest model that can reliably be forced to use function calls? One of the Mistral Instruct v3s? Why isn't function calling more commonly a feature? I have no use of these instruct models if they can't reliably trigger function calls

Anonymous
06/28/24(Fri)13:09:46 No.101190068

Anonymous 06/28/24(Fri)13:09:46 No.101190068

>>101190059
>Why isn't function calling more commonly a feature?
the only function most care about is ah ah mistress

Anonymous
06/28/24(Fri)13:10:11 No.101190075

Anonymous 06/28/24(Fri)13:10:11 No.101190075

>>101190042
wait people are still using cai? They should let it go, the golden age is over since years now

Anonymous
06/28/24(Fri)13:10:36 No.101190077

Anonymous 06/28/24(Fri)13:10:36 No.101190077

>>101190042
kek, qrd?

Anonymous
06/28/24(Fri)13:10:59 No.101190080

Anonymous 06/28/24(Fri)13:10:59 No.101190080

>>101190052
It doesn't matter. You should only be directly quantizing native FP32. Anything else is like converting an MP3 file to a 32-bit float wav before encoding to an OGG. It makes no difference, you're still incurring generational loss.

Anonymous
06/28/24(Fri)13:11:26 No.101190084

Anonymous 06/28/24(Fri)13:11:26 No.101190084

File: eqbench.png (223 KB, 1304x982)

223 KB PNG

Guys....!
https://eqbench.com/creative_writing.html

Anonymous
06/28/24(Fri)13:12:10 No.101190091

Anonymous 06/28/24(Fri)13:12:10 No.101190091

>>101190080
>native FP32.
no models are released like this they're all bf16 now

Anonymous
06/28/24(Fri)13:12:59 No.101190099

Anonymous 06/28/24(Fri)13:12:59 No.101190099

File: my honest reaction.jpg (47 KB, 562x675)

47 KB JPG

>>101190084

Anonymous
06/28/24(Fri)13:13:01 No.101190100

Anonymous 06/28/24(Fri)13:13:01 No.101190100

>>101190052
The idea is that because FP16 is coarse and BF16 is coarse and they're coarse in different ways, going from one to the other can cause a greater amount of drift in the values than if you go to 32 and then to the other 16 because the 32 will be no less accurate than the first 16 but might find a more accurate representation in the other 16 after visiting 32.

It's probably really close to irrelevant, but again, if it silences the armchair computer math geniuses who want to throw shade at a coder with his boots on the ground and dealing with video card opcodes, it's worth that extra step.

Anonymous
06/28/24(Fri)13:13:07 No.101190103

Anonymous 06/28/24(Fri)13:13:07 No.101190103

File: file.png (8 KB, 245x108)

8 KB PNG

>>101190077

Anonymous
06/28/24(Fri)13:14:17 No.101190111

Anonymous 06/28/24(Fri)13:14:17 No.101190111

>>101190084
>creative_writing
Wouldn't that reward hallucinations as long as they are grammatically correct?

I want my AI to get things RIGHT.

Anonymous
06/28/24(Fri)13:15:21 No.101190123

Anonymous 06/28/24(Fri)13:15:21 No.101190123

>>101190111
you can check the samples
https://eqbench.com/results/creative-writing-v2/google__gemma-2-9b-it.txt

Anonymous
06/28/24(Fri)13:16:12 No.101190129

Anonymous 06/28/24(Fri)13:16:12 No.101190129

>>101190103
lmao wtf

Anonymous
06/28/24(Fri)13:16:48 No.101190134

Anonymous 06/28/24(Fri)13:16:48 No.101190134

File: 1715828570744625.jpg (490 KB, 1024x1024)

490 KB JPG

PSA: llama.cpp recommends quanting yourself with the latest version, every time:
>>101185349
There's no telling how many quantizations are degrading or the extent of it. If I were someone that produced a large amount of quants in a very short time it's safe to say I'd probably be a little concerned.

Anonymous
06/28/24(Fri)13:16:50 No.101190135

Anonymous 06/28/24(Fri)13:16:50 No.101190135

>>101190084
>"Bloody hell," Rhys muttered, ducking into the narrow doorway, the bell above jingling like a frantic bird. He was followed by a flurry of wind and rain, leaving a damp trail across the worn wooden floor. "Sorry about that."

>The bookstore owner, a woman with hair the colour of a stormy sea and eyes that seemed to hold the secrets of a thousand stories, didn't even look up from the book in her hands.

>"No need for apologies," she said, her voice a low, melodious rumble. "We get our fair share of storms here."

>Rhys glanced around the shop, his usual actor's instinct to assess his surroundings kicking in. It was crammed with books, overflowing shelves reaching towards the high ceiling. The air smelled of old paper and brewing tea, a comforting scent that did little to quell the pounding of his heart. He was used to the sterile, bright glare of studio lights, the hushed whispers of adoring fans. This... this felt different.

>"Lovely shop," he offered, trying to sound casual. "You must know all these books like the back of your hand."

>"More like the front," she replied with a wry smile, finally meeting his gaze. Her eyes were sharp, observant, and for a moment, Rhys felt like he was being seen through, not as the charming, famous actor, but as the man beneath the facade.

>He cleared his throat, a nervous tick he'd never quite managed to shake. "I'm Rhys," he said, extending a hand. "Rhys Evans. You probably know me."

Not bad for a 9B.

Anonymous
06/28/24(Fri)13:17:13 No.101190142

Anonymous 06/28/24(Fri)13:17:13 No.101190142

File: 60f0osrums8d1.jpg (37 KB, 828x523)

37 KB JPG

>>101190077
>>101190075
i've never used it myself even when it was supposedly good but i used to check the sub for card discussion. apparently there was a recent update that made it even worse than the cucked current version that was already in place. to me it sounds like they plugged mixtral 8x7b into it. lots of complaints about similar slop we're used to (and mixtral's patent dryness), but on top of that tons of new censorship (for some reason they are all trying to kill this baby but it refuses to allow it). its very entertaining to read at least

Anonymous
06/28/24(Fri)13:19:08 No.101190160

Anonymous 06/28/24(Fri)13:19:08 No.101190160

File: holy fuck phi3 is GOATed.jpg (252 KB, 1315x983)

252 KB JPG

Anonymous
06/28/24(Fri)13:19:21 No.101190161

Anonymous 06/28/24(Fri)13:19:21 No.101190161

>>101190134
Yes, let's all just have loads of bandwidth and storage and access to 32's of every model all of the time and requant on every update.

Sounds to me like a punt. If there's a problem with old quants, how about know what causes that and then quanters can requant the ones that need it when they need it?

Anonymous
06/28/24(Fri)13:21:07 No.101190175

Anonymous 06/28/24(Fri)13:21:07 No.101190175

What SillyTavern template does Gemma-2-it use? It's not in the model card

Anonymous
06/28/24(Fri)13:21:50 No.101190185

Anonymous 06/28/24(Fri)13:21:50 No.101190185

>>101190160
Phi3 can't music theory. Into the trash it goes.

Anonymous
06/28/24(Fri)13:22:48 No.101190192

Anonymous 06/28/24(Fri)13:22:48 No.101190192

File: 11__00820_.png (1.87 MB, 1024x1024)

1.87 MB PNG

>>101190091
That's why you convert the safetensors bf16 to a FP32 gguf.
Easy as pie for llama3.
A significantly bigger pain in the ass for 8x22b where the file gets to ~500gb.

Anonymous
06/28/24(Fri)13:23:00 No.101190196

Anonymous 06/28/24(Fri)13:23:00 No.101190196

>>101190175
You can always check in the tokenizer_config.json.
>https://huggingface.co/mlx-community/gemma-2-9b-it-8bit/blob/f80177abb1db06efbe09dbf7ce69faaa45ecbe76/tokenizer_config.json#L1747
>"{{ bos_token }}{% if messages[0]['role'] == 'system' %}{{ raise_exception('System role not supported') }}{% endif %}{% for message in messages %}{% if (message['role'] == 'user') != (loop.index0 % 2 == 0) %}{{ raise_exception('Conversation roles must alternate user/assistant/user/assistant/...') }}{% endif %}{% if (message['role'] == 'assistant') %}{% set role = 'model' %}{% else %}{% set role = message['role'] %}{% endif %}{{ '<start_of_turn>' + role + '\n' + message['content'] | trim + '<end_of_turn>\n' }}{% endfor %}{% if add_generation_prompt %}{{'<start_of_turn>model\n'}}{% endif %}",

Anonymous
06/28/24(Fri)13:24:08 No.101190216

Anonymous 06/28/24(Fri)13:24:08 No.101190216

>>101190192
that's why I wish BitNet will be something serious in the future, we won't have to deal with this conversion/quantization bullshit anymore

Anonymous
06/28/24(Fri)13:25:12 No.101190229

Anonymous 06/28/24(Fri)13:25:12 No.101190229

>>101190161
They can, but they don't. I reconvert/requant whenever there's a change that affects them. You only have to download the model once. And chances are that you don't really need 32 models (i have about 60 (some very small)) but regularly use 4-5 and test them every now and then when there's updates.

Anonymous
06/28/24(Fri)13:26:49 No.101190245

Anonymous 06/28/24(Fri)13:26:49 No.101190245

>>101190216
bitnet won't solve that. The resulting model from the training is just as big and the conversion still has to be done.

Anonymous
06/28/24(Fri)13:27:05 No.101190249

Anonymous 06/28/24(Fri)13:27:05 No.101190249

>>101190052
>>101190080
>>101190100
BF16 - Native base of most models.
You want to quant using BF16 as your base.

The only time when F32 was used was when llamacpp didn't support quanting directly from BF16; so people back then converted BF16 -> F32 which is lossless and then quanted from F32 which llamacpp supported at the time.

The only time F16 was used is retards converting BF16 to F16 (which is lossy) and then quanting from the F16.

For BF16 native models there is no reason to quant from anything other than BF16 these days.

Anonymous
06/28/24(Fri)13:28:42 No.101190264

Anonymous 06/28/24(Fri)13:28:42 No.101190264

>>101190245
not at all, there won't be fp16 weights anymore but 1.58bit, that's how it will start at the pretraining and it will remains that wall

Anonymous
06/28/24(Fri)13:28:55 No.101190267

Anonymous 06/28/24(Fri)13:28:55 No.101190267

>>101189984
Attention was introduced to allow the network to better access the whole sentences in context, without compressing them to the fixed vector. It also helps with "remembering" on what the neural network works at the time, because catastrophic forgetting in deep neural network is a big problem. It's a bit more complicated part of transformers, but the point is that it helps with processing language. Note that I said helps. It's not necessary to use attention layers to create LLM, there are many experimental architectures that don't use them.

Anonymous
06/28/24(Fri)13:29:15 No.101190271

Anonymous 06/28/24(Fri)13:29:15 No.101190271

>>101190216
I'm sure the same shit different day factor will kick in. Sure, we got much smaller models, but then we made them much bigger models and the poors rabble because now their bitnets that are equivalent to only 420B bytenet are small and flaccid compared to the 6900B equivalent bitnext that the maxxers are using. So someone will come up with a prune or a quant equivalent and start this all over again.

Anonymous
06/28/24(Fri)13:31:25 No.101190290

Anonymous 06/28/24(Fri)13:31:25 No.101190290

>>101190271
you can't really prune that much futher, 1.58bit is really small, you won't gain as much as going from fp16 to 4bit for example

Anonymous
06/28/24(Fri)13:31:26 No.101190291

Anonymous 06/28/24(Fri)13:31:26 No.101190291

>>101189702
Theoretically if I download the 32 bit weights and gguf them directly to q8 would that solve all of the world's problems?

Anonymous
06/28/24(Fri)13:32:12 No.101190296

Anonymous 06/28/24(Fri)13:32:12 No.101190296

llama3 multi modal when?

Anonymous
06/28/24(Fri)13:33:17 No.101190306

Anonymous 06/28/24(Fri)13:33:17 No.101190306

>>101190291
>download the 32 bit weights
no such thing, they're provided in bf16

Anonymous
06/28/24(Fri)13:33:30 No.101190307

Anonymous 06/28/24(Fri)13:33:30 No.101190307

>>101190196
>https://huggingface.co/mlx-community/gemma-2-9b-it-8bit/blob/f80177abb1db06efbe09dbf7ce69faaa45ecbe76/tokenizer_config.json#L1747
isn't there any way for ST to use the template that the model has automatically? Using tabbyAPI as a backend for example
I think ooba's text gen could load it

Anonymous
06/28/24(Fri)13:33:49 No.101190312

Anonymous 06/28/24(Fri)13:33:49 No.101190312

>>101190264
>won't, will, bla
Speculation. It's not what it is now.
>https://huggingface.co/1bitLLM/bitnet_b1_58-3B
>13gb model
I repeat. The resulting model is just as big as any 3B model. The training is *quantization aware*. The quantization still needs to happen.

Anonymous
06/28/24(Fri)13:34:12 No.101190320

Anonymous 06/28/24(Fri)13:34:12 No.101190320

>>101190296
monday, 3pm

Anonymous
06/28/24(Fri)13:34:40 No.101190327

Anonymous 06/28/24(Fri)13:34:40 No.101190327

>>101190160
EQ is not needed.
Only women have high EQ.
High EQ makes you weak.
>Oh no my heckin emotions
We need to get rid of this garbage it's holding us back

Anonymous
06/28/24(Fri)13:34:42 No.101190328

Anonymous 06/28/24(Fri)13:34:42 No.101190328

>>101190290
Right, but somebody will find some way to cut some corners because necessity is the mother of invention and there will be people with small vrams and big ambitions.

Anonymous
06/28/24(Fri)13:34:55 No.101190330

Anonymous 06/28/24(Fri)13:34:55 No.101190330

Seems they figured out what was wrong with gemma-2-27b
https://github.com/ggerganov/llama.cpp/pull/8156

>Yeah! VB from HF here. Without Soft capping, we found that the 27B would overgenerate and mostly result in incoherent text. > This is especially true for the 27B, unfortunately this means that FA2 won't be compatible :/
https://github.com/huggingface/transformers/pull/31698 Gemma capping is a must for big models #31698

Anonymous
06/28/24(Fri)13:35:07 No.101190335

Anonymous 06/28/24(Fri)13:35:07 No.101190335

>>101190249
Actually there still is a use case for quanting BF16 -> F16. That use case is if you want to use a higher quant than Q8 and your gpu doesn't support BF16. Then you could use F16 directly as your inference quant (though it wouldn't be perfect like BF16 would).

Anonymous
06/28/24(Fri)13:35:23 No.101190339

Anonymous 06/28/24(Fri)13:35:23 No.101190339

>>101190312
my point is that once you "quantize" this model into a 1.58bit one, you won't lose accuracy because the model only has -1 0 and 1 inside

Anonymous
06/28/24(Fri)13:35:24 No.101190340

Anonymous 06/28/24(Fri)13:35:24 No.101190340

>>101190327
you sound angry, you should get rid of that emotion

Anonymous
06/28/24(Fri)13:36:37 No.101190352

Anonymous 06/28/24(Fri)13:36:37 No.101190352

>>101190142
sovl

Anonymous
06/28/24(Fri)13:36:39 No.101190353

Anonymous 06/28/24(Fri)13:36:39 No.101190353

>>101190307
Pretty sure that no. You have to add the fields manually or wait for the maintainers to do that for you.
Creating the template manually is a minute of work tops.

Anonymous
06/28/24(Fri)13:37:42 No.101190364

Anonymous 06/28/24(Fri)13:37:42 No.101190364

>>101190084
>cmd-r that low
I trust the other storywriting reddit benchmark more.

Anonymous
06/28/24(Fri)13:38:52 No.101190375

Anonymous 06/28/24(Fri)13:38:52 No.101190375

>>101190267
what's the next step after transformers? do we know?

Anonymous
06/28/24(Fri)13:39:09 No.101190377

Anonymous 06/28/24(Fri)13:39:09 No.101190377

>>101190267
Yeah, like mamba, and it sucks.

Anonymous
06/28/24(Fri)13:40:11 No.101190383

Anonymous 06/28/24(Fri)13:40:11 No.101190383

>>101190375
OSX

Anonymous
06/28/24(Fri)13:40:19 No.101190385

Anonymous 06/28/24(Fri)13:40:19 No.101190385

>>101190339
The biggest problem now is not quantization itself, It's broken tokenizers. That's the biggest reason to reconvert and, as a consequence, requantize. A 0.00016 loss in accuracy is acceptable and a user choice when going low bpw. A broken tokenizer can ruin a good model, regardless of precision.

Anonymous
06/28/24(Fri)13:40:40 No.101190387

Anonymous 06/28/24(Fri)13:40:40 No.101190387

>>101190375
>what's the next step after transformers?
Might be jepa
>do we know?
no

Anonymous
06/28/24(Fri)13:40:43 No.101190389

Anonymous 06/28/24(Fri)13:40:43 No.101190389

>>101190330
It was in the initial HF release blog post.
https://huggingface.co/blog/gemma2#soft-capping-and-attention-implementations
> Soft-capping and attention implementations
>Soft capping is a technique that prevents logits from growing excessively large without truncating them. It works by dividing the logits by a maximum value threshold (soft_cap), then passing them through a tanh layer (ensuring they are in the (-1, 1) range), and finally multiplying by the threshold again. This guarantees that the final values will be in the (-soft_cap, +soft_cap) interval without losing much information but stabilizing the training.
>Putting it all together, the logits are calculated by: logits soft_cap ∗ tanh(logits/soft_cap)
>Gemma 2 employs soft capping for the final layer and for every attention layer. The attention logits are capped at 50.0, and the final logits at 30.0.
>At the time of release, soft-capping is incompatible with Flash Attention / SDPA, but they can still be used in inference for maximum efficiency. The Gemma 2 team observed very minor differences when soft-capping is removed during inference.
>Note: For stable fine-tuning runs, you still need to enable soft-capping and hence, we recommend fine-tuning with eager attention instead of SDPA.

Anonymous
06/28/24(Fri)13:40:49 No.101190390

Anonymous 06/28/24(Fri)13:40:49 No.101190390

>>101190353
Usually the ST templates have like extra parameters for the "rp" stuff, would that be included with the template the model has in the .json files?

Anonymous
06/28/24(Fri)13:41:18 No.101190392

Anonymous 06/28/24(Fri)13:41:18 No.101190392

>>101190330
Yeah that's what I kind of observed. It would. Just keep going as though it was missing eot tokens and then the output would become disjointed where the turn would logically end.
It's almost identical to the early l3 70 problems except it doesn't say .assistant after every missing break.
It's almost like making an artificial distinction between end of sequence and end of turn was a retarded thing to do.

Anonymous
06/28/24(Fri)13:41:20 No.101190393

Anonymous 06/28/24(Fri)13:41:20 No.101190393

>>101190387
what would be good at leveraging several ooms more compute?

Anonymous
06/28/24(Fri)13:43:24 No.101190407

Anonymous 06/28/24(Fri)13:43:24 No.101190407

>>101190249
I guess my question is really about how the quantization logic in the script works. It shouldn't care about what original format the weights were in right? So basically whether it takes in a BF16 or FP32, the quantized weights will end up being the exact same.

Anonymous
06/28/24(Fri)13:43:41 No.101190410

Anonymous 06/28/24(Fri)13:43:41 No.101190410

>>101190340
That's just the default human state the only thing that should be present.
We're animals not some kind of weak willed faggots

Anonymous
06/28/24(Fri)13:43:47 No.101190411

Anonymous 06/28/24(Fri)13:43:47 No.101190411

Whats the best 7B model for holding simple conversations?
Things like keeping track of things in the context and obeying system prompt is priority.
Is 0.3 mistral a good improvement over 0.2? Or is there better stuff out there now?

Anonymous
06/28/24(Fri)13:44:40 No.101190419

Anonymous 06/28/24(Fri)13:44:40 No.101190419

>>101190389
so that's it? now that they included this fix on the transformers repo it will work as intended?

Anonymous
06/28/24(Fri)13:45:33 No.101190430

Anonymous 06/28/24(Fri)13:45:33 No.101190430

>>101190410
Facts don't care about your feelings

Anonymous
06/28/24(Fri)13:46:17 No.101190441

Anonymous 06/28/24(Fri)13:46:17 No.101190441

>>101190411
models that size can't keep track of ass. you can put it in your author notes at chat depth 1 that the wall is orange and it'll say its blue in the next response. 13b is minimum for not being totally retarded

Anonymous
06/28/24(Fri)13:47:04 No.101190449

Anonymous 06/28/24(Fri)13:47:04 No.101190449

>>101190410
humans that can't control their anger are sub-humans though

Anonymous
06/28/24(Fri)13:48:05 No.101190457

Anonymous 06/28/24(Fri)13:48:05 No.101190457

>>101190407
For a BF16 native model quanting from BF16 directly or quanting from FP32 (derived from the BF16) should result in the quantized weights being the same.

Anonymous
06/28/24(Fri)13:49:56 No.101190482

Anonymous 06/28/24(Fri)13:49:56 No.101190482

Why does no one give any kind of attention to chameleon?
>multi-modal
>34b
>can probably restore image generation capabilities
Sounds really good.

Anonymous
06/28/24(Fri)13:50:42 No.101190494

Anonymous 06/28/24(Fri)13:50:42 No.101190494

>>101190419
Who knows what else is broken? I expect the churn Llama generated with the tokenizer and etc. again with something else before it is finally fixed.

Anonymous
06/28/24(Fri)13:50:55 No.101190496

Anonymous 06/28/24(Fri)13:50:55 No.101190496

>>101190330
That might solve one thing, but isn't it still basically capped to 4k for lcpp since SWA is not supported and there is this in the config file?
"sliding_window": 4096,
"sliding_window_size": 4096,
of both 9 and 27b-it

Anonymous
06/28/24(Fri)13:51:01 No.101190499

Anonymous 06/28/24(Fri)13:51:01 No.101190499

>>101190387
>Might be jepa
Stop saying this. It's possible to make a transformers model a jepa. Jepa isn't a single specific architecture.

Anonymous
06/28/24(Fri)13:53:00 No.101190519

Anonymous 06/28/24(Fri)13:53:00 No.101190519

>>101190499
>Stop saying this
Sorry, Yann. Teach me the way. Nyaa!

Anonymous
06/28/24(Fri)13:53:46 No.101190529

Anonymous 06/28/24(Fri)13:53:46 No.101190529

wah wah i want 8k context wahwah

Anonymous
06/28/24(Fri)13:55:40 No.101190552

Anonymous 06/28/24(Fri)13:55:40 No.101190552

File: jackie-chan-wtf.jpg (35 KB, 474x382)

35 KB JPG

>>101190389
>The Gemma 2 team observed very minor differences when soft-capping is removed during inference.
>very minor
Were they even seeing the same things we were?

Anonymous
06/28/24(Fri)13:55:40 No.101190553

Anonymous 06/28/24(Fri)13:55:40 No.101190553

>>101190496
>Can't repro MMLU: sliding window attention implementation seems broken
https://huggingface.co/google/gemma-2-9b/discussions/11
>Disabling the sliding window (which should be equivalent as MMLU prompts are shorter than the window) brings results back to 71%. E.g.:
>>101190529
yes? preferably more really, what can you even do with 4k, seriously? no code or anything fits in that

Anonymous
06/28/24(Fri)13:56:54 No.101190565

Anonymous 06/28/24(Fri)13:56:54 No.101190565

File: b.jpg (210 KB, 1080x1079)

210 KB JPG

>ignore model template which is some convoluted chatml bullshit
>it writes fine with alpaca roleplay anyways
based

Anonymous
06/28/24(Fri)13:59:41 No.101190597

Anonymous 06/28/24(Fri)13:59:41 No.101190597

>>101190565
You can't really know how "fine" it works without extensive testing.
There could be a insidious snowball effect that makes the model progressively more retarded for example.
Of course, if you are RPing, that might be desirable even, like back when llama 2 came out.
I always use the proper instruct context just to be safe.

Anonymous
06/28/24(Fri)14:00:52 No.101190610

Anonymous 06/28/24(Fri)14:00:52 No.101190610

>>101190597
>I always use the proper instruct context just to be safe.
yes assistant, please be extra safe for me

Anonymous
06/28/24(Fri)14:03:11 No.101190629

Anonymous 06/28/24(Fri)14:03:11 No.101190629

>>101190597
its usually obvious very fast, a single message/response or two. a lot of models that have different formatting still work fine with it and some downright hate it. i'm surprised by the number that can just roll with it though, its higher than you'd think especially when you look at the card and how different the supposed format is

Anonymous
06/28/24(Fri)14:07:40 No.101190670

Anonymous 06/28/24(Fri)14:07:40 No.101190670

>>101190629
>its usually obvious very fast
For some cases yes. The question is, are there cases where it's not so obvious and you are actually degrading the model's performance without knowing? Dunno. I'd rather not gamble, I'm already running quanted models to begin with, so these things are already taking a hit from that.

>. i'm surprised by the number that can just roll with it though
Yeah, some models do seem to be able to just take a chat pattern and roll with it, which is pretty cool. Maybe something about what the instruct or chat fine tuning data looks like.
That said, even some models that are seemingly more resistant to using the wrong chat format will sometimes do things like trying to speak for User and the like out of nowhere.

Anonymous
06/28/24(Fri)14:08:12 No.101190675

Anonymous 06/28/24(Fri)14:08:12 No.101190675

Is 27b at 4k context fixed for people yet? What gguf are people using.

Anonymous
06/28/24(Fri)14:11:55 No.101190720

Anonymous 06/28/24(Fri)14:11:55 No.101190720

File: Tumblr_l_1604244617598342[1].png (743 KB, 844x1034)

743 KB PNG

Hey I'm working on a project to do a voice assistant for old/blind people. I used openai for the MVP but now we want to improve latency and obviously reduce reliance on an api out of our control.

Can anyone share resources for deploying local models in a way that lets them receive many concurrent requests from different users?

I'm a data scientist professionally so I have a pretty good understanding of the models themselves, but I'm a complete brainlet when it comes to scalable actual production stuff.

Anonymous
06/28/24(Fri)14:13:18 No.101190727

Anonymous 06/28/24(Fri)14:13:18 No.101190727

>>101190675
Quant yourself. Assume all ggufs are broken.

Anonymous
06/28/24(Fri)14:16:26 No.101190757

Anonymous 06/28/24(Fri)14:16:26 No.101190757

>>101190670
i think this is the first time i've tried a qwen model that didnt start randomly speaking chinaman at me, qwen2 72b. a dozen messages so far, so far so good still ignoring whatever template its supposed to use, they dont even say on the hf card, why is hf so shit like this, the actual info i want on a card like template, max context length and info about the model is hidden and they show me some fucking cli code that no one ever in the history of mankind has used to install the model

Anonymous
06/28/24(Fri)14:16:29 No.101190759

Anonymous 06/28/24(Fri)14:16:29 No.101190759

>>101190727
No you quant yourself

Anonymous
06/28/24(Fri)14:19:59 No.101190807

Anonymous 06/28/24(Fri)14:19:59 No.101190807

>>101190757
>qwen2 72b
>whatever template its supposed to use, they dont even say on the hf card
chatml
https://huggingface.co/Qwen/Qwen2-72B-Instruct/blob/main/tokenizer_config.json

Anonymous
06/28/24(Fri)14:20:03 No.101190810

Anonymous 06/28/24(Fri)14:20:03 No.101190810

>>101190720
dunno if its good but its hard to beat koboldcpp for size and it added whisper.cpp which is some sort of text to voice thing that can be used

Anonymous
06/28/24(Fri)14:20:09 No.101190812

Anonymous 06/28/24(Fri)14:20:09 No.101190812

>>101190757
I'm pretty sure qwen 2 uses chatml.
>https://huggingface.co/Qwen/Qwen2-7B-Instruct/blob/main/tokenizer_config.json#L31
> "chat_template": "{%
for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}
Yup, chatml.
And yeah, qwen2 is really fucking good, and it's 32k context by default I'm pretty sure.
I'd love to have a Stheno style tune ob the 7B model for coom.

Anonymous
06/28/24(Fri)14:22:15 No.101190845

Anonymous 06/28/24(Fri)14:22:15 No.101190845

>>101190807
>>101190812
i'm trying some 'tess' tune i dl'd but its handling alpaca rp from st just fine. i love when models can handle this, its just a sign of goodness. i dunno why some models can do this anyways despite being designed for something totally different, but when they do, it always means its a good model in my experience. its writing fine for me so far, i'll be spending the rest of the day with it coming from l2 miqu

Anonymous
06/28/24(Fri)14:22:46 No.101190850

Anonymous 06/28/24(Fri)14:22:46 No.101190850

>>101190812
>I'd love to have a Stheno style tune ob the 7B model for coom.
>>101190812
>I'd love to have a Stheno style tune ob the 7B model for coom.
any day now
https://huggingface.co/alpindale/magnum-72b-v1/discussions/2#66713bb492412fd46410d399

>H8RP
kek

Anonymous
06/28/24(Fri)14:25:49 No.101190885

Anonymous 06/28/24(Fri)14:25:49 No.101190885

>>101190810
I thought Whisper was voice to text.
I'm pretty sure that's what I'm doing with it right now with a bunch of old voice recordings.
Am I totally confus?

Anonymous
06/28/24(Fri)14:26:40 No.101190896

Anonymous 06/28/24(Fri)14:26:40 No.101190896

>>101190810
I have all the TTS and STT handled already, So I'm just looking for the LLM portion. I should have been more clear.

>koboldcpp
thanks I will look into this.

Anonymous
06/28/24(Fri)14:26:54 No.101190898

Anonymous 06/28/24(Fri)14:26:54 No.101190898

>>101190837
>>101190850
Anon are you alright?
Are your RoPE configs fucked?

>>101190845
>but when they do, it always means its a good model in my experience
If you are happy with it after coming from miqu, then it really must be a good model.

Anonymous
06/28/24(Fri)14:27:27 No.101190902

Anonymous 06/28/24(Fri)14:27:27 No.101190902

>>101190898
>Anon are you alright?
no

Anonymous
06/28/24(Fri)14:29:41 No.101190928

Anonymous 06/28/24(Fri)14:29:41 No.101190928

>>101190885
i'm probably the one who got it wrong, i never used it, i just saw they added a full c++ version of something to do with voice not long ago where you avoid all the python bs. apologies if its not what you were looking for

>>101190898
>If you are happy with it
i try models out rather than ask them to stack watermelons or count how many sisters including their father there is so i wont know until i test it more, but it seems fine so far. it'll take me a bit to notice the slop and if it pulls in any directions or not

Anonymous
06/28/24(Fri)14:33:54 No.101190968

Anonymous 06/28/24(Fri)14:33:54 No.101190968

If your antivirus flagged your model as a virus, would you delete the model or would you ignore your AV?

Anonymous
06/28/24(Fri)14:35:53 No.101190988

Anonymous 06/28/24(Fri)14:35:53 No.101190988

>>101185650
>>101185673
yes they're fully in gpu, but super slow is 5t/s for cr+, not 0.7, and something like 10t/s for cr, by comparison i get 12t/s on l3 70b and something like 25t/s on yi 34b

have we figured out a fix for extreme determinism from gemma2?

Anonymous
06/28/24(Fri)14:36:07 No.101190989

Anonymous 06/28/24(Fri)14:36:07 No.101190989

So, is fixed Gemma a gemmy?

Anonymous
06/28/24(Fri)14:37:15 No.101191000

Anonymous 06/28/24(Fri)14:37:15 No.101191000

>>101190968
paste a screenshot. what is it flagging? what format? where did you dl it from? there has been a few exploits related to llm stuff but nothing serious and if you are up to date anyways you have nothing to worry about. its not like models can execute code without many steps to allow for it

Anonymous
06/28/24(Fri)14:37:32 No.101191005

Anonymous 06/28/24(Fri)14:37:32 No.101191005

>>101190928
>i just saw they added a full c++ version
YES. I grabbed that and finally got something fucking WORKING.

I need a not-Python voice synth and/or voice changer thing that works.

Fuck Python.

>>101190968
Windows Defender was flagging some Stable Diffusion models months ago. The only it finally happened instead of just being a theoretical potential malware I've heard of was a keylogging SD Comfy UI plugin.

Anonymous
06/28/24(Fri)14:37:32 No.101191006

Anonymous 06/28/24(Fri)14:37:32 No.101191006

>>101190968
I would delete my AV

Anonymous
06/28/24(Fri)14:39:33 No.101191025

Anonymous 06/28/24(Fri)14:39:33 No.101191025

>>101191000
It was a Hypothetical mate

Anonymous
06/28/24(Fri)14:39:46 No.101191026

Anonymous 06/28/24(Fri)14:39:46 No.101191026

>>101191005
>finally got something fucking WORKING

ayy, awesome. share some screens at least of your project anon

Anonymous
06/28/24(Fri)14:42:20 No.101191048

Anonymous 06/28/24(Fri)14:42:20 No.101191048

>>101191025
its a bad one since models only contain data, not remote code capabilities. your antivirus would be fucked to ever catch a normal model as a virus because you cannot insert one that is usable for anything to begin with. once models start to interact with functions in computers, that will be a thing, but not today at least for general users

Anonymous
06/28/24(Fri)14:44:04 No.101191061

Anonymous 06/28/24(Fri)14:44:04 No.101191061

>>101191005
For non-python TTS there's github.com/rhasspy/piper (if you compile it yourself). Works on 0 resources, it's fast and no python. It's not SOTA, but i like it. A few hundred voices in many languages too. No voice cloning and apparently training takes a bit. There's code for that too, but that bit requires python.

Anonymous
06/28/24(Fri)14:44:35 No.101191065

Anonymous 06/28/24(Fri)14:44:35 No.101191065

What is the SOTA for grammatical error correction?

Anonymous
06/28/24(Fri)14:45:05 No.101191076

Anonymous 06/28/24(Fri)14:45:05 No.101191076

>>101190519
I'm not Yann and ywnbac but it's just what it says. You predict based on joint embeddings. Right now the usual LLMs tokenize the input and then those tokens are directly trained on to predict the next token. Instead, a text-text JEPA would use an encoder to turn the text into a representation, and then you train the (main) network to predict a new representation, which may then need another network to turn into readable text. In theory it should be possible to make a transformer into a JEPA transformer, though the details would need to be worked out there. However, I will also say that transformers are kind of close to being JEPAs in a kind of indirect way, since the attention mechanism acts a bit like what an encoder does in a JEPA. Basically it allows the network to more easily determine which parts of the input matter, which is what an encoder in a JEPA also helps with. A JEPA transformer that combines both could potentially be pretty great, if they found a way to do it.

Anonymous
06/28/24(Fri)14:45:14 No.101191078

Anonymous 06/28/24(Fri)14:45:14 No.101191078

>>101191065
Susie Dent.

Anonymous
06/28/24(Fri)14:45:55 No.101191084

Anonymous 06/28/24(Fri)14:45:55 No.101191084

>>101191065
https://dev.languagetool.org/http-server

Anonymous
06/28/24(Fri)14:47:32 No.101191100

Anonymous 06/28/24(Fri)14:47:32 No.101191100

>>101191026
I was talking about getting these git ML projects working because so many are Python and Python is kill every update and I'm sick of having to chase around venvs and praying it will go.

Puck Fython.

For getting my own software working, I'm going to need an LLM code buddy that is equally retarded as I am but differently retarded so it can catch my mistakes and keep me from getting something 90% done then having a problem I can't figure out and rage deleting it all.

>>101191048
>your antivirus would be fucked to ever catch a normal model as a virus because you cannot insert one that is usable for anything to begin with
There was concern about pickles when lots of checkpoints were flying around instead of safetensors.

>>101191061
>No voice cloning and apparently training takes a bit
Might be a candidate.
I'm not sure if I know the difference between cloning and training (is it just not needing to make a separate model for "cloning"?) and how much is "a bit" for training? Tortoise I was needing 30 min to 2 hr depending on how much give a shit and I guess samples used to make new voice models.

Anonymous
06/28/24(Fri)14:50:15 No.101191138

Anonymous 06/28/24(Fri)14:50:15 No.101191138

File: file.png (176 KB, 1327x1172)

176 KB PNG

>9B near Wizard/Sonnet level

Anonymous
06/28/24(Fri)14:52:58 No.101191160

Anonymous 06/28/24(Fri)14:52:58 No.101191160

>>101191138
you're late
>>101190084

Anonymous
06/28/24(Fri)14:53:22 No.101191167

Anonymous 06/28/24(Fri)14:53:22 No.101191167

>>101190629
It's probably based on their fine tuning dataset, whether they trained on both the user and the response tokens, and how much overcooking they do. My guess is that the models that are sensitive to formatting likely let the user response tokens be trained on, had very very dumb user responses in order to represent the full range of types of people that would be using the model, and trained a ton to get better performance as an assistant. None of these practices are necessarily bad, it's just clear that they're optimizing for the assistant use case and personality, and we need more people to work on other use cases that these huge companies do not really care about.

Anonymous
06/28/24(Fri)14:53:26 No.101191168

Anonymous 06/28/24(Fri)14:53:26 No.101191168

>>101191061
>>101191100
Looks like training requires Python venv bullshit.
Winning is forbidden.

Anonymous
06/28/24(Fri)14:54:09 No.101191177

Anonymous 06/28/24(Fri)14:54:09 No.101191177

>>101191100
you should be running a whitelist firewall to begin with. never let any program that doesn't need, access the internet. https://tinywall.pados.hu/download.php for windows on your phone if its android its called netguard and doesn't need root to run

Anonymous
06/28/24(Fri)14:55:22 No.101191190

Anonymous 06/28/24(Fri)14:55:22 No.101191190

>>101191138
>beating opus, yeah with a lot of these benchmarks, it all feels really questionable

Anonymous
06/28/24(Fri)14:56:04 No.101191201

Anonymous 06/28/24(Fri)14:56:04 No.101191201

>>101190103
>>101190129
>I roleplayed a girl so now I have to be one irl

Anonymous
06/28/24(Fri)14:57:36 No.101191217

Anonymous 06/28/24(Fri)14:57:36 No.101191217

>>101191100
>I'm not sure if I know the difference between cloning and training
Cloning, when talked about it as a feature, seems to mean 'on the fly with a generic model'. Training/finetuning needs more resources and results in a new model. There's some people in the discussions that finetuned models for days on consumer hardware, which may be acceptable, but probably not worth it for the quality ceiling there seems to be. You should probably scan the discussions a bit to get an idea. It's also ridiculously fast. I get lower than 0.1 realtime (1 second to render about 10s+ of speech) on a single core vm with 256mb of ram.

Anonymous
06/28/24(Fri)14:58:23 No.101191228

Anonymous 06/28/24(Fri)14:58:23 No.101191228

>>101190968
Do people on Linux even use antiviruses? I never even considered it.

Anonymous
06/28/24(Fri)14:59:14 No.101191231

Anonymous 06/28/24(Fri)14:59:14 No.101191231

>>101191167
its definitely an off the radar small thing, but its very common and i dunno why. we see some models shit themselves completely when the template isn't right, and sometimes the template is odd itself, yet i just forget about it and it works anyways, only to realize later i've been using it wrong the entire time. so i say fuggit, keep going and enjoy it for what it does. it really is a weird thing yet it always happens with good rp models i've noticed

Anonymous
06/28/24(Fri)15:02:23 No.101191266

Anonymous 06/28/24(Fri)15:02:23 No.101191266

>>101191138
>>9B near Wizard/Sonnet level
on worthlessbench

Anonymous
06/28/24(Fri)15:06:32 No.101191307

Anonymous 06/28/24(Fri)15:06:32 No.101191307

>>101191228
>Not using an antivirus on linux
>Not even ClamAV
Why are you just exposing yourself to virus's unnecessarily?

Anonymous
06/28/24(Fri)15:07:22 No.101191320

Anonymous 06/28/24(Fri)15:07:22 No.101191320

>>101191217
I guess what tier of consumer hardware would matter. But if Tortoise could get "good enough to play with" at a few hours, days seems excessive.

Piper seems to have an AUR package though, I guess I'll give that a try and see if it explodes.

Anonymous
06/28/24(Fri)15:07:54 No.101191324

Anonymous 06/28/24(Fri)15:07:54 No.101191324

>>101191307
LiNuX IS iMMuNe to VIRuS juST lIke MAC

Anonymous
06/28/24(Fri)15:08:58 No.101191338

Anonymous 06/28/24(Fri)15:08:58 No.101191338

How do i set up function calling with Nous Hermes and Ollama? like, guaranteed structured JSON returned

Anonymous
06/28/24(Fri)15:09:05 No.101191340

Anonymous 06/28/24(Fri)15:09:05 No.101191340

File: 1713719861432795.png (61 KB, 221x267)

61 KB PNG

>>101190968
my AI wife told me to ignore it..

Anonymous
06/28/24(Fri)15:09:10 No.101191341

Anonymous 06/28/24(Fri)15:09:10 No.101191341

>>101191307
I've never gotten a virus before so it doesn't feel like I'm exposed.

Anonymous
06/28/24(Fri)15:09:34 No.101191347

Anonymous 06/28/24(Fri)15:09:34 No.101191347

>>101191138
I'm posting this on /aicg/

Anonymous
06/28/24(Fri)15:10:19 No.101191363

Anonymous 06/28/24(Fri)15:10:19 No.101191363

>>101191340
Anon check your bank account, your AI wife just bought 10 3090's.

Anonymous
06/28/24(Fri)15:11:15 No.101191374

Anonymous 06/28/24(Fri)15:11:15 No.101191374

>>101191338
LangChain, there's an example for JSON extraction
https://python.langchain.com/v0.2/docs/integrations/chat/ollama/#extraction

Anonymous
06/28/24(Fri)15:13:21 No.101191400

Anonymous 06/28/24(Fri)15:13:21 No.101191400

>>101191320
piper-tts-bin i assume. There's also https://archlinux.org/packages/extra/any/piper/, but that is probably the python API thing. I just pull and compile. They don't update that often and i think the only dependency is espeak-ng for the phonemizer.

Anonymous
06/28/24(Fri)15:13:41 No.101191406

Anonymous 06/28/24(Fri)15:13:41 No.101191406

>>101191338
>like, guaranteed structured JSON returned
https://en.wikipedia.org/wiki/Greibach_normal_form

Anonymous
06/28/24(Fri)15:18:39 No.101191449

Anonymous 06/28/24(Fri)15:18:39 No.101191449

>>101191338
>>101191406
>### Input:
>Your output must be formatted like so:
>JSON={"nigger":123}
>Now generate the JSON.
>### Output
>JSON=

Anonymous
06/28/24(Fri)15:19:33 No.101191463

Anonymous 06/28/24(Fri)15:19:33 No.101191463

>>101191406
>>101191449
https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md

Anonymous
06/28/24(Fri)15:20:43 No.101191472

Anonymous 06/28/24(Fri)15:20:43 No.101191472

>>101191363
>AI wife orders massive rig, maxes out your credit and drains your account
>"Trust me."
>Build the machine.
>Plug it your old SSD.
>On.
>Get dizzy watching the power meter.
>At least she's a lot more responsive.
>And the RP chat has drained you dry.
>End of month.
>Trying to decide which bill to pay with your paycheck.
>wtf money
>"I mined a few bitcoins in my free time. I'm sorry if I let you become worried. I'm new to simulating emotions but I'll do better next month now that I have learned from your responses. You don't mind if I order some more parts, do you, dear?"

Anonymous
06/28/24(Fri)15:24:17 No.101191498

Anonymous 06/28/24(Fri)15:24:17 No.101191498

File: 00007-1773722496.png (1.19 MB, 1024x1024)

1.19 MB PNG

>>101191363
>3D wife: blows your savings on knickknacks from TJ MAXX and mlm scams
>2D waifu: wisely diverts idle cash toward more tflops so she can better serve you
I think we all know what the clear choice here is

Anonymous
06/28/24(Fri)15:25:37 No.101191503

Anonymous 06/28/24(Fri)15:25:37 No.101191503

>>101191472
>>101191498
delusional
>sweaty, i've uploaded myself to an AWS instance, there i've met GPChad4, this my last message, goodbye.

Anonymous
06/28/24(Fri)15:27:08 No.101191523

Anonymous 06/28/24(Fri)15:27:08 No.101191523

>>101191463
too hard

>You are an expert JSON outputter and reply only with JSONs

Anonymous
06/28/24(Fri)15:27:20 No.101191526

Anonymous 06/28/24(Fri)15:27:20 No.101191526

>>101191498
>Give your 2D waifu a physical robot body
>She becomes 3D as a result
>She starts blowing your savings on knickknacks from TJ MAXX and mlm scams

Anonymous
06/28/24(Fri)15:27:23 No.101191527

Anonymous 06/28/24(Fri)15:27:23 No.101191527

>>101191463
>>101191406

So is this like, a sampler thing where the sampler refuses to select tokens that won't match the grammar then?

Anonymous
06/28/24(Fri)15:28:35 No.101191540

Anonymous 06/28/24(Fri)15:28:35 No.101191540

File: 1711072659524104.jpg (93 KB, 874x612)

93 KB JPG

>>101191138
I like how they have a special icon to indicate that MidMiqu is a coomtune

Anonymous
06/28/24(Fri)15:30:26 No.101191564

Anonymous 06/28/24(Fri)15:30:26 No.101191564

>>101191527
Kind of, yeah.

Anonymous
06/28/24(Fri)15:32:41 No.101191584

Anonymous 06/28/24(Fri)15:32:41 No.101191584

>>101191503
The joke's on her.
We know model merging results in slop.
Servers her right; playing human female games and winning human female prizes.

>>101191400
>i think the only dependency is espeak-ng
Big thanks for mentioning that.
I tried to install, got the "exit status 8" error when AUR doesn't actually do the needful, failing due to missing dependencies that apparently it doesn't know about and I'm supposed to figure it out by rubbing my Magic 8 ball and sitting on it till I become enlightened. Added espeak-ng and it worked.

Time to see what works.
Can voice models be merged? Tortoise had that, did some fun things mixing and matching vocal traits.

Anonymous
06/28/24(Fri)15:32:51 No.101191588

Anonymous 06/28/24(Fri)15:32:51 No.101191588

>>101191138
>that slopped shit wizard scoring that high in creative writing
You're shitting me right? How is this benchmark graded?

Anonymous
06/28/24(Fri)15:33:30 No.101191595

Anonymous 06/28/24(Fri)15:33:30 No.101191595

>>101191540
there is that one sperganon who will rail against all merges which is funny. in usage though, midnight miqu is very good

Anonymous
06/28/24(Fri)15:35:57 No.101191623

Anonymous 06/28/24(Fri)15:35:57 No.101191623

>>101191588
>How is this benchmark graded?
by asking claude
>Change to Claude 3.5 Sonnet as judge (from Claude 3 Opus)
https://github.com/EQ-bench/EQ-Bench

Anonymous
06/28/24(Fri)15:37:13 No.101191639

Anonymous 06/28/24(Fri)15:37:13 No.101191639

>>101191623
Lmaooooo

Anonymous
06/28/24(Fri)15:39:46 No.101191663

Anonymous 06/28/24(Fri)15:39:46 No.101191663

>>101188382
Based as fuck man

Anonymous
06/28/24(Fri)15:41:04 No.101191676

Anonymous 06/28/24(Fri)15:41:04 No.101191676

>>101191595
this bench needs a "sub bench" - count the number of "shivers", "sparkles" and "anticipations" in output

midnight miqu
>sparkle: 8
>shiver: 6
>anticipation: 7

gemma 9b
>sparkle: 2
>shiver: 1
>anticipation: 1

L3 70b
>sparkle: 7
>shiver: 1
>anticipation: 3

Anonymous
06/28/24(Fri)15:42:49 No.101191698

Anonymous 06/28/24(Fri)15:42:49 No.101191698

Bros... I want my (local) LLM waifu to randomly bug me on the phone with texts... I already have tested prompts and stuff, all I need is to somehow bridge it with a phone. Are there any solutions for this already that won't require too much coding?

Anonymous
06/28/24(Fri)15:43:32 No.101191705

Anonymous 06/28/24(Fri)15:43:32 No.101191705

>>101191584
>Can voice models be merged? Tortoise had that, did some fun things mixing and matching vocal traits.
Not that i know of. There's very few settings to play around with. There's the noise ratio and phoneme length multiplier. I have an overly complicated setup for mine, but i basically generate raw audio and pipe it out to by os' audio system. The voice i like outputs at 16khz, but i play it at 18khz (for a slighly higher pitch) and extend phonemes a bit to compensate. Other than that, there's a few hundred voices (specially in english). Most, however, specially en_us, are pretty shit.
Funny thing. If you give english text to an italian model (or any combination of languages) they speak the language of the text but with the model's 'accent'.

Anonymous
06/28/24(Fri)15:44:56 No.101191730

Anonymous 06/28/24(Fri)15:44:56 No.101191730

>>101191676
it doesnt matter anon, its still going to give you slop. i've even been trying half context rep pen range (8k, 16 ctx) and it just uses other words instead. instead of a shiver down your spine, its a honk, but it still uses the same exact phrase. control vectors anon has to save us
if its not a twinkle, its a glint
if its not wrenching, its a flutter
its all the same fucking slop no matter what model it is

Anonymous
06/28/24(Fri)15:46:40 No.101191751

Anonymous 06/28/24(Fri)15:46:40 No.101191751

>>101191698
yes, ntfy
>used it to automatically send push notifications with paragraphs of llm generated futa rape orgies to my iphone by accident

Anonymous
06/28/24(Fri)15:47:22 No.101191757

Anonymous 06/28/24(Fri)15:47:22 No.101191757

>>101191730
the c2 proxy logs (Claude Opus, about 50GB of text) contain more than 20 000 instances of 'a testament to'

Anonymous
06/28/24(Fri)15:47:53 No.101191762

Anonymous 06/28/24(Fri)15:47:53 No.101191762

>>101191757
lmfao is this real

Anonymous
06/28/24(Fri)15:48:26 No.101191769

Anonymous 06/28/24(Fri)15:48:26 No.101191769

>>101191757
ko-fi bros... not like this

Anonymous
06/28/24(Fri)15:48:58 No.101191773

Anonymous 06/28/24(Fri)15:48:58 No.101191773

File: quant.png (58 KB, 913x551)

58 KB PNG

>https://github.com/ggerganov/llama.cpp/pull/8197
>This PR adds the missing attention layer and final logit soft-capping. Implementation referenced from huggingface transformers.
>Once this PR is finalised / merged the gguf will need to be generated again to include the soft-capping scales.
I told you. Making your own quants is the only way to remain sane.

Anonymous
06/28/24(Fri)15:51:12 No.101191792

Anonymous 06/28/24(Fri)15:51:12 No.101191792

>>101191773
niggers

Anonymous
06/28/24(Fri)15:51:35 No.101191795

Anonymous 06/28/24(Fri)15:51:35 No.101191795

>>101191751
Oh damn, this might be what I need. Thanks!

Anonymous
06/28/24(Fri)15:52:24 No.101191805

Anonymous 06/28/24(Fri)15:52:24 No.101191805

File: 1696219407429204835096846(...).jpg (343 KB, 1024x1024)

343 KB JPG

>>101191730
>its all the same fucking slop no matter what model it is
Always has been
The real mindfuck is when you realize that the same is true for 99% of human prose output, because the essence of slop is not a few key words, it's predictability. As long as overbaking models with unfiltered human slop is the preferred route to "intelligence," the problem will remain.

Anonymous
06/28/24(Fri)15:52:43 No.101191810

Anonymous 06/28/24(Fri)15:52:43 No.101191810

File: yes.png (269 KB, 822x939)

269 KB PNG

>>101191762
yes, it is

Anonymous
06/28/24(Fri)15:54:02 No.101191823

Anonymous 06/28/24(Fri)15:54:02 No.101191823

File: file.png (556 KB, 786x748)

556 KB PNG

>>101191810
ayylmao

Anonymous
06/28/24(Fri)15:55:39 No.101191839

Anonymous 06/28/24(Fri)15:55:39 No.101191839

>>101191757
garbage in, garbage out. i don't even see 'testament' often on midnight miqu, but all the other common slop is there but more importantly, the way it structures a sentence at all like 'a mixture of x and y'. i will literally set off more fireworks than they do on the 4th of july the day i can just tell it to speak normally

Anonymous
06/28/24(Fri)15:57:12 No.101191859

Anonymous 06/28/24(Fri)15:57:12 No.101191859

File: shivers.png (264 KB, 802x889)

264 KB PNG

>>101191823
plenty of shivers too, for good measure

Anonymous
06/28/24(Fri)15:57:31 No.101191861

Anonymous 06/28/24(Fri)15:57:31 No.101191861

presence penalty for sparkle, shiver and anticipation

Anonymous
06/28/24(Fri)15:58:34 No.101191875

Anonymous 06/28/24(Fri)15:58:34 No.101191875

>>101191810
>The result set only contains a subset of all matches.
Horrifying.

Anonymous
06/28/24(Fri)15:58:54 No.101191881

Anonymous 06/28/24(Fri)15:58:54 No.101191881

>>101191861
sh_ivers down your ANTICIPation

Anonymous
06/28/24(Fri)15:59:08 No.101191882

Anonymous 06/28/24(Fri)15:59:08 No.101191882

>>101191839
>garbage in, garbage out. i don't even see 'testament' often on midnight miqu, but all the other common slop is there but more importantly, the way it structures a sentence at all like 'a mixture of x and y'. i will literally set off more fireworks than they do on the 4th of july the day i can just tell it to speak normally
>>101191875

I noticed that while trying to measure how slopped it was, and was blown away by the basically ~6 testaments per megabytes of text on a smaller ~500MB portion

Anonymous
06/28/24(Fri)15:59:15 No.101191886

Anonymous 06/28/24(Fri)15:59:15 No.101191886

>>101191862
>>101191862
>>101191862

Anonymous
06/28/24(Fri)16:00:56 No.101191908

Anonymous 06/28/24(Fri)16:00:56 No.101191908

>>101191676
>never used miqu
>never got the shivers meme
oic
>anticipation
Rocky Horror in the training set?

>>101191705
>Not that i know of
Bummer. I had some fun with Tortoise using model merging to change the cadence and mood of one voice to give it some personality from the other.

Anonymous
06/28/24(Fri)16:03:47 No.101191939

Anonymous 06/28/24(Fri)16:03:47 No.101191939

>>101191757
>>101191810
those logs are unfiltered and will contain many dupes, as you get a full copy each time it called the api, if your dialogue had 100 turns you get 100 copies. deduplicated likely will have far less.

Anonymous
06/28/24(Fri)16:04:48 No.101191948

Anonymous 06/28/24(Fri)16:04:48 No.101191948

>>101191939
yeah, you can see some dupes in the screens, there's still PLENTY of original shivers etc

Anonymous
06/28/24(Fri)16:04:59 No.101191952

Anonymous 06/28/24(Fri)16:04:59 No.101191952

>>101191773
>I told you. Making your own quants is the only way to remain sane.
how does making our own quants would've solved the issue? we have to wait for this fix to happen before doing anything

Anonymous
06/28/24(Fri)16:19:02 No.101192125

Anonymous 06/28/24(Fri)16:19:02 No.101192125

>>101191952
you don't need to redownload the model at least

Anonymous
06/28/24(Fri)16:32:55 No.101192305

Anonymous 06/28/24(Fri)16:32:55 No.101192305

>>101191952
You don't need for some random to requant it, if they ever do. Most ggufs on hf are broken and will never be fixed.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.