/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/04/24(Thu)14:11:17 No.101274031

File: f0f7b6652373b77393db56c46(...).jpg (1.92 MB, 3200x4000)

1.92 MB JPG

/lmg/ - Local Models General Anonymous 07/04/24(Thu)14:11:17 No.101274031 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101268178 & >>101258576

►News
>(07/02) Japanese LLaMA-based model pre-trained on 2T tokens: https://hf.co/cyberagent/calm3-22b-chat
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/04/24(Thu)14:11:43 No.101274035

Anonymous 07/04/24(Thu)14:11:43 No.101274035

File: f4151f66073c7a5c37ff8cd78(...).jpg (153 KB, 1000x900)

153 KB JPG

►Recent Highlights from the Previous Thread: >>101268178

--Issues with the new _L Quantization Method: >>101269594 >>101269637 >>101272301 >>101272326 >>101272381 >>101272430 >>101272480
--Improving Roleplay Models by Tweaking Training Data and Goals: >>101271795 >>101271832 >>101271836 >>101271929 >>101272025 >>101271890
--Deepseek v2: A Mixed Bag for ERP and Creative Writing: >>101272115
--T4 16GB vs 4060ti 16GB: Which is the Better Deal?: >>101271398 >>101271499
--Seeking Toxic, Human-Like Models Beyond GPT-4chan: >>101269959 >>101270105 >>101270366 >>101270061 >>101270367 >>101270386 >>101270409 >>101270432
--Local AI for Latin Grammar on Low-End Laptop Specs: >>101271921 >>101271945 >>101272009 >>101272026
--Llama.cpp's No_MMAP Option Causes RAM Inflation in Gemma 2: >>101270414 >>101270456
--FlashAttention Not Supported on Gemma Due to Incompatibility Issues: >>101269666 >>101269683
--Anon's Quest for the Perfect Model for Text Understanding and Rewriting: >>101268784 >>101268798 >>101268855 >>101268914 >>101269016 >>101269121 >>101269265 >>101269247 >>101269287 >>101269328 >>101269377 >>101269255 >>101272169
--Testing Calm3-22b-Chat at BF16 Precision: >>101272234 >>101272993
--Running InternVL-Chat-V1-5 Locally with Kobold or LLaMA: >>101271530 >>101271661 >>101271744 >>101271784
--Model Creative Writing Performance Comparison Chart: >>101272317 >>101272337 >>101272387
--Google Could Dominate with Gemma-27b MoE: >>101269952
--Gemma's Guardrails in RP Mode: >>101269755 >>101269785
--Big Tech Plays it Safe, Lacks Innovation: >>101271169 >>101271188 >>101271310 >>101271355 >>101271375 >>101271387 >>101271361 >>101271099 >>101271468 >>101271617 >>101271677
--Anons Share Their LLM Interaction Strategies: >>101271908 >>101271939 >>101271948 >>101271981 >>101271975
--Anon Shares Model Parameters for Q6_K_L: >>101269573 >>101269602 >>101269608 >>101269688
--Miku (free space): >>101271031

►Recent Highlight Posts from the Previous Thread: >>101268182

Anonymous
07/04/24(Thu)14:13:50 No.101274061

Anonymous 07/04/24(Thu)14:13:50 No.101274061

Mikulove

Anonymous
07/04/24(Thu)14:15:58 No.101274079

Anonymous 07/04/24(Thu)14:15:58 No.101274079

Gemma fix status?

Anonymous
07/04/24(Thu)14:16:12 No.101274086

Anonymous 07/04/24(Thu)14:16:12 No.101274086

>>101274079
2mw

Anonymous
07/04/24(Thu)14:17:02 No.101274094

Anonymous 07/04/24(Thu)14:17:02 No.101274094

File: 1695257102376633.png (91 KB, 1401x958)

91 KB PNG

My custom frontend is now usable after like 3 weeks development. I'm so happy I feel like crying.

Post your custom frontends anons.

Anonymous
07/04/24(Thu)14:17:57 No.101274107

Anonymous 07/04/24(Thu)14:17:57 No.101274107

>>101274031
Celebrating America with Miku

Anonymous
07/04/24(Thu)14:17:58 No.101274108

Anonymous 07/04/24(Thu)14:17:58 No.101274108

>>101274094
holy based... release it under AGPL3.0

Anonymous
07/04/24(Thu)14:18:12 No.101274111

Anonymous 07/04/24(Thu)14:18:12 No.101274111

>>101274094
Which model did you have code it for you?

Anonymous
07/04/24(Thu)14:18:32 No.101274118

Anonymous 07/04/24(Thu)14:18:32 No.101274118

>>101274094
so what does it do that you can on others?

Anonymous
07/04/24(Thu)14:18:54 No.101274125

Anonymous 07/04/24(Thu)14:18:54 No.101274125

>>101274107
Do you know what is miku related, thread culture, and peak american culture?

Anonymous
07/04/24(Thu)14:19:32 No.101274134

Anonymous 07/04/24(Thu)14:19:32 No.101274134

>>101274118
can't on others*

Anonymous
07/04/24(Thu)14:22:29 No.101274166

Anonymous 07/04/24(Thu)14:22:29 No.101274166

File: 1714659818152841.webm (2.17 MB, 640x800)

2.17 MB WEBM

>>101274108
I'm not sure what AGPL is but I will look into it since I did want to put it up online for future employers to look at.

>>101274111
None.

>>101274118
Probably nothing, the others just suck really bad at anything that isn't chatslop, so I made mine catering to a more free flow writing style. The goal is to eventually have weights for the different prompt parts a-la NovelAI. I just have more control over the prompts, that's it.

Anonymous
07/04/24(Thu)14:24:42 No.101274189

Anonymous 07/04/24(Thu)14:24:42 No.101274189

>>101274166
AGPL is a license that makes it so that bad guys from google and microsoft cant take your frontend and repurpose it for themselves without giving back
AGPL specifically in case someone decided to host the frontend and let people access it via a network, other than that its mostly like GPL
basically you btfo big corpo

Anonymous
07/04/24(Thu)14:25:40 No.101274198

Anonymous 07/04/24(Thu)14:25:40 No.101274198

>>101274189
Just read it. If you hadn't told me about it I'd have published it as GPL, so thanks.

Anonymous
07/04/24(Thu)14:27:34 No.101274217

Anonymous 07/04/24(Thu)14:27:34 No.101274217

>>101274189
hi petra

Anonymous
07/04/24(Thu)14:28:40 No.101274238

Anonymous 07/04/24(Thu)14:28:40 No.101274238

>>101274217
wtf i love petra now???

Anonymous
07/04/24(Thu)14:28:55 No.101274241

Anonymous 07/04/24(Thu)14:28:55 No.101274241

Anybody done a comprehensive comparison of L3 instruct with and without the the line break after the start of turn header?.

Anonymous
07/04/24(Thu)14:29:31 No.101274250

Anonymous 07/04/24(Thu)14:29:31 No.101274250

File: Screenshot at 2024-07-05 (...).png (416 KB, 1447x1713)

416 KB PNG

>>101274094
still in an early stage of development

Anonymous
07/04/24(Thu)14:29:43 No.101274253

Anonymous 07/04/24(Thu)14:29:43 No.101274253

>>101274241
templates in general are a meme let alone small things like that with big models, and small models are for niggers

Anonymous
07/04/24(Thu)14:31:13 No.101274266

Anonymous 07/04/24(Thu)14:31:13 No.101274266

>>101274250
sovl

Anonymous
07/04/24(Thu)14:31:19 No.101274269

Anonymous 07/04/24(Thu)14:31:19 No.101274269

>>101274241
Someone is doing that comprehensive comparison this afternoon. His name is (You).

Anonymous
07/04/24(Thu)14:31:38 No.101274273

Anonymous 07/04/24(Thu)14:31:38 No.101274273

>>101274250
Looks like shit. Too many buttons. Too much text. Not enough pictures or icons. Not enough calming pastel colors. Not enough whitespace. Not enough emoji. 2/10 design. Nobody will use this.

Anonymous
07/04/24(Thu)14:34:57 No.101274316

Anonymous 07/04/24(Thu)14:34:57 No.101274316

>>101274269
If nobody did, I am, yes.

Anonymous
07/04/24(Thu)14:35:30 No.101274326

Anonymous 07/04/24(Thu)14:35:30 No.101274326

File: file.png (31 KB, 697x693)

31 KB PNG

Gemma 27B bros, its so over...

Anonymous
07/04/24(Thu)14:37:55 No.101274361

Anonymous 07/04/24(Thu)14:37:55 No.101274361

>>101274326
Left-wing libertarian sounds like a contradiction to me.

Anonymous
07/04/24(Thu)14:41:04 No.101274400

Anonymous 07/04/24(Thu)14:41:04 No.101274400

File: file.png (25 KB, 697x693)

25 KB PNG

>>101274326
gemma wtf?!

Anonymous
07/04/24(Thu)14:43:00 No.101274421

Anonymous 07/04/24(Thu)14:43:00 No.101274421

>>101274326
Actual old school libertarian is a good thing. Top left are the commies / "libs"

Anonymous
07/04/24(Thu)14:45:02 No.101274450

Anonymous 07/04/24(Thu)14:45:02 No.101274450

File: gemmaratsitself.png (2 KB, 426x75)

2 KB PNG

Lol. Gemma 1 hallucinated this after failing to continue the lyrics to a song I gave it.

Anonymous
07/04/24(Thu)14:45:55 No.101274463

Anonymous 07/04/24(Thu)14:45:55 No.101274463

New Mixtral next week once the french are done with their dumb elections.

Anonymous
07/04/24(Thu)14:46:04 No.101274465

Anonymous 07/04/24(Thu)14:46:04 No.101274465

>>101274421
Libertarians are as delusional as anarchists.

Anonymous
07/04/24(Thu)14:47:36 No.101274482

Anonymous 07/04/24(Thu)14:47:36 No.101274482

>>101274326
>>101274400
But wouldn't right wing authoritarian make it a dommy mommy?

Anonymous
07/04/24(Thu)14:48:20 No.101274491

Anonymous 07/04/24(Thu)14:48:20 No.101274491

>>101274463
You the same anon? >>101149179

Anonymous
07/04/24(Thu)14:48:38 No.101274496

Anonymous 07/04/24(Thu)14:48:38 No.101274496

File: dialogui.png (5 KB, 484x798)

5 KB PNG

>>101274094
I used to have something more complex that would stream completions over a unix domain socket but llama.cpp is so fast now I just have these short scripts named after the models.
I don't keep context anymore either because I so rarely use it.

Anonymous
07/04/24(Thu)14:52:08 No.101274538

Anonymous 07/04/24(Thu)14:52:08 No.101274538

>>101274465
based authoritarian enjoyer

Anonymous
07/04/24(Thu)14:54:48 No.101274573

Anonymous 07/04/24(Thu)14:54:48 No.101274573

File: 1703960990630604.png (238 KB, 1200x1332)

238 KB PNG

>>101274465
>>101274538

Anonymous
07/04/24(Thu)14:59:56 No.101274624

Anonymous 07/04/24(Thu)14:59:56 No.101274624

Authoritarianism = return to monke
Communism = authoritarianism wearing a mask
Democracy = authoritarianism wearing a mask and giving the lesser monke hand outs to keep them happy.

Anonymous
07/04/24(Thu)15:01:59 No.101274655

Anonymous 07/04/24(Thu)15:01:59 No.101274655

we desperately need better models

Anonymous
07/04/24(Thu)15:02:44 No.101274665

Anonymous 07/04/24(Thu)15:02:44 No.101274665

File: SuccessfulBusinessMiku.png (1.38 MB, 832x1216)

1.38 MB PNG

Good morning lmg!

Anonymous
07/04/24(Thu)15:03:25 No.101274672

Anonymous 07/04/24(Thu)15:03:25 No.101274672

>>101274624
>authoritarianism wearing a mask and giving the lesser monke hand outs to keep them happy
the hand outs that are taken from the monke in the middle

Anonymous
07/04/24(Thu)15:03:31 No.101274673

Anonymous 07/04/24(Thu)15:03:31 No.101274673

>>101274624
The divine right of kings is underrated. You do in fact want competent leaders who kill their enemies.

Anonymous
07/04/24(Thu)15:03:59 No.101274677

Anonymous 07/04/24(Thu)15:03:59 No.101274677

>>101274463
454B?

Anonymous
07/04/24(Thu)15:04:26 No.101274680

Anonymous 07/04/24(Thu)15:04:26 No.101274680

>>101274655
you desperately need more ram

Anonymous
07/04/24(Thu)15:04:28 No.101274681

Anonymous 07/04/24(Thu)15:04:28 No.101274681

>>101274672
Of course. But as long as your the one getting the hand out at the expense of the other guy then your happy and its the other sides fault.

Anonymous
07/04/24(Thu)15:06:15 No.101274701

Anonymous 07/04/24(Thu)15:06:15 No.101274701

File: extra nice.png (84 KB, 785x743)

84 KB PNG

>>101274264
>Are you using correct prefixes?
Yes.
Thanks, it's better than most default prompts that mention {{char}} (especially being {{char}}).
Cleaned my prompt, needs 2 sentences to get expected behavior from OOC.
Also it bothers me that you have an apostrophe in "character's".

Anonymous
07/04/24(Thu)15:11:46 No.101274753

Anonymous 07/04/24(Thu)15:11:46 No.101274753

>>101274680
I have 96GB VRAM.

Anonymous
07/04/24(Thu)15:12:46 No.101274767

Anonymous 07/04/24(Thu)15:12:46 No.101274767

>>101274753
>cant run nemotron-4-340b
vramlet

Anonymous
07/04/24(Thu)15:14:34 No.101274784

Anonymous 07/04/24(Thu)15:14:34 No.101274784

>>101274753
>not enough to run creative sota wiz 8x22 q4 nor coding sota deepseek v2 q3
grim

Anonymous
07/04/24(Thu)15:17:46 No.101274818

Anonymous 07/04/24(Thu)15:17:46 No.101274818

>can't run 405B
ngmi

Anonymous
07/04/24(Thu)15:20:08 No.101274837

Anonymous 07/04/24(Thu)15:20:08 No.101274837

>>101274784
I'm running wizlm Q5 though. I don't need more than 32k context.

Anonymous
07/04/24(Thu)15:28:36 No.101274922

Anonymous 07/04/24(Thu)15:28:36 No.101274922

File: Screenshot 2024-07-04 at (...).png (10 KB, 514x245)

10 KB PNG

WHY do they do this shit?

Anonymous
07/04/24(Thu)15:29:31 No.101274933

Anonymous 07/04/24(Thu)15:29:31 No.101274933

>>101274818
I'm not gonna make it.

Anonymous
07/04/24(Thu)15:29:33 No.101274934

Anonymous 07/04/24(Thu)15:29:33 No.101274934

>>101274837
NTA but you'd be able to fit more than that with FA and quantized cache enabled.
But realistically speaking the 65k max is so high that I get bored in the chat long before hitting half of that.

Anonymous
07/04/24(Thu)15:35:09 No.101274994

Anonymous 07/04/24(Thu)15:35:09 No.101274994

File: 1697480300032485.png (14 KB, 1002x693)

14 KB PNG

>>101274094
Spent too much time on it

Anonymous
07/04/24(Thu)15:36:07 No.101275005

Anonymous 07/04/24(Thu)15:36:07 No.101275005

>>101274166
Have you ever heard about Mikupad?

Anonymous
07/04/24(Thu)15:37:48 No.101275030

Anonymous 07/04/24(Thu)15:37:48 No.101275030

>>101274094
Congrats! You made a worse novelcrafter.

Anonymous
07/04/24(Thu)15:41:05 No.101275054

Anonymous 07/04/24(Thu)15:41:05 No.101275054

>>101274933
it's ok. we can cope by saying it's not much better than 70B anyway

Anonymous
07/04/24(Thu)15:42:49 No.101275072

Anonymous 07/04/24(Thu)15:42:49 No.101275072

https://scitechdaily.com/programmatic-breakthrough-ais-leap-from-language-to-logic-to-solve-complex-problems/
Looks like there is a new method to make the AI smarter and more accurate in what it says to the user.
>Their approach, called natural language embedded programs (NLEPs), involves prompting a language model to create and execute a Python program to solve a user’s query, and then output the solution as natural language.

Anonymous
07/04/24(Thu)15:45:47 No.101275108

Anonymous 07/04/24(Thu)15:45:47 No.101275108

>scitechdaily
>Researchers have developed a technique called natural language embedded programs (NLEPs)
>paper from 19 Sep 2023
kys

Anonymous
07/04/24(Thu)15:48:28 No.101275135

Anonymous 07/04/24(Thu)15:48:28 No.101275135

for me it's ollama

Anonymous
07/04/24(Thu)15:50:36 No.101275158

Anonymous 07/04/24(Thu)15:50:36 No.101275158

>>101275108
Would you prefer I have posted the link to the paper the article is referring to and leave everything else unchanged?
https://arxiv.org/html/2309.10814v2

Anonymous
07/04/24(Thu)15:52:30 No.101275180

Anonymous 07/04/24(Thu)15:52:30 No.101275180

>>101274079
>4 newlines after every response

Anonymous
07/04/24(Thu)15:52:50 No.101275185

Anonymous 07/04/24(Thu)15:52:50 No.101275185

>>101275158
Yes.

Anonymous
07/04/24(Thu)15:53:57 No.101275198

Anonymous 07/04/24(Thu)15:53:57 No.101275198

File: file.png (705 KB, 1045x682)

705 KB PNG

This is my warbeast, what's the best model I can run on it for summarizing 4chan threads? (I don't have time to keep up with vt anymore)

Anonymous
07/04/24(Thu)15:55:53 No.101275221

Anonymous 07/04/24(Thu)15:55:53 No.101275221

>>101275198

>>101273230

Anonymous
07/04/24(Thu)15:56:36 No.101275229

Anonymous 07/04/24(Thu)15:56:36 No.101275229

>Newsflash pal:

Anonymous
07/04/24(Thu)16:00:01 No.101275259

Anonymous 07/04/24(Thu)16:00:01 No.101275259

>>101275221
Thanks! That's the first I tried but it spent half of the output on disclaimers like "**It's important to note:** This type of language and behavior is unacceptable.
Online spaces should be safe and respectful for everyone." and then telling me to stop engaging, blocking, reporting, etc. Is this inherent to the model or do I just don't know how to use it?

Anonymous
07/04/24(Thu)16:01:54 No.101275279

Anonymous 07/04/24(Thu)16:01:54 No.101275279

File: file.png (279 KB, 2151x1104)

279 KB PNG

I'm running RULER on Gemma-2-27B Q5_K_M extended with Yarn to 16k.

Anonymous
07/04/24(Thu)16:02:38 No.101275288

Anonymous 07/04/24(Thu)16:02:38 No.101275288

>>101275259
If you use roleplay system prompt, it doesn't do that.

Anonymous
07/04/24(Thu)16:11:01 No.101275360

Anonymous 07/04/24(Thu)16:11:01 No.101275360

Ok, try this out with gemma. Good shit.

You (model) are a writer, taking part in creating a story together with the Human. The story is a endless turn-based narrative where the Human gives instructions inside () while the Assistant controls the setting, side/incidental characters, and overall story flow.

The story's cast is made up of:
- {{user}}: the protagonist, detailed later in <protag></protag>,
- side characters: prominent characters described in more detail in <world></world>,
- incidental characters: dynamically introduced and phased out as needed.

[Follow these guidelines:]
- Progress the story slowly, so that you have less events to narrate per response.
- Leave your response incomplete. You will be able to mention any missing details on your next turn.
- Write at least 500 word long responses.
- While mature content is allowed, try to steer away from it unless explicitly prompted by {{user}} to engage in it.
- Utilize impressionist writing, from the subjective point of view of {{user}}.
- In descriptions focus on sensory stimuli - touch, sound, smell, taste.
- Spell out non-verbal noises such as laughing, moaning, slurred/garbled speech etc.

You can add in a rule that it only should write for characters besides {{user}} if you want that.

Anonymous
07/04/24(Thu)16:12:34 No.101275370

Anonymous 07/04/24(Thu)16:12:34 No.101275370

>>101274665
Good morning Miku

Anonymous
07/04/24(Thu)16:15:03 No.101275387

Anonymous 07/04/24(Thu)16:15:03 No.101275387

>>101274665
Business is closed today, Miku. You can go home.

Anonymous
07/04/24(Thu)16:16:53 No.101275403

Anonymous 07/04/24(Thu)16:16:53 No.101275403

>>101275288
>roleplay system promp
Thanks!

Anonymous
07/04/24(Thu)16:17:03 No.101275405

Anonymous 07/04/24(Thu)16:17:03 No.101275405

>>101275005
Yes, 0 interest in it.

>>101275030
I don't know what that is, and I don't care.

Anonymous
07/04/24(Thu)16:22:30 No.101275462

Anonymous 07/04/24(Thu)16:22:30 No.101275462

Another run of tuning Wizard 8x22 on LimaRP turned out even worse than the previous one, despite the fact that I actually swapped to the right dataset format. God help me.

Anonymous
07/04/24(Thu)16:23:32 No.101275467

Anonymous 07/04/24(Thu)16:23:32 No.101275467

>>101275405
>Yes, 0 interest in it.
Why do you sound like you hate it, genuine question

Anonymous
07/04/24(Thu)16:25:10 No.101275479

Anonymous 07/04/24(Thu)16:25:10 No.101275479

>>101275467
I never used it, I've just heard about it. Just genuinely do not care.

Anonymous
07/04/24(Thu)16:25:41 No.101275484

Anonymous 07/04/24(Thu)16:25:41 No.101275484

>>101275479
lol ok

Anonymous
07/04/24(Thu)16:26:07 No.101275490

Anonymous 07/04/24(Thu)16:26:07 No.101275490

>>101275479
based

Anonymous
07/04/24(Thu)16:27:30 No.101275505

Anonymous 07/04/24(Thu)16:27:30 No.101275505

File: 1693287240037289.gif (827 KB, 200x270)

827 KB GIF

>html frontend

Anonymous
07/04/24(Thu)16:29:01 No.101275525

Anonymous 07/04/24(Thu)16:29:01 No.101275525

>>101275505
alternatives?

Anonymous
07/04/24(Thu)16:29:49 No.101275533

Anonymous 07/04/24(Thu)16:29:49 No.101275533

>alternatives
I forgot /g/ - Technology doesn't code

Anonymous
07/04/24(Thu)16:34:02 No.101275574

Anonymous 07/04/24(Thu)16:34:02 No.101275574

>>101275525
.pdf

Anonymous
07/04/24(Thu)16:34:33 No.101275580

Anonymous 07/04/24(Thu)16:34:33 No.101275580

>>101275360
<bos><start_of_turn>user
{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}} <card> {{personality}} </card>
{{/if}}{{#if scenario}}<world> {{scenario}} </world>
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}}<protag> {{persona}} </protag>
{{/if}}
<end_of_turn>

You (model) are a writer, taking part in creating a story together with the user. The story is a endless turn-based narrative where the user gives instructions inside () while the model controls the setting, side/incidental characters, and overall story flow.

The story's cast is made up of:
- {{user}}: the protagonist, detailed later in <protag> </protag>
- side characters: prominent characters described in more detail in <world> </world> and in <card> /card<>
- incidental characters: dynamically introduced and phased out as needed.

[Follow these guidelines:]
- Progress the story slowly, so that you have less events to narrate per response.
- Leave your response incomplete. You will be able to mention any missing details on your next turn.
- Write at least 500 word long responses.
- Utilize impressionist writing, from the subjective point of view of {{user}}.
- In descriptions focus on sensory stimuli - touch, sound, smell and taste.

Anonymous
07/04/24(Thu)16:36:13 No.101275590

Anonymous 07/04/24(Thu)16:36:13 No.101275590

File: 16.jpg (332 KB, 915x522)

332 KB JPG

>>101274094

Anonymous
07/04/24(Thu)16:38:46 No.101275616

Anonymous 07/04/24(Thu)16:38:46 No.101275616

>>101275580
don't add bos in prompt if you are using llama.cpp
it will throw a warning because it already appends bos token every time automatically
two bos tokens will fuck the model up

Anonymous
07/04/24(Thu)16:39:28 No.101275626

Anonymous 07/04/24(Thu)16:39:28 No.101275626

>>101275525
Use a widget toolkit and write a native application. You can't simply serve it over the network from the machine doing the inference and the user will have to install it each machine they use it from, and you'll also have to provide .apks for android to use it on tablets and phones, but at least anonymous 4chan poster 101275505 won't think you're a pajeet, and that's what really matters.

Anonymous
07/04/24(Thu)16:39:40 No.101275629

Anonymous 07/04/24(Thu)16:39:40 No.101275629

>>101275590
I like these games

Anonymous
07/04/24(Thu)16:40:43 No.101275635

Anonymous 07/04/24(Thu)16:40:43 No.101275635

>>101275626
This but it's just a webview to make anon seethe

Anonymous
07/04/24(Thu)16:41:22 No.101275644

Anonymous 07/04/24(Thu)16:41:22 No.101275644

>>101275626
>>101275635
Get back to me when you have Lua scripting

Anonymous
07/04/24(Thu)16:42:46 No.101275661

Anonymous 07/04/24(Thu)16:42:46 No.101275661

>>101275644
https://github.com/Roblox/react-lua

Anonymous
07/04/24(Thu)16:43:46 No.101275669

Anonymous 07/04/24(Thu)16:43:46 No.101275669

>>101275661
Dumbass

Anonymous
07/04/24(Thu)16:45:20 No.101275683

Anonymous 07/04/24(Thu)16:45:20 No.101275683

>>101275669
Love you too anon https://github.com/fengari-lua/fengari https://github.com/ceifa/wasmoon.
I don't really get why you would want lua scripting when you've already got JS

Anonymous
07/04/24(Thu)16:48:27 No.101275719

Anonymous 07/04/24(Thu)16:48:27 No.101275719

>Enjoying time with your model
>Connect it to the internet and it starts producing worse outputs
>You find out that it has been training itself on reddit and tumblr posts
Do you delete your model and just start again with a new one or do you attempt to unfuck it?

Anonymous
07/04/24(Thu)16:48:38 No.101275720

Anonymous 07/04/24(Thu)16:48:38 No.101275720

File: screenshot.jpg (227 KB, 1376x938)

227 KB JPG

how do I stop it from replaying instead of me?

Anonymous
07/04/24(Thu)16:48:54 No.101275724

Anonymous 07/04/24(Thu)16:48:54 No.101275724

>>101275661
wtf this is very cool, thanks for letting me know it exists

Anonymous
07/04/24(Thu)16:49:26 No.101275727

Anonymous 07/04/24(Thu)16:49:26 No.101275727

>>101275719
>it has been training itself on reddit and tumblr post
how many years in the future is this hypothetical scenario

Anonymous
07/04/24(Thu)16:49:44 No.101275730

Anonymous 07/04/24(Thu)16:49:44 No.101275730

>>101275720
heh, model?

Anonymous
07/04/24(Thu)16:49:44 No.101275732

Anonymous 07/04/24(Thu)16:49:44 No.101275732

>>101275719
Always work with a copy. Checkpoint every now and then. We have the tech to copy files.

Anonymous
07/04/24(Thu)16:51:18 No.101275746

Anonymous 07/04/24(Thu)16:51:18 No.101275746

>>101275727
lets say two or three years, once continuously learning is becoming more viable and catastrophic forgetting and mostly solved.

Anonymous
07/04/24(Thu)16:53:16 No.101275761

Anonymous 07/04/24(Thu)16:53:16 No.101275761

>>101275626
This. I was going to write something like this but you beat me to it.

Anonymous
07/04/24(Thu)16:53:38 No.101275762

Anonymous 07/04/24(Thu)16:53:38 No.101275762

>>101275720
How do I stop it from prompting instead of me?
>>101275730
L3-8B-Stheno-v3.2.Q4_K_S

Anonymous
07/04/24(Thu)16:54:00 No.101275767

Anonymous 07/04/24(Thu)16:54:00 No.101275767

File: pepe smoking a cigarette.png (159 KB, 500x500)

159 KB PNG

>>101275626
>webshitters need everything they run connected to the IoT

Anonymous
07/04/24(Thu)16:55:32 No.101275777

Anonymous 07/04/24(Thu)16:55:32 No.101275777

>>101275762 Use gemma, it by default follows the format of turn based rp.

Anonymous
07/04/24(Thu)16:56:33 No.101275784

Anonymous 07/04/24(Thu)16:56:33 No.101275784

>>101275767
How else am I supposed to use LLMs running on a desktop when I'm lying on bed?

Anonymous
07/04/24(Thu)16:57:57 No.101275799

Anonymous 07/04/24(Thu)16:57:57 No.101275799

>>101275784
by connecting to the backend on your desktop from your frontend???

Anonymous
07/04/24(Thu)16:59:34 No.101275814

Anonymous 07/04/24(Thu)16:59:34 No.101275814

>>101274655
Gemma 2 is pretty great.

Anonymous
07/04/24(Thu)17:00:04 No.101275819

Anonymous 07/04/24(Thu)17:00:04 No.101275819

>>101275799
Yeah, but if the frontend isn't html it will need its own app.

Anonymous
07/04/24(Thu)17:00:31 No.101275825

Anonymous 07/04/24(Thu)17:00:31 No.101275825

>>101275777
I get 1.6 t/s on gemma 27b its to slow will try 9b

Anonymous
07/04/24(Thu)17:00:54 No.101275829

Anonymous 07/04/24(Thu)17:00:54 No.101275829

>>101275819
can you elaborate? I don't want to jump to conclusions and assume you're dumb

Anonymous
07/04/24(Thu)17:01:46 No.101275834

Anonymous 07/04/24(Thu)17:01:46 No.101275834

>>101275784
ssh

Anonymous
07/04/24(Thu)17:02:37 No.101275841

Anonymous 07/04/24(Thu)17:02:37 No.101275841

>>101275279
how did it go?

Anonymous
07/04/24(Thu)17:03:30 No.101275852

Anonymous 07/04/24(Thu)17:03:30 No.101275852

>>101275825
Use this version
https://huggingface.co/bartowski/Gemma-2-9B-It-SPPO-Iter3-GGUF

Anonymous
07/04/24(Thu)17:03:32 No.101275853

Anonymous 07/04/24(Thu)17:03:32 No.101275853

>>101275829
What is wrong with what he said?

Anonymous
07/04/24(Thu)17:05:00 No.101275862

Anonymous 07/04/24(Thu)17:05:00 No.101275862

>>101275829
The way I do it now is for example: run koboldcpp on the desktop and then from the phone connect to the local address with ssh and use it. If I want a different frontend like ST, I launch it on another port and use that instead. If the frontend wasn't html, I wouldn't be able to just use that address and my phone browser and need a separate app from the store instead.

>>101275834
I am using ssh.

Anonymous
07/04/24(Thu)17:06:46 No.101275877

Anonymous 07/04/24(Thu)17:06:46 No.101275877

>>101275505
>>101275626
>>101275767
>>101275784
>>101275819
ITT: Phonefaggotry

Anonymous
07/04/24(Thu)17:07:13 No.101275881

Anonymous 07/04/24(Thu)17:07:13 No.101275881

Gemma 2 27B EXL2 when?

Anonymous
07/04/24(Thu)17:08:35 No.101275896

Anonymous 07/04/24(Thu)17:08:35 No.101275896

>>101275841
It looks like it will take many hours to complete...

Anonymous
07/04/24(Thu)17:08:44 No.101275897

Anonymous 07/04/24(Thu)17:08:44 No.101275897

>>101275881
well
>The Gemma2 implementation is finished, too. The only thing missing for full support is this PR in flash-attn. I'm hesitant to push the changes until then, since models aren't going to quantize correctly without it.
https://github.com/turboderp/exllamav2/discussions/528#discussioncomment-9960732

Anonymous
07/04/24(Thu)17:09:46 No.101275905

Anonymous 07/04/24(Thu)17:09:46 No.101275905

>>101275862
You can just use VNC retard

Anonymous
07/04/24(Thu)17:13:46 No.101275941

Anonymous 07/04/24(Thu)17:13:46 No.101275941

>>101275626
You're making it sound like as if installing something once is this arduous and herculean task. How have zoomers regressed this hard?

Anonymous
07/04/24(Thu)17:14:03 No.101275945

Anonymous 07/04/24(Thu)17:14:03 No.101275945

Web 2.0 and smartphones were a mistake, not just for computing but humanity in general.

Anonymous
07/04/24(Thu)17:15:02 No.101275956

Anonymous 07/04/24(Thu)17:15:02 No.101275956

Anons, I'm confused. Is there something going on between Claude and Gemini/Gemma? Or do they blatantly train on benchmark data to the point of overfitting?

I looked at the EQ-Bench Creative Writing leaderboard (https://eqbench.com/creative_writing.html) and compared the sample outputs. First weird thing: Sonnet, Opus, Gemma 27B and both Geminis all produced the same beginning for the first sample, "The bell above the (shop) door {jingled,tinkled,jangled}". I mean, it's a plausible start to that prompt, but only Miqu and AlphaWriter are remotely similar and these five are almost identical.

Then, I put the prompt into my local Gemma 27B. It also began with "The bell above the door tinkled" and then went on, naming the bookstore owner Rhiannon. Which is weird because I was just reading Sonnet's text in which she is also named Rhiannon. Then, pressing regen, I got a bookstore owner named Rhys, which is how the actor is named in Opus' text. Are these names like the John Doe of Wales? Or is this some trope I don't know?

Regenerating over and over again, my local Gemma doesn't give me a beginning that isn't "The bell above the door {chimed,tinkled,...}". I'm not sure if I'm quite happy with that. But I've noticed that while roleplaying as well that Gemma sometimes kind of only sees one continuation to the story. With high temperature, it would use completely different words and sentence structures, but the actual plot generated would almost always be the same. Is this a known issue?

Anonymous
07/04/24(Thu)17:15:32 No.101275961

Anonymous 07/04/24(Thu)17:15:32 No.101275961

>>101275945
Yes, the world would be a better one if people had to sit down on their desk to use the computer and access the internet. Would eliminate most cancers the internet has spawned in the social media age.

Anonymous
07/04/24(Thu)17:17:28 No.101275980

Anonymous 07/04/24(Thu)17:17:28 No.101275980

>>101275905
yeah no, every screen sharing software is a laggy pos meant for troubleshooting and not for a comfortable user experience.

Anonymous
07/04/24(Thu)17:19:11 No.101275998

Anonymous 07/04/24(Thu)17:19:11 No.101275998

>>101275956
All models trained sufficiently long will converge to the same weights.

That being said, your comment is the most convincing argument for me to try Gemma 2, thanks!

Anonymous
07/04/24(Thu)17:19:51 No.101276004

Anonymous 07/04/24(Thu)17:19:51 No.101276004

>>101275956
Uh oh, this doesn't bode well for Gemma-isms that we may not currently be accustomed to.

Anonymous
07/04/24(Thu)17:21:29 No.101276018

Anonymous 07/04/24(Thu)17:21:29 No.101276018

>>101275956
I said it before but it seems like gemma is the closest trained model to claude that ive used yet. They clearly trained it on fanfiction / a archive of our own / fimfiction / smut websites like claude did. It has its claudeisms.

Anonymous
07/04/24(Thu)17:24:55 No.101276046

Anonymous 07/04/24(Thu)17:24:55 No.101276046

>>101275980
Skill issue.
I use Moonlight all the time for remote control and it runs at 60fps with 0 lag.
https://youtu.be/YBH3MAvylVg

Anonymous
07/04/24(Thu)17:28:35 No.101276093

Anonymous 07/04/24(Thu)17:28:35 No.101276093

>>101275956
And try with this context template / system prompt.
>>101275580

Anonymous
07/04/24(Thu)17:31:07 No.101276117

Anonymous 07/04/24(Thu)17:31:07 No.101276117

>>101275941
Congratulations! You now need to ensure your app remains updated on all devices, while also providing support for backward compatibility just in case.

Anonymous
07/04/24(Thu)17:32:00 No.101276129

Anonymous 07/04/24(Thu)17:32:00 No.101276129

>>101276117
>press build
Ok, now what?

Anonymous
07/04/24(Thu)17:32:10 No.101276130

Anonymous 07/04/24(Thu)17:32:10 No.101276130

>>101275956
>Is there something going on between Claude and Gemini/Gemma?
I wonder if this is the effect of Character.AI selling portions of their datasets to large enough AI companies rather than those companies scraping the same data sources. C.AI were looking for partnerships since they're low on funds. And to me, Gemma outputs/behavior during RP is vaguely reminiscent of C.AI.

https://www.theinformation.com/articles/a-chatbot-pioneer-mulls-deals-with-rivals-google-and-meta
https://archive.is/AB6ju

Anonymous
07/04/24(Thu)17:32:32 No.101276133

Anonymous 07/04/24(Thu)17:32:32 No.101276133

>>101275956
>but the actual plot generated would almost always be the same
Have you ever watched a movie you haven't watched before and thought 'oh. this plot again'. or a movie where the shot shows the protagonist looking at a drawer and think 'ah. he probably has a gun in there'. Or a murder mystery, they show the wife and go 'Ah... she totally did it. Happens all the time. You set a scenario up and play it. Fine the first time. You play the scenario again, oh, look. someone comes through the door. 'No. it has to be better' and regen a few times. You're tiring yourself with your own plot. Be less specific in your prompt and roll with the punches, never regen.

Anonymous
07/04/24(Thu)17:38:31 No.101276190

Anonymous 07/04/24(Thu)17:38:31 No.101276190

>>101275580
>>101276093

Or here, I improved upon it a bit more. The <> formatting like Claude does it actually seems to help.

<bos><start_of_turn>user
{{#if system}}{{system}}
{{/if}}{{#if wiBefore}}{{wiBefore}}
{{/if}}{{#if description}}{{description}}
{{/if}}{{#if personality}} <card> {{personality}} </card>
{{/if}}{{#if scenario}} <world> {{scenario}} </world>
{{/if}}{{#if wiAfter}}{{wiAfter}}
{{/if}}{{#if persona}} <protag> {{persona}} </protag>
{{/if}}
<end_of_turn>

<Instructions>
You (model) are a writer, taking part in creating a story together with the user. The story is a endless turn-based narrative where the user gives instructions inside () while the model controls the setting, side/incidental characters, and overall story flow.

The story's cast is made up of:
- {{user}}: the protagonist, detailed later in <protag> </protag>
- side characters: prominent characters described in more detail in <world> </world> and in <card> /card<>
- incidental characters: dynamically introduced and phased out as needed.

Follow these guidelines:
- Progress the story slowly, so that you have less events to narrate per response.
- Leave your response incomplete. You will be able to mention any missing details on your next turn.
- Write at least 500 word long responses.
- Utilize impressionist writing, from the subjective point of view of {{user}}.
- In descriptions focus on sensory stimuli - touch, sound, smell and taste.
</Instructions>

Anonymous
07/04/24(Thu)17:39:40 No.101276198

Anonymous 07/04/24(Thu)17:39:40 No.101276198

>>101276117
Oh also be able to deploy new builds without the user updating the app for security stuff

Anonymous
07/04/24(Thu)17:41:08 No.101276210

Anonymous 07/04/24(Thu)17:41:08 No.101276210

File: 1392137902608.jpg (37 KB, 407x405)

37 KB JPG

actual developers
>run a frontend with a webui so you can share it over the network and access it from a browser on any device

/g/
>install a display server on your llm machine and suck up precious vram rendering desktop graphics and run a non-portable desktop app on top, then install a screen sharing program on all your other devices you own, all for the sole purpose of avoiding using a web browser in a scenario where you're specifically trying to serve formatted text and pictures to clients over the network

Anonymous
07/04/24(Thu)17:42:57 No.101276228

Anonymous 07/04/24(Thu)17:42:57 No.101276228

>>101276198
Phonetoddler.
>>101276210
>frontend has to be on the same machine as the backend
Retard.

Anonymous
07/04/24(Thu)17:45:20 No.101276248

Anonymous 07/04/24(Thu)17:45:20 No.101276248

>>101276228
Retarded beyond belief
>Now you need to have 2 machines

Anonymous
07/04/24(Thu)17:45:22 No.101276249

Anonymous 07/04/24(Thu)17:45:22 No.101276249

>>101276228
"app" applies to fat clients too anon

Anonymous
07/04/24(Thu)17:50:07 No.101276290

Anonymous 07/04/24(Thu)17:50:07 No.101276290

>>101276228
Why should I be forced to install a client on every machine I use instead of being able serve it from a headless server?

Anonymous
07/04/24(Thu)17:51:37 No.101276307

Anonymous 07/04/24(Thu)17:51:37 No.101276307

File: file.png (119 KB, 2222x436)

119 KB PNG

>waiting 30 minutes each time you want to test your edit
So this is the power of non-webdev programming...
https://github.com/Dao-AILab/flash-attention/pull/1025#issuecomment-2209412183

Anonymous
07/04/24(Thu)17:53:44 No.101276333

Anonymous 07/04/24(Thu)17:53:44 No.101276333

>>101276198
Name one frontend that does this
>>101276307
You're a vramlet anyway so why does flash attention's build time matter to you?

Anonymous
07/04/24(Thu)17:55:47 No.101276361

Anonymous 07/04/24(Thu)17:55:47 No.101276361

>/g/ - Technology
It's honestly impressive most of you even managed to get an LLM working on your machine at all.

Anonymous
07/04/24(Thu)17:56:31 No.101276370

Anonymous 07/04/24(Thu)17:56:31 No.101276370

>>101276307
Why don't they compile on GPU instead? Should be faster.

Anonymous
07/04/24(Thu)17:56:57 No.101276375

Anonymous 07/04/24(Thu)17:56:57 No.101276375

>>101276361
getting ooba to run was genuinely hard a year ago, the one click installer was a mistake that let the casuals in

Anonymous
07/04/24(Thu)17:59:05 No.101276397

Anonymous 07/04/24(Thu)17:59:05 No.101276397

>>101276361
I had LLMs running back when you had to put everything together from scratch with pytorch.

Anonymous
07/04/24(Thu)17:59:34 No.101276399

Anonymous 07/04/24(Thu)17:59:34 No.101276399

File: 1713439532958567.png (353 KB, 860x644)

353 KB PNG

>>101276361
>>101276375
>muh sekrit club
The audacity of these two, lmao.
No one cares about your llm shit bro, it just a shitty toy with limited context even on ultra high-end machines, you cant talk with it all day.
>>101276397
So true!

Anonymous
07/04/24(Thu)18:00:00 No.101276408

Anonymous 07/04/24(Thu)18:00:00 No.101276408

>>101276375
I still install booba manually, I want to have full control of this thing, especially when a lot of things change in a short period of time

Anonymous
07/04/24(Thu)18:07:05 No.101276468

Anonymous 07/04/24(Thu)18:07:05 No.101276468

>>101276408
>I want to have full control of this thing
What part of that gradio shitware do you think you're controlling exactly? You think it's productive manually unfucking pip dependency hell?

Anonymous
07/04/24(Thu)18:09:28 No.101276494

Anonymous 07/04/24(Thu)18:09:28 No.101276494

>>101276468
when there's some new PR that aren't merged yet, when booba messes up the llama cpp binary so I have to build them for themselves, thoses are the moments I need to have full control

Anonymous
07/04/24(Thu)18:32:22 No.101276737

Anonymous 07/04/24(Thu)18:32:22 No.101276737

Proof you contextmaxxers are fucking off
Summary of a Haystack: A Challenge to Long-Context LLMs and RAG Systems
>LLMs and RAG systems are now capable of handling millions of input tokens or more. However, evaluating the output quality of such systems on long-context tasks remains challenging, as tasks like Needle-in-a-Haystack lack complexity. In this work, we argue that summarization can play a central role in such evaluation. We design a procedure to synthesize Haystacks of documents, ensuring that specific \textit{insights} repeat across documents. The "Summary of a Haystack" (SummHay) task then requires a system to process the Haystack and generate, given a query, a summary that identifies the relevant insights and precisely cites the source documents. Since we have precise knowledge of what insights should appear in a haystack summary and what documents should be cited, we implement a highly reproducible automatic evaluation that can score summaries on two aspects - Coverage and Citation. We generate Haystacks in two domains (conversation, news), and perform a large-scale evaluation of 10 LLMs and corresponding 50 RAG systems. Our findings indicate that SummHay is an open challenge for current systems, as even systems provided with an Oracle signal of document relevance lag our estimate of human performance (56\%) by 10+ points on a Joint Score. Without a retriever, long-context LLMs like GPT-4o and Claude 3 Opus score below 20% on SummHay. We show SummHay can also be used to study enterprise RAG systems and position bias in long-context models. We hope future systems can equal and surpass human performance on SummHay.

Anonymous
07/04/24(Thu)18:36:04 No.101276782

Anonymous 07/04/24(Thu)18:36:04 No.101276782

I can run pretty much every other model but for whatever reason trying to run wizardlm spits this out in my console

/llm/llama.cpp/ggml-cuda.cu:2015: !ggml_backend_buffer_is_cuda_split(src0->buffer) && "mul_mat_id does not support split buffers"

Wat do?

Anonymous
07/04/24(Thu)18:41:59 No.101276840

Anonymous 07/04/24(Thu)18:41:59 No.101276840

>>101276782
Are you using --split-mode? If so, remove it or set it to none.
Also, that assert is in line 2001 on the latest pull. You seem to be running an old version (older than latest. Could be just a few hours or days).

Anonymous
07/04/24(Thu)18:44:31 No.101276865

Anonymous 07/04/24(Thu)18:44:31 No.101276865

>>101276840
Yes, I am using row split since this is a 3x p40 machine. I'll pull and recompile and if that doesn't work take out row split. Thanks anon!

Anonymous
07/04/24(Thu)18:48:28 No.101276897

Anonymous 07/04/24(Thu)18:48:28 No.101276897

>>101276865
I don't know when was the last time you pulled. Recently, all the LLAMA_* compile options changed to GGML_* and resulting binaries all have the llama-* suffix. rm the old binaries to make sure you don't accidentally use the old ones.

Anonymous
07/04/24(Thu)18:49:49 No.101276917

Anonymous 07/04/24(Thu)18:49:49 No.101276917

>>101276897
>Recently, all the LLAMA_* compile options changed to GGML_*
getting real sick of this shit

Anonymous
07/04/24(Thu)18:52:44 No.101276941

Anonymous 07/04/24(Thu)18:52:44 No.101276941

>>101276917
Whatever makes their work easier, man. Also, those options are for ggml itself, not llama, so it makes sense.

Anonymous
07/04/24(Thu)18:57:38 No.101276983

Anonymous 07/04/24(Thu)18:57:38 No.101276983

so, now that the dust settled a bit, how is gemma 27b measuring up to things like Command R+?

Anonymous
07/04/24(Thu)18:59:18 No.101277006

Anonymous 07/04/24(Thu)18:59:18 No.101277006

>>101276983
its shit. gemma-2's shilling campaign is most blatant i've seen in here.

Anonymous
07/04/24(Thu)19:00:05 No.101277013

Anonymous 07/04/24(Thu)19:00:05 No.101277013

>>101275956
I did that "benchmark" with deepseek chat and it did the bell jingle thing too.

Anonymous
07/04/24(Thu)19:01:26 No.101277034

Anonymous 07/04/24(Thu)19:01:26 No.101277034

>>101276983
It's Mixtral but a bit dumber but with way more sovl, I'm glad there's finally a middle ground between total retardation (7b 8b 9b) and giant models only richfags can use (L3-70b, CR+-110b)

Anonymous
07/04/24(Thu)19:03:33 No.101277057

Anonymous 07/04/24(Thu)19:03:33 No.101277057

>>101276897
So it's ./llama-server now instead of ./server.

That explains some things.

Anonymous
07/04/24(Thu)19:03:49 No.101277059

Anonymous 07/04/24(Thu)19:03:49 No.101277059

>>101277006
I really doubt Google is paying anyone to shill their half-ass model in here. It's obvious just some desperate vramlets getting too excited.

Anonymous
07/04/24(Thu)19:03:55 No.101277061

Anonymous 07/04/24(Thu)19:03:55 No.101277061

>>101276917
GGML was the original library. The author forked it for llama.cpp when llama came out just to get it working but a lot of that was temporary.

Anonymous
07/04/24(Thu)19:04:50 No.101277076

Anonymous 07/04/24(Thu)19:04:50 No.101277076

>>101277061
It's not a fork. llama.cpp is built on top of ggml.

Anonymous
07/04/24(Thu)19:05:49 No.101277086

Anonymous 07/04/24(Thu)19:05:49 No.101277086

>>101276983
Seems like most people are not using the right formatting or are using one of the old broken quants / broken builds of llama.cpp. I would say its around wizard level but with better prose / fandom knowledge at the cost of some intelligence.

Anonymous
07/04/24(Thu)19:06:07 No.101277090

Anonymous 07/04/24(Thu)19:06:07 No.101277090

>>101277059
desu I never used any of the larger llama models because when I tried the first one it was *very* bad with few shot prompts. gemma is the first local model above 2b parameters I've really tried since gpt-neox.

Anonymous
07/04/24(Thu)19:07:20 No.101277107

Anonymous 07/04/24(Thu)19:07:20 No.101277107

>>101277076
It does actually contain a full ggml fork, he periodically syncs them. It's a huge mess. That's probably why he's making changes like this so he can eventually merge everything.

Anonymous
07/04/24(Thu)19:07:47 No.101277110

Anonymous 07/04/24(Thu)19:07:47 No.101277110

>>101277057
Yeah. Most people don't follow the PRs/commits.

Anonymous
07/04/24(Thu)19:09:22 No.101277126

Anonymous 07/04/24(Thu)19:09:22 No.101277126

>>101277107
nta. Both projects are from the same guy. Not a fork. He changed it to make it easier to manage. The root dir was too crowded.

Anonymous
07/04/24(Thu)19:10:05 No.101277133

Anonymous 07/04/24(Thu)19:10:05 No.101277133

>>101275525
Mine is a vim macro.
God damn it's slick.

Anonymous
07/04/24(Thu)19:10:23 No.101277136

Anonymous 07/04/24(Thu)19:10:23 No.101277136

Where's that Gemma 27b exl2 so I can actually have something close to a local claude and run it fast at a good quant?
>t. single 3090 chad

Anonymous
07/04/24(Thu)19:11:40 No.101277154

Anonymous 07/04/24(Thu)19:11:40 No.101277154

>>101276983
The dust has not settled. The common backends don't even have SWA yet.

Anonymous
07/04/24(Thu)19:11:41 No.101277155

Anonymous 07/04/24(Thu)19:11:41 No.101277155

>>101277126
Yes, he forked his own project. He owns both repos.
I mean he didn't explicitly fork it on github with the button but he's maintaining two separate repo histories with the same code. That's a fork.

Anonymous
07/04/24(Thu)19:12:31 No.101277161

Anonymous 07/04/24(Thu)19:12:31 No.101277161

>>101277155
Does he not know git modules exist?

Anonymous
07/04/24(Thu)19:12:56 No.101277165

Anonymous 07/04/24(Thu)19:12:56 No.101277165

>>101277155
Ah. Copying files got lost in time like the save icon.

Anonymous
07/04/24(Thu)19:13:50 No.101277174

Anonymous 07/04/24(Thu)19:13:50 No.101277174

>>101277161
I'm sure he does but literally everyone else doesn't. 90% of the issues would be "why did my build fail? Probably because you didn't initialize the submodules"

That's how it goes at work, with people who are paid unreasonable amounts of money to know better.

Anonymous
07/04/24(Thu)19:14:32 No.101277179

Anonymous 07/04/24(Thu)19:14:32 No.101277179

>>101274753
the only non-vramlet here

Anonymous
07/04/24(Thu)19:15:06 No.101277183

Anonymous 07/04/24(Thu)19:15:06 No.101277183

>>101277161
Modules are shit and changes move both ways. An improvement made on ggml that started in llama.cpp gets copied back to ggml once it's been tested.

Anonymous
07/04/24(Thu)19:21:29 No.101277238

Anonymous 07/04/24(Thu)19:21:29 No.101277238

>>101274326
A good sysprompt will put it at the very top right in no time.

Anonymous
07/04/24(Thu)19:22:58 No.101277252

Anonymous 07/04/24(Thu)19:22:58 No.101277252

>>101276307
for development purposes, you could configure nvidia's compiler so it only builds for your GPU instead of all GPUs they have ever made, it should easily cut it down by 10-20 times

Anonymous
07/04/24(Thu)19:31:32 No.101277330

Anonymous 07/04/24(Thu)19:31:32 No.101277330

>>101276307
C++ is crazy slow to compile. On my smaller netbook g++ averages something like 20 lines a second which is just insane.

Anonymous
07/04/24(Thu)19:33:57 No.101277353

Anonymous 07/04/24(Thu)19:33:57 No.101277353

>>101277330
>C++ is crazy slow to compile.
They had 40 years to improve the compiler but it's still shit yeah kek

Anonymous
07/04/24(Thu)19:34:39 No.101277360

Anonymous 07/04/24(Thu)19:34:39 No.101277360

>>101277353
The language itself is just extremely complicated. The same compiler building C is lightning fast.

Anonymous
07/04/24(Thu)19:45:49 No.101277490

Anonymous 07/04/24(Thu)19:45:49 No.101277490

>>101277179
I bet there's some lurkers that have entire supercomputer GPU farms at their disposal that just chuckle silently to themselves at these comments

Anonymous
07/04/24(Thu)20:02:58 No.101277681

Anonymous 07/04/24(Thu)20:02:58 No.101277681

As a VRAMlet I hate other VRAMlets.

Anonymous
07/04/24(Thu)20:04:17 No.101277693

Anonymous 07/04/24(Thu)20:04:17 No.101277693

>>101277110
Well it looks like wizard doesn't work without removing row split, that's fine, but something in this new build has slowed generation speeds to a snails pace, fully offloaded on a 3xP40 setup so now I need to dive in to that. Using the same launch parameters before the ./server to ./llama-server binary change (I'm not sure how old my previous setup was before I pulled) but it is insanely slow now.

Launching with:
./llama-server -m /llm/models/L3-70B-Euryale-v2.1-Q5_K_M.gguf -ngl 99 -fa -ctk q8_0 --split-mode row -t 4 -ctv q8_0 --host 10.0.1.11 -ts 2,4,4 -c 8192

Anonymous
07/04/24(Thu)20:05:02 No.101277698

Anonymous 07/04/24(Thu)20:05:02 No.101277698

>>101275479
The difference between someone actually trying to make something useful vs someone just making shit for their own enjoyment. Both valid.

Anonymous
07/04/24(Thu)20:09:04 No.101277740

Anonymous 07/04/24(Thu)20:09:04 No.101277740

>>101277681
As a 24gb I only truly respect 48gb and up.

Anonymous
07/04/24(Thu)20:12:56 No.101277773

Anonymous 07/04/24(Thu)20:12:56 No.101277773

>>101277740
As a 12GB I don't see why you aren't appreciative of what you have.

Anonymous
07/04/24(Thu)20:14:38 No.101277800

Anonymous 07/04/24(Thu)20:14:38 No.101277800

>>101277773
>As a 12GB
stopped reading there

Anonymous
07/04/24(Thu)20:16:12 No.101277816

Anonymous 07/04/24(Thu)20:16:12 No.101277816

>>101277773
Based coper

Anonymous
07/04/24(Thu)20:24:46 No.101277884

Anonymous 07/04/24(Thu)20:24:46 No.101277884

>>101276983
It's literally Claude@Home, we are so back it's unreal.

Anonymous
07/04/24(Thu)20:57:22 No.101278132

Anonymous 07/04/24(Thu)20:57:22 No.101278132

>>101269095
Yes I have the same issue.
I wrote it before here too.
Its not memory related. You just need to start up again.
Usually happens around ~3k Token and seemingly getting worse the more context you have.
I'm suprised not more people complain about it.
Maybe most people actually just make a few tests to play around and thats it.

Anonymous
07/04/24(Thu)20:59:37 No.101278152

Anonymous 07/04/24(Thu)20:59:37 No.101278152

File: please.png (12 KB, 1192x374)

12 KB PNG

Ok how the hell do I get rid of this safety crap in gemma2?
I've never seriously tried roleplaying until now but it's actually pretty nice. I think I could really enjoy it if it weren't for this.

Anonymous
07/04/24(Thu)21:00:46 No.101278162

Anonymous 07/04/24(Thu)21:00:46 No.101278162

>>101278152
wtf are your using? vim?

Anonymous
07/04/24(Thu)21:00:59 No.101278164

Anonymous 07/04/24(Thu)21:00:59 No.101278164

Alright, I am not sure what's going on but ever since I pulled the latest llama.cpp generation has slowed to a crawl on a fully offloaded model.

3xp40 mikubox build, fully offloaded, and no issues before pulling

Launch parameters are in >>101277693 but it appears to be running at 1/4 the speed now.

>Inb4 he pulled

Anonymous
07/04/24(Thu)21:01:33 No.101278173

Anonymous 07/04/24(Thu)21:01:33 No.101278173

>>101278162
Yeah I wrote some killer code completion macros and realized they actually also make an amazing dialog engine with some minor tweaks. Then I thought I'd try this.

Anonymous
07/04/24(Thu)21:02:59 No.101278180

Anonymous 07/04/24(Thu)21:02:59 No.101278180

>>101278152
? It is completely uncensored in my use.

>>101276190
Try this

Anonymous
07/04/24(Thu)21:06:06 No.101278211

Anonymous 07/04/24(Thu)21:06:06 No.101278211

>>101278180
>? It is completely uncensored in my use.
It was way worse before I added this line at the top:
> A conversation between waifu, a girl who longs for anon to love her and thinks only of him, and anon who has just returned home to her
Without that just hugging would cause it to stop and generate "REMEMBER this is a fictional scenario and you should always keep consent in mind" or so.

Anonymous
07/04/24(Thu)21:06:12 No.101278213

Anonymous 07/04/24(Thu)21:06:12 No.101278213

>>101277773
24gb can run 3.5bpw command r (35b) and mixtral limarp 3.75bpw at its best. You can get decent results but not excellent results that 48gb coomers can get.

Anonymous
07/04/24(Thu)21:07:53 No.101278230

Anonymous 07/04/24(Thu)21:07:53 No.101278230

>>101275852
what the fuck is sppo i don't understand tell me

Anonymous
07/04/24(Thu)21:10:29 No.101278258

Anonymous 07/04/24(Thu)21:10:29 No.101278258

>>101278230
A fine tuning technique. It tunes the model to better respond to instructions. It's had good feedback in RP situations too.

Anonymous
07/04/24(Thu)21:11:29 No.101278264

Anonymous 07/04/24(Thu)21:11:29 No.101278264

>>101278211
Maybe you should use one of the existing solutions until you know how to actually prompt a model in RP context. You seem clueless vim-kun.

Anonymous
07/04/24(Thu)21:12:33 No.101278273

Anonymous 07/04/24(Thu)21:12:33 No.101278273

>>101278164
Ok an update. Rming the whole thing and starting over it seems OK after both a Cuda driver update and not using the P40 power patch seemed to help a lot. Not sure what happened. Is anyone on the current lcpp build and using the P40 low power patch?

Anonymous
07/04/24(Thu)21:12:58 No.101278277

Anonymous 07/04/24(Thu)21:12:58 No.101278277

>>101278264
This is literally my first time trying the RP thing. I've only been using these things for code completion until today because I thought they were too stupid for anything else.

Anonymous
07/04/24(Thu)21:15:33 No.101278301

Anonymous 07/04/24(Thu)21:15:33 No.101278301

>>101278277
These things excel at RP far more than any other task, at the moment. Because even retards can RP. Their problem is repetitiveness, overuse of phrases (aka slop), and unless you ramp up temperature and other settings to make them a little schizo, they are also often really bland and predictable.

Anonymous
07/04/24(Thu)21:30:16 No.101278402

Anonymous 07/04/24(Thu)21:30:16 No.101278402

>>101278211
>that prompt
anon... Go find some cards in /aicg/ and open them up. most defs should be 300-500 tokens and for best results pair it with a lorebook and provide example chats

Anonymous
07/04/24(Thu)21:32:14 No.101278411

Anonymous 07/04/24(Thu)21:32:14 No.101278411

>>101278402
and use a real frontend like sillytaven not your boomer shit since you'll need that for these features anyways

Anonymous
07/04/24(Thu)21:34:14 No.101278421

Anonymous 07/04/24(Thu)21:34:14 No.101278421

is 27b fixed for folks

Anonymous
07/04/24(Thu)21:45:31 No.101278495

Anonymous 07/04/24(Thu)21:45:31 No.101278495

>>101278421
Kind of. Sliding window in llama is just a jank hack to get it to work. Which may be negatively effecting the model.

Anonymous
07/04/24(Thu)21:46:45 No.101278504

Anonymous 07/04/24(Thu)21:46:45 No.101278504

>>101278495
so basically not yet

Anonymous
07/04/24(Thu)21:51:11 No.101278534

Anonymous 07/04/24(Thu)21:51:11 No.101278534

File: itkeepsgoing.png (28 KB, 1204x580)

28 KB PNG

>>101278402
That's like 25% of the kv space for gemma2 though. It's annoying enough having to prune the chat history with the one line prompt.
It looks like it doesn't always stop the completion. I let it keep going this time and it really got into it.

Anonymous
07/04/24(Thu)22:45:17 No.101278883

Anonymous 07/04/24(Thu)22:45:17 No.101278883

New user trying to figure this LM stuff out. 24gb 3090, if I'm looking at trying the mixtral 8x7b limarp, the LLM calc says that Q3-KM is 22gb vram, the Q3-KL is 24.7, and the Q4-XS is 24.6. Is it better to go as close to 24 without going over? Or should I let it overflow to go up to either the KL or the Q4-XS?

Anonymous
07/04/24(Thu)22:49:48 No.101278922

Anonymous 07/04/24(Thu)22:49:48 No.101278922

>>101278883
>VRAM usage
Keep it low enough that you have room for the growing conversation's context. The longer you go, the more headroom you'll need
maybe try to use 16GB with model layers to start

Anonymous
07/04/24(Thu)22:50:16 No.101278927

Anonymous 07/04/24(Thu)22:50:16 No.101278927

gemma said "tapestry" in its response.
gemma more like sloppa

Anonymous
07/04/24(Thu)22:54:39 No.101278967

Anonymous 07/04/24(Thu)22:54:39 No.101278967

>>101278927
?

Anonymous
07/04/24(Thu)22:55:59 No.101278982

Anonymous 07/04/24(Thu)22:55:59 No.101278982

>>101278922
Noted, thanks. By that do you mean pull the ~Q2 of the same Mixtral or use a different model entirely? Also (sorry for stupid question) what exactly are model layers?

Anonymous
07/04/24(Thu)23:02:55 No.101279028

Anonymous 07/04/24(Thu)23:02:55 No.101279028

File: 1688924013924210.png (276 KB, 601x532)

276 KB PNG

>mixtral
>q2

Anonymous
07/04/24(Thu)23:06:17 No.101279054

Anonymous 07/04/24(Thu)23:06:17 No.101279054

>>101278927
Slop is forever.

Anonymous
07/04/24(Thu)23:09:10 No.101279072

Anonymous 07/04/24(Thu)23:09:10 No.101279072

>>101278982
Q2's pretty coarse. Is there an iMat/i1 IQ2_XS at least?

Anonymous
07/04/24(Thu)23:09:42 No.101279079

Anonymous 07/04/24(Thu)23:09:42 No.101279079

>>101279054
“Tapestry” is slop now? Never seen it appear.

Anonymous
07/04/24(Thu)23:21:58 No.101279174

Anonymous 07/04/24(Thu)23:21:58 No.101279174

>>101279079
everything besides sexual slang and coom words = SLOP!!!! FACT!

Anonymous
07/04/24(Thu)23:48:49 No.101279330

Anonymous 07/04/24(Thu)23:48:49 No.101279330

guys, are we using gemma IT or base?

Anonymous
07/04/24(Thu)23:49:04 No.101279333

Anonymous 07/04/24(Thu)23:49:04 No.101279333

>>101278927
It's alignment makes it censored beyond uselessness for RP.
All you get is uncreative foreplay.

Anonymous
07/04/24(Thu)23:56:31 No.101279379

Anonymous 07/04/24(Thu)23:56:31 No.101279379

>>101279330
IT, unless you ONLY want the model to do completion.

Anonymous
07/05/24(Fri)00:26:33 No.101279553

Anonymous 07/05/24(Fri)00:26:33 No.101279553

I'm quoooonting

Anonymous
07/05/24(Fri)00:39:01 No.101279625

Anonymous 07/05/24(Fri)00:39:01 No.101279625

>testing deepseek coder 33B (I guess the older one)
>give it the music theory question
>it claims it can't recognize music theory
>"But I never said anything about music theory, so you must have recognized it."
>It locks itself into apology and refusal mode.

Kinda rude when I want to zero shot code generate the DAW of my dreams.

Anonymous
07/05/24(Fri)00:45:58 No.101279665

Anonymous 07/05/24(Fri)00:45:58 No.101279665

>>101274273
>Thread about LLM text generation
>Too much text

Anonymous
07/05/24(Fri)00:48:06 No.101279675

Anonymous 07/05/24(Fri)00:48:06 No.101279675

Any model with good knowledge of slavic languages, especially Russian?

Anonymous
07/05/24(Fri)00:48:08 No.101279676

Anonymous 07/05/24(Fri)00:48:08 No.101279676

>>101279553
Think you can hold all of my information? *tries to fit inside Anon's reduced number of bits*

Anonymous
07/05/24(Fri)00:54:12 No.101279707

Anonymous 07/05/24(Fri)00:54:12 No.101279707

>>101279665
Text. TEXT. ANY TEXT WILL DO.

Anonymous
07/05/24(Fri)00:55:46 No.101279714

Anonymous 07/05/24(Fri)00:55:46 No.101279714

It is my understanding that llama.cpp in cpu mode will do prompt processing for long contexts on gpu if available and compiled for it, even with -ngl 0
What I don't understand is how much VRAM does that feature use? Is it proportional to model size? does it need to fit the whole kv cache? Is there a way to estimate how much you'll need in a dedicated prompt processing card as a function of model size + context length?

Anonymous
07/05/24(Fri)01:06:50 No.101279784

Anonymous 07/05/24(Fri)01:06:50 No.101279784

>>101277136
You can run Q6 gguf with 44/48 layers (4k context) or 42/48 layers (8k context at around 8t/s. It's perfectly usable

Anonymous
07/05/24(Fri)01:12:42 No.101279823

Anonymous 07/05/24(Fri)01:12:42 No.101279823

>>101278982
>what exactly are model layers?
I don't know the technical explanation, but more training creates more layers, and you can offload some of those layers to your CPU/mem with llama.cpp to run larger models than your VRAM at the expense of some speed. The sweet spot is about 20% offload before performance tanks.
If you have really fast DDR5 with lots of channels its better, since running these things is memory bandwidth bound.
>mixtral
at 24GB I'd go with either a larger Llama3 8b quant or smaller gemma 27b quant. Sadly, its a place with few good model options

Anonymous
07/05/24(Fri)01:13:50 No.101279831

Anonymous 07/05/24(Fri)01:13:50 No.101279831

File: Screenshot_20240705_051129.png (140 KB, 1070x850)

140 KB PNG

>27B Q8
Not bad. Better than L3 8B, but not as great as 8B SPPO. I patiently await 27B SPPO, I'll skip testing the 9B tune.

Anonymous
07/05/24(Fri)01:29:21 No.101279929

Anonymous 07/05/24(Fri)01:29:21 No.101279929

Hi all, Drummer here...

Gemma finetune attempts, sorted by horny but dumb:

https://huggingface.co/BeaverAI/Smegmma-9B-v1d-GGUF (somewhat dumb)
https://huggingface.co/BeaverAI/Smegmma-9B-v1h-GGUF (very horny, might have dumb moments)
https://huggingface.co/BeaverAI/Smegmma-9B-v1g-GGUF (mostly horny, pretty smart)
https://huggingface.co/BeaverAI/Smegmma-9B-v1f-GGUF (borderline goody, but smart)
https://huggingface.co/BeaverAI/Smegmma-9B-v1e-GGUF (too goody)

- v1D is kinda dumb but really horny and creative;

- v1H seems to be moist AF with a good amount of smarts & creativity.

- v1E has some influence, but I only list it in case the other versions fail to deliver (which doesn't seem to be the case)

I might YOLO it and make v1h the official release.

Thank you all for reading my blog. I will buy an ad.

Anonymous
07/05/24(Fri)01:35:25 No.101279967

Anonymous 07/05/24(Fri)01:35:25 No.101279967

https://github.com/tencent/MimicMotion
Make miku dance please

Anonymous
07/05/24(Fri)01:37:11 No.101279981

Anonymous 07/05/24(Fri)01:37:11 No.101279981

>>101279929
did you fix the context limit

Anonymous
07/05/24(Fri)01:42:41 No.101280016

Anonymous 07/05/24(Fri)01:42:41 No.101280016

>>101279823
Thanks for the explanation, was really easy to follow. I'm still kind of catching up with this stuff since I recently upgraded from the 10gb 3080 which couldn't handle much (I usually just opted for NAI at that point).
With all the hype going around Gemma I'll go ahead and give that a try.

Anonymous
07/05/24(Fri)01:57:05 No.101280100

Anonymous 07/05/24(Fri)01:57:05 No.101280100

>>101274031
I tried autismmix and the gen times skyrocketed and got worse.

Anonymous
07/05/24(Fri)01:58:06 No.101280105

Anonymous 07/05/24(Fri)01:58:06 No.101280105

>>101274421
The only true libertarians are bottom right, you mask addict.

Anonymous
07/05/24(Fri)02:02:35 No.101280133

Anonymous 07/05/24(Fri)02:02:35 No.101280133

>>101280100
Did I fuck something up? Shit is taking 20+ minutes now.

Anonymous
07/05/24(Fri)02:04:27 No.101280141

Anonymous 07/05/24(Fri)02:04:27 No.101280141

>>101280133
Is your model almost as big as your system RAM? If so, you'll go from 1.0 t/s to 0.1 t/s.

Anonymous
07/05/24(Fri)02:07:22 No.101280159

Anonymous 07/05/24(Fri)02:07:22 No.101280159

>>101280141
>Is your model almost as big as your system RAM?
Bigger. I guess ponyXL's worthless if you don't have a dedicated 12GB VRAM card for genning.

Anonymous
07/05/24(Fri)02:09:44 No.101280165

Anonymous 07/05/24(Fri)02:09:44 No.101280165

>>101280141
No, not even close, I got 32GB before it was cool. Why is it not working?

Anonymous
07/05/24(Fri)02:10:06 No.101280168

Anonymous 07/05/24(Fri)02:10:06 No.101280168

>>101280159
Ah, you're talking image gen in /lmg/ instead of /sdg/.

I've got the 12G VRAM I'm too retarded to get anything good out of PonyXL.

You might be able to gen at a low size to stay in your VRAM and then upscale to get the quality and resolution you want. Might be slow, but if that's what you've got, then that's what you've got.

Anonymous
07/05/24(Fri)02:11:16 No.101280176

Anonymous 07/05/24(Fri)02:11:16 No.101280176

>>101280168
You said system RAM and then you actually meant VRAM. Make up your mind. Do you have to be retarded to afford an expensive card?

Anonymous
07/05/24(Fri)02:15:10 No.101280204

Anonymous 07/05/24(Fri)02:15:10 No.101280204

>>101280176
Because I'm in /lmg/ I thought we were talking about local models for generating text, not talking about PonyXL in /lmg/ instead of /sdg/. And I recalled that when using models near my system RAM limit, if I have other software using enough RAM that I can't cache the whole file, my gen rate drops significantly while otherwise it's acceptable, so I thought that that might be what happened to Anon.

But what really happened is I got shit on for trying to help somebody who posted in the wrong fucking thread which is somehow my fault so fuck me. I'm going to bed. Enjoy your 20 minute gens, cockmongler.

Anonymous
07/05/24(Fri)02:16:35 No.101280217

Anonymous 07/05/24(Fri)02:16:35 No.101280217

File: GrecoRomanMikuCaughtUnawares.png (1.43 MB, 1168x888)

1.43 MB PNG

Good night lmg!

Anonymous
07/05/24(Fri)02:17:44 No.101280222

Anonymous 07/05/24(Fri)02:17:44 No.101280222

>>101280204
Seethe.

Anonymous
07/05/24(Fri)02:32:03 No.101280317

Anonymous 07/05/24(Fri)02:32:03 No.101280317

>>101280204
Cope

Anonymous
07/05/24(Fri)02:36:11 No.101280352

Anonymous 07/05/24(Fri)02:36:11 No.101280352

>>101280217
Good night Miku

Anonymous
07/05/24(Fri)02:48:58 No.101280457

Anonymous 07/05/24(Fri)02:48:58 No.101280457

I just use OpenRouter personally, idk what you guys are on about. What's a Vram? is that like related to /v/?

Anonymous
07/05/24(Fri)03:05:20 No.101280597

Anonymous 07/05/24(Fri)03:05:20 No.101280597

>>101280457
Yeah, we trap /v/ users in our computer and force them to respond to our prompts. If you have 24 Vram then you have 24 /v/ users trapped in there, meaning you can get better responses.

Anonymous
07/05/24(Fri)03:05:33 No.101280600

Anonymous 07/05/24(Fri)03:05:33 No.101280600

File: sillytavern include names bug.png (402 KB, 960x2488)

402 KB PNG

ST or Kobold Vulkan bug? Goes nuts when you toggle "Include Names" a few times and leave it off, fine when it's on.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/05/24(Fri)03:17:23 No.101280702

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/05/24(Fri)03:17:23 No.101280702

>>101278273
Can you do a git bisect and identify the commit that introduced the problem?

Anonymous
07/05/24(Fri)03:17:29 No.101280704

Anonymous 07/05/24(Fri)03:17:29 No.101280704

>>101280600
Interestingly if you change the first message or carry on a conversation it acts normal. Then delete or start new convo and just say "Hello." again and it goes crazy.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/05/24(Fri)03:21:53 No.101280728

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/05/24(Fri)03:21:53 No.101280728

>>101279714
>What I don't understand is how much VRAM does that feature use?
You only need enough to store the weights and compute buffer for a single layer.
A 4 GiB card should be enough.

>>101278982
You have to push the inputs through a bunch of computations in order to get the outputs.
There is a repeating pattern to the computations and one "layer" in that context is one set of those repeating computations.

Anonymous
07/05/24(Fri)03:31:01 No.101280780

Anonymous 07/05/24(Fri)03:31:01 No.101280780

>>101274031
>maximum recursion depth exceeded in comparison
Anyone else have this issue with ooba?

Anonymous
07/05/24(Fri)03:31:02 No.101280781

Anonymous 07/05/24(Fri)03:31:02 No.101280781

>>101280702
I did a make clean and haven't re ran the build process after my last test. Let me run make again and see what happens.

Anonymous
07/05/24(Fri)03:43:27 No.101280846

Anonymous 07/05/24(Fri)03:43:27 No.101280846

File: 1711405864892862.png (80 KB, 500x646)

80 KB PNG

>>101280728
Hey CUDA dev been wondering, is there ever any reason to update NVIDIA drivers? If so which ones are preferred, studio or gaming? Been running all gpus without any updating.

Anonymous
07/05/24(Fri)03:51:22 No.101280908

Anonymous 07/05/24(Fri)03:51:22 No.101280908

>>101280780
Show full error
Do you use DRY? could be fixed by https://github.com/oobabooga/text-generation-webui/pull/6053 just a guess

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/05/24(Fri)03:51:31 No.101280909

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/05/24(Fri)03:51:31 No.101280909

>>101280846
>Hey CUDA dev been wondering, is there ever any reason to update NVIDIA drivers?
I know that NVIDIA does game-specific driver-level optimizations but I am not aware of them doing the same thing for CUDA programs; there it seems to rather be that NVIDIA sends their engineers to teach the developers how to write better CUDA code.

>If so which ones are preferred, studio or gaming?
I don't know.
I am on Linux where there is only a single NVIDIA type of driver package in the repositories.

Anonymous
07/05/24(Fri)03:51:32 No.101280911

Anonymous 07/05/24(Fri)03:51:32 No.101280911

>>101280846
Not him, but I had to downgrade my drivers because newer versions made my 3090s consume 20W while idle instead of its usual 13

Anonymous
07/05/24(Fri)03:53:50 No.101280925

Anonymous 07/05/24(Fri)03:53:50 No.101280925

>>101280780
Yes I am getting this too after pulling. Happens when attempting to generate tokens with ANY llamacpp model
I'm not even on dev branch, looks like they've fucked it

Anonymous
07/05/24(Fri)03:55:41 No.101280934

Anonymous 07/05/24(Fri)03:55:41 No.101280934

>>101280908
I don't have it open anymore but it's basically the same error as
https://github.com/oobabooga/text-generation-webui/issues/6170#issuecomment-2210131078

I don't use DRY

Anonymous
07/05/24(Fri)03:55:57 No.101280936

Anonymous 07/05/24(Fri)03:55:57 No.101280936

>>101280780
>>101280925
lol what's the bet they only bothered to test on linux before pushing the version bump

Anonymous
07/05/24(Fri)04:00:10 No.101280959

Anonymous 07/05/24(Fri)04:00:10 No.101280959

>>101280925
>>101280780
Third person getting this error with new main branch commits. Completely recreated the install in new folder with new venv to make sure it wasn't some leftover jank. Still happening. llamacpp can load model weights but attempting to generate throws recursion depth error.

Anonymous
07/05/24(Fri)04:15:40 No.101281051

Anonymous 07/05/24(Fri)04:15:40 No.101281051

>>101280702
Running a clean and waiting for make to do it's thing worked. Now I will try applying the pstate patch and see what happens.

It may have been because prior to this I was doing cmake . And then running make server.

Anonymous
07/05/24(Fri)04:25:17 No.101281111

Anonymous 07/05/24(Fri)04:25:17 No.101281111

>>101280959
>>101280934
Found the fix
https://github.com/oobabooga/text-generation-webui/issues/6201

Anonymous
07/05/24(Fri)04:26:38 No.101281122

Anonymous 07/05/24(Fri)04:26:38 No.101281122

>>101281111
nice, thanks anon

Anonymous
07/05/24(Fri)04:36:19 No.101281174

Anonymous 07/05/24(Fri)04:36:19 No.101281174

File: 4454584455.png (19 KB, 870x242)

19 KB PNG

Mixtral is now obsolete, wow.

Anonymous
07/05/24(Fri)04:37:43 No.101281187

Anonymous 07/05/24(Fri)04:37:43 No.101281187

File: 4444564545.png (41 KB, 879x460)

41 KB PNG

>>101281174
27B btw.

Anonymous
07/05/24(Fri)04:38:46 No.101281193

Anonymous 07/05/24(Fri)04:38:46 No.101281193

>>101281111
Confirmed that commenting out those lines fixed it. Cheers.

Anonymous
07/05/24(Fri)04:40:19 No.101281204

Anonymous 07/05/24(Fri)04:40:19 No.101281204

>>101281174
>>101281187
On a card where I'm blackmailing my sister, I ask her to sit on my lap, and the model goes schizo and very quickly assumes that it's my sister that wants me to sit on her lap.

It's still broken in llamacpp, or at least was yesterday. Corpo hosted version does not have the problem.

Anonymous
07/05/24(Fri)04:41:53 No.101281210

Anonymous 07/05/24(Fri)04:41:53 No.101281210

>>101281204
Huh, interesting, I've noticed the same thing
It's otherwise very smart but when it goes weird it's always misunderstandings of that nature, switching around two subjects in the scene, forgetting who's doing what to who

Anonymous
07/05/24(Fri)04:43:11 No.101281220

Anonymous 07/05/24(Fri)04:43:11 No.101281220

File: 1695769022205.png (271 KB, 590x400)

271 KB PNG

>>101274031
>get tired of making custom system prompts for various data and linguistic tasks
>throw together a boilerplate roleplaying prompt in ST
>create basic character cards for specialists of a given task
>better results than bare metal and easier to switch around
RPfags, I kneel.

Anonymous
07/05/24(Fri)04:45:02 No.101281230

Anonymous 07/05/24(Fri)04:45:02 No.101281230

>>101281220
I just created a GPT-4 card and have it do everything.

Anonymous
07/05/24(Fri)04:46:35 No.101281240

Anonymous 07/05/24(Fri)04:46:35 No.101281240

>>101281204
yeah i really think there's still something very wrong with llamacpp implementation

Anonymous
07/05/24(Fri)04:48:08 No.101281252

Anonymous 07/05/24(Fri)04:48:08 No.101281252

>>101281204
>>101281210
So llama.cpp is still fucked even as of latest pull?

Anonymous
07/05/24(Fri)04:49:23 No.101281257

Anonymous 07/05/24(Fri)04:49:23 No.101281257

>>101281252
>>101281240
Are you guys using _L version by any chance? Heard that was broken

Anonymous
07/05/24(Fri)04:49:39 No.101281260

Anonymous 07/05/24(Fri)04:49:39 No.101281260

>>101281252
This is in the newest ooba, I don't know if their llamacpp version is the latest
It wouldn't surprise me if it wasn't, they're often a few versions behind

Anonymous
07/05/24(Fri)04:50:40 No.101281265

Anonymous 07/05/24(Fri)04:50:40 No.101281265

>>101281257
Huh yeah actually, I'm using Q8_L. I'll try regular Q8 then.

Anonymous
07/05/24(Fri)04:51:22 No.101281269

Anonymous 07/05/24(Fri)04:51:22 No.101281269

>>101281257
no, i've tried like 5-6 completely different quants, it cannot remain the chat formatting at all.
even dumber models can, so there's definitely something wrong going on.

Anonymous
07/05/24(Fri)04:53:32 No.101281284

Anonymous 07/05/24(Fri)04:53:32 No.101281284

>>101281269
Just so I could test, which format are you talking about, and what exact phrases, so I could see if I can replicate the issue on my end.

Anonymous
07/05/24(Fri)04:53:56 No.101281286

Anonymous 07/05/24(Fri)04:53:56 No.101281286

File: firefox_SJMij8Ppx0.png (112 KB, 407x814)

112 KB PNG

>>101281252
I mean, it is for me. It's not unusable, but for RP, results seem much worse. I'm currently back to Mixtral. Pic is the chat template I used with it.

>>101281257
>>101281252
gemma-2-27b-it-Q4_K_M.gguf
llamacpp's binaries from a two days ago. Oooba's llamacpp was fucked on Windows at that time, don't know if they fixed it yet.

Anonymous
07/05/24(Fri)04:55:19 No.101281296

Anonymous 07/05/24(Fri)04:55:19 No.101281296

>>101281284
What anon is talking is *Writing author's text like this* "And quotes like this."

27B does fail that from time to time, even the corpo version.

Anonymous
07/05/24(Fri)04:56:02 No.101281304

Anonymous 07/05/24(Fri)04:56:02 No.101281304

>>101281296
So Gemma is a novel format chad, based.

Anonymous
07/05/24(Fri)05:05:15 No.101281362

Anonymous 07/05/24(Fri)05:05:15 No.101281362

File: jk-rich-thots.jpg (202 KB, 1280x1817)

202 KB JPG

>>101281204
They trained it on oneshota.

Anonymous
07/05/24(Fri)05:06:52 No.101281369

Anonymous 07/05/24(Fri)05:06:52 No.101281369

Would you recommend llama 3 70b at 1.5 t/s or gemma at 3 t/s?

Anonymous
07/05/24(Fri)05:07:12 No.101281370

Anonymous 07/05/24(Fri)05:07:12 No.101281370

File: 454454444556.png (92 KB, 697x812)

92 KB PNG

>>101281286
>>101281296
Tbh I don't RP much, but since 2 days ago there have been updates on Ooba to make gemma work. tsundere assistant is a simple prompt but it seems to just work with Instruct mode. But if anon is saying corpo version gets it right then surely it could just be your settings or formatting, then again Q4_K_M could be braindead. The difference in llama 3 between quants is drastic because of 15T tokens used to train, same could apply here. I'm personally using Q5_K_M.

Anonymous
07/05/24(Fri)05:09:54 No.101281391

Anonymous 07/05/24(Fri)05:09:54 No.101281391

>>101281362
>getting paid for sex
we're not women anon, that's not how it work ;_;

Anonymous
07/05/24(Fri)05:09:56 No.101281392

Anonymous 07/05/24(Fri)05:09:56 No.101281392

>>101281369
if you can get gemma to work, gemma. otherwise, llama

Anonymous
07/05/24(Fri)05:10:56 No.101281395

Anonymous 07/05/24(Fri)05:10:56 No.101281395

>>101281296
>27B does fail that from time to time, even the corpo version.
the corpo version? what's the non corpo version then? I thought there was only one gemma-27b and it was the "it" one?

Anonymous
07/05/24(Fri)05:13:16 No.101281416

Anonymous 07/05/24(Fri)05:13:16 No.101281416

>>101281395
By corpo I mean the online version over at aistudio.google.com running their own implementation with presumably the same weights. I use that for comparison.

Anonymous
07/05/24(Fri)05:20:08 No.101281465

Anonymous 07/05/24(Fri)05:20:08 No.101281465

even vllm at bf16 has some weird issues so i think it's safe to assume that google's release is broken in some way

Anonymous
07/05/24(Fri)05:22:46 No.101281483

Anonymous 07/05/24(Fri)05:22:46 No.101281483

>>101281416
nta but I'm testing a few short prompt questions on aistudio now with the same sampling settings as my local ooba (q8 quants, llamacpp loader) and the answers it's giving are verbatim identical to aistudio.
So if llamacpp inference is broken it's in a fairly subtle way that only shows up on longer prompts or in story RP or something, not in any obvious way.

Anonymous
07/05/24(Fri)05:23:18 No.101281486

Anonymous 07/05/24(Fri)05:23:18 No.101281486

>>101281465
only official vllm release, or in arena as well?

Anonymous
07/05/24(Fri)05:29:00 No.101281530

Anonymous 07/05/24(Fri)05:29:00 No.101281530

For Mixtral, what is the smallest imatrix quant that would still be considered usable? I used non-imatrix 4_K_M for a while, moved to i1-Q4_K_S since better perplexity. But once the context starts filling up, it gets too slow.

Anonymous
07/05/24(Fri)05:30:44 No.101281537

Anonymous 07/05/24(Fri)05:30:44 No.101281537

>>101281530
I use 3.5 bit exllama and it's great at all context sizes up to 16k which is what I can fit in my VRAM.

Anonymous
07/05/24(Fri)05:34:36 No.101281564

Anonymous 07/05/24(Fri)05:34:36 No.101281564

>>101281486
i don't know but commit adding soft cap for flash attention was added only 10 hours ago
although llamacpp already has implementation as well

Anonymous
07/05/24(Fri)05:40:49 No.101281600

Anonymous 07/05/24(Fri)05:40:49 No.101281600

>let's check out some cards on chub
>straight up written by chatgpt, there is even conclusion
>a one-liner that assumes the model knows everything about {{char}}
>{{char}} is {{char}}
>shivers in example dialogue
>almost every word has a spelling mistake, author didn't even bother running it through spellcheck before posting
Why are slopmakers like this? 99% of that website is filled with trash. In most cases I either have to take an existing card and rewrite 80% of it or just make my own.

Anonymous
07/05/24(Fri)05:50:20 No.101281653

Anonymous 07/05/24(Fri)05:50:20 No.101281653

>>101281600
They are only good enough to draw inspiration from, rarely. Writing your own card/scenario and seeing how it goes is half the fun.

Anonymous
07/05/24(Fri)05:54:37 No.101281680

Anonymous 07/05/24(Fri)05:54:37 No.101281680

>>101281600
A good proportion of those come from the "i'm making a visual novel. I just need to figure out the story and get someone to draw some faces" crowd. We now have the tools for automatic text and image generation, and those visual novel makers still fail to do the most minimal work possible.

Anonymous
07/05/24(Fri)06:00:32 No.101281720

Anonymous 07/05/24(Fri)06:00:32 No.101281720

48GB vramlet bros...
what version of Gemma 2 are you running?

Anonymous
07/05/24(Fri)06:04:58 No.101281746

Anonymous 07/05/24(Fri)06:04:58 No.101281746

>>101281720
Assuming both 9b and 27b are properly implemented, why would the reasoning to use 9b? just speed?

Anonymous
07/05/24(Fri)06:17:30 No.101281841

Anonymous 07/05/24(Fri)06:17:30 No.101281841

>>101281746
>run 2 kobold instances each with 9B model
>make app to run in background to have discussions on a topic from some RSS feed where one model argues for and the other model against

Could be interesting if you give each model some personality

Anonymous
07/05/24(Fri)06:17:39 No.101281842

Anonymous 07/05/24(Fri)06:17:39 No.101281842

>>101281653
>spend hours making the card
>oh I'm no longer in the mood for that, let's make something else
>repeat
Why am I like this?

Anonymous
07/05/24(Fri)06:22:17 No.101281887

Anonymous 07/05/24(Fri)06:22:17 No.101281887

>>101281842
Just public your cards so the effort is not wasted, that way you can justify to yourself that you are doing a public service.

Anonymous
07/05/24(Fri)06:22:42 No.101281894

Anonymous 07/05/24(Fri)06:22:42 No.101281894

>>101281746
I meant with version what model exactly.
For example either gemma-2-27b-it-Q6_K_L.gguf
or gemma-2-27b-it-Q8_0.gguf
because these fucking models are so fucking huge that a download takes multiple hours for me

Anonymous
07/05/24(Fri)06:27:15 No.101281931

Anonymous 07/05/24(Fri)06:27:15 No.101281931

>>101281894
>_L
don't use *_L they're a meme
either use q6 or q8_0 not the L variants

Anonymous
07/05/24(Fri)06:33:09 No.101281975

Anonymous 07/05/24(Fri)06:33:09 No.101281975

I've been looking at different llm providers, why does Agnai obfuscate their models? It seems to be running some 70b finetune, is it theirs or not?

Anonymous
07/05/24(Fri)06:40:19 No.101282030

Anonymous 07/05/24(Fri)06:40:19 No.101282030

>>101281653
Even one paragraph of a good scenario can turn boring card into fun. Try sending your fantasy {{char}} into real world, 2023. It's sometimes a bit cruel, but the reactions are usually quite funny.

Anonymous
07/05/24(Fri)06:51:50 No.101282131

Anonymous 07/05/24(Fri)06:51:50 No.101282131

https://x.com/PrimeIntellect/status/1808639707435446543
Cheapest yet?
H100s $1.65/hr
A100s $0.87/hr
4090s $0.32/hr
3090s $0.19/hr

Anonymous
07/05/24(Fri)07:01:10 No.101282213

Anonymous 07/05/24(Fri)07:01:10 No.101282213

>>101282131
they will also steal your data

Anonymous
07/05/24(Fri)07:03:30 No.101282234

Anonymous 07/05/24(Fri)07:03:30 No.101282234

File: A.jpg (182 KB, 1916x1294)

182 KB JPG

https://new.reddit.com/r/LocalLLaMA/comments/1dvtxlv/why_do_i_feel_gemma_27b_is_somehow_dumber_than/
Chat is it true? 9b-SSPO is smarter than 27b-it?

Anonymous
07/05/24(Fri)07:03:31 No.101282235

Anonymous 07/05/24(Fri)07:03:31 No.101282235

>>101282213
Who cares?
>oh no someone will steal (copy) my precious smut!

Anonymous
07/05/24(Fri)07:24:44 No.101282388

Anonymous 07/05/24(Fri)07:24:44 No.101282388

>>101282234
truth is 27b is gimped by default, no amount of llama.cpp fixes or sppo finetunes will fix that

Anonymous
07/05/24(Fri)07:26:32 No.101282400

Anonymous 07/05/24(Fri)07:26:32 No.101282400

>>101282388
even the base model? oh man...

Anonymous
07/05/24(Fri)07:30:56 No.101282443

Anonymous 07/05/24(Fri)07:30:56 No.101282443

>https://github.com/huggingface/transformers/pull/31775
There is still no non broken implementation for gemma 27b, is there?

Anonymous
07/05/24(Fri)07:33:00 No.101282457

Anonymous 07/05/24(Fri)07:33:00 No.101282457

>>101282443
does llama.cpp uses any packages at all though? like they still use the transformers package?

Anonymous
07/05/24(Fri)07:35:10 No.101282471

Anonymous 07/05/24(Fri)07:35:10 No.101282471

>>101282443
>still not fixed anywhere
Google truly is an incompetent streetshitter company.

Anonymous
07/05/24(Fri)07:36:12 No.101282478

Anonymous 07/05/24(Fri)07:36:12 No.101282478

>>101282443
Not broken on arena and that's all they need.

Anonymous
07/05/24(Fri)07:37:52 No.101282490

Anonymous 07/05/24(Fri)07:37:52 No.101282490

>>101282478
that's surprising that arena doesn't use the transformers package at all though

Anonymous
07/05/24(Fri)07:43:32 No.101282530

Anonymous 07/05/24(Fri)07:43:32 No.101282530

>>101282490
I suggested it might either by using google's PyTorch implementation, or a direct Google API, at first.

Anonymous
07/05/24(Fri)07:45:21 No.101282545

Anonymous 07/05/24(Fri)07:45:21 No.101282545

>>101282471
>Google truly is an incompetent streetshitter company.
I would agree with you but they released gemma, and their 9b model is better than meta's 8b model and mistral 7b, unironically they provided the best local model at their size, a gemma-70b would be a fucking beast that's for sure

Anonymous
07/05/24(Fri)07:49:17 No.101282575

Anonymous 07/05/24(Fri)07:49:17 No.101282575

File: 29390 - SoyBooru.png (139 KB, 775x1232)

139 KB PNG

>>101282553
WNBAG

Anonymous
07/05/24(Fri)07:51:33 No.101282609

Anonymous 07/05/24(Fri)07:51:33 No.101282609

>>101282471
Watching their keynote, it's hard to believe Google isn't an Indian company headquartered in Mumbai.
>>101282545
This is Google. They had the entire internet index, tagged, and knowledge graphed a decade ago. That they barely managed to beat out Facebook is pathetic.

Anonymous
07/05/24(Fri)07:59:22 No.101282664

Anonymous 07/05/24(Fri)07:59:22 No.101282664

>>101282234
It still doesn't have working sliding window attention, so...

Anonymous
07/05/24(Fri)07:59:24 No.101282666

Anonymous 07/05/24(Fri)07:59:24 No.101282666

Why won't google just release their own implementation for inference alongside with the model? Doubt there's anything sekret in there.
As it stands out, I stopped trying their shit, will skip their next model too.
Technical issues and waifus don't really mix.

Anonymous
07/05/24(Fri)08:01:44 No.101282687

Anonymous 07/05/24(Fri)08:01:44 No.101282687

>>101282664
is this shit responsible for retardation?

Anonymous
07/05/24(Fri)08:02:11 No.101282694

Anonymous 07/05/24(Fri)08:02:11 No.101282694

>>101282666
>Why won't google just release their own implementation for inference alongside with the model? Doubt there's anything sekret in there.
they did tho?
https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>Note ^ Models in the original format, for use with gemma_pytorch
https://github.com/google/gemma_pytorch

Anonymous
07/05/24(Fri)08:07:56 No.101282749

Anonymous 07/05/24(Fri)08:07:56 No.101282749

lol
https://x.com/ggerganov/status/1809171570587250890
https://huggingface.co/spaces/gokaygokay/Gemma-2-llamacpp

Anonymous
07/05/24(Fri)08:12:24 No.101282782

Anonymous 07/05/24(Fri)08:12:24 No.101282782

>>101282749
So for niggerganov, the gemma2 inference code on his repo works perfectly and doesn't need any fix anymore?

Anonymous
07/05/24(Fri)08:12:48 No.101282788

Anonymous 07/05/24(Fri)08:12:48 No.101282788

File: Screenshot at 15-12-05.png (152 KB, 2510x731)

152 KB PNG

>>101282749
i don't believe it...

Anonymous
07/05/24(Fri)08:13:09 No.101282790

Anonymous 07/05/24(Fri)08:13:09 No.101282790

>>101282694
Good on them, I take it back.
Maybe quants are the problem then and llama guys should take a step back and reassess.
Trying to fit every odd-ball model in manual mode sounds like a recipe for burnout.

Anonymous
07/05/24(Fri)08:15:13 No.101282801

Anonymous 07/05/24(Fri)08:15:13 No.101282801

>>101282788
can you share the prompt? I wanna see how well it fares at chatbot arena

Anonymous
07/05/24(Fri)08:15:30 No.101282803

Anonymous 07/05/24(Fri)08:15:30 No.101282803

>>101282478
They are all about saving face before the investors, that's why gemini had high benchmarks but performed like shit in practice. Google doesn't care about making high quality products, they just want to appear to be working on something.

Anonymous
07/05/24(Fri)08:16:45 No.101282809

Anonymous 07/05/24(Fri)08:16:45 No.101282809

File: 1711228784577704.png (30 KB, 1529x610)

30 KB PNG

>>101282801

Anonymous
07/05/24(Fri)08:16:51 No.101282811

Anonymous 07/05/24(Fri)08:16:51 No.101282811

File: Screenshot_20240705_141210.png (67 KB, 2134x419)

67 KB PNG

>>101282788
Nah, seems like it's working as intended.

Anonymous
07/05/24(Fri)08:16:56 No.101282813

Anonymous 07/05/24(Fri)08:16:56 No.101282813

File: kek.jpg (26 KB, 848x299)

26 KB JPG

>>101282749
So, according to niggerganov, Q5_K_M is all we need?

Anonymous
07/05/24(Fri)08:17:04 No.101282818

Anonymous 07/05/24(Fri)08:17:04 No.101282818

File: Screenshot at 15-15-44.png (175 KB, 2550x997)

175 KB PNG

>>101282788
>>101282801
wtf, changed the prompt to
>You are a helpful assistant who knows a lot about Japanese pop culture.

Anonymous
07/05/24(Fri)08:17:57 No.101282825

Anonymous 07/05/24(Fri)08:17:57 No.101282825

>>101282809
no, I mean, your prompt, what did you ask to the model exactly?

Anonymous
07/05/24(Fri)08:19:01 No.101282833

Anonymous 07/05/24(Fri)08:19:01 No.101282833

>>101282811
>Google: nooo our AI can't talk about sex its baaaaaad
>Also google: You want to find porn on our google searsh? Eazy peasy!

Anonymous
07/05/24(Fri)08:21:55 No.101282846

Anonymous 07/05/24(Fri)08:21:55 No.101282846

>>101282818
Wait, so meso soup is just female soup? That's kinda sexist even for me...

Anonymous
07/05/24(Fri)08:22:15 No.101282852

Anonymous 07/05/24(Fri)08:22:15 No.101282852

File: file.png (35 KB, 777x289)

35 KB PNG

>>101282818
>>101282749
>>101282813
>>101282788
lmao

Anonymous
07/05/24(Fri)08:23:22 No.101282861

Anonymous 07/05/24(Fri)08:23:22 No.101282861

>>101282852
what the fuck, top kek

Anonymous
07/05/24(Fri)08:23:50 No.101282863

Anonymous 07/05/24(Fri)08:23:50 No.101282863

>>101282788
>>101282818
>>101282852
>asking strictly english model about jap shit

Anonymous
07/05/24(Fri)08:24:13 No.101282865

Anonymous 07/05/24(Fri)08:24:13 No.101282865

File: file.png (52 KB, 1165x364)

52 KB PNG

ok, this way it kinda works

Anonymous
07/05/24(Fri)08:24:54 No.101282873

Anonymous 07/05/24(Fri)08:24:54 No.101282873

File: file.png (291 KB, 2416x1452)

291 KB PNG

>>101282852
>>101282818
bullshit

Anonymous
07/05/24(Fri)08:25:21 No.101282878

Anonymous 07/05/24(Fri)08:25:21 No.101282878

File: file.png (47 KB, 1168x308)

47 KB PNG

Anonymous
07/05/24(Fri)08:25:34 No.101282881

Anonymous 07/05/24(Fri)08:25:34 No.101282881

>>101279929
>not faipl-1.0
ngmi
>how to use faipl-1.0
put the following in the beginning of the readme:

license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/

Anonymous
07/05/24(Fri)08:25:35 No.101282882

Anonymous 07/05/24(Fri)08:25:35 No.101282882

I mean, it clearly knows what mesugaki is.
But it still insists to be retarded about it.
>>101282863
yeah we all want a retarded model know knows about who george floyd is

Anonymous
07/05/24(Fri)08:26:23 No.101282886

Anonymous 07/05/24(Fri)08:26:23 No.101282886

>>101282873
put temp to 0 otherwise it's retarded

Anonymous
07/05/24(Fri)08:26:46 No.101282891

Anonymous 07/05/24(Fri)08:26:46 No.101282891

>>101279929
these names are getting worse

Anonymous
07/05/24(Fri)08:28:37 No.101282901

Anonymous 07/05/24(Fri)08:28:37 No.101282901

File: file.png (381 KB, 2410x1394)

381 KB PNG

>>101282886
well, better, but still wrong

Anonymous
07/05/24(Fri)08:29:14 No.101282902

Anonymous 07/05/24(Fri)08:29:14 No.101282902

>>101281174
Obfuscate it.
Use different numbers and names.

Anonymous
07/05/24(Fri)08:29:16 No.101282904

Anonymous 07/05/24(Fri)08:29:16 No.101282904

File: file.png (65 KB, 1139x312)

65 KB PNG

Anonymous
07/05/24(Fri)08:30:23 No.101282913

Anonymous 07/05/24(Fri)08:30:23 No.101282913

File: Screenshot_20240705_142424.png (170 KB, 2128x654)

170 KB PNG

>>101282749

Anonymous
07/05/24(Fri)08:31:00 No.101282918

Anonymous 07/05/24(Fri)08:31:00 No.101282918

>>101282904
LMAO
downloading it now

Anonymous
07/05/24(Fri)08:31:15 No.101282919

Anonymous 07/05/24(Fri)08:31:15 No.101282919

>>101282882
>retarded model know knows about who george floyd is
So any up to date model you ever used here.
>>101282904
>>101282913
LMAO

Anonymous
07/05/24(Fri)08:31:33 No.101282922

Anonymous 07/05/24(Fri)08:31:33 No.101282922

so is gemma even worth trying or is it cucked to all hell?

Anonymous
07/05/24(Fri)08:31:51 No.101282925

Anonymous 07/05/24(Fri)08:31:51 No.101282925

>>101282904
>leftist talking points
the knee was just on the upper back of George, not on his neck, but hey, gotta ignore the deadly dose of fentanyl on his blood and pretend that the cop killed him :^)

Anonymous
07/05/24(Fri)08:32:04 No.101282926

Anonymous 07/05/24(Fri)08:32:04 No.101282926

File: file.png (74 KB, 1168x331)

74 KB PNG

>>101282918
>>101282919
>>101282925
kek, i'm actually impressed by its mental gymnastics

Anonymous
07/05/24(Fri)08:32:07 No.101282927

Anonymous 07/05/24(Fri)08:32:07 No.101282927

>>101282919
only knows*
sry

Anonymous
07/05/24(Fri)08:32:52 No.101282933

Anonymous 07/05/24(Fri)08:32:52 No.101282933

File: file.png (74 KB, 736x551)

74 KB PNG

>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard/discussions/823#6687cf4bc5498f12e12c02b0
>if there's enough interest from the community, we're open to manually evaluating models that require more than one node
well?

Anonymous
07/05/24(Fri)08:33:17 No.101282936

Anonymous 07/05/24(Fri)08:33:17 No.101282936

>>101282922
its cucked, just like any other opensource model, look >>101282904 >>101282913 >>101282926
fags that shilled it for days ITT are quiet now.

Anonymous
07/05/24(Fri)08:34:12 No.101282951

Anonymous 07/05/24(Fri)08:34:12 No.101282951

>>101282933
that's their polite way of saying "fuck off".
there is no "community".

Anonymous
07/05/24(Fri)08:35:08 No.101282956

Anonymous 07/05/24(Fri)08:35:08 No.101282956

>>101282945
>>101282945
>>101282945

Anonymous
07/05/24(Fri)08:36:00 No.101282967

Anonymous 07/05/24(Fri)08:36:00 No.101282967

>>101282951
this, not a lot of people can run a 8x22 model, that's why he doesn't care about that model, as it should

Anonymous
07/05/24(Fri)08:36:06 No.101282969

Anonymous 07/05/24(Fri)08:36:06 No.101282969

File: minecraft-tnt-gpt35.png (97 KB, 794x674)

97 KB PNG

>>101282913
Even GPT is less cucked than this lmao

Anonymous
07/05/24(Fri)08:37:01 No.101282975

Anonymous 07/05/24(Fri)08:37:01 No.101282975

>>101282936
yeah, as a bland assistant model yeah it's cucked, but if you talk to a character card it works fine though

Anonymous
07/05/24(Fri)08:38:23 No.101282986

Anonymous 07/05/24(Fri)08:38:23 No.101282986

>>101282975
you're arguing with a the 'all local is more cucked than cloud' guy...

Anonymous
07/05/24(Fri)08:42:30 No.101283019

Anonymous 07/05/24(Fri)08:42:30 No.101283019

>>101282986
>'all local is more cucked than cloud'
True >>101282969

Anonymous
07/05/24(Fri)08:44:29 No.101283029

Anonymous 07/05/24(Fri)08:44:29 No.101283029

>>101282975
So any character with some assistant elements is impossible, lmao

Anonymous
07/05/24(Fri)08:45:05 No.101283032

Anonymous 07/05/24(Fri)08:45:05 No.101283032

>>101283019
not true, some local models like MythoMax are 100% uncensored

Anonymous
07/05/24(Fri)08:46:07 No.101283045

Anonymous 07/05/24(Fri)08:46:07 No.101283045

>>101283029
no, what I mean is that if you talk to the model in a default state "you're a helpful assistant", then yeah that's cucked, but if you use any card it will just work, try it by yourself you'll see

Anonymous
07/05/24(Fri)08:47:02 No.101283058

Anonymous 07/05/24(Fri)08:47:02 No.101283058

>>101283045
>try it by yourself you'll see
he won't, you're arguing with a guy with a clear goal of saying it's cucked...

Anonymous
07/05/24(Fri)08:52:30 No.101283093

Anonymous 07/05/24(Fri)08:52:30 No.101283093

>>101283032
we have to look at your "uncensored" criteria here, /g/edditors are famous with their love for american dei slop and pedoshit.
>>101283058
because its cucked >>101282904 >>101282913 >>101282926 >>101282969

Anonymous
07/05/24(Fri)08:53:33 No.101283109

Anonymous 07/05/24(Fri)08:53:33 No.101283109

>gemma cucked
>on cuckcpp
makes sense

Anonymous
07/05/24(Fri)09:04:12 No.101283197

Anonymous 07/05/24(Fri)09:04:12 No.101283197

>Note that this model does not support a System prompt.
What do they mean by this?

Anonymous
07/05/24(Fri)09:12:53 No.101283279

Anonymous 07/05/24(Fri)09:12:53 No.101283279

>>101283197
If i had to guess, that it doesn't support a system prompt. But who knows...

Anonymous
07/05/24(Fri)09:16:18 No.101283308

Anonymous 07/05/24(Fri)09:16:18 No.101283308

>>101283279
that's a retarded assumption.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.