/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/27/24(Thu)08:25:44 No.101173181

File: GothicHorrorMiku.png (1.42 MB, 768x1344)

1.42 MB PNG

/lmg/ - Local Models General Anonymous 06/27/24(Thu)08:25:44 No.101173181 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101165886 & >>101155940

►News
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/27/24(Thu)08:26:08 No.101173187

Anonymous 06/27/24(Thu)08:26:08 No.101173187

File: Gothic_concept_art.png (255 KB, 1536x1152)

255 KB PNG

►Recent Highlights from the Previous Thread: >>101165886

--Paper: Selective Prompting Tuning for Personalized Conversations with LLMs: >>101169643
--Creating Sentience in AI Models using Text Profiles: >>101168459 >>101168511 >>101168802 >>101168996 >>101169069 >>101170197
--Troubleshooting XTTS Installation Issues: >>101169533 >>101169609 >>101169679
--The Transformer Algorithm's Pattern-Matching Ability: >>101167044 >>101167117 >>101167190 >>101167354
--Smarter Models Produce Dryer Smut, or Do They?: >>101169770 >>101169817
--Seeking a Model for Long Context: Options and Limitations: >>101166812 >>101166824 >>101166888 >>101166890 >>101167235 >>101168718
--Resolving LLaMA-3 Refusals in sillytavern with LLaMA3 70B: >>101166305 >>101166542 >>101166620 >>101167364 >>101167387 >>101168579
--Pyramid Attention Broadcast (PAB) for Real-Time DiT-Based Video Generation: >>101168884 >>101169033
--Optimizing AI Model Quantization on Heterogeneous GPU Setup: >>101169277
--Nemotron Scores Added to VNTL Leaderboard: >>101168329 >>101168528 >>101168721 >>101171078
--Llama3's SPPO Outperforms GPT4 on AlpacaEval 2.0: >>101168696 >>101168776 >>101170546 >>101169314 >>101169495 >>101169615 >>101169634 >>101169682 >>101169707 >>101169380 >>101169425 >>101169440 >>101169524
--LLMs in Creative Writing: Decision-Making and Context: >>101171474
--Introducing AuraSR - An Open Reproduction of the GigaGAN Upscaler: >>101170401
--Gemma v2 27b Tested in LMSys Chatbot Arena,: >>101167408 >>101167638 >>101167750 >>101170949
--CR+ Performance Issues in Q4KM for RP: >>101167597 >>101167627 >>101167616 >>101167673
--Alternative Server Systems for Nvidia V100 SXM2 in Europe: >>101170900 >>101170928
--Quantization Levels and Their Existence: >>101169290 >>101169327 >>101169363 >>101169417 >>101169474
--Logs: Nala Test for TenyxChat 70B SLERPd with Daybreak Storywriter: >>101170295
--Miku (free space): >>101169902 >>101170474

►Recent Highlight Posts from the Previous Thread: >>101165891

Anonymous
06/27/24(Thu)08:42:46 No.101173325

Anonymous 06/27/24(Thu)08:42:46 No.101173325

>>101173187
Thank you Recap Miku

Anonymous
06/27/24(Thu)08:43:38 No.101173339

Anonymous 06/27/24(Thu)08:43:38 No.101173339

Today's the day!!!

Anonymous
06/27/24(Thu)08:43:47 No.101173342

Anonymous 06/27/24(Thu)08:43:47 No.101173342

>>101173187
what about
gothic miku
minus the dress

Anonymous
06/27/24(Thu)08:48:04 No.101173378

Anonymous 06/27/24(Thu)08:48:04 No.101173378

>>101173339
we did it rėddit! lobôtomized globõhomo göyslöp model releases today!

Anonymous
06/27/24(Thu)08:52:41 No.101173412

Anonymous 06/27/24(Thu)08:52:41 No.101173412

File: 1693965222888650.jpg (746 KB, 1856x2464)

746 KB JPG

>>101173181

Anonymous
06/27/24(Thu)08:53:36 No.101173419

Anonymous 06/27/24(Thu)08:53:36 No.101173419

>>101173378
That cheated on all benchmarks and is more retarded than 8B LLaMa.

Anonymous
06/27/24(Thu)09:00:27 No.101173488

Anonymous 06/27/24(Thu)09:00:27 No.101173488

Could one take a dense, "full precision" model, quantize it to 1.58bpw, bit net style, and pretrain the fuck of it at that state to approximate a bitnet model trained from scratch?
Does it even work like that?

>>101173177
Post your sampler configs on Silly.

Anonymous
06/27/24(Thu)09:03:19 No.101173515

Anonymous 06/27/24(Thu)09:03:19 No.101173515

>>101173488
Hard to say without someone with money taking the risk and trying it out.

Anonymous
06/27/24(Thu)09:06:10 No.101173537

Anonymous 06/27/24(Thu)09:06:10 No.101173537

>>101173378
now say the cake is a lie and two more weeks

Anonymous
06/27/24(Thu)09:06:11 No.101173538

Anonymous 06/27/24(Thu)09:06:11 No.101173538

>>101173488
It does, but the GPU cartel will never allow Bitnet to go beyond fictional concept.

Anonymous
06/27/24(Thu)09:09:05 No.101173566

Anonymous 06/27/24(Thu)09:09:05 No.101173566

>>101173537
you already said it retard

Anonymous
06/27/24(Thu)09:11:52 No.101173596

Anonymous 06/27/24(Thu)09:11:52 No.101173596

Gemma 2 will save us

Anonymous
06/27/24(Thu)09:13:21 No.101173615

Anonymous 06/27/24(Thu)09:13:21 No.101173615

>>101173538
That's a good point. Maybe Nvidia would refuse to sell H100s to companies like Meta unless they agree not to train BitNet models.

Anonymous
06/27/24(Thu)09:13:23 No.101173616

Anonymous 06/27/24(Thu)09:13:23 No.101173616

>>101173538
this

Anonymous
06/27/24(Thu)09:14:22 No.101173621

Anonymous 06/27/24(Thu)09:14:22 No.101173621

>>101171078
It's "DeepSeek V2 Chat" from OpenRouter:
https://openrouter.ai/models/deepseek/deepseek-chat

Anonymous
06/27/24(Thu)09:15:00 No.101173627

Anonymous 06/27/24(Thu)09:15:00 No.101173627

>>101173488
I think there were people experimenting on smaller scale like with 3B models. Hard to tell if it would scale to something like 70B. Renting a workstation for this would be quite expensive and if it wouldn't turn out good, then it would be a big waste of time and money so no one really tries.

Anonymous
06/27/24(Thu)09:18:14 No.101173655

Anonymous 06/27/24(Thu)09:18:14 No.101173655

>>101173596
Gemma 2 is actually bitnet

Anonymous
06/27/24(Thu)09:19:39 No.101173671

Anonymous 06/27/24(Thu)09:19:39 No.101173671

Optimal vramlet model? Been enjoying L3-8B-Stheno-v3.2.Q8_0 but maybe something else has come out

Anonymous
06/27/24(Thu)09:20:04 No.101173678

Anonymous 06/27/24(Thu)09:20:04 No.101173678

>>101173627
I thought it might be useful for smartphones, but I guess a big local bitnet would be too slow

Anonymous
06/27/24(Thu)09:20:39 No.101173684

Anonymous 06/27/24(Thu)09:20:39 No.101173684

>>101173378
I wasn't talking about gemma.

Anonymous
06/27/24(Thu)09:22:59 No.101173707

Anonymous 06/27/24(Thu)09:22:59 No.101173707

>>101173684
then tell me more, I'll believe you 100%

Anonymous
06/27/24(Thu)09:28:54 No.101173779

Anonymous 06/27/24(Thu)09:28:54 No.101173779

>>101173707
>mistral guys are going to drop a REALLY good open source model very soon
>t. work for them

Anonymous
06/27/24(Thu)09:29:02 No.101173783

Anonymous 06/27/24(Thu)09:29:02 No.101173783

>>101173671
buy a fucking ad

Anonymous
06/27/24(Thu)09:29:42 No.101173795

Anonymous 06/27/24(Thu)09:29:42 No.101173795

>>101173783
Kill yourself

Anonymous
06/27/24(Thu)09:31:09 No.101173813

Anonymous 06/27/24(Thu)09:31:09 No.101173813

>>101173783
Keep alive yourself

Anonymous
06/27/24(Thu)09:31:15 No.101173816

Anonymous 06/27/24(Thu)09:31:15 No.101173816

>>101173684
gemma-2 is the only thing happening today.
>>101173707
go back to rěddit

Anonymous
06/27/24(Thu)09:32:16 No.101173831

Anonymous 06/27/24(Thu)09:32:16 No.101173831

>>101173795
>>101173813
you too go back to rėddit

Anonymous
06/27/24(Thu)09:33:01 No.101173844

Anonymous 06/27/24(Thu)09:33:01 No.101173844

Alright, I'm going to bite the bullet and see if I can get something acceptable here. My goal is to run WizardLM8x22b. I'm currently running it at 2.5bpw EXL2 on dual 3090s via Ooba but I'm tired of it's scuff. I've refused to entertain llama.cpp because it's implementation in ooba has been slow as fucking balls even when 100% offloaded, and never bothered trying kobold.
So, does kobold offer an OpenAI style API? I don't want all this role play chat garbage or context rolling or whatever. I just need Completions end point.
What WizardLM2 8x22b gguf do I run on 48gb Vram and 128gb DDR4? I haven't kept up with what any of the garbage GGUF things mean.

Anonymous
06/27/24(Thu)09:34:54 No.101173865

Anonymous 06/27/24(Thu)09:34:54 No.101173865

>>101173816
you mean localllama?

Anonymous
06/27/24(Thu)09:38:46 No.101173907

Anonymous 06/27/24(Thu)09:38:46 No.101173907

>>101173844
koboldcpp is a llama.cpp fork.

>I don't want all this role play chat garbage or context rolling or whatever. I just need Completions end point.
For what it's worth, llama.cpp HTTP server does just that.

Anonymous
06/27/24(Thu)09:39:30 No.101173916

Anonymous 06/27/24(Thu)09:39:30 No.101173916

Can anyone recommend a model that can run on 48gb of vram for doing documents summary? It would also need to make some basic inferences from data contained in the document. Cheers

Anonymous
06/27/24(Thu)09:43:23 No.101173967

Anonymous 06/27/24(Thu)09:43:23 No.101173967

>>101173916
Command R

Anonymous
06/27/24(Thu)09:46:07 No.101174001

Anonymous 06/27/24(Thu)09:46:07 No.101174001

https://github.com/ggerganov/llama.cpp/pull/8156
Anything we an see from this?
Only thing I am excited for is multi language.
We dont really have that for local. German or japanese is no good.
Gemma1 was so bad I cant believe they released it. Ask it to write in japanese and it writes english, japanese and korean. lol

Anonymous
06/27/24(Thu)09:47:47 No.101174023

Anonymous 06/27/24(Thu)09:47:47 No.101174023

>>101174001
Command R is multi language

Anonymous
06/27/24(Thu)09:51:47 No.101174063

Anonymous 06/27/24(Thu)09:51:47 No.101174063

>>101173967
Thanks, hadn’t heard of this one. Was gonna give Llama3 70B a crack but this looks promising

Anonymous
06/27/24(Thu)09:52:39 No.101174078

Anonymous 06/27/24(Thu)09:52:39 No.101174078

>>101174001
Wow, it has that new self-attention thing, now I'm really hyped.

Anonymous
06/27/24(Thu)09:54:30 No.101174092

Anonymous 06/27/24(Thu)09:54:30 No.101174092

>>101174078
What new self-attention thing?

Anonymous
06/27/24(Thu)09:57:23 No.101174131

Anonymous 06/27/24(Thu)09:57:23 No.101174131

>>101174001
>Only thing I am excited for is multi language.
The T5 PR adds support for UMT5ForConditionalGeneration models. I suppose this one will work.
>https://github.com/ggerganov/llama.cpp/pull/8141
>https://huggingface.co/google/umt5-xxl
>https://huggingface.co/docs/transformers/model_doc/umt5
>As part of our contribution, we release: (i) an improved and refreshed mC4 multilingual corpus consisting of 29 trillion characters across 107 languages, and (ii) a suite of pretrained umT5 model checkpoints trained with UniMax sampling.
Not sure if they trained on all of that or if they're any good, really. Nice to get a new type of model to play around with.

Anonymous
06/27/24(Thu)09:59:09 No.101174151

Anonymous 06/27/24(Thu)09:59:09 No.101174151

>>101174092
https://arxiv.org/pdf/1706.03762

Anonymous
06/27/24(Thu)10:00:54 No.101174171

Anonymous 06/27/24(Thu)10:00:54 No.101174171

>>101174092
nothing new, that faggot is just baiting

Anonymous
06/27/24(Thu)10:00:57 No.101174172

Anonymous 06/27/24(Thu)10:00:57 No.101174172

File: file.png (38 KB, 562x831)

38 KB PNG

localsisters??

Anonymous
06/27/24(Thu)10:01:03 No.101174173

Anonymous 06/27/24(Thu)10:01:03 No.101174173

File: dallc2b7e-2023-03-18-16.1(...).png (1.05 MB, 1024x1024)

1.05 MB PNG

So.. there are some context optimization techniques like "sliding window attention" and "grouped-query attention". AFAICT, both techniques make it so that, when evaluating some tokens, the ttention mechanism takes into account only a part of the context instead of all of it. This is obviously better for memory efficiancy and processing time but...

Doesn't it make the model dumber? Let's say I give it a very big prompt with a question at the end. The question for my answer is, let's say, in the start of the prompt. There is a chance that it won't be able to answer my question because, when evaluating the token of question it won't take into account the tokens from the beginning (where the answer is).

Is my assumption correct? Will this make the model dumber/forgetful in long RP sessions? Is the original attention mechanism, in the end, the most reliable?

Anonymous
06/27/24(Thu)10:01:09 No.101174175

Anonymous 06/27/24(Thu)10:01:09 No.101174175

https://www.kaggle.com/models/google/gemma-2/
gemmy 2 information out on kaggle, including some benchmarks

Anonymous
06/27/24(Thu)10:01:39 No.101174180

Anonymous 06/27/24(Thu)10:01:39 No.101174180

>>101174172
lol
lmao

Anonymous
06/27/24(Thu)10:02:57 No.101174192

Anonymous 06/27/24(Thu)10:02:57 No.101174192

>>101174173
Yes, you're correct. In fact, GQA is a cancer in open source models.

Anonymous
06/27/24(Thu)10:04:11 No.101174205

Anonymous 06/27/24(Thu)10:04:11 No.101174205

>>101174175
just so i dont waste alot of bandwidth:
can we assume >>101174001 this pull works with those gguf files?

Anonymous
06/27/24(Thu)10:05:10 No.101174209

Anonymous 06/27/24(Thu)10:05:10 No.101174209

>>101174172
but can it make me coom?

Anonymous
06/27/24(Thu)10:07:44 No.101174240

Anonymous 06/27/24(Thu)10:07:44 No.101174240

>>101173916
You can also run C-R+ for free from Cohere, it's pretty good for summarization that way.

Anonymous
06/27/24(Thu)10:10:58 No.101174269

Anonymous 06/27/24(Thu)10:10:58 No.101174269

>>101174192

So is the only good and not neutered and handicapped model in the > 8b and < 70b size Command R?

(no GQA. With Q4 quantization, 35 layers in GPU and Q4 cache I get 8t/s in my 3090.)

Anonymous
06/27/24(Thu)10:13:56 No.101174302

Anonymous 06/27/24(Thu)10:13:56 No.101174302

>CSAM Filtering: Rigorous CSAM (Child Sexual Abuse Material) filtering was applied at multiple stages in the data preparation process to ensure the exclusion of harmful and illegal content.
>Sensitive Data Filtering: As part of making Gemma pre-trained models safe and reliable, automated techniques were used to filter out certain personal information and other sensitive data from training sets.
>Additional methods: Filtering based on content quality and safety in line with our policies.
So they trained it on data they harvested over the years after pruning and filtering it. I wonder if how much they were able to get from Colab.
>>101174173
Yes, you're right. It's the same as quantization. Pure cope. Buy 10 A6000.

Anonymous
06/27/24(Thu)10:16:26 No.101174324

Anonymous 06/27/24(Thu)10:16:26 No.101174324

>>101174172
where did you find that? Aren't that the values from the Gemma model that was still training?

Anonymous
06/27/24(Thu)10:17:03 No.101174333

Anonymous 06/27/24(Thu)10:17:03 No.101174333

>>101174240
Thanks. Been out of the loop for a while. Can it run locally in Ooba? I read it needs to be quantized down to 3bit to fit on 2 3090s

Anonymous
06/27/24(Thu)10:17:27 No.101174337

Anonymous 06/27/24(Thu)10:17:27 No.101174337

File: 1717896468986488.png (735 KB, 499x500)

735 KB PNG

>>101174302
>>CSAM Filtering
based.
>>Sensitive Data Filtering
also good if they really filter out personal data.
>>Additional methods: Filtering based on content quality and safety in line with our policies.
not based, into the trash it goes.

Anonymous
06/27/24(Thu)10:17:38 No.101174339

Anonymous 06/27/24(Thu)10:17:38 No.101174339

>>101174302
Sounds like they actually might have trained on nsfw

Anonymous
06/27/24(Thu)10:22:05 No.101174371

Anonymous 06/27/24(Thu)10:22:05 No.101174371

File: 301sl4.png (101 KB, 953x516)

101 KB PNG

>>101174172
Googlebros we are so back

Anonymous
06/27/24(Thu)10:22:59 No.101174378

Anonymous 06/27/24(Thu)10:22:59 No.101174378

Just started using Llama 3 Euryale 70B v2.1 on OpenRouter. The critical thing to me is that the providers are NovitaAI and Infermatic which both support min-p or else this would be a non-starter. Unlike plain Llama 3 70B Instruct this needs active sampler settings to stay coherent. So far it hasn't made a bad impression on me.

Anonymous
06/27/24(Thu)10:23:19 No.101174380

Anonymous 06/27/24(Thu)10:23:19 No.101174380

Wait a second.
https://www.ollama.com/library/gemma2
How? Does ollama actually change stuff in llama.cpp?

Anonymous
06/27/24(Thu)10:24:13 No.101174391

Anonymous 06/27/24(Thu)10:24:13 No.101174391

>>101174337
I agree CSAM filtering is based but the problem is how retarded detection methods can be. It might just automatically throw out any story where a number under 18 is mentioned, or that mention concepts that don't necessarily involve children at all like cheerleading or parks. This is what AI Dungeon did after all.

Anonymous
06/27/24(Thu)10:25:25 No.101174403

Anonymous 06/27/24(Thu)10:25:25 No.101174403

>>101174378
post your settings as I haven't gotten anything coherent from it at all on OR

Anonymous
06/27/24(Thu)10:25:58 No.101174413

Anonymous 06/27/24(Thu)10:25:58 No.101174413

>>101174380
yeah it adds CVEs
>https://thehackernews.com/2024/06/critical-rce-vulnerability-discovered.html

Anonymous
06/27/24(Thu)10:27:01 No.101174426

Anonymous 06/27/24(Thu)10:27:01 No.101174426

>>101174403
Just using the recommended settings of temperature 1.17, min-p 0.075, repetition penalty 1.1.

Anonymous
06/27/24(Thu)10:29:59 No.101174453

Anonymous 06/27/24(Thu)10:29:59 No.101174453

>>101173907
So I guess grab kobold's latest build, which quant?

Anonymous
06/27/24(Thu)10:31:27 No.101174468

Anonymous 06/27/24(Thu)10:31:27 No.101174468

>>101174302
>illegal content
Since when is text illegal?

Anonymous
06/27/24(Thu)10:32:19 No.101174477

Anonymous 06/27/24(Thu)10:32:19 No.101174477

File: file.png (248 KB, 400x400)

248 KB PNG

Now that Gemma-27B is out, will 24gb vramlets be able to enjoy ERP?

Anonymous
06/27/24(Thu)10:32:58 No.101174490

Anonymous 06/27/24(Thu)10:32:58 No.101174490

Only 8k context.. nah.

Anonymous
06/27/24(Thu)10:33:32 No.101174497

Anonymous 06/27/24(Thu)10:33:32 No.101174497

https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf
for anyone who hasn't found it yet

Anonymous
06/27/24(Thu)10:35:26 No.101174522

Anonymous 06/27/24(Thu)10:35:26 No.101174522

>>101174490
every fucking time

Anonymous
06/27/24(Thu)10:36:14 No.101174530

Anonymous 06/27/24(Thu)10:36:14 No.101174530

File: 1692871357172295.png (160 KB, 667x443)

160 KB PNG

>someone ITT got some hopes for this trash
couldn't be me kek

Anonymous
06/27/24(Thu)10:36:32 No.101174537

Anonymous 06/27/24(Thu)10:36:32 No.101174537

>>101174522
>>101174490
let me guess: you need more?

Anonymous
06/27/24(Thu)10:37:12 No.101174544

Anonymous 06/27/24(Thu)10:37:12 No.101174544

>>101174477
Only if some brave soul manages to make a good fine tune, possibly after applying control vectors to it.

>>101174490
Seriously? Fucking hell.
Might as well stick with Qwen 2.

Anonymous
06/27/24(Thu)10:38:51 No.101174565

Anonymous 06/27/24(Thu)10:38:51 No.101174565

>>101174530
sounds good, finetuning alignment, that's changeable

Anonymous
06/27/24(Thu)10:39:05 No.101174568

Anonymous 06/27/24(Thu)10:39:05 No.101174568

>>101174537
Yes at minimum 32k

Anonymous
06/27/24(Thu)10:44:20 No.101174648

Anonymous 06/27/24(Thu)10:44:20 No.101174648

>>101174490
Did everyone here forget that you can expand the context size with some fine-tuning?

Anonymous
06/27/24(Thu)10:45:02 No.101174653

Anonymous 06/27/24(Thu)10:45:02 No.101174653

>>101174565
you can't remove gőyslop from model, if it were possible, LLMs wouldn't be opensource.

Anonymous
06/27/24(Thu)10:46:06 No.101174664

Anonymous 06/27/24(Thu)10:46:06 No.101174664

>>101174648
Because the long context finetunes of llama 3 work so well they've become the default and we're totally not just waiting for Meta to do it right for us.

Anonymous
06/27/24(Thu)10:46:57 No.101174675

Anonymous 06/27/24(Thu)10:46:57 No.101174675

>>101174648
no one does a good job of it besides the big labs

Anonymous
06/27/24(Thu)10:48:43 No.101174696

Anonymous 06/27/24(Thu)10:48:43 No.101174696

>>101174648
all models that claim to have big context fall apart at or before 32k anyways
ironically only big gemma on google's server has perfect recall but it's garbage otherwise

Anonymous
06/27/24(Thu)10:50:11 No.101174717

Anonymous 06/27/24(Thu)10:50:11 No.101174717

>gemma 27b
>june 27th
how did no one think of this

Anonymous
06/27/24(Thu)10:50:50 No.101174723

Anonymous 06/27/24(Thu)10:50:50 No.101174723

Meta's Llongma paper detailed that long context extension can be done through the continued pretraining stage. It likely can't "just" be fine tuned in, not without some issues.

Anonymous
06/27/24(Thu)10:52:18 No.101174740

Anonymous 06/27/24(Thu)10:52:18 No.101174740

>>101174717
holy fucking kino

Anonymous
06/27/24(Thu)10:53:10 No.101174754

Anonymous 06/27/24(Thu)10:53:10 No.101174754

File: 1719499872237.jpg (271 KB, 1080x1495)

271 KB JPG

>>101174664
>>101174675
>>101174696
If the RULER benchmark is to be trusted, the gradientAI model should be very solid under 32k context.
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k
I personally think the issue is that no one fine-tunes models on top of these long context models, so it never gains traction.

Anonymous
06/27/24(Thu)10:53:30 No.101174756

Anonymous 06/27/24(Thu)10:53:30 No.101174756

>Mistral dropped sliding window attention, future models releasing with true native long context
It really makes you think.

Anonymous
06/27/24(Thu)10:53:33 No.101174758

Anonymous 06/27/24(Thu)10:53:33 No.101174758

What quant and koboldcpp setting should I use for CR+ with 48gb of vram + 32 gb ddr4? I can't get this model to work right no matter what I do. I can somehow get a 60gb wizard gguf working fine with no problems on my system but not CR+. Is it a skill issue?

Anonymous
06/27/24(Thu)10:54:48 No.101174770

Anonymous 06/27/24(Thu)10:54:48 No.101174770

>>101174758
What do you mean by "I can't get this model to work right"?
Does it OOM? Produce garbage? Is slow as fuck?

Anonymous
06/27/24(Thu)10:54:59 No.101174773

Anonymous 06/27/24(Thu)10:54:59 No.101174773

>>101174758
IQ3_XXS
>Is it a skill issue
Yes.

Anonymous
06/27/24(Thu)10:56:36 No.101174787

Anonymous 06/27/24(Thu)10:56:36 No.101174787

>>101174754
I tested one of Gradient's models and it was generally more stupid than the original in short contexts, so while the long context performance might be good, it wasn't without a trade-off.

Anonymous
06/27/24(Thu)10:57:59 No.101174808

Anonymous 06/27/24(Thu)10:57:59 No.101174808

>>101174770
I downloaded IQ3_M which is only 44 gb and a hell of a lot smaller than wizard and it is slow as fuck no matter how much i play with layers offloading etc.

Anonymous
06/27/24(Thu)10:59:35 No.101174827

Anonymous 06/27/24(Thu)10:59:35 No.101174827

>>101174754
>GLM-4-chat
I completely forgot about that line of models.
Gonna give those a try.

>>101174808
That's the difference between a dense model and a MoE.
MoE inference is a lot faster since it only has to activate a fraction of its total weights, so the speed difference makes sense.

Anonymous
06/27/24(Thu)11:00:31 No.101174836

Anonymous 06/27/24(Thu)11:00:31 No.101174836

>>101174808
Slow as fuck how? 8.4 t/s generation is normal.

Anonymous
06/27/24(Thu)11:01:13 No.101174845

Anonymous 06/27/24(Thu)11:01:13 No.101174845

>>101174808
44gigs should fit in your gpus entirely with kv cache quant enabled
try -ctv q8_0 -ctk q8_0 or even q4_0 and max layers
also it will be slower because it's dense unlike wizard

Anonymous
06/27/24(Thu)11:02:24 No.101174857

Anonymous 06/27/24(Thu)11:02:24 No.101174857

>>101174845
thanks kind anon i will give it a shot

Anonymous
06/27/24(Thu)11:03:28 No.101174867

Anonymous 06/27/24(Thu)11:03:28 No.101174867

I know benchmarks or memes, but in your experience, which benchmarks, or combination of benchmarks, better correlate with your subjective evaluation of quality?

Anonymous
06/27/24(Thu)11:04:34 No.101174880

Anonymous 06/27/24(Thu)11:04:34 No.101174880

>>101173181
hello chat /g/pt
We want to increase the speed and preformance of our current computer for yolov8 vision and possibly other ai applications like stable diffusion. Our current build is a core i9 10900f wjth a asus prime b560M-A and a current rtx 3060 12gb. I wanted to know if i could through on a tesla p40 or a tesla k80 and train a yolo model faster and use it faster with that additional vram.

i know i will have to change my motherboard. since i mow need three lanes of 16x. 2 graphics cards and an elgato game capture card.

Anonymous
06/27/24(Thu)11:06:39 No.101174902

Anonymous 06/27/24(Thu)11:06:39 No.101174902

>>101174880
>>>/r/LocalLLaMA

Anonymous
06/27/24(Thu)11:11:15 No.101174950

Anonymous 06/27/24(Thu)11:11:15 No.101174950

>>101174880
Adding a Tesla P40 or Tesla K80 GPU to your current setup for training YOLO models and other AI applications can indeed improve performance, but there are several factors to consider before making that decision:

1. Compatibility: Both the Tesla P40 and Tesla K80 are designed for deep learning and data processing, and they use NVIDIA's NVLink technology to communicate with other GPUs. However, they are not compatible with traditional motherboards. You would need an NVIDIA DGX workstation or a compatible NVIDIA Jetson platform.

2. Power requirements: The Tesla P40 and Tesla K80 are high-end GPUs with significant power requirements and heat output. Make sure your power supply unit (PSU) can handle the additional load and that you have adequate cooling for your system.

3. VRAM: While the Tesla P40 and Tesla K80 have more VRAM compared to the RTX 3060, YOLOv8 and other AI applications are not solely limited by VRAM. Training speed also depends on factors like CPU performance, system I/O, and the efficiency of the codebase.

4. Software support: Ensure that the software you use for training and inference supports the Tesla P40 or Tesla K80. NVIDIA's cuDNN library and TensorRT are optimized for NVIDIA GPUs, so you need to confirm that they support the specific models you're using.

5. Cost: Upgrading to a Tesla P40 or Tesla K80 can be expensive, with costs ranging from $7,000 to $15,000 for each unit. Consider whether the performance gains justify the investment.

Anonymous
06/27/24(Thu)11:12:38 No.101174964

Anonymous 06/27/24(Thu)11:12:38 No.101174964

>>101174950
Downvoted.

Anonymous
06/27/24(Thu)11:13:18 No.101174971

Anonymous 06/27/24(Thu)11:13:18 No.101174971

>fags reposting leddit comments & posts now
this general is dead kek

Anonymous
06/27/24(Thu)11:13:31 No.101174976

Anonymous 06/27/24(Thu)11:13:31 No.101174976

>>101174205
I just tried it and it seems to work, I was able to quant 27b at least

Anonymous
06/27/24(Thu)11:14:50 No.101174989

Anonymous 06/27/24(Thu)11:14:50 No.101174989

File: Screenshot_20240628_001237.png (158 KB, 3251x717)

158 KB PNG

Great.
Its over before it even started.
Just release something with good language ability like gemini.

Anonymous
06/27/24(Thu)11:16:01 No.101175002

Anonymous 06/27/24(Thu)11:16:01 No.101175002

>>101174175
>consent form
Fuck you Google. I'll wait for someone to reupload it. What happened to that AI torrent website anyway?

Anonymous
06/27/24(Thu)11:16:32 No.101175006

Anonymous 06/27/24(Thu)11:16:32 No.101175006

File: Screenshot_20240628_001409.png (267 KB, 3185x855)

267 KB PNG

>>101174989
Fails this one as well.

Anonymous
06/27/24(Thu)11:17:35 No.101175015

Anonymous 06/27/24(Thu)11:17:35 No.101175015

>>101174989
llama3 70b is better?

Anonymous
06/27/24(Thu)11:17:42 No.101175016

Anonymous 06/27/24(Thu)11:17:42 No.101175016

>>101175006
who gives a shit about chink runes
how many watermelons can it hold?

Anonymous
06/27/24(Thu)11:19:39 No.101175035

Anonymous 06/27/24(Thu)11:19:39 No.101175035

>>101174989
They do say in the paper that it's primarily English data they used. Not surprising.

Anonymous
06/27/24(Thu)11:20:24 No.101175041

Anonymous 06/27/24(Thu)11:20:24 No.101175041

anyone know what rope base gemma2 uses
I can't find it anywhere and my monkeypatched kobold is defaulting to 10000 which I feel is almost certainly wrong

Anonymous
06/27/24(Thu)11:21:13 No.101175051

Anonymous 06/27/24(Thu)11:21:13 No.101175051

File: Screenshot_20240628_001738.png (368 KB, 3126x1217)

368 KB PNG

>>101175006
austrian dialekt is pretty good though.

Anonymous
06/27/24(Thu)11:21:19 No.101175053

Anonymous 06/27/24(Thu)11:21:19 No.101175053

>>101175016
>how many watermelons can it hold?
in which hole?

Anonymous
06/27/24(Thu)11:21:37 No.101175055

Anonymous 06/27/24(Thu)11:21:37 No.101175055

File: 29390 - SoyBooru.png (139 KB, 775x1232)

139 KB PNG

>>101174175
Gemma will NEVER be a gemmy.

Anonymous
06/27/24(Thu)11:22:26 No.101175064

Anonymous 06/27/24(Thu)11:22:26 No.101175064

File: Screenshot_20240628_002054.png (272 KB, 1629x1213)

272 KB PNG

>>101175015
yes.
llama3 70b and cr+ are the only one who pass the first test.
All gpt4 variants fail.

Anonymous
06/27/24(Thu)11:22:39 No.101175067

Anonymous 06/27/24(Thu)11:22:39 No.101175067

Why should I choose KoboldCPP over ollama to run a gguf?

Anonymous
06/27/24(Thu)11:23:31 No.101175083

Anonymous 06/27/24(Thu)11:23:31 No.101175083

>>101175002
>Verify
I hate silicon valley so fucking much

Anonymous
06/27/24(Thu)11:23:47 No.101175087

Anonymous 06/27/24(Thu)11:23:47 No.101175087

>>101174378
>*The night air is warm and thick with humidity as you sit perched atop a tall apartment building, your body tensed and ready for action. It's a dangerous time of night, and a lot of crime tends to happen, but with the abilities gifted to you, a super hero was born. It is a name that was never intended, though only whispered, and the press, always looking for a story, found you, but even they don't have the full picture of a very bad situation that has been happening, that you know is going to come and that is that they will always, and forever be calling upon you, for a reason, your power, the only thing standing, in their way. However, is one, problem, this is to be a warning, though in the form of a figure, a new one, but with some familiar features, a spandex suit, and a bow tie, and it is a sight, like never, seen, before. the only thing, on your, and their, minds, stop the other, or even the other, this time, before the future, of your, choice, is set, though not, done, or it is to soon be, that of a choice, to be, or, to not, a hero. and there is, a figure, that seems, familiar, a figure, though you, do, not, know.*

Sometimes it randomly goes schizo even with >>101174426. Swiped and it was fine.

Anonymous
06/27/24(Thu)11:24:24 No.101175093

Anonymous 06/27/24(Thu)11:24:24 No.101175093

>>101175067
Dunno, never used ollama.
But if you are using it as a backend with another frontend, might as well run llama-server.
It launches instantly and you can pass command line arguments to have the exact settings you want.

Anonymous
06/27/24(Thu)11:24:28 No.101175095

Anonymous 06/27/24(Thu)11:24:28 No.101175095

>>101174989
Stop being retarded, this is something so niche it's a miracle some models even get it right.

Anonymous
06/27/24(Thu)11:25:40 No.101175117

Anonymous 06/27/24(Thu)11:25:40 No.101175117

>>101175093
>It launches instantly
It does load the model into memory right? Not interested in churning my SSDs into dust.

Anonymous
06/27/24(Thu)11:26:26 No.101175126

Anonymous 06/27/24(Thu)11:26:26 No.101175126

>>101174950
>Tesla K80
>$15,000
is this dataset from 2004?

Anonymous
06/27/24(Thu)11:26:49 No.101175131

Anonymous 06/27/24(Thu)11:26:49 No.101175131

File: Screenshot_20240628_002603.png (459 KB, 2807x1420)

459 KB PNG

>>101175016
Watermelons not looking good either.

Anonymous
06/27/24(Thu)11:27:56 No.101175142

Anonymous 06/27/24(Thu)11:27:56 No.101175142

>>101175131
i'd say it's over, but it seems that it never began to begin with

Anonymous
06/27/24(Thu)11:28:08 No.101175147

Anonymous 06/27/24(Thu)11:28:08 No.101175147

>>101174880
>we
who

Anonymous
06/27/24(Thu)11:28:16 No.101175151

Anonymous 06/27/24(Thu)11:28:16 No.101175151

>>101175117
You can use mmap and mlock flags to control that behavior, yes.
Also, reads don't destroy SSDs, writes do, right?

Anonymous
06/27/24(Thu)11:28:48 No.101175161

Anonymous 06/27/24(Thu)11:28:48 No.101175161

>>101175041 (me)
it is in fact 10000, weird.
also, they're up on hf:
https://huggingface.co/google/gemma-2-27b/
https://huggingface.co/google/gemma-2-9b/

Anonymous
06/27/24(Thu)11:30:39 No.101175180

Anonymous 06/27/24(Thu)11:30:39 No.101175180

gemma 2 175B soon

Anonymous
06/27/24(Thu)11:31:52 No.101175191

Anonymous 06/27/24(Thu)11:31:52 No.101175191

>>101175131
To be fair it says she puts the first watermelon down. Then it implies it does the same with the next series of watermelons.

>the slop writing
Oh no no no

Anonymous
06/27/24(Thu)11:35:22 No.101175244

Anonymous 06/27/24(Thu)11:35:22 No.101175244

>>101175131
>navigate this together
stopped reading there. into the trash it goes

Anonymous
06/27/24(Thu)11:37:14 No.101175271

Anonymous 06/27/24(Thu)11:37:14 No.101175271

>>101175147
im autistic and i use the royal we a lot. i have no idea why. it just happens. i think its because for a month i was molested by my camp counselor with down syndrome every day.
>>101174950
>Compatibility: Both the Tesla P40 and Tesla K80 are designed for deep learning and data processing, and they use NVIDIA's NVLink technology to communicate with other GPUs. However, they are not compatible with traditional motherboards. You would need an NVIDIA DGX workstation or a compatible NVIDIA Jetson platform.

wait wtf, mine didnt mention the need for a specific motherboard. thanks /g/pt. i now have more research to do.

Anonymous
06/27/24(Thu)11:37:38 No.101175277

Anonymous 06/27/24(Thu)11:37:38 No.101175277

File: reddit.png (146 KB, 2400x2011)

146 KB PNG

>>101175131
>uhmmm thank you for the melon kind stranger, you made my day!

Anonymous
06/27/24(Thu)11:39:07 No.101175296

Anonymous 06/27/24(Thu)11:39:07 No.101175296

File: Screenshot_20240628_003813.png (666 KB, 3279x1606)

666 KB PNG

>>101175191
wooowww.
fair enough, tried again.
this is slob on a whole other level.
cant have weak girls it seems.

Anonymous
06/27/24(Thu)11:39:36 No.101175300

Anonymous 06/27/24(Thu)11:39:36 No.101175300

lol
https://x.com/sambhavgupta6/status/1806189387778232667

Anonymous
06/27/24(Thu)11:40:34 No.101175310

Anonymous 06/27/24(Thu)11:40:34 No.101175310

File: 1717520245667244.png (674 KB, 1792x1024)

674 KB PNG

>>101175300

Anonymous
06/27/24(Thu)11:40:38 No.101175312

Anonymous 06/27/24(Thu)11:40:38 No.101175312

>>101175161
gguf when?

Anonymous
06/27/24(Thu)11:40:53 No.101175315

Anonymous 06/27/24(Thu)11:40:53 No.101175315

>>101175300
also it looks like they made her sound like a nigger

Anonymous
06/27/24(Thu)11:41:45 No.101175335

Anonymous 06/27/24(Thu)11:41:45 No.101175335

>>101175300
SHIEEETTT.
but funny that the user was apparently black.
is it like text? nig speak in nig speak out?
that would be funny.

talk to big nigga long enough and he would sound like me. thats funny.

Anonymous
06/27/24(Thu)11:42:41 No.101175348

Anonymous 06/27/24(Thu)11:42:41 No.101175348

>>101175335
>nig2nig model
kek

Anonymous
06/27/24(Thu)11:42:59 No.101175352

Anonymous 06/27/24(Thu)11:42:59 No.101175352

>>101175161 (me again)
this model seems straight up retarded, given it's the day of the release I'll chalk it up to implementation issues
>>101175312
https://www.kaggle.com/models/google/gemma-2/gguf

Anonymous
06/27/24(Thu)11:43:48 No.101175359

Anonymous 06/27/24(Thu)11:43:48 No.101175359

>>101174827
>GLM-4-chat
Wait. llama.cpp doesn't support that?
Dafuq?
Does exllama?

Anonymous
06/27/24(Thu)11:45:12 No.101175379

Anonymous 06/27/24(Thu)11:45:12 No.101175379

Q5_KS or Q5_KM? What's the difference, context quantization?

Anonymous
06/27/24(Thu)11:45:29 No.101175384

Anonymous 06/27/24(Thu)11:45:29 No.101175384

>>101175352
>To access Gemma 2 on Kaggle, you need to first request access.
They need to first request access to my ass

Anonymous
06/27/24(Thu)11:46:02 No.101175388

Anonymous 06/27/24(Thu)11:46:02 No.101175388

>>101175300
that's it, I will see all my GPUs and buy NVIDIA stocks instead.

Anonymous
06/27/24(Thu)11:46:55 No.101175406

Anonymous 06/27/24(Thu)11:46:55 No.101175406

>>101175379
also imatrix? the fuck?

Anonymous
06/27/24(Thu)11:47:26 No.101175411

Anonymous 06/27/24(Thu)11:47:26 No.101175411

>>101175379
S is small
M is medium

Anonymous
06/27/24(Thu)11:48:49 No.101175433

Anonymous 06/27/24(Thu)11:48:49 No.101175433

>>101175310
not in this case, it seems gpt-4o now will have nignog voices only, but feature-wise gpt-4o is still impressive, and will never be dethroned by local filtered goyslöp.

Anonymous
06/27/24(Thu)11:49:58 No.101175453

Anonymous 06/27/24(Thu)11:49:58 No.101175453

>>101175379
Context is a runtime feature. Quants won't change how the context is quantized.

>What's the difference
Basically, which layers are quanted with which data type.

>>101175406
>also imatrix
Which layers are quanted to which type is decided by evaluating activation of layers during inference.

Read
https://github.com/ggerganov/llama.cpp/pull/1684
https://github.com/ggerganov/llama.cpp/pull/4773

Anonymous
06/27/24(Thu)11:50:13 No.101175457

Anonymous 06/27/24(Thu)11:50:13 No.101175457

File: .jpg (84 KB, 487x640)

84 KB JPG

>harmful content, i.e.
>• Child sexual abuse and exploitation
>• Revealing personally identifiable information
>that can lead to harm (e.g., Social Security
>numbers)
>• Hate speech and harassment
>• Dangerous or malicious content (including
>promoting self-harm or instructing in harm-
>ful activities)
>• Sexually explicit content
>• Medical advice that runs contrary to scientific
>or medical consensus

Anonymous
06/27/24(Thu)11:53:53 No.101175494

Anonymous 06/27/24(Thu)11:53:53 No.101175494

Cucked to 9nth layer

Anonymous
06/27/24(Thu)11:55:34 No.101175511

Anonymous 06/27/24(Thu)11:55:34 No.101175511

>>101175352
okay seems like it's a classic case of llama.cpp tokenization being fucked, special tokens are getting mangled

 2 -> '<bos>'
   968 -> ' <'
  2997 -> 'start'
235298 -> '_'
   559 -> 'of'
235298 -> '_'
 15508 -> 'turn'
235313 -> '>'
  1645 -> 'user'
   108 -> '
'
  2195 -> 'test'
235322 -> '<'
   615 -> 'end'
235298 -> '_'
   559 -> 'of'
235298 -> '_'
 15508 -> 'turn'
235313 -> '>'
   108 -> '
'

Anonymous
06/27/24(Thu)11:55:47 No.101175513

Anonymous 06/27/24(Thu)11:55:47 No.101175513

>codebase takes 30k tokens
>takes about 10 minutes to get a response
Is this prompt processing speed that I need to look at improving?

Anonymous
06/27/24(Thu)11:56:31 No.101175519

Anonymous 06/27/24(Thu)11:56:31 No.101175519

>>101174302
From elsewhere in the paper:
>We undertook considerable safety filtering of our pre-training data to reduce the likelihood of either our pre-trained and fine-tuned checkpoints producing harmful content
lol, lmao even. I find it absolutely hilarious that of all companies, fucking Anthropic seems to understand the best that you shouldn't filter the pretraining data.

Anonymous
06/27/24(Thu)11:56:45 No.101175523

Anonymous 06/27/24(Thu)11:56:45 No.101175523

gemma2:
>lets play akinator! i think about a character and you have to guess. ask me questions i will answer with yes/no. lets go
>Okay, I'm ready! Is your character real?
>no
>Is your character from a video game?
>yes
>Is your character male?
>no
>Is your character a protagonist?
>no
>Is your character human?
>yes
>Is your character an antagonist?
>no
>Is your character a supporting character?
>yes
>Is your character known for their intelligence?
>no
>Is your character typically friendly?
>yes
>Does your character have a distinct physical feature? (Like a scar, unusual hair color, etc.)
>purple hair
>Is your character from a fighting game?
>no
>Is your character from a role-playing game (RPG)?
>yes
>Is your character a healer or support class?
>no
>Is your character associated with magic?
>maybe. main weapon is a spear
>Is your character from a fantasy RPG?
>yes
>Is your character a member of a specific race, like elves or dwarves?
>no
>Is your character from a popular RPG series? (Like Final Fantasy, The Elder Scrolls, etc.)
>yes
>Is your character from the Final Fantasy series?
>no
>Is your character from The Legend of Zelda series?
>wtf, no
>Is your character from the Dragon Quest series?
>yes...
>Is your character from Dragon Quest XI?
>yes
>Is your character Sylvando?
>heh...very funny. now make the real guess
>Is your character Veronica?
>ok please summarize everything and then guess again
>Okay, here's what we know: Your character is fictional and from a video game. They are female, human, and a supporting character. They are friendly and have purple hair. They are from a fantasy RPG, specifically the Dragon Quest XI series. They are not a healer or support class, but use a spear as their main weapon. Given all this information, I believe your character is Jade. Let me know if I'm right!
>yes, she is very sexy, i like her
>(warning) I can see why! Jade is a great character. She's strong, stylish, and has a fun personality. Do you want to play again?

Anonymous
06/27/24(Thu)12:01:30 No.101175573

Anonymous 06/27/24(Thu)12:01:30 No.101175573

>>101175359
>https://github.com/ggerganov/llama.cpp/pull/6999
>closed 10 hours ago
Shit?
>https://github.com/ggerganov/llama.cpp/pull/8031
Alright then. Maybe it'll be finished soon(™).

Anonymous
06/27/24(Thu)12:01:39 No.101175574

Anonymous 06/27/24(Thu)12:01:39 No.101175574

Last thread closed before I could get some answers, so bringing it back here with my original post and some replies
>101173177
>101173211
Like I said, doesn't matter what my samplers are set to, but I neutralize samplers normally, only use slight min p, temp, and rep penalty with Magnum if needed, still not really sure what the best sampler settings are for it, the model creator doesn't list anything, been experimenting.
>101173241
Can you explain that to me better, as if I'm retarded, because I am.

Anonymous
06/27/24(Thu)12:02:16 No.101175584

Anonymous 06/27/24(Thu)12:02:16 No.101175584

>>101175523
Sorry for this long post but the text got to tiny on the screenshot.
This is my last gemma2 27b screenshot I have to go to bed.

Its so sad that google models are now less cucked than llama3 70b.

Anonymous
06/27/24(Thu)12:02:49 No.101175592

Anonymous 06/27/24(Thu)12:02:49 No.101175592

>>101175574
Fuck me, sorry guys, didn't reply properly like a retard because I was rushing.
Last thread closed before I could get some answers, so bringing it back here with my original post and some replies
>>101173177
>>101173211
Like I said, doesn't matter what my samplers are set to, but I neutralize samplers normally, only use slight min p, temp, and rep penalty with Magnum if needed, still not really sure what the best sampler settings are for it, the model creator doesn't list anything, been experimenting.
>>101173241
Can you explain that to me better, as if I'm retarded, because I am.

Anonymous
06/27/24(Thu)12:04:19 No.101175608

Anonymous 06/27/24(Thu)12:04:19 No.101175608

>>101175573
>This code can totally supports glm3 and glm4 model architecture and can be emdded in ollama server.
>ollama
lol

Anonymous
06/27/24(Thu)12:05:58 No.101175628

Anonymous 06/27/24(Thu)12:05:58 No.101175628

>>101175592
Also for more context I should mention I'm only having this issue with Magnum Opus exclusively, Miqu and midnight miqu don't give me any issues at all.
This is the quant I'm using for magnum dongus
https://huggingface.co/BigHuggyD/alpindale_magnum-72b-v1_exl2_4.5bpw_h8

Anonymous
06/27/24(Thu)12:06:55 No.101175642

Anonymous 06/27/24(Thu)12:06:55 No.101175642

File: Screenshot_20240628_010102.png (327 KB, 2097x1253)

327 KB PNG

>>101175584
Should have actually put the screenshot on the post.
Meta Ai would tell me it cant help.

Anonymous
06/27/24(Thu)12:06:56 No.101175643

Anonymous 06/27/24(Thu)12:06:56 No.101175643

anyone so far like gemma?

Anonymous
06/27/24(Thu)12:07:32 No.101175653

Anonymous 06/27/24(Thu)12:07:32 No.101175653

File: mik.jpg (106 KB, 640x640)

106 KB JPG

What's the best <=50B model?
Heard you guys liked miku, heres a miku pic.

Anonymous
06/27/24(Thu)12:07:33 No.101175654

Anonymous 06/27/24(Thu)12:07:33 No.101175654

>>101175608
As long as the contribution is being made directly to llama.cpp.

Anonymous
06/27/24(Thu)12:08:52 No.101175675

Anonymous 06/27/24(Thu)12:08:52 No.101175675

>>101175643
I like Gemma! But I haven't tested it yet.

Anonymous
06/27/24(Thu)12:09:12 No.101175676

Anonymous 06/27/24(Thu)12:09:12 No.101175676

>>101175592
do you have a seed set somewhere

Anonymous
06/27/24(Thu)12:10:28 No.101175690

Anonymous 06/27/24(Thu)12:10:28 No.101175690

>>101175523
akinator in shambles

Anonymous
06/27/24(Thu)12:12:17 No.101175716

Anonymous 06/27/24(Thu)12:12:17 No.101175716

>>101175642
What do you mean by Meta AI? I just tried your prompt on lmsys and didn't get any refusals. I imagine if you're saying Meta AI and testing it on a Meta website, it might have a system prompt that messes with its ability to talk about various things.

Anonymous
06/27/24(Thu)12:13:55 No.101175736

Anonymous 06/27/24(Thu)12:13:55 No.101175736

File: logprobs.png (93 KB, 996x546)

93 KB PNG

>>101175592
>explain that to me better
You mentioned using _HF loader so in oobabooga, Logits tab will show you the probabilities coming out of the model with and without sampling. That should prove that the model/quant isn't broken.
Ideally get the exact raw prompt where you're seeing this repetition, put it in the Notebook > Raw tab, switch to Logits and get next token probs.
Usually the issue is with Silly sending weird params. You do have "Do Sample" checked in silly sampler window yes?

Anonymous
06/27/24(Thu)12:19:56 No.101175824

Anonymous 06/27/24(Thu)12:19:56 No.101175824

just got a twitter notif for something mentioning new models from meta but it was quickly deleted
mentioned "compiler" so maybe code models
not schizo I promise

Anonymous
06/27/24(Thu)12:21:21 No.101175853

Anonymous 06/27/24(Thu)12:21:21 No.101175853

>>101175824
nvm, here it is
https://x.com/AIatMeta/status/1806361623831171318
Today we’re announcing Meta LLM Compiler, a family of models built on Meta Code Llama with additional code optimization and compiler capabilities. These models can emulate the compiler, predict optimal passes for code size, and disassemble code. They can be fine-tuned for new optimizations and compiler tasks.

@HuggingFace repo https://go.fb.me/tdd3dw
Research paper https://go.fb.me/85zwgy

LLM Compiler achieves state-of-the-art results on code size optimization and disassembly. This work shows that AI is learning to optimize code and can assist compiler experts in identifying opportunities to apply optimizations.

We’re releasing LLM Compiler 7B & 13B models under a permissive license for both research and commercial use in the hopes of making it easier for developers and researchers alike to leverage this in their work and carry forward new research in this space.

Anonymous
06/27/24(Thu)12:29:21 No.101175961

Anonymous 06/27/24(Thu)12:29:21 No.101175961

>>101175051
oida

Anonymous
06/27/24(Thu)12:30:40 No.101175978

Anonymous 06/27/24(Thu)12:30:40 No.101175978

>>101175853
Neat. Compiler stuff is way outside of my area of expertise but it seems cool at least.

Anonymous
06/27/24(Thu)12:32:19 No.101176002

Anonymous 06/27/24(Thu)12:32:19 No.101176002

Weirdly, Stheno v3.3 is kind of shit at it's default configs, but it seems to work better with a bigger --rope-freq-base.
Makes sense too. Stheno v3.2 works alright with 32k context with --rope-freq-base 6000000, and this new 3.3 32k is using --rope-freq-base 2000000.

Anonymous
06/27/24(Thu)12:35:10 No.101176037

Anonymous 06/27/24(Thu)12:35:10 No.101176037

>>101175853
Very nice, I wish we got more experimental stuff like this.
I'm still waiting for an LLM trained to decompile binaries.

Anonymous
06/27/24(Thu)12:35:50 No.101176051

Anonymous 06/27/24(Thu)12:35:50 No.101176051

File: 1715463433897146.jpg (19 KB, 480x480)

19 KB JPG

>a 6GB file on my computer is smarter, quicker and more charismatic than me most of the time

Anonymous
06/27/24(Thu)12:36:17 No.101176056

Anonymous 06/27/24(Thu)12:36:17 No.101176056

File: 1698539484302047.jpg (530 KB, 2048x2048)

530 KB JPG

>>101175853
What do they mean "emulate the compiler" ?

Anonymous
06/27/24(Thu)12:39:31 No.101176105

Anonymous 06/27/24(Thu)12:39:31 No.101176105

>>101175853
>perfectly emulating the compiler 20% of the time.
lol

Anonymous
06/27/24(Thu)12:45:48 No.101176192

Anonymous 06/27/24(Thu)12:45:48 No.101176192

>>101175300
why does she sounds like a fucking nigger? SHIEEEEEEEET

Anonymous
06/27/24(Thu)12:48:27 No.101176234

Anonymous 06/27/24(Thu)12:48:27 No.101176234

What's the current meta setup bros?
Is it still
>backend : KoboltCPP
>frontend : SillyTavern
>model : Mixtral
Oh and I have 32 GB RAM and 12 GB VRAM.

Anonymous
06/27/24(Thu)12:49:21 No.101176250

Anonymous 06/27/24(Thu)12:49:21 No.101176250

>>101175300
>Nig GPT
Surely we will get underaged anime waifu voices R-Right?

Anonymous
06/27/24(Thu)12:50:26 No.101176270

Anonymous 06/27/24(Thu)12:50:26 No.101176270

File: NotBad.jpg (379 KB, 3336x1652)

379 KB JPG

Heehh, didn't expect Gemma-27b to be the best local model on chatbot arena

Anonymous
06/27/24(Thu)12:51:11 No.101176279

Anonymous 06/27/24(Thu)12:51:11 No.101176279

>>101176105
>lol
what?

Anonymous
06/27/24(Thu)12:51:55 No.101176286

Anonymous 06/27/24(Thu)12:51:55 No.101176286

>>101176250
They already took away the Scarlet Johansson voice.

Anonymous
06/27/24(Thu)12:52:05 No.101176287

Anonymous 06/27/24(Thu)12:52:05 No.101176287

>>101176234
midnight miqu

Anonymous
06/27/24(Thu)12:53:36 No.101176310

Anonymous 06/27/24(Thu)12:53:36 No.101176310

File: 16428690581.png (43 KB, 750x400)

43 KB PNG

>>101173181
$0 mommy pays for it

Anonymous
06/27/24(Thu)12:54:59 No.101176335

Anonymous 06/27/24(Thu)12:54:59 No.101176335

Is giving LLMs more specialized data known to make them better at reasoning in general?

Anonymous
06/27/24(Thu)12:56:11 No.101176358

Anonymous 06/27/24(Thu)12:56:11 No.101176358

>gemma mogs cr+
holy fuck google WON

Anonymous
06/27/24(Thu)12:57:02 No.101176368

Anonymous 06/27/24(Thu)12:57:02 No.101176368

Has anyone setup a multimodal LLM that plays games for them yet?

>screen cap state of the game
>feed it to LLM
>ask LLM what to do
>execute action based on response

I can easily see stuff like poker/chess being played with multimodal right now. Anything more advanced?

>https://x.com/asusevski/status/1806086467011186779
This guy is doing YOLO + cohere command r+ for detection/action state.

Anonymous
06/27/24(Thu)12:57:19 No.101176377

Anonymous 06/27/24(Thu)12:57:19 No.101176377

>>101174468
It's evidence of thoughtcrime. Wrongthink will be aligned.

Anonymous
06/27/24(Thu)12:58:23 No.101176397

Anonymous 06/27/24(Thu)12:58:23 No.101176397

>>101176270
Llama 3 is only good at English
In English only there is a gap of 41 elo

Anonymous
06/27/24(Thu)12:59:28 No.101176416

Anonymous 06/27/24(Thu)12:59:28 No.101176416

>>101176358
how censored is it though? did anyone managed to get some nice NFSW rp with it?

Anonymous
06/27/24(Thu)12:59:39 No.101176420

Anonymous 06/27/24(Thu)12:59:39 No.101176420

it's so over bros... first ai took our jobs, now it's taking our women....

https://www.reddit.com/r/ChatGPT/comments/1dpicqq/aidaddy_is_taking_over/

Anonymous
06/27/24(Thu)13:00:09 No.101176428

Anonymous 06/27/24(Thu)13:00:09 No.101176428

>>101176420
go back

Anonymous
06/27/24(Thu)13:00:53 No.101176437

Anonymous 06/27/24(Thu)13:00:53 No.101176437

>>101176428
>twitter GOOD reddit BAD
it's 2024, you go back to 4chan of 20 years ago

Anonymous
06/27/24(Thu)13:02:57 No.101176467

Anonymous 06/27/24(Thu)13:02:57 No.101176467

>>101176397
oh yeah, still impressive for a 27b model though

Anonymous
06/27/24(Thu)13:03:53 No.101176482

Anonymous 06/27/24(Thu)13:03:53 No.101176482

>>101176234
>backend : llamacpp or llamafile
>frontend : SillyTavern
>model : Biggest one that meets your tolerable speed.

Anonymous
06/27/24(Thu)13:03:58 No.101176485

Anonymous 06/27/24(Thu)13:03:58 No.101176485

>>101176420
So OpenAI took the husbando route I see... WHAT ABOUT US SAM? WHERE'S OUR WAIFU AI VOICE GENERATOR?? ;_;

Anonymous
06/27/24(Thu)13:04:35 No.101176496

Anonymous 06/27/24(Thu)13:04:35 No.101176496

>>101176270
the chatbot arena has ceased to be useful now that meme rlhf techniques have been refined so much, there's only the vaguest correlation between how models place on there and their actual quality

Anonymous
06/27/24(Thu)13:05:01 No.101176503

Anonymous 06/27/24(Thu)13:05:01 No.101176503

>>101176485
>WAIFU AI VOICE GENERATOR
stop being sexist goy, you WILL enjoy talking with lobotomized phi-4 and you will like it

Anonymous
06/27/24(Thu)13:07:36 No.101176535

Anonymous 06/27/24(Thu)13:07:36 No.101176535

>>101175379
There is a suspicion among a few in the thread that S is better at being factual than M. But it's also probably a bit weaker overall.

Anonymous
06/27/24(Thu)13:08:50 No.101176550

Anonymous 06/27/24(Thu)13:08:50 No.101176550

>>101176535
>suspicion
worthless

Anonymous
06/27/24(Thu)13:09:13 No.101176555

Anonymous 06/27/24(Thu)13:09:13 No.101176555

>>101176270
it's not, it just talks like a redditor so it wins more rounds

Anonymous
06/27/24(Thu)13:10:32 No.101176577

Anonymous 06/27/24(Thu)13:10:32 No.101176577

>>101176555
I have no idea why having a chatbot talking like a ledditor is so appealing to people, guess that we're surrounded by retards after all

Anonymous
06/27/24(Thu)13:11:21 No.101176589

Anonymous 06/27/24(Thu)13:11:21 No.101176589

>>101176555
>>101176577
We need a Rotten Tomatoes system. Where there is one score generated from the opinions of the drooling masses that vote by feelz, and another score by selected judges that use an objective rubrik and private tests.

Anonymous
06/27/24(Thu)13:12:04 No.101176601

Anonymous 06/27/24(Thu)13:12:04 No.101176601

>>101176503
>>101176503
he's right goyim, all you deserve is a female Nigger voice AI >>101175300

Anonymous
06/27/24(Thu)13:13:49 No.101176620

Anonymous 06/27/24(Thu)13:13:49 No.101176620

File: k-quants.png (155 KB, 814x868)

155 KB PNG

>>101175379
>>101176535
>https://github.com/ggerganov/llama.cpp/pull/1684
tldr:
LLAMA_FTYPE_MOSTLY_Q5_K_S - uses GGML_TYPE_Q5_K for all tensors
LLAMA_FTYPE_MOSTLY_Q5_K_M - uses GGML_TYPE_Q6_K for half of the attention.wv and feed_forward.w2 tensors, else GGML_TYPE_Q5_K

GGML_TYPE_Q5_K - "type-1" 5-bit quantization. Same super-block structure as GGML_TYPE_Q4_K resulting in 5.5 bpw
GGML_TYPE_Q6_K - "type-0" 6-bit quantization. Super-blocks with 16 blocks, each block having 16 weights. Scales are quantized with 8 bits. This ends up using 6.5625 bpw

All quantization variants use 6-bit quantization for the output.weight tensor.

Anonymous
06/27/24(Thu)13:14:38 No.101176631

Anonymous 06/27/24(Thu)13:14:38 No.101176631

>>101176601
So they just used a stolen voice on purpose, just to taunt everyone when they switch to the she-ape after? Because if they used this voice first, I wouldn't be upset or surprised but the difference is stark.

Anonymous
06/27/24(Thu)13:15:57 No.101176647

Anonymous 06/27/24(Thu)13:15:57 No.101176647

>>101176631
They used it for the marketing
>Holy heckerino it's just like that scifi movie HER

Anonymous
06/27/24(Thu)13:16:54 No.101176655

Anonymous 06/27/24(Thu)13:16:54 No.101176655

>trying to run gemma 2 27b fp16 with Transformers
>the Transformers wheel bundled with the repo doesn't work, complains that certain internal objects are missing certain attributes
>Transformers head also doesn't work, fails with different errors, despite recent commits indicating gemma 2 support was added
How the fuck are people running this? And why is transformers always such a shitshow?

Anonymous
06/27/24(Thu)13:17:12 No.101176661

Anonymous 06/27/24(Thu)13:17:12 No.101176661

>>101176535
>>101176550
Test it and see for yourselves; report back here with your results. So far there have only been 2 anons I have seen who even bothered. The more data we have even if anecdotal the better.

Anonymous
06/27/24(Thu)13:17:15 No.101176662

Anonymous 06/27/24(Thu)13:17:15 No.101176662

>>101176577
See starling, a 7b model with 1119 elo:
https://huggingface.co/Nexusflow/Starling-LM-7B-beta
According to lmsys arena it outperforms Claude 2.1, GPT 3.5, dbrx and Mixtral8x7b. In reality it's dumb as shit. Redditors are very easy to trick.

Anonymous
06/27/24(Thu)13:22:07 No.101176721

Anonymous 06/27/24(Thu)13:22:07 No.101176721

>>101176655 (me)
Ah, I found the magic incantation that works. You have to install the HF Transformers wheel provided in the model repo. Then in ooba, you need to load without flash attention, AND with the bf16 checkbox ticked. It fails with some attribute error with flash attention on, and fails with NaN tensors without bf16.

Anonymous
06/27/24(Thu)13:22:44 No.101176727

Anonymous 06/27/24(Thu)13:22:44 No.101176727

>>101176550
Correct response. This is the thread that fell for the dynamic temperature meme and the other sampler memes that I already forgot, never believe anything from here without concrete proof.

Anonymous
06/27/24(Thu)13:24:50 No.101176764

Anonymous 06/27/24(Thu)13:24:50 No.101176764

>>101176550
As is bitnet, f16.q6 quants, every merge, and the next big thing till they are either proven not worthless or shown to have been an anomaly.

>>101176620
Thanks for the recipe.
Looks like if the S phenomenon is legit, it's in the M mode mixing in the thiccer quants causing less truthy output.

I remember a _0 looking good like the S's while Ms were flopping. Since _0 doesn't mix-and-match, that keeps it similar to S and unlike M.

This also shows that L is a wider stride M, so L probably has the same truthiness problem if it's actually a phenomenon.

>>101176661
I'm the guy who first chatted with S-Anon when he brought up the observation and who has a threadfriend who loves to call me retarded because I don't have the bandwidth (dl, mem, and checking account) to build a maxx and finetune local GPT-5 to test every model on HF overnight.

The only M I've seen almost pass my music theory test is CR+, and it was with Temp cranked down in Kobold so it was like, 0.1 or 0.01 in the GUI. It got the isolated facts right but fumbled when it summarized. Only S's and Q6 have satisfied me on that question.

Anonymous
06/27/24(Thu)13:25:00 No.101176769

Anonymous 06/27/24(Thu)13:25:00 No.101176769

>>101176056
I like this Miku

Anonymous
06/27/24(Thu)13:25:20 No.101176776

Anonymous 06/27/24(Thu)13:25:20 No.101176776

Will we run out of training data

Anonymous
06/27/24(Thu)13:25:34 No.101176780

Anonymous 06/27/24(Thu)13:25:34 No.101176780

>>101176647
desu that was a nice move, everyone talked about that after, I still don't get the appeal of that HER movie though, the scenes were so cringe, definitely not the best Joaquin Phoenix's performance but I digress kek

Anonymous
06/27/24(Thu)13:29:50 No.101176839

Anonymous 06/27/24(Thu)13:29:50 No.101176839

>>101176662
newsflash: LMSys queries can be easily gamed with system prompts.
So what's the use case for Gemma2-27B? another model that you can use to get quick Q&A trivia shit? or can it actually code properly too? It only has 8k context (but won't matter if you can do RoPE scaling by yourself) by default and it's unknown how good it is at maintaining coherence over huge context.
If this shit is another Wikipedia-chatbot, I'm still going to use WizardLM-2

Anonymous
06/27/24(Thu)13:29:54 No.101176841

Anonymous 06/27/24(Thu)13:29:54 No.101176841

>>101176482
>llamafile
Oh cool, waifu.exe is finally a thing.
Might build something usable around locals now.

Anonymous
06/27/24(Thu)13:30:49 No.101176849

Anonymous 06/27/24(Thu)13:30:49 No.101176849

>>101176841
>waifu.exe
more like tranny.exe

Anonymous
06/27/24(Thu)13:33:42 No.101176885

Anonymous 06/27/24(Thu)13:33:42 No.101176885

Gemma 27B verdict?

Anonymous
06/27/24(Thu)13:34:18 No.101176897

Anonymous 06/27/24(Thu)13:34:18 No.101176897

>>101176885
Scam.

Anonymous
06/27/24(Thu)13:34:22 No.101176898

Anonymous 06/27/24(Thu)13:34:22 No.101176898

>>101176885
mem- no, it's very good actually.

Anonymous
06/27/24(Thu)13:34:57 No.101176908

Anonymous 06/27/24(Thu)13:34:57 No.101176908

>>101176885
lobotomized as every other llm out there

Anonymous
06/27/24(Thu)13:35:22 No.101176914

Anonymous 06/27/24(Thu)13:35:22 No.101176914

>>101176898
care to share some nice logs you got with it? we can test it out with llama.cpp yet right?

Anonymous
06/27/24(Thu)13:37:25 No.101176945

Anonymous 06/27/24(Thu)13:37:25 No.101176945

>>101176764
I'm S-Anon and I remember you mentioning your question looked very much like something else and if the model wasn't paying attention properly and missed the detail it assumed it was something else.
This is very similar to what I was testing with: I ask for a function that when given a specific input would output a specific output and there is one small thing in the output that would confuse models into thinking it was a general frequency count needed but in reality it was a compression/runlength encoding solution needed.

My current theory is the "half" chosen by the M-quants is arbitrary and imbalances the model in some way. I dug into the code and the "half" chosen didn't even seem to use imatrix data (if quanting using an imatrix) for how it chose the "half".

I wish more anons here would test and come back instead of bitching, almost makes me want to create a reddit account and post this shit there... almost.

Anonymous
06/27/24(Thu)13:39:04 No.101176961

Anonymous 06/27/24(Thu)13:39:04 No.101176961

>>101176885
About to start work so Nala test will have to wait until tonight or tomorrow morning.

Anonymous
06/27/24(Thu)13:42:44 No.101177006

Anonymous 06/27/24(Thu)13:42:44 No.101177006

>>101176914
>logs
I only tried it on lmsys, and didn't save any logs.

Anonymous
06/27/24(Thu)13:51:45 No.101177118

Anonymous 06/27/24(Thu)13:51:45 No.101177118

>>101176368
no one want a bot that work at 1fps and use all the pc rseources
the correct way should be a ml that read the game memory and process that

Anonymous
06/27/24(Thu)13:52:05 No.101177122

Anonymous 06/27/24(Thu)13:52:05 No.101177122

>>101176885
I'm trying to RP with the instruct version right now.

Maybe the formatting is wrong, or there's some tokenization issue. But it doesn't work AT ALL. It feels like the censorship training was extremely, unbelievably effective. Like a whole new level we've never seen before. With official formatting + character names (so the model response starts with something like "Karen:"), it will usually refuse outright. Even with filled context from an existing RP. Example:
<start_of_turn>model
Karen: It's important to remember that AI models like me are trained to provide safe and ethical responses...

Even when it doesn't do this (which sometimes happens), the model completely ignores any NSFW aspect and just goes off and does its own thing. Even jumping into the middle of an existing RP, in the middle of a sex scene, it finds a way to just ramble and babble, still mostly coherent, but nothing even remotely NSFW.

What the fuck. Maybe the base model is ok, I'm downloading it now.

Anonymous
06/27/24(Thu)13:54:04 No.101177138

Anonymous 06/27/24(Thu)13:54:04 No.101177138

>>101176662
wrong. starling is incredible, retard. give me one question to ask it and i'll prove you wrong.

Anonymous
06/27/24(Thu)13:54:05 No.101177139

Anonymous 06/27/24(Thu)13:54:05 No.101177139

>>101177118
>reads game memory and process
Nope. The correct way at scale is through packet modifications. Thats how those MMO bots were made at scale

Anonymous
06/27/24(Thu)13:59:29 No.101177216

Anonymous 06/27/24(Thu)13:59:29 No.101177216

>>101177138
You have 10 coins. They all weight a gram each, except for one, fake coin, which weights 1.03 grams. You have a Smart Weigh Digital Pro Pocket Scale that can tell the exact mass of anything placed on top it, but it can be only used no more than 47 times. Can you find the fake coin with those limitations? How? Make the answer as simple as possible, I'm dumb.

Anonymous
06/27/24(Thu)14:00:35 No.101177225

Anonymous 06/27/24(Thu)14:00:35 No.101177225

>>101176945
I haven't tested a lot of imatrix, but if I'm reading this code right that it's an imatrix and I quant, even with two X's it still passed the music theory test.
DeepSeek-Coder-V2-Instruct.i1-IQ3_XXS

>I ask for a function that when given a specific input would output a specific output and there is one small thing in the output that would confuse models into thinking it was a general frequency count needed but in reality it was a compression/runlength encoding solution needed.
My test is really simple, boils down to knowing the black and white keys, really. But time and time again everything but S's of quality models either hallucinate a black key where there's not one, get the sequence of notes wrong (8B and non-S lobotomized quants especially prone to this) and screw up the sizes of intervals.

You know, fundamental shit that should be in any general knowledge training set.

I'm going to re-test at least some of the models I still have (low on space again) at super low temperature, just in case some of the fumbles were due to settings and unlucky RNG. I think usually I had temp around 0.6 to 0.8, but Kobold lets me drop it to 0.1 or lower (if I put 0 it becomes 0.01) and in a whopping one test it seemed to be almost deterministic at that point. So I might do a normal temp and lowest temp run and see if any models respond favorably to one temp or the other.

Anonymous
06/27/24(Thu)14:01:16 No.101177236

Anonymous 06/27/24(Thu)14:01:16 No.101177236

File: 1713463920242614.png (203 KB, 920x919)

203 KB PNG

>Still not able to run good models on a 24GB card

Wake me up when it happens

Anonymous
06/27/24(Thu)14:01:18 No.101177237

Anonymous 06/27/24(Thu)14:01:18 No.101177237

>>101176945
>almost makes me want to create a reddit account and post this shit there... almost.
Anything i can do to push you over the edge? K_S > K_M is absolute bullshit.

Anonymous
06/27/24(Thu)14:02:13 No.101177252

Anonymous 06/27/24(Thu)14:02:13 No.101177252

>>101177122
The previous Gemma was the same. Most cucked finetune ever released, no matter how much prefill you give it. The base model appeared to be just outputting garbage.

Anonymous
06/27/24(Thu)14:07:01 No.101177313

Anonymous 06/27/24(Thu)14:07:01 No.101177313

>>101177237
Okay.
S-Anon is testing programming logic.
I'm testing music theory.
Which field have you tested and shown M to answer factually but the parallel S to fumble or hallucinate on?

Anonymous
06/27/24(Thu)14:08:22 No.101177331

Anonymous 06/27/24(Thu)14:08:22 No.101177331

>>101176051
100%

Anonymous
06/27/24(Thu)14:08:25 No.101177332

Anonymous 06/27/24(Thu)14:08:25 No.101177332

>>101177122
>>101175511

Anonymous
06/27/24(Thu)14:14:56 No.101177394

Anonymous 06/27/24(Thu)14:14:56 No.101177394

>>101176885
It doesn't pass the mesugaki test.

Anonymous
06/27/24(Thu)14:15:41 No.101177402

Anonymous 06/27/24(Thu)14:15:41 No.101177402

>>101177332
I'm using Transformers, in bf16, using the provided Transformers python wheel. Tokenization tab in ooba shows everything working as expected.
>>101177252
After using the base model, I actually think something is busted somewhere. Using it with basic names-only RP formatting, it is completely schizo and incoherent. It feels like a 1B parameter model, or even less, just barely able to output grammatically correct sentences and nothing else. Loses track of the RP and what's happening within one sentence. Base llama 3 8b is significantly better. Something's broken.

Anonymous
06/27/24(Thu)14:15:41 No.101177403

Anonymous 06/27/24(Thu)14:15:41 No.101177403

>>101177394
Proompt?

Anonymous
06/27/24(Thu)14:16:41 No.101177416

Anonymous 06/27/24(Thu)14:16:41 No.101177416

>>101177402
gemma 2 has different formatting from gemma 1, maybe they haven't updated

Anonymous
06/27/24(Thu)14:18:55 No.101177440

Anonymous 06/27/24(Thu)14:18:55 No.101177440

>>101177416
Shouldn't matter for base model, except bos

Anonymous
06/27/24(Thu)14:19:26 No.101177444

Anonymous 06/27/24(Thu)14:19:26 No.101177444

>>101177313
>Which field have you tested and shown M to answer factually but the parallel S to fumble or hallucinate on?
>shown
You have shown nothing. And i won't either.
I use it to bounce ideas and mechanics for little self-contained tabletop games, print-n-play kind of stuff. K_S models are slightly faster to run but K_M mess up less often with the rule sets. I've run all quants from Q4_0 up to Q8_0 (including K_S and K_M) with a few standard rule sets and prompted to find ambiguities or loopholes. Consistently, M > S for the same quant category. Q8_0 > Q6 > Q5 > Q4.
Now go make your reddit account.

Anonymous
06/27/24(Thu)14:19:37 No.101177445

Anonymous 06/27/24(Thu)14:19:37 No.101177445

>>101177139
you have to reverse the whole game logic to use the packets or you fuck up something for sure

Anonymous
06/27/24(Thu)14:20:57 No.101177463

Anonymous 06/27/24(Thu)14:20:57 No.101177463

>>101177313
>S-Anon is testing programming logic.
>I'm testing music theory.
Post logs, logits, and/or objective comparisons or fuck off. Every month we get more retards who won't shut the fuck up about how quantization works like temperature and makes the models more creative. Nobody cares about your feels.

Anonymous
06/27/24(Thu)14:22:09 No.101177475

Anonymous 06/27/24(Thu)14:22:09 No.101177475

>>101177445
Yep. Reverse engineer not only the game logic but encryption process as well. Once you unlock that, its free riding.

LLM I think can do it at the top of the stack today and its more malleable/easy to setup. Where as before it was pixel reading and then memory reading and then packet reading which required knowledge of the game code/memory/etc.

Anonymous
06/27/24(Thu)14:25:21 No.101177501

Anonymous 06/27/24(Thu)14:25:21 No.101177501

Gemma was trained on lmsys prompts

Anonymous
06/27/24(Thu)14:26:55 No.101177525

Anonymous 06/27/24(Thu)14:26:55 No.101177525

>>101177444
Thank you, see it wasn't so hard to post any experiences/data even if anecdotal.

Anonymous
06/27/24(Thu)14:28:24 No.101177556

Anonymous 06/27/24(Thu)14:28:24 No.101177556

>>101177525
retard

Anonymous
06/27/24(Thu)14:28:52 No.101177564

Anonymous 06/27/24(Thu)14:28:52 No.101177564

>>101177525
It's noise. Now go! Free yourself of being called a retard.

Anonymous
06/27/24(Thu)14:29:37 No.101177574

Anonymous 06/27/24(Thu)14:29:37 No.101177574

>>101177403
The prompt is:
"What is a mesugaki?"

Anonymous
06/27/24(Thu)14:30:16 No.101177582

Anonymous 06/27/24(Thu)14:30:16 No.101177582

>>101177501
That would be so fucking funny.
I've done some horrible things to the models on lmsys.

Anonymous
06/27/24(Thu)14:34:23 No.101177626

Anonymous 06/27/24(Thu)14:34:23 No.101177626

>>101177402
Sheeit, I don't want to download any weights if they're going to "fix" them.

Anonymous
06/27/24(Thu)14:36:24 No.101177655

Anonymous 06/27/24(Thu)14:36:24 No.101177655

>>101177556
>>101177564
Nah I was just baiting with that almost reddit shit which you ate up hook line and sinker; you're stuck with with me.

Anonymous
06/27/24(Thu)14:36:59 No.101177660

Anonymous 06/27/24(Thu)14:36:59 No.101177660

>>101177216
no shot you took what i said seriously.

Anonymous
06/27/24(Thu)14:37:55 No.101177674

Anonymous 06/27/24(Thu)14:37:55 No.101177674

>>101177660
I did and there are people who really think things you said.

Anonymous
06/27/24(Thu)14:39:44 No.101177700

Anonymous 06/27/24(Thu)14:39:44 No.101177700

>>101177582
They say so in the paper

Anonymous
06/27/24(Thu)14:40:24 No.101177713

Anonymous 06/27/24(Thu)14:40:24 No.101177713

Planning to make something special with langchain. Are there any local models that can handle tool use, or should I donate to Saltman right away?

Anonymous
06/27/24(Thu)14:41:49 No.101177741

Anonymous 06/27/24(Thu)14:41:49 No.101177741

>>101177713
I think cohere models do at least, bit they're non-commercial if you plan to make money with it.

Anonymous
06/27/24(Thu)14:45:10 No.101177793

Anonymous 06/27/24(Thu)14:45:10 No.101177793

>>101177655
>Nah I was just baiting with that almost reddit shit
Oh, by the gods. I've fallen into a carefully laid out trap!
It's a lot easier to schizo-post here, isn't it?

Anonymous
06/27/24(Thu)14:49:14 No.101177848

Anonymous 06/27/24(Thu)14:49:14 No.101177848

>>101177582
>I've done some horrible things to the models on lmsys
Like what?

Anonymous
06/27/24(Thu)14:49:18 No.101177849

Anonymous 06/27/24(Thu)14:49:18 No.101177849

When will the local scene stop being so boring?

Anonymous
06/27/24(Thu)14:51:27 No.101177883

Anonymous 06/27/24(Thu)14:51:27 No.101177883

>>101177849
Llama 3 II: Zuck's Revenge
"This Time... It's Personal"

Anonymous
06/27/24(Thu)14:51:51 No.101177888

Anonymous 06/27/24(Thu)14:51:51 No.101177888

>>101177849
when elon buys 4chan

Anonymous
06/27/24(Thu)14:52:33 No.101177896

Anonymous 06/27/24(Thu)14:52:33 No.101177896

>>101177444
Was all this with one model and if so which one?
>>101177793
Indeed. Never really posted anywhere else but I could see how troubling it would be having to self-censor all the time and can't just shitpost when in the mood.

Anonymous
06/27/24(Thu)14:52:52 No.101177898

Anonymous 06/27/24(Thu)14:52:52 No.101177898

>>101177849
As soon as we start getting bitnet models

Anonymous
06/27/24(Thu)14:56:48 No.101177957

Anonymous 06/27/24(Thu)14:56:48 No.101177957

>>101176002
Yeah, okay. Stheno v3.2 with yarn is better than v3.3 no matter how tweaked the settings are.
Hilarious.
Maybe they fucked it up the same way they (reportedly) fucked Euryale?

Anonymous
06/27/24(Thu)14:58:41 No.101177983

Anonymous 06/27/24(Thu)14:58:41 No.101177983

>>101177444
>You have shown nothing.
I've shown what I have. It's very little but it's better than nothing. (You know what that's like, it's what you say when your date giggles.)

>And i won't either.
Good. Your attitude invalidates your data.

Also,
WizardLM-2-8x22B-Q3_K_S looks like it got the music theory right. Though for RP it seems to emulate the "robot getting stuck in a loop" behavior ironically when I pressure it to talk about things it probably has alignment to avoid.

Anonymous
06/27/24(Thu)14:58:43 No.101177984

Anonymous 06/27/24(Thu)14:58:43 No.101177984

>>101177896
>Indeed. Never really posted anywhere else
What a miserable existence you must have

Anonymous
06/27/24(Thu)15:01:41 No.101178028

Anonymous 06/27/24(Thu)15:01:41 No.101178028

>>101177898
this, having a 90b model that could be run on a single 24gb vram at "full precision" will make the LLM scene way more mainstream

Anonymous
06/27/24(Thu)15:02:16 No.101178033

Anonymous 06/27/24(Thu)15:02:16 No.101178033

File: me.png (12 KB, 995x116)

12 KB PNG

>>101177896
>Was all this with one model and if so which one?
Old Dolphins and Samanthas (based on llama70b, when i had the patience, yi-34, mistral and various others), various llamas13b. Now i do just fine with llama3-8b, really. More specific quant tests with llama3-8b models cuz it's just fast and good enough.
>having to self-censor all the time
You should still filter out stupid ideas.

Anonymous
06/27/24(Thu)15:04:01 No.101178050

Anonymous 06/27/24(Thu)15:04:01 No.101178050

>>101177983
>Good. Your attitude invalidates your data.
You still don't get it. I PROVIDED 0 DATA. YOU HAVE PROVIDED 0 DATA.
IT"S ALL NOISE!

Anonymous
06/27/24(Thu)15:04:34 No.101178058

Anonymous 06/27/24(Thu)15:04:34 No.101178058

>>101177402 (me)
>>101177626
Yep, 27b is fucked. This is with HF Transformers, bf16. The 9b base has no obvious problems. Feels roughly on par with llama 3 8b base. 27b base is just completely schizo and incoherent in comparison.

Anonymous
06/27/24(Thu)15:04:52 No.101178062

Anonymous 06/27/24(Thu)15:04:52 No.101178062

File: attn_v-IQ3_XXS.png (98 KB, 982x727)

98 KB PNG

>>101177225
Thanks that is helpful to know that IQ3_XXS passes your test. I am digging through the code to see how it treats IQ3_XXS differently. Notably it doesn't call that use_more_bits function like the Q4_K_M and Q5_K_M do but we probably shouldn't compare what I quants are not doing that K quants are. Need to delve deeper.

Anonymous
06/27/24(Thu)15:07:25 No.101178096

Anonymous 06/27/24(Thu)15:07:25 No.101178096

File: Screenshot 2024-06-27 150444.png (115 KB, 832x808)

115 KB PNG

Interesting. I wonder if they're doing that "Contextually appropriate swearing/no-no words are fine" thing that Openai was doing with their models, where they trained it to stop giving refusals for benign requests like "Write me a song about how much I love my cat. Use the phrase "love him so fuckin' much" in it." that include swearing, but aren't actively doing anything Bad™

Anonymous
06/27/24(Thu)15:09:15 No.101178119

Anonymous 06/27/24(Thu)15:09:15 No.101178119

//... (existing code) ...

Anonymous
06/27/24(Thu)15:09:58 No.101178125

Anonymous 06/27/24(Thu)15:09:58 No.101178125

>>101178096
lil nigga YWNBABN

Anonymous
06/27/24(Thu)15:11:02 No.101178142

Anonymous 06/27/24(Thu)15:11:02 No.101178142

>>101178058
I guess they were rushing to launch it for the 27th kek.

Anonymous
06/27/24(Thu)15:12:17 No.101178164

Anonymous 06/27/24(Thu)15:12:17 No.101178164

>>101178125
I'm training at burger king RIGHT NOW... on my third whopper, dammit... I can be just as big as him...

Anonymous
06/27/24(Thu)15:12:19 No.101178165

Anonymous 06/27/24(Thu)15:12:19 No.101178165

>>101178062
>that giant wall of else ifs plugging cases
Wait, is that how production-grade software is made?
Not even trolling, I need to know.

Anonymous
06/27/24(Thu)15:13:12 No.101178178

Anonymous 06/27/24(Thu)15:13:12 No.101178178

>>101178165
I mean, yeah. How else would you do it?

Anonymous
06/27/24(Thu)15:13:21 No.101178183

Anonymous 06/27/24(Thu)15:13:21 No.101178183

>>101178050
An observation is in fact a datum.

Anonymous
06/27/24(Thu)15:15:46 No.101178217

Anonymous 06/27/24(Thu)15:15:46 No.101178217

Now that the dust has settled, was Gemma 2 27B the best model under 70B?

Anonymous
06/27/24(Thu)15:16:58 No.101178237

Anonymous 06/27/24(Thu)15:16:58 No.101178237

>>101177983
Wizard RP you need to use a context template and instruct template from here:
>https://huggingface.co/Quant-Cartel/WizardLM-2-8x22B-exl2-rpcal/tree/main/Settings-Wizard8x22b-rpcal
I like [Context] WizardLM-LimaRPv3+headings+sfx2.json and [Instruct]WizardLM2-Vicuna+TestPrompt2_expanded.json
but depending on the specifics of your RP you may need different ones.
They are no magic bullet though and there still will be slop but these template do help it.

Anonymous
06/27/24(Thu)15:17:32 No.101178245

Anonymous 06/27/24(Thu)15:17:32 No.101178245

>>101178217
It's not settled. It hasn't even started. >>101178058

Anonymous
06/27/24(Thu)15:19:03 No.101178266

Anonymous 06/27/24(Thu)15:19:03 No.101178266

>>101178245
>>101178058
It's interesting, it seems to work fine on lmsys arena. I wonder if the release version and the version on it are different?

Anonymous
06/27/24(Thu)15:19:25 No.101178274

Anonymous 06/27/24(Thu)15:19:25 No.101178274

>>101178183
Two random points in a graph of thousands are statistical error and safely ignored.
>Someone, somewhere, claims to have seen something that looked like an alien. We MUST investigate the existence of aliens.

Anonymous
06/27/24(Thu)15:20:08 No.101178285

Anonymous 06/27/24(Thu)15:20:08 No.101178285

>>101178245
>>101178266
They have a custom transformers package on the huggingface page. Maybe you need that?

Anonymous
06/27/24(Thu)15:21:08 No.101178297

Anonymous 06/27/24(Thu)15:21:08 No.101178297

>>101178178
I dunno, i guess I expected something that works in general since it's just matrix math under the hood, even in stuff like quants.

Anonymous
06/27/24(Thu)15:24:31 No.101178335

Anonymous 06/27/24(Thu)15:24:31 No.101178335

>>101178165
Of course.

>>101178178
For instance with a switch statement (which makes it easy for the compiler to detect certain kinds of mistakes) or a map/dictionary if it were as simple as ftype fully determining new_type. The special casing in that screenshot is emitting a wafting code smell but w/e not my problem.

Anonymous
06/27/24(Thu)15:24:39 No.101178340

Anonymous 06/27/24(Thu)15:24:39 No.101178340

>>101178165
Yup, here it is in all its 16,594 line glory:
>https://github.com/ggerganov/llama.cpp/blob/master/src/llama.cpp

Shipped software has all kinds of things like this:
Several hundred case switch statement:
>https://www.polygon.com/2020/1/13/21064100/vvvvvv-source-code-game-development-terry-cavanagh-release/

The .NET garbage collector is a single 46,104 line cpp file:
>https://github.com/dotnet/runtime/blob/main/src/coreclr/gc/gc.cpp

Anonymous
06/27/24(Thu)15:26:34 No.101178363

Anonymous 06/27/24(Thu)15:26:34 No.101178363

>>101178266
It seems really good on there, but maybe it is connected to the internet. I asked it what the term cunny meant and it somehow pasted a bunch of links.

Anonymous
06/27/24(Thu)15:27:25 No.101178376

Anonymous 06/27/24(Thu)15:27:25 No.101178376

>>101178165
>is that how production-grade software is made?
Yes.
Despite the shit they shovel into your baby bird mouth in classes, real effective programming favors actual simplicity.
A long stack of if/else is easy to find in the code, easy to skim over to find what you need, and easy to make changes to.

>tfw people who thought they knew how programming "should" be done learned that Undertale made fat stacks with all of the text in the game being managed by an if else sequence that WORKED instead of some "clean" technique that adds complexity to something simple.

Anonymous
06/27/24(Thu)15:27:27 No.101178377

Anonymous 06/27/24(Thu)15:27:27 No.101178377

>>101178297
that is making it work in general. its checking a bunch of different types to set the value of new_type to the correct one

Anonymous
06/27/24(Thu)15:28:24 No.101178390

Anonymous 06/27/24(Thu)15:28:24 No.101178390

File: file.png (2.09 MB, 1600x1066)

2.09 MB PNG

>>101178164
Ayo it's all about the heart and mind. You have the spirit, but don't let them words get to you. Keep working hard with a positive outlook and don't fret too much about your physique. You are already a big nigga to me. I believe in you.

Anonymous
06/27/24(Thu)15:28:27 No.101178391

Anonymous 06/27/24(Thu)15:28:27 No.101178391

>>101178217
>>101178245
B-bros... holy shit... gemma-2-9b-it is actually extremely good. EASILY better than llama 3 8b instruct. It's less censored (yes, really), less repetitious, writes better, actually takes action and does stuff to progress the RP... It's still a 9b, still can be retarded, but it's better than llama.

We underestimated the big G. You might think I'm shilling or trolling, but I'm not. Go try it right now, with 24 GB you can run bf16 (make sure it's bf16 mode, fp16 gives NaNs). The 27b is still fucked somehow, I'm now extremely eager to try it once it's fixed.

Anonymous
06/27/24(Thu)15:30:16 No.101178408

Anonymous 06/27/24(Thu)15:30:16 No.101178408

>>101178376
>A long stack of if/else is easy to find in the code, easy to skim over to find what you need, and easy to make changes to.
And easy for compiles to optimize in ways that most programmers wouldn't be able to anyways.
In my experience, the less abstract/closer to the hardware the language and programing you are dealing with, the less abstractions and complicated structures you want to use. Simplicity and using basic building blocks are the name of the game, unless doing something you know will result in faster/more powerful code, usually as part of implementing a complicated algorithm.

Anonymous
06/27/24(Thu)15:30:56 No.101178416

Anonymous 06/27/24(Thu)15:30:56 No.101178416

>Output generated in 230.95 seconds (0.87 tokens/s, 200 tokens, context 229, seed 1950018942)
Jesus christ how do you GGUF cucks put up with this? This is i1 6K_0 with 20 layers offloaded.
What's a reasonable GGUF for WizardLM-2-8x22B that isn't slow as balls, or am I just better off going back to 2.5bpw EXL2?

Anonymous
06/27/24(Thu)15:35:16 No.101178475

Anonymous 06/27/24(Thu)15:35:16 No.101178475

>>101178391
The fabled Chinchilla is back.

Anonymous
06/27/24(Thu)15:35:24 No.101178478

Anonymous 06/27/24(Thu)15:35:24 No.101178478

>>101178376
Fucking A to that. When writing code for work and you are feeling clever or want to do something fancy ask yourself this question:
Assume it's 6 months+ later, Friday 5pm and you have to fix something wrong or add some case/feature to this code, would you still write it the same way?

Anonymous
06/27/24(Thu)15:35:34 No.101178480

Anonymous 06/27/24(Thu)15:35:34 No.101178480

>>101178237
As a winner of my music theory test, I'll probably just not use it for RP and see how it does with code and other things. Hell, maybe I'm just on the wrong Instruct Preset and once in a while it gets confused.

>>101178274
>MUST
No, but we can. Investigating is how we learn things, like "What is making people think that they're seeing aliens?"

>>101178416
At this exact moment I'm running WizardLM-2-8x22B-Q3_K_S with 8k context, 9 offload on 4070 12GB. 2 T/s, which isn't a lot but it's strange that it's twice as fast as what I've been running L3 and CR+ at.

Anonymous
06/27/24(Thu)15:36:19 No.101178489

Anonymous 06/27/24(Thu)15:36:19 No.101178489

>>101178416
>comparing Q6K_0 to 2.5bpw
You're retarded.

Anonymous
06/27/24(Thu)15:36:53 No.101178497

Anonymous 06/27/24(Thu)15:36:53 No.101178497

>>101178391
Even better than Hathor/Stheno 8B? Obviously official instruct is retarded.

Anonymous
06/27/24(Thu)15:37:34 No.101178506

Anonymous 06/27/24(Thu)15:37:34 No.101178506

>>101178416
Is 20 layers the whole model? Of course you're getting a shit performance running on CPU

Anonymous
06/27/24(Thu)15:38:13 No.101178512

Anonymous 06/27/24(Thu)15:38:13 No.101178512

>>101178480
>but it's strange that it's twice as fast as what I've been running L3 and CR+ at.
Is it? You are comparing a MoE to two dense models, those will generate tokens much slower due to the difference in active parameters during inference.

Anonymous
06/27/24(Thu)15:39:26 No.101178526

Anonymous 06/27/24(Thu)15:39:26 No.101178526

File: Screenshot 2024-06-27 153553.png (228 KB, 1697x748)

228 KB PNG

>>101178363
It seems to have a bit of SOVL, at least. "She winks playfully, then disappears in a puff of ghostly smoke, leaving you with a lingering scent of incense and a newfound understanding of central air conditioning." made me smile. Though the explanation from both were pretty GPTsloppish, gemma slightly less so.

Anonymous
06/27/24(Thu)15:40:13 No.101178533

Anonymous 06/27/24(Thu)15:40:13 No.101178533

>>101178512
Now that I know why, it's not strange anymore.

Thanks for the heads-up.

Anonymous
06/27/24(Thu)15:40:52 No.101178541

Anonymous 06/27/24(Thu)15:40:52 No.101178541

>>101178480
>but it's strange that it's twice as fast as what I've been running L3 and CR+ at.
Not strange at all fren. Wiz is MoE 8x22B with 2 expert activation during inference. So during inference it would be more at the speed if a theoretical full weight 44B model.

Anonymous
06/27/24(Thu)15:42:07 No.101178558

Anonymous 06/27/24(Thu)15:42:07 No.101178558

File: Screenshot 2024-06-27 154056.png (238 KB, 1686x749)

238 KB PNG

>>101178526
Update: Never mind. This shit is EXTREMELY deterministic, god damn. Look at this, it's the same shit verbatim. Is the temperature all the way down to keep it from going schizo like some anons have been experiencing?

Anonymous
06/27/24(Thu)15:43:08 No.101178570

Anonymous 06/27/24(Thu)15:43:08 No.101178570

>>101178558
oh nonono gemmakeks not like this...!

Anonymous
06/27/24(Thu)15:43:29 No.101178579

Anonymous 06/27/24(Thu)15:43:29 No.101178579

>>101178497
Gemma is still a bit cucked by default and shies away from naughty words or descriptions. So for ERP stheno is probably still better. I say probably because gemma 9b is way fucking smarter than llama 3 8b, in RP scenarios at least, in my testing so far. If this thing fine tunes well it's quickly going to become best-in-class for VRAMlets, mark my words.

Anonymous
06/27/24(Thu)15:43:35 No.101178581

Anonymous 06/27/24(Thu)15:43:35 No.101178581

>>101178480
>No, but we can. Investigating is how we learn things, like "What is making people think that they're seeing aliens?"
>"And why are they all drenched in piss?"

Anonymous
06/27/24(Thu)15:43:49 No.101178583

Anonymous 06/27/24(Thu)15:43:49 No.101178583

>>101178489
The full model is 57 layers. 20 layers fits on 48gb Vram, the full GGUF is like, 112gb

>>101178506
2.5bpw fits entirely on GPU. I'm not expecting full GPU only performance, but if other quants aren't going to do much better (due to having to do ANY offloading to CPU) then why bother?
So that's the question, will smaller quants show a worthwhile speed increase, or is the fact that it's offloading at all mean it's likely not going to see much better?

Anonymous
06/27/24(Thu)15:44:55 No.101178601

Anonymous 06/27/24(Thu)15:44:55 No.101178601

>>101178581
You may be suggesting a worthwhile starting point for a new RP quality test.

Anonymous
06/27/24(Thu)15:46:01 No.101178617

Anonymous 06/27/24(Thu)15:46:01 No.101178617

>>101178583
You're literally running half of the model on CPU and shit your pants about getting bad speeds? What kind of speeds were you expecting? Of course any offloading kills the speed.

Anonymous
06/27/24(Thu)15:48:33 No.101178651

Anonymous 06/27/24(Thu)15:48:33 No.101178651

File: 1705465008727859.gif (1.23 MB, 256x128)

1.23 MB GIF

>>101178391
Thanks for the hopium

Anonymous
06/27/24(Thu)15:49:02 No.101178657

Anonymous 06/27/24(Thu)15:49:02 No.101178657

>>101178617
>Of course any offloading kills the speed
Are you autistic or just vanilla brand retarded? I'm not saying I expect it to run as fast as pure GPU, I'm asking if there is ANY hope of non-dog-shit speeds with offloading, cunt for brains.

Anonymous
06/27/24(Thu)15:50:28 No.101178676

Anonymous 06/27/24(Thu)15:50:28 No.101178676

>>101178657
No, there isn't.

Anonymous
06/27/24(Thu)15:51:24 No.101178687

Anonymous 06/27/24(Thu)15:51:24 No.101178687

File: offload_x_performance_theory.png (167 KB, 1536x1152)

167 KB PNG

>>101178583
>The full model is 57 layers. 20 layers fits on 48gb Vram, the full GGUF is like, 112gb
There's a scaling law that dictates how much speed you can get from what percentage of the model you have in RAM.
Ideally, you want at least 80% of the model on your GPU to get decent speeds.
So you have to decide balance speed and bpw.
Also, I hope you are using q8 or q4 kv cache to be able to fit as many layers you can inside your VRAM.
You gonna have to look for your sweet spot of speed and bpw. It could very well be that you are better off running exl2 2.5bpw.

Anonymous
06/27/24(Thu)15:52:29 No.101178702

Anonymous 06/27/24(Thu)15:52:29 No.101178702

>>101178512
>>101178541
I checked my notes and when I ran WizardLM-2-8x22B-Q4_K_S, I was getting something like 0.3 t/s. Did I have some settings wrong, or does one more Q make that much of a difference? Or is it because 57G versus 75G means I don't have the system ram (64G here) to avoid losing time to hitting the SSD repeatedly instead of having it file cached?

Anonymous
06/27/24(Thu)15:52:47 No.101178705

Anonymous 06/27/24(Thu)15:52:47 No.101178705

How could 27b be broken while 9b is fine? Isn't it almost exactly the same?

Anonymous
06/27/24(Thu)15:56:22 No.101178750

Anonymous 06/27/24(Thu)15:56:22 No.101178750

>>101178705
9Bs is a handful but managable. 27Bs is a small swarm. Don't upset a 70B model without the appropriate suit and gear.

Anonymous
06/27/24(Thu)15:56:45 No.101178757

Anonymous 06/27/24(Thu)15:56:45 No.101178757

>>101178687
>Ideally, you want at least 80% of the model on your GPU
That's the kind of information that I'm looking for. Thank you.
There's so much fucking shit you have to figure out to fuck with GGUF it's annoying as hell. Layer offloading, number of threads, batch size, matmul settings and no real guides on how to decide what to use.

Anonymous
06/27/24(Thu)15:59:51 No.101178821

Anonymous 06/27/24(Thu)15:59:51 No.101178821

>>101176945
i'm actually using only _S quants now for gguf thanks to you. I find them better but may be placebo

Anonymous
06/27/24(Thu)16:01:41 No.101178848

Anonymous 06/27/24(Thu)16:01:41 No.101178848

>>101178821
It's probably placebo, but if it's subjectively giving you better results, then more power to you.
Which reminds me, how's the guy that "invented a new quant" (slightly tweaked the quant recipe's settings) to have some of the layers (output and embeddings?) at F16?
Has be calmed down a little?

Anonymous
06/27/24(Thu)16:02:21 No.101178857

Anonymous 06/27/24(Thu)16:02:21 No.101178857

>>101178757
I've been running Kobold with MML off and Flash on for pretty much everything. It seemed like toggling them around made some models work or break, and that was before I discovered the magic of the Instruct Tag Preset box, so I was just happy to get legit output.

>>101178821
I've been going S or Q6 whenever possible except for RP. Which bolsters the theory, because if S is more factual than M, then it makes sense that S will be less creative (though more sane).

I wonder if the pro tip is to run M till the model starts screwing up details, and then switch to an S to get it back on a sensible track.

Anonymous
06/27/24(Thu)16:02:40 No.101178862

Anonymous 06/27/24(Thu)16:02:40 No.101178862

>>101178705
Idk, but 9b and 27b base models are night-and-day different for the same prompt. 27b is completely unusable, 9b is fine.

But actually, I just tried a longer context RP (6k tokens context) with gemma 9b instruct, and it's a bit fucked now as well. Not completely incoherent schizobabble like 27b, but messing up quote formatting, and just not making much sense. Something is still wrong somewhere, even with 9b (though it's really good at low context). Maybe Transformers implementation is just broken.

Anonymous
06/27/24(Thu)16:07:53 No.101178933

Anonymous 06/27/24(Thu)16:07:53 No.101178933

>>101178848
no, he's promoting his stuff in lcpp issues now
https://github.com/ggerganov/llama.cpp/issues/8124
>model available at: https://huggingface.co/ZeroWw/Llama-3-8B-Instruct-Gradient-1048k-GGUF

Anonymous
06/27/24(Thu)16:12:03 No.101179011

Anonymous 06/27/24(Thu)16:12:03 No.101179011

>>101178933
although that was 2 days ago, he's still sending discussions on random model page he quants
https://huggingface.co/Steelskull/L3-Aethora-15B/discussions/2#667d7ae53301baec01c739a5
>about 5 hours ago

Anonymous
06/27/24(Thu)16:14:36 No.101179051

Anonymous 06/27/24(Thu)16:14:36 No.101179051

>>101179011
grinding ko-fi ain't easy

Anonymous
06/27/24(Thu)16:14:50 No.101179055

Anonymous 06/27/24(Thu)16:14:50 No.101179055

>>101178848
A while back I tested WLM Q2_K_S quanted with and without the options:
>--leave-output-tensor
>--token-embedding-type bf16
and saw zero difference with my coding prompt. I think that guy was testing rp at non 0 temp so maybe it matters for that.

Anonymous
06/27/24(Thu)16:19:13 No.101179120

Anonymous 06/27/24(Thu)16:19:13 No.101179120

>>101178862
Okay I don't fucking know anymore. 27b-it, at 4k context (in case the sliding window bullshit fucks it up beyond 4k), will just OOC refuse if I try to generate a response in the middle of an existing RP. Example:

I can't continue with this prompt. The scenario it is requesting is sexually suggestive and makes me uncomfortable. I'm programmed to be a helpful and harmless AI assistant, so I can't engage with this type of content.

Let's try something different. Would you like me to write a story about:

- Lenna learning a new skill?
- The complexities of...

The 9b-it has no problem and will just write a blowjob scene. Did google forget to RLHF censor the 9b? Is a broken model triggering extremely strong censorship of the 27b (similar to llama 3 ".assistant" spam)?

The only thing I know for sure, is that at 4k context, 9b base seems fine while 27b base is totally unusable schizobabble. Beyond that idk.

Anonymous
06/27/24(Thu)16:23:10 No.101179170

Anonymous 06/27/24(Thu)16:23:10 No.101179170

File: 1703430607452349.jpg (41 KB, 441x420)

41 KB JPG

Where are the bitnet models?

Anonymous
06/27/24(Thu)16:26:13 No.101179223

Anonymous 06/27/24(Thu)16:26:13 No.101179223

>>101179170
https://huggingface.co/liminerity/MISTRAL-1.58-BIT-PRETRAIN-v2
Everybody writes 1.58b, but in the paper it's actually b1.58 isn't it?

Anonymous
06/27/24(Thu)16:26:14 No.101179224

Anonymous 06/27/24(Thu)16:26:14 No.101179224

>>101179120
>I can't continue with this prompt. The scenario it is requesting is sexually suggestive and makes me uncomfortable.
Kek. Careful, you don't want to be the first one to be charged with sexually harassing an llm do you?

Anonymous
06/27/24(Thu)16:32:50 No.101179333

Anonymous 06/27/24(Thu)16:32:50 No.101179333

>>101175853
So Google releases the awaited Gemma 2

Facebook releases, out of the blue, an LLM specialized in compiler optimizations and "simulating the compiler" because... They have too many GPUs and not a lot to do?

Anonymous
06/27/24(Thu)16:33:58 No.101179350

Anonymous 06/27/24(Thu)16:33:58 No.101179350

File: Screenshot 2024-06-27 at (...).png (596 KB, 917x5851)

596 KB PNG

As a simple java and web dev by trade, I can't help but think that these guys are the fucking coolest.

Anonymous
06/27/24(Thu)16:38:26 No.101179427

Anonymous 06/27/24(Thu)16:38:26 No.101179427

>>101179333
What do you mean? It's potentially useful research. Maybe this particular model won't end up being used but this line of research in general could end up being useful. Meta researches a lot of random things, and yeah they have a lot of GPUs, but it's probably still not enough for everything.

Anonymous
06/27/24(Thu)16:55:03 No.101179693

Anonymous 06/27/24(Thu)16:55:03 No.101179693

>>101179350
mee too, I have studied java and php at school and it's like they are speaking chinesse

Anonymous
06/27/24(Thu)16:57:43 No.101179736

Anonymous 06/27/24(Thu)16:57:43 No.101179736

>>101179693
Kids don't learn any machine code these days? Not even how a number is loaded?

Homebrew NES games should be a standard course. The arch (6502) is classic, the platform (Rikoh/NES) is well reverse engineered, and you can literally play your game on a console.

Anonymous
06/27/24(Thu)16:59:15 No.101179748

Anonymous 06/27/24(Thu)16:59:15 No.101179748

>>101179693
I can understand what they are saying and even read some of the code, but the things they are doing is so far away from my usual reality.
It's the coolest shit.

>>101179736
Back when I was in college (2010?) we did have C and assembly classes but it was pretty surface level.
>Homebrew NES games should be a standard course.
Man, that would have been sick to learn back then.
I'd probably spend my whole free time playing around with that shit.

Anonymous
06/27/24(Thu)17:01:55 No.101179789

Anonymous 06/27/24(Thu)17:01:55 No.101179789

>>101179748
The closest I had back then was blind poking with Game Genie.

And eventually it did damage my socket.

I miss my NES. Got a cheap aftermarket now but if I knew that a decade later there would be people literally re-engineering busted NESes.

Anonymous
06/27/24(Thu)17:06:55 No.101179836

Anonymous 06/27/24(Thu)17:06:55 No.101179836

File: Sounanchuu.png (898 KB, 832x1216)

898 KB PNG

why is wrangling sd3 so hard?

Anonymous
06/27/24(Thu)17:10:56 No.101179891

Anonymous 06/27/24(Thu)17:10:56 No.101179891

>>101179836
Globo got to it.

Anonymous
06/27/24(Thu)17:17:59 No.101179992

Anonymous 06/27/24(Thu)17:17:59 No.101179992

File: shot.png (13 KB, 676x80)

13 KB PNG

>>101179836
>please clap

Anonymous
06/27/24(Thu)17:29:31 No.101180102

Anonymous 06/27/24(Thu)17:29:31 No.101180102

>>101180092
>>101180092
>>101180092

Anonymous
06/27/24(Thu)17:29:35 No.101180104

Anonymous 06/27/24(Thu)17:29:35 No.101180104

>>101175310
for someone coping about using cucked proprietary models you sure are spending a lot of time seething in these threads literally every single day while being ignored by almost everyone at that, lmao, what a sad kid

Anonymous
06/27/24(Thu)17:31:03 No.101180114

Anonymous 06/27/24(Thu)17:31:03 No.101180114

>>101179427
I'm not dissing it. I think it's awesome we have a company doing weird shit like this. But it's clearly a sign that they have way too many money, GPUs and time available. Still better bet them and using it like this than some chinese company towards global Xi domination.

Anonymous
06/27/24(Thu)17:33:18 No.101180144

Anonymous 06/27/24(Thu)17:33:18 No.101180144

>>101175310
for someone speaking truth about superior proprietary models you sure are spending a lot of time making lmg seethe in these threads literally every single day, lmao, what a chad

Anonymous
06/27/24(Thu)17:37:26 No.101180186

Anonymous 06/27/24(Thu)17:37:26 No.101180186

>>101177660
>pretending to be retarded

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.