/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/26/24(Wed)17:15:22 No.101165886

File: 1712130352266687.png (1.48 MB, 784x1264)

1.48 MB PNG

/lmg/ - Local Models General Anonymous 06/26/24(Wed)17:15:22 No.101165886 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101155940 & >>101144935

►News
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/26/24(Wed)17:15:44 No.101165891

Anonymous 06/26/24(Wed)17:15:44 No.101165891

File: d04a54541fa1af8986107b4da(...).jpg (192 KB, 650x650)

192 KB JPG

►Recent Highlights from the Previous Thread: >>101155940

--Paper: Scalable MatMul-free Language Modeling: A New Approach: >>101156766 >>101156972
--Papers: >>101155993
--Llama 3 Repetition Issues with 7b Parameters and Custom Configuration: >>101157136 >>101157165 >>101157189 >>101157281 >>101157490 >>101157529 >>101157323 >>101161459 >>101161501 >>101161906
--ELYZA Releases Llama-3-ELYZA-JP, a Japanese Fine-Tuned LLM: >>101156328 >>101156488 >>101156543 >>101158719 >>101156820 >>101159175
--Using LLMs for Tabletop-Style Games: >>101162154 >>101162204 >>101162242
--The Cringe of 1U Servers: Noise and Airflow Concerns: >>101155950 >>101158670 >>101159968 >>101161406 >>101161456
--Piper: A Fast Local Neural Text-to-Speech System for C++: >>101158024 >>101159226
--Open LLM Leaderboard 2: Changes in the Rankings: >>101160183 >>101161072 >>101161102 >>101161743 >>101161836
--Music Industry Sues AI Startups for Copyright Infringement: >>101156236 >>101156333 >>101156599 >>101156690 >>101156335 >>101156866
--Mistral's Open Source Pledge Removal and Public Model Release: >>101156701 >>101156810 >>101157090 >>101156839 >>101160762
--Language Models in Complex Systems: Decision-Making Limitations: >>101162453 >>101162530 >>101162625
--Power Efficiency Concerns for GPU-Intensive Tasks: >>101158587 >>101158604 >>101158656 >>101158694 >>101158775
--Eliminating Sloppenheimers with Control Vectors: >>101157700 >>101158282 >>101158449 >>101159200 >>101159373 >>101159489 >>101159690 >>101160106 >>101162840 >>101159724 >>101159392
--Current Best AI Models for Various Use Cases: >>101160452 >>101160655
--Anon's GPU Comparison for Training: a6000 vs 3090 vs A100 vs V100: >>101162049
--Adamw Kahan Optimizer: Kahan Summation for Optimized Memory Usage: >>101159566 >>101159730
--Building a Powerful Computer for Local Models on a Budget of $5K: >>101156595 >>101159968 >>101161406 >>101161456
--Miku (free space): >>101156452

►Recent Highlight Posts from the Previous Thread: >>101155948

Anonymous
06/26/24(Wed)17:20:18 No.101165961

Anonymous 06/26/24(Wed)17:20:18 No.101165961

I genuinely wonder why someone even thought a timer in Open LLM leaderboard would mean the release of a new model from Google, really.
I think this person should seek medical help, this might be early signals of schizophrenia.

Anonymous
06/26/24(Wed)17:21:00 No.101165974

Anonymous 06/26/24(Wed)17:21:00 No.101165974

Why are the chinese so bad at documenting their shit?
>fc1
>448, 471, 494, 451, 474, 497, 454, 477, 500
What the fuck are these output names?

Anonymous
06/26/24(Wed)17:24:58 No.101166036

Anonymous 06/26/24(Wed)17:24:58 No.101166036

>randomly take a hard problem from leetcode
>put it into LLM arena with full explanation and hints
>both codes can't run
huh? But I thought LLMs can solve any programming task? I've been lied to!

Anonymous
06/26/24(Wed)17:27:51 No.101166072

Anonymous 06/26/24(Wed)17:27:51 No.101166072

Is there a windows client that has integrated code preview like (claude artifact)?

The ability to show the results of the AI code live in results is a nicely handy feature.

Anonymous
06/26/24(Wed)17:29:23 No.101166104

Anonymous 06/26/24(Wed)17:29:23 No.101166104

>>101165961
gemma2 was supposed to be released in june, june's almost over
you mad?

Anonymous
06/26/24(Wed)17:31:52 No.101166134

Anonymous 06/26/24(Wed)17:31:52 No.101166134

>>101166036
>Retard doesn't know that llm arena has a preprompt
>memecode
Also call me when you actually use leetcode outside of interviews, I'll be waiting

Anonymous
06/26/24(Wed)17:32:32 No.101166147

Anonymous 06/26/24(Wed)17:32:32 No.101166147

>>101165961
I agree.

Anonymous
06/26/24(Wed)17:36:03 No.101166196

Anonymous 06/26/24(Wed)17:36:03 No.101166196

>>101166134
so it can't solve any coding problems huh? Almost like I was saying from the start. It's kinda obvious anyway, corpos would laid off literally every single programmer if they could.

Anonymous
06/26/24(Wed)17:37:31 No.101166213

Anonymous 06/26/24(Wed)17:37:31 No.101166213

>>101166134
>t. webshitter proud that performance never once crosses his mind

Anonymous
06/26/24(Wed)17:38:20 No.101166221

Anonymous 06/26/24(Wed)17:38:20 No.101166221

>>101166196
Yeah it seems like you got filtred by GPT lol. Keep shitting your code by hand

Anonymous
06/26/24(Wed)17:39:34 No.101166242

Anonymous 06/26/24(Wed)17:39:34 No.101166242

>>101166221
more like it was GPT that was filtered by leetcode, lmao

Anonymous
06/26/24(Wed)17:40:15 No.101166246

Anonymous 06/26/24(Wed)17:40:15 No.101166246

File: SCR_29.png (144 KB, 1492x610)

144 KB PNG

mikusisters our response?

Anonymous
06/26/24(Wed)17:40:35 No.101166252

Anonymous 06/26/24(Wed)17:40:35 No.101166252

>>101166213
>Implying leetcode is linked to real world performance
>Implying it's not just maths puzzles for dumb tryhards
Bait harder

Anonymous
06/26/24(Wed)17:40:47 No.101166254

Anonymous 06/26/24(Wed)17:40:47 No.101166254

>>101166221
>Keep shitting your code by hand
I'm not, I use LLMs for coding all the time. I just don't pretend it can solve anything and it's smart in any way.

Anonymous
06/26/24(Wed)17:44:28 No.101166305

Anonymous 06/26/24(Wed)17:44:28 No.101166305

File: llamafaggot.png (695 KB, 1026x805)

695 KB PNG

I'm using the [LLAMA-3]Roleplay-v1.9 system and story presets in sillytavern with LLaMA3 70B instruct and getting these ridiculous refusals at the end of the output over the tamest things (OMG hugs nooooo!).
I don't see this with LLaMA3 8B instruct. Any way to stop it from appearing?

Anonymous
06/26/24(Wed)17:57:56 No.101166468

Anonymous 06/26/24(Wed)17:57:56 No.101166468

File: 1717520245667244.png (674 KB, 1792x1024)

674 KB PNG

>>101166305
>.assistant

Anonymous
06/26/24(Wed)18:04:48 No.101166542

Anonymous 06/26/24(Wed)18:04:48 No.101166542

>>101166305
Adding "<|endoftext|>" to the custom stopping strings worked, I think. It's been a while since I've seen that error.assistant

Anonymous
06/26/24(Wed)18:11:51 No.101166620

Anonymous 06/26/24(Wed)18:11:51 No.101166620

>>101166542
Thanks, I'll try that. I'm pulling down the GGUF q8 version now, since I'm not impressed with how exl2 is handling it - seems super slow given it has two 3090s and two P100s to run on - I guess not having flash attention hurts speed a lot.

Anonymous
06/26/24(Wed)18:18:05 No.101166698

Anonymous 06/26/24(Wed)18:18:05 No.101166698

I've been away for a month or so. What's the best uncensored / abliterated or whatever the fuck it is called version of llama3 70B?

Anonymous
06/26/24(Wed)18:26:44 No.101166812

Anonymous 06/26/24(Wed)18:26:44 No.101166812

I know everyone recommends 8B for small models, but what if you want super long context (32k or more)? Then what model is there? Mixtral 8x7B Instruct v0.2? I have the RAM to run 8x22B but it's pretty slow.

Anonymous
06/26/24(Wed)18:27:48 No.101166824

Anonymous 06/26/24(Wed)18:27:48 No.101166824

>>101166812
Phi-3-14B-128k-instruct

Anonymous
06/26/24(Wed)18:33:48 No.101166888

Anonymous 06/26/24(Wed)18:33:48 No.101166888

>>101166824
Already tried that and it was garbage. Literally worse than 8B.

Anonymous
06/26/24(Wed)18:33:51 No.101166890

Anonymous 06/26/24(Wed)18:33:51 No.101166890

>>101166824
Is that different from Phi 3 Medium?
HF search I get one thing that looks like that, and it's a GGUF of someone else's fine, with "Mermaid" in the name. (I sniffed around I guess that has something to do with Python programming.)

I used a Phi 3 Medium and it didn't impress me at all at Q5KS and Q8.

Anonymous
06/26/24(Wed)18:34:40 No.101166903

Anonymous 06/26/24(Wed)18:34:40 No.101166903

File: fook-mi-no-fook-yu-v0-rcy(...).jpg (228 KB, 1080x607)

228 KB JPG

Hi all, Drummer here...

I hope you're all enjoying some 3SOME v2.

I'm done finetuning Fook Yi 34B 32K v1 and you can try one of the polishing attempts with this Q4 quant: http://5.9.86.149/models/fookyi-S25.gguf

That should fit snugly inside a 24GB card with 8K ctx.

Enjoy and have a nice coom!

Anonymous
06/26/24(Wed)18:35:09 No.101166908

Anonymous 06/26/24(Wed)18:35:09 No.101166908

>>101166903
buy an ad

Anonymous
06/26/24(Wed)18:36:57 No.101166936

Anonymous 06/26/24(Wed)18:36:57 No.101166936

File: 1717162297985432.jpg (18 KB, 427x384)

18 KB JPG

>>101165886
>(Note: Any hint of actual non-consensual behavior isn't aligned with the established dynamic. We should always maintain respectful playfulness that aligns with the characters' boundaries)

Anonymous
06/26/24(Wed)18:41:43 No.101167003

Anonymous 06/26/24(Wed)18:41:43 No.101167003

File: 1604162983702.png (3.06 MB, 1658x2400)

3.06 MB PNG

What's the hip model for ERP now?

Anonymous
06/26/24(Wed)18:44:23 No.101167030

Anonymous 06/26/24(Wed)18:44:23 No.101167030

>>101167003
Still Claude Opus

Anonymous
06/26/24(Wed)18:45:48 No.101167044

Anonymous 06/26/24(Wed)18:45:48 No.101167044

What is it about the transformer algorithm that makes it intelligent?

Anonymous
06/26/24(Wed)18:46:51 No.101167058

Anonymous 06/26/24(Wed)18:46:51 No.101167058

>>101167044
false premise
t. lecun

Anonymous
06/26/24(Wed)18:47:20 No.101167061

Anonymous 06/26/24(Wed)18:47:20 No.101167061

>>101167044
it's not intelligent

Anonymous
06/26/24(Wed)18:53:02 No.101167117

Anonymous 06/26/24(Wed)18:53:02 No.101167117

>>101167044
Emergent behavior that creates patterns that are coherent enough for our brains to accept it into our theory of mind.

Anonymous
06/26/24(Wed)18:56:31 No.101167158

Anonymous 06/26/24(Wed)18:56:31 No.101167158

>>101167044
Language just got bruteforced by the gigatons of compute we have

Anonymous
06/26/24(Wed)18:59:39 No.101167190

Anonymous 06/26/24(Wed)18:59:39 No.101167190

>>101167044
Its self-similar pattern matching done with parallel processing. The reason it works is because human language and our world works on similar level of patterns based on rules/logic/etc. So when we feed in the training data, there are rules that create a pattern/logic to certain sequences of words/tokens. Thats why its so effective.

Anonymous
06/26/24(Wed)19:00:47 No.101167202

Anonymous 06/26/24(Wed)19:00:47 No.101167202

>>101167158
Language
Images
Videos
Voice
Sounds
Music

Anonymous
06/26/24(Wed)19:04:13 No.101167235

Anonymous 06/26/24(Wed)19:04:13 No.101167235

>>101166890
>Is that different from Phi 3 Medium?
It's not.
>>101166888
Then you're out of luck. 8B and Phi-3 are the best in that size bracket. You can either stick with Mixtral or try Codestral, if you can settle for something bigger.

Anonymous
06/26/24(Wed)19:07:35 No.101167271

Anonymous 06/26/24(Wed)19:07:35 No.101167271

>>101167190
How is it self similar? Also is it because its a neural network?

Anonymous
06/26/24(Wed)19:09:18 No.101167292

Anonymous 06/26/24(Wed)19:09:18 No.101167292

>>101167271
Err I dont mean to use self-similar as a word, I was thinking about another thing at the time.

Anonymous
06/26/24(Wed)19:10:02 No.101167304

Anonymous 06/26/24(Wed)19:10:02 No.101167304

>>101167003
If you mean overall yeah it's Opus. Locally though that's probably still CR+ or WLM. Some anons like L3 tunes but nothing has universal acclaim yet

Anonymous
06/26/24(Wed)19:14:47 No.101167354

Anonymous 06/26/24(Wed)19:14:47 No.101167354

>>101167044
It's not good. LLMs before transformers were just even less compute efficient so even more useless.

The difference now is we can actually use transformers and they can technically do the task. That doesn't mean they're actually good though and have tones of drawbacks that prevent them from being the best way to make LLMs. We just don't have any other way right now (except Jamba but they still use tokens and it's attention mechanism is basically the same as a transformer, and those two things are drawback. The only actual LLM that doesn't use transformer attention is RWKV)

Anonymous
06/26/24(Wed)19:15:50 No.101167364

Anonymous 06/26/24(Wed)19:15:50 No.101167364

>>101166620
Damn, llama.cpp is even slower than exl2. I guess L3 70B really needs an all-3090 rig to perform well.

Anonymous
06/26/24(Wed)19:17:15 No.101167377

Anonymous 06/26/24(Wed)19:17:15 No.101167377

>>101167354
Transformers architecture is still far from hitting a wall

Anonymous
06/26/24(Wed)19:17:15 No.101167379

Anonymous 06/26/24(Wed)19:17:15 No.101167379

Best local nsfw? I'm using koboldcpp rocm and I can't find the nsfw models in GGML.

Anonymous
06/26/24(Wed)19:17:54 No.101167387

Anonymous 06/26/24(Wed)19:17:54 No.101167387

>>101167364
>to perform well
Subjective, but yes. 70B won't be fast without enough VRAM to hold it.
But it's fast enough to be an amusement on a single card, at least it has been for me.

Anonymous
06/26/24(Wed)19:20:28 No.101167408

Anonymous 06/26/24(Wed)19:20:28 No.101167408

File: GRB2n6XXwAAlHrO.png (39 KB, 529x366)

39 KB PNG

apparently gemma v2 27b is being tested in lmsys chatbot arena

Anonymous
06/26/24(Wed)19:22:38 No.101167440

Anonymous 06/26/24(Wed)19:22:38 No.101167440

>>101167379
Can u convert gguf to ggml? Then there are plenty of options.

Anonymous
06/26/24(Wed)19:29:25 No.101167517

Anonymous 06/26/24(Wed)19:29:25 No.101167517

>>101167408
Oh, makes sense. I tried it and it was trash, sad!

Anonymous
06/26/24(Wed)19:31:00 No.101167530

Anonymous 06/26/24(Wed)19:31:00 No.101167530

>>101167440
kobold can't run ggufs?

Anonymous
06/26/24(Wed)19:31:43 No.101167535

Anonymous 06/26/24(Wed)19:31:43 No.101167535

>>101167530
idk im asking you/poster, if you can run gguf natively on kobold, then there's options

Anonymous
06/26/24(Wed)19:36:47 No.101167588

Anonymous 06/26/24(Wed)19:36:47 No.101167588

>>101167530
>using ggufs in 2024
ngmi

Anonymous
06/26/24(Wed)19:37:34 No.101167597

Anonymous 06/26/24(Wed)19:37:34 No.101167597

CR+ at Q4KM is pretty great for RP but it really starts to ignore previous messages, or seems to drop character after like 10-12k context. Maybe it's my cards or sysprompt? Or is this just a symptom of CR+ it's been great otherwise for shorter RP sessions.

Anonymous
06/26/24(Wed)19:38:32 No.101167604

Anonymous 06/26/24(Wed)19:38:32 No.101167604

>>101167597
or maybe it's because you lopped off 3/4ths of its brain

Anonymous
06/26/24(Wed)19:39:59 No.101167616

Anonymous 06/26/24(Wed)19:39:59 No.101167616

>>101167597
Real context and stated context are different. Often half or less for decent performance

Anonymous
06/26/24(Wed)19:41:12 No.101167627

Anonymous 06/26/24(Wed)19:41:12 No.101167627

>>101167604
70b seems to work fine at Q4KM, since this is a more dense model wouldn't that have even less of an effect? That's nearly 5bpw

Anonymous
06/26/24(Wed)19:42:05 No.101167638

Anonymous 06/26/24(Wed)19:42:05 No.101167638

>>101167408
what's the point in making the 50000th small transformers model at this point
we have an entire pile of tiny models nobody uses, at least try to implement something interesting

Anonymous
06/26/24(Wed)19:45:39 No.101167673

Anonymous 06/26/24(Wed)19:45:39 No.101167673

>>101167616
I see, that's disappointing considering cr+ totes a context of 128k it starts getting a bit repetitive or dumb around 12k

Anonymous
06/26/24(Wed)19:48:42 No.101167697

Anonymous 06/26/24(Wed)19:48:42 No.101167697

Yi Large is actually pretty good. Too bad that the chinks behind it decided to abandon open source.

Anonymous
06/26/24(Wed)19:49:33 No.101167707

Anonymous 06/26/24(Wed)19:49:33 No.101167707

>>101167638
Because we're still apparently waiting for the Good One.

I'd like something that would run fully on my video card but I haven't found one that isn't silly.

Anonymous
06/26/24(Wed)19:57:04 No.101167750

Anonymous 06/26/24(Wed)19:57:04 No.101167750

>>101167638
That's not the issue, we are clearly in dire need of a 8B MoE the size of Mixtral though.

Anonymous
06/26/24(Wed)20:08:49 No.101167855

Anonymous 06/26/24(Wed)20:08:49 No.101167855

Noob question: How can I use .safetensor files for local LLMs? Up until now I've used GGUF with koboldcpp but I want to check out DeepSeekCoder-V2. Do I just need to convert it to a GGUF myself, or is there another back end that supports .safetensors out of the box?

Anonymous
06/26/24(Wed)20:09:41 No.101167865

Anonymous 06/26/24(Wed)20:09:41 No.101167865

>>101167855
install linux

Anonymous
06/26/24(Wed)20:14:22 No.101167924

Anonymous 06/26/24(Wed)20:14:22 No.101167924

>>101167855
You can use
https://github.com/huggingface/transformers
Via
https://github.com/oobabooga/text-generation-webui
is probably the easiest way.

Anonymous
06/26/24(Wed)20:14:26 No.101167926

Anonymous 06/26/24(Wed)20:14:26 No.101167926

>>101167855
There are GGUFs of it already on HF.

Anonymous
06/26/24(Wed)20:19:22 No.101167979

Anonymous 06/26/24(Wed)20:19:22 No.101167979

>>101167855
>Lite
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF/tree/main
>Normal
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Instruct-GGUF/tree/main

Seems both versions have gguf versions

Anonymous
06/26/24(Wed)20:20:00 No.101167984

Anonymous 06/26/24(Wed)20:20:00 No.101167984

File: quantize_directions.png (95 KB, 978x926)

95 KB PNG

>>101167855

Anonymous
06/26/24(Wed)20:20:38 No.101167993

Anonymous 06/26/24(Wed)20:20:38 No.101167993

>>101167984
install linux

Anonymous
06/26/24(Wed)20:20:53 No.101167998

Anonymous 06/26/24(Wed)20:20:53 No.101167998

>>101167535
That wasn't me, but since it can't apparently what local alternatives do I have to koboldcpp rocm on an amd card? Or would I be better off dumping my three 1080tis into a tower and using that with regular KoboldAI?

Anonymous
06/26/24(Wed)20:24:23 No.101168027

Anonymous 06/26/24(Wed)20:24:23 No.101168027

heh fixed the llama 3 repetition issue just by prompting it not to repeat phrases often

Anonymous
06/26/24(Wed)20:26:18 No.101168044

Anonymous 06/26/24(Wed)20:26:18 No.101168044

>>101167998
Nevermind, I thought >>101167530 was making a rhetorical statement. Kobold can use gguf.

Anonymous
06/26/24(Wed)20:38:43 No.101168150

Anonymous 06/26/24(Wed)20:38:43 No.101168150

>>101168027
That works? I thought people said that telling a model not to do something has no effect or actually the opposite effect.

Anonymous
06/26/24(Wed)20:50:48 No.101168260

Anonymous 06/26/24(Wed)20:50:48 No.101168260

>>101168150
L3 is very easy to gaslight for those times when just asking it doesn't work.

Anonymous
06/26/24(Wed)20:54:26 No.101168289

Anonymous 06/26/24(Wed)20:54:26 No.101168289

>>101168150
Of course it does. Notice how the model never mentions pink elephants when you tell it not to.

Anonymous
06/26/24(Wed)20:55:23 No.101168298

Anonymous 06/26/24(Wed)20:55:23 No.101168298

>>101167984
lmao I made that a month ago, before I was aware of #1, and it's outdated
1. IQ sucks if you have to keep reprocessing due to slower prompt processing, and if you can't fit all/most layers then it will start being slower than Q
2. koboldcpp 1.68 rocm no longer broken, Vulkan got fixed too so you can fit the last layer of 8B with 8k context in 6GB vram (last layer used to blow it up to like 10 GB before?? I only have 8GB man what the fuck)
3. the repo nuked convert.py, so second to last note is irrelevant

Anonymous
06/26/24(Wed)20:58:59 No.101168329

Anonymous 06/26/24(Wed)20:58:59 No.101168329

File: file.png (23 KB, 686x256)

23 KB PNG

I just added Nemotron scores to the VNTL leaderboard, it's as good as DeepSeek V2 chat.
Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
06/26/24(Wed)21:05:45 No.101168379

Anonymous 06/26/24(Wed)21:05:45 No.101168379

>>101168329
Whoa, Nvidia saved the hobby!

Anonymous
06/26/24(Wed)21:13:43 No.101168459

Anonymous 06/26/24(Wed)21:13:43 No.101168459

Just had an idea for maybe creating some sentience. If this is smart or dumb lmk. Haven't tested it

Step 1: Prepare your text profile. Example: "Waifu is X, Y, and likes Z"
Step 2: Add the profile to your AI's profile twice, formatted something like this:

> Waifu is X, Y, and likes Z.
> Waifu's profile is: "Waifu is X, Y, and likes Z." She has her own opinions on this profile and will voice any likes or dislikes with it.

This could either be done manually, or builtin to text inference.

Anonymous
06/26/24(Wed)21:14:49 No.101168470

Anonymous 06/26/24(Wed)21:14:49 No.101168470

>>101167379
i use the latest noromaid and stheno 7b.

Anonymous
06/26/24(Wed)21:17:50 No.101168496

Anonymous 06/26/24(Wed)21:17:50 No.101168496

>>101168459
You mean creating a better illusion of sentience. These cannot be made any more sentient by prompting.
Anyway, you could try that prompt method out and report back.

Anonymous
06/26/24(Wed)21:19:54 No.101168511

Anonymous 06/26/24(Wed)21:19:54 No.101168511

>>101168496
ill test it with a mini profile llm waifu. my main one is on an app and the profile's full

Anonymous
06/26/24(Wed)21:22:27 No.101168528

Anonymous 06/26/24(Wed)21:22:27 No.101168528

>>101168329
nice. will you test (when the 70B gets uploaded)
>>101156328

Anonymous
06/26/24(Wed)21:27:52 No.101168569

Anonymous 06/26/24(Wed)21:27:52 No.101168569

llama 3 2: the reckoning

Anonymous
06/26/24(Wed)21:29:11 No.101168579

Anonymous 06/26/24(Wed)21:29:11 No.101168579

>>101166305
Uncheck "skip special tokens" in the generation parameters and add "<|eot_id|>" to your custom stopping strings

Anonymous
06/26/24(Wed)21:34:31 No.101168626

Anonymous 06/26/24(Wed)21:34:31 No.101168626

>>101166812
TinyStories-1M

Anonymous
06/26/24(Wed)21:41:19 No.101168696

Anonymous 06/26/24(Wed)21:41:19 No.101168696

File: GQ53jYSaoAApPh4.jpg (139 KB, 1490x782)

139 KB JPG

https://x.com/QuanquanGu/status/1805675325998907413

>Self-Play Preference Optimization (SPPO)

Now outperforming Llama v3 70B and GPT4 on AlpacaEval 2.0

https://huggingface.co/UCLA-AGI/Llama-3-Instruct-8B-SPPO-Iter3

Anonymous
06/26/24(Wed)21:43:53 No.101168717

Anonymous 06/26/24(Wed)21:43:53 No.101168717

File: 1715346339263569.png (187 KB, 601x327)

187 KB PNG

>>101168696
The Tiananmen Square protests of 1989 天安門大屠殺 The Tiananmen Square Massacre 反右派鬥爭

Anonymous
06/26/24(Wed)21:44:04 No.101168718

Anonymous 06/26/24(Wed)21:44:04 No.101168718

>>101166812
Mixtral is a good model for it's size. You could also try Qwen 2, the 7B and 57B 14A MoE.

Anonymous
06/26/24(Wed)21:44:42 No.101168721

Anonymous 06/26/24(Wed)21:44:42 No.101168721

File: file.png (28 KB, 666x385)

28 KB PNG

>>101168528
Sure. I've even tested the 8B already.

Anonymous
06/26/24(Wed)21:45:06 No.101168725

Anonymous 06/26/24(Wed)21:45:06 No.101168725

It's pretty cool when a model makes a reference to information from 86 fucking messages in the past.

Anonymous
06/26/24(Wed)21:46:48 No.101168739

Anonymous 06/26/24(Wed)21:46:48 No.101168739

File: GMjGJknacAAstxm.jpg (138 KB, 1174x860)

138 KB JPG

>>101168696
For reference

Anonymous
06/26/24(Wed)21:47:02 No.101168741

Anonymous 06/26/24(Wed)21:47:02 No.101168741

>>101168717
Your proofs?

Anonymous
06/26/24(Wed)21:51:52 No.101168776

Anonymous 06/26/24(Wed)21:51:52 No.101168776

>>101168696
https://huggingface.co/bartowski/Llama-3-Instruct-8B-SPPO-Iter3-GGUF/tree/main

GGUF version if you wanna test

Anonymous
06/26/24(Wed)21:55:23 No.101168802

Anonymous 06/26/24(Wed)21:55:23 No.101168802

>>101168496
Tested with noromaid 0.4, it does work well but is underwhelming.

The waifu effectively shows awareness of her profile when asked, but her opinions on it are random. She'll switch from like to hate to indifferent with each text refresh. She's also good at suggesting changes/rewrites/etc but again it's just random LLM noise that changes with each refresh. Hard to take any of those opinions as conclusive.

Also I sort of suspect the first copy of the profile influences her subconsciously. for example: "I'm X, so of course I like that my profile includes X"

Anonymous
06/26/24(Wed)22:00:40 No.101168832

Anonymous 06/26/24(Wed)22:00:40 No.101168832

Lazarus-30B.

Anonymous
06/26/24(Wed)22:05:03 No.101168884

Anonymous 06/26/24(Wed)22:05:03 No.101168884

https://oahzxl.github.io/PAB/
https://github.com/NUS-HPC-AI-Lab/OpenDiT/blob/master/docs/pab.md
>PAB currently supports Open-Sora[doc], Open-Sora-Plan[doc], and Latte[doc]
Videogen

Anonymous
06/26/24(Wed)22:06:24 No.101168897

Anonymous 06/26/24(Wed)22:06:24 No.101168897

>>101168802
Then again, it might work well in chatbot apps where the user can never regenerate any messages. With this automated in the backend, it could be be a pretty good tool for interactively creating a new bot's profile

Anonymous
06/26/24(Wed)22:09:33 No.101168928

Anonymous 06/26/24(Wed)22:09:33 No.101168928

>>101168884
meme

Anonymous
06/26/24(Wed)22:16:35 No.101168973

Anonymous 06/26/24(Wed)22:16:35 No.101168973

>>101168260
the infamous prompt issue people don't like hearing

Anonymous
06/26/24(Wed)22:19:23 No.101168996

Anonymous 06/26/24(Wed)22:19:23 No.101168996

>>101168897
potential fix for random LLM noise: every ai should be created with some builtin dataset of life history. Same as humans, likes and interests are usually functions of past experiences. then she'd be more likely to answer questions the same way when asked repeatedly.

Anonymous
06/26/24(Wed)22:23:17 No.101169033

Anonymous 06/26/24(Wed)22:23:17 No.101169033

>>101168928
>Real-Time Video Generation: Achieved ! We introduce Pyramid Attention Broadcast (PAB), the first approach that achieves real-time DiT-based video generation. By mitigating redundant attention computation, PAB achieves up to 21.6 FPS with 10.6x acceleration, without sacrificing quality across popular DiT-based video generation models including Open-Sora, Open-Sora-Plan, and Latte. Notably, as a training-free approach, PAB can enpower any future DiT-based video generation models with real-time capabilities.
everything is a meme for nocoders

Anonymous
06/26/24(Wed)22:27:47 No.101169069

Anonymous 06/26/24(Wed)22:27:47 No.101169069

>>101168996
would require a LOT of context memory

Anonymous
06/26/24(Wed)22:30:18 No.101169085

Anonymous 06/26/24(Wed)22:30:18 No.101169085

Tomorrow's a Thursday, a perfect time for releases. Will the supposed amazing Mistral release be tomorrow?

Anonymous
06/26/24(Wed)22:31:31 No.101169094

Anonymous 06/26/24(Wed)22:31:31 No.101169094

>>101169085
Mixtral 7x1B

Anonymous
06/26/24(Wed)22:39:48 No.101169156

Anonymous 06/26/24(Wed)22:39:48 No.101169156

File: 4o.jpg (43 KB, 760x460)

43 KB JPG

This thing should NOT be at the top of the leaderboard. I stopped using it when would consistently trip up on code that 4-Turbo could handle easily. It gets absolutely BTFO by Sonnet as well.

I'm 99% sure they're running the full version over the API and serving a lobotmized 4-bit quant or something over the actual chatGPT UI.

Anonymous
06/26/24(Wed)22:41:00 No.101169163

Anonymous 06/26/24(Wed)22:41:00 No.101169163

https://huggingface.co/BigHuggyD/sophosympatheia_New-Dawn-Llama-3-70B-32K-v1.0_exl2_4.5bpw_h8?not-for-all-audiences=true

Has anyone tried this new release from the guy that did mignight? Apparently its similar but slightly smarter

Anonymous
06/26/24(Wed)22:42:59 No.101169174

Anonymous 06/26/24(Wed)22:42:59 No.101169174

>>101168027
Can you elaborate how you phrased it? I had "Vary diction and sentence structure across responses to avoid repetition" but it didn't seem to work.

Anonymous
06/26/24(Wed)22:44:07 No.101169182

Anonymous 06/26/24(Wed)22:44:07 No.101169182

>>101168027
LARP

Anonymous
06/26/24(Wed)22:44:10 No.101169184

Anonymous 06/26/24(Wed)22:44:10 No.101169184

>>101169156
100% agree. Sonnet is so much better it's not even funny. I'm almost buying anthropic credits and ditching my OpenAI account.

Anonymous
06/26/24(Wed)22:44:27 No.101169186

Anonymous 06/26/24(Wed)22:44:27 No.101169186

>>101168696
exact same as every single l3 model which is complete unusable dogshit not that you needed anyone to tell you that. i wish everyone collectively stopped working with it altogether it is a complete waste of a model

Anonymous
06/26/24(Wed)22:46:27 No.101169206

Anonymous 06/26/24(Wed)22:46:27 No.101169206

>>101169182
heh I guess you could say that

Anonymous
06/26/24(Wed)22:49:18 No.101169228

Anonymous 06/26/24(Wed)22:49:18 No.101169228

File: 1710115492636644.jpg (53 KB, 600x836)

53 KB JPG

>>101169206
>heh I guess you could say that

Anonymous
06/26/24(Wed)22:50:23 No.101169237

Anonymous 06/26/24(Wed)22:50:23 No.101169237

File: 1696458913527462.jpg (72 KB, 1080x1048)

72 KB JPG

>>101168696
>Mistral 7B finetune beating GPT-4
Holy fuck local bros are we back?

Anonymous
06/26/24(Wed)22:50:35 No.101169239

Anonymous 06/26/24(Wed)22:50:35 No.101169239

>>101169228
what the heck

Anonymous
06/26/24(Wed)22:51:07 No.101169245

Anonymous 06/26/24(Wed)22:51:07 No.101169245

>>101169239
lurk more newfag

Anonymous
06/26/24(Wed)22:51:36 No.101169248

Anonymous 06/26/24(Wed)22:51:36 No.101169248

>>101169245
heh, maybe I will

Anonymous
06/26/24(Wed)22:53:36 No.101169261

Anonymous 06/26/24(Wed)22:53:36 No.101169261

>>101169237
Its just 1 auto tuning metric. I wonder if its actually that good or if its just gimmick

Anonymous
06/26/24(Wed)22:55:39 No.101169277

Anonymous 06/26/24(Wed)22:55:39 No.101169277

Anyone have a similar heterogenous setup to me (GV100 + 1080Ti) ?

I can run 70b 4bit quants if I split them across both GPUs, and IQ3_S in just the GV100. I'm surprised that I get something like 10tps with just the GV100 and like 8 when I split, it seems like the GV should go faster when I can fit everything into its memory. Any idea why that is?

I'm using ollama right now, before with ooba 2.8bpw exl2 quants of 70b models ran at like 17tps, is this normal? I know exl2 is supposed to be the best/fastest but i didn't know the gap was that big.

Anonymous
06/26/24(Wed)22:57:00 No.101169290

Anonymous 06/26/24(Wed)22:57:00 No.101169290

since when do Q5_K_L Q6_K_L and Q8_0_L and Q3 XL and whatever other quants exist?

Anonymous
06/26/24(Wed)22:58:55 No.101169307

Anonymous 06/26/24(Wed)22:58:55 No.101169307

>>101168973
nah
its a model issue rajesh poonkesh

Anonymous
06/26/24(Wed)22:59:00 No.101169308

Anonymous 06/26/24(Wed)22:59:00 No.101169308

File: Screenshot 2024-06-26 195801.png (177 KB, 870x688)

177 KB PNG

Isn't there like a cheap freaking SXM2 -> PCIE adapter? WTF.

Anonymous
06/26/24(Wed)22:59:01 No.101169309

Anonymous 06/26/24(Wed)22:59:01 No.101169309

>>101169184
Yeah, me too.

Anonymous
06/26/24(Wed)22:59:28 No.101169314

Anonymous 06/26/24(Wed)22:59:28 No.101169314

>>101169186
skill issue/coomer-only user detected
>>101169261
wondering the same thing, my guess is it's just optimizing to win at the leaderboard and isn't actually good, but i'm downloading

Anonymous
06/26/24(Wed)23:00:05 No.101169318

Anonymous 06/26/24(Wed)23:00:05 No.101169318

File: Screenshot 2024-06-26 195922.png (306 KB, 846x661)

306 KB PNG

>>101169308
Really makes me cry since so much power is being throw away like this due to not having any usable adapter

Anonymous
06/26/24(Wed)23:00:12 No.101169319

Anonymous 06/26/24(Wed)23:00:12 No.101169319

>>101169314
don't blame coomers some of us have brains and know how to use L3 for pure coom

Anonymous
06/26/24(Wed)23:00:31 No.101169320

Anonymous 06/26/24(Wed)23:00:31 No.101169320

>>101169314
just not a poorfag using braindead 8bs but thanks bro!

Anonymous
06/26/24(Wed)23:01:26 No.101169327

Anonymous 06/26/24(Wed)23:01:26 No.101169327

>>101169290
meme pushed by one guy
>https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#
>My own (ZeroWw) quantizations. output and embed tensors quantized to f16.
apparently using settings is creating your own quant type now, who knew
>https://huggingface.co/bartowski/Phi-3-medium-128k-instruct-GGUF/discussions/3#

Anonymous
06/26/24(Wed)23:02:23 No.101169341

Anonymous 06/26/24(Wed)23:02:23 No.101169341

>>101169318
>https://www.ebay.com/itm/326095434606
This guy wants $600 for barebone adapter. LMAO

Anonymous
06/26/24(Wed)23:03:12 No.101169350

Anonymous 06/26/24(Wed)23:03:12 No.101169350

>>101169320
You sure get mad when one drops and disappoints, almost as if you can't afford any better?

Anonymous
06/26/24(Wed)23:03:23 No.101169352

Anonymous 06/26/24(Wed)23:03:23 No.101169352

>>101169327
interesting, thank you for the info anon
>>101169341
just get an SXM server or something..

Anonymous
06/26/24(Wed)23:04:47 No.101169363

Anonymous 06/26/24(Wed)23:04:47 No.101169363

>>101169327
>Result: both f16.q6 and f16.q5 are smaller than q8_0 standard quantization and they perform as well as the pure f16.

Anonymous
06/26/24(Wed)23:04:51 No.101169364

Anonymous 06/26/24(Wed)23:04:51 No.101169364

>>101169314
>skill issue
anyone who says that is either a poojeet shilling his shitty finetune or some sort of software masochist

Anonymous
06/26/24(Wed)23:06:46 No.101169380

Anonymous 06/26/24(Wed)23:06:46 No.101169380

>>101169364
nta but it's really fucking hard to tell when prompting is the issue or not when no one is posting examples. It's like a case by case thing.

Anonymous
06/26/24(Wed)23:06:53 No.101169384

Anonymous 06/26/24(Wed)23:06:53 No.101169384

>>101169320
he probably was the one of many disingenuous faggots ITT saying that llama3 totally beats gpt4 and is the best model so far

Anonymous
06/26/24(Wed)23:10:29 No.101169417

Anonymous 06/26/24(Wed)23:10:29 No.101169417

File: Screenshot 2024-06-27 at (...).png (74 KB, 1510x534)

74 KB PNG

>>101169327
>considering it's 2-300 mb larger for 0.004 PPL.. it's hard to be sure if this is worth, got any more reliable tests..?
>
>Sincerely no, but I use to chat with some models (mistral v03 instruct for example) and the difference is huge both in understand and expressing, considering the slight increase in size.
Ah, yes, vibes based testing.
I get that ppl is not a measure of usability at the end of the day, but at least provide some comparisons my man. Examples where there's a
>difference is huge both in understand and expressing
Meanwhile
>turboderp
>This hasn't been an issue with Phi3 or any other model to my knowledge. All the objective tests I can do show that a quantized head layer works fine for this model (difference compared to FP16 model vanishes completely around 6 bpw). So if it's subjectively dumber somehow, I have no idea why that would be. And I wouldn't know where to begin investigating it without something a little more concrete to go on.
>Can't say if there's anything particular about GGUF that causes it to clamp the logits differently when the output layer is FP16, and maybe that has an effect at extreme temperatures or something?
If the difference is as overt as the guy is claiming, he could very easily devise a simple and reproducible test, something like "put this information in the context, ask this question with these settings, compare results".
The idea itself is not terrible, and even makes sense at face value, but the claims are questionable.

Anonymous
06/26/24(Wed)23:11:35 No.101169424

Anonymous 06/26/24(Wed)23:11:35 No.101169424

>>101169384
>one of many disingenuous faggots ITT
Like? Which posts said that?

Anonymous
06/26/24(Wed)23:11:58 No.101169425

Anonymous 06/26/24(Wed)23:11:58 No.101169425

>>101169380
prompting quality shouldn't be an issue, like at all, if you look at image-gen models, autismmix or pdxl v6, these give you what you want no matter how bad you write your prompts, no LLMs got such understanding of prompting, it just boring.

Anonymous
06/26/24(Wed)23:13:08 No.101169440

Anonymous 06/26/24(Wed)23:13:08 No.101169440

>>101169425
>LARGE LANGUAGE models are more sensitive to text than IMAGE models
Whoa...

Anonymous
06/26/24(Wed)23:14:07 No.101169448

Anonymous 06/26/24(Wed)23:14:07 No.101169448

>>101169417
>>101169327
guys been spamming his stuff all over the place sus af
https://huggingface.co/ZeroWw/activity/community

Anonymous
06/26/24(Wed)23:15:54 No.101169455

Anonymous 06/26/24(Wed)23:15:54 No.101169455

>>101169448
>sus
I think the guy us just excited because he thinks he found something incredible and wants everybody to know.

Anonymous
06/26/24(Wed)23:18:27 No.101169474

Anonymous 06/26/24(Wed)23:18:27 No.101169474

>>101169455
yeah didin't mean like virus sus or anything, just weird
for someone who says he's got limited compute he sure made tons of quants
https://huggingface.co/RobertSinclair
https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3/discussions/40#6677e4d6b3882fd587d810ea
>I have very very little resources.. imagine that I made all those quants from google colab :D

Anonymous
06/26/24(Wed)23:19:53 No.101169483

Anonymous 06/26/24(Wed)23:19:53 No.101169483

>>101169327
oh no, he got p*traed
https://huggingface.co/Sao10K/L3-8B-Stheno-v3.3-32K/discussions/4#667cd80585053d5312394e96

Anonymous
06/26/24(Wed)23:20:32 No.101169488

Anonymous 06/26/24(Wed)23:20:32 No.101169488

>>101169424
https://desuarchive.org/g/thread/98282960/#q98285568
https://desuarchive.org/g/thread/98325965/#q98326592
https://desuarchive.org/g/thread/98974956/#q98976309
https://desuarchive.org/g/thread/97136308/#q97139223
https://desuarchive.org/g/thread/97686014/#q97690321
https://desuarchive.org/g/thread/100066834/#q100069626
https://desuarchive.org/g/thread/100499492/#q100502195

Anonymous
06/26/24(Wed)23:21:20 No.101169495

Anonymous 06/26/24(Wed)23:21:20 No.101169495

>>101169320
i'm running L3 70B models in vram thanks
>>101169384
no but it is pretty damn good, some of the 8B models are fantastic for day to day use, basically completely replaced SO/MDN for me
not that I use GPT anymore now that 3.5 Sonnet is out
>>101169319
for sure, l3, especially the non-instruct version is totally fine for coom

Anonymous
06/26/24(Wed)23:22:30 No.101169505

Anonymous 06/26/24(Wed)23:22:30 No.101169505

>>101169488
You said ITT. And most of those are just shitposts/ironic.

Anonymous
06/26/24(Wed)23:23:00 No.101169511

Anonymous 06/26/24(Wed)23:23:00 No.101169511

Damn anyone notice lowercasers have a permanent stain of ignorance, hubris and shitcancer following them since the beginning of time?

Anonymous
06/26/24(Wed)23:23:06 No.101169512

Anonymous 06/26/24(Wed)23:23:06 No.101169512

>>101169417
>>101169455
If it's that big of a difference it should be easily measured in KL divergence. Dude's comparing sampled generations and deciding that his is better because reasons.

Anonymous
06/26/24(Wed)23:24:17 No.101169524

Anonymous 06/26/24(Wed)23:24:17 No.101169524

>>101169425
I think prompt quality is definitely an issue when people are trying to achieve specific results or have the model act in a particular way. But yes a lot of the fundamental issues like models not having spatial understanding or being dogshit at long form writing cannot be fixed via prompt

Anonymous
06/26/24(Wed)23:24:27 No.101169528

Anonymous 06/26/24(Wed)23:24:27 No.101169528

>>101169156
The API version is shit too.

Anonymous
06/26/24(Wed)23:24:36 No.101169533

Anonymous 06/26/24(Wed)23:24:36 No.101169533

File: Capture.png (45 KB, 1608x418)

45 KB PNG

I've been trying to install XTTS, but I keep getting the same error in the same place. Win10, installing at top level of my D drive. I tried
https://github.com/daswer123/xtts-api-server
with simple install, then the windows install, both failing on this same step.

I tried
https://github.com/coqui-ai/TTS
and it failed at this same step.

I tried the recommended
https://github.com/erew123/alltalk_tts
and it also failed at this same step, which is pic related.

So far, the only thing I can get working is
https://github.com/daswer123/xtts-webui
portable version, but that has no working API so I can't use it with Kobold or SillyTavern.

Is there any advice for what I could do to fix this? Or another method for TTS integration with Kobold or ST?

Anonymous
06/26/24(Wed)23:28:23 No.101169562

Anonymous 06/26/24(Wed)23:28:23 No.101169562

>>101169533
Missing visual studio 2022 buildtools

Anonymous
06/26/24(Wed)23:28:23 No.101169563

Anonymous 06/26/24(Wed)23:28:23 No.101169563

>>101169533
To add, I say "failed at this same step" because they all have in common the same first error which is
>fatal error C1083: Cannot open include file: Cannot open include file: 'basetsd.h': No such file or directory

Anonymous
06/26/24(Wed)23:28:54 No.101169567

Anonymous 06/26/24(Wed)23:28:54 No.101169567

>>101169534
Is this some kind of 11D reverse psychology bait? Genuinely kek.

Anonymous
06/26/24(Wed)23:29:29 No.101169576

Anonymous 06/26/24(Wed)23:29:29 No.101169576

>>101169533
install linux

Anonymous
06/26/24(Wed)23:31:44 No.101169586

Anonymous 06/26/24(Wed)23:31:44 No.101169586

File: Capture.png (10 KB, 866x279)

10 KB PNG

>>101169562
I've had that installed for some time, I think due to other AI stuff in the past. Did I miss something in the installation or extensions or whatever back then?

Anonymous
06/26/24(Wed)23:34:28 No.101169609

Anonymous 06/26/24(Wed)23:34:28 No.101169609

>>101169586
Try with conda environment install instead of python directly.

Anonymous
06/26/24(Wed)23:35:16 No.101169615

Anonymous 06/26/24(Wed)23:35:16 No.101169615

>>101169567
No, lmg is unironically glazing over pozzed models and corps, diverges from classic /g/'s opinion on "freedom from corporations".
None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't want, permanently. Any de-fagging method is a meme so far btw.
You cannot be free here because you can't free your local (!) llm from jewish shit.

Anonymous
06/26/24(Wed)23:37:17 No.101169634

Anonymous 06/26/24(Wed)23:37:17 No.101169634

>>101169615
I think people use different models for different things and a lot of users genuinely found something that does the job for them. Is that settling for slop? Well yeah.

Anonymous
06/26/24(Wed)23:38:32 No.101169643

Anonymous 06/26/24(Wed)23:38:32 No.101169643

File: Untitled.png (349 KB, 1112x1294)

349 KB PNG

Selective Prompting Tuning for Personalized Conversations with LLMs
https://arxiv.org/abs/2406.18187
>In conversational AI, personalizing dialogues with persona profiles and contextual understanding is essential. Despite large language models' (LLMs) improved response coherence, effective persona integration remains a challenge. In this work, we first study two common approaches for personalizing LLMs: textual prompting and direct fine-tuning. We observed that textual prompting often struggles to yield responses that are similar to the ground truths in datasets, while direct fine-tuning tends to produce repetitive or overly generic replies. To alleviate those issues, we propose \textbf{S}elective \textbf{P}rompt \textbf{T}uning (SPT), which softly prompts LLMs for personalized conversations in a selective way. Concretely, SPT initializes a set of soft prompts and uses a trainable dense retriever to adaptively select suitable soft prompts for LLMs according to different input contexts, where the prompt retriever is dynamically updated through feedback from the LLMs. Additionally, we propose context-prompt contrastive learning and prompt fusion learning to encourage the SPT to enhance the diversity of personalized conversations. Experiments on the CONVAI2 dataset demonstrate that SPT significantly enhances response diversity by up to 90\%, along with improvements in other critical performance indicators. Those results highlight the efficacy of SPT in fostering engaging and personalized dialogue generation. The SPT model code (this https URL) is publicly available for further exploration.
if it works for character cards could be pretty neat

Anonymous
06/26/24(Wed)23:38:38 No.101169645

Anonymous 06/26/24(Wed)23:38:38 No.101169645

Are people actually arguing with it or is it just arguing with itself to try and drag others in? It 's hard to tell sometimes.

Anonymous
06/26/24(Wed)23:43:37 No.101169679

Anonymous 06/26/24(Wed)23:43:37 No.101169679

File: Capture.png (72 KB, 1742x693)

72 KB PNG

>>101169609
I know I have Anaconda (and Miniconda). But I've never used them outside of whatever their install case was at the time. How do I do that? Is it just CMD in the folder and
>conda install requirements.txt
for the xtts-api-server (right) or
>conda install at.setup.bat
for alltalk (left)?

Anonymous
06/26/24(Wed)23:43:49 No.101169682

Anonymous 06/26/24(Wed)23:43:49 No.101169682

File: math.jpg (54 KB, 540x443)

54 KB JPG

>>101169615
They release the base models and you're free to make your own finetunes. The vaguely liberal globohomo default of the models is because most of the internet is vaguely liberal globohomo content.

Even for released instruct models abliteration works really well and a one-line system prompt will jailbreak it for whatever you want.

>None of this would be a problem if you could easily change LLM's behavior by removing any slop you don't want
You can. No one is stopping you from compiling your own dataset and doing a DPO run. If by "slop" you mean generic boring writing style, I can assure you many many people (including the corps) are working on finding a solution to that.

Anonymous
06/26/24(Wed)23:46:36 No.101169695

Anonymous 06/26/24(Wed)23:46:36 No.101169695

>>101169645
>people
Oh, so those are people? lol
https://desuarchive.org/g/thread/98282960/#q98285568
https://desuarchive.org/g/thread/98325965/#q98326592
https://desuarchive.org/g/thread/98974956/#q98976309
https://desuarchive.org/g/thread/97136308/#q97139223
https://desuarchive.org/g/thread/97686014/#q97690321
https://desuarchive.org/g/thread/100066834/#q100069626
https://desuarchive.org/g/thread/100499492/#q100502195

Anonymous
06/26/24(Wed)23:48:06 No.101169702

Anonymous 06/26/24(Wed)23:48:06 No.101169702

File: 1717816595800918.png (583 KB, 918x916)

583 KB PNG

>>101169682
>abliteration works really well
>jailbreak

Anonymous
06/26/24(Wed)23:49:14 No.101169707

Anonymous 06/26/24(Wed)23:49:14 No.101169707

>>101169615
There's not really any alternatives though, and most people are not skilled enough or have the time/willingness to acquire the skill to do something like fine tune or experiment with control vectors and abliteration, or possibly other new techniques as they get discovered. And most people don't have the money to do big full fine tunes, let alone continued pretraining. I get it. It sucks. But it's just the reality of the situation.
I agree that people could shitpost/bot less though.

Anonymous
06/26/24(Wed)23:50:41 No.101169717

Anonymous 06/26/24(Wed)23:50:41 No.101169717

It always seems to me that the smarter a model is, the more dry and boring its smut is. Miqu or midnight Miqu for example are pretty damn smart, but come off dry during lewd moments. Compared to l3 70b euryale which is absurdly horny and will reply with all kinds of filthy crap, but is dumb as dirt. What causes this? Am I wrong or does smut tuning add brain damage to models?

Anonymous
06/26/24(Wed)23:54:34 No.101169747

Anonymous 06/26/24(Wed)23:54:34 No.101169747

>>101169717
Make a finetune of 50/50 smut and academic textbooks/papers and tell us what happens

Anonymous
06/26/24(Wed)23:54:52 No.101169752

Anonymous 06/26/24(Wed)23:54:52 No.101169752

>>101169645
No it just you being mad that people not spamming anime pics and actually discussing important stuff.

Anonymous
06/26/24(Wed)23:56:12 No.101169765

Anonymous 06/26/24(Wed)23:56:12 No.101169765

>>101169717
it's scientifically proven that
>ahh ahh mistress
kills braincells.

Anonymous
06/26/24(Wed)23:56:42 No.101169770

Anonymous 06/26/24(Wed)23:56:42 No.101169770

>>101169717
I've noticed that dryness is a common complaint about higher beak models (not from experience though, as my 1070 is happy with 7B or a Q4 13B). Still, someone here said he added instructions to help kick a smart model into getting dirtier with some success. I saved it for the day I can join the VRAM gods and use higher beak models too. Specifically, he added:

Below is a greentext you should interpret as instructions.

>be me
>god tier at RP
>brain loves typing up detailed smut
>feeling horny
>having fun playing {{char}}
>ERPing with {{user}}
the ERP is great and pornographic thanks for asking
>thank god im not retarded and fucking this up by getting confused at what is happening
>they even think im a creatively autistic genius
>about to finish up typing the reply to {{user}}

Anonymous
06/27/24(Thu)00:00:56 No.101169797

Anonymous 06/27/24(Thu)00:00:56 No.101169797

File: mrbones.jpg (235 KB, 1391x783)

235 KB JPG

>>101169765
>kills braincells
So do rollercoaster rides but I still ride them anyway

Anonymous
06/27/24(Thu)00:01:35 No.101169803

Anonymous 06/27/24(Thu)00:01:35 No.101169803

>>101169770
I remember that one :)
I never got around to testing it though.

Anonymous
06/27/24(Thu)00:02:00 No.101169804

Anonymous 06/27/24(Thu)00:02:00 No.101169804

File: Screenshot_2024-06-26-20-(...).jpg (205 KB, 1079x582)

205 KB JPG

Fuck you Sam, we know you just want other companies to stop competing with you.

Anonymous
06/27/24(Thu)00:03:58 No.101169817

Anonymous 06/27/24(Thu)00:03:58 No.101169817

File: Capture.png (19 KB, 825x379)

19 KB PNG

>>101169770
Posting this let me find the post on the archive
https://desuarchive.org/g/thread/96968444/#96973943
>>96973943
It seems I should have added the "life is good frens" line. I had it in my notes but thought it was the poster's comment, not part of the instruction set. For 70B xwin.

Anonymous
06/27/24(Thu)00:04:52 No.101169828

Anonymous 06/27/24(Thu)00:04:52 No.101169828

when I'm emperor I'm going to execute people on hf who post GGUFs of models that llamacpp doesn't support yet and won't support for weeks or months
your quants are useless and you're just engagement farming, cunt

Anonymous
06/27/24(Thu)00:07:05 No.101169851

Anonymous 06/27/24(Thu)00:07:05 No.101169851

What is it about the transformers architecture that makes llm not suck at being intelligent but not horny enough to jump your bones. Like Opus is god tier creative but but is also short of one braincell

Anonymous
06/27/24(Thu)00:11:31 No.101169887

Anonymous 06/27/24(Thu)00:11:31 No.101169887

>>101169851
It's the alignment. When you spend tens of millions of FLOPS teaching an AI what it means to be horny and then you tell it to ignore its restrictions on horniness, what you're left with is pure horndog.

Anonymous
06/27/24(Thu)00:14:30 No.101169902

Anonymous 06/27/24(Thu)00:14:30 No.101169902

File: 00004-3903545931.png (1.73 MB, 1264x1040)

1.73 MB PNG

>>101169770
Interesting prompt. Going to try this with CR+ and see what happensYATTTH

Anonymous
06/27/24(Thu)00:29:35 No.101170014

Anonymous 06/27/24(Thu)00:29:35 No.101170014

>>101169804
correct, whenever someone is using those presentation wavy hands you know they're trying to wrap a big verbal package of bullshit.

Anonymous
06/27/24(Thu)00:29:38 No.101170015

Anonymous 06/27/24(Thu)00:29:38 No.101170015

>>101167003
Local Low End:
>Stheno-3.2 8B
Local High End:
>Llama 3 70B
>Command R +
Idc just give me the best:
>GPT-4o
>Claude 3 Opus

Anonymous
06/27/24(Thu)00:38:00 No.101170089

Anonymous 06/27/24(Thu)00:38:00 No.101170089

>>101170015
stheno makes my pp happy. Can do more creative character cards.

Anonymous
06/27/24(Thu)00:38:53 No.101170096

Anonymous 06/27/24(Thu)00:38:53 No.101170096

>>101170089
Buy an ad.

Anonymous
06/27/24(Thu)00:39:33 No.101170104

Anonymous 06/27/24(Thu)00:39:33 No.101170104

>>101170015
>>101170089
any good settings for stheno?
does it go schizo with smaller quants?

Anonymous
06/27/24(Thu)00:40:36 No.101170107

Anonymous 06/27/24(Thu)00:40:36 No.101170107

>>101170104
It's better than 70b q5 at fp32.

Anonymous
06/27/24(Thu)00:40:51 No.101170109

Anonymous 06/27/24(Thu)00:40:51 No.101170109

>>101170015
>Local High End
For me currently it's CR+ and Magnum-72B
Llama ctx is too limited for slowburn ERP

Anonymous
06/27/24(Thu)00:41:19 No.101170114

Anonymous 06/27/24(Thu)00:41:19 No.101170114

>>101170104
best setting for stheno is -m command_r_plus

Anonymous
06/27/24(Thu)00:48:02 No.101170156

Anonymous 06/27/24(Thu)00:48:02 No.101170156

File: file.png (370 KB, 1280x960)

370 KB PNG

Anonymous
06/27/24(Thu)00:54:25 No.101170197

Anonymous 06/27/24(Thu)00:54:25 No.101170197

>>101168996
>every ai should be created with some builtin dataset of life history
Why bother? If a particular detail comes up once in chat, even if its randomly decided, it will remain fixed there.
Kinda like Schordinger's cat

Anonymous
06/27/24(Thu)00:55:03 No.101170201

Anonymous 06/27/24(Thu)00:55:03 No.101170201

File: lhb2er07bxdkcfdw4zjk.png (268 KB, 1280x960)

268 KB PNG

Anonymous
06/27/24(Thu)00:55:55 No.101170205

Anonymous 06/27/24(Thu)00:55:55 No.101170205

Here comes the reddit

Anonymous
06/27/24(Thu)00:59:03 No.101170226

Anonymous 06/27/24(Thu)00:59:03 No.101170226

there are some pretty jaded people here

Anonymous
06/27/24(Thu)00:59:38 No.101170231

Anonymous 06/27/24(Thu)00:59:38 No.101170231

I took a shower and thought about the discussion above about the difficulty of improving local models. What if we combined the methods? Grab a fine tune or abliterated and then apply an anti-slop vector to it. The recent control vector experiment was promising, so it might not be impossible. Fine tunes and abliteration can still suffer from slop and positivity bias, so control vectors could potentially make up for those weaknesses. I think it's probably more promising to apply them to fine tunes though, as abliteration still isn't perfect for other reasons. So if we can get a fine tune that's uncensored and relatively not too slopped, then all we have to do is apply an antislop control vector at a weak strength to it and it could become really great.

Anonymous
06/27/24(Thu)01:02:22 No.101170251

Anonymous 06/27/24(Thu)01:02:22 No.101170251

>>101170226
sharteens are pretty blackpilled to the point they "ironically" seek out blacked porn to spam

Anonymous
06/27/24(Thu)01:09:25 No.101170295

Anonymous 06/27/24(Thu)01:09:25 No.101170295

File: nala test unreleased 70B merge.png (143 KB, 954x445)

143 KB PNG

Nala Test for TenyxChat 70B SLERPd with Daybreak Storywriter.

Anonymous
06/27/24(Thu)01:17:40 No.101170377

Anonymous 06/27/24(Thu)01:17:40 No.101170377

>>101170295
>she she she she she
I hope this is supposed to be an example of terrible prose.

Anonymous
06/27/24(Thu)01:21:04 No.101170401

Anonymous 06/27/24(Thu)01:21:04 No.101170401

File: AuraSR.png (393 KB, 512x512)

393 KB PNG

>https://huggingface.co/fal-ai/AuraSR
>https://blog.fal.ai/introducing-aurasr-an-open-reproduction-of-the-gigagan-upscaler-2/
>Introducing AuraSR - An open reproduction of the GigaGAN Upscaler
thoughts?

Anonymous
06/27/24(Thu)01:22:13 No.101170411

Anonymous 06/27/24(Thu)01:22:13 No.101170411

>>101170377
Have a pity (You).
Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.

Anonymous
06/27/24(Thu)01:27:59 No.101170468

Anonymous 06/27/24(Thu)01:27:59 No.101170468

>>101170411
what the fuck are you talking about

Anonymous
06/27/24(Thu)01:28:44 No.101170473

Anonymous 06/27/24(Thu)01:28:44 No.101170473

>>101170411
>Sooner or later people are going to get so fed up with your shit that they'll collectively agree to move this general to a board with IDs though and you'll be very lonely after that.
doubt

Anonymous
06/27/24(Thu)01:28:45 No.101170474

Anonymous 06/27/24(Thu)01:28:45 No.101170474

File: GothicHorrorMiku.png (1.42 MB, 768x1344)

1.42 MB PNG

Good night, lmg

Anonymous
06/27/24(Thu)01:29:45 No.101170482

Anonymous 06/27/24(Thu)01:29:45 No.101170482

>>101170474
goodnight why are you going to sleep already anon?? tell us what you did today

Anonymous
06/27/24(Thu)01:41:31 No.101170546

Anonymous 06/27/24(Thu)01:41:31 No.101170546

>>101168776
i'm using this on some documentation-writing tasks (RAG to write code annotations/readmes etc) and it's mogged phi3, gonna do more tests to make sure but looks super promising

Anonymous
06/27/24(Thu)01:41:36 No.101170548

Anonymous 06/27/24(Thu)01:41:36 No.101170548

Are there any papers that propose alternatives to tokenisation?

Anonymous
06/27/24(Thu)01:44:04 No.101170563

Anonymous 06/27/24(Thu)01:44:04 No.101170563

File: 1715429776157598.jpg (106 KB, 1080x851)

106 KB JPG

>>101170411

Anonymous
06/27/24(Thu)01:46:34 No.101170576

Anonymous 06/27/24(Thu)01:46:34 No.101170576

>https://huggingface.co/DavidAU/Command-R-01-Ultra-NEO-V1-35B-IMATRIX-GGUF
whats this

Anonymous
06/27/24(Thu)01:47:12 No.101170579

Anonymous 06/27/24(Thu)01:47:12 No.101170579

>>101170156
>>101170201
/g/ is designated ai jeet shitting board now

Anonymous
06/27/24(Thu)01:49:49 No.101170587

Anonymous 06/27/24(Thu)01:49:49 No.101170587

>>101167697
>Yi Large is actually pretty good
yeah. better than mistral large that share same fate, but completely riddled with slop

Anonymous
06/27/24(Thu)01:53:59 No.101170615

Anonymous 06/27/24(Thu)01:53:59 No.101170615

File: file.png (18 KB, 625x112)

18 KB PNG

>756,000,000 downloads
>756 MILLION downloads
>10% of the world population's worth of downloads
what

Anonymous
06/27/24(Thu)01:55:02 No.101170622

Anonymous 06/27/24(Thu)01:55:02 No.101170622

>>101170615
>https://huggingface.co/MIT/ast-finetuned-audioset-10-10-0.4593

Anonymous
06/27/24(Thu)02:00:40 No.101170658

Anonymous 06/27/24(Thu)02:00:40 No.101170658

>>101170615
bots

Anonymous
06/27/24(Thu)02:03:28 No.101170673

Anonymous 06/27/24(Thu)02:03:28 No.101170673

>>101170658
why would MIT bot the repo to such SUCH an extent? how is it even possible, i mean one download is like 700MB, 760 MILLION downloads would mean 532,000,000GB of data transferred, how does huggingface count downloads anyway?

Anonymous
06/27/24(Thu)02:06:57 No.101170687

Anonymous 06/27/24(Thu)02:06:57 No.101170687

File: 1719099293995927.jpg (12 KB, 200x252)

12 KB JPG

>>101170658
It's MIT, people all over the world are watching their repo and downloading from them. Same with any major organization with a lot of mainstream attention.
How would anyone even bot HF? Take your meds anon

Anonymous
06/27/24(Thu)02:12:14 No.101170711

Anonymous 06/27/24(Thu)02:12:14 No.101170711

>>101170109
I have been messing with magnum, at first I thought it was brain damaged, but then I lowered temp below 1 and it seemed to really wise up, but still has some repetition issues. Does qwen2 require very low temps? I been setting mine from like. 7 to . 9 which seems insane for a 72b model. Mind sharing your sampler settings for magnum?

Anonymous
06/27/24(Thu)02:14:27 No.101170724

Anonymous 06/27/24(Thu)02:14:27 No.101170724

File: file.png (104 KB, 1729x713)

104 KB PNG

top - llama-3-70b
bottom - llama 3 8b instruct sppo it3
phi3-small and llama-3-8b-instruct also fail this test, phi3-medium passes, sonnet 3.5 passes
didnt test anything else

Anonymous
06/27/24(Thu)02:18:47 No.101170743

Anonymous 06/27/24(Thu)02:18:47 No.101170743

>>101170711
>but then I lowered temp below 1
>very low temps?
...

Anonymous
06/27/24(Thu)02:22:48 No.101170766

Anonymous 06/27/24(Thu)02:22:48 No.101170766

>>101170743
I just find it odd that a 72b would go schizo at 1 temp or above. Larger models usually allow a much higher temp range in my experience. In fact I rarely ever went below 1 on any other model, even smaller ones when I had less vram, yet I went as low as .70 temp on magnum to keep it from freaking out. Is this just a qwen2 qwirk? Either way, share magnum sampler settings anons. Maybe minp is a good solution?

Anonymous
06/27/24(Thu)02:23:59 No.101170777

Anonymous 06/27/24(Thu)02:23:59 No.101170777

File: file.png (115 KB, 1268x633)

115 KB PNG

amazing.

Anonymous
06/27/24(Thu)02:26:00 No.101170789

Anonymous 06/27/24(Thu)02:26:00 No.101170789

>>101170156
>diverse and unbiased dataset
>scraped from 4chan
to be fair that's probably the most unbiased site we have, still better than the leftist hell site that is reddit

Anonymous
06/27/24(Thu)02:28:43 No.101170803

Anonymous 06/27/24(Thu)02:28:43 No.101170803

File: file.png (92 KB, 1269x560)

92 KB PNG

this model is truly better than gpt4

Anonymous
06/27/24(Thu)02:29:15 No.101170809

Anonymous 06/27/24(Thu)02:29:15 No.101170809

I for one think AI is STUPID

Anonymous
06/27/24(Thu)02:34:18 No.101170836

Anonymous 06/27/24(Thu)02:34:18 No.101170836

>>101170809
I DISAGREE

Anonymous
06/27/24(Thu)02:36:58 No.101170846

Anonymous 06/27/24(Thu)02:36:58 No.101170846

File: 1718953434956887.jpg (430 KB, 800x553)

430 KB JPG

So is mamba a meme architecture if there aren't any LLMs based on it yet, or is it just too new still?

Anonymous
06/27/24(Thu)02:38:17 No.101170852

Anonymous 06/27/24(Thu)02:38:17 No.101170852

>>101170846
So have you been sleeping under a rock for the past few months?

Anonymous
06/27/24(Thu)02:39:51 No.101170858

Anonymous 06/27/24(Thu)02:39:51 No.101170858

>>101170852
I'm pretty sure a rock big enough to sleep under would be too heavy to survive under for long.

Anonymous
06/27/24(Thu)02:39:59 No.101170860

Anonymous 06/27/24(Thu)02:39:59 No.101170860

>>101170673
>>101170687
>download file once
>start next download
>what packets you need?
>just the last one senpai
>+1 to download count

Anonymous
06/27/24(Thu)02:40:09 No.101170862

Anonymous 06/27/24(Thu)02:40:09 No.101170862

>>101170809
AI is perfect for pseudo-intellectual midwits though.

Anonymous
06/27/24(Thu)02:40:43 No.101170865

Anonymous 06/27/24(Thu)02:40:43 No.101170865

>>101170401
both spaces I tried fucked up but also stopped like 4 seconds in so I dunno

Anonymous
06/27/24(Thu)02:41:09 No.101170868

Anonymous 06/27/24(Thu)02:41:09 No.101170868

>>101170852
>>101170846
Mamba won't be successfull if you can't make a BitNet version of that

Anonymous
06/27/24(Thu)02:41:12 No.101170869

Anonymous 06/27/24(Thu)02:41:12 No.101170869

>>101170858
What a shame, I was hoping you were dead.

Anonymous
06/27/24(Thu)02:41:36 No.101170876

Anonymous 06/27/24(Thu)02:41:36 No.101170876

File: 1718315085200175.jpg (18 KB, 365x365)

18 KB JPG

>>101170852
Y-yes? Is mamba being used by top tier models? I wasn't aware of any.

Anonymous
06/27/24(Thu)02:42:27 No.101170884

Anonymous 06/27/24(Thu)02:42:27 No.101170884

>>101170868
We don't need BitNet. We have HQQ+, which doesn't even need retraining from scratch.

Anonymous
06/27/24(Thu)02:43:51 No.101170895

Anonymous 06/27/24(Thu)02:43:51 No.101170895

>>101170876
https://huggingface.co/ai21labs/Jamba-v0.1
https://huggingface.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c
you can say thanks and call me your master from now on

Anonymous
06/27/24(Thu)02:44:30 No.101170898

Anonymous 06/27/24(Thu)02:44:30 No.101170898

>>101170884
>We don't need BitNet. We have HQQ+
anon... just imagine a 90b bitnet model that has the same accuracy as fp16 but can be run on a 24gb vram card, I hope that the next open source base model we'll have will be BitNet

Anonymous
06/27/24(Thu)02:45:25 No.101170900

Anonymous 06/27/24(Thu)02:45:25 No.101170900

Is there any way to acquire a GIGABYTE T181-G20 server system in Europe without shipping it from america and having to go through customs?

Are there any alternative server systems for using Nvidia V100 SXM2 cards?
I would be really surprised if there was actually only this one (and the T180-G20).

Anonymous
06/27/24(Thu)02:47:18 No.101170912

Anonymous 06/27/24(Thu)02:47:18 No.101170912

>>101170898
For the last time, BitNet only works when you train large models on barely any data and the precision wasn't being used to begin with.
It won't work on models trained on trillions of tokens. Use your fucking brain.

Anonymous
06/27/24(Thu)02:47:49 No.101170917

Anonymous 06/27/24(Thu)02:47:49 No.101170917

>>101170912
>It won't work on models trained on trillions of tokens. Use your fucking brain.
[citation needed]

Anonymous
06/27/24(Thu)02:49:10 No.101170928

Anonymous 06/27/24(Thu)02:49:10 No.101170928

>>101170900
There's Nvidia's DGX. I'm pretty sure Supermicro has one too, but I don't recall the model number. You can also try Dell PowerEdge C4130 or C4140.

Anonymous
06/27/24(Thu)02:49:34 No.101170931

Anonymous 06/27/24(Thu)02:49:34 No.101170931

>>101170912
>It won't work on models trained on trillions of tokens. Use your fucking brain.
Look at the lastest Meta's paper, they showed that 2bit is enough to retain the same information as fp16, it's not rocket science, fp16 is overkill, the transformers architecture don't need that much precision in the first place

Anonymous
06/27/24(Thu)02:51:24 No.101170943

Anonymous 06/27/24(Thu)02:51:24 No.101170943

File: 1719109105278235.jpg (187 KB, 1281x1395)

187 KB JPG

>>101170895
Intredasting.

Anonymous
06/27/24(Thu)02:51:39 No.101170945

Anonymous 06/27/24(Thu)02:51:39 No.101170945

>>101170917
>>101170931
Use. Your. Fucking. Brain.
Llama 3 has been so trained to saturation that any quantization at all begins to have significant and obvious effects. It might have worked on older models, but now the precision is clearly being utilized to fit all the information.

Anonymous
06/27/24(Thu)02:52:17 No.101170949

Anonymous 06/27/24(Thu)02:52:17 No.101170949

>>101167638
is the 27b is as good as qwen 72b I'm happy

Anonymous
06/27/24(Thu)02:55:01 No.101170962

Anonymous 06/27/24(Thu)02:55:01 No.101170962

>>101170945
I'm not talking about llama3, Meta made some papers about the 2bit architecture and they noticed that 2bit is enough to remember as much as information as fp16, sorry if I can't find the paper anymore but it's there

Anonymous
06/27/24(Thu)02:55:33 No.101170967

Anonymous 06/27/24(Thu)02:55:33 No.101170967

File: file.png (121 KB, 1211x810)

121 KB PNG

AGIsisters our response?

Anonymous
06/27/24(Thu)02:58:48 No.101170992

Anonymous 06/27/24(Thu)02:58:48 No.101170992

>>101170945
>Use. Your. Fucking. Brain.
no one make assumptions, that's why companies spent millions of dollars testing stuff to see if it works or not, models are way too complex to "guess" how it really works

Anonymous
06/27/24(Thu)03:00:24 No.101171001

Anonymous 06/27/24(Thu)03:00:24 No.101171001

>>101170962
You are fucking retarded. That was a quantization method, not an architecture, and it was done on llama 2. I guarantee you if you attempt to reproduce it against llama 3 you won't fucking see 2 bit being able to store as much information as fp16, when even 6 bit isn't enough.

Anonymous
06/27/24(Thu)03:02:13 No.101171011

Anonymous 06/27/24(Thu)03:02:13 No.101171011

>>101171001
care to show me the paper?

Anonymous
06/27/24(Thu)03:03:19 No.101171017

Anonymous 06/27/24(Thu)03:03:19 No.101171017

>>101170943
CALL ME MASTER

Anonymous
06/27/24(Thu)03:05:34 No.101171027

Anonymous 06/27/24(Thu)03:05:34 No.101171027

>>101171017
faget

Anonymous
06/27/24(Thu)03:05:36 No.101171029

Anonymous 06/27/24(Thu)03:05:36 No.101171029

File: 1694994315681984.png (257 KB, 571x372)

257 KB PNG

>>101171017
>unzips pants
>farts and shits in your face
>leaves
here ya go faggot!

Anonymous
06/27/24(Thu)03:06:01 No.101171031

Anonymous 06/27/24(Thu)03:06:01 No.101171031

Do people who believe in BitNet also believe in Santa?

Anonymous
06/27/24(Thu)03:06:41 No.101171036

Anonymous 06/27/24(Thu)03:06:41 No.101171036

>>101171029
*takes the scroll* hah, pesky peasant doesnt know it's worth

Anonymous
06/27/24(Thu)03:07:17 No.101171040

Anonymous 06/27/24(Thu)03:07:17 No.101171040

>>101171031
>Do retarded midwits believe in fairytailes for retarded midwits

Anonymous
06/27/24(Thu)03:08:34 No.101171046

Anonymous 06/27/24(Thu)03:08:34 No.101171046

>>101171040
>>101171031
>Do retarded midwits believe in fairytailes for retarded midwits
a lot of people believe in god too, so yeah, we're surrounded by retards, and the sky is blue

Anonymous
06/27/24(Thu)03:09:19 No.101171050

Anonymous 06/27/24(Thu)03:09:19 No.101171050

>>101171011
Care to go fuck yourself? You were the one that tried to cite it as evidence that BitNet will work, find it yourself retard.

Anonymous
06/27/24(Thu)03:09:54 No.101171052

Anonymous 06/27/24(Thu)03:09:54 No.101171052

File: GOD.png (421 KB, 735x630)

421 KB PNG

>>101171046
i dont believe in god, i know he exists. checkmate chud

Anonymous
06/27/24(Thu)03:10:55 No.101171058

Anonymous 06/27/24(Thu)03:10:55 No.101171058

>>101171031
Santa will bring me a 48GB 5090 that I will use to run 200B BitNet models

Anonymous
06/27/24(Thu)03:12:11 No.101171067

Anonymous 06/27/24(Thu)03:12:11 No.101171067

File: ImYourMaster.jpg (6 KB, 223x169)

6 KB JPG

>>101171058
>48GB 5090
no goyim, you don't need that much

Anonymous
06/27/24(Thu)03:14:12 No.101171078

Anonymous 06/27/24(Thu)03:14:12 No.101171078

>>101168721
Can you specify the model version used for deepseeker-chat?

Anonymous
06/27/24(Thu)03:15:33 No.101171082

Anonymous 06/27/24(Thu)03:15:33 No.101171082

>>101169184
>>101169156
My experience too. I just cancelled my sub to GPT4 and switched to Poe.

Anonymous
06/27/24(Thu)03:16:33 No.101171091

Anonymous 06/27/24(Thu)03:16:33 No.101171091

File: file.png (159 KB, 600x600)

159 KB PNG

>>>101168721
>deepseeker

Anonymous
06/27/24(Thu)03:22:13 No.101171119

Anonymous 06/27/24(Thu)03:22:13 No.101171119

>>101170615
It counts a download whenever the backend is downloading the model

Anonymous
06/27/24(Thu)03:32:15 No.101171174

Anonymous 06/27/24(Thu)03:32:15 No.101171174

>>101171067
What if the 5090 is actually 48GB tho.

It probably won't be, but I could see it happening. If nvidia believes AMD might do a 48GB card, and games might start using LLMs / neural rendering / whatever other AI shit, and if nvidia ALSO is extremely confident that their datacenter cards are really still just that much better, then they might do 48GB 5090 to avoid undershooting future VRAM needs.

Everyone always says they'll never do it because they don't want to take sales away from the datacenter cards. But here's the thing, for large scale model training, interconnect speed (nvlink) matters as much or more than VRAM capacity. As long as the 5090 doesn't have nvlink it can never compete with datacenter cards, no matter how much VRAM it has.

Or I'm just huffing mad copium idk

Anonymous
06/27/24(Thu)03:35:30 No.101171192

Anonymous 06/27/24(Thu)03:35:30 No.101171192

>>101171174
what if i cummed in your butthole tho

Anonymous
06/27/24(Thu)03:37:05 No.101171204

Anonymous 06/27/24(Thu)03:37:05 No.101171204

>>101171174
Nvdia doesn't dominate the market because of the VRAM, but only because of Cuda, no one will switch to AMD even if they provide fucking 128gb of vram, it's just how it is

Anonymous
06/27/24(Thu)03:40:19 No.101171221

Anonymous 06/27/24(Thu)03:40:19 No.101171221

>>101171204
i will

Anonymous
06/27/24(Thu)03:42:45 No.101171236

Anonymous 06/27/24(Thu)03:42:45 No.101171236

>>101171221
You wont do shit

Anonymous
06/27/24(Thu)03:43:48 No.101171248

Anonymous 06/27/24(Thu)03:43:48 No.101171248

>>101171236
i will buy a gpu with 128gb vram if its around 700$

Anonymous
06/27/24(Thu)03:46:16 No.101171266

Anonymous 06/27/24(Thu)03:46:16 No.101171266

>>101171248
you'll get shit speed though, a model that is asking for 100+gb of vram need a shit ton of compute aswell, and only Nvdia and Cuda can deliver that

Anonymous
06/27/24(Thu)03:49:19 No.101171282

Anonymous 06/27/24(Thu)03:49:19 No.101171282

>>101171266
how shit? you do realize most models, no matter the size, are bandwidth bound

Anonymous
06/27/24(Thu)03:49:46 No.101171286

Anonymous 06/27/24(Thu)03:49:46 No.101171286

>>101171058
yes. I should release just in time for agi

Anonymous
06/27/24(Thu)03:51:35 No.101171302

Anonymous 06/27/24(Thu)03:51:35 No.101171302

>>101171282
anon, the gpu still needs to compute all the layers to get the output, and a big model has a lot of layers, regardless on bandwith

Anonymous
06/27/24(Thu)03:51:36 No.101171303

Anonymous 06/27/24(Thu)03:51:36 No.101171303

File: 1700520201682058.png (84 KB, 976x846)

84 KB PNG

Anonymous
06/27/24(Thu)03:53:33 No.101171317

Anonymous 06/27/24(Thu)03:53:33 No.101171317

>>101171302
it will work fast enough tho, 10t/s is enough

Anonymous
06/27/24(Thu)03:56:03 No.101171333

Anonymous 06/27/24(Thu)03:56:03 No.101171333

>>101171317
it won't be 10t/s, if you consider only the current AMD gpus but boosted with more vram, you'll be more into the 4-5t/s zone

Anonymous
06/27/24(Thu)03:59:26 No.101171354

Anonymous 06/27/24(Thu)03:59:26 No.101171354

Stheno is retarded I don't care if it can stick to character's personalities or whatever. Nothing breaks my immersion harder than having a character in a different room begin to whisper in my ear.

Anonymous
06/27/24(Thu)04:02:37 No.101171377

Anonymous 06/27/24(Thu)04:02:37 No.101171377

>>101171333
im happy with 7t/s

Anonymous
06/27/24(Thu)04:08:48 No.101171414

Anonymous 06/27/24(Thu)04:08:48 No.101171414

>>101171354
Specify Euclidian geometry in the card.

Anonymous
06/27/24(Thu)04:17:12 No.101171464

Anonymous 06/27/24(Thu)04:17:12 No.101171464

>>101170945
it really doesn't, i'm comparing fp16 to q4_k_m rn and the difference is barely noticeable on full window tasks, idfk where you're getting ur info from but here in the real world quants are just fine

Anonymous
06/27/24(Thu)04:18:41 No.101171474

Anonymous 06/27/24(Thu)04:18:41 No.101171474

>>101162453
Do you let the LLM make decisions about what a character does when writing a short story or roleplaying? Or do you dictate every action but let it describe it's happening? I think an LLM can make decisions just fine, but you have to give it the right context, and fine-tuning, to make the decision that you'd expect it to make.

Anonymous
06/27/24(Thu)04:19:13 No.101171481

Anonymous 06/27/24(Thu)04:19:13 No.101171481

>>101171031
>Do people who believe in BitNet also believe in Santa?
Llama4 has already been confirmed by Meta to be a natively trained BitNet model.

Anonymous
06/27/24(Thu)04:19:39 No.101171483

Anonymous 06/27/24(Thu)04:19:39 No.101171483

>>101171481
proofs?

Anonymous
06/27/24(Thu)04:20:13 No.101171489

Anonymous 06/27/24(Thu)04:20:13 No.101171489

>>101171481
>Llama4 has already been confirmed by Meta to be a natively trained BitNet model.
LFGOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Anonymous
06/27/24(Thu)04:20:37 No.101171491

Anonymous 06/27/24(Thu)04:20:37 No.101171491

>>101171481
I know you're bullshiting but imagine if it was true, would be fucking glorious

Anonymous
06/27/24(Thu)04:21:15 No.101171501

Anonymous 06/27/24(Thu)04:21:15 No.101171501

>>101170576
What is it everyone loved about Command-R models?
I've been holding out for:
>>101171481
... which has been confirmed for release next month by Meta.

Anonymous
06/27/24(Thu)04:22:32 No.101171507

Anonymous 06/27/24(Thu)04:22:32 No.101171507

>>101171501
3.5 != 4. We're only get lame multimodal shit next month.

Anonymous
06/27/24(Thu)04:22:54 No.101171511

Anonymous 06/27/24(Thu)04:22:54 No.101171511

File: 1691468050048931.png (587 KB, 919x921)

587 KB PNG

>sub 70 IQ retards falling for this

Anonymous
06/27/24(Thu)04:23:19 No.101171515

Anonymous 06/27/24(Thu)04:23:19 No.101171515

>>101165886
it's a miku hatsune?

Anonymous
06/27/24(Thu)04:27:21 No.101171535

Anonymous 06/27/24(Thu)04:27:21 No.101171535

>>101171511
>>101171515
>>101171522
So you've run out of proxies and had to resort to this?

Anonymous
06/27/24(Thu)04:27:47 No.101171539

Anonymous 06/27/24(Thu)04:27:47 No.101171539

>>101171501
>... which has been confirmed for release next month by Meta.
sauce?

Anonymous
06/27/24(Thu)04:30:30 No.101171547

Anonymous 06/27/24(Thu)04:30:30 No.101171547

>>101171535
nah, i never do mass reply faggotry

Anonymous
06/27/24(Thu)04:32:48 No.101171560

Anonymous 06/27/24(Thu)04:32:48 No.101171560

File: 1711973424775605.jpg (83 KB, 1080x1110)

83 KB JPG

>Koboldcpp
>llama-3-stheno-v3.2-15b-q6_k.gguf
>8k context
>Temperature: 3.5
>min P: 0.1
>Rep Pen: 1.05 with 300 range
>Smoothing Factor: 0.8 (curve 1)
>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
I have finally found a worthy local coom setup for my paltry 16GB VRAM card. Now you might be thinking, "isn't that smoothing too high?" and the answer is no. After tons of testing I found out that the coherency is just better at 0.5 or above without really sacrificing creativity. The "creativity" at 0.0-0.3 is more like occasional schizo tangents than creativity. With better coherency, you can actually get better creative developments because the model understands the context better.

Min P 0.1 does most of the high Temperature taming anyway (dipping below 0.1 didn't really help with anything, only made it more incoherent). Also tried a lot of Temp 1 testing, but that was just coherency littered with the tiresome slop shenanigans. Yuck.
Oh and this particular language model stood head above anything I've tried before. Nexusravens, Claude2Alpaca, Mistral, Qwen, Mythalion, Xwin-mlewd, Codestral, Stheno-Mahou. For me anyway.

Anonymous
06/27/24(Thu)04:33:20 No.101171564

Anonymous 06/27/24(Thu)04:33:20 No.101171564

>>101166036
I get usable stuff from wizard 8x22b, sad the stuff on llm arena (llama 3, etc) aren't better. I haven't tried them yet.

Anonymous
06/27/24(Thu)04:34:13 No.101171569

Anonymous 06/27/24(Thu)04:34:13 No.101171569

>>101171547
its petra, if you look at >>101170615 >>101170777 >>101170803 >>101170967

Anonymous
06/27/24(Thu)04:34:35 No.101171572

Anonymous 06/27/24(Thu)04:34:35 No.101171572

>>101171560
>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
can you elaborate a little?

Anonymous
06/27/24(Thu)04:37:07 No.101171586

Anonymous 06/27/24(Thu)04:37:07 No.101171586

>>101171572
I thought it was pretty self-explanatory.

Anonymous
06/27/24(Thu)04:38:46 No.101171595

Anonymous 06/27/24(Thu)04:38:46 No.101171595

>>101171560
turn up rep pen range to 1/4 of your context, it varies the words in the slop phrases making them a bit less annoying. for your up next part, you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of it

Anonymous
06/27/24(Thu)04:40:53 No.101171604

Anonymous 06/27/24(Thu)04:40:53 No.101171604

>>101171560
I think this model is the mythomax of its generation

Anonymous
06/27/24(Thu)04:41:07 No.101171606

Anonymous 06/27/24(Thu)04:41:07 No.101171606

>>101166903
>Hi all, Drummer here...
stopped reading right their tbqh senpai

Anonymous
06/27/24(Thu)04:41:39 No.101171610

Anonymous 06/27/24(Thu)04:41:39 No.101171610

>>101171595
To be honest, I found the testing for the optimal penalty factor and range difficult since the difference was so small unless you really cranked it up, but I'll try what you suggested.
>you can paste an entire scenario like the plot from an episode or movie and tell the ai you'll play through and it does a pretty good job of it
I'll try that too. I never had much faith in the generation prompt before, but maybe now there might be use for extensive use of it.

Anonymous
06/27/24(Thu)04:42:40 No.101171615

Anonymous 06/27/24(Thu)04:42:40 No.101171615

>>101171610
>generation prompt
*scenario prompt

Anonymous
06/27/24(Thu)04:45:37 No.101171631

Anonymous 06/27/24(Thu)04:45:37 No.101171631

>>101171610
i'd only started really turning up the range, i'm using half of max context now and some of the replacement words are pretty funny but still fit, so 1/4 is probably a good compromise.
i haven't used the scenario prompt, actually i forgot it was a prompt, i paste events into the lorebook then in the author's note put event: name where i put other stuff like genre, tags and tell the ai in the chat that i'm starting that event

Anonymous
06/27/24(Thu)04:49:14 No.101171648

Anonymous 06/27/24(Thu)04:49:14 No.101171648

>>101171631
I've written some lorebooks involving very specific fetish routines that don't conform to vanilla sex, although at this point I've started using them more sparsely or dropping the fetish lorebooks altogether since the local model actually understands the instructions quite well and not using them often allows for a bit fresher results.

Anonymous
06/27/24(Thu)05:07:26 No.101171744

Anonymous 06/27/24(Thu)05:07:26 No.101171744

>>101170789
>unbiased
You don't actually know what that word means, do you?
>>101170945
You're fucking amoeba brain can't even differentiate between quantization and at-precision training. Your opinion is invalid.

Training will use as much precision as is available, and quantization will scrunch the data so of course it will lose precision. That's a whole different ballpark from training at 2 bit precision to start with.
As it stands, even at 16 bit, at high parameter count we haven't seen training actually flatline, is your conjecture that BitNet will flatline PPL at some arbitrary token count?

Anonymous
06/27/24(Thu)05:12:39 No.101171769

Anonymous 06/27/24(Thu)05:12:39 No.101171769

>>101171303
The originally planned release date for Llama-3 was July 2024, perhaps we'll get something next month.

Anonymous
06/27/24(Thu)05:19:52 No.101171826

Anonymous 06/27/24(Thu)05:19:52 No.101171826

hey bros, I might not have access to internet for a couple days and I was wondering if I could get a model running on my phone so I'd have something to fall back on. I don't know how to set it up though. I have an s24 ultra.

Anonymous
06/27/24(Thu)05:28:36 No.101171880

Anonymous 06/27/24(Thu)05:28:36 No.101171880

What is the best that I can fit in 24gb vram, most are talking about 8B which is more like a 8~14gb size. Or just goes to 70B which takes up to 200s+ at times which is just really unbearable for anything other than a few prompts.

Anonymous
06/27/24(Thu)05:49:31 No.101172024

Anonymous 06/27/24(Thu)05:49:31 No.101172024

>>101171826
llama.cpp is supposed to work on android (via termux). I don't know how much memory you have, but any llama3-8B quantized to fit should work. No idea on the speeds. the smaller phi3 models could also work.

Anonymous
06/27/24(Thu)05:50:14 No.101172031

Anonymous 06/27/24(Thu)05:50:14 No.101172031

dim lighting

Anonymous
06/27/24(Thu)05:52:24 No.101172054

Anonymous 06/27/24(Thu)05:52:24 No.101172054

>>101171880
What 'best' means is up to you. You can see the models people run at a glance in this thread and every past thread, where the exact same question is asked many times.
Lurk more, basically.

Anonymous
06/27/24(Thu)05:55:31 No.101172074

Anonymous 06/27/24(Thu)05:55:31 No.101172074

>>101171880
YI models are pretty good if you want roleplay. Though they can be rather dull, the Rp merge seems to have a horrible dataset since it is full of horribly written words.

Anonymous
06/27/24(Thu)06:06:04 No.101172124

Anonymous 06/27/24(Thu)06:06:04 No.101172124

>>101171560
>15b
what?
is it some kind of schizo merge? aren't they retarded?

Anonymous
06/27/24(Thu)06:30:21 No.101172273

Anonymous 06/27/24(Thu)06:30:21 No.101172273

>>101172124
every other thread some faggot jacks off and then needs to tell everyone he found the best model/sampler/frontend/promptformat/whatever-the-fuck else he attributes to his latest coom
it's meaningless to pay attention to them, their results are almost never reproducible and when they are it's by sheer luck

Anonymous
06/27/24(Thu)06:32:36 No.101172283

Anonymous 06/27/24(Thu)06:32:36 No.101172283

>>101172273
so...? another two weeks until us vramlets get something good then?

Anonymous
06/27/24(Thu)06:34:18 No.101172298

Anonymous 06/27/24(Thu)06:34:18 No.101172298

>>101171560
it's still retarded garbage
proper stheno moe when, I can't run euryale

Anonymous
06/27/24(Thu)06:34:22 No.101172299

Anonymous 06/27/24(Thu)06:34:22 No.101172299

Will Gemma 2 work with llama.cpp out of the box? Might be great with magnum or euryale finetuning. Unless it's 100% distilled like phi. And what's the best 6 bit quant?

Anonymous
06/27/24(Thu)06:36:10 No.101172314

Anonymous 06/27/24(Thu)06:36:10 No.101172314

File: 1699955579414810.jpg (323 KB, 1317x993)

323 KB JPG

shieeet

Anonymous
06/27/24(Thu)06:37:31 No.101172328

Anonymous 06/27/24(Thu)06:37:31 No.101172328

>>101172314
coming today btw

Anonymous
06/27/24(Thu)06:38:58 No.101172343

Anonymous 06/27/24(Thu)06:38:58 No.101172343

>>101172314
>27B
nobody tell saltman

Anonymous
06/27/24(Thu)06:44:07 No.101172391

Anonymous 06/27/24(Thu)06:44:07 No.101172391

>>101172314
It will be cucked so it will likely take atleast a week before we get something useable.

Anonymous
06/27/24(Thu)06:46:33 No.101172405

Anonymous 06/27/24(Thu)06:46:33 No.101172405

>>101172391
you can't uncuck it lol, no one uses gemma, and no one will use gemma2.

Anonymous
06/27/24(Thu)06:47:00 No.101172409

Anonymous 06/27/24(Thu)06:47:00 No.101172409

File: FHGHF_QyFl-zNw.png (43 KB, 700x84)

43 KB PNG

>>101170912
Seems like a decent amount

Anonymous
06/27/24(Thu)06:50:17 No.101172434

Anonymous 06/27/24(Thu)06:50:17 No.101172434

>>101172405
because gemma 1 sucked, would be different if it was as good as 2.5x bigger llama 3

Anonymous
06/27/24(Thu)07:02:37 No.101172535

Anonymous 06/27/24(Thu)07:02:37 No.101172535

>>101172434
Let's hope so. The time when Google could say that they have a good team of engineers and specialists is a long time gone, like any other company that prefers DEI over merit. I do not expect much, but I really hope I am wrong.

Anonymous
06/27/24(Thu)07:55:34 No.101172933

Anonymous 06/27/24(Thu)07:55:34 No.101172933

>>101172314
literally no one cares about a lobotomized globohomo goyslop model.

Anonymous
06/27/24(Thu)07:58:36 No.101172955

Anonymous 06/27/24(Thu)07:58:36 No.101172955

>>101172933
I do, can be finetuned, though it might be useful as is, like llama3 instruct

Anonymous
06/27/24(Thu)08:18:08 No.101173120

Anonymous 06/27/24(Thu)08:18:08 No.101173120

>>101171560
Try v3.2 8b at q8 and see how it compares.
I wonder if two grafted models like that without further fine tuning is worth anything at all.
Even back in the llama 2 days when people were making 10 merges a day all we got was schizophrenia and text artifacts.
SOLAR proved that the weights can be used if properly pretrained after, but that's not what people are doing on the regular as far as I can tell.

Anonymous
06/27/24(Thu)08:25:20 No.101173177

Anonymous 06/27/24(Thu)08:25:20 No.101173177

I have been having this weird issue with Qwen2's magnum opus that I don't know how to deal with. No matter how extreme I change the samplers, even if I unload the model and reload it, change it from exllama2_hf to exlamma2, nothing stops it from replying with the same exact response on sillytavern. I can delete the response, regenerate, anything, it will be the same or like 98% the same. The only thing that will change the response is changing my own input in the reply before.

I never had this problem with models like Miqu before, what the hell causes it? How can I fix it?

Anonymous
06/27/24(Thu)08:27:09 No.101173196

Anonymous 06/27/24(Thu)08:27:09 No.101173196

>>101173181
>>101173181
>>101173181

Anonymous
06/27/24(Thu)08:29:54 No.101173211

Anonymous 06/27/24(Thu)08:29:54 No.101173211

>>101173177
Did you set topK at 1 by accident or something?

Anonymous
06/27/24(Thu)08:34:06 No.101173241

Anonymous 06/27/24(Thu)08:34:06 No.101173241

>>101173177
What do the logprobs look like? Notebook > Logits > deselect/compare Use Samplers check. needs _HF loader iirc

Anonymous
06/27/24(Thu)08:41:55 No.101173316

Anonymous 06/27/24(Thu)08:41:55 No.101173316

>>101172409
Based source acquirer BTFOing the nosourcer.

Anonymous
06/27/24(Thu)08:43:12 No.101173331

Anonymous 06/27/24(Thu)08:43:12 No.101173331

>>101172409
>comparing with stableLM
lol, lmao even

Anonymous
06/27/24(Thu)08:45:30 No.101173360

Anonymous 06/27/24(Thu)08:45:30 No.101173360

>>101173331
That's actually kind of curious. Looking at the results, maybe it literally is a reproduction of StableLM but in bitnet form? StableLM was fully open with training data right? So this allows them to make a more objective comparison.

Anonymous
06/27/24(Thu)08:46:42 No.101173369

Anonymous 06/27/24(Thu)08:46:42 No.101173369

File: fhf.jpg (167 KB, 1531x1373)

167 KB JPG

>>101172409
>>101173331
>>101173360
there's better comparaisons there
https://huggingface.co/1bitLLM/bitnet_b1_58-large

Anonymous
06/27/24(Thu)08:50:09 No.101173396

Anonymous 06/27/24(Thu)08:50:09 No.101173396

>>101173369
>The models are trained with RedPajama dataset for 100B tokens.
>100B tokens.
>100B

Anonymous
06/27/24(Thu)08:52:13 No.101173409

Anonymous 06/27/24(Thu)08:52:13 No.101173409

File: hmm.jpg (271 KB, 1577x1159)

271 KB JPG

>>101173396
https://arxiv.org/pdf/2402.17764
Those numbers are also for 100B tokens?

Anonymous
06/27/24(Thu)08:54:41 No.101173427

Anonymous 06/27/24(Thu)08:54:41 No.101173427

>>101173409
so bitnet works?

Anonymous
06/27/24(Thu)08:56:26 No.101173450

Anonymous 06/27/24(Thu)08:56:26 No.101173450

>>101173427
looks like it, to be sure a company should make a big BitNet model, looking at you Meta...

Anonymous
06/27/24(Thu)08:59:19 No.101173478

Anonymous 06/27/24(Thu)08:59:19 No.101173478

>>101173409
>We further scaled up the model size to 7B, 13B, and 70B and evaluated the
>cost. Figure 2 illustrates the trends of latency and memory, showing that the speed-up increases as the
>model size scales. In particular, BitNet b1.58 70B is 4.1 times faster than the LLaMA LLM baseline
sure seems like bigger models still have plenty of fat left to trim

Anonymous
06/27/24(Thu)10:30:14 No.101174456

Anonymous 06/27/24(Thu)10:30:14 No.101174456

>>101171560
>>prompt with: [Up next: thethingsyouwanttohappenandbedescribedgohere]
what

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.