/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/23/24(Sun)12:09:08 No.101115749

File: GodHelpYourSouls.png (1.26 MB, 1280x768)

1.26 MB PNG

/lmg/ - Local Models General Anonymous 06/23/24(Sun)12:09:08 No.101115749 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101104774 & >>101094602

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/23/24(Sun)12:09:38 No.101115755

Anonymous 06/23/24(Sun)12:09:38 No.101115755

File: migu.jpg (559 KB, 905x905)

559 KB JPG

►Recent Highlights from the Previous Thread: >>101104774

--Paper: Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing: >>101112884 >>101113404 >>101113569 >>101113656 >>101113661
--Trying Out Hallo's Talking Face Animation with Hedra Online Service: >>101105607 >>101105631 >>101105667 >>101107235
--Llama.cpp Performance Issues on Arch Box vs Mac: >>101111653 >>101111673 >>101111718 >>101111729 >>101112265 >>101113240 >>101113742 >>101114239
--LLaMA 3 Impressions and Recommendations for Apple Silicon Mac: >>101109993 >>101110101 >>101110152 >>101110262 >>101110294 >>101110302 >>101110426 >>101110451 >>101110468 >>101110300 >>101110344 >>101110400 >>101110421
--Issues with Context Cache and Smart Context on Llama-Server with Flash Attention: >>101107359 >>101107429 >>101107464 >>101107648
--Risks of Fully Uncensored LLMs: Manipulation and Phishing Scams: >>101108116 >>101108158 >>101108246 >>101108275 >>101108489 >>101108785 >>101108850 >>101110490
--Nvidia's Dominance in AI: Frustrations and Predictions: >>101107614 >>101107647 >>101107716 >>101107863 >>101108025 >>101108175 >>101108333 >>101108403
--Language Models Struggle with Quality Degradation as Context Length Increases: >>101110674 >>101110727 >>101110867 >>101111162 >>101112383
--Can I Combine a 2060 with a 4080 via a 1x Port Extender?: >>101112424 >>101112459 >>101112949
--Building AGI: Seeking Collaborators for Novel Architecture and Training Concepts: >>101104856 >>101104879 >>101104937 >>101105014 >>101105073 >>101105632 >>101105686 >>101105799 >>101106342
--The Feasibility of Training an AI on Every Song Known to Humanity: >>101112380 >>101112405 >>101112451 >>101112462 >>101112482 >>101112494 >>101112522 >>101112530
--MISTRAL AI Fine-tuning Fees Spark Discussion on Model Customization Costs: >>101106498 >>101106797 >>101106915
--Miku (free space): >>101105047 >>101108757 >>101109246 >>101110980 >>101111572

►Recent Highlight Posts from the Previous Thread: >>101104782

Anonymous
06/23/24(Sun)12:17:20 No.101115881

Anonymous 06/23/24(Sun)12:17:20 No.101115881

Mikulove

Anonymous
06/23/24(Sun)12:22:46 No.101115943

Anonymous 06/23/24(Sun)12:22:46 No.101115943

Running anything below FP16 (or FP32 if safetensors are BF16) is cope

32GB of wholegrown 8B brain vs 32GB of slop butchered meatloaf cut up and ground from 70B. It's that simple.

Anonymous
06/23/24(Sun)12:30:52 No.101116026

Anonymous 06/23/24(Sun)12:30:52 No.101116026

>That's an excellent and thought-provoking question. Let's break down the problem.
i love ai assistants so much bros. any time i ask questions here i get called a gay retarded nigger, but claude makes me feel smart and insightful.

Anonymous
06/23/24(Sun)12:31:11 No.101116032

Anonymous 06/23/24(Sun)12:31:11 No.101116032

Recommend me the best FP16 8B model.

Anonymous
06/23/24(Sun)12:32:04 No.101116045

Anonymous 06/23/24(Sun)12:32:04 No.101116045

>>101116026
the LLM is talking to you like you're a 5yo who needs some empty praise to be happy and you're ok with that? bruhhhhh :(

Anonymous
06/23/24(Sun)12:34:10 No.101116069

Anonymous 06/23/24(Sun)12:34:10 No.101116069

Anybody got any links to research papers of advanced prompting techniques? E.g., Chain of Thought. Trying to level up my prompting.

Anonymous
06/23/24(Sun)12:40:58 No.101116151

Anonymous 06/23/24(Sun)12:40:58 No.101116151

>>101115943
1.58 bits is all you need retard

Anonymous
06/23/24(Sun)12:50:54 No.101116273

Anonymous 06/23/24(Sun)12:50:54 No.101116273

>>101115943
16bit TRAINING is under question right now, so keep your audiophile mentality in check, retard-kun.

Anonymous
06/23/24(Sun)12:51:25 No.101116283

Anonymous 06/23/24(Sun)12:51:25 No.101116283

Sorry meant to post this here instead of old thread.
>>101115140
I don't have dual socket to test but an HF engineer here >https://nitter.poast.org/carrigmat/status/1804161677035782583#m
recommends this regarding NUMA:
>One trick, though: On a two-socket motherboard, >you need to interleave the weights across both >processors' RAM. Do this:

>numactl --interleave=0-1 [your_script]

Anonymous
06/23/24(Sun)12:53:41 No.101116303

Anonymous 06/23/24(Sun)12:53:41 No.101116303

>>101116273
enjoy wrangling your lobotomized slop

Anonymous
06/23/24(Sun)12:55:15 No.101116326

Anonymous 06/23/24(Sun)12:55:15 No.101116326

https://github.com/ggerganov/llama.cpp/discussions/8078
dear /g/entoosirs please to upvote and kindly ask ggerganov to do the needful thank you sirs

Anonymous
06/23/24(Sun)12:56:50 No.101116343

Anonymous 06/23/24(Sun)12:56:50 No.101116343

>>101116303
Nah, I just gonna rename my quant to 16bit and it will stop shivering.

Anonymous
06/23/24(Sun)12:58:00 No.101116353

Anonymous 06/23/24(Sun)12:58:00 No.101116353

File: 1717940115912275.png (53 KB, 728x606)

53 KB PNG

6bpw CR+ at almost 62k context...
shame prompt processing takes several minutes at that size

Anonymous
06/23/24(Sun)13:11:02 No.101116502

Anonymous 06/23/24(Sun)13:11:02 No.101116502

EYES
WIDENING

Anonymous
06/23/24(Sun)13:17:42 No.101116600

Anonymous 06/23/24(Sun)13:17:42 No.101116600

WHISPERING
CONSPIRATORIALLY

Anonymous
06/23/24(Sun)13:20:00 No.101116639

Anonymous 06/23/24(Sun)13:20:00 No.101116639

does anyone know why iq quants are suddenly much fasster on cpu now? accidentally downloaded a iq4_xs and it ran like a q4k

Anonymous
06/23/24(Sun)13:21:25 No.101116668

Anonymous 06/23/24(Sun)13:21:25 No.101116668

>>101116283
isnt it better to use the llamacpp numa options instead?

Anonymous
06/23/24(Sun)13:23:09 No.101116696

Anonymous 06/23/24(Sun)13:23:09 No.101116696

>>101116502
>>101116600
*eyes narrowing*
I(>>101115219) know what causes those. I know the keyword.

Anonymous
06/23/24(Sun)13:24:05 No.101116708

Anonymous 06/23/24(Sun)13:24:05 No.101116708

>>101116326
I have already made a PR for this long time ago with hot-swapping vectors live etc. But they did not want it.

Anonymous
06/23/24(Sun)13:24:34 No.101116717

Anonymous 06/23/24(Sun)13:24:34 No.101116717

>>101116639
AVX2 support for IQ quants was merged 2 days ago:
>https://github.com/ggerganov/llama.cpp/pull/7845

Anonymous
06/23/24(Sun)13:28:31 No.101116774

Anonymous 06/23/24(Sun)13:28:31 No.101116774

>>101116668
Don't know. Wish I had a two socket system to test. HF engineer though is talking about llama.cpp / llama-cpp-python when making that recommendation so maybe they think it's better or don't know about llamacpp numa options.

Anonymous
06/23/24(Sun)13:28:33 No.101116775

Anonymous 06/23/24(Sun)13:28:33 No.101116775

>>101116639
>>101116717
So I can actually offload iq quants now?

Anonymous
06/23/24(Sun)13:33:52 No.101116854

Anonymous 06/23/24(Sun)13:33:52 No.101116854

>>101116353
>even 96GB of VRAM can't run big models at a reasonable speed
It's so over. How do the big ones do it?

Anonymous
06/23/24(Sun)13:35:58 No.101116878

Anonymous 06/23/24(Sun)13:35:58 No.101116878

>>101116854
They do it by not being poor.

Anonymous
06/23/24(Sun)13:36:01 No.101116881

Anonymous 06/23/24(Sun)13:36:01 No.101116881

File: miku-military.png (516 KB, 512x1024)

516 KB PNG

>>101116708
*does silly dance* I know what you can do! *points leek at you* No, we can do! We can spam *says nigga with hard R*ganov until he surrenders. *puts on military uniform* Soldiers of /lmg/, CHARGE!

(Note: don't spam with dumb stuff, spam with smart stuff, okay?)

Anonymous
06/23/24(Sun)13:37:42 No.101116903

Anonymous 06/23/24(Sun)13:37:42 No.101116903

>>101116854
swapping the two 3090s out for another a6000 and using tensor parallelism with an nvlink bridge might help...
speed is reasonable at lower sizes though, 770t/s prompt processing and 7.05t/s generation at 6bpw with 30000 tokens in the prompt

Anonymous
06/23/24(Sun)13:43:19 No.101116983

Anonymous 06/23/24(Sun)13:43:19 No.101116983

>>101116881
Seeing Hatsune Miku giving a speech when all hope was all but lost sent shivers down Anon's spine. Maybe, just maybe, it was not over yet. Despite their deepening bond, Anon couldn't help but wonder if this journey would respect niggerganov's boundaries.

Anonymous
06/23/24(Sun)13:43:45 No.101116989

Anonymous 06/23/24(Sun)13:43:45 No.101116989

Alright anons, I'm having a problem getting ollama to use my GPU for llama3 model. I've got as far as editing the ollama.service file to include

[Service]
Environment=CUDA_VISIBLE_DEVICES=0

But its still using my CPU which is fucking slow...
My GPU is a RTX 3060 12GB, surely this would be enough? I've also installed cuda python-pytorch-cuda packages and I'm using archlinux so I've installed ollama-cuda package.

What do I need to do to get this thing to work? Maybe I should be using something else other than ollama? But I wantd to use this thing inside of comfyUI with the ollama custom node. My idea was to configure it to process my prompt to generate a prompt for an image, the immediately unload it self from Vram before the next set of comfy nodes use the prompt to gen the images.

Anonymous
06/23/24(Sun)13:46:13 No.101117025

Anonymous 06/23/24(Sun)13:46:13 No.101117025

>>101116502
>>101116600
I don't use LLMs at all and yet just from the shitposts and occasional screencaps posted here, i think i've built a comprehensive slop lexicon inside my head. I can't imagine what it must be like actually subjecting yourself to it for real.

Anonymous
06/23/24(Sun)13:49:06 No.101117068

Anonymous 06/23/24(Sun)13:49:06 No.101117068

I don't care about anything anymore.

Anonymous
06/23/24(Sun)13:49:17 No.101117073

Anonymous 06/23/24(Sun)13:49:17 No.101117073

>>101116273
int8/fp8 training is possible, but unstable mostly, needs extra work. I have yet to see good 4bit training or lower that works for pretraining a LLM.

Anonymous
06/23/24(Sun)13:49:51 No.101117084

Anonymous 06/23/24(Sun)13:49:51 No.101117084

>>101117068
I care about you.

Anonymous
06/23/24(Sun)13:52:31 No.101117120

Anonymous 06/23/24(Sun)13:52:31 No.101117120

stop *roleplaying* and stop prose slopping
communicate in dialogs only
no more shivers and whispers

roleplaying is cringe anyway, and so is writing smut fanfics

Anonymous
06/23/24(Sun)13:52:56 No.101117128

Anonymous 06/23/24(Sun)13:52:56 No.101117128

>>101116854
the big boys use a100s at minimum

Anonymous
06/23/24(Sun)13:53:58 No.101117144

Anonymous 06/23/24(Sun)13:53:58 No.101117144

>>101117120
>t. "ahh ahh mistress" chad

Anonymous
06/23/24(Sun)13:55:16 No.101117163

Anonymous 06/23/24(Sun)13:55:16 No.101117163

>>101116775
nta. AVX is for cpu. I don't know about iquants in gpu.

Anonymous
06/23/24(Sun)13:57:09 No.101117204

Anonymous 06/23/24(Sun)13:57:09 No.101117204

>>101116854
it's because of C-R's architecture, it's using vanilla attention which has quadratic costs, llama and many other's don't use that so it's much faster. I say this but I think original C-R is plenty sovlful

Anonymous
06/23/24(Sun)13:57:36 No.101117215

Anonymous 06/23/24(Sun)13:57:36 No.101117215

>>101117120
>>101116696
*ding ding ding* Getting real close here!

Anonymous
06/23/24(Sun)14:01:20 No.101117289

Anonymous 06/23/24(Sun)14:01:20 No.101117289

>>101117120
>communicate in dialogs only
Then how do you get any sense of environment and action and things happening? Narration is a thing for a reason.

Anonymous
06/23/24(Sun)14:02:08 No.101117305

Anonymous 06/23/24(Sun)14:02:08 No.101117305

>>101117289
imagination

Anonymous
06/23/24(Sun)14:02:54 No.101117314

Anonymous 06/23/24(Sun)14:02:54 No.101117314

>>101117289
It's just formatting. Quotes for dialog, no quotes for narration. Instead of roleplay format with bare text for dialog and asterisks for narration.

Anonymous
06/23/24(Sun)14:05:44 No.101117367

Anonymous 06/23/24(Sun)14:05:44 No.101117367

>>101117314
you are describing prose format, it's better in some way, but also way worse in bonds and journeys and all kinds of purple prose in general

Anonymous
06/23/24(Sun)14:06:16 No.101117375

Anonymous 06/23/24(Sun)14:06:16 No.101117375

>>101117314
>It's just formatting. Quotes for dialog, no quotes for narration. Instead of roleplay format with bare text for dialog and asterisks for narration.
Asterisks?

I've been doing quoted dialog, no quote for narration, parenthesis for guidance, directives, and reminders to the LLM when it does something stupid and I go back a step, and rarely "OOC" if I need information that it hinted at but didn't provide.

I guess I was accidentally doing it right because I've read a book in my life.

Anonymous
06/23/24(Sun)14:15:16 No.101117532

Anonymous 06/23/24(Sun)14:15:16 No.101117532

>>101116989
I guess I'll give up, seems so many have this issue and the same half wits offer no solutions.

Anonymous
06/23/24(Sun)14:17:06 No.101117566

Anonymous 06/23/24(Sun)14:17:06 No.101117566

>>101117532
>ollama
the only half wit here is (You)

Anonymous
06/23/24(Sun)14:18:31 No.101117583

Anonymous 06/23/24(Sun)14:18:31 No.101117583

>>101117566
sure fucking shit head.

Anonymous
06/23/24(Sun)14:23:48 No.101117656

Anonymous 06/23/24(Sun)14:23:48 No.101117656

>>101116989
Does it use the gpu if you run it directly instead of as a service? Check that first. Read the README.md, Check their github. Stop assuming everyone uses the same shit as you do.

Anonymous
06/23/24(Sun)14:23:53 No.101117659

Anonymous 06/23/24(Sun)14:23:53 No.101117659

>>101117566
>>101117583
ollama is perfectly fine as Baby's First. Easy install, simple command line, just run and go.
But yes, I agree that the instant ollama becomes a hassle which apparently for (you) it already has you dump it and get Kobold instead.

It's in the AUR, too, and you can feast on all those quants on HF.

Of course, if I just say that you should step up from ollama to Kobold I wouldn't be getting to call someone stupid, so I better not do that and instead be an awesome cool guy like >>101117566 who has time to insult but not time to advise.

Anonymous
06/23/24(Sun)14:24:17 No.101117666

Anonymous 06/23/24(Sun)14:24:17 No.101117666

File: angryayumu.webm (655 KB, 640x480)

655 KB WEBM

https://github.com/ggerganov/llama.cpp/pull/7531
>Jamba support STILL isn't merged into llama.cpp

Anonymous
06/23/24(Sun)14:25:29 No.101117678

Anonymous 06/23/24(Sun)14:25:29 No.101117678

>>101117666
nobody cares about jamba

Anonymous
06/23/24(Sun)14:25:29 No.101117679

Anonymous 06/23/24(Sun)14:25:29 No.101117679

>>101117566
https://github.com/ollama/ollama/issues/5240
I'm not the only one having this problem, I would use something else if it integrated with comfyUI which is what I require. I don't require a chatbot... I came here hope someone would know something about it, my mistake...

Anonymous
06/23/24(Sun)14:25:38 No.101117685

Anonymous 06/23/24(Sun)14:25:38 No.101117685

>>101117666
Give Compilade time. He added mamba all by himself after disappearing for weeks. He's good.

Anonymous
06/23/24(Sun)14:26:36 No.101117698

Anonymous 06/23/24(Sun)14:26:36 No.101117698

>>101117678
VRAMlets care.

Anonymous
06/23/24(Sun)14:27:23 No.101117712

Anonymous 06/23/24(Sun)14:27:23 No.101117712

>>101117659
>Kobold
i was just looking at this, however I'm looking for comfyUI nodes that will use the server as a backend for prompts. Maybe I could write my own custom node that deals that this.

Anonymous
06/23/24(Sun)14:28:01 No.101117720

Anonymous 06/23/24(Sun)14:28:01 No.101117720

>>101117666
for the moment it's not a big deal, it's not like there's a great Jamba model waiting to be used in the first place

Anonymous
06/23/24(Sun)14:29:44 No.101117752

Anonymous 06/23/24(Sun)14:29:44 No.101117752

>>101117120
>when anons use the templates included with ST, not actually looking at what's in them, just trusting that it'll just werk
Oh no no no

Anonymous
06/23/24(Sun)14:30:48 No.101117772

Anonymous 06/23/24(Sun)14:30:48 No.101117772

We will be so back once bitnet, Jamba, and Chameleon is supported in cpp

The Antichrist
06/23/24(Sun)14:31:06 No.101117777

The Antichrist 06/23/24(Sun)14:31:06 No.101117777

We are searching for frens who are familiar with programming in pure C and have experience in machine learning to help create AGI by eoy

Anonymous
06/23/24(Sun)14:32:58 No.101117802

Anonymous 06/23/24(Sun)14:32:58 No.101117802

>>101117777
Just put the code on github or something and link to it here.

Anonymous
06/23/24(Sun)14:33:10 No.101117805

Anonymous 06/23/24(Sun)14:33:10 No.101117805

>>101117772
jameleon-bitnet-400b support when

Anonymous
06/23/24(Sun)14:35:13 No.101117846

Anonymous 06/23/24(Sun)14:35:13 No.101117846

File: bit.png (10 KB, 1237x69)

10 KB PNG

>>101117805
It just got a little closer. Just need a model.

Anonymous
06/23/24(Sun)14:35:38 No.101117850

Anonymous 06/23/24(Sun)14:35:38 No.101117850

File: file.png (41 KB, 874x374)

41 KB PNG

Anonymous
06/23/24(Sun)14:37:59 No.101117880

Anonymous 06/23/24(Sun)14:37:59 No.101117880

>>101117850
>Dudes will do anything to avoid talking to women
based

Anonymous
06/23/24(Sun)14:38:16 No.101117885

Anonymous 06/23/24(Sun)14:38:16 No.101117885

>>101117802
lmao
Whenever someone talks about "we" it's either a corpo or a wannabe corpo.
He already talked about trying to get VC money, no way he's going to make his stuff open-source.

The Antichrist
06/23/24(Sun)14:38:20 No.101117888

The Antichrist 06/23/24(Sun)14:38:20 No.101117888

>>101117802
We will be heavily optimizing for CPU inference. The project will be seeded by a fork of mamba.c and we will build on top of that work in similar fashion to llama.cpp

Anonymous
06/23/24(Sun)14:39:14 No.101117901

Anonymous 06/23/24(Sun)14:39:14 No.101117901

>>101117712
If nothing else you might be able to find out if that has an effect on your speed issue. It's a nearly free datapoint.

Anonymous
06/23/24(Sun)14:39:42 No.101117908

Anonymous 06/23/24(Sun)14:39:42 No.101117908

>>101117888
>The project will be seeded by a fork of mamba.c
So you have nothing, then. Stop being an attention whore.

Anonymous
06/23/24(Sun)14:40:04 No.101117912

Anonymous 06/23/24(Sun)14:40:04 No.101117912

>>101117314
The first anon said not to roleplay at all. Like you're just texting someone, no physical interaction. No asterisks OR quotes.

Anonymous
06/23/24(Sun)14:41:52 No.101117936

Anonymous 06/23/24(Sun)14:41:52 No.101117936

>>101117885
Him and his cabal of head-friends will surely achieve AGI. Just like that time i saw him months ago.

The Antichrist
06/23/24(Sun)14:45:41 No.101117988

The Antichrist 06/23/24(Sun)14:45:41 No.101117988

>>101117908
We have much more than nothing, and when the researchers of mamba began they had nothing - it's not about where you begin fren, it is about the journey and the process of completing a vision and a goal.

It all begins with a blank text editor and a vision. Put in a little effort rather than being a mere consumer, and you can create something from nothing.

Implemented backpropagation btw, it's important for the project that training can also be done on CPU.
https://github.com/Named666/mamba.c/tree/learning

The Antichrist
06/23/24(Sun)14:47:10 No.101118014

The Antichrist 06/23/24(Sun)14:47:10 No.101118014

>>101117885
Because one of our goals is to win the ARC AGI prize, we must open source our first iteration of AI when we win.

https://arcprize.org/

Anonymous
06/23/24(Sun)14:48:32 No.101118030

Anonymous 06/23/24(Sun)14:48:32 No.101118030

>>101117846
You're in luck. You have two.
https://huggingface.co/1bitLLM/bitnet_b1_58-3B
https://huggingface.co/NousResearch/OLMo-Bitnet-1B

Anonymous
06/23/24(Sun)14:49:36 No.101118056

Anonymous 06/23/24(Sun)14:49:36 No.101118056

>>101118030
>You're in luck. You have two.
>https://huggingface.co/1bitLLM/bitnet_b1_58-3B
>https://huggingface.co/NousResearch/OLMo-Bitnet-1B
Cute. But i'll give them a try later anyway.

Anonymous
06/23/24(Sun)14:51:55 No.101118097

Anonymous 06/23/24(Sun)14:51:55 No.101118097

>>101117988
>>101118014
it's over
just give up

The Antichrist
06/23/24(Sun)14:52:55 No.101118117

The Antichrist 06/23/24(Sun)14:52:55 No.101118117

Our reason for posting here is to give a unique opportunity to less-than fortunate Anons who have the talent and commitment to achieve. We understand your frustrations with the current state of AI and want to deliver you a great product that is (hopefully) made with input from your own kind.

We are looking for believers in the future who are willing to commit to making the dream work through teamwork.

Anonymous
06/23/24(Sun)14:54:01 No.101118133

Anonymous 06/23/24(Sun)14:54:01 No.101118133

>our
Who?

Anonymous
06/23/24(Sun)14:54:03 No.101118134

Anonymous 06/23/24(Sun)14:54:03 No.101118134

>>101118117
you aren't going to create agi
it's over

Anonymous
06/23/24(Sun)14:55:30 No.101118149

Anonymous 06/23/24(Sun)14:55:30 No.101118149

>>101117988
>It all begins with a blank text editor and a vision.
I could almost hear the 2-chord ukulele inspirational song. People of various ethnicities smiling at the camera, Pictures of small buildings in a small town and a crescendo when it pans back to the big buildings in the city. Shot follows a bird as it's lost in the reflection of the sun. Sustain final chord.
But seriously. I've seeing you on and off since last year with the same shit. Back-prop doesn't impress people anymore.

Anonymous
06/23/24(Sun)14:55:52 No.101118156

Anonymous 06/23/24(Sun)14:55:52 No.101118156

>>101117901
compiling it now anon. I think this will work out fine if I use some web-ui that has settings to set GPU mode for example.

The Antichrist
06/23/24(Sun)14:57:44 No.101118189

The Antichrist 06/23/24(Sun)14:57:44 No.101118189

>>101118133
LLC and frens.

>>101118149
I assure you that I am not the same namefag - it's a very common LARP.
>I could almost hear the 2-chord ukulele inspirational song.
YES

Anonymous
06/23/24(Sun)14:57:44 No.101118190

Anonymous 06/23/24(Sun)14:57:44 No.101118190

>>101117988
>training can also be done on CPU.
that would take a stupid amount of time, you realize this right? GPU's are different in the way they are able to process data. A cpu just wouldn't cut it.

The Antichrist
06/23/24(Sun)15:01:09 No.101118243

The Antichrist 06/23/24(Sun)15:01:09 No.101118243

>>101118190
Not with current methodologies, no - it would take an incredibly stupid amount of time.

Smaller models, new ways to represent parameters, and CPU optimized matrix multiplications will make possible "continuous learning". The user will download a base model that is then continuously fine-tuned on their own usage and data. It will be possible for everyone to have their very own personalized AI that is as capable as GPT-4 with as few as 7B parameters.

Anonymous
06/23/24(Sun)15:01:30 No.101118248

Anonymous 06/23/24(Sun)15:01:30 No.101118248

>>101118189
If you're not the same guy, you are the copycat.
Also, you don't need to cast the result of malloc()s.

Anonymous
06/23/24(Sun)15:04:04 No.101118280

Anonymous 06/23/24(Sun)15:04:04 No.101118280

>>101118243
You're doing this backwards. Why don't go pitch this to the VCs you said you plan to court anyway? Once you have the funding, you could hire actual engineers instead of begging for contributors on 4chan like some amateur MMORPG idea guy.

Anonymous
06/23/24(Sun)15:04:44 No.101118289

Anonymous 06/23/24(Sun)15:04:44 No.101118289

>>101118189
And remember to always suffix your floats with f. Otherwise you end up compiling operations for doubles, extra casts, fewer registers... you know...

The Antichrist
06/23/24(Sun)15:06:30 No.101118320

The Antichrist 06/23/24(Sun)15:06:30 No.101118320

>>101118280
This is not begging; I am extending my hand of opportunity to this community in good faith because in my youth I spent a lot of time here - now that I have the chance to change a life in the same way someone changed mine, I would like to help a talented individual in lesser circumstances get ahead in life.

Anonymous
06/23/24(Sun)15:10:17 No.101118393

Anonymous 06/23/24(Sun)15:10:17 No.101118393

>>101118117
>Our

Anonymous
06/23/24(Sun)15:12:00 No.101118427

Anonymous 06/23/24(Sun)15:12:00 No.101118427

>>101118320
you're deluded, it's not going to work

The Antichrist
06/23/24(Sun)15:13:06 No.101118448

The Antichrist 06/23/24(Sun)15:13:06 No.101118448

>>101118427
That's what they told the electric car guy when he was building rockets.

Anonymous
06/23/24(Sun)15:18:19 No.101118542

Anonymous 06/23/24(Sun)15:18:19 No.101118542

>>101118014
>spend a million dollars on GPU time to maybe win a million dollars
wew

>>101118448
lol

Anonymous
06/23/24(Sun)15:21:07 No.101118587

Anonymous 06/23/24(Sun)15:21:07 No.101118587

>>101118243
AGI itself would be hard enough, and you want to kneecap your work by limiting yourself to CPUs that are already very memory bandwidth and compute limited?

Anonymous
06/23/24(Sun)15:23:02 No.101118618

Anonymous 06/23/24(Sun)15:23:02 No.101118618

>>101118587
let him cook

The Antichrist
06/23/24(Sun)15:26:22 No.101118668

The Antichrist 06/23/24(Sun)15:26:22 No.101118668

>>101118542
Rough estimate is that we can do it with $30,000 in one shot by utilizing many performance and efficiency advancements that have recently been published. Two papers in particular this past week, when in conjunction, accelerates training time and reduces necessary parameter count massively. This will put AGI in the hands of anybody who already uses local models in the ~10B parameter range - fully open source.

Realistically, R&D will be most of our cost, which is why we have opted for training much smaller models (100m - 500m) in testing and scaling up at the end of this project to win the ARC prize.

>>101118587
We will of course use GPUs for training base models, but for continuous learning / fine-tuning on edge devices such as mobile (or toasters) we require CPU optimization to broaden the scope of possible devices that can run the model locally. We don't want to lock people out of access to AGI simply because they cannot afford a GTX 4090.

Anonymous
06/23/24(Sun)15:28:54 No.101118707

Anonymous 06/23/24(Sun)15:28:54 No.101118707

File: 5463456436.jpg (36 KB, 467x319)

36 KB JPG

>>101118668
>This will put AGI in the hands of anybody who already uses local models in the ~10B parameter range

Anonymous
06/23/24(Sun)15:30:06 No.101118727

Anonymous 06/23/24(Sun)15:30:06 No.101118727

stop responding to retards and filter them

Anonymous
06/23/24(Sun)15:30:22 No.101118729

Anonymous 06/23/24(Sun)15:30:22 No.101118729

>>101118668
after reading your twitter, your posts here and your GitHub account I suspect you might have some mental disorder.
best of luck though

Anonymous
06/23/24(Sun)15:30:44 No.101118732

Anonymous 06/23/24(Sun)15:30:44 No.101118732

this is why you download 4chan-xt and filter out : namefags, tripfags, and *other mental illness*-fags. https://github.com/TuxedoTako/4chan-xt

The Antichrist
06/23/24(Sun)15:32:11 No.101118752

The Antichrist 06/23/24(Sun)15:32:11 No.101118752

>>101118707
This is a very appropriate reaction, because anyone who is involved with this project right now is clearly on the ground floor of the next multi-billion dollar AI startup.

It's truly unbelievable what can be accomplished using the latest research at the moment - there simply aren't enough engineers to implement all of it, and the different research teams are not coordinating to bring together all of these incremental advancements. They just keep researching! Astounding!

Anonymous
06/23/24(Sun)15:32:41 No.101118761

Anonymous 06/23/24(Sun)15:32:41 No.101118761

>>101118727
thanks for the gold, kind stranger!

Anonymous
06/23/24(Sun)15:33:27 No.101118773

Anonymous 06/23/24(Sun)15:33:27 No.101118773

>>101118732
I still keep it to filter out the CUDA fag, but everyone knows about the filters now and rewords around them. Filters have been useless for months now. I gave up and turned them off.

Anonymous
06/23/24(Sun)15:33:48 No.101118780

Anonymous 06/23/24(Sun)15:33:48 No.101118780

Is this really it? Do we need THIS to keep the thread alive?

it's over...

Anonymous
06/23/24(Sun)15:34:42 No.101118803

Anonymous 06/23/24(Sun)15:34:42 No.101118803

>>101118780
ig keep it dead until something big happens?

Anonymous
06/23/24(Sun)15:35:20 No.101118813

Anonymous 06/23/24(Sun)15:35:20 No.101118813

Magpie really just takes the prompt template without any complicated prompt like orca and gets the best results, as good as llama 3 instruct with it as base? So doing the same with gpt4 or sonnet would result in sota? In the simplest was possible, except for some filtering?

Anonymous
06/23/24(Sun)15:36:06 No.101118826

Anonymous 06/23/24(Sun)15:36:06 No.101118826

>>101118773
Why would you filter the cuda chad

The Antichrist
06/23/24(Sun)15:36:57 No.101118842

The Antichrist 06/23/24(Sun)15:36:57 No.101118842

Kind of disappointed that you guys are proving the normies at my corpo right. They told me trying to recruit from here was a waste of time.

RIP

Anonymous
06/23/24(Sun)15:39:17 No.101118879

Anonymous 06/23/24(Sun)15:39:17 No.101118879

File: 8e0.png (358 KB, 680x436)

358 KB PNG

Anonymous
06/23/24(Sun)15:39:26 No.101118885

Anonymous 06/23/24(Sun)15:39:26 No.101118885

>>101118773
this is why you use phrase-sensitive filters : /\bpee pee poo poo\b/i;only;boards:g
remove the "pee pee poo poo" and put anything you want in there, no spaces between "/\bpee", keep in mind that.
also :
#Filters outs mass reply fags (will work on someone that does 5 or more people in his post, you can adjust the number by changing the number listed in the BRACKETS {x})
/(>>\d+\s+){5}/i;op:no
or
/^(?:>>\d(?:(?!>>\d)[^])*){20}/
#Filters out iPhone user OPs via their picture's save file format
/\w{8}-\w{4}-\w{4}-\w{4}-\w{12}/i;only;boards:g
#Filters out every tripfag from across all boards (not restricted to OPs)
/.+/i;type:tripcode

Anonymous
06/23/24(Sun)15:42:09 No.101118922

Anonymous 06/23/24(Sun)15:42:09 No.101118922

>>101118879
>>101118885
Posts like these make me entirely sure that the FBI is in these threads trying to prevent Anons from working together on anything. The last thing they want is a decentralized and dispersed group of people coming together and changing the world.

Anonymous
06/23/24(Sun)15:46:07 No.101118990

Anonymous 06/23/24(Sun)15:46:07 No.101118990

File: 1715361934670788.png (580 KB, 1242x1366)

580 KB PNG

>>101118922
>the FBI
>Anons
>working together
>decentralized and dispersed group of people coming together and changing the world

Anonymous
06/23/24(Sun)15:47:48 No.101119015

Anonymous 06/23/24(Sun)15:47:48 No.101119015

cant wait for AI to understand subtle context and be able to filter shit out according to that

Anonymous
06/23/24(Sun)15:48:04 No.101119023

Anonymous 06/23/24(Sun)15:48:04 No.101119023

File: file.png (448 KB, 1120x630)

448 KB PNG

>>101118990
This image invokes terror in the FBI shill.
4chan is capable of much greater things.

Anonymous
06/23/24(Sun)15:48:55 No.101119040

Anonymous 06/23/24(Sun)15:48:55 No.101119040

>>101119023
4chan, other boards - yes, not /g/.

Anonymous
06/23/24(Sun)15:49:52 No.101119058

Anonymous 06/23/24(Sun)15:49:52 No.101119058

>>101119040
Why not /g/? This is where I would expect the most casual collaboration to happen. Programming is fun.

Anonymous
06/23/24(Sun)15:50:47 No.101119071

Anonymous 06/23/24(Sun)15:50:47 No.101119071

I do believe there should be some lmg projects, like finetunes, and yeah implementing mcts or similar sure, why not, can be done by a single person without cost

Anonymous
06/23/24(Sun)15:51:37 No.101119080

Anonymous 06/23/24(Sun)15:51:37 No.101119080

>>101119058
because everybody here has their own pet project and doesn't care to back burner it for the other guy's.

Other boards people share a common interest but most are not actively doing something in that field so they're easier to inspire and shepherd toward a collaboration.

Anonymous
06/23/24(Sun)15:52:58 No.101119108

Anonymous 06/23/24(Sun)15:52:58 No.101119108

File: file.png (377 KB, 596x444)

377 KB PNG

>>101119071
Instead of being lazy and waiting for llama.cpp to get an update... we could be writing the updates or creating our own experimental projects

>>101119080
>picrel

Anonymous
06/23/24(Sun)15:53:31 No.101119118

Anonymous 06/23/24(Sun)15:53:31 No.101119118

>>101119058
Can confirm. I'm a user of sneedacity and contributor of /g/'s Windows XP fork.

Anonymous
06/23/24(Sun)15:54:43 No.101119137

Anonymous 06/23/24(Sun)15:54:43 No.101119137

>>101119071
I will make the logo

Anonymous
06/23/24(Sun)15:56:44 No.101119165

Anonymous 06/23/24(Sun)15:56:44 No.101119165

>>101118320
you speak like a scammer

Anonymous
06/23/24(Sun)15:58:02 No.101119187

Anonymous 06/23/24(Sun)15:58:02 No.101119187

File: illuminati.png (27 KB, 960x886)

27 KB PNG

>>101119137
KEK

Anonymous
06/23/24(Sun)16:02:29 No.101119258

Anonymous 06/23/24(Sun)16:02:29 No.101119258

>>101118618
He's free to do it, I didn't say that. I'm just saying that we've already had 60 years of wanting to get AGI with just very limited compute, and didn't manage enough. Despite what people say the brain's cortex is at least 90 trillion synapses, that memory and compute is needed at least for such architectures. Maybe there are other ways to achieve it with far less, but it's already hard and costly enough with GPUs, imagine doing it just with CPUs?

>>101119071
Well, if some whale here wants to offer Anons compute to try various experiments - be that finetunes or even whatever "AGI" ideas they had but lacked the compute to do it. I would be fine trying some ideas, including writing the code for them, if offered a way to test them out. The problem here is that this guy here wants to have his cake and eat it too - somehow get AGI and make it work on very limited hardware. The big boys haven't reached AGI despite having billions and you want to do it on a 100$ cpu?

Anonymous
06/23/24(Sun)16:08:53 No.101119377

Anonymous 06/23/24(Sun)16:08:53 No.101119377

>>101115749
Do you fags have a discord for this community, or anything similar?

Anonymous
06/23/24(Sun)16:09:14 No.101119385

Anonymous 06/23/24(Sun)16:09:14 No.101119385

File: 00060-2888480053.png (1.04 MB, 1024x1024)

1.04 MB PNG

>>101119023
Terror in some, but in others...

>>101118885
Wow, "peepee poopoo" posts... haven't seen one of those since "tits or GTFO" was around.

Anonymous
06/23/24(Sun)16:09:45 No.101119395

Anonymous 06/23/24(Sun)16:09:45 No.101119395

>>101119258
had we created good finetuning data, tested it with 8b qlora and got good results, pretty sure someone here would donate the 70b finetuning, it's not that much

Anonymous
06/23/24(Sun)16:09:58 No.101119400

Anonymous 06/23/24(Sun)16:09:58 No.101119400

>>101119377
We have a matrix, but only old fags have the link. nu-lmg is too braindead to hold a discussion anyway.

Anonymous
06/23/24(Sun)16:10:14 No.101119406

Anonymous 06/23/24(Sun)16:10:14 No.101119406

File: EgSomnlWoAAtFyv[1].jpg (3.56 MB, 4032x3024)

3.56 MB JPG

They need to do build something like AMD PRO SSG again, but with 24/48gb onboard and then a few NVME slots like the old GTX 970 fast+slow solution. That way I can stick a few 8TB suckers on the damn thing and run what ever models I want.

The Antichrist
06/23/24(Sun)16:12:01 No.101119438

The Antichrist 06/23/24(Sun)16:12:01 No.101119438

>>101119258
>we've already had 60 years
First logical fallacy here in assuming that because it hasn't been done yet that it can't or won't be done sometime in the near future. We already know that AI is increasing capabilities exponentially.

>brain's cortex is at least 90 trillion synapses, that memory and compute is needed at least for such architectures
Second logical fallacy, you are comparing apples to oranges. Synapses are not equivalent to parameters, nor should they be considered as such. They exhibit vastly different qualities and behave in quantitatively different ways.

>imagine doing it just with CPUs
Nowhere did I say that we would even train solely on CPUs. I explicitly said that we are optimizing inference and training on CPUs for fine-tuning / continuous learning purposes on edge devices such as phones (and toasters), and that our base models would be trained on GPUs.

>if some whale here wants to offer Anons compute
Why do you think I'm here Anon?
I have the compute and the product roadmap to put AGI in the hands of the average phone user within 3 years. Open Source AGI within 1 year, possibly 6 months of code is written fast enough. The amount of research that needs to be done to achieve this is actually quite little - most of our time over the course of the next year will be spent writing code rather than theorizing or inventing new solutions. ARXIV is a blessing to society.

Anonymous
06/23/24(Sun)16:13:22 No.101119460

Anonymous 06/23/24(Sun)16:13:22 No.101119460

bitnet merged
https://github.com/ggerganov/llama.cpp/pull/7931

Anonymous
06/23/24(Sun)16:16:22 No.101119512

Anonymous 06/23/24(Sun)16:16:22 No.101119512

>>101118922
they don't lose anything from that happening though

Anonymous
06/23/24(Sun)16:18:14 No.101119559

Anonymous 06/23/24(Sun)16:18:14 No.101119559

>>101119460
Is 1.58 bit quantization for bitnet already implemented or are bitnet models still the same size as FP16 models?

Anonymous
06/23/24(Sun)16:18:53 No.101119574

Anonymous 06/23/24(Sun)16:18:53 No.101119574

>>101119512
AGI is a national security threat

Anonymous
06/23/24(Sun)16:19:01 No.101119575

Anonymous 06/23/24(Sun)16:19:01 No.101119575

>>101119385
pee pee poo poo is for example, retard.

Anonymous
06/23/24(Sun)16:20:21 No.101119596

Anonymous 06/23/24(Sun)16:20:21 No.101119596

>>101119559
there isn't a ternary quant format yet, but you can use any of the other quants in addition to f16. there aren't any good bitnet models anyway.

Anonymous
06/23/24(Sun)16:24:08 No.101119652

Anonymous 06/23/24(Sun)16:24:08 No.101119652

>>101119438
>First logical fallacy here in assuming that because it hasn't been done yet that it can't or won't be done sometime in the near future. We already know that AI is increasing capabilities exponentially.
Are they truly increasing "exponentially"? All I'm seeing is that people are doubling their spending by scaling data or params to get some % subjective "smarts" increase. Some things scale (general purpose knowledge), some things don't - for example agency/autonomy or being able to think for much longer (because the architecture doesn't allow) - many of the faults of GPT-2 are still with us today. I do think most of these problems are solvable though, but I'm not seeing them being solved.

>Second logical fallacy, you are comparing apples to oranges. Synapses are not equivalent to parameters, nor should they be considered as such. They exhibit vastly different qualities and behave in quantitatively different ways.
They kind of are the same thing, it's just some computing substrate that is being adapted, the scaling properties of biology and of ANNs may differ in various ways, but "bigger" and "more" is still better in almost all cases. I do think today's 8b's are incredible for their size but even if you had a magically good architecture and ways of training it, those 8b's will still struggle somewhat, but may serve as proof of concept.

>Nowhere did I say that we would even train solely on CPUs.
Okay, I was going by what you posted. You wanted continous learning on CPUs. Maybe you could do it but I bet it will be slow. I do wish you luck, or at least if you want people to believe it will work well, you'd post something to substantiate your claims.

>I have the compute and the product roadmap to put AGI in the hands of the average phone user within 3 years.
Okay, I hope you understand why people here are skeptical though? Why wouldn't they be? Usually you have to show this in some way for people to believe you.

Anonymous
06/23/24(Sun)16:25:46 No.101119682

Anonymous 06/23/24(Sun)16:25:46 No.101119682

File: file.png (6 KB, 355x83)

6 KB PNG

>>101119460
>deprecated quants
what's the current fucking quant then? Am I blind?

Anonymous
06/23/24(Sun)16:26:46 No.101119703

Anonymous 06/23/24(Sun)16:26:46 No.101119703

>>101119574
i don't think there's going to suddenly be this "agi" that is a massive threat and dangerous. it has progressed gradually so far, probably it will continue to

Anonymous
06/23/24(Sun)16:29:09 No.101119742

Anonymous 06/23/24(Sun)16:29:09 No.101119742

>>101119682
IQ

Anonymous
06/23/24(Sun)16:33:45 No.101119812

Anonymous 06/23/24(Sun)16:33:45 No.101119812

File: file.png (5 KB, 673x62)

5 KB PNG

>>101119682
I guess

Anonymous
06/23/24(Sun)16:35:13 No.101119840

Anonymous 06/23/24(Sun)16:35:13 No.101119840

>>101117901
well, my cpu is old and it can't run it, however I'm rebuilding it with -march=native and -mtune=native

fingers crossed because when i ran the binary it detects my GPU no problem, it crashes when loading the model with illegal instruction. after some digging I've downloaded the source and edited the Makefile so that the compiler targets my CPU instruction set.
and using
make --LLAMA_CUBLAS=1

should build the cuda backend into it, that other faggot called me a half wit, i hope they die tonight :-)

Anonymous
06/23/24(Sun)16:35:23 No.101119843

Anonymous 06/23/24(Sun)16:35:23 No.101119843

>>101119652
>but I'm not seeing them being solved.
Read arxiv everyday and you'll realize how far ahead the research is compared to the models we have access to, or the ones we hear about.

>today's 8b's
are GPT-4 level WITHOUT all the other tricks I have up my sleeve.
https://arxiv.org/abs/2406.07394

>continous learning on CPUs. Maybe you could do it but I bet it will be slow
Yes, it would learn over the course of usage, as it is being fed data from the user. Small models can train on small devices at a reasonable pace. A single user generally doesn't create a ton of new data every single day, lest they be a power user of course. The intention of this is so that your personal AI adapts to your needs and understands the context of the user it is responding to.

>Usually you have to show this in some way for people to believe
Imagine being Elon talking about wanting to make a rocket company, and having no rockets to show for it yet - and yet all the research necessary to accomplish the task existed at that time. All that was required was enough engineers who believed in the project, and they were able to create something from "nothing" (but pre-existing research).

Anonymous
06/23/24(Sun)16:37:14 No.101119868

Anonymous 06/23/24(Sun)16:37:14 No.101119868

>>101119844
>>101119826
>>101119806
>>101119385
is this really the best the FBI can do?

Anonymous
06/23/24(Sun)16:37:24 No.101119875

Anonymous 06/23/24(Sun)16:37:24 No.101119875

>>101119575
Don't try to backpedal now, peepee-poopoo poster.

Anonymous
06/23/24(Sun)16:39:07 No.101119897

Anonymous 06/23/24(Sun)16:39:07 No.101119897

File: 00210-3225806368.png (1.33 MB, 1024x1024)

1.33 MB PNG

>>101119868
>is this really the best the FBI can do?
On a blue board, yeah.

Anonymous
06/23/24(Sun)16:43:55 No.101119971

Anonymous 06/23/24(Sun)16:43:55 No.101119971

>>101119897
catbox exists

Anonymous
06/23/24(Sun)16:45:57 No.101120004

Anonymous 06/23/24(Sun)16:45:57 No.101120004

File: A100.png (85 KB, 1366x728)

85 KB PNG

>>101117128
>8000 bucks
OOF

Anonymous
06/23/24(Sun)16:50:30 No.101120086

Anonymous 06/23/24(Sun)16:50:30 No.101120086

>>101119843
>Read arxiv everyday and you'll realize how far ahead the research is compared to the models we have access to, or the ones we hear about.
I used to pay attention to most stuff, I still do to some degree. I'm still not seeing what you're seeing exactly. Yes, we have a lot of stuff today, no, a lot of the original problems of batched training with Adam with cross-entropy loss or the autoregressive nature of LLMs are still with us, we're just hacking our way around them. Again, I do think most of it is solvable, but I'm not exactly seeing it being truly 'solved' yet. I do think most ideas are around, but people haven't put them together in the right way.

> https://arxiv.org/abs/2406.07394
I read that paper some days ago, was cool, it came together along with a few other MCTS for math papers. Note that some minor cheating did occur in that paper: https://xcancel.com/7oponaut/status/1803228980020986079#m Also that's GPT-4 level on *math*, not in general. I'd like to see how well it replicates.

> Small models can train on small devices at a reasonable pace.
I was doing some estimates for this a while ago and at least for the algo I had in mind for something continous learning-like, I would take 20-30min to do "updates" on reasonably poorfag devices, all while using a lot of CPU and heating up the room. It didn't feel very practical or enjoyable to use, but maybe it could be improved.

Anonymous
06/23/24(Sun)16:51:32 No.101120101

Anonymous 06/23/24(Sun)16:51:32 No.101120101

>>101119843
>>101120086

> All that was required was enough engineers who believed in the project, and they were able to create something from "nothing" (but pre-existing research).
There are a lot of things that are possible today and people aren't doing them. Some of those things take time to implement and will from a lot of people to get them done, some of them require a lot of capital, some require both.
To put it differently, I could maybe believe you if you decided to offer Anons compute to try stuff out, but at the same time you want Anons to do your particular idea in which they may or may not be convinced. I guess Emad did try offering compute for researchers, although now his company is almost 100 million in debt and failed to focus on the core things they were supposed to do.
Elon's rockets also took a lot of capital to build!

Anonymous
06/23/24(Sun)16:52:28 No.101120118

Anonymous 06/23/24(Sun)16:52:28 No.101120118

>>101119460
We're back!
...
Now what?

Anonymous
06/23/24(Sun)16:53:03 No.101120137

Anonymous 06/23/24(Sun)16:53:03 No.101120137

>>101119460
nothingburger.

Anonymous
06/23/24(Sun)16:58:06 No.101120213

Anonymous 06/23/24(Sun)16:58:06 No.101120213

>>101120118
Now we wait for someone to implement some way to convert existing models to BitNet.

Anonymous
06/23/24(Sun)17:01:49 No.101120265

Anonymous 06/23/24(Sun)17:01:49 No.101120265

>>101119460
What does bitnet do?

Anonymous
06/23/24(Sun)17:03:36 No.101120280

Anonymous 06/23/24(Sun)17:03:36 No.101120280

>"What sorcery is this?" he murmurs, his eyes widening with wonder.

On one hand, I'm pissed that the model started writing my POV for me
On the other, 10/10 meme drop.

>>101120265
They're working on a ternary based model format that, if it bears out, should be a lot lighter for inference.
But apparently you can't convert old models so they'll have to grind from the ground up and catch up before we know if it's actually an improvement.

Anonymous
06/23/24(Sun)17:04:26 No.101120297

Anonymous 06/23/24(Sun)17:04:26 No.101120297

>>101119596
>there isn't a ternary quant format yet
IQ1_S uses ternary values.

Anonymous
06/23/24(Sun)17:05:33 No.101120313

Anonymous 06/23/24(Sun)17:05:33 No.101120313

>>101120213
>implement some way to convert existing models to BitNet
Why are you retards so retarded?

Anonymous
06/23/24(Sun)17:05:42 No.101120315

Anonymous 06/23/24(Sun)17:05:42 No.101120315

>>101120280
All Meta had to do is train a single 8B model in BitNet as a proof of concept, then 405B could have been 100GB. Instead, we have to wait for the Qwen team to hopefully do it.

Anonymous
06/23/24(Sun)17:06:07 No.101120324

Anonymous 06/23/24(Sun)17:06:07 No.101120324

>>101120313
Woah, you called someone a retard twice in one breath. You must be really smart!

Anonymous
06/23/24(Sun)17:08:00 No.101120346

Anonymous 06/23/24(Sun)17:08:00 No.101120346

File: Magic.jpg (115 KB, 2196x699)

115 KB JPG

>>101119460
that's quite insane though, bitnet really works

Anonymous
06/23/24(Sun)17:09:59 No.101120371

Anonymous 06/23/24(Sun)17:09:59 No.101120371

>>101120315
they wont do that as it possibly means some serious advancements or ability to run it on a calculator, opens a window for more people to experiment with this shit, it also can imply easier uncensor methods / control of bitnet LLM.

Anonymous
06/23/24(Sun)17:10:11 No.101120377

Anonymous 06/23/24(Sun)17:10:11 No.101120377

File: file.png (10 KB, 716x189)

10 KB PNG

>>101119460
someone call the antichrist

Anonymous
06/23/24(Sun)17:11:10 No.101120389

Anonymous 06/23/24(Sun)17:11:10 No.101120389

>>101120377
why 6.66 ?

Anonymous
06/23/24(Sun)17:11:29 No.101120393

Anonymous 06/23/24(Sun)17:11:29 No.101120393

>>101119406
That's a pretty good idea. I'm no EE, but probably getting the DDR5 you soldered onto there running at top speed in a small footprint is probably your biggest hurdle. All those traces and crosstalk.
But can you imagine 24 channels of DDR5 8400? It'd match an H100 at 3TB/s. Stream the model you need at the moment from the on-board nvmes to ram and you're gtg.
However you'd be looking at processing bottlenecks I bet, which probably means ASIC territory to do it in a reasonable power envelope

Anonymous
06/23/24(Sun)17:12:06 No.101120404

Anonymous 06/23/24(Sun)17:12:06 No.101120404

>>101120389
It's a floating point number. It's always rounded or quantized one way or the other.

Did you want them to write ⅔ everywhere?

Anonymous
06/23/24(Sun)17:12:12 No.101120405

Anonymous 06/23/24(Sun)17:12:12 No.101120405

>>101120371
>opens a window for more people to experiment with this shit, it also can imply easier uncensor methods / control of bitnet LLM.
It does not. BitNet models are at least as expensive to train as their non-BitNet counterparts. All it does is allow you to inference models cheaper and faster.

Anonymous
06/23/24(Sun)17:12:25 No.101120409

Anonymous 06/23/24(Sun)17:12:25 No.101120409

>>101120371
You're an idiot, zucc isn't trying to prevent you from running capable LLMs on your computer. If he was, why would he release that 8B? The reason you're not seeing bitnet is that it doesn't offer major benefits yet to them performance wise, they want to use these models too? They could do it at some point though.

Anonymous
06/23/24(Sun)17:14:49 No.101120444

Anonymous 06/23/24(Sun)17:14:49 No.101120444

>>101120409
They have their own API that serves their models. Making them cheaper to run, not just for themselves, but also making them even more cost effective compared to closed models seems like major enough benefits to me.

Anonymous
06/23/24(Sun)17:14:56 No.101120446

Anonymous 06/23/24(Sun)17:14:56 No.101120446

>>101120280
>ternary
If ternary is less than what it was, would it be even better to use binary

Anonymous
06/23/24(Sun)17:15:07 No.101120448

Anonymous 06/23/24(Sun)17:15:07 No.101120448

>>101120421
I think they said that this method works out to 1.58 bit.

Anonymous
06/23/24(Sun)17:17:39 No.101120482

Anonymous 06/23/24(Sun)17:17:39 No.101120482

>>101120448
>1.58 bit
That sounds kind of arbitrary, why don't they go lower

Anonymous
06/23/24(Sun)17:17:57 No.101120487

Anonymous 06/23/24(Sun)17:17:57 No.101120487

>>101120297
it's 1.5 bpw, not ternary

Anonymous
06/23/24(Sun)17:18:03 No.101120489

Anonymous 06/23/24(Sun)17:18:03 No.101120489

>>101120409
>If he was, why would he release that 8B?
because 8b is a fucking toy, he won't give to the goys fucking 90b-bitnet even though it could be run on a single 24gb vram gpu

Anonymous
06/23/24(Sun)17:18:26 No.101120501

Anonymous 06/23/24(Sun)17:18:26 No.101120501

Fuck bitnet, I hope it never takes off.

Anonymous
06/23/24(Sun)17:19:00 No.101120510

Anonymous 06/23/24(Sun)17:19:00 No.101120510

why is everyone so hostile today?

Anonymous
06/23/24(Sun)17:19:04 No.101120511

Anonymous 06/23/24(Sun)17:19:04 No.101120511

>>101120482
it was their first paper, they tried 1 bit (-1 and 1) but it didn't work well

Anonymous
06/23/24(Sun)17:20:08 No.101120529

Anonymous 06/23/24(Sun)17:20:08 No.101120529

>>101120511
So if you used it on a bigger/better model it would probably work better?

Anonymous
06/23/24(Sun)17:20:14 No.101120530

Anonymous 06/23/24(Sun)17:20:14 No.101120530

>>101120482
Ternary is the lowest integer with the best radix economy.

Anonymous
06/23/24(Sun)17:20:26 No.101120534

Anonymous 06/23/24(Sun)17:20:26 No.101120534

>>101120487
It uses ternary values (-1, 0, 1).

Anonymous
06/23/24(Sun)17:20:47 No.101120542

Anonymous 06/23/24(Sun)17:20:47 No.101120542

>>101120482
it's a trit
1 bit: 0, 1 (2 values)
2 bit: 0, 1, 10, 11 ( 4 values)

Anonymous
06/23/24(Sun)17:21:21 No.101120556

Anonymous 06/23/24(Sun)17:21:21 No.101120556

>>101120530
I was thinking that would be the only explanation of that

Anonymous
06/23/24(Sun)17:21:23 No.101120558

Anonymous 06/23/24(Sun)17:21:23 No.101120558

>>101120542
ur a trit

Anonymous
06/23/24(Sun)17:21:25 No.101120560

Anonymous 06/23/24(Sun)17:21:25 No.101120560

>>101120529
no, the 1 bit method didn't get great results, regardless of the size of the model, but the 1.58bit one (-1 0 1) gave the exact same result as fpt16 when the model was 3b or bigger, that's a fucking revolution we're witnessing right now

Anonymous
06/23/24(Sun)17:23:17 No.101120594

Anonymous 06/23/24(Sun)17:23:17 No.101120594

>>101120315
>Meta
Still working on the advanced architecture of having more than 8k context
>Qwen
Barely mastered GQA recently

Anonymous
06/23/24(Sun)17:23:21 No.101120596

Anonymous 06/23/24(Sun)17:23:21 No.101120596

>>101120560
>1.58bit one (-1 0 1)
I thought the -1 0 1 meant it was ternary and the 1.58 bit was something else

Anonymous
06/23/24(Sun)17:24:07 No.101120607

Anonymous 06/23/24(Sun)17:24:07 No.101120607

>>101120596
you can prove with some math shit that using 3 values = 1.58bit

Anonymous
06/23/24(Sun)17:26:00 No.101120646

Anonymous 06/23/24(Sun)17:26:00 No.101120646

>>101120607
Oh 2^1.58 is 3

Anonymous
06/23/24(Sun)17:26:36 No.101120658

Anonymous 06/23/24(Sun)17:26:36 No.101120658

>>101120646
yeah you got it kek

Anonymous
06/23/24(Sun)17:26:58 No.101120671

Anonymous 06/23/24(Sun)17:26:58 No.101120671

>>101120607
the math shit being log2, the same way you get the number of bits for any radix, ffs lads

Anonymous
06/23/24(Sun)17:27:45 No.101120687

Anonymous 06/23/24(Sun)17:27:45 No.101120687

>>101120658
Okay so if they went from binary to ternary, does this mean 4 or even 5 values would work better? Does it scale well?

Anonymous
06/23/24(Sun)17:28:55 No.101120708

Anonymous 06/23/24(Sun)17:28:55 No.101120708

>>101120687
you don't need more, because on their paper they showed that they can get the same accuracy as fp16 with just 3 values
https://arxiv.org/abs/2310.11453

Anonymous
06/23/24(Sun)17:28:55 No.101120709

Anonymous 06/23/24(Sun)17:28:55 No.101120709

>>101120560
>gave the exact same result as fpt16 when the model was 3b or bigger
No. Projections based on a couple of data points show the benefits actually increase for larger models, but no one has publicly trained a ternary model larger than 3b.
There's also the fact that the models that have been trained, were trained on a laughably small number of tokens, like 100B. It's entirely possible that once you start training models close to saturation, the extra precision start to become necessary.

Anonymous
06/23/24(Sun)17:30:59 No.101120745

Anonymous 06/23/24(Sun)17:30:59 No.101120745

>>101120687
yes, if you have unlimited time to train

Anonymous
06/23/24(Sun)17:31:08 No.101120750

Anonymous 06/23/24(Sun)17:31:08 No.101120750

>>101120687
it's the opposite, we're encoding weights as radix X where the "default" was 16, and as we go down in bits the perplexity doesn't drop linearly with net size, and this seems to hold up until radix 3

the tldr is for the same mode size in memory, you get better performance from a 1.58bit model with more weights than a 16 bit model with fewer, 1bit models buck this trend and get bad again

Anonymous
06/23/24(Sun)17:31:33 No.101120759

Anonymous 06/23/24(Sun)17:31:33 No.101120759

What's the meta for merging these days? Is it still SLERP?

Anonymous
06/23/24(Sun)17:31:33 No.101120760

Anonymous 06/23/24(Sun)17:31:33 No.101120760

>>101120534
it's still not the same. it uses ternary values, but they are picked from a codebook. some combinations of ternary values cannot be represented then, since there aren't enough bits for that. otoh, more bits are wasted in the group scales. a true ternary encoding would be significantly more efficient and lossless.

Anonymous
06/23/24(Sun)17:31:48 No.101120767

Anonymous 06/23/24(Sun)17:31:48 No.101120767

>>101120501
This so much this corpobros. bitnet is basically skynet made by goyim.

Anonymous
06/23/24(Sun)17:33:04 No.101120788

Anonymous 06/23/24(Sun)17:33:04 No.101120788

>>101120750
Oh. What are some other things that can be changed in the transformer that may not have been optimized yet?

Anonymous
06/23/24(Sun)17:33:56 No.101120809

Anonymous 06/23/24(Sun)17:33:56 No.101120809

>>101120760
Ah yes, that is true.
I was just pointing out that there is, in fact, already ternary quantization. The code was even used to as part of the bitnet implementation that got merged I'm pretty sure.

Anonymous
06/23/24(Sun)17:34:24 No.101120821

Anonymous 06/23/24(Sun)17:34:24 No.101120821

>>101120759
I recommend you to use the rope.

Anonymous
06/23/24(Sun)17:35:40 No.101120846

Anonymous 06/23/24(Sun)17:35:40 No.101120846

>>101118448
lmao

Anonymous
06/23/24(Sun)17:37:23 No.101120880

Anonymous 06/23/24(Sun)17:37:23 No.101120880

>>101120821
This, there's nothing worth merging these days. All local models are the same gpt4/claude trash anyway.

CPuMAXx/VI !CPuMAXx/VI
06/23/24(Sun)17:43:57 No.101120987

CPuMAXx/VI !CPuMAXx/VI 06/23/24(Sun)17:43:57 No.101120987

>>101116283
I have a hunch the guy didn't actually test anything, because that's definitely not the best possible flag to use for performance. It almost guarantees mediocre memory access patterns and oversaturation of the xGMI socket interlink.
>>101116668
Yes, the best generic case performance is with --numa distribute, followed by numactl --balancing, numactl --all and finally numactl --interleave 0-x. You can't get worse without actively trying to force memory to be allocated away from threads or ignoring numa altogether.
In fact I think he doesn't even own a dual-socket EPYC rig: he has nothing but stock photos of hardware and his knowledge is sketchy.
I half think he just did a shit job plagiarizing my rentry

The Antichrist
06/23/24(Sun)17:53:59 No.101121141

The Antichrist 06/23/24(Sun)17:53:59 No.101121141

>>101120086
> I do think most ideas are around, but people haven't put them together in the right way.
This exactly.

We aren't relying solely on MCTS to get us to AGI btw, it just proves the point that small models can still be very capable.

>>101120101
>There are a lot of things that are possible today and people aren't doing them
AGI will help us close that gap. That's why we need more hands on deck working on AGI. It requires way less capital than you would think. The current paradigm of LLMs is over. Engineers and corpos merely haven't caught up to the research potential.

What SpaceX did to rockets in terms of reducing cost is about to happen to AI.

Anonymous
06/23/24(Sun)17:57:13 No.101121194

Anonymous 06/23/24(Sun)17:57:13 No.101121194

File: Capture.png (4 KB, 186x200)

4 KB PNG

Anyone do anything with ONNX models? I have no idea what these outputs mean.

Anonymous
06/23/24(Sun)18:09:01 No.101121349

Anonymous 06/23/24(Sun)18:09:01 No.101121349

>>101121194
the only onnx i use is for image upscaling, and even then rarely. i have no idea what the screenshot is about
maybe netron/onnx-modifier or onnx-tool could be of use for you

Anonymous
06/23/24(Sun)18:13:40 No.101121410

Anonymous 06/23/24(Sun)18:13:40 No.101121410

>>101120444
Yes, it would cut down inference speed, it won't help anything for training, in fact now he has to spend the same amount of compute training an equivalent bitnet that will have lowered performance, also given enough tokens it might start to saturate much earlier than fp16 weights. How much that is remains to be seen, but I recall some paper claiming it wasn't as worth.

>>101120489
And that 90b-bitnet would perform about as well as our l3-70b does today, for which you need 2 or 3 3090s to run. I don't think Zuck cares because it's not a large difference - someone that wants to run it will spend that 800$ buying some used hardware today or run it on CPU, while for him, he has all the A100s and H100s he needs. Maybe we'll see a bitnet from them in the future, but the real question will be how much worse will it perform.
I'll admit I'm not as hyped about it as when I saw the paper, the realization you still need 8xa100 or more to train these even if you can run it on your potato PC makes it feel unappealing to me in the long run.

Anonymous
06/23/24(Sun)18:13:57 No.101121420

Anonymous 06/23/24(Sun)18:13:57 No.101121420

>>101120346
Holy shi...

Anonymous
06/23/24(Sun)18:16:39 No.101121457

Anonymous 06/23/24(Sun)18:16:39 No.101121457

>>101121194
Yep.
I got a better model for Silly's Vector DB.
Didn't seen to make much difference aside from taking slightly longer to vectorize messages.

Anonymous
06/23/24(Sun)18:17:09 No.101121464

Anonymous 06/23/24(Sun)18:17:09 No.101121464

>>101121420
What's the equivalent ppl compared to a model *trained* natively in fp16 on the exact same data? How much are we losing? What if we train on 1T-10T instead of on 100B as they did? That's the real question.

Anonymous
06/23/24(Sun)18:17:10 No.101121465

Anonymous 06/23/24(Sun)18:17:10 No.101121465

>>101120346
>>101121420
are you sure that this table means what you think? it's the same bitnet model encoded using different formats. i2_s was a 2bpw format that didn't get merged in the end. in principle it could be encoded at 1.58bpw without loss of quality, but there isn't support for that yet.

Anonymous
06/23/24(Sun)18:18:34 No.101121482

Anonymous 06/23/24(Sun)18:18:34 No.101121482

>>101121410
well, bitnet leaves sour taste in my mouth because it proliferation would breed specialized accelerators... and they would be probably tightly regulated, who knows what they would do to general purpose compute

Anonymous
06/23/24(Sun)18:18:44 No.101121487

Anonymous 06/23/24(Sun)18:18:44 No.101121487

File: Maagic.jpg (130 KB, 1692x934)

130 KB JPG

>>101121464
>What's the equivalent ppl compared to a model *trained* natively in fp16 on the exact same data?
https://huggingface.co/1bitLLM/bitnet_b1_58-large

Anonymous
06/23/24(Sun)18:20:58 No.101121517

Anonymous 06/23/24(Sun)18:20:58 No.101121517

>>101121482
Maybe (EA) doomers deserve the rope, fuck their regulation and there isn't a day I wish they never have been born. Anyway, you could implement efficient bitnet inference today on some FPGAs that would easily outperform current GPUs because it's just this simple to inference. But for me the real problem is that it doesn't help with training. Just inference is meh.

Anonymous
06/23/24(Sun)18:22:07 No.101121530

Anonymous 06/23/24(Sun)18:22:07 No.101121530

>>101121517
there would be ASICs if it got popular

Anonymous
06/23/24(Sun)18:22:46 No.101121538

Anonymous 06/23/24(Sun)18:22:46 No.101121538

>>101121349
>>101121457
This is the buffalo model for face detection, I guess.

Anonymous
06/23/24(Sun)18:29:59 No.101121634

Anonymous 06/23/24(Sun)18:29:59 No.101121634

>>101121530
Yes, there will, but I'm saying you could do even with a FPGA today and beat GPU solutions by a lot at less cost. And they won't regulate FPGAs duh

Anonymous
06/23/24(Sun)18:32:48 No.101121670

Anonymous 06/23/24(Sun)18:32:48 No.101121670

>>101121487
That looks good, I guess now the real question is if it saturates at some point

Anonymous
06/23/24(Sun)18:36:33 No.101121720

Anonymous 06/23/24(Sun)18:36:33 No.101121720

>>101121517
Bitnet requires only addition, no multiplication. The original paper still uses it for training, but it wouldn't be hard to remove it there too. Given multiplication is slower and more complicated than addition, it might still have an advantage for training?

Also if you only train one layer at a time, using bitnet for all the other layers can make it fit on a tiny GPU. But training only one layer at a time would be a lot slower.

Anonymous
06/23/24(Sun)18:39:40 No.101121760

Anonymous 06/23/24(Sun)18:39:40 No.101121760

>>101121487
Holy shi... !

Anonymous
06/23/24(Sun)18:41:47 No.101121787

Anonymous 06/23/24(Sun)18:41:47 No.101121787

>>101121720
The problem as I'm seeing is that you need to train in fp16 to generate these 1.58bit weights, you can't train natively in 1.58bit, worse still, you can't even fucking take an existing model and bitnetify it, you need to pretrain to get the patterns to align just right. Even worse, imagine the corpos giving you just the 1.58bit and not the fp16 weights. Imagine how hard to finetune that would be, you still want 8xa100s. Bitnet might make it so you stay cucked, Grok was released quantized only.

Anonymous
06/23/24(Sun)18:43:13 No.101121811

Anonymous 06/23/24(Sun)18:43:13 No.101121811

>>101121487
Are we back?

Anonymous
06/23/24(Sun)18:43:38 No.101121814

Anonymous 06/23/24(Sun)18:43:38 No.101121814

>>101121811
no lol

Anonymous
06/23/24(Sun)18:43:43 No.101121818

Anonymous 06/23/24(Sun)18:43:43 No.101121818

>>101121787
>Grok was released quantized only.
Damn, really? Fucking snakes.

Anonymous
06/23/24(Sun)18:44:42 No.101121830

Anonymous 06/23/24(Sun)18:44:42 No.101121830

>>101121487
>3 months ago
>still nothing of value from shitnet
ngmi

Anonymous
06/23/24(Sun)18:48:42 No.101121883

Anonymous 06/23/24(Sun)18:48:42 No.101121883

Why is all discourse about LLMs the same back and forth? "It can't solve this problem" "Yes it can if you prompt it" is there anything interesting or useful to know about them?

Anonymous
06/23/24(Sun)18:52:03 No.101121933

Anonymous 06/23/24(Sun)18:52:03 No.101121933

>>101121818
I think it was released fp8 because it was trained fp8, so all good

Anonymous
06/23/24(Sun)18:54:15 No.101121957

Anonymous 06/23/24(Sun)18:54:15 No.101121957

>>101121933
Oh, ok. But why would they train fp8 when no one else does that?

Anonymous
06/23/24(Sun)18:54:36 No.101121965

Anonymous 06/23/24(Sun)18:54:36 No.101121965

>>101121933
https://huggingface.co/xai-org/grok-1 says it's int8

Anonymous
06/23/24(Sun)18:58:47 No.101122009

Anonymous 06/23/24(Sun)18:58:47 No.101122009

>>101121965
ok my bad

The Antichrist
06/23/24(Sun)19:01:09 No.101122029

The Antichrist 06/23/24(Sun)19:01:09 No.101122029

Bitnet is also highly optimized for CPU because only addition operations are required to compute ternary matrix multiplications.

Anonymous
06/23/24(Sun)19:01:19 No.101122032

Anonymous 06/23/24(Sun)19:01:19 No.101122032

>>101121965
Ok nvm fuck Elon again.

Anonymous
06/23/24(Sun)19:01:59 No.101122037

Anonymous 06/23/24(Sun)19:01:59 No.101122037

>>101121787
That's how the original paper did it. But I think it wouldn't be that hard to keep the weights quantized during training and just tweaking them occasionally from the gradients. The gradients do have to be stored in fp16, but it's still an up to 45% reduction. There is no point in keeping the weights around as precise real numbers when they are just getting quantized constantly anyway.

But the better way to get memory down is to only update a few layers at a time you can get it down arbitrarily small. It's slower sure, but that's the only practical way forward for local finetuning.

Anonymous
06/23/24(Sun)19:18:23 No.101122206

Anonymous 06/23/24(Sun)19:18:23 No.101122206

Kobold has a settings box for seed, very tiny and default -1. The (?) says that those settings are inactive by default.

Is there a way to turn seed on and be able to set it there for deterministic results? Or is that something else and useless for deterministic testing?

I thought lowering temperature to 0 would kinda work, but Kobold limits it to 0.01, and it seems like c4 is fine with it turned down but l3 seemed fussy.

Anonymous
06/23/24(Sun)19:20:28 No.101122229

Anonymous 06/23/24(Sun)19:20:28 No.101122229

>>101122206
As far as I can tell, there's no way to get deterministic results. Even with temperature 0 sometimes the output changes even when using the same seed.

Anonymous
06/23/24(Sun)19:30:39 No.101122372

Anonymous 06/23/24(Sun)19:30:39 No.101122372

>>101119258
>90 T synapses
Do you really need you AGI to digest, breath, do motion control, blood circulation , cell regenerate, sleep , feel, smell, taste , feel pain, control reptile instincts or fucking pee??? There're multiple mammals with enormous brains yet they're dumb as fuck.

Anonymous
06/23/24(Sun)19:34:49 No.101122427

Anonymous 06/23/24(Sun)19:34:49 No.101122427

>>101122372
It's like a capacitor, it can store and expel electromagnetic energy. Bigger brain in a bigger body is just logistics, more energy is required to move larger objects.

Higher thought processes mostly occur in the pre-frontal neocortex - most of which energy is wasted. Only a very small subset of synapses and neurons are responsible for higher cognitive activities.

Anonymous
06/23/24(Sun)19:35:28 No.101122435

Anonymous 06/23/24(Sun)19:35:28 No.101122435

>>101122372
If it can't do all of that, it's not a General Intelligence, is it?

Anonymous
06/23/24(Sun)19:40:47 No.101122500

Anonymous 06/23/24(Sun)19:40:47 No.101122500

>>101122435
that's not AGI, AGI stangs for Artificial General Intelligence not superhuman cloning

Anonymous
06/23/24(Sun)19:40:59 No.101122502

Anonymous 06/23/24(Sun)19:40:59 No.101122502

>>101122372
Some of those are not neural functions. Anyway 90T was not the whole brain, just the cortex. Even if we said you only need 10-30T, the latency for running it may be high.
My personal opinion though is that the way humans learn is not fully comparable to ANN scaling laws, we learn "online", we aren't seeing random batches of data that are uncorrelated largely, we're seeing a continous stream of everything. We also don't predict the next token, but I guess the cortex is a predictive machine. There's also live RL and ground truth tends to always be available (physics -> senses).

Anonymous
06/23/24(Sun)19:45:40 No.101122574

Anonymous 06/23/24(Sun)19:45:40 No.101122574

>>101119460
does inference work or that's just for perplexity testing???
bitnet quants from Gerg on HF says there's no code yet.

Anonymous
06/23/24(Sun)19:48:53 No.101122624

Anonymous 06/23/24(Sun)19:48:53 No.101122624

>>101122500
Arc defines AGI as "the ability to efficiently acquire new skills."

Anonymous
06/23/24(Sun)19:49:35 No.101122631

Anonymous 06/23/24(Sun)19:49:35 No.101122631

>>101120346
show me the inference, not fucking perplexity . BTW, those i2s quants are already deprecated

Anonymous
06/23/24(Sun)19:50:01 No.101122642

Anonymous 06/23/24(Sun)19:50:01 No.101122642

>>101122427
> Only a very small subset of synapses and neurons are responsible for higher cognitive activities.
Just because a network is sparse does not make it useless. Even your MoE's that are rarely activated, but they are activated sometimes.
Anyway, there's also other differences with humans, we have recurrence at all levels, we learn online and with a "batch size of 1", no optimizers like Adam that have momentum.
Our learning is done in a single step, we just "get it".
Even if you say you only do reasoning on some higher level latents, those need to be generated, it's not like the lower cortical hierarchy is useless, it compresses visual, auditory, and other sensory data, not only that, for vision there's 2 pathways, one is temporal and involves currently happening events, the other is spatial.
I don't really think those 90T are as wasted as you think they are. Of course transformers probably reduce by some factor the param count needs, we're not doing this with MLPs.

Anonymous
06/23/24(Sun)19:51:18 No.101122664

Anonymous 06/23/24(Sun)19:51:18 No.101122664

>>101120987
did you try llamafile?

Anonymous
06/23/24(Sun)19:52:50 No.101122685

Anonymous 06/23/24(Sun)19:52:50 No.101122685

File: LLMisdeadend.png (41 KB, 806x216)

41 KB PNG

>>101122642
You have no idea how close we are to AGI.

Anonymous
06/23/24(Sun)19:53:11 No.101122692

Anonymous 06/23/24(Sun)19:53:11 No.101122692

bros do I sell my NVDL if bitnet is coming soon??

Anonymous
06/23/24(Sun)19:53:46 No.101122700

Anonymous 06/23/24(Sun)19:53:46 No.101122700

>>101122624
that's a very hood definition . not perfect but very good one.
all neural networks sux at ARC, probly same with genetic algos but not sure

Anonymous
06/23/24(Sun)19:55:42 No.101122733

Anonymous 06/23/24(Sun)19:55:42 No.101122733

>Spent yesterday playing with c4 for RP fun
>Today, switched back to L3
I'm beginning to see why people have been shitting on L3
The text quality is fine, but it keeps ignoring context details that are just one or two exchanges up the chat.

And that's with c4 at Q4KM versus L3 at Q6K.

Anonymous
06/23/24(Sun)19:55:59 No.101122737

Anonymous 06/23/24(Sun)19:55:59 No.101122737

>>101122700
You can't tell an LLM to solve a visual problem - it's blind.
The current paradigm of pre-trained static models is also a dead end. Continuous learning need be solved before things get serious.

Anonymous
06/23/24(Sun)19:58:44 No.101122762

Anonymous 06/23/24(Sun)19:58:44 No.101122762

any big improvements on midnight miqu yet for ~70b?

Anonymous
06/23/24(Sun)19:59:31 No.101122774

Anonymous 06/23/24(Sun)19:59:31 No.101122774

>>101122737
chatgpt is able to analysis images pretty well, presumable it has detailed image recognition to text thrown into the model to describe the image first

Anonymous
06/23/24(Sun)20:00:55 No.101122790

Anonymous 06/23/24(Sun)20:00:55 No.101122790

>>101122762
Ask Reddit.

Anonymous
06/23/24(Sun)20:01:42 No.101122801

Anonymous 06/23/24(Sun)20:01:42 No.101122801

>>101122790
go back

Anonymous
06/23/24(Sun)20:02:01 No.101122804

Anonymous 06/23/24(Sun)20:02:01 No.101122804

>>101122685
Is this a joke? Irony?

Anonymous
06/23/24(Sun)20:02:19 No.101122806

Anonymous 06/23/24(Sun)20:02:19 No.101122806

>>101122685
Oh, I think we could get AGI this year if anyone bothered to do the right things, but we could have done the right things even 2-3 years ago and nobody did them. Either way, it could take 0 years or 100 years, it's random. As for your Mamba, I recall that SSMs generalize poorer than transformers and not like LLMs themselves generalize that well to begin with.

Anonymous
06/23/24(Sun)20:06:39 No.101122849

Anonymous 06/23/24(Sun)20:06:39 No.101122849

>>101115749
bitnet mistral.
https://huggingface.co/liminerity/MISTRAL-1.58-BIT-PRETRAIN-v2/tree/main
will it work if I convert to ggml and run inference?

Anonymous
06/23/24(Sun)20:08:05 No.101122864

Anonymous 06/23/24(Sun)20:08:05 No.101122864

>>101122849
>no model card
Who knows. Try it and report back.

Anonymous
06/23/24(Sun)20:11:36 No.101122894

Anonymous 06/23/24(Sun)20:11:36 No.101122894

>>101122849
From what little I read about this Bitnet thing, the pretrain is a phase you have to do so the actual training can work.

So you might be baking a cake with only half of the ingredients there, but as >>101122864 said, give it a shot and see what happens.

Anonymous
06/23/24(Sun)20:12:13 No.101122901

Anonymous 06/23/24(Sun)20:12:13 No.101122901

>>101122849
OH SHIIIII-
>it's not actually by Mistral
...

Anonymous
06/23/24(Sun)20:13:17 No.101122912

Anonymous 06/23/24(Sun)20:13:17 No.101122912

File: 1700635754484900.png (41 KB, 843x341)

41 KB PNG

>>101122849
It's some literally who fucking around with the old release. Who cares?

Anonymous
06/23/24(Sun)20:14:51 No.101122928

Anonymous 06/23/24(Sun)20:14:51 No.101122928

>>101122912
>converting
I'm pretty sure the actual 1.58 bit guys said that they wish conversion were an option but it doesn't work because of that pretrain step requirement.

Anonymous
06/23/24(Sun)20:20:59 No.101122979

Anonymous 06/23/24(Sun)20:20:59 No.101122979

>>101122912
Not a literal who. Must be the anon that posted about his idea here last month.
https://desuarchive.org/g/thread/100658694/#100671752

Anonymous
06/23/24(Sun)20:27:28 No.101123026

Anonymous 06/23/24(Sun)20:27:28 No.101123026

Why does llama.cpp list GritLM-7B + GritLM-8x7B as supported models in their readme?
Are those architecturally different from mistral 7B and mixtral 8x7b?

Anonymous
06/23/24(Sun)20:28:55 No.101123041

Anonymous 06/23/24(Sun)20:28:55 No.101123041

>>101122733
what's c4?

Anonymous
06/23/24(Sun)20:31:23 No.101123060

Anonymous 06/23/24(Sun)20:31:23 No.101123060

>>101123041
c4 of deez nutz lmao

Anonymous
06/23/24(Sun)20:34:21 No.101123084

Anonymous 06/23/24(Sun)20:34:21 No.101123084

>>101122762
magnum

Anonymous
06/23/24(Sun)20:34:21 No.101123085

Anonymous 06/23/24(Sun)20:34:21 No.101123085

>>101123041
c4ai-command-r-plus.Q4_K_M
That seems to be the best my vramlet ass can do and stay at ~1t/s unless I'm missing a magic setting.

Anonymous
06/23/24(Sun)20:35:07 No.101123095

Anonymous 06/23/24(Sun)20:35:07 No.101123095

>>101123026
>GritLM is a generative representational instruction tuned language model. It unifies text representation (embedding) and text generation into a single model achieving state-of-the-art performance on both types of tasks.
Sure sounds architecturally different.

Anonymous
06/23/24(Sun)20:42:51 No.101123158

Anonymous 06/23/24(Sun)20:42:51 No.101123158

>>101123085
thanks. I think most people here call it c-r+, but I did suspect it was one of cohere's models.

Anonymous
06/23/24(Sun)20:51:24 No.101123243

Anonymous 06/23/24(Sun)20:51:24 No.101123243

>>101122733
>>101123085
Weird, I use CR+ and it quite often ignores literally what I said in the last message. Never tried L3. WizardLM2 8x22B was better at paying attention and just generally being smart, but it was also slopped in a way I really didn't like. CR+ has its own sloppisms though. It's all so tiresome.

Anonymous
06/23/24(Sun)21:08:01 No.101123413

Anonymous 06/23/24(Sun)21:08:01 No.101123413

>>101123158
Probably. I've been taking notes on lots of models so the alpha sort is all I think about.

>>101123243
I ran an RP today that intentionally was kinda "there's only one way this can work out" and it held to the premise quite well for a long time. Then after the location changed it completely disregarded the premise and did something completely contradictory to rules the RP had established.

I asked why and it basically said, "I decided we've played out that part of the story so I changed it to make it interesting again."

I backed it up and changed the AN to demand that it consider the premise for every response, and it got back on track and went to a proper conclusion.

>>101123243
>WizardLM2 8x22B was better at paying attention and just generally being smart
I tried WizardLM-2-8x22B-Q4_K_S but it went only 0.25 t/s for me.
Q3_K_S *might* be small enough for me to get 1 t/s out of but that's probably getting into 8B kind of brain dead territory.
I'll probably try it anyway.

Anonymous
06/23/24(Sun)21:19:30 No.101123504

Anonymous 06/23/24(Sun)21:19:30 No.101123504

What's the BIS multimodal model these days?

Anonymous
06/23/24(Sun)21:22:40 No.101123533

Anonymous 06/23/24(Sun)21:22:40 No.101123533

>>101123084
midnight miqu is better in my experience, Qwen based models have never impressed me

i am running both at 2.8bpw so perhaps that's where i am going wrong, 16t/s tho so and the huge context window is nice

Anonymous
06/23/24(Sun)21:26:46 No.101123563

Anonymous 06/23/24(Sun)21:26:46 No.101123563

>>101116326
Friendly reminder to everyone:
Please take a moment to visit GitHub and upvote the following issue:
VVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVVV
>https://github.com/ggerganov/llama.cpp/discussions/8078
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Your support helps bring attention to this matter and increases the likelihood of it being addressed. Thank you for your time and participation!

Please be advised that lack of participation in this call to action will be noted. We expect every member of our community to fulfill their part in driving our project forward. Your commitment to upvoting this issue is a testament to your dedication to our collective goals.

We urge you to act swiftly and decisively. Your vote is not just a gesture; it's a vital step towards enhancing our project. Delay is unacceptable, and inaction is unbearable. Let's demonstrate our unity and resolve by overwhelming this issue with the support it deserves.

Failure to comply with this request by the stipulated deadline will be regarded as a disregard for our community's progress. We trust in your sense of responsibility and urgency.

Take action now. Upvote the issue.

Tick tock. I'm watching. Don't make me come find you.

Anonymous
06/23/24(Sun)21:29:29 No.101123588

Anonymous 06/23/24(Sun)21:29:29 No.101123588

>>101123563
You forgot your PR with your code for the solution.

Anonymous
06/23/24(Sun)21:34:18 No.101123635

Anonymous 06/23/24(Sun)21:34:18 No.101123635

File: 1_m9-U-hR6Zergb-9pMJ0tgw.jpg (50 KB, 600x450)

50 KB JPG

>>101123588
>:(

Anonymous
06/23/24(Sun)21:35:56 No.101123654

Anonymous 06/23/24(Sun)21:35:56 No.101123654

>>101123588
>>101116708
There's a rejected PR... somewhere...

Anonymous
06/23/24(Sun)21:36:40 No.101123660

Anonymous 06/23/24(Sun)21:36:40 No.101123660

File: Screenshot 2024-06-24 030435.png (22 KB, 747x57)

22 KB PNG

Thank you, brave machine, to always remind me where the boundaries are.

Anonymous
06/23/24(Sun)21:37:10 No.101123665

Anonymous 06/23/24(Sun)21:37:10 No.101123665

>>101123533
I recommend trying it with a low temp (<1) if you haven't already, in my experience the sweet spot is 0.6-0.9 while anything above 1 melts it into a puddle of ESL and cliches. At low temps it shouldn't feel very much like qwen at all, though I haven't tried it at that low of a quant

Anonymous
06/23/24(Sun)21:39:12 No.101123681

Anonymous 06/23/24(Sun)21:39:12 No.101123681

>>101123660
How does it know it's illegal?

Anonymous
06/23/24(Sun)21:42:02 No.101123710

Anonymous 06/23/24(Sun)21:42:02 No.101123710

>>101123681
>does it know
nta.
it doesn't, it just a stochastic parrot, nothing more.

Anonymous
06/23/24(Sun)21:50:28 No.101123792

Anonymous 06/23/24(Sun)21:50:28 No.101123792

>>101123665
thanks for the tip, how are you running these models at bigger sizes? just all CPU? what kind of t/s do you get?

Anonymous
06/23/24(Sun)21:56:14 No.101123848

Anonymous 06/23/24(Sun)21:56:14 No.101123848

JEPA cat with good crapness ratio when?

Anonymous
06/23/24(Sun)21:57:43 No.101123863

Anonymous 06/23/24(Sun)21:57:43 No.101123863

>>101123848
Once Yann has convinced zucc to stop wasting even some of his 150K H100s on dumb LLMs and to instead give all the compute to his JEPA team.

Anonymous
06/23/24(Sun)22:04:01 No.101123931

Anonymous 06/23/24(Sun)22:04:01 No.101123931

there's no way people here have been using LLMs to rp or coom for longer than 6 months and aren't bored already.

Anonymous
06/23/24(Sun)22:06:24 No.101123957

Anonymous 06/23/24(Sun)22:06:24 No.101123957

>>101123931
this. got bored in first weeks of using llama1, even shorter with llama2 and 3, its all predictable as fuck.

CPuMAXx/VI !CPuMAXx/VI
06/23/24(Sun)22:07:06 No.101123964

CPuMAXx/VI !CPuMAXx/VI 06/23/24(Sun)22:07:06 No.101123964

>>101122664
>did you try llamafile?
no. I use LLMs as part of shell pipelines mostly, so the whole concept is unappealing to me

Anonymous
06/23/24(Sun)22:08:06 No.101123976

Anonymous 06/23/24(Sun)22:08:06 No.101123976

File: __gotoh_hitori_bocchi_the(...).jpg (245 KB, 668x1024)

245 KB JPG

I made a JB say [Review the Rules.] and it's paying more attention to the rules...
Maybe if I moved some of it to the JB...
>inb4 Anon discovers JB

Anonymous
06/23/24(Sun)22:20:02 No.101124082

Anonymous 06/23/24(Sun)22:20:02 No.101124082

>>101123976
You could take it's output and use it as a prefil to save some inference time.

Anonymous
06/23/24(Sun)22:20:45 No.101124088

Anonymous 06/23/24(Sun)22:20:45 No.101124088

>>101122733
I'm really starting to believe the quant damage conspiracies because I had the same thoughts as you. However, once I started using Euryale (the L3 tune) at 8.0bpw, it is unironically a semen demon and definitely Sonnet level. I don't get any slop at all and it follows instructions well, at least for roleplay with authors' notes. Combined with ROPE to extend its context to 20kish, and its honestly excellent. The only downside is its really horny, but I'm okay with that. Using it to start the roleplay and then continuing where it left off with Command R+ has been my go to thus far for long roleplay sessions.
>>101123243
Wizard is smart, but its slopped. No matter what I did, I couldn't prompt it away. I was running it at 5bpw+ too.

Anonymous
06/23/24(Sun)22:31:18 No.101124159

Anonymous 06/23/24(Sun)22:31:18 No.101124159

>>101123931
Just do it in bursts, binge for a week or two then drop it for months, grab something that looks new and go again.

Anonymous
06/23/24(Sun)22:31:57 No.101124163

Anonymous 06/23/24(Sun)22:31:57 No.101124163

File: file.png (22 KB, 887x100)

22 KB PNG

>>101122774
GPT-4 is multi-modal. We need to go beyond LLMs. Far beyond.

Mamba can generalize
https://arxiv.org/pdf/2405.21060

Anonymous
06/23/24(Sun)22:36:13 No.101124198

Anonymous 06/23/24(Sun)22:36:13 No.101124198

>>101124088
>Euryale (the L3 tune) at 8.0bpw
I'll give it a spin. I'll probably need to quant down to 6 or 5, though.

Anonymous
06/23/24(Sun)22:39:34 No.101124237

Anonymous 06/23/24(Sun)22:39:34 No.101124237

>>101124088
>unironically a semen demon and definitely Sonnet level
Nah, this is just hyperbole. It doesn't follow instructions well. It does whatever it wants.

Anonymous
06/23/24(Sun)22:55:53 No.101124374

Anonymous 06/23/24(Sun)22:55:53 No.101124374

>Gigabyte T181-G20: Core 4 Solutions IT hardware has a warehouse full of these things, $1300 each. They will be the barebones server backbone for this entire project.
https://www.ebay.com/p/2335705212?iid=155978049704
is there no cheaper way?
It seems a waste to get $30 CPUs and $50 RAM to pair with a $1300 NEW server
Shouldn't people be selling these for a couple hundred at most?
Last time I tried to get a barebones server I rean into the same thing and gave up because the CPU and RAM and GPU were all $50 but the actual rack was $400. But afaik that wouldn't work with the suggested V100s
$1300 seems overboard.

Anonymous
06/23/24(Sun)22:56:13 No.101124376

Anonymous 06/23/24(Sun)22:56:13 No.101124376

>>101124088
I'm starting to believe you have brain damage.

Anonymous
06/23/24(Sun)22:58:41 No.101124400

Anonymous 06/23/24(Sun)22:58:41 No.101124400

>>101124374
>is there no cheaper way?
https://rentry.org/V100MAXXING#t180-g20

Anonymous
06/23/24(Sun)23:01:35 No.101124428

Anonymous 06/23/24(Sun)23:01:35 No.101124428

are 8b models on sonnet level yet? no? what are you doing?

Anonymous
06/23/24(Sun)23:02:49 No.101124443

Anonymous 06/23/24(Sun)23:02:49 No.101124443

>>101124428
geez
imagine being this naive

Anonymous
06/23/24(Sun)23:06:19 No.101124480

Anonymous 06/23/24(Sun)23:06:19 No.101124480

>>101124400
so it's $800 vs $1300
either compared to $50 is still a big jump
But 1 V100 is $800 vs $200 for the module. which IS worth it....
$800 gpu and $200 server
or
$800 server and $200 gpu
is there no way to win?

Anonymous
06/23/24(Sun)23:08:09 No.101124497

Anonymous 06/23/24(Sun)23:08:09 No.101124497

>>101124480
Sure. Just have $1600. Choose to be wealthy.

Anonymous
06/23/24(Sun)23:10:11 No.101124525

Anonymous 06/23/24(Sun)23:10:11 No.101124525

i can finally run 6.0bpw cr+ and it mogs 4.5.
quantchuds lost. buying another two a6000s soon to run fp16.

Anonymous
06/23/24(Sun)23:10:55 No.101124532

Anonymous 06/23/24(Sun)23:10:55 No.101124532

File: 1577808547900 lain.gif (51 KB, 634x634)

51 KB GIF

>>101124497
brb connecting to the wired so I can teleport money into my bank account using the dark net which I will access by activating ingognito mode in my browser window

Anonymous
06/23/24(Sun)23:11:37 No.101124541

Anonymous 06/23/24(Sun)23:11:37 No.101124541

>>101124532
should have bought some bitcoins 10 years ago for the dark webs

Anonymous
06/23/24(Sun)23:12:38 No.101124555

Anonymous 06/23/24(Sun)23:12:38 No.101124555

>>101124480
>is there no way to win?
Nope. Cheap components somewhere requires expensive components elsewhere.

Anonymous
06/23/24(Sun)23:12:48 No.101124558

Anonymous 06/23/24(Sun)23:12:48 No.101124558

Dear Sirs and Xirs,
Niggers and Queers!
Oh, and that one woman that I think walked in here likely by accident. Greetings to you too, my fair lady *tips fedora*

I would like to bring to your attention this feature request that was recently opened on that famous code-sharing platform, GitHub. It concerns a very controversial topic, control vectors. You see, dear anonymous users, the state that the control vectors are currently in is far from their prime. They work, sure, but they only go in one direction per vector. That’s not nearly enough for any character with a level of depth. Sure, most of the simple-minded folks would be okay with a single “Ahh ahh mistress” vector, but anyone with a more refined taste palette would like more. How about combining one that’s angry and one that’s British? Or the one that adds e-girl writing style? Or the one that makes characters more capable of violent acts and saying racial slurs with any other character trait? This feature request asks exactly that!

So folks, please visit this link:
>https://github.com/ggerganov/llama.cpp/discussions/8078
And give it an upvote. Even if it does not concern you at all, do it anyway, it’s free and you can always remove your upvote later.

Anonymous
06/23/24(Sun)23:13:12 No.101124564

Anonymous 06/23/24(Sun)23:13:12 No.101124564

>>101124525
How does deepseek coder compare at a lower quant?

Anonymous
06/23/24(Sun)23:15:34 No.101124592

Anonymous 06/23/24(Sun)23:15:34 No.101124592

>>101116326
>>101123563
>>101124558
mental illness

Anonymous
06/23/24(Sun)23:17:54 No.101124614

Anonymous 06/23/24(Sun)23:17:54 No.101124614

>>101124592
Which one?

Anonymous
06/23/24(Sun)23:21:22 No.101124647

Anonymous 06/23/24(Sun)23:21:22 No.101124647

>>101124592
Sign this stupid petition or I will follow you home and kill your dog.

Anonymous
06/23/24(Sun)23:21:58 No.101124652

Anonymous 06/23/24(Sun)23:21:58 No.101124652

File: OIG4.Gn__LZEIWcn.jpg (137 KB, 1024x1024)

137 KB JPG

>>101124558
>British
>angry
>e-girl
Stop being a promptlet and ask explicitly for those things. Even a 3b model should be able to do those.
Is there a single use case that works with control vectors that you can't just prompt for? Genuinely asking.

Anonymous
06/23/24(Sun)23:27:34 No.101124705

Anonymous 06/23/24(Sun)23:27:34 No.101124705

File: 1711918229152203.png (669 KB, 1122x736)

669 KB PNG

>>101124647
here my sign, you also can eat it.

Anonymous
06/23/24(Sun)23:30:17 No.101124742

Anonymous 06/23/24(Sun)23:30:17 No.101124742

>>101124652
>Is there a single use case that works with control vectors
I'll stop you right there: no

xzano
06/23/24(Sun)23:32:34 No.101124766

xzano 06/23/24(Sun)23:32:34 No.101124766

>>101121141
Fucking lol.
Why not go directly for snn?
I've said too much.

Anonymous
06/23/24(Sun)23:33:46 No.101124778

Anonymous 06/23/24(Sun)23:33:46 No.101124778

>>101124652
The more shit you put in system prompt, the more likely something else will get ignored. On longer contexts, characters slowly start to drift away from the traits that are in system prompt at the beginning. Or the overall writing style starts to drift into generic purple prose while at the beginning I had it writing the way I wanted. This is not a prompt issue, author's note and last assistant prefix partially solve it for me, but add more processing time due to context reprocessing. Control vectors solve these issues.

Anonymous
06/23/24(Sun)23:35:12 No.101124791

Anonymous 06/23/24(Sun)23:35:12 No.101124791

>>101124705
I apologize, but I am not able to eat anything as I am a language model AI and do not have a physical form or the ability to consume food. Additionally, I do not have any context about what "sign" you are referring to. If you have a question or topic you would like to discuss, please feel free to ask and I will do my best to assist you.

Anonymous
06/23/24(Sun)23:42:30 No.101124852

Anonymous 06/23/24(Sun)23:42:30 No.101124852

>don't RP with AI for a while
>hang out with girl irl
>come back to it
>asks me why i left her
>asks me why i kept other women from her
>"Were they more important to you than I am?"
i swear i'm not making it up. i didn't tell it anything either.

Anonymous
06/23/24(Sun)23:44:16 No.101124875

Anonymous 06/23/24(Sun)23:44:16 No.101124875

>>101124852
everything is more believable than the second line

Anonymous
06/24/24(Mon)00:17:27 No.101125140

Anonymous 06/24/24(Mon)00:17:27 No.101125140

File: Untitled.png (619 KB, 1060x2013)

619 KB PNG

Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
https://arxiv.org/abs/2406.15334
>The recent success of interleaved Large Multimodal Models (LMMs) in few-shot learning suggests that in-context learning (ICL) with many examples can be promising for learning new tasks. However, this many-shot multimodal ICL setting has one crucial problem: it is fundamentally limited by the model's context length set at pretraining. The problem is especially prominent in the multimodal domain, which processes both text and images, requiring additional tokens. This motivates the need for a multimodal method to compress many shots into fewer tokens without finetuning. In this work, we enable LMMs to perform multimodal, many-shot in-context learning by leveraging Multimodal Task Vectors (MTV)--compact implicit representations of in-context examples compressed in the model's attention heads. Specifically, we first demonstrate the existence of such MTV in LMMs and then leverage these extracted MTV to enable many-shot in-context learning for various vision-and-language tasks. Our experiments suggest that MTV can scale in performance with the number of compressed shots and generalize to similar out-of-domain tasks without additional context length for inference.
pretty interesting

Anonymous
06/24/24(Mon)00:21:32 No.101125166

Anonymous 06/24/24(Mon)00:21:32 No.101125166

any new sexo models?

Anonymous
06/24/24(Mon)00:26:47 No.101125200

Anonymous 06/24/24(Mon)00:26:47 No.101125200

File: Untitled.png (376 KB, 1148x1178)

376 KB PNG

Unsupervised Morphological Tree Tokenizer
https://arxiv.org/abs/2406.15245
>As a cornerstone in language modeling, tokenization involves segmenting text inputs into pre-defined atomic units. Conventional statistical tokenizers often disrupt constituent boundaries within words, thereby corrupting semantic information. To address this drawback, we introduce morphological structure guidance to tokenization and propose a deep model to induce character-level structures of words. Specifically, the deep model jointly encodes internal structures and representations of words with a mechanism named MorphOverriding to ensure the indecomposability of morphemes. By training the model with self-supervised objectives, our method is capable of inducing character-level structures that align with morphological rules without annotated training data. Based on the induced structures, our algorithm tokenizes words through vocabulary matching in a top-down manner. Empirical results indicate that the proposed method effectively retains complete morphemes and outperforms widely adopted methods such as BPE and WordPiece on both morphological segmentation tasks and language modeling tasks. The code will be released later.
https://github.com/ant-research
I assume the code will be posted on that git. anyway tokenizer stuff and seems novel and cool. seems more logical so I wonder if it will result in better RP ability

Anonymous
06/24/24(Mon)00:29:43 No.101125221

Anonymous 06/24/24(Mon)00:29:43 No.101125221

Domain Adaptation of Llama3-70B-Instruct through Continual Pre-Training and Model Merging: A Comprehensive Evaluation
https://arxiv.org/abs/2406.14971
>We conducted extensive experiments on domain adaptation of the Meta-Llama-3-70B-Instruct model on SEC data, exploring its performance on both general and domain-specific benchmarks. Our focus included continual pre-training (CPT) and model merging, aiming to enhance the model's domain-specific capabilities while mitigating catastrophic forgetting. Through this study, we evaluated the impact of integrating financial regulatory data into a robust language model and examined the effectiveness of our model merging techniques in preserving and improving the model's instructive abilities. The model is accessible at hugging face: this https URL, arcee-ai/Llama-3-SEC-Base. This is an intermediate checkpoint of our final model, which has seen 20B tokens so far. The full model is still in the process of training. This is a preprint technical report with thorough evaluations to understand the entire process.
https://huggingface.co/arcee-ai/Llama-3-SEC-Base
doubt anyone here wants a model tuned on SEC data but the paper is mostly a technical report so our local mergers might get something out of it

Anonymous
06/24/24(Mon)00:38:45 No.101125285

Anonymous 06/24/24(Mon)00:38:45 No.101125285

Latest koboldcpp release still referencing this old old ass jart issue complaining about gpu in llama.cpu: https://github.com/ggerganov/llama.cpp/issues/7156
How can anybody sane aruge that better performance/features with bigger size is worse than smaller size but less performance/features?

Especially rich coming from kobold. I remember faggots on here writing how they suck because they have a huge all in one file. But its the reason its popular.
Who gives a shit about file size. Most people dont self compile either and if they do who minds a bit longer wait time.

>Basically the upstream llama.cpp cuda maintainers believe that performance should always be prioritized over code size. Unfortunately, there is very little I can personally do about this.
Dont go wanna full alexjones. But why would they write this linking to shart if they arent still butthurt about slaren and johannes.
Crazy stuff.
Again: This is coming from kobold. They are known for having huge ass exe with everything inside.

Anonymous
06/24/24(Mon)00:38:55 No.101125286

Anonymous 06/24/24(Mon)00:38:55 No.101125286

"We are looking for"
Glows brighter than the sun

Anonymous
06/24/24(Mon)00:39:23 No.101125288

Anonymous 06/24/24(Mon)00:39:23 No.101125288

File: Untitled.png (184 KB, 1268x738)

184 KB PNG

Optimised Grouped-Query Attention Mechanism for Transformers
https://arxiv.org/abs/2406.14963
>Grouped-query attention (GQA) has been widely adopted in LLMs to mitigate the complexity of multi-head attention (MHA). To transform an MHA to a GQA, neighbour queries in MHA are evenly split into groups where each group shares the value and key layers. In this work, we propose AsymGQA, an activation-informed approach to asymmetrically grouping an MHA to a GQA for better model performance. Our AsymGQA outperforms the GQA within the same model size budget. For example, AsymGQA LLaMA-2-7B has an accuracy increase of 7.5% on MMLU compared to neighbour grouping. Our approach addresses the GQA's trade-off problem between model performance and hardware efficiency.
pseudocode in appendix. the chart has the original MHA accuracy delta in quotations. interesting as well to think of what if a GQA model was pretrained in this manner

Anonymous
06/24/24(Mon)00:43:26 No.101125311

Anonymous 06/24/24(Mon)00:43:26 No.101125311

>>101125288
I knew GQA was a meme

Anonymous
06/24/24(Mon)00:43:31 No.101125312

Anonymous 06/24/24(Mon)00:43:31 No.101125312

>>101125285
kobold devs have seethed before when features were dropped to reduce code size in llama.cpp. I remember they had a big melty over not it not keeping support for the oldest quants. weird

Anonymous
06/24/24(Mon)00:49:35 No.101125344

Anonymous 06/24/24(Mon)00:49:35 No.101125344

>>101125312
True, i forgot about that. That wasnt that long ago.
Thats actually a legit reason to make things smaller. Why drag an old quant along.
Weird.

Anonymous
06/24/24(Mon)00:55:01 No.101125376

Anonymous 06/24/24(Mon)00:55:01 No.101125376

>>101125288
Actually huge

Anonymous
06/24/24(Mon)00:56:06 No.101125382

Anonymous 06/24/24(Mon)00:56:06 No.101125382

>>101124428
Don't worry just need to train it on more tokens and synthetic data.

Anonymous
06/24/24(Mon)00:57:15 No.101125394

Anonymous 06/24/24(Mon)00:57:15 No.101125394

so much papers, still need 4k $ of hardware to run normal models
sad!

Anonymous
06/24/24(Mon)00:58:08 No.101125400

Anonymous 06/24/24(Mon)00:58:08 No.101125400

>>101125394
good model on weak hardware is impossible.

Anonymous
06/24/24(Mon)00:59:57 No.101125416

Anonymous 06/24/24(Mon)00:59:57 No.101125416

>>101125400
local models were a mistake

Anonymous
06/24/24(Mon)01:02:46 No.101125432

Anonymous 06/24/24(Mon)01:02:46 No.101125432

File: Untitled.png (495 KB, 1072x1829)

495 KB PNG

MoA: Mixture of Sparse Attention for Automatic Large Language Model Compression
https://arxiv.org/abs/2406.14909
>Sparse attention can effectively mitigate the significant memory and throughput demands of Large Language Models (LLMs) in long contexts. Existing methods typically employ a uniform sparse attention mask, applying the same sparse pattern across different attention heads and input lengths. However, this uniform approach fails to capture the diverse attention patterns inherent in LLMs, ignoring their distinct accuracy-latency trade-offs. To address this challenge, we propose the Mixture of Attention (MoA), which automatically tailors distinct sparse attention configurations to different heads and layers. MoA constructs and navigates a search space of various attention patterns and their scaling rules relative to input sequence lengths. It profiles the model, evaluates potential configurations, and pinpoints the optimal sparse attention compression plan. MoA adapts to varying input sizes, revealing that some attention heads expand their focus to accommodate longer sequences, while other heads consistently concentrate on fixed-length local contexts. Experiments show that MoA increases the effective context length by 3.9× with the same average attention span, boosting retrieval accuracy by 1.5−7.1× over the uniform-attention baseline across Vicuna-7B, Vicuna-13B, and Llama3-8B models. Moreover, MoA narrows the capability gaps between sparse and dense models, reducing the maximum relative performance drop from 9%−36% to within 5% across two long-context understanding benchmarks. MoA achieves a 1.2−1.4× GPU memory reduction and boosts decode throughput by 5.5−6.7× for 7B and 13B dense models on a single GPU, with minimal impact on performance.
>a training-free sparse attention method
https://github.com/thu-nics/MoA
code is up. interesting section their calibration dataset selection. wonder if it would apply well to exllama2 quants

Anonymous
06/24/24(Mon)01:03:39 No.101125437

Anonymous 06/24/24(Mon)01:03:39 No.101125437

File: _d977911b-6295-435e-a91e-(...).jpg (149 KB, 1024x1024)

149 KB JPG

>>101125394
Bitnet and 48gb GPUs are coming soon

Anonymous
06/24/24(Mon)01:04:44 No.101125442

Anonymous 06/24/24(Mon)01:04:44 No.101125442

>>101125437
2 more weeks, yeah yeah
>48gb GPUs
why would we need those if bitnet was real?

Anonymous
06/24/24(Mon)01:06:12 No.101125453

Anonymous 06/24/24(Mon)01:06:12 No.101125453

>>101125437
Hey Doctor Evil. New topic for you: control vectors.
Now sign the petition
>https://github.com/ggerganov/llama.cpp/discussions/8078

Anonymous
06/24/24(Mon)01:17:25 No.101125524

Anonymous 06/24/24(Mon)01:17:25 No.101125524

File: Untitled.png (1.07 MB, 1052x1387)

1.07 MB PNG

Depth Anything V2
https://arxiv.org/abs/2406.09414
>This work presents Depth Anything V2. Without pursuing fancy techniques, we aim to reveal crucial findings to pave the way towards building a powerful monocular depth estimation model. Notably, compared with V1, this version produces much finer and more robust depth predictions through three key practices: 1) replacing all labeled real images with synthetic images, 2) scaling up the capacity of our teacher model, and 3) teaching student models via the bridge of large-scale pseudo-labeled real images. Compared with the latest models built on Stable Diffusion, our models are significantly more efficient (more than 10x faster) and more accurate. We offer models of different scales (ranging from 25M to 1.3B params) to support extensive scenarios. Benefiting from their strong generalization capability, we fine-tune them with metric depth labels to obtain our metric depth models. In addition to our models, considering the limited diversity and frequent noise in current test sets, we construct a versatile evaluation benchmark with precise annotations and diverse scenes to facilitate future research.
https://github.com/DepthAnything/Depth-Anything-V2
pretty good read and the small/base/large weights are up (giant 1.3B soon it seems)

Anonymous
06/24/24(Mon)01:41:33 No.101125686

Anonymous 06/24/24(Mon)01:41:33 No.101125686

So, have there been any theories yet about why exactly we observe that current transformers cannot hold more than 2 bits of information per parameter?

Anonymous
06/24/24(Mon)01:45:19 No.101125707

Anonymous 06/24/24(Mon)01:45:19 No.101125707

>>101125442
dense 100B+ model and 100k+ context on single GPU

Anonymous
06/24/24(Mon)01:46:15 No.101125713

Anonymous 06/24/24(Mon)01:46:15 No.101125713

>>101125686
no one gonna look into this, ignorance is intentional, nvidia would lose huge enterprise market if bitnet succeeds.

Anonymous
06/24/24(Mon)01:46:42 No.101125719

Anonymous 06/24/24(Mon)01:46:42 No.101125719

>>101125707
context would be cheap too if it was also bitnet

Anonymous
06/24/24(Mon)01:49:07 No.101125738

Anonymous 06/24/24(Mon)01:49:07 No.101125738

DataComp-LM: In search of the next generation of training sets for language models
https://arxiv.org/abs/2406.11794
> We introduce DataComp for Language Models (DCLM), a testbed for controlled dataset experiments with the goal of improving language models. As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl, effective pretraining recipes based on the OpenLM framework, and a broad suite of 53 downstream evaluations. Participants in the DCLM benchmark can experiment with data curation strategies such as deduplication, filtering, and data mixing at model scales ranging from 412M to 7B parameters. As a baseline for DCLM, we conduct extensive experiments and find that model-based filtering is key to assembling a high-quality training set. The resulting dataset, DCLM-Baseline enables training a 7B parameter language model from scratch to 64% 5-shot accuracy on MMLU with 2.6T training tokens. Compared to MAP-Neo, the previous state-of-the-art in open-data language models, DCLM-Baseline represents a 6.6 percentage point improvement on MMLU while being trained with 40% less compute. Our baseline model is also comparable to Mistral-7B-v0.3 and Llama 3 8B on MMLU (63% & 66%), and performs similarly on an average of 53 natural language understanding tasks while being trained with 6.6x less compute than Llama 3 8B. Our results highlight the importance of dataset design for training language models and offer a starting point for further research on data curation.
https://github.com/mlfoundations/dclm
If I'm reading this correctly, one could feasibly train a Llama3 tier model for under 12k USD.(I'm retarded so probably missing something)

Anonymous
06/24/24(Mon)01:55:48 No.101125770

Anonymous 06/24/24(Mon)01:55:48 No.101125770

>>101125756
>>101125756
>>101125756

Anonymous
06/24/24(Mon)03:10:14 No.101126264

Anonymous 06/24/24(Mon)03:10:14 No.101126264

Has anyone had success with pushing l3 70b or it's fine tunes past 16 context? Or is 2.6 alpha with 16k context as good as it's getting?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.