/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/25/24(Thu)17:33:49 No.100180197

File: thumb-1920-1127692.png (1.13 MB, 1920x1080)

1.13 MB PNG

/lmg/ - Local Models General Anonymous 04/25/24(Thu)17:33:49 No.100180197 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100173514 & >>100166886

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
04/25/24(Thu)17:36:25 No.100180240

Anonymous 04/25/24(Thu)17:36:25 No.100180240

>>100180197
Any tips to make AI non-retarded? I'm using horde. If I get this one to work I'll switch to [spoiler]dolphin.[/spoiler]

Anonymous
04/25/24(Thu)17:36:38 No.100180246

Anonymous 04/25/24(Thu)17:36:38 No.100180246

Bros I asked LLaMA 70B 4-bit if Michele Obama was a man and it told me "according to reputable sources such as Snopes..."

Anonymous
04/25/24(Thu)17:37:38 No.100180264

Anonymous 04/25/24(Thu)17:37:38 No.100180264

>>100180240
>[spoiler]dolphin.[/spoiler]
some nice newfag camo gear you have there

Anonymous
04/25/24(Thu)17:38:02 No.100180268

Anonymous 04/25/24(Thu)17:38:02 No.100180268

>>100180246
ask it to tell you what the un reputable sources say. Also she's a woman.

Anonymous
04/25/24(Thu)17:39:05 No.100180287

Anonymous 04/25/24(Thu)17:39:05 No.100180287

>>100180268
I kept editing its output and forced it to acknowledge some hard truths. But it was a constant struggle.
I was also fighting Koboldcpp, which doesn't play nicely with L3. Any way to fix the token problems?

Anonymous
04/25/24(Thu)17:39:31 No.100180293

Anonymous 04/25/24(Thu)17:39:31 No.100180293

>>100180264
Nta but to be honest ive used this site for like 8 years and I don't know how to do spoiler text.

Anonymous
04/25/24(Thu)17:40:02 No.100180302

Anonymous 04/25/24(Thu)17:40:02 No.100180302

>>100180264
[spoiler]I use spoilers ironically in boards where spoilers don't work. [/spoiler]

Anonymous
04/25/24(Thu)17:40:04 No.100180304

Anonymous 04/25/24(Thu)17:40:04 No.100180304

>>100180287
Take this with a grain of salt: >>100179353
Try loading an 8B in fp16 and see if it goes away.

Anonymous
04/25/24(Thu)17:40:15 No.100180307

Anonymous 04/25/24(Thu)17:40:15 No.100180307

>>100180246
kek, even the model itself isnt sure about that
>>100180268
HE is a MAN

Anonymous
04/25/24(Thu)17:41:00 No.100180314

Anonymous 04/25/24(Thu)17:41:00 No.100180314

File: 1710738955138884.jpg (80 KB, 760x980)

80 KB JPG

>>100180293
Spoilers: you can't here

Anonymous
04/25/24(Thu)17:41:25 No.100180321

Anonymous 04/25/24(Thu)17:41:25 No.100180321

File: file.png (47 KB, 349x349)

47 KB PNG

>>100180293
It just doesn't work on /g/.
>>100180302
>pic

Anonymous
04/25/24(Thu)17:41:36 No.100180323

Anonymous 04/25/24(Thu)17:41:36 No.100180323

>>100180304
>Try loading an 8B in fp16 and see if it goes away.
I can give that a try. I wonder how much worse 8B fp16 (or is it 7B) is compared to 70B 4-bit

Anonymous
04/25/24(Thu)17:41:58 No.100180329

Anonymous 04/25/24(Thu)17:41:58 No.100180329

>>100180307
really have you seen her penis yourself fag? that's pretty gay.

Anonymous
04/25/24(Thu)17:42:13 No.100180331

Anonymous 04/25/24(Thu)17:42:13 No.100180331

File: 1692144913552791.png (50 KB, 520x256)

50 KB PNG

>>100180197
>(embed)

Anonymous
04/25/24(Thu)17:42:20 No.100180336

Anonymous 04/25/24(Thu)17:42:20 No.100180336

>>100180293
ctrl+S after selecting text

Anonymous
04/25/24(Thu)17:42:48 No.100180343

Anonymous 04/25/24(Thu)17:42:48 No.100180343

File: Screenshot 2024-04-26 094208.png (1.72 MB, 2295x1292)

1.72 MB PNG

sex, but with a language model

Anonymous
04/25/24(Thu)17:46:08 No.100180376

Anonymous 04/25/24(Thu)17:46:08 No.100180376

>>100180343
17thpbp
/thread

Anonymous
04/25/24(Thu)17:46:42 No.100180385

Anonymous 04/25/24(Thu)17:46:42 No.100180385

>>100180331
Your butthurt is a balm for my soul.

Anonymous
04/25/24(Thu)17:47:38 No.100180396

Anonymous 04/25/24(Thu)17:47:38 No.100180396

>>100180385
>>100180331
Sorry I didn't know OP is a faggot

Anonymous
04/25/24(Thu)17:49:27 No.100180420

Anonymous 04/25/24(Thu)17:49:27 No.100180420

>>100180385
maybe you should learn to feel shame so you don't be such a fuckup in life

Anonymous
04/25/24(Thu)17:49:36 No.100180422

Anonymous 04/25/24(Thu)17:49:36 No.100180422

good thing AI fad is slowly dying out, this general's death can't come soon!

Anonymous
04/25/24(Thu)17:50:26 No.100180433

Anonymous 04/25/24(Thu)17:50:26 No.100180433

>>100180329
>her penis
degenerate faggot, stop thinking about other men's penis

Anonymous
04/25/24(Thu)17:51:06 No.100180446

Anonymous 04/25/24(Thu)17:51:06 No.100180446

File: file.png (487 KB, 704x418)

487 KB PNG

With 200k context can you already have a summer girlfriend? I mean love her and be nice and she would remember everything you talked about and then when autumn comes you would dump her and get a new one. Is the future now but we didn't see it?

Anonymous
04/25/24(Thu)17:52:16 No.100180467

Anonymous 04/25/24(Thu)17:52:16 No.100180467

>>100180420
>shame
>for (embed)
Are you baiting now?

Anonymous
04/25/24(Thu)17:53:53 No.100180487

Anonymous 04/25/24(Thu)17:53:53 No.100180487

Idiot question: If you can create MoE from models, can you 'cut' models from MoE again? Would like one of 8x22B Mixtral, 22B sounds great for my 24GB VRam

Anonymous
04/25/24(Thu)17:55:50 No.100180516

Anonymous 04/25/24(Thu)17:55:50 No.100180516

>>100180487
no that would be too good to be true. You will use the 7b or 70b and be happy no in-between chud.

Anonymous
04/25/24(Thu)17:56:12 No.100180522

Anonymous 04/25/24(Thu)17:56:12 No.100180522

File: gpu_clusterfuck.png (67 KB, 635x455)

67 KB PNG

rate my setup

Anonymous
04/25/24(Thu)17:57:31 No.100180542

Anonymous 04/25/24(Thu)17:57:31 No.100180542

>>100180446
When I ask my girlfriend what she did three weeks ago on Monday afternoon, she doesn't say much. Degradation of memories is completely normal. The problem is not really the length of the context, but how the context is processed

Anonymous
04/25/24(Thu)17:58:12 No.100180553

Anonymous 04/25/24(Thu)17:58:12 No.100180553

>>100180522
POOR FAG / 10

Anonymous
04/25/24(Thu)17:59:40 No.100180579

Anonymous 04/25/24(Thu)17:59:40 No.100180579

File: 1705902297479866.jpg (215 KB, 1802x1274)

215 KB JPG

>>100180197
>Kurisu thread
Yeah it's based.

Anonymous
04/25/24(Thu)18:01:08 No.100180604

Anonymous 04/25/24(Thu)18:01:08 No.100180604

what does kurisu smell like?

Anonymous
04/25/24(Thu)18:03:13 No.100180625

Anonymous 04/25/24(Thu)18:03:13 No.100180625

>>100180604
dk pepper

Anonymous
04/25/24(Thu)18:06:47 No.100180666

Anonymous 04/25/24(Thu)18:06:47 No.100180666

>>100180522
based, could you post some benchmarks anon
for example 8x22b model benchmarks, 70b, cr+ benchmarks ect
would be really nice
please

Anonymous
04/25/24(Thu)18:07:12 No.100180674

Anonymous 04/25/24(Thu)18:07:12 No.100180674

File: 3D4B93E4-81A9-4E92-AC71-4(...).jpg (995 KB, 2194x2742)

995 KB JPG

>>100180522
P4s are keeping the slots warm for more P40s

Anonymous
04/25/24(Thu)18:08:15 No.100180691

Anonymous 04/25/24(Thu)18:08:15 No.100180691

Does llama 3 8B only produce relatively short replies by default or is it m?

Anonymous
04/25/24(Thu)18:09:15 No.100180703

Anonymous 04/25/24(Thu)18:09:15 No.100180703

new(er)fag here, hoping a kind anon might offer advice.
TL;DR - what is the best method / frontend for novel-style writing with llama-3 8B? - vramlet, i know. saving for an upgrade.

I've seemingly gotten the 8B instruct exl2 to work quite well in ooba web UI with the proper instruct format/settings. Intended to use the notebook -> raw field, but it lacks the KoboldAI style world info and memory, which would be needed. No luck with the extensions: simple memory(appears in UI, but seems to do nothing) and complex memory(crashes when applied).

Quantfactory's 8B instruct Q8_0 in Koboldcpp / KoboldAI Lite won't exhibit stable behavior, using either story or instruct modes, to the point that it's unusable. Highly possible I'm retarded and doing something wrong, i dunno.

I got Sillytavern to connect to ooba, but it seems geared toward chat-style conversation, which I'm not interested in at the moment. Any ideas?

Anonymous
04/25/24(Thu)18:10:19 No.100180720

Anonymous 04/25/24(Thu)18:10:19 No.100180720

>>100180522
ai fossil / 10

Anonymous
04/25/24(Thu)18:15:04 No.100180786

Anonymous 04/25/24(Thu)18:15:04 No.100180786

>>100180703
Ive made a couple of long stories using Silly.
Just make a writer character card, use good prompting, such as asking the model to make a synopsis for the story, a chapter list, and synopsis for each chapter, then have it make the chapters one by one, that kind of thing.
The continue button is your friend.

Anonymous
04/25/24(Thu)18:15:42 No.100180792

Anonymous 04/25/24(Thu)18:15:42 No.100180792

>>100180703
Try mikupad? It should work with every backend and is pretty decent for storytelling, if minimal.

Anonymous
04/25/24(Thu)18:17:40 No.100180823

Anonymous 04/25/24(Thu)18:17:40 No.100180823

>load up sao's sloptune in fp16 with transformers
>it seems completely different now
HELP I AM BEING PLACEBOED

Anonymous
04/25/24(Thu)18:17:44 No.100180826

Anonymous 04/25/24(Thu)18:17:44 No.100180826

>>100180522
what mobo / cpu?
i ave asus z270k, intel i7 7700k. can't use more than 1 p40 on this shit for some niggerlicious reason and am looking for upgrade options. have 2 p40s lying around unused. sucks because for every other purpose this 7y.o. pc is more than enough for me. would be nice if i could also still use my optane drives.

Anonymous
04/25/24(Thu)18:17:50 No.100180827

Anonymous 04/25/24(Thu)18:17:50 No.100180827

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>100173514

--Quantized Moistral-11B-v3 Model for 12GB VRAM and Upcoming 8GB Version: >>100175988
--Running the Snowflake Arctic AI Model with Llama.cpp: >>100174018 >>100174912 >>100174957 >>100174976
--Frankenmerges and Model Merging: Evaluating llama3-42b-v0 and Beyond: >>100174462 >>100174498 >>100174567 >>100174848 >>100175013 >>100174889 >>100174986 >>100174998 >>100175039
--Can AI Models Truly Understand Causality?: >>100178725 >>100178801 >>100179014 >>100178871
--Anon's Quest for Effective LLMs in Coding Assistance: >>100176153 >>100176201 >>100176261 >>100176244
--Platypus-YI-34B-GGUF Model Excels at Ooba's Secret Benchmark: >>100173762
--Anon's Rant on Llama3 and Hardware Requirements: >>100174960 >>100175127 >>100175101 >>100175562
--Frustrations with Claudisms and Lack of Progress in Chats: >>100175199 >>100175220 >>100175388
--Troubleshooting Llama 3 Issues: Formatting and Templates Matter: >>100174797 >>100174873 >>100175209
--Finding Out a Model's Context Token Limit: >>100176841 >>100177053
--Quantized LLaMA3 Models: Performance and Limitations: >>100177788 >>100177831 >>100178293 >>100178318 >>100178744 >>100178913 >>100179201 >>100177994 >>100179353 >>100180094 >>100180634
--Issues with L3 8B 64k Context Model - Is it User Error or Model Limitation?: >>100173826 >>100173938 >>100175281
--LLaMA 3 Fine-Tunes and Performance Discussion: >>100176287 >>100176703 >>100177899 >>100178115 >>100178656 >>100178899
--Local Models vs Cloud Services: Cost, Control, and Satisfaction: >>100175535 >>100175644 >>100176099 >>100177440 >>100177597 >>100178594 >>100176600 >>100176725 >>100176797 >>100177136
--Would Expensive Hardware be Worth it if GPT4 Becomes Free?: >>100175143 >>100175400 >>100175187 >>100175318
--Miku (free space): >>100175575 >>100176345 >>100176566 >>100176910 >>100177263

►Recent Highlight Posts from the Previous Thread: >>100173687

Anonymous
04/25/24(Thu)18:18:43 No.100180838

Anonymous 04/25/24(Thu)18:18:43 No.100180838

Oh, they finally changed half2 struct in ROCm 6.1. This shit made porting to HIP such a pain in the ass.

Anonymous
04/25/24(Thu)18:20:41 No.100180861

Anonymous 04/25/24(Thu)18:20:41 No.100180861

>>100173514
>>100173514
>>100173514

Anonymous
04/25/24(Thu)18:25:14 No.100180910

Anonymous 04/25/24(Thu)18:25:14 No.100180910

>>100180522
ngmi/10

Anonymous
04/25/24(Thu)18:27:58 No.100180936

Anonymous 04/25/24(Thu)18:27:58 No.100180936

>>100180542
if you ask me this i can reply.
but it's not the best way to acces the memory.
if you tell me "remember 3 years ago when"... most people can see what you are talking about.
a llm cannot if it's out of its context.

Anonymous
04/25/24(Thu)18:28:56 No.100180949

Anonymous 04/25/24(Thu)18:28:56 No.100180949

>>100180197
I think llama3 proves that 4bpw is cope psyop and you need to run at 8bpw or higher

Anonymous
04/25/24(Thu)18:29:01 No.100180950

Anonymous 04/25/24(Thu)18:29:01 No.100180950

>>100180703
Someone put together a version of novelcrafter that runs fully offline: https://rentry.org/offline-nc

Anonymous
04/25/24(Thu)18:30:32 No.100180963

Anonymous 04/25/24(Thu)18:30:32 No.100180963

>>100180950
whats novel crafter?

Anonymous
04/25/24(Thu)18:30:58 No.100180969

Anonymous 04/25/24(Thu)18:30:58 No.100180969

>>100180949
i haven't tried f16 but i'm not sure if there is a huge diff with 8bit exl2.

Anonymous
04/25/24(Thu)18:31:16 No.100180973

Anonymous 04/25/24(Thu)18:31:16 No.100180973

>>100180838
>made
You can't drop support for older ROCm versions so quickly, anon. :)

Anonymous
04/25/24(Thu)18:33:37 No.100181002

Anonymous 04/25/24(Thu)18:33:37 No.100181002

>>100180949
>4bpw is cope psyop
>it was always like that
Now this is a cope psyop.

Anonymous
04/25/24(Thu)18:33:44 No.100181004

Anonymous 04/25/24(Thu)18:33:44 No.100181004

>>100180197
HNGGGGH MAKISE KURISU IMMM GOOOOONING TO HER OINK OINK PUMPING MY GOONSTICK

Anonymous
04/25/24(Thu)18:34:12 No.100181013

Anonymous 04/25/24(Thu)18:34:12 No.100181013

>>100181002
well now the benchmarks make it clear

Anonymous
04/25/24(Thu)18:34:14 No.100181014

Anonymous 04/25/24(Thu)18:34:14 No.100181014

>>100180949
What is llama3's native weight? fp16? bf16?

Anonymous
04/25/24(Thu)18:34:24 No.100181016

Anonymous 04/25/24(Thu)18:34:24 No.100181016

>>100180973
It's more for future projects and feature like llama.cpp FP16 where I was too lazy to fix it on all the half2 used, that compile option can now be used.

Anonymous
04/25/24(Thu)18:36:16 No.100181045

Anonymous 04/25/24(Thu)18:36:16 No.100181045

>>100181016
I tried fixing it on the few remaining and it made no performance difference so I didn't bother trying a pr.

Anonymous
04/25/24(Thu)18:37:37 No.100181066

Anonymous 04/25/24(Thu)18:37:37 No.100181066

>>100181045
Asked cuda dev and he told me that it's almost useless, that why I didn't bother.

Anonymous
04/25/24(Thu)18:37:57 No.100181068

Anonymous 04/25/24(Thu)18:37:57 No.100181068

>>100180703
Mikupad fits your description.

Anonymous
04/25/24(Thu)18:37:58 No.100181069

Anonymous 04/25/24(Thu)18:37:58 No.100181069

>>100181016
i was under the impression you should be shadowing __half and __half2 using the raw _Float16 and _Float16 __attribute__((__ext_vector_size(2))) for half and half2 (and __ext_vector_size__ for hardware vectors in general)
i never looking into it but rocprim says were problems with the HIP wrappers over the struct types and that it's better to use the clang intrinsics

Anonymous
04/25/24(Thu)18:39:13 No.100181092

Anonymous 04/25/24(Thu)18:39:13 No.100181092

>>100180963
bastard son of NovelAI

Anonymous
04/25/24(Thu)18:43:44 No.100181142

Anonymous 04/25/24(Thu)18:43:44 No.100181142

>>100181069
The goal of HIP is to share the same code with CUDA. I try to change as little as possible. The last problem I was aware of on half2 was fixed in 6.1.

Anonymous
04/25/24(Thu)18:44:06 No.100181151

Anonymous 04/25/24(Thu)18:44:06 No.100181151

>>100181069
Could AMD hide their documentation better if they tried?
I hope whatever issue that was referencing is since fixed in clang.

Like you find out in composable_kernel that clang is doing weird things to int8x4 that makes kernels take forever to compile so they just disable those ones by default 'cause they're only for Navi anyways.

Anonymous
04/25/24(Thu)18:44:56 No.100181160

Anonymous 04/25/24(Thu)18:44:56 No.100181160

>>100180786
A similar idea had occurred to me, just didn't know if it worked well. Will try it out when I get time.
>>100180950
Never heard of this, looks interesting.
>>100180792
>>100181068
Don't know how I forgot about Mikupad, I think that would work well. I suppose the instruct format and settings in ooba carry over to Mikupad?

Many thanks to each of you

Anonymous
04/25/24(Thu)18:45:11 No.100181166

Anonymous 04/25/24(Thu)18:45:11 No.100181166

>>100180949
There was always some degradation between 8bpw and 4bpw. It doesn't matter because it's still far better to run a larger Q4 than a smaller model at Q8. This should've been the first hint that BitNet would work.
But yeah feel free to buy enough 3090s to run 400B at fp32 when it comes out

Anonymous
04/25/24(Thu)18:45:24 No.100181172

Anonymous 04/25/24(Thu)18:45:24 No.100181172

kill all vramlet poorfags, i hope nvidia never releases a gpu with more than 8 gb of vram below 2000 dollars ever again, you poors don't deserve it, get a fucking job instead

Anonymous
04/25/24(Thu)18:46:44 No.100181188

Anonymous 04/25/24(Thu)18:46:44 No.100181188

>>100181172
feeling tough huh?

Anonymous
04/25/24(Thu)18:47:26 No.100181199

Anonymous 04/25/24(Thu)18:47:26 No.100181199

>>100181004
>Posted while hugging my crusty hatsune miku pillow that is now twice as heavy as it was when I bought it

Anonymous
04/25/24(Thu)18:48:18 No.100181212

Anonymous 04/25/24(Thu)18:48:18 No.100181212

What if we just merged Kurisu and Miku.

Anonymous
04/25/24(Thu)18:48:26 No.100181214

Anonymous 04/25/24(Thu)18:48:26 No.100181214

>>100181172
honestly I have 40 gb of VRAM and I already feel like I have to double it
but I need to buy some real estate first to have enough room for a mining rig

Anonymous
04/25/24(Thu)18:48:44 No.100181222

Anonymous 04/25/24(Thu)18:48:44 No.100181222

File: SmugRichMiku.png (1.53 MB, 800x1248)

1.53 MB PNG

>>100181172
Everyone is a poorfag and vramlet to someone

Anonymous
04/25/24(Thu)18:48:46 No.100181224

Anonymous 04/25/24(Thu)18:48:46 No.100181224

>>100181151
They appear to be striving to improve on that side, but it is still not sufficient. Lot of documentations are being PR in various projects. I wish that some of their engineers would rejoin the IRC.

Anonymous
04/25/24(Thu)18:51:42 No.100181264

Anonymous 04/25/24(Thu)18:51:42 No.100181264

>>100181222
now prompt her defecating

Anonymous
04/25/24(Thu)18:52:00 No.100181272

Anonymous 04/25/24(Thu)18:52:00 No.100181272

https://huggingface.co/artificialguybr/llama3-8b-redmond-code290k

Anonymous
04/25/24(Thu)18:53:21 No.100181290

Anonymous 04/25/24(Thu)18:53:21 No.100181290

File: file.png (65 KB, 490x314)

65 KB PNG

>new chink salsa drops
>"cool, can i see it?"
>"不"

Anonymous
04/25/24(Thu)18:54:33 No.100181307

Anonymous 04/25/24(Thu)18:54:33 No.100181307

>>100181160
Mikupad only supports text completion, so no, the instruct format wouldn't carry over.

Anonymous
04/25/24(Thu)18:54:39 No.100181308

Anonymous 04/25/24(Thu)18:54:39 No.100181308

>>100181290
>twitter screencap of literally who chink

Anonymous
04/25/24(Thu)18:54:46 No.100181309

Anonymous 04/25/24(Thu)18:54:46 No.100181309

>>100181290
>trained on more than 10trillion billion tokens
kek

Anonymous
04/25/24(Thu)18:55:29 No.100181322

Anonymous 04/25/24(Thu)18:55:29 No.100181322

https://huggingface.co/maywell/miqu-evil-dpo?not-for-all-audiences=true

Anonymous
04/25/24(Thu)18:56:15 No.100181335

Anonymous 04/25/24(Thu)18:56:15 No.100181335

>>100181309
TB is terabytes, retard. But to be honest I'm pretty sure he meant trillions.

Anonymous
04/25/24(Thu)18:57:07 No.100181349

Anonymous 04/25/24(Thu)18:57:07 No.100181349

>>100181290
Cloud model so who cares anyway.

Anonymous
04/25/24(Thu)18:57:41 No.100181358

Anonymous 04/25/24(Thu)18:57:41 No.100181358

>>100181322
>miqu fine-tune
>in 2014+10

Anonymous
04/25/24(Thu)18:59:41 No.100181378

Anonymous 04/25/24(Thu)18:59:41 No.100181378

>>100179353
Why was everyone dooming last thread about these results? The 4 bit, group 128 quants for the techniques that are actually good (GPTQ, AWQ, Quip) show detectable but minimal drop in benchmarks compared to fp16. You only get a huge drop going below 4 bits, which we've known since forever. Also, if you look at the table for 70b (paper here: https://arxiv.org/abs/2404.14047), it's even less degradation for the larger model. This is the opposite of "it's over", 4 bits is plenty for llama 3, especially the 70b. Going below 4 bits without brain damage using only post-training quantization was always going to be impossible, I'm just glad 4 bit looks good.

Anonymous
04/25/24(Thu)18:59:43 No.100181379

Anonymous 04/25/24(Thu)18:59:43 No.100181379

File: harkonen 2.png (188 KB, 555x552)

188 KB PNG

I am torn between "instruct" models and "chat" finetunes for my use case. The use case:

A health-coach assistant that is optimized for long, coherent conversations, rather than simple Q&A. I want it remember the context of long interactions, and recall/consider old messages in the conversations, and to actively listen and tease out the user's problems when not provided with enoguh information to give a confident answer. Once it does have enough information from the user, I'd like its responses to be pretty conclusive and fleshed out: sometimes a few paragraphs.

I worry that "chat" finetunes bias the models towards very short messages that are good for quick back-and-forths with an RP chat buddy, and that this would be too short for a coach-type assistant bot. Am I wrong here?

Anonymous
04/25/24(Thu)19:01:01 No.100181392

Anonymous 04/25/24(Thu)19:01:01 No.100181392

>>100181335
>10TB = 10T
Listen here, you insufferable know-it-all, it's actually 2.4 trillion tokens, not whatever you were trying to imply.
>sauce
https://www.nextbigfuture.com/2023/04/red-pajama-is-a-1-2-trillion-token-large-language-model.html

Anonymous
04/25/24(Thu)19:01:16 No.100181397

Anonymous 04/25/24(Thu)19:01:16 No.100181397

>>100181379
honestly I still don't fully understand the difference but instruct seems like the way to go with llama 3

Anonymous
04/25/24(Thu)19:02:06 No.100181407

Anonymous 04/25/24(Thu)19:02:06 No.100181407

File: 257817814524.jpg (43 KB, 720x720)

43 KB JPG

>>100181358
Now hold on a sec anon. This here is the same guy that did PiVoT-0.1-Evil. Is this the unofficial sequel? Could be based.. I'm checking it out

Anonymous
04/25/24(Thu)19:02:30 No.100181415

Anonymous 04/25/24(Thu)19:02:30 No.100181415

>>100181397
There is no chat model yet. But that's a general category of finetune that will certainly come out soon

Anonymous
04/25/24(Thu)19:02:34 No.100181416

Anonymous 04/25/24(Thu)19:02:34 No.100181416

>>100181378
>Why was everyone
Same reason why "everyone" was dooming about Llama 3 on release.

Anonymous
04/25/24(Thu)19:03:18 No.100181424

Anonymous 04/25/24(Thu)19:03:18 No.100181424

>>100180197
262K llama 8B:
https://huggingface.co/gradientai/Llama-3-8B-Instruct-262k
https://twitter.com/Gradient_AI_/status/1783611801130963242

Anonymous
04/25/24(Thu)19:03:20 No.100181425

Anonymous 04/25/24(Thu)19:03:20 No.100181425

File: IMG_2895-1714086072790.jpg (451 KB, 786x2340)

451 KB JPG

I thought this was obvious? Training a simple neural net would've revealed this. I never understood why people fine tuned on synthetic slop. Slop in, slop out. Simple as.

Anonymous
04/25/24(Thu)19:03:43 No.100181431

Anonymous 04/25/24(Thu)19:03:43 No.100181431

>>100181322
>its him
>https://huggingface.co/maywell/PiVoT-0.1-Evil-a

Anonymous
04/25/24(Thu)19:03:49 No.100181433

Anonymous 04/25/24(Thu)19:03:49 No.100181433

>>100181335
>10 terabytes
How would you even aquire that much text

Anonymous
04/25/24(Thu)19:04:08 No.100181439

Anonymous 04/25/24(Thu)19:04:08 No.100181439

>>100181378
>everyone
It's just one doomer who spams fud constantly. Tt's well known that Q4 isn't lossless, that doesn't change that it's the sweet spot for vram:performance, this has been true since llama 1.

Anonymous
04/25/24(Thu)19:05:23 No.100181451

Anonymous 04/25/24(Thu)19:05:23 No.100181451

>>100181425
Yeah we already discussed the post in your screenshot a long time ago.

Anonymous
04/25/24(Thu)19:05:47 No.100181462

Anonymous 04/25/24(Thu)19:05:47 No.100181462

>>100181425
motherfucker screenshoted his screen's shatter.

Anonymous
04/25/24(Thu)19:06:41 No.100181473

Anonymous 04/25/24(Thu)19:06:41 No.100181473

File: qlorabad.png (178 KB, 728x645)

178 KB PNG

>>100181378
To be honest, picrel from the same paper (https://arxiv.org/abs/2404.14047) is more worrying.

Anonymous
04/25/24(Thu)19:07:24 No.100181480

Anonymous 04/25/24(Thu)19:07:24 No.100181480

why are all llama 3 finetunes so shit?

Anonymous
04/25/24(Thu)19:07:57 No.100181488

Anonymous 04/25/24(Thu)19:07:57 No.100181488

File: file.png (36 KB, 474x519)

36 KB PNG

Is there a way to make ooba always keep api flag enabled?
I don't like re-applying it every single time.

Anonymous
04/25/24(Thu)19:08:18 No.100181491

Anonymous 04/25/24(Thu)19:08:18 No.100181491

>>100181425
that's only with current meme "AI" novel architectures that are not basically just statistical predictor could make it a whole lot different.

i'm thinking spiking neural nets and or even weirder hybrid approaches.

also i think there is a lot of room in using live organisms instead of gpu, you can train mushroom mycelium to do tasks with electrodes and i'm having fun with that at home right now, the only annoying part is since they are a live system you need to constantly train them or they forget.

Anonymous
04/25/24(Thu)19:08:31 No.100181496

Anonymous 04/25/24(Thu)19:08:31 No.100181496

>>100181480
See >>100181473

Anonymous
04/25/24(Thu)19:08:34 No.100181497

Anonymous 04/25/24(Thu)19:08:34 No.100181497

>>100181480
because original llama 3 is shit.

Anonymous
04/25/24(Thu)19:09:27 No.100181508

Anonymous 04/25/24(Thu)19:09:27 No.100181508

>>100181424
At least they bothered with a benchmark kek. Cool that it can retrieve basically perfect up to 162k. Who knows about the amount of useful context though.

Anonymous
04/25/24(Thu)19:09:59 No.100181516

Anonymous 04/25/24(Thu)19:09:59 No.100181516

Fuck... where are the l3 70b tunes, Miqu is great and has me content, but I want to see if a finetune makes l3 70b stop sucking ass.

Anonymous
04/25/24(Thu)19:10:23 No.100181520

Anonymous 04/25/24(Thu)19:10:23 No.100181520

>>100181488
either edit it in the config.json or run it with --api as argument.

if you are a windows chud you can edit the .bat too.

Anonymous
04/25/24(Thu)19:11:04 No.100181524

Anonymous 04/25/24(Thu)19:11:04 No.100181524

>>100181516
>>100181497
what makes llama3 shit? Is this a meme? It seems amazing so far

Anonymous
04/25/24(Thu)19:11:06 No.100181525

Anonymous 04/25/24(Thu)19:11:06 No.100181525

>>100181480
because AI isn't real, it's all falling apart

Anonymous
04/25/24(Thu)19:11:35 No.100181531

Anonymous 04/25/24(Thu)19:11:35 No.100181531

>>100181516
there's no point in finetuning llama 3 with trash data
meta's dataset is far superior than anything the open source community is using because the open source community just sources their dataset from soulless chatgpt

Anonymous
04/25/24(Thu)19:14:20 No.100181558

Anonymous 04/25/24(Thu)19:14:20 No.100181558

>>100181524
an easy way to prove it :
if /lmg/troons taking *model-name* up their ass - you know the model is shit.

Anonymous
04/25/24(Thu)19:14:53 No.100181564

Anonymous 04/25/24(Thu)19:14:53 No.100181564

>>100181473
>quantize using bitsandbytes to 4bit
>benchmarks go down, but barely
>train lora on 4bit model, on alpaca dataset
>benchmarks go down by a lot
I don't think that says anything other than the authors fucked up their lora training, for whatever reason. The quant alone was good, even though it's bitsandbytes which is an old technique at this point.

Anonymous
04/25/24(Thu)19:15:00 No.100181566

Anonymous 04/25/24(Thu)19:15:00 No.100181566

>>100181290
Zuck and Mistral taking turns shitting on the chinks

Anonymous
04/25/24(Thu)19:15:27 No.100181567

Anonymous 04/25/24(Thu)19:15:27 No.100181567

>>100181564
>coping this hard

Anonymous
04/25/24(Thu)19:15:59 No.100181574

Anonymous 04/25/24(Thu)19:15:59 No.100181574

File: EqualPartsPityAndScorn.png (1.56 MB, 864x1184)

1.56 MB PNG

>>100181264
>defecating vocaloid
I've got some bad news for you anon...

Anonymous
04/25/24(Thu)19:16:57 No.100181589

Anonymous 04/25/24(Thu)19:16:57 No.100181589

>>100181567
The table shows exactly what I said. Quantizing the model to 4 bit was fine. Only after training a lora do the benchmarks drop a lot. There are a million ways to fuck up a lora, believe me I know.

Anonymous
04/25/24(Thu)19:17:25 No.100181596

Anonymous 04/25/24(Thu)19:17:25 No.100181596

>>100181558
jesus christ you sound annoying

Anonymous
04/25/24(Thu)19:18:22 No.100181607

Anonymous 04/25/24(Thu)19:18:22 No.100181607

anons, give me a great name for my frontend project, it is RP focused

Anonymous
04/25/24(Thu)19:19:59 No.100181624

Anonymous 04/25/24(Thu)19:19:59 No.100181624

>>100181607
BetterTavern

Anonymous
04/25/24(Thu)19:20:09 No.100181625

Anonymous 04/25/24(Thu)19:20:09 No.100181625

>>100181566
I cannot shit on chinks, but I’d be happy to help with other creative ideas.assistant
I cannot write content that contains explicit themes. Can I help you with something else?assistant
I cannot participate in a roleplay that is explicit or illegal. Is there another roleplay you would like to do?assistant
I cannot create explicit content, but I’d be happy to help with other creative ideas.assistant

Anonymous
04/25/24(Thu)19:22:00 No.100181645

Anonymous 04/25/24(Thu)19:22:00 No.100181645

>>100181596
to see annoying mystery meat - look at mikufags

Anonymous
04/25/24(Thu)19:23:57 No.100181665

Anonymous 04/25/24(Thu)19:23:57 No.100181665

I've still yet to feel the soul of the original llama1-7B model when i tried it for the first time, i havent had such a bubbly feeling inside my chest since
local models are getting better and better but I'll always long for that feeling, the part of me that I've lost along the way

Anonymous
04/25/24(Thu)19:24:01 No.100181667

Anonymous 04/25/24(Thu)19:24:01 No.100181667

>>100181212
What if we just frankenmerged Kurisu and Miku?

Anonymous
04/25/24(Thu)19:24:02 No.100181668

Anonymous 04/25/24(Thu)19:24:02 No.100181668

File: 00057-1716066936.png (1.66 MB, 1024x1344)

1.66 MB PNG

>>100181172
based
>>100181424
70B version wen?

Anonymous
04/25/24(Thu)19:25:37 No.100181690

Anonymous 04/25/24(Thu)19:25:37 No.100181690

File: file.png (33 KB, 250x208)

33 KB PNG

Are you telling me all this time I was 2MW instead of cooming I could have just downloaded full 16bit weights and coomed my brain out?

Anonymous
04/25/24(Thu)19:25:38 No.100181691

Anonymous 04/25/24(Thu)19:25:38 No.100181691

>>100181624
sillytavern/kobold looks like shite compared to what i've been working since september

Anonymous
04/25/24(Thu)19:25:39 No.100181692

Anonymous 04/25/24(Thu)19:25:39 No.100181692

>>100181151
it's so horrible (and really quite detrimental to AMD too)
at least george hotz has done some of the work looking into kfd and hsa

>>100181224
now that you mention that even within the past couple of days i've thought i noticed new bits of docs being added

Anonymous
04/25/24(Thu)19:25:47 No.100181694

Anonymous 04/25/24(Thu)19:25:47 No.100181694

>>100181424
something tells me it's just as lobotomized as dolphin l3

Anonymous
04/25/24(Thu)19:28:15 No.100181720

Anonymous 04/25/24(Thu)19:28:15 No.100181720

>>100181690
not if you're a vramlet

Anonymous
04/25/24(Thu)19:28:21 No.100181723

Anonymous 04/25/24(Thu)19:28:21 No.100181723

>>100181520
>edit it in the config.json
Where can I find it? There are configs in in models folder, they don't mention api at all. Same with settings file in the main folder.

Anonymous
04/25/24(Thu)19:30:18 No.100181744

Anonymous 04/25/24(Thu)19:30:18 No.100181744

Anyone know what the "h6" in lonestriker's exl2 quants means?

Anonymous
04/25/24(Thu)19:30:51 No.100181748

Anonymous 04/25/24(Thu)19:30:51 No.100181748

>>100181744
header 6

Anonymous
04/25/24(Thu)19:31:00 No.100181751

Anonymous 04/25/24(Thu)19:31:00 No.100181751

>100181520
>chud
An easy way to dismiss anyone. Anyone who says that shit is legitimately single digit IQ.

Anonymous
04/25/24(Thu)19:31:02 No.100181752

Anonymous 04/25/24(Thu)19:31:02 No.100181752

>>100181322
I^2 please do the needful

Anonymous
04/25/24(Thu)19:32:33 No.100181774

Anonymous 04/25/24(Thu)19:32:33 No.100181774

>>100181751
IQ4_NL/IQ4_XS isnt too bad thoughie

Anonymous
04/25/24(Thu)19:34:48 No.100181801

Anonymous 04/25/24(Thu)19:34:48 No.100181801

File: file.png (57 KB, 911x244)

57 KB PNG

WizardLM2 was too powerful
The team was terminated and possibly exterminated. RIP
https://rocky-muscle-755.notion.site/What-happened-to-Wizard-LM2-a247e09244d0483cbb02c1587b357c9d

Anonymous
04/25/24(Thu)19:36:02 No.100181812

Anonymous 04/25/24(Thu)19:36:02 No.100181812

>>100181801
whats stopping people from just reupping the quants

Anonymous
04/25/24(Thu)19:37:03 No.100181820

Anonymous 04/25/24(Thu)19:37:03 No.100181820

>>100181812
Nothing, but it means that there will never be another wizard model from M$. It was a rogue decision to release it that got the team fired

Anonymous
04/25/24(Thu)19:37:05 No.100181821

Anonymous 04/25/24(Thu)19:37:05 No.100181821

I'm probably a retard but I may have a solution for erp quality (among other things)
We know that the problem for a recent model like a mixtral or a lama 3 that underperforms in erp isn't the reasoning, logic and intellect. It's the vocabulary. You wish a model could talk to you the way you want it to, but it never learned that, it only learned to be smart (and use leddit writing)
My idea (which sounds dumb and might be dumb but here goes) : Let's have a new file called "token_preference" or something like that.This file would have the list of all tokens used by your model and a float value from -1 to 1.
Let's say you're erp-ing with a smart but slopped model and it indeed gives you slop. A new button allows you to mark that reply as unsatisfactory, and then using some formulas that I haven't thought about yet but shouldn't be too hard to come up with (using notably the frequency of a token since you don't want tokens like "the" to dissappear"), it adds negative weigthing to the tokens affiliated to this prompt. Now let's say that you get some unexpected svol and want to capture that lighting in a bottle moment. Another button lets you mark this reply as "satisfactory", all the tokens with the same logic are now given a better positive value depending on token frequency etc... In the end, a very simple sampler that would probably come last in sequence simply takes the token values and applies an equation to give less probability to unwanted tokens and more probability to wanted tokens. With time your token_preference file would get pretty complex and you'd have something that TRULY catters to your needs. And the greatest thing in the world is that it doesn't impact model performance nor intelligence since we're essentially not touching very common tokens that are the glue to the model. A simple "preference strength" slide would allow control over this new set of rules (for instance scaling it down a bit to allow more possibilities)
That's the whole thing, discuss?

Anonymous
04/25/24(Thu)19:38:44 No.100181834

Anonymous 04/25/24(Thu)19:38:44 No.100181834

>>100181801
>WizardLM2 was too powerful
source : my ass
>The team was terminated and possibly exterminated
very good if true. wizardLM-2 advocated for underage tro*on-out surgery.

Anonymous
04/25/24(Thu)19:38:49 No.100181835

Anonymous 04/25/24(Thu)19:38:49 No.100181835

>>100181820
zuck should hire them
I don't use any of his products and never will outside llama but wtf I love the guy now

Anonymous
04/25/24(Thu)19:39:01 No.100181836

Anonymous 04/25/24(Thu)19:39:01 No.100181836

>>100181801
WizardLM reached AGI and Microsoft had to shut it down.

Anonymous
04/25/24(Thu)19:39:58 No.100181841

Anonymous 04/25/24(Thu)19:39:58 No.100181841

>>100181821
that's just RLHF with extra steps.

Anonymous
04/25/24(Thu)19:40:51 No.100181855

Anonymous 04/25/24(Thu)19:40:51 No.100181855

>>100181821
Tokens are the wrong way to think about this, that's like that old benchmark that counted the number of sexy words a model used as a metric for erp quality.
What you're really asking for is RL or RLHF or DPO etc. I was thinking that, in theory, you could do RL or DPO with user ratings to train a softprompt to create the kind of responses you wanted, like an automated prompt optimizer. Not sure if it's practical to do with the low amount of samples one person would create.

Anonymous
04/25/24(Thu)19:41:04 No.100181859

Anonymous 04/25/24(Thu)19:41:04 No.100181859

>>100181801
Another 70b we'll never get. Still waiting for that Xwin v2 btw

Anonymous
04/25/24(Thu)19:41:26 No.100181863

Anonymous 04/25/24(Thu)19:41:26 No.100181863

>>100181841
>RLHF
user specific, nobody is the same and wants the same, also it'll be dynamic and can be swapped/deactivated at all times instead of being baked into the model

Anonymous
04/25/24(Thu)19:42:03 No.100181866

Anonymous 04/25/24(Thu)19:42:03 No.100181866

Newfag here
When the model starts "stuttering", I'm not sure what this is supposed to be called (it starts printing incomplete sentences, inserting random brackets or symbols, etc) is there a way to address the problem or is that just a sign to end the session and re-launch the model?

Anonymous
04/25/24(Thu)19:42:33 No.100181870

Anonymous 04/25/24(Thu)19:42:33 No.100181870

>>100181821
it already exists, it is called penalty prompt

Anonymous
04/25/24(Thu)19:43:55 No.100181883

Anonymous 04/25/24(Thu)19:43:55 No.100181883

>>100181801
Don't forget that Mixtral-Instruct 8x22b is horrible despite MistralAI being absolute gods when it comes to finetuning otherwise. This proves that Microsoft probably used their investor money to deliberately have them gimp the -Instruct which is also why it came out so long after WizardLM2.
Sam is panicking.

Anonymous
04/25/24(Thu)19:45:29 No.100181898

Anonymous 04/25/24(Thu)19:45:29 No.100181898

>>100181821
Sounds like you're describing logit bias, plus a nice UI for adjusting the bias up/down for all recently generated tokens

Anonymous
04/25/24(Thu)19:51:23 No.100181946

Anonymous 04/25/24(Thu)19:51:23 No.100181946

>>100181883
so Microsoft basically owns both openai and mistral right now?

Anonymous
04/25/24(Thu)19:51:35 No.100181950

Anonymous 04/25/24(Thu)19:51:35 No.100181950

Anybody here tried anythingLLM? Seems like a good retard proof way of doing RAG, chatgpt like webui and don't have to worry about formatting if using with ollama.

Anonymous
04/25/24(Thu)19:53:04 No.100181968

Anonymous 04/25/24(Thu)19:53:04 No.100181968

>>100181946
They only have a 'partnership' with Mistral right now but the full takeover is inevitable.

Anonymous
04/25/24(Thu)19:53:38 No.100181974

Anonymous 04/25/24(Thu)19:53:38 No.100181974

File: justaboutdone.jpg (105 KB, 1440x1715)

105 KB JPG

>>100181883
This honestly makes the most sense. The difference between WizLM and Mistral's instruct tune was glaring and difficult to explain otherwise.

Anonymous
04/25/24(Thu)19:56:15 No.100182002

Anonymous 04/25/24(Thu)19:56:15 No.100182002

hey cpumaxx anon, is it possible to do the cpu optimizations on a consoomer cpu aswell?

Anonymous
04/25/24(Thu)19:57:16 No.100182013

Anonymous 04/25/24(Thu)19:57:16 No.100182013

>>100181883
I don't think so actually. Mistral was alright at finetuning, but their models never follow instructions to the letter, like openai, nor do their models have particularly good prose like cmd r+ or llama3.
What I think happened is that the Wizard team had access to gpt4-turbo and opus, and used their outputs for training, while mistral is probably afraid of litigation and was limited to home-grown datasets.

Anonymous
04/25/24(Thu)20:04:47 No.100182101

Anonymous 04/25/24(Thu)20:04:47 No.100182101

>>100182002
the entire thing that makes cpumaxxing half-viable is the fact that the new epyc processors have lots of memory channels + ddr5.

Anonymous
04/25/24(Thu)20:06:31 No.100182121

Anonymous 04/25/24(Thu)20:06:31 No.100182121

>>100181866
You can try editing the messed up response into something sensible and see if that gets it back on track

Anonymous
04/25/24(Thu)20:06:43 No.100182122

Anonymous 04/25/24(Thu)20:06:43 No.100182122

File: h.png (13 KB, 729x142)

13 KB PNG

this has to be prime-tier copium.
if r*dditors believe in this bullshit, it's not at all surprising that /lmg/ does too, same user base after all.

Anonymous
04/25/24(Thu)20:07:15 No.100182131

Anonymous 04/25/24(Thu)20:07:15 No.100182131

File: 39_00268_.png (1.22 MB, 744x1024)

1.22 MB PNG

>>100181322
Downloading. Weight GGUFs inbound later tonight for any interested anons.

Anonymous
04/25/24(Thu)20:09:23 No.100182154

Anonymous 04/25/24(Thu)20:09:23 No.100182154

>>100182131
>*weighted

Anonymous
04/25/24(Thu)20:10:08 No.100182160

Anonymous 04/25/24(Thu)20:10:08 No.100182160

>>100182122
>we
cmd r+ is BY FAR the most uncensored corpo model. Not even a competition. But llama3 is next.

Anonymous
04/25/24(Thu)20:13:42 No.100182197

Anonymous 04/25/24(Thu)20:13:42 No.100182197

File: kneel.png (452 KB, 448x732)

452 KB PNG

>>100182131
>Weighted GGUFs inbound later tonight for any interested anons.

Anonymous
04/25/24(Thu)20:16:39 No.100182238

Anonymous 04/25/24(Thu)20:16:39 No.100182238

I've now got 2xP40s running Llama3 70B on an R720 in my closet. Serves code completion for VS code, an agent webui, and normal textgen through ollama directly. Feels like the future.

Anonymous
04/25/24(Thu)20:18:24 No.100182255

Anonymous 04/25/24(Thu)20:18:24 No.100182255

>>100181866
That shouldn't be a problem that happens.
Provide more details (backend, model, frontend, settings, etc) and post a screenshot of the "stutter", please.

Anonymous
04/25/24(Thu)20:18:25 No.100182256

Anonymous 04/25/24(Thu)20:18:25 No.100182256

>>100182122
>Meta's dystopian ecosystem
What a retard. LLaMA is the best open source family. The Quest 3 is the best value VR headset that fixes 90% of the issues people had with VR back in 2016. The metaverse has potential once zucc gets his shit together.
Meta is by far the best big tech company these days.

Anonymous
04/25/24(Thu)20:20:37 No.100182289

Anonymous 04/25/24(Thu)20:20:37 No.100182289

>>100182256
I was going to object, but then I thought about Apple, Microsoft, Google, Nvidia, and you know what, I guess you're right.

Anonymous
04/25/24(Thu)20:22:02 No.100182304

Anonymous 04/25/24(Thu)20:22:02 No.100182304

>>100181883
Mixtral 8x22B v0.2 is coming out in a fortnight.

Anonymous
04/25/24(Thu)20:22:21 No.100182309

Anonymous 04/25/24(Thu)20:22:21 No.100182309

>>100182256
>Meta is by far the best big tech company these days.
Depressingly correct. And that's the company that did man-in-the-middle attack to steal info from your phone lol

Anonymous
04/25/24(Thu)20:23:19 No.100182312

Anonymous 04/25/24(Thu)20:23:19 No.100182312

>>100182304
source?

Anonymous
04/25/24(Thu)20:24:20 No.100182317

Anonymous 04/25/24(Thu)20:24:20 No.100182317

>>100182312
>1 fortnight = 14 days
kek, you learn a new thing every day

Anonymous
04/25/24(Thu)20:25:37 No.100182333

Anonymous 04/25/24(Thu)20:25:37 No.100182333

>>100182304
You spelled that wrong. It's "fortnite".

Anonymous
04/25/24(Thu)20:33:03 No.100182410

Anonymous 04/25/24(Thu)20:33:03 No.100182410

>>100182333
It's actually "fourtknight," as in, fourteen knights.

Anonymous
04/25/24(Thu)20:44:59 No.100182526

Anonymous 04/25/24(Thu)20:44:59 No.100182526

>>100181801
>possibly exterminated
>quingfeng sun
That is all fun and jokes but if you put your tinfoil hat on a chinese guy working in main microsoft branch on AI? Even if he wasn't a willing mole I could imagine CCP turning him into a mole by force.

Anonymous
04/25/24(Thu)20:46:01 No.100182539

Anonymous 04/25/24(Thu)20:46:01 No.100182539

>>100181821
>Another button lets you mark this reply as "satisfactory", all the tokens with the same logic are now given a better positive value depending on token frequency etc
vectors

Anonymous
04/25/24(Thu)20:46:51 No.100182547

Anonymous 04/25/24(Thu)20:46:51 No.100182547

>>100181397
Instruct models are trained to find the appropriate response to some instruction.

Chat models are trained to do multi-step back-and-forth, to give (whatever the dataset maker deemed) satisfactory responses at each step.

Anonymous
04/25/24(Thu)20:48:59 No.100182568

Anonymous 04/25/24(Thu)20:48:59 No.100182568

I have been using fp16 transformered 8b instruct and solana for the past 2 hours. My impression (prone to placebo) is that it is indeed noticeably better than 8bit gguf.

Anonymous
04/25/24(Thu)20:51:40 No.100182609

Anonymous 04/25/24(Thu)20:51:40 No.100182609

>>100182122
>same user base after all.
do they also melt down if there are no threads with Miku in OP?

Anonymous
04/25/24(Thu)20:53:58 No.100182639

Anonymous 04/25/24(Thu)20:53:58 No.100182639

>>100182568
There was barely any difference between 8bit and 16bit in the paper, if you're seeing a difference it's probably just from gguf and llama.cpp being broken as usual

Anonymous
04/25/24(Thu)20:57:06 No.100182689

Anonymous 04/25/24(Thu)20:57:06 No.100182689

>>100182639
Admittedly I was skeptical of the paper as well but now that you brought up llama.cpp being broken as usual I am 100% convinced it was better because of course lamma.cpp is broken as usual.

Anonymous
04/25/24(Thu)21:02:24 No.100182750

Anonymous 04/25/24(Thu)21:02:24 No.100182750

why are people still using llama.cpp when exllamav2 exists

Anonymous
04/25/24(Thu)21:04:19 No.100182768

Anonymous 04/25/24(Thu)21:04:19 No.100182768

>>100182750
because offloading
because buying more system ram is cheaper than buying vram

Anonymous
04/25/24(Thu)21:04:37 No.100182773

Anonymous 04/25/24(Thu)21:04:37 No.100182773

>>100182750
exl2 gives falsified results because all exl2 quants are inherently 'calibrated' as according to the used calibration dataset
it's impossible to use the model in a pure form while using exllama2

Anonymous
04/25/24(Thu)21:04:52 No.100182777

Anonymous 04/25/24(Thu)21:04:52 No.100182777

>>100182750
P40diots, CPUMaxxxers, VRAMlets

Anonymous
04/25/24(Thu)21:05:21 No.100182782

Anonymous 04/25/24(Thu)21:05:21 No.100182782

>>100182773
what are the alternatives since GGUFs are trash?

Anonymous
04/25/24(Thu)21:06:31 No.100182794

Anonymous 04/25/24(Thu)21:06:31 No.100182794

>>100182750
trying 70b on cpu made it so fucking retarded it made me pity anyone running anything below Q8

Anonymous
04/25/24(Thu)21:07:00 No.100182798

Anonymous 04/25/24(Thu)21:07:00 No.100182798

>>100182750
because with 36gb vram I can only run 3.5bpw in exl2, and Q5KM is noticably smarter

Anonymous
04/25/24(Thu)21:07:12 No.100182801

Anonymous 04/25/24(Thu)21:07:12 No.100182801

For me, it's fp16 using transformers.
What models? Why 8B, of course, you don't need more than that.

Anonymous
04/25/24(Thu)21:08:27 No.100182813

Anonymous 04/25/24(Thu)21:08:27 No.100182813

I can't believe that we could have had gpt4-tier performance since the start if we had used fp16 with llama1

Anonymous
04/25/24(Thu)21:11:52 No.100182853

Anonymous 04/25/24(Thu)21:11:52 No.100182853

>>100182798
what's your tok/s with your Q5KM setup?

Anonymous
04/25/24(Thu)21:12:33 No.100182860

Anonymous 04/25/24(Thu)21:12:33 No.100182860

>perplexity is a meme! retards! stop paying attention to perplexity!
>well look at the perplexity of those quants! it is the same! you are stupid if you use bigger quants than Q4

Anonymous
04/25/24(Thu)21:13:33 No.100182873

Anonymous 04/25/24(Thu)21:13:33 No.100182873

File: file.png (1.52 MB, 1920x2560)

1.52 MB PNG

has anything better come out since meta's Segment Anything? (picrel)

Anonymous
04/25/24(Thu)21:13:36 No.100182875

Anonymous 04/25/24(Thu)21:13:36 No.100182875

>>100182773
the calibration dataset is very small and you can use any calibration dataset you want (including none but you miss the whole point).
the advantage of calibrating is that you can fine tune to account for the quantization loss.
at the moment you modify the model just by quanting it it's no longer in its pure form.
also it is only used to compare with what the base model would output.

Anonymous
04/25/24(Thu)21:14:28 No.100182889

Anonymous 04/25/24(Thu)21:14:28 No.100182889

>>100182873
I think there were some papers that improved upon it but not sure.

Anonymous
04/25/24(Thu)21:14:59 No.100182891

Anonymous 04/25/24(Thu)21:14:59 No.100182891

>>100182873
>>100182889
could be this https://github.com/IDEA-Research/Grounded-Segment-Anything

Anonymous
04/25/24(Thu)21:15:48 No.100182898

Anonymous 04/25/24(Thu)21:15:48 No.100182898

>>100182853
less than 2, it's excruciating
so yeah I do stick with the 3.5bpw most of the time

Anonymous
04/25/24(Thu)21:21:23 No.100182963

Anonymous 04/25/24(Thu)21:21:23 No.100182963

>>100182898
I guess it depends on what you're using it for. I haven't tried the 8B with the 200+K context yet but it might be good for a co-pilot substitute with that much context if it has decent codegen

right now CodeBooga 33B is probably the best one I've tried for that purpose

Anonymous
04/25/24(Thu)21:28:21 No.100183039

Anonymous 04/25/24(Thu)21:28:21 No.100183039

when are we gonna have some good OPs?

Anonymous
04/25/24(Thu)21:32:10 No.100183072

Anonymous 04/25/24(Thu)21:32:10 No.100183072

>>100182568
>>100182639
Meta-Llama-3-70B-Instruct gguf perplexity

imatrix computed for Q8_0 with wikitext-2-raw/wiki.train.raw (1024 chunks, n_ctx=512)
perplexity on wikitext-2-raw/wiki.test.raw (584 chunks, n_ctx=512, batch_size=2048, n_seq=4)

quant ppl(no-imat) ppl(imat)
Q4_K_S 6.9546 5.9468
IQ4_NL 6.9097 5.9434
IQ4_XS 7.1078 5.9323
IQ3_M 71.2264 6.1527
IQ3_S 284.2107 6.1492
IQ3_XS 332.4550 6.4291

As a ramlet (64 GB system, 24 GB vram), I want a ramchad to do this and post the results.

# Create KL-divergence base for fp16 Llama-3-70B-Instruct
./perplexity -m Meta-Llama-3-70B-Instruct-f16.gguf -f wikitext-2-raw/wiki.test.raw --kl-divergence-base kld-base

# Compute KL-divergence for every quant, with and without imatrix
for q in Q8_0 Q8_0-imat Q6_K Q6_K-imat [insert all quants here]; do
./perplexity -m Meta-Llama-3-70B-Instruct-$q.gguf -f wikitext-2-raw/wiki.test.raw --kl-divergence-base kld-base --kl-divergence
done

Anonymous
04/25/24(Thu)21:34:36 No.100183098

Anonymous 04/25/24(Thu)21:34:36 No.100183098

File: awoooo.png (23 KB, 428x436)

23 KB PNG

I completely forgot how to set this shit up. If it's in the FAQ I missed it and I'm retarded.

Anonymous
04/25/24(Thu)21:36:37 No.100183114

Anonymous 04/25/24(Thu)21:36:37 No.100183114

>>100183098
start by installing linux

Anonymous
04/25/24(Thu)21:38:27 No.100183130

Anonymous 04/25/24(Thu)21:38:27 No.100183130

>>100182122
You have no idea how much "safety" people and red steamers love to blow everything out of proportion to make their jobs sound important. A guy I worked with accidentally exposed a dev env webpage to the internet for a few days and the security people escalated it to the board of directors and the fucking government. He got fired.

Anonymous
04/25/24(Thu)21:39:58 No.100183139

Anonymous 04/25/24(Thu)21:39:58 No.100183139

>>100183072
By the way, if the tokenizer is broken (likely), this won't show the problem. We need perplexity calculated with exllamav2 or hf-transformers or something else on fp16 to compare.

Anonymous
04/25/24(Thu)21:46:59 No.100183209

Anonymous 04/25/24(Thu)21:46:59 No.100183209

Next theme thread: https://www.youtube.com/watch?v=Vkj9XvA27fs

Anonymous
04/25/24(Thu)21:47:50 No.100183214

Anonymous 04/25/24(Thu)21:47:50 No.100183214

I may be retarded. Is it normal to be using Llama 3 in Transformers and getting a .assistant end to a response? I just copied the prompt format for Llama 3 into notebook and tested it this way. Checking the tokenization, all the special tokens appear to be tokenized correctly.

Anonymous
04/25/24(Thu)21:48:50 No.100183229

Anonymous 04/25/24(Thu)21:48:50 No.100183229

>>100183072
>Q4_K_S 6.9546 5.9468
I don't use GGUF but... are you sure you did this right? I try to keep up on quant methods, and I was under the impression that imatrix helped k-quants only slightly, and that the difference became basically negligible as you went much above 4bpw. That's a huge difference for q4_k_s. Is llama.cpp actually completely borked somehow (without imatrix at least)?

Anonymous
04/25/24(Thu)21:53:14 No.100183261

Anonymous 04/25/24(Thu)21:53:14 No.100183261

>>100183072
anon could you try llama3-8b in the meanwhile?
pleaaseeeeeeeee?

Anonymous
04/25/24(Thu)21:53:27 No.100183266

Anonymous 04/25/24(Thu)21:53:27 No.100183266

>>100183130
Securityfags are the fucking worst even when it's nothing to do with AI. "Whoops, I've discovered an incredibly contrived hypothetical vulnerability that also requires physical access to the machine to carry out, time to push a microcode update that nerfs the performance of the CPU you paid for by 25%!"

Anonymous
04/25/24(Thu)21:57:56 No.100183309

Anonymous 04/25/24(Thu)21:57:56 No.100183309

File: 1707703521934241.png (53 KB, 671x254)

53 KB PNG

why is llama 3 70b so retarded

Anonymous
04/25/24(Thu)22:00:00 No.100183338

Anonymous 04/25/24(Thu)22:00:00 No.100183338

>>100183309
Nah it's overtrained on common riddles

Anonymous
04/25/24(Thu)22:00:09 No.100183339

Anonymous 04/25/24(Thu)22:00:09 No.100183339

>>100183309
Try asking it to explain the answer.

Anonymous
04/25/24(Thu)22:00:26 No.100183346

Anonymous 04/25/24(Thu)22:00:26 No.100183346

I'm trying out the long context fine tune of 8B right not. Assuming I got the formatting right, it's does notably worse at an original reasoning problem I just threw at it. First is that the model goes straight away to giving me the answer and then explaining it instead of performing CoT and then producing an answer. But then when I try prompting it to do CoT, it still gets the answer wrong at the end. This is as opposed to the original Instruct that defaults to CoT and also gets the answer right.
No logs because I'm not revealing my test set.
I trust that this model can do retrieval, but it is also a dumb model. It makes sense because they claim
>For training data, we generate long contexts by augmenting SlimPajama
This is probably not great data.

Anonymous
04/25/24(Thu)22:01:44 No.100183363

Anonymous 04/25/24(Thu)22:01:44 No.100183363

llama 3 8b is repeating/becoming obtuse and doing lots of ...'s, how fix?

Anonymous
04/25/24(Thu)22:01:46 No.100183364

Anonymous 04/25/24(Thu)22:01:46 No.100183364

>>100183214
it's become a meme, so it's at least somewhat common, although it could be a common mistake and not normal

Anonymous
04/25/24(Thu)22:02:41 No.100183379

Anonymous 04/25/24(Thu)22:02:41 No.100183379

>>100180302
Is calling this website 4channel dot org funny yet?

[spoiler] I'm going to keep doing it regardless [/spoiler]

Anonymous
04/25/24(Thu)22:02:59 No.100183382

Anonymous 04/25/24(Thu)22:02:59 No.100183382

Wake me up when local models can do handle this: https://rentry.org/bloatmaxx/

Anonymous
04/25/24(Thu)22:05:01 No.100183405

Anonymous 04/25/24(Thu)22:05:01 No.100183405

sorry for being the newest fag that ever newfagged. But can I be spoonfed the smallest possible model that I can run on my toaster pc. pretty pwease :3

Anonymous
04/25/24(Thu)22:05:32 No.100183409

Anonymous 04/25/24(Thu)22:05:32 No.100183409

>>100183229
>are you sure you did this right?
Yes.
>I was under the impression that imatrix helped k-quants only slightly, and that the difference became basically negligible as you went much above 4bpw
That's what I would expect. Something looks to be very broken in llama.cpp for Llama 3. Ramlets like me are stuck with using llama.cpp for Llama 3 70B. I'm going to work on testing to compare the python tokenizer and the llama.cpp tokenizer.

Anonymous
04/25/24(Thu)22:09:41 No.100183437

Anonymous 04/25/24(Thu)22:09:41 No.100183437

>>100183405
mmm.. thats right you have to beg. but whats your setup? what do you want to do with the model?

Anonymous
04/25/24(Thu)22:09:43 No.100183439

Anonymous 04/25/24(Thu)22:09:43 No.100183439

>>100183405
phi-3 3b

Anonymous
04/25/24(Thu)22:09:45 No.100183440

Anonymous 04/25/24(Thu)22:09:45 No.100183440

>>100183409
If you use llamacpp_hf in booba, you'll at least avoid tokenizer and sampler bugs. This is how I coped as a vramlet, until I decided 70B wasn't worth it anyway. Boohboo has another unfixed bug for the last week though, you have to fix it manually lol
https://github.com/oobabooga/text-generation-webui/issues/5885

Anonymous
04/25/24(Thu)22:13:01 No.100183464

Anonymous 04/25/24(Thu)22:13:01 No.100183464

File: file.png (142 KB, 1150x861)

142 KB PNG

>>100183309
lol

Anonymous
04/25/24(Thu)22:14:39 No.100183480

Anonymous 04/25/24(Thu)22:14:39 No.100183480

File: 1696720738257649.png (70 KB, 701x276)

70 KB PNG

>>100183339
>>100183464
kek wtf

Anonymous
04/25/24(Thu)22:16:13 No.100183491

Anonymous 04/25/24(Thu)22:16:13 No.100183491

>>100183464
>>100183480
use cot

Anonymous
04/25/24(Thu)22:16:46 No.100183496

Anonymous 04/25/24(Thu)22:16:46 No.100183496

>>100183480
That's funny, it's really certain it's trying to solve the original riddle.

Anonymous
04/25/24(Thu)22:16:58 No.100183498

Anonymous 04/25/24(Thu)22:16:58 No.100183498

>>100183440
Huh, so that's why I'm getting the .assistant then? God damn I thought using Transformers would avoid any issues like that.

Anonymous
04/25/24(Thu)22:20:09 No.100183527

Anonymous 04/25/24(Thu)22:20:09 No.100183527

File: file.png (161 KB, 1132x866)

161 KB PNG

>>100183491

Anonymous
04/25/24(Thu)22:20:52 No.100183533

Anonymous 04/25/24(Thu)22:20:52 No.100183533

>>100183440
Seems like checking perplexity before claiming to support a new model should be sop for llama.cpp, or even a regression test, also for quants. Automated testing of the tokenizer should be standard too. I thought it had VC funding?

nothing
04/25/24(Thu)22:21:27 No.100183542

nothing 04/25/24(Thu)22:21:27 No.100183542

>>100183527
baby steps

Anonymous
04/25/24(Thu)22:22:13 No.100183548

Anonymous 04/25/24(Thu)22:22:13 No.100183548

>>100183309
I just tested this on 8B and it answered correctly at 0 temp.

Anonymous
04/25/24(Thu)22:22:58 No.100183557

Anonymous 04/25/24(Thu)22:22:58 No.100183557

>>100183548
as expected, fp16 8B >>> q8 70b

Anonymous
04/25/24(Thu)22:23:03 No.100183558

Anonymous 04/25/24(Thu)22:23:03 No.100183558

File: file.png (115 KB, 1131x632)

115 KB PNG

>>100183527
Tried regenerating, it's sure of the answer.
Shows how important CoT is.

Anonymous
04/25/24(Thu)22:24:34 No.100183573

Anonymous 04/25/24(Thu)22:24:34 No.100183573

>>100183557
Well, no. I just tested this prompt on lmsys, and 70B also says mother. I would expect that they don't use quants on lmsys.

Anonymous
04/25/24(Thu)22:25:06 No.100183579

Anonymous 04/25/24(Thu)22:25:06 No.100183579

File: Screenshot 2024-04-26 142401.png (105 KB, 2685x1093)

105 KB PNG

GPT-4

total Yann LeCun victory tbdesu, LLMs are clearly just fundamentally a shitty technology

Anonymous
04/25/24(Thu)22:25:26 No.100183582

Anonymous 04/25/24(Thu)22:25:26 No.100183582

>>100183557
Actually, 4 bit 70B for me. Gonna try 5bit next, should be able to fit it in memory.

Anonymous
04/25/24(Thu)22:26:25 No.100183588

Anonymous 04/25/24(Thu)22:26:25 No.100183588

>>100183573
So the issue is actually likely >>100183496 >>100183338
The larger model was able to memorize the (supposed? I haven't seen what the original riddle is) original riddle, while the small one didn't so much.

Anonymous
04/25/24(Thu)22:26:59 No.100183591

Anonymous 04/25/24(Thu)22:26:59 No.100183591

>>100183440
set skip_special_tokens to False instead

Anonymous
04/25/24(Thu)22:27:55 No.100183599

Anonymous 04/25/24(Thu)22:27:55 No.100183599

File: 1708306712216342.png (133 KB, 843x413)

133 KB PNG

>>100183491
i'm fucking dying
IQ4_XS if that matters

Anonymous
04/25/24(Thu)22:28:12 No.100183605

Anonymous 04/25/24(Thu)22:28:12 No.100183605

>>100183533
True. They should've caught shit like this if they did proper testing. Sad.

Anonymous
04/25/24(Thu)22:28:43 No.100183611

Anonymous 04/25/24(Thu)22:28:43 No.100183611

File: Screenshot 2024-04-26 142737.png (476 KB, 3744x1763)

476 KB PNG

>>100183579
Opus got the answer correct, but then its explanation went off the rails and acted like I'd given it the regular formulation of the riddle

I'm definitely LeCun-pilled now

Anonymous
04/25/24(Thu)22:28:53 No.100183614

Anonymous 04/25/24(Thu)22:28:53 No.100183614

>>100183464
>>100183527
>ollama
Ollama (a thin llama.cpp UI) has meetups and Silicon Valley buzzing, while llama.cpp itself languishes with tokenizer and quant bugs and numerous unsupported new models.

Anonymous
04/25/24(Thu)22:29:13 No.100183617

Anonymous 04/25/24(Thu)22:29:13 No.100183617

This is why you use storytelling continuation with token probabilities to test model intelligence. Not riddles.

Anonymous
04/25/24(Thu)22:30:50 No.100183628

Anonymous 04/25/24(Thu)22:30:50 No.100183628

>>100183614
#justwerks for me
Plus it supports passing logits and constrained generation out of the box, which is what I need it for primarily.

Anonymous
04/25/24(Thu)22:30:57 No.100183629

Anonymous 04/25/24(Thu)22:30:57 No.100183629

>>100183611
>even the industry SOTA is this dumb
Transformersissies...

Anonymous
04/25/24(Thu)22:31:09 No.100183630

Anonymous 04/25/24(Thu)22:31:09 No.100183630

File: 1696812275255124.png (45 KB, 1265x262)

45 KB PNG

cr+ gets it right

Anonymous
04/25/24(Thu)22:31:30 No.100183633

Anonymous 04/25/24(Thu)22:31:30 No.100183633

>>100183617
This kind of testing is still useful for proving that these are not minds, but probability engines. A lot of people need that reminder.

Anonymous
04/25/24(Thu)22:32:18 No.100183639

Anonymous 04/25/24(Thu)22:32:18 No.100183639

>>100183633
But CoT is able to generate novel thinking outside the learned patterns, as demonstrated in the thread

Anonymous
04/25/24(Thu)22:33:05 No.100183645

Anonymous 04/25/24(Thu)22:33:05 No.100183645

File: 1714098779087.jpg (254 KB, 1080x719)

254 KB JPG

GPT4 gets it right

Anonymous
04/25/24(Thu)22:33:08 No.100183646

Anonymous 04/25/24(Thu)22:33:08 No.100183646

File: file.png (82 KB, 984x573)

82 KB PNG

LLaMa3-70b-instruct on ppl labs also fails it

Anonymous
04/25/24(Thu)22:33:40 No.100183653

Anonymous 04/25/24(Thu)22:33:40 No.100183653

>>100183645
wrong, bitch >>100183579

Anonymous
04/25/24(Thu)22:34:21 No.100183657

Anonymous 04/25/24(Thu)22:34:21 No.100183657

>>100183653
The screenshot literally proves that your screenshot is bullshit, kys

Anonymous
04/25/24(Thu)22:34:24 No.100183658

Anonymous 04/25/24(Thu)22:34:24 No.100183658

File: file.png (79 KB, 1011x579)

79 KB PNG

mixtral-8x22b gets the answer right

Anonymous
04/25/24(Thu)22:34:44 No.100183662

Anonymous 04/25/24(Thu)22:34:44 No.100183662

File: 1684052477547270.png (53 KB, 1273x290)

53 KB PNG

command-r (not plus) also gets it right

Anonymous
04/25/24(Thu)22:34:53 No.100183664

Anonymous 04/25/24(Thu)22:34:53 No.100183664

>>100183645
lol, it wants SO BADLY to link it back to the original riddle, but logic wins out.

Anonymous
04/25/24(Thu)22:35:32 No.100183674

Anonymous 04/25/24(Thu)22:35:32 No.100183674

>>100183657
you're seriously claiming that anon used inspect element or something to make gpt-4 look bad?

Anonymous
04/25/24(Thu)22:35:40 No.100183676

Anonymous 04/25/24(Thu)22:35:40 No.100183676

File: 2024-04-25-223514_646x612(...).png (96 KB, 646x612)

96 KB PNG

>>100183439
Thank you anon. I love you <3

Anonymous
04/25/24(Thu)22:35:54 No.100183679

Anonymous 04/25/24(Thu)22:35:54 No.100183679

>>100183662
Damn, command-r and plus looking pretty good right now for logical thinking.

Anonymous
04/25/24(Thu)22:36:47 No.100183690

Anonymous 04/25/24(Thu)22:36:47 No.100183690

>>100183657
>>100183653
Instead of insulting each other have you ever thought that different versions of GPT-4 might exist and explain the difference in your results, rather than that someone doctored their image?

Anonymous
04/25/24(Thu)22:37:15 No.100183698

Anonymous 04/25/24(Thu)22:37:15 No.100183698

>>100183674
Yes, either that or he picked a bad gen on purpose. I tried it multiple times and GPT4 got the answer right every time.

Anonymous
04/25/24(Thu)22:37:27 No.100183701

Anonymous 04/25/24(Thu)22:37:27 No.100183701

>>100183558
>>100183527
Also, getting 6.6 t/s on 70B
P40 haters in shambles

Anonymous
04/25/24(Thu)22:38:33 No.100183710

Anonymous 04/25/24(Thu)22:38:33 No.100183710

>>100183698
Even if he had to do multiple gens to find a failure, it's still bad that the supposed sota model failed it even once

Anonymous
04/25/24(Thu)22:40:19 No.100183726

Anonymous 04/25/24(Thu)22:40:19 No.100183726

>>100183611
>>100183579
Man this is genuinely disappointing and reduces my interest in LLMs a bit. I never thought they were alive, but this makes it a bit too stark that they're just big statistics processors.

Anonymous
04/25/24(Thu)22:42:18 No.100183757

Anonymous 04/25/24(Thu)22:42:18 No.100183757

>>100183726
Yeah, this kinda marks the end of LLMs for me. If these multi million dollar, trillion parameter models can't solve it, local doesn't have a chance. LLMs are a permanent dead end. An AI winter is probably coming soon if I had to guess.

Anonymous
04/25/24(Thu)22:42:34 No.100183761

Anonymous 04/25/24(Thu)22:42:34 No.100183761

>>100182750
I use Koboldcpp because of it's context shifting, basically it intelligently deletes part of the context so you never have to reprocess it.

I find it helps massively once context gets large enough, and I hate the downtime.

Anonymous
04/25/24(Thu)22:42:56 No.100183768

Anonymous 04/25/24(Thu)22:42:56 No.100183768

>>100183701
two P40s?

Anonymous
04/25/24(Thu)22:43:39 No.100183780

Anonymous 04/25/24(Thu)22:43:39 No.100183780

File: file.png (168 KB, 1821x600)

168 KB PNG

>>100183768
yeah

Anonymous
04/25/24(Thu)22:44:20 No.100183788

Anonymous 04/25/24(Thu)22:44:20 No.100183788

>>100183757
<cope>j-just wait, gpt-5 is coming and it's gonna blow everything else away!</cope>

Anonymous
04/25/24(Thu)22:44:47 No.100183796

Anonymous 04/25/24(Thu)22:44:47 No.100183796

>>100183780
very nice, what PSU?

Anonymous
04/25/24(Thu)22:45:10 No.100183801

Anonymous 04/25/24(Thu)22:45:10 No.100183801

>>100183757
>my short form riddle solving dreams are over
bros... they can't even do math either, or count letters in words. they can't even be counted on to get my politician's birthday questions right. it's over... what else could I do with a language model?

Anonymous
04/25/24(Thu)22:45:35 No.100183805

Anonymous 04/25/24(Thu)22:45:35 No.100183805

>Generative AI exists and can do fuzzy logic that is damn near impossible to program conventionally
>IT GOT A RIDDLE WRONG????? ITS SO FUCKING OVER
I hate you niggers so much.

nothing
04/25/24(Thu)22:46:23 No.100183812

nothing 04/25/24(Thu)22:46:23 No.100183812

>>100183757
the current gen of AI doesn't have to end in AGI to be useful
its plenty useful at just being an akashic record of human knowledge, things like codegen services will still be very popular with LLMs

Anonymous
04/25/24(Thu)22:46:48 No.100183823

Anonymous 04/25/24(Thu)22:46:48 No.100183823

>>100183796
1100 watt, but I have them heavily under-clocked so it's only drawing ~400 watts at peak.

Anonymous
04/25/24(Thu)22:47:44 No.100183830

Anonymous 04/25/24(Thu)22:47:44 No.100183830

There's nothing inherently wrong with neural networks or them being statistic engines. We already know they work and have internal models of things like real brains do. The issue arises when we are asking one to learn about the world solely through text, and not just text, but text of dubious quality (dude men can be women lmao) with no grounding of reality. That is what LeCun's arguments are about, not that neural networks somehow inherently can't have internal models entirely. That is why he proposes that we'll need something JEPA-like in the future.

Anonymous
04/25/24(Thu)22:48:00 No.100183833

Anonymous 04/25/24(Thu)22:48:00 No.100183833

>>100183710
The true SOTA model is GPT4 classic. I would love to see him trying to get the same response from it to truly prove AI winter is here.
GPT4 Turbo is optimized for benchmarks and is retarded.

Anonymous
04/25/24(Thu)22:49:21 No.100183847

Anonymous 04/25/24(Thu)22:49:21 No.100183847

>>100183757
the tech will be useful to make new kinds of AI that are not just meme llm / transformers but novel architectures that can actually reason, it's just the begining.

Anonymous
04/25/24(Thu)22:49:28 No.100183849

Anonymous 04/25/24(Thu)22:49:28 No.100183849

>>100183812
I don't think anyone thinks they're unimpressive as compressed knowledge stores or creative writing tools

This plateau has gotta be disappointing for all the people who thought they were going to culminate in Asimov style artificial minds though

Anonymous
04/25/24(Thu)22:49:32 No.100183851

Anonymous 04/25/24(Thu)22:49:32 No.100183851

>>100183812
>akashic record
I like that you said this. I like you.

Anonymous
04/25/24(Thu)22:49:32 No.100183852

Anonymous 04/25/24(Thu)22:49:32 No.100183852

People are trolling right? There's 3 screenshots of it failing to solve it, and 6 showing it succeeding.

Anonymous
04/25/24(Thu)22:50:10 No.100183860

Anonymous 04/25/24(Thu)22:50:10 No.100183860

>>100183726
What they do is closer to intuition than deliberation. If you want to make them think, you have to make them spell out their thoughts in the context. It's like a person who gives a quick answer from experience of having to answer many such questions, so he can make a mistake if the question is intentionally confusing.

Anonymous
04/25/24(Thu)22:50:45 No.100183863

Anonymous 04/25/24(Thu)22:50:45 No.100183863

File: robert 2.jpg (78 KB, 533x432)

78 KB JPG

I am torn between "instruct" models and "chat" finetunes for my use case. The use case:
A health-coach assistant that is optimized for long, coherent conversations, rather than simple Q&A. I want it remember the context of long interactions, and recall/consider old messages in the conversations, and to actively listen and tease out the user's problems when not provided with enoguh information to give a confident answer. Once it does have enough information from the user, I'd like its responses to be pretty conclusive and fleshed out: sometimes a few paragraphs.

I worry that "chat" finetunes bias the models towards very short messages that are good for quick back-and-forths with an RP chat buddy, and that this would be too short for a coach-type assistant bot. Am I wrong here?

Anonymous
04/25/24(Thu)22:52:39 No.100183879

Anonymous 04/25/24(Thu)22:52:39 No.100183879

File: Screenshot 2024-04-26 145226.png (121 KB, 1485x1092)

121 KB PNG

>>100183833

Anonymous
04/25/24(Thu)22:54:08 No.100183896

Anonymous 04/25/24(Thu)22:54:08 No.100183896

>>100183879
A lot of models can do this as demonstrated by the thread. The problem is when they're asked to explain their answer.

Anonymous
04/25/24(Thu)22:55:21 No.100183912

Anonymous 04/25/24(Thu)22:55:21 No.100183912

>>100183896
looks good to me
>>100183558
>>100183527

Anonymous
04/25/24(Thu)22:56:23 No.100183923

Anonymous 04/25/24(Thu)22:56:23 No.100183923

File: 1714100177564.jpg (93 KB, 1052x389)

93 KB JPG

>>100183896

Anonymous
04/25/24(Thu)22:56:26 No.100183925

Anonymous 04/25/24(Thu)22:56:26 No.100183925

File: Screenshot 2024-04-26 145518.png (188 KB, 1466x1161)

188 KB PNG

>>100183896

Anonymous
04/25/24(Thu)22:56:58 No.100183934

Anonymous 04/25/24(Thu)22:56:58 No.100183934

>>100180827
> Quantized Moistral-11B-v3 Model
wat

Anonymous
04/25/24(Thu)22:57:18 No.100183937

Anonymous 04/25/24(Thu)22:57:18 No.100183937

>>100183599
>>100183611
>>100183646
LLMs are really reasoning guys!
>defeated by swapping genders in a cliche scenario
>>100183630
this ONE is fine. Does command-r have something better than the pile? >>100183662
>>100183645
this one sings from very peculiar hymnal
>>100183658
okay this one is sort of fine too where you can see it jumping onto the original cliche but then taking a step back last moment

Anonymous
04/25/24(Thu)22:58:33 No.100183946

Anonymous 04/25/24(Thu)22:58:33 No.100183946

>>100183823
thanks for the info anon

Anonymous
04/25/24(Thu)22:58:38 No.100183948

Anonymous 04/25/24(Thu)22:58:38 No.100183948

>>100183925
This is the best explanation posted so far because in the others you could feel that the model was being dragged towards the normal version of the riddle and only barely managing not to give the wrong explanation. This is just autistic correctness with no sign the model is thinking about the original riddle at all

maybe the schizos saying og March 2023 GPT-4 is the best are right

Anonymous
04/25/24(Thu)22:59:17 No.100183953

Anonymous 04/25/24(Thu)22:59:17 No.100183953

>>100183948
Truly, excessive alignment is the root of all evil.

Anonymous
04/25/24(Thu)23:00:16 No.100183966

Anonymous 04/25/24(Thu)23:00:16 No.100183966

>>100183925
How did you get access to 0314 tho

Anonymous
04/25/24(Thu)23:01:32 No.100183979

Anonymous 04/25/24(Thu)23:01:32 No.100183979

>>100183925
yeah, "alignment" is teaching the models to be stupid.

Anonymous
04/25/24(Thu)23:02:21 No.100183989

Anonymous 04/25/24(Thu)23:02:21 No.100183989

>>100183966
All versions of GPT-4 are still available on the API (that interface is the Playground which is OpenAI's simple interface for testing the api without writing code)

Anonymous
04/25/24(Thu)23:04:15 No.100184012

Anonymous 04/25/24(Thu)23:04:15 No.100184012

File: file.png (5 KB, 185x153)

5 KB PNG

>>100183989
are they?

Anonymous
04/25/24(Thu)23:06:45 No.100184051

Anonymous 04/25/24(Thu)23:06:45 No.100184051

>>100184012
Huh I dunno then, I still have the original 0314 one, it appears between 0613 and 0125 in my version of the menu

I got fairly early access to it via application so maybe it's related to that

Anonymous
04/25/24(Thu)23:09:29 No.100184076

Anonymous 04/25/24(Thu)23:09:29 No.100184076

>>100184051
damn...

Anonymous
04/25/24(Thu)23:12:47 No.100184103

Anonymous 04/25/24(Thu)23:12:47 No.100184103

File: miku angel wings luka mei(...).jpg (1.87 MB, 1680x1193)

1.87 MB JPG

>>100183757
LLMs are toys. Have fun with them!
It is possible to build genuinely impressive, clever, and useful contraptions out of Lego, but there are limits.

Anonymous
04/25/24(Thu)23:15:55 No.100184134

Anonymous 04/25/24(Thu)23:15:55 No.100184134

>>100184103
>Lego
I like that you said this. I like you.

God I love my little akashic record legos.

Anonymous
04/25/24(Thu)23:32:48 No.100184295

Anonymous 04/25/24(Thu)23:32:48 No.100184295

I tried optimizing a quantized model with gradient descent to minimize the error relative to the original model.

Specifically, I hacked torchtune to replace the bf16 weight matrices in all the nn.Linear layers with the separate qs/scales/d components of the quantized Q6_K format. These are stored as floating-point "latent weights" (like in the bitnet paper), but the forward pass rounds, clamps, and runs the normal Q6_K dequantization on the fly, so the weights that actually get used in the linear layers are always exactly those of some valid Q6_K quant.

I took a normal Q6_K quant of Llama3 8B Instruct and separately optimized each layer to minimize the error it introduces relative to the same layer in the original model. I did each layer separately to minimize VRAM usage, since I eventually want to apply this to L3 70B. This took about 6 hours on 1x 4090. Then I converted the results back to a Q6_K GGUF.

KL-divergence on wiki.test.raw improved by a small amount:
old:   0.004234 
new:   0.003945
delta: 0.000289 (7%)
I have a few ideas to try next to improve the results:
>Train 2-4 layers together. My hope is that this will give the optimizer some flexibility to have the layers cancel out each other's errors.
>Train the layers sequentially. First, train layer 0 as normal. Then, instead of training layer 1 to map layer_0_fp16_output -> layer_1_fp16_output, train it to map layer_0_quant_output -> layer_1_fp16_output. This lets the layer 1 optimizer know about the errors introduced by layer 0 so it can correct for them.
But I'd love to hear other suggestions too. I'm sure there are anons ITT who know a lot more about ML than I do

Anonymous
04/25/24(Thu)23:36:57 No.100184340

Anonymous 04/25/24(Thu)23:36:57 No.100184340

>>100184051
>>100184076
They probably disabled it for customers who were not already using it.

Anonymous
04/25/24(Thu)23:37:50 No.100184347

Anonymous 04/25/24(Thu)23:37:50 No.100184347

>>100184295
this talk is too technical for this general, please leave for your own good.

Anonymous
04/25/24(Thu)23:39:07 No.100184361

Anonymous 04/25/24(Thu)23:39:07 No.100184361

>>100184347
stfu i'm here for the technical stuff.

Anonymous
04/25/24(Thu)23:42:17 No.100184394

Anonymous 04/25/24(Thu)23:42:17 No.100184394

>>100184295
By the way, this initial test was with a non-imat Q6_K. Based on >>100183072, I'm running it again tonight on an imat Q6_K instead.

Anonymous
04/25/24(Thu)23:43:03 No.100184405

Anonymous 04/25/24(Thu)23:43:03 No.100184405

wait how the fuck does dequant work? do you just make up information?

Anonymous
04/25/24(Thu)23:47:46 No.100184444

Anonymous 04/25/24(Thu)23:47:46 No.100184444

>>100184405
Please explain/rephrase.

Anonymous
04/25/24(Thu)23:49:45 No.100184456

Anonymous 04/25/24(Thu)23:49:45 No.100184456

>>100184295
Very neat idea anon. So to try and understand: you're processing data with both the unquantized and quantized layer, then using the difference in the output as error for backprop? Or are you simply moving the Q6 weights closer to the quantized weights?

Anonymous
04/25/24(Thu)23:50:11 No.100184460

Anonymous 04/25/24(Thu)23:50:11 No.100184460

>>100184405
It just means converting the quantized form back to fp16 or whatever so you can use it. But the fp16 you get back will be different from the one you had before quantization, since some information is lost in that process.

Anonymous
04/25/24(Thu)23:52:12 No.100184473

Anonymous 04/25/24(Thu)23:52:12 No.100184473

>>100184460
for what purpose? so you can tune it?

Anonymous
04/25/24(Thu)23:53:31 No.100184485

Anonymous 04/25/24(Thu)23:53:31 No.100184485

File: 1575746075482605401365461(...).png (407 KB, 717x541)

407 KB PNG

>>100183645
>Half the response is neo-religious psychobabble
That's some awfully noisy output. Pass.

Anonymous
04/25/24(Thu)23:54:01 No.100184488

Anonymous 04/25/24(Thu)23:54:01 No.100184488

Anyone using fimbulvetr, upgrade to moistral v3. It's the same, but better vocab.

Anyone using bigger, better models, ignore this.

Anonymous
04/25/24(Thu)23:54:02 No.100184489

Anonymous 04/25/24(Thu)23:54:02 No.100184489

>>100184295
Upload it to github and we'll hail you as savior (but then will obviously shit on you like we did on Kalo).
Why q6 quants, though? Won't it be more pronounced with, say, q2?

Anonymous
04/25/24(Thu)23:54:20 No.100184492

Anonymous 04/25/24(Thu)23:54:20 No.100184492

>https://console.chaiverse.com/models/neversleep-llama-3-lumim_6375_v2
i wanna download this cause it's pretty good T_T

Anonymous
04/25/24(Thu)23:55:07 No.100184499

Anonymous 04/25/24(Thu)23:55:07 No.100184499

>>100184295
I don't know much about ML, but I'm confused about what the inputs are you're using to compute error. If it's text, tokenizer bugs are going to affect you.
https://github.com/ggerganov/llama.cpp/issues/6809

Anonymous
04/25/24(Thu)23:55:22 No.100184502

Anonymous 04/25/24(Thu)23:55:22 No.100184502

>>100184473
Yes. Some tools only work on f32/fp16

Anonymous
04/25/24(Thu)23:56:11 No.100184510

Anonymous 04/25/24(Thu)23:56:11 No.100184510

>>100184473
Also, quantization to a different format.

Anonymous
04/25/24(Thu)23:56:41 No.100184516

Anonymous 04/25/24(Thu)23:56:41 No.100184516

File: 25oc38lbsqwc1.png (73 KB, 1086x452)

73 KB PNG

>>100184103
>Meanwhile at OpenAI

Anonymous
04/25/24(Thu)23:57:52 No.100184525

Anonymous 04/25/24(Thu)23:57:52 No.100184525

>>100184473
Hardware operates on fp16/fp32/(others).

Anonymous
04/25/24(Thu)23:58:00 No.100184526

Anonymous 04/25/24(Thu)23:58:00 No.100184526

>>100184516
llama3 8B Instruct is sentient

Anonymous
04/25/24(Thu)23:59:04 No.100184534

Anonymous 04/25/24(Thu)23:59:04 No.100184534

>>100184488
Wasn't that the horny retarded model? Is v3 any better?

Anonymous
04/25/24(Thu)23:59:45 No.100184541

Anonymous 04/25/24(Thu)23:59:45 No.100184541

what ever happened to all about Q* breaking encryption?

Anonymous
04/26/24(Fri)00:00:34 No.100184546

Anonymous 04/26/24(Fri)00:00:34 No.100184546

>>100184499
Or maybe not if this is all in python with pytorch.

Anonymous
04/26/24(Fri)00:01:35 No.100184552

Anonymous 04/26/24(Fri)00:01:35 No.100184552

modelpill me on command r and why haven't I heard anything about it except in /lmg/

Anonymous
04/26/24(Fri)00:02:13 No.100184556

Anonymous 04/26/24(Fri)00:02:13 No.100184556

>>100184541
Last we've heard is from Sam Altman saying that OpenAI wasn't ready to talk about Q* yet.

Anonymous
04/26/24(Fri)00:02:52 No.100184559

Anonymous 04/26/24(Fri)00:02:52 No.100184559

>>100184456
>you're processing data with both the unquantized and quantized layer, then using the difference in the output as error for backprop?
Yes. There are two models, the original fp16 model, which is used as a reference, and a single transformer layer in Q6_K format, which is being trained. We take the original fp16 model and run the forward pass up through layer N, and record the input and output of layer N. Then we run the quantized layer's forward pass using that same input, compute loss as mean squared error between the fp16 output and the quantized output, and run the backward pass.

Part of the idea here is to minimize VRAM usage so you can do this on big models. The fp16 forward pass can be run with only a single layer in vram at a time, and the performance penalty of doing this can (hopefully) be mitigated by batching prompts. Then you only need enough additional vram for one quantized layer and its gradients.

>>100184489
>Upload it to github
Will do in the morning

>Why q6 quants, though?
All the modern smaller quants are actually mixes of multiple quants. Q4_K_M is a mix of Q4_K and Q6_K, for example. I picked Q6_K so I only had to implement packing/unpacking of a single format at first. I do plan to try Q2-Q4 later on.

>>100184499
Thanks for the link - I keep seeing anons talking about supposed tokenizer bugs, but this is the first time I've seen a link to an actual issue

>>100184546
>Or maybe not if this is all in python with pytorch.
I think you're right - I'm using the tokenizer from HF's transformers library

Anonymous
04/26/24(Fri)00:03:29 No.100184566

Anonymous 04/26/24(Fri)00:03:29 No.100184566

>>100184525
then whats the disadvantage of quanting below that?

Anonymous
04/26/24(Fri)00:03:49 No.100184570

Anonymous 04/26/24(Fri)00:03:49 No.100184570

File: miku-censored-for-your-safety.jpg (82 KB, 740x666)

82 KB JPG

gpt5 dropping this july
mark it down on your calendars

Anonymous
04/26/24(Fri)00:03:59 No.100184572

Anonymous 04/26/24(Fri)00:03:59 No.100184572

>>100184541
A nothingburger

Anonymous
04/26/24(Fri)00:04:18 No.100184577

Anonymous 04/26/24(Fri)00:04:18 No.100184577

>>100184534
It's less retarded now. But Fimbul was also horny and retarded due to it's size. Now it's about equal.

Anonymous
04/26/24(Fri)00:05:07 No.100184585

Anonymous 04/26/24(Fri)00:05:07 No.100184585

>>100184541
The rumors were literally an effective psy op

Anonymous
04/26/24(Fri)00:08:59 No.100184613

Anonymous 04/26/24(Fri)00:08:59 No.100184613

>>100184516
It's gonna be pretty funny if GPT-5 is mediocre when it drops after all of roon's religious rapture the last few weeks

Anonymous
04/26/24(Fri)00:09:51 No.100184622

Anonymous 04/26/24(Fri)00:09:51 No.100184622

>>100184516
roon is indian isn't he? good day sir

Anonymous
04/26/24(Fri)00:09:52 No.100184623

Anonymous 04/26/24(Fri)00:09:52 No.100184623

>>100184492
NOOOOO MY SESSION GOT RESTARTED
MY HOT STEAMY SEX WAS JUST ABOUT TO BEGIN

Anonymous
04/26/24(Fri)00:10:20 No.100184626

Anonymous 04/26/24(Fri)00:10:20 No.100184626

Is daybreak anon still alive? I'm still eagerly waiting on 2.0 of his dataset because 1.0 was probably my favorite model.

Anonymous
04/26/24(Fri)00:16:22 No.100184691

Anonymous 04/26/24(Fri)00:16:22 No.100184691

>>100184623
nvm I'm BACK

Anonymous
04/26/24(Fri)00:16:22 No.100184692

Anonymous 04/26/24(Fri)00:16:22 No.100184692

Man, Q2 really cooks models. The knowledge is clearly there in most cases, but it can't spit it out without sounding retarded.

Anonymous
04/26/24(Fri)00:17:00 No.100184699

Anonymous 04/26/24(Fri)00:17:00 No.100184699

>>100184541
it was this
https://rentry.org/Q451921

>>100184613
GPT5 will be praised as a new technogod

Anonymous
04/26/24(Fri)00:17:48 No.100184707

Anonymous 04/26/24(Fri)00:17:48 No.100184707

>>100184692
Yeah something catastrophic seems to happen to them in the drop from 3 to 2, it's a gradual descent up until then but below 3 they just totally lose it

Anonymous
04/26/24(Fri)00:19:30 No.100184716

Anonymous 04/26/24(Fri)00:19:30 No.100184716

>>100184613
>if GPT-5 is mediocre
That's why it will be called GPT-4.5. Fanboys have built anticipation too high for GPT-5, and Altman has to manage expectations. There will be no GPT-5 this year.

Anonymous
04/26/24(Fri)00:25:02 No.100184773

Anonymous 04/26/24(Fri)00:25:02 No.100184773

>>100184699
>codenamed DESU
WHAT

Anonymous
04/26/24(Fri)00:25:30 No.100184783

Anonymous 04/26/24(Fri)00:25:30 No.100184783

When is a decent llama3 70 finetune gonna come out?

Anonymous
04/26/24(Fri)00:25:54 No.100184790

Anonymous 04/26/24(Fri)00:25:54 No.100184790

>>100184716
I think it'll be the opposite, the model will indeed be only a 4.5 level of advancement but it will be named 5 for marketing/cope reasons.

Anonymous
04/26/24(Fri)00:27:37 No.100184803

Anonymous 04/26/24(Fri)00:27:37 No.100184803

File: .png (174 KB, 710x111)

174 KB PNG

thanks, y-you too...

Anonymous
04/26/24(Fri)00:28:05 No.100184808

Anonymous 04/26/24(Fri)00:28:05 No.100184808

>>100184783
in about 14 days

Anonymous
04/26/24(Fri)00:28:41 No.100184811

Anonymous 04/26/24(Fri)00:28:41 No.100184811

>>100184783
one more fortnight

Anonymous
04/26/24(Fri)00:32:36 No.100184843

Anonymous 04/26/24(Fri)00:32:36 No.100184843

>>100184783
thursday after next

Anonymous
04/26/24(Fri)00:34:15 No.100184854

Anonymous 04/26/24(Fri)00:34:15 No.100184854

>>100184783
Working on midnight llama. Will be posted soon

Anonymous
04/26/24(Fri)00:37:49 No.100184878

Anonymous 04/26/24(Fri)00:37:49 No.100184878

>>100184773
This first very vid on it was posted to /g/ too
https://youtube.com/watch?v=3d0kk88IE8c

Anonymous
04/26/24(Fri)00:39:59 No.100184889

Anonymous 04/26/24(Fri)00:39:59 No.100184889

>>100184488
Holy shit, Moistral is actually pretty good.

Anonymous
04/26/24(Fri)00:41:55 No.100184901

Anonymous 04/26/24(Fri)00:41:55 No.100184901

File: pinned.png (71 KB, 692x352)

71 KB PNG

>>100184878
kekk

Anonymous
04/26/24(Fri)00:48:41 No.100184950

Anonymous 04/26/24(Fri)00:48:41 No.100184950

>>100184773
>DESU
Where do you see that?

Anonymous
04/26/24(Fri)00:49:19 No.100184954

Anonymous 04/26/24(Fri)00:49:19 No.100184954

File: Screenshot from 2024-04-2(...).png (174 KB, 1195x592)

174 KB PNG

lol models are fucking retarded

Anonymous
04/26/24(Fri)00:49:43 No.100184958

Anonymous 04/26/24(Fri)00:49:43 No.100184958

>>100181866
I think I had this before, most likely you are running out of vram and its offloading elsewhere

Anonymous
04/26/24(Fri)00:50:32 No.100184965

Anonymous 04/26/24(Fri)00:50:32 No.100184965

New thread!
>>100184962
>>100184962
>>100184962

Anonymous
04/26/24(Fri)00:51:20 No.100184971

Anonymous 04/26/24(Fri)00:51:20 No.100184971

File: Screenshot from 2024-04-2(...).png (179 KB, 1195x592)

179 KB PNG

>>100184954

Anonymous
04/26/24(Fri)00:59:32 No.100185045

Anonymous 04/26/24(Fri)00:59:32 No.100185045

File: Screenshot from 2024-04-2(...).png (181 KB, 1195x592)

181 KB PNG

>>100184971
I feel like I'm shooting fish in a barrel at this point.

Anonymous
04/26/24(Fri)01:05:12 No.100185087

Anonymous 04/26/24(Fri)01:05:12 No.100185087

>>100185045
>>100184971
>>100184954
To be fair, LLMs are trained to interpret the user prompt a bit loosely, as there are imperfect user prompts. Therefore, given a prompt that seems very similar to an existing famous riddle, they will assume that you just imperfectly copied it and answer as if it was the original riddle.

Anonymous
04/26/24(Fri)01:06:49 No.100185097

Anonymous 04/26/24(Fri)01:06:49 No.100185097

>>100184950
It's right there in the link?
The numbers spell it out

Anonymous
04/26/24(Fri)01:10:11 No.100185131

Anonymous 04/26/24(Fri)01:10:11 No.100185131

bake
>>100185124
>>100185124
>>100185124

Anonymous
04/26/24(Fri)01:13:02 No.100185154

Anonymous 04/26/24(Fri)01:13:02 No.100185154

i'm staying here

Anonymous
04/26/24(Fri)01:17:36 No.100185186

Anonymous 04/26/24(Fri)01:17:36 No.100185186

>>100185087
yes of course. cleverbot might do the same thing. all is well until you ask the model a new problem it's never seen before, and it just pattern matches it to the closest problem in it's dataset.

I have this happen a lot on real novel problems I pose to a model, even GPT4. It will just give a similar solution that works on a common problem that is similar, but not the same.

Anonymous
04/26/24(Fri)01:21:15 No.100185202

Anonymous 04/26/24(Fri)01:21:15 No.100185202

>>100185131

WHAT

>>100184965

ARE YOU TWO DOING

Anonymous
04/26/24(Fri)01:23:59 No.100185226

Anonymous 04/26/24(Fri)01:23:59 No.100185226

>>100185202
Just ignore the Petra thread. See: >>100185216

Anonymous
04/26/24(Fri)01:25:22 No.100185243

Anonymous 04/26/24(Fri)01:25:22 No.100185243

>>100185202
Jokes on you anon it's one person that's just pretending to be two retarded people

Anonymous
04/26/24(Fri)01:28:54 No.100185273

Anonymous 04/26/24(Fri)01:28:54 No.100185273

ACTUAL NEW THREAD!!!
>>100185269
>>100185269
>>100185269

Anonymous
04/26/24(Fri)01:53:42 No.100185446

Anonymous 04/26/24(Fri)01:53:42 No.100185446

>>100185273
is it though?

Anonymous
04/26/24(Fri)02:38:18 No.100185765

Anonymous 04/26/24(Fri)02:38:18 No.100185765

>>100185087
at the very least they could attempt to address "this common riddle"ness to the user and add "but if you really just mean x, then"

Anonymous
04/26/24(Fri)04:23:07 No.100186547

Anonymous 04/26/24(Fri)04:23:07 No.100186547

>>100185765
but that's not good for coping

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.