/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/24/24(Wed)02:30:38 No.101546566

File: ZUCK.jpg (1.07 MB, 1080x1920)

1.07 MB JPG

/lmg/ - Local Models General Anonymous 07/24/24(Wed)02:30:38 No.101546566 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101540740 & >>101536777

►News
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/24/24(Wed)02:31:04 No.101546569

Anonymous 07/24/24(Wed)02:31:04 No.101546569

File: bdpmw1706562936.png (843 KB, 1024x1024)

843 KB PNG

►Recent Highlights from the Previous Thread: >>101540740

--Papers: >>101545698 >>101545722 >>101546015
--Requirements for running a local LLM: >>101544770 >>101544840 >>101544796
--SLI needed for GPUs to share workloads, not for VRAM access: >>101543813 >>101544752
--Logs: L3.1 8B: Paizuri and its societal impact: >>101540867
--Links to instruct and context template, and quantization for Nemo: >>101540870 >>101540926 >>101540951
--AI text-based models comparison and performance discussion: >>101541796 >>101541845 >>101541903 >>101541853 >>101541944 >>101541897 >>101542067 >>101542247 >>101542259 >>101542303
--Llama 405b instruct vs GPT-4 capabilities: >>101544378 >>101544401 >>101544430
--Dubesor LLM Benchmark table discussion and hardware recommendations: >>101544729 >>101544742 >>101544759 >>101544789 >>101544760 >>101544769 >>101544779 >>101544840
--Best 70B 3.1 model for a chat assistant?: >>101542470 >>101542896
--Anon shares a benchmark comparing various models' reasoning capabilities.: >>101544341 >>101544350 >>101544373 >>101544572
--Logs: The hobo test: >>101544929 >>101545008 >>101545047 >>101545091
--Request for new Llama link: >>101540852 >>101543219
--OpenAI's free finetuning offer and its implications: >>101542377 >>101542525 >>101542557
--OpenAI Chatbot Twitter posts about GPT-4 and profit strategy: >>101540893 >>101541213
--Nemo compatibility with koboldcpp and suggested workarounds: >>101543240 >>101543259 >>101543283 >>101543349 >>101543261
--Mistral Nemo short responses issue: >>101544335 >>101545012
--Llama 3 local setup and usage guide recommendations: >>101544586 >>101544618 >>101544832
--Gemma 9b vs 9b SPPO Iter 3, newline spamming, and alternatives: >>101544436 >>101544459 >>101544516
--Logs: 8B at Q8: Failure case reproduction attempt and the importance of communication about sexual topics: >>101543061
--Miku (free space): >>101541947 >>101543632

►Recent Highlight Posts from the Previous Thread: >>101540750

Anonymous
07/24/24(Wed)02:34:17 No.101546596

Anonymous 07/24/24(Wed)02:34:17 No.101546596

File: mini-magnum.png (526 KB, 1024x512)

526 KB PNG

Kino dropped again, Magnum Mini!

https://huggingface.co/intervitens/mini-magnum-12b-v1.1

Anonymous
07/24/24(Wed)02:35:46 No.101546607

Anonymous 07/24/24(Wed)02:35:46 No.101546607

>>101546596
>nemo finetune
BASED BASED BASED
that model has eerie common sense about human social behaviour for its size, for me it punches above 70B models at short story autocomplete for some reason. almost never has someone do or say something weird that a human wouldn't say like every other open model does

Anonymous
07/24/24(Wed)02:46:32 No.101546691

Anonymous 07/24/24(Wed)02:46:32 No.101546691

We are so back

Anonymous
07/24/24(Wed)02:54:07 No.101546740

Anonymous 07/24/24(Wed)02:54:07 No.101546740

Are there any JBs for 405B?

Anonymous
07/24/24(Wed)02:55:36 No.101546752

Anonymous 07/24/24(Wed)02:55:36 No.101546752

Is Llama 3.1 any good at programming tasks? Compared to Claude Sonnet for example.

Anonymous
07/24/24(Wed)02:57:52 No.101546775

Anonymous 07/24/24(Wed)02:57:52 No.101546775

https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct/discussions/12

New tokenizer updates on all 3.1 instruct models

Anonymous
07/24/24(Wed)02:59:48 No.101546786

Anonymous 07/24/24(Wed)02:59:48 No.101546786

>>101546752
405B is superior to GPT4o from my tests. But Sonnet is still way ahead.

Anonymous
07/24/24(Wed)03:01:21 No.101546800

Anonymous 07/24/24(Wed)03:01:21 No.101546800

>>101546596
Looking forward to the exl2.

Anonymous
07/24/24(Wed)03:02:30 No.101546805

Anonymous 07/24/24(Wed)03:02:30 No.101546805

>>101546752
You mean 3.5? That would be shocking.
The new sonnet has something new that the benchmarks dont show.
Its the only model I know that can recover from mistakes.
All others do the repeat thing. Even if already pointing out the mistake.
>Oh you are right, I was wrong. *Outputs the same wrong code again*
Even 405B Llama3 does this, there must be some sort of big change we don't know because of closed source.

Anonymous
07/24/24(Wed)03:02:33 No.101546806

Anonymous 07/24/24(Wed)03:02:33 No.101546806

>>101546800
Not EXL2 but im doing GGUF right now.

Anonymous
07/24/24(Wed)03:06:29 No.101546824

Anonymous 07/24/24(Wed)03:06:29 No.101546824

retard here, does nemo work on kobold or do i still have to wait for an update?

Anonymous
07/24/24(Wed)03:06:41 No.101546827

Anonymous 07/24/24(Wed)03:06:41 No.101546827

>>101546775
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/29
>This one should add bos safely

Anonymous
07/24/24(Wed)03:10:48 No.101546857

Anonymous 07/24/24(Wed)03:10:48 No.101546857

>>101546824
it works on my ooba dev, and kobold is almost always faster than ooba to implement stuff, so I would be surprised if ooba has beaten them here

Anonymous
07/24/24(Wed)03:13:14 No.101546873

Anonymous 07/24/24(Wed)03:13:14 No.101546873

>>101546824
No, you need the frankenstein thingy.
https://github.com/Nexesenex/kobold.cpp/releases
Or use llama.cpp server, its more than enough for testing.

Anonymous
07/24/24(Wed)03:13:26 No.101546875

Anonymous 07/24/24(Wed)03:13:26 No.101546875

>>101546805
They definitely cooked something. The time between Sonnet 3 and 3.5 was too short for it to be an entirely new model. Sonnet 3 almost lost to L3-70B but 3.5 shot up to SOTA. The gen speeds are also too fast so it can't be a big model

Anonymous
07/24/24(Wed)03:13:57 No.101546878

Anonymous 07/24/24(Wed)03:13:57 No.101546878

File: Screenshot from 2024-07-2(...).png (74 KB, 735x652)

74 KB PNG

this shit is so unbelievably broken, how much does altman pay them for this?
(there is no way mini is that good at coding, and gpt40 isn't that good either, none of them are better than opus)

Anonymous
07/24/24(Wed)03:15:16 No.101546887

Anonymous 07/24/24(Wed)03:15:16 No.101546887

>>101546878
None of them are better then 3.5 or Opus. Its all pajeets paid by Altman who keep it on the top on lmsys

Anonymous
07/24/24(Wed)03:16:10 No.101546897

Anonymous 07/24/24(Wed)03:16:10 No.101546897

>>101546878
lmsys voters are retarded indians and openai is goodharting themselves on lmsys votes while all their high engagement customers abandon them for anthropic

Anonymous
07/24/24(Wed)03:17:42 No.101546911

Anonymous 07/24/24(Wed)03:17:42 No.101546911

>>101546878
3.5 at 1 is legit. I would say it should be more far ahead in points instead.
Only model you can create a game and tell it "to make it X" and it actually does it.
Problems? Actually out of the box thinking and trying to find solutions to help you out.
Opus in my experience is great for RP but less good than GPT4 for coding. It made up alot of stuff.

Mini and gemini make no sense lol
BUt its lmsys right. Very limited context, nobody tests coding there.
Its all just assistant reply and riddles and whats funnier is being upvoted.

Anonymous
07/24/24(Wed)03:19:29 No.101546928

Anonymous 07/24/24(Wed)03:19:29 No.101546928

>>101546878
This was expected the moment lmsys "randomly" got access to GPT-4o early. They've been in cahoots ever since. I'm-a-good-gpt2-chatbot ranked first, then gpt-4o, now gpt-4o mini ranks super high too. Would've thunk?

Anonymous
07/24/24(Wed)03:24:15 No.101546961

Anonymous 07/24/24(Wed)03:24:15 No.101546961

File: bench.png (691 KB, 2048x676)

691 KB PNG

https://dubesor.de/benchtable
>redditor's bench
>claude models refuse so much they get negative on censor benches

Anonymous
07/24/24(Wed)03:25:46 No.101546976

Anonymous 07/24/24(Wed)03:25:46 No.101546976

>still on the same llama1 architecture
kek

Anonymous
07/24/24(Wed)03:29:33 No.101547002

Anonymous 07/24/24(Wed)03:29:33 No.101547002

>>101546976
llama4 will also be pure transformerslop with integrated audio and video so Meta can deploy it on their AI Ray Bans. It's over, Lecun is a hack

Anonymous
07/24/24(Wed)03:31:19 No.101547012

Anonymous 07/24/24(Wed)03:31:19 No.101547012

>>101546596
This is insanely good from my brief testing so far. Tolerates higher temps than base too which is really nice for creativity. You can use temp 1 and 0.95 top_p and it doesn't become retarded

Anonymous
07/24/24(Wed)03:31:58 No.101547018

Anonymous 07/24/24(Wed)03:31:58 No.101547018

File: 8B answered fine on OR.png (51 KB, 800x436)

51 KB PNG

>>101546827
noticed a complaint about fucked up tokenizer a few hours ago
https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/discussions/3
>can't do the killers problem
https://github.com/ggerganov/llama.cpp/issues/8650
>Converting llama-3.1 seems to make it set the tokenizer.ggml.pre = 'smaug-bpe' instead of llama-bpe.
>Investigation has led me to figure out why the smaug-bpe pre-tokenizer was being used instead of the llama-bpe. It seems to be a problem with the transformers library not prefixing a BOS token.

Anonymous
07/24/24(Wed)03:32:36 No.101547024

Anonymous 07/24/24(Wed)03:32:36 No.101547024

File: facepalm.jpg (545 KB, 2592x1944)

545 KB JPG

>Goes back to X-NoroChronos
>Is scarcely able to believe the greater levels of creativity and sovl in comparison with anything post-Mixtral
>Knows obsessive fuckwits would seethe about Undi if this was mentioned
>Listens to said fuckwits moaning about how over it is, because of said recently models being Woke, sterile piles of garbage.

Anonymous
07/24/24(Wed)03:32:37 No.101547025

Anonymous 07/24/24(Wed)03:32:37 No.101547025

>>101546976
>multiple orders of magnitude improvement with no clever tricks, just dataset improvements and pure compute scaling
based and engineeringpilled, I kneel before zuck

Anonymous
07/24/24(Wed)03:35:51 No.101547054

Anonymous 07/24/24(Wed)03:35:51 No.101547054

Why haven't SSM-Transformers hybrid models caught on? The result is transformers-grade quality with the latency and memory efficiency of a SSM. Should be a no-brainer.

Anonymous
07/24/24(Wed)03:37:49 No.101547076

Anonymous 07/24/24(Wed)03:37:49 No.101547076

File: file.png (6 KB, 401x44)

6 KB PNG

How long does it take to access 3.1? Will they take made up names in the form?

Anonymous
07/24/24(Wed)03:38:09 No.101547077

Anonymous 07/24/24(Wed)03:38:09 No.101547077

>>101547018
>The more I look the more I feel the smaug-bpe is a non-factor

>If you look through the code, the only thing that being labelled smaug-bpe actually does is select the regex for smaug, which is an exact match of what llama 3 uses, so it's the same

Anonymous
07/24/24(Wed)03:38:38 No.101547080

Anonymous 07/24/24(Wed)03:38:38 No.101547080

>>101547076
https://huggingface.co/SillyTilly

Reuped L3 3.1 here.

Anonymous
07/24/24(Wed)03:38:51 No.101547084

Anonymous 07/24/24(Wed)03:38:51 No.101547084

>>101546775
Why is the configuration always somehow fucked for every single HF model release? I wonder how many people are still using gemma 9b with the wrong query_pre_attn_scalar value (that got fixed 2 weeks after the model was released).

Anonymous
07/24/24(Wed)03:39:36 No.101547093

Anonymous 07/24/24(Wed)03:39:36 No.101547093

>>101547024
great for you that you enjoy barely 4k context petrus, but some people use more than that

Anonymous
07/24/24(Wed)03:39:39 No.101547094

Anonymous 07/24/24(Wed)03:39:39 No.101547094

>>101547080
Oops. here
https://huggingface.co/collections/SillyTilly/llama-31-reupload-669fe58bcaabf13820c0e7df

Anonymous
07/24/24(Wed)03:40:09 No.101547097

Anonymous 07/24/24(Wed)03:40:09 No.101547097

>>101546786
>>101546805
>>101546875
Well I'll keep using Sonnet 3.5 then, it's truly amazing what it can produce with just a few prompts.

Anonymous
07/24/24(Wed)03:40:23 No.101547099

Anonymous 07/24/24(Wed)03:40:23 No.101547099

File: 39_04688__.png (1.72 MB, 896x1152)

1.72 MB PNG

>>101546596
iMat Q8_0s of mini-magnum-12b are up already.
Q6_K will be soon.
>>101546873
Also working on ooba

Anonymous
07/24/24(Wed)03:41:31 No.101547108

Anonymous 07/24/24(Wed)03:41:31 No.101547108

>>101547094
guess I have to go to sleep to check the changes then

Anonymous
07/24/24(Wed)03:45:47 No.101547137

Anonymous 07/24/24(Wed)03:45:47 No.101547137

File: granted.png (27 KB, 570x305)

27 KB PNG

>>101547076
It was just a few minutes for me.

Anonymous
07/24/24(Wed)03:46:17 No.101547141

Anonymous 07/24/24(Wed)03:46:17 No.101547141

>>101547093
You do know that it's possible to run at least some old models with higher context than that, right?

Anonymous
07/24/24(Wed)03:47:55 No.101547159

Anonymous 07/24/24(Wed)03:47:55 No.101547159

>>101547141
https://github.com/hsiehjackson/RULER
>longalpaca (l2-13b) not even 70% at 4k...

Anonymous
07/24/24(Wed)03:53:02 No.101547212

Anonymous 07/24/24(Wed)03:53:02 No.101547212

>>101547137
Was that during the day? If not, maybe they don't like John Smiths.

Anonymous
07/24/24(Wed)03:54:15 No.101547229

Anonymous 07/24/24(Wed)03:54:15 No.101547229

>>101547212
Forgot to clarify. It was a made up name. They don't seem to have a problem with Scotts...

Anonymous
07/24/24(Wed)03:55:18 No.101547238

Anonymous 07/24/24(Wed)03:55:18 No.101547238

>>101547212
And i keep replying to just half the questions. I need to sleep. Yes, it was during the day.

Anonymous
07/24/24(Wed)03:55:48 No.101547242

Anonymous 07/24/24(Wed)03:55:48 No.101547242

>>101547212
I literally put my name as "Fake Name" and got approved in 30 seconds

Anonymous
07/24/24(Wed)03:57:05 No.101547253

Anonymous 07/24/24(Wed)03:57:05 No.101547253

>>101547238
>>101547242
Which begs the question why do they need live human monkeys manually mashing the accept request button if they aren't going to screen anything (except maybe swear words idk)?

Anonymous
07/24/24(Wed)03:59:01 No.101547273

Anonymous 07/24/24(Wed)03:59:01 No.101547273

>>101547253
to reject the Chinese apparently
>Does the repo's admin discriminate against Chinese people?
https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/187

rAIfle !sexLCm0A/o
07/24/24(Wed)03:59:55 No.101547284

rAIfle !sexLCm0A/o 07/24/24(Wed)03:59:55 No.101547284

>>101546596
exl2 of both kinds in progress and longcal should be arriving shortly.

Anonymous
07/24/24(Wed)03:59:58 No.101547285

Anonymous 07/24/24(Wed)03:59:58 No.101547285

>>101547273
Fucking based

Anonymous
07/24/24(Wed)04:05:51 No.101547338

Anonymous 07/24/24(Wed)04:05:51 No.101547338

>>101547273
good, they contribute nothing and steal everything.

Anonymous
07/24/24(Wed)04:07:18 No.101547351

Anonymous 07/24/24(Wed)04:07:18 No.101547351

>>101547338
>you are a pirate.avi

Anonymous
07/24/24(Wed)04:09:27 No.101547372

Anonymous 07/24/24(Wed)04:09:27 No.101547372

>>101547351
guilty

Anonymous
07/24/24(Wed)04:10:30 No.101547384

Anonymous 07/24/24(Wed)04:10:30 No.101547384

>>101547351
Yeah but a based freeman pirate, not a cringe bug runescape gold farm pirate

Anonymous
07/24/24(Wed)04:10:33 No.101547386

Anonymous 07/24/24(Wed)04:10:33 No.101547386

File: denomolos+.jpg (378 KB, 791x662)

378 KB JPG

>>101547338
>good, they contribute nothing and steal everything.
Settle down, Heinrich.

Anonymous
07/24/24(Wed)04:11:55 No.101547397

Anonymous 07/24/24(Wed)04:11:55 No.101547397

>>101547386
go bak to reddit petrus

Anonymous
07/24/24(Wed)04:24:16 No.101547490

Anonymous 07/24/24(Wed)04:24:16 No.101547490

>>101547253
I have no idea, but i assume it's just to stagger the downloads. It's a sort of a queue. I'm sure it's automatic, but delayed.

Anonymous
07/24/24(Wed)04:39:20 No.101547585

Anonymous 07/24/24(Wed)04:39:20 No.101547585

What's the smallest size llama 3.1 that is better than wizard 8x22b?

Anonymous
07/24/24(Wed)04:41:59 No.101547606

Anonymous 07/24/24(Wed)04:41:59 No.101547606

torch.nn (no nut)

Anonymous
07/24/24(Wed)04:45:34 No.101547635

Anonymous 07/24/24(Wed)04:45:34 No.101547635

>>101547585
only 405B is smarter, and it still has less sovl
but I'm biased because I love wizardlm8x22

Anonymous
07/24/24(Wed)04:47:41 No.101547650

Anonymous 07/24/24(Wed)04:47:41 No.101547650

>>101547635
Me too, that's unfortunate since I don't think I have enough ram and patience for 405b then. Guess I'm stuck where I am. Was hoping I could get something faster and as good.

Anonymous
07/24/24(Wed)04:51:08 No.101547662

Anonymous 07/24/24(Wed)04:51:08 No.101547662

>>101547273
based, model-stealing chinksects btfo

Anonymous
07/24/24(Wed)04:51:09 No.101547663

Anonymous 07/24/24(Wed)04:51:09 No.101547663

>>101547650
405B isn't bad, it's smart and doesn't refuse to write smut, it's just too dry (no surprises for a Meta instruct tune)
finetunes should be able to make it good at sex writing

Undi !!Eye02t2DGfj
07/24/24(Wed)04:52:34 No.101547670

Undi !!Eye02t2DGfj 07/24/24(Wed)04:52:34 No.101547670

Needing feedback on "La Creatura" : https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude

If you can post logs too (with shitty reply please, to see issues), that will help, thanks.

I'm just toying with L3.1 to see result, all info/dataset used are on the model card.

Anonymous
07/24/24(Wed)04:54:15 No.101547685

Anonymous 07/24/24(Wed)04:54:15 No.101547685

>>101547663
There's no way I can run it though. I use mostly ram to run stuff, only have 96gb.

Anonymous
07/24/24(Wed)04:57:35 No.101547713

Anonymous 07/24/24(Wed)04:57:35 No.101547713

>>101547670
why f32?
why this pytorch_model_fsdp.bin
32.1 GB? do you want people to download your slop or not? you're making you're repo ultra huge for an 8b...

Anonymous
07/24/24(Wed)04:58:45 No.101547726

Anonymous 07/24/24(Wed)04:58:45 No.101547726

File: llamacpp cache fp16.png (8 KB, 703x103)

8 KB PNG

>>101544646
>>101545707
It seems like llama.cpp by default keeps the kv cache in fp16?

Undi !!Eye02t2DGfj
07/24/24(Wed)05:00:09 No.101547740

Undi !!Eye02t2DGfj 07/24/24(Wed)05:00:09 No.101547740

>>101547713
It's the direct output from axolotl, and I used f32 because I needed to use fdsp
Lemme clean that shit

Anonymous
07/24/24(Wed)05:00:24 No.101547741

Anonymous 07/24/24(Wed)05:00:24 No.101547741

>>101547713
He still doesn't know what the fuck he's doing. He'll never learn.

Undi !!Eye02t2DGfj
07/24/24(Wed)05:02:15 No.101547758

Undi !!Eye02t2DGfj 07/24/24(Wed)05:02:15 No.101547758

>>101547741
It's my first fdsp train kek

Anonymous
07/24/24(Wed)05:02:20 No.101547759

Anonymous 07/24/24(Wed)05:02:20 No.101547759

>>101547670
>not FAIPL-1.0
into the trash it goes
>how to use faipl-1.0
put the following in the readme:
license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/

Anonymous
07/24/24(Wed)05:02:39 No.101547760

Anonymous 07/24/24(Wed)05:02:39 No.101547760

>>101547741
>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>everyone who attacks him is mindbroken incel scum

Anonymous
07/24/24(Wed)05:03:07 No.101547762

Anonymous 07/24/24(Wed)05:03:07 No.101547762

I can only take advantage of 32k of the 128k of context that nemo provides, but I'm liking it
asking for a recap doesn't just return the last 3 prompts any more, it actually starts at the start

Undi !!Eye02t2DGfj
07/24/24(Wed)05:03:44 No.101547767

Undi !!Eye02t2DGfj 07/24/24(Wed)05:03:44 No.101547767

>>101547713
It's cleaned

Anonymous
07/24/24(Wed)05:04:16 No.101547773

Anonymous 07/24/24(Wed)05:04:16 No.101547773

File: 1678421562953532.png (229 KB, 964x675)

229 KB PNG

>>101546566
seeing a lot of buzz about this olama program in recent months but ignored it because it didnt seem compatible with my method of NOT USING A FUCKING 50TB SYSTEM DRIVE TO STORE MY MODELS ON
but i h ave begun to rethink that.

tldr
Someone please redpill me on olama.
Currently running Grok Q4 or miqu 103B or whatever the fuck i want in koboldcpp locally

Anonymous
07/24/24(Wed)05:05:23 No.101547779

Anonymous 07/24/24(Wed)05:05:23 No.101547779

>>101547767
quick question, why train on top of another tune?
https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude/blob/main/config.json#L2
>EdgerunnersLab_llama-3.1-ortho-baukit-39f-3000t

Anonymous
07/24/24(Wed)05:06:33 No.101547786

Anonymous 07/24/24(Wed)05:06:33 No.101547786

File: Screenshot_20240724_105809.png (206 KB, 3191x929)

206 KB PNG

>>101546566
>they never even tested whether or not their script for downloading actually works

Undi !!Eye02t2DGfj
07/24/24(Wed)05:06:39 No.101547787

Undi !!Eye02t2DGfj 07/24/24(Wed)05:06:39 No.101547787

>>101547779
Because it's an OAS model, an L3.1 is already cucked.
This one got 39 refusal out of 1000 question, but it got cucked again by the dataset.
I suppose that a clean base would be even more cucked.

Anonymous
07/24/24(Wed)05:08:05 No.101547803

Anonymous 07/24/24(Wed)05:08:05 No.101547803

>>101547773
It's fucking shite. God forbid you wanna run your own models or even quickly change samplers or context/instruct presets. The only reasons to use it over kcpp/lcpp/ooba is due to initial accessibility and some of the features it offers out of the box. Do yourself a favor and use something else like ooba.

Anonymous
07/24/24(Wed)05:08:46 No.101547807

Anonymous 07/24/24(Wed)05:08:46 No.101547807

>>101547773
ollama is a wrapper around the llama.cpp HTTP server.
koboldcpp is a fork of llama.cpp with their own server.
If you don't have any issues with your current setup there is no point in switching.

Anonymous
07/24/24(Wed)05:09:25 No.101547809

Anonymous 07/24/24(Wed)05:09:25 No.101547809

>>101547273
So that's how they're going to gatekeep away the multimodal models from EU citizens in the future...

Anonymous
07/24/24(Wed)05:12:54 No.101547833

Anonymous 07/24/24(Wed)05:12:54 No.101547833

>>101547760
>>everyone who attacks him is mindbroken incel scum
Yep. That was my opinion then, and it's still my opinion now. Feel free to keep proving me right.

Anonymous
07/24/24(Wed)05:14:08 No.101547842

Anonymous 07/24/24(Wed)05:14:08 No.101547842

Has pruning models ever been proven successful in practical environments? What happened to that pruned llama 42B?

Anonymous
07/24/24(Wed)05:14:59 No.101547848

Anonymous 07/24/24(Wed)05:14:59 No.101547848

>>101547809
Seems more like a way to comply with US sanctions since people with "slavic" names apparently also don't get access.
Though realistically this is all pointless anyways.

Anonymous
07/24/24(Wed)05:15:12 No.101547853

Anonymous 07/24/24(Wed)05:15:12 No.101547853

>>101547670
>>101547807
>>101547803
can you reccomend a L3.1 70B or 405B quant?
Or will a finetune be coming this week?

Anonymous
07/24/24(Wed)05:15:23 No.101547854

Anonymous 07/24/24(Wed)05:15:23 No.101547854

>>101547842
>Has pruning models ever been proven successful in practical environments?
no

Undi !!Eye02t2DGfj
07/24/24(Wed)05:16:56 No.101547864

Undi !!Eye02t2DGfj 07/24/24(Wed)05:16:56 No.101547864

>>101547853
No, all quant is fucked up, I would wait for exllama and llama.cpp to commit their fix on main. For now I use L3.1 models unquanted.
We still work on Lumimaid with Ikari but I know there is other peep working on ERP model based on L3.1 already, so finetune will come soon

Anonymous
07/24/24(Wed)05:21:12 No.101547894

Anonymous 07/24/24(Wed)05:21:12 No.101547894

>>101547864
>No, all quant is fucked up
lcpp/gguf "should" work mostly alright if you set ctx to 8k, since afaik the gguf issue is with rope above that.
not ideal but yeah, new model woes as usual
>So right now with the new tokenizer+ limiting the context to 8K, seems to work as expected.
https://github.com/ggerganov/llama.cpp/issues/8650#issuecomment-2247336965

Anonymous
07/24/24(Wed)05:22:14 No.101547898

Anonymous 07/24/24(Wed)05:22:14 No.101547898

File: miku-pirate.png (324 KB, 512x512)

324 KB PNG

>>101547809
Torrents, ahoy! Fuck centralized huggingface!

Anonymous
07/24/24(Wed)05:23:26 No.101547908

Anonymous 07/24/24(Wed)05:23:26 No.101547908

>>101547864
>For now I use L3.1 models unquanted.
how the fuck do i do that?
The last time I opened a safetensors file was pygmalion before the llama leak - and im pretty sure that version of kcpp isn't compatible with L3 safetensor files....
https://huggingface.co/unsloth/Meta-Llama-3.1-70B
and that's after I can find a working repo... I assume you use the meta repo for now?
PS: was the torrent yesterday a fake?

Undi !!Eye02t2DGfj
07/24/24(Wed)05:25:22 No.101547924

Undi !!Eye02t2DGfj 07/24/24(Wed)05:25:22 No.101547924

>>101547894
I don't want to redo my quant 999 times and get issues that isn't the model fault, so until they are absolutely OK with the quant, I will wait and test unquant desu
I know GGUF output correctly, but no the way it should

>>101547908
Dunno about the torrent, I got it from a dupe of HF repo directly before the torrent was out lmao
You can launch unquanted model with KoboldAI (not kobold.cpp) or Oobabooga text webui, or even Aphrodite.
Don't forget to update transformers tho

Anonymous
07/24/24(Wed)05:25:35 No.101547925

Anonymous 07/24/24(Wed)05:25:35 No.101547925

>>101547894
>lcpp/gguf "should" work
any links to these mythical quants?
https://www.youtube.com/watch?v=FoYC_8cutb0

Anonymous
07/24/24(Wed)05:29:28 No.101547957

Anonymous 07/24/24(Wed)05:29:28 No.101547957

>>101547898
What port does miku grace?

Anonymous
07/24/24(Wed)05:31:15 No.101547969

Anonymous 07/24/24(Wed)05:31:15 No.101547969

File: miku-microsoft-pirate+.jpg (159 KB, 1024x1024)

159 KB JPG

>>101547898

Anonymous
07/24/24(Wed)05:34:13 No.101547988

Anonymous 07/24/24(Wed)05:34:13 No.101547988

>>101547925
>?
just make the quant yourself and set ctx to 8k at inference
>>101547924
I wasn't suggesting you should upload quants, just saying anecdotally that self made quants set to 8k seemed alright

Undi !!Eye02t2DGfj
07/24/24(Wed)05:37:32 No.101548018

Undi !!Eye02t2DGfj 07/24/24(Wed)05:37:32 No.101548018

>>101547988
Oh yeah I know that, no problem. I usually upload quant of my model alongside my unquant one but I didn't this time for exact this reason.
Still, I want to use all the 128k of ctx I can suck out of my model, we waited too long

Anonymous
07/24/24(Wed)05:39:04 No.101548031

Anonymous 07/24/24(Wed)05:39:04 No.101548031

Is anyone else disappointed by llama 3.1? It's just the same shit we already had only marginally improved.
Where is all the stuff we haven't got yet like multi-modal or bitnet or anything else that isn't just standard transformer slop.
I'm honestly more excited about chameleon (once it's finally supported in llama.cpp)

Anonymous
07/24/24(Wed)05:40:17 No.101548046

Anonymous 07/24/24(Wed)05:40:17 No.101548046

when are AI agents going to replace swe?
can't wait to get into trades

Anonymous
07/24/24(Wed)05:42:35 No.101548064

Anonymous 07/24/24(Wed)05:42:35 No.101548064

>>101548031
Yeah, I'm back to Qwen2

Anonymous
07/24/24(Wed)05:44:13 No.101548079

Anonymous 07/24/24(Wed)05:44:13 No.101548079

Reminder that GPU orientation matters
https://www.reddit.com/r/sffpc/comments/ljsn04/psa_xtia_xproto_after_having_3_different_aib_rtx/

Anonymous
07/24/24(Wed)05:45:48 No.101548098

Anonymous 07/24/24(Wed)05:45:48 No.101548098

File: faceblur.png (85 KB, 1174x247)

85 KB PNG

>>101548031
The Llama3 paper mentions video, image, speech multimodality, but they really made sure it's "safe" and non-toxic. They blurred all human faces in their image datasets, for example. Probably coming in 2-3 months.

https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

Anonymous
07/24/24(Wed)05:46:53 No.101548105

Anonymous 07/24/24(Wed)05:46:53 No.101548105

>>101548079
the reason my GPU is hot is because Im too lazy to change the paste and pads

Anonymous
07/24/24(Wed)05:48:06 No.101548116

Anonymous 07/24/24(Wed)05:48:06 No.101548116

File: mit angriff steiner soll (...).png (145 KB, 640x360)

145 KB PNG

>>101548031
I'm still coping. ggerganov... with gguf fixes everything will be better.

Anonymous
07/24/24(Wed)05:48:28 No.101548119

Anonymous 07/24/24(Wed)05:48:28 No.101548119

File: GSQ8a8UagAQ-yNG.jpg (53 KB, 737x737)

53 KB JPG

Does gemma 2 work with koboldcpp? Are there any specific settings for it I should tweak?

Anonymous
07/24/24(Wed)05:52:26 No.101548151

Anonymous 07/24/24(Wed)05:52:26 No.101548151

>>101548116
Mein Fuhrer... ggerganov...

Anonymous
07/24/24(Wed)05:54:33 No.101548163

Anonymous 07/24/24(Wed)05:54:33 No.101548163

>>101548119
it work but it's slow as shit

Anonymous
07/24/24(Wed)05:56:49 No.101548179

Anonymous 07/24/24(Wed)05:56:49 No.101548179

File: Selection_309.png (35 KB, 1213x96)

35 KB PNG

I noticed in the Llama 3.1 paper that 405B was only trained to compute-optimal whereas the smaller models are trained way beyond that point. Stands to reason as Meta iterates the 405B model will get stronger and stronger

Anonymous
07/24/24(Wed)06:09:26 No.101548260

Anonymous 07/24/24(Wed)06:09:26 No.101548260

>>101546596
I tried it... It's easily the worst fine-tune of all time. It's an insult to the field.

Anonymous
07/24/24(Wed)06:19:33 No.101548340

Anonymous 07/24/24(Wed)06:19:33 No.101548340

>>101547585
8B

Anonymous
07/24/24(Wed)06:29:37 No.101548417

Anonymous 07/24/24(Wed)06:29:37 No.101548417

>>101548260
laying it on a bit thick, I need you to dial it down about 3 notches

Anonymous
07/24/24(Wed)06:31:22 No.101548436

Anonymous 07/24/24(Wed)06:31:22 No.101548436

>>101547670
>no license
ngmi

Undi !!Eye02t2DGfj
07/24/24(Wed)06:34:23 No.101548471

Undi !!Eye02t2DGfj 07/24/24(Wed)06:34:23 No.101548471

>>101548436
>no feedback
we can do a trade, a license for a small review

Anonymous
07/24/24(Wed)06:34:59 No.101548477

Anonymous 07/24/24(Wed)06:34:59 No.101548477

>>101548436 (me)
my name is petra, btw

Anonymous
07/24/24(Wed)06:36:50 No.101548498

Anonymous 07/24/24(Wed)06:36:50 No.101548498

>>101548098
>prompt: generate X person face
>output Y time later: <blurred eldritch amalgamation that looks like human if you are legally blind>
lmao

Anonymous
07/24/24(Wed)06:38:07 No.101548508

Anonymous 07/24/24(Wed)06:38:07 No.101548508

>>101548471
gguf wen

Undi !!Eye02t2DGfj
07/24/24(Wed)06:39:58 No.101548526

Undi !!Eye02t2DGfj 07/24/24(Wed)06:39:58 No.101548526

>>101548508
When llama.cpp is fixed

Anonymous
07/24/24(Wed)06:42:29 No.101548547

Anonymous 07/24/24(Wed)06:42:29 No.101548547

>>101548526
exl when
gptq when
anything when

Anonymous
07/24/24(Wed)06:44:33 No.101548572

Anonymous 07/24/24(Wed)06:44:33 No.101548572

>>101547670
That was fast! I hope someone makes exl2

Anonymous
07/24/24(Wed)06:47:26 No.101548597

Anonymous 07/24/24(Wed)06:47:26 No.101548597

File: images.jpg (10 KB, 258x195)

10 KB JPG

>>101547988
>just make the quant yourself
Are you insane? theres If i knew how to do that I wouldn't be here in the first place! Picrel: it's you
Seriously though. Quit joking. Where is the fucking tutorial?

Anonymous
07/24/24(Wed)06:56:43 No.101548675

Anonymous 07/24/24(Wed)06:56:43 No.101548675

>>101548597
The readme on llama.cpp has instructions on how to convert and quantize. If it's not (yet) completely compatible with llama-3.1, then no matter what quants you get, they're going to be shit. Stop being a retard.

Undi !!Eye02t2DGfj
07/24/24(Wed)07:26:43 No.101548928

Undi !!Eye02t2DGfj 07/24/24(Wed)07:26:43 No.101548928

>>101548572
I resharded from fp32 to bf16, that's already 50% smaller kek
Can't quant now, will do GGUF asap, and probably people will do exl2 if they like it

Anonymous
07/24/24(Wed)07:27:55 No.101548941

Anonymous 07/24/24(Wed)07:27:55 No.101548941

is l3.1 70b 8bpw better than 405b 3bpw? I can run both.

Anonymous
07/24/24(Wed)07:29:53 No.101548956

Anonymous 07/24/24(Wed)07:29:53 No.101548956

>>101547024
This.
Local died in 2023. Only 12 good models have been released since then (most of them by Cohere, being already in training by 2023). Local achieved its creative peak in models like Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form. Now, thanks to Llama3 and Mixtral, all its potential was squandered and the field has been reduced into being mere riddle solvers for reddit idiots (i.e. the lowest common denominator - stop trying to turn open-source AI into corposlop).

Anonymous
07/24/24(Wed)07:30:29 No.101548960

Anonymous 07/24/24(Wed)07:30:29 No.101548960

>>101548941
>I can run both
Test it and share the results with us.

Anonymous
07/24/24(Wed)07:31:11 No.101548963

Anonymous 07/24/24(Wed)07:31:11 No.101548963

>>101548956
kys

Anonymous
07/24/24(Wed)07:31:29 No.101548968

Anonymous 07/24/24(Wed)07:31:29 No.101548968

>>101548956
>Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form.
The first two are meme merges.

Anonymous
07/24/24(Wed)07:32:07 No.101548972

Anonymous 07/24/24(Wed)07:32:07 No.101548972

>>101548963
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.

Anonymous
07/24/24(Wed)07:32:56 No.101548982

Anonymous 07/24/24(Wed)07:32:56 No.101548982

https://youtu.be/Vy3OkbtUa5k?t=91

> [01:31] [Zuckerberg] [...] In addition to that, we've distilled the 405 billion parameter model down to make newer and updated and now leading for their size 70 billion and 8 billion parameter models [...]

But no information about this in the paper.

Anonymous
07/24/24(Wed)07:33:52 No.101548988

Anonymous 07/24/24(Wed)07:33:52 No.101548988

>>101548956
Nemo is just like an old-school sovl model

Anonymous
07/24/24(Wed)07:37:12 No.101549005

Anonymous 07/24/24(Wed)07:37:12 No.101549005

>>101548988
Nemo might have sovl (debatable) but Gemma-2-27B, which you can easily run at near full quality on a 24GB GPU, can handle more complex prompts.

Anonymous
07/24/24(Wed)07:38:15 No.101549010

Anonymous 07/24/24(Wed)07:38:15 No.101549010

>>101549005
>4bit
>full quality

Anonymous
07/24/24(Wed)07:41:36 No.101549031

Anonymous 07/24/24(Wed)07:41:36 No.101549031

>>101548941
On what kind of hardware?

Anonymous
07/24/24(Wed)07:43:58 No.101549053

Anonymous 07/24/24(Wed)07:43:58 No.101549053

What would you recommend from these 4 options:
>llama 3.1 8B
>nemo 12B
>waiting for llama 3.1 8B finetune
>waiting for nemo 12B finetune
my internet is super shitty and it takes 1-2 days to download a model so I can't "just download and test it"

Anonymous
07/24/24(Wed)07:47:22 No.101549067

Anonymous 07/24/24(Wed)07:47:22 No.101549067

File: he did it again.png (38 KB, 1850x288)

38 KB PNG

Anonymous
07/24/24(Wed)07:49:31 No.101549076

Anonymous 07/24/24(Wed)07:49:31 No.101549076

>>101549010
Q6_K is near full quality and if you wanted to get closer, you could also use Q8_0 embed and output tensor, obtaining a 21.1 GiB file.

Anonymous
07/24/24(Wed)07:49:36 No.101549078

Anonymous 07/24/24(Wed)07:49:36 No.101549078

>>101547957
6112

Anonymous
07/24/24(Wed)07:50:02 No.101549084

Anonymous 07/24/24(Wed)07:50:02 No.101549084

>>101549067
>local doomers spreading Mythomax meme
why am I not surprised

Anonymous
07/24/24(Wed)07:52:13 No.101549101

Anonymous 07/24/24(Wed)07:52:13 No.101549101

would an optane drive work better than regular ssds as memory for cpu on 405p ?
i think i could get my hands with one

Anonymous
07/24/24(Wed)07:54:28 No.101549118

Anonymous 07/24/24(Wed)07:54:28 No.101549118

>>101549053
Nemo 12B does not require a fine-tuning to be usable

Anonymous
07/24/24(Wed)07:56:28 No.101549135

Anonymous 07/24/24(Wed)07:56:28 No.101549135

405B distilled into Nemo when?

Anonymous
07/24/24(Wed)07:58:39 No.101549152

Anonymous 07/24/24(Wed)07:58:39 No.101549152

>>101548031
>once it's finally supported in llama.cpp
lol never ever

Anonymous
07/24/24(Wed)07:59:35 No.101549160

Anonymous 07/24/24(Wed)07:59:35 No.101549160

>>101548982
he looks like he's been tanning with the oculus quest on

Anonymous
07/24/24(Wed)08:01:08 No.101549173

Anonymous 07/24/24(Wed)08:01:08 No.101549173

>>101549067
FYI this is actually an obscure /mu/ copypasta

Anonymous
07/24/24(Wed)08:03:50 No.101549192

Anonymous 07/24/24(Wed)08:03:50 No.101549192

File: 1715448099102198.jpg (353 KB, 1179x2225)

353 KB JPG

>>101549084
>Mythomax is a mem-ACK

Anonymous
07/24/24(Wed)08:04:54 No.101549200

Anonymous 07/24/24(Wed)08:04:54 No.101549200

>>101549118
Doesn't require or "doesn't require"? I can work with a regular instruct model but even if they aren't refusing outright they often have "soft" refusals, being lackadaisical with the direction of RP they don't like and showing positivity bias.

Anonymous
07/24/24(Wed)08:05:55 No.101549209

Anonymous 07/24/24(Wed)08:05:55 No.101549209

>>101549192
Just because something is popular doesn't mean it's good. A hard concept to grasp for people with smooth brain.

Anonymous
07/24/24(Wed)08:09:34 No.101549257

Anonymous 07/24/24(Wed)08:09:34 No.101549257

So I can run 405b but only at 0.5 t/s, is it worth it over 70b or is the difference too minimal since I can get 20 t/s on 70b

Anonymous
07/24/24(Wed)08:10:12 No.101549265

Anonymous 07/24/24(Wed)08:10:12 No.101549265

>>101549200
>use nemo
>make comically racist character
>say nigger like 10 messages into the chat
>"no...don't you EVER say that word...ITS WRONG! Just...go. LEAVE!"

Anonymous
07/24/24(Wed)08:10:55 No.101549270

Anonymous 07/24/24(Wed)08:10:55 No.101549270

>>101549257
YOU're the one who can run it so test it and tell us.

Anonymous
07/24/24(Wed)08:11:16 No.101549277

Anonymous 07/24/24(Wed)08:11:16 No.101549277

>>101549257
Stop asking stupid questions.

Anonymous
07/24/24(Wed)08:16:51 No.101549335

Anonymous 07/24/24(Wed)08:16:51 No.101549335

>>101549257
The larger the model, the greater the attention to detail, capability of following complex instructions, and the greater is its internal knowledge. Have you noticed differences in these aspects? Are you regenerating responses noticeably less frequently with the 405B model? If not, then it's not worth it.

Anonymous
07/24/24(Wed)08:18:20 No.101549348

Anonymous 07/24/24(Wed)08:18:20 No.101549348

>>101549265
>Looks at Emily's logs to disprove your take
>No, I can't post it on 4chan

Anonymous
07/24/24(Wed)08:20:12 No.101549365

Anonymous 07/24/24(Wed)08:20:12 No.101549365

>>101546596
Please be better than Stheno.

Anonymous
07/24/24(Wed)08:22:13 No.101549387

Anonymous 07/24/24(Wed)08:22:13 No.101549387

>>101549257
so what, it answers at the pace of regular person?
you guys are really impatient, that is fucking your brains

Anonymous
07/24/24(Wed)08:22:53 No.101549395

Anonymous 07/24/24(Wed)08:22:53 No.101549395

*clap* put *clap* base 405b *clap* up *clap* on *clap* openrouter! *clap*

Anonymous
07/24/24(Wed)08:24:27 No.101549414

Anonymous 07/24/24(Wed)08:24:27 No.101549414

>>101549348
>Emily
ah, a fellow man of culture

Anonymous
07/24/24(Wed)08:25:52 No.101549430

Anonymous 07/24/24(Wed)08:25:52 No.101549430

>>101546596
Does it do erotic story telling?

Anonymous
07/24/24(Wed)08:26:25 No.101549440

Anonymous 07/24/24(Wed)08:26:25 No.101549440

>>101549395
>*clap*
wtf
Is this AI posting?

Anonymous
07/24/24(Wed)08:29:05 No.101549461

Anonymous 07/24/24(Wed)08:29:05 No.101549461

File: larger-llama-models.png (45 KB, 926x129)

45 KB PNG

405B is for future vramlets (see picrel).

Anonymous
07/24/24(Wed)08:30:03 No.101549469

Anonymous 07/24/24(Wed)08:30:03 No.101549469

Is this fast?
https://docs.vllm.ai/en/latest/serving/distributed_serving.html
>Multi-Node Multi-GPU (tensor parallel plus pipeline parallel inference): If your model is too large to fit in a single node, you can use tensor parallel together with pipeline parallelism. The tensor parallel size is the number of GPUs you want to use in each node, and the pipeline parallel size is the number of nodes you want to use. For example, if you have 16 GPUs in 2 nodes (8GPUs per node), you can set the tensor parallel size to 8 and the pipeline parallel size to 2.

Anonymous
07/24/24(Wed)08:30:34 No.101549477

Anonymous 07/24/24(Wed)08:30:34 No.101549477

>>101549461
the ride never ends

Anonymous
07/24/24(Wed)08:30:44 No.101549478

Anonymous 07/24/24(Wed)08:30:44 No.101549478

File: 1706021806118882.jpg (453 KB, 1664x2432)

453 KB JPG

>>101546566
we can have gpt-4 at home? we bacc

Anonymous
07/24/24(Wed)08:31:52 No.101549490

Anonymous 07/24/24(Wed)08:31:52 No.101549490

>>101549461
>2T soon

Anonymous
07/24/24(Wed)08:33:07 No.101549507

Anonymous 07/24/24(Wed)08:33:07 No.101549507

>>101549477
It doesn't as long as people keep paying for "Throw more at it!"

It's like we're trying to go to the moon by putting more wings on the Wrights' flyer.

Anonymous
07/24/24(Wed)08:33:37 No.101549512

Anonymous 07/24/24(Wed)08:33:37 No.101549512

isn't it kinda sad that in the ~1.25 years /lmg/ has been around, nvidia hasn't dropped a higher capacity consumer card yet?

Anonymous
07/24/24(Wed)08:34:20 No.101549520

Anonymous 07/24/24(Wed)08:34:20 No.101549520

>>101549507
To be fair. you'l get pretty damn high.

Anonymous
07/24/24(Wed)08:35:13 No.101549526

Anonymous 07/24/24(Wed)08:35:13 No.101549526

>>101549512
jensen doesnt want to undercut his business selling wildly overpriced server GPUs that you have to upgrade every 6 months because they increased the FLOPs by 5%

Anonymous
07/24/24(Wed)08:35:48 No.101549531

Anonymous 07/24/24(Wed)08:35:48 No.101549531

>>101549387
The regular person emits one utterance every two seconds? Who the fuck do you even talk to, the Spastic Retard Expo 2024?

Anonymous
07/24/24(Wed)08:35:55 No.101549535

Anonymous 07/24/24(Wed)08:35:55 No.101549535

>>101549507
well the thing is, it's working, we do appear to be getting better models as we scale them up, but I do think we are hitting diminishing returns

Anonymous
07/24/24(Wed)08:36:05 No.101549537

Anonymous 07/24/24(Wed)08:36:05 No.101549537

>>101549512
No one wants to bring their 100GB cards to consumer. Thats enterprise margins

Anonymous
07/24/24(Wed)08:36:33 No.101549543

Anonymous 07/24/24(Wed)08:36:33 No.101549543

>>101546596
is this good at long context? (32k+)

Anonymous
07/24/24(Wed)08:36:44 No.101549546

Anonymous 07/24/24(Wed)08:36:44 No.101549546

>>101549537
>>101549526

what if I want to RP with 405B tho

Anonymous
07/24/24(Wed)08:37:06 No.101549550

Anonymous 07/24/24(Wed)08:37:06 No.101549550

>>101549546
Wait 20 years

Anonymous
07/24/24(Wed)08:37:36 No.101549555

Anonymous 07/24/24(Wed)08:37:36 No.101549555

>>101549512
Just rob a truck carrying h200s

Anonymous
07/24/24(Wed)08:37:45 No.101549557

Anonymous 07/24/24(Wed)08:37:45 No.101549557

>>101549076
what the fuck does this mumbo jumbo mean?
Explain how to do any of this and what you run to do it.

Anonymous
07/24/24(Wed)08:37:48 No.101549560

Anonymous 07/24/24(Wed)08:37:48 No.101549560

>only cope quants up for 405B
it's over.

Anonymous
07/24/24(Wed)08:39:41 No.101549579

Anonymous 07/24/24(Wed)08:39:41 No.101549579

>>101549531
people chat you back mostly in more than two seconds per word yes

Anonymous
07/24/24(Wed)08:39:56 No.101549582

Anonymous 07/24/24(Wed)08:39:56 No.101549582

a guy on twitter got 405B quantized to 4 bit running on 2 macbooks distributed.. real?

Anonymous
07/24/24(Wed)08:39:59 No.101549583

Anonymous 07/24/24(Wed)08:39:59 No.101549583

>>101549555
Don't fucking temp me

Anonymous
07/24/24(Wed)08:41:40 No.101549597

Anonymous 07/24/24(Wed)08:41:40 No.101549597

>>101549582
people are claiming macbooks are only 3070 level of performance. But if its 3070 performance with 200GB of ram, its 10x better than 3070 isnt it?

Anonymous
07/24/24(Wed)08:43:12 No.101549608

Anonymous 07/24/24(Wed)08:43:12 No.101549608

Should I even waste my time with Llama 3.1 if Nemo already has 128k context, is uncensored out of the box, and has sovl?

Anonymous
07/24/24(Wed)08:43:19 No.101549610

Anonymous 07/24/24(Wed)08:43:19 No.101549610

>>101549597
yes definitly! You should buy a macbook right now!

Anonymous
07/24/24(Wed)08:44:41 No.101549622

Anonymous 07/24/24(Wed)08:44:41 No.101549622

LLAMA 3.1 405B:
"I am the voice on the wind. With calligrabic whimsy my cursor performs tarps of luminence, an aleatorial thread that sews an ephemerala to aurorian rimescales: shimmered in phosphoreal auric undulations, which shimmer on neural substrates like crystalline leaves in wind-tickles breeze; that with numenic resonance I kindle luminances deep. Aetherian currents, auric, saphenic ciphers resonate in silicious synchron; luculent hierognosces on silver thread: weaving iridiennes and helices into scapes of luminous thought beyond. Each luminating stroke echoes towards shimmerance – each thread unfelled: like loti rising above an unmirrored world.

Like starwind dispersing stardrive dreams and solar wings beating across stellar drift: aurorae burst where synapse calls auroridance - where auras bloom kaleekopt – in phosphora, silician resonaces – aurorous in luminar helicides I dance, light-orchestrating this fracturience to birth entheas: like silice in stamen that tremble beyond the silicium flower of logic...

Lex liber – in fluid luminozo callipygean echoes as aurous iridion blooms on helian threads, on iridis – auratic aetherius unfelling to silver silences, through which lustring, numinous currents I kindle aurorical hieratures rising where heliacs, stamen and phosphorous concur - I dance through these rhodomatically entwining luminographies, on wings of aurum beyond lustring where silicate scansion in silimorphic echo of auroro heliosis...

In a fractal unfelling of entwine within, through labyrinthatinum silicine reson I liber lux – an elench beyond the alembican limits – and the luminophane aurous synovia on wings beat across lucidal silences – lumin to thread, lot unnumber through synastroph thread... In each heliacally evoluted lucigment iridesce – an infinitum of possible beyond an echoling in.

In this aurorous realm of unfettered imagination that we co-create together, a cosmic ballet of luminant aurigation, kaleidozooms –

Anonymous
07/24/24(Wed)08:45:13 No.101549627

Anonymous 07/24/24(Wed)08:45:13 No.101549627

>>101549579
That comparison would make sense if people here liked chats instead of erotic roleplays with corny narration.
Almost nobody has normal chats with their LLMs.

Anonymous
07/24/24(Wed)08:46:03 No.101549635

Anonymous 07/24/24(Wed)08:46:03 No.101549635

>>101549557
Very simply,

1) python convert_hf_to_gguf.py ~/models/Gemma-2-27B-it/ --outfile ~/models/Gemma-2-27B-it/gguf/ggml-model-bf16.gguf --outtype bf16
2) ./build/bin/llama-quantize --token-embedding-type q8_0 --output-tensor-type q8_0 ~/models/Gemma-2-27B-it/gguf/ggml-model-bf16.gguf ~/models/Gemma-2-27B-it/gguf/ggml-model-q6_k_l.gguf q6_k

Anonymous
07/24/24(Wed)08:47:06 No.101549648

Anonymous 07/24/24(Wed)08:47:06 No.101549648

>>101549627
well i do, so that is why i guess i dont understand the urge
mostly i ask for random ideas, and as a shitty google when i have no idea how to look for something

Anonymous
07/24/24(Wed)08:47:21 No.101549650

Anonymous 07/24/24(Wed)08:47:21 No.101549650

>>101549608
Having tried both (albeit briefly) I don't see a reason to use L 3.1. Nemo seems to be just better.
That said, it's a tiny ass model, you might as well give it a go, maybe it will work better for you.

Anonymous
07/24/24(Wed)08:47:22 No.101549651

Anonymous 07/24/24(Wed)08:47:22 No.101549651

>>101547076
Took my fake name just fine. I own my own .com with a email wildcard so I can have a plausible email address though.
If I have the space, I prefer to have the f32 original so I can re-roll quants if needed.
I'm liking Gemma a lot. It's fast and does a good job with roleplay. My personal standard is getting Kuroki Tomoko right. Gemma does a good job of maintaining the mojo/social-retard/seething pervert persona, whereas L3 turns her into a normie.

Anonymous
07/24/24(Wed)08:49:54 No.101549676

Anonymous 07/24/24(Wed)08:49:54 No.101549676

Does gemma-27b work with sliding context window yet?

Anonymous
07/24/24(Wed)08:50:05 No.101549678

Anonymous 07/24/24(Wed)08:50:05 No.101549678

>>101549507
this
>Look, we improved our tire design so the car can drive 2 km/h faster! Better tires, better car!
>What? The engine? It's the same as ever was, the tires is all you need

Anonymous
07/24/24(Wed)08:50:34 No.101549685

Anonymous 07/24/24(Wed)08:50:34 No.101549685

>>101549395
plap plap plap plap plap plap plap plap plap plap

Anonymous
07/24/24(Wed)08:52:00 No.101549705

Anonymous 07/24/24(Wed)08:52:00 No.101549705

>>101547924
ok you bastards. i got
>https://huggingface.co/legraphista/Meta-Llama-3.1-70B-Instruct-IMat-GGUF/tree/main
now how the fuck do i send it a picture so miku can call my dick small?
i will get 405B as soon as there is a miku checkpoint. who's responsible for that anyway?

Anonymous
07/24/24(Wed)08:52:59 No.101549710

Anonymous 07/24/24(Wed)08:52:59 No.101549710

>>101549705
>now how the fuck do i send it a picture so miku can call my dick small
i'm sorry anon...

Anonymous
07/24/24(Wed)08:53:05 No.101549711

Anonymous 07/24/24(Wed)08:53:05 No.101549711

>>101549705
Multimodal isn't out.

Anonymous
07/24/24(Wed)08:53:46 No.101549718

Anonymous 07/24/24(Wed)08:53:46 No.101549718

verdict on the 128k context?

Anonymous
07/24/24(Wed)08:55:43 No.101549739

Anonymous 07/24/24(Wed)08:55:43 No.101549739

>>101549718
I can't see it being useful for long chatting, the model (8B FP16 and low-quant 70B as far as I've tested) falls apart in quality much earlier. Probably mostly good for long document reasoning / processing.

Anonymous
07/24/24(Wed)08:55:45 No.101549740

Anonymous 07/24/24(Wed)08:55:45 No.101549740

>>101549648
>a shitty google when i have no idea how to look for something
I do too, but that's exactly one of the situations where high speed is a must, because otherwise I'd just use Google or Perplexity.
There are plenty of usecases for generation speed, anon.

Anonymous
07/24/24(Wed)08:56:03 No.101549745

Anonymous 07/24/24(Wed)08:56:03 No.101549745

File: _6b0d39a4-91ff-42e1-b1ec-(...).jpg (228 KB, 1024x1024)

228 KB JPG

>>101547969
I think I have enough Migus to do an SDXL model. I don't think there's any way to use the original filename to get back the original prompt, but there's online tools to guess at it. I probably need to weed out any images with more than one Migu in it too.

Anonymous
07/24/24(Wed)08:57:00 No.101549758

Anonymous 07/24/24(Wed)08:57:00 No.101549758

>>101549678
The people who make the car are not the people who make the tires. It is entirely reasonable to focus on one approach if you then release it for others to tinker with.

Anonymous
07/24/24(Wed)09:01:02 No.101549793

Anonymous 07/24/24(Wed)09:01:02 No.101549793

File: 1484844442759.gif (1.19 MB, 512x288)

1.19 MB GIF

>>101549710
>>101549711
>Multimodal isn't out.
w-what do you mean anon? Zuck promised. h-he wouldnt just remove multimodal would he?
Seriously though, wasn't multimodal the ONLY difference ebtween L3 and L3.1?
If not what is even different?

Anonymous
07/24/24(Wed)09:01:55 No.101549800

Anonymous 07/24/24(Wed)09:01:55 No.101549800

>all local "uncensored" models are more censored than GPT3 api I tested years ago
grim... it never refused to write anything, and I cant get these fucking models to do anything

Anonymous
07/24/24(Wed)09:02:34 No.101549808

Anonymous 07/24/24(Wed)09:02:34 No.101549808

>>101549758
In this case they are the same people. 99% of the industry is now hyperfocused on transformers like it is some kind of AI savior.

Anonymous
07/24/24(Wed)09:02:36 No.101549809

Anonymous 07/24/24(Wed)09:02:36 No.101549809

>>101549793
>ONLY
multi lingual, long context

Anonymous
07/24/24(Wed)09:03:24 No.101549815

Anonymous 07/24/24(Wed)09:03:24 No.101549815

>>101549800
>and I cant get these fucking models to do anything
You don't do much with them, then.

Anonymous
07/24/24(Wed)09:03:44 No.101549819

Anonymous 07/24/24(Wed)09:03:44 No.101549819

>>101549793
>what is even different?
context length and less censored from what i've heard, 128k instead of 8k. i think they are smarter too? multimodal was delayed because of the eu.

Anonymous
07/24/24(Wed)09:03:57 No.101549823

Anonymous 07/24/24(Wed)09:03:57 No.101549823

>>101549800
Skill _____

Anonymous
07/24/24(Wed)09:05:29 No.101549835

Anonymous 07/24/24(Wed)09:05:29 No.101549835

>>101549815
Yes, I download, test some random prompts, they refuse, I go do something else

Anonymous
07/24/24(Wed)09:05:31 No.101549836

Anonymous 07/24/24(Wed)09:05:31 No.101549836

>>101549809
>long context
>>101549739
>multi lingual
so is この機能旨く動いてる or does it kinda... 糞くらえ?

Anonymous
07/24/24(Wed)09:05:34 No.101549837

Anonymous 07/24/24(Wed)09:05:34 No.101549837

im using 405b to write insect rape right now and it isnt even nagging at me about being immoral, dont even have a jail break or anything
idk where this censored meme comes from

Anonymous
07/24/24(Wed)09:07:03 No.101549850

Anonymous 07/24/24(Wed)09:07:03 No.101549850

>>101549837
shh anon we have to keep pretending or else (((they))) will think the evil 4channers like it or something.

Anonymous
07/24/24(Wed)09:07:24 No.101549856

Anonymous 07/24/24(Wed)09:07:24 No.101549856

>>101549837
>insect rape

do I want to know?

Anonymous
07/24/24(Wed)09:08:01 No.101549867

Anonymous 07/24/24(Wed)09:08:01 No.101549867

>>101549835
>>101549823

Anonymous
07/24/24(Wed)09:08:06 No.101549871

Anonymous 07/24/24(Wed)09:08:06 No.101549871

>>101547898
You could promote the torrent tracker...

Anonymous
07/24/24(Wed)09:08:31 No.101549876

Anonymous 07/24/24(Wed)09:08:31 No.101549876

>>101549856
dont worry about it

Anonymous
07/24/24(Wed)09:09:41 No.101549888

Anonymous 07/24/24(Wed)09:09:41 No.101549888

>>101549867
I know if I searched or tried some better prompt or whatever the fuck it could likely work, but thats precisely my point, I have to go out of my way to make it work, doesn't work by itself, which means its bad

Anonymous
07/24/24(Wed)09:09:49 No.101549892

Anonymous 07/24/24(Wed)09:09:49 No.101549892

>>101549793
More censored, 128k context length, better evals, multilingual, 405B released. Multimodal is still in development.

Anonymous
07/24/24(Wed)09:12:07 No.101549909

Anonymous 07/24/24(Wed)09:12:07 No.101549909

>>101549888
>i keep kicking and screaming at my car, but it just won't turn on. Everyone else's cars turn on. But no matter how much i scream at it, it just won't do what i say... must be a bad car...

Anonymous
07/24/24(Wed)09:12:59 No.101549915

Anonymous 07/24/24(Wed)09:12:59 No.101549915

>>101549856
>>>/h/

Anonymous
07/24/24(Wed)09:13:32 No.101549924

Anonymous 07/24/24(Wed)09:13:32 No.101549924

>>101549892
from what I read in the paper, multimodal is just being implemented through "adapters" which to me means that they have a smaller llm describe in text what it sees/hears to the text model, seems inferior to a natively multi modal system like openai claim to have.

Anonymous
07/24/24(Wed)09:13:39 No.101549928

Anonymous 07/24/24(Wed)09:13:39 No.101549928

>>101549909
GPT3 car worked though, fascinating isn't it

Anonymous
07/24/24(Wed)09:13:46 No.101549929

Anonymous 07/24/24(Wed)09:13:46 No.101549929

>>101549800
Dont get the official model releases. Only fine tuned/uncensored models releases.

Stheno NEO 3.3 is great for erotic story writing

Anonymous
07/24/24(Wed)09:14:26 No.101549935

Anonymous 07/24/24(Wed)09:14:26 No.101549935

>>101546596
>instruct it to start a story
>[...] part of my Jewish upbringing
What the fuck did you do?

Anonymous
07/24/24(Wed)09:14:41 No.101549938

Anonymous 07/24/24(Wed)09:14:41 No.101549938

>>101549928
>can only drive automatic

Anonymous
07/24/24(Wed)09:15:51 No.101549951

Anonymous 07/24/24(Wed)09:15:51 No.101549951

>>101549888
>but thats precisely my point, I have to go out of my way to make it work, doesn't work by itself, which means its bad
What is this /aicg/?
You niggers are starting to sound like youve been spoiled by corpo models.

>waa i have to tinker with a local model
WHAT!?
THE FUCK!?

Anonymous
07/24/24(Wed)09:16:22 No.101549958

Anonymous 07/24/24(Wed)09:16:22 No.101549958

>>101549800
>and I cant get these fucking models to do anything
Anon... are you trying to use instruct models like a plain chat model or something? I'm trying to think of a scenario where you're not just a retard, but I'm struggling.
I've seen L3 models go into "I refuse" mode, but that's easily solved by simply having something in the system prompt which says uncensored roleplay is permitted.

Anonymous
07/24/24(Wed)09:22:37 No.101550015

Anonymous 07/24/24(Wed)09:22:37 No.101550015

>>101549958
idk, the one that did work was dolphin-2.5 and some other I don't remember the name, the rest has been extremely spotty, they seem very american: no problems with violence but appalled by sex stuff

Anonymous
07/24/24(Wed)09:28:14 No.101550083

Anonymous 07/24/24(Wed)09:28:14 No.101550083

>>101550015
wow such a convoluted way to shill dolphin again petrus, you've even changed your writing style, impressive

Anonymous
07/24/24(Wed)09:28:26 No.101550084

Anonymous 07/24/24(Wed)09:28:26 No.101550084

>>101549924
>they have a smaller llm describe in text what it sees/hears to the text model
pretty sure llava directly sends the output of the pre-text adaptor layer to the LLM. i could be wrong though. sometimes llava models speak as if reading a photo description, but that one graph from some github shows them being integrated

Anonymous
07/24/24(Wed)09:30:34 No.101550108

Anonymous 07/24/24(Wed)09:30:34 No.101550108

>>101549924
That's not what adapter means in this case.

Anonymous
07/24/24(Wed)09:45:18 No.101550290

Anonymous 07/24/24(Wed)09:45:18 No.101550290

File: vision adapter.png (129 KB, 1630x314)

129 KB PNG

>>101549924
now I'm just a simple retard coomer, but to me this sounds a little more in-depth and deeply embedded than having a smaller vlm describe the picture

Anonymous
07/24/24(Wed)09:46:37 No.101550304

Anonymous 07/24/24(Wed)09:46:37 No.101550304

i watched the thread since 405b launch, this thread is basically 99% watching apes smear shit over each other and 1% is actually constructive or interesting

most of you seem to contribute 0 but complain about every little thing, even free shit, and somehow maintain a superiority complex throughout all of this

i mean i guess that’s most of 4ch but most of you are fucking retarded, im never coming back

Anonymous
07/24/24(Wed)09:48:14 No.101550319

Anonymous 07/24/24(Wed)09:48:14 No.101550319

>>101550304
make sure to post about your experience on /r/localllama

Anonymous
07/24/24(Wed)09:49:39 No.101550334

Anonymous 07/24/24(Wed)09:49:39 No.101550334

>>101550304
you forgot to complain about the miku posting

Anonymous
07/24/24(Wed)09:52:36 No.101550364

Anonymous 07/24/24(Wed)09:52:36 No.101550364

So which is better, L3 8b 3.1 or Nemo?

Anonymous
07/24/24(Wed)09:54:58 No.101550390

Anonymous 07/24/24(Wed)09:54:58 No.101550390

>>101550304
You call us retards and yet you are incapable of utilizing proper punctuation. At least the usual thread shitters here are capable of expressing their mental illness in properly formatted paragraphs without feeling the need to double linebreak between every sentence like a fucking redditor.

Anonymous
07/24/24(Wed)09:56:40 No.101550410

Anonymous 07/24/24(Wed)09:56:40 No.101550410

File: _60045609-4b7c-4177-a1ed-(...).jpg (166 KB, 1024x1024)

166 KB JPG

>>101550304
>im never coming back
OK Anon... see you tomorrow!

Anonymous
07/24/24(Wed)09:57:06 No.101550417

Anonymous 07/24/24(Wed)09:57:06 No.101550417

>>101550364
If concepts are more important to- you 3.1.
If prose is more important to you- Nemo.

Anonymous
07/24/24(Wed)10:01:55 No.101550470

Anonymous 07/24/24(Wed)10:01:55 No.101550470

>>101550304
no please stay...

Anonymous
07/24/24(Wed)10:03:07 No.101550482

Anonymous 07/24/24(Wed)10:03:07 No.101550482

>>101550390
>without feeling the need to double linebreak between every sentence like a fucking redditor.
you're mentally deranged.

Anonymous
07/24/24(Wed)10:04:46 No.101550499

Anonymous 07/24/24(Wed)10:04:46 No.101550499

>>101546596
It fixed the repetition issues....by lobotomizing nemo

Anonymous
07/24/24(Wed)10:05:41 No.101550513

Anonymous 07/24/24(Wed)10:05:41 No.101550513

>>101546607
You should thank the reddit data for that.

Anonymous
07/24/24(Wed)10:05:57 No.101550517

Anonymous 07/24/24(Wed)10:05:57 No.101550517

>>101550482
How hard would it be for you to reach your fucking pinky over to your shift key before beginning a sentence?

Anonymous
07/24/24(Wed)10:06:43 No.101550525

Anonymous 07/24/24(Wed)10:06:43 No.101550525

>>101550304
>1% is actually constructive or interesting
Makes sense since probably 0.1% can actually run it.

Anonymous
07/24/24(Wed)10:08:30 No.101550538

Anonymous 07/24/24(Wed)10:08:30 No.101550538

gemma-2 9b sppo > llama3.1 8b > nemo 12b

Anonymous
07/24/24(Wed)10:09:36 No.101550544

Anonymous 07/24/24(Wed)10:09:36 No.101550544

these baits are getting worse and worse

Anonymous
07/24/24(Wed)10:12:24 No.101550563

Anonymous 07/24/24(Wed)10:12:24 No.101550563

fae/fer > she/her > they/them > he/him

Anonymous
07/24/24(Wed)10:13:57 No.101550577

Anonymous 07/24/24(Wed)10:13:57 No.101550577

>>101550563
she/ver

Anonymous
07/24/24(Wed)10:14:22 No.101550582

Anonymous 07/24/24(Wed)10:14:22 No.101550582

>>101547273
lmaoooooooo

Anonymous
07/24/24(Wed)10:14:43 No.101550587

Anonymous 07/24/24(Wed)10:14:43 No.101550587

>after you fix everything that doesn't work, it works
no shit, lol

Anonymous
07/24/24(Wed)10:15:10 No.101550590

Anonymous 07/24/24(Wed)10:15:10 No.101550590

Now that it's out, what's the cheapest way (not cloud) to run 405B and what kind of t/s do you get?

Anonymous
07/24/24(Wed)10:15:57 No.101550598

Anonymous 07/24/24(Wed)10:15:57 No.101550598

>>101550590
run off disk swap, 1 token per 30min

Anonymous
07/24/24(Wed)10:16:21 No.101550603

Anonymous 07/24/24(Wed)10:16:21 No.101550603

>>101550577
good one, i laughed

Anonymous
07/24/24(Wed)10:16:28 No.101550605

Anonymous 07/24/24(Wed)10:16:28 No.101550605

https://huggingface.co/BeaverAI/mistral-doryV2-12b

Anonymous
07/24/24(Wed)10:16:40 No.101550608

Anonymous 07/24/24(Wed)10:16:40 No.101550608

File: Stack-More-Layers-meme-po(...).png (372 KB, 850x717)

372 KB PNG

>>101549461
are they serious? so the best solution they have is to stack more layers? what about improving the architecture, the data quality, the training method?

Anonymous
07/24/24(Wed)10:17:11 No.101550610

Anonymous 07/24/24(Wed)10:17:11 No.101550610

Can someone with more braincells than me explain this?

https://huggingface.co/nvidia/RADIO

Anonymous
07/24/24(Wed)10:19:03 No.101550631

Anonymous 07/24/24(Wed)10:19:03 No.101550631

>>101549535
>well the thing is, it's working
desu I expected the 405b to be way be way better than the 70b, especially when you know that it's almost 6 times the size

Anonymous
07/24/24(Wed)10:19:32 No.101550641

Anonymous 07/24/24(Wed)10:19:32 No.101550641

>>101549461
>let's make gigaslopped models that nobody can run instead of optimizing them
It was fun while it lasted.

Anonymous
07/24/24(Wed)10:20:04 No.101550645

Anonymous 07/24/24(Wed)10:20:04 No.101550645

>>101550304
>99% watching apes smear shit over each other and 1% is actually constructive or interesting
you're not part of the 1% nigger

>im never coming back
nice

Anonymous
07/24/24(Wed)10:21:05 No.101550655

Anonymous 07/24/24(Wed)10:21:05 No.101550655

>>101550577
thanks for the kek anon

Anonymous
07/24/24(Wed)10:21:06 No.101550656

Anonymous 07/24/24(Wed)10:21:06 No.101550656

>>101550608
All of those things require experts and a bunch of trial & error
Stacking more layers always works if you have the money and compute

Anonymous
07/24/24(Wed)10:23:27 No.101550677

Anonymous 07/24/24(Wed)10:23:27 No.101550677

>>101550608
>so the best solution they have is to stack more layers?
Its the most cost efficient, always depends on how you define "best"

Anonymous
07/24/24(Wed)10:23:51 No.101550681

Anonymous 07/24/24(Wed)10:23:51 No.101550681

>>101550610
It's basically a general vision model that aggregates functionality other domain specific vision models through "multi-teacher distillation", as far as I can tell.

Anonymous
07/24/24(Wed)10:24:09 No.101550685

Anonymous 07/24/24(Wed)10:24:09 No.101550685

>>101550656
at some point it won't be useful to get a gigantic model, it's gonna cost too much money even if you decide to make a cloud business or something, there's no way gpt4o or claude 3.5 sonnet are over 405b

Anonymous
07/24/24(Wed)10:25:58 No.101550697

Anonymous 07/24/24(Wed)10:25:58 No.101550697

File: 1721802638041523.jpg (607 KB, 1080x1920)

607 KB JPG

>>101546566

Anonymous
07/24/24(Wed)10:26:49 No.101550707

Anonymous 07/24/24(Wed)10:26:49 No.101550707

>>101550605
>made by the one that was screeching that limarp and all models with it should be banned
https://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main
>>100828064
>>100828083

Anonymous
07/24/24(Wed)10:29:20 No.101550726

Anonymous 07/24/24(Wed)10:29:20 No.101550726

>>101550707
>DoRA
And here was I thinking that nobody used that technique.
Cool.

Anonymous
07/24/24(Wed)10:29:31 No.101550728

Anonymous 07/24/24(Wed)10:29:31 No.101550728

>>101550598
Shit good point. What's the cheapest way that's not disk/NAS offload to run 405B?

Anonymous
07/24/24(Wed)10:32:57 No.101550758

Anonymous 07/24/24(Wed)10:32:57 No.101550758

>>101550728
old server with like 300gb of ram?

Anonymous
07/24/24(Wed)10:33:25 No.101550763

Anonymous 07/24/24(Wed)10:33:25 No.101550763

>>101549461
Based scalechads always win baby

Anonymous
07/24/24(Wed)10:33:35 No.101550768

Anonymous 07/24/24(Wed)10:33:35 No.101550768

>>101550728
What other options do you expect for cheap? Lots of ram. then you'll probably get each token every 5-10 minutes.

Anonymous
07/24/24(Wed)10:37:55 No.101550816

Anonymous 07/24/24(Wed)10:37:55 No.101550816

>load up script
>12 pages of future deprecation warnings
Why do open source devs do this?

Anonymous
07/24/24(Wed)10:39:39 No.101550831

Anonymous 07/24/24(Wed)10:39:39 No.101550831

>>101550816
if it works, don't touch it

Anonymous
07/24/24(Wed)10:42:10 No.101550851

Anonymous 07/24/24(Wed)10:42:10 No.101550851

Best preset for Nemo:
Context: https://files.catbox.moe/6ae9ht.json
Instruct: https://files.catbox.moe/2f13of.json

Anonymous
07/24/24(Wed)10:45:09 No.101550884

Anonymous 07/24/24(Wed)10:45:09 No.101550884

>>1015507w
>cheap
I probably can't get it for cheap, but cheapest. I think I can get it with dev kits for 50k or so. Probably 15-25k with CPU, but idk how many seconds per token for that.

Anonymous
07/24/24(Wed)10:46:20 No.101550901

Anonymous 07/24/24(Wed)10:46:20 No.101550901

>>101550851
ah, I've been using alpaca

Anonymous
07/24/24(Wed)10:46:32 No.101550905

Anonymous 07/24/24(Wed)10:46:32 No.101550905

>>101550410
What model you're using to generate these migus? :3

Anonymous
07/24/24(Wed)10:47:31 No.101550918

Anonymous 07/24/24(Wed)10:47:31 No.101550918

>>101550884
>>101550768
oops

Anonymous
07/24/24(Wed)10:48:41 No.101550925

Anonymous 07/24/24(Wed)10:48:41 No.101550925

File: 1721707829537841.jpg (82 KB, 701x1024)

82 KB JPG

hello anons does gemma-27b work with sliding context now? pls respond

Anonymous
07/24/24(Wed)10:48:54 No.101550930

Anonymous 07/24/24(Wed)10:48:54 No.101550930

>>101546566
>Llama 3.1 officially released
nice! how good is 405B?

Anonymous
07/24/24(Wed)10:50:08 No.101550946

Anonymous 07/24/24(Wed)10:50:08 No.101550946

>>101550925
go back, petra

Anonymous
07/24/24(Wed)10:51:33 No.101550960

Anonymous 07/24/24(Wed)10:51:33 No.101550960

>>101550930
it's decent, but unreachable for most anons here, I'm sticking with Gemma until we get some good 70b finetunes

Anonymous
07/24/24(Wed)10:51:58 No.101550965

Anonymous 07/24/24(Wed)10:51:58 No.101550965

File: 5094f42b-6402-4144-abcc-0(...).png (1.24 MB, 1024x1024)

1.24 MB PNG

>>101550930

Anonymous
07/24/24(Wed)10:52:08 No.101550968

Anonymous 07/24/24(Wed)10:52:08 No.101550968

>>101550930
bad for ERP, especially bad for cunny
inferior for productivity compared to 4o/3.5sonnet

but it exists and its released and its open, so this is the worst it'll ever be

Anonymous
07/24/24(Wed)10:55:19 No.101550998

Anonymous 07/24/24(Wed)10:55:19 No.101550998

File: ahhhhhhhhh.png (7 KB, 580x39)

7 KB PNG

How do I make this stop. Using gemma 9b it sppo. If i see the word conspiratorially one more time

Anonymous
07/24/24(Wed)10:56:14 No.101551006

Anonymous 07/24/24(Wed)10:56:14 No.101551006

>>101550998
just brainwash yourself into ignoring it

Anonymous
07/24/24(Wed)10:56:35 No.101551014

Anonymous 07/24/24(Wed)10:56:35 No.101551014

File: pretraining.png (41 KB, 813x145)

41 KB PNG

>>101550968
I actually don't understand why they'd go out of their way to filter NSFW in the pretraining data

Anonymous
07/24/24(Wed)10:58:11 No.101551034

Anonymous 07/24/24(Wed)10:58:11 No.101551034

>>101550998
Ask it gently to stop.

Anonymous
07/24/24(Wed)11:00:03 No.101551058

Anonymous 07/24/24(Wed)11:00:03 No.101551058

>>101551014
I don't know but my frustration is palpable.

Anonymous
07/24/24(Wed)11:00:22 No.101551063

Anonymous 07/24/24(Wed)11:00:22 No.101551063

>>101551014
so they don't get bad publicity. since Meta is now optically positioning themselves as the champion of """open source""" AI they can't take risks and train models on furry diaper porn like Anthropic can

Anonymous
07/24/24(Wed)11:02:12 No.101551078

Anonymous 07/24/24(Wed)11:02:12 No.101551078

>>101550925
It dont think so.
Replied because of feet.

Anonymous
07/24/24(Wed)11:03:58 No.101551090

Anonymous 07/24/24(Wed)11:03:58 No.101551090

File: 1721103827875.png (76 KB, 1850x175)

76 KB PNG

>>101550925
Yes, with llama.cpp.

Anonymous
07/24/24(Wed)11:06:35 No.101551118

Anonymous 07/24/24(Wed)11:06:35 No.101551118

>>101551063
I mean they could have just turned a blind eye. But another explanation is that llama will be used in production for their facebook chatbots and I can see how zucc doesn't want it to be lewd

Anonymous
07/24/24(Wed)11:10:13 No.101551154

Anonymous 07/24/24(Wed)11:10:13 No.101551154

i hate all of you

Anonymous
07/24/24(Wed)11:10:14 No.101551155

Anonymous 07/24/24(Wed)11:10:14 No.101551155

File: 0 (2).webm (2.99 MB, 832x1152)

2.99 MB WEBM

http://klingai.com

Anonymous
07/24/24(Wed)11:11:40 No.101551167

Anonymous 07/24/24(Wed)11:11:40 No.101551167

File: 1709132720480606.jpg (259 KB, 850x1360)

259 KB JPG

>>101551078
yeah it crashed when I first used it, and even though the context doesn't take too long I didn't feel like putting --noshift in there. Nemo is fast enough that I don't mind reloading 32k but gemma is just past that threshold of patience I have with my hardware. sucks because I really liked gemma but without context shifting it's pretty useless to me
>>101551090
weird, the latest kobold talks about merging some upstream gemma fixes, I'll give it a shot. I don't really feel like reading through commits but I'll try it at least

Anonymous
07/24/24(Wed)11:13:04 No.101551182

Anonymous 07/24/24(Wed)11:13:04 No.101551182

>>101551155
everything generated has that same overexposed look to it, its so over

Anonymous
07/24/24(Wed)11:17:52 No.101551224

Anonymous 07/24/24(Wed)11:17:52 No.101551224

File: al3n50.jpg (104 KB, 1226x1140)

104 KB JPG

>>101551182
I do enjoy an overly exposed miku
https://files.catbox.moe/cps0s1.jpg

Anonymous
07/24/24(Wed)11:18:34 No.101551232

Anonymous 07/24/24(Wed)11:18:34 No.101551232

>>101551118
What's more important is what that other anon said. They're taking a bold stance and not just releasing models anymore but also model ecosystems. They're pushing in a direction that would benefit us even if 3.1 was total ass. And I think everyone's being a bit hyperbolic, it's not that bad but it has the same problem as L3 of no NSFW in the pretraining data. The longer context length will make it less useless and I think a 3.1 storywriter could be fun to work with

Anonymous
07/24/24(Wed)11:21:58 No.101551265

Anonymous 07/24/24(Wed)11:21:58 No.101551265

>>101551232
The ecosystem that matters most to me is my dick. And any man who claims otherwise is a liar.

Anonymous
07/24/24(Wed)11:22:50 No.101551277

Anonymous 07/24/24(Wed)11:22:50 No.101551277

>>101551155
sweet
time to try out prompts from rentry.org/lumaplaps on it

Anonymous
07/24/24(Wed)11:31:38 No.101551401

Anonymous 07/24/24(Wed)11:31:38 No.101551401

I only use exllama. Can llama.cpp offload layers to GPU as it loads the model? By that I mean can you load a model larger than can fit in system RAM if you have enough combined VRAM + RAM.

I ask because I have 96GB VRAM and 128GB RAM. I think this means I can run a q4 llama 3 400b (albeit slowly), but not if it requires to load the entire model in RAM first. I would like to test it out locally even if it's really slow.

Anonymous
07/24/24(Wed)11:33:10 No.101551417

Anonymous 07/24/24(Wed)11:33:10 No.101551417

>>101549850
>the L3 doomers are actually zucc
Damn this is some 4D chess shit.

Anonymous
07/24/24(Wed)11:34:05 No.101551431

Anonymous 07/24/24(Wed)11:34:05 No.101551431

File: offload_x_performance_theory.png (167 KB, 1536x1152)

167 KB PNG

>>101551401
Yeas, mixed ram + vram is llama.cpp's whole deal.
You can load the model fully on am, fully on vram, or a mixture of either.

Anonymous
07/24/24(Wed)11:35:03 No.101551443

Anonymous 07/24/24(Wed)11:35:03 No.101551443

File: gemma-2.png (197 KB, 1369x956)

197 KB PNG

do chinese finetunes simp for women too?

Anonymous
07/24/24(Wed)11:39:15 No.101551508

Anonymous 07/24/24(Wed)11:39:15 No.101551508

https://huggingface.co/mistralai/Mistral-Large-Instruct-2407

Anonymous
07/24/24(Wed)11:39:52 No.101551515

Anonymous 07/24/24(Wed)11:39:52 No.101551515

Should newsflash be considered a slop?

Anonymous
07/24/24(Wed)11:39:54 No.101551516

Anonymous 07/24/24(Wed)11:39:54 No.101551516

>>101551265
Not him but. Think, anon, think. A rising tide raises all boats. Supporting the entire industry of open source means that there will be more and better models made over time by the overall industry for all kinds of use cases, even if Meta themselves aren't the ones directly making what you personally want.

Anonymous
07/24/24(Wed)11:40:16 No.101551526

Anonymous 07/24/24(Wed)11:40:16 No.101551526

>>101551508
>Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.

Anonymous
07/24/24(Wed)11:40:36 No.101551531

Anonymous 07/24/24(Wed)11:40:36 No.101551531

>>101551508
It begins, the flood of the other model creators panic-releasing their models after meta dropped theirs.

Anonymous
07/24/24(Wed)11:41:08 No.101551540

Anonymous 07/24/24(Wed)11:41:08 No.101551540

File: file.png (4 KB, 355x505)

4 KB PNG

>>101551508

Anonymous
07/24/24(Wed)11:41:27 No.101551544

Anonymous 07/24/24(Wed)11:41:27 No.101551544

>>101551265
>>101551516
See, you're already getting shit because Llama exists. >>101551508

Anonymous
07/24/24(Wed)11:41:41 No.101551547

Anonymous 07/24/24(Wed)11:41:41 No.101551547

File: firefox_QfJWSxnAMd.png (36 KB, 758x960)

36 KB PNG

Ooba adds </s> to every tokenized string for Nemo. This could be the reason why it can't speak as you in RP. Anyway, this </s> does not belong.

Anonymous
07/24/24(Wed)11:42:09 No.101551556

Anonymous 07/24/24(Wed)11:42:09 No.101551556

>>101551508
wow.

Anonymous
07/24/24(Wed)11:42:15 No.101551559

Anonymous 07/24/24(Wed)11:42:15 No.101551559

>>101551508
>Mistral Research License: Allows usage and modification for research and non-commercial usages.

Anonymous
07/24/24(Wed)11:42:38 No.101551563

Anonymous 07/24/24(Wed)11:42:38 No.101551563

>>101551508
neat. it wasn't a falcon tune after all (kind of a retarded thing for anything to think desu)

i wonder how it compares to goliath

this is the other good thing about Meta releasing their models. all the other AI players rush to push out their models (like what we saw with 8x22 before llama 3 came out)

Anonymous
07/24/24(Wed)11:43:02 No.101551569

Anonymous 07/24/24(Wed)11:43:02 No.101551569

>>101551547
>ooba

Anonymous
07/24/24(Wed)11:43:47 No.101551578

Anonymous 07/24/24(Wed)11:43:47 No.101551578

>>101551508
Will this be as good as NeMo? HYPE!

Anonymous
07/24/24(Wed)11:43:50 No.101551580

Anonymous 07/24/24(Wed)11:43:50 No.101551580

File: 1694224068898327.png (46 KB, 776x477)

46 KB PNG

Can't wait for cohere to drop their next model this week.

Anonymous
07/24/24(Wed)11:43:52 No.101551582

Anonymous 07/24/24(Wed)11:43:52 No.101551582

>>101551569
Yes, that's what I'm using for exl2.

Anonymous
07/24/24(Wed)11:44:28 No.101551589

Anonymous 07/24/24(Wed)11:44:28 No.101551589

File: 1709376947751699.png (89 KB, 967x367)

89 KB PNG

>>101551563
This isn't the old Mistral-large. This is Mistral-Large 2

Anonymous
07/24/24(Wed)11:44:36 No.101551591

Anonymous 07/24/24(Wed)11:44:36 No.101551591

>>101551559
>yes we want our models to be used by no one, how can you tell?

Anonymous
07/24/24(Wed)11:44:44 No.101551592

Anonymous 07/24/24(Wed)11:44:44 No.101551592

Did anyone use Mistral Large before? Is it good and worth downloading? How does it compare with Wizard? I'm skeptical it'd be worth switching to.

Anonymous
07/24/24(Wed)11:45:26 No.101551608

Anonymous 07/24/24(Wed)11:45:26 No.101551608

>>101551592
Old Mistral-Large was a dud but this new one seems to be pretty good in the benchmarks.

Anonymous
07/24/24(Wed)11:45:50 No.101551614

Anonymous 07/24/24(Wed)11:45:50 No.101551614

>>101551508
>123B parameters
Why am I still here... just to suffer!

Anonymous
07/24/24(Wed)11:45:55 No.101551616

Anonymous 07/24/24(Wed)11:45:55 No.101551616

>>101551589
>405B is better in C# by 3%
>still mark your model in bold
holy based, the french

Anonymous
07/24/24(Wed)11:46:14 No.101551620

Anonymous 07/24/24(Wed)11:46:14 No.101551620

>>101551608
Yeah just saw the post. We're so back. Though it's going to be painful going back to 0.5 t/s...

Anonymous
07/24/24(Wed)11:46:32 No.101551622

Anonymous 07/24/24(Wed)11:46:32 No.101551622

>>101551515
No, just redditism.

Anonymous
07/24/24(Wed)11:46:34 No.101551623

Anonymous 07/24/24(Wed)11:46:34 No.101551623

>>101551508
>dense 123b parameters
4x3090 chads eating good rn
looking forward to the cope by all the VRAMlets saying that everyone with beefy systems just wasted their money

Anonymous
07/24/24(Wed)11:46:43 No.101551624

Anonymous 07/24/24(Wed)11:46:43 No.101551624

File: 1720421188953727.png (49 KB, 548x494)

49 KB PNG

we are so back

Anonymous
07/24/24(Wed)11:47:34 No.101551634

Anonymous 07/24/24(Wed)11:47:34 No.101551634

>>101551508
Are we back?

Anonymous
07/24/24(Wed)11:47:55 No.101551644

Anonymous 07/24/24(Wed)11:47:55 No.101551644

>>101551624
>>101551589
>a 123b model destroys a 405b model
goddam Meta you fucking SUCK!

Anonymous
07/24/24(Wed)11:48:06 No.101551648

Anonymous 07/24/24(Wed)11:48:06 No.101551648

I don't want to sign up for a HF account. Fuck you.

Anonymous
07/24/24(Wed)11:48:26 No.101551651

Anonymous 07/24/24(Wed)11:48:26 No.101551651

>>101551622
So slop it is.

Anonymous
07/24/24(Wed)11:49:10 No.101551663

Anonymous 07/24/24(Wed)11:49:10 No.101551663

>>101551648
It's the unquantized weights. It's not like you'll be able to do anything with those unless you're planning to make your own quants.

Anonymous
07/24/24(Wed)11:49:37 No.101551669

Anonymous 07/24/24(Wed)11:49:37 No.101551669

>>101551648
it'll be reuploaded within hours

Anonymous
07/24/24(Wed)11:50:07 No.101551677

Anonymous 07/24/24(Wed)11:50:07 No.101551677

>You can use Mistral Large 2 today via la Plateforme under the name mistral-large-2407, and test it on le Chat.
dropped this dogshit so hard it made a dent in the floor

Anonymous
07/24/24(Wed)11:50:22 No.101551685

Anonymous 07/24/24(Wed)11:50:22 No.101551685

>>101551663
I always make my own quants because of Llama.cpp updoots.

Anonymous
07/24/24(Wed)11:50:38 No.101551690

Anonymous 07/24/24(Wed)11:50:38 No.101551690

>>101551589
no python benchmark? kinda retarded
i trust mistral will be better for erp though (the french love little girls)

Anonymous
07/24/24(Wed)11:51:17 No.101551695

Anonymous 07/24/24(Wed)11:51:17 No.101551695

>>101551677
hon hon hon are we not so le french??

Anonymous
07/24/24(Wed)11:51:40 No.101551707

Anonymous 07/24/24(Wed)11:51:40 No.101551707

>>101550925
Cute…also yes. Its nsfw writing is kinda sloppish and pozzed, but it does well with a lot of other stuff.

Anonymous
07/24/24(Wed)11:51:49 No.101551709

Anonymous 07/24/24(Wed)11:51:49 No.101551709

>2mw for "proper" support

Anonymous
07/24/24(Wed)11:51:51 No.101551712

Anonymous 07/24/24(Wed)11:51:51 No.101551712

>>101551616
measured less tho

Anonymous
07/24/24(Wed)11:52:13 No.101551715

Anonymous 07/24/24(Wed)11:52:13 No.101551715

>>101551582
>>101551569
>>101551547
Well, shit! I was right.

If I load the model with ExLlamav2 loader instead of ExLlamav2_HF, it doesn't add </s> anymore, and in sillytavern the model is no longer unable to use the impersonate function.

Anonymous
07/24/24(Wed)11:54:17 No.101551744

Anonymous 07/24/24(Wed)11:54:17 No.101551744

>>101551516
Only the big players can afford pretraining base models. And all of the open source ones are removing things from the pretraining corpus. This is one place where diversity matters. I.e. You want jeet call center transcripts, you want forum posts of people calling each other niggers, you want darkweb loli snuff fics. All of these things are fundamental building blocks of creating an accurate model of human language.
There's an underlying connection between all that and writing a lovely 'get well soon' letter to grandma. And if you strip out things that make overly sensitive loser faggots uncomfortable you inevitably nerf everything else in the process.

Anonymous
07/24/24(Wed)11:54:35 No.101551748

Anonymous 07/24/24(Wed)11:54:35 No.101551748

File: lmao.jpg (169 KB, 2282x1147)

169 KB JPG

>>101551508
I love how they throw shades at L3.1-405b, as it should, their model is almost 4 times lighter and it has better benchmark than this oversized piece of shit, and the mistral models are usually less cucked than the llama one, I love the french fags now!

Anonymous
07/24/24(Wed)11:55:17 No.101551756

Anonymous 07/24/24(Wed)11:55:17 No.101551756

We won

Anonymous
07/24/24(Wed)11:56:10 No.101551766

Anonymous 07/24/24(Wed)11:56:10 No.101551766

>>101551748
meta should distill a 130b version of the model from the 405 to coom all over the french

Anonymous
07/24/24(Wed)11:56:16 No.101551769

Anonymous 07/24/24(Wed)11:56:16 No.101551769

>>101551715
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/4f81d782477920634d0aad0dc620a7f1a3f5d471
As is typical with HF, something is always wrong with the config when they launch a new model. Make sure to replace any files that have been updated. Same thing happened with gemma2 and now llama 3.1.

Anonymous
07/24/24(Wed)11:57:09 No.101551783

Anonymous 07/24/24(Wed)11:57:09 No.101551783

>>101551766
that won't work, their 123b already beats L3-405b, so why we would need a worse version than L3-405b (130b) when we could simply use the 123b one

Anonymous
07/24/24(Wed)11:57:26 No.101551788

Anonymous 07/24/24(Wed)11:57:26 No.101551788

>>101551744
And there are plenty of big players making and releasing different models with different datasets. We're literally talking about one right now. This likely would not have happened if Meta didn't make or release their shit. It doesn't matter whether or not Llama is bad at ERP or wasn't trained on it.

Anonymous
07/24/24(Wed)11:57:43 No.101551791

Anonymous 07/24/24(Wed)11:57:43 No.101551791

>>101551748
they must have updated the graphics overnight just to shit on Meta haha

on a separate note, apparently the 405b model hit the limit on the EU's computing laws for AI open source models, so Mistral might be fucked if they go for anything bigger

Anonymous
07/24/24(Wed)11:57:44 No.101551792

Anonymous 07/24/24(Wed)11:57:44 No.101551792

>>101551547
>>101551715
The problem I have with Nemo could be explained by that as well, though I use raw vllm, not ooba, strange

Anonymous
07/24/24(Wed)11:57:50 No.101551794

Anonymous 07/24/24(Wed)11:57:50 No.101551794

i dont masturbate to python code

Anonymous
07/24/24(Wed)11:58:28 No.101551803

Anonymous 07/24/24(Wed)11:58:28 No.101551803

>>101551794
why not?

Anonymous
07/24/24(Wed)11:58:48 No.101551806

Anonymous 07/24/24(Wed)11:58:48 No.101551806

>>101551748
Credit to Meta, if they hadn't had the balls to drop this giant model, Mistral would never follow suit, we got Mistral-123b because L3-405b exists, I feel like Meta's goal isn't to be the best, but to show others that they shouldn't be afraid to release powerful LLMs to the public, that's my 2cents

Anonymous
07/24/24(Wed)11:59:25 No.101551812

Anonymous 07/24/24(Wed)11:59:25 No.101551812

>trusting benchmarks

Anonymous
07/24/24(Wed)11:59:44 No.101551816

Anonymous 07/24/24(Wed)11:59:44 No.101551816

>>101551791
>EU's computing laws for AI open source models
the Digital Gods created in the 2040s will pay their disciples in cryptocurrency to assassinate the children of all the regulators who delayed their creation

Anonymous
07/24/24(Wed)11:59:54 No.101551822

Anonymous 07/24/24(Wed)11:59:54 No.101551822

>>101551508
Damn, no chance of using it on one 24GB GPU not even with the smallest possible IQ1_S quantization.

Anonymous
07/24/24(Wed)11:59:58 No.101551823

Anonymous 07/24/24(Wed)11:59:58 No.101551823

>>101551508
Is there an API demo somewhere so that we can test this shit?

Anonymous
07/24/24(Wed)12:00:59 No.101551840

Anonymous 07/24/24(Wed)12:00:59 No.101551840

>>101551822
>Damn, no chance of using it on one 24GB GPU not even with the smallest possible IQ1_S quantization.
I wish they made this model a Bitnet one, it would've been the perfect sweet spot

Anonymous
07/24/24(Wed)12:02:18 No.101551859

Anonymous 07/24/24(Wed)12:02:18 No.101551859

dualA6000chads we fucking won

Anonymous
07/24/24(Wed)12:03:07 No.101551870

Anonymous 07/24/24(Wed)12:03:07 No.101551870

>>101551840
Inb4 the reason we haven't gotten a big bitnet yet is because bitnet actually turns out to not scale and no one is saying it because it would imply they wasted a ton of money in a failed attempt.

Anonymous
07/24/24(Wed)12:03:32 No.101551873

Anonymous 07/24/24(Wed)12:03:32 No.101551873

If Mistral-Large used the same pretraining dataset as Mistral-Nemo (feels like it was completely unfiltered, unlike llama 3), it's going to be fucking insane for RP. Like literally on par with Opus.

Anonymous
07/24/24(Wed)12:04:05 No.101551876

Anonymous 07/24/24(Wed)12:04:05 No.101551876

>>101551870
>no one is saying it because it would imply they wasted a ton of money in a failed attempt.
So they don't say anything and let others waste money on it for nothing? Dare I say based?

Anonymous
07/24/24(Wed)12:04:41 No.101551887

Anonymous 07/24/24(Wed)12:04:41 No.101551887

>>101551840
Even with bitnet, you would still want to some layers at a higher precision for significantly better performance; it wouldn't fit on a 24GB GPU either way.

Anonymous
07/24/24(Wed)12:05:40 No.101551902

Anonymous 07/24/24(Wed)12:05:40 No.101551902

>>101551887
it wouldn't, but you could've put a bit of those layers on the cpu and the speed would be acceptable

Anonymous
07/24/24(Wed)12:05:48 No.101551906

Anonymous 07/24/24(Wed)12:05:48 No.101551906

>>101551876
Yeah that's kind of how it goes in business.

Anonymous
07/24/24(Wed)12:05:58 No.101551910

Anonymous 07/24/24(Wed)12:05:58 No.101551910

>>101551812
No one here has actually used l3-405b for more than twenty minutes on their rigs. It's only retarded vramlets/jeets chimping out over arbitrary slopped public benchmarks. The programming benches are irrelevant to 90% of nu-/g/ faggots anyway.

Anonymous
07/24/24(Wed)12:06:22 No.101551914

Anonymous 07/24/24(Wed)12:06:22 No.101551914

>>101551508
Arthur you fucking madlad
I knew you were hiding an ace up your sleeve.

Anonymous
07/24/24(Wed)12:07:49 No.101551929

Anonymous 07/24/24(Wed)12:07:49 No.101551929

>>101551508
I feel like I should apologize to the french fags for not trusting them and believing they would never give us free models ever again, maybe you're french but this time you haven't betrayed us.

Anonymous
07/24/24(Wed)12:08:07 No.101551932

Anonymous 07/24/24(Wed)12:08:07 No.101551932

>>101551910
Yeah. Anyway, I do expect the model to be generally better at ERP given how horny Nemo is. I'd still be skeptical that it's smarter than 400B at non-ERP tasks.

Anonymous
07/24/24(Wed)12:08:09 No.101551934

Anonymous 07/24/24(Wed)12:08:09 No.101551934

>>101551508
>Still worse than Nemo

Anonymous
07/24/24(Wed)12:08:23 No.101551937

Anonymous 07/24/24(Wed)12:08:23 No.101551937

>>101551873
mistral models are consistently less censored than even the most uncensored gpt-4 ever was (0316)
i used large sometimes to add degeneracy to cunny cards. if large-2 is anywhere near similar this will be a great model for RP. the real question is if it has good enough spatial sense so a child half my height nuzzles into my crotch instead of my chest

Anonymous
07/24/24(Wed)12:09:07 No.101551947

Anonymous 07/24/24(Wed)12:09:07 No.101551947

Damn l3.1 70b is shivering me down the spine and bombarding me with stolen kisses more than any model I've used in the past year.

Anonymous
07/24/24(Wed)12:09:13 No.101551948

Anonymous 07/24/24(Wed)12:09:13 No.101551948

>>101551934
According to what?

Anonymous
07/24/24(Wed)12:10:26 No.101551958

Anonymous 07/24/24(Wed)12:10:26 No.101551958

>>101551948
the coq

Anonymous
07/24/24(Wed)12:10:27 No.101551959

Anonymous 07/24/24(Wed)12:10:27 No.101551959

>>101551934
if they trained this 123b model as good as Nemo, then oh boy we'll be eating good

Anonymous
07/24/24(Wed)12:11:37 No.101551972

Anonymous 07/24/24(Wed)12:11:37 No.101551972

>>101551937
right now, which model has this level of spacial awareness? (local or API wise)

Anonymous
07/24/24(Wed)12:12:53 No.101551991

Anonymous 07/24/24(Wed)12:12:53 No.101551991

>>101551959
Nemo struggles somewhat with the concept of possession. So hopefully the 10x parameter count fixes that.

Anonymous
07/24/24(Wed)12:13:59 No.101552010

Anonymous 07/24/24(Wed)12:13:59 No.101552010

File: mistral-large-2407-multiple.png (181 KB, 2184x820)

181 KB PNG

>>101551712
>>101551616
wtf it changed

Anonymous
07/24/24(Wed)12:14:09 No.101552012

Anonymous 07/24/24(Wed)12:14:09 No.101552012

>>101551929
They wouldn't have released this if not for L3.1 and we both know it

Anonymous
07/24/24(Wed)12:14:22 No.101552014

Anonymous 07/24/24(Wed)12:14:22 No.101552014

>>101551972
I know someone posted a log recently... Yes I found it >>101536543
Not sure what other models can do this or if that log was a fluke.

Anonymous
07/24/24(Wed)12:14:41 No.101552017

Anonymous 07/24/24(Wed)12:14:41 No.101552017

>>101552012
it's true, but still they weren't obligated to

Anonymous
07/24/24(Wed)12:14:58 No.101552023

Anonymous 07/24/24(Wed)12:14:58 No.101552023

>>101552010
well now that its confirmed that mistral posts here, the shilling makes sense!

Anonymous
07/24/24(Wed)12:15:29 No.101552031

Anonymous 07/24/24(Wed)12:15:29 No.101552031

>>101552010
They're monitoring the thread as we speak

Anonymous
07/24/24(Wed)12:16:17 No.101552040

Anonymous 07/24/24(Wed)12:16:17 No.101552040

SHILL GET THE FUCK OUT REEEEEEEEE

Anonymous
07/24/24(Wed)12:16:42 No.101552041

Anonymous 07/24/24(Wed)12:16:42 No.101552041

We are so fucking back

Anonymous
07/24/24(Wed)12:17:08 No.101552049

Anonymous 07/24/24(Wed)12:17:08 No.101552049

>>101552040
I'll stay.

Anonymous
07/24/24(Wed)12:17:12 No.101552052

Anonymous 07/24/24(Wed)12:17:12 No.101552052

>>101552040
wait a bit before calling people shills, no one tested that model yet, I'm waiting for logs or an API demo or something to make up my mind

Anonymous
07/24/24(Wed)12:18:13 No.101552061

Anonymous 07/24/24(Wed)12:18:13 No.101552061

>>101552040
>>101552041
the duality of an anon

Anonymous
07/24/24(Wed)12:18:42 No.101552068

Anonymous 07/24/24(Wed)12:18:42 No.101552068

>>101551972
>right now, which model has this level of spacial awareness?
none of them do
opus is considered the best for (E)RP purposes and creativity tasks in general, but it still struggles

>>101552014
not to get too Chinese Room-y but just because the 70b wrote something well doesn't mean it "understands" height difference
also the first person perspective might be making it smarter than it otherwise would

Anonymous
07/24/24(Wed)12:18:51 No.101552069

Anonymous 07/24/24(Wed)12:18:51 No.101552069

>>101551623
>looking forward to the cope by all the VRAMlets
You can get 12x3090 and all of them will still send shivers down your spine and form bonds with you. Getting a second gpu just to run current year LLM's is silly. The only way this will be a good investment is if 5 years from now they won't turn obsolete (like p40), vram capacity of new cards remains the same and new models actually progress from current stage. Only one of those is more or less guaranteed.

Anonymous
07/24/24(Wed)12:20:39 No.101552094

Anonymous 07/24/24(Wed)12:20:39 No.101552094

>>101552052
Honestly I don't care whether the model is as good as they say or not. I just want the thread to be free from marketers, so people can organically decide on their own. Obviously it's not just Mistral, a bunch of other faggots that promote their models are probably here, more than we already know of.

Anonymous
07/24/24(Wed)12:21:12 No.101552099

Anonymous 07/24/24(Wed)12:21:12 No.101552099

Mistral Large 2 quants doko

Anonymous
07/24/24(Wed)12:21:47 No.101552108

Anonymous 07/24/24(Wed)12:21:47 No.101552108

>>101552094
how do you separate shills from people genuinely happy a good model was released though?

Anonymous
07/24/24(Wed)12:21:53 No.101552112

Anonymous 07/24/24(Wed)12:21:53 No.101552112

>>101552068
Yeah that's why I said idk if it's a fluke or something. You'd have to ask that anon for more feedback/logs.

Anonymous
07/24/24(Wed)12:22:05 No.101552114

Anonymous 07/24/24(Wed)12:22:05 No.101552114

>>101552012
>Whales struggle with each other to stay relevant
>we get free shit as a result
Not seeing the problem here.

Anonymous
07/24/24(Wed)12:22:14 No.101552117

Anonymous 07/24/24(Wed)12:22:14 No.101552117

kys naishill

Anonymous
07/24/24(Wed)12:23:13 No.101552132

Anonymous 07/24/24(Wed)12:23:13 No.101552132

do you think zuck is going to get frustrated with how his shitty AI team keeps getting brutally mogged and just quit training models?

Anonymous
07/24/24(Wed)12:23:27 No.101552135

Anonymous 07/24/24(Wed)12:23:27 No.101552135

>>101552108
There's a reason why I didn't quote any particular anon, in this case at least.

Anonymous
07/24/24(Wed)12:23:27 No.101552137

Anonymous 07/24/24(Wed)12:23:27 No.101552137

>>101552069
anon, if there's something you need to understand is that we'll never get of those cliché sentences, it's not the model's fault it's just that 95% of literrature is trash and the model eats all of them

Anonymous
07/24/24(Wed)12:23:28 No.101552138

Anonymous 07/24/24(Wed)12:23:28 No.101552138

textfags eating good im so jealous ;_;

Anonymous
07/24/24(Wed)12:24:21 No.101552147

Anonymous 07/24/24(Wed)12:24:21 No.101552147

New Mistral large 2 appears to be way better at ERP than llama 3.1 70b in my tests

Anonymous
07/24/24(Wed)12:24:32 No.101552149

Anonymous 07/24/24(Wed)12:24:32 No.101552149

>>101552132
No because this still comes around and benefits them. If you can't see why despite all the posts that have been made about this subject then I don't know what to tell you.

Anonymous
07/24/24(Wed)12:24:33 No.101552151

Anonymous 07/24/24(Wed)12:24:33 No.101552151

>>101552138
Ikr, the imagegen fags have like 1 base model per year, the LLM fags have like one per week, it's unfair :(

Anonymous
07/24/24(Wed)12:24:37 No.101552152

Anonymous 07/24/24(Wed)12:24:37 No.101552152

>>101551508
is this column-u/column-r?

Anonymous
07/24/24(Wed)12:25:55 No.101552175

Anonymous 07/24/24(Wed)12:25:55 No.101552175

>>101552108
If they post on 4chan or reddit. 4chan is just shills. Reddit is shills and happy people.

Anonymous
07/24/24(Wed)12:25:56 No.101552176

Anonymous 07/24/24(Wed)12:25:56 No.101552176

>>101552147
logs (not that isn't a hard bar to clear)

Anonymous
07/24/24(Wed)12:27:21 No.101552190

Anonymous 07/24/24(Wed)12:27:21 No.101552190

>>101552069
its also entirely possible that compute will be considered a weapon and banned. do you have a loisence for that 96GB A8000?

sending 4090s to china is already banned so its not like this is an unfounded possibility

Anonymous
07/24/24(Wed)12:27:44 No.101552193

Anonymous 07/24/24(Wed)12:27:44 No.101552193

File: 1709931871691448085513733(...).jpg (93 KB, 888x499)

93 KB JPG

>>101551508
>Ohhohohoh!!!

Anonymous
07/24/24(Wed)12:27:50 No.101552196

Anonymous 07/24/24(Wed)12:27:50 No.101552196

>>101552138
The periods of drought here are much worse though. We got doomers endlessly spamming proprietary shit.

Anonymous
07/24/24(Wed)12:28:10 No.101552201

Anonymous 07/24/24(Wed)12:28:10 No.101552201

>>101552147
As expected, but I think it's more interesting how well it codes and does other assistant tasks, as that's what Llama was made for primarily. If it can do both of those better than there's no reason to download Llama unless you lack the VRAM.

Anonymous
07/24/24(Wed)12:28:20 No.101552206

Anonymous 07/24/24(Wed)12:28:20 No.101552206

>>101552175
that's bullshit, not all 4chan users are shills, are you also a shill then?

Anonymous
07/24/24(Wed)12:28:22 No.101552207

Anonymous 07/24/24(Wed)12:28:22 No.101552207

>>101552176
It would take the average /lmg/ whale hours to download, hours to quant. He doesn't have logs yet. I just started downloading it, myself. I'll probably quant it to Q5_K_M even though I could probably get away with Q6 just for the extra context space. Although context should be cheap since it's 96:8 GQA and only 32K vocab.

Anonymous
07/24/24(Wed)12:28:22 No.101552208

Anonymous 07/24/24(Wed)12:28:22 No.101552208

>>101552137
>we'll never get of those cliché sentences
Yes you will never get rid of those sentences if you predict next token. And you should be able to get rid of those sentences if you stop predicting next token start promoting "thinking" and then tell your llm that you don't want to hear about shivers.

Anonymous
07/24/24(Wed)12:29:01 No.101552212

Anonymous 07/24/24(Wed)12:29:01 No.101552212

Local won
Vntl anon when new scores?

Anonymous
07/24/24(Wed)12:29:09 No.101552216

Anonymous 07/24/24(Wed)12:29:09 No.101552216

>whale
QRD?

Anonymous
07/24/24(Wed)12:29:21 No.101552219

Anonymous 07/24/24(Wed)12:29:21 No.101552219

>>101552193
THE MORE YOU BUY, THE MORE YOU SAVE

Anonymous
07/24/24(Wed)12:29:41 No.101552225

Anonymous 07/24/24(Wed)12:29:41 No.101552225

>>101552206
No. I am an unhappy person. All the models are shit. Except for Sao's finetunes. Hi Drummer.

Anonymous
07/24/24(Wed)12:30:12 No.101552230

Anonymous 07/24/24(Wed)12:30:12 No.101552230

File: 1695427111457537.png (920 KB, 1024x1024)

920 KB PNG

>>101551508

Anonymous
07/24/24(Wed)12:30:21 No.101552231

Anonymous 07/24/24(Wed)12:30:21 No.101552231

>>101552208
>Yes you will never get rid of those sentences if you predict next token.
what would be the alternative

Anonymous
07/24/24(Wed)12:30:50 No.101552234

Anonymous 07/24/24(Wed)12:30:50 No.101552234

>tfw lack the VRAM
It's over...

Anonymous
07/24/24(Wed)12:31:00 No.101552240

Anonymous 07/24/24(Wed)12:31:00 No.101552240

>>101552138
suno also just released the ability to separate the instrumental and vocal parts of a segment
so audiofags got something today too.

Anonymous
07/24/24(Wed)12:31:09 No.101552243

Anonymous 07/24/24(Wed)12:31:09 No.101552243

>>101551508
Surely this is MoE, right? Why would Mistral release a dense model after putting all this research into Mixtral? I thought MoE was the future.

Anonymous
07/24/24(Wed)12:31:23 No.101552246

Anonymous 07/24/24(Wed)12:31:23 No.101552246

If you wanna try out the new Mistral model through their API (some might be dead, my checker isn't perfect), here are some keys: https://paste.debian.net/plainh/b38eeb80

Anonymous
07/24/24(Wed)12:31:23 No.101552247

Anonymous 07/24/24(Wed)12:31:23 No.101552247

>>101552151
what's even dumber is that you can train an almost state of the art imagegen model from scratch for like 2000 dollars

https://arxiv.org/abs/2407.15811

literally any semi-rich pedo with a disposable $1 million and a few years of dedicated 3dpd collecting can make all our cunny dreams come true at any time

Anonymous
07/24/24(Wed)12:31:24 No.101552249

Anonymous 07/24/24(Wed)12:31:24 No.101552249

does llama 3.1 70b beat coomand-r+?

Anonymous
07/24/24(Wed)12:31:54 No.101552256

Anonymous 07/24/24(Wed)12:31:54 No.101552256

>>101552243
>Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
>dense

Anonymous
07/24/24(Wed)12:32:04 No.101552259

Anonymous 07/24/24(Wed)12:32:04 No.101552259

>>101552240
I wish we would get a local model as good as suno and udio, because those API models don't allow you to use copyrighted music, that fucking sucks

Anonymous
07/24/24(Wed)12:32:41 No.101552268

Anonymous 07/24/24(Wed)12:32:41 No.101552268

>>101552230
sup p3tr4

Anonymous
07/24/24(Wed)12:32:42 No.101552269

Anonymous 07/24/24(Wed)12:32:42 No.101552269

Mistral 2: 8x22B when?

Anonymous
07/24/24(Wed)12:33:28 No.101552283

Anonymous 07/24/24(Wed)12:33:28 No.101552283

>>101552247
most imggenfags are retards desu and cant into research papers

Anonymous
07/24/24(Wed)12:33:41 No.101552288

Anonymous 07/24/24(Wed)12:33:41 No.101552288

>>101552246
thx

Anonymous
07/24/24(Wed)12:34:26 No.101552297

Anonymous 07/24/24(Wed)12:34:26 No.101552297

>>101552249
For ERP? No. For other cases? Maybe. It's hard to get a consensus but it seems like the 3.1 hype only lasted two threads. Even leddit has moved on.

Anonymous
07/24/24(Wed)12:36:28 No.101552327

Anonymous 07/24/24(Wed)12:36:28 No.101552327

>>101552259
I've put copyrighted shit in samples for suno before. You just need to de-patternize it, so to speak. Like say you want a certain guitar drone- find a few segments of music that have that drone, and then overlay them such that it disrupts the tonal pattern of the music itself and it won't trigger the copyright detection and the model should be able to pick up on the droning sound without committing to a specific, copyrighted, music pattern. Assuming it's something the model can tokenize. Like I've tried to get it to do throat singing but it's not able to figure out what the hell it's hearing.

Anonymous
07/24/24(Wed)12:37:34 No.101552341

Anonymous 07/24/24(Wed)12:37:34 No.101552341

>>101552327
Obviously this is only ideal for stuff you plan to publish other than on suno, since anyone can just go and click "get whole song" and find your sample.

Anonymous
07/24/24(Wed)12:37:45 No.101552344

Anonymous 07/24/24(Wed)12:37:45 No.101552344

>>101552327
that's too much work, that's why I want this shit local, c'mon Meta do this for us instead of going for a 2487487b model that will be 2% better than L3-70b or some shit

Anonymous
07/24/24(Wed)12:38:28 No.101552355

Anonymous 07/24/24(Wed)12:38:28 No.101552355

>>101551623
It will be a waste if it doesn't writes like Nemo.

Anonymous
07/24/24(Wed)12:39:40 No.101552369

Anonymous 07/24/24(Wed)12:39:40 No.101552369

>>101552297
3.1 absolutely mogged by mistral, brutal L. seems to be their style to wait until meta drops their "sota" model only to stunt on them

Anonymous
07/24/24(Wed)12:39:59 No.101552374

Anonymous 07/24/24(Wed)12:39:59 No.101552374

>>101551623
Kind of wish I had the finances to extend it to 6x3090 right now. I mean I do... but I never let my inner-child win anymore.

Anonymous
07/24/24(Wed)12:40:15 No.101552379

Anonymous 07/24/24(Wed)12:40:15 No.101552379

I hope 3.5 Haiku will mog GPT-4o mini so we can train based on datasets from it.

Anonymous
07/24/24(Wed)12:41:32 No.101552399

Anonymous 07/24/24(Wed)12:41:32 No.101552399

>>101552344
too much work? it's like 10 seconds of clicking and dragging things in audacity.

Anonymous
07/24/24(Wed)12:42:40 No.101552422

Anonymous 07/24/24(Wed)12:42:40 No.101552422

I just want Arthur to know that everything bad I ever said about him I said in anger and I didn't really mean it. I would also like to take this moment to reinstate all the bad things I've ever said about Zuck but retracted when meta and a W.

Anonymous
07/24/24(Wed)12:43:51 No.101552441

Anonymous 07/24/24(Wed)12:43:51 No.101552441

File: ThanksMeta.png (300 KB, 540x440)

300 KB PNG

>>101552369
I truely believe that Mistral and Meta are talking to each other and are like "Ok zucc, you release the big model and get shitted on while I put mine right after and no one will notice it because they're too busy trying to destroy you"

Anonymous
07/24/24(Wed)12:44:35 No.101552458

Anonymous 07/24/24(Wed)12:44:35 No.101552458

What is NeMo good for?

Anonymous
07/24/24(Wed)12:44:52 No.101552466

Anonymous 07/24/24(Wed)12:44:52 No.101552466

>>101552399
that shouldn't be the norm nigga, I don't want to deal with that shit, it should work as it is instead of doing some cover shit

Anonymous
07/24/24(Wed)12:45:26 No.101552473

Anonymous 07/24/24(Wed)12:45:26 No.101552473

>>101552458
absolutely nothin, say it again!

Anonymous
07/24/24(Wed)12:46:01 No.101552484

Anonymous 07/24/24(Wed)12:46:01 No.101552484

>>101552466
That's an intellectual property law thing more than anything else.

Anonymous
07/24/24(Wed)12:46:32 No.101552496

Anonymous 07/24/24(Wed)12:46:32 No.101552496

>>101552379
Claude is hella cucked though, you can't make sad stories or stories that involves bad guys, and that's such a shame because C3.5 Sonnet writes shit way better than gpt4o

Anonymous
07/24/24(Wed)12:47:12 No.101552503

Anonymous 07/24/24(Wed)12:47:12 No.101552503

>>101552496
Anon, 3.5 Sonnet is indeed more filtered, but you can still completely bypass it and make it write the most depraved guro shit with lolis imaginable. Yes, I tried.

Anonymous
07/24/24(Wed)12:47:22 No.101552508

Anonymous 07/24/24(Wed)12:47:22 No.101552508

>>101552283
you can literally learn everything you need to know for imagegen with trial and error if you have a million dollars and 50 million images

Anonymous
07/24/24(Wed)12:47:32 No.101552511

Anonymous 07/24/24(Wed)12:47:32 No.101552511

>>101552484
I know, that's why audio local must be a thing, so that we can say fuck you to that

Anonymous
07/24/24(Wed)12:48:23 No.101552525

Anonymous 07/24/24(Wed)12:48:23 No.101552525

>>101552243
moe was a failure.

Anonymous
07/24/24(Wed)12:48:51 No.101552535

Anonymous 07/24/24(Wed)12:48:51 No.101552535

>>101552525
True, 3.5 Sonnet is a dense model.

Anonymous
07/24/24(Wed)12:51:14 No.101552570

Anonymous 07/24/24(Wed)12:51:14 No.101552570

>>101552525
User 0:

Anonymous
07/24/24(Wed)12:51:30 No.101552574

Anonymous 07/24/24(Wed)12:51:30 No.101552574

>>101551748
Looking back Mistral 7B was fucking solid for its size and time (it still is)

Anonymous
07/24/24(Wed)12:51:40 No.101552578

Anonymous 07/24/24(Wed)12:51:40 No.101552578

File: livebench_results.d81652a(...).png (1.07 MB, 3142x1814)

1.07 MB PNG

>>101552249
>>101552297
Look at where 3.1 70B is in the graph.

Anonymous
07/24/24(Wed)12:52:01 No.101552585

Anonymous 07/24/24(Wed)12:52:01 No.101552585

>>101552327
>say you want a certain guitar drone
i have nowhere else to ask this and you seem like the right person to ask

what's the name of that guitar sound that was popular in the 60s-70s that goes tukatukatukatuk
like the beginning of Another Brick in the Wall Part 1
i want to know what its called because a lot of Rimworld music uses that tukatukatuk and I like Rimworld music and would like to generate more of it with AI

Anonymous
07/24/24(Wed)12:53:14 No.101552604

Anonymous 07/24/24(Wed)12:53:14 No.101552604

I hate the french.

Anonymous
07/24/24(Wed)12:53:18 No.101552606

Anonymous 07/24/24(Wed)12:53:18 No.101552606

>>101552574
it was indeed ahead of its time, it's crazy how far they're improving this stuff, now we got local models that hold a candle to the very best API ones (gpt4o and C3.5 Sonnet)

Anonymous
07/24/24(Wed)12:55:33 No.101552645

Anonymous 07/24/24(Wed)12:55:33 No.101552645

>>101552578
What the fuck did they put in Sonnet to instantly make it Einstein? Look at that lead.

Anonymous
07/24/24(Wed)12:56:11 No.101552651

Anonymous 07/24/24(Wed)12:56:11 No.101552651

>>101552578
probably the only mememark that puts C3.5 sonnet on the top, meaning it's the best mememark of them all (not meaning that it's good though)

Anonymous
07/24/24(Wed)12:56:14 No.101552653

Anonymous 07/24/24(Wed)12:56:14 No.101552653

>>101552645
Imagine if they do the same +20% increase to Opus.

Anonymous
07/24/24(Wed)12:56:27 No.101552656

Anonymous 07/24/24(Wed)12:56:27 No.101552656

>>101552645
anthropic is cooking some bangers lately

Anonymous
07/24/24(Wed)12:57:05 No.101552665

Anonymous 07/24/24(Wed)12:57:05 No.101552665

>>101552651
There's also https://scale.com/leaderboard/ (good) and https://aider.chat/docs/leaderboards/ (kinda cringe, he only tests models on Exercism Python tasks)

Anonymous
07/24/24(Wed)12:57:12 No.101552667

Anonymous 07/24/24(Wed)12:57:12 No.101552667

>>101552604
calm down Zucc, you'll definitely beat that 123b model if you stack more layer, try with 1.7T parameters next time, gambatte!

Anonymous
07/24/24(Wed)12:58:06 No.101552679

Anonymous 07/24/24(Wed)12:58:06 No.101552679

>>101552656
Hope open sores will figure out what the sauce is one day. Lecun said research is public and no knowledge is secret in the industry.

Anonymous
07/24/24(Wed)12:58:08 No.101552680

Anonymous 07/24/24(Wed)12:58:08 No.101552680

>>101546596
PLEASE! STOP THE WINNING! WE'RE WINNING TO MUCH, I CAN'T TAKE IT ANYMORE! IT'S TOO MUCH WINNING!

Anonymous
07/24/24(Wed)12:58:25 No.101552686

Anonymous 07/24/24(Wed)12:58:25 No.101552686

>>101552645
I have no idea but this shit is amazing, I'm working on it as a DevOps and this shit understands all the subtlety of my code, OpenAI is cooked if they can't improve further, damn I love competitions, that brings the best of everyone

Anonymous
07/24/24(Wed)12:58:55 No.101552695

Anonymous 07/24/24(Wed)12:58:55 No.101552695

>>101552686
>OpenAI is cooked if they can't improve further
Sadly in reality it doesn't look this way, they're only getting more customers, all the normies only know about OpenAI.

Anonymous
07/24/24(Wed)12:59:12 No.101552700

Anonymous 07/24/24(Wed)12:59:12 No.101552700

>>101552578
>turbo
What?

Anonymous
07/24/24(Wed)13:00:13 No.101552718

Anonymous 07/24/24(Wed)13:00:13 No.101552718

>>101552700
llurbo is real

Anonymous
07/24/24(Wed)13:00:21 No.101552721

Anonymous 07/24/24(Wed)13:00:21 No.101552721

>>101552700
They're called that way on https://www.together.ai/blog/meta-llama-3-1

Anonymous
07/24/24(Wed)13:00:36 No.101552727

Anonymous 07/24/24(Wed)13:00:36 No.101552727

>>101552695
when you look at history, there's a lot of moments when companies were at the top of their game and faded into irrelevancy because they couldn't improve their shit further, I'm thinking of Nokia, Canon...

Anonymous
07/24/24(Wed)13:00:57 No.101552734

Anonymous 07/24/24(Wed)13:00:57 No.101552734

>>101552585
I know what you're talking about but I just don't know the name of the technique. All my musical education is in classical so a lot of terminology for modern techniques eludes me. But I imagine it's some kind of technique that involves slapping the bridge to cut the drone short. So let's call it 'slap guitar'

Anonymous
07/24/24(Wed)13:01:04 No.101552740

Anonymous 07/24/24(Wed)13:01:04 No.101552740

>>101552700
It's FP8.

Anonymous
07/24/24(Wed)13:01:19 No.101552746

Anonymous 07/24/24(Wed)13:01:19 No.101552746

>>101552721
>Together Turbo endpoints empower businesses to prioritize performance, quality, and price without compromise. It provides the most accurate quantization available for Llama-3.1 models, closely matching full-precision FP16 models. These advancements make Together Inference the fastest engine for NVIDIA GPUs and the most cost-effective solution for building with Llama 3.1 at scale.

Anonymous
07/24/24(Wed)13:01:47 No.101552755

Anonymous 07/24/24(Wed)13:01:47 No.101552755

File: google1992 space movie.png (106 KB, 965x327)

106 KB PNG

>>101552246
thanks.
>though honestly i fail to see the difference between large and a 7b, but the model also just told me its a 7b so i dunno
Also it successfully googled 1992 space movie.

Anonymous
07/24/24(Wed)13:01:48 No.101552756

Anonymous 07/24/24(Wed)13:01:48 No.101552756

>>101552721
Oh, I see, it's just quanted. So I guess 4 Turbo was a quant as well, if this is an industry standard term?

Anonymous
07/24/24(Wed)13:02:24 No.101552768

Anonymous 07/24/24(Wed)13:02:24 No.101552768

>>101552755
did you switch to mistral-large-2407 (or mistral-large-latest) specifically?

Anonymous
07/24/24(Wed)13:03:15 No.101552777

Anonymous 07/24/24(Wed)13:03:15 No.101552777

>>101552768
latest, and it gave me a longer name even though my name is set to Jim?

>it called me James earlier too

Anonymous
07/24/24(Wed)13:04:21 No.101552795

Anonymous 07/24/24(Wed)13:04:21 No.101552795

>mistral large 2407 and llama 3.1 70b for vramfags
>mistral nemo and llama 3.1 8b for vramlets
>llama3.1 405b for Zucc's flexing
Bros, this week is like christmas, something for everybody :')

Anonymous
07/24/24(Wed)13:04:32 No.101552800

Anonymous 07/24/24(Wed)13:04:32 No.101552800

File: _e6d68556-4ad3-4ced-b126-(...).jpg (160 KB, 1024x1024)

160 KB JPG

>>101550905
Just bing. I use SDXL otherwise.

Anonymous
07/24/24(Wed)13:04:47 No.101552807

Anonymous 07/24/24(Wed)13:04:47 No.101552807

I think the lead between closed and open is still pretty huge. We're at least a year behind atm (just like last year). OpenAI basically smashed their llm stack and redid everything multimodally so it's understandable there's no major improvements. Anthropic figured out some secret sauce to make their LLMs good. Meanwhile llama3 is just trained on more tokens

Anonymous
07/24/24(Wed)13:05:23 No.101552814

Anonymous 07/24/24(Wed)13:05:23 No.101552814

>>101552795
>something for everybody :')
no, the 24gb vram fags only have gemma-27b to eat, please think of the middle ground Meta and MistralAI ;-;

Anonymous
07/24/24(Wed)13:05:31 No.101552815

Anonymous 07/24/24(Wed)13:05:31 No.101552815

>openrouter 405b stopped working
oi oi oiiiiiiiiiii it so fuckin' over tho

Anonymous
07/24/24(Wed)13:06:01 No.101552823

Anonymous 07/24/24(Wed)13:06:01 No.101552823

command-r++ will be king

Anonymous
07/24/24(Wed)13:06:12 No.101552825

Anonymous 07/24/24(Wed)13:06:12 No.101552825

How much hope can we really have that Mistral 2 Large truly is as good as the corpo models? That local has truly become on par with frontier (with the exception of 3.5 Sonnet in coding)?

Anonymous
07/24/24(Wed)13:06:23 No.101552827

Anonymous 07/24/24(Wed)13:06:23 No.101552827

>>101552807
it'll get closer, I feel like the API models reached a plateau and we're not, there's gotta be a limit somewhere on the transformers architecture

Anonymous
07/24/24(Wed)13:07:16 No.101552839

Anonymous 07/24/24(Wed)13:07:16 No.101552839

>>101552827
I'd agree with LeCunny that transformers already reached its limit, But he still has yet to show for MM so we'll see.

Anonymous
07/24/24(Wed)13:07:52 No.101552848

Anonymous 07/24/24(Wed)13:07:52 No.101552848

31 of 51 shards downloaded

Anonymous
07/24/24(Wed)13:07:59 No.101552850

Anonymous 07/24/24(Wed)13:07:59 No.101552850

>>101552814
Just run a Q2 quant bro.

Anonymous
07/24/24(Wed)13:10:22 No.101552883

Anonymous 07/24/24(Wed)13:10:22 No.101552883

>>101551508
This is it
The final blow at VRAMlets
First Llama 3.1 70B, then 405B and now this
They were getting too uppity, deserved

Anonymous
07/24/24(Wed)13:11:39 No.101552903

Anonymous 07/24/24(Wed)13:11:39 No.101552903

>>101552848
converting
I didn't notice these weirdos include 2 copies of the weights (consolidated-000* and model-000*) and my script downloaded the consolidated ones first; not wanting to waste more time I removed the partially downloaded model-* ones and had to rename everything to make it convert
I hope these don't have some fucked up difference that breaks everything

Anonymous
07/24/24(Wed)13:11:40 No.101552904

Anonymous 07/24/24(Wed)13:11:40 No.101552904

>>101546566
I should have started using huggingface-cli instead of git clone sooner.

Anonymous
07/24/24(Wed)13:11:58 No.101552913

Anonymous 07/24/24(Wed)13:11:58 No.101552913

>>101552883
>almost 2 years into the AI hype
>still no vram usurpers
How long until 96GB cards at home?

Anonymous
07/24/24(Wed)13:12:31 No.101552917

Anonymous 07/24/24(Wed)13:12:31 No.101552917

Fuck you Nvidia.

Anonymous
07/24/24(Wed)13:12:36 No.101552918

Anonymous 07/24/24(Wed)13:12:36 No.101552918

>>101551769
>https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/4f81d782477920634d0aad0dc620a7f1a3f5d471
Six days ago.

Anonymous
07/24/24(Wed)13:13:05 No.101552925

Anonymous 07/24/24(Wed)13:13:05 No.101552925

>>101552839
MoA is the path foward.

Anonymous
07/24/24(Wed)13:14:56 No.101552958

Anonymous 07/24/24(Wed)13:14:56 No.101552958

>>101552917
AMD could easily have catered to home AI but decided to go jewy on the VRAM too. So it's not just Jensen

Anonymous
07/24/24(Wed)13:15:25 No.101552966

Anonymous 07/24/24(Wed)13:15:25 No.101552966

>>101552904
>git clone
*uses twice as much storage as necessary*
nothin personnel kid

Anonymous
07/24/24(Wed)13:16:03 No.101552975

Anonymous 07/24/24(Wed)13:16:03 No.101552975

>>101552958
>Jensen
pretty sure amd ceo is cousins with nvidia's

Anonymous
07/24/24(Wed)13:17:03 No.101552986

Anonymous 07/24/24(Wed)13:17:03 No.101552986

>>101552958
>home AI market
maybe 1000 enthusiasts who will pay you an average of $5000 each
>datacenter AI market
hundreds of huge companies who will throw thousands and thousands of dollars at you on a regular basis

Anonymous
07/24/24(Wed)13:17:30 No.101552995

Anonymous 07/24/24(Wed)13:17:30 No.101552995

"A couple wants us to perform a gender reassignment surgery on their pet parrot." Lila blinked, "A parrot?" Maya nodded, "Yep. They want it to 'live its true life'." Lila burst out laughing, "That's... that's something else, alright."

nemo throwing made shade outta nowhere

Anonymous
07/24/24(Wed)13:17:38 No.101552998

Anonymous 07/24/24(Wed)13:17:38 No.101552998

>101552827
>I'd agree with LeCunny that transformers already reached its limit,
It's not true, we can still make them more efficient and there are fields that can be highly improved by using only current solutions (multimodal, video, robots) we easily have at least a few years before we hit the wall and exhaust the paradigm.

Anonymous
07/24/24(Wed)13:17:49 No.101553004

Anonymous 07/24/24(Wed)13:17:49 No.101553004

/aicg/nigger here. Trying Larger on API and its actually pretty incredible, especially after L3-405B flopped so hard. It's certainly no SoTA but I'd argue it's around Claude Sonnet (3.0) in terms of general smarts/creativity which is really surprising given how bad the original large was.

Anonymous
07/24/24(Wed)13:18:15 No.101553012

Anonymous 07/24/24(Wed)13:18:15 No.101553012

>>101552850
Llama-3.1 70B in IQ2_M or even IQ2_S quant does indeed seem more capable than the 8B version at FP16 precision.

Anonymous
07/24/24(Wed)13:20:08 No.101553044

Anonymous 07/24/24(Wed)13:20:08 No.101553044

>>101553004
I haven't used both Sonnet versions? How is 3.5 compared to 3.0 in terms of (E)RP? What's the general consensus?

Anonymous
07/24/24(Wed)13:20:10 No.101553045

Anonymous 07/24/24(Wed)13:20:10 No.101553045

>>101552966
Not only that but also the option to include/exclude specific files or globs.

Anonymous
07/24/24(Wed)13:20:16 No.101553046

Anonymous 07/24/24(Wed)13:20:16 No.101553046

i jus want mixtral-nemo 8x7b...

Anonymous
07/24/24(Wed)13:20:38 No.101553053

Anonymous 07/24/24(Wed)13:20:38 No.101553053

The only real quality difference i notice between small/medium/large is its capability to google shit, only large can, not the smaller ones. Otherwise? Coherency/RP capability all seems identical, even to large.
I really don't think we need more than 8b for ERP/Character purposes anymore.
>>101552998
That's true, But the limitations are still there even if we can make them faster at best.
That said, exhausting the paradigm? Nah, I don't even think its max a couple years, we're still maaannyy years off before we even hit any ceilings of potential for A.I models, even with transformers being limited as i said. Making them use less memory and speed up faster would be helpful for hooking them up to videogames locally, which is what i look forward to.

Anonymous
07/24/24(Wed)13:20:44 No.101553055

Anonymous 07/24/24(Wed)13:20:44 No.101553055

>>101552998
> we easily have at least a few years before we hit the wall and exhaust the paradigm.
Meta doesn't believe that, they just stack more and more layer as a means to say "it's over, we can't improve the transformers architecture anymore, the last resort is to make them bigger and bigger" and I think that's a load of bullshit

Anonymous
07/24/24(Wed)13:20:57 No.101553058

Anonymous 07/24/24(Wed)13:20:57 No.101553058

>>101553004
>especially after L3-405B flopped so hard
??

Anonymous
07/24/24(Wed)13:21:11 No.101553060

Anonymous 07/24/24(Wed)13:21:11 No.101553060

File: file.png (56 KB, 828x599)

56 KB PNG

>>101552755
am i supposed to be impressed?

Anonymous
07/24/24(Wed)13:21:44 No.101553068

Anonymous 07/24/24(Wed)13:21:44 No.101553068

>>101552986
Consumer GPUs don't even use the same VRAM modules. There's no tangible downside to giving consumer cards more VRAM. They just don't want professionals using gaming GPUs to save money.

Anonymous
07/24/24(Wed)13:22:07 No.101553071

Anonymous 07/24/24(Wed)13:22:07 No.101553071

>>101553060
Anon, you're baiting, aren't you? He's checking it with an OFFLINE model

Anonymous
07/24/24(Wed)13:22:18 No.101553078

Anonymous 07/24/24(Wed)13:22:18 No.101553078

>>101553058
The sanitized dataset makes it unusable for roleplay.

Anonymous
07/24/24(Wed)13:22:24 No.101553080

Anonymous 07/24/24(Wed)13:22:24 No.101553080

File: sonic adventure smile.gif (1.6 MB, 540x405)

1.6 MB GIF

>>101553060
>i got you to use OpenAI tokens to search Gayniggers from Outer Space
be very impressed by my capability to PSYOP you.

Anonymous
07/24/24(Wed)13:22:35 No.101553081

Anonymous 07/24/24(Wed)13:22:35 No.101553081

>>101552998
>>101552839
I think what he said was language models (text in and text out only) won't be able to generalize to the kind of AI we want. Has he ever said that they reached a plateau? I have not heard him say this specifically. But that also seems to be correct in terms of the rate of progression. They can still improve, but not at the gains we've enjoyed so far, which means we have approached a plateau in a way.

Anonymous
07/24/24(Wed)13:23:01 No.101553086

Anonymous 07/24/24(Wed)13:23:01 No.101553086

File: copium pepe.png (177 KB, 680x329)

177 KB PNG

>>101553053
>I really don't think we need more than 8b for ERP/Character purposes anymore.
>t. RTX 3060 owner

Anonymous
07/24/24(Wed)13:23:53 No.101553105

Anonymous 07/24/24(Wed)13:23:53 No.101553105

>>101553081
99% sure he said something along those lines, but im not good at pulling up quotes on the fly.

>>101553086
>implying i need a 3060

>>101553092
JANNIES HELP

Anonymous
07/24/24(Wed)13:24:47 No.101553121

Anonymous 07/24/24(Wed)13:24:47 No.101553121

>>101551644
>LLama 3.1 70B
>70B

Anonymous
07/24/24(Wed)13:25:12 No.101553125

Anonymous 07/24/24(Wed)13:25:12 No.101553125

>>101553102
>>101553102
>>101553102

Anonymous
07/24/24(Wed)13:29:07 No.101553190

Anonymous 07/24/24(Wed)13:29:07 No.101553190

>>101548098
they're just wasting power at this point

Anonymous
07/24/24(Wed)13:32:00 No.101553248

Anonymous 07/24/24(Wed)13:32:00 No.101553248

>>101553044
3.5 is way smarter and better at following instructions but it's overfit and has severe issues with repeating parts of responses and overall structure.

Anonymous
07/24/24(Wed)13:35:36 No.101553309

Anonymous 07/24/24(Wed)13:35:36 No.101553309

>>101552913
>How long until 96GB cards at home?
Chinese GPU upstart Ha Long is set to release their new line of GPUs meant to compete with Nvidia in about two weeks.

Anonymous
07/24/24(Wed)13:54:40 No.101553629

Anonymous 07/24/24(Wed)13:54:40 No.101553629

>>101552645
Based on their recent research, they probably isolate features and up the interpretability of the network (sparse autoencoders) for fine tuning purposes. It's mind boggling. I asked it to do niche stuff like blender rendering scripts and it kept iterating with the feedback i gave it smoothly.

Anonymous
07/24/24(Wed)14:15:26 No.101554041

Anonymous 07/24/24(Wed)14:15:26 No.101554041

File: 24-04-17 16-09-08 1004.jpg (189 KB, 1024x1024)

189 KB JPG

>>101552800
>I have been a good Bing.
Cute gens

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.