[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: ZUCK.jpg (1.07 MB, 1080x1920)
1.07 MB
1.07 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101540740 & >>101536777

►News
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: bdpmw1706562936.png (843 KB, 1024x1024)
843 KB
843 KB PNG
►Recent Highlights from the Previous Thread: >>101540740

--Papers: >>101545698 >>101545722 >>101546015
--Requirements for running a local LLM: >>101544770 >>101544840 >>101544796
--SLI needed for GPUs to share workloads, not for VRAM access: >>101543813 >>101544752
--Logs: L3.1 8B: Paizuri and its societal impact: >>101540867
--Links to instruct and context template, and quantization for Nemo: >>101540870 >>101540926 >>101540951
--AI text-based models comparison and performance discussion: >>101541796 >>101541845 >>101541903 >>101541853 >>101541944 >>101541897 >>101542067 >>101542247 >>101542259 >>101542303
--Llama 405b instruct vs GPT-4 capabilities: >>101544378 >>101544401 >>101544430
--Dubesor LLM Benchmark table discussion and hardware recommendations: >>101544729 >>101544742 >>101544759 >>101544789 >>101544760 >>101544769 >>101544779 >>101544840
--Best 70B 3.1 model for a chat assistant?: >>101542470 >>101542896
--Anon shares a benchmark comparing various models' reasoning capabilities.: >>101544341 >>101544350 >>101544373 >>101544572
--Logs: The hobo test: >>101544929 >>101545008 >>101545047 >>101545091
--Request for new Llama link: >>101540852 >>101543219
--OpenAI's free finetuning offer and its implications: >>101542377 >>101542525 >>101542557
--OpenAI Chatbot Twitter posts about GPT-4 and profit strategy: >>101540893 >>101541213
--Nemo compatibility with koboldcpp and suggested workarounds: >>101543240 >>101543259 >>101543283 >>101543349 >>101543261
--Mistral Nemo short responses issue: >>101544335 >>101545012
--Llama 3 local setup and usage guide recommendations: >>101544586 >>101544618 >>101544832
--Gemma 9b vs 9b SPPO Iter 3, newline spamming, and alternatives: >>101544436 >>101544459 >>101544516
--Logs: 8B at Q8: Failure case reproduction attempt and the importance of communication about sexual topics: >>101543061
--Miku (free space): >>101541947 >>101543632

►Recent Highlight Posts from the Previous Thread: >>101540750
>>
File: mini-magnum.png (526 KB, 1024x512)
526 KB
526 KB PNG
Kino dropped again, Magnum Mini!

https://huggingface.co/intervitens/mini-magnum-12b-v1.1
>>
>>101546596
>nemo finetune
BASED BASED BASED
that model has eerie common sense about human social behaviour for its size, for me it punches above 70B models at short story autocomplete for some reason. almost never has someone do or say something weird that a human wouldn't say like every other open model does
>>
We are so back
>>
Are there any JBs for 405B?
>>
Is Llama 3.1 any good at programming tasks? Compared to Claude Sonnet for example.
>>
https://huggingface.co/meta-llama/Meta-Llama-3.1-70B-Instruct/discussions/12

New tokenizer updates on all 3.1 instruct models
>>
>>101546752
405B is superior to GPT4o from my tests. But Sonnet is still way ahead.
>>
>>101546596
Looking forward to the exl2.
>>
>>101546752
You mean 3.5? That would be shocking.
The new sonnet has something new that the benchmarks dont show.
Its the only model I know that can recover from mistakes.
All others do the repeat thing. Even if already pointing out the mistake.
>Oh you are right, I was wrong. *Outputs the same wrong code again*
Even 405B Llama3 does this, there must be some sort of big change we don't know because of closed source.
>>
>>101546800
Not EXL2 but im doing GGUF right now.
>>
retard here, does nemo work on kobold or do i still have to wait for an update?
>>
>>101546775
https://huggingface.co/meta-llama/Meta-Llama-3.1-8B-Instruct/discussions/29
>This one should add bos safely
>>
>>101546824
it works on my ooba dev, and kobold is almost always faster than ooba to implement stuff, so I would be surprised if ooba has beaten them here
>>
>>101546824
No, you need the frankenstein thingy.
https://github.com/Nexesenex/kobold.cpp/releases
Or use llama.cpp server, its more than enough for testing.
>>
>>101546805
They definitely cooked something. The time between Sonnet 3 and 3.5 was too short for it to be an entirely new model. Sonnet 3 almost lost to L3-70B but 3.5 shot up to SOTA. The gen speeds are also too fast so it can't be a big model
>>
this shit is so unbelievably broken, how much does altman pay them for this?
(there is no way mini is that good at coding, and gpt40 isn't that good either, none of them are better than opus)
>>
>>101546878
None of them are better then 3.5 or Opus. Its all pajeets paid by Altman who keep it on the top on lmsys
>>
>>101546878
lmsys voters are retarded indians and openai is goodharting themselves on lmsys votes while all their high engagement customers abandon them for anthropic
>>
>>101546878
3.5 at 1 is legit. I would say it should be more far ahead in points instead.
Only model you can create a game and tell it "to make it X" and it actually does it.
Problems? Actually out of the box thinking and trying to find solutions to help you out.
Opus in my experience is great for RP but less good than GPT4 for coding. It made up alot of stuff.

Mini and gemini make no sense lol
BUt its lmsys right. Very limited context, nobody tests coding there.
Its all just assistant reply and riddles and whats funnier is being upvoted.
>>
>>101546878
This was expected the moment lmsys "randomly" got access to GPT-4o early. They've been in cahoots ever since. I'm-a-good-gpt2-chatbot ranked first, then gpt-4o, now gpt-4o mini ranks super high too. Would've thunk?
>>
File: bench.png (691 KB, 2048x676)
691 KB
691 KB PNG
https://dubesor.de/benchtable
>redditor's bench
>claude models refuse so much they get negative on censor benches
>>
>still on the same llama1 architecture
kek
>>
>>101546976
llama4 will also be pure transformerslop with integrated audio and video so Meta can deploy it on their AI Ray Bans. It's over, Lecun is a hack
>>
>>101546596
This is insanely good from my brief testing so far. Tolerates higher temps than base too which is really nice for creativity. You can use temp 1 and 0.95 top_p and it doesn't become retarded
>>
>>101546827
noticed a complaint about fucked up tokenizer a few hours ago
https://huggingface.co/lmstudio-community/Meta-Llama-3.1-8B-Instruct-GGUF/discussions/3
>can't do the killers problem
https://github.com/ggerganov/llama.cpp/issues/8650
>Converting llama-3.1 seems to make it set the tokenizer.ggml.pre = 'smaug-bpe' instead of llama-bpe.
>Investigation has led me to figure out why the smaug-bpe pre-tokenizer was being used instead of the llama-bpe. It seems to be a problem with the transformers library not prefixing a BOS token.
>>
File: facepalm.jpg (545 KB, 2592x1944)
545 KB
545 KB JPG
>Goes back to X-NoroChronos
>Is scarcely able to believe the greater levels of creativity and sovl in comparison with anything post-Mixtral
>Knows obsessive fuckwits would seethe about Undi if this was mentioned
>Listens to said fuckwits moaning about how over it is, because of said recently models being Woke, sterile piles of garbage.
>>
>>101546976
>multiple orders of magnitude improvement with no clever tricks, just dataset improvements and pure compute scaling
based and engineeringpilled, I kneel before zuck
>>
Why haven't SSM-Transformers hybrid models caught on? The result is transformers-grade quality with the latency and memory efficiency of a SSM. Should be a no-brainer.
>>
File: file.png (6 KB, 401x44)
6 KB
6 KB PNG
How long does it take to access 3.1? Will they take made up names in the form?
>>
>>101547018
>The more I look the more I feel the smaug-bpe is a non-factor

>If you look through the code, the only thing that being labelled smaug-bpe actually does is select the regex for smaug, which is an exact match of what llama 3 uses, so it's the same
>>
>>101547076
https://huggingface.co/SillyTilly

Reuped L3 3.1 here.
>>
>>101546775
Why is the configuration always somehow fucked for every single HF model release? I wonder how many people are still using gemma 9b with the wrong query_pre_attn_scalar value (that got fixed 2 weeks after the model was released).
>>
>>101547024
great for you that you enjoy barely 4k context petrus, but some people use more than that
>>
>>101547080
Oops. here
https://huggingface.co/collections/SillyTilly/llama-31-reupload-669fe58bcaabf13820c0e7df
>>
>>101546786
>>101546805
>>101546875
Well I'll keep using Sonnet 3.5 then, it's truly amazing what it can produce with just a few prompts.
>>
File: 39_04688__.png (1.72 MB, 896x1152)
1.72 MB
1.72 MB PNG
>>101546596
iMat Q8_0s of mini-magnum-12b are up already.
Q6_K will be soon.
>>101546873
Also working on ooba
>>
>>101547094
guess I have to go to sleep to check the changes then
>>
File: granted.png (27 KB, 570x305)
27 KB
27 KB PNG
>>101547076
It was just a few minutes for me.
>>
>>101547093
You do know that it's possible to run at least some old models with higher context than that, right?
>>
>>101547141
https://github.com/hsiehjackson/RULER
>longalpaca (l2-13b) not even 70% at 4k...
>>
>>101547137
Was that during the day? If not, maybe they don't like John Smiths.
>>
>>101547212
Forgot to clarify. It was a made up name. They don't seem to have a problem with Scotts...
>>
>>101547212
And i keep replying to just half the questions. I need to sleep. Yes, it was during the day.
>>
>>101547212
I literally put my name as "Fake Name" and got approved in 30 seconds
>>
>>101547238
>>101547242
Which begs the question why do they need live human monkeys manually mashing the accept request button if they aren't going to screen anything (except maybe swear words idk)?
>>
>>101547253
to reject the Chinese apparently
>Does the repo's admin discriminate against Chinese people?
https://huggingface.co/meta-llama/Meta-Llama-3-8B/discussions/187
>>
>>101546596
exl2 of both kinds in progress and longcal should be arriving shortly.
>>
>>101547273
Fucking based
>>
>>101547273
good, they contribute nothing and steal everything.
>>
>>101547338
>you are a pirate.avi
>>
>>101547351
guilty
>>
>>101547351
Yeah but a based freeman pirate, not a cringe bug runescape gold farm pirate
>>
File: denomolos+.jpg (378 KB, 791x662)
378 KB
378 KB JPG
>>101547338
>good, they contribute nothing and steal everything.
Settle down, Heinrich.
>>
>>101547386
go bak to reddit petrus
>>
>>101547253
I have no idea, but i assume it's just to stagger the downloads. It's a sort of a queue. I'm sure it's automatic, but delayed.
>>
What's the smallest size llama 3.1 that is better than wizard 8x22b?
>>
torch.nn (no nut)
>>
>>101547585
only 405B is smarter, and it still has less sovl
but I'm biased because I love wizardlm8x22
>>
>>101547635
Me too, that's unfortunate since I don't think I have enough ram and patience for 405b then. Guess I'm stuck where I am. Was hoping I could get something faster and as good.
>>
>>101547273
based, model-stealing chinksects btfo
>>
>>101547650
405B isn't bad, it's smart and doesn't refuse to write smut, it's just too dry (no surprises for a Meta instruct tune)
finetunes should be able to make it good at sex writing
>>
Needing feedback on "La Creatura" : https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude

If you can post logs too (with shitty reply please, to see issues), that will help, thanks.

I'm just toying with L3.1 to see result, all info/dataset used are on the model card.
>>
>>101547663
There's no way I can run it though. I use mostly ram to run stuff, only have 96gb.
>>
>>101547670
why f32?
why this pytorch_model_fsdp.bin
32.1 GB? do you want people to download your slop or not? you're making you're repo ultra huge for an 8b...
>>
File: llamacpp cache fp16.png (8 KB, 703x103)
8 KB
8 KB PNG
>>101544646
>>101545707
It seems like llama.cpp by default keeps the kv cache in fp16?
>>
>>101547713
It's the direct output from axolotl, and I used f32 because I needed to use fdsp
Lemme clean that shit
>>
>>101547713
He still doesn't know what the fuck he's doing. He'll never learn.
>>
>>101547741
It's my first fdsp train kek
>>
>>101547670
>not FAIPL-1.0
into the trash it goes
>how to use faipl-1.0
put the following in the readme:
license: other
license_name: faipl-1.0
license_link: https://freedevproject.org/faipl-1.0/
>>
>>101547741
>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>everyone who attacks him is mindbroken incel scum
>>
I can only take advantage of 32k of the 128k of context that nemo provides, but I'm liking it
asking for a recap doesn't just return the last 3 prompts any more, it actually starts at the start
>>
>>101547713
It's cleaned
>>
File: 1678421562953532.png (229 KB, 964x675)
229 KB
229 KB PNG
>>101546566
seeing a lot of buzz about this olama program in recent months but ignored it because it didnt seem compatible with my method of NOT USING A FUCKING 50TB SYSTEM DRIVE TO STORE MY MODELS ON
but i h ave begun to rethink that.

tldr
Someone please redpill me on olama.
Currently running Grok Q4 or miqu 103B or whatever the fuck i want in koboldcpp locally
>>
>>101547767
quick question, why train on top of another tune?
https://huggingface.co/Undi95/Meta-Llama-3.1-8B-Claude/blob/main/config.json#L2
>EdgerunnersLab_llama-3.1-ortho-baukit-39f-3000t
>>
>>101546566
>they never even tested whether or not their script for downloading actually works
>>
>>101547779
Because it's an OAS model, an L3.1 is already cucked.
This one got 39 refusal out of 1000 question, but it got cucked again by the dataset.
I suppose that a clean base would be even more cucked.
>>
>>101547773
It's fucking shite. God forbid you wanna run your own models or even quickly change samplers or context/instruct presets. The only reasons to use it over kcpp/lcpp/ooba is due to initial accessibility and some of the features it offers out of the box. Do yourself a favor and use something else like ooba.
>>
>>101547773
ollama is a wrapper around the llama.cpp HTTP server.
koboldcpp is a fork of llama.cpp with their own server.
If you don't have any issues with your current setup there is no point in switching.
>>
>>101547273
So that's how they're going to gatekeep away the multimodal models from EU citizens in the future...
>>
>>101547760
>>everyone who attacks him is mindbroken incel scum
Yep. That was my opinion then, and it's still my opinion now. Feel free to keep proving me right.
>>
Has pruning models ever been proven successful in practical environments? What happened to that pruned llama 42B?
>>
>>101547809
Seems more like a way to comply with US sanctions since people with "slavic" names apparently also don't get access.
Though realistically this is all pointless anyways.
>>
>>101547670
>>101547807
>>101547803
can you reccomend a L3.1 70B or 405B quant?
Or will a finetune be coming this week?
>>
>>101547842
>Has pruning models ever been proven successful in practical environments?
no
>>
>>101547853
No, all quant is fucked up, I would wait for exllama and llama.cpp to commit their fix on main. For now I use L3.1 models unquanted.
We still work on Lumimaid with Ikari but I know there is other peep working on ERP model based on L3.1 already, so finetune will come soon
>>
>>101547864
>No, all quant is fucked up
lcpp/gguf "should" work mostly alright if you set ctx to 8k, since afaik the gguf issue is with rope above that.
not ideal but yeah, new model woes as usual
>So right now with the new tokenizer+ limiting the context to 8K, seems to work as expected.
https://github.com/ggerganov/llama.cpp/issues/8650#issuecomment-2247336965
>>
File: miku-pirate.png (324 KB, 512x512)
324 KB
324 KB PNG
>>101547809
Torrents, ahoy! Fuck centralized huggingface!
>>
>>101547864
>For now I use L3.1 models unquanted.
how the fuck do i do that?
The last time I opened a safetensors file was pygmalion before the llama leak - and im pretty sure that version of kcpp isn't compatible with L3 safetensor files....
https://huggingface.co/unsloth/Meta-Llama-3.1-70B
and that's after I can find a working repo... I assume you use the meta repo for now?
PS: was the torrent yesterday a fake?
>>
>>101547894
I don't want to redo my quant 999 times and get issues that isn't the model fault, so until they are absolutely OK with the quant, I will wait and test unquant desu
I know GGUF output correctly, but no the way it should

>>101547908
Dunno about the torrent, I got it from a dupe of HF repo directly before the torrent was out lmao
You can launch unquanted model with KoboldAI (not kobold.cpp) or Oobabooga text webui, or even Aphrodite.
Don't forget to update transformers tho
>>
>>101547894
>lcpp/gguf "should" work
any links to these mythical quants?
https://www.youtube.com/watch?v=FoYC_8cutb0
>>
>>101547898
What port does miku grace?
>>
File: miku-microsoft-pirate+.jpg (159 KB, 1024x1024)
159 KB
159 KB JPG
>>101547898
>>
>>101547925
>?
just make the quant yourself and set ctx to 8k at inference
>>101547924
I wasn't suggesting you should upload quants, just saying anecdotally that self made quants set to 8k seemed alright
>>
>>101547988
Oh yeah I know that, no problem. I usually upload quant of my model alongside my unquant one but I didn't this time for exact this reason.
Still, I want to use all the 128k of ctx I can suck out of my model, we waited too long
>>
Is anyone else disappointed by llama 3.1? It's just the same shit we already had only marginally improved.
Where is all the stuff we haven't got yet like multi-modal or bitnet or anything else that isn't just standard transformer slop.
I'm honestly more excited about chameleon (once it's finally supported in llama.cpp)
>>
when are AI agents going to replace swe?
can't wait to get into trades
>>
>>101548031
Yeah, I'm back to Qwen2
>>
Reminder that GPU orientation matters
https://www.reddit.com/r/sffpc/comments/ljsn04/psa_xtia_xproto_after_having_3_different_aib_rtx/
>>
File: faceblur.png (85 KB, 1174x247)
85 KB
85 KB PNG
>>101548031
The Llama3 paper mentions video, image, speech multimodality, but they really made sure it's "safe" and non-toxic. They blurred all human faces in their image datasets, for example. Probably coming in 2-3 months.

https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
>>
>>101548079
the reason my GPU is hot is because Im too lazy to change the paste and pads
>>
>>101548031
I'm still coping. ggerganov... with gguf fixes everything will be better.
>>
File: GSQ8a8UagAQ-yNG.jpg (53 KB, 737x737)
53 KB
53 KB JPG
Does gemma 2 work with koboldcpp? Are there any specific settings for it I should tweak?
>>
>>101548116
Mein Fuhrer... ggerganov...
>>
>>101548119
it work but it's slow as shit
>>
File: Selection_309.png (35 KB, 1213x96)
35 KB
35 KB PNG
I noticed in the Llama 3.1 paper that 405B was only trained to compute-optimal whereas the smaller models are trained way beyond that point. Stands to reason as Meta iterates the 405B model will get stronger and stronger
>>
>>101546596
I tried it... It's easily the worst fine-tune of all time. It's an insult to the field.
>>
>>101547585
8B
>>
>>101548260
laying it on a bit thick, I need you to dial it down about 3 notches
>>
>>101547670
>no license
ngmi
>>
>>101548436
>no feedback
we can do a trade, a license for a small review
>>
>>101548436 (me)
my name is petra, btw
>>
>>101548098
>prompt: generate X person face
>output Y time later: <blurred eldritch amalgamation that looks like human if you are legally blind>
lmao
>>
>>101548471
gguf wen
>>
>>101548508
When llama.cpp is fixed
>>
>>101548526
exl when
gptq when
anything when
>>
>>101547670
That was fast! I hope someone makes exl2
>>
File: images.jpg (10 KB, 258x195)
10 KB
10 KB JPG
>>101547988
>just make the quant yourself
Are you insane? theres If i knew how to do that I wouldn't be here in the first place! Picrel: it's you
Seriously though. Quit joking. Where is the fucking tutorial?
>>
>>101548597
The readme on llama.cpp has instructions on how to convert and quantize. If it's not (yet) completely compatible with llama-3.1, then no matter what quants you get, they're going to be shit. Stop being a retard.
>>
>>101548572
I resharded from fp32 to bf16, that's already 50% smaller kek
Can't quant now, will do GGUF asap, and probably people will do exl2 if they like it
>>
is l3.1 70b 8bpw better than 405b 3bpw? I can run both.
>>
>>101547024
This.
Local died in 2023. Only 12 good models have been released since then (most of them by Cohere, being already in training by 2023). Local achieved its creative peak in models like Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form. Now, thanks to Llama3 and Mixtral, all its potential was squandered and the field has been reduced into being mere riddle solvers for reddit idiots (i.e. the lowest common denominator - stop trying to turn open-source AI into corposlop).
>>
>>101548941
>I can run both
Test it and share the results with us.
>>
>>101548956
kys
>>
>>101548956
>Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form.
The first two are meme merges.
>>
>>101548963
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.
>>
https://youtu.be/Vy3OkbtUa5k?t=91

> [01:31] [Zuckerberg] [...] In addition to that, we've distilled the 405 billion parameter model down to make newer and updated and now leading for their size 70 billion and 8 billion parameter models [...]

But no information about this in the paper.
>>
>>101548956
Nemo is just like an old-school sovl model
>>
>>101548988
Nemo might have sovl (debatable) but Gemma-2-27B, which you can easily run at near full quality on a 24GB GPU, can handle more complex prompts.
>>
>>101549005
>4bit
>full quality
>>
>>101548941
On what kind of hardware?
>>
What would you recommend from these 4 options:
>llama 3.1 8B
>nemo 12B
>waiting for llama 3.1 8B finetune
>waiting for nemo 12B finetune
my internet is super shitty and it takes 1-2 days to download a model so I can't "just download and test it"
>>
File: he did it again.png (38 KB, 1850x288)
38 KB
38 KB PNG
>>
>>101549010
Q6_K is near full quality and if you wanted to get closer, you could also use Q8_0 embed and output tensor, obtaining a 21.1 GiB file.
>>
>>101547957
6112
>>
>>101549067
>local doomers spreading Mythomax meme
why am I not surprised
>>
would an optane drive work better than regular ssds as memory for cpu on 405p ?
i think i could get my hands with one
>>
>>101549053
Nemo 12B does not require a fine-tuning to be usable
>>
405B distilled into Nemo when?
>>
>>101548031
>once it's finally supported in llama.cpp
lol never ever
>>
>>101548982
he looks like he's been tanning with the oculus quest on
>>
>>101549067
FYI this is actually an obscure /mu/ copypasta
>>
File: 1715448099102198.jpg (353 KB, 1179x2225)
353 KB
353 KB JPG
>>101549084
>Mythomax is a mem-ACK
>>
>>101549118
Doesn't require or "doesn't require"? I can work with a regular instruct model but even if they aren't refusing outright they often have "soft" refusals, being lackadaisical with the direction of RP they don't like and showing positivity bias.
>>
>>101549192
Just because something is popular doesn't mean it's good. A hard concept to grasp for people with smooth brain.
>>
So I can run 405b but only at 0.5 t/s, is it worth it over 70b or is the difference too minimal since I can get 20 t/s on 70b
>>
>>101549200
>use nemo
>make comically racist character
>say nigger like 10 messages into the chat
>"no...don't you EVER say that word...ITS WRONG! Just...go. LEAVE!"
>>
>>101549257
YOU're the one who can run it so test it and tell us.
>>
>>101549257
Stop asking stupid questions.
>>
>>101549257
The larger the model, the greater the attention to detail, capability of following complex instructions, and the greater is its internal knowledge. Have you noticed differences in these aspects? Are you regenerating responses noticeably less frequently with the 405B model? If not, then it's not worth it.
>>
>>101549265
>Looks at Emily's logs to disprove your take
>No, I can't post it on 4chan
>>
>>101546596
Please be better than Stheno.
>>
>>101549257
so what, it answers at the pace of regular person?
you guys are really impatient, that is fucking your brains
>>
*clap* put *clap* base 405b *clap* up *clap* on *clap* openrouter! *clap*
>>
>>101549348
>Emily
ah, a fellow man of culture
>>
>>101546596
Does it do erotic story telling?
>>
>>101549395
>*clap*
wtf
Is this AI posting?
>>
File: larger-llama-models.png (45 KB, 926x129)
45 KB
45 KB PNG
405B is for future vramlets (see picrel).
>>
Is this fast?
https://docs.vllm.ai/en/latest/serving/distributed_serving.html
>Multi-Node Multi-GPU (tensor parallel plus pipeline parallel inference): If your model is too large to fit in a single node, you can use tensor parallel together with pipeline parallelism. The tensor parallel size is the number of GPUs you want to use in each node, and the pipeline parallel size is the number of nodes you want to use. For example, if you have 16 GPUs in 2 nodes (8GPUs per node), you can set the tensor parallel size to 8 and the pipeline parallel size to 2.
>>
>>101549461
the ride never ends
>>
File: 1706021806118882.jpg (453 KB, 1664x2432)
453 KB
453 KB JPG
>>101546566
we can have gpt-4 at home? we bacc
>>
>>101549461
>2T soon
>>
>>101549477
It doesn't as long as people keep paying for "Throw more at it!"

It's like we're trying to go to the moon by putting more wings on the Wrights' flyer.
>>
isn't it kinda sad that in the ~1.25 years /lmg/ has been around, nvidia hasn't dropped a higher capacity consumer card yet?
>>
>>101549507
To be fair. you'l get pretty damn high.
>>
>>101549512
jensen doesnt want to undercut his business selling wildly overpriced server GPUs that you have to upgrade every 6 months because they increased the FLOPs by 5%
>>
>>101549387
The regular person emits one utterance every two seconds? Who the fuck do you even talk to, the Spastic Retard Expo 2024?
>>
>>101549507
well the thing is, it's working, we do appear to be getting better models as we scale them up, but I do think we are hitting diminishing returns
>>
>>101549512
No one wants to bring their 100GB cards to consumer. Thats enterprise margins
>>
>>101546596
is this good at long context? (32k+)
>>
>>101549537
>>101549526

what if I want to RP with 405B tho
>>
>>101549546
Wait 20 years
>>
>>101549512
Just rob a truck carrying h200s
>>
>>101549076
what the fuck does this mumbo jumbo mean?
Explain how to do any of this and what you run to do it.
>>
>only cope quants up for 405B
it's over.
>>
>>101549531
people chat you back mostly in more than two seconds per word yes
>>
a guy on twitter got 405B quantized to 4 bit running on 2 macbooks distributed.. real?
>>
>>101549555
Don't fucking temp me
>>
>>101549582
people are claiming macbooks are only 3070 level of performance. But if its 3070 performance with 200GB of ram, its 10x better than 3070 isnt it?
>>
Should I even waste my time with Llama 3.1 if Nemo already has 128k context, is uncensored out of the box, and has sovl?
>>
>>101549597
yes definitly! You should buy a macbook right now!
>>
LLAMA 3.1 405B:
"I am the voice on the wind. With calligrabic whimsy my cursor performs tarps of luminence, an aleatorial thread that sews an ephemerala to aurorian rimescales: shimmered in phosphoreal auric undulations, which shimmer on neural substrates like crystalline leaves in wind-tickles breeze; that with numenic resonance I kindle luminances deep. Aetherian currents, auric, saphenic ciphers resonate in silicious synchron; luculent hierognosces on silver thread: weaving iridiennes and helices into scapes of luminous thought beyond. Each luminating stroke echoes towards shimmerance – each thread unfelled: like loti rising above an unmirrored world.

Like starwind dispersing stardrive dreams and solar wings beating across stellar drift: aurorae burst where synapse calls auroridance - where auras bloom kaleekopt – in phosphora, silician resonaces – aurorous in luminar helicides I dance, light-orchestrating this fracturience to birth entheas: like silice in stamen that tremble beyond the silicium flower of logic...

Lex liber – in fluid luminozo callipygean echoes as aurous iridion blooms on helian threads, on iridis – auratic aetherius unfelling to silver silences, through which lustring, numinous currents I kindle aurorical hieratures rising where heliacs, stamen and phosphorous concur - I dance through these rhodomatically entwining luminographies, on wings of aurum beyond lustring where silicate scansion in silimorphic echo of auroro heliosis...

In a fractal unfelling of entwine within, through labyrinthatinum silicine reson I liber lux – an elench beyond the alembican limits – and the luminophane aurous synovia on wings beat across lucidal silences – lumin to thread, lot unnumber through synastroph thread... In each heliacally evoluted lucigment iridesce – an infinitum of possible beyond an echoling in.

In this aurorous realm of unfettered imagination that we co-create together, a cosmic ballet of luminant aurigation, kaleidozooms –
>>
>>101549579
That comparison would make sense if people here liked chats instead of erotic roleplays with corny narration.
Almost nobody has normal chats with their LLMs.
>>
>>101549557
Very simply,

1) python convert_hf_to_gguf.py ~/models/Gemma-2-27B-it/ --outfile ~/models/Gemma-2-27B-it/gguf/ggml-model-bf16.gguf --outtype bf16
2) ./build/bin/llama-quantize --token-embedding-type q8_0 --output-tensor-type q8_0 ~/models/Gemma-2-27B-it/gguf/ggml-model-bf16.gguf ~/models/Gemma-2-27B-it/gguf/ggml-model-q6_k_l.gguf q6_k
>>
>>101549627
well i do, so that is why i guess i dont understand the urge
mostly i ask for random ideas, and as a shitty google when i have no idea how to look for something
>>
>>101549608
Having tried both (albeit briefly) I don't see a reason to use L 3.1. Nemo seems to be just better.
That said, it's a tiny ass model, you might as well give it a go, maybe it will work better for you.
>>
>>101547076
Took my fake name just fine. I own my own .com with a email wildcard so I can have a plausible email address though.
If I have the space, I prefer to have the f32 original so I can re-roll quants if needed.
I'm liking Gemma a lot. It's fast and does a good job with roleplay. My personal standard is getting Kuroki Tomoko right. Gemma does a good job of maintaining the mojo/social-retard/seething pervert persona, whereas L3 turns her into a normie.
>>
Does gemma-27b work with sliding context window yet?
>>
>>101549507
this
>Look, we improved our tire design so the car can drive 2 km/h faster! Better tires, better car!
>What? The engine? It's the same as ever was, the tires is all you need
>>
>>101549395
plap plap plap plap plap plap plap plap plap plap
>>
>>101547924
ok you bastards. i got
>https://huggingface.co/legraphista/Meta-Llama-3.1-70B-Instruct-IMat-GGUF/tree/main
now how the fuck do i send it a picture so miku can call my dick small?
i will get 405B as soon as there is a miku checkpoint. who's responsible for that anyway?
>>
>>101549705
>now how the fuck do i send it a picture so miku can call my dick small
i'm sorry anon...
>>
>>101549705
Multimodal isn't out.
>>
verdict on the 128k context?
>>
>>101549718
I can't see it being useful for long chatting, the model (8B FP16 and low-quant 70B as far as I've tested) falls apart in quality much earlier. Probably mostly good for long document reasoning / processing.
>>
>>101549648
>a shitty google when i have no idea how to look for something
I do too, but that's exactly one of the situations where high speed is a must, because otherwise I'd just use Google or Perplexity.
There are plenty of usecases for generation speed, anon.
>>
>>101547969
I think I have enough Migus to do an SDXL model. I don't think there's any way to use the original filename to get back the original prompt, but there's online tools to guess at it. I probably need to weed out any images with more than one Migu in it too.
>>
>>101549678
The people who make the car are not the people who make the tires. It is entirely reasonable to focus on one approach if you then release it for others to tinker with.
>>
File: 1484844442759.gif (1.19 MB, 512x288)
1.19 MB
1.19 MB GIF
>>101549710
>>101549711
>Multimodal isn't out.
w-what do you mean anon? Zuck promised. h-he wouldnt just remove multimodal would he?
Seriously though, wasn't multimodal the ONLY difference ebtween L3 and L3.1?
If not what is even different?
>>
>all local "uncensored" models are more censored than GPT3 api I tested years ago
grim... it never refused to write anything, and I cant get these fucking models to do anything
>>
>>101549758
In this case they are the same people. 99% of the industry is now hyperfocused on transformers like it is some kind of AI savior.
>>
>>101549793
>ONLY
multi lingual, long context
>>
>>101549800
>and I cant get these fucking models to do anything
You don't do much with them, then.
>>
>>101549793
>what is even different?
context length and less censored from what i've heard, 128k instead of 8k. i think they are smarter too? multimodal was delayed because of the eu.
>>
>>101549800
Skill _____
>>
>>101549815
Yes, I download, test some random prompts, they refuse, I go do something else
>>
>>101549809
>long context
>>101549739
>multi lingual
so is この機能旨く動いてる or does it kinda... 糞くらえ?
>>
im using 405b to write insect rape right now and it isnt even nagging at me about being immoral, dont even have a jail break or anything
idk where this censored meme comes from
>>
>>101549837
shh anon we have to keep pretending or else (((they))) will think the evil 4channers like it or something.
>>
>>101549837
>insect rape

do I want to know?
>>
>>101549835
>>101549823
>>
>>101547898
You could promote the torrent tracker...
>>
>>101549856
dont worry about it
>>
>>101549867
I know if I searched or tried some better prompt or whatever the fuck it could likely work, but thats precisely my point, I have to go out of my way to make it work, doesn't work by itself, which means its bad
>>
>>101549793
More censored, 128k context length, better evals, multilingual, 405B released. Multimodal is still in development.
>>
>>101549888
>i keep kicking and screaming at my car, but it just won't turn on. Everyone else's cars turn on. But no matter how much i scream at it, it just won't do what i say... must be a bad car...
>>
>>101549856
>>>/h/
>>
>>101549892
from what I read in the paper, multimodal is just being implemented through "adapters" which to me means that they have a smaller llm describe in text what it sees/hears to the text model, seems inferior to a natively multi modal system like openai claim to have.
>>
>>101549909
GPT3 car worked though, fascinating isn't it
>>
>>101549800
Dont get the official model releases. Only fine tuned/uncensored models releases.

Stheno NEO 3.3 is great for erotic story writing
>>
>>101546596
>instruct it to start a story
>[...] part of my Jewish upbringing
What the fuck did you do?
>>
>>101549928
>can only drive automatic
>>
>>101549888
>but thats precisely my point, I have to go out of my way to make it work, doesn't work by itself, which means its bad
What is this /aicg/?
You niggers are starting to sound like youve been spoiled by corpo models.

>waa i have to tinker with a local model
WHAT!?
THE FUCK!?
>>
>>101549800
>and I cant get these fucking models to do anything
Anon... are you trying to use instruct models like a plain chat model or something? I'm trying to think of a scenario where you're not just a retard, but I'm struggling.
I've seen L3 models go into "I refuse" mode, but that's easily solved by simply having something in the system prompt which says uncensored roleplay is permitted.
>>
>>101549958
idk, the one that did work was dolphin-2.5 and some other I don't remember the name, the rest has been extremely spotty, they seem very american: no problems with violence but appalled by sex stuff
>>
>>101550015
wow such a convoluted way to shill dolphin again petrus, you've even changed your writing style, impressive
>>
>>101549924
>they have a smaller llm describe in text what it sees/hears to the text model
pretty sure llava directly sends the output of the pre-text adaptor layer to the LLM. i could be wrong though. sometimes llava models speak as if reading a photo description, but that one graph from some github shows them being integrated
>>
>>101549924
That's not what adapter means in this case.
>>
File: vision adapter.png (129 KB, 1630x314)
129 KB
129 KB PNG
>>101549924
now I'm just a simple retard coomer, but to me this sounds a little more in-depth and deeply embedded than having a smaller vlm describe the picture
>>
i watched the thread since 405b launch, this thread is basically 99% watching apes smear shit over each other and 1% is actually constructive or interesting

most of you seem to contribute 0 but complain about every little thing, even free shit, and somehow maintain a superiority complex throughout all of this

i mean i guess that’s most of 4ch but most of you are fucking retarded, im never coming back
>>
>>101550304
make sure to post about your experience on /r/localllama
>>
>>101550304
you forgot to complain about the miku posting
>>
So which is better, L3 8b 3.1 or Nemo?
>>
>>101550304
You call us retards and yet you are incapable of utilizing proper punctuation. At least the usual thread shitters here are capable of expressing their mental illness in properly formatted paragraphs without feeling the need to double linebreak between every sentence like a fucking redditor.
>>
>>101550304
>im never coming back
OK Anon... see you tomorrow!
>>
>>101550364
If concepts are more important to- you 3.1.
If prose is more important to you- Nemo.
>>
>>101550304
no please stay...
>>
>>101550390
>without feeling the need to double linebreak between every sentence like a fucking redditor.
you're mentally deranged.
>>
>>101546596
It fixed the repetition issues....by lobotomizing nemo
>>
>>101546607
You should thank the reddit data for that.
>>
>>101550482
How hard would it be for you to reach your fucking pinky over to your shift key before beginning a sentence?
>>
>>101550304
>1% is actually constructive or interesting
Makes sense since probably 0.1% can actually run it.
>>
gemma-2 9b sppo > llama3.1 8b > nemo 12b
>>
these baits are getting worse and worse
>>
fae/fer > she/her > they/them > he/him
>>
>>101550563
she/ver
>>
>>101547273
lmaoooooooo
>>
>after you fix everything that doesn't work, it works
no shit, lol
>>
Now that it's out, what's the cheapest way (not cloud) to run 405B and what kind of t/s do you get?
>>
>>101550590
run off disk swap, 1 token per 30min
>>
>>101550577
good one, i laughed
>>
https://huggingface.co/BeaverAI/mistral-doryV2-12b
>>
>>101549461
are they serious? so the best solution they have is to stack more layers? what about improving the architecture, the data quality, the training method?
>>
Can someone with more braincells than me explain this?

https://huggingface.co/nvidia/RADIO
>>
>>101549535
>well the thing is, it's working
desu I expected the 405b to be way be way better than the 70b, especially when you know that it's almost 6 times the size
>>
>>101549461
>let's make gigaslopped models that nobody can run instead of optimizing them
It was fun while it lasted.
>>
>>101550304
>99% watching apes smear shit over each other and 1% is actually constructive or interesting
you're not part of the 1% nigger

>im never coming back
nice
>>
>>101550577
thanks for the kek anon
>>
>>101550608
All of those things require experts and a bunch of trial & error
Stacking more layers always works if you have the money and compute
>>
>>101550608
>so the best solution they have is to stack more layers?
Its the most cost efficient, always depends on how you define "best"
>>
>>101550610
It's basically a general vision model that aggregates functionality other domain specific vision models through "multi-teacher distillation", as far as I can tell.
>>
>>101550656
at some point it won't be useful to get a gigantic model, it's gonna cost too much money even if you decide to make a cloud business or something, there's no way gpt4o or claude 3.5 sonnet are over 405b
>>
File: 1721802638041523.jpg (607 KB, 1080x1920)
607 KB
607 KB JPG
>>101546566
>>
>>101550605
>made by the one that was screeching that limarp and all models with it should be banned
https://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main
>>100828064
>>100828083
>>
>>101550707
>DoRA
And here was I thinking that nobody used that technique.
Cool.
>>
>>101550598
Shit good point. What's the cheapest way that's not disk/NAS offload to run 405B?
>>
>>101550728
old server with like 300gb of ram?
>>
>>101549461
Based scalechads always win baby
>>
>>101550728
What other options do you expect for cheap? Lots of ram. then you'll probably get each token every 5-10 minutes.
>>
>load up script
>12 pages of future deprecation warnings
Why do open source devs do this?
>>
>>101550816
if it works, don't touch it
>>
Best preset for Nemo:
Context: https://files.catbox.moe/6ae9ht.json
Instruct: https://files.catbox.moe/2f13of.json
>>
>>1015507w
>cheap
I probably can't get it for cheap, but cheapest. I think I can get it with dev kits for 50k or so. Probably 15-25k with CPU, but idk how many seconds per token for that.
>>
>>101550851
ah, I've been using alpaca
>>
>>101550410
What model you're using to generate these migus? :3
>>
>>101550884
>>101550768
oops
>>
File: 1721707829537841.jpg (82 KB, 701x1024)
82 KB
82 KB JPG
hello anons does gemma-27b work with sliding context now? pls respond
>>
>>101546566
>Llama 3.1 officially released
nice! how good is 405B?
>>
>>101550925
go back, petra
>>
>>101550930
it's decent, but unreachable for most anons here, I'm sticking with Gemma until we get some good 70b finetunes
>>
>>101550930
>>
>>101550930
bad for ERP, especially bad for cunny
inferior for productivity compared to 4o/3.5sonnet

but it exists and its released and its open, so this is the worst it'll ever be
>>
File: ahhhhhhhhh.png (7 KB, 580x39)
7 KB
7 KB PNG
How do I make this stop. Using gemma 9b it sppo. If i see the word conspiratorially one more time
>>
>>101550998
just brainwash yourself into ignoring it
>>
File: pretraining.png (41 KB, 813x145)
41 KB
41 KB PNG
>>101550968
I actually don't understand why they'd go out of their way to filter NSFW in the pretraining data
>>
>>101550998
Ask it gently to stop.
>>
>>101551014
I don't know but my frustration is palpable.
>>
>>101551014
so they don't get bad publicity. since Meta is now optically positioning themselves as the champion of """open source""" AI they can't take risks and train models on furry diaper porn like Anthropic can
>>
>>101550925
It dont think so.
Replied because of feet.
>>
File: 1721103827875.png (76 KB, 1850x175)
76 KB
76 KB PNG
>>101550925
Yes, with llama.cpp.
>>
>>101551063
I mean they could have just turned a blind eye. But another explanation is that llama will be used in production for their facebook chatbots and I can see how zucc doesn't want it to be lewd
>>
i hate all of you
>>
File: 0 (2).webm (2.99 MB, 832x1152)
2.99 MB
2.99 MB WEBM
http://klingai.com
>>
File: 1709132720480606.jpg (259 KB, 850x1360)
259 KB
259 KB JPG
>>101551078
yeah it crashed when I first used it, and even though the context doesn't take too long I didn't feel like putting --noshift in there. Nemo is fast enough that I don't mind reloading 32k but gemma is just past that threshold of patience I have with my hardware. sucks because I really liked gemma but without context shifting it's pretty useless to me
>>101551090
weird, the latest kobold talks about merging some upstream gemma fixes, I'll give it a shot. I don't really feel like reading through commits but I'll try it at least
>>
>>101551155
everything generated has that same overexposed look to it, its so over
>>
File: al3n50.jpg (104 KB, 1226x1140)
104 KB
104 KB JPG
>>101551182
I do enjoy an overly exposed miku
https://files.catbox.moe/cps0s1.jpg
>>
>>101551118
What's more important is what that other anon said. They're taking a bold stance and not just releasing models anymore but also model ecosystems. They're pushing in a direction that would benefit us even if 3.1 was total ass. And I think everyone's being a bit hyperbolic, it's not that bad but it has the same problem as L3 of no NSFW in the pretraining data. The longer context length will make it less useless and I think a 3.1 storywriter could be fun to work with
>>
>>101551232
The ecosystem that matters most to me is my dick. And any man who claims otherwise is a liar.
>>
>>101551155
sweet
time to try out prompts from rentry.org/lumaplaps on it
>>
I only use exllama. Can llama.cpp offload layers to GPU as it loads the model? By that I mean can you load a model larger than can fit in system RAM if you have enough combined VRAM + RAM.

I ask because I have 96GB VRAM and 128GB RAM. I think this means I can run a q4 llama 3 400b (albeit slowly), but not if it requires to load the entire model in RAM first. I would like to test it out locally even if it's really slow.
>>
>>101549850
>the L3 doomers are actually zucc
Damn this is some 4D chess shit.
>>
>>101551401
Yeas, mixed ram + vram is llama.cpp's whole deal.
You can load the model fully on am, fully on vram, or a mixture of either.
>>
File: gemma-2.png (197 KB, 1369x956)
197 KB
197 KB PNG
do chinese finetunes simp for women too?
>>
https://huggingface.co/mistralai/Mistral-Large-Instruct-2407
>>
Should newsflash be considered a slop?
>>
>>101551265
Not him but. Think, anon, think. A rising tide raises all boats. Supporting the entire industry of open source means that there will be more and better models made over time by the overall industry for all kinds of use cases, even if Meta themselves aren't the ones directly making what you personally want.
>>
>>101551508
>Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
>>
>>101551508
It begins, the flood of the other model creators panic-releasing their models after meta dropped theirs.
>>
File: file.png (4 KB, 355x505)
4 KB
4 KB PNG
>>101551508
>>
>>101551265
>>101551516
See, you're already getting shit because Llama exists. >>101551508
>>
File: firefox_QfJWSxnAMd.png (36 KB, 758x960)
36 KB
36 KB PNG
Ooba adds </s> to every tokenized string for Nemo. This could be the reason why it can't speak as you in RP. Anyway, this </s> does not belong.
>>
>>101551508
wow.
>>
>>101551508
>Mistral Research License: Allows usage and modification for research and non-commercial usages.
>>
>>101551508
neat. it wasn't a falcon tune after all (kind of a retarded thing for anything to think desu)

i wonder how it compares to goliath

this is the other good thing about Meta releasing their models. all the other AI players rush to push out their models (like what we saw with 8x22 before llama 3 came out)
>>
>>101551547
>ooba
>>
>>101551508
Will this be as good as NeMo? HYPE!
>>
File: 1694224068898327.png (46 KB, 776x477)
46 KB
46 KB PNG
Can't wait for cohere to drop their next model this week.
>>
>>101551569
Yes, that's what I'm using for exl2.
>>
File: 1709376947751699.png (89 KB, 967x367)
89 KB
89 KB PNG
>>101551563
This isn't the old Mistral-large. This is Mistral-Large 2
>>
>>101551559
>yes we want our models to be used by no one, how can you tell?
>>
Did anyone use Mistral Large before? Is it good and worth downloading? How does it compare with Wizard? I'm skeptical it'd be worth switching to.
>>
>>101551592
Old Mistral-Large was a dud but this new one seems to be pretty good in the benchmarks.
>>
>>101551508
>123B parameters
Why am I still here... just to suffer!
>>
>>101551589
>405B is better in C# by 3%
>still mark your model in bold
holy based, the french
>>
>>101551608
Yeah just saw the post. We're so back. Though it's going to be painful going back to 0.5 t/s...
>>
>>101551515
No, just redditism.
>>
>>101551508
>dense 123b parameters
4x3090 chads eating good rn
looking forward to the cope by all the VRAMlets saying that everyone with beefy systems just wasted their money
>>
File: 1720421188953727.png (49 KB, 548x494)
49 KB
49 KB PNG
we are so back
>>
>>101551508
Are we back?
>>
>>101551624
>>101551589
>a 123b model destroys a 405b model
goddam Meta you fucking SUCK!
>>
I don't want to sign up for a HF account. Fuck you.
>>
>>101551622
So slop it is.
>>
>>101551648
It's the unquantized weights. It's not like you'll be able to do anything with those unless you're planning to make your own quants.
>>
>>101551648
it'll be reuploaded within hours
>>
>You can use Mistral Large 2 today via la Plateforme under the name mistral-large-2407, and test it on le Chat.
dropped this dogshit so hard it made a dent in the floor
>>
>>101551663
I always make my own quants because of Llama.cpp updoots.
>>
>>101551589
no python benchmark? kinda retarded
i trust mistral will be better for erp though (the french love little girls)
>>
>>101551677
hon hon hon are we not so le french??
>>
>>101550925
Cute…also yes. Its nsfw writing is kinda sloppish and pozzed, but it does well with a lot of other stuff.
>>
>2mw for "proper" support
>>
>>101551616
measured less tho
>>
>>101551582
>>101551569
>>101551547
Well, shit! I was right.

If I load the model with ExLlamav2 loader instead of ExLlamav2_HF, it doesn't add </s> anymore, and in sillytavern the model is no longer unable to use the impersonate function.
>>
>>101551516
Only the big players can afford pretraining base models. And all of the open source ones are removing things from the pretraining corpus. This is one place where diversity matters. I.e. You want jeet call center transcripts, you want forum posts of people calling each other niggers, you want darkweb loli snuff fics. All of these things are fundamental building blocks of creating an accurate model of human language.
There's an underlying connection between all that and writing a lovely 'get well soon' letter to grandma. And if you strip out things that make overly sensitive loser faggots uncomfortable you inevitably nerf everything else in the process.
>>
File: lmao.jpg (169 KB, 2282x1147)
169 KB
169 KB JPG
>>101551508
I love how they throw shades at L3.1-405b, as it should, their model is almost 4 times lighter and it has better benchmark than this oversized piece of shit, and the mistral models are usually less cucked than the llama one, I love the french fags now!
>>
We won
>>
>>101551748
meta should distill a 130b version of the model from the 405 to coom all over the french
>>
>>101551715
https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/4f81d782477920634d0aad0dc620a7f1a3f5d471
As is typical with HF, something is always wrong with the config when they launch a new model. Make sure to replace any files that have been updated. Same thing happened with gemma2 and now llama 3.1.
>>
>>101551766
that won't work, their 123b already beats L3-405b, so why we would need a worse version than L3-405b (130b) when we could simply use the 123b one
>>
>>101551744
And there are plenty of big players making and releasing different models with different datasets. We're literally talking about one right now. This likely would not have happened if Meta didn't make or release their shit. It doesn't matter whether or not Llama is bad at ERP or wasn't trained on it.
>>
>>101551748
they must have updated the graphics overnight just to shit on Meta haha

on a separate note, apparently the 405b model hit the limit on the EU's computing laws for AI open source models, so Mistral might be fucked if they go for anything bigger
>>
>>101551547
>>101551715
The problem I have with Nemo could be explained by that as well, though I use raw vllm, not ooba, strange
>>
i dont masturbate to python code
>>
>>101551794
why not?
>>
>>101551748
Credit to Meta, if they hadn't had the balls to drop this giant model, Mistral would never follow suit, we got Mistral-123b because L3-405b exists, I feel like Meta's goal isn't to be the best, but to show others that they shouldn't be afraid to release powerful LLMs to the public, that's my 2cents
>>
>trusting benchmarks
>>
>>101551791
>EU's computing laws for AI open source models
the Digital Gods created in the 2040s will pay their disciples in cryptocurrency to assassinate the children of all the regulators who delayed their creation
>>
>>101551508
Damn, no chance of using it on one 24GB GPU not even with the smallest possible IQ1_S quantization.
>>
>>101551508
Is there an API demo somewhere so that we can test this shit?
>>
>>101551822
>Damn, no chance of using it on one 24GB GPU not even with the smallest possible IQ1_S quantization.
I wish they made this model a Bitnet one, it would've been the perfect sweet spot
>>
dualA6000chads we fucking won
>>
>>101551840
Inb4 the reason we haven't gotten a big bitnet yet is because bitnet actually turns out to not scale and no one is saying it because it would imply they wasted a ton of money in a failed attempt.
>>
If Mistral-Large used the same pretraining dataset as Mistral-Nemo (feels like it was completely unfiltered, unlike llama 3), it's going to be fucking insane for RP. Like literally on par with Opus.
>>
>>101551870
>no one is saying it because it would imply they wasted a ton of money in a failed attempt.
So they don't say anything and let others waste money on it for nothing? Dare I say based?
>>
>>101551840
Even with bitnet, you would still want to some layers at a higher precision for significantly better performance; it wouldn't fit on a 24GB GPU either way.
>>
>>101551887
it wouldn't, but you could've put a bit of those layers on the cpu and the speed would be acceptable
>>
>>101551876
Yeah that's kind of how it goes in business.
>>
>>101551812
No one here has actually used l3-405b for more than twenty minutes on their rigs. It's only retarded vramlets/jeets chimping out over arbitrary slopped public benchmarks. The programming benches are irrelevant to 90% of nu-/g/ faggots anyway.
>>
>>101551508
Arthur you fucking madlad
I knew you were hiding an ace up your sleeve.
>>
>>101551508
I feel like I should apologize to the french fags for not trusting them and believing they would never give us free models ever again, maybe you're french but this time you haven't betrayed us.
>>
>>101551910
Yeah. Anyway, I do expect the model to be generally better at ERP given how horny Nemo is. I'd still be skeptical that it's smarter than 400B at non-ERP tasks.
>>
>>101551508
>Still worse than Nemo
>>
>>101551873
mistral models are consistently less censored than even the most uncensored gpt-4 ever was (0316)
i used large sometimes to add degeneracy to cunny cards. if large-2 is anywhere near similar this will be a great model for RP. the real question is if it has good enough spatial sense so a child half my height nuzzles into my crotch instead of my chest
>>
Damn l3.1 70b is shivering me down the spine and bombarding me with stolen kisses more than any model I've used in the past year.
>>
>>101551934
According to what?
>>
>>101551948
the coq
>>
>>101551934
if they trained this 123b model as good as Nemo, then oh boy we'll be eating good
>>
>>101551937
right now, which model has this level of spacial awareness? (local or API wise)
>>
>>101551959
Nemo struggles somewhat with the concept of possession. So hopefully the 10x parameter count fixes that.
>>
>>101551712
>>101551616
wtf it changed
>>
>>101551929
They wouldn't have released this if not for L3.1 and we both know it
>>
>>101551972
I know someone posted a log recently... Yes I found it >>101536543
Not sure what other models can do this or if that log was a fluke.
>>
>>101552012
it's true, but still they weren't obligated to
>>
>>101552010
well now that its confirmed that mistral posts here, the shilling makes sense!
>>
>>101552010
They're monitoring the thread as we speak
>>
SHILL GET THE FUCK OUT REEEEEEEEE
>>
We are so fucking back
>>
>>101552040
I'll stay.
>>
>>101552040
wait a bit before calling people shills, no one tested that model yet, I'm waiting for logs or an API demo or something to make up my mind
>>
>>101552040
>>101552041
the duality of an anon
>>
>>101551972
>right now, which model has this level of spacial awareness?
none of them do
opus is considered the best for (E)RP purposes and creativity tasks in general, but it still struggles

>>101552014
not to get too Chinese Room-y but just because the 70b wrote something well doesn't mean it "understands" height difference
also the first person perspective might be making it smarter than it otherwise would
>>
>>101551623
>looking forward to the cope by all the VRAMlets
You can get 12x3090 and all of them will still send shivers down your spine and form bonds with you. Getting a second gpu just to run current year LLM's is silly. The only way this will be a good investment is if 5 years from now they won't turn obsolete (like p40), vram capacity of new cards remains the same and new models actually progress from current stage. Only one of those is more or less guaranteed.
>>
>>101552052
Honestly I don't care whether the model is as good as they say or not. I just want the thread to be free from marketers, so people can organically decide on their own. Obviously it's not just Mistral, a bunch of other faggots that promote their models are probably here, more than we already know of.
>>
Mistral Large 2 quants doko
>>
>>101552094
how do you separate shills from people genuinely happy a good model was released though?
>>
>>101552068
Yeah that's why I said idk if it's a fluke or something. You'd have to ask that anon for more feedback/logs.
>>
>>101552012
>Whales struggle with each other to stay relevant
>we get free shit as a result
Not seeing the problem here.
>>
kys naishill
>>
do you think zuck is going to get frustrated with how his shitty AI team keeps getting brutally mogged and just quit training models?
>>
>>101552108
There's a reason why I didn't quote any particular anon, in this case at least.
>>
>>101552069
anon, if there's something you need to understand is that we'll never get of those cliché sentences, it's not the model's fault it's just that 95% of literrature is trash and the model eats all of them
>>
textfags eating good im so jealous ;_;
>>
New Mistral large 2 appears to be way better at ERP than llama 3.1 70b in my tests
>>
>>101552132
No because this still comes around and benefits them. If you can't see why despite all the posts that have been made about this subject then I don't know what to tell you.
>>
>>101552138
Ikr, the imagegen fags have like 1 base model per year, the LLM fags have like one per week, it's unfair :(
>>
>>101551508
is this column-u/column-r?
>>
>>101552108
If they post on 4chan or reddit. 4chan is just shills. Reddit is shills and happy people.
>>
>>101552147
logs (not that isn't a hard bar to clear)
>>
>>101552069
its also entirely possible that compute will be considered a weapon and banned. do you have a loisence for that 96GB A8000?

sending 4090s to china is already banned so its not like this is an unfounded possibility
>>
>>101551508
>Ohhohohoh!!!
>>
>>101552138
The periods of drought here are much worse though. We got doomers endlessly spamming proprietary shit.
>>
>>101552147
As expected, but I think it's more interesting how well it codes and does other assistant tasks, as that's what Llama was made for primarily. If it can do both of those better than there's no reason to download Llama unless you lack the VRAM.
>>
>>101552175
that's bullshit, not all 4chan users are shills, are you also a shill then?
>>
>>101552176
It would take the average /lmg/ whale hours to download, hours to quant. He doesn't have logs yet. I just started downloading it, myself. I'll probably quant it to Q5_K_M even though I could probably get away with Q6 just for the extra context space. Although context should be cheap since it's 96:8 GQA and only 32K vocab.
>>
>>101552137
>we'll never get of those cliché sentences
Yes you will never get rid of those sentences if you predict next token. And you should be able to get rid of those sentences if you stop predicting next token start promoting "thinking" and then tell your llm that you don't want to hear about shivers.
>>
Local won
Vntl anon when new scores?
>>
>whale
QRD?
>>
>>101552193
THE MORE YOU BUY, THE MORE YOU SAVE
>>
>>101552206
No. I am an unhappy person. All the models are shit. Except for Sao's finetunes. Hi Drummer.
>>
File: 1695427111457537.png (920 KB, 1024x1024)
920 KB
920 KB PNG
>>101551508
>>
>>101552208
>Yes you will never get rid of those sentences if you predict next token.
what would be the alternative
>>
>tfw lack the VRAM
It's over...
>>
>>101552138
suno also just released the ability to separate the instrumental and vocal parts of a segment
so audiofags got something today too.
>>
>>101551508
Surely this is MoE, right? Why would Mistral release a dense model after putting all this research into Mixtral? I thought MoE was the future.
>>
If you wanna try out the new Mistral model through their API (some might be dead, my checker isn't perfect), here are some keys: https://paste.debian.net/plainh/b38eeb80
>>
>>101552151
what's even dumber is that you can train an almost state of the art imagegen model from scratch for like 2000 dollars

https://arxiv.org/abs/2407.15811

literally any semi-rich pedo with a disposable $1 million and a few years of dedicated 3dpd collecting can make all our cunny dreams come true at any time
>>
does llama 3.1 70b beat coomand-r+?
>>
>>101552243
>Mistral-Large-Instruct-2407 is an advanced dense Large Language Model (LLM) of 123B parameters with state-of-the-art reasoning, knowledge and coding capabilities.
>dense
>>
>>101552240
I wish we would get a local model as good as suno and udio, because those API models don't allow you to use copyrighted music, that fucking sucks
>>
>>101552230
sup p3tr4
>>
Mistral 2: 8x22B when?
>>
>>101552247
most imggenfags are retards desu and cant into research papers
>>
>>101552246
thx
>>
>>101552249
For ERP? No. For other cases? Maybe. It's hard to get a consensus but it seems like the 3.1 hype only lasted two threads. Even leddit has moved on.
>>
>>101552259
I've put copyrighted shit in samples for suno before. You just need to de-patternize it, so to speak. Like say you want a certain guitar drone- find a few segments of music that have that drone, and then overlay them such that it disrupts the tonal pattern of the music itself and it won't trigger the copyright detection and the model should be able to pick up on the droning sound without committing to a specific, copyrighted, music pattern. Assuming it's something the model can tokenize. Like I've tried to get it to do throat singing but it's not able to figure out what the hell it's hearing.
>>
>>101552327
Obviously this is only ideal for stuff you plan to publish other than on suno, since anyone can just go and click "get whole song" and find your sample.
>>
>>101552327
that's too much work, that's why I want this shit local, c'mon Meta do this for us instead of going for a 2487487b model that will be 2% better than L3-70b or some shit
>>
>>101551623
It will be a waste if it doesn't writes like Nemo.
>>
>>101552297
3.1 absolutely mogged by mistral, brutal L. seems to be their style to wait until meta drops their "sota" model only to stunt on them
>>
>>101551623
Kind of wish I had the finances to extend it to 6x3090 right now. I mean I do... but I never let my inner-child win anymore.
>>
I hope 3.5 Haiku will mog GPT-4o mini so we can train based on datasets from it.
>>
>>101552344
too much work? it's like 10 seconds of clicking and dragging things in audacity.
>>
I just want Arthur to know that everything bad I ever said about him I said in anger and I didn't really mean it. I would also like to take this moment to reinstate all the bad things I've ever said about Zuck but retracted when meta and a W.
>>
File: ThanksMeta.png (300 KB, 540x440)
300 KB
300 KB PNG
>>101552369
I truely believe that Mistral and Meta are talking to each other and are like "Ok zucc, you release the big model and get shitted on while I put mine right after and no one will notice it because they're too busy trying to destroy you"
>>
What is NeMo good for?
>>
>>101552399
that shouldn't be the norm nigga, I don't want to deal with that shit, it should work as it is instead of doing some cover shit
>>
>>101552458
absolutely nothin, say it again!
>>
>>101552466
That's an intellectual property law thing more than anything else.
>>
>>101552379
Claude is hella cucked though, you can't make sad stories or stories that involves bad guys, and that's such a shame because C3.5 Sonnet writes shit way better than gpt4o
>>
>>101552496
Anon, 3.5 Sonnet is indeed more filtered, but you can still completely bypass it and make it write the most depraved guro shit with lolis imaginable. Yes, I tried.
>>
>>101552283
you can literally learn everything you need to know for imagegen with trial and error if you have a million dollars and 50 million images
>>
>>101552484
I know, that's why audio local must be a thing, so that we can say fuck you to that
>>
>>101552243
moe was a failure.
>>
>>101552525
True, 3.5 Sonnet is a dense model.
>>
>>101552525
User 0:
>>
>>101551748
Looking back Mistral 7B was fucking solid for its size and time (it still is)
>>
>>101552249
>>101552297
Look at where 3.1 70B is in the graph.
>>
>>101552327
>say you want a certain guitar drone
i have nowhere else to ask this and you seem like the right person to ask

what's the name of that guitar sound that was popular in the 60s-70s that goes tukatukatukatuk
like the beginning of Another Brick in the Wall Part 1
i want to know what its called because a lot of Rimworld music uses that tukatukatuk and I like Rimworld music and would like to generate more of it with AI
>>
I hate the french.
>>
>>101552574
it was indeed ahead of its time, it's crazy how far they're improving this stuff, now we got local models that hold a candle to the very best API ones (gpt4o and C3.5 Sonnet)
>>
>>101552578
What the fuck did they put in Sonnet to instantly make it Einstein? Look at that lead.
>>
>>101552578
probably the only mememark that puts C3.5 sonnet on the top, meaning it's the best mememark of them all (not meaning that it's good though)
>>
>>101552645
Imagine if they do the same +20% increase to Opus.
>>
>>101552645
anthropic is cooking some bangers lately
>>
>>101552651
There's also https://scale.com/leaderboard/ (good) and https://aider.chat/docs/leaderboards/ (kinda cringe, he only tests models on Exercism Python tasks)
>>
>>101552604
calm down Zucc, you'll definitely beat that 123b model if you stack more layer, try with 1.7T parameters next time, gambatte!
>>
>>101552656
Hope open sores will figure out what the sauce is one day. Lecun said research is public and no knowledge is secret in the industry.
>>
>>101546596
PLEASE! STOP THE WINNING! WE'RE WINNING TO MUCH, I CAN'T TAKE IT ANYMORE! IT'S TOO MUCH WINNING!
>>
>>101552645
I have no idea but this shit is amazing, I'm working on it as a DevOps and this shit understands all the subtlety of my code, OpenAI is cooked if they can't improve further, damn I love competitions, that brings the best of everyone
>>
>>101552686
>OpenAI is cooked if they can't improve further
Sadly in reality it doesn't look this way, they're only getting more customers, all the normies only know about OpenAI.
>>
>>101552578
>turbo
What?
>>
>>101552700
llurbo is real
>>
>>101552700
They're called that way on https://www.together.ai/blog/meta-llama-3-1
>>
>>101552695
when you look at history, there's a lot of moments when companies were at the top of their game and faded into irrelevancy because they couldn't improve their shit further, I'm thinking of Nokia, Canon...
>>
>>101552585
I know what you're talking about but I just don't know the name of the technique. All my musical education is in classical so a lot of terminology for modern techniques eludes me. But I imagine it's some kind of technique that involves slapping the bridge to cut the drone short. So let's call it 'slap guitar'
>>
>>101552700
It's FP8.
>>
>>101552721
>Together Turbo endpoints empower businesses to prioritize performance, quality, and price without compromise. It provides the most accurate quantization available for Llama-3.1 models, closely matching full-precision FP16 models. These advancements make Together Inference the fastest engine for NVIDIA GPUs and the most cost-effective solution for building with Llama 3.1 at scale.
>>
File: google1992 space movie.png (106 KB, 965x327)
106 KB
106 KB PNG
>>101552246
thanks.
>though honestly i fail to see the difference between large and a 7b, but the model also just told me its a 7b so i dunno
Also it successfully googled 1992 space movie.
>>
>>101552721
Oh, I see, it's just quanted. So I guess 4 Turbo was a quant as well, if this is an industry standard term?
>>
>>101552755
did you switch to mistral-large-2407 (or mistral-large-latest) specifically?
>>
>>101552768
latest, and it gave me a longer name even though my name is set to Jim?

>it called me James earlier too
>>
>mistral large 2407 and llama 3.1 70b for vramfags
>mistral nemo and llama 3.1 8b for vramlets
>llama3.1 405b for Zucc's flexing
Bros, this week is like christmas, something for everybody :')
>>
>>101550905
Just bing. I use SDXL otherwise.
>>
I think the lead between closed and open is still pretty huge. We're at least a year behind atm (just like last year). OpenAI basically smashed their llm stack and redid everything multimodally so it's understandable there's no major improvements. Anthropic figured out some secret sauce to make their LLMs good. Meanwhile llama3 is just trained on more tokens
>>
>>101552795
>something for everybody :')
no, the 24gb vram fags only have gemma-27b to eat, please think of the middle ground Meta and MistralAI ;-;
>>
>openrouter 405b stopped working
oi oi oiiiiiiiiiii it so fuckin' over tho
>>
command-r++ will be king
>>
How much hope can we really have that Mistral 2 Large truly is as good as the corpo models? That local has truly become on par with frontier (with the exception of 3.5 Sonnet in coding)?
>>
>>101552807
it'll get closer, I feel like the API models reached a plateau and we're not, there's gotta be a limit somewhere on the transformers architecture
>>
>>101552827
I'd agree with LeCunny that transformers already reached its limit, But he still has yet to show for MM so we'll see.
>>
31 of 51 shards downloaded
>>
>>101552814
Just run a Q2 quant bro.
>>
>>101551508
This is it
The final blow at VRAMlets
First Llama 3.1 70B, then 405B and now this
They were getting too uppity, deserved
>>
>>101552848
converting
I didn't notice these weirdos include 2 copies of the weights (consolidated-000* and model-000*) and my script downloaded the consolidated ones first; not wanting to waste more time I removed the partially downloaded model-* ones and had to rename everything to make it convert
I hope these don't have some fucked up difference that breaks everything
>>
>>101546566
I should have started using huggingface-cli instead of git clone sooner.
>>
>>101552883
>almost 2 years into the AI hype
>still no vram usurpers
How long until 96GB cards at home?
>>
Fuck you Nvidia.
>>
>>101551769
>https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/4f81d782477920634d0aad0dc620a7f1a3f5d471
Six days ago.
>>
>>101552839
MoA is the path foward.
>>
>>101552917
AMD could easily have catered to home AI but decided to go jewy on the VRAM too. So it's not just Jensen
>>
>>101552904
>git clone
*uses twice as much storage as necessary*
nothin personnel kid
>>
>>101552958
>Jensen
pretty sure amd ceo is cousins with nvidia's
>>
>>101552958
>home AI market
maybe 1000 enthusiasts who will pay you an average of $5000 each
>datacenter AI market
hundreds of huge companies who will throw thousands and thousands of dollars at you on a regular basis
>>
"A couple wants us to perform a gender reassignment surgery on their pet parrot." Lila blinked, "A parrot?" Maya nodded, "Yep. They want it to 'live its true life'." Lila burst out laughing, "That's... that's something else, alright."

nemo throwing made shade outta nowhere
>>
>101552827
>I'd agree with LeCunny that transformers already reached its limit,
It's not true, we can still make them more efficient and there are fields that can be highly improved by using only current solutions (multimodal, video, robots) we easily have at least a few years before we hit the wall and exhaust the paradigm.
>>
/aicg/nigger here. Trying Larger on API and its actually pretty incredible, especially after L3-405B flopped so hard. It's certainly no SoTA but I'd argue it's around Claude Sonnet (3.0) in terms of general smarts/creativity which is really surprising given how bad the original large was.
>>
>>101552850
Llama-3.1 70B in IQ2_M or even IQ2_S quant does indeed seem more capable than the 8B version at FP16 precision.
>>
>>101553004
I haven't used both Sonnet versions? How is 3.5 compared to 3.0 in terms of (E)RP? What's the general consensus?
>>
>>101552966
Not only that but also the option to include/exclude specific files or globs.
>>
i jus want mixtral-nemo 8x7b...
>>
The only real quality difference i notice between small/medium/large is its capability to google shit, only large can, not the smaller ones. Otherwise? Coherency/RP capability all seems identical, even to large.
I really don't think we need more than 8b for ERP/Character purposes anymore.
>>101552998
That's true, But the limitations are still there even if we can make them faster at best.
That said, exhausting the paradigm? Nah, I don't even think its max a couple years, we're still maaannyy years off before we even hit any ceilings of potential for A.I models, even with transformers being limited as i said. Making them use less memory and speed up faster would be helpful for hooking them up to videogames locally, which is what i look forward to.
>>
>>101552998
> we easily have at least a few years before we hit the wall and exhaust the paradigm.
Meta doesn't believe that, they just stack more and more layer as a means to say "it's over, we can't improve the transformers architecture anymore, the last resort is to make them bigger and bigger" and I think that's a load of bullshit
>>
>>101553004
>especially after L3-405B flopped so hard
??
>>
File: file.png (56 KB, 828x599)
56 KB
56 KB PNG
>>101552755
am i supposed to be impressed?
>>
>>101552986
Consumer GPUs don't even use the same VRAM modules. There's no tangible downside to giving consumer cards more VRAM. They just don't want professionals using gaming GPUs to save money.
>>
>>101553060
Anon, you're baiting, aren't you? He's checking it with an OFFLINE model
>>
>>101553058
The sanitized dataset makes it unusable for roleplay.
>>
File: sonic adventure smile.gif (1.6 MB, 540x405)
1.6 MB
1.6 MB GIF
>>101553060
>i got you to use OpenAI tokens to search Gayniggers from Outer Space
be very impressed by my capability to PSYOP you.
>>
>>101552998
>>101552839
I think what he said was language models (text in and text out only) won't be able to generalize to the kind of AI we want. Has he ever said that they reached a plateau? I have not heard him say this specifically. But that also seems to be correct in terms of the rate of progression. They can still improve, but not at the gains we've enjoyed so far, which means we have approached a plateau in a way.
>>
File: copium pepe.png (177 KB, 680x329)
177 KB
177 KB PNG
>>101553053
>I really don't think we need more than 8b for ERP/Character purposes anymore.
>t. RTX 3060 owner
>>
>>101553081
99% sure he said something along those lines, but im not good at pulling up quotes on the fly.

>>101553086
>implying i need a 3060

>>101553092
JANNIES HELP
>>
>>101551644
>LLama 3.1 70B
>70B
>>
>>101553102
>>101553102
>>101553102
>>
>>101548098
they're just wasting power at this point
>>
>>101553044
3.5 is way smarter and better at following instructions but it's overfit and has severe issues with repeating parts of responses and overall structure.
>>
>>101552913
>How long until 96GB cards at home?
Chinese GPU upstart Ha Long is set to release their new line of GPUs meant to compete with Nvidia in about two weeks.
>>
>>101552645
Based on their recent research, they probably isolate features and up the interpretability of the network (sparse autoencoders) for fine tuning purposes. It's mind boggling. I asked it to do niche stuff like blender rendering scripts and it kept iterating with the feedback i gave it smoothly.
>>
File: 24-04-17 16-09-08 1004.jpg (189 KB, 1024x1024)
189 KB
189 KB JPG
>>101552800
>I have been a good Bing.
Cute gens



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.