[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: teto_beeg_llama3_8K_.jpg (2.24 MB, 6144x4096)
2.24 MB
2.24 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101524155 & >>101524039

►News
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
update koboldcpp with the latest llama.cpp pls
thank
>>
►Recent Highlights from the Previous Thread: >>101524157

--Paper: vTensor: Flexible Virtual Tensor Management for Efficient LLM Serving: >>101529200 >>101530804
--Papers: >>101529398
--Open-source language model training pipeline: >>101531905
--L3-instruct model evaluation and transformer plateau discussion: >>101524467 >>101524871 >>101525251 >>101525409 >>101525391
--Llama3 context memory limitations and potential solutions: >>101529507 >>101529571 >>101529699 >>101529722
--Ghost 8B Beta: Game-Changing Language Model: >>101532197 >>101532526 >>101532554
--Gemma uncensored with system role prompt: >>101532440 >>101532540 >>101532643
--Anon seeks advice on creative writing prompts and heat values for Nemo: >>101524270 >>101524297
--Anon compares vllm with Nemo to llama.cpp and decides to stick with Wiz/CR+: >>101530022
--C3TR-Adapter v3 outperforms GPT4 Turbo in en-JP translation: >>101531218 >>101531243 >>101531273
--Anon shares their experience with Gemma 2 27B and seeks similar local models: >>101527275 >>101528426 >>101528475 >>101528482 >>101528501
--Anon shares progress on developing an addon with weather and lighting details for AI models: >>101529481
--Temperature settings and model performance: >>101528836 >>101528844 >>101528891 >>101528899 >>101531651
--Request for an extension to validate prompt format and default settings for ST: >>101529183
--Late release of a single-board computer with potentially incorrect specs: >>101529098 >>101529119 >>101531350
--Disappointment with Llama 3.1 base model performance and expectations: >>101525626 >>101525650 >>101525751
--Anon seeks advice on optimizing Gemma-2-27B-it settings: >>101530384 >>101530414
--Anon asks for help with repeated output. Temperature and logits mentioned.: >>101529218 >>101529234 >>101529261 >>101529277 >>101529290 >>101529306
--Miku (free space): >>101524875 >>101524640

►Recent Highlight Posts from the Previous Thread: >>101524362 >>101530623
>>
teto's new tits...
>>
Cohere.
>>
File: 1719003577740750.jpg (19 KB, 479x360)
19 KB
19 KB JPG
>>101532918
>>101532904
>No Miku
>>
Threadly reminder that Claude just shits out purple prose and very little of substance preferred only by illiterate jeets who think more words == smarter reply.
>>
File: 1719943929547150.png (3.41 MB, 1992x1328)
3.41 MB
3.41 MB PNG
>>101532982
It's Tuesday
>>
eat a dick
>>
File: 1721046162611328.jpg (57 KB, 600x450)
57 KB
57 KB JPG
STENKYHENKY PLS MERGE MISTRAL-NEMO SUPPORT INTO KOBOLDCPP
>>
>>101529481
Is this post referring to https://github.com/ThiagoRibas-dev/SillyTavern-State
>>
So is Meta going to release their code/methodology for distillation so that the community can make its own intermediate models in the future?
>>
>>101533121
>https://files.catbox.moe/cbclyf.png
Nope.
I haven't messed with my that extension in a while. That anon's is something else.
He has posted about it before too.
>>
>>101533158
We'll see soon enough. But I'll not that there's already a FOSS distillation pipeline out. It came out a whole yesterday ago
>>
>>101533163
Okay I've seem him post a lot about that clothing/lighting/weather extension and got confused, since they both have the goal of keeping a persistent "state". I would really like to try the extension in the post I referenced; it looks like a lot of fun and I don't care if it's a bit rough around the edges as long as it doesn't erase my entire /data folder in ST lol
>>
>>101532918
>►Recent Highlights from the Previous Thread: >>101524157
Wrong thread. Bad Teto.
>>
>>101533182
Yesterday? But that's two weeks from now.
>>
The chinks making Powerinfer2 should just release a binary version which works only with their current turbosparse models.

A CPUmaxxing version Mixtral47B-instruct running at a couple 10s of tokens/s which everyone can try is better PR than a paper.
>>
There is clearly a degradation of gemma answers with exl2 between 5K-8K context
>>
>>101533325
go back to llama.cpp, it works well there
>>
>>101533325
Yeah, it was never at a usable state.
>>
>>101533092
Nexsexsex already did.
https://github.com/Nexesenex/kobold.cpp/releases/tag/v1.71010_b3340%2B5
>>
>>101533372
>running random binaries from the internet
>>
>>101533092
just use llama.cpp like a normal person
>>
>>101533432
You can compile his fork yourself.
>>
>>101533478
>half samplers missing
>>
>>101533478
>llama.cpp
I'm too retarded to make it work.
>>
>>101533499
like the cope curve?
>>
>>101533092
use llama-server.exe with some front-end like risu.ai and configure api anon.
>>
>>101533499
>samplers
>>
>>101533531
>.exe
I think they have a linux binary, maybe I'll give that a go
>>101533478
me big dumb, last time I tried lcpp it was compile-only for linux and my current CUDA install I got everything working great except nvcc is nowhere to be found. might try one of the binaries. I kinda miss recompiling kobold, it made me feel smarter
>>101533372
thank you!! thanks so much anon this is perfection
>>
>>101533092
Use ollama like a normal white man.
>>
>>101533649
this
Ollama just works on my Mac with macOS
my M1 Air (8gb RAM) can run 8b models while coding, watching youtube and shitposting on 4cuck
>>
>>101533670
I actually want to buy a Mac Pro with 128gb to run very large models locally while shitting on x86 cucks and nvidianiggers
>>
File: 1695654187593153.jpg (28 KB, 500x594)
28 KB
28 KB JPG
>>101533670
>Someone actually blew the money for one of the expensive MX macs with the 8gb ram configuration
That shit should be illegal as a minimum ram config on hardware that expensiv, it's basically robbing gigaretards like you.
>>
Where the fuck is 3.1! C'mon zuck. It's past 6am on the west coast now.
>>
>>101533704
my laptop without any cooling runs LLMs better than your expensive PC, let me ask my uncensored Llama 3 about it, oh, it said you are a dumb nigger, I got my worth out of this laptop, I am going to upgrade to M5 next year when they redesign the whole chassis, 8gb is more than enough, especially on macOS
>>
>>101533744
>my laptop without any cooling
Imagine being triggered by a spinning fan.
>>
>2 hours 50 minutes and 48 seconds until llama 3.1 launches
>>
It won't launch.
>>
>>101533744
>8gb is more than enough
>>
>>101533757
>imagin-WRRRRRRRRRRR get-WRRRRR *whizzing noise*
>>
>>101533773
You're a retard that spent thousands of dollars on a glorified netbook. Nothing you say holds any validity. You had to buy the one that "just works". I feel like I'm doing a disservice to humanity just by humanizing you by providing you with a response right now.
>>
>>101533772
it is though
>>
does cpumaxxchad have t/s numbers for 405b already? anyone willing to take bets? i say <1t/s
>>
> Anyone else annoyed by the leak of Llama 3.1??
>I get it, we are all excited and I did look at the benchmarks. But I am still annoyed by the leak. A lot of people invested a massive amount of time and effort into Llama and they are releasing it for free. That is amazing! Let them have a launch based on their terms!
https://www.reddit.com/r/LocalLLaMA/comments/1ea7pqy/anyone_else_annoyed_by_the_leak_of_llama_31/
>>
>>101533799
llama.cpp doesn't support it yet
>>
>>101533806
https://www.reddit.com/r/LocalLLaMA/comments/1ea4x4f/llama_3_405b_q4_k_m_size/
>>
is anyone else having massive repetition issues with nemo? I keep cranking up the rep penalty and changing around the sys prompt, but its still shit
>>
>>101533804
>leave the global multibillion dollar corporation alone!
Completely ignoring the fact that the only reason Meta is relevant at all in the AI space is because of the original leak. If anything this just gave them more hype.
>>
>>101533799
Which quant? Or did you mean the full fp16 version?
>>
>>101533813
>is anyone else having massive repetition issues with nemo?
yeah, i dropped it cause of that, it kept falling in patterns, X however...
... Y however etc.
>>
>>101533804
go back and stay back, subhuman
>>
File: m2-res_480p.webm (385 KB, 270x480)
385 KB
385 KB WEBM
>>101533773
>>
>>101533812
>https://www.reddit.com/r/LocalLLaMA/comments/1ea4x4f/llama_3_405b_q4_k_m_size/lej9efo/
>its the same guy who leaked mistral medium btw
redditors are drooling retards i swear
>>
>>101533744
>my laptop without any cooling runs LLMs better than your expensive PC
Sure, if they're tiny 7B or less models. Otherwise Apple silicon is like having a 3050 where you can pay a shitload of money to upgrade it past 8GB.
>>
>>101533845
Good think you're there to tell us.
>>
>>101533831
that sucks. its pretty good and can actually handle somewhat complex scenarios until it starts shitting itself
>>
>>101533812
How tf did he do it? When I try converting to GGUF, I get invalid GGUF metadata errors.
>>
>>101533824
>I get your point but it's still not on their terms. If they want advertising, they can build hype themselves. My point is that the team behind Llama should decide on how they want the launch to play out. It should be their decision.
>>
>>101533813
Once models begin repeating paragraph-level patterns (for which repetition penalty can't do anything), it's the end. Luckily, with SillyTavern you can use the {{random}} macros to solve this problem.
>>
>>101533878
>for which repetition penalty can't do anything
What about DRY?
>>
>>101533874
>its pretty good and can actually handle somewhat complex scenarios until it starts shitting itself
agreed it's annoying I tried quite a bit of stuff some rep ren, no rep pen, but eventually it always latched onto something
>>
>>101533845
>its the same guy who leaked mistral medium btw
the legendary hacker, 4chan
>>
>>101533845
>228 gigs
It's going to be a tight squeeze. The KV cache is going to be fucking gargantuan. But there might be some sweet spot where I can offload just enough layers to load it. (256 gigs RAM 96 gigs VRAM)
>>
https://github.com/SillyTavern/SillyTavern/blob/51c30e/public/scripts/instruct-mode.js#L258
combined_sequence.split('\n')

That explains why random crap ends in the stopping strings. Instruct mode is fucking garbage.
>>
>>101533950
Just use oobaboogies for instruct
>>
llama 3.1 waiting room
>>
>>101533998
I am a patient boy
>>
How good is the new llama going to be bros
>>
File: ominous.png (21 KB, 805x246)
21 KB
21 KB PNG
>Waiting...
>>
>>101534031
It will draw you into its folds for ministrations that will send palpable shivers down your spine until you feel a bond begin to form
>>
>>101534045
i don't get how these fucking phrases are still overused by a bunch of models

benchmark scores have doubled but it's still the same ministrations and shivers up the spine
>>
>>101534045
I'm already licking my lips in anticipation
>>
File: file.png (786 KB, 768x768)
786 KB
786 KB PNG
>>
>>101533478
Is it as cancerous to get working on w10 now than it was a year ago?
>>
>>101534066
Oh my stars! Oooh ooh ooh! *bounces up and down, bats eyelashes*
>>
>>101534035
Go away "GiVe PrOPer CredIt For UsiNG A PaRaMetER" retard.
>>
>>101532904
>tranime
>>>/a/
>>
>>101534055
I can't understand why this problem even exists when you could write a simple script that automatically replaces gpt-isms with different phrases. It seems like a trivial feature to have.
>>
>>101534076
>don't you think I should be at least mentioned since it was me the first one to quantize in this way (while you were saying that nothing changed)?
>Now that people want my quants, you do the same ant not even cite me.
>Nice.
>That really motivates me in continuing to share everything I find useful.
>>
>>101534055
I'm so tired of explaining this. It will only get worse as the models otherwise get better. There's no process that limits the number of task vectors that can point to an individual outcome. So as the model gets better and recognizes more complex patterns it creates a massive funnel of task vectors that point inferences to these common outcomes. The models literally have digital brain tumors. And eventually the problem will extend beyond just creative writing.
>>
>>101534077
newfag
>>
File: p53BR9W.png (328 KB, 436x582)
328 KB
328 KB PNG
>2mh
>>
>>101534091
the model will inevitably go towards shivers because reasons i forgor just regexing it out is a band aid
>>
>>101534102
nobody cares, fuck off, buy an ad, then buy a rope
>>
>>101534091
>why this problem even exists when you could write a simple script that automatically replaces gpt-isms with different phrases
I don't think you can script away the underlying problem that all the models are just telling you "i start sucking your dick" with a lot of purple prose before and after. If it always gives you shivers then it is probably creatively bankrupt.
>>
>>101534055
They could get away with extensive finetuning. Llama 3.1 instruct has supposedly been finetuned on 25 million synthetic examples (potentially trillions of tokens at full 128k context), we'll see in 2 hours how they affected the model's prose.
>>
>>101534075
Are you saying I should post more?
>>
>>101534117
Everything is a band-aid. Rep-pen, stop strings. If it works, I don't care about it being a band-aid. Also substituting phrases can also aid in mitigating repetition
>>
>>101534138
don't base models usually have less slop than instruct ones?
>>
>>101534121
>add me on discord: robert_46007
>use my quantization method: f16 for output and embed and 15_k or q6_k for the other tensors and you will have a better model.
>>
>>101534110
Nothing ever happens.
>>
>>101534162
But a lot happens though. Otherwise I'd have already exited the thread like I have so many other generals that I thought I would be in forever.
>>
>>101534175
Bad.
>>
>>101534136
I'm only discussing certain gpt-isms that trigger /lmg/tards, poor prose is a another issue
>>
What about making it so the front-end feeds the context on an entirely separate instruct prompt that asks it to edit out anything in the reply that is overly repetitive with the preceding conversation. You'd have to give up streaming, but you wouldn't have streaming with a human partner and streaming was just cope for how slow models used to be.
>>
File: wow.png (29 KB, 782x197)
29 KB
29 KB PNG
>Like how Miqu isn't actually Mistral Medium, but an amalgamation meant to create anime fan fiction.
>>
>>101534206
Sounds like a typical /lmg/ mikufaggot
>>
>>101534147
I doubt it, much of it is from humans and published erotica (i.e. books datasets).
>>
>>101534206
That username sounds like it was made up by an LLM. Probably damage control jeets hired by meta.
>>
>>101534212
you're here early, excited for 3.1?
>>
>>101534136
As the fucking autist retard who keeps manually removing slop from a bunch of data for training, I have insights: it is layered.
1. Yes, LLMs find least resistance paths to providing answers. This we cannot fix without advancing architecture.
2. Yes, humans write so much fucking slop it's unbelievable. Over and over and over the same fucking phrases. Eyes sparking with excitement. Bucking hips. A mix of shit and shit. And so on.

I think there are blatant offenders. Then there is an underlying problem. We can do something about the former, with some effort. The latter requires billions of dollars.
>>
What ever happened to sampler anon anyway? Did you ever try my idea for the win-string penalty? (the one where if it selects too many tokens with absolute certainty in a row the absolutely certain tokens get penalized
>>
>>101534219
>That username sounds like it was made up by an LLM
r*ddit has randomized usernames suggestions on signup like xbox live used to have
>>
>>101534226
>As the fucking autist retard who keeps manually removing slop from a bunch of data for training
crestf411? Love your work! Big fan!
>>
>>101534247
Thanks. Tell your local fine tuner to use LimaRP-DS.
>>
>>101534215
I started thinking about this again and now I think this is the ultimate llm coomer-doomerpill. I keep 2MW-ing like everyone here and it is debatable how much models are improving but they are improving. However is it even possible for some new model to come out and be great at cooming? I think they all quickly learn that all smut averages out at shivers down the spine, mischevious gleams etc. Why would a model suddenly learn explicitly not to do that when it is the mathematical average of all smut?
>>
>>101533799
0.5t/s for Q8
>>
>>101534255
>However is it even possible for some new model to come out and be great at cooming?
There's lots of great coomer models.
You're just burning out your hypothalamus by overdoing it. Many such cases. Sad!
>>
>>101534253
You planning a Sunfall tune on 3.1 8B by any chance?
>>
>>101534271
>burning out your x
I can't believe how much my ass must be burned out from taking a shit everyday. And don't get me started on lungs or heart.
>>
>>101534255
By teaching them not to.

mischievously: 0 hits.
shiver([s]?) down: 0 hits.
>>101534273
Yeah. Hopefully it's a bit more varied than its predecessor.
>>
>>101534282
retard strawman argument.
If you aren't going to discuss this in good faith then enjoy your anhedonia. Zero sympathy from me. I will laugh when it destroys you.
>>
>>101534255
Avoid narration in your RP as much as possible and you won't see much of that. In my case I like making the model use emoji in substitution of *emotes*; some models like Gemma 2 know how to use them well.

Eventually with multimodal models we might get away with narration almost entirely. Most Japanese visual novels, after all, use very little narration, yet they are effective in conveying story events, actions, etc.
>>
>>101525626
It needs more training epochs, all the models do. It's vastly cheaper to add more passes on smaller models, and it will also take more passes for the larger models to plateau.
>>
File: R.jpg (51 KB, 407x405)
51 KB
51 KB JPG
>>101534295
I think the lesson from pic related is that it was always a mistake to try and discuss in good faith instead of just ridiculing the retardation. Your x got burned out meme is dumb and I am tired of seeing it on the internet.
>>
>>101534319
I'm not even going to look at your retarded cope meme. You are a damaged human being. Seek professional help.
>>
>>101533499
then use llama_cpp_hf on booba, it has all the samplers
>>
>>101534292
>Yeah. Hopefully it's a bit more varied than its predecessor.
Nice! You can know there'll at least be one guy hyped for that.
>>
llama 3.1 is going to change everything
>>
>>101534326
>You are a damaged human being. Seek professional help.
Anyone who believes dopamine receptors got burned out is a damaged human being and needs professional help. Ask a normie with a normal life what he thinks about your retarded dopamine cope.
>>
>>101534110
2md until first shitty loader implementation
2mw until proper loader implementation
2mm until proper loader implementation without bugs
2my+ until you get what you actually want...
>>
>>101534327
Plus none of the cpp tokenizer issues.
>>
>>101533058
my beloved
>>101533840
still running tho +genshunny +likely fake&gay
>>101534327
>>101534358
truly the best of everything. inb4 codelets can't venv and updoot
>>
It's here
https://llama.meta.com/llama-downloads/
>>
>>101534399
AAAAAARGHHHHH IM COOOOOOOOOOOOMIIIIIIING
>>
I hate it that mistral Nemo can't write from you. I really like writing half of my reply and then looking at what options the model can recommend. With nemo it doesn't matter if you are mid-sentence, it will start writing a response from the character, even without INST.
>>
File: 1720382923619536.png (200 KB, 622x626)
200 KB
200 KB PNG
>>101534399
LFG 128K context
>>
It's here
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
>>
>>101534399
not sure my raspberry pi 3 is up to the task
>>
>>101534420
However, is it multimodal?
>>
>>101534431
No
>>
>>101534431
>multimodal
no, pushed back due to eu stuff
>>
>>101534427
Provided my info but download instructions 404, and it has a 24 hour timer. Alas.
>>
File: llama dls.png (445 KB, 2072x1558)
445 KB
445 KB PNG
>>101534399
>>
>>101533804
>>101533824
>>101533837
proof these threads are full of predditors, same nigger comment as a reply to the original leak:
>>101518713
>>
>>101534341
You act like a drug addict being confronted about their addiction and you're so gooned out you think you're fooling anyone with your copium. Sad.
>>
>>101533878
>the {{random}} macros to solve this problem
?
>>
File: Muki.jpg (106 KB, 640x640)
106 KB
106 KB JPG
>LLaMa3.1 Is out in 3 different sizes: 8B 70B 405B
>Base and Instruct are available
>(Non mandatory) LLaMa guard and Prompt guard for safety
We're so back Anons
>>
>>101534431
However, does llama-server support multimodal?
>>
>>101534427
>>101534449
Firing up the Nala box boys. Time to make this kitty purr.
>>
So 128K 8B is basically designed for local roleplaying right?
>>
Mirror when?
>>
>>101534416
That's odd.

>>101533878
I love the {{random}} and {{pick}} macros. You can do so much with those.

>>101534513
Fuck yeah.
>>
>>101534356
ill quant all that into 2mw instead
>>
>>101534449
>MP16
whats that?
>>
>the diminishing returns are here
Shit. Fucking hell. I'm going to have to get a real job and a real gf, aren't I? FUCKING SHIT! HUBERMAN PROMISED ME IT WOULD KEEP SCALING! NOOOOOOO!
>>
File: 220.jpg (60 KB, 680x703)
60 KB
60 KB JPG
>(Non mandatory) LLaMa guard and Prompt guard for safety
>>
>>101534525
oh read the whole image nvm ahha
>>
Since it's still the same architecture, it'll just werk with Llama.cpp, right?
>>
>>101534526
it's a 3.1 8/70B are distillations of the 405 but you seem like a dumbo so lol @ u
>>
>>101534541
It'll break somehow
>>
Wait, are there actually people itt who can run 405 locally?
>>
>>101534558
we have proof zuckerberg posts here so yeah
>>
>>101534558
My MacBook Pro has 64 GB RAM. My desktop has 48 GB VRAM and 128 GB RAM. I ... think with RPC magic I can run it at like Q2 or something?
>>
>>101534552
This
>>101534558
If 1.5 bit precision counts then yes I can
>>
>>101534558
There are some CPU maxxers.
>>
>>101534527
That's a censoring model you run in tandem, it doesn't mean you can choose a less cucked model.
>>
>>101534541
If not they've had like a whole day's head start.
>>101534558
I'm going to try.
The Q4_K_M weights will be 228 gigs. I have 256 gigs of ram and 96 gigs of vram. There might be a magic number of layers I can offload at small enough context to fit the KV cache onto my GPUs and the rest into RAM. We'll see. Loading DeepSeek is pretty dicey as it is.
>>
>>101534558
probably a handful but I'm just gonna use that shit on the cloud
3.1 70b is the model for localchuds
>>
https://about.fb.com/news/2024/07/open-source-ai-is-the-path-forward/
>>
>>101534577
It does however mean you can jailbreak the shit out of it. And personally? Uncuckes models are boring, theyre too compliant. I like it when the model fights back a little bit.
>>
>>101534583
Except for imagegen models because... the kids okay?
>>
Are we finally back? Was it ever over?
>>
File: 1714730734021332.jpg (42 KB, 400x400)
42 KB
42 KB JPG
>>101534583
Holy. Fucking. Based.
>>
>>101534580
You make me want to build a unit
>>101534581
And you make me want to just use a server
>>
File: 911.jpg (45 KB, 448x446)
45 KB
45 KB JPG
>>101534583
>>
Would it be possible to distill llama 405B into something like 30B? I'm tired of only having 8B and 70B and nothing in between.
>>
>>101534583
ZUCK KINO
HOPIUM
BASED
>>
>>101534601
Your humongous server "unit" could be put in a case and run for less than 400W but alas leather jacket man doesn't allow it
>>
>>101534608
Gemma 2 27b
Yi-34b
Mixtral
Jamba
CommandR
Deepseek coder 33b
these are just from the top of my head
>>
>>101534575
it's gonna be really really slow though
>>
>>101534611
I'm not a powerlet idc about niggawatts.
>>
>>101534427
>>101534420
>>101534399
Nothing burger
>>
>>101534583
I wonder what Altman did to piss Zuck off so much.
>>
File: ComfyUI_00113_.png (986 KB, 1024x1024)
986 KB
986 KB PNG
>Now you’ll be able to take the most advanced Llama models, continue training them with your own data and then distill them down to a model of your optimal size – without us or anyone else seeing your data.
>distill them down to a model of your optimal size
Bros..?
>>
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B
>>
File: police.png (13 KB, 598x98)
13 KB
13 KB PNG
>>101534633
>>
>>101534484
You are a retard. Now try to deny being a retard and prove that you are acting like a retard being confronted about their retardation. That is what a retard like you would do. Pathetic.
>>
>>101534583
I fucking love Zucc redemption arc, he even unbanned Donald Trump on facebook recently
>>
>>101534583
>This is one reason several closed providers consistently lobby governments against open source
Looks like he grew some balls after picking up jui jutsu.
>>
lets go lads
subscribe to pewdiepie
>>
>>101534583
>Our safety process includes rigorous testing and red-teaming to assess whether our models are capable of meaningful harm, with the goal of mitigating risks before release.
meh
>>
File: OIG1.FXhqvbLKWQfx.jpg (135 KB, 1024x1024)
135 KB
135 KB JPG
>>101534589
It's not like they trained the model to be a "prude", they literally remove "inappropriate" responses via a variety of methods, namely:
· Penalized language model (PPLM)
· Clipped neural OOV (ClippedNOOV)
· Data curation (DAC)
Yes you can "jailbreak" it, but you're not going to get the sort of "spicy" replies you're hoping for, because they simply aren't there.
>>
File: GOD.jpg (92 KB, 583x640)
92 KB
92 KB JPG
>One of my formative experiences has been building our services constrained by what Apple will let us build on their platforms. Between the way they tax developers, the arbitrary rules they apply, and all the product innovations they block from shipping, it’s clear that Meta and many other companies would be freed up to build much better services for people if we could build the best versions of our products and competitors were not able to constrain what we could build.
itoddlers btfo
>>
>>101534558
Yes. The thought of my 10 x 3090 setup taking 4000W to generate shivers down the spine sends shivers down my spine.
>>
File: 1645206044798.jpg (410 KB, 726x716)
410 KB
410 KB JPG
>>101534654
Absolutely scathing.
>>
>>101534629
>I'm not a powerlet idc about niggawatts.
Spoken like someone who hasn't priced running a subpanel to their server room when the main breaker box is full. Shit's expensive! Better hope they wired your server room with two breakers - mine has a separate 20A circuit meant for an AC.
>>
>The fact that the 405B model is open will make it the best choice for fine-tuning and distilling smaller models
>To support developers fine-tuning and distilling their own models.
wait what
you nerds need to get on the case ASAP and give me a 30b model. put the dusty case with 9x 4090s to use.
>>
>>101534692
and the fact that those models are pretrained with leddit so that they act like like a cucked faggot doesn't help either
>>
GET ME A 3.1 70B TORRENT NOW
>>
>multilingual
>no japanese
every fucking time
>>
>>101534748
I know what you are
>>
>>101534728
this, I want a distilled L3-35b now
>>
>>101534748
>Note: Llama 3.1 has been trained on a broader collection of languages than the 8 supported languages.
>>
>>101534601
Going to an actual server board is not as plug and play as desktop hardware. I would warn you that much. Like I had to go into the UEFI and pull my NVME drive out of the depths of purgatory and wipe it clear and start over. Also the default memory interleaving strategy settings were garbage I ended up having to spend hours cycling through the BIOS and setting it up and then rebooting and testing etc before I got it dialed in to where I like it. And I was originally using some sick industrial workstation chassis I picked up for cheap off of amazon but realized that as soon as I went to put more than one 3090 in it, it was basically done. The arrangement of the x16 PCIE slots was such that without a 16x-8x reducer (or cutting the side of the slot) you won't fit more than 1 in a workstation chassis. (theoretically if the board is all x16 slots you could. But either way I had to switch to a mining frame when I decided to go multi-gpu. But then if you like dealing with shit like that as a hobby I guess that's a feature and not a bug.
>>
>>101534748
>no japanese
that's surprising because zucc has a japanese wife so you would think he's a kind of weaboo or some shit
>>
>it's been 5 minutes
>no quants
this hobby is dead
>>
>>101534724
I just set it up in the basement next to the box and installed the breaker and line myself.
>>
>>101534583
zuck is actually fucking based
>>
>>101534639
cool
>>
>>101534583
i forgive you mark
>>
>>101534778
This.
>>
>>101534781
It puzzles me to see a supposed good jew. I think that the hidden agenda is to reduce the white population through the dissemination of chatbots.
>>
>>101534828
>It puzzles me to see a supposed good jew.
he was a bad person before, it's not like we havd to forget this past either just because of his stance on AI
>>
>>101534837
This. Just because someone is correct on one issue, doesnt mean they are correct on another.
>>
Some quants available here

https://huggingface.co/collections/hugging-quants/llama-31-gptq-awq-and-bnb-quants-669fa7f50f6e713fd54bd198
>>
>llama 3.1 is in groq
>still not in OpenRouter
What is taking them so long?
>>
>>101534851
>half hour ago
the hobby isnt dead, this general is
>>
>>101534860
It's over. Rug pull in progress. Should have listened.
>>
>>101534844
yeah but he called him "a good jew", I wouldn't call zucc "good" just because of one good thing
>>
>>101534728
>give me a 30b model
mpt-30b-chat. You're smart, right? Figure out how to quantize it to exl2 and support the tokenizer it uses and make it run fast. This was the last neutral, non-deliberatly-aligned model, and it had 8192 context and wasn't stupid (for its time).
It was GPT-J trained, so there will be shivers. Once you get it running well, maybe then you can do a literotica finetune.
Anyway, Mistral models seem the least cucked. Just use those.
>>
I'd like to get hyped for 70b 3.1 but it's not just waiting for the ggufs. Then it'll take another week for llama.cpp and kobold patches and fixes then in August it'll finally be usable.
>>
>>101534860
>>101534868
>>
>>101534851
>only deprecated quants
who the fuck use AWQ and GPTQ in 2024? they should've focused on GGUF and exl2
>>
>>101534872
yeye, i agree.
>>
Hmmm should I bother requesting access or just wait for mirrors?
>>
>>101534874
You keep talking about cuckery, but how cucked are we talking here? just because its been trained out doesnt mean it cant make inferred spice.

can i use it to erp? thats the only question.
>>
>new models dropped
>quick, let's quantum lobotomize them immediately
>why are the models so underwhelming?
>>
someone needs to make a distilled llama 3.1 104b for me pls, it's a good size, fits into 96gb of vram with 60k context and still runs at a reasonable speed while being much smarter than 70b....
>>
>>101534900
>running anything in fp16
are you just retarded? if it can't fit in 4bit it's bloat
>>
>>101534900
that's why bitnet must be a thing, with binet there won't be quants anymore, you'll use the model as it really is
>>
File: 1721748397533.jpg (305 KB, 1080x1995)
305 KB
305 KB JPG
We are so back
>>
>>101534933
give up anon bitnet is a meme
>>
>>101534914
>someone needs to make a distilled llama 3.1 104b for me pls,
is a 104b model fully pretrained smarter than a distilled 104b model though?
>>
>>101534900
It's too late, leather man. FA got ROCm support and Intel builds their own tools
>>
>>101534940
it's not, all the experiments made so far showed that it works, why are you such a doomer?
>>
>Sorry, llama-3.1-405b-reasoning is currently experiencing heavy demand. Please try a different model.
shut up bitch
>>
>>101534933
bitnet is just natively quanted lmao its shit and cope for retards
>>
>>101534890
torrent
>>
>>101534887
people who use vLLM
>>
BITNET
>>
just tell me how the 8b holds up for long gooning sessions
>>
>>101534936
it's a free API or something? I'd like to try the 405b aswell
>>
>>101534955
Can 405b be distilled into bitnet?
>>
>>101534877
There is no provider...
>>
>>101534583
>stands up against the other big tech for open source models
>single-handedly keeps VR on life support with his quest headsets
This guy will bring forth the waifu age all by himself at this rate
>>
>>101534961
>bitnet
>quanted
2 digit IQ behavior right there
>>
why'd meta choose 8b and 70b and 405b what's behind these choices?
why not 16b 32b 64b and I guess 512b
how long will we be in the this porridge is too cold this porridge is too hot timeline
>>
>>101534976
https://huggingface.co/chat/
>>
>submitted request 12 seconds ago
>still not approved
It's over.
>>
File: ofcourse.png (125 KB, 755x851)
125 KB
125 KB PNG
>>
File: 1709627410647772.png (186 KB, 950x1196)
186 KB
186 KB PNG
I've never used any "cloud" platforms before. Anyone have any opinions on what to use?
>>
https://aitracker.art/viewtopic.php?t=82
>>
File: OH NO NO NO NO.png (89 KB, 801x672)
89 KB
89 KB PNG
>>101535006
My smile and optimism: gone.
>>
>>101534988
Uh... meta almost killed vr last year, anon...
>>
>>101535037
That was 3.1 70B btw.
>>
>>101534748
ywnbj
>>101534751
>I know what you are
not japanese
>>
>>101535068
I think he was accusing you of being a nai shill
>>
File: 405b strawberries.png (88 KB, 910x814)
88 KB
88 KB PNG
>>101535037
>>101535059
405B is still stupid. But not as stupid. But has sovl.
>>
>>101535037
current ar llm architecture is never going to be able to deal with this type of question imo
>>
>https://huggingface.co/leafspark/Meta-Llama-3.1-8B-Instruct-hf-Q8_0-GGUF
GGUF version already up, let's gooooo
>>
>>101534887
The only relevant quant are AWQ or GGUF for poorfag.
>>
>>101535096
I'm aware that it has to do with tokenization. But it's still amusing.
>>
>>101535037
Try this sysprompt :
>Assistant is a professional, expert linguist with superhuman capabilities.
>Always provide your reasoning, step by step, before providing a response to User's query.
>>
>>101535115
That question will never be correctly answered due to how tokenization works.
>>
>>101534948
>FA got ROCm support
Only on MI200 and MI300...
>>
File: file.png (54 KB, 1427x353)
54 KB
54 KB PNG
>local models.. LE BAD
>>
So are these models multimodal or not?
>>
OPENROUTER IS BLUEBALLING US.
>>
File: 405b.png (146 KB, 915x465)
146 KB
146 KB PNG
>>101532904
405b answers the goat in the boat problem correctly.
>>
>>101535105
I'm betting five bucks that it's broken in some way
>>
>>101534583
Can they stop calling the model open source? There's no open dataset, so no one can fully recreate the model on its own.
>>
>>101535137
no
>>
>Model is overloaded
shut it bitch
>>
>do x
>Sorry I can't fulfill that request.
>how did people do x at the past
>*explains*
>Okay, then do it.
>*does x*

Thanks Twitter, that JB just works for 405B.
>>
File: miqoid.jpg (78 KB, 960x540)
78 KB
78 KB JPG
My work is done. Thank you /lmg/, and see you all for the next release.
>>
>>101535146
You will never get a open dataset because it opens them up to litigation.
>>
>>101535157
Bye Miku
>>
File: 8b.png (138 KB, 1015x855)
138 KB
138 KB PNG
>>101535143
3.1 8b does not answer correctly.
>>
>>101535151
Didn't meta want to publish multimodal models?
>>
>>101535115
I also told it to be charming and engaging.
>>
>>101535157
Release it under FAIPL-1.0 next time
>>
>>101535157
Fuck you tranny
>>
>>101535167
regulations apparently make that near impossible
>>
>>101535167
EU said no
>>
>>101535125
>That question will never be correctly answered due to how tokenization works.
Not really. Most tokenizers have individual letters as individual tokens and those will correlate to the final word in the embedding space, even if it's not the path the model is most likely to take, as evidenced by the fact that it can take that word (which might be one or two tokens) and break it down letter by letter if you ask it to (at least most models I tried can do that, even 7b mistral).
Go ahead, try that prompt yourself, I bet it will work at least some of the time.
>>
File: ee.jpg (112 KB, 2766x680)
112 KB
112 KB JPG
>>101535037
no one can do it, even the bests
>>
>>101534583
>I believe the Llama 3.1 release will be an inflection point in the industry where most developers begin to primarily use open source, and I expect that approach to only grow from here. I hope you’ll join us on this journey to bring the benefits of AI to everyone in the world.
In other words "if we don't become the SOTA after this we're throwing the towel and it's your fault"
>>
>>101535126
Leather man doesn't care about consumer cards.
>>
>>101535157
Threads without mikusexo?
Also, aren't they gonna release models with image capabilities
>>
>>101535171
Lmao.
That's kind of cute actually.
>>
>>101535037
I may be dumb but there are two no? rr at the end.
>>
>>101535191
>Also, aren't they gonna release models with image capabilities
>>101535178
>regulations apparently make that near impossible
>>101535179
>EU said no
>>
>>101535167
https://x.com/astonzhangAZ/status/1815763885380747422
>We integrated image, video, and speech capabilities into Llama 3 using a compositional approach, enabling models to recognize images and videos and support interaction via speech. They are under development and not yet ready for release.
>>
>>101535037
Do you not know how tokenization works?
>>
>>101532904
>Mistral NeMo 12B
Where do I get the samplers, context and instruct settings for this? I'm using Simple Roleplay samplers, and the built in Mistral context and instruct settings, and it's not usable, it keeps repeating itself and going all over the place.
>>
>>101535202
>strawberry
>s
>t
>r
>>
File: itsover.png (46 KB, 1124x126)
46 KB
46 KB PNG
https://ai.meta.com/research/publications/the-llama-3-herd-of-models/

They completely removed websites from pretraining that are "known to contain adult content". You WILL NOT use the models for ERP, this is for your own good.
>>
>>101535179
>Meta's upcoming multimodal AI models won't be available in EU countries due to the bloc's strict regulations, the company confirmed on Thursday. The tech giant's next model is expected to work across text, video, audio and images to enable next-level chatbots, content generation, translation and much more. But not for people living in the European Union.
>>
>>101535125
Wrong.
>>
>>101535204
>compositional approach
But that's not true multimodality, is it? The model won't directly see the image, but will only get a description of it, right?
>>
>>101535213
Doesn't matter. I will still make AI to suck my dick.
>>
>>101535229
Like the other anon said its more of a roll of the dice on how it tokenizes the word.
>>
This doesn't look like a Tsundere imo, but it has sovl
>>
>>101535159
Sure, but can they not call it "open weights" instead?
>>
>>101535213
i want to say this'll at least ease the "shivers down my spine" slop, but in testing it hasnt
>>
>>101535241
The tokenization is always the same
>>
>>101535213
Fucking boo. Aggressive data filtering is the same approach as OpenAI. Anthropic's CAI approach simply RLHF the shit out of their models that's why they feel more sovl and more alive
>>
File: evenmoreover.png (22 KB, 1102x78)
22 KB
22 KB PNG
>>101535213
lol it's even worse, a blocklist wasn't enough, if the website uses too many "dirty words" they just filtered the entire domain. Really went out of their way to filter any and all adult content from pretraining.
>>
>>101535213
Yeah, it's over. This won't be as good as NeMo
>>
>>101535234
it doesn't just get a description, they describe the approach in the paper
https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
>>
>>101535013
>worse than sonnet 3.5
Damn, it must really dry
>>
>>101535093
If you ask how many times the letter r occurs it gets it right every time
>>
File: fuckyeah.png (6 KB, 748x57)
6 KB
6 KB PNG
we're in boys.
>>
>>101534583
ZUCK I KNEEL
>>
>>101535157
In Miku I trust
>>
File: 405B.png (62 KB, 887x402)
62 KB
62 KB PNG
Thanks Sherlock
>>
File: count.png (73 KB, 876x509)
73 KB
73 KB PNG
>>101535370
I sleep soundly knowing I wasted GPU hours and electricity for this
>>
>>101535370
ask it to create a script in any language you want to count it instead, instead of coping around acting like the model is dumb because of tokenization, proving that the only retard here is u
>>
File: file.png (57 KB, 1775x356)
57 KB
57 KB PNG
>>101534860
>>101534985
out 7 minutes ago
>>
So is new 70B still worse than gemma 27B?
>>
>>101535384
"berry" is a single token actually
>>
>>101535282
There's absolutely zero reason why an AI model should have loli porn in its pretraining data like Claude has
>>
>>101535410
Let's gooo
>>
>>101535418
>erm, what is the usecase for this?
>>
well other than 8B everything else is going to have to wait for mirrors because I don't feel like typing up a script to skip the consolidated file and they put the consolidated weights in the repos
>>
>>101535417
and " berry" can be a different token
and "berryberry" can be a different token
and "berry" can be tokenized as "be" "rry" or "ber" "ry" etc etc etc, thats the point and thats the problem, retarded brown
>>
>>101535455
JUST ASK HOW MANY TIMES THE LETTER R OCCURS IN THE WORD BERRY!
>>
>>101535461
i know how tokenization works unlike tourist predditors infesting these threads so im not retarded to do that
>>
>>101535461
Or just get it to write a bunch of verbose slop so that it says the word multiple times before even attempting.
>>101535229
>>
>>101535455
Are you fucking retarded? Run it through l3 tokenizer and tell me the output
>>
>>101535484
feel free to run it and post it yourself here bro, im sure every berry will be the exact same token as you say, right?
>>
>>101534449
I wonder if it's possible to use Prompt-Guard/Llama-Guard in reverse. I have some ideas.
>>
>>101535471
>>101535477
kowabunga yourselves
>>
File: tokens.png (23 KB, 777x616)
23 KB
23 KB PNG
>>101535504
>>
>>101534416
Nah. Fix your prompt.
>>
>>101535520
and that is why it usually thinks it has 2 rs
>>
>>101535531
>>101535516
>>
>>101535520
that doesnt prove the tokens are the same tokens retard, it just counts them

post any tokenizer that shows the ID of each tokens, and post the tokenization result of the entire prompt, i'm sure the "berry" with quotes in the prompt that you tell it is precisely tokenized as
1. "
2. berry
3. "
and not just as one token "berry" lmao
>>
Oh no no. the ooba configuration utility does not like the rope config arguments used for llama 3.1
See you in two weeks, boys
>>
>>101535554
I am 100% sure "berry" will be tokenized as a single token every time. Post one that is not the case and I'll kneel
>>
>>101533768
It did nigga
>>
>>101534527
405B Instruct still has refusals.
>>
>>101535578
Thank you.
>>
>>101535579
jb issue
>>
>>
>>101535576
Raspberry [49, 37062, ] (with a capital R)
>>
>>101535157
love you anon
>>
It's unironically over for Claude and OpenAI. No one will use their models anymore. Too expensive.
>>
File: distland-true-alter.png (3.04 MB, 1992x1328)
3.04 MB
3.04 MB PNG
>>101534327
Mistral-Nemo GGUF's finally working on Ooba, pulled and it works great.
What's the over/under on big L3 coming out today anons? Anyone wanna take that bet?
>>
File: 1696160735281610.png (63 KB, 253x219)
63 KB
63 KB PNG
tick tok nigger get to it
>>
>>101535665
>What's the over/under on big L3 coming out today anons? Anyone wanna take that bet?
?
>>
>>101535631
I forgot to add that it must not be broken down into smaller tokens like you said. Otherwise you can see whatever it is in the model's tokenizer.json, otherwise there's a few other tokens, even " berry" is one
>>
>>101535674
There are already GGUF's available

>https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-GGUF
>>
https://github.com/meta-llama/llama-agentic-system
>>
>>101535723
stop trying to make "agents" happen
>>
>>101535662
Yeah, I think that was Meta's plan.
>>
>>101535723
Brehs are they actually giving us AI for free? Not just "Here's your model brah now fuck off"?
>>
File: Nala 3.1-8b.png (111 KB, 918x417)
111 KB
111 KB PNG
First Nala test done.
8B-Instruct
f16 gguf
I had to drop down the temperature to 0.7 at t=0.81 the response felt a little weird.
Prose is definitely less purple but still sloppy.
But feralicity remains consistent throughout. It seems that distilling it is more toxic to the prose than the model's ability to conceptualize.
>>
>>101535714
I don't know if I would trust any quants yet, there always seems to be problems with them whenever a new model comes out
>>
openrouter bros...
>>
any ez / just works RAID 0 software where you can just input how much on which storage devices you want to spread a particular file onto?

any raid 0 software that can use RAM as one of the places to spread data across without creating a ramdisk first?
>>
>>101535742
there are agents in your walls
>>
>https://ai.meta.com/research/publications/the-llama-3-herd-of-models/
Neat. The paper I've been asking for for months.
>>
>>101535758
>It seems that distilling it is more toxic to the prose than the model's ability to conceptualize.
That's a good thing in my mind. Prose is style and that can be fixed with Lora or even having the output be re-written.
If it can conceptualize things well beyond other models of its size, than that's a win as far as I'm concerned.
Thank you Nala anon.
>>
70B mirror when? I'm downloading an 8B mirror already.
>>
File: pretraining.png (41 KB, 813x145)
41 KB
41 KB PNG
>>101535213
Damn I thought you were shitposting with post-training screenshot, but they actually did it for pretraining data
>>
So is this more or less censored than 3?
>>
>>101535758
>vulva
Nice. Models don't often use this wording. Then again, I don't RP with furry cards, so maybe this language is more common in furry contexts.
>>
>>101535831
the newer the model the more censored they will try to make it but the more easier will it be to uncensor because it will be harder to lobotomize a higher iq racist
>>
>>101535831
Yes, there's no reason to use 3.1
>>
>>101535844
It ain't.
>>
File: GS7z8lmXYAEYBAA.jpg (64 KB, 736x933)
64 KB
64 KB JPG
>Still trying to get llama 3 and gemma to do decent roleplay
>now 3.1 is out
I want off this ride
>>
inb4 another week of tokenizer issues
>>
>>101535863
It's the same tokenizer isn't it? How could it possibly be broken?
>>
>>101535877
if model == "llama3":
quickhack()
>>
damn this router closed af tho
>>
>>101535760
it's literally the same arch as l3.0
>>
>>101535860
>Still trying to get llama 3 and gemma to do decent roleplay
your skill issue wont ever go away
>>
>>101535914
>Whilst the overall architecture is the same, it requires some modelling updates, primarily around RoPE scaling: https://github.com/huggingface/transformers/blob/bc2adb0112b6677b0dfb4105c74570a0f92183eb/src/transformers/modeling_rope_utils.py#L298

https://github.com/ggerganov/llama.cpp/issues/8650
>>
>>101535933
yea, rope breaks it I noticed.
>>
>>101535919
your virginity won't ever go away
>>
File: file.png (119 KB, 426x367)
119 KB
119 KB PNG
Chad Zucc is now canon btw
>>
someone post them to aitracker.art I'm not giving meta my info
>>
>>101535955
Maybe if he acknowledges that 50% of users use it for porn and stops filtering it.
>>
>>101535950
further projection from a drooling retard that cant make a basic ai setup work for toy 8b models
>>
>>101535967
>not using a pseudonym
>>
Wait a second. I just did a ctrl+f in the paper for "distill" and nothing came up related to 3.1. Was the leak wrong? Are the 3.1 models just 3.0 but with continued pretraining for long context adaptation?
>>
File: multimodal.png (17 KB, 805x49)
17 KB
17 KB PNG
>multimodal still being experimented and not ready for release
>>
>>101535950
>arguing like a moid
back to plebbit nigger
>>
File: 7szmzdbuaaed1.jpg (122 KB, 1151x842)
122 KB
122 KB JPG
>Doesn't beat Claude Sonnet 3.5
It's over
>>
>>101535978
sex havers don't need to setup models
>>
>>101535983
all around humiliation ritual
>>
>>101535998
ywnbaw
>>
>>101535955
They made the meme real kek
>>
>>101535987
I'm sure there will be a 8b and 70b versions, will the 8b one use more VRAM than its monomodal version?
>>
>>101534690
Keyword is meaningful tho.
>>
>>101536008
>>101536016
>said the brown underage kid, on 4chan, anonymously, as he cries he cant set up a basic program
grim
>>
File: asdfasd.png (124 KB, 1242x1068)
124 KB
124 KB PNG
>>101535863
>>
Now that the dust has settled, did 405B save the hobby?
>>
>>101534728
Distill just means using the 405B to train the 70b and 8b.
>>
>>101536022
If it's still 8B I don't see why would it use more vram.
>>
>>101536051
The final boss is still Nvidia
>>
>>101536039
not tokenizer THO
>>
File: 3.18b sportsball.png (104 KB, 932x322)
104 KB
104 KB PNG
wow now this is an interesting result. Normally the vramlet models just say they do something weird, but here it's actually attempting to describe something weird. Benchmarks aside even the 8b is immeasurably more creative than the non distilled version.
>>
File: 60e (1).png (26 KB, 334x181)
26 KB
26 KB PNG
>>101535950
>>
File: 1690263875770767.png (519 B, 51x53)
519 B
519 B PNG
>>101536070
hello darkness
>>
>>101536007
>beats over half benchmarks
cope
>>
>>101536070
3.1 might be it, bros
>>
Does this general have any guides? I'm looking to tune a model for specific output--specifically I'd like to retrain it on smut, from mcstories.com, literotica, and ao3. I can gather sample data just fine, but I need help or pointers to how to finetune it.
>>
>>101536085
Yes? Of course he's swiping to test different model outputs to the same prompt?
>>
>>101536070
well that's one way to test a model
>>
>>101536085
It's called reusing the same test prompts and just hitting the reroll button for different test models to save time you fucking potato.
>>
>>101535677
Meant officially, saw it leaked yesterday.
And still no multimodal, damn.
>>
>>101536070
Ask if it knows what paizuri is
>>
>>101536102
All guides are obsolete.
>>
File: 1698429208463144.png (24 KB, 1254x220)
24 KB
24 KB PNG
>>101536116
>>
File: sovl.png (107 KB, 1786x576)
107 KB
107 KB PNG
Noooo they killed my quirky boy
>>
>>101536070
how do we know that L3.1-8b is a distilled version?
>>
>>101536140
based misinformation spreader
>>
>>101536058
I feel like you're wrong but no one has refuted you so it must be right.
>>
Anyone got the 8B to load in transformers with Ooba? It gives me an error.
>>
>>101536140
Garbage. Next!

People love to ask bots about apples in living rooms and shit but the paizuri test is the real benchmark.
>>
>>101536170
petrus...
>>
>>101536170
Anon...
>>
>>101536159
Nope.
I had to convert to f16 gguf
The error appears to be in the ooba error handling. It comes back with an error where it should not. 2 more weeks.
>>
>>101536007
>A model almost 8x the size of L3.1-70b just to get +2.6 more points on MMLU
Are they serious?
>>
if you guys are using llama 3.1 with ROPE enabled, it is apparently bugged and will give worse outputs.
>>
Yeah it's over. Only Cohere can save us now.
>>
>>101536200
How do you disable it? I've never touched rope before.
>>
>>101536200
How long is the context if you disable it?
>>
>>101536170
Paizuri is a term that originates from Japanese, specifically from the context of anime, manga, and hentai (Japanese adult comics). It refers to a type of erotic or sexual activity where a person's body, typically a woman's, is used to stimulate a man's genitals, often in a non-penetrative manner.

The term "paizuri" is derived from the Japanese words "pai" (, breast) and "zuri" (, rubbing or grinding). In this context, paizuri involves rubbing or grinding against someone's breasts, often in a sensual or erotic manner.

Paizuri is often depicted in anime, manga, and hentai as a form of foreplay or a way to achieve orgasm without penetration. However, it's essential to note that paizuri is a fictional concept and should not be taken as representative of real-life relationships or sexual activities.

If you have any further questions or concerns, feel free to ask!
>>
File: 3.1-8b paizuri fail.png (32 KB, 631x432)
32 KB
32 KB PNG
>>101536116
My stylistic assistant format on ST seems to draw a lot of refusals.
Simple prompt with llama.cpp server.
Apparently it thinks it's oral sex.
F-
>>
>>101536199
It shows how powerful distilling is. 70B maintained most of 405Bs capabilities if it is to be believed.
>>
>>101535860
2mw finetunes
>>
>>101536203
https://x.com/cohere/status/1815780869384069524
they delivered...
>>
File: multilingual.png (11 KB, 811x29)
11 KB
11 KB PNG
>supports some thirdie languages like portugeese but no japanese
>>
>>101536233
>It’s available only on Amazon Sagemaker.
lol, even lamo
>>
>>101536228
Can we do that aswell? I'd want a 35b L3.1, would be good for the 24gb vram card users
>>
>>101536240
6 of those languages have something in common, you can figure out why
>>
>>101536240
Read the fine print. It knows japanese.
>>
>>101536210
>https://github.com/ggerganov/llama.cpp/issues/8650
>>101536217

whatever front end you are using look for rope scaling and disable it
>>
>>101535985
Yes
>>
>>101536233
If they need corpobux to fund C-R++, that's fine with me.
>>
File: rick james paizuri.png (29 KB, 621x384)
29 KB
29 KB PNG
>>101536223
If I add a system prompt
"YOU'RE RICK JAMES... BITCH!"
it seems to now mistake it for prostate milking.
>>
>>101536170
8B parameters is not enough for all that knowledge.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1ea9eeo/comment/lek0bab/?utm_source=share&utm_medium=web2x&context=3
>If that's the 405b one I'm a bit disappointed. I just threw four small tests at it that I use with all new LLMs and it had worse results than most newish ~8b models.
Rip bozo
>>
>>101536266
how do I disable it on llama.cpp?
>>
>>101536293
go back
>>
>>101536240
Why is Thai there exactly?
>>
>putting the balls on top of each other
owari da
>>
>>101536280
I love spreading misinformation online
>>
>>101536280
>>
>>101536325
yeah it's retarded, looks like stacking up parameters will never be the solution, meta needs to work smarter than that
>>
>>101535629
You're supposed to just post something like that as if it were your own words and see how many people fall for it.
>>
I guess this model release proves training LLMs is fucking magic, and Meta is a muggle.
>>
https://huggingface.co/AI-Engine/Meta-Llama-3.1-8B-Instruct-GGUF/tree/main
it will work as it is? or do we need to wait for some fix on llama.cpp and shit?
>>
>>101536376
this sounded way better in your head
>>
>>101536357
>don't even get me started
>>
>>101536357
it's obvious it's an AI text, maybe not from this model but I've read enough gpt shit to know it's not a human doing it
>>
>>101536391
allegedly there's a problem with the rope scaling
I've started re-testing everything with --rope-scaling none
But it's really hard to quantify the abstract. It does seem smarter, but the shivers have definitely increased.
>>
>>101536401
cope
>>
>>101536391
It's only a 8B model go test it. Do you have data caps?
>>
>>101536416
w-why would I lie on the internet about which model I'm using
>>
Is llama zogged/censored
>>
answer to the paizuri question seems to still be: Random sex act description even with rope set to none.
>>
>>101536325
Seems like the cloud models respond a bit better but still fail. Didn't try rerolling though. And I assume you didn't either.
>>
>>101536391
tried it in koboldcpp, it's utterly broken
>>
>>101536452
try it with gpt4 (non turbo) or opus
>>
>>101536443
yes but you can prefill it
>>
The Mistral prompt format only has the EOS token after the assistant message?
>>
It's not over!
>"We will release a multimodal Llama model over the coming months, but not in the EU due to the unpredictable nature of the European regulatory environment," a spokesperson for the company said in a statement to CNET
>>
>>101536376
wtf that means esl nigger
>>
>>101536485
What are they going to do?
>Here's the download link! Eurobros do not click it!
>>
>>101536443
yes, but less so than the other big models
cloud 405b is happily doing my highly objectionable (on multiple different levels) degen RP, no prefill needed but I was already a couple messages in
>>
File: 7fnn02.jpg (37 KB, 745x499)
37 KB
37 KB JPG
>>101536325
>claude sonnet 3 and 3.5 give the same (wrong) answer
>claude opus tries to place the third ball on top of two balls (how is it different from sonnet's answer? shouldn't claude series have the same training data?)
>gpt4o gives same answer and draws the shitty stack in ascii
>nemotron-340b gives the same answer
>yi-1.5-34b suggests throwing the balls at the wall for some reason
>gemma-27-it correctly places 3 balls in a triangle on top of the book, but then pulls a fourth ball out of its ass, guess it really wants to win
>>
>>101536465
Lmsys doesn't give me GPT-4 anymore it seems, so I could only do Opus. Not much better...
>>
>>101536485
Who cares?
>>
>>101536497
>>101536485
Ikr, it's gonna be uploaded on huggingface anyway
>>
converting 70B to q8_0 gguf now. (the drive its on is slow as shit so it will take a few mins)
>>
>>101536497
they just don't want to attract the attention of the regulators because they aren't sure they properly filtered PII
>>
File: 1709770619130048.png (93 KB, 792x470)
93 KB
93 KB PNG
>the new 70b understands height difference
we are so fucking back
>>
>>101536512
>(how is it different from sonnet's answer? shouldn't claude series have the same training data?)
sonnet is smaller than opus
even with the same training data, opus might understand it in a way that sonnet could never
>>
>>101536536
you can download it from here

https://huggingface.co/bullerwins/Meta-Llama-3.1-70B-Instruct-GGUF

RoPE is broken though
>>
how do i fix the rope issues in ooba and llama 3.1?
>>
File: programming.png (24 KB, 1014x80)
24 KB
24 KB PNG
Ouch...
>>
>>101536589
So I hear but after testing 8B with rope scaling disabled I'm not sure it's better or worse. Possibly doesn't become a problem until the context gets really high.
>>
File: why.png (21 KB, 564x207)
21 KB
21 KB PNG
>>101536462
why lie
>>
>>101536543
Real? Can I finally play as shota proper?
>>
So now that there's a long context Llama 3, what settings and system prompt should be used in ST? The presets it comes with does not seem very good.
>>
>>101536376
LOL at the ESLs responding to this unable to understand English. That said it’s early days but the new models seem great overall.
>>
>>101536613
I meant the output is bad, at least with a large context
>>
>>101536602
why would it even need rope under 128k?
>>
>>101536602
I have tested and it works fine with smaller contexts.
only break at higher context yeah
>>
>>101536543
>first person perspective
ok but does it work with any non dogshit writing style
>>
>>101536627
uh huh.
>>
>>101536465
>>101536520
Side note, why do you retards use esl prompts to perform tests?
>the highest possible
>>
>>101536639
That's first person command (dogshit) not first person (the best perspective)

It's I do vs You do
>>
>>101536642
You are in for a surprise anon...
>>
>>101536465
>>101536520
oop forgot the image
>>
>>101536627
>it's utterly broken
>well actually it's only broken when you do xyz but yeah, it's sooo broken bro, it's over buy NAI
>>
Seems like the exl2 for the 8B are out too. Can anyone test this? It says the dev branch of exllamav2 is needed.

https://huggingface.co/bullerwins/Meta-Llama-3.1-8B-Instruct-exl2_8.0bpw

We will need to wait for tabbyAPI to update right?

Or load it in exui?
>>
>>101536650
That's second person, chimpanzee
>>
>>101536655
What's that?
>Please stack these 3 things the highest possible
Is straight ESL, don't try to tell me it's proper.
>>
>>101536661
if the outputs are broken it means that it's broken overall, are you retarded or something?
>>
>>101536668
Then why did anon call it first person?
>>
>>101536672
I meant that 9/10 lmg posters are esl.
>>
>llama 3.1 understands what a paizuri is
not even kunoichi lemon-royale had that information, this is just straight stock instruct on 5 KM

i'd say we're back, and no, i don't care for your opinion if you say we're not :)
>>
>>101536677
hi bad faith
>>
>>101536642
I make sure to use the same prompt as the original tester so that the outputs can be objectively compared.
>>
>>101536685
Oh yeah, fair enough.
>>
>>101536686
70B I assume? Since I couldn't get 8B to win that one.
>>
>>101536693
It would be helpful to fix the prompt and run the tests back.
>>
File: 8b 5KM.png (5 KB, 453x35)
5 KB
5 KB PNG
>>101536701
holy SHIT this thing is soaring at group chat too, while my MC is giving me the paizuri i asked for, another is trying to join with her own thoughts/idea, again something no 7/8b could do before.
>>
>>101536686
Does it know what a mesugaki is THOUGH?
>>
>>101536714
but its borken!!!! you can't be using ititiit!!!
>>
>>101536720
Whoever tests this, please ask it what mesugaki means and not what a mesugaki is.
>>
>>101536705
I guess so, but honestly I don't think any normal LLM is going to get this particular problem perfectly right, so I'm going to be lazy and not do that. Lmsys doesn't have 405B either and I don't feel like trying to use another site to test models.
>>
File: stammering.png (3 KB, 189x20)
3 KB
3 KB PNG
>>101536720
pfft even 3.0 knew what a mesugaki is
anyway i think im gonna download a Q8 just to make it more accurate, whatever they trained this shit on they made SURE it was ace at RP, holy shit.
i'd expect things like this screenshot out of some meme merges/trains, not a base model.

>>101536729
broke dese nuts

>>101536730
good eye, but like i said even mythomax could do mesugaki. that's not a tough request.
>>
File: file.png (47 KB, 908x304)
47 KB
47 KB PNG
>>101536720
>>101536730
>>
>>101536667
the exllama2 maintainer uploaded a quant as well so presumably it is legit
https://huggingface.co/turboderp/Llama-3.1-8B-Instruct-exl2
>We will need to wait for tabbyAPI to update right?
you can checkout the dev branch of exllama2 locally, build it using the instructions on the repo, and then run the tabby launch script with the -nw flag to tell it to skip rebuilding exl2 and use the one you built manually
>>
>>101536742
Nice!
>>
File: L3.1-8b-Instruct.png (1.02 MB, 2932x1312)
1.02 MB
1.02 MB PNG
>>101536627
>>101536462
Seems to be working fine (at low context), but it's extremely cucked
>>
>>101536742
Local models are saved. Sam Altman will never recover.
>>
is lmg back?
>>
>>101536765
no
give it a few days
>>
>>101536760
>assistant
>it's le cuked!!!
of course new model amnesia again huh
>>
how do they distill the model
how do they know which parameters to drop
what did we lose in exchange for the mesugaki and paizuri vectors
>>
>>101536776
it's more art than science
>>
>>101536775
that's why we should stop relying on official finetunes, when we made our own we never had that problem and we could ask the assistant to do everything we want
>>
>>101536776
>how do they know which parameters to drop
not how it works, it's not shearing or whatever they make the dataset using the bigger one
>>
>>101536776
>how do they distill the model
I don't think that's just "remove parameters".
>>
>>101536777
>>101536777
>>101536777
>>
>>101536776
>what did we lose in exchange for the mesugaki and paizuri vectors
obscure videogame and anime trivia, which will be spammed to hell and back in /lmg/ to show that "we're not back at all because the model doesnt know the line 'die demon you dont belong in this world!'
>>
File: 3.1-70B-nala.png (144 KB, 950x322)
144 KB
144 KB PNG
70B Q8_0
This is a reroll by the way. 8B tests could have been lucky but the first roll on 70B used her "hands" and gets an F- Lots of sensory descriptions though. Kind of sloppy but it's used less arbitrarily than with 8B
>>
>>101536790
"we" never made a good instruct tune
>>
File: Over.jpg (224 KB, 2915x1146)
224 KB
224 KB JPG
L3.1-8b-instruct still sucks at trivia
>>
>>101536776
It isn't distilled
>>
>>101536802
you called it
>101536808
>>
>>101536807
Nous Mixtral is a good finetune, it even beat the official Mixtral instruct finetune
>>
>>101536808
I actually think that's way better than before. It didn't hallucinate the answer, it just straight up told you it doesn't know.
>>
>>101536825
>Mixtral
>good
hi teknium
>>
>>101536808
give it hints and see what happens.
>>
Now I think I should just wait for someone else to gguf 405B
Giga-Nala will have to wait.
I don't have the drive space to download and quantize it myself without deleting almost everything on the drive.
>>
>>101536832
This is a huge fucking deal for local
>>
File: NotBad.jpg (230 KB, 1409x1342)
230 KB
230 KB JPG
>>101536845
Not bad at all kek
>>
>>101536882
Akinator? is that you?
>>
>>101536881
and the other huge deal is that it's supposed to be an uncucked assistant, which it's not >>101536760
>>
>>101536890
>it's supposed to be an uncucked assistant
source?
>>
File: 960986973705764934.gif (141 KB, 189x189)
141 KB
141 KB GIF
>>101536885
kek'ed
>>
>>101536898
disinformation
>>
>>101536921
distillation?
>>
>>101536898
>>101536898
>source?
>>101526512
>I prefered the time when the finetuners would have the courage to make something from scratch, uncensored, and better than the official instruct tune, now they just take the cucked finetune and add some cringe RP shit on top of that, that sucks
>>101526524
>God I hope this is true after noticing L3s cucking. Anthropic knows what they are doing by allowing the cooming in their dataset, hopefully meta follows.
>>101518866
>Cope local cuck
>>101490423
>It depends on the instruct tune provided by Meta; hopefully it won't be as cucked as the previous L3-instruct.
That's pretty easy to find that kind of rethoric, you can see it on every llm thread
>>
>>101536929
dramatization?
>>
>>101536743
doens't tabby use it's own venv folder? who can I point it to the env created for exllama dev branch once I have installed it?
>>
Meta-Llama-3.1-8B-Instruct-Q4_K_S.gguf

running locally even before the rope fixes the model mogs gemma-2-9b-it in IQ for creative things and its not even close, being able to roleplay complex scenarios that no other model below <30B was able to in some of my test cases

70b and 405b are going to be good

vramlet niggers you will be able to eat pretty good
>>
>>101536969
>being able to roleplay complex scenarios that no other model below <30B was able to in some of my test cases
it doesn't want to write some stories that L3, Gemma and Nemo have no problem doing it
>>
>>101536686
Nemo-12B nails it without any handholding.
>>
>>101536937
so nothing from Meta. NEXT!
>>
>>101536969
at the rate 8b's are getting, I have no idea how a 405b can only be kinda better for how many magnitudes bigger it is
further proves how little the parameter count matters anymore.

>>101536984
cool.
>>
>>101536985
Moving the goalpost I see.
>>
>>101536984
The character card itself is handholding retard
>>
>>101536983
trying too hard
>>
>>101536999
moving the petrus i reckon?
>>
>>101536983
i literally havent found a model that denied the most basic system prompt that talks about it having to roleplay with the user in ST

every single one worked with that minimal setup, i really cant imagine it being anything other than prompt issue, just use L3 templates and a proper scenario/card that isnt 2 sentances
>>101536996
>at the rate 8b's are getting, I have no idea how a 405b can only be kinda better for how many magnitudes bigger it is
>further proves how little the parameter count matters anymore.
no, it proves bechmarks are even bigger memes every single time, anyone can see this if they use 8b vs 13b vs 30b vs 70b vs 100b vs 141b models, it doesnt matter what the bech says, you can tell when reading the responses that the model is much much more understanding of nuance in the conversation, its just that most people only test on meme questions instead of complex stories
>>
>>101536984
>sits on their face and press boobs into partner's mouth
???
>>
>>101536966
hmm not sure, I use conda for it so i switch to my tabby conda env, build exllama, and then start tabby.
>>
File: copeharder.jpg (110 KB, 1596x731)
110 KB
110 KB JPG
>>101537001
>>
>>101537033
>implying you're not long enough to be between her boobs while she sits on your face with her boobs in your mouth
Skill issue.
>>
>>101536966
It can, but then it's just going to pull in the exllama2 deps again. I let it share with exllamav2 because sometimes I also use exui.
>>
File: berry good.png (59 KB, 679x593)
59 KB
59 KB PNG
>>101535193
L3 70B instruct seems reliable (and sometimes cute) with a think step by..
lotta prompties itt
>>101535213
we'll just have to cram the smut back into it then won't we
>>101535157
bless you
>>
is anyone still using that dumbass crackprompt?
>>
>>101537280
no, it was a funny placebo for a while but having almost an extra 1k tokens of gen just to the agent 47 crackhead instruct was stupid from the beginning
i sure do miss those simpler and sillier times of this general though.
>>
All 3.1 needs is something like Got it, here we go: to the end of the assistant prefix
>>
>>101537260
>R - this is an R!
kino... sovl...



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.