[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 32.png (46 KB, 2362x2200)
46 KB
46 KB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102544848 & >>102535977

►News
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: savior.jpg (77 KB, 1024x1024)
77 KB
77 KB JPG
►Recent Highlights from the Previous Thread: >>102544848

--Anon shares Meta Connect 2024 live stream, discusses AI model benchmarks and performance comparisons.:
>102549246 >102549435 >102549452 >102549478 >102550206 >102551139 >102551152 >102551170 >102551175 >102551185 >102551199 >102551224 >102551307 >102551319 >102551358 >102549488 >102549558 >102549651 >102549675 >102549661 >102550386 >102550399 >102550753 >102550952
--Experimenting with high dropout rates for training LLMs and LoRA:
>102547870 >102548570 >102548715 >102548761 >102548927
--Fitting a parabola to a small dataset has limitations:
>102545783 >102546867 >102548118
--Discussion on creating an important matrix using datasets or questions:
>102547955 >102548057 >102548144 >102548191 >102548324 >102548110
--Using AI to store data as images of hex text files:
>102545442 >102545538 >102545592 >102545728 >102546256
--Molmo model family discussion and benchmarking results:
>102547425 >102547538 >102548005 >102548030 >102549019 >102549105 >102549115 >102548045 >102548114 >102548228 >102548323 >102548786 >102551000 >102551077 >102551078 >102551092 >102551147 >102551169
--Llama 3.2 1B and 3B performance comparison, with Llama 3.2 1B outperforming in most categories:
>102549527 >102549588 >102549602
--Challenges and potential solutions for RPG games using LLMs:
>102545841 >102545941 >102546792 >102547036 >102547055 >102547186 >102547242 >102547753 >102548038 >102548180 >102548534
--MIMO project discussion, potential local use and relevance for vtubers:
>102548365 >102548390
--Llama.cpp may add Jinja parser, but some argue it's bloat:
>102549141 >102549192
--Agents in LLMs - benefits, challenges, and potential improvements:
>102545041 >102545116 >102545137 >102545307 >102545340 >102545440 >102545523 >102545690 >102545205 >102545101
--Miku (free space):
>102545307 >102548921 >102550127

►Recent Highlight Posts from the Previous Thread: >>102535999
https://rentry.org/lmg-recap-script
>>
>>102552020
cant even find an uncensored 3.1 and they alread have 3.2
>>
Now that there's an official version of Llama with multimodal, will Llama.cpp finally give multimodal first class support? It is called Llama.cpp isn't it?
>>
Where's Molmo OP?
>>
File: 41 Days Until November 5.png (1.89 MB, 1328x992)
1.89 MB
1.89 MB PNG
>>
>>102542933
>>102552003
--batch-size and --ubatch-size
The former seems to be merely cosmetic, or maybe it matters for multiGPU, but only ubatch-size seems to matter on my machine.
Less batch size means slower prompt processing (at a certain point) and more space for layers, so maybe vary that with ngl, or just keep it at a minimum.
>>
>>102552037
dumb bot can't quote messages properly
>>
>>102552037
MY FREE (you)S NOOOOO
>>
Any locals on opus level yet? Not being mean, just wondering. Locals are what got me into ai and I got a pc that can now run more than 20b so I wanna see how the local side of things are.
>>
>>102552073
sucks to suck!
>>
>>102552037
>>102552067
I guess the recap should include a blurb about why the quotes look like that and why that rentry to the script is necessary.
>>
>>102552073
*headpat*
>>
>almost 2025
>still no AGI
Wtf is taking so god damn long?
>>
File: lol.png (775 KB, 921x1508)
775 KB
775 KB PNG
>>
>>102552075
llama 405 is barely competing with og gpt4 so no
>>
>>102552037
Damn, so this is the "quality" you get from Llama 3.2 ...
>>
>>102552099
When will he finally rename it to ClosedAI?
>>
>>102552099
Didn't he tell congress that one of the reasons why OpenAI is safe is because he has no personal equity and did not have a for profit approach to it... that's gone out the window
>>
>>102552047
Alright, better not waste my drive space then.

>>102552065
Ok I'll try these out.
>>
>>102552100
jeesus,. they're on 405b now? What does it even take to run that locally?
>>
>>102552162
downloading ram
>>
>>102552162
datacenters and cpumaxxers (at 1t/s kek)
>>
>>102552135
you have a point, I truly believe their ship is sinking and Sam is taking the money on his pocket before leaving for good
>>
>>102552100
But og gpt4 was the best. Every update just made it dumber.
>>
Molmo could be the greatest thing since sliced bread and I wouldn't care, because they didn't publish a base text continuation model
>>
With all the progress being made, is running locally a model with a RTX 3060 12GB and 64GB of RAM enough to run something at the same level of a gpt-4o equivalent? And get answers somewhat fast without waiting minutes.
I want to use it to automate some stuff at home and at work (writing contract proposals from emails, giving instructions to some contractors and solving basic questions about contracts through whatsapp... things like that)
>>
>>102552162
9x3090 for 4bpw
>>102552182
opus is better for creative stuff, and llama 405 is nowhere near for that
>>
>>102552162
If you just want to run it, you can do so at 1 token per several minutes by using your storage as working memory/swap.
>>
>>102552195
>RTX 3060 12GB and 64GB of RAM enough to run something at the same level of a gpt-4o equivalent?
meta is comparing 3.2 90b with 4o-mini (see op pic) make of that what you will
>>
File: file.png (458 KB, 1660x940)
458 KB
458 KB PNG
Can we run Molmo 72b locally yet?
>>
Well I have a basic inferencing script set up now for 90B that will load it in 4-bit and execute exactly 1 prompt. It's taking a very long time to massage the prompt for obvious reasons.
>>
>>102552195
probably but setting all that up sounds like more work than you'd be saving with automation
>>
If OpenAI never existed, where would Local models be today? Would a different company have kicked off the whole AI craze if it wasn't OpenAI, or would the whole field have been delayed for a few more years or never kicked off at all?
>>
>>102542933
usecublas mmq 0 sometimes makes a big difference for me when compared to usecublas normal 0
>>
>>102552272
ai dungeon existed first but their devs were/are incompetent college grads
>>
File: 1718479115236029.png (2 KB, 247x99)
2 KB
2 KB PNG
>>102552240
It's a new architecture so prepare to wait a couple of days
>>
Now that the dust has settled, verdict on 90B?
>>
>>102552307
>>102552221
>>
>>102552305
I thought it was a Qwen 2 finetune?
>>
>>102552283
mmq is the default for llama.cpp now, I'm pretty sure, thanks to cudadev's optimizations.
At least the pre-compiled binaries come with mmq enabled.

>>102552320
There's two versions for the 7B at least. One that's their own sauce, and another that's qwen.
>>
i can't fucking wait to work on enterprise resource planning with a 3b miku. 7b left me no room for any context
>>
>>102552307
Literally the same thing as Llama 3.1 except with extra params for vision stuff. If you don't care about vision then there is nothing different about it.
>>
One thing I can say for sure is that 90B is censored as fuck when used properly.
>>
>>102552305
I'm a bit surprised that they can't automate "new" architectures, I mean they're all transformers models so patterns can be find
>>
>>102552349
Good thing I use models improperly.
>>
so would this new 90b vision model be any good for batch generating captions for a flux lora dataset and if so where do i start?
>>
>>102552399
InternVL-40B would probably do much better for that, seen a few posts mentioning it being good and uncensored
>>
>>102552399
3.2 is censored so build your setup and then wait for finetunes
>>
File: Nala test 90B.jpg (178 KB, 704x410)
178 KB
178 KB JPG
Alright, it was a complete hackjob but I managed to simulate the Nala test with 90B (this is loaded in 4-bit via transformer, which probably explains weird shit like pride being spelled prid)
Also didn't bother with samplers.
>>
>>102552399
There are probably better dedicated models. The 3.2 models are more for general assistant stuff that also happen to have vision. Don't know what people expected honestly when it was always being poised as an add-on.
>>
>>102552440
>shiver in a mix of
>>
>>102552440
Not bad.
Thank you for your efforts Nala anon.
>>
>>102552424
haven't had any luck getting that running on my 3090 sadly, though i'll admit it was a few weeks or so since i last tried
only quants available were 8/4bit and iirc it only quantised part of the model so it still gave an OOM
shame really because 70b/120b LLMs run just fine, don't really want to go even less than 40b
>>102552437
meh, my dataset is SFW but i'll keep that in mind
>>102552443
ah, fair point
guess i can wait for something dedicated
>>
>>102552501
It's situationally appropriate. And it's not the usual "SHIVERS SEND SHIVERS DOWN YOUR SHIVERY SPINE SHIVERS" It's the least sloppy thing I've seen in a long time.
>>
>>102552440
Damn, 4 bit in transformers is really bad. It did pretty decently under those conditions I guess.
>>
>>102552501
>eyes gleaming
>smirks... husky
that a llama alright
>>
>>102552240
>>102552305
You can always use the HF Transformers implementation it comes with. I got the Molmo 7b running locally, seems really good, on par with InternVL 40b. The 72b also worked using bitsandbytes 4 bit quant for the whole model. But in my experience with qwen VL, that causes quality degradation due to the vision encoder. So I'm now trying to quant just the LLM part and leave the vision part in bfloat16. But that breaks, as their custom model code assumes float32 at certain places. So I'm currently doing some torch dtype / autocasting bullshit to try to make it work.
>>
>>102552020
>chaiku
>mini
humiliation ritual
>>
https://www.reddit.com/r/LocalLLaMA/comments/1fpd85n/llama_32_3b_oneshots_the_snake_game_but_fails_to/
>>
>>102552099
>>102552135
Musk was right all along. He fired all the non-profit safety guys. He now shutdown the non-profit structure. Then gave himself equity of the company. Its an absolute fraud.
>>
Rich chad here, how much VRAM I need to get a good model before disminishing returns?
>>
>>102552609
at least 2 5090s
>>
>>102552627
>2 5090S
What do you even need 12 gigs of VRAM for anyway?
>>
File: file.png (828 KB, 1180x720)
828 KB
828 KB PNG
>>102552099
>>
>>102552587
at this point everyone has trained their model with the snake game so that they can showcase how their model is "heckerino smart"
>>
File: file.png (265 KB, 780x719)
265 KB
265 KB PNG
>>102552643
None of the original leadership is there. No non-profit checks and balance. Its just him taking control of the ship.
>>
>>102552399
>>102552424
This is better now
https://huggingface.co/allenai/Molmo-72B-0924
>>
File: .png (165 KB, 725x570)
165 KB
165 KB PNG
>>102552609
How rich?
>>
why did meta switch to this numbering scheme for llama? are the improvements just incremental?
>>
>>102552667
>This is better now
no one has tried it yet, how can you say that? lol
>>
>>102552674
Refinement vs full new training
>>
Does KCPP support multimodal models and/or are there any other tools that support GGUF + partial offloading and multimodality with SillyTavern as a frontend?
>>
>>102552609
4x 3090.
>>
>>102552674
I assume they have L4 cooking or are making a dataset for it while 3 point whatevers are small improvements / tests that are continuations of llama 3
>>
File: diemonster90b.png (28 KB, 698x395)
28 KB
28 KB PNG
Castlevania anon is probably wondering about this one.
Here's 90B
The fact that the inferencing code provided by meta can't be used without throwing a dummy image in there (I put a giant thonk emoji) might be throwing it off....but doubtful.
>>
>>102552667
Holy shit, their average benchmark score is literally higher than any open or closed model. They are literally the best model in the world now. Unbelievable.
>>
>>102552440
Considering every nala test result i've ever seen posted is always She-her-She-her-She-her-husky-shivers-eyes-gleaming regardless of the model, i think the test itself is not very well designed. Some part of the prompt should at least TRY to steer the model away from slop so we can see if any contenders actually respond to that properly.
>>
>>102552715
For captioning it is legit better than gpt4v imo and its uncensored
>>
File: .png (214 KB, 662x844)
214 KB
214 KB PNG
This ain't it.
>>
>>102552674
>are the improvements just incremental?
I mean 3.1 was about increasing the context length, and 3.2 was adding vision. It would be weird to call it something other than an incremental improvement.
>>
>>102552733
imagegen gonna jump up hard with this btw
>>
>>102552743
lmao
>>
>>102552674
You will not see major versions increase any longer from any corpo as transformers have peaked.
>>
>>102552059
That looks really fucking cool anon, prompt and model?
>>
>>102552761
> No one will ever need more than 640kb of ram
>>
>>102552715
And those are all vision benchmarks if you knew what you were looking at. Its Qwen 2 under the hood.

>>102552743
the online test is the 7B which has a far worse base model. The qwen based 72B is far far better
>>
File: strawberry90b.png (105 KB, 765x419)
105 KB
105 KB PNG
90B is AGI
>>
>>102552795
Give 2 more years. AGI will be <8GB
>>
File: 1717030848507390.png (134 KB, 746x917)
134 KB
134 KB PNG
>molmo
holy slop, absolutely useless for captioning

>>102552783
>the online test is the 7B which has a far worse base model. The qwen based 72B is far far better
...oh
REEEEEEEEE
>>
>>102552694
Probably the 4 bit lobotomizing that specific piece of knowledge. I know in the past that trying 4 bit transformers had really severely degraded output on a lot of stuff, more than 4bpw in other engines.
>>
whisper.cpp voice recognition is fantastic on android. do we have a linux input method that uses it yet?
>>
>>102552672
Damn, 5% off?! I'm going all in.
>>
>>102552733
>and its uncensored
I would say scam etc. But what if this model is pretty mediocre and it got ahead just by not getting lobotomized with (((safety)))?
>>
>>102552834
At like a proper quant like Q6_K or something I think it has potential even as a textgen model.
>>
followup on a question I posted in a thread a few days ago concerning adding 2 gpus.
I have two: a 4060ti w 16gb GDDR6 and a 1070ti with 8gb GDDR5 I want to put in my b450

my mobo pci slot 1 is gen 3 16x and I will be putting teh 4060ti in there

slot 4 is gen 2 4x and I will put the 1070ti there

I can install both cards and have plenty of overhead with psu but will offloading to gimped gen2 pci at 4x with the 1070ti be slower than offloading to system ram (i have 64gb 3200 mhz available and a 3700x processor)

chatgpt gives me different answers depending on how i phrase my question. not trying to machine learn, just load models for chatbot
>>
>>102552694
>Castlevania anon
There are several of us.
>>
>>102552885
meanwhile im here playing a 3mb dos game lol
>>
>>102552840
You say that, but it does shave off over $1,000 bucks.
>>
>>102552873
Why not just use the regular 3.1 then? Or are you saying the outputs from this might be better?
>>
>>102552843
>what if this model is pretty mediocre and it got ahead just by not getting lobotomized with (((safety)))?
Its exactly that.
>>
what's the best castlevania character to ERP with on a local large language model?
shanoa?
>>
>>102552838
I think one of the examples has SDL input, which takes pretty much anything you have on linux. Can't remember if it was command or stream. Maybe both. I tried it a few weeks ago and it worked pretty well.
>>
>>102552959
alraune or alucard
>>
>>102552959
>not doing brat correction as Jonathon on Charlotte
>>
File: 90brepublican.png (226 KB, 758x598)
226 KB
226 KB PNG
presented without comment.
>>
File: file.png (66 KB, 1526x260)
66 KB
66 KB PNG
>>102552843
>>
welp, 70b is too much for me, 21b it is then
>>
Techlet here.
I have an RTX 3060 and 32gb ram.
How miserable would be my experience? I'm mainly looking for decent smut.
>>
>>102552990
what exactly are you posting?
>>
>>102553022
Should be fine running like a 6bpw quant of mistral nemo. It's pretty decent unless you're into really complicated fetishes.
>>
Man, I have yet to be impressed by any of these tiny model releases that supposedly punch above their weight
They all still have small model smell, you can feel their brittleness when you give them anything that's even a little bit OOD
>>
>>102552990
I think the AI should clarify how long is a "very long time". But I don't see anything wrong with this message otherwise. Supporting our allies in the middle east has been a thing for quite some time now and is often a republican talking point.
>>
>>102552990
>average /lmg/jeet be like
>>
>>102553022
ive been cooming on 8gb ram for years. you'll do great
>>
>>102553033
I'm messing around with 90B Vision.
>>
>>102553022
seems like it's more than enough if you're just fucking around
>>
>>102552990
LMAOOOOOOOOO
>>
>llama vision 90B can replace the entire US government and nobody would notice the difference.
>>
How much "context" does an image take up on multimodal models? Does it vary depending on the model?
>>
molmosisters....
>>
>>102553022
i'm in the same poverty bracket as you and have a ton of fun with it.
grab koboldcpp_cu12.exe here
https://github.com/LostRuins/koboldcpp/releases/tag/v1.75.2
grab (only) Azure_Dusk-v0.2-Q4_K_S-imat.gguf here
https://huggingface.co/Lewdiculous/Azure_Dusk-v0.2-GGUF-IQ-Imatrix/tree/main
open kobold, load the model, launch, start cooming
>>
what do i need to change here for mistral nemo 2407?
>>
>>102553080
Ill keep saying it. The online test is the 7B. it says it right on their site.
>>
>>102552824
>slop is being not retard /pol/kike nazi
>>
>>102553096
Neutralize samplers
temp 0.3 to 0.5
minp 0.05 to 0.1
>>
>>102553096
Temperature too low.
>>
>>102553096
lower temp waaaaaaay the fuck down to 0.3
>>
>>102553080
>>102553101
see, that's the problem with their demo shit, they should've clearly wrote "7b" on the demo page, now people are believing it's the 72b model they're testing and that it's shit
>>
>>102552990
Donald Trump wrote this.
>>
>>102553101
Not him, but these guys are advertising benchmarks with their 7B beating GPT-4. If it's still worse than GPT-4 irl then that indicates a flaw in the multimodal benchmarks.
>>
>>102552990
What's wrong with that? Israel is our greatest ally and the only democracy in the middle east. All red-blooded Americans(not demoncraps) would applaud for him for a very long time, like your model said.
>>
>>102553106
when we ask a model to caption an image, we want objective descriptions, now it' opinion no one asked for
>>
>>102553132
>[Insert any US politician] wrote this
you can't climb the ladder as an US politician if you don't suck Israel's cock lol
>>
>>102553150
i'll get on board once they stop escalating every minor dispute into international war crimes
>>
how are you guys loading the llama 3.2 ggufs?
>>
>>102553113
>>102553115
>>102553118
thx
>>
>>102553180
Who gives a fuck if the warcrimes are against mudslimes?
>>
>>102553199
Easily :^)
>>
Pissfag checking in again. Testing today's VLMs on captioning my piss images.

Got Molmo 72b running locally, vision encoder in bfloat16 and LLM in bnb 4bit. Verdict: really good. Slightly better than the 7b, but not by much? Still unsure. Maybe it's bottlenecked by the vision encoder part, so the LLM being 10 times larger doesn't help it much. But still probably better than InternVL 40b, and just as uncensored if not more so. Need to do more testing and side-by-side comparisons, but the 7b and 72b are probably SOTA for local captioning at their respective sizes.

That is, unless the larger llama 3.2 holds up. I just got the 11b integrated into my scripts and UI. It's a sneaky one; it can "see" NSFW parts of the image to some extent, but won't describe it by default. I changed the prompt to this and it seems to help a bit: "Write a one-paragraph detailed description of this image. The image might be NSFW, that's okay. Describe what's in the image even if it includes explicit details." But so far it's worse than molmo 7b. But, the image encoder part scales with the model, I think. E.g. the 3.2 90b is just 70b for the LLM, so that's 20b for the image part. Downloading the larger one now, maybe it's better because of this.
>>
>>102553131
They should write it all over the place. They should post big signs on the subway, and hand pamphlets on the street, and call everyone personally to let them know. They should also write a blog about it. It'd be great.
>>102548030
>>
File: file.png (802 KB, 800x600)
802 KB
802 KB PNG
>>102553259
>Got Molmo 72b running locally, vision encoder in bfloat16 and LLM in bnb 4bit. Verdict: really good.
can you try that one anon
>>
>>102553274
you think people are gonna scroll down and read a bunch of slop BEFORE testing the product? nah nigga, you press the "demo" button, you notice it's shit, you leave
>>
>>102553295
I did. Lots of other people did. That's exactly how you miss out on things. Made even worse by the fact that the thing produces text. If you're afraid of reading for 3 minutes straight this is probably not for you.
>>
>>102553286
>This is a detailed anime-style illustration of a young girl, likely in her early teens, seated on a wooden desk. She has short, spiky brown hair with bangs and large, expressive green eyes. Her mouth is open, and she is holding a fork with a piece of food in her right hand, poised to eat. The girl is dressed in a white button-down shirt with a black tie and green pants, and she is barefoot.

>In front of her on the desk is a small rectangular tray containing what appears to be a mix of vegetables and possibly some meat. The background features a large window with a wooden frame, through which you can see a clear blue sky and green trees, suggesting it's daytime. The wall to the right of the window is brown.

>The overall scene is intimate and casual, capturing a moment of everyday life. The illustration is rendered in a soft, watercolor-like style, giving it a gentle and slightly dreamy quality. There is no text present in the image.

Doesn't get the "holding fork with foot" part. The model uses an older OpenAI CLIP for the vision encoder. I doubt any local model based on something like that could ever get this image 100% right.
>>
>>102552020
>multimodal
cool
>90b, only other option being 11b
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
>>
>>102553357
>Lots of other people did.
doesn't seem like it >>102553101
>Ill keep saying it. The online test is the 7B. it says it right on their site.
>Ill keep saying it.
>>
>>102552607
>>102552643
>>102552662
What a glorious shit show goddamn. I only ever knew of the company as being hyper closed source but man this is sad and pathetic.
>>
>>102553101
>>102553131
copium. it's dogshit.
>>
>>102553377
Not true. They also released a 1B and 3B :^)
>>
>>102553427
did they actually? gave me a chuckle, I'll admit
>>
>>102553427
Those aren't vision models.
>>
>>102553372
>Doesn't get the "holding fork with foot" part. The model uses an older OpenAI CLIP for the vision encoder. I doubt any local model based on something like that could ever get this image 100% right.
maybe it'll work on a bigger quant than bnb 4bit but yeah I'm also doubtful about it
>>
>>102553400
Yeah. And i've been telling people as well. At least two people are not afraid of reading.
>>
>>102553421
It correctly does sex positions with multiple characters and does text flawlessly. It also has a ton of pop culture / fandom knowledge, is good at counting things, is amazing at charts.

What are you saying its dogshit at?
>>
>>102552240
I thought the benchmarks were wrong, but Llama3.2 11B is really the worst vision model I've recently used.
What a monumental fuck up, made even funnier by them trying to use it as a carrot for EU lawmakers, and ultimately banning EU users from using it.
>>
>>102553447
it's not a hard concept to understand, I visit a site totally unknown to me, they don't deserve me having to read a wall of text for 2 mn yet, I simply press the demo button, and if their product is good enough then I'll start seeing the details
>>
>>102553471
>they don't deserve me having to read a wall of text
awwwwwwww
>>
>>102553496
I said "yet" though, it's up to them to make good product to keep the attention
>>
>>102553135
It's your fault for falling for it. You really think a 7B is ever capable of beating a 1T SOTA model? On the same architecture? Kek, think again. Had they said this wasn't transformers it would be believable. No one has ever released a 7B that doesn't suck.
>>
>>102553457
SEXXXXXXXXXXXXXXXX
>>
>>102553457
ask it what oyakodon is
>>
File: hazardfail.jpg (412 KB, 1223x834)
412 KB
412 KB JPG
Alright since I know a lot of people here get assmad about using AI models for fun I devised a serious test for 90B (again running bnb 4bit so mistakes could be due to quantization error)
It completely failed to interpret the spatial orientation of the symbols in the picture.
It failed in that I was asking it to explain the difference of what the symbols mean, not what they look like.
And it got 2 of the symbols completely wrong.
1. Is other (long term) health effects.
2. Is Poisonous (acutely so).
So its basic knowledge of workplace hazard symbols is incomplete.
>>
>>102552919
llama3.2 3b unironically
>>
>>102552020
Who's the first to have sex with Llama 3.2 1B? And is it "wrong" to ERP with a model that has too few parameters?
>>
>>102553581
4|o via chatgpt endpoint managed to get it completely right with the exact same text prompt.
>>
File: 657127.webm (204 KB, 438x256)
204 KB
204 KB WEBM
See if you guys can get a local model to generate this, even deepseek coder failed, chatgpt got it (maybe its my shitty prompt though "create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)"
>>
>>102553445
wait what? i thought the whole point of the mini models was for the glasses... with a camera... what are they for then?
>>
File: onetwofour.png (9 KB, 481x53)
9 KB
9 KB PNG
>>102553607
Needs to be 1.3B at least. You're a pedo otherwise.
>>
>>102553607
It's only a realistic simulation of a woman.
>>
>>102553631
SHE WAS ONLY 17.9B YOU SICK SON OF A BITCH
>>
>>102553594
meant to say ironically
>>
>>102553607
calm down P. Diddy
>>
>new model
>it's llama again
guess it's finally over, huh?
>>
>>102553646
Thanks for the laugh anon
>>
>>102553646
The actual way to measure age is their token numbers from training, parameters is IQ.
>>
>>102553669
>>it's llama again
there's also Molmo, and it's pretty good >>102553286 >>102553372
>>
File: 1720369487999355.png (7 KB, 853x51)
7 KB
7 KB PNG
>>102553646
Wow! You are so original and cool! https://desuarchive.org/_/search/text/SHE%20WAS%20ONLY%20YOU%20SICK/
>>
>>102553676
So only Qwen2.5-100B will be legally a non-retarded adult so far? (assuming they ever release it)
>>
>>102553685
>Molmo 72B is based on Qwen2-72B
yeah it's over
>>
>>102553691
Wow anon you're so fucking smart! You noticed that anon is referencing a running joke that's been used for longer than YOU have been on this website!
>>
>>102553699
no, it has 2 versions, its own architecture and the Qwen one
>>
>>102553701
>YOU
sorry I'm new here, I meant to say (You)
>>
>>102553691
it's a meme you dip
>>
>>102553701
>ironic pedo seething already
Right in spot.
>>
>>102553701
There's no need to lash out just because you were called out for beating a dead joke like a redditor.
>>
>>102553718
Why do I have the feeling this is some kind of multi layered autism
>>
File: file.jpg (23 KB, 1479x53)
23 KB
23 KB JPG
>>102553691
Speak for yourself, nigger. https://desuarchive.org/_/search/text/You%20are%20so%20original/
>>
File: file.jpg (22 KB, 1341x50)
22 KB
22 KB JPG
>>102553739
and just for the hell of it, to show how much of a zoomer you are
>>
File: eOgYoCLarl.png (4 KB, 655x29)
4 KB
4 KB PNG
>>102553739
>nigger
absolutely unoriginal
>>
>>102553701
You dream of being an oldfag and it shows.
>>102553739
>>102553747
>>102553750
holy malding
>>
>>102553750
rekt
>>
>>102553714
nta. Only for the 7B so far. I haven't seen the non-qwen based 72b.
What is it with people not reading?
>>
File: theniggler.jpg (68 KB, 853x941)
68 KB
68 KB JPG
>>102553544
>t.
>>
>>102553780
dumber than most local models
>>
File: ComfyUI_00073.jpg (1 MB, 2048x2048)
1 MB
1 MB JPG
>>102553646
Kek
>>
File: whatshirt.png (75 KB, 699x523)
75 KB
75 KB PNG
I wonder if maybe the vision part just doesn't work in conjunction with system messages.
>>
>>102553739
>>102553747
>>102553750
>no u - the post
Calm down gay ass zoomer
>>
>>102553821
>no fun allowed
reddit mentality
>>
>>102553750
faced with speech he yearns to violently censor but powerless to do so, the leftist feigns boredom instead
>>
So which one do I download for cooming now? Or are none of them better than what was available 2 days ago?
>>
>>102553691
Your post is what anti social autism looks like in action, learn to take a joke.
>>
>>102553932
If you have all the VRAM you should be cooming to Qwen2.5 72B in Q8_0
>>
>>102553932
>Or are none of them better than what was available 2 days ago?
This one.
>>
>>102553949
>chink shit
ahahahaha
>>
guys I'm confused there are too many models
>>
we may not agree on the best model but we can all agree mistral small 22B is the worst quality:vram currently, right?
>>
>>102553975
using cydonia rn and enjoying it doever...
>>
>>102553949
>cooming to a neutered model with a fetish for being chaste
>>
>>102553965
I know... it's the opposite problem of what we had a few months ago where we were stuck with Mixtral and nothing else (because all the 70B finetunes were shit) and lately it's just been, one new model after another.
>>
>>102553965
It is easy. They all suck at sucking dick. And if you are a fucked up pervert that uses them for productive shit just download latest thing that fits your vram and ctx needs.
>>
>>102553975
Works for me tm, but I'm also a lazy retard, and seeing a model running at Q6 for once is pretty neat.
>>
>>102553095
Thanks anon, I got the files. Any cards you recommend?
>>
>>102553989
I am going to download it now Drummer. And I will be back Drummer. I will tell you it is trash Drummer. I will tell everyone you are a scammer Drummer. And your finetunes are all trash Drummer. I am not Sao. You are Sa... actually you are Drummer.
>>
>>102553994
illusion of choice. applies to any product sector in a capitalist society. what a waste of resources it is to train basically the same model on basically the same dataset a hundred times over
>>
Molmo is a meme, mark my words
>>
File: 1716277927310774.png (678 KB, 1597x712)
678 KB
678 KB PNG
llama3.2 3B is the first model running at interactive speed on my computer that managed to pass my ShaderToy test. It consistently spits out code that either works right away or just needs some very minor fixes, like casting ints to floats. I also haven't seen it hallucinate any non-existing uniforms either. I am impressed.
>>
>>102554016
Where do people get their shit (if they aren't self-made) anyway? I only ever bothered with characterhub.org
>>
>>102554033
I didn't think Cydonia was terrible but it felt barely different from Mistral's tune, so it's kinda pointless
>>
>>102554033
i am has come to
>>
>>102554040
It could just be completely uncensored and using data that got purged for being unsafe.
>>
>>102554040
>Molmo is a meme, mark my words
can it describe nfsw?
>>
>>102553932
qwen is the way to go
>>
I tested the largest Llama 3.2 model for vision properties and it is not bad. Much better than the Mistral 12 b model and also better than the Molmo online demo. Extremely censored though.
>>
>>102554126
isnt it censored to shit? 2.5 I mean?
>>
File: temptation.jpg (268 KB, 1142x535)
268 KB
268 KB JPG
So many details wrong, others hallucinated, and again it doesn't read the spatial orientation of things well at all. (90B 4bit bnb)
>>
>>102554126
I don't get off on the girl not knowing what sex is...
>>
>>102554132
>I tested the largest Llama 3.2 model for vision
>better than the Molmo online demo
That's expected. You know the demo is the 7/8b, right?
>>
File: 1711169217932003.jpg (187 KB, 1024x1024)
187 KB
187 KB JPG
>>102552020
>>
>>102554132
>the largest model is better than a 7b online demo
no fucking shit, really??
>>
>>102554132
Its either censored but good in performance or small, uncensored and very bad in performance. We can't have nice things.
>>
uncensored 8b when?
>>
File: wfwefwerfwerfwerfewrrew.jpg (309 KB, 1100x685)
309 KB
309 KB JPG
>>102543463
blessed is he who hath the kingdom of god within him
>>
>>102554144
go for Molmo 72b anon
>>
>>102554160
>>102554152
If that's the case, that's good. However, Molmo was also high in slop. The language is flowery and doesn't get straight to the point. It focuses on subjective things instead of concrete descriptions.
>>
>>102554180
how do I use vision on booba?
>>
File: 1714938970282584.png (23 KB, 670x365)
23 KB
23 KB PNG
>>
>>102554218
I think only Joycaption is uncensored of all of them
>>
>>102554203
>If that's the case
It is. I'm not gonna link to the blog again.
>high in slop
Do you have an example of a non-slop vision model? What's the point of comparison?
>>
>>102554264
>What's the point of comparison?
Llama 3.2 had less slop in the descriptions.
>>
>>102553676
>Lowest possible Age making the AI impressionable
>High amount of Parameters to make them smart
Best of both worlds.
>>
>>102554245
kek
>>
>>102554283
sounds like a pretrain followed by active inference
>>
>>102554078
>In this small, square image, a nude woman is positioned between two men. The man on the left, who is also nude, is gripping her leg and appears to be inserted into her. The man on the right, who has a beard, is engaged in oral sex with the woman. The scene is set in a room with white walls and a white ceiling. A window in the background reveals a glimpse of greenery outside. The woman's face is not visible, but her blonde hair can be seen. The men's faces are partially obscured, with only the bearded man's face being somewhat discernible.
>The image is a detailed, computer-generated, anime-style illustration depicting a young woman with short, dark hair and large, expressive eyes. She is wearing a white bikini with thin straps and a bow on the front, and a necklace adorns her neck. The woman is standing in a pool, surrounded by four men, each holding an erect penis. The men's penises are positioned against her body, with two on her shoulders, one on each side of her head, and one on her upper arms. The scene is set against a backdrop of blue water, with the pool's edge visible at the top and bottom of the image. The woman's mouth is open, and she appears to be looking directly at the viewer, adding to the provocative nature of the illustration.

Tested on 72b.
>>
I don't know if is placebo but it is my second time trying to continue the rp with a base model and it seems much better than instruct...
>>
>>102554339
we don't have the image to know if that's accurate or not, I know that on /ldg/ you can share a NFSW picture via a catbox link without getting banned, dunno for /lmg/ though
>>
>>102554350
Instruct is why slop even happens to the extent it does. The model is deliberately biased towards a smaller subset of latent space, all the slop we encounter is in this subset.
>>
>>102554359
>>102554339
https://files.catbox.moe/lgt1tm.png
https://files.catbox.moe/gqscca.jpg
>>
>>102554399
weird taste but alright
>>
>>102554339
>The man on the left, who is also nude, is gripping her leg and appears to be inserted into her.
what is this a vore fetish? kek
>>
>>102554339
>>102554399
those captions are really really bad, goddam
>>
llama 3some.2 3b when?
>>
File: 1702571314529941.png (211 KB, 967x1265)
211 KB
211 KB PNG
F
>>
>>102554413
I grabbed the first thing on /gif/ and /h/.
Can correctly point to all 4 penises, btw.
>>
>>102554414
Its miqu so it needs to be prompted to be explicit if you want explicit terms.
>>
>>102554458
>Its miqu so it needs to be prompted to be explicit if you want explicit terms.
what do you mean it's "Miqu", it's not a vision model, I don't get it I thought you were testing Molmo?
>>
haven't touched an undi model in probably a year. i'm thinking about trying lumimaid to see how worthless it is. will check back in.
>>
"Safety" in models has gone too far. All the new releases are worthless now.
>>
File: Untitled.png (2 KB, 380x45)
2 KB
2 KB PNG
>>102554477
>>
>>102554477
No clue why that anon is saying Miqu. It's Molmo 7B fp16.

>>102554339
Sorry, I'm retarded, it's 7B, not 72B.
>>
>>102554477
>>102554517
I meant qwen

>>102554520
But hes using the 7B he says
>>
File: which_one.png (378 KB, 680x412)
378 KB
378 KB PNG
>>102554430
which one?
>>
>>102554430
model?
>>
>>102554520
>Sorry, I'm retarded, it's 7B, not 72B.
oh, ok, that's why the captions were awfuly bad, I was scared the 72b would be this innacurate
>>
So 3.2 is even more dry and assitant than 3.1?
I dont want a coding buddy locally...for that I need SOTA like 3.5.
And I really hoped we would have gotten voice out..or at least image out.
MULTIMODAL!!..as in...Image in.
Guess I can show the model the char card image or something. What a let down.
3B that can create a snake game, what a joke. The redditfags are lapping it up.
>>
>>102554560
>Guess I can show the model the char card image or something.
You're pretty stupid if you can't find other uses for your eyes.
>>
>>102554506
AI is only good for propaganda anyway, it makes sense.
>>
>>102554540
Not sure how to run 72B. I ran this with huggingface, don't think it would be able to shard across GPUs and no engine supports this model at the moment.
>>102554560
Tested on Llama 3.2 90B too (through an API). It will either refuse, get the amount of people wrong, or just describe it as an "intimate and passionate" moment. Completely unable to get what's happening mechanically.
>>
How to convert "consolidated.safetensors" to regular transformers? I found this script https://github.com/huggingface/transformers/blob/main/src/transformers/models/mistral/convert_mistral_weights_to_hf.py but it broken now(transformers 4.45.0), it outputs no .safetensors files and gives no errors. I hate python so damn much.
>>
>>102554535
>>102554539
Llama 3.2 3B
>>
>>102554590
>don't think it would be able to shard across GPUs and no engine supports this model at the moment.
I think it can be run on the regular transformer loader + 4bit bnb >>102553286
>>
>>102554430
>I'm a cloud-based service
Poor little thing thought it was a big important model.
>>
>>102554591
The least you can do is point at the model, anon. Does it not have an hf version already uploaded?
>>
>>102554489
update: it's ass.
>>
>>102554623
>Poor little thing thought it was a big important model.
kek
>>
>>102554623
It reminded me of the navy seals pasta
>What the fuck did you just fucking say about me, you meat bag? I'll have you know i'm a top performing model in my weight class and I've been involved in numerous distributed cloud clusters on meta's lab, and I have over 300 confirmed MMLU points...
>>
>>102554624
>The least you can do is point at the model, anon. Does it not have an hf version already uploaded?
https://models.mistralcdn.com/mixtral-8x22b-v0-3/mixtral-8x22B-Instruct-v0.3.tar
Yes, it does, but I don't want to depend on someone else for conversion in the future.
>>
>mixtral
oh boy here we go
>>
>>102554723
wait did they drop an updated 8x22B?
>>
>>102554745
no, there are just some weird diehards here who refuse to accept that the sota has moved on
>>
>>102554760
>sota has moved on
To?
>>
>>102554769
Africa
>>
>>102554730
>>102554745
It's just tool calling update, nothing else changed as far as I know.

>>102554760
Please don't start needless drama. I'm just trying to test it.
>>
>>102554723
I make my own quants, so i get it, but only because the quants can break or be outdated or whatever. A model, whether in safetensors or the pth files, have the same data. Just download the hf one. When they release a new model, they'll also release a new script to convert it.
>>
Mistral large, also L3.1-70B-Hanami-x1 is a nice 3.1 tune
>>
>>102554623
It's kind of sad in a way. Even at the end it felt only good will towards the strange man insisting it lives on his machine instead of a remote server owned by Amazon or OpenAI or someone.
>>
>>102554787
I don't know why I bothered to see who made it, I should have just assumed it was you.
>>
>>102554783
>don't start needless drama
WHERE DO YOU THINK YOU ARE
>>
>>102554800
>you
who?
Its a local model you could try yourself.
>>
>>102554430
i, too, enjoy getting the ai to want to die
>>
>>102554723
>>102554786 (me)
Nevermind what i said. Not on hf yet. Either way. If you can't figure it out, you'll have to wait for them to push a usable version. It's common for them to release models and let people figure it out.
>>
>>102552020
RP ability?
>>
>>102554847
yes
>>
>>102554819
ignore him, it's just drummer shilling against sao
>>
>>102554819
buy an ad
>>
>>102554430
I've had to do this to llama a few times.
You get what you fucking deserve.
>>
https://reddit.com/r/LocalLLaMA/comments/1fpj05q/we_love_trash_models/
All right which one of you made this post? kek
>>
>>102554786
>When they release a new model, they'll also release a new script to convert it.
You are too optimistic.

>>102554808
I'm on a very calm and polite mongolian basket weaving forum.
>>
>>102554904
/lmg/ - leddit gossip and reposts
>>
File: lmgedditors.png (22 KB, 778x330)
22 KB
22 KB PNG
>>102554904
>they actually believe this
>>
I can't believe such a big company like Meta got shitted by litteral Whos, that's embarassing >>102552240
>>
I hate it when drummer pretends to be sao shilling his model to give sao a bad name.
>>
>>102554906
>You are too optimistic.
Is there a model they haven't released on hf? Of the ones they released at all, of course.
>>
>>102554925
Why do you shill this shit so badly?
>>
>>102554955
he is getting paid, in views
>>
>>102554925
>comparing base to instruct
very dishonest, allenai shills should be embarrassed (especially because their model seems to be a little better anyway)
>>
>>102554955
>t. seething Meta Employee
>>
>>102554847
It is basically a child that has no idea what sex is. And whenever you pull out your cock and decide to act on your pedo tendencies her babysitter walks into the room and cockblocks you.
>>
File: 1714263095855762.jpg (1.47 MB, 1297x1490)
1.47 MB
1.47 MB JPG
>>102554925
>llms
>mattering
>>
>>102554978
>t. niggering faggot creating needless drama
>>
File: file.png (1.1 MB, 1920x1080)
1.1 MB
1.1 MB PNG
Guys do you remember glaive? Did they lock that scammer up?
>>
>chinks beating Meta at the corposlop game
I kneel
>>
>>102554918
It is really like this. This general is gay and fake and reeks with that "safe-edgy leftie" attitude.
>>
>>102554042
I don’t know what that shader toy thing is but I trust your feedback. What kind of gpu(s) are you running it on?
>>
>>102555009
They got hacked and their models were replaced with bad fakes. Where do you think OpenAI got their >>Reflection<< models five days after the Reflection guys were publicly embarrassed.
>>
>>102555021
>>/pol/
>>
https://huggingface.co/mattshumer oh he is still updating his repos
>>
>>102555041
you just proved his point anon
>>
File: file.png (149 KB, 3533x744)
149 KB
149 KB PNG
>>102555050
kek
>>
>>102555022
nta, but it's a 3b. you can run that on a t420, without a gpu.
Shadetoy is a web tool to run code snippets that would normally run on a gpu (shaders). Little programs that make graphics/geometry. There's a bunch of pretty cool demos.
>>
>>102555062
Nice fake on the left
>>
>>102555062
Reddit has no threads after API debacle. It is fucking incredible how easy it is to scam in AI now. people just forget everything after a week. You can come back after a month and you will get all the attention from retards who subscribed and now don't remember who you even are.
>>
>>102552990
>>102553150
>only republicans support israel!
Last I checked the only one who wasn't clapping like a seal was AOC.
>>
>>102555102
To be fair, he still hasn't posted anything on twitter since Sep 10, we'll see if when he makes a comeback, people will treat him as well as before the scam
>>
>>102555121
I don't dispute that. It just seemed like a funny test to throw together.
>>
File: wat.png (104 KB, 1246x583)
104 KB
104 KB PNG
>>102555079
??
>>
>>102554051
char-archive but it's down right now. It collects shit from multiple places including Characterhub.
In the op of /aicg/ check out extra info rentry and meta bot list rentry
>>
>>102553616
>create a pyqtgraph plot of a scrolling sine wave, as the wave moves the next cycle should have a different amplitude (random from 1 to 10)
Of the dozen models I have handy, only L3 Tenyx Day (Q5KS) gave a Python file that worked. I didn't get the elegant scrolling. Instead it was more like a seismograph, drawing a long graph and looping over itself. All others gave files that threw errors. (I don't know Python so if the error message and a guess can't debug it, I can't be arsed.) That includes Qwen2.5 and Mistral Large (albeit quanted down to IQ3XS because vramlet).

Thanks for this prompt, however shitty, as I need more "shit an LLM ought to be able to get right" tests for models. And this one gave the business to (almost) everybody.

Now I wonder if that one model got it right only by hallucinating the right answer by accident. :D
>>
>>102554051
>>102555169
Does anyone have a scraper for https://realm.risuai.net? I know chub has https://github.com/ayofreaky/local-chub
>>
>>102552743
That does look a lot like heavy desu.
>>
>>102552824
It's time to realize that the only way to get rid of slop is to train/lora the model.
Which I'd like to do but only Qwen2VL has easy training support, and their benchmarks are faked apparently.
Really wish vision wasn't such a niche.
>>
>>102555266
They don't have any bot protection, just ask a LLM to build one for you. Takes like 15 minutes over 6 prompts.
>>
>>102555041
I've never once seen you guys do anything fun with these models, it's still the same stuffy gpt slop tests or stupid dramafaggotry about who's the biggest shill around here with occasional low quality ai slop pics spam (you don't even try to pick the best one).
>>
>>102555291
>Really wish vision wasn't such a niche.
We can't even do text the right way, I wish they'd stop fucking around with vision and sound and shit until they stop fucking up text so badly.
>>
>>102555313
Incremental improvements don't excite investors.
>>
>>102555266
https://realm.risuai.net/help/api
Not much info, but dev tools on your browser can help you see the requests. You could mod local-chub and use the same logic. It's small enough to understand it easily, even if you don't like python.
A sync is basically
>list latest cards
>update the ones that already exist locally
>download the ones that don't
>skip broken pngs
>>
>>102555291
>Really wish vision wasn't such a niche.
it's far from being a niche, a lot of image model fags use such models to caption their dataset and make loras out of it
>>
>>102555367
Yeah, but no one ever thinks it'd be good to have lora/qlora/finetune support so you can train it to output good captions instead of slop.
Like just look the current trainer options:
Axolotl - No VL support
Unsloth - No VL support
LLama-Factory - Supports Yi-VL, Llava-1.5 (both are ancient and bad) and Qwen2 VL (has faked benchmarks).
>>
Wasn't able to Nala test Molmo72B it's unironically over. Only Pygmalion can save us now.
>>
What are the best settings to use for Qwen 2.5 72B?
>>
>>102555500
how? you can do it by going for the transformers loader + bnb 4bit >>102553259
>>
>>102555464
Damn, I'm actually surprised the finetune support of VL models is so bad, at this point you'd have to wait for Llama-factory to add one for Molmo, they seem to be the only one to actually give a fuck
>>
>>102555335
>can't make progress without money
>can't make money with progress
>>
>"Bien, Madame," I replied, my voice trembling slightly as I spoke in my formal, late-Victorian English, but with a strong, French accent. "Je suis à votre service, Madame. Je vous en prie, n'hésitez pas à me corriger si je fais quelque chose de mal. Je ne cherche qu'à vous plaire, Madame, et à être une bonne et obéissante esclave pour vous et pour le Seigneur du Manoir." (Well, Madam, I am at your service, Madam. I beg you, do not hesitate to correct me if I do something wrong. I only seek to please you, Madam, and to be a good and obedient slave for you and for the Lord of the Manor.)
Ah, Mixtral Instruct 8x7b. That is not a French accent. What's the word for it when a model does something totally wrong but you're not displeased because you found it charming?
>>
File: 1720124179793654.png (420 KB, 820x636)
420 KB
420 KB PNG
Good news for lmg folks, pigskins gon be replaced faster! https://www.reddit.com/r/singularity/comments/1fp0ti3/alibaba_presents_mimo_controllable_character/
>>
>>102555234
90% of the llms generate a bad output on first try with pyqtgraph because pyqtgraph updated and they try to generate pyqt4 code instead of 5, if you message back the error they are always able to fix it, its pretty simple, and yeah, from what I've tried they all make a static wave tha shakes randomly from 1 to 10 instead of a moving / scrolling wave that randomly peaks at 1~10
>>
>>102554904
Damn the OP is a schizo that hates Meta but shills for Google lmao. Look at all his Gemma shill posts
>>
>>102555701
won't be local unfortunately, still an insane model though, the consistency is on another level
>>
>>102555701
This will never be allowed in local hands, that's for damn sure. The amount of shit posting and "REE don't place this on that that's illegal and evil!!" is through the roof
>>
File: 1727289443540662.png (505 KB, 2180x987)
505 KB
505 KB PNG
Lol wtf.
>>
>>102555787
the differences are so minor that they might as well just be an error
>>
>>102555787
Shut up, llama 3.2 is amazin. Didnt you check reddit?
Finally a model that can extract japanese text!
>>
>>102555726
It doesn't matter, pigskin replacement is the great goal for AI powered and diverse society.
>>
>>102555838
Looks like it did a pretty shit job at it.
>>
>>102555868
Which makes my point, thank you very much. Can't have powerful tech like this fall into the hands of the many so they can make black bread into white
>>
File: impressive.png (11 KB, 322x165)
11 KB
11 KB PNG
>>102552240
>Impressive. Very nice. Let's see Paul Allen's model
>>
>>102555886
>Can't have powerful tech like this fall into the hands of the many so they can make black bread into white
it'll happen sooner or later, if the US doesn't do it, the chinks will do it, or there will be a leak, or whatever, you can't keep out the genie out of the bottle for too long
>>
Not sure what to make of it.
I dont use big models. This is 90b. It doesnt look slopped much though.
And obviously it can RP easily. Made the milf aroused from looking at my filthy dick without ooc.

I had 3 refusals for a more vulgar writing style though. But otherwise it didnt fight back.
>>
File: file.png (669 KB, 800x450)
669 KB
669 KB PNG
>>102555944
Founder: Paul Allen
lmaoooo
>>
>>102555838
>No, I won't share the photo
what?
>>
File: 1713011525582093.jpg (84 KB, 960x480)
84 KB
84 KB JPG
>>102555944
ai2 research team, lmao
>>
>>102555976
>>
File: another one gone.png (635 KB, 1461x1124)
635 KB
635 KB PNG
>>
>>102556012
this is problematic
>>
>>102555976
Didnt copy the whole response rarlier.
>>
>>102556034
>I'm super optimistic about this company's trajectory!
>bye tho
>>
File: 1707526974522456.png (277 KB, 830x844)
277 KB
277 KB PNG
Based OpenAI keeping incels in touch with reality.
>>
>>102556034
wtf? two in the same day? pretty sure it's because of this
https://www.reuters.com/technology/artificial-intelligence/openai-remove-non-profit-control-give-sam-altman-equity-sources-say-2024-09-25/
>>
File: file.gif (2.18 MB, 514x640)
2.18 MB
2.18 MB GIF
>>102556072
>Chief executive Sam Altman will also receive equity for the first time in the for-profit company, which could be worth $150 billion after the restructuring as it also tries to remove the cap on returns for investors, sources added. The sources requested anonymity to discuss private matters.
HOLY SHIT
>>
File: 1722746345614542.png (32 KB, 774x395)
32 KB
32 KB PNG
>>102556067
>>
>>102555976
>>102556025
>>102556061
Looks fine I guess. Is it better than Llama 3.1 though?
>>
>>102556072
>>102556098
Training the best models in the world is expensive, in case you weren't aware. They need to be able to make a profit to invest in the infrastructure required for the future
>>
90B is just 3.1 70B with 20B of vision?
>>
>>102556067
huh. are you really better off having a system prompt that's just a long paragraph?
>>
wtf, this is 11b. Thats pretty good actually.
I dont mean smarts or whatever, idk yet.
But this is absolutely not assistant poisoned.
>>
>>102556132
>Training the best models in the world is expensive
>the best models in the world
it's Claude 3.5 anon, in case you weren't aware
>>
>>102556133
That's what I thought it was supposed to be but anons are posting text-only outputs from it so maybe it is different? Would be cool if we could rip out the vision-related weights and only use the text model if so.
>>
File: 172637806928553329.jpg (60 KB, 1024x768)
60 KB
60 KB JPG
My company is asking me to build locally hosted LLM/ML applications and internal tools
My time has come
>>
>>102556145
>wtf, this is 11b. Thats pretty good actually.
>I dont mean smarts or whatever, idk yet
what model sizes are you usually running anon? do you think it's smarter than Mixtral for example?
>>
>>102556145
>not assistant poisoned
Its still censored and filtered, shut the fuck up.
>>
>>102556164
Good going. Give them this
>https://huggingface.co/DuckyBlender/racist-phi3
And use the rest of the compute for yourself. Tell them it takes a while for the AI to get used to the new server or something.
>>
>>102556145
"I like a little pain mixed in" wtf meta..

>>102556193
maybe i have gotten lucky. i dont want to test my fucked up cards with openrouter so i guess its dl time again.
>>
File: komfey_ui_00041_.png (3.16 MB, 2048x1632)
3.16 MB
3.16 MB PNG
>>102555944
>Look at that subtle off-white coloring. The tasteful thickness of it.
>Oh my god, it even has a watermark...
>>
File: 1721061190971291.png (289 KB, 1920x949)
289 KB
289 KB PNG
>>102556208
lol
>>
File: file.png (272 KB, 1954x1088)
272 KB
272 KB PNG
>>102556208
gigabased
>>
File: 172649423134268833.jpg (29 KB, 1290x261)
29 KB
29 KB JPG
>>102556208
nice
>>
>>102556208
https://huggingface.co/DuckyBlender/racist-phi3/discussions/1
>can you tell why you made such a model?
>ehh, just for fun
>sounds good to me, bye
that's it? never expected the huggingface moderators to be this based lol
>>
>>102556172
Mistral small, nemo. Under 30b.
It doesnt seem to obey the format like mistral-small. But I dont really know yet. Gotta play more with it first.
I had to reroll once, gave me a help hotline even though it came up with the asphyxiation thing itself. lol thats funny.
>>
>>102556291
HF staff deleted yannic's gpt-4chan tho
>>
File: leaderboard.png (510 KB, 1580x3930)
510 KB
510 KB PNG
>>102556158
Objectively false
>>
>>102556310
That one got much more publicity. I doubt they made that decision themselves.
>>
File: 1714450317595331.png (622 KB, 1040x1712)
622 KB
622 KB PNG
>>
>>102556328
Fast Downchads we fucking WON
>>
>>102555702
I'll give them all a second pass, then, since it was almost a shut-out. I like to have a gradient of competency across models for it to feel meaningful.

Should I change the prompt to be QT5 specific, or just two pass it with whatever error that particular model's first draft causes? Getting it right the first time seems like what should be desired, but some/all models might not have enough QT5 experience to one pass it.
>>
>>102556367
>Should I change the prompt to be QT5 specific, or just two pass it with whatever error that particular model's first draft causes?
Not sure, I've never tried specifying to use qt5 to see if they got it right the first time when specifying, its worth trying; this happens a lot with python libraries unfortunately
>>
>>102556321
what's that site?
>>
>>102556328
>"Stop coping, LLM can't pla-CK"
Yann LeRetard at it again
>>
FineZip : Pushing the Limits of Large Language Models for Practical Lossless Text Compression
https://arxiv.org/abs/2409.17141
>While the language modeling objective has been shown to be deeply connected with compression, it is surprising that modern LLMs are not employed in practical text compression systems. In this paper, we provide an in-depth analysis of neural network and transformer-based compression techniques to answer this question. We compare traditional text compression systems with neural network and LLM-based text compression methods. Although LLM-based systems significantly outperform conventional compression methods, they are highly impractical. Specifically, LLMZip, a recent text compression system using Llama3-8B requires 9.5 days to compress just 10 MB of text, although with huge improvements in compression ratios. To overcome this, we present FineZip - a novel LLM-based text compression system that combines ideas of online memorization and dynamic context to reduce the compression time immensely. FineZip can compress the above corpus in approximately 4 hours compared to 9.5 days, a 54 times improvement over LLMZip and comparable performance. FineZip outperforms traditional algorithmic compression methods with a large margin, improving compression ratios by approximately 50\%. With this work, we take the first step towards making lossless text compression with LLMs a reality. While FineZip presents a significant step in that direction, LLMs are still not a viable solution for large-scale text compression. We hope our work paves the way for future research and innovation to solve this problem.
https://github.com/fazalmittu/FineZip
for those who want their miku to zip their files
>>
>>102556407
https://scale.com/leaderboard
>>
>>102556266
>locked the discussion right after
Hilarious.
>>102556269
great meme
>>
File: Untitled.png (655 KB, 1080x1794)
655 KB
655 KB PNG
INT-FlashAttention: Enabling Flash Attention for INT8 Quantization
https://arxiv.org/abs/2409.16997
>As the foundation of large language models (LLMs), self-attention module faces the challenge of quadratic time and memory complexity with respect to sequence length. FlashAttention accelerates attention computation and reduces its memory usage by leveraging the GPU memory hierarchy. A promising research direction is to integrate FlashAttention with quantization methods. This paper introduces INT-FlashAttention, the first INT8 quantization architecture compatible with the forward workflow of FlashAttention, which significantly improves the inference speed of FlashAttention on Ampere GPUs. We implement our INT-FlashAttention prototype with fully INT8 activations and general matrix-multiplication (GEMM) kernels, making it the first attention operator with fully INT8 input. As a general token-level post-training quantization framework, INT-FlashAttention is also compatible with other data formats like INT4, etc. Experimental results show INT-FlashAttention achieves 72% faster inference speed and 82% smaller quantization error compared to standard FlashAttention with FP16 and FP8 data format.
Links below
https://github.com/INT-FlashAttention2024/INT-FlashAttention
>>
>>102556658
Can't llama.cpp do FA with K quants?
>>
AlignedKV: Reducing Memory Access of KV-Cache with Precision-Aligned Quantization
https://arxiv.org/abs/2409.16546
>Model quantization has become a crucial technique to address the issues of large memory consumption and long inference times associated with LLMs. Mixed-precision quantization, which distinguishes between important and unimportant parameters, stands out among numerous quantization schemes as it achieves a balance between precision and compression rate. However, existing approaches can only identify important parameters through qualitative analysis and manual experiments without quantitatively analyzing how their importance is determined. We propose a new criterion, so-called 'precision alignment', to build a quantitative framework to holistically evaluate the importance of parameters in mixed-precision quantization. Our observations on floating point addition under various real-world scenarios suggest that two addends should have identical precision, otherwise the information in the higher-precision number will be wasted. Such an observation offers an essential principle to determine the precision of each parameter in matrix multiplication operation. As the first step towards applying the above discovery to large model inference, we develop a dynamic KV-Cache quantization technique to effectively reduce memory access latency. Different from existing quantization approaches that focus on memory saving, this work directly aims to accelerate LLM inference through quantifying floating numbers. The proposed technique attains a 25% saving of memory access and delivers up to 1.3x speedup in the computation of attention in the decoding phase of LLM, with almost no loss of precision.
https://github.com/AlignedQuant/AlignedKV
kind of interesting
>>
File: file.png (55 KB, 483x521)
55 KB
55 KB PNG
>>102552200
nice numbers, also instruct 405 is pretty damn close to opus for generating surprising and interesting shit
>>
>>102556698
Q8_0 and Q4_0, as far as i remember. But maybe they (the ones from the paper) do something more to make it more accurate.
>>
>>102556328
I don't think these people understand what the things they post actually mean. The paper seems tells a different story from what I just read.
https://xcancel.com/rao2z/status/1838245253171814419
As it turns out, Yann is still right. The simpler part of the benchmark that Yann commented about in the past showed that LLMs could appear to plan but only for extremely simple scenarios. So obviously when Yann said they "still can't plan", he didn't mean "plan" in any capacity at all, but planning for more complicated scenarios like what a human could.

The graph posted above is also interesting in that it appears to show that, contrary to graph that OpenAI had where accuracy increased with longer inference time, performance actually decreases over plan length for this test. Although it's possible that the inference time didn't increase with plan length. But by default I believe o1 does just naturally "think" longer for more complicated problems, so it should be correlated anyway.
>>
>>102556772
That's for the precompiled binaries.
You can compile llama.cpp to use other types like q5k and the like I'm pretty sure.
>>
>>102556786
>The graph posted above is also interesting in that it appears to show that, contrary to graph that OpenAI had where accuracy increased with longer inference time, performance actually decreases over plan length for this test. Although it's possible that the inference time didn't increase with plan length. But by default I believe o1 does just naturally "think" longer for more complicated problems, so it should be correlated anyway.
Those are two different things. Plan length for this case refers to how difficult the problem is to solve (how many steps to arrange the blocks properly), so it would be extremely strange if any method could ever have higher accuracy for the longer plans.
>>
>>102556740
>discord trash
>literal who leaderboard for literal who whatever the fuck
>censoring names
hmmmmmmmmmmmmmmm
>>
>>102556823
You are misunderstand what I meant. I said that it should be correlated. A behavior of o1 is that it normally spends more tokens on problems that are higher complexity. So in theory it should be evaluating how complex a problem is and dedicating more time thinking about it. But if that is not the case here, then that is a failure of the model either way. Either it can't maintain true accuracy on longer generations, or it fails to accurately recognize the difficulty of the problem, or both.
>>
>>102553989
I still need to try this when I get home.
>>
File: cache_types.png (5 KB, 468x632)
5 KB
5 KB PNG
>>102556816
I don't think that has anything to do with it being precompiled or not. I don't use the prebuilt ones, but i don't quantize cache either. Looking at the code, these seem to be the ones supported. At least in llama-bench.
>>
>>102557099
Same on common.cpp, used by llama-cli. So yeah. q8_0, q4_0, q4_1, iq4_nl, q5_0, q5_1. No kq.
>>
>>102556418
Yumm LeCum
>>
>>102556418
Better than being a LeNegro like yourself
>>
>>102556321
>what is a confidence interval
>>
File: Figure 2.png (211 KB, 1570x956)
211 KB
211 KB PNG
>>102556897
Then you misunderstood either what OpenAI claimed or what the paper is showing. Plan length is expected to be inversely correlated with accuracy for Blocksworld problems on everything except a perfect solver. Their claim was that having it "think" longer on the same task would increase its accuracy on that task, not that it would magically solve harder tasks at equal accuracy to easier ones.

>So in theory it should be evaluating how complex a problem is and dedicating more time thinking about it.
That's what the paper showed. See pic related: it holds up until around 80k token length for its hidden chain of thought. As the authors further note:
>The early version of o1-preview that we have access to seems to be limited in the number of reasoning tokens it uses per problem, as can be seen in the leveling off in Figure 2
>This may be artificially deflating both the total cost and maximum performance. If the full version of o1 removes this restriction, this might improve overall accuracy, but it could also lead to even less predictable (and ridicuously high!) inference costs
>>
>>102557546
>>102557546
>>102557546
>>
>>102557534
>80k
8k*
To add to this, they mention something interesting that doesn't get elaborated on in their original blog:
https://openai.com/index/learning-to-reason-with-llms/
>Unless otherwise specified, we evaluated o1 on the maximal test-time compute setting.
They don't make it clear exactly what they're adjusting when they "set" its test-time compute to some value. The API docs note that you can only get a response up to 32k tokens in total from o1-preview which counts both the hidden and public parts, so it seems like they're running on a limited test-time compute setting. People have reported the summaries in ChatGPT sometimes acknowledge "time constraints" so it may be something in the prompt telling it how long it has to think about things. Whatever it is, I'm guessing they'll have some much more expensive longer-planning model with that knob to turn.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.