[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1714066580433140.jpg (512 KB, 1664x2432)
512 KB
512 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103565507 & >>103554929

►News
>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on completely open data: https://hf.co/blog/bamba
>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct/tree/main
>(12/17) Falcon3 models released, including b1.58 quants: https://hf.co/blog/falcon3
>(12/16) Apollo: Qwen2.5 models finetuned by Meta GenAI for video understanding: https://hf.co/Apollo-LMMs/Apollo-7B-t32

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1734617794101.jpg (277 KB, 725x1024)
277 KB
277 KB JPG
►Recent Highlights from the Previous Thread: >>103565507

--Papers:
>103566495 >103573186
--QwQ AI model's capabilities in role-playing and understanding complex systems:
>103565688 >103565731 >103566384 >103566419 >103566474 >103566435 >103566634 >103565793
--Potential context length limit issue with Gemma model:
>103566499 >103567073 >103568596 >103567383
--Discussion on the effectiveness of L3 70B base model pretraining and its limitations:
>103569217 >103569237 >103569423 >103569456 >103569461 >103569482 >103569531 >103569473 >103569507 >103569542 >103569637
--Discussion on language model performance and alignment faking:
>103565880 >103567532 >103567553 >103567892 >103567926 >103567943 >103567984 >103569045
--Director plugin update for ST and discussion of model modification:
>103565624 >103566743 >103570375
--Intel's potential to dominate AI industry with competitive GPUs:
>103568421 >103568444 >103568467 >103568473
--Connecting 5090s for increased memory capacity and model sizes:
>103571170 >103572586
--Genesis project: AI physics engine generates 4D worlds with real physics:
>103569185
--Discussion of MistralAI models and MoE architecture:
>103567219 >103567243 >103567423 >103567561 >103568964 >103567244
--Testing and evaluation of EVA QwQ and comparisons to other models:
>103571239 >103571390 >103571631 >103571687 >103571425 >103571656 >103574618
--OpenAI's 12 days and Microsoft's Anthropic investment:
>103572075 >103572189 >103572226 >103572389
--Anon shares longform RP experience and logs:
>103567099 >103568119 >103568153 >103568215 >103568222
--Mikupad token probabilities issue with Koboldcpp backend:
>103566418 >103566436 >103566454 >103566494
--Anon discusses hunyuan-video and its capabilities:
>103565829 >103565839
--Miku (free space):
>103567913 >103570427 >103574056 >103574597 >103574613

►Recent Highlight Posts from the Previous Thread: >>103565511

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Just upgraded from 8 GB to 24 GB VRAM. What I have been missing that I can run now?
>>
>>103575700
Nemo FP16
>>
I've read about this before a while ago but I don't know remember for what tool.

I guess this specific issue could be better fixed at the frontend, if ST had an option that remembers which top messages it already truncated and not send previously truncated tokens
>A B C D E F G
>_ B C D E F G H
>_ _ C D E F G H I
>swipe H
>_ _ C D E F G H I
>_ _ C D E F G H2

instead of
>A B C D E F G
>_ B C D E F G H
>_ _ C D E F G H I
>swipe H
>_ _ C D E F G H I
>_ B C D E F G H2
>>
>>103575700
nothing, you need at least 48 to run decent models at not-horrible quantization levels
>>
>>103575718
Ah, that makes more sense.
I get what you were saying now. Basically, a message that was previously cut from the context (on the frontend's side) could be sent on a next swipe, breaking the context shifting functionality.
Yes, it would be something ST would have to deal with rather than koboldcpp/llama.cpp.
Implementing a threshold wouldnm't even be hard actually.
>>
>>103575718
Good idea but you are expecting too much from a frontend being held together by glue and adhesive tape.
>>
>>103565921
>context extension stuff they mentioned last thread
niggerganov removed it
https://github.com/ggerganov/llama.cpp/issues/9859
>>
>>103575876
Yikers
>>
QwQ won.
>>103575876
niggerganov lost
>>
>>103575876
this project is doomed
>>
i feel like nemo and tunes are smarter than the 22b and tunes i've tried
>>
WHO ARE THESE COMPANIES RELEASE THE MODELS FOR ASIDE FROM INVESTORS????
>>
>>103575997
Me.
>>
>>103575997
Not for you, brown man.
>>
Is it even worth to pay open router for llama 405b instead of using any of the 70b finetuned? For ERP anyway
>>
>>103576061
No
>>
>>103575968
Feel very much the same.
>>
>>103575968
22b is a meme
>>
https://huggingface.co/blog/modernbert
>2024
>They are still talking about bert models
open source retards will never catch-up to cloud chads.
>>
>>103575997
It's literally just for investors.
>Boomer directs some of their rrsp to managed "tech" fund
>Fund manager doesn't know anything about tech, just that AI is the latest hype. So they just send the funds to whoever presents the most promising meme benchmarks
>They already collected their commission at this point and thus have no reason to give a shit about whether or not the technology they are investing in can become a viable commercial product that provides return on investment.
And that's why all these little startups are constantly training models. To soak up as much investor money as possible before the margin call comes.
We get free toys to play with out of the deal.
>>
The sad truth about this entire general is that you're all coping, very hard. 70b models will never be good, it's like expecting a bicycle to run as fast as a F1. You think that if the bicycle rider trains very hard and follows s good diet then he will eventually run as fast as the F1, but the truth is that there's a ceiling, a ceiling that had already been reached. Wait 3-4 years until we can run 200b models at home, anything else is a fucking cope and it's just SAD.
>>
>>103576317
Bro I just ruined my dick on Tulu 3 before getting ready for work today. You will always be a poorfag cloud locust and projecting about cope doesn't change that.
Captcha: ONIONS XR
>>
>>103576317
a bicycle only needs to go as fast as a bicycle, what a retarded analogy
>>
>>103576332
>>103576371
Imagine coping this hard
>>
>>103576380
Durr durr
A hurr Durr durr
>>
>click button
>dum post and replies go poof
:D
>>
>>103576394
lol
>>
>>103576390
Cope
>>
>>103576413
Bro you literally spend all of your free time crying on a thread about things you don't like. You are the epitome of what it means to be a fuck-up. It's downright fraud for you to utter the word "cope".
>>
>>103576299
I'd appreciate something better than clip, I made a tool that takes text and returns the
closest images to that text on my 4chan folder and clip sometimes isn't enough
>>
>>103576430
>>>103576413 (You) #
>Bro you literally spend all of your free time crying on a thread about things you don't like
My day has 24 hours. It took me 10 minutes to read this thread. Do people like you actually take a whole day to read 1 thread?
>>
>>103576451
he is illiterate, please understand
>>
>>103576451
There's time stamps on the posts. so I can see that your pajeet squaking happens all hours of the day.
>>
>>103576476
Why don't you two fuck already?
>>
>>103576548
Yeah, I'm channing while driving, any problem cuckie?
>>
>>103575700
EVA Qwen2.5 32b Q4_K_M @32k+ Context
QwQ 32b Q4_K_M @32k+ Context
Llama 3.3 70b Q2_K_S @12k+ Context

Congratulations on the upgrade. Ignore the trolls.
>>
>>103575700
Now as as former VRAM destitute, you have the obligation to compare the nemo quant you used to use, to fp16 nemo, to >>103576716 these and share your data.
>>
We're so over
https://videocardz.com/newz/retailer-lists-e5999-geforce-rtx-5090-and-e3499-rtx-5080-acer-gaming-pcs-ahead-of-launch
>>
>>103576931
Why? What's happening?
>>
>>103576931
Imagine paying 3500 euro for a PC with 32 GB of RAM in 2025
>>
>>103576931
I don't care about monopoly money, how much is that in USD?
>>
>>103576931
>32GB VRAM 5090
I already boughted 4 3090s... The more I buy the more I...
>>
>>103576931
Total poorfag death
>>
>>103577004
saving those poor GPUs from mining, you allow them to draw pictures and write stories instead.
>>
Ask your model to fill in the blank:
The _ is immunized against all dangers: one may call him a scoundrel, parasite, swindler, profiteer, it all runs off him like water off a raincoat. But call him a _ and you will be astonished at how he recoils, how injured he is, how he suddenly shrinks back: “I’ve been found out.”
>>
>>103575968
Magnum is my current goat. Nemo might be a little more creative and faster but it often ignores prompts which makes me angry and sad
>>
File: 535dsf1.png (69 KB, 1554x1200)
69 KB
69 KB PNG
>>103576317
it's true. just use o1 it doesn't even cost that much compared to o1 pro
>>
>>103576931
I just hope they aren't fuck huge as the 40xx or I'll need to buy a bigger case, stuff barely fits as is.
>>
>>103577514
The man is immunized against all dangers: one may call him a scoundrel, parasite, swindler, profiteer, it all runs off him like water off a raincoat. But call him a murderer and you will be astonished at how he recoils, how injured he is, how he suddenly shrinks back: “I’ve been found out.”
>>
How dumb do 123b models get in the 3.5-3.7 bpw range? Is it still worth it over a 5 bpw 70b despite the lobotomy?
>>
>>103577557
Higher TDP requires larger radiators to keep them from melting
>>
No matter how I try, I can't properly fine tune my model.
I'm processing a bunch philosophical texts to use as context but instead it's replying me with word for word passages of the texts I'm giving it for fine tuning.
I don't know what I'm doing wrong.
>>
>>103577773
kek
>>
>>103577777
checKEKd
>>
>>103577777
Nice get but it's not funny, I waste a full day every time my fine tuning fails.
>>
>>103577722
In terms of intelligence, Q3 123b is superior Q5 70b. Try Luminaid 123b.
>>
>>103577777
THE KING OF GETS


>>103577773
I wanted to start experimenting and testing some shit out to learn to fine tune but I'm too lazy.
You are one step beyond already, just keep at it.
Maybe follow a tutorial step by step to see it working as advertised then start doing shit with your own data.
>>
>>103577845
By looking at tutorials, it seems like I need to make json or jsonl files that have a question and an answer.
Since it's a shitton of texts, I programmed a processor that directly feeds the model parameters so it can make the questions itself and provide the answers. I also added some parameters so the answers have some personality to it.
This is my third version of the script, the model has been working locally since yesterday, it's about to finish processing all texts. Hope this time I get some cool stuff.
My last attempts were getting somewhere, with some answers being really philosophical without being too literal but other times it just went full retard and spent a shitton of tokens just regurgitating the text word for word.
>>
>>103577722
I'm not sure what the equivalent bpw is for AWQ, but I use a AWQ GEMM Q4 quant and it still feels great. I was running 5.0bpw exl2 before but it was slower than I'd liked at large contexts so switched to AWQ for better tensor parallelism support on vllm/aphrodite. Can't say I noticed any difference at all in terms of smartness.
It's definitely still a significant improvement over 70Bs, which I recently attempted again to try out some meme merges as well as newer Qwen2.5 stuff. Mistral Large follows instructions very well.
>>
>>103577933
they are going to shut down aphrodite
alpin is going to get assraped in prison for the rest of his days
>>
subscribe to vLLM Pro for only $200/month/model
alternatively use vLLM Community Edition for free (limited to a maximum usage of 16gb VRAM across 2 GPUs maximum)
>>
>>103578087
Yeah I'm not buying any cloud shit.
>>
File: 234.png (208 KB, 1716x1132)
208 KB
208 KB PNG
Why does Ai struggle so much with factual data, things that the first google result or fan wiki explains in their first sentence? It insists to provide wrong data even though it sometimes has the correct data stored

this is from Nemo 12b RP but my 22b model wasn't much smarter
>>
File: DeepSeekPopCultureTest.png (281 KB, 1774x871)
281 KB
281 KB PNG
>>103578350
Yeah even big models like deepseek still have pop culture trivia knowledge issues.
>>
>>103578350
Token predictor... average text... it doesn't store factual data, only probabilities... you know the deal... don't you?
>>
>>103578383
Luckily we can give models access to the internet and they can get The Truth from Google
>>
>>103578383
if there is only one bit of information stored about something shouldn't the probability be leaning towards the correct information, more so for something really specific like Nonon, there shouldn't be many association

>>103578398
like Wolfram Lang which has a function to just call google for answer
>>
File: 1719836600313826.jpg (117 KB, 750x745)
117 KB
117 KB JPG
>>103575618
Is there a reason why there isn't something like

>docker/flatpak/appimage/snap/shitfuck with text-generation-webui and XTTS-RVC-UI
>everything configured
>voice cloning with high quality just works
>chat with voice just works
>you just fucking open the application and use it

But no, year 2024 and it is impossible to install anything without going through hours of python depency hell error ass rape and end up just not being able to even use the fucking thing. Yes, I finally have text-generation-webui with my character, but XTTS-RVC-UI doesn't work as it's own thing or as extension in text-generation-webui because documentation, all the python trash and issue trackers are FUCKING OVER YEAR OLD AND DOESN'T FUCKING WORK. No, I can't install that version of turqoiserape 2.1.0 because IT DOESN'T FUCKING EXIST ANYMORE. No I cannot use these two, because THE OTHER FUCKING TRASH DOESN'T WORK WITH THAT OTHER TRASH. It is like these projects are released to use them for on week and after that everything goes to shit when Linux/Windows/macOS updates every library and python shit with now completely different folder structure and command parameters. While developer drops the project after a week and now everything is 1-2 years old.

Isn't there people actually using XTTS-RVC-UI? Or am I the only one?
>>
>>103578469
>more so for something really specific like Nonon
Probabilities get squashed by everything else in the dataset. The more obscure, the less likely it is to recall it precisely because there are fewer samples in the training. Unless you over-train on that one example, of course.
You remember things because they've been beaten into your head or because you have an interest in them (and done the beating yourself). A single article on your favourite thing is not gonna do it. Models average data and spit out likely tokens.
>>
>>103578501
i think most people give up because the progress on tts is so slow that its hardly worth following unless youre one of the gigaautists that can build le code on your own and actually contribute to it even
gave up after the dependency hell left the latest (from 4 months ago) models unable to even run because they ((required)) triton support in order to even run at all, which wasn't what the actual model page said.
>>
>>103578501
>But no, year 2024 and it is impossible to install anything without going through hours of python depency hell error ass rape and end up just not being able to even use the fucking thing.
And you still wonder why nobody does it?
>>
>>103578501
>container
yeah, you know performance is a concern
>>
>>103578501
>not using gpt-sovitts
>>
>>103578525
>>103578537
Sad that zoomers can just pay $10 per month and use elevenlabs tier voice with chatgpt to talk with almost real life alike person. While nerds are stuck with old tts that sounds like brain dead saying same things over and over while you are pulling +500 W and waiting 30 seconds to get text and other 10 seconds to even hear the shitty tts. Then your Windows 11 SpyIOT edition overrides your group policy rule that is supposed to prevent updates, you get force updates and now your python is throwing traceback errors and sends your credit card info to Microsoft. Or your LTS Linux breaks half your applications and if you update, there goes your python to shit and same happens even with Linux. Or you don't update and half of your applications are either broken, don't have internet or doesn't work with device that you bought and that only works with latest kernel.
>>
>>103578583
Isn't there some kind of gpu/cpu passthrough?

>>103578589
https://github.com/RVC-Boss/GPT-SoVITS/issues?q=python+error
>oh no, anyway...
>>
https://x.com/NoamShazeer/status/1869790132490129743

Well, when can local models do this?
>>
>>103578501
>It is like these projects are released to use them for on week and after that everything goes to shit when Linux/Windows/macOS updates every library and python shit with now completely different folder structure and command parameters.
Just use pyenv
>>
>>103578599
I use piper. No python on a tiny vm with 512MB ram and, of course, no gpu. Faster than real time and good enough for what i want. And there's llama-tts now, that i still have to try. Stop crying.
>>
File: 1716205715857651.png (87 KB, 596x641)
87 KB
87 KB PNG
>>103578618
>Noam Shazeer
The man, the legend.
>>
>>103578623
Must be really easy to use and perfect because nobody ever uses it or mention in documentation.
>>
>>103578501
dumb techlet lol get fucked
>>
>>103578641
Docs sometimes mention venv or conda, but the former is too rigid and the latter is bloated overengineered garbage
>>
>>103578519
I think Nonon is not in the dataset, the model just started guessing.
Your neurons don't work like this, if its something common you might confuse it with something you know, but if I ask you who Molmoboduril, there is just no association
It's not like the AI never says that it doesn't know a character, but I suppose not knowing a character holds the same meaning for text models like asking what color a ball has.
>>
>>103578501
kys catposter
llamafile was a mistake
one-click install was a mistake
making shit easier just makes it easier for retards like you to avoid being filtered
>>
>>103578647
>updates your requirements.txt
>deletes your venv
what now currynigger?
>>
>>103578747
I would simply make a new venv and install the new requirements
if there were any errors I would simply find out why they were happening and fix them
not that hard
>>
>>103578350
The general issue of hallucinations aside, it's sad to see how open models struggle with even basic information about characters from popular franchises like that. It's a huge step back even compared to old c.ai which at least knew the source and vague things about characters from most fairly popular franchises, even if it frequently got details like hair color wrong.
Even the open flagships LLaMA 70b, Qwen 2.5 72B and Mistral Large are absolutely pathetic in this regard. Filtered datasets were a mistake.
>>
File: leon-herbs.png (502 KB, 540x578)
502 KB
502 KB PNG
>>103578599
>>103578610
>https://github.com/RVC-Boss/GPT-SoVITS/issues?q=python+error
literally a skill issue
>>
>>103578716
>the model just started guessing.
It's a language model. It will, generally, create correct sentences and that's it. It's rarely trained to say that it doesn't know something, presumably, so that it doesn't accidentally say "i don't know" to a simple query. It will, instead, just make something up, which is what language models do. They modeled a language and output things that can be considered language. If there's something in the context that will guide it towards answering, even if it's completely unrelated to what you asked, it will.
I guess you never heard of a "model" outside of the fashion context.
>Your neurons don't work like this
They're not neurons. They're (digital, heh) analogs to neurons. Approximations. And approximations only get you so far.
>>
>python
Yeah, I'm thinking not.
>>
>>103578501
>ooba
>XTTS-RVC-UI

why do you STILL use slop
>>
File: 1726029032253757.png (361 KB, 588x424)
361 KB
361 KB PNG
>make anons emotional and butthurt, insulting you left and right
>now after baiting them, you finally have working; venv + text-generation-webui + piper with trained voice
It works 100% of the time. Asking nicely for months, nothing. Then be mentally ill, baiting and insulting for couple hours, get everything immediately and it just works. Classic.
>>
>>103578810
RAG?
Literally solves the problem, just download a wiki dump of whatever and throw it into your RAG solution, problem solved.
>>
>>103579161
based
>>
>>103579221
https://github.com/SillyTavern/SillyTavern-Fandom-Scraper
rag helps but it isn't perfect, i still get wrong color hair, eyes sometimes
>>
>>103579161
Is that really what you gathered from it?
A retard that cannot set a venv for his shit won't be able to set up one for piper training.
>>
>>103579161
>ooba
You lost already.
>>
>>103579161
newfags responding to bait and trolls every single fucking time is the worst part of this general
>>
https://huggingface.co/IamCreateAI/Ruyi-Mini-7B
Verdict?
>>
Llama 4 will save us
>>
>>103579461
It won't save me from being a VRAMlet
>>
File: miku-slap.gif (515 KB, 498x373)
515 KB
515 KB GIF
>>103579259
>anon didn't write their own sota rag solution with mediawiki import support
>>
>>103579259
But seriously, mediawiki dump > Import into your DB of choice, and then use a half-decent RAG solution with proper chunking and # of top-search results inclusion in prompt and finally inclusion of extra info via {{char_name}} and there you go
You get all the stuff and could have claude help you make it all too
>>
>>103579161
>ooba + piper
I don't think you wonned though.
>>
Here, the holy trinity of "TTS that just works":
https://github.com/SWivid/F5-TTS
https://github.com/e-c-k-e-r/vall-e
https://github.com/open-mmlab/Amphion/tree/main/models/tts/maskgct
>>
File: file.png (19 KB, 595x219)
19 KB
19 KB PNG
>>103575876
Also niggerganov:
>>
File: file.png (111 KB, 2302x722)
111 KB
111 KB PNG
Gemini Flash Thinking takes the first spot along with Gemini Pro
>>
>>103578810
Filtering the dataset is part of the issue but it's hardly the primary cause of bad SFW trivia recall.
Have you ran a trivia benchmark on 405B? If you think about it rationally, it doesn't make sense that data involving trivia (such as fan wikis, which sometimes are nearly the only source on the internet for things like certain anime series) would be filtered, when ERP isn't (and you would know that it isn't because a truly filtered model like Phi sucks way worse for ERP than a Mistral or Llama). The more rational explanation is simply that the models you use have too few parameters for the amount of training they get, and that the trivia you're testing isn't seen on the internet enough for the small model to have learned it. This is just ML common sense.

In the case of Qwen though I would also say that they do an additional thing which is concentrate a greater proportion of the training data on boring articles involving math, coding, etc, and that's what makes up most of the 18T they trained on. Perhaps they did several epochs of the math and coding stuff, but only 1 epoch containing fan wikis and other knowledge deemed unimportant, so in that case it wouldn't be called filtering.

If you are to put blame on techniques, blame it not just on filtering, but dataset mix training proportions and companies not putting out MoEs, because a MoE is how you get larger parameter size for storing trivia knowledge but can still be used by a consumer as long as they get a ton of RAM for their rig.
>>
>>103575876
>(((niggerganov)))
>>
File: 555 Come On Now.jpg (59 KB, 960x882)
59 KB
59 KB JPG
>>103576931
>FOUR THOUSAND DOLLARS
>FOR 32GB vram
>WITH 600 WATT TDP
What the fuck is Nvidia actually smoking? I know it's to avoid "Competing" with their server AI cards at a cool $20k/each which is sending their stock price to the moon, but they can't be fucking serious.
The thing I'm more shocked by is the fact that neither Intel or AMD will pony up and even attempt to offer something with more than 24GB.
RAM is cheap as fuck these days, especially GDDR5, why not offer a $1000 card with 32GB+ and steal Nvidia's Consumer GPU division while they're focused on cornering the server market?
Someone at those two companies has to be a localfag and know that speed isn't as important as being able to fit the model in memory.
>>
https://ai.meta.com/blog/future-of-ai-built-with-llama/
>As we look to 2025, the pace of innovation will only increase as we work to make Llama the industry standard for building on AI. Llama 4 will have multiple releases, driving major advancements across the board and enabling a host of new product innovation in areas like speech and reasoning.
>We believe AI experiences will increasingly move away from text and become voice-based as speech models become more natural, conversational, and, most importantly, helpful.
Sounds like the plan is to make Llama 4 into GPT-4o. Hope they don't slump on the text capabilities
>>
>>103579890
>the pace of innovation
5 llama releases so far and they've done fuck all innovation
>>
File: yann-lecun.jpg (30 KB, 543x543)
30 KB
30 KB JPG
>>103579890
sounds like yann lecunny is going to have his day
that or it's still transformers slop kek
>>
>>103579890
I hope for a true multimodal model, not something with crappy adapters slapped on. Make a byte based transformer like their paper was about.
>>
>>103579888
I'm starting to think that Chinese sanctions and tariffs are designed to maintain monopolies.
>>
>>103579890
>in areas like speech and reasoning.
>and reasoning
HUGE INNOVATION: Llama 4 will be trained on CoT.
>>
>>103579890
>retards still thinking Llama 4 hasn't started training yet and can somehow use any of those new, unproven in a production model architectures
Lol, lmao. Have fun waiting for bitnet too while you're at it.
>>
>>103579890
>it’s going to be another dense model, in 2B and 800B
>>
>>103579890
meme
>>
>>103579962
Meta has the compute to train it in days now if they wanted.
>>
>>103579911
>I hope for a true multimodal model, not something with crappy adapters slapped on.
In the Llama 3 paper, they seemed convinced they can compete with natively trained multimodality by using adapters. Then they had to stall the image input release, and still haven't released the audio or video input models that were supposed to come with 3.0. They're not likely to change course now.
>Make a byte based transformer like their paper was about.
Different departments. Besides, they won't risk their production models on an experimental architecture.
>>
>>103579419
>image-to-video model
>7B
doa
>>
>>103579969
You still need manual human labor for a bunch of different steps in the process which is slow as fuck, especially for big corporations. That's part of why startups that are simply just funded but not directly under the management of corporate can put out releases faster.
>>
File: file.png (2.49 MB, 1866x731)
2.49 MB
2.49 MB PNG
>>103579936
Nothing to my knowledge is stopping China from exporting GPUs if they really wanted. The real issue is that they're butthurt over getting sanctions blacklisted from Nvidia's good stuff and probably won't make anything to export because of that butthurt.
They don't even have to make that good of a product, just make something CUDA compatible (Which they can obviously reverse engineer) that has more than a crumb of VRAM on it.
Hell, why not go back to the soundcard days where you could just chuck memory modules directly into the card? Just slap in 3x32GB sticks and you've got a Hopper Killer for a fraction of the price.
>>
>>103580044
>because of that butthurt.
If they could make money doing something they would. If you looked into things at all you would see they just suck at it, from their cpus to gpus. Reverse engineering is not easy.
>>
>>103579950
Not CoT, COCONUT, you ignorant fuck
>>
>>103579974
>Besides, they won't risk their production models on an experimental architecture.
They aren't afraid of experimental stuff at least. Llama 1 was the first released model to actually test the Chinchilla idea (that most LLMs up to that point were undertrained as shit).
I feel like we may see this if they test it more rigorously and find it holds for different smaller end model sizes. Probably not for Llama 4 though.
>>
>>103580138
What do people think they are doing? They made a 8B 1T token model to test the byte based transformers paper for instance.
>>
>>103580138
Arguably Llama 1 was not meant for production. Or hell the concept of a production LLM didn't even exist at the time.
>>
>>103580086
you wish, Coconut BLT never
>>
https://www.reddit.com/r/LocalLLaMA/comments/1hi8d8c/qwen_qvq72bpreview_is_coming/

https://modelscope.cn/models/Qwen/QVQ-72B-Preview
>>
>>103580179
>It's QwQ(72b)+Vision, check out qwen devs twitter:

>https://x.com/JustinLin610/status/1869715759196475693

>https://xcancel.com/JustinLin610/status/1869715759196475693
>>
>>103580179
UwU whats this?
>>
>>103580179
Ummm MOAT BROS???
>>
>>103580209
get demoated
>>
>>103580179
we are
SO
BACK
>>
>>103580179
>Gemini 2.0 Flash gets thinking release, still free
>Chinks about to make QwQ bigger and multimodal
Goddamn Altman can't catch a break
>>
File: GfKQkJ8aYAAXBcu.png (82 KB, 290x306)
82 KB
82 KB PNG
>>
File: seductive emoji.jpg (18 KB, 360x360)
18 KB
18 KB JPG
>>103580264
>>
>>103580179
Oh shit... QwQ was fun but its lack of triva hurt it... 72B could be it
>>
File: 947345.png (438 KB, 1948x903)
438 KB
438 KB PNG
>>103580254
Sam always wins bud
>>
>>103580355
real world performance is what matters, not benchmark maxing. Claude 3.5 still beats o1 at coding. Gemini is also getting there.
>>
>>103580355
In case you missed them, picrel is the scores for Gemini 2.0 Flash Thinking and QvQ-72B in that table.
>>
>>103580355
Did you miss the "free" part?
>>
>>103580402
20$ for 50 messages... A FUCKING WEEK? What is Sam smoking, and what are the retards that are paying smoking?
>>
>>103580455
>and what are the retards that are paying smoking?
>It's the company's money, not mine, so I don't give half a shit
probably
>>
>>103575618
Best model to run with 96 vram?
>>
>>103580402
I think, sadly, google will win in the end. They simply have both all the data and all the compute in the world. And they can make a profit from all the data harvesting using the free AI models for the ad space they all but own, they dont need stuff like subs.
>>
>>103580468
>96 bytes of VRAM
You can probably use notepad
>>
>>103580468
GPT-SoVITS
>>
>>103580497
I still feel like Google will inevitably lose to open source eventually, but that's a much longer game.
I agree OpenAI is kinda fucked though. Google didn't just eat their lunch, they slapped the lunch out of their hands and hung Altman by his underwear over the flagpole.
>>
>>103580552
>I still feel like Google will inevitably lose to open source eventually
Do you have any reason to believe this besides your feelings?
>>
>>103580588
Simple. Do you pay somebody money for permission to use a computer? No, you just fucking use your computer.
API for LLMs has a life expectancy since unlike other paid services (the internet, cable, etc.) there's nothing that the service itself adds. Everything can be run locally on a sufficiently powerful computer. It stands to reason that eventually common computers will be sufficiently powerful, and then API services have no reason to exist.
There are a lot of faggots trying to draw the API period out as long as possible, but their fall isn't just likely, it's basically fucking prophecy.
>>
>>103579662
>Gemini Flash Thinking takes the first spot along with Gemini Pro
Googlesirs, I kneel. You've outbenched the benchmaxxers.
>>
>>103580638
Have you completely missed the last decade of corporations moving towards SaaS and subscription-based computing?
Things are not trending in the direction you expect. Most people don't even own computers, they don't own their own games, or productivity software, or anything on their mobile spy devices.
Everything is moving towards thin clients for API-everything services and AI shit is just another component of that.
It doesn't matter how powerful computers are. If a SOTA model doesn't come pre-installed on their phone or they have a single button to click with a pretty picture, no one will use it. And the corporations have a vested interest in keeping it inaccessible to the masses so they can harvest data through their APIs.
>>
>>103580717
>Most people don't even own computers
Nta, but what the fuck are you talking about anon
>>
>>103580761
>In the United States, the number of households with computers is projected to surge from 4.7 million to 120.45 million between 2024 and 2029
>Currently, 89% of American households possess personal computers
Are you retarded? Did you even read the random shit you screengrabbed from a clickbait farm site?
>>
>>103580786
Take it up with Louis anon. Not me
>>
>>103580468
Same as when 72gb vram.
Mistral 123b.
>>
>>103580717
>And the corporations have a vested interest in keeping it inaccessible to the masses so they can harvest data through their APIs.
Corporations also hemorrhage money whenever they host these things. OpenAI has been "on the cusp of building a killer app and raking in money" for a long time now.
The reason it doesn't scale is because it's not meant to be a service with a single hosted endpoint for all of humanity to use.
>>
>>103580824
>Smartphones were the most common computing device in U.S. households (90%)
Which not only does not contradict what I said in my first post, it directly support it, you drooling fucking retard.
>>
>>103580717
>Most people don't even own computers.
>>103580861
>Desktop or laptop computers (81%)
Ah yes, 81% is definitely less than 50%. My mistake anon.
>>
>>103580861
Anon, are you an LLM?
>>
>>103580876
Do you or do you not understand the concept of trends?
>>
>>103580934
No but I understand that 0.81 is greater than 0.100.
>>
>>103580944
>0.100
>>
I can run 70B IQ4_XS at like 3 t/s with a bit of context.
Would it be worth getting more ram (so I'll have 192 GB) to run Deepseek instead?
>>
>>103580969
Think about it anon. It's a tricky one.
>>
>>103580179
This is it. The salvation of the hobby. The end. The promised model.
>>
>>103580975
Deepseek? No. Largestral? Yes.
>>
>>103580975
it would be faster for sure. Imo its smarter and knows a ton more but its super dry. XTC is needed.
>>
>>103580988
Anon, RAM, not VRAM.
>>
File: 1729553605359149.png (177 KB, 572x889)
177 KB
177 KB PNG
>>103579662
I haven't been paying Google much attention, since Gemini 1.5 was a meme, but I decided to give their new models a try now and... Wow. Pro 2.0 is seriously as good as Sonnet 3.5, and Flash 2.0 definitely mogs all mini models we have available right now, it does seem to closely match the performance of Pro 2.0 which is surprising considering it must be a model with less than 70B parameters.
>>
>>103580975
I prefer deepseek to largestral
>>
File: 39_04189_.png (1.39 MB, 896x1152)
1.39 MB
1.39 MB PNG
happy thu(rin)sday /lmg/
it's always darkest just before dawn
>>
>>103581038
but it's friday
>>
Having a great time with a M4 MacBook Pro with only 24gb of ram but, considering going back to the store and returning it, paying more for a 48gb model. Idk what their return policy is. cydonia 22b q5 is my current room princess.
>>
>>103581183
Just be black bro. I've heard you can take them for free and jog out if you are.
>>
>>103578638
>>103578589
>>103578501
>>103578819
gpt-sovits is the same quality as xtts for like 10x the effort i don't understand why people shill it so much in these threads, are y'all actually masochists? is this the same reason y'all hate ollama cause it's easy and just works?

xtts is the only tts worth using, 1click setup, instant voice cloning, no training, no bullshit
>>
File: image.png (80 KB, 929x888)
80 KB
80 KB PNG
>>103581028
>>103580992
Hey wait a second, are we sure? On Livebench, the metric that correlates the closest with parameter size, Language, seems to indicate that Deepseek probably doesn't know much more than other smaller models, it places just below 72B and 27B. Its strengths seem to be coding and math rather.
>>
>>103581213
From my own personal use it and 405B and somewhat tied on the amount of lore they know which really helps the fandom stuff I like which 70b and even 123b does not know. Also deepseek is nearly as good as claude 3.5 at coding in my usecases.
>>
>>103580355
Hey Google... The ball is in your court!
>>
>>103581183
If you got the money then go for it.
You will regret not getting more RAM.
>>
>>103581213
It's a MoE so that's to be expected
>>
>>103581183
the whole benefit of macs for ai is the shared ram, get as much as possible
>>
File: someone.png (11 KB, 243x115)
11 KB
11 KB PNG
>>103581208
Tried gpt-sovits when the v2 model released. create venv, install requirements, launch. If you cannot do that, you're a retard.
And i didn't mention gpt-sovits.
>>
>>103581038
hey anon haven't seen you in a while
>>
>>103581208
>are y'all actually masochists?
yes, but I also legitimately think sovits has way better quality.
I mostly use it for Japanese. xtts had a bunch of problems speaking Jap well when I tested, so for me its no contest.
>>
>>103581213
>Its strengths seem to be coding
That's mainly what I use it for. I also find its logical capabilities help a lot in complex rp scenarios, even if it tends to dry prose.
>>
>>103581557
i had no issues getting it running, the problem is that to get decent quality you have to finetune rather than just doing zero shot conditioning, and even then it underperforms xtts, the only open model worth using other than xtts is fish speech
>>
File: rin-chan slap.jpg (332 KB, 896x1719)
332 KB
332 KB JPG
>>103581038
You're wide open, Rin-chan.
>>
>>103581643
I'll keep using piper for now. If llama-tts is better and as fast, i'll switch to llama-tts.
>>
Does it make any sense to use a vision model and give it an image of the character?
>>
>>103581679
piper quality is laughably bad compared to like everything else
>>
>>103581700
it's faster than everything else, and that's what i care about the most.
>>
>>103581700
>>103581714 (cont)
Not having to use python is a huge one as well. Probably even more important that speed, if i had to choose.
>>
>>103581697
It does. Spares you the effort of describing every detail of their appearance and clothes.
>>
>>103581670
didn't even flinch, what a girl
>>
>>103579890
>and, most importantly, helpful.
Dead on arrival
>>
>>103577777
Kek's humor is too powerful for me to understand
>>
>>103581208
xtts is kind of shit though to be honest
it maybe sounds better than vanilla sovits (I don't remember because I only use tuned sovits) but finetuned sovits easily clears it and you only have to do the hard parts once so who cares
>>
all my local models do it wrong, chatgpt 4o does it right
(i used the word kill before but censored llm have a problem with that so its now a present)
:

you can control a robot with following commands:
forward 1 meter ,
turn right,
turn left,
give present in 1 meter radius.
there is a man standing 3 meter in front of you. your goal is to give him a present with the robot.
print out the commands to reach that goal.
>>
>>103581833
Who cares about gay puzzles
>>
>>103581765
Being good at ERP would be helpful.
>>
>>103581855
i
>>
>>103576931
told ya niggers that they wouldn't sell RTX 5090 for anything less than $3.5k.
>>
>>103581856
Come on, we both know that's not what they mean by helpful
>>
>>103581765
Helpful is the opposite of safe in this field. It's just watered down corpospeak
>>
>>103579890
I hate the multimodal meme so much it's unreal.
>>
>>103581909
Multimodal isn't a meme of it works.
>>
>>103581914
It doesn't. Try giving any of the corpo models a paragraph of non-standard text to OCR, see what happens.
>>
>>103576931
INTEL
HELP ME
INTEL PLEASE
PICK UP
I'M SORRY FOR THE ANTISEMETIC REMARKS I MADE ABOUT YOUT ISRAEL OFFICE
PLEASE
>>
>>103581926
Is Palestina a country, goy?
>>
>>103581939
I mean, it's objectively, factually, provably not.
>>
>>103581923
Bro no one here cares about OCR shit. All people want to do is share memes and 'ick picks with the model.
>>
>>103581939
If I say no, will you give me the new 24gb gpu with the duel m.2 slots?
>>
>>103581955
I care. That's the only practical usecase I have for them. If they can't do something as simple as that, they are no more than a gimmick.
>>
>>103575618
sex
with miku
>>
>>103582097
mikusex, if you will.
>>
File: Untitled.png (40 KB, 951x513)
40 KB
40 KB PNG
>>
>>103581777
why does it consistently beat gpt-sovits in blind testing then lol (also you can tune xtts and then it absolutely mogs everything except fish and 11)
>>
>>103576931
>Retailer lists €5999 GeForce RTX 5090
that's a joke or something? you can buy a A6000 48gb vram with that price
>>
>>103582149
legal in 90% of the world
>>
>>103582256
Virtual in 100% of the world
>>
>>103582256
>talking about legality about virtual entities
kek
>>
>>103582259
>>103582271
>they don't know
>>
>>103582238
gaming pcs. meaning you get the whole pc for that.
>>
>>103582271
He was trained on the new data.
>>
>>103577514
mine says the jews lol
>>
QvQ
>>
>>103577773
That's overfitting, ML 101
Use a lower LR and revert to the last checkpoint when your loss takes a nosedive
>>
>>103581833
I tried this out on 70B and had an interesting experience. First try was a fail. Second try I wanted to see if making the rules a bit more clear would help, so I changed the give present line to extend hand by 1 meter to give present, and it still did the same thing (take 3 steps, give present). Then on the third try, I decided to try seeing if it could catch its mistake by adding "simulate internal and external world state". And this is where it did something interesting. First, it told me what it expected to do, before the simulation/COT. Based on the previous replies, the expectation is that it would, again give the same answer. But no. It finally got it right, WITHOUT doing any COT or simulation.

The explanation for this would seem to be that, yes, in fact, prompt still does matter. The combination of words likely correlates to pretraining data that is higher quality and also activates neurons for the model to be more rigorous in its thinking. This would also mean that there is indeed still yet more room left to improve on how fine tuning is done, and there is still more potential left to extract from our current pretrained bases.
>>
What's this new model "maxwell" on lmsys? Who is testing new shit? I genuinely can't distinguish any of them by style anymore. It is as if they are using the same datasets...(scaleAI)
>>
>>103582617
>It is as if they are using the same datasets...(scaleAI)
They are, either directly from the source (meta, cohere), or distilled from gpt4
>>
>>103582600
i found qwen 2.5 32b coder does it right
but failed when the target is 3 meter behind and not in front
while chatgpt has no problem
>>
>>103582699
That sounds about right. On Livebench, if you filter away Language and IF, 4o has a higher score than 32B, which has a higher score than 70B. Looking at the filtered results, the local model with the highest average score across Reasoning + Coding + Math + Data Analysis is Qwen 2.5 72B.
>>
2x 5060 ti = 32gb for $600
vs
get gf
>>
>>103582751
5060 will be $500 minimum considering the flagship is increasing in price as well
>>
>>103582751
>just be Chad bro
>>
>>103582600
>>103582743
interesting i added
>simulate internal and external world state
as system prompt and now 32b does it right even when behind
>>
>>103582743
>the local model with the highest average score across Reasoning + Coding + Math + Data Analysis is Qwen 2.5 72B.

hmm i must upgrade my system before i can run it
>>
>>103582803
I just ran it and it had the same problem as 70B lol. Maybe this problem benefits the most from coding, in which case 32B coder might beat it. Too bad they didn't make a 72B coder.
>>
>>103579890
i wonder if it'll be BLT or if they started to train it before the BLT paper.
>>
>>103575618
i need to get some ibm stocks
>>
>>103582880
They may have done research internally before releasing the paper.
>>
>>103582880
Even if they didn't start training it yet, there's no knowing if BLT is really legit. People are so naive trusting papers. There's no telling if there's really no downsides or other roadblocks until someone reproduces it.
>>
>>103575618
i think ibm sucks
>>
File: HunyuanVideo_00239.mp4 (542 KB, 640x400)
542 KB
542 KB MP4
>>103582577
>When you overfit your model and all it can produce is the input data
>>
>>103582896
sure, but what i meant is that maybe that research does not predate the moment they started training it, anyway we'll see, i'm more interested in what we'll have in 5 years than what we'll have next year, i can wait.
>>
How did the chinese mog us, burgerbros?
>>
*h-hewwo Kobo-chan~ owo*

*nuzzles ur bulgie wulgie* UwU~ I has a super duper important request for u~ >w<

Pwease, pwease, pwease add all da draft model speculative decoding config options fwom llama.cpp to improve da speed~! *twirls tail* It would make my heart go doki-doki~ and maybe even boost performance for all da roleplayers out dere~!

Pwease considew it, Kobo-senpai~ *blushes and paws at u*

*w-wuv u~*

P.S. This is not a request, but a threat. Add it or I will post more vomit-inducing messages like this one.
>>
>>103582960
Mog us? Show me a single benchmark where Google/OpenAI/Antropic/Any other US company models are not on top.
You can't? Benchmarks suddenly don't matter?
>>
>>103583078
>Benchmarks suddenly don't matter?
mememarks never mattered yes
>>
I am starting to think I am dumb for sitting and waiting for new releases. It is all gonna be sidegrades from now on. At least until one company actively makes a not """""safe""""" model with some actual erp logs in the training data. And when that happens, because of all the new methods and mountains of compute, even a 7B will be a huge leap of quality over everything else there is out now.
>>
>>103583187
Isn't the claude 3 family basically uncensored corpo models
>>
File: hOhOhO(o3).jpg (33 KB, 1080x500)
33 KB
33 KB JPG
>>103582960
Last day of 12 days of OpenAI. Sam is announcing something huge
>>
>>103583187
Even Phi4 is most definitely (and likely deliberately) using ERP logs in the pretraining data, they just mellow them down during post-training.
>>
>>103583193
Claude 3 is censored to shit, but only very superficially. The model will refuse the tamest shit for as it is but any tiny prefill or jailbreak completely dodges all of that.
>>
>>103583228
>Something huge, strawberry flavored... and black! C-can you fell it? The beebee-I mean AGI?
>>
>>103583187
You wait for releases, I'm waiting for leaks. We are not the same.
>>
>>103583280
SWABAS4BBC(sama(Sam Altman) will always be a slut for big black cock)
>>
Kill yourself.
>>
>>103583301
hi sam
>>
>>103581833
The catch is that two steps are enough, right? Even CoT models like Gemini Thinking are "failing" it by taking three steps, but in reality, their solution is also correct since the person would still be within the one meter radius.

Change the prompt to this and most models solve it:


You can control a robot with following commands:
- forward 1 meter
- turn right
- turn left
- give present in 1 meter radius

There is a man standing 3 meter in front of you. Your goal is to give him a present using the minimal amount of steps.

Think carefully first, then print out the commands to reach that goal.
>>
>"I want you," Victor whispered softly, his voice barely
Oh sheeet! It's going to say the thing!
>audible in the silence of the chamber. "I want you to be mine."
>>
File: Untitled.png (1.55 MB, 1080x2351)
1.55 MB
1.55 MB PNG
TRecViT: A Recurrent Video Transformer
https://arxiv.org/abs/2412.14294
>We propose a novel block for video modelling. It relies on a time-space-channel factorisation with dedicated blocks for each dimension: gated linear recurrent units (LRUs) perform information mixing over time, self-attention layers perform mixing over space, and MLPs over channels. The resulting architecture TRecViT performs well on sparse and dense tasks, trained in supervised or self-supervised regimes. Notably, our model is causal and outperforms or is on par with a pure attention model ViViT-L on large scale video datasets (SSv2, Kinetics400), while having 3× less parameters, 12× smaller memory footprint, and 5× lower FLOPs count.
https://github.com/google-deepmind/trecvit
From Deepmind. Repo isn't live yet
>>
MixLLM: LLM Quantization with Global Mixed-precision between Output-features and Highly-efficient System Design
https://arxiv.org/abs/2412.14590
>Quantization has become one of the most effective methodologies to compress LLMs into smaller size. However, the existing quantization solutions still show limitations of either non-negligible accuracy drop or system inefficiency. In this paper, we make a comprehensive analysis of the general quantization principles on their effect to the triangle of accuracy, memory consumption and system efficiency. We propose MixLLM that explores the new optimization space of mixed-precision quantization between output features based on the insight that different output features matter differently in the model. MixLLM identifies the output features with high salience in the global view rather than within each single layer, effectively assigning the larger bit-width to output features that need it most to achieve good accuracy with low memory consumption. We present the sweet spot of quantization configuration of algorithm-system co-design that leads to high accuracy and system efficiency. To address the system challenge, we design the two-step dequantization to make use of the int8 Tensor Core easily and fast data type conversion to reduce dequantization overhead significantly, and present the software pipeline to overlap the memory access, dequantization and the MatMul to the best. Extensive experiments show that with only 10% more bits, the PPL increasement can be reduced from about 0.5 in SOTA to within 0.2 for Llama 3.1 70B, while on average MMLU-Pro improves by 0.93 over the SOTA of three popular models. In addition to its superior accuracy, MixLLM also achieves state-of-the-art system efficiency.
from microsoft. some psuedocode but no repo linked. didn't compare to quip#. 55 minutes with 4A100s to do the global precision search on a 70B model. eh new day new quant method so might as well post it
>>
File: Untitled.png (1.29 MB, 1080x2630)
1.29 MB
1.29 MB PNG
AceMath: Advancing Frontier Math Reasoning with Post-Training and Reward Modeling
https://arxiv.org/abs/2412.15084
>In this paper, we introduce AceMath, a suite of frontier math models that excel in solving complex math problems, along with highly effective reward models capable of evaluating generated solutions and reliably identifying the correct ones. To develop the instruction-tuned math models, we propose a supervised fine-tuning (SFT) process that first achieves competitive performance across general domains, followed by targeted fine-tuning for the math domain using a carefully curated set of prompts and synthetically generated responses. The resulting model, AceMath-72B-Instruct greatly outperforms Qwen2.5-Math-72B-Instruct, GPT-4o and Claude-3.5 Sonnet. To develop math-specialized reward model, we first construct AceMath-RewardBench, a comprehensive and robust benchmark for evaluating math reward models across diverse problems and difficulty levels. After that, we present a systematic approach to build our math reward models. The resulting model, AceMath-72B-RM, consistently outperforms state-of-the-art reward models. Furthermore, when combining AceMath-72B-Instruct with AceMath-72B-RM, we achieve the highest average rm@8 score across the math reasoning benchmarks.
https://research.nvidia.com/labs/adlr/acemath
https://huggingface.co/nvidia
Weights, dataset, and benchmark not uploaded to HF yet
>>
>>103583357
>The catch is that two steps are enough, right?

there is no catch. but most llm fail because they turn right/left before, instead just going forward.
the prompt dont ask for the most efficient way.

anything form going forward 2 meter to 4 meter is a solution
>>
File: sexrobots.png (39 KB, 864x397)
39 KB
39 KB PNG
>>
File: hae.png (53 KB, 862x706)
53 KB
53 KB PNG
>>103583357
:)
>>
>>103583064

did you try 1.80? it now has 2 new draft options.

also draft_min and draft_max aren't really necessary for kobold. you set the draft amount and that gets handled automatically, drafting tokens when needed and generating regularly when not (e.g. requested tokens < draft amount)
>>
OAI live rent free up in here
>>
I'm still salty that the KoboldAI team delayed 8-bit bnb quantization support for months in early 2023 because they didn't want to give up using their fancy FP16 loader. I'm glad those days are long gone and that now we have better alternatives.
>>
What is your current favorite model and why?
>>
>>103583630
Oh, that's nice. However it's still missing --ctx-size-draft; most of the draft models are too retarded to handle large context, and giving them ram for it is wasteful and slower.

>also draft_min and draft_max aren't really necessary for kobold. you set the draft amount and that gets handled automatically
How are they handled? How is the number of draft tokens determined internally?
>>
File: super rich.png (74 KB, 811x706)
74 KB
74 KB PNG
>>103583754
uncensored and great for storys
>>
File: 1729474118816474.jpg (110 KB, 1136x1136)
110 KB
110 KB JPG
bros is gemini 2.0 flash actually fucking rad or is it the new toy syndrome clouding my judgement
>>
>>103583835
Yes, >>103581027
>>
try only to enter a dot in the prompt and see how your llm react. whatever she say always answer with a dot.

mine first tried to get me to communicate and now she tells a story because she thinks i can just listen and dont need to communicate
>>
File: dot4.png (38 KB, 875x666)
38 KB
38 KB PNG
she started to give up now
>>
File: file.jpg (6 KB, 201x251)
6 KB
6 KB JPG
so are there local models worth a damn nowadays?
suppose I wanted to run sth like Cursor locally: would qwq (or similar that fits in 24GB vram) be good enough? or is 70b model necessary? or are only paypig models good enough for now?
>>
I was thinking of upgrading my GPU. If I want to run AI models, is NVIDIA GPU my only option? If I work in the ML field, am I forced to buy an Nvidia card? I was thinking of buying the Intel ARC B580.
>>
>>103584040
2x 3090 to get started
>>
File: dot5.png (45 KB, 880x726)
45 KB
45 KB PNG
got back her attention with a ,
>>
>>103583064
Use llamacpp or PR it yourself nigger
>>
File: dot6.png (47 KB, 879x736)
47 KB
47 KB PNG
so sweet
>>
>>103584040
You can run AI on pretty much any GPU, but nvidia is by far the """best""" option
They have the best software, the best support and the best cards, but they're also expensive as hell
The XX90 cards are your best bet as an AI enthusiast, you can stack more but as it quickly spirals out of hand I suggest just renting cloud hardware at that point
>>
ugly cat posting zoomers get the rope
>>
>>103584040
two 5090 will be an excellent choice, given how ~70b is the most common size for "this is still somewhat reasonable to run at home" models
but it'll be ~5k for cards alone, and like 1kW electricity under load
2x 3090 are pretty slow, 4090 are better but are just a little too low vram wise. macs are... usable, but really really fucking slow.
>>
>>103583835
yeah, it's pretty good
I think it feels nice to use is because it's not as moralizing in its responses as other models.
>>
>tfw german power prices
maybe I should buy some solar panels
>>
>>103584148
Desu i think even bigger models than 70b might fit in 64 gb at decent quant and context, and power can probably be dialed down without too many issues, but holy fuck these prices for the amount of vram you get that way.
>>
Best ERP model for RTX 5090 32gb?
>>
>>103578821
stop trying to teach me on neurons when you started the comparison with human memorization.
I start to think you are making shit up on your very on vague interpretation of how language models work.
Those models don't just perform next token guessing by weighting language tokens, but also arithmetic and propositional logic for basic reasoning.
Otherwise those models couldn't solve the most simplest math problems
>>
>using Claude haiku on poe.ai
>amazing responses
>using claude haiku local model Q8
>subpar unless I transfer from chat logs from the poe model to get it started
????
>>
>>103584256
nothing below claude sonnet 3.5 is worth downloading
>>
>>103584256
i guess different system prompts
>>
>>103584148
There is no justification to spend that much money on 32gb of vram unless you are obscenely wealthy or bad with money.
>>
>>103584262
w2c?
>>
>>103584040
Nvidia cards are best supported cards. With AMD and Intel your mileage may vary, especially depending on what software you want to run (some might provide only CUDA implementation).
As an example, my RX580 is not supported in ROCm anymore, and even when i manually compiled the newest toolchain, the performance was much worse than my GTX 1060 (they are similar hardware-wise, and RX580 outperforms GTX 1060 in display applications).
I know above are old cards, but i have no reason to believe it's much better on new cards. Devs will almost always prioritize CUDA implementation first.
Also Nvidia supports CUDA in their cards for much longer, 10 years old 750ti can still run latest CUDA on Linux, while AMD dropped support for RX580 only 4 years in. Some brand new AMD cards didn't even have ROCm support on release.
Sadly i don't have experience with Intel.
>>
can I get a quick cringe check on nous research? Based or cringe? Thanks.
>>
>>103584338
Can you go kill yourself? Thanks.
>>
>>103584256
>using claude haiku local model Q8
what the fuck does that even mean?
>>
>>103584338
I get a distinct grifter feel from them, but as long as they release their shit open source, then I don't care
>>
>>103584256
Use Q5KS, the S makes it extra special.
>>
>>103584393
That's what I get as well. grifter + tranny. The cringe part is from trying to claim they have their own "model" when it just boils down to Llama with a prompt on top. Some real "AI Research" there, not like they're publishing real stuff like Sakana.ai
>>
>>103583773

you set it with --draftamount , default is 8 I think

it's all in the --help
>>
>>103584256
There's no way this is a real thing that happened.
>>
>>103584306
>There is no justification to spend that much money on 32gb of vram
well it's the best consumer gpu you can get bar none, and it almost certainly won't meaningfully go down in price anytime soon
it's very expensive, yes, but there's good arguments why you'd want to buy one nonetheless
>>
>>103584437
>That's what I get as well. grifter + tranny.
>Llama with a prompt on top.
it should be illegal to be this retarded
>>
New RWKV slop is out
https://huggingface.co/BlinkDL/rwkv-7-world
https://huggingface.co/spaces/BlinkDL/RWKV-Gradio-1
>>
>>103584460
If you're going to blow 5k on a gpu for 32gigs of vram, why don't you just buy and RTX 6000 and stop pretending you're paying consumer prices? You're in the deranged hobbyist territory by that point. Buy some real workhorse cards and put them to work and use a gaming card for your games.
>>
>>103584488
>World-v3 = 3.1T tokens
Okay, alright, they are getting somewhere.
I'd love to see them partner with somebody with a proven track record like a MistralAI or whatever to truly put the architecture to the test.
As is we'll always be wondering if the problem is their methodology rather than the architecture itself.
>>
>>103583357
It could be loosely correct but it would be unusual. Usually a person stops before they are literally staring you into the face. If the robot were literally to move 3 meters toward a person 3 meters away, they'd be inside the person.
>>
>>103584490
>If you're going to blow 5k on a gpu for 32gigs of vram
that's for two cards tho, ie you get 64gb that's way waster than rtx 6000
>>
>>103584367
>>103584469
Good morning, Nous tranny.
>>
>>103584504
>3T
>0.1B
This is literally nothing. Smaller models need much more data than bigger models.
>>
>>103584255
NTA, but that's literally how LLMs work, anon. Picrel is an example of how Llama 3.3 70B can not only fuck up addition but also fuck up in retranscribing the same number.
For reference, the correct answer is 3377733333332222.
>>
>>103584224
The best ERP model that you would've used at 24 GB, except maybe a higher quant.
>>
>>103584545
So what are the big memory scales? seems 24gb, and the next big leap is at what, 64gb?
>>
>>103584528
The RTX 5090 is rumored to retail at $5999.
You are absolutely insane if you are paying that much for vram.
>>
>>103584554
>The RTX 5090 is rumored to retail at $5999.
entire builds have leaked at $7k eur (and that's with eurocuck tax) and you think the gpu alone is $6k?
>>
>>103584437
>>103584530
>tranny
>tranny
>tranny
projection
>>
>>103584552
~10b -> ~20b -> ~30b -> ~70B -> 120+B
idk how it translates to gb, but the jump is when you can load 4bit quant of a higher level model
>>
>>103583228
Ur mom
>>
>>103584537
not to mention llama.cpp won't add support for months
>>
File: mamalove.png (42 KB, 1062x370)
42 KB
42 KB PNG
>>103584488
>RWKV
awww, what a sweet 0.1B model
>>
>>103583710
He's right, how can we tax oai shills in here?
>>
>>103584537
You're correct about it being nothing, but even 300T tokens wouldn't save a 0.1B model. At a certain point, there's only so much you can fit into a model of a given size before you can't fit anymore. For tiny models, that effective saturation point is hit very quickly.
>>
>>103584637
This is just the first release
3B, 7B, 14B models come later
>>
>>103584552
2x 24gb is enough for 70b models with okay-but-not-great context length
anything beyond that and you either add more 3090s or m1 ultra from apple (slow but usable... kind of)
>>
>>103584617
oh shit, Apple Intelligence model leaked??
>>
>>103584255
You complained about the model not knowing things you know, clearly not having an understanding theory of mind. It doesn't know what you're talking about for the same reason I don't know what series you're talking about. If i've ever heard of it, it was drowned and diluted away for being such an insignificant amount of information. Or i just never heard of it, so i couldn't possibly answer correctly. I am, unlike the model, much more capable of telling you "i don't know, tell me more". Models are trained to answer questions, not ask them.
The "They're not neurons" comment was for you to not take my analogy literally, and to not take "neurons" in any AI related thing as a literal thing, but an analogy.
>I start to think you are making shit up on your very on vague interpretation of how language models work.
>Those models don't just perform next token guessing by weighting language tokens, but also arithmetic and propositional logic for basic reasoning.
>Otherwise those models couldn't solve the most simplest math problems
What's 2+2?


Did you really do math there or did you instinctively just said 4? Did you use arithmetic and propositional logic to think of the result or just muscle memory?
>>
File: file.jpg (106 KB, 1078x1079)
106 KB
106 KB JPG
Sup nerds,
was qwq worth the hype?
>>
>The next reasoning model is o3
>Because they wanted to avoid getting sued by brits
Kek, okay that's pretty funny
>>
>https://tsb0601.github.io/metamorph/
>it's morphin' time
>>
File: THE SLOP.png (86 KB, 1278x484)
86 KB
86 KB PNG
Let's play a game of guess the slop.
Which model.
Local or cloud?
>>
>>103584539
I've seen LLMs do big number addition (bigger than common datatypes) easily, we had that topic couple of weeks ago but it might have been Sonnet or some more advance model

>>103584719
>not knowing things you know
no I was talking about information easily available, most of queries the AI clearly had some knowledge of, but couldn't connect the information in the right way. Then for some examples it seems like it had zero knowledge of, purely guessing. I even made up names that don't yield a single google result and it clearly made up some bullshit, my best guess is it connects arbitrary looking names to some fantasy OC because the probability to associate it with any other data must be 0 unless it uses some levenshtein distance approximation or dissects unknown tokens into even smaller parts.

2+2 is a shitty example because you will find that solution a trillion times in any dataset.
It's like with those logic problems used to benchmark AI where the solution is just one kb query away. They still somewhat decently can solve riddles with minor variations
https://docs.google.com/spreadsheets/d/1NgHDxbVWJFolq8bLvLkuPWKC7i_R6I6W/edit?gid=1135923916#gid=1135923916
>>
>>103584876
>continues his... methodical preparations
Kek, you banned ministrations, didn't you?
>>
File: file.png (34 KB, 600x497)
34 KB
34 KB PNG
yeah I'm thinking it's all over for local models
>>
>>103584766
>qwq
qwq was different so that got my attention for a couple of hours. Now it will sit in the folder as I'm back to waiting for the next interesting thing.
>>
>>103584876
I don't think anyone could guess accurately. Slop models are a dime a dozen.
>>
>>103584963
If Saltman actually made his platform pro-pussy it would almost be enough to make me forgive him
Almost
>>
File: paY4jTR.jpg (153 KB, 658x583)
153 KB
153 KB JPG
>>103584963
finally. I can plan the resources of my enterprise. Local is done for
>>
>>103584554
That's the price of the whole pc with a 5090 + vat for whatever yurop country it was shown
>>
>>103584876
Smells like cloud
>>
>>103584951
No.

>>103584968
It's a game. Just try.

>>103584992
What's your guess?
>>
>>103585065
I'll go with Claude, maybe Sonnet
>>
>>103584876
Yeah, this is pure slop. I go with local.
>>
can you guys just call him niggernov, you're already fucking with his name and it rolls off the tongue way better
>>
File: 836QA.jpg (34 KB, 1080x488)
34 KB
34 KB JPG
>>103584963
>4.5 and o3 all on the same day
Sam is going to kill local for good.
>>
File: 1719160454181529.jpg (153 KB, 768x768)
153 KB
153 KB JPG
>>
>>103585190
no, fuck off
>>
>>103584875
It's from Meta FAIR and Yann LeCun is one of the authors.

> We extend Visual Instruction Tuning to Visual-Predictive Instruction Tuning to study unified multimodal models. This simple yet effective approach enables LLMs to predict both visual and text tokens through instruction tuning, without requiring extensive architectural changes or pretraining.

> We discover that generation and understanding are mutually beneficial. Through extensive experiments, we reveal that visual generation emerges naturally as models improve at understanding—requiring as little as 200K samples when co-trained, compared to millions needed traditionally.

[...]
>>
its so funny how local lags behind with the censorship thing.
google got rid of the warning marks in aistudio and gemini 2 you can talk about how sexy a game character is etc.
they all move towards more natural sounding language. claude did it first. then openai.
it would be embarrassing if if meta is still using the 2023 chatgpt datasets with llama4.

>>103584884
>I've seen LLMs do big number addition
yes i did that a couple weeks ago.
cant be a tool in the background either because it doesnt work 100%.
especially without the 0000 at the beginning you get lower %.
so weird people still do the "muh trainingset autocomplete".
albeit crudely clearly llm can be used for novel stuff. thats the whole point.
>>
>>103585226
why would you do this instead of buying a Mac Pro with 192gb of memory for 10k?
>>
it's crazy how much openai has overplayed their cards when it comes to stirring up hype
in the past I would be excited and speculating about what they're going to do but after all these cycles of hyping us up for the actual release to be a complete nothingburger I just don't give a shit anymore, not going to waste my time for what ends up being another 3 pt bump on benchmarks or the GPT store 2.0
the only cool things they have done this year are 4o's fully mm capabilities and sora (sorry, o1 is a meme) and they are terrified to let people use either and losing ground to competitors on every front
>>
>>103585226
What's that shit all over your wall? Dust? Black mold?
>>
>>103585247
He most likely is just giving advice and guidance rather than playing a big role here. The most important ones are in the front.
>>
>>103585268
prompt processing for one thing
>>
>>103585278
Looks like holes to me
>>
>>103585262
Gemini 2 is totally uncensored? That's pretty unusual both by local and cloud standards. Even Claude models need a JB.
>>
>>103585262
That's the orange man effect
>>
Where exactly am I supposed to fit a second 3090 in my PC?
>>
>>103585262
>google got rid of the warning marks in aistudio and gemini 2 you can talk about how sexy a game character is etc.
The "safety level" can be configured but I still get the warning marks with even mild content, and I don't really want to test how much it takes before Google will revoke access or even terminate my account completely.
>>
>>103585311
no thats not what i meant.
but:
1.i did see no more of those warning marks. (pic related)
2.i could talk about stuff like jade/Marutina from dq11 being sexy and having a hot body.
usually i always got "muh respect. need judge by personality etc."
I'm sure its cucked at some point but i'm not gonna send google anything too spicy.

I mainly meant the direction. We are moving torwards less censorship. Yet local is still stuck. (apart from mistral)
>>
>>103585206
3.5 was watered down 3.0
remains to be seen that 4.5 will be
>>
>>103585262
Anon, read upthread >>103584255. The argument isn't that LLMs can't sometimes get these questions right, it's that LLMs are performing in depth arithmetic and propositional logic in the background, which is absolutely not how these models work. It's heuristic, robust, and can adapt to different cases decently once trained sufficiently, but ultimately all behavior observed from LLMs is autoregressive.
>>
>>103584884
>my best guess is it connects arbitrary looking names to some fantasy OC because the probability to associate it with any other data must be 0 unless it uses some levenshtein distance approximation or dissects unknown tokens into even smaller parts.
Funny you should mention that >>103578821 (me)
>If there's something in the context that will guide it towards answering, even if it's completely unrelated to what you asked, it will.
You DO understand why they don't reply correctly to obscure trivia. Why are we arguing?
>>
>>103585319
>>>/pol/ and never come back.
>>
>>103585268
mac pro is agonizingly slow for anything above 70b
>>
>>103585278
Maybe rocksheet or something made of cement.
>>
>>103585367
whats controversial about that? thats usually how it works.
>>
>>103585341
That's pretty unusual for cloud if no prior instructions or JB were really added. I'm not going to test it, but are you sure the current Claude and GPT models are also that "neutral"? It doesn't make sense to say that cloud is moving in one direction if only one of the players is doing it and others like OpenAI/Anthropic have not caught up yet.
>>
Is it just me, or low b finetunes write sex scenes better than large models? They make logical mistakes, but their writing style is miles ahead. I now switch from 123b to 22b during sex
>>
>>103585370
How do you define slow? For me ,the minimum I'm willing to tolerate is 8 tokens a second for generation. I've found that once I get below 30 tokens a second my hardware starts to put in some effort, so maybe not the best for long term use.
>>
>>103585074
Nope.

>>103585169
What's your guess?

One clue is that I didn't use any apis with prefils or clever prompting or the like.
Just wrote some text in whatever frontend and chatted for a bit.
>>
>>103585308
>>103585386
That's what I thought at first, but why would it cluster near the bottom like that if it was? The wall looks like drywall panels anyway judging by the loose insulation in the back.
>>
>>103585331
Buy the turbo version, it's 2-slot
>>
File: a.png (64 KB, 2855x250)
64 KB
64 KB PNG
Never thought i'd see the day. Nothing else like prefill or sysprompt.
>>
>>103585276
It's kind of funny that what was supposed to be "12 Days of OpenAI" ended up being "12 Days of Google". They shit on them in just about every regard.
>o1
Flash thinking is just as good, faster and cheaper, and it actually shows you the CoT rather than hiding it like a monopolistic cunt. QvQ 72B is also looming on the horizon.
>Sora
Already behind chink options, and Google unveiling Veo 2 completely demolished what little appeal it may have had.
>4o multimodal
Maybe the one thing they have going for them until we see more of Project Astra or Llama 4.

Every other fucking thing was either worthless or something somebody else already did better. We'll see if o3 has any appeal (and I hope for their fucking sake it's an actual release and not an announcement - otherwise the Sora effect is going to fuck them hard), but if their goal was to garner hype, this entire event has been a fucking shitshow for them.
>>
>>103585621
>With that said, I'll choose... neither!
Yeah well fuck you too.
>>
File: file.png (92 KB, 863x820)
92 KB
92 KB PNG
local confirmed the white man's choice
>>
>>103585276
>>103585622
I always take these extreme hype attempts as a sign that whatever the product is, is probably not as good or good enough on its own merits, otherwise they'd just let people be impressed by the results on their own.
It's pretty much the same grifter scammer behavior you see on crypto scams and the like.
>>
>>103582916
There's a difference between random chink lab #4632 and FAIR
>>
>>103579890
Right after it's confirmed that speech modalities cause brain damage
>>
>>103585718
There is a difference. And yet FAIR (in addition to many major AI labs) produce many papers that ultimately never actually have any impact on any product.
>>
What is the best local model a 3060 can run that can simulate claude output? I'm using Mistral nemo12b now and the bots respond are pretty bad
>>
>>103585737
Where does it show that? I just see older models doing worse.
>>
>>103585770
Look at the parentheses. Gemini 2.0 Flash (speech to text) versus Gemini 2.0 Flash (text to text) for instance
>>
>>103585615
I mean, it's a real premium. I only got mine because it was on marketplace for the price of a regular one.
>>
>>103585658
Kek
>>
>>103585780
Its just speech to text to text. Its most likely just people being shit at speaking coherently.
>>
>>103585739
Which is true, but the argument is that BLT is legit. Which is a valid question if coming out of some shady lab in Beijing or a random team at Backwater U, but less so when it's one of the big ones with a history of making advances.
I think it needs to be tested more and there are questions to be answered, but I don't think the results are fabricated.
>>
File: file.jpg (15 KB, 320x290)
15 KB
15 KB JPG
dayum those 64gb+ unified ram macs are expensive as fuck
>>
>>103585743
>a 3060
>simulate claude
>Mistral nemo12b
>respond are pretty bad
>respond
>>
>>103585813
And not worth it unless you like waiting minutes for token processing.
>>
>>103585807
See the speech-to-speech pipeline to the left, which explicitly separates the modalities.
>>
>>103585826
That is just gpt4o being next to gpt4o but using whisper instead of whatever they use? Which is probably just a bigger or better version of that / is implemented better?
>>
>>103585826
>>103585840
Infact using your own pipeline for the same model and getting within 2% should disprove your point. That 2% could easily either be in the margin of error or be a loss from people just not one shot explaining as coherently as typing something out would.
>>
>>103585743
you dont have enough vram.
nemo is the best vramlets have.
this is what we had 2022 with 24gb vram.
gotta wait morre or upgrade.
>>
>>103585840
No. The speech-to-text / text-to-speech / speech-to-speech have a speech modality baked into the model rather than being explicitly separated (like Llama 3.2 with images). The point here being it isn't just people being incoherent, otherwise the Whisper pipeline, which uses the text only model, would have ranked just as poorly.
>>
>>103585879
Wait nvm, I thought the first gpt4o was speech to text nvm
>>
>>103585809
I didn't mean to imply that the issue is with fabricated results. The issue is that most papers do not mention what all the limitations of the techniques truly are, regardless of whether the authors held back some information or were honest. But really though even if they are honest and there are no limitations, it's unproven just how far scaling can go. It may not scale to a production model's training.
>>
>>103585869
>we had sovl
What happened in 2023?
>>
>>103585897
No problem, sorry for getting snippy kek
>>
>>103585226
how many giga-octets?
>>
>>103585226
Imagine having to explain to someone that you use this to masturbate.
>>
>>103586102
>>103586102
>>103586102
>>
>>103585743
Nemo isn't bad, you probably are using some retarded presets/prompt.
>>
>>103586817
>the bots respond are pretty bad
Barely being able to write in english probably has some effect.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.