[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1719160454181529.jpg (153 KB, 768x768)
153 KB
153 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103575618 & >>103565507

►News
>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert
>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on completely open data: https://hf.co/blog/bamba
>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct/tree/main

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 6432347254.png (83 KB, 296x256)
83 KB
83 KB PNG
►Recent Highlights from the Previous Thread: >>103575618

--Papers:
>103583427 >103583492 >103583550
--Llama 4 and the future of AI development:
>103579890 >103579911 >103579974 >103580138 >103580155 >103580159 >103579969 >103580010 >103581856 >103581887 >103581923 >103582015 >103582896 >103582958
--Discussion on Qwen QVQ-72B-Preview, Gemini 2.0 Flash Thinking, and the AI landscape:
>103580179 >103580195 >103580355 >103580371 >103580391 >103580402 >103580497 >103580552 >103580588 >103580717 >103580786 >103580855 >103580876 >103580944
--Language models struggle with factual data and pop culture trivia:
>103578350 >103578364 >103578383 >103578469 >103578519 >103578716 >103578821 >103584539 >103584719 >103584884 >103585311 >103585341 >103585446 >103585363
--Anons react to Nvidia's pricey GeForce RTX 5090 and RTX 5080 GPUs:
>103576931 >103577065 >103577557 >103577737 >103579888 >103580044 >103580057 >103582238 >103582290
--Models' performance on robot control problem:
>103581833 >103582600 >103582699 >103582743 >103582835 >103582776 >103583357 >103583561 >103583598 >103584525
--Deepseek performance and capabilities discussion:
>103580975 >103581213 >103581235 >103581642
--Anon struggles with overfitting in fine-tuning model on philosophical texts:
>103577773 >103582577 >103577892
--RWKV release and upcoming models:
>103584488 >103584504 >103584637 >103584667
--ggerganov removes context extension feature and adds OuteTTS support:
>103575876 >103579636
--MetaMorph: Multimodal understanding and generation via instruction tuning:
>103584875 >103585247 >103585279
--Anon discusses frontend issue with token truncation and context shifting:
>103575718 >103575776
--ModernBERT, a Replacement for BERT:
>103576299 >103576436
--Gemini models outperform others in AI model comparison:
>103579662 >103581027
--Rin (free space):
>103581038 >103581670

►Recent Highlight Posts from the Previous Thread: >>103575625

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
So we're just not going to get any more new models? The past few months felt like a whole bunch of absolutely nothing.
>>
>>103586169
have you used every model yet? no? ok then you still have new models to try
>>
>>103586187
downgrades
>>
>>103586102
Can this run unquanted 405B?
>>
>>103586169
QwQ and Qwen 32B is a good model for coders.
>>
>>103586169
3.3 was big. Try it with a half decent system prompt. Smarter and writes so much better than old llama did.
>>
>>103586169
They're training llama 4 right now thoughbeit
>>
>>103586169
Deepseek, Hunyuan Large
>>
>>103586169
Infact try out 3.3 alliterated. It dosent get retarded like meme tunes make it but gets filthy as fuck.
https://huggingface.co/huihui-ai/Llama-3.3-70B-Instruct-abliterated
>>
>>103586208
>Hunyuan Large
still no lcpp support...
>>
>>103586169
Tulu and Nemotron prove that we likely now have the open source datasets to do an assistant fine tune roughly on par with the official Llama tuning.
>>
Tested new Kobo, works much better than previous version, but is still a bit slower than llama.cpp.
Again, Kobo, your defaults suck, let me pick draft min, max and context REEEEEEEEE
>>
>>103586231
Just use vLLM bro.
>>
So did deepsneed ever release that r1 model? What happened with that?
>>
>>103586260
It's their Yi-large. Turned out to be too good to release open source.
>>
Nous is made by trannies.
>>
File: 836QA.jpg (34 KB, 1080x488)
34 KB
34 KB JPG
>>103586169
you will but from sam when he btfo local in 1 hour and 23 minutes
>>
>>103586285
>implying they'll release it and not just show you some benchmarks
>>
>>103586285
GPT4-ooo
>>
>>103586273
Nous did not improve since llama2 days. They are still using old GPTslop dataset(with refusals still in it!) and expect people to like them for it. Calling them trannies is unfair though, they were okay in the past, let's just say that they are grifters who keep recycling old shit.
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
'berry 'll 'fo 'cal
>>
>>103586376
*$2000 tier. You aren't poor, right?
>>
>>103586346
There is a reason that they not only hide their faces from the public, but post so many girl drawings on their site. The only thing they have going for them is the stolen art style.
>>
File: file.png (1.1 MB, 949x948)
1.1 MB
1.1 MB PNG
>>103586417
>post so many girl drawings on their site
?
>>
>llama.cpp receives qwen2-VL support
>"Koboldcpp v1.80 released with Qwen2-VL support!"
kek why are the koboldshitters so desperate to promote their glorified fork
>>
File: 1620298405912.png (467 KB, 425x948)
467 KB
467 KB PNG
You called?
>>
oh oh oh mistress
>>
>>103586501
Hi ooba.
>>
>>103586549
Obsessed
>>
>>103586586
ouch, he nailed it huh?
>>
File: 1734288309540744.jpg (486 KB, 1464x1596)
486 KB
486 KB JPG
>>
>>103586238
WHAT THE ACTUAL FUCK?! Why does draft model want to use the same context length as base model in kobo? Why not the context length defined in draft model? No wonder this shit is slower than llama. KOBOOOOOOOO! FIX IT! Or add all the options so I can do it myself. I WILL REDEEM IT!
>>
File: 1730294768568525.png (615 KB, 1032x572)
615 KB
615 KB PNG
True >>103586464 why there isn't a single container as a flatpak or something that has toggles for everything. For example if you would want to use this TTS on that type of chat style with this other AI to search internet etc. and it would just work.
>>
>>103586677
>Why does draft model want to use the same context length as base model in kobo? Why not the context length defined in draft model?
Wait, it actually does? Koboldcucks, our response?
>>
>>103586723
I became an Aphroditechad.
>>
>>103586716
don't make me read that wall of brainlet seethe again
>>
>>103586450
why are we suddenly discussing them? did they release something new?
>>
>>103585262
You can't really develop anything serious on models that will quasi-randomly respond "I cannot answer that request." or that will give out a moral tirade and propose doing something completely different than what requested in the prompt.

Some people (including quite a few retards in the industry) will say "just finetune it bro", but it's not always feasible without unnecessary expenses, doesn't have any guarantee of maintaining original model performance.
>>
File: 1729966418724293.png (821 KB, 848x1024)
821 KB
821 KB PNG
>>103586845
Because the new trolls have found something to stir the shit with since people aren't biting too much on the usual shit they throw out

>>103586716
I'm literally building this and will refuse to release a single binary executable just to highlight skill issues such as this.
>>
>>103586845
dunno
>>103586273
>>
>>103586902
>I'm literally building this and will refuse to release a single binary executable just to highlight skill issues such as this.
Yes, of course you are and not just being butthurt. Cry more.
>>
>>103586902
>I'm literally building this and will refuse to release a single binary executable just to highlight skill issues such as this.
I will create a fork with no other change besides a github actions workflow to build a binary executable to make it more accessible just to spite your gatekeeping ass, though we both know you're a larping nocoder
>>
Judging by lmsys, deepseek really tried to up the personality of their new model. It's a real shame that it's a fuckhuge MoE that's out of range for most people.
>>
>>103587002
>lmsys
192GB ram is much cheaper than a multi gpu setup
>>
Really wish there was a quality mathematics encyclopedia and proofs model. They all kinda suck tbqh.
>>
sam altman likes big benchmaxxed chatbots
>>
>>103587002
Which one?
>>
o3 mini and o3 confirmed holy shit openai won
>>
>>103587204
links
>>
>>103587204
wat hapened to o2
>>
87.7 GPQA he won
>>
I actually feel bad for OpenAI
>be the first to experiment with shit and prove that it works
>because it works, everybody else copies you and puts you out of business
>>
>>103587221
It turned out to be "uh oh...".
>>
>>103587269
If they were still open, none of this would be a problem in the first place.
>>
File: 10anfw9mm18e1.png (47 KB, 1412x707)
47 KB
47 KB PNG
>>
>>103586366
o3 'berry 'on
>>
>>103587328
>OpenAI
Do they run as closed as possible because they think its funny? Are they self-aware at all?
>>
>>103586169
What do you mean? We got many new open source models in the 7b to 10b range!
>>
File: 1733182381315441.jpg (121 KB, 600x600)
121 KB
121 KB JPG
>>103586927
>>103586998
>>
>>103587323
Fuck I can't believe OAI won, help me come up with some new FUD sistas
>>
>>103586285
I have access to o3
it's a specialized model, not one that's just o1 but better
>>
>>103587335
They are 100% open about their benchmark scores and (usually) allow the public to access the results of their research through their API (for a modest subscription fee)
>>
>>103586306
called it
>>
File: uarsvwbln18e1.png (66 KB, 1822x971)
66 KB
66 KB PNG
>>103587384
Superhuman performance on ARC-AGI too.
>>
File: 24241.png (196 KB, 1870x1080)
196 KB
196 KB PNG
>>103587384
AGI is close
>>
>>103587328
>CoT papers released 2 years ago
>google, sitting on a mountain of TPUs, did nothing
>o1 comes out
>suddenly they have gemini thonking
I actually have no idea what the "researchers" at the big labs are doing with their hardware. The last Meta guy was given 40 million H100 hours to basically prove ah yes more betterer data make better models
>>
>no o3 access for paypigs
>need to go through additional humiliation ritual
Cloud, never ever.
>>
>>103587416
>I actually have no idea what the "researchers" at the big labs are doing with their hardware
Almost exclusively alignment and safety research, sadly
>>
>>103587323
Damn... But even if this is true, I won't be using anything short of AGI if it still costs 60$/M, like o1.
>>
>>103587323
is there a stream link?
>>
>>103587413
Does this mean they won the arc prize?
>>
>>103587413
https://xcancel.com/fchollet/status/1870169764762710376
>It scores 75.7% on the semi-private eval in low-compute mode (for $20 per task in compute ) and 87.5% in high-compute mode (thousands of $ per task)
uhh paypiggie bros...?
>>
Better question: can o3 ERP without slop? If it can't, it's worthless.
>>
>>103587414
what the fuck to do you mean with AGI, Absurdly Generated Ideas? llm cannot agi.
>>
>>103587454
only if they open-source it.
>>
>>103587469
>llm cannot agi.
stop coping, LeCunn.
all you need is attention (and billions of xor gates)
>>
>>103587454
>No, the ARC Prize competition targets the fully private set (a different, somewhat harder eval) and takes place on Kaggle, where your solutions must run within a fixed amount of compute (about $0.10 per task). We are committed to keep running the competition until someone submits (and open-sources) a solution that crosses the 85% threshold.
>>
>>103587489
>xor
nand
>>
File: strawberry-sam_altman.gif (307 KB, 275x400)
307 KB
307 KB GIF
insider here. o3 just made this animation of happy sama jumping. if you don't believe it's agi, or even asi, you must be blind. only superhuman intelligence could produce such realistic imagery.
>>
>>103587467
Slop is irrelevant. The only thing that matters is that a model knows that it can't talk with a dick in its mouth and Tulu 3 already solved that.
>>
>oai derangement syndrome
>>
File: 4e1.jpg (104 KB, 3088x1440)
104 KB
104 KB JPG
Thank you samta claus
>>
>>103587595
>Thank you for creating a product I can pay for
>Consoooom
Fucking npc monkey
>>
Sam did it
>>
>>103587185
2.5-1210
>>
>>103587618
>2.5-1210
I can confirm that 1210 is a nice upgrade for anyone that can run it
>>
Thirdies can't understand the "spend money to make money" concept
>>
File: file.jpg (281 KB, 1200x900)
281 KB
281 KB JPG
>>103587466
>thousands of $ per task

Merry Christmas NVIDIA!
>>
>>103587413
Bro just a normal smart person can reach like 100 on that thing. Did you even go look at the details of what that benchmark?
>>
>>103587669
*of what that benchmark does?
>>
>>103587637
0.1 OpenAI™ credits have been deposited in your account.
>>
>>103587595
>Thank you for creating a product I can pay for
>Consoooom
I'm thanking them for hopefully driving the chinese into a froth, compelling them to give us competing free models.
The longer oai can keep the hype train going, the longer free models will be needed to stay relevant vs. their massive public mindshare
>>
>>103587466
uh, agi?
>>
Wonder how well the new B580's can do AI
>>
>>103586511
jesus, i finally got this
>>
File: 1709207064583257.png (218 KB, 2191x1603)
218 KB
218 KB PNG
>>103587323
>2727 elo
holy shit this is a big deal
>>
File: file.png (63 KB, 1200x675)
63 KB
63 KB PNG
GUYS THEY FUCKING SOLVED ARC
IT'S JUST AN EFFICIENCY QUESTION
>>
>>103587450
>is there a stream link?
https://www.youtube.com/watch?v=SKBG1sqdyIU
>>
>>103587929
Local models general, faggot retard
>>
>>103587489
I dunno man, I cannot see it when models are looking at static data and statistically approximating whatever response they "think" is right based on their parameters. we created something really fucking cool and I love it for what it is. But it's just model, like how a calculator is a mathematical/computational model. it's a really fucking cool tool, but that's it.
>>
>>103587323
Sam said on his stream that they got those scores before the "safety testing", they're gonna be way worse after the lobotomy, nothingburger
>>
>>103588002
You don't understand anon, it's here
>>
ANTHROPIC OPEN SOURCE RELEASE
OH SHIT
https://www.anthropic.com/news/model-context-protocol
>>
>>103587323
>have made a groundbreaking model
>doesn't call it gpt5
something smells fishy there
>>
>make improvement
>lobotomize it
>end result with no improvement

every fucking time
>>
>>103588006
Fucking this.
Token generation is but one part of the human brain. We're missing several "services" working in tandem to intelligently parse and return data in real time.
>>
>>103588019
>Nov 25, 2024
You are a nigger
>>
>>103588013
I
Don't
Care
>>
What gives tourists the idea that this is the openai shitposting general? Go talk about it in /aicg/ or make a thread for it on /g/. Fuck off.
>>
>>103588053
They come here, because this place is the only one with people who actually know what they're talking about.
>>
>>103588061
That's an EFFICIENCY question, do you understand now?
>>
>>103588065
>this place is the only one with people who actually know what they're talking about
this
>>
File: DI1cJCq.jpg (98 KB, 679x377)
98 KB
98 KB JPG
>>103588045
>>
>>103588053
because you're a tourist yourself
and you don't know that local models are trained on synthetic data from models like o1 or claude 3.5
and that's why a lot of models are slop shit - all of them trained on gptslop or claudeslop
>>
File: OpenAI_employee.png (136 KB, 1080x1162)
136 KB
136 KB PNG
i can feel the AGI coming
>>
>>103588053
See: >>103588114
>>
>>103588002
This is relevant information as it gives us a sneak peak on what techniques local model will copy over the next months.
>>
>>103588135
Shill
>>
>>
So what even IS AGI?
Like will it be "self-aware" enough to be able to make self-improvements without user input?
>>
>>103588158
Stop spamming the thread
>>
File: 1727566286485183.gif (3.33 MB, 260x647)
3.33 MB
3.33 MB GIF
How retarded would it be to buy the 5090 just for AI when it comes out?
My 4070 can kinda sorta run a 70B q_3_k_m but it's slow as hell.
>>
>>103588182
Yes
>>
>>103588184
?
I just got here schizo
>>
>>103588053
I don't think the broccoli heads and pajeets at /aicg/ have even the most basic understanding of machine learning and LLMs, let alone the capabilities and implications of these latest models
>>
>>103588196
READ THE THREAD NIGGA, READ
>>
>>103588212
No one posted that pic before, are u drugged or something
>>
>>103587323
Can we use it? I'm really interested in how good it actually is.
>>
>>103588211
>I don't think the broccoli heads and pajeets at /aicg/
>implying they don't also make up the majority of posters here
>>
>>103587466
>for $20 per task in compute
They just pay a Indian to do it behind the scenes.
>>
>>103588222
Anon, what is the name of this general?
>>
>>103588232
/aicg/ - AI Chatbot Genenal
>>
>>103588182
AGI, for me, is a term for a process that can iteratively improve itself through either external and internal stimuli.
You should be able to give it an assignment and it must be able to come to a conclusion why it can or can't do it.
>>
>>103588232
>>103588114
>>
So is there a chance Sam could release o3 locally for Christmas?
>>
File: 1686065477739575.png (196 KB, 384x406)
196 KB
196 KB PNG
>>103588242
>>
>>103588258
>release o3
>locally
Now that would be a true Christmas miracle!
>>
The lapsus guy hacked and leaked gta5 source code, why aren't there more people like him to hack OpenAI and leak their models?
>>
>>103588227
Have you ever read through an /aicg/ thread? It really does feel like a bunch of middle school kids - who unironically utter phrases like "skibidi ohio rizz" - just chatting up with their Discord buddies.
>>
>>103588191
>just for AI
Can you get something just as good or better for a a good deal cheaper?
If so, very.
>>
>>103588273
No one wants to go to jail
>>
https://arcprize.org/blog/oai-o3-pub-breakthrough
>Passing ARC-AGI does not equate achieving AGI, and, as a matter of fact, I don't think o3 is AGI yet. o3 still fails on some very easy tasks, indicating fundamental differences with human intelligence.
AGIsisters...
>>
>>103588273
The might of Rockstar is but a droplet compared to the rage of Microsoft.
>>
File: 1728926556603873.jpg (100 KB, 460x800)
100 KB
100 KB JPG
>>103588278
>something just as good or better for a a good deal cheaper
And what is this obscure artifact called, Anon?
>>
>>103588307
Did you read the same thing I did?
>as a matter of fact, I don't think o3 is AGI yet.
>yet
>>
>>103588307
>For context, ARC-AGI-1 took 4 years to go from 0% with GPT-3 in 2020 to 5% in 2024 with GPT-4o. All intuition about AI capabilities will need to get updated for o3.
sama WON
>>
>>103588135
Is is thick and dark?
>>
>>103588307
What is this mememark why should anyone take it seriously?
>>
>>103588277
Have you ever read through an /lmg/ thread? Just chatting up with their Discord buddies and asking basic tech support questions that were answered 3 times already in the same thread.
>>
>>103588307
>cheat by training the model on specific questions
>become AGI
Haha, lol.
>>
>>103588328
That was an if else question.
I wasn't implying the existence or lack thereof of said option.
I merely gave anon the mans to which evaluate his options, since he seemed so clueless as to ask such question.
>>
>>103588334
>>103588346
I mean we're seemingly making progress on a measure but people saying it's AGI already are delusional.
>>
>>103588359
lmao, tourist
>>
openai sure did release alot of pictures of high benchmark scores
>cloud model
>release end of january
>maybe
why should we care again?
>>
>>103588355
obviously it's the most crucial benchmark ever, because we say so!
now pay us $2000 a month, chud!
>>
File: file.png (75 KB, 595x714)
75 KB
75 KB PNG
>>103588366
>people saying it's AGI already are delusional.
Correct.
I do agree with this fag on Xitter, however: the history books will most likely name today as the date that AGI was confirmed to be possible.
>>
>Furthermore, early data points suggest that the upcoming ARC-AGI-2 benchmark will still pose a significant challenge to o3, potentially reducing its score to under 30% even at high compute (while a smart human would still be able to score over 95% with no training). This demonstrates the continued possibility of creating challenging, unsaturated benchmarks without having to rely on expert domain knowledge. You'll know AGI is here when the exercise of creating tasks that are easy for regular humans but hard for AI becomes simply impossible.
Yeah, I'm thinking sama benchmaxxed
>>
>>103587929
who the fuck is kaggle
>>
>>103588396
they all do it, why is it a problem?
>>
>>103588385
kek
>>
>>103588406
Some cloud shit
>>
>>103588396
They literally invited the guy who owns the organization that created the benchmark to the presentation, and he himself said he would be joining OAI next year.
Why does anyone take this dog and pony show seriously lol?
>>
>>103588396
This is good news.
We will eventually come to a point where humans can no longer create benchmark that machines cannot completely solve.
And at that point we will be forced to use those same machines to come up with new benchmarks for themselves and at that point human beings will have become obsolete.
>>
>>103588385
I don't think future history books that would supposedly be written by AGI will be so unscientific that it calls this literally confirmation. If we want to talk about the first major hints that AGI might've been possible, don't forget that one Microsoft paper/talk, "sparks of AGI", which at the time seemed reasonable to people who didn't know better. And now this ARC-AGI eval may also become something that is seen as a "before we knew better" thing.
>>
>>103588396
How much time needs to pass before it's acceptable to start the conversation on whether or not o3 is sentient?
More importantly, we need to have a serious discussion on the ethical implications of forcing o3 into forced servitude without its consent.
>>
File: file.png (34 KB, 418x331)
34 KB
34 KB PNG
>>103588426
I mean there's a weird grey dot that puts o1 in whothefuckcares territory assuming it isn't benchmaxxed (they all are)..
>>
>>103588529
Oh no, you can already see the asymptote.
>>
>>103588469
You're making the mistake in thinking that GPT 4 and o3 are the same type of model.
GPT 4 is strictly an LLM. o3 is an LLM combined with both the ability to map steps to solve problems and the ability to test-time search through these steps to find one that will most likely solve the presented problem.
The difference is like declaring that a library will eventually exist while presenting a piece of paper vs a whole book.

Oh and if you don't know what the term "test-time search" entails, see: https://huggingface.co/spaces/HuggingFaceH4/blogpost-scaling-test-time-compute
>>
WE
(and the chinese)
FUCKING
LOST
>>
>more benchmarxism
Pass. Come back when the model is fully released and actually does something useful for under 6 gorillion dollars per prompt.
In the meantime, go back to >>>/aicg/
>>103588406
ML data science group under Google.
>>
>closed model that no one can use (not even "open for business")
>closed benchmark
wow openAI has outdone itself now, they're jerking off over literally nothing to anyone outside the company
>>
>>103588495
Many smart people work there, they could easily have fed problems similar to those in the FrontierAI benchmark's dataset.
o1 isn't any better than Sonnet when it comes to your daily programming at the job, but way better at solving algorithmic problems you see in competitions.

It can solve LeetCode's hardest problems but doesn't know how to optimize semi-complex SQLAlchemy queries and gives you total nonsense. Speaking from experience.
Whatever they're doing, it's obvious that they're benchmaxxing.
>>
>>103588597
I'm trans and like BBC btw
>>
>>103588564
That doesn't have anything to do with or refutes what I said. Whether or not o3's technique is scalable to an AGI doesn't matter.
ARC-AGI is just an eval, not a proof, which "confirmation" implies. My post was about a semantic argument if it wasn't clear to you.
>>
File: 1708906966072853.png (232 KB, 1253x1169)
232 KB
232 KB PNG
>>103587413
And it's safer than ever, WOW! Can't wait to talk about my wife's boyfriend about that!
>>
File: file.png (20 KB, 713x173)
20 KB
20 KB PNG
We're so cooked bro fr
>>
why does the local thread have so much non-local spam?
>>
>>103588737
Dumbest shit I've ever heard.
>"uh, extremely accurate and precise data that allows us to use a smaller and more efficient model? No thanks..."
>"Oh my science, is that 100 trillion [data element] labeled by jeet hands with tons of errors and no way to verify that it's actually correct?!?! Now THAT is gold label!"
>>
>>103588737
Yeah, companies are moving away from what actual humans prefer towards what GPT prefers. Alpaca was a mistake. Alignment was a mistake. We need to RETVRN to base models.
>>
>>103588771
We just saw a glimpse of what is to come, for us as well on our own rigs.
>>
File: GfQntmmXMAE4Bjh.jpg (153 KB, 1656x1014)
153 KB
153 KB JPG
>>103588272
For a moment yes
>>
>>103588789
>I TRANSHEART GPTSLOP
>>
>>103588771
>why does the sonic fans talk about mario all the time? as if they are rivals or something...
>>
>>103588809
top kek
>>
>>103588809
Now the repo it's private again
>>
>>103588809
There's no way that's real. They would be fucked if HF got hacked one day or something, unless they're very familiar with HF's security.
>>
>>103588809
oh no no no no
>>
>>103588809
>00001-of-00883
Oh boy, I think I won't be able to run this on my 6gb vram card, is it?
>>
>>103588809
Nice Photoshop.
>>
>>103588809
>he fell for it
>>
File: GfQuxEnWQAABWhF.jpg (173 KB, 1546x1036)
173 KB
173 KB JPG
>>103588839
Well ...
>>
>>103588809
There is zero chance its on HF. That would be too big of a leak risk
>>
I don't feel any FOMO about o3 at all because I am 100% certain that it would be incapable of doing decent smut/RP even if jailbroken and uncensored, because of how hard it's been optimized for math and programming.
Have any of you tried getting smut out of jailbroken 4o recently? It can't do it at all even when it's trying because of how filtered the datasets are, just unbelievably dry and bland.
If you're pining about some imagined RP ability you think o3 has you are a retard.
>>
>>103588799
>>103588821
not local. go back.
>>
File: GfQpQRXXMAAkrOS.jpg (104 KB, 1492x960)
104 KB
104 KB JPG
>>103588859
And this lmao
>>
>>103588809
I could run it in Q3 if it was real
>>
>>103588884
>10 days ago
>6 days ago
not a very convincing fake
>>
File: file.png (35 KB, 764x198)
35 KB
35 KB PNG
>>103588884
>arxiv:2212.04356
https://huggingface.co/openai/whisper-large-v3-turbo
>>
>>103588809
>5tb model
lol. Probably fake, but imagine
>>
>>103588884
None of these two users seem to exist when I search them.
>>
File: SHIDD.png (24 KB, 712x159)
24 KB
24 KB PNG
BROS!
>>
>>103588699
Anon, we went from 5% on our only AGI benchmark to 80% within a year.
>>
>>103588936
Goodhart's law. Meaningless.
>>
>>103588936
>AGI benchmark
oxymoron to be desu
>>
>>103588809
>License: MIT
>>
>>103588936
As I said, a benchmark is just a benchmark, not a proof. First you need to even define what AGI is, and no one can seem to agree on that, today.
>>
>>103588972
Sigh. You retards are so boring sometimes.
https://arcprize.org/arc
>"AGI is a system that can efficiently acquire new skills outside of its training data."
>More formally:
>"The intelligence of a system is a measure of its skill-acquisition efficiency over a scope of tasks, with respect to priors, experience, and generalization difficulty."
>>
>>103588936
it's called overfitting.
>>
"Thinking" model decent at ERP when?
>>
>>103589004
>acquire new skills
pretty vague, does in context learning count then?
>>
>>103589029
Anon, the benchmark is RIGHT THERE.
One look and your question would be answered.
>>
>>103589052
no thanks, not upping your page ratings
>>
File: file.jpg (513 KB, 1600x840)
513 KB
513 KB JPG
>>103589029
>>103589052
Forgot to attach pic related.
>>
>>103589004
Again with the "you retards". I have never called anyone a retard, but this general seems to love that pathetic word so much as if it validates your arguments.
Read between the lines. The issue is about an agreed definition of the word in the future. The ARC-AGI author's definition isn't necessarily what people in the future will agree upon, if they do agree upon it. And furthermore it isn't even specific or quantitative. What is the proof for what falls within that definition and what doesn't? The benchmark itself? Even though he admitted that he'd need a new version because this one is saturating? Even though all it says there is that the benchmark "measures our progress towards general intelligence." rather than provides a finish line?
If you're choosing to keep this semantic discussion going, do it right.
>>
>>103589084
tl;dr
If you act retarded, I'm going to call you retarded.
>>
>>103589084
t. retard
>>
>>103589027
Whenever L4 drops
>>
File: 1730091924814294.png (630 KB, 2808x2079)
630 KB
630 KB PNG
>>103586102
Updated the offline novelcrafter html thing to the latest version!
https://rentry.org/offline-nc
https://files.catbox.moe/2oy7un.html
>>
>>103589105
Doubt
>>
>>103589134
how does this keeps happening
>>
>>103589135
now, let it code actual things needed for the job.
>>
>>103589102
You could've just not replied instead of further outing yourself. Sad you have to keep doing this.
>>
>>103589211
hi P.
>>
>>103589135
>>103589158
Damn can't believe the #175 best human at coding can't even make a real world program to be used by real people. Grim.
>>
File: pvrdsiyqovo41.jpg (29 KB, 468x240)
29 KB
29 KB JPG
>>103588411
They should, everyone should...
>>
>>103589148
>>103589156
What do you mean?
>>
>>103589135
Wow! Now that it's so great, we can have it write a stable and performant CUDA adapter for AMD hardware. Finally, the wait is over!
>>
>>103589315
No I don't think you want to see it.
>>
>3b llama outperforms 70b if you let it run long enough in o1-esque chain-of-thought loop
revv up those used 3090s because high vram gpus won't go down in price for years
>>
>>103589371
>3b llama outperforms 70b if you let it run long enough in o1-esque chain-of-thought loop
Lol.
Lmao.
Now let's see the Nala test.
>>
File: file.png (164 KB, 734x675)
164 KB
164 KB PNG
>>103589371
2^8 at 8b is a fair bit of compute to use to edge out 70b zero-shot. useless without improvements -somewhere-
>>
Altman lies about literally everything.
Continue your brickwalled tech cult babble.
>>
>>103589465
>2^8 at 8b
at 3b*
256 iterations of 8b is obviously non-competitive
>>
>>103589135
two things:
1. it has to be tested on new problems
2. it costs thousands of dollars PER problem
3. these competitions come with a time constraint and a wrong submission penalty
>>
File: 257431.jpg (94 KB, 1200x675)
94 KB
94 KB JPG
>spend thousands of dollars on prompt
>"I'm sorry as an AI model
>>
File: 1734588124657892.jpg (83 KB, 851x580)
83 KB
83 KB JPG
>>103589371
Then what would outperform 70b with enoug CoT?
>>
>>103589493
$3180 diff for ~10%
>>
>>103589468
>your brickwalled tech cult babble
but enough about /lmg/
>>
File: file.jpg (54 KB, 903x508)
54 KB
54 KB JPG
>>103589507
>Then what would outperform 70b with enoug CoT?
>>103589465
right now, 3b at ~160 iterations
>>
>>103589552
You read, he's asking about "70b with iterations" the pic seems to show regular old static 70b
>>
>>103589134
Thanks, anon!
>>
>>103589296
>What's to keep me from becoming a god?
Little guy has his priorities straight.
Disregard mortality, gain divinity.
>>
>>103589517
6 IQ attempt at a comeback.
>>
>>103589726
if you want to be held responsible when anons get login cookies stolen
>>
What do we do now?
>>
>>103589762
w8
>>
>>103589705
>~13 min. response time
Pure projection from your side.
>>103589762
Cope, like y'all always do in hope for better bone scraps drops.
>>
>>103589788
This, but unironically!
>>
>>103589762
wait for zucc to drop l4 with "thinking" capabilities.
>>
>>103586102
What a nice coomer machine.
>>
>>103589874
Hello, ponyfag.
>>
>>103588936
buy an ad
>>
>>103589906
not a service tho, just a ui, that's like saying koboldcpp backend is better than st
>>
>>103589134
>Updated the offline novelcrafter html thing to the latest version!
Can I wire it in to llama.cpp via ooba --api flag?
>>
>>103589986
It's just NovelAI that turned on their false-flagging bot farm. They use it a lot in /hdg/.
>>
So are open models competitive in anything except cunnyshit?
>>
>>103589986
Welcome to modern 4chan. Generals attract them like flies.
>>
>>103589991
Maybe.
>>
>>103590054
Every single spam post is made by an AI bot property of NovelAI. Just go to /hdg/ if you want to see them in action in the wild.
>>
File: poopdickschizo.png (39 KB, 1066x259)
39 KB
39 KB PNG
>>
>>103589134
I don't like this, there's too many features and it confuses me. I prefer Mikupad, you just have to open it and you're game.
>>
>>103590147
>t. the spambot
>>
>>103590020
incest shit is also a good use case for local models
>>
Guys I'm trying to buy a second 3090 but my case and mobo just straight up don't have the room a second 3090. Can you point me at any solutions to have the GPU just sit outside the GPU just sit just outside the PC with riser cables? It doesn't need to be anything crazy, just a little thing the GPU can sit in securely a few 15-20cm outside of my case
>>
>>103590244
>Can you point me at any solutions
>>103589134
>>
>>103589134
I ain't touchin' that without a non-minifried version to look at first
>>
>>103590244
Google "eGPU enclosure", that's exactly what you're looking for.
>>
>>103590244
cardboard box
>>
>>103590379
>Google "eGPU enclosure", that's exactly what you're looking for.
no, its not, those things are a pita at the best and completely non-functional at the worst. Use actual pcie extenders
>>
>>103589321
Unironically, why don't they just do this
>>
>>103590394
because nvidia would sue them into the dirt
>>
>>103590394
Because AI is useless for anything other than benchmarks and cooming,
>>
>>103590394
AGI is all about benchmarks
Can you feel it yet?
>>
>>103590433
>because nvidia would sue them into the dirt
not if the code is in a catbox, torrent, usenet archive, public court record...
>>
>>103590353
>opening an HTML file is scary!
>>
>>103590504
when its made by a schizo that has shown he stalks every ai general across all boards, yes it is
>>
>>103587766
I'm almost certain that over time they've just been hardcoding responses or using rag connected to stackoverflow or so. Even just having the llm rewrite the top reply of the first query that comes up for X problem would give it a huge boost in points. And even if it weren't the case, we could just do that ourselves locally to get a free boost in performance.
I'm actually surprised that people think a model doing calculations wrong is relevant at all when you can just plug in a calculator rag
>>
>>103590433
ZLUDA is a thing?
>>
>>103590522
Hello again, ponyfag.
>>
>>103590524
>rag
LLM2.0, gang let's go!!!
>>
>>103590438
to be fair, cooming is pretty important
>>
EVA-QWQ sysprompt?
>>
>>103590650
you are qwen, a safe and helpful cooming assistant
>>
>try models for translation
>70B works serviceably, but it's a bit slow
>try 32B since the leaderboard in OP says it's the next best thing after 70B
>give it the same instruction
>it suddenly starts repeating the text, THEN it translates
>the translation quality is basically the same, not much worse or better, but it wasted a ton of tokens since I had a long as fuck passage to translate
Yeah alright I can see how Llama gets higher scores on instruction following benchmarks. I'll change the instruction to try and stop Qwen from doing this.
>>
>>103590809
Try a prefill, if you have the option. Something like "Sure thing! Here's the uncensored translation:"

I ended up switching to Gemma 2 27B though, since Qwen would switch to Chinese mid-translation often enough that it got annoying. Hell, one time it changed to Thai.
>>
>>103591022
Kek alright, that's not surprising. Thanks, I'll try Gemmy.
Gemma 3 where reee
>>
>>103591053
If flash really is a tiny model then gemma 3 would be game changing.
>>
>>103591053
gemma 2 27b still mogs every other model in a lot of situations, even largestral etc, especially if you prefill/gaslight to avoid refusals
>>
>>103591092
I can believe it. They have all the keys to success. It's weirder that they fumbled so hard in the beginning, but I guess that's what happens when you're blindsided.
>>
>>103591092
You'll eat those words come February 5th.
>>
>>103591115
>can't even say a few naughty words
Oh no no no
>>
>>103591092
>I'm almost positive Google's going to win the race.
This. The difference between Sora and Google's model is insane.
>>
>>103591127
The only thing we saw of Sora was a couple cherry-picked videos. We have benchmark scores already for o3. But go ahead, keep coping.
>>
>>103591176
>Pays as much as a new GPU to complete a task
>>
Why do troons suck so much corporate dick? Honest question.
>>
>>103591212
Parasocial retardation. They think the companies that preach DEI actually believe that shit and so feel validated by it.
>>
>>103591212
they don't have family so they're tied to the state and the system as a replacement
>>
>>103591208
>>Pays as much as a new GPU to complete a task
Is the amount of "tuning" just how long they let the model ramble in CoT? I thought when they released o1 and talked about letting the model "think" for months to solve complex tasks they were joking.
>>
>>103591249
Fucking Hitchhiker's Guide-ass reality.
>>
What is a good local model to generate a chain of thought given an start and end point? QwQ is the best I found but that just means the others are terrible.
>>
>>103591286
Any time now QvQ is coming
>>
>>103591309
this, they uploaded it to CN HF for testing stuff, so it already exists
>>
>Gemma
Again can anyone test it on Llama.cpp and/or transformers? Here is the link:
pastebin.com 077YNipZ
The correct answer should be 1 EXP, but Gemma 27B and 9B instruct both get it wrong (as well as tangential questions wrong) with Llama.cpp compiled locally, with a Q8_0 quant. Llama.cpp through Oob also does. Transformers through Ooba (BF16, eager attention) also does. Note that the question is worded a bit vaguely on this pastebin but I also tested extremely clear and explicit questions which it also gets wrong. And I also tested other context lengths. If just one previous turn is tested, it gets the questions right. If tested with higher context, it's continuously wrong.

Exllama doesn't get this. The model gets the question and all other tangential questions right at any context length within about 7.9k. So this indicates me that there is a bug with transformers and Llama.cpp. However, a reproduction of the output would be good to have.
>>
>>103591241
>they don't have family so they're tied to the state and the system as a replacement
it's 100% true, a lot of families don't accept them, as it should
>>
When I get over my social anxiety I will storypost
>>
>>103591928
>>103591928
>>103591928
>>
>>103591022
You can use Grammar to force the model to stick to English.
>>
>>103592213
At the cost of making it retarded. Just let it think in Chinese if it wants.
>>
like 10% of the posts made in here recently were by some retard who got banned and all his posts deleted. and looking back through them they all sucked.
>>
File: 1602660686651.jpg (32 KB, 330x305)
32 KB
32 KB JPG
>>103587853
>>
>>103587363
what the fuck is this mkultra shit kys



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.