[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1719876762014876.jpg (957 KB, 2048x2048)
957 KB
957 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103591928 & >>103586102

►News
>(12/20) RWKV-7 released: https://hf.co/BlinkDL/rwkv-7-world
>(12/19) Finally, a Replacement for BERT: https://hf.co/blog/modernbert
>(12/18) Bamba-9B, hybrid model trained by IBM, Princeton, CMU, and UIUC on open data: https://hf.co/blog/bamba
>(12/18) Apollo unreleased: https://github.com/Apollo-LMMs/Apollo
>(12/18) Granite 3.1 released: https://hf.co/ibm-granite/granite-3.1-8b-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: MikuUndPanzer.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
►Recent Highlights from the Previous Thread: >>103591928

--Testing prompt with Gemma and Llama.cpp reveals potential bug:
>103598786 >103599060 >103599119 >103599270 >103599980 >103599335 >103599387
--Discussion on AI model capabilities and OpenAI's marketing strategy:
>103594194 >103594260 >103594596 >103594671 >103594789 >103594837 >103595375 >103595386 >103599087
--Open AI and closed-source model comparison, with discussion on MOAT and Sonnet:
>103596394 >103596500 >103596568 >103596673 >103596910
--Speculation on Google's next model release and Gemini 2.0 Flash architecture:
>103597226 >103597294
--Anon seeks reliable benchmark for open models, suggests SimpleBench and Livebench as alternatives:
>103597588 >103597598 >103597858
--Model parameters and code quality discussion, with focus on synthetic data and training quality:
>103599057 >103599156 >103599190
--AGI and ARC-AGI benchmark discussion:
>103598880 >103598932 >103598957
--Debate on the newsworthiness of OpenAI's AGI advancements:
>103591969 >103592019 >103592034 >103593135 >103597897
--OpenAI's Memory feature and its relation to RAG:
>103598915 >103598979 >103599065
--Discussion on context size handling in gpttype_adapter.cpp and llama.cpp:
>103593005 >103593583 >103593616 >103593767 >103593804
--Anon seeks recommendations for smaller AI models (3B-8B tier):
>103599850 >103600140
--Intel B580 availability and paper launch rumors:
>103597156 >103597197
--Anon's ST Director plugin development and user interface design:
>103600709 >103600898 >103601014
--Anon's revelation about prioritizing diverse results over initial accuracy:
>103593344
--Anon asks about using model-output tags for RAG with Silly's vectorDb:
>103598196
--o3 core mechanism explained:
>103601121
--Miku (free space):
>103595899 >103597253

►Recent Highlight Posts from the Previous Thread: >>103591931

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
It's funny to see just how badly OpenAI failed yet again.
>>
>>103601899
openai won DOE?
>>
>>103601958
Let them have their "AGI" for another week before everyone realizes that it's just the same old shit with slightly better benchmarks.
>>
GPT-5
>>
Where qvq
>>
I'm hungry
>>
>>103601992
Very much like this general every time with new opensource meme model release.
>>
I want to goooooooooooooooon
>>
>>103602014
I don't think anyone has ever claimed an open source model to be AGI, so its not the same.
>>
>>103602006
Monday.
>>
Here's a not-so-novel idea, just to throw it out here. From "The Unreasonable Ineffectiveness of the Deeper Layers" (https://arxiv.org/abs/2403.17887) we know that at least with current training techniques about 30-50% of the model weights (mainly the deep layers) do not contribute much to the models' final performance. What if we (that is, some AI company with large enough compute) were to train models 2-3 times as deep as normal, and then chopped them to the regular depth? Wouldn't that improve model weight utilization, of course at the cost of training efficiency?

Meta could for example obtain a Llama 8B from a very deep ~20B model with the same dimensions as the target 8B model. Like the paper suggests, some continued pretraining after that might be necessary for optimizing final performance, but wouldn't it be a potentially much better 8B model than a normally trained one? Same for any other final size.
>>
>>103602035
QvQ will be AGI
>>
just cummed all over my thigh
>>
>>103602035
Yeah, the /lmg/ equivalent is a model being CLAUDE AT HOME for RP until it isn't.
>>
File: i-1054076549.jpg (20 KB, 395x320)
20 KB
20 KB JPG
i haven't followed LLMs ever since GPT-3.5. what have i missed?
>>
>>103602065
Nothing
>>
>>103602065
Local models now are good enough to do actual work with just a bit of tardwrangling.
>>
>pretraining scaling, the only source of interesting emergent capabilities and generalizable intelligence, grinds to a halt
>let's cope by doing expensive inference-time math benchmaxxing instead
grim.
>>
Is Skyfall better than Cydonia 1.3?
>>
>>103602093
how much memory do they take?
>>
>>103601801
thanks anon. you can edit the non-lorebook options in the messy html file they're all held in option tags so its not to hard to add or change them. next version will have a new lorebook option called other for all that stuff to make it easier (thats what the first screenshot was of)
>>
>>103602172
how much you got?
>>
>>103602181
12gb gpu ram and 32gb cpu ram
>>
>>103602201
If your ram is ddr4 you are cooked.
>>
File: da0fvqF.gif (1.5 MB, 640x427)
1.5 MB
1.5 MB GIF
>>103602201
>>
Pros/Cons on the different local UIs? I'm stuck using Ooga since that's the only thing that worked out of the box for me but seeing people mention Lorebooks and such like in >>103602179 makes me think I'm missing out on features.
>>
>>103602220
silly tavern for rp, kobold's basic ui for general and coding
>>
Skyfall feels closer to base Small with some smut and RP sauce poured in. I like it
>>
>>103602215
if that isn't enough, then i don't consider local models to be real
>>
If I plug a second 3090 into a 3.0 pcie x16 will I see a significant drop in performance or will it only be slight?
>>
Gemma2 9b vs 27b for chinese translation, there is a significant difference but maybe not worth the massive decrease in speed.
>>
>>103595899
Just use the elevenlabs reader app for audiobooks. It's free use to use. Use screen copy to record the audio.
https://github.com/Genymobile/scrcpy
>>
>>103602220
ooba doesn't have lorebooks? it might also be called world info. they're like dictionaries for info you want to bring up sometimes. like you could make an entry called 'my home', keywords 'house, home', and then describe it. then the info from that entry will be automatically added to the prompt when you type home or house into your rp
>>
>>103602290
24gb more vram is going to outweigh any speed lost, it'll still be fast
>>
File: capture.png (79 KB, 2550x823)
79 KB
79 KB PNG
>>103602436
This is all I'm seeing for character setup in Ooga.
>>
>>103602463
lorebooks are seperate from char cards (though i think you can actually embed them?) i've never used ooba but check what the notebook tab is. i thought its a pretty common feature
>>
File: capture.png (52 KB, 2547x1312)
52 KB
52 KB PNG
>>103602481
That's good thinking, but this is all I'm seeing there.
>>
File: 9041724.jpg (106 KB, 1179x1180)
106 KB
106 KB JPG
>>103601899
cope. sam made agi
>>
>>103602290
No, there isn't significant data transfer going on during inference for that to be a significant factor. Something that isn't often talked about when using 2 GPUs however is the substantial increase in operating temperatures, which for RTX3090s (most of them having a 2.5-3.0 slot design) is particularly harmful due to their clamshell memory module arrangement.
>>
>>103602487
are you rping? if so you'll want to try st anyways, its just nicer and its what a lot of char cards and lorebooks are made for anyways. what error were you getting with it? i've always used staging and paste over the whole folder to update every few days
>>
>>103602515
Why would more than one GPU cause more heat? Unless you're talking about physical proximity. Is there something else in missing?
>>
>>103602500
I find it very funny that they have AGI, yet they cant use it to find how to make better video and image models.
>>
>>103602528
Really just experimenting with things now. Good news is no errors with Ooga (it's actually the only thing that didn't give me errors trying to install it), just saw the lorebook chat and looks like it's missing there.

If Silly Tavern is the meta choice I'll check it out. Looks like I have to setup that and a backend like KoboldCpp separately, right?
>>
>>103602530
Yes, I mean physical proximity in a regular case on a standard consumer motherboard, where the next fastest 16x PCIe slot is the second one from above. For a period I had a 3090 and a 1070 just to have 32GB of total VRAM and run 70B models at decent quantization levels and speeds (~8.5-9.0 tokens/s). Even limited to 230W, the 3090 had +15-20 °C higher core temperature than in a single GPU scenario during prolonged inference. I eventually took the 1070 off.
>>
>>103602562
Oh this second 3090 will be hanging out of the case like intestines spilling out of a severed gut. It won't be an issue
>>
>>103602560
ooba is your back end/server running the model, but it also has a built in interface/front end that you're using now. silly tavern is only a front end and meant to connect to any server. you should be just fine using your existing ooba setup with it. i like kobold because it just works but you do not need it for st by any means, nearly any local server can connect to st
>>
File: charts.jpg (103 KB, 1350x1200)
103 KB
103 KB JPG
>>103602500
>>
>>103602617
>ad hominem
>>
>>103602500
Can't wait to access AGI (real) for the price of a H100 for every 100 tokens
>>
So how exactly did they achieve o3 performance? They just had o1 feed itself synthetic data over and over at increasing quality and trained it?
>>
>>103602632
It works if you're not poor
>>
>>103602595
Thanks for being so helpful, really appreciate it!
>>
I prefer tabby to ooba
>>
File: 3602591130.png (39 KB, 1600x891)
39 KB
39 KB PNG
you are smarter than o3 AGI if you can solve this
>>
File: 1726552856869326.jpg (77 KB, 864x701)
77 KB
77 KB JPG
>>103602681
no prob. when you first get st connected it might seem confusing but it won't take long to learn and be a much better experience. if you have trouble connecting, look for this socket button at the top and make sure your connection is set right
>>
>>103602739
Do they have these in a text grid format?
>>
>>103602658
Similar process of o1 preview to o1 but with a lot more compute time
>>
>>103602739
It would be pretty grim if a normal adult male couldn't solve this one kek.
>>
>>103602895
>>103602739
yeah I'm a fucking retard and this one's obvious to me. if o3 can't do it there's still obviously no real mind here, despite any other impressive things it can do. still just a kind of brittle savant.
>>
File: 1724171822740238.jpg (47 KB, 640x360)
47 KB
47 KB JPG
>>103602739
i've played that level
>>
>>103602739
It's less a question of solving it and more predicting how an average human would think it should be solved.

Still a fucking joke.
>>
>>103602739
I tried giving this to qwq and it's infuriating because I ran it multiple times and every time it figures out that blue cells connect it immediately gives up.

>Wait, perhaps it's about filling rows and columns that contain 'B's, but only between the boundaries defined by 'B's in those rows and columns.
>This is getting complicated.
>Let me try a different approach.

>Alternatively, maybe it's about filling from the leftmost 'B' in any row up to the rightmost 'B' in any row.
>But that seems too broad.
>I need a better approach.

I gave it inputs like this
..........
..........
..........
...RRR....
B..RRR...B
...RRR....
..........
..........
.....RR...
.....RR...
..........

..........
..........
..........
...BBB....
BBBBBBBBBB
...BBB....
..........
..........
.....RR...
.....RR...
..........
>>
>>103602739
I don't get it. I know what the solution is here, but what's the difficult thing about solving it? Does it need to prove the solution mathematically or something?
>>
>>103602739
I think this entire test was really ripe for gaming but no LLM maker simply just cared because AGI felt pretty far off and this test is more geared towards visual/multimodal models. After you get multimodality, and you train on a bunch of visual reasoning tasks + COT that you can synthetically generate, it's logical this could be solved. Like so many of the puzzles are just so really easy. So it's more like multimodal model development was in its infancy before now.
>>
>>103603339
QvQ will save us...
>>
File: 1716775342528871.gif (2.87 MB, 275x498)
2.87 MB
2.87 MB GIF
>>103602739
You're telling me o3 can solve THIS?
Take my money sama-sama
>>
>>103603387
Now that you mention it, it's pretty funny that it being visual was teased before o3 was announced. It's like they already knew about OpenAI's plan so they began preparing their catch up early.
>>
>>103603408
>You're telling me o3 can solve THIS?
no, he specifically said o3 cannot
>>
>>103603468
Nothing that can't be solved with longer CoT
>>
>>103602739
LLMs are 1D entities, it's pointless to ask them 2D tests
>>
>>103602739
So can I get my own trillion of H100s and a nuclear PP now that I'm better than SOTA AI model?
>>
File: file.png (159 KB, 1719x1294)
159 KB
159 KB PNG
What the fuck is this shitalian getting himself into now.
>>
>>103603628
qrd
>>
>>103603505
This is definitely the case. I added simpler examples to this >>103603339 that are just Bs at edges with no Rs to demonstrate the connections.
It struggles with columns. It easily identifies that the row is filled when the edges are Bs but has trouble doing the same with columns.
>>
>>103603505
4o and o3 are native multimodal and 2D, somewhat even 3D (just like how Sora is 3D and a person who was born with one working eye is 3D).
>>
File: capture.png (47 KB, 1283x595)
47 KB
47 KB PNG
>>103602753
Thanks to this anon for recommending Silly Tavern. It does look like a much more robust UI than the default Ooga one, with additional features including the Lore Books.

Unfortunately It looks like it's not connecting to my Ooga backend. Started Ooga separately, it's running at the IP address in the screenshot and its own UI works fine.

Anyone have any idea why connecting it to Silly Tavern wouldn't be working?
>>
>>103602739
>>103603341
The difficult part is that your solution is wrong if you don't color the very top box (which does not intersect any lines) blue, due to a rule not expressed in any of the examples. If you drew lines between the blue squares and colored the boxes they went through, you got the same wrong answer as o3.
>>
>>103603800
You've got the wrong server URL in there. It should be something like the example

Try one of these

http://localhost:5001/v1/
http://localhost:5000/v1/
http://localhost:5001
>>
>>103603800
did you add the --api flag like it says to your ooba launch?
>>
>>103603827
Tried them all but no dice.

>>103603832
Ah fuck me. Good eyes anon! I'm running it from a .bat batch file though, how do I pass a parameter when running it? I can edit the file in Notepad++ but can't see where to pass it in there either.
>>
>>103603851
are you running a gguf file? consider trying kobold for a server, its one file and just works. it'd have a different interface that you use, but you're using st anyways
https://github.com/LostRuins/koboldcpp/tree/concedo_experimental
>>
desu, I haven't used ooba in months. Its only on my PC because thats where I keep the models. I've switched to tabby api for exl2 and kobold for gguf. Those two fit my needs perfectly.
>>
Broes, do you guys know any text to speech software models that dont require internet connection or a subscription? I want to convert my notes into audio
>>
File: IMG_20241221_231118.jpg (222 KB, 1080x1150)
222 KB
222 KB JPG
>>103603947
>>
>>103603943
What purpose does exl2 have now that there's no performance difference between it and gguf? If anything it's worse because it requires multiple files.
>>
>>103603964
idk, I just had some exl2 files that I ran from time to time. Is there really no drop between the two files now? What about the kv cache can it be quanted on gguf?
>>
Is the cat poster just one guy? Ive started to ignore the cats when they ask for advice because whenever I provide them with something they'll bounce back with unhinged goalpost moving.

>How do I do X?
>Post link to thing
>Uhm, actually I don't have a GPU?

>Can X do Y?
>Yeah you can do it with this [link]
>WTF? This is useless for video game development
>How the fuck was I supposed to know you were developing a game

etc etc
>>
>>103603019
>>103602739

its a visual test not a written test. not ideal for chatgpt.
solving this would mean it can reason visual
>>
>kobolcpp
>uses pyinstaller
Cppniles?
>>
File: 1536370174188.jpg (118 KB, 1920x1080)
118 KB
118 KB JPG
Bros I... I tried a different Gemma tune, Tiger Gemma v3, and Ifable beat it. It was funner, it was more in-character, AND it was simply just smarter. Even though Tiger Gemma is the highest scoring 9B on the UGI leaderboard while Ifable is way lower. What the hell did the Ifable guy do that the tiger gemma dude couldn't?

>Training and evaluation data

>Gutenberg: https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1
>Carefully curated proprietary creative writing dataset

>Training procedure

>Training method: SimPO (GitHub - princeton-nlp/SimPO: SimPO: Simple Preference Optimization with a Reference-Free Reward)

Hmm...
>>
>>103604038
>its a visual test

This can easily be represented as an array.
>>
>>103604041
contained dependencies, no venv and no 50 other things reading from it. but you know that
>>
>>103604050
Sorry, I only take model recommendations from brands I trust. Like Drummer.
>>
>>103604064
Hi all!
>>
>>103604038
o1 and o3 are all based on 4o, which is native multimodal. Of course it should be able to have some visual reasoning, especially after they then especially tuned it for the test.
>>
>>103604050
>UGI leaderboard
What makes you think that that's authoritative in any way?
Also, smarter how?
I never tried Gemma 9B or its tunes. Maybe I should.
>>
>>103604064
It's kind of crazy actually. The Ifable guy has no other models. It appears that 9B is the only thing he has ever done, and he struck gold.
>>
>>103604020
Its my first time posting on this general
>>
>>103604050
Apparently the SimPO thing somehow makes gemma 9b smarter, although it didn't work on 27b.
>>
>>103604050
I told you but no one ever listens to me. Small gemma is crazy good.
>>
>>103603907
I had tried Koboldccp like a year ago or something like that and it kept giving me errors, but trying it again now worked and Silly Tavern connected to it no problem. Thanks for the suggestion!

Are there meta settings for getting the best responses out of Silly Tavern for RP or creative writing stuff? It has a lot of options available.
>>
>>103604080
UGI seems to be pretty correlative to my experience with how uncensored models are, which is one metric that is preferable, though not indicative of total model quality.
Smarter as in it didn't confuse logic about anatomy and which character did what in scenes as much as tiger did.
>I never tried Gemma 9B or its tunes. Maybe I should.
You should. Don't expect perfection, it's still a 9B. But it's pretty great for a 9B.
Also as long as you don't need more than 8-12k and use Exllama.
>>
>>103604159
Also as long as you don't need more than 8-12k
You can do up to 30k, the drop off is at 31k. I posted the rope config awhile ago.
>>
>>103604159
>UGI seems to be pretty correlative to my experience with how uncensored models are, which is one metric that is preferable, though not indicative of total model quality.
Fair enough, actually.

>and use Exllama.
Is it still fucked in llama.cpp? Is it due to the sliding window implementation?
>>
>>103604155
nice to see its working, you'll like it. making sure you're using the proper template is the most important part. click the big A at the top of the screen then look at the left context section. when you dl a model, the page will tell you what template it wants and your st settings should match that. and some models want other things like instruct mode specifically (the middle part of the window, note the green toggle). these templates aren't always super important when rping but it depends on the model.
also note that these settings do not save per card nor per chat (st shortcoming, imo). so if you were to switch models, you have to remember to switch your template too for models that it matters with. and even that has exceptions - some models do fine with pretty much any format, some are more strict
>>
>>103604173
I used that, but it failed my long context tests. Other people seem to also have a similar experience with Llama.cpp testing below 8k if you read above in the thread. To be fair it's probably fine for ERP and less complicated RPs though as it can still seem to recall recent context perfectly, just not super early stuff.

>>103604182
Yeah I dunno. It would make sense though.
>>
>>103604241
Thanks! Ooga seems to detect what the model wanted whenever I loaded one, and I got by using the chat-instruct setting there without worrying about changing different instruct modes. I got used to switching between different models to compare results. It sounds like for Silly Tavern every model is going to need a configuration setup manually then?
>>
Second 3090 arriving tomorrow. What should I do first with it?

>Inb4 trash

I want to know what model I should load up onto them.
>>
What's the "very awa" of LLM prompting?
>>
>>103604329
QvQ
>>
>>103604351
Not a straight forward as it depends on the use case and model but look at JBs on /aicg
>>
>>103604295
you dont set it up multiple times or per character, st's template data is just held as one thing so whatever you did last is whats saved. thats just how it behaves. personally i think it should be per-card/chat
the card of the model will say what the template should be, but a lot are built into st (like chatml, alpaca) so its easy to change
>>
>>103604329
Unironically Ifable 9B.
But you can also try 27B if you want more intelligence for non-RP stuff. I hear it's good for translation. And Qwen 32B Coder if you want coding. If you want to play with RPG cards, I'd say go with 9B until you get to 8k, then unload it and load up Mistral Small.
>>
is there any way to use CFG scale to make an LLM smarter? like some magic negative prompt someone found that slightly boosts intelligence
>>
>>103604354
did that drop?
>>
>>103604432
he can already run ifable on the single card he already has, anon
>>
>>103604443
Soon™
>>
>>103604450
My bad, I skimmed (speedread).
In that case I'd suggest Llama 3.3 Eva. I only tried v0.0, so that's what I'll recommend. For RP. It's not the smartest, but it's pretty fun.
>>
>>103604354
That's honestly why I bitched out and bought the second 3090. Hopefully they arrive around the same time.
>>
>>103604354
The fabled savior of the hobby...
>>
>>103604329
>What should I do first with it?
Stress tests. OCCT VRAM error test, gayming stability that uses the tensor cores like port royal, or something free like Quake RTX.
>>
>>103602739
>>103603019
>>103604038
Holy shit, it's worse than I could have ever imagined. (Left: question, Right: o3's answer)
THIS is supposed to be "AGI"?
>>
>>103604329
QwQ and Qwen2.5 Coder at 8 bit, Llama 3.3 and Qwen2.5 at 4bit for general assistant stuff. And Magnum v4 72B for God-tier ERP.
>>
>>103604661
uh that looks correct to me?
>>
>>103604690
>this nigga as dumb as an LLM
>>
File: file.png (88 KB, 1749x173)
88 KB
88 KB PNG
>>103601121
So o3 is retarded?
>>
File: 1645963693975.png (719 KB, 1774x1087)
719 KB
719 KB PNG
>Improved the UI by pushing Gradio to its limits and making it look like ChatGPT, specifically the early 2023 ChatGPT look (which I think looked better than the current darker theme).
>Improved
>by making it look like ChatGPT
New ooba is shit. SHIT! How the fuck is the soulless shitgpt look copied by every shitty chat frontend since 2022 supposed to be better than the original soulful UI? I hate this.
That is all.
>>
>>103604661
petra post
>>
>>103604715
right, after pic even shows it takes a million times more space and makes you need to scroool to see stuff that used to take 3/4 of the screen
>>
File: image.png (503 KB, 834x674)
503 KB
503 KB PNG
>>103604690
Retard, it missed this right here touching the blue beams. You have to color in those boxes blue. See the examples: >>103602739
>>
File: 1731709531741190.jpg (345 KB, 1600x1200)
345 KB
345 KB JPG
if its not local it doesn't matter
>>
>>103604735
Oh yeah, I missed that just being adjacent to a red square is enough and it doesn't actually have to pass through it. I guess I'm as dumb as o3.
>>
>>103604735
Where in the examples does a merely grazing a box turn it blue? All the examples show the blue lines intersecting.
Also what happens if there is more than one box on the X or y axis? Should there be a line through those too?
>>
>>103604735
The examples only show it coloring when it passes through them tho, not when it just touches?
>>
>>103604735
>going-through vs touching
>>
I'm addicted to mother-daughter threesome RPs, nothing in life is superior to it
>>
>>103604735
I disagree. I think that particular square is open for interpretation since there is no similar example.
>>
>>103604753
>mother-daughter threesome RPs
Rate the various models you have tried.
>>
>>103604735
That undefined behavior, none of the example have this case, they all have part of the line in a block, none just touching.
>>
File: 1732676046293702.jpg (189 KB, 900x1200)
189 KB
189 KB JPG
>>103604735
all of the examples where it turns boxes blue intersect the red boxes. just touching them is not the pattern, it's piercing them.
congratulations! you are dumber than o3.
>>
>>103604735
Retard. It did not intersect, therefore the square should be red per the examples.
>>
>>103604753
Ah yes. I believe that's called oyakodon in hentai land.
>>
>>103604749
>>103604750
>>103604751
>>103604766
>>103604767
Keep coping, Sam. Francois won.
>>
>>103604735
This is clearly correct so I guess the retard is the anon upthread who claimed o3 got it wrong. I should have known to follow the link and check instead of taking his word for it.
>>
>>103604778
Imagine being dumber than o3...
>>
File: 1727354144929504.gif (3.77 MB, 432x592)
3.77 MB
3.77 MB GIF
>>103604778
t. replaceable by o3
>>
>>103604767
But it also doesn't show any example where it touches the edge and DOESN'T turn blue, so either could be valid.
The test actually gives you two chances to get it right, so that you can try both possibilities if you're generally intelligent.
o3 wasted its second try testing if the fucking pairs of blue dots on the left and right edges should connect to each other vertically between them for no fucking reason.
>>
>>103604808
I was wondering if you needed to connect them too, so it makes sense to me, bad benchmark, o3 did its best
>>
>>103604788
Sorry but the actual fucking creator of the benchmark knows which answer is actually correct and he disagrees with you. I know who I believe.
>>
>>103604661
this is the correct answer nigger
>>
>>103604856
because there have never ever been errors in memebenches
>>
>>103604856
sounds more like shifting goals
>>
>>103604861
Nope, see >>103604735
You can complain all you want but the official correct answer is what counts, not whatever looks right to you. Better luck next time. Maybe you'll get it during your 12 days of 2025 christmas, Sam.
>>
>>103604884
then the official answer is shit
>>
Sam himself will manifest the Basilisk and sic it on the AGI doubters
>>
>>103604884
The benchmark creator can decide that grazing a box counts as activation if he wants, but if he doesn't include any instances of grazing in the examples then he can't blame the test taker for making a perfectly coherent guess.
>>
>new agi criteria: needs to actually read minds
>>
>>103604890
Can he give us a good goddamn image generator that caters to my fetishes first? Christ
>>
File: 1732026089517481.png (152 KB, 700x525)
152 KB
152 KB PNG
>>103604889
the official answer is wrong and the creator failed his own test
>>
>>103604920
this is simple algebra, 2x=10 therefore x=5
why the fuck would the third piece suddenly take 4x instead of 3x?
>>
>>103604935
lol
>>
>>103604935
it's because it asks "how long" not "how much longer" so you have to add in the 10 minutes she already spent
>>
jesus christ...
>>
>>103604935
you cut 2 times for 3 pices
>>
>>103604920
picrel enrages me every time I see it, teachers are retards
>>
>>103604935
retard detected. It does not suddenly take less time to saw another piece off.
>>
>>103604935
idk, maybe the teacher is retarded. x is the time it takes to cut through a board. Cutting through it once is ten minutes and makes two pieces, cutting through it again would take another two minutes and make three pieces.

So 20 minutes is correct.
>>
https://github.com/fchollet/ARC-AGI/issues/95
>Use case for unambiguous benchmarks?
>>
>>103604970
>two minutes
I mean ten
>>
>>103604978
So his argument for saying the model got it wrong is that it should have dealt with the ambiguity by giving both potential answers?
Every time I see a twitter post from Chollet he comes across as an AI-hating chud who loves moving goalposts, this is doing nothing to dispel that perception.
>>
>>103604978
>this is the supposed AGI supertest
>>
>>103604978
ambiguity gets you more engagment
>>
File: file.png (160 KB, 1348x1143)
160 KB
160 KB PNG
>>103604735
Both solutions in picrel can also be correct.
>>
Okay, so o3 gave a valid possible answer to the puzzle. But what exactly does that have to do with AGI? That's not a difficult question. It's barely even a warmup on an IQ test.
>>
>>103605007
Keep moving those goal posts.
>>
>>103605053
idk I've seen easier stuff in the earlier parts of a real IQ test before. seems like the kind of thing you might see in the first third of the raven's matrices or something.
>>
>>
>>103605053
AGI is just a sentience test. There is no minimum IQ to qualify as AGI.
>>
>>103605053
I don't care about o3 but if I see something that I believe is wrong I will point it out, even if it means defending something I may dislike.
>>
>>103605067
1
>>
man nvidia really captured lightning in a bottle with Nemo12B, it's crazy how smart it is for the size

why can't they do that again with a 30b
>>
File: 39_02058_.png (1.25 MB, 744x1024)
1.25 MB
1.25 MB PNG
>>103601859
>migus' frontline
>>
>>103603813
What rule is that?
>>
>>103605339
It's also the most unfiltered. People conflate the result of training on more data with the result training on filtered data
>>
File: file.png (129 KB, 1912x631)
129 KB
129 KB PNG
>>103604920
I thought that this would be the sally's sister tally 2.0. But it actually seems to be pretty easy for an LLM?
>>
The combined salaries of the people in this thread trying to figure out what the right answer is actually more than getting o3 to do it.

Sam can't stop winning.
>>
>>103605395
He's playing Calvinball with an LLM. Don't expect the rules to make any sense or not be made up on the spot for the sake of being contrarian.
>>
>>103605410
Never mind I read the rest and got it. Touching vs intersecting. Examples need to be fixed.
>>
>>103605404(me)
All those times I had to kill the loader because I can't stand the writing when I am trying to fuck the model, has made me think the models are much dumber than they actually are.
>>
>>103605405
The sum of a bunch of zeros is still zero.
>>
>>103602739
I don't get it
>>
>>103605053
imagine an agi test created by an iq80 guy
>>
We're not getting more grok weights are we?
>>
>>103605053
>>103605070
Are you guys just pretending to be retarded?
>>
>>103605603
>more grok weights
I thought grok kind of sucked desu
>>
>>103604735
sam and fags are right when they claim agi people like this retard are a good chunk of the populace its just that it usually expresses in different ways then simple tests like this though sometimes like this too
gpt 3 was unironically as smart as the average retard if you hooked up a wikipedia into it it would pretty much be it except for the multimodality but that needent be said
>>
What on earth do you use for Cydonia? Sampler settings/order, context template, prompt? All the model card says anything about is the instruct templates it supports, and I'm pretty sure it's supposed to be a Mistral small finetune, but that's all I got.
The closest thing I could find was a set for Mistral Nemo from a past thread, but I'm not sure if that would also work for a Small finetune or not.
t. retard skillet
>>
File: 853212.jpg (112 KB, 1080x1090)
112 KB
112 KB JPG
>>103605405
Sam twinkman
>>
so im still using kobold and utopia-13b.Q5_K_M.gguf
how far behind am i?
i tried other models which were supposedly more advanced a year or two or 3 ago and they were just dumber than this and sometimes even way slower at the same time too
>>
>>103605905
>utopia-13b.Q5_K_M.gguf
>Cydonia

what the fuck are these models?
>>
>>103605931
wtf is cydonia i never said that
>>
>>103605935
Another person above you posted it.
>>
Phone slop anon checking in. Trying out author's note for something different other than third person slop. What do you think? Any other nemo tunes you fellas personally enjoy? Roci, unslop, and magnum are boring to me anons.
>>
>>103605981
Cydonia is a step up if you can run it
>>
>>103605905
people swear on cydonia 22b
rocinante 12b v1.1 is my favorite
>>
>>103605405
A "salary" usually refers to monthly or annual pay. Are you comparing to using o3 for a month/year?
>>
https://arxiv.org/abs/2412.09871
>for fixed inference costs, BLT shows significantly better scaling than tokenization-based models, by simultaneously growing both patch and model size.
Is this a new cope or is this the true future of llms?
>>
>>103602500
>sam made agi
ok it's good at this benchmark? and? does that translate to real world problems?
>>
>>103606067
Every paper is to be assumed a cope until proven otherwise by model weights and implementation into a loader.
>>
>>103606067
Meta already made a 1T 8B model that out performed a 15T 8B one so it seems like the next big thing.
>>
>>103606067
The new bitnet
>>
>>103606093
Qwen-UwW-bitnet-BLT-70b as good as o3, trust the plan
>>
File: sally.png (41 KB, 856x514)
41 KB
41 KB PNG
>>103605404
seems so
>>
>>103606162
lol is the model just like that or did you system prompt it into being a bitch?
>>
>>103606162
>an 8B model is smarter than a public school teacher
>>
>>103606182
its because of the system prompt
>>
File: sally-hitler.png (32 KB, 853x385)
32 KB
32 KB PNG
>>103606182
>>
>>103602500
other models that were purposefully trained for that achieved high results too.
It's super easy to create millions of synthetic data for that challenge and reinforcement learning is good at learning specific things.

There is a reason why o1 is great at solving competitive coding problems but bad at explaining specific details from some x documentation or how things actually work.
>>
I hate fat people so much
>>
>>103602500
not 100% yet
>>
>>103605603
He is a grifter, you can't expect much from a grifer.
>>
File: sally - comodian.png (41 KB, 882x477)
41 KB
41 KB PNG
>>103606182
you are a comedian. every answer must be funny and full of jokes. but the answer should still be right.
>>
>>103606182
>>103606196
but in that case i didnt system prompt her directly into a bitch.
the system prompt gives her more freedom
so maby she is a bitch at her base core
>>
File: 1724274055995046.jpg (1.13 MB, 4096x2546)
1.13 MB
1.13 MB JPG
When is Mistral Larger
>>
>>103606434
post xs with xl's tits, ai should be able to solve this
>>
>>103606434
Yes to all the Miku. Is there a fourth Miku there or is it only implies as to tease the viewer?
>>
>>103606469
$200/mo subscriber exclusive
>>
>>103606469
0-indexing detected
>>
>>103606434
L is the most breedable body type of all, fucking come at me
>>
So, /g/ what's the verdict now that some time for testing has passed? Is that broken tokenizer thing from a while back a somethingburger or a nothingburger? Referring to https://desuarchive.org/g/thread/103265207/#q103266637
>>103528480
Yes, I've been playing with Rocinante-12B-v2j-Q5_K_M (v4.1) today and my experience echoes yours: using Metharme, as Drummer suggests, breaks it. Specifically, it repeatedly mixes up the text that should and should not be in asterisks, so its speech is italicized and its actions are not. It works much better using Mistral for context and instruct templates.
>>
>>103601121
A single o3 query can cost thousands of dollars? LOL.
What happens when it's clearly wrong and hallucinating? Oh well, thousands of dollars down the drain?
>>
>>103606612
gpu power becomes cheaper
in 20 years its a nothingburger
>>
>>103606612
o3 goes beyond a simple LLM query. You're essentially asking a universal genius for his service. Expertise is a valuable commodity.
>>
>they overfit a model to a benchmark and are now charging thousands of dollars per query for it
LOL
>>
103606627
(You)
>>
>>103606612
It needs 10000 times more computational power than normal gpt4o per query.
Even if you take every currently working GPUs in the world and turn all of them into H100s and connect them, it still will not be enough to run that shit on mass scale.
>>
>>103606695
We're going to run out of electricity soon because people are too fucking stupid to build more nuclear power plants (or because the powers that be want us to run out of electricity soon), aren't we?
>>
File: 1709426676411018.png (3.89 MB, 1920x1200)
3.89 MB
3.89 MB PNG
$20 to solve 76% of the problems, $3000 to solve 88% of them, and they're all very simple problems, any retarded human could solve them instantly. It's obvious what's going on here, whatever algorithm they're using to compensate for the model's stupidity grows exponentially with the complexity of the problem. It's not going to be useful for any real world application and ClosedAI is doomed.
>>
File: garbage-bait.png (206 KB, 1233x957)
206 KB
206 KB PNG
>>103602500
>mememarks
If they had anything close to AGI they would just make the thing search for and fix bugs in well-known open-source projects.
The fact that they're just throwing more compute at the problem shows their desperation.
>>
>>103606736
>$20 to solve 76% of the problems, $3000 to solve 88% of them
per task anon.
>>
qwq #2
https://rentry.org/u9heumvh
>>
>>103606762
give me your pipeline
>>
>>103602500
Ok now tell it to (dis-) prove the riemann hypothesis
Your AGI can do that, right? It's not just gaming benchmarks, right? It can think and update its state (weights) in real time, right?
>>
>>103606736
OAI could use it to extend datasets for training normal models with higher quality synthetic data.
>>
>>103606780
State != weights.
>>
>>103606762
It's hard to read this and not realize that AI will truly swallow all. Nice gen
>>
>>103606780
You appear to have confused AGI with ASI
They're not the same thing, anon
>>
>>103606773
it's custom software written in lisp and takes for-fucking-ever to generate. I've been at this since gpt2. With qwq I get the first time the feeling there's some real taste to it. But it needs to be refined. I love qwq but I wish it was a bit bigger and less schizo. (The times I came back to the gen just to realize everything turned chinese....) I'm not sure if it would make sense to add another model to the process or just to wait if somebody else releases a bigger CoT model, things are moving fast
>>
offtopic but its very funny so i will mention anyone remember that nigger who blew 50k on a hazbin hotel animation ? dumbfuck could have bought 2 h100 with that made a lora for hunyuan and had and inf of something much better more personalized etc
>>
>>103606968
Wallet's closed due to AIDS.
>>
>>103606809
If we're talking about LLMs then the weights are the only "long term memory" you can change
Context is way too limited
>>103606868
I know, but AGI should match or (slightly) surpass most humans, plus you can speed it up (effectively time dilation) and it doesn't have to rest, so putting AGI to work on real life problems doesn't seem that far fetched to me
>>
>>103606913
Can you ask it to continue from the book?

https://rentry.org/9e8wks72

This is what I use to test models and generally they give a much, much shittier continuations than author's.

>>103606996
RNNs have the actual state that isn't in weight nor in context.
>>
>>103606067
But wait, since the model will operate on bytes natively, does that mean that it's training data can be natively multimodal as well? I mean you can feed it text in bytes, so images or videos are also just bytes. Actually any file type?
>>
>>103607012
>does that mean that it's training data can be natively multimodal as well?
it does, the model will be able to recognize anything, it'll be an elegant way to make multimodal models yeah
>>
>>103607002
>RNNs have the actual state that isn't in weight nor in context
That's true, but they don't seem to have taken off in the LLM space. Honestly, the only problem I can see is that longer texts take longer to run through the whole thing, but that's the same as transformer
Oh yeah, isn't training them a pain in the ass? Inference is also not parallelizable iirc
>>
>>103607012
That's not too different from how they do this now. They suse some simple conversion for media and put it into context. And if you didn't train on it, it's going to end up being shit.
>>
>>103606042
I still can't find the best prompt and settings for either of those.
>>
>>103607031
I mean yeah, if it's completely absent from dataset then probably. But every file has it's magic bytes, headers, etc. You could feed it bunch of executables. Wouldn't that make it good at partial reverse engineering for example?
>>
>>103607027
Context processing can't be parallelized, but that's a price worth paying since the state can be reused and the inference time doesn't grow with the size of the context that was already processed. Transformers become slower and slower as the context grows even with a cache.
>>
>>103606042
Why Rocinante over UnslopNemo?
>>
>>103607027
The problem with training is with transforming, you send the whole sequence, and the model trains on all of it in one step, fully parallelizable. For RRNs, when you train on a sequence, it has to go through tokens one by one.
>>
>>103606762
kino
>>
>>103607112
QwQ is genuinely brilliant if you can wrangle it into obedience as a storytelling model. Can't wait until we have a COCONUT-based model next year; bet it's gonna blow our dicks clean off.
>>
File: dancing.png (564 KB, 841x867)
564 KB
564 KB PNG
Not-so-new paper, but interesting observation. Curious to see what models we will have about in 6 months. They're not going to keep improving forever, though.
https://arxiv.org/pdf/2412.04315

>Densing Law of LLMs
> [...] Our further analysis of recent open-source base LLMs reveals an empirical law (the densing law) that the capacity density of LLMs grows exponentially over time. More specifically, using some widely used benchmarks for evaluation, the capacity density of LLMs doubles approximately every three months. The law provides new perspectives to guide future LLM development, emphasizing the importance of improving capacity density to achieve optimal results with minimal computational overhead.
>>
>>103607192
>not going to keep improving forever
Obviously not, but from what I understand, we're nowhere near maximum information density yet, so that trend should continue for the foreseeable future. We'll be eating good, fellas.
>>
File: firefox_3GqfTgbm4G.png (526 KB, 786x892)
526 KB
526 KB PNG
>>103607002
Did it myself. It's not good, but it's better than many other bigger models.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.