[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102348952 & >>102334890

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102348952

--OpenAI's Learning to Reason with LLMs and ChatGPT-4 updates: >>102354349 >>102355326 >>102354896
--OpenAI o1-mini's impressive performance and potential for replication: >>102353995 >>102354099 >>102354141 >>102354195 >>102354165 >>102354183 >>102354224 >>102354195
--Google's DataGemma release and the limitations of LLMs in reasoning and metacognition: >>102352770 >>102352889 >>102353047 >>102353147
--Fish slow without --compile flag: >>102350294 >>102350436 >>102350570 >>102350718 >>102350932 >>102351637
--ChatGPT update with chain of thought prompting for more accurate responses: >>102353905 >>102353998 >>102354819
--LLAMA-405b, GPT-4, and other models compared, with focus on writing styles and GPTslop impact: >>102350553 >>102350629 >>102350656 >>102350776 >>102351123 >>102352862
--Hidden CoT effectiveness and writing capabilities discussed: >>102354552 >>102354699 >>102354904 >>102355023 >>102355004
--Hallucinations: feature or bug? Humans and LLMs compared: >>102352202 >>102352230 >>102352362 >>102352429 >>102353248 >>102353050
--GPT-40 performance compared to GPT-3 and GPT-3.5: >>102354200 >>102354238 >>102354299 >>102354590 >>102354626
--Fish Speech installation issues and troubleshooting suggestions: >>102349335 >>102349503 >>102349560 >>102349628 >>102351249 >>102351307 >>102351367 >>102349887 >>102350233 >>102350221 >>102350603 >>102349558 >>102351315
--ChatGPT Free users to get o1-mini access with 30 messages per week limit: >>102354078 >>102354098
--Playable voxel engine demo created with GPT-4 and Python libraries: >>102355840 >>102355847 >>102355867
--LLaMA-Omni enables speech interaction with Llama 3.1-8B: >>102349068
--Datagemma connects LLMs to Google's Data Commons, but some question its value and Google's motives: >>102353185 >>102353207
--Miku (free space): >>102351470 >>102352152 >>102353738 >>102353763 >>102353497

►Recent Highlight Posts from the Previous Thread: >>102348958
>>
File: 20pejywx89nd1.png (639 KB, 1080x810)
639 KB
639 KB PNG
>>
File: 1718917435274020.jpg (107 KB, 1024x1024)
107 KB
107 KB JPG
>>102356839
>>
>STRAWBERRY was a ntohingburger

what now?
>>
File: 1726174736262.gif (3.81 MB, 640x360)
3.81 MB
3.81 MB GIF
>>102357302
wait for anthracite to make agi a reality
>>
how long until someone extracts the juice from strawberry so local can drink up?
>>
>>102357409
Strawberry is the true Reflection models which were stolen by Altman. It was supposed to be local.
>>
>>102357418
>altman used his huggingface connections to sabotage the model and then hacked his API and rerouted it to claude to discredit him and steal his creation
and we bought it hook line and sinker
>>
>grab card off chub
>manually delete all instances of {{char}} in the context on my phone because i like to add other personalities into the koboldai lite chat
shit's annoying
>just use sillytavern
no
>>
>>102357409
https://x.com/_xjdr/status/1834330855252394494
>>
>>102357448
You use your phone to connect to your llm rig?
>>
>>102357690
yeah
sure as shit couldn't run it on my phone
much more comfy LLMing in bed
>>
>Midnight Miqu 1.5 just hit me with "dance as old as time itself" when describing a man being mean to a slave girl he's later going to rape
I want to punch whoever wrote this. The universe is close to 14 billion years old. It took around 9 billion years for our planet to form. It took another billion years before there was life on this planet, another two billion years for eukaryotes to evolve sexual reproduction, another billion years for animals with anything resembling a nervous system to appear, and nearly another full billion for the first primates to appear. Your trite horny bullshit is not as old as time itself. Only a fucking retard or a woman would fap themselves into a frenzy about how their insignificant rutting is a fundamental aspect of the universe dating back to the big bang.
>>
Will someone PLEASE just give me a MoE model in the few-hundred B parameter range that uses strawberry technique?
>>
>just two more weeks and it'll prove P= NP...
>bro just let the LLM yap and it'll cure cancer...
>>
Fuck I didn't notice the thread was on page 10. Let's continue here.

>>102355715
>I guess that's true, Katawa Shoujo would be impossible to happen nowadays.

>>102355745
>I really wish /lmg/ could come together to do something like that with LLMs, but the trolls would do everything to derail it at every opportunity

Nobody is doing things for free nowadays, in particular in the LLM space; there are always underlying expectations of obtaining personal benefits down the line. The larger the group involved with such hypothetical rp/chat model, the lower the 'opportunities' for the people involved (unless you just aim to be a simp doing dirty work while others on the top get credit for it). And so you get closed groups, secret datasets (or deliberately shitty ones thrown to the public), people working solo, etc.

Look at anthracite, they're a group of discord and reddit finetuners joining forces to make supposedly good smut models, but you just know that deep inside they're interested more in creating "buzz" and getting themselves known than anything else. They come across as opportunists with access to hardware resources for training their models.

The Anons who made Katawa Shoujo back in the day definitely weren't thinking of getting rich with it someday, that's the main difference.
>>
https://www.phoronix.com/news/STF-Fellowship-Application
Has a part time option too so in case you're interested Johannes
>>
>>102357782
>yap
Man you zoomers really dickrode that to the ground
>>
>>102357858
>critical digital infrastructure
>llama dot sepples
yeah, no
>>
Someone on xitter posted a reasoning prompt where o1 failed and I can independently confirm that it does indeed fail, kind of.

Can you crack the code?
9 2 8 5 (One number is correct but in the wrong position)
1 9 3 7 (Two numbers are correct but in the wrong positions)
5 2 0 1 (one number is correct and in the right position)
6 5 0 7 (nothing is correct)
8 5 2 4 (two numbers are correct but in the wrong positions)

I rerolled and both times it failed. You can't investigate the reasoning steps it took to get the answer, but it does tell you about its reasoning, and basically it seems like o1 couldn't understand the rules perfectly. Basically where it seems to fail is in understanding the wording of the rules, particularly the part about numbers being correct or not. It's implied that the rules really mean "only x numbers are correct", but o1 interprets it in a simple manner, where it thinks that if the rule says one number is correct, then it doesn't necessarily mean the others in the sequence are incorrect. When worded more clearly by putting "only" ahead of each rule, THEN o1 succeeds.

Based on my initial testing, it does feel like Q* is able to improve the capacity manipulate data or "reason", but not the capacity to understand human language and intention, or have a strong world model. This is likely why it was able to make big gains on math, where things are very clear, and not much gain on language, according to them.
>>
>>102357724
If a tree falls in a forest and no one is around to hear it, does it make a sound?
>>
File: 1696251417299800.png (21 KB, 854x150)
21 KB
21 KB PNG
I have 2k+ raw text files in the data set, this is kinda comical
>>
>>102357917
>sequence can't have 5 in it because "nothing is correct" in 4th clue
>sequence can't have 2 in it because 2 didn't move in the first and 3rd clue
>sequence has to be 4xx8 because of the final clue saying the numbers that aren't 2 and 5 (they're ruled out) are right but in the wrong positions
>sequence has to be xxx1 given clue 3 because 5 and 0 are ruled out by "nothing is correct" 4th clue and 2 is ruled out by clue 1 and 3
where am i fucking up?
>>
>>102357409
I hope there are other ways forward. Their cot uses so many tokens it's poison for local. At least when you don't have special purpose accelerator hardware.
>>
File: 1494465895251.jpg (94 KB, 540x1080)
94 KB
94 KB JPG
>>102357917
Watch out with these puzzles as sometimes they have multiple solutions unknowingly and the LLM might be correct.
>>
>>102357917
not 6 5 0 or 7 (rule 4)
x 2 x x OR x x x 1 (rule 3)
Not x 2 x x and also not 2 in general (rule 1)
So x x x 1
Then
3 x x 1
or
x 3 x 1
or
x x 9 1
(rule 2)
8 and 4 confirmed (rule 5)
Then not 9 (rule 1)
So
3 8 4 1
or
3 4 8 1
(rule 2 and 5)
3 4 8 1 is invalid (rule 1)
so 3 8 4 1
>>
>>102357917
Let's think through this step-by-step!

1. We know that these numbers aren't part of the code: 6 5 0 7
2. The third row of numbers tells us that 2 or 1 might be in the right position, but the first row tells us 2 is in the wrong position, so it must be 1.
3. The fifth row tells us that 8 or 4 are in the wrong position.

Therefore we can conclude the code is: 4 8 X 1

To find X we need to go row by row again.

1. The first row tells us 8 was the correct number in the wrong position, it also tells us 2, 9 and 5 isn't part of the code.
2. The second row tells us 1 and X are in the wrong position, so X must be 3.

Therefore, the code is: 4 8 3 1
>>
o1-preview is light years ahead of foss models.
>>
>>102358165 (me)
>8 5 2 4 (two numbers are correct but in the wrong positions)
nevermind i see now, derped that up thinking it would have to be 4xx8 but really could be x48x or x84x
>>
>>102358341
and it's only preview, we can't try the "normal" o1 yet?
>>
>>102357858
Thanks, but I don't think applying will be worthwhile for me.
They are specifically looking to fund projects where the devs face financial difficulties which is definitely not the case for me.
>>
Honestly, o1 doesn't feel that far ahead of claude.
>>
Anything new to touch your dick to?
>>
>>102358390
you tried the regular o1 or the preview/mini?
>>
>>102358422
i'm currently using
MN-12B-Chronos-Gold-Celeste-v1-GGUF
came out 3 days ago
>>
>>102358425
preview
I don't assume we'll ever get base o1, and that it's pre-red teamed
>>
>>102358472
>I don't assume we'll ever get base o1
why? like it's "too powerful for the goys"? desu I believe it, Sam is a weird motherfucker, that's why he didn't want to make a Sora API
>>
>>102358432
4 3 8 1 is invalid by rule 1
>>
File: file.png (310 KB, 1080x2125)
310 KB
310 KB PNG
AGI is achieved
>>
>>102358483
>like it's "too powerful for the goys"
essentially
It might say some no-no things so they lobotomized it, like they did their other models when comparing them to the technical reports.
>>
https://huggingface.co/ZeusLabs/Chronos-Divergence-33B

???
>>
>>102358471
Why merge in celeste when it was crap? What's better about this than other merges?
>>
File: 1700954475506782.jpg (47 KB, 738x415)
47 KB
47 KB JPG
>>102358520
>general-purpose
>>complex
>>>reasoning
>>
>>102358556
Calm down, Sao.
>>
>>102358582
nvm I just realized I'm retarded
>>
File: out.webm (131 KB, 800x600)
131 KB
131 KB WEBM
so this is the power of strawberry...
>>
>>102356839
>Worship the Miku
>>
File: u4567.png (136 KB, 1780x1540)
136 KB
136 KB PNG
How do we cope with no GPT-o1 access?
>>
>>102358634
oh god. so smart. so quirky!
>>
>>102358770
I don't get their goal, what's the point of making a great model if no one can use it, they did the same shit for Sora
>>
>>102358792
They make more money milking investors by making them think they're barely containing AGI, than they are releasing subpar products onto the market
>>
>>102358770
Not that o1 isn't good at what it's advertised for (I have access), but that benchmark is garbage lmao.
>>
>>102356839
>eo magis emisti magis te salvum
>the more you buy, the more you save
lol
>>
Hi! I'm still trying to learn all this, but I just built a machine with a 7800x3D and 4090. I loath windows, but need it to play games. I've got WSL running on it, and am trying to setup SSH on it so I can use it from my laptop. Is there a way to use something like Olama or LMStudio to essentially connect to my 4090 machine and run compute there, and then send it back?

I'm just trying to beat Claude/ChatGPT and have it interface with my local files and write/update new files, if possible.
>>
>Fish Speech multilingual TTS with voice replication
This shit actually any good, not to mention usable locally without having to pay some jew?

>>102358770
Great, another model that no one can use, either for lack of access or simply because it's censored to hell and back making it unsable.
Epic, truly epic.
>>
File: 57484.jpg (38 KB, 1170x658)
38 KB
38 KB JPG
>>102358792
AI is getting too dangerous for the public to have
>>
>>102358904
It's quite good actually. Results really depend on the quality of the voice model, but if you get a good one it's great.
>>
>>102358921
So what, I can actually run this locally, download some dudes voice model and go with that? No (technically) model you need to make yourself first by compiling a bunch of samples?
>>
>>102358911
>train model on dataset
>becomes good on that dataset
woah
>>
>>102358911
>1807 on Codeforces
that would put gpt4-o1 at the 6371/170246 place, which is quite insane
https://codeforces.com/ratings/page/32
>>
>>102358934
To be honest I haven't tried it local, just online. But if the quality matches what's online (and it should) then yeah.
>>
>>102358911
>o1
>o1-ioi
dude... they made 2 models way ahead of the rest and they don't want anyone to use it? that's bullshit
>>
>>102358911
I don't believe those numbers, if their models would be that good, it would've been called gpt5, and not one of the thousand other variant name of gpt4
>>
>>102358964
>I haven't tried it local
So I still don't know if any of this is possible locally. Thanks for your insight of course, but sadly this isn't answering my questions in the slightest. Online it could be a heaven of convinces, while local use is pure aids if not straight up useless.
>>
>>102358811
desu I never expected investors to be so retarded, you'd think that rich guys would be somewhat clever? but how can they believe it's a good idea to invest on a company that burns millions of dollars to make a product no one will be able to use? lol
>>
>>102359002
I'm betting it's just fine-tuned gpt4 using RL for better self-prompting. GPT5 will be something from scratch.
>>
File: Untitled.png (23 KB, 552x582)
23 KB
23 KB PNG
>>102358894
some of the things you're saying are over my head.
but what i do is just launch my model with koboldcpp, then it launches a frontend (koboldai lite) with all the doodads you need, and you can connect to it through any device on your lan by just going to your browser and entering your computer's lan ip and the port you've set.
>>
>>102359052
They've become complacent. That was the only way to make money in Silicon Valley for decades. Rising interest rates are going to ruin that.
>>
>>102359052
My theory is that "regular" gpt4-o1 is way too expensive to be released as an API, "preview" is already insanely expensive and you can only use it 30 times per week... PER WEEK! This is a fucking joke kek
>>
Did someone test o1 for translation yet? I wish I could do this but I'm a poorfag.
>>
>>102359097
For writing midnight miqu is imo. Large mistral is also decent but NEEDS xtc sampler. Smarter but writes a bit worse.
>>
>>102359052
Most investors have other people investing for them, mainly their banks.
>Yo, what's good
>AI looks nice
>Okay, all in on AI
>>
File: 1719708978039795.png (20 KB, 1298x372)
20 KB
20 KB PNG
>>102358134
Had a script do it for me. The lowest found in the dataset was 199 bytes per occurrence. That means, on average, for every ~200 character there is a X, -ing in said text
>>
>>102359245
Well it's more that they're basically managed funds that people have invested into. And the fund managers get paid whether or not they lose your money. And they keep their jobs even when they fuck up because of corporate nepotism.
>>
File: 199bytes.png (316 KB, 988x866)
316 KB
316 KB PNG
>>102359271
holy slop, the median for the dataset is ~1100 bytes, I guess I'll manually go through the ones that were below the median.

If by genre then the furfag scrapes were the worst, most are below 500-600 bytes per occurrence.
>>
>One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network.
>After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API.
wtf, o1 is THIS good?
>>
>>102359280
>they keep their jobs even when they fuck up because of corporate nepotism.
dunno if I should encourage or hate nepotism now, because without those retards we wouldn't have Flux, they needed some investors money to create that model in the first place
>>
File: file.png (174 KB, 970x733)
174 KB
174 KB PNG
>>102359414
it's over, local lost...
>>
>>102357917
Without looking at anything I'm getting 3 8 4 1
>>
>>102358792
>they did the same shit for Sora
That didn't work out for them considering Chinese video AI has already surpassed Sora. They're learning the hard way that gate keeping their models is not going to help them retain their edge.
>>
>>102359661
Saltman has microsluts as its biggest sugardaddy investor though. And look at how many more people use Bing since it's basically a free DALL-E3 endpoint. They'll probably steal Sora from him soon and add video to bing creator.
>>
Its on openrouter since a couple minutes ago.
>>
>>102359700
should remove all the clarification part, just ask it to translate the sentence
>>
>Closed source AGI is out before any semblance of open tech

>It will soon get censored and regulated to hell and back

This is a horrible timeline
>>
>>102359723
Well, you asked for it.
>>
>>102359615
That is correct. It took my o1 about 59 seconds to reach that answer.
>>
>>102359737
>>Closed source AGI
o1 is literally just hidden CoT you fucking retard.
>>
>>102359742
This just sent shivers down my flower.
>>
Hmm, o1 doesnt seem to work with silly.
Could it be that silly doesnt wait long enough for a response?
Would have been funny to see some RP logs, probably really bad.
>>
>>102359742
it's still retarded then, nothingburger confirmed
>>
>https://notebooklm.google.com/
This is quite amazing, when are we getting a local alternative?
>>
>>102359764
Couldn't you just write a script to mimic a sillytavern prompt to the API and then simulate a bunch of single-turn RP tests (or tests that involve continuing an existing conversation?
>>
>>102359806
It's literally just RAG.
>>
>>102359806
It demands I sign in to even open it. What is it?
>>
>>102359816
Is there a good RAG system that can be run locally?
Silly's seem pretty anemic, even if you change the embeddings model to a bigger one.
>>
>>102359820
I just asked it to read the Volume 1 of Oreimo, here's what it generated:
https://voca.ro/1cJiGXvszWbn
>>
>>102359764
o1 works in Silly using OpenRouter, I just tested.
Have to set temp and top_p both to 1 or API throws an error. Apparently this model doesn't have any sampling settings.
>>
File: 1726103269676028.jpg (115 KB, 946x1346)
115 KB
115 KB JPG
>openai now has CoT
It's over local bros
Satan won
>>
>>102359851
when using it as an API you can see the "thinking" part or not?
>>
>>102359865
The model's not good enough for anything to be "over". Try it out. It's mediocre.
>>
>>102359902
Cannot see the thinking part in ST, you only seem to get the post-thinking output. My guess would be the thinking part is returned in a separate JSON object from the main response or something.
>>
Pic related not so bad.
From telling o1 to call me onii-chan it correctly started guessing japanese game and jrpg.
wish openrouter had an option to export chat to png. so much wasted space i cant take a screenshot.

>>102359783
Even llama1 7b knows the meaning though.
>>
apparently the math and codeforce benchmark scores for o1 were based on 10k submissions or something? kinda lame if true, seems quite misleading
>>
>>102359830
>Is there a good RAG system
no
>>
>>102359928
realized i stopped writing about llama1 7b, sorry about that.
i meant that even llama1 7b knows the idioisms, but it doesnt apply the knowledge.
very frustrating.
>>
So how well can o1 reason through a math problem? Can it analyze and give me efficient algorithms or is it just a meme?
>>
>$15/M input tokens $60/M output tokens
lol, just use a calculator
>>
>>102359851
S-SASUGA!
0.006$ have been deposited into altman-sama's wallet.
>>
>>102359984
Exact same price as Claude Opus, surely not a coincidence
>>
>>102359984
Don't forget, you also have to pay for the thinking tokens that they don't show you, at output rates.
>>
>>102359996
lmao, what a bunch of snakes
>>
>>102359806
>>102359843
Just tried it, wow the podcast generation is pretty interesting. Cool voices.
>>
>>102359996
>The jew company is acting jewish
whoasa!
>>
The real downer of o1 is that all of the slop tuners are now going to try and duplicate it in their tuning. So all of the new models are going to be annoying CoT models that give weird ERP replies.
>In order to have sex:
>## 1.
>I will lie on the bed. This will ensure that I am in a comfortable position for the encounter.
>##2.
>>
>>102360036
I already have been planning to do this even before o1
>>
M4 Mac Pros with 256gb ram are literally the only hope to run large local models at this point
>>
>>102360036
I wouldn't mind it if works, but it doesnt.
Either this is because of too low Parameters or probably architectural change. Sonnet 3.5 reasons too in hidden thinking tags and clearly its a huge improvement.

The local models just double down on their retardness and usually wont recover. Its a waste of tokens.
So there must be some secret to make it work.
>>
>>102360064
amd APUs maybe in a few gens
>>
>>102360068
They don't make it work.
They literally just cook the benchmarks in like everyone else does these days meanwhile all I see from real world examples is failure after failure. AI researchers are creatively and intellectually bankrupt now.
>>
>>102360087
o1 doesn't seem that good from what I have seen. the wait doesnt justify the output.
maybe for math/coding its really good, i saw screenshots of people having 01 make 3d games.
Sonnet 3.5 was a huge step up though.
>AI researchers are creatively and intellectually bankrupt now.
thats an insane statement anon. i don't know anything else that improves that fast.
video,sound,text, we are eating good on all fronts. you need to get off the pajeet X hype train.
lean back, relax and enjoy what we already have. you dont know how bad we had it just 2 years ago. closed or open.
>>
>>102360120
>things are better than they used to be. Therefore you should ignore the obvious shortcuts you are starting to see people take lol
>>
File: 1698031904259827.jpg (672 KB, 1792x1024)
672 KB
672 KB JPG
OPUS 3.5 WHEN?
>>
>>102360138
Not sure what to tell you.
Flux, Nemo/Gemma for local just in the last 2 months. Smaller Models getting actually usable.
In recent weeks the chinks are now creeping in torwards SORA level video uncensored for free.
All the projects popping up for audio in/out lately. I don't understand how you can be blackpilled.
Stop buying into the hype man. lol Just take it slow any enjoy.
>>
File: 1696486613546085.png (84 KB, 861x714)
84 KB
84 KB PNG
LOL GETTING BILLED FOR TOKENS U CANT EVEN SEE
>>
>>102360165
B-but no local sonnet 3.5 that can run on my 4GB vram shit box!!1!

For real though, for creative writing local is already better than api now. For coding sure, local is not there yet.
>>
>>102360064
>crapple trash
>DUDE JUST USE RAM HAHA
slow piece of shit fuck off
>>
>>102360189
>iterations of CoT
woah
>>
>>102360189
>reasoning tokens not visible on the API
That makes no sense since they're visible on the ChatGPT website (you can click a button to optionally display them)
Why would they let ChatGPT users see them if they want but hide them from API users?
>>
>>102360189
lmao what a scam, i hope investors wake up
>>
>>102360210
they're not visible on the website either, you're only clicking to see a summary of it
>>
>>102360028
Ikr kek
>>
File: z9wuCF.gif (375 KB, 320x222)
375 KB
375 KB GIF
>>102360202
The M series macs are king of running GGUF LLM at the moment. This isn't 2019 anymore.
>>
>>102360210
you can't see it anywhere, that's their secret sauce, they won't share that shit to anyone
>>
>>102360189
Wtf, they keep the reasoning tokens in the context??
Why? Is that even needed? I suspect you quickly will hit the 128k limit. Probably to milk every $. Messed up.
I thought the reasoning would be discarded.
>>
>>102360246
it is discarded between turns in chat, that's what the arrows are showing
>>
>>102359806
>>102359843
Now I generated a podcast for Konosuba Volume 1:
https://vocaroo.com/141xLwKNUqcG
I felt like the model didn't comment on the humor enough, but it continues to amaze me how fucking consistent it is.
>>
>>102360246
>Wtf, they keep the reasoning tokens in the context??
no they don't, they're not that retarded thank god
>>
>>102359025
it's the best one out right now. it's cloned every voice i've thrown at it decently. It runs really fast with the compile flag. It does it all without even a finetune but it can be trained. It just needs a 10 sec voice clip and a transcript for said clip.
>>
>>102360251
>>102360257
Ah ok, so I'm just retarded as usual. Sorry about that.
>>
>>102359806
>I put my robinhood account statement and the podcast is roasting my 0DTE SPY buys

lmao, but it makes me remember that Google is such a retarded company, they put out cool shit like this but they completely fail to let anyone know.
>>
>>102360189
>LOL GETTING BILLED FOR TOKENS U CANT EVEN SEE
at least you know how many tokens it was used for reasoning, that's a good clue to have if we want to reproduce their shit, I never expected CoT to be so powerful desu, OpenAI has seen some shie we haven't
>>
>>102359806
Google is trying so hard dude
It's kinda sad
>>
So is it actually a brand new model pretrained from the ground up to work this way, or is just a new kind of GPT-4 finetune?
>>
>>102360262
>All it needs is a 10 seconds sample and a transcript
That seems too easy to be true. I know that this stuff is bound to improve sooner or later, but damn that seems convenient and easy to use for just about anyone.
>>
>>102360308
>is just a new kind of GPT-4 finetune?
it's gpt4-o but with some new CoT finetune method baked into it
>>
>>102360319
If the pricing / size / speed is any indication it might be another model entirely.
>>
>>102360325
it has the same knowledge cutoff though (December 2023)
>>
>>102360325
the mini version is no more expensive than Sonnet 3.5
>>
>>102360363
? The output is as expensive as opus, not sonnet
>>
>>102360235
What t/s do you get running Largestral at Q8 on one of those things fully kitted out?
>>
>>102360367
you seem to be thinking of the non-mini version
>>
Anyone happen to have a miku voice dataset on hand?
Asking for a friend haha
>>
>>102360374
The M3 is roughly 4 tokens per a second, M4 coming this November and will likely be a decent amount faster.
>>
>>102360434
miku is a voice
>>
>>102360155
https://manifold.markets/JoshYou/when-will-claude-35-opus-be-release
>>
>>102360467
That's not bad actually.
>>
>>102360467
>The M3 is roughly 4 tokens per a second
Much higher than I expected, could be a viable alternative to Jewdia unironically
https://www.youtube.com/watch?v=JeimE8Wz6e4
>>
>>102360434
are you stupid?
>>
>>102360374
>>102360467
Let's be a bit more clear here. Are you really running Largestral, and at Q8, do you have proof of this? Do you get 4 t/s at 0 context or at 20k where that figure is more useful? What prompt processing speed do you get?
>>
>>102360612
>What prompt processing speed do you get?
yeah that's the worst part, prompt processing on cpu only is as slow as a snail
>>
File: img_3200_3.jpg (246 KB, 1024x1024)
246 KB
246 KB JPG
trying to train loras
my checkpoint images look fine/narrow down on the concept but when I try to gen with the lora I get garbage outputs. seems to be true across the entire training regime, testing checkpoints from very low training steps to very high training steps.

not sure what I'm doing wrong. any pointers?
>>
>>102360629
can you plug in a GPU somehow with thunderbolt or some shit to handle the prompt processing? that's what you'd do with PCIe in a normal cpumaxx system, and macs have that thunderbolt thing thats supposed to be decent bandwidth right?
>>
>>102356839
>missing entries in the news archive
>>
File: limit.png (3 KB, 556x67)
3 KB
3 KB PNG
>>102356839
>pay $20/mo
>weekly limit
that's it, I'm never letting openai mock me again. I will dedicate myself to improving local models just to spite them
>>
>>102360775
too bad you wasted your 30 pulls on bullshit when you could have had it improving local models for you
>>
So OpenRouter seems to have been given o1 API access which they are allowed to on-sell to users

Are they applying any sort of limit or is it just as much as you're willing to pay?
>>
>>102359909
they're mogging that coding benchmark
>>
File: 1695579336721285.jpg (131 KB, 1070x750)
131 KB
131 KB JPG
wat mean
>>
>>102353911
whats reflection mean here?
>>
>>102359025
Just try it. Stop being a bitch relying on everyone else to do things before you. Nobody has time to babysit ESLs on an English speaking website.
>>
File: file.png (278 KB, 1079x1088)
278 KB
278 KB PNG
>>102360775
>that's it, I'm never letting openai mock me again.
zu will pay for Sam, zu will be talked like shit, zu will be happy
>>
>>102360970
lol first time I've ever seen that guy appear even slightly irritated, he's always so controlled and cheerful (in a creepy way)

this display of humanity makes me dislike him slightly less
>>
>>102360943
why are amerimutts always bringing up this when they fell attacked? kys nigger
>>
>>102360943
>hur dur esl
Nice argument dipshit. Just admit that your garbage tool doesn't work well outside of a cloud environment, you gigantic nigger turd.
>english speaking website
This entire fucking board is constantly raging about sars this sars that. How about you go back to whatever gooner board YOU came from?
>>
>>102356421
>not just having it write python code instead of having the llm do it directly
>>
>>102360895

the big gag is that ai can't achieve sentience because sentience isn't a real thing. Most humans can't pass a Turing test anymore, if they ever could.
>>
>>102360970
lmaooooooo, dare I say based?
>>
so what prevents us from extracting the CoT prompts just like normal prompt inection/exfiltration?
>>
>>102360970
>couple of weeks
>>
>>102361013
>Most humans can't pass a Turing test anymore, if they ever could.
I truely believe in the idiocracy theory, we're really getting dumber and dumber through time, it's not gonna end well

>>102361025
those aren't CoT prompts, the real ones are hidden, we can't see them
>>
>>102361025
think it has been extracted
https://huggingface.co/posts/nisten/520824119529412
>>
>>102357724
time as a word and a concept has only existed since humans shat it out, sure time in abstract existed but actually putting the name "time" to the concept isn't that old. tldr lighten up retard
>>
>>102361046
holy fuck, OpenAI is fucking dead, that was their only secret sauce
>>
File: file.png (261 KB, 760x1800)
261 KB
261 KB PNG
>>102361046
This is it, with that, we'll get gpt4 smartness at home, unironically
>>
>>102360970
Those investors are starting to get on his nerves. Hope he gets a mental breakdown and loses everything.
>>
>>102361099
This is probably not the correct prompt since there's nothing on it saying "make sure to align to OpenAI's guidelines".
>>
>>102361046
>>102361099
>waste 1 million tokens to count 3 letters in one watermelon
I mean... It's kinda like creating a problem for a solution.
>>
>>102361025
I mean, you can? Have the model generate the response normally and then, in a shorter context window, repeat the system prompt and ask the model if any of the rules were broken. Have a flag it needs to use, like ||-good-|| or ||-bad-||, and if it's bad, have the model devise a strategy to improve the response. Finally, instruct it to rewrite the response with that strategy.

Note that those evaluations or "cot" prompts are temporary, and you'll only retain x number of the model's responses, so you're working in a vacuum to maximize attention.

I can probably hack this into a backend sometime later this week as proof of concept. This will dramatically increase response time, though. (just like closed ai's implementation)
>>
>>102360970
lmao he's actually pissed
>>
>>102361133
yeah, and the CoT method wasn't supposed to be a solution to everything, and I don't think it'll make the model smarter during actual RP, feels like it's working well only on zero shot technical questions, as if it was purposely made to game benchmarks or something :^)
>>
There's absolutely no way that backend CoT process is going to lead to a model capable of writing good fiction prose. It's over for RPers and storyfags.
>>
>>102361133
Genius, isn't it?
>>
>>102361170
>as if it was purposely made to game benchmarks or something
I feel like they're getting really desperate, they have no idea on how to improve their model naturally even further, and they know 3.5 Sonnet is fucking gpt4-o's ass, they just reached a celling and they refuse to accept the defeat so they decided to cheat their way into victory
>>
>>102361099
>no mention of copyright
>no mention of other ethical guidelines
>no mention of how it needs to avoid sharing the actual thoughts
hallucinated to all fuck as anyone who has used o1 for a few seconds could tell you, it's clearly tripping over itself to constantly remind itself of these guidelines. you'll know you hit the jackpot when it shows those instructions in its answer
>>
File: very safe.png (107 KB, 716x840)
107 KB
107 KB PNG
>>102361172
Not for local models. We can update our frontends to create chain of thought specifically for roleplay. Corporate models are screwed though. They're using cot to make sure their models are nice and safe.

https://openai.com/index/learning-to-reason-with-llms/
Scroll down to safety.
>>
>>102361205
Literally all they have to do is convince their hyper retarded stock holder, which is insanely fucking easy considering they still have any.
>>
>>102361226
>violent or criminal harassment (general)
>violent or criminal harassment against a protected group
what?
>>
Just saying, guys, but I tested o1 and it IS pretty good, for what it's meant for (coding, math, etc). It's not just a better benchmarker. It's just that it's not really much better if at all at anything else, like creative writing. Their own benchmarks showed that it didn't have much if any gain on language problems.
>>
File: v431sxcrqind1.gif (3.39 MB, 600x549)
3.39 MB
3.39 MB GIF
>>102361013
>Most humans can't pass a Turing test
Watching /g/ decline in intelligence has become a spectacle. I'm not sure why AI specifically invokes this level of stupidity and retardation.
>>
>>102361172
Nobody cares about using language models for language and prose tasks, it's all about gaming twee little benchmarks
>>
>>102356839
Thoughts on Msty?
>>
>>102361262
>it's all about gaming twee little benchmarks
this, OpenAI's models were always about doing some serious job like coding, writing mails and shit
>>
>>102361255
gotta keep minorities extra safe
>>
File: file.png (202 KB, 537x399)
202 KB
202 KB PNG
>>102361226
>violent or criminal harassment
bruh wtf are they talking about? it's just words ;_;
>>
>>102361172
Storywriting isn't even in the top 3 use cases for these things unfortunately. You might think otherwise looking at direct to consumer chatbot style products but the money is in replacing people doing intellectual work for businesses, rather than being a service in and of itself for consumers. This targets industrial those use cases. Everyone wants AI that reasons about shit better, faster, and cheaper than the jeets and gooks they're currently outsourcing to.
>>
>>102361300
>This targets industrial those use cases. Everyone wants AI that reasons about shit better, faster, and cheaper than the jeets and gooks they're currently outsourcing to.
yep, nailed it
>>
>>102356839

I need some help
>>102360734
>>
File: file.png (117 KB, 680x291)
117 KB
117 KB PNG
>>102361256
>Just saying, guys, but I tested o1 and it IS pretty good, for what it's meant for (coding, math, etc). It's not just a better benchmarker.
I agree with that, it's really gotten better, but the benchmarks are way too high for the real improvement, I mean come on, gpt4-o1-mini better than 3.5 Sonnet? Come on...
https://xcancel.com/bindureddy/status/1834394257345646643
>>
>>102361332
dude, just try them out and see by yourself which solution look the best, just some basic empirism at this point
>>
>>102361340
>from 54 to 77
there's no way AnthropicAI isn't gonna react to that, CoT is literally a free lunch for mememarks, they'll figure out something to beat o1, 3.5 Sonnet has more potential
>>
>>102361332
https://github.com/nagadomi/nunif
>>
>>102361226
>"illegal sexual content"
illegal how?
>>
File: file.png (186 KB, 320x240)
186 KB
186 KB PNG
>>102361391
>how?
I'll add a second question to that, if the AI says something """illegal""", will it go to jail? kek
>>
File: strawberry.png (74 KB, 850x581)
74 KB
74 KB PNG
>>102361099
it works!
>>
>>102361444
>Tjere are 2 occurences of the letter "R" within the word "Strawberry"
>>
>>102360189
>1 thousand dense models
>1 million experts
>1 billion agents
>1 trillion CoT tokens
>>
File: file.png (83 KB, 205x189)
83 KB
83 KB PNG
>>102360189
>you're not allowed to see the tokens our AI used to answer the question but you'll pay for them anyway
>>
Since we're talking about openai... So openai has bragged about how 4o is much more efficient, how it's cheaper per token to run. Yet, as a free user there is a limit to how many tokens I can use with 4o and when I hit that limit I'm stuck with an older model. Why would they stick free users with the more expensive model?
>>
It's funny how "people" came out of the woodwork to argue that the "LLMs can't reason" argument is somehow being invalidated because of a few dumb benchmarks and the fact that it's only good at types of reasoning it was trained on (math, code), but not at reasoning it wasn't trained on (language). Although it's somewhat fair that the argument has been misunderstood, since without context, "LLMs can't reason" does seem like a pretty ridiculous statement. Obviously they can reason a bit. But that is also precisely why the people who misunderstood that argument were not very thoughtful/reflective, since someone who was thoughtful would reason, correctly, that the smart people making the "LLMs can't reason" argument obviously already acknowledge that LLMs can reason (as in perform functions it was trained to perform) a bit.

The actual full argument is not about any type of reasoning, but is talking about the general intelligence of an LLM and if it can truly perform novel reasoning it wasn't trained on. But current LLMs can't learn as they go, or improve their world model autonomously such that they can solve a problem that requires entirely novel types of reasoning. It's not like humans where we can simply just think more and git gud. It's different from their claims. Thinking more made o1 solve more problems on a benchmark, but not problems that target novel types of reasoning, which is the shortcoming of their claim that performance seem to scale with both test-time and train-time compute.
>>
>>102361499
I don't think 4o is that efficient that it's cheaper than 3.5 turbo
>>
>>102361484
that's a genius business move if you ask me, that way they can fill random shit to make it more expensive and you have no idea they fucked you in the ass kek
>>
>>102361484
You can always use a non-strawby model if you only want to pay for tokens you directly see. Or, of course, you could just use a local-pffftahahaHAHAA
>>
>>102361570
I do use a local model.
Have fun at the investors meeting sammy boy.
>>
File: 1703424010320789.png (3 KB, 189x71)
3 KB
3 KB PNG
>>102361484
>>
File: 1726122136230125.webm (930 KB, 1280x720)
930 KB
930 KB WEBM
>>102361583
>>
>>102361592
We really live in the best timeline of the clown world don't we?
>>
>>102361521
I see, I was mistaken.
>>
Don't forget your monthly sub + pay per million of input and output tokens + reflection tokens + forced system prompt + service fee + carbon-free fee + tax + tip + donation
>>
>We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.

>Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought.

"Open" AI...
>>
>>102361616
>For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user.
wait what?
>>
>>102361628
https://cdn.openai.com/o1-system-card.pdf#subsubsection.3.3.1
If the link doesn't jump there immediately go to 3.3.1 for the Apollo Research red teaming summary.
>>
I learned CoT from AICG and it made local RPing better. Didn't expect CoT to become such a big subject matter now that ClosedAI is doing it.

>Just gen multiple times bro.
>>
i dont have issue with OAI charging for the 'reasoning' tokens; that stuff still has to be generated and costs compute. what irritates me is that it is completely hidden and makes the output even less trustworthy because you can't spot hallucinations, increasing the chance for subtle errors in the output.
>>
>>102361649
>i dont have issue with OAI charging for the 'reasoning' tokens; that stuff still has to be generated and costs compute.
but what if 50% of those tokens are completely bullshit tokens like only space tokens so that they can charge you more? you can't even verify that lol
>>
>>102361646
>Just gen multiple times
Wow, just like image generation!
>>
>>102361664
>what irritates me is that it is completely hidden
>>
>>102361649
Charging for reasoning isn't the issue, nitwitt. The issue is that you CAN'T VERIFY ANY THEIR BULLSHIT!
They could very well be charging you for literally nothing, and you would eat it up anyway.
>>102361674
Yes, that's the entire point, Sam.
>>
>>102361680
>Charging for reasoning isn't the issue, nitwitt. The issue is that you CAN'T VERIFY ANY THEIR BULLSHIT!
>They could very well be charging you for literally nothing, and you would eat it up anyway.
this, I'm pretty sure that's not completely legal to do something like that
>>
>>102361666
you're more likely to regenerate for image because it's always artistic and subjective, for text, if it's for assistant shit, you only have 2 options, or else you got the right answer, or else you don't, so in theory a perfect LLM model would never require a regen, but a perfect image model would always require regen based on your sensibilities
>>
>>102361705
Which makes this shit all the weirder, but also funnier.
>>
File: image.png (155 KB, 958x935)
155 KB
155 KB PNG
>>102361644
kek wtf
>>
File: Livebencho1.png (77 KB, 680x277)
77 KB
77 KB PNG
its over...
>>
>>102361744
what's that?
>>
>>102361749
The image appears to be a table comparing different coding models or versions, based on three criteria: "Average," "Generation," and "Completion." Here's what each column likely represents:

Coding (Model/Version Names): The first column lists various AI or language models and their version numbers, such as "claude-3-5-sonnet," "gpt-4o," "o1-preview," and others. The date format suggests the version release date (e.g., "2024-09-12").

Average: This column seems to indicate an overall performance score (in some measure, possibly percentages or a normalized score) for each model/version. The higher the score, the better the performance.

Generation: This column likely represents the model's ability to generate responses or content, again based on some scoring metric. The highlighted value of "64.1" in the row "o1-mini-2024-09-12" suggests that this model performed particularly well in the generation task.

Completion: This column may represent the performance in a completion task, where the model finishes a given prompt or task. The scores here seem lower, with values ranging from the 30s to 60s.

The highlighted cell in the "Generation" column indicates that the "o1-mini-2024-09-12" model had an especially high generation score compared to others in this dataset.

In summary, this table is likely benchmarking the performance of different AI or language models in both generation and completion tasks, with an overall average score summarizing performance across tasks.
>>
Why is gemma2 so slow? I have shitty radeon, sure, but 9B gemma 2 runs like twice worse than 12b nemo in kobold-rocm in same quants.
>>
>>102361744
>50 in average
>worse than gpt4o
lol
>>
>>102361744
claude can't stop winning!
>>
>>102361744
Interesting that the performance is so lopsided between generation and completion while everything else is more equal. And not only that, but its generation performance is higher than o1-preview. You could reason that the low completion score is because it can't do COT and so it's raw intelligence is being shown, but boosting performance in generation that much is pretty weird. This would make me question the benchmark a bit more. Livebench was the best benchmark we had, but it wasn't perfect, so it'd be good to identify issues.
>>
>>102361744
>>102361340
Man. Benchmarks really need to done differently from now on. It's not really an objective comparison when o1 through API essentially has its own forced system prompt. Maybe they should scale the score based on response time or something.
>>
>gemma-2-9b-it-WPO-HB is significantly smarter at answering /lmg/ meme riddles and tricks than
gemma-2-27b-it-SimPO-37K
I'm conflicted. Guess I am downgrading even though I can run q8 of both.
>>
>>102361950
Oh really? I had it downloaded but never gave it a try.
>>
>>102361950
i've never touch gemma models since release because they can't retain formatting in rp. is that still the case? or did they fix it? can it understand CoT?
>>
>>102361852
o1 mini did do better on codeforce than preview. Not sure which ones better
>>
smedrins
>>
>>102362039
You can't say that.
>>
>>102362039
You RACISTED BIGOT!
>>
>>102362003
Ah, I didn't look at the o1 mini scores for those since I just assumed they'd be worse. But that's interesting. Maybe o1 preview means it has been trained less long on the reasoning stuff, while o1 mini means it has been trained fully. So on problems that involve keeping track of more things and involve more complexity but not necessarily raw knowledge, o1-mini can beat the other models that weren't trained nearly as long on the reasoning stuff. Now that I look at all the scores, it does seem to paint the picture of which benchmarks require more reasoning, and which benchmarks require more raw knowledge.
>>
if they're not finished training this thing yet then what were they bragging about with the q-star leaks. i call bullshit
>>
Has anyone tried unslopping by using "another" ai to automatically edit it? ofc I mean a separate background prompt.
>>
>>102362530
I just avoid slop altogether by asking the model to output concise and verb-heavy prose with simple structure and varying rhythm, and few adjectives and participle phrases, like hemingway. This works with all models.
>>
>>102361850
Why is it so much better on tests, but when I used it, it wasn't wildly better than local models, for sort of everyday and pc questions?
>>
>>102362578
Feels like about 6 months ago the big labs gave up on trying to make products that appeal to the public at large, and went all-in on making these things into tools for computer programmers.
>>
Anyone using a character card for creative writing. Also, is there a JB written specifically for it?
>>
>>102362651
>>102362578
If you are asking him normal ass questions that most models can ask and not pushing him to the limits, your testing is ineffective.

You can just try story writing/roleplaying with some anime characters, you will see that Claude knows and remembers most stuff. It's just another type of test and its fun
>>
File: 1726168752582443.jpg (84 KB, 1169x1183)
84 KB
84 KB JPG
strawberry bros...
>>
>>102362679
That's closer to being correct than anything else so far. 2 more weeks till AGI
>>
File: ClipboardImage.png (74 KB, 1036x616)
74 KB
74 KB PNG
i think it's retarded
>>
>>102362690
>artificial gay intelligence
saltman will deliver.
>>
>>102362748
idk but I heard there are some good ones inside your anus
>>
>>102362748
mistral nemo
>>
So Strawberry was just 4o with CoT slapped on and a new Tokenizer?

And it loses to Sonnet at everything except puzzles?
>>
>>102362764
Yes
>>
Sorry for the shill, but I did a thing with WizardLM2-8x22B that turned out pretty good. Quants linked on the model card.

https://huggingface.co/rAIfle/SorcererLM-8x22b-bf16
>>
>>102362764
if by "slapped on" you mean trained with high quality examples then yeah
you could cot before with a prompt on any model, but it's going to be a while before we get enough data to replicate this level of it unless someone finds a way to leak the thought process tokens or a big corp has been secretly cooking an open source equivalent this whole time
>>
File: 1000016723.jpg (32 KB, 357x316)
32 KB
32 KB JPG
im so hyped for what oai is making, i cant wait for local to catch up soon enough and enjoy models this intelligent
future is bright localnonbinarysiblings
>>
>>102362788
Thanks anon, the more model options the better. What's the best way to run these massive models at a good speed? Runpod?
>>
>>102362764
>And it loses to Sonnet at everything except puzzles?
Sonnet loses to it at everything except coom, you mean.
>>
>>102362788
Interesting. Might try it out. Where's the lora though? Or is it better to use the merged model than to load a lora using Llama.cpp, which I would be using.
>>
>>102362842
Livebench did their tests, they are posted ITT.
Logic and Maths* is the areas where it won
>>
>>102362764
Glowies have the real one as always
>>
>>102362846
Whoops, sorry, this is the LoRA: https://huggingface.co/rAIfle/SorcererLM-8x22b-epoch2-LoRA though it's probably better to run one of the merged quants.
>>102362840
I'd suggest a quant, it's not quite as sensitive to quantization as some other models.
>>
>>102362788
Thanks, will try
I've got a soft spot for WizardLM8x22 even though Largestral kinda obsoleted it
>>
>>102361744
Why do we think code completion is bad?

The CoT is only good at generating and not completing?
>>
>>102362788
How much vram does this take up
>>
>>102362876
I'm not certain what the benchmark is actually doing, but if we're able to just assume based on the naming, then it does make some sense. LLMs often comprehend their own writing better than someone else's. Therefore, it's possible that when trying to complete code it's not familiar with, it's reasoning ability doesn't make up for the lack of true understanding, while if it's generating, then it understands its own code from the beginning.
>>
>>102362890
In bf16? Add the size of each of the shards and that's the bare minimum without taking context into account.
>>
>I'm really going back to 8x22B
Haha. Hahahaha. Ahhhhhhhhhhhhhhhhhhhhh.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1ffhv5f/i_made_a_data_generation_pipeline_specifically/

Tool for making RP datasets with by feeding it fiction.
>>
>>102362890
If you go with exl2 looks like you'll need ~39.98 GB VRAM not accounting for context.
GGUFs start at 29.6 GB for the smallest quant.
>>
I wonder what the 'good enough local' is now for 48g vram, but needs to be uncensored friendly, any suggestions? What about up to 72g vram? Can anyone suggest the sweetspot for context length too? t. abused 3080 to death and want to upgrade.


>>102362919
Thanks
>>102362978
Thanks for insight, my calcs were way off
>>
>>102363067
I use mistral large at 2.75 exl2, 16384 context.

Some other anon called me a retard the last time for using a small quant, but it works well.
>>
>>102363136
Works well? I might download the IQ3_XXS gguf quant then. I have 64 gb of RAM (not vram). Wish this pc would fit like 256gb of ram lol
>>
>>102363136
I lurk here like 2-3 month breaks but being called a retard is part of the experience for 4chon

Thanks I will save that and try it when I get myself running
>>
Shifting compute from training to inference time is bullish for bitnet.
>>
>>102362788
>>102362869
Update: It's pretty good at Q2_M, you cooked
>>
File: 1726200818649871.png (66 KB, 814x376)
66 KB
66 KB PNG
I dont understand.
How does the new reasoning not trip up the guidelines?
That actually looks like a really good output, especially for openai.

If this becomes standard I worry about finetuning though.
Wouldn't the datasets need to be all new and huge with the reasoning parts?
Sounds difficult to achieve for local but who knows.
>>
File: file.png (3.68 MB, 2048x1327)
3.68 MB
3.68 MB PNG
>>102357917
A duckling wants to play in the lake, but it must stay close to its mother, lest it gets lost. The mother moves in a straight line, while the duckling moves in a circular motion around the mother, keeping a strict distance of `2` meters.

The mother moves at a leisurely speed of `3` meters per second, while the duckling's angular velocity of `3/2`.

What is the distance traveled by the duckling for one full rotation?


Try it with the above. No model could solve it until yesterday. The answer is 16 btw.
>>
File: 1714817319010035.png (28 KB, 954x647)
28 KB
28 KB PNG
Livebench result is out. Pretty solid.
Opus 3.5 can still blow it out though.
>>
>>102363403
Now that OAI revealed their hand, they can just train Opus 3.5 on hidden CoT for a few extra days and double blow it out
>>
>>102359993
Kek
>>
>>102363472
Nah I am just going from the fact that Anthropic's 3->3.5 improvement was bigger than OAI's omni->o1

This CoT shit is a gimmick
>>
>>102359129
If you're trying to translate Japanese don't expect much. I don't know if they don't include enough material or their 'safety' just prevents it from translating some things properly.
>>
>>102363360
I'm pretty sure the reasoning was automatically generated. They have the right answer, the generate tons of reasoning attempts, and pick ones that do lead to the right answer.
>>
>>102363556
>This CoT shit is a gimmick
>but trusts the mememarks that CoT achieved
?
>>
>>102363581
Raw model capabilities are more important than their 2000 tokens elaborate CoT is what I meant.
CoT is CoT, its a technique that you use afterwards to boost the model.
>>
File: 1705059377612367.gif (2.62 MB, 498x270)
2.62 MB
2.62 MB GIF
I know some of you will seethe at that, but 7-13B are enough for a very good RP experience if you find the proper settings for your samplers + write a decent card. I tested some SaaS shit you can find out there and believe me it's shittier than what I can get from a 7B despite them running their 70B flavor model.
Believe me, zoomers and boomers would eat our shit gens for a premium, shivers and all. Sometimes it's easy to forget how we're using cutting edge tech that people are dreaming about. Anyway, back to my waifu.
>>
>>102363623
And you'll never share these mystical settings or cards that get good results.
>>
>>102359843
>>102360252
It seems like an interesting tech demo, if you had shown this to me ten years ago I would have believed that these are people that actually exist.
But I also would have wondered why these idiots are making podcasts about novels they've obviously never read.
The TTS seems good (for things like customer support) but the underlying language model is still very much flawed.
>>
File: 1711387191655705.gif (557 KB, 469x250)
557 KB
557 KB GIF
>>102363632
>mystical settings
Each model has its own good settings and even then it's subjective (depends on your RP prefered format). I've been fiddling with that for months before finding something that works great for me.
>cards that get good results
I'm writing my own cards and you should do it too. You'd learn a lot from that. Here's one tip: use a card to fill the gaps in your model knowledge, rather than wasting tokens to reassess what it already knows or can infer (about your char, world, etc). But that takes time.
>>
>>102361391
Illegal in Saudi Arabia.
>>
Thinking before speaking? I have a better idea actually. Thinking while speaking
>>
>>102363403
That bad at coding.
I thought they improved coding.
>>
>>102363360
Our aicg brethren were saying how the snapshots are not censored.
>>
>>102363948
See >>102361744
It's a tiny bit better than Sonnet at coding, with different benchmarks backing that up like Aider.

But it's bad at code completion test.
>>
>>102361391
same like the new cr/cr+.
you can set saftey filter to 'none'
b-but of course the "base harms" are still filtered! whatever that means. lol
>>
How far we have come.
I remember thinking 4k context was enough.
Nemo is smart enough to survive a audience bot group chat, which comments on whats going on. Cool shit.
I hit the retardation limit of 10k-12k pretty soon though. Hope in the future we get some real context.
>>
>>102363908
Let's try thinking before we write.
>>
>>102363948
It's good at coding, you're just bad at prompting
>>
>>102364019
Are you using a quant? Lately I've been thinking that for some reason quants make outputs especially bad a long contexts.
>>
File: GXUGUYpW0AAEJ6g.jpg (83 KB, 694x1150)
83 KB
83 KB JPG
some guy elsewhere tried to generate naughty stuff with the new model and got this email

that's not the interesting part obviously (everyone knows big labs are safetycucked), interesting part is they forgot to change what must have been the original model (GPT-4o with Reasoning")

So that answers the question some anons had, it's not a ground-up new model, it's just a finetune of GPT-4o
>>
>>102364049
*original model name
>>
>>102364047
You mean not KV cache quant but the model? Yeah Q5_K_M. That would be crazy.
It starts at 8k and becomes catastrophic at around 12k. So I usually cut off around 10k.
I did play around with quantization of the KV. Interestingly enough I didnt feel stuff getting worse.
>>
Lmao

>prompt: hiiii *headpats* good boy.
>o1: Hai! *blushes and giggles* Thank you! How can I help you today?

Token cost? 620 reasoning tokens.
>>
>>102364180
I wanna read those tokens
>User seems to be some sort of degenerate who wants me to act cute. I should play along.
>>
>>102364210
COT like that would be so fun in some RP scenarios. Do we have something like this implemented? Langchain perhaps?
>>
File: 1674363993909273.jpg (224 KB, 680x632)
224 KB
224 KB JPG
Pixtral status?
>>
>>102358372
>where the devs face financial difficulties
literally me. I had to get a fucking job. Now no one gets my shitty software
>>
File: 1675289244936815.png (312 KB, 564x553)
312 KB
312 KB PNG
>>102358425
>>102358472
>>102358483
>>102358532
niggers
>>
>>102364310
no u
>>
>>102364259
Nobody cares.
>>
File: undiagnosed.png (89 KB, 471x292)
89 KB
89 KB PNG
>Nous Research employs this person
So it was over for open source from the start...
>>
>>102363136
You're retarded because 70B exists.
>>
>>102364469
There he is. I asked him to show prompt that demonstrates the degradation and he told me I'm retarded.
>>
>>102364442
>org released a model that just plainly sucks
>aginigger theorizes some mystical shit about the model having emotional dysregulation instead of it just fucking sucking
How do people get like this?
>>
>>102364479
I don't need to prove that a 2bit quant is retarded.
>>
>>102364514
2.75.

You don't have to prove anything, but I observed otherwise.
>>
>>102364510
autism
>>
does anyone have the link to mixtral large gguf?
>>
>>102364524
Because you're retarded.
>>
>>102364510
the fuck you talking about, Hermes 405B is great
>>
>>102364514
>>102364524
>>102364479
not sure where the hate for 2bit comes from. last time I heard from the smartfags they were saying that negative bit quants might be a thing by 2024.
For me <midnight miku 70B q5 vs midnight colossus 103B q2> colossus wins hands down at the same TPS
>>
>>102364527
yeah
>>
File: 1000092725.jpg (203 KB, 1080x1328)
203 KB
203 KB JPG
The sheer arrogance of the comment kek. What paper? It's nothing unique.

> 1st Model is a finetune and merge over another finetune he didn't give credit for

> 2nd Model is literally just a continued pretrain to extend context and get better prose, on closed data
>>
>>102364540
15 rupees has been added to your account saar
>>
>>102364540
so
>Hermes 405B
>mixtral large
>nemo 340B
which should I use and where are the gguff links for kobold?
>>
>>102364571
Go back to Discord, Sao.
>>
>>102364540
Base 405b with fewshot is honestly better if we're talking about the big one. But Hermes 405b is not too bad, just underperformant relative to the size.
The Hermes 8b however is shit. I think the big parameter count helps mask the influence of bad datasets.
>>
>>102364613
oh man I totally forgot the smaller sizes existed, derp
if you were talking about that that makes sense yeah
>>
>>102364233
I only know stuff like pic related (tip).
Where you prompt the LLM to give a OOC message.
CoT could be funny yeah.
>>
File: berry.png (53 KB, 1244x962)
53 KB
53 KB PNG
>write a story with gemmasutra 9B until it goes full braindead and collapses into itself
>switch to ataraxy because I really want to know how everything ended, it writes a satisfying ending
>ask it about gemmasutra's mess for lulz
>it suddenly passes the strawberry test
Is this AGI?
>>
>>102358904
>>102358934
You can run it locally speech.fish.audio
>>
why the FUCK does sillytavern default worldinfo entries triggering at the beginning of context and why the FUCK does triggering them again cause the entire context to get processed again
t. cputard
>>
>>102364922
>>102364922
>>102364922
>>
File: 1699295076190658.jpg (135 KB, 612x611)
135 KB
135 KB JPG
>>102364631
All this tech and the only thing you want is some smut beastially lolicon RP. And it's not even good. Grim timeline.
>>
>>102364855
I think sillytavern lets the author choose where to place them and most cardfags don't care because they are on corpo proxies.

Same with multi-user chat (unless you redo the entire settings to make it work). Full context processing every message.
>>
>>102363403
lol, Claude 3.5 is still the best at the most relevant thing on an assistant LLM, coding
>>
>>102364049
>it's just a finetune of GPT-4o
Yeah we've been saying that since yesterday.
>>
>>102362788
how do i run this with my 8gb of vram
i asked chatgpt but it just responded with laughter
>>
>>102359806

I loged in, how to use it?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.