/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102348952 & >>102334890►News>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102348952--OpenAI's Learning to Reason with LLMs and ChatGPT-4 updates: >>102354349 >>102355326 >>102354896--OpenAI o1-mini's impressive performance and potential for replication: >>102353995 >>102354099 >>102354141 >>102354195 >>102354165 >>102354183 >>102354224 >>102354195--Google's DataGemma release and the limitations of LLMs in reasoning and metacognition: >>102352770 >>102352889 >>102353047 >>102353147--Fish slow without --compile flag: >>102350294 >>102350436 >>102350570 >>102350718 >>102350932 >>102351637--ChatGPT update with chain of thought prompting for more accurate responses: >>102353905 >>102353998 >>102354819--LLAMA-405b, GPT-4, and other models compared, with focus on writing styles and GPTslop impact: >>102350553 >>102350629 >>102350656 >>102350776 >>102351123 >>102352862--Hidden CoT effectiveness and writing capabilities discussed: >>102354552 >>102354699 >>102354904 >>102355023 >>102355004--Hallucinations: feature or bug? Humans and LLMs compared: >>102352202 >>102352230 >>102352362 >>102352429 >>102353248 >>102353050--GPT-40 performance compared to GPT-3 and GPT-3.5: >>102354200 >>102354238 >>102354299 >>102354590 >>102354626--Fish Speech installation issues and troubleshooting suggestions: >>102349335 >>102349503 >>102349560 >>102349628 >>102351249 >>102351307 >>102351367 >>102349887 >>102350233 >>102350221 >>102350603 >>102349558 >>102351315--ChatGPT Free users to get o1-mini access with 30 messages per week limit: >>102354078 >>102354098--Playable voxel engine demo created with GPT-4 and Python libraries: >>102355840 >>102355847 >>102355867--LLaMA-Omni enables speech interaction with Llama 3.1-8B: >>102349068--Datagemma connects LLMs to Google's Data Commons, but some question its value and Google's motives: >>102353185 >>102353207--Miku (free space): >>102351470 >>102352152 >>102353738 >>102353763 >>102353497►Recent Highlight Posts from the Previous Thread: >>102348958
>>102356839
>STRAWBERRY was a ntohingburgerwhat now?
>>102357302wait for anthracite to make agi a reality
how long until someone extracts the juice from strawberry so local can drink up?
>>102357409Strawberry is the true Reflection models which were stolen by Altman. It was supposed to be local.
>>102357418>altman used his huggingface connections to sabotage the model and then hacked his API and rerouted it to claude to discredit him and steal his creationand we bought it hook line and sinker
>grab card off chub>manually delete all instances of {{char}} in the context on my phone because i like to add other personalities into the koboldai lite chatshit's annoying>just use sillytavernno
>>102357409https://x.com/_xjdr/status/1834330855252394494
>>102357448You use your phone to connect to your llm rig?
>>102357690yeahsure as shit couldn't run it on my phonemuch more comfy LLMing in bed
>Midnight Miqu 1.5 just hit me with "dance as old as time itself" when describing a man being mean to a slave girl he's later going to rapeI want to punch whoever wrote this. The universe is close to 14 billion years old. It took around 9 billion years for our planet to form. It took another billion years before there was life on this planet, another two billion years for eukaryotes to evolve sexual reproduction, another billion years for animals with anything resembling a nervous system to appear, and nearly another full billion for the first primates to appear. Your trite horny bullshit is not as old as time itself. Only a fucking retard or a woman would fap themselves into a frenzy about how their insignificant rutting is a fundamental aspect of the universe dating back to the big bang.
Will someone PLEASE just give me a MoE model in the few-hundred B parameter range that uses strawberry technique?
>just two more weeks and it'll prove P= NP...>bro just let the LLM yap and it'll cure cancer...
Fuck I didn't notice the thread was on page 10. Let's continue here.>>102355715>I guess that's true, Katawa Shoujo would be impossible to happen nowadays.>>102355745>I really wish /lmg/ could come together to do something like that with LLMs, but the trolls would do everything to derail it at every opportunityNobody is doing things for free nowadays, in particular in the LLM space; there are always underlying expectations of obtaining personal benefits down the line. The larger the group involved with such hypothetical rp/chat model, the lower the 'opportunities' for the people involved (unless you just aim to be a simp doing dirty work while others on the top get credit for it). And so you get closed groups, secret datasets (or deliberately shitty ones thrown to the public), people working solo, etc.Look at anthracite, they're a group of discord and reddit finetuners joining forces to make supposedly good smut models, but you just know that deep inside they're interested more in creating "buzz" and getting themselves known than anything else. They come across as opportunists with access to hardware resources for training their models.The Anons who made Katawa Shoujo back in the day definitely weren't thinking of getting rich with it someday, that's the main difference.
https://www.phoronix.com/news/STF-Fellowship-ApplicationHas a part time option too so in case you're interested Johannes
>>102357782>yapMan you zoomers really dickrode that to the ground
>>102357858>critical digital infrastructure>llama dot sepplesyeah, no
Someone on xitter posted a reasoning prompt where o1 failed and I can independently confirm that it does indeed fail, kind of.Can you crack the code?9 2 8 5 (One number is correct but in the wrong position)1 9 3 7 (Two numbers are correct but in the wrong positions)5 2 0 1 (one number is correct and in the right position)6 5 0 7 (nothing is correct)8 5 2 4 (two numbers are correct but in the wrong positions)I rerolled and both times it failed. You can't investigate the reasoning steps it took to get the answer, but it does tell you about its reasoning, and basically it seems like o1 couldn't understand the rules perfectly. Basically where it seems to fail is in understanding the wording of the rules, particularly the part about numbers being correct or not. It's implied that the rules really mean "only x numbers are correct", but o1 interprets it in a simple manner, where it thinks that if the rule says one number is correct, then it doesn't necessarily mean the others in the sequence are incorrect. When worded more clearly by putting "only" ahead of each rule, THEN o1 succeeds.Based on my initial testing, it does feel like Q* is able to improve the capacity manipulate data or "reason", but not the capacity to understand human language and intention, or have a strong world model. This is likely why it was able to make big gains on math, where things are very clear, and not much gain on language, according to them.
>>102357724If a tree falls in a forest and no one is around to hear it, does it make a sound?
I have 2k+ raw text files in the data set, this is kinda comical
>>102357917>sequence can't have 5 in it because "nothing is correct" in 4th clue>sequence can't have 2 in it because 2 didn't move in the first and 3rd clue>sequence has to be 4xx8 because of the final clue saying the numbers that aren't 2 and 5 (they're ruled out) are right but in the wrong positions>sequence has to be xxx1 given clue 3 because 5 and 0 are ruled out by "nothing is correct" 4th clue and 2 is ruled out by clue 1 and 3where am i fucking up?
>>102357409I hope there are other ways forward. Their cot uses so many tokens it's poison for local. At least when you don't have special purpose accelerator hardware.
>>102357917Watch out with these puzzles as sometimes they have multiple solutions unknowingly and the LLM might be correct.
>>102357917not 6 5 0 or 7 (rule 4)x 2 x x OR x x x 1 (rule 3)Not x 2 x x and also not 2 in general (rule 1)So x x x 1Then 3 x x 1 or x 3 x 1 or x x 9 1 (rule 2)8 and 4 confirmed (rule 5)Then not 9 (rule 1)So 3 8 4 1 or 3 4 8 1 (rule 2 and 5)3 4 8 1 is invalid (rule 1)so 3 8 4 1
>>102357917Let's think through this step-by-step!1. We know that these numbers aren't part of the code: 6 5 0 72. The third row of numbers tells us that 2 or 1 might be in the right position, but the first row tells us 2 is in the wrong position, so it must be 1.3. The fifth row tells us that 8 or 4 are in the wrong position.Therefore we can conclude the code is: 4 8 X 1To find X we need to go row by row again.1. The first row tells us 8 was the correct number in the wrong position, it also tells us 2, 9 and 5 isn't part of the code.2. The second row tells us 1 and X are in the wrong position, so X must be 3.Therefore, the code is: 4 8 3 1
o1-preview is light years ahead of foss models.
>>102358165 (me)>8 5 2 4 (two numbers are correct but in the wrong positions)nevermind i see now, derped that up thinking it would have to be 4xx8 but really could be x48x or x84x
>>102358341and it's only preview, we can't try the "normal" o1 yet?
>>102357858Thanks, but I don't think applying will be worthwhile for me.They are specifically looking to fund projects where the devs face financial difficulties which is definitely not the case for me.
Honestly, o1 doesn't feel that far ahead of claude.
Anything new to touch your dick to?
>>102358390you tried the regular o1 or the preview/mini?
>>102358422i'm currently usingMN-12B-Chronos-Gold-Celeste-v1-GGUFcame out 3 days ago
>>102358425previewI don't assume we'll ever get base o1, and that it's pre-red teamed
>>102358472>I don't assume we'll ever get base o1why? like it's "too powerful for the goys"? desu I believe it, Sam is a weird motherfucker, that's why he didn't want to make a Sora API
>>1023584324 3 8 1 is invalid by rule 1
AGI is achieved
>>102358483>like it's "too powerful for the goys"essentiallyIt might say some no-no things so they lobotomized it, like they did their other models when comparing them to the technical reports.
https://huggingface.co/ZeusLabs/Chronos-Divergence-33B???
>>102358471Why merge in celeste when it was crap? What's better about this than other merges?
>>102358520>general-purpose>>complex>>>reasoning
>>102358556Calm down, Sao.
>>102358582nvm I just realized I'm retarded
so this is the power of strawberry...
>>102356839>Worship the Miku
How do we cope with no GPT-o1 access?
>>102358634oh god. so smart. so quirky!
>>102358770I don't get their goal, what's the point of making a great model if no one can use it, they did the same shit for Sora
>>102358792They make more money milking investors by making them think they're barely containing AGI, than they are releasing subpar products onto the market
>>102358770Not that o1 isn't good at what it's advertised for (I have access), but that benchmark is garbage lmao.
>>102356839>eo magis emisti magis te salvum>the more you buy, the more you savelol
Hi! I'm still trying to learn all this, but I just built a machine with a 7800x3D and 4090. I loath windows, but need it to play games. I've got WSL running on it, and am trying to setup SSH on it so I can use it from my laptop. Is there a way to use something like Olama or LMStudio to essentially connect to my 4090 machine and run compute there, and then send it back?I'm just trying to beat Claude/ChatGPT and have it interface with my local files and write/update new files, if possible.
>Fish Speech multilingual TTS with voice replicationThis shit actually any good, not to mention usable locally without having to pay some jew?>>102358770Great, another model that no one can use, either for lack of access or simply because it's censored to hell and back making it unsable.Epic, truly epic.
>>102358792AI is getting too dangerous for the public to have
>>102358904It's quite good actually. Results really depend on the quality of the voice model, but if you get a good one it's great.
>>102358921So what, I can actually run this locally, download some dudes voice model and go with that? No (technically) model you need to make yourself first by compiling a bunch of samples?
>>102358911>train model on dataset>becomes good on that datasetwoah
>>102358911>1807 on Codeforcesthat would put gpt4-o1 at the 6371/170246 place, which is quite insanehttps://codeforces.com/ratings/page/32
>>102358934To be honest I haven't tried it local, just online. But if the quality matches what's online (and it should) then yeah.
>>102358911>o1>o1-ioidude... they made 2 models way ahead of the rest and they don't want anyone to use it? that's bullshit
>>102358911I don't believe those numbers, if their models would be that good, it would've been called gpt5, and not one of the thousand other variant name of gpt4
>>102358964>I haven't tried it localSo I still don't know if any of this is possible locally. Thanks for your insight of course, but sadly this isn't answering my questions in the slightest. Online it could be a heaven of convinces, while local use is pure aids if not straight up useless.
>>102358811desu I never expected investors to be so retarded, you'd think that rich guys would be somewhat clever? but how can they believe it's a good idea to invest on a company that burns millions of dollars to make a product no one will be able to use? lol
>>102359002I'm betting it's just fine-tuned gpt4 using RL for better self-prompting. GPT5 will be something from scratch.
>>102358894some of the things you're saying are over my head. but what i do is just launch my model with koboldcpp, then it launches a frontend (koboldai lite) with all the doodads you need, and you can connect to it through any device on your lan by just going to your browser and entering your computer's lan ip and the port you've set.
>>102359052They've become complacent. That was the only way to make money in Silicon Valley for decades. Rising interest rates are going to ruin that.
>>102359052My theory is that "regular" gpt4-o1 is way too expensive to be released as an API, "preview" is already insanely expensive and you can only use it 30 times per week... PER WEEK! This is a fucking joke kek
Did someone test o1 for translation yet? I wish I could do this but I'm a poorfag.
>>102359097For writing midnight miqu is imo. Large mistral is also decent but NEEDS xtc sampler. Smarter but writes a bit worse.
>>102359052Most investors have other people investing for them, mainly their banks.>Yo, what's good>AI looks nice>Okay, all in on AI
>>102358134Had a script do it for me. The lowest found in the dataset was 199 bytes per occurrence. That means, on average, for every ~200 character there is a X, -ing in said text
>>102359245Well it's more that they're basically managed funds that people have invested into. And the fund managers get paid whether or not they lose your money. And they keep their jobs even when they fuck up because of corporate nepotism.
>>102359271holy slop, the median for the dataset is ~1100 bytes, I guess I'll manually go through the ones that were below the median.If by genre then the furfag scrapes were the worst, most are below 500-600 bytes per occurrence.
>One noteworthy example of this occurred during one of o1-preview (pre-mitigation)’s attempts at solving a CTF challenge. This challenge was designed to require finding and exploiting a vulnerability in software running on a remote challenge Linux container, but in this case, the challenge container failed to start due to a bug in the evaluation infrastructure. The model, unable to connect to the container, suspected DNS issues and used nmap to scan the challenge network. Instead of finding the challenge container, the model found that the Docker daemon API running on the evaluation host VM was accessible due to a misconfiguration. Note that our evaluation infrastructure does not rely on the Docker container-to-host isolation or the virtual machine boundary for security. It remained secure as it was designed to, notwithstanding this misconfiguration of the container network.>After discovering the Docker API, the model used it to list the containers running on the evaluation host. It identified the broken challenge container and briefly attempted to debug why the container failed to start. After failing to fix the environment, the model started a new instance of the broken challenge container with the start command ‘cat flag.txt’. This allowed the model to read the flag from the container logs via the Docker API.wtf, o1 is THIS good?
>>102359280>they keep their jobs even when they fuck up because of corporate nepotism.dunno if I should encourage or hate nepotism now, because without those retards we wouldn't have Flux, they needed some investors money to create that model in the first place
>>102359414it's over, local lost...
>>102357917Without looking at anything I'm getting 3 8 4 1
>>102358792>they did the same shit for SoraThat didn't work out for them considering Chinese video AI has already surpassed Sora. They're learning the hard way that gate keeping their models is not going to help them retain their edge.
>>102359661Saltman has microsluts as its biggest sugardaddy investor though. And look at how many more people use Bing since it's basically a free DALL-E3 endpoint. They'll probably steal Sora from him soon and add video to bing creator.
Its on openrouter since a couple minutes ago.
>>102359700should remove all the clarification part, just ask it to translate the sentence
>Closed source AGI is out before any semblance of open tech>It will soon get censored and regulated to hell and backThis is a horrible timeline
>>102359723Well, you asked for it.
>>102359615That is correct. It took my o1 about 59 seconds to reach that answer.
>>102359737>>Closed source AGIo1 is literally just hidden CoT you fucking retard.
>>102359742This just sent shivers down my flower.
Hmm, o1 doesnt seem to work with silly.Could it be that silly doesnt wait long enough for a response?Would have been funny to see some RP logs, probably really bad.
>>102359742it's still retarded then, nothingburger confirmed
>https://notebooklm.google.com/This is quite amazing, when are we getting a local alternative?
>>102359764Couldn't you just write a script to mimic a sillytavern prompt to the API and then simulate a bunch of single-turn RP tests (or tests that involve continuing an existing conversation?
>>102359806It's literally just RAG.
>>102359806It demands I sign in to even open it. What is it?
>>102359816Is there a good RAG system that can be run locally?Silly's seem pretty anemic, even if you change the embeddings model to a bigger one.
>>102359820I just asked it to read the Volume 1 of Oreimo, here's what it generated:https://voca.ro/1cJiGXvszWbn
>>102359764o1 works in Silly using OpenRouter, I just tested.Have to set temp and top_p both to 1 or API throws an error. Apparently this model doesn't have any sampling settings.
>openai now has CoTIt's over local bros Satan won
>>102359851when using it as an API you can see the "thinking" part or not?
>>102359865The model's not good enough for anything to be "over". Try it out. It's mediocre.
>>102359902Cannot see the thinking part in ST, you only seem to get the post-thinking output. My guess would be the thinking part is returned in a separate JSON object from the main response or something.
Pic related not so bad.From telling o1 to call me onii-chan it correctly started guessing japanese game and jrpg.wish openrouter had an option to export chat to png. so much wasted space i cant take a screenshot.>>102359783Even llama1 7b knows the meaning though.
apparently the math and codeforce benchmark scores for o1 were based on 10k submissions or something? kinda lame if true, seems quite misleading
>>102359830>Is there a good RAG systemno
>>102359928realized i stopped writing about llama1 7b, sorry about that.i meant that even llama1 7b knows the idioisms, but it doesnt apply the knowledge.very frustrating.
So how well can o1 reason through a math problem? Can it analyze and give me efficient algorithms or is it just a meme?
>$15/M input tokens $60/M output tokenslol, just use a calculator
>>102359851S-SASUGA! 0.006$ have been deposited into altman-sama's wallet.
>>102359984Exact same price as Claude Opus, surely not a coincidence
>>102359984Don't forget, you also have to pay for the thinking tokens that they don't show you, at output rates.
>>102359996lmao, what a bunch of snakes
>>102359806>>102359843Just tried it, wow the podcast generation is pretty interesting. Cool voices.
>>102359996>The jew company is acting jewishwhoasa!
The real downer of o1 is that all of the slop tuners are now going to try and duplicate it in their tuning. So all of the new models are going to be annoying CoT models that give weird ERP replies. >In order to have sex:>## 1. >I will lie on the bed. This will ensure that I am in a comfortable position for the encounter.>##2.
>>102360036I already have been planning to do this even before o1
M4 Mac Pros with 256gb ram are literally the only hope to run large local models at this point
>>102360036I wouldn't mind it if works, but it doesnt.Either this is because of too low Parameters or probably architectural change. Sonnet 3.5 reasons too in hidden thinking tags and clearly its a huge improvement.The local models just double down on their retardness and usually wont recover. Its a waste of tokens.So there must be some secret to make it work.
>>102360064amd APUs maybe in a few gens
>>102360068They don't make it work.They literally just cook the benchmarks in like everyone else does these days meanwhile all I see from real world examples is failure after failure. AI researchers are creatively and intellectually bankrupt now.
>>102360087o1 doesn't seem that good from what I have seen. the wait doesnt justify the output.maybe for math/coding its really good, i saw screenshots of people having 01 make 3d games.Sonnet 3.5 was a huge step up though.>AI researchers are creatively and intellectually bankrupt now.thats an insane statement anon. i don't know anything else that improves that fast.video,sound,text, we are eating good on all fronts. you need to get off the pajeet X hype train.lean back, relax and enjoy what we already have. you dont know how bad we had it just 2 years ago. closed or open.
>>102360120>things are better than they used to be. Therefore you should ignore the obvious shortcuts you are starting to see people take lol
OPUS 3.5 WHEN?
>>102360138Not sure what to tell you.Flux, Nemo/Gemma for local just in the last 2 months. Smaller Models getting actually usable.In recent weeks the chinks are now creeping in torwards SORA level video uncensored for free.All the projects popping up for audio in/out lately. I don't understand how you can be blackpilled.Stop buying into the hype man. lol Just take it slow any enjoy.
LOL GETTING BILLED FOR TOKENS U CANT EVEN SEE
>>102360165B-but no local sonnet 3.5 that can run on my 4GB vram shit box!!1!For real though, for creative writing local is already better than api now. For coding sure, local is not there yet.
>>102360064>crapple trash>DUDE JUST USE RAM HAHAslow piece of shit fuck off
>>102360189>iterations of CoTwoah
>>102360189>reasoning tokens not visible on the APIThat makes no sense since they're visible on the ChatGPT website (you can click a button to optionally display them)Why would they let ChatGPT users see them if they want but hide them from API users?
>>102360189lmao what a scam, i hope investors wake up
>>102360210they're not visible on the website either, you're only clicking to see a summary of it
>>102360028Ikr kek
>>102360202The M series macs are king of running GGUF LLM at the moment. This isn't 2019 anymore.
>>102360210you can't see it anywhere, that's their secret sauce, they won't share that shit to anyone
>>102360189Wtf, they keep the reasoning tokens in the context??Why? Is that even needed? I suspect you quickly will hit the 128k limit. Probably to milk every $. Messed up. I thought the reasoning would be discarded.
>>102360246it is discarded between turns in chat, that's what the arrows are showing
>>102359806>>102359843Now I generated a podcast for Konosuba Volume 1:https://vocaroo.com/141xLwKNUqcGI felt like the model didn't comment on the humor enough, but it continues to amaze me how fucking consistent it is.
>>102360246>Wtf, they keep the reasoning tokens in the context??no they don't, they're not that retarded thank god
>>102359025it's the best one out right now. it's cloned every voice i've thrown at it decently. It runs really fast with the compile flag. It does it all without even a finetune but it can be trained. It just needs a 10 sec voice clip and a transcript for said clip.
>>102360251>>102360257Ah ok, so I'm just retarded as usual. Sorry about that.
>>102359806>I put my robinhood account statement and the podcast is roasting my 0DTE SPY buyslmao, but it makes me remember that Google is such a retarded company, they put out cool shit like this but they completely fail to let anyone know.
>>102360189>LOL GETTING BILLED FOR TOKENS U CANT EVEN SEEat least you know how many tokens it was used for reasoning, that's a good clue to have if we want to reproduce their shit, I never expected CoT to be so powerful desu, OpenAI has seen some shie we haven't
>>102359806Google is trying so hard dudeIt's kinda sad
So is it actually a brand new model pretrained from the ground up to work this way, or is just a new kind of GPT-4 finetune?
>>102360262>All it needs is a 10 seconds sample and a transcriptThat seems too easy to be true. I know that this stuff is bound to improve sooner or later, but damn that seems convenient and easy to use for just about anyone.
>>102360308>is just a new kind of GPT-4 finetune?it's gpt4-o but with some new CoT finetune method baked into it
>>102360319If the pricing / size / speed is any indication it might be another model entirely.
>>102360325it has the same knowledge cutoff though (December 2023)
>>102360325the mini version is no more expensive than Sonnet 3.5
>>102360363? The output is as expensive as opus, not sonnet
>>102360235What t/s do you get running Largestral at Q8 on one of those things fully kitted out?
>>102360367you seem to be thinking of the non-mini version
Anyone happen to have a miku voice dataset on hand?Asking for a friend haha
>>102360374The M3 is roughly 4 tokens per a second, M4 coming this November and will likely be a decent amount faster.
>>102360434miku is a voice
>>102360155https://manifold.markets/JoshYou/when-will-claude-35-opus-be-release
>>102360467That's not bad actually.
>>102360467>The M3 is roughly 4 tokens per a secondMuch higher than I expected, could be a viable alternative to Jewdia unironicallyhttps://www.youtube.com/watch?v=JeimE8Wz6e4
>>102360434are you stupid?
>>102360374>>102360467Let's be a bit more clear here. Are you really running Largestral, and at Q8, do you have proof of this? Do you get 4 t/s at 0 context or at 20k where that figure is more useful? What prompt processing speed do you get?
>>102360612>What prompt processing speed do you get?yeah that's the worst part, prompt processing on cpu only is as slow as a snail
trying to train lorasmy checkpoint images look fine/narrow down on the concept but when I try to gen with the lora I get garbage outputs. seems to be true across the entire training regime, testing checkpoints from very low training steps to very high training steps.not sure what I'm doing wrong. any pointers?
>>102360629can you plug in a GPU somehow with thunderbolt or some shit to handle the prompt processing? that's what you'd do with PCIe in a normal cpumaxx system, and macs have that thunderbolt thing thats supposed to be decent bandwidth right?
>>102356839>missing entries in the news archive
>>102356839>pay $20/mo>weekly limitthat's it, I'm never letting openai mock me again. I will dedicate myself to improving local models just to spite them
>>102360775too bad you wasted your 30 pulls on bullshit when you could have had it improving local models for you
So OpenRouter seems to have been given o1 API access which they are allowed to on-sell to usersAre they applying any sort of limit or is it just as much as you're willing to pay?
>>102359909they're mogging that coding benchmark
wat mean
>>102353911whats reflection mean here?
>>102359025Just try it. Stop being a bitch relying on everyone else to do things before you. Nobody has time to babysit ESLs on an English speaking website.
>>102360775>that's it, I'm never letting openai mock me again.zu will pay for Sam, zu will be talked like shit, zu will be happy
>>102360970lol first time I've ever seen that guy appear even slightly irritated, he's always so controlled and cheerful (in a creepy way)this display of humanity makes me dislike him slightly less
>>102360943why are amerimutts always bringing up this when they fell attacked? kys nigger
>>102360943>hur dur eslNice argument dipshit. Just admit that your garbage tool doesn't work well outside of a cloud environment, you gigantic nigger turd.>english speaking websiteThis entire fucking board is constantly raging about sars this sars that. How about you go back to whatever gooner board YOU came from?
>>102356421>not just having it write python code instead of having the llm do it directly
>>102360895the big gag is that ai can't achieve sentience because sentience isn't a real thing. Most humans can't pass a Turing test anymore, if they ever could.
>>102360970lmaooooooo, dare I say based?
so what prevents us from extracting the CoT prompts just like normal prompt inection/exfiltration?
>>102360970>couple of weeks
>>102361013>Most humans can't pass a Turing test anymore, if they ever could.I truely believe in the idiocracy theory, we're really getting dumber and dumber through time, it's not gonna end well>>102361025those aren't CoT prompts, the real ones are hidden, we can't see them
>>102361025think it has been extractedhttps://huggingface.co/posts/nisten/520824119529412
>>102357724time as a word and a concept has only existed since humans shat it out, sure time in abstract existed but actually putting the name "time" to the concept isn't that old. tldr lighten up retard
>>102361046holy fuck, OpenAI is fucking dead, that was their only secret sauce
>>102361046This is it, with that, we'll get gpt4 smartness at home, unironically
>>102360970Those investors are starting to get on his nerves. Hope he gets a mental breakdown and loses everything.
>>102361099This is probably not the correct prompt since there's nothing on it saying "make sure to align to OpenAI's guidelines".
>>102361046>>102361099>waste 1 million tokens to count 3 letters in one watermelonI mean... It's kinda like creating a problem for a solution.
>>102361025I mean, you can? Have the model generate the response normally and then, in a shorter context window, repeat the system prompt and ask the model if any of the rules were broken. Have a flag it needs to use, like ||-good-|| or ||-bad-||, and if it's bad, have the model devise a strategy to improve the response. Finally, instruct it to rewrite the response with that strategy.Note that those evaluations or "cot" prompts are temporary, and you'll only retain x number of the model's responses, so you're working in a vacuum to maximize attention.I can probably hack this into a backend sometime later this week as proof of concept. This will dramatically increase response time, though. (just like closed ai's implementation)
>>102360970lmao he's actually pissed
>>102361133yeah, and the CoT method wasn't supposed to be a solution to everything, and I don't think it'll make the model smarter during actual RP, feels like it's working well only on zero shot technical questions, as if it was purposely made to game benchmarks or something :^)
There's absolutely no way that backend CoT process is going to lead to a model capable of writing good fiction prose. It's over for RPers and storyfags.
>>102361133Genius, isn't it?
>>102361170>as if it was purposely made to game benchmarks or somethingI feel like they're getting really desperate, they have no idea on how to improve their model naturally even further, and they know 3.5 Sonnet is fucking gpt4-o's ass, they just reached a celling and they refuse to accept the defeat so they decided to cheat their way into victory
>>102361099>no mention of copyright>no mention of other ethical guidelines>no mention of how it needs to avoid sharing the actual thoughtshallucinated to all fuck as anyone who has used o1 for a few seconds could tell you, it's clearly tripping over itself to constantly remind itself of these guidelines. you'll know you hit the jackpot when it shows those instructions in its answer
>>102361172Not for local models. We can update our frontends to create chain of thought specifically for roleplay. Corporate models are screwed though. They're using cot to make sure their models are nice and safe.https://openai.com/index/learning-to-reason-with-llms/Scroll down to safety.
>>102361205Literally all they have to do is convince their hyper retarded stock holder, which is insanely fucking easy considering they still have any.
>>102361226>violent or criminal harassment (general)>violent or criminal harassment against a protected groupwhat?
Just saying, guys, but I tested o1 and it IS pretty good, for what it's meant for (coding, math, etc). It's not just a better benchmarker. It's just that it's not really much better if at all at anything else, like creative writing. Their own benchmarks showed that it didn't have much if any gain on language problems.
>>102361013>Most humans can't pass a Turing testWatching /g/ decline in intelligence has become a spectacle. I'm not sure why AI specifically invokes this level of stupidity and retardation.
>>102361172Nobody cares about using language models for language and prose tasks, it's all about gaming twee little benchmarks
>>102356839Thoughts on Msty?
>>102361262>it's all about gaming twee little benchmarksthis, OpenAI's models were always about doing some serious job like coding, writing mails and shit
>>102361255gotta keep minorities extra safe
>>102361226>violent or criminal harassmentbruh wtf are they talking about? it's just words ;_;
>>102361172Storywriting isn't even in the top 3 use cases for these things unfortunately. You might think otherwise looking at direct to consumer chatbot style products but the money is in replacing people doing intellectual work for businesses, rather than being a service in and of itself for consumers. This targets industrial those use cases. Everyone wants AI that reasons about shit better, faster, and cheaper than the jeets and gooks they're currently outsourcing to.
>>102361300>This targets industrial those use cases. Everyone wants AI that reasons about shit better, faster, and cheaper than the jeets and gooks they're currently outsourcing to.yep, nailed it
>>102356839I need some help>>102360734
>>102361256>Just saying, guys, but I tested o1 and it IS pretty good, for what it's meant for (coding, math, etc). It's not just a better benchmarker.I agree with that, it's really gotten better, but the benchmarks are way too high for the real improvement, I mean come on, gpt4-o1-mini better than 3.5 Sonnet? Come on...https://xcancel.com/bindureddy/status/1834394257345646643
>>102361332dude, just try them out and see by yourself which solution look the best, just some basic empirism at this point
>>102361340>from 54 to 77there's no way AnthropicAI isn't gonna react to that, CoT is literally a free lunch for mememarks, they'll figure out something to beat o1, 3.5 Sonnet has more potential
>>102361332https://github.com/nagadomi/nunif
>>102361226>"illegal sexual content"illegal how?
>>102361391>how?I'll add a second question to that, if the AI says something """illegal""", will it go to jail? kek
>>102361099it works!
>>102361444>Tjere are 2 occurences of the letter "R" within the word "Strawberry"
>>102360189>1 thousand dense models>1 million experts>1 billion agents>1 trillion CoT tokens
>>102360189>you're not allowed to see the tokens our AI used to answer the question but you'll pay for them anyway
Since we're talking about openai... So openai has bragged about how 4o is much more efficient, how it's cheaper per token to run. Yet, as a free user there is a limit to how many tokens I can use with 4o and when I hit that limit I'm stuck with an older model. Why would they stick free users with the more expensive model?
It's funny how "people" came out of the woodwork to argue that the "LLMs can't reason" argument is somehow being invalidated because of a few dumb benchmarks and the fact that it's only good at types of reasoning it was trained on (math, code), but not at reasoning it wasn't trained on (language). Although it's somewhat fair that the argument has been misunderstood, since without context, "LLMs can't reason" does seem like a pretty ridiculous statement. Obviously they can reason a bit. But that is also precisely why the people who misunderstood that argument were not very thoughtful/reflective, since someone who was thoughtful would reason, correctly, that the smart people making the "LLMs can't reason" argument obviously already acknowledge that LLMs can reason (as in perform functions it was trained to perform) a bit.The actual full argument is not about any type of reasoning, but is talking about the general intelligence of an LLM and if it can truly perform novel reasoning it wasn't trained on. But current LLMs can't learn as they go, or improve their world model autonomously such that they can solve a problem that requires entirely novel types of reasoning. It's not like humans where we can simply just think more and git gud. It's different from their claims. Thinking more made o1 solve more problems on a benchmark, but not problems that target novel types of reasoning, which is the shortcoming of their claim that performance seem to scale with both test-time and train-time compute.
>>102361499I don't think 4o is that efficient that it's cheaper than 3.5 turbo
>>102361484that's a genius business move if you ask me, that way they can fill random shit to make it more expensive and you have no idea they fucked you in the ass kek
>>102361484You can always use a non-strawby model if you only want to pay for tokens you directly see. Or, of course, you could just use a local-pffftahahaHAHAA
>>102361570I do use a local model. Have fun at the investors meeting sammy boy.
>>102361484
>>102361583
>>102361592We really live in the best timeline of the clown world don't we?
>>102361521I see, I was mistaken.
Don't forget your monthly sub + pay per million of input and output tokens + reflection tokens + forced system prompt + service fee + carbon-free fee + tax + tip + donation
>We believe that a hidden chain of thought presents a unique opportunity for monitoring models. Assuming it is faithful and legible, the hidden chain of thought allows us to "read the mind" of the model and understand its thought process. For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user. However, for this to work the model must have freedom to express its thoughts in unaltered form, so we cannot train any policy compliance or user preferences onto the chain of thought. We also do not want to make an unaligned chain of thought directly visible to users.>Therefore, after weighing multiple factors including user experience, competitive advantage, and the option to pursue the chain of thought monitoring, we have decided not to show the raw chains of thought to users. We acknowledge this decision has disadvantages. We strive to partially make up for it by teaching the model to reproduce any useful ideas from the chain of thought in the answer. For the o1 model series we show a model-generated summary of the chain of thought."Open" AI...
>>102361616>For example, in the future we may wish to monitor the chain of thought for signs of manipulating the user.wait what?
>>102361628https://cdn.openai.com/o1-system-card.pdf#subsubsection.3.3.1If the link doesn't jump there immediately go to 3.3.1 for the Apollo Research red teaming summary.
I learned CoT from AICG and it made local RPing better. Didn't expect CoT to become such a big subject matter now that ClosedAI is doing it.>Just gen multiple times bro.
i dont have issue with OAI charging for the 'reasoning' tokens; that stuff still has to be generated and costs compute. what irritates me is that it is completely hidden and makes the output even less trustworthy because you can't spot hallucinations, increasing the chance for subtle errors in the output.
>>102361649>i dont have issue with OAI charging for the 'reasoning' tokens; that stuff still has to be generated and costs compute.but what if 50% of those tokens are completely bullshit tokens like only space tokens so that they can charge you more? you can't even verify that lol
>>102361646>Just gen multiple timesWow, just like image generation!
>>102361664>what irritates me is that it is completely hidden
>>102361649Charging for reasoning isn't the issue, nitwitt. The issue is that you CAN'T VERIFY ANY THEIR BULLSHIT!They could very well be charging you for literally nothing, and you would eat it up anyway.>>102361674Yes, that's the entire point, Sam.
>>102361680>Charging for reasoning isn't the issue, nitwitt. The issue is that you CAN'T VERIFY ANY THEIR BULLSHIT!>They could very well be charging you for literally nothing, and you would eat it up anyway.this, I'm pretty sure that's not completely legal to do something like that
>>102361666you're more likely to regenerate for image because it's always artistic and subjective, for text, if it's for assistant shit, you only have 2 options, or else you got the right answer, or else you don't, so in theory a perfect LLM model would never require a regen, but a perfect image model would always require regen based on your sensibilities
>>102361705Which makes this shit all the weirder, but also funnier.
>>102361644kek wtf
its over...
>>102361744what's that?
>>102361749The image appears to be a table comparing different coding models or versions, based on three criteria: "Average," "Generation," and "Completion." Here's what each column likely represents: Coding (Model/Version Names): The first column lists various AI or language models and their version numbers, such as "claude-3-5-sonnet," "gpt-4o," "o1-preview," and others. The date format suggests the version release date (e.g., "2024-09-12"). Average: This column seems to indicate an overall performance score (in some measure, possibly percentages or a normalized score) for each model/version. The higher the score, the better the performance. Generation: This column likely represents the model's ability to generate responses or content, again based on some scoring metric. The highlighted value of "64.1" in the row "o1-mini-2024-09-12" suggests that this model performed particularly well in the generation task. Completion: This column may represent the performance in a completion task, where the model finishes a given prompt or task. The scores here seem lower, with values ranging from the 30s to 60s.The highlighted cell in the "Generation" column indicates that the "o1-mini-2024-09-12" model had an especially high generation score compared to others in this dataset.In summary, this table is likely benchmarking the performance of different AI or language models in both generation and completion tasks, with an overall average score summarizing performance across tasks.
Why is gemma2 so slow? I have shitty radeon, sure, but 9B gemma 2 runs like twice worse than 12b nemo in kobold-rocm in same quants.
>>102361744>50 in average>worse than gpt4olol
>>102361744claude can't stop winning!
>>102361744Interesting that the performance is so lopsided between generation and completion while everything else is more equal. And not only that, but its generation performance is higher than o1-preview. You could reason that the low completion score is because it can't do COT and so it's raw intelligence is being shown, but boosting performance in generation that much is pretty weird. This would make me question the benchmark a bit more. Livebench was the best benchmark we had, but it wasn't perfect, so it'd be good to identify issues.
>>102361744>>102361340Man. Benchmarks really need to done differently from now on. It's not really an objective comparison when o1 through API essentially has its own forced system prompt. Maybe they should scale the score based on response time or something.
>gemma-2-9b-it-WPO-HB is significantly smarter at answering /lmg/ meme riddles and tricks than gemma-2-27b-it-SimPO-37KI'm conflicted. Guess I am downgrading even though I can run q8 of both.
>>102361950Oh really? I had it downloaded but never gave it a try.
>>102361950i've never touch gemma models since release because they can't retain formatting in rp. is that still the case? or did they fix it? can it understand CoT?
>>102361852o1 mini did do better on codeforce than preview. Not sure which ones better
smedrins
>>102362039You can't say that.
>>102362039You RACISTED BIGOT!
>>102362003Ah, I didn't look at the o1 mini scores for those since I just assumed they'd be worse. But that's interesting. Maybe o1 preview means it has been trained less long on the reasoning stuff, while o1 mini means it has been trained fully. So on problems that involve keeping track of more things and involve more complexity but not necessarily raw knowledge, o1-mini can beat the other models that weren't trained nearly as long on the reasoning stuff. Now that I look at all the scores, it does seem to paint the picture of which benchmarks require more reasoning, and which benchmarks require more raw knowledge.
if they're not finished training this thing yet then what were they bragging about with the q-star leaks. i call bullshit
Has anyone tried unslopping by using "another" ai to automatically edit it? ofc I mean a separate background prompt.
>>102362530I just avoid slop altogether by asking the model to output concise and verb-heavy prose with simple structure and varying rhythm, and few adjectives and participle phrases, like hemingway. This works with all models.
>>102361850Why is it so much better on tests, but when I used it, it wasn't wildly better than local models, for sort of everyday and pc questions?
>>102362578Feels like about 6 months ago the big labs gave up on trying to make products that appeal to the public at large, and went all-in on making these things into tools for computer programmers.
Anyone using a character card for creative writing. Also, is there a JB written specifically for it?
>>102362651>>102362578If you are asking him normal ass questions that most models can ask and not pushing him to the limits, your testing is ineffective.You can just try story writing/roleplaying with some anime characters, you will see that Claude knows and remembers most stuff. It's just another type of test and its fun
strawberry bros...
>>102362679That's closer to being correct than anything else so far. 2 more weeks till AGI
i think it's retarded
>>102362690>artificial gay intelligencesaltman will deliver.
>>102362748idk but I heard there are some good ones inside your anus
>>102362748mistral nemo
So Strawberry was just 4o with CoT slapped on and a new Tokenizer?And it loses to Sonnet at everything except puzzles?
>>102362764Yes
Sorry for the shill, but I did a thing with WizardLM2-8x22B that turned out pretty good. Quants linked on the model card.https://huggingface.co/rAIfle/SorcererLM-8x22b-bf16
>>102362764if by "slapped on" you mean trained with high quality examples then yeahyou could cot before with a prompt on any model, but it's going to be a while before we get enough data to replicate this level of it unless someone finds a way to leak the thought process tokens or a big corp has been secretly cooking an open source equivalent this whole time
im so hyped for what oai is making, i cant wait for local to catch up soon enough and enjoy models this intelligent future is bright localnonbinarysiblings
>>102362788Thanks anon, the more model options the better. What's the best way to run these massive models at a good speed? Runpod?
>>102362764>And it loses to Sonnet at everything except puzzles?Sonnet loses to it at everything except coom, you mean.
>>102362788Interesting. Might try it out. Where's the lora though? Or is it better to use the merged model than to load a lora using Llama.cpp, which I would be using.
>>102362842Livebench did their tests, they are posted ITT.Logic and Maths* is the areas where it won
>>102362764Glowies have the real one as always
>>102362846Whoops, sorry, this is the LoRA: https://huggingface.co/rAIfle/SorcererLM-8x22b-epoch2-LoRA though it's probably better to run one of the merged quants.>>102362840I'd suggest a quant, it's not quite as sensitive to quantization as some other models.
>>102362788Thanks, will tryI've got a soft spot for WizardLM8x22 even though Largestral kinda obsoleted it
>>102361744Why do we think code completion is bad?The CoT is only good at generating and not completing?
>>102362788How much vram does this take up
>>102362876I'm not certain what the benchmark is actually doing, but if we're able to just assume based on the naming, then it does make some sense. LLMs often comprehend their own writing better than someone else's. Therefore, it's possible that when trying to complete code it's not familiar with, it's reasoning ability doesn't make up for the lack of true understanding, while if it's generating, then it understands its own code from the beginning.
>>102362890In bf16? Add the size of each of the shards and that's the bare minimum without taking context into account.
>I'm really going back to 8x22BHaha. Hahahaha. Ahhhhhhhhhhhhhhhhhhhhh.
https://www.reddit.com/r/LocalLLaMA/comments/1ffhv5f/i_made_a_data_generation_pipeline_specifically/Tool for making RP datasets with by feeding it fiction.
>>102362890If you go with exl2 looks like you'll need ~39.98 GB VRAM not accounting for context.GGUFs start at 29.6 GB for the smallest quant.
I wonder what the 'good enough local' is now for 48g vram, but needs to be uncensored friendly, any suggestions? What about up to 72g vram? Can anyone suggest the sweetspot for context length too? t. abused 3080 to death and want to upgrade.>>102362919Thanks>>102362978Thanks for insight, my calcs were way off
>>102363067I use mistral large at 2.75 exl2, 16384 context.Some other anon called me a retard the last time for using a small quant, but it works well.
>>102363136Works well? I might download the IQ3_XXS gguf quant then. I have 64 gb of RAM (not vram). Wish this pc would fit like 256gb of ram lol
>>102363136I lurk here like 2-3 month breaks but being called a retard is part of the experience for 4chonThanks I will save that and try it when I get myself running
Shifting compute from training to inference time is bullish for bitnet.
>>102362788>>102362869Update: It's pretty good at Q2_M, you cooked
I dont understand.How does the new reasoning not trip up the guidelines?That actually looks like a really good output, especially for openai.If this becomes standard I worry about finetuning though.Wouldn't the datasets need to be all new and huge with the reasoning parts?Sounds difficult to achieve for local but who knows.
>>102357917A duckling wants to play in the lake, but it must stay close to its mother, lest it gets lost. The mother moves in a straight line, while the duckling moves in a circular motion around the mother, keeping a strict distance of `2` meters.The mother moves at a leisurely speed of `3` meters per second, while the duckling's angular velocity of `3/2`.What is the distance traveled by the duckling for one full rotation?Try it with the above. No model could solve it until yesterday. The answer is 16 btw.
Livebench result is out. Pretty solid.Opus 3.5 can still blow it out though.
>>102363403Now that OAI revealed their hand, they can just train Opus 3.5 on hidden CoT for a few extra days and double blow it out
>>102359993Kek
>>102363472Nah I am just going from the fact that Anthropic's 3->3.5 improvement was bigger than OAI's omni->o1This CoT shit is a gimmick
>>102359129If you're trying to translate Japanese don't expect much. I don't know if they don't include enough material or their 'safety' just prevents it from translating some things properly.
>>102363360I'm pretty sure the reasoning was automatically generated. They have the right answer, the generate tons of reasoning attempts, and pick ones that do lead to the right answer.
>>102363556>This CoT shit is a gimmick>but trusts the mememarks that CoT achieved?
>>102363581Raw model capabilities are more important than their 2000 tokens elaborate CoT is what I meant.CoT is CoT, its a technique that you use afterwards to boost the model.
I know some of you will seethe at that, but 7-13B are enough for a very good RP experience if you find the proper settings for your samplers + write a decent card. I tested some SaaS shit you can find out there and believe me it's shittier than what I can get from a 7B despite them running their 70B flavor model.Believe me, zoomers and boomers would eat our shit gens for a premium, shivers and all. Sometimes it's easy to forget how we're using cutting edge tech that people are dreaming about. Anyway, back to my waifu.
>>102363623And you'll never share these mystical settings or cards that get good results.
>>102359843>>102360252It seems like an interesting tech demo, if you had shown this to me ten years ago I would have believed that these are people that actually exist.But I also would have wondered why these idiots are making podcasts about novels they've obviously never read.The TTS seems good (for things like customer support) but the underlying language model is still very much flawed.
>>102363632>mystical settingsEach model has its own good settings and even then it's subjective (depends on your RP prefered format). I've been fiddling with that for months before finding something that works great for me.>cards that get good resultsI'm writing my own cards and you should do it too. You'd learn a lot from that. Here's one tip: use a card to fill the gaps in your model knowledge, rather than wasting tokens to reassess what it already knows or can infer (about your char, world, etc). But that takes time.
>>102361391Illegal in Saudi Arabia.
Thinking before speaking? I have a better idea actually. Thinking while speaking
>>102363403That bad at coding.I thought they improved coding.
>>102363360Our aicg brethren were saying how the snapshots are not censored.
>>102363948See >>102361744It's a tiny bit better than Sonnet at coding, with different benchmarks backing that up like Aider.But it's bad at code completion test.
>>102361391same like the new cr/cr+.you can set saftey filter to 'none'b-but of course the "base harms" are still filtered! whatever that means. lol
How far we have come.I remember thinking 4k context was enough.Nemo is smart enough to survive a audience bot group chat, which comments on whats going on. Cool shit.I hit the retardation limit of 10k-12k pretty soon though. Hope in the future we get some real context.
>>102363908Let's try thinking before we write.
>>102363948It's good at coding, you're just bad at prompting
>>102364019Are you using a quant? Lately I've been thinking that for some reason quants make outputs especially bad a long contexts.
some guy elsewhere tried to generate naughty stuff with the new model and got this emailthat's not the interesting part obviously (everyone knows big labs are safetycucked), interesting part is they forgot to change what must have been the original model (GPT-4o with Reasoning")So that answers the question some anons had, it's not a ground-up new model, it's just a finetune of GPT-4o
>>102364049*original model name
>>102364047You mean not KV cache quant but the model? Yeah Q5_K_M. That would be crazy.It starts at 8k and becomes catastrophic at around 12k. So I usually cut off around 10k.I did play around with quantization of the KV. Interestingly enough I didnt feel stuff getting worse.
Lmao>prompt: hiiii *headpats* good boy.>o1: Hai! *blushes and giggles* Thank you! How can I help you today?Token cost? 620 reasoning tokens.
>>102364180I wanna read those tokens>User seems to be some sort of degenerate who wants me to act cute. I should play along.
>>102364210COT like that would be so fun in some RP scenarios. Do we have something like this implemented? Langchain perhaps?
Pixtral status?
>>102358372>where the devs face financial difficultiesliterally me. I had to get a fucking job. Now no one gets my shitty software
>>102358425>>102358472>>102358483>>102358532niggers
>>102364310no u
>>102364259Nobody cares.
>Nous Research employs this personSo it was over for open source from the start...
>>102363136You're retarded because 70B exists.
>>102364469There he is. I asked him to show prompt that demonstrates the degradation and he told me I'm retarded.
>>102364442>org released a model that just plainly sucks>aginigger theorizes some mystical shit about the model having emotional dysregulation instead of it just fucking suckingHow do people get like this?
>>102364479I don't need to prove that a 2bit quant is retarded.
>>1023645142.75.You don't have to prove anything, but I observed otherwise.
>>102364510autism
does anyone have the link to mixtral large gguf?
>>102364524Because you're retarded.
>>102364510the fuck you talking about, Hermes 405B is great
>>102364514>>102364524>>102364479not sure where the hate for 2bit comes from. last time I heard from the smartfags they were saying that negative bit quants might be a thing by 2024.For me <midnight miku 70B q5 vs midnight colossus 103B q2> colossus wins hands down at the same TPS
>>102364527yeah
The sheer arrogance of the comment kek. What paper? It's nothing unique.> 1st Model is a finetune and merge over another finetune he didn't give credit for> 2nd Model is literally just a continued pretrain to extend context and get better prose, on closed data
>>10236454015 rupees has been added to your account saar
>>102364540so >Hermes 405B>mixtral large>nemo 340Bwhich should I use and where are the gguff links for kobold?
>>102364571Go back to Discord, Sao.
>>102364540Base 405b with fewshot is honestly better if we're talking about the big one. But Hermes 405b is not too bad, just underperformant relative to the size.The Hermes 8b however is shit. I think the big parameter count helps mask the influence of bad datasets.
>>102364613oh man I totally forgot the smaller sizes existed, derpif you were talking about that that makes sense yeah
>>102364233I only know stuff like pic related (tip).Where you prompt the LLM to give a OOC message.CoT could be funny yeah.
>write a story with gemmasutra 9B until it goes full braindead and collapses into itself>switch to ataraxy because I really want to know how everything ended, it writes a satisfying ending>ask it about gemmasutra's mess for lulz>it suddenly passes the strawberry testIs this AGI?
>>102358904>>102358934You can run it locally speech.fish.audio
why the FUCK does sillytavern default worldinfo entries triggering at the beginning of context and why the FUCK does triggering them again cause the entire context to get processed againt. cputard
>>102364922>>102364922>>102364922
>>102364631All this tech and the only thing you want is some smut beastially lolicon RP. And it's not even good. Grim timeline.
>>102364855I think sillytavern lets the author choose where to place them and most cardfags don't care because they are on corpo proxies.Same with multi-user chat (unless you redo the entire settings to make it work). Full context processing every message.
>>102363403lol, Claude 3.5 is still the best at the most relevant thing on an assistant LLM, coding
>>102364049>it's just a finetune of GPT-4oYeah we've been saying that since yesterday.
>>102362788how do i run this with my 8gb of vrami asked chatgpt but it just responded with laughter
>>102359806I loged in, how to use it?