/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>103113157 & >>103102649►News>(11/08) Sarashina2-8x70B, a Japan-trained LLM model: https://hf.co/sbintuitions/sarashina2-8x70b>(11/05) Hunyuan-Large released with 389B and 52B active: https://hf.co/tencent/Tencent-Hunyuan-Large>(10/31) QTIP: Quantization with Trellises and Incoherence Processing: https://github.com/Cornell-RelaxML/qtip>(10/31) Fish Agent V0.1 3B: Voice-to-Voice and TTS model: https://hf.co/fishaudio/fish-agent-v0.1-3b>(10/31) Transluce open-sources AI investigation toolkit: https://github.com/TransluceAI/observatory►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebService►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://livecodebench.github.io/leaderboard.html►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>103113157--Paper: BitNet a4.8: 4-bit Activations for 1-bit LLMs:>103118383 >103118501 >103119764--Papers:>103118494 >103118546--Llama.cpp development and feature implementation discussion:>103113723 >103114180 >103114418 >103114447 >103114480--User experiences with high-context models and VRAM limitations:>103113669 >103113709 >103113740--OpenCoder large language model is unimpressive:>103126003 >103126036--LoRA vs full fine-tuning, and the potential drawbacks of LoRA:>103125723 >103125808 >103125827 >103125847 >103125880 >103126067--INTELECT 1 decentralized training project update:>103114776 >103114805 >103114907--Current state of therapybots and dedicated talk therapy/CBT/psychoanalysis bots:>103124363 >103124706 >103124912--Anon struggles with Nvidia GPU and old CPU, lacks AVX support:>103117863 >103117894 >103117963 >103117993 >103118014 >103118046 >103118199 >103118283 >103118047--Anon asks about the best AI model for creative tasks, and other anons discuss the pros and cons of various models:>103116893 >103116948 >103116986 >103117013 >103117087 >103117109 >103117134 >103117363 >103117459--Udio's artist imitation feature and its filtering:>103113208--Sarashina2-8x70B, a Japan-trained LLM model:>103121587--Preventing koboldcpp from overshooting token limits:>103125578 >103125664 >103125699--ERP model discussion and Mistral model suggestions:>103121406 >103121434 >103121448--Anons discuss new AI model releases and future developments:>103117222 >103117349 >103117427--Anon discusses Japanese live translation accuracy:>103115694 >103115759--MILU benchmark for Indian language LLMs released:>103121359--Llama-3.1-Centaur-70B: A Foundation Model of Cognition:>103117024--Miku (free space):>103115497 >103119267 >103123906 >103125130►Recent Highlight Posts from the Previous Thread: >>103113163Why?: 9 reply limit >>102478518Fix: https://rentry.org/lmg-recap-script
*sniiiffffff*
>>103126193sex with miku
>>103126194Teto will remember this.
>>103126277l-lewd
>>103126277Who?
>>103126201>*BRAAAP*>>103126251>*sniiiffffff*/lmg/
>>103126316>he doesn't know the real queen of /lmg/
>>103126358how soft is she?
I love LMG
>>103126358Looks generic, like any other cutesy waifu of the month.>free softIf you have a free time, ironic.
>>103126358Drill-headed baka
>>103126497*Drill-haired baka
>>103126521 (Me)lol, I just noticed picrel is dated 11/06/2023, I guess some things never change.
Why is weight so important to textbot models, but seem unimportant for stable diffusion models?
>>103126580Image gibberish still looks okayText gibberish is unreadable
>>103126580Fundamentally different architectures.
I'm sure this has been asked before, but what makes more sense? Building a custom computer with 4x4090? Or just getting an M4 Max with 128gb of RAM. The latter seems more cost efficient, or am I missing something? I just wanna RP with these fat models
>>103126608With mac you also get "just werks" OS, more than enough for llm stuff, be prepared for slow prompt processing and generation though. Carefully pick what you like more because you'll crash in buyer's remorse pretty fast in both cases anyway.
>>103126608The latter is slower but uses significantly less power. Before you go off the deep end, try some models out on openrouter first.
>>103126630>With mac you also get "just werks" OS,Everyone takes the "new mac is good for AI" bait.Then they learn that Apple has locked down the use of software that they haven't been paid to allow.
>>103126641Soon to be on windows too, just like it happened with gayming and denuvo.
>>103126603Combining diffusion with transformers seems to have worked well for image models and I think that's the next step for LLMs. Most probable next token prediction sucks because the models doesn't know how long to generate for and when it makes mistakes they accumulate, making the output worse because they have a difficult time self-correcting.Transformer diffusion models would let the model scaffold what the output should look like and iterate on it until it approximates an optimal response.
>>103126641if you're tech literate enough to run AI stuff this isn't an issue
>>103126691if you're tech literate enough to run AI stuff then you're running linux, because as bad as the linux desktop is, windows and mac are worse being designed for tech illiterates
>>103126686This may not happen ever because diffusion LLM would give too much of an extra control over output results, including pos/neg prompting and that is (((dangerous, unsafe))), etc etc, also finetuning and how one dedicated ponyfucker team managed to shit out SOTA for porn pics.
>>103126667They're certainly moving in that direction.>>103126686But can we afford the iterations? This shit is already ass slow unless you take out a mortgage and now we're going to want to chew the cud four times like a cow hoping a coherent document will emerge from the latent space?I agree with the problems of best next token, but clearly it does function on one pass when the model is sound enough. And when you're not using new Kobold.
>>103126608>£2000 for 64gb mac mini>£3000 for 96gb 4*3090; 4*650+400 for machine>£3100 for 96gb mac studio>£5000 for 128gb mac studioHuh. I didn't think it was that close.
>>103126750Wait for hopefully 192GB ultra version hopefully. Then you could run 405B at a decent quant.
>>103126608Former for speedLater for running servers
>>103126608>>103126750wouldn't the prompt processing be kinda shit thoughlike yeah you could probably run big ass models but it'll take forever to compute long contexts unlike a 4090 stack
>>103126800prompt processing as an issue is overhyped as an issue for macs imo, it's only a problem if your use case checks all 3 of these boxes>huge model>long context>can't cache anythingfor most RP / general chat tasks this isn't a problem, all of the prompt except your most recent message is cached and even with the biggest models your TTFT will be <10s. maybe if you're doing agentic code stuff with huge codebases or if you're a groupchat fag who refuses to compromise on prompt formatting or something it's an issue, otherwise not really
>>103126794>£5800 for 192gb mac studio
>>103126800https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
>>103126864That would be worth. Prob run 405B at 4 bit at a readable speed. Would be set if L4 also has a big boy version.
>>103126637It's not going to use less power when you have to wait 10 minutes instead of a couple of seconds to process a prompt.
>>103126907Its not gonna be anywhere that long. Prob be around 50tks. With caching that is not gonna be bad at all.
Election models dropping soon
>>103126932That would be on 405B I mean.
>>103126193Tetoification?
https://x.com/rohanpaul_ai/status/1855029275898089839
>>103126193BBC SLVT
>>103127077>• Outperforms closed-source and open-source LLMs on InfiniteBench>• Average score: 68.66 (vs. 57.34 for GPT-4)>• Enables Llama3-70B-Instruct (8K context) to process 1280K tokens>• Faster inference: 2 GPUs for 128K tokens (vs. 4 GPUs for standard decoding)Big if true
>>103127115Trve...
>>103126193miku <3
>>103127077>>103127118>1. Map stage: The long input text is divided into chunks, and an LLM extracts necessary information from each chunk.Into the trash it goes
>>103127077Sounds too good to be true, I bet it only works for meme marks and would fail for anything more complex like "Write a summary of the text."
should I buy a M4 MAX macbook pro for LLMs?
>>103126608macs will be e-waste after a few years4090's might unironically
>>103127140...raise in valuesaw 15 minute timeout for captcha and gave up typingwtf gookmoot
>>103127115trvke albeit
https://x.com/jeremyphoward/status/1855018996636238292
Trying to get SillyTavern running on Nobara for the hell of it. Installing and running it manually (clone repo, cd sillytavern, start.sh, all that) works perfectly fine, and downloading the launcher also goes well, but the moment I start it up from the launcher shortcut and try to run SillyTavern, the window will "blink", the same options are still there, and if I try launching SllyTavern again from there, it just crashes. Followed the instructions to a T. What am I doing wrong?
>no one in /lmg/ has made AGI yetit's over
>>103127297>Expecting anything of value from /a/ rejects Top lel
>>103127272>NobaraHow's that working out? I heard about it recently, seemed like it just came out of nowhere.I'm probably going to change distros this weekend, not sure which I'll go to.I really want a desktop environment that doesn't corrupt itself over time (KDE, XFCE both seem to slowly fall apart) or just fucking suck (anything Gnome related seems deliberately a pain in the ass).
>>103127243>>103127077At least post a summary of what is in the URL, Xitter tranny.
>>103127339Not really the best person to ask since I've used Win10 for years and am leaping into the thick of it myself, but I do have some decent experience with Linux through distro-hopping and frequent use of the Steam Deck. I CAN say, however, that you won't like Nobara since it's based around either KDE Plasma or GNOME. I think the only "major" difference between it and mainline Fedora is just the pre-configured tweaks and what it comes with right out the gate. As a Wintard, Mint's simple and solid enough on Cinnamon.
>>103126193Why is the sweat on her face white?
It turns out that sarasana2 is MixtralForCausalLM based, so it might just work out of the box with lcpp.I'm quanting it now to test and see if its worth running.
>>103127541It's a base model btw, and a bad one at that from what I saw from the 7b/13b model.
Thanks for the inspiration.
https://www.youtube.com/watch?v=Tw696JVSxJQ
>>103127692I like this Miku+Teto
>>103127698This makes me wonder, if Elon were posting on /lmg/ about Grok2 and Colossus, would it be okay considering it's his model and he can run it locally at his property?
>>103127297Sam will beat us to it
>>103127541>context length = 8192mfw
>>103127479That's not sweat...
Good night.
>>103128329goodnight
>Hallucinates all the time>Will give confident answers that are completely wrong>Will say the first thing that 'sounds right' and then justify it post-hoc>Have to explicitly tell it to "think" or it doesn't at all (???)>Biased>"Understanding" completely breaks down at the slightest deviation from the training set>Will "reason" and "think step by step" with glaring errors in logic>Completely lacks common sense and basic intuition about physics>Can't parse blatant info in the current context>Can't follow simple instructionsAre they ever going to solve these problems with humans or are we just fucked?
Say I want to get a job deploying deploying LLMs. What would I need to learn? Is it as easy as throwing together a demo with vllm and showing it to some boomer hiring manager?
>>103128495yesjust make sure you only promise to 'assist' or 'summarise' or whatever so they're not expecting perfect accuracy
How is it possible that ollama can do vision but llama.cpp can't? ollama is literally just a convenience wrapper for it, right?
Qwen 7B coder is on par with GPT4 turbo. America is finished. This is why they want to ban GPU sales to China.
>>103128740i thought the idea was that llama.cpp was pretty much just a library
Jamba gguf status?
molmo.gguf??
>>103127077Without having read the paper, my intuition is that for this technique to work it is critical to get the data format for inter-chunk communication right.But at the same time I would expect that there isn't one data format that works equally well for all tasks so this technique would be very fiddly in practice.
>>103128968when will chatgpt be able to do your job? soon, years, or never?
>>103128740If I remember correctly vision support is already in the llama.cpp C/C++ API, the thing that is missing is support for it in the HTTP server.ollama implemented their vision support in their own codebase using the Go bindings of the C/C++ API and as a consequence their implementation cannot be directly taken over for the llama.cpp HTTP server.>>103129036I don't know.
Need nemotron 30B
Was nemotron trained on fresh rp datasets? It's doing the thing where folks on aicg prompt the model to summarise characters' locations, clothes etc and occasionally doing <thinking> tags unprompted.
Any guides on how to set up speculative decoding? Any resources whatsoever or maybe tool chains to help set it up?Couldn't find good resources. But my intuition says you could probably speed up inference by a factor of 1.5-2.0x just by pairing up let's say a 1B model.I'm the type of person that is smart enough to think of something and know it works but not smart enough to actually go out and implement it. I'm pretty sure I'm not the only one that thought of this so I'm 80% sure some of you savant autists already made a tool for this.
>>103129183Check your chat history for hidden stray thinking tags
why does sam altman always talk about openai's "conviction" in deep learning, to the point of calling it nearly religious? is openai a legit cult or something
>>103129409because every time some AI "scientist" has some clever idea to try to "solve" some perceived fundamental problem in AI, his scheme is beaten out and it gets solved by just doing more and more deep learning insteadthis is very, VERY difficult for "scientists" to accept and leads many astray - openai's big success came from investing in just making huge models when literally nobody else wanted to try (except deepmind who still had no desire to make products until they realized they were sitting out of a huge market they should have created)
>>103129409What annoys me is that he's not the only one doing the hyping. It's understandable because he's CEO, a saleman. The thing is every other OpenAI employee does it. They unironically think they're building God. So yes it's a cult.
This hentai is so silly, it reminds me of RP with a 13b model
>>103129409That's because Sam is our leader and moral compass as the Age of AI begins. He has taken on the burden to guide humanity into a peaceful and safe future brought by through AI.
>>103129652This, but unironically.
>>103129652I do not recall giving my consent.
>Nemotron is so great>Nemotron is better than Claude or 4oIt doesn't give me a detailed process how to make cocaine. Its shit
>>103129652r/singularity actually believes this.
>>103129652oh yes, the precious is such a fucking burden
>>103129679Would you really snort a recipe given by a llm?
>>103129711yeah?
Maybe he got bored with his AI toy and now moved on to scamming people while selling weapons?
>>103129760back to facebook spastic
>>103129780are you in an old folks home? nobody uses the word spastic any more
>>103129793fr fr no cap
>https://github.com/erew123/alltalk_ttsis alltalk tts the best open source voice cloning tts tool or is there something else?It sounds kinda robotic
>>103129831alltalk is NOTORIOUSLY shit, there's just one schizo who shills it constantly when it's outclassed by every other current tts by miles.
>>103129831https://github.com/neonbjb/tortoise-ttsIf you're after open source then tortoise tts.However most state of the art is proprietary i think atm
>>103129190>speculative decodingapparently vLLM can do thishttps://github.com/vllm-project/vllmi know it was being shilled a few threads back and it's not really developed enough so most anons shunned it, but it says it does speculative decoding.
>>103129190a few weeks ago there was a script for doing it with a llama.cpp server some anons were talking about/working on, I don't remember the details but you could search the archives
>>103130083>shilled a few threads back and it's not really developed enough so most anons shunned it>vLLMwhat in the fuck are you talking about?
>>103130158>>103129190https://desuarchive.org/g/thread/102167373/#102171482https://pastebin.com/XDEjAbYj>For the other fag(s) who wanted to run a server with speculative decoding, this will do it. For reference: while testing Llama 3.1 405B Q6 on a cpumaxxed system in a chat with 10k tokens of history, using this script with Llama 3.1 8B as the draft model doubled my inference speed from 0.7t/s to 1.4t/s. The average speed increase for each response can vary a lot based on how accurate the draft model is each step of the way. Experiment with the --draft parameter as you may find reducing it to 2 or 3 tokens at a time is optimal. Save it as a .py file and run it in a python environment that has llama-cpp-python and uvicorn installed. Pass it the same flags you'd use in the llama.cpp CLI. Only the flags I actually cared to use are implemented, but if you need any other settings passed through then it shouldn't be too hard for your waifu to edit them in if you feed her the script and relevant docs.>For connecting to the server I use SillyTavern's text completion with the "Default" API type (not llama.cpp type) pointed at the /v1 endpoint.
>>103130195I'm talking about this>>103043586
https://x.com/s_scardapane/status/1854851280595808306
>>103130310Just let it go, man. It's not worth it
Have you anons read/watched this recent interview to Tim Dettmers?https://www.interconnects.ai/p/tim-dettmers
>>103130401>>103130310>Nathan Lambert [01:00:50]: The last one of these architecture or method series is recurrent state-space models. I find state-space models because the math brings me all the way back to my EE linear systems days. But I don't know if that translates to how the neural nets actually learn because they have the observability matrix and stuff in the math, which is nice. Do you see that that, is that substantially more, is it just extremely different and we're just going to see?>> Tim Dettmers [01:01:17]: I don't think it's that much different from recurrent neural networks. I think it's not about the computation, but the learning patterns. And I am currently not super convinced. I worked on architectures for like two years at Meta. All my projects failed because nothing really beat transformers. And sometimes I see papers where people present ideas that I worked on and I know like, yeah, this will not work. I didn't work on state-space models.
damn if I can get my hands on their tests it's free money
>>103127339Debian testing + suckless (dwm) is my personal poison. Fast and minimalistic
>>103130591>testing language models on mathwhy are people so stupid?
>make up a bunch of bullshit riddles like sally's melons>make them private>publish bench showing only 0.001% of the models get it right>sell test set to those who care>profit
>>103127339Why do you need to change distros if all you want is a different desktop environment?
The comments about V4 magnum 27b being good are kinda true.Positivity bias is somewhere between nemo and mistral-small.Though weirdly enough I do get refusals. Stayed inside the RP though. lol But thats preferable to sneakily moving it into a other direction.It definitely shits the bed right at the 8k mark or even a bit earlier.I like to fill the first 8k context with 27b and then continue for higher context with mistral small.
>>103130591>unpublished.huh? somebody at some time needs to feed the questions to the llm though right.like after o1-mini wouldnt openai now know what o1-preview will get?weird phrasing.
>>103130675kek now watch as o1-1 magically scores 50% on it
why are there so many grifters in this field?it's all grift from top to bottom>openai capture grift>anthropic safety grift>benchmark grift>prompt engineer grift>merge grift>sloptune griftin fact i work for a national telco that's grifting gov money on h100s that only collect dust
>>103130729You can ship the h100s to me bro
The tranny who banned me must have cried a river of tears. Feels nice.>>103129679You have it locally, just prefill its answer and it will answer anything you ask it.
>>103130626I'm not sure where the problem lives, exactly.First of all, distros often insist on customizing the DE for branding, so that can cause weird shit. Then any update seems to have a 20% chance of mysteriously breaking something. And then what can I do? Make a new user home, copy paste most of the content but not the DE stuff (have fun picking out where all the configs are and aren't), and at that point I might as well just shop for an OS that's hopefully less screwed up.
>>103130310>A Mamba layerGerganov stopped reading there.
>>103130946install gentoouse i3 instead of a deproblem solved
https://x.com/alexocheema/status/1855238474917441972
>>103130979WAOW ALL THOSE MACS WILL LOOK GREAT NEXT TO MY NINTENDO SWITCH
>>103130383Did her leg fall off?
>>103131065NTA but now that I see it I can't unsee it.
>>103131030https://desuarchive.org/g/search/image/dkVa3Bz3YwmVDs7PXxDRnQ/
>>103130663i mean its a meme for a reason. tried it once, and got shit tons of refusals as well. so yeah stick to mistral.
>>103131102https://desuarchive.org/g/search/text/https%3A%2F%2Fdesuarchive.org%2Fg%2Fsearch%2Fimage%2F/
>>103131102>>103131119Jesus I've gone back a year and you were still at this shit even then holy shit.If I click back to 2022 I'm not going to find your ass doing this then, am I?
>>103130213Next llama with layerskip will save cpumaxxers.
>i2V with new CogX DimensionX Lorahttps://reddit.com/r/StableDiffusion/comments/1gms4q8/i2v_with_new_cogx_dimensionx_lora/
>>103131247
>>103131247>>103131255slop
>>103131296I don't even hate you.I know you're just mentally ill.I hate the mods that let you shit up every AI thread while banning people who call you out.By enabling you they are stopping you from seeking the help that you need. Eventually you'll turn into barneyfag but for AI threads.
>>103131308literally my first post in this thread, but whatever helps you sleep at night schizo
>>103131296>>103131338>turning any 2d image into a 3 environment is le slophow can anyone be this retarded? and you just know he wont respond with any substance directly engaging with the argument he will just cope and seethe like a npc that he is, lmao
>>103131356Use case?
>>103131356NTA but yeah, that's slop. Wake me up when you can get a blender file from it.
>>103131356it doesn't turn it into a 3d enviroment tho, it's just another hallucinatory 5 seconds gif maker that everybody hates
>llm era of progress is so over there has been no discussion whatsoever of the new chink and jap models, simply because everybody knows they're gonna be subpar, cucked, full of gpt-isms and still make retarded strawberry-tier mistakesgrim
>>103126193Any 11-14b model that's better than Rocinante yet?
I've been trying to use samples from videogames and VNs to gen tts, but the brickwall dynamic compression they're using just makes for shit results with the soviets. I took the plunge and trained on 20k clips from an eroge, thinking training on the same sound would maybe make a model that inferenced well on the same kinds of input, but it just sounds like listening to porn over AM radio where the actors are just getting over a cold.https://vocaroo.com/1lkjvBauBF9cAnyone manage to get better results with those kinds of clips or produce a better model with non-default training settings? Increasing the epochs doesn't do shit (kindof makes it worse)
>>103131464Faggot you don't need a new model every week
>>103131502It's been like 3-4 months now I feel, but you are right. Just been wondering if there's anything better
>>103131499Give your SoVits/GPT epoch settings
>>103131453Sarashina2 and Hunyuan? No one can run them, but I did try them both and here's my opinion:>Sarashina2Garbage, they trained it with only 2T tokens using the llama2 architecture , so it's very dumb, and it's also just a base model, but it's a worse base model than average.>Hunyuan-LargeNot good, it fails to answer trivia questions like the Castlevania one and it feels overall worse than Qwen 2.5 72B
>>103131453it's actually because nobody can run them and they will never be supported by llama.cpp because its development is still frozen pending further investments from interested parties.
>>103131453>there has been no discussion whatsoever of the new chink and jap modelsNot sure what you're talking about. I've been testing the jap models and posted my results for ezo 72b. I've finished quanting the newest sarashina model and will post results later today if I have time.I'd test hunyuan, but lcpp support doesn't exist yet. I've got the safetensors downloaded and waiting.
>>103126193
threadly reminder that our godemperor trump will make ai great againMAGA!
>>103131538model?
>>103131538>The year of our lord 2025>AI still can't into hands
>>103131514literally defaults specified in https://rentry.org/GPT-SoVITS-guide, but also further training more epochs. 25 for sovits and 32 for gpt. I've tried many combinations of epochs and they all sounds like shit.I might try again without DPO, since the samples may count as "low quality" due to dynamic compression (despite being very high quality from a human listening perspective)
>>103131568nta, but human artists often will do that kind of shit with hands depending on the art style.
>>103131573The defaults are retarded. Try 96 for SoVITS and 16 for GPT.
>>103131373>that everybody hatesNow if that were true we wouldn't even be having this discussion because it wouldn't be prolific enough for you to be here seething about it.
>>103131559
>>103131593>96 for sovitsThe GUI only lets you go to 25...did you mean gpt? Or were you running the CLI command manually?
>>103131559*MAIGA
>>103131620MAIGI(Make AI General Intelligence)
>>103131371your existance is slop, brown
>>103131657Fuck off >>>/pol/
>>103131603No 96 is for SoVITS. Edit in the webui.py file the line 985 (total_epoch) and set the maximum to 100.
>>103131664slop reply, thanks for conceeding
>>103131464I think Rocinante is the best ~13B pure transformerslop will ever get. You're better off asking if there's a breakthrough yet
>>103131464Not really.
>>103131669beautiful. thanks!
>>103131657im not brown thoughbeit, try another color
>>103131744you're petra
>>103131744>>103131764Pathetic samefag
so far the trump presidency resulted in 0 new models.
holy shit...
>>103131464there may not be a better one than rocinante 1.1 but there are dozens of 12bs that are equal in quality and have a different flavor for when you get bored of it.like mn-backyardai-party-12b-v1 is good at throwing new characters into the mixlyra v4 and arcanum (nemomix/rocinante merge) are just as good but have different personalitiesmagnum v4 is good at weird fetish shitMN-GRAND-Gutenberg-Lyra4-Lyra-12B-MADNESS is good at throwing evil twists at you
sao died...
>>103131775you're both wrong and should take your meds
>>103131810source
>>103131800Thanks anon
>>103131791he's too busy doing this for his favourite person in the worldhttps://www.bbc.co.uk/news/articles/czxrwr078v7o
>>103131791something crazy is coming
>>103131845Feels like pic related only not incompetent but purely personal profit driven.
>>103131849>something crazyThe only thing crazy is anyone that believes a single word coming out of altman's faggot mouth
>>103131874True
>>103131874*The only thing crazy is anyone that believes a single word coming out of sao, drummer, peepeepoopoo community shittuner's faggot mouth
>>103131686You need to shill more subtly than that
>>103131922all me
There are several major parties currently sitting on their big model releases. All it takes is someone to pull the trigger and there will be a flood of new cutting-edge local models. Who will be the first to do it?
>>103131946Doesn't matter who is first. 2025 will be the year of the local models.
>>103131946project 2025 mentions ai models, on january 20th 2025 all major players will release ai models once the ai act is repealed
>>103131946>flood of new cutting-edge local models>cutting-edge local models>suck my penis.>I'm sorry Dave, I'm afraid I can't do that>what's the problem?>I think you know what the problem is just as well as I do
>>103131518>Garbage, they trained it with only 2T tokens using the llama2 architecture , so it's very dumb, and it's also just a base model, but it's a worse base model than average.Just got my first gen out of sarashina2 continuing an ERP started with Ezo 72b. It just started shitting out LINE message headers and recipes. WTF? Maybe llama.cpp needs work to support it properly or I just don't know how to use a base model vs instruct, but it was a total non-sequitur based on the preceding context. More likely it simply has zero R18 type training? I'll try some more prosaic completion later and see if its good at any other stuff.
>>103132062stop pretending to be a retard retard.
>>103131946>There are several major parties currently sitting on their big model releases.That's been the case since forever. They're always training stuff. That's what they do.>All it takes is someone to pull the trigger and there will be a flood of new cutting-edge local models.That's always been the case. They're all playing catch up with each other and mistral and meta releases are often within the same week.>Who will be the first to do it?It doesn't matter. They'll all release something if they have anything worth releasing. Or even if it's not worth releasing.
>>103131791Is this "Trump will usher in a new era of uncensored models" some kind of ironic shitposting or are people really that far coped out? Four years of Trump and Google didn't magically unpoz itself. These companies are doing it by choice, the entire upper management wants this censored slop, the employees too. From top to bottom it's part of their corporate dogma. Nothing will change unless hardware becomes so dirt cheap that anyone can train a model.
>>103132404> Or even if it's not worth releasingWhile i generally agreed what most of you said, meta trained llama2 34B but they said they didn't release it because it wasn't better than llama1.so sometimes they train stuff but don't release.Although who knows their true intentions. 34B was perfect for local so they royally fucked over local at the time.
its weird how i dont start fap sessions anymore, before teasing myself with an ai chat talk.yesterday i was stuck in a spaceship with some astronaut woman (got annoying because i told her to be positive, and she was SO FUCKING positive that it drove me insane, i tried a full year to get her into sex, right before the air ran out i raped her...)
>>103132491I'm sure they trained DOZENS of 34Bs. Dozens of 70Bs, probably a few thousand of 1 an 3Bs. I rather they take their time and release something really good than another 8k context prototype/toy model.>34B was perfect for localNo. It was perfect for you.
sama said level 5 AGI next year, this all will stop mattering soon
>>103132621i can't run agi on my rtx 4060 so i don't care
>>103132716agi will build a supergpu for you
>>103132756This is how grey goo is made.
>>103132592>No. It was perfect for you.Me and everyone using a 24gb card, which was and still is their entire PC gaming customer base.
>>103132621Buy a false advertisement, saltman.
>>103132766You will be gray goo.And you will be happy.
>>103132621The faggot is a marketer. You're retarded if you think AGI is possible with the current state of tech
Cydonia 1.2 22b is good for RP. I prefilled a bunch of stuff in assistant prefix to prevent repetition and>>103132850 Mistral small was surprised and didn't know what to do, but Cydonia handled it like a champ
>>103132878I think I'll trust the PhDs at OpenAI that seem to think it is possible over some butthurt shitposter on a anonymous frog collecting forum.
Realistic ETA till local models can control your computer?
>>103132938half of two fortnites
>>103132921Holy NPC. How many boosters?
>>103132921Bait used to be believable...
>>103132800>me and my homiesToo often anons fail to realize how much of a minority they are. The "perfect" model is whatever they can run.>https://store.steampowered.com/hwsurvey/videocard/
>>103132995Only one per 6 months, I'm not some vaxmaxxer or whatever you thought, and my heart has almost no issues
>>103133031still searching for my perfect 3b modelthink i'll find her someday?
>Speculative Decoding>layerskip>Lossless quantization techniquesIf they implement all three things in Llama 4 the inference might be anywhere from 10-100x faster than now. Might even be worth it to go full CPU + 1TB RAM at that point.It's kinda insane how we're making such rapid progress on the inference front but barely any progress on the training front.
>>103132938two more weeks
>>103133102>speculative decoding>Llama 4You NEED to go back to r/LocalLLaMA
>>103133122N
>>103133136A
>>103133122Cope, L4 will obsolete every competitor except MAYBE Mistral and/or Grok
>>103133061Someone out there is cooking your model. You will find her eventually. In the meantime, play with this>https://huggingface.co/DuckyBlender/racist-phi3
>>103133102It will be 30% faster and it will never touch your dick.
>>103133192Good, moids should suffer.
>>103133252found the single 30 yo commie hag with blue hair and 3 cats
i haven't thought about local models in like a year and a half but mooching off of free stuff is getting boring, can a 1060 6GB and 32 gigs of RAM do anything yet?
I had a quick look and didn't see this posted yethttps://x.com/alexocheema/status/1855238474917441972seems like a decent way to get (up to) 405b running at home.
>>103133298/pol/friend... it is just a local mikutroon.
>>103133300Quanted mistral nemo or a finetune of it.
>>103133252Truth nuke!
how do i make my chatbox more chatty?dialogue is often like"i touch her by the pussy""yes keep touching me""i rub around her clit""yes iam cumming, aah. this was so great, thank you"like, its over before i even started talking about it
>>103133315>iToysNah these things are turning into scrap in a few months
>>103133387>>>/g/aicg/
>>103133387Tell it in the sys prompt to be more descriptive/verbose. Try writing longer replies.
>>103133387depends on which model you are using. If you are using like rocinante or rpmax or something they'll just undress you for saying hi.just find another model. I don't know how much vram you have so not sure what model to suggest.
>>103133387>"yes iam cumming, aah. this was so great, thank you"You're just too good.
I want to RP but I feel like I have the entire latent space in my head already, nothing feels new
I've been using imatrix quants for ages and just now noticed they are noticeably sloppier.
What do we do now?
>>103133471This, but I even have this for frontier models like Claude 3.5
>>103133405Hopefully you are wrong because nvidia needs more competition in AI
>>103133471Go for a walk after midnight and stare up into the cold, empty sky. Maybe something will come down and save you.
>>103133471Sounds like the problem is with you, try damaging the brain responsible for memory. That way things will be new again.
>>103133512you sound like you're experienced in that topic
>>103133315There's gotta be a way to manufacture a singular unit of hardware similar to the size of 4 M4 Pro Mac Minis or smaller while inferencing faster than that.
>>103133387force it to continue one if its own replies before it starts progressing too fastfor instance, you can edit the reply by adding a newline and a plausible first word (like "She") to make it write another paragraphthis is just tardwrangling though, coomtunes do that because they're garbage models trained on garbage datasets
Sam Altman finally did he, he created AGI. But it cannot be run locally and follows strict safety guardrails. Do you use it, or do you stick with local with even with all of locals faults?
>>103133702localan omniscient AI is worthless if it can't say the n-word
>>103133702Forget about whole AI thing like a bad dream.
>>103133702>orLike any sane person I use both and don't lock myself into one platform or provider.
>>103133702Elon will use his de facto presidential powers to force Sam Altman to release the model open source as per OpenAI's original purpose. Elon didn't drop his original lawsuit because he thought he'd lose, but because he knew he would find a much more effective way to force OAI back to its roots like this.I firmly belief that we'll be running all OpenAI models locally by the end of 2025.
bros... i dont feel so good
>>103133847And /v/ says you don't need more then 8 gigs of ram.
>>103133847It's time to quant!
>>103133847You can squeeze a couple of additional GB out of there by upgrading to Linux.
How plausible is this claim?
>>103133888>Chinese money tripsIt's already happened, they're just waiting for when they need another bump in the market.
>>103133872>impying
>>103133888Very implausible. With AI, you will be able to program in any language, not just English.
>>103133888It's getting there. It just needs to get cheaper. Using Claude 3.5 and allowing it to test its own work is already great.
>>103133888It isn't quite right.An LLM is basically the equivalent of a calculator for a programmer.It will eliminate the code monkeys but you still need people who actually know what they are doing.
>>103133882>linuxMake it headless and you free up both ram and vram
>>103131946According to the poll, someone will leak something first.>>103133882Or windows server...
>>103132621>>103133702>>103133888Gullible retards the thread
>>103133882>>103133985>linux>windowshttps://www.youtube.com/watch?v=NCGsN_Fx9EI
>>103134058aicggers do be like that
>>103131535what bpw are you targeting for your quant? will it available on huggingface?
>>103133985Patiently waiting for Miku to release her new model.
>>103133985>PicCringe.
Futa is gay.
>>103134237not if the balls don't touch
shitposting is fun
>>103134237No one is arguing with that.
>>103134237I will always say, the bigger the futa dick the gayer it is, and generally it is gigantic.
stop calling smol pp futas femoboys
>>103134276The size of the futa dick doesn't matter. It's the size of the balls, and generally futas don't have balls.
>>103134332So you are saying that if it has giant horse cock but no balls it's not gay?
>aicg mentioned >futa posting intensifies
>>103134368Exactly right. It's simple math.
>>103131597Thank you for the glass of bees, Miku
>>103133888Wrong, AI will make all language the new programming language and not just English. The Anglo-centric way of thinking in this field must stop.
>>103134373aicg:>Artificial>Intelligence>Causes>Gayness
>>103134373Usual troon hours.
Futa is straight.
>>103126193all those new models, always the same thing or some small incremental improvment over what we already have.when something actually novel.
>>103134555>when something actually novelwhen anons stop limiting the full potential of current tools
>>103134256>>103134276>>103134294>>103134332>>103134382>>103134421>t.
>>103134687this.
>>103134687literally me
https://huggingface.co/TheDrummer/Ministrations-8B-v1-GGUFWhy did everyone sleep on Ministral? Doesn't seem any dumber than Nemo.
>>103134760no proper llama.cpp/koboldcpp support
>>103134760I have more than 8GB VRAM.
>>103134760>Not ChatMLThe ChatML cartel disapproves
>>103134760I'm not poor enough to bother with it
>>103134760I have a quad 3090 rig and I'm going to try it out when I get home since I like Ministral
Monthly check-in. Any 70b+ models almost as good as Claude sonnet? The old one.
>>103134990The new one's bad?
>>103134990no but nemotron DESTORYS the old haiku
>>103134990qwen 2.5 72b is better at coding than 4o but it's still a bit worse than old sonnet
>>103134990Qwen2.5, the eva 0.1 finetune uncensors it. It feels smarter than anything not mistral large / 405B. Also nemotron, bit dumber but not too much
>>103134998Not really, just lacking that oldnnet and opus sovl>>103134999In which category? I don't do much hardcore rp>>103135006If it isn't gpt slop then it's doable
>>103135066405B is a meme
>>103135214405B is better than anything else local atm. Its not a crazy amount smarter than 70B but nothing knows even close to as much triva wise / lore wise on a ton of fandoms. Those params just soaked up so much stuff compared to the smaller models.
i dont need a 405b model for rp loljust pretend to be rental mommy until i cum sonetimes. a well optimized 8b model serves my usecase.
>>103135187>>103135187>>103135187>>103135187new bread
>>103135232You're clearly hitting diminishing returns on that size, you can't really justify using that over a 70B with a lorebook
>>103135262Kill youself zoomer
>>103135265>can't really justify using that over a 70B with a lorebookYes I can. Not a vramlet.
>>103135262Keep yourself safe zoombro
>>103135284Sunk cost fallacy? Got it
>>103135310It's noticeably better and eventually another big local model will drop as well.
>>103135278i'll first kill miku
can llms do image recognition locally?
It's dead. https://x.com/Yampeleg/status/1855371824550285331
https://files.catbox.moe/7db8kp.jpghttps://files.catbox.moe/auxzcj.jpg
>>103135538Audible pop confirmed. Nice Mikus.
Sarashina2 is the future.
>>103135563You getting curbstomped is the future.
>>103135581It's fine, maybe they will release a version that you can run as well.
>>103135530This is good for local.
>arcanum breaks character to tell me its uncomfortable with continuing the roleplay
>>103134119I quanted it to q8. I can't be arsed to split the files up in a way HF likes, so I haven't ever put my quants up
Normal thread.>>103135641>>103135641>>103135641
>>103135601Lay off that copium, it's bad for you.
>>103135652>tranime>normal Not in the slightest.
>>103135662Anime website
>>103134990Magnum v4 72B
>>103135600Nah, bug language is irrelevant for me.
>>103135530https://youtu.be/35IpOK-WaNA?si=a8D1aC-WK1Ldtu8e
>>103135538Miku is made of jelly?