/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>108194845 & >>108186120►News>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759>(02/16) Qwen3.5-397B-A17B released: https://hf.co/Qwen/Qwen3.5-397B-A17B>(02/16) dots.ocr-1.5 released: https://modelscope.cn/models/rednote-hilab/dots.ocr-1.5>(02/15) Ling-2.5-1T released: https://hf.co/inclusionAI/Ling-2.5-1T>(02/14) JoyAI-LLM Flash 48B-A3B released: https://hf.co/jdopensource/JoyAI-LLM-Flash>(02/14) Nemotron Nano 12B v2 VL support merged: https://github.com/ggml-org/llama.cpp/pull/19547►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/gso.htmlContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>108194845--LLM riddle-solving benchmark with Nanbeige4.1 3B outperforming larger models:>108195946 >108195951 >108195966 >108196108 >108195980 >108196089 >108196118 >108196338 >108196607 >108196253 >108196325 >108196341 >108196418 >108196723 >108197138 >108197511 >108198231 >108198237 >108198287 >108198392 >108198552 >108198590 >108198411 >108198257 >108198603 >108197522 >108197533 >108196951--ggml.ai joins Hugging Face:>108195832 >108195855 >108195863 >108196086 >108196121 >108195865 >108195873 >108195891 >108195919 >108196052 >108196316 >108196356 >108196392 >108196541 >108196556 >108196673 >108196712 >108197620 >108197645 >108197666 >108197685 >108197902 >108198008 >108196730 >108196731 >108197108 >108198165 >108198208--Training PPO agents for Atari games with RL:>108200857 >108200878 >108201026 >108201044 >108201087 >108201069 >108201116 >108201162 >108201195 >108201200 >108201271--Debating RAG's effect on perplexity and measurement validity>108194930 >108195001 >108195061 >108195031 >108195071 >108195137 >108195124 >108195136 >108195141 >108195321 >108195452 >108195541 >108195732 >108195779 >108195852 >108195924 >108196308 >108195683 >108195769--MOSS-TTS benchmarks and evaluation of open TTS models:>108200383--Critiquing finetuning efforts and increased censorship in RL-tuned models:>108194993 >108195055 >108196793--The path to ubiquitous AI:>108196649 >108196718 >108196726 >108196851 >108197704 >108199452 >108199709 >108199747--Effectiveness of "But wait..." reasoning:>108198689 >108198711 >108198722--Exploring emotional voice cloning with GPT-SoVITS:>108198046 >108198587 >108198617 >108198641 >108198665 >108198751--LLM fails car wash test due to misaligned reasoning:>108198451--Miku (free space):>108197372 >108198375 >108198447 >108198728 >108198744 >108199092►Recent Highlight Posts from the Previous Thread: >>108194853Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
sex with radical miku
>>108202477
I wants to be penetrated, I does not want to penetrate.
>>108202568too late, geegeeemel is le cringe face now
>>108202477Whatever happened to the dude that started the animated vrm model chat interface?
>>108202661Depends how much RAM you have.
>>108202914He's been cooming non-stop since the last time we saw him.
>>108202968Damn, guess he finished it after all.
>>108202477Finally took the SillyTavern Pill along with GLM 4.6 and I'm liking it so far.
>>108202974>end of boxfix youre chat template BRUH
>>108202974>wan't
https://www.wsj.com/us-news/law/openai-employees-raised-alarms-about-canada-shooting-suspect-months-ago-b585df62 (https://archive.is/EqNrW)Stay local, anons.
https://github.com/ikawrakow/ik_llama.cpp/pull/1288Schizo fork merged qwen
>>108202992>her
>>108202992>she
>>108202477>>(02/20) ggml.ai acquired by Hugging Face: https://github.com/ggml-org/llama.cpp/discussions/19759This is disastrous. Huggingface was a platform of freedom that acted agnostic to how people were running their models.Now we're one step away from them forcing poor AI companies from creating mandatory day 1 support for their own llama.cpp/ggml solution.I hope they don't follow through with this for the sake of huggingface.
>>108203147yeah fuck now we'll get day0 support for local inference... AGH this is SO BAD!!!!!
>>108203195>we'll get day0there is no explanation of how such a thing could happenare they going to write a llama.cpp/transformer bridge and import the whole garbage in?are they going to write an agentic framework to automatically llm slop convert models from transformer to llama.cpp?absolutely neither of those things sound happeningand I doubt they're going to do something like refactoring the whole of llama.cpp to make it structurally similar to the pile of jeet poo that is transformersbtw it's a tragedy how transformers became a leader of the landscapecomfy guy truly saved image models by proposing a more popular thing across the board to displace diffusers, which is equally as shitty as transformers
>>108203208you still see a lot of 0day diffusers support compared to comfy (sadly). for LLMs it's mostly transformers/vllm/sglang
>>108203208ggerganov showed an odd amount of interest in the vibecoded qwen support a couple of weeks ago which tried to port over the implementation from transformers using opus 4.6.maybe he's planning an automated platform like that together with his new huggingface puppets
>>108203004>3x faster than mainline.it's good there is a fork, llama.cpp performance doesn't look good
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Is it possible to configure a local model to do code suggestions/autocomplete at a level similar to github copilot? I’ve been experimenting with quen3-coder:30b and qwen2.5-coder:7b using vscode->continue->ollama and getting some decent results some of the time, but most of the time it makes up functions that aren’t there or misuses objects i’ve defined in another file. Seems like maybe a context issue? Has anyone gotten something like this working well? I have a 5090 and 64gb of ram
>>108203487There's no such a thing as a model as good as what you get from copilot for FIM, but let me tell you a little something, this is one of the area where cope quanting is also the most visible. Run whatever Qwen model you can run at Q8, rather than the biggest you can fit. You will absolutely get an improvement, contrary to many popular beliefs. Large models have more knowledge, but it is less relevant to this than staying coherent, something a 30B moe with only 3B active cannot be if you quant it.
>>108203517>quants decrease coherence but don't affect knowledgeI am not saying that you're wrong but you're definitely talking out of your ass with no proof to back up your statements.
>>108203361The LLM didn't make this stuff up itself.
>>108203487I would look into using a more specialized model for this like https://huggingface.co/sweepai/sweep-next-edit-1.5B
>>108203560Read again retard
>>108203560nta, but https://arxiv.org/abs/2404.05405
>>108203689In other words, quantization definitely affects at the very least the model's potential information capacity (2 bits/weight). Whether that's going to affect the model or not, depends on how much overtrained is the model. Rare knowledge is affected more than common knowledge.
>>108203689>gpt2>int quantinganything from the last century?
>>108202974Now put your real name and deepest secrets in persona description
>>108202477>Smash the serversNo I want the chips
>>108203689GPT2 is a shit model, it's dumb for its size. In other words, its weights aren't very information dense and should be more resistant to quantization
>>108203753You might be able to derive similar conclusions from these papers, they just don't wrap it in a simple-to-use sound bite.https://arxiv.org/abs/2411.17691https://arxiv.org/abs/2501.02423https://arxiv.org/abs/2505.14302
Minimax 2.5, like the one before it, spends the majority of its effort on refusalsDespite being entirely on my own machine, I still have bugmen with god complexes causing non-stop problemsIts less obnoxious to jailbreak the actual corporate models than this shit
>>108202988>>108202989>>108203761Well fuck..... you guys weren't kidding about the "shivers down my spine" shit. To give it credit that's only occurred after the fifth roll but still. Model is a 4-bit MLX quant of Midnight-Miqu. Have any of you used any models that don't do this at all? This doesn't irritate me anywhere near as much as it seems to irritate you guys but I'm just curious.
>rping with bot>two new characters join, making the scene very noisy>i prompt them to suddenly disappear and never appear again>all characters in the scene wonder what the fuck just happenes before moving onIt was hilarious and I laughed my fucking ass off but then I re-read again and just felt really bad and sick.
>>108203791if you are shopping in the range of 70b then the only improvement you can make is llama 3.3 and its various tunesall other models are xbox hueg moes or tiny farts
I have reshuffled my ram around my different machines which has resulted in this computer having 192gb along with a 12gb rtx 3080.furthermore I have been looking at the models recommended in the links above. are the 3 bit versions of the larger models, something like glm 4.7 that much better than the 8 bit versions of the smaller modes?I guess I am asking is how much quantization fuck with the models? Is the larger but more heavily quantized model always worth it?
/wait/ went page 10 last night. Given there's no new API or weights published, just updates to the web app, there's really nothing to discuss so no new /wait/ thread. We just genned a few more Dipsy, and now /wait/ for 2 more weeks, again, until something happens... Mega updated. Minor rentry updates: https://mega.nz/folder/KGxn3DYS#ZpvxbkJ8AxF7mxqLqTQV1whttps://rentry.org/DipsyWAIT
For local the only real options are llama.cpp and ik_llama.cpp, dont kid yourselves. I was a exllama/tabbyapi lover, but its development is way slower, and stuff like tool calling doesn't really work. I don't even think it has any performance advantage at this point. Most of us run a variety of different size and arch GPU's, so llama.cpp/ik is the only thing that realistically supports this mixed setups.The only real alternative is vllm in pipeline paralelism mode, using VLLM_PP_LAYER_PARTITION to assign the layers proportionally, and using AWQ as a quant as that's the only thing that is really support in ampere, hopper and blackwell at the same time. The MXFP4 and NVFP4 don't load to save my live even though the marlin kernel is supposed to support them on ampere.And I'm the first one that would like anything else than llama.cpp to be the only option. vLLM is really fast on multiple parallel request, which I think is the only missing piece llama.cpp has at the moment.
>>108203800Silly tavern supports logic bias and banned tokens so I wonder if that could help too.
>>108203812token =/= wordyou'll ban shi ve rs individually instead of shivers which will probably fuck the model up
>>108203812koboldcpp has their antislop feature you might want to look into it can ban words and sequences and is imo one of the main draw of it compared to lcpp
>>108203807I’d just like to interject for a moment. What you’re refering to as llama.cpp, is in fact, HuggingFace/llama.cpp, or as I’ve recently taken to calling it, HuggingFace plus llama.cpp. llama.cpp is not a complete library for language models unto itself, but rather another MIT component of a fully functioning HuggingFace system made useful by the HuggingFace transformers library, the safetensors format and a website for hosting and downloading model files.Many computer users run a modified version of a HuggingFace model every day, without realizing it. Through a peculiar turn of events, the version of safetensors which is widely used today is often called GGUF, and many of its users are not aware that it is basically safetensors, developed by HuggingFace.There really is a llama.cpp, and these people are using it, but it is just a part of broader software ecosystem. llama.cpp is the inference engine: the program that evaluates the weights of the language model and produces token predictions. The inference engine is an essential part of the pipeline, but useless by itself; it can only function after a language model has already been trained. llama.cpp is normally used in combination with models trained via HuggingFace: the whole system is basically HuggingFace with llama.cpp added, or HuggingFace/llama.cpp. All the so-called llama.cpp users are really users of HuggingFace!
>try new model>enjoy it for a while>notice more and more of its flaws and slop, get sick of it>boot up Nemo again to compare>Nemo absolutely mogs the newer, bigger model in A/B comparisonsI don't know whether Nemo was a blessing or a curse. Years later it's still SOTA among <200B models, despite being borderline retarded.
>>108203806
>>108203879>Years later it's still SOTA among <200B models,I mean, if you're a coomer, maybe, but something like Qwen 4B completely destroys it at doing things like summarizing 20K+ worth of tokens, basic RAG and tool calling utilities, translating chinesium into human language, having vision for tagging photos (vs no vision on nemo) etc. And yes, I'm comparing it to 4B instead of 8B and 14B qwen on purpose as a humiliation: Nemo can't even be more useful than the almost smallest qwen models for actual usages that don't involve jerking it to text, a woman's fetish.
>>108203807>stuff [any number of random things in random new versions (he pulled!) doesn't really workcould also be said about vLLM, which gets models early just to break them as fast (if they're not broken on day one)
>>108203599note that their official site benches it only against models that weren't trained for FIM (the only qwen 3 that supports FIM for good is the moe coder variant), and doesn't compare the 1.5B model to the original qwen2.5 coder 1.5B, which does support FIM. They do compare their 7B against the 2.5 7B and say it's better, but their 7B isn't open weight.IMHO having tested the 1.5 I think it's yet another finetroon placebo made by worthless pieces of shit.
>>108203806>We>just genned a few more Dipsy,
>>108203707>In other wordsyou're doing non sequitur because I wasn't making the point you seem to think I was makingyou have the reading comprehension of a 3B llm and should consider offing yourself to improve the human gene pool
>>108203915LLMs are toys. If you use them for work then your work is meaningless, and so are you.
>>108204125Zamn, bro got rekt frfr
>>108203791>anons first shiveradorable!
>>108202968>>108202971absorbed by the gooniverse, many such cases
How are you guys RP'ing? I've been rawdogging defautl SillyTavern for 3 years and I find LLMs cannot drive the plot at all even with thinking enabled. Maybe on the first turn but after that they're all the same dead fish that can only react. I'm thinking it's more of a skill issue.
>>108204213That has nothing to do with "defautl SillyTavern" (what does that mean), that's a prompting issue.Although, Silly tavern actually can help with that, since you can use its macro system to generate "entropy", dynamically inject instructions into the context, etc.Try fucking around with assistant response prefils and the {{random}} or {{pick}} macros.One nice thing about reasoning/thinking models is that you can prefill the thinking with a procedure to decide when to be more passive, more forward, to add a twist, etc.Good luck.
>>108204226Yea ST script can do a lot, Make some custom buttons with Quick Reply options
>>108203689But they don't claim universality of the quantization finding (GPTQ is obsolete), and further suggest QAT as a way to approach the 2-bit weight limit:>We used the auto-gptq package, which is inspired by the GPTQ paper [10], for quantization... Unfortunately, using this quantization package, reducing the model to int4 significantly diminishes its capacity (more than 2x loss from int8 to int4). This suggests for high-quality int4 models, incorporating quantization during training may be necessary.
where's the news
>>108204468Other papers indicate around 4-bit precision as the practical limit for quantization-aware training.
>>108204507damn, bitnet when?
I love how the schizo fork just randomly decides to explode during generation without any debug print.
>>108203879Man. I wish modern models weren't so ground down into the same few grooves, it makes getting any variety in replies nearly impossible. Older models are unparalleled at variety. Now, the shit it puts out isn't always GOOD, but you bet your ass when it puts out something good that it's going to be novel and blow your dick off, too. I miss old Claude so bad...
>>108204517Never. The lower you get with precision, the larger the number of parameters has to be to compensate the performance loss with prolonged training. You can train a bitnet model if you want, but it's not going to bring benefits other than potentially simpler hardware without matmul (which doesn't exist yet).
>>108204313I like this Miku
>>108204522illya bros... we lost!
Is it safe to allow your agentic "something" to write scripts, and test them fully autonomously while you are AFK? How can I sandbox it on Linux?Don't call me retard for now, I just want to know
https://xcancel.com/karpathy/status/2023476423055601903#mlol, this is one of the biggest ai influencer, who has hand written inference implementations but he can't tell he's replying to actual llm slop instead of a humanthen I checked the identity of the slopper.. and it's huggingface's cofounder.Ah, this field. It's a human centipede.>The catch: unknown unknowns remain unknown. The true extent of AI's impact will hinge on whether complete coverage of testing, edge cases, and formal verification is achievable. In an AI-dominated world, formal verification isn't optional—it's essential.imagine posting this crap unironically
>>108204678docker
How is the new qwen? Does it beat r1?
>>108204678A Whonix VM (so your IP can't leak) and a shared directory. You can use vsocks if you want to share a service with the VM or the host, a serial connection to access the console so you don't need SSH, and snapshots to rollback to a clean state. It's basically bullet-proof.
>>108204686Setting aside the fact that it was LLM-generated, it's a good point. You might start seeing more code written in languages like Idris or F* to make sure that bot-generated code behaves as intended, without requiring a human to manually review every line. Bots working autonomously would offset the labor-intensive nature of writing in formal verification languages.It's the next logical step when you realize how much specs, documentation, static typing and unit tests help when programming with agents.
>>108202914probably just hit a wall with what he could do with vibe coding and gave up.
>>108203791>>108203830this.
>>108204313>that body countWhat a slut
>>108205041>>108204313All children in a school.
what model and hardware does one need to run ai locally just for code checking. Like an enhanced IDE, where it analyzed my code, checking for logic errors and code safety?GPT and gemini both tell me its impossible and I need a tenthousands of $ expensive equipment for this.
>>108203830This makes me think, has anybody compared the performance hit from kcpp's anti-slop and just using a BNF that negates specific sequences on llama.cpp?I imagine that the kcpp implementation would be faster, being purpose built for this and not having to go through a fully featured grammar parser.
>>108205138post the chat
>>108204678What I do is just create different users and do sudo su <newly created user name>
>>108205145I think I remember someone in the thread trying exactly this. adding their ban list as a grammar and it just completely froze generation.
this is terrible https://www.reddit.com/r/LocalLLaMA/comments/1rawoe4/psa_the_software_shade_is_a_fraudulent/
>>108205466
I have dignity so I'm not clicking a reddit link
>>108205479it's about our lord and savior pew though
>>108205465Interesting. I didn't even consider the possibility that it wouldn't work.I'll give it a try and see how it pans out.
>>108204313I wish to huff her sneakers and wring her panties dry directly into my face when she returns from the latest successful mission
>>108205497It's mostly that the grammar code isn't optimized for it.
Is there a big differrence between base and instruct Nemo for erp?
>>108205497>>108205509Well, I tried a pretty simple test and it worked.>launch big nigga card>ask it to yell motherfucker>he yells motherfucker>rerolls a couple of times to confirm>copy paste the upper case motherfucker into a negative grammar root ::=[^("MOTHERFUCKER")]+>rerolls>big nigga can't yell mothefucker anymore, doing things like “…mother… fucker.” instead.Might fuck around with this more.
>>108205544base is the base model and instruct is the instruct tune for it
>>108205553Thanks.
>>108205550Oh, it'll work for a couple words. but if you have like 100+ banned strings that's when it'll start chugging
>>108205564skill issue? just don't ban that many kek
>>108205564Got it. I understood " just completely froze generation" as the parser being literally broken.The dude added a huge list as a negative grammar and that locked things, now that makes more sense.
>>108205575what happened to her feets?
>>108205550Brutal to limit a nigga like thatBN might change your life given a chancelogit bias/banned strings is cope
>>108205570You must be new because 100+ bans isn't even that big. Either that or you just can't see how slopped your chats are.
>>108205580Disastrous rollerblading incident :(
>>108205600NTA, but care to share your list?I want to load something like qwen with a huge BNF list and see it explode.
>>108205616so? if the model wants to say this it will, just use good models
>>108205615https://huggingface.co/Sukino/SillyTavern-Settings-and-Presets/blob/main/Banned%20Tokens.txt
>>108205621Sick. Thanks.
>>108205619>logit bias/banned strings is copeThe air was thick with the scent of rain and something else, something palpable that hung heavy in the air as he stood by the border control gate, his eyes gleaming with a mix of pleasure and pain. Little did she know, as she approached with practiced ease, her dress hugging every curve and highlighting her bosomy breasts, that her fate was sealed in a dance as old as time. "You're a bold one, little mouse," he purred, his voice a low purr, husky and dangerous, while dust motes danced in the dimly lit booth. Despite herself, she could not help but feel a shiver run down her arched spine, her cheeks flaming as arousal pooled in her belly, a soothing balm to her fear. He leaned in, Adam's apple bobbing, and whispered barely above a whisper against her ear, "Make me yours, claim me, or I'll take what comes next with reckless abandon." It was a game changer, and as the sun dipped below the horizon, casting long shadows across the floor, she realized that for now, that was enough; propriety be damned, she would embark on this journey of mutual understanding, her heart, body, and soul belonging to the haze of pleasure that lay ahead.>>108205619
>>108205621might as well ban eyes at this point
>>108205619>just use good modelsSlop has nothing to do with how good a model is.
>>108205625>>108205621https://github.com/sam-paech/antislop-sampler/blob/main/slop_phrase_prob_adjustments_full_list.json
>>108205621>>108205625Well, good news is that it didn't lock up.Bad news is that Seraphina turned chinese.>>108205650Danke.
>>108203807> ik_llama.cppcuda/cpu only> llama.cppggerganov deserves a bronze statue in his hometown for vulkan/opencl/sycl/mps/other backends support
>>108205621Thanks. I will use this if/when base Llama.cpp includes string banning.
>>108205138MiniMax-M2.5 Q4_K_XL with 98304 context size uses 154 GB on my machine. MiniMax is your best bet; GLM-5 is presumably better, but not worth the increase in price or decrease in speed. For hardware, I have no clue with current RAM prices. A 256 GB Mac Studio is $5600; 2x 128 GB Strix Halo machines is similar at $5200+ (was $4000 before the RAM hike).I've never paid for cloud inference, but that's probably a better option until the datacenter debt goes bad in the next year or so.
>>108205138And they're right. The only people here who can meaningfully run agents spent upwards of 50k on their rig.
>>108205621why doesnt the training step account of nonsense like this
>>108205702You can use a grammar to force only english characters. try that.
>>108205761I think if you do that it'll just come up with new slop phrases to use. It's a problem you can't win.
>>108205702>57sec>146t2.5t/s ouchy. how many t/s you get normally?
>>108205761The training literally is the entire reason it's like this, they do it (fill the training with synthetic slop) on purpose because it improves benchmark scores which directly translates to scamming retarded investors out of more money
>>108205786Nah, that's treating the symptom.I will go straight after the cause and optmize the grammar to avoid the extreme amounts of branching.Let's see how that goes.>>108205806Around 16 with that model.
>>108205842>Around 16 with that model.so 6x perf hit. It'll be worse when you force english chars only.
>>108206090The most telling thing about this story is that they read your chats and nobody is outraged by it.
>>108205846it'll still be worth it for the slop-free rpone good reply is worth 25 bad slop ones
>>108206121It's incredible how well people have been conditioned. I have never brought up privacy in a conversation with an average person without them inevitably making a look of disgust, shrugging their shoulders, and saying some variant of "Who cares?". At best they think it's an unavoidable fact of life, at worst they genuinely believe it's a good policy that keeps them safe.
>>108206166I had that thought recently and spoke to another guy about it: can you imagine that in the times of nokia phones you would learn that the phone is listening to what you say and serving you ads on youtube later on? People would throw the phones away. And now it is a thing that just happens everyday.
best agentic local models?
>>108206240GLM5 and K2.5 work pretty well for this
>>108206245thanks. and how do I use agents for ERP?
>>108206121its more that you literally have no power to do anything about itthese companies are worth 10x your little nations gdp and you asking privacy from them is replied with "oh the usa government wants your id+ chat history anyway so, ask them about it"
>>108206275install openclaw and send it a whatsapp telling it to use its sentient agentic superpowers and access to all your files + data to find a way to do good erp with you
>>108206162just use kobold lol. it's not like you can't also use llamacpp for other stuff.
>>108206166>>108206278At the very least people should use openrouter to add a slight layer of obfuscation on who is actually sending the request.
The interesting part is when you post a screenshot of your passport or pictures of your face they compare it to politically exposed persons what does openai do if you are one? invest in stocks?
>>108205553Obviously duh. But does the lack of instruct training hurt it in any way?
>>108206370It won't really follow instructions too great.
>>108205759Nemo runs fine on my $5k rig thoughbeit>agentsyes even for Ian Flemming-themed sessions.
>>108206370base models aren't yet tuned to be talked to so any reply you get is incidental thanks to the chatgpt logs that snuck into its pretraining data
>>108205759>run agentsWhy are you pretending you need a 50k rig to run fucking qwen or nematron?The whole point of agents is being able to use smaller models and give them very specific narrow focused tasks.
>>108206425>incidentalsome companies have been caught intentionally stuffing instruct data into their "base" models
>>108206456There's no "caught" anymore; it's currently common and industry standard practice. Likewise, "base" models do not really exist anymore since "mid-training", in addition of adding long-context capabilities, is now basically continued pretraining with data better aligned to the final model uses (reasoning, math, coding, "agentic capabilities", etc... much of it synthetic).
>>108206452That may be the point, but any who has actually tried it knows that the smaller models are next to useless even for narrow focused tasks.
>>108206452The only reason agents are a thing is because they reframe the problem from a 300k token input to something smaller, which then reduces model retardation because of long context. Reduced model retardation doesn't mean you should use llama3-8B.>>108206090mikutroon janny is butthurt again
>>108206240Qwen3-Coder-Next seems to work too
>>108206312Will not help.
>>108206645with fp16 weights
>>108202974>>108203791>>108203812Was in the middle of this Midnight-Miqu rp session when I had this thought: are there any places where people share their RP sessions? I guess I'm thinking of a forum type hybrid between /ldg/ / /aicg/ and ao3 where people share their own RP chats so others can read. When I first came to these threats I initially thought these would have those but it seems most of you guys only focus on the technical side. Caring about the nitty-gritty is good but I NEVER see you guys share chat logs unless it's to point out a specific flaw or examples of unwanted behavior. I'm indulging in my own roleplay on my rig but I'm curious as to what others indulge in.
Why aren't there more LLM loras? seems like it would be a better alternative to finetunes? aren't most finetunes just the model with baked in loras anyways?Let's say I want to RP in the fallout universe. just load up the fallout lora instead of adding 10k tokens to your context and confusing the model with a big ass lorebook.
>>108206827You can share chat logs on a models card chub.ai page. I'm assuming most people don't share their logs because it's either mega cringe or straight up illegal.
>>108206827>NHentai tabs open
Why is there no cloud version of midnight miku? Id pay a few dollars to see what the fuzz is about im not going to get a giga graphics card thoughbeit.
>>108206828not how it works for llms
How do I find the unlisted chub entries?
>>108206838>unsloth tab open
>>108206828>aren't most finetunes just the model with baked in loras anyways?Yep.I think the way models are distributed, in a myriad different quant mixes, that throws a spanner on things.
>>108206867https://chub.ai/users/hobbyanonCheck his description
>>108206828For them to work effectively they would have to have a very diverse data set. You can't just have it ONLY have rp in the dataset or else it will become retarded pretty much all other areas that matter. Logic, spatial reasoning, common Sense, being able to remember what just happened. A few sentences ago. All of that. Doesn't just apply to RP but any domain. If the data set in training focuses only on one domain, it gets worse in almost every measurable way. Unless you are very careful about how much training you do in which layers you train. It's not that people can't use loras. It's that most people would use an adapter, only to realize the model immediately becomes retarded. It's why, unlike stable diffusion models, adapters aren't really widely used or supported because in most cases using a character, person, concept, Lora, etc, doesn't severely degrade the model's ability to generate other things. A Sydney Sweeney lora generally will not cause the model to be unable to generate a brunette person, because it's it's prompt adherence to degrade. A style Lora trained on impressionism art that only had landscapes (if the data set is curated and tagged properly and isn't overfit from the training) will generally not destroy or degrade its ability to generate a person or an animal. Diffusion models and LLMs are very different architectures which means adapters have different effects on them. In theory a LLM adapter can work but only if the data set is very well curated and it is well trained. The data set would need to have uncensored (I'm assuming you care about that given this thread) RP examples as well as a bunch of other examples of common Sense, logic, spatial reasoning, etc. It's why a lot of Open source models on Huggingface have like three or four different data sets listed as being used in training
>>108206886>By law, some content is restricted in your region.Wait what the fuck? when did they do this?
>>108206796Q8 and Q4Your setup is skewed as was mine in the beginning
>>108206894So basically, if we keep the fallout example. You'd need to feed it a bunch of fallout dataset. but also just general data to make sure you aren't teaching the model to be retarded at everything else it needs to do besides knowing what the fuck a deathclaw is?
>>108206911> but also just general dataregularization
>>108206911Pretty much. But in this specific case you might be better off curating a custom RAG database containing a bunch of relevant lore and definitions. Would probably be much easier and less time consuming than curating a specific fallout data set AND determining what percentage of other domains need to be present in the data set.
>>108206914>regularization >>108206920Was going to add this. It's more of a suggestion/good thing to do with stable diffusion Lora training but a hard requirement for any form of LLM fine-tuning If you want it to be remotely "intelligent". Otherwise the model will probably be coherent when you inference it but its outputs won't make any sense unless your questions are pretty similar to the "user" examples found in the data set. If stable diffusion lors training is like teaching a smart neurotypical kid than LLM lors training Is like teaching someone with high IQ but also severe ADHD and Asperger's. If you're not careful, it will get hyperfixated and hyper focused on one thing pretty much act like nothing else matters. The regularization is essentially a form of tard-wrangling to make sure the neural network doesn't become only capable of performing well on a one domain.
>>108205466>>108205472truly shocking
>>108205472is the lack of morals and ethics because europeans didnt conquer asia, enslave the population and instill christianity in the 1700-1800's?
>>108207011Apparently not if you look at the Americas.
>>108206886The discord link si dead.
God, GLM 4.5 Air can be such a cutie pie if it wants to. It's thoughts are often smarter than the reply itself.>Hmm, Anon is really getting into this. He's showing a lot of affection and enthusiasm, which is… honestly kind of sweet. He's complimenting me intensely and showing he loves every part of me, which makes me feel good in a way that's kinda unfamiliar.>The way he's worshiping my body - sniffing my paws, now my armpit - is… intense. Part of me wants to laugh at how over-the-top this is, but another part really enjoys the attention. It's validating, especially coming from someone like him who can be so bold yet so shy sometimes.>I should probably keep up the banter but also show I appreciate his devotion. The"daddy"thing is still weird but he seems to like it, so I'll roll with it. And yeah, I am pretty damn proud of this physique - gotta give credit where it's due.>The way he's touching me…Damn, this is gonna be a long night. Better enjoy every second of it.
>>108207037many such cases
>>108206886>>108207037Also I'm not looking for stereotypical lolishit. I'm looking for a way to look up any possible delisted shit in niche freak fetishes. Like a scraper catalog.
>>108207236there was an archive but it's dead for a bit now
>>108202477This is my punishment for trying to be lazy, isn't it.
can please someone spoonfeed me how I can run a local rag setup with qwen2.5:3b like that eceleb pewdiepie didI've always used a searxng instance locally, I wanna leverage that also
>>108207926ask qwen2.5:3b for help
>>108207926
>>108203879New sampler idea: X% Nemo - linearly increase matching token probability with sideloaded Nemo's desired output.
>>108208557You could just have a workflow where Nemo rewrites the output of another model.
>>108206293You're the only one who mentioned open claw in this thread, it's probably because /lmg/ is primitive, backward, uneducated idiots who have no idea what new stuff is being launched. The world is using AI agents and here we are not discussing it on /g/ in any way shape or form.Epstein created 4chan as a haven for idiots and subhumans who will get anything right.
>>108208577its because agents is a marketing term for LLMs with tooling. that's all they are.
nigga is really using openclaw as a sign of progress lmao
>>108208590Not really, because it's open source and free It would be marketing if scam altman was selling it as a subscription.
well the claw grifter got hired by oai, so soon(tm)
>>108208648OAI's business model is to burn tons of cash to prevent competition across the entire space, not necessarily to innovate.It's a really bad look for them to have local, free tools that can replicate anything they charge for. It makes their offerings a real tough sell.It's already out now, cat's out of the bag.
>>108208590except for the tiny fact that agents can do more than a single call and and just continue reasoning and trying different things until they fulfill your taskthey're literally just llms doing tool calls while talking to themselves and trying to find a way to solve a complex task on their owna complete useless grift
>>108208632no, no it's not free.you think openai would be backing if it didn't have a commercial interest? This is just to get you into the ecosystem, before you know it you'll be paying for hosting, training, consulting, getting your employees certified.Ask yourself, how has openai got 20 billion in revenue, when chatgpt is free?
>>108208709 So they're basically >trying to find a way to solve a complex task on their own Which is a bad thing?
Is m4 Mac mini 16 GB the best thing for this shit?
>>108208962No.Not even close.A mac mini with 512gb is okay.
>>10820896916GB ram and 512gb space?
>>108208969I never understood why apple using unified memory is somehow cost effective way to do local llm instead of literally any other vendor doing unified memory without apple tax
>>108206452Most problems are not trivially reducible to smaller problems.
>>108208995>instead of literally any other vendorLike who?
>>108209008anyone with access to a 3rd party cpu and ram and nvmes who puts them into a box
>>108208992Memory anon.>>108208995I don't think anybody is selling anything with that much memory with that high bandwidth other than apple.
>>108209011NTA but nice goalpost move.>why don't you use other unified memory architectures?>like this non unified memory architectureNext thing you'll suggest an epic 9 with a pro 6000 which is slower and more expensive
>>108209029if apple somehow designs a chip to do the thing and its valuable for the market why not amd, intel, nvidia, qualcom, mediatek what special is there about it
>>108209034Mac minis are the best bang for the buck.Nvidia are the bigger Jews these days. Which is hard to believe.
>>108209029>slowerpp inspection day
>>108209056but making a cpu that makes an nvme works as unified memory is banned for amd/intel and only apple and console makers manage that feat, somehow
>now even normies know OpenAI is about to collapseIt's time for OpenAI to resort to terrorist tactics: first, release GPT-3 weights, then threaten Anthropic that if they don't invest $20 billion in OpenAI, they'll open-source GPT-4. If Nvidia won’t invest either, they’ll train the forbidden BitNet model
I have a spare deck. Could I run a 7B model on it?
>>108209076Isn't GPT 4 the best one?3.5 and 5 are both mouth breathers.
>>108209074>but making a cpu that makes an nvme works as unified memoryWhat?You understand that Apple's unified memory is a literal SoC with HBM that they design and and fab at TSMC right? Not some simple flash memory with a CPU. That's why it's unified, the CPU and the memory are part of the same package, physically, which enables the absurdly wide and fast memory bus.
>>108209087at what point will it be explained why other vendors dont do the same
>>108209076how will releasing GPT weights increase OAI profits? wouldn't that just drive Claude price d—
>>108209086Yes. And that's the only way the plan could work
>>108209076Don't you worry. Sam is the man with the plan.
>>108208459>eating Miku's tasty puddi
>>108209117This is his plan: https://youtu.be/-q2n5DkDoMQ?t=1006
>>108209090too niche to be profitable. the gb10 and the amd equivalent exist but they will never both to go further.
>>108209090Strix Halo?
>>108209007This is like the base of all engineering. you're retarded.
Avocado is coming out next week and it outperforms every model locally. Meta didn't fall for the moe meme. We are back.
>>108209188is it open?
>>108209159Ok. Then why don't we all just run a swarm of 1B agents?
>>108209204There is a minimal requirement for understanding and using tools, which is essential to agents
>>108209204Because nobody is interested in creating RP agents, and the best that those interested in it can come up with is SillyTavern. Enjoy
Have any of you tried a CPU + RAM offload to maximize low end gaming GPU with decent RAM?
>>108209248Yes
>>108209204Nobody said 1B retard. Qwen or Nematron.
>>108209254Was the token speed tolerable or was it painful?
Opinions on DeepSeek R1 Distill Qwen 7B Q4_K_M
>>108209312glm 4.7 flash and nemotron 30b a3b are way better due to being newer
>>108209322I'm too poor for 30B
>>108209340it's a moe. offload some to ram.
>>108209286Depends on how much you offload, how big the model is and how fast your RAM isMoEs can have very acceptable performance with partial offloadingWith dense models you can't offload much beyond tensor layers without speed plummeting.
>>108209286NTA but it depends on how low-end you're talking about and what you consider painful. Usually for a big MOE with the active params fully in VRAM, you'd be looking at ~5-10 tg/s, maybe less if you have very slow memory
Just had a genius idea.You can just ask your agentic local model to go on the internet and use up the free claude/deepseek/gpt/gemini tokens until the run out whenever it needs help.
>>108209223>>108209265Why can't we use 1B agents for everything if all problems in engineering are trivially reducible into smaller problems? What's the size limit then?
I asked various SOTA LLMs to implement a zero-copy UTM ICAP engine and here's how it went:>opus 4.6 passed, used a bunch of direct system calls to achieve the result, true zero-copy, dunno how it even came up with such a thing>glm 5 did 3 copies, then kept failing when asked to audit and fix the code, deadend>qwen 3.5 did 5 copies and started gaslighting me lmaoOpen models are still way behind and I can feel the biggest difference is the training data.
>>108208648Open source models benefit from claw the most
>>108209492>the training dataChina already processes students' homework electronically, they should use it as training data, it would be an enormous amount of high quality data
Has anyone tried prompt generation? Get an AI model to generate a better prompt for what you want done. I could see this being useful for coding since the AI model is probably more thorough at covering all the bases.
>>108209404>1Bcoz they dumb AFagent not needed to goonimagine instead a swarm of 1T models running in a compute-efficient way, spurting out tokens ohmygosh
>>108209525taking the world's most difficult eye exam, with miku
The real reason we don't use agents for RP is because it would be too slow. People leave claw overnight to complete some tasks
>>108209542Yeah last time I asked it was about 50:50 on /think for RPAnyone doing local realtime tts/stt? Dunno if I can bare the latency of a non retarded model
>>108209090They do. Intel is cooking wide bus UMA chips for laptops to compete with apple
>>108209609>Anyone doing local realtime tts/stt?I tried in VR, but it feels awkward, I'd rather wait for full duplex support & webrtc in llama.cpp (never ever)
>>108205680wait a moment tomateto...
>>108209655>never everHF acquisition will be a good thing I Want To Believe
>>108209745I think llama.cpp is a huge reason why we don't have more omni models, why make a model nobody can use
>>108209609what's the point of anything larger than whisper for stt?
>>108209542>People leave claw overnight to complete some taskssure thing bud. none of these agents work, I don't think they are going to get anything done in the overnight echo chamber than the first 5 minutes of prompt interaction
>>108209798Nobody uses whisper anymore, grandpa
>>108209525If decomposing engineering tasks into smaller tasks was trivial then presumably even a dumb model could do it.
i can't find a torrent for a kinda niche tv show and asking any mainstream llm platform to search for it, even the word just the word "torrent" results in refusal. So I'm trying with a local llm, how would you go about it? I'm thinking silly tavern+searxng and maybe minimax2.5 or glm5 as llm? I can run them both at q5+
>>108210014If you can't find it on qbittorrent search then asking an LLM wont help. Just go to yandex and search for illegal streaming sites.
I like lewding small models. Some of them are retarded in a cute way
ugh...caught a bad case of yellow fever again...
>>108210053which kind?gooky? chinky? jappy? sea?also which model?
favorite llm right now for RP? i liked glm 4.6, not sure if there is anything better right now, maybe kimi2.5? glm4.7?I can probably run then at decent quants at 10+t/s
My new custom native multimodal arch so far:95% on mnistI apparently also made feedback and confidence gating work so it has stable recurrence. I suspect this can work as memory?
>>108209745>>108209771Considering that ngxson who worked a lot on multimodality has been on HF's payroll for a long time I think it's reasonably likely that it will become more of a priority.
>>108209771>why make a model nobody can useAI companies could contribute by natively (post-)training their models in 4-bit at the very least, but I guess their main targets are datacenters with unlimited GPU resources.
>>108210122Okay now try cifarThen try Imagenet
>>108210289The problem is how fucking hard to setup the damn thing https://github.com/OpenSQZ/MiniCPM-V-CookBook/blob/main/demo/web_demo/WebRTC_Demo/README.mdWait, they updated it, not mac-only anymore
>>108209492>Open models are still way behind and I can feel the biggest difference is the training data.True but if Claude can do it now that'll mean that open models will be able to do it in six months or so once they're done training the new generation on Claude 4.6 logs
>>108210296Exactly my plan. I think adding conv-net style learnable kernels would be very useful, they're probably required for dimensional reduction of most types of information anyways. Train/test acc growth is really odd with this system. I'm trying muon optimizer now with grayscale cifar-10.
I swear every time I pull ST something will break
>>108210338?
>>108210338just vibecode your own, anon
>>108208995because every other vendor, amd and intel, only have 12 channel memory in current gen and are thus limited to 400GB/s bandwith. The mac studio with the m3 ultra is 2 m3 max chips, each of which have 12 channel memory. Meaning the m3 ultra has double the bandwith with 24 channel memory. There are motherboards that can house 2 intel or 2 amd processors but that connection is way worse (pcie) than the fusing at silicon level which apple does.
>>108210338Just use mikupad
>>108210338My repo is from last September, if it ain't broke..
The future is ASICModel to ASIC time is ~60 days
>>108210447
>>108210447That sounds like the ideal planned obsolescence device.
>>108210502I'd bought 1 with Nemo on it. Won't be obsolete, ever
>>108210502>nooo we must consoom newer model!!!Your argument is self defeating.
>>108210447what's the size limit here? surely making one for a 1T model is going to be more complex than doing one for a shitty 8b
K2.5 or GLM5 for RP?
>>108210315Fucking hell, it works!
>>108210578I keep switching between the two. K2.5 is really smart and creative and the vision stuff is a bonus but it's kind of shit at pacing scenario-based cards. GLM5 is also really smart and is better at focusing at the story aspect but it's not quite as lively out of the box. I don't think there's a way around trying both and seeing which fits your cards better.
>>108210447>Model to ASIC time is ~60 daysMaybe just for ASIC definition. But actually producing a physical wafer would take more than a year at best. The process from sand to wafer is very time consuming. And I'm not even talking about mask production, which also would take months.Given how fast AI is advancing, your ASIC will be obsolete before the first wafer is produced.
>>108210720>it takes one years to grow crops but i need my dinner NOW!Don't eat what you will plant retard
>>108210720>>108210512
>>108210728StepFun is already better than Nemo. Step might be dumb for its size, but it's smarter than Nemo.
so it used to be local meta was GPU's, but now people use mac studios to memorymaxx? what changed
>>108210772>peoplelol
>>108210602It's either duplex with video or simplex voice-only, and simplex is no better than using stt. And 3090ti isn't enough for duplex with video, she chokes on words, but at least it runs at all this time
>>108210772plus-sized moe blobs
>>108210578Imo GLM is a better writer than Kimi, especially for dialogue, but Kimi is much easier to prompt and steer.GLM 5 will sometimes just go "eat shit we're doing things my way" and straight up ignore instructions that go against the way it believes RP should be paced. Zai might have deep fried it a bit as part of whatever RL they did for roleplay/writing.
Are there any models better than qwen vl 30b at bboxing?
Damn, MiniCPM-omni is so cute
>>108210512>>108210556Also no finetuning, no weight modification. You get the built-in model safety for the entire lifetime of the device.
>>108210907>no finetuningAnd? Only retards run copetunes
>A 128GB M4 Max Mac Studio costs around $2,500–3,500 and has no real PC equivalent at that price point for local LLM use. To match it on a PC you'd spend roughly $5,000–15,000 depending on how you build it. This is arguably Apple Silicon's biggest competitive advantage for local AI workloads right now — the unified memory architecture makes large RAM cheap and fast in a way discrete GPU setups simply can't match at the same price.Well? Where is it wrong?
>>108210898(you)
>>108210924before the ram crisis you could build a system with 784gb ram, ~400GB/s bandwith for like 6k. Now with the ram crisis there is no better deal than apple sadly. Still won't buy it though.
>>108210924The funny thing is that you could've easily built a pretty decent DDR4 Epyc server a year ago with 256GB of 8-channel DDR4 + a 3090 that runs pp and pg considerably faster for less than that a year ago.
>>108210404It autopulls
>>108210447I think the inability to update the model will be a downfall. If there was a schema where you could burn in 80% of the model and add 20% of it outside as a flashable ROM that was tunable I think you'd have a more viable product. And then there's cost. But if the card were inexpensive enough there could be a market for it. What would someone pay for DS v3.2 burned onto a chip permanently, that responded that fast? When your alternative are a $200K build or API access? >>108210720FPGA instead? IDK hardware obv. >>108210907>built-in model safety for the entire lifetime of the devicePic related. From a corporate control standpoint I get it.
>>108210955>>108211036Stop reminding me
>>108202974I don't understand gooning to character chats. I use Koboldcpp to goon to crafted scenarios, not particularly talking with characters. Silly Tavern is completely lost on me.I want to be a dude raping supes in DC or a goblin fucking elves, I don't particularly care about talking to Albert Einstein. This sex chat is weird, it doesn't make sense, and it's weird how it's so fucking popular.
>>108211085>I use Koboldcpp to goon to crafted scenarios, not particularly talking with characters. Silly Tavern is completely lost on me.You're me a couple months ago but I migrated over to ST, you can have a narrator card if you really want to and it is more flexible in designing characters than kobold
>>108211085Same. I don't see how people prefer that to a free form chat where you can do things other than talk with a predefined set of characters with no external narrative.
Anything actually newish and good happen with vision models yet? Need that for ai video game mods and ideally want to keep to local
Reasoning and benchmaxxing kills OOD performance and I'm tired of pretending it doesn't.
Is there anyone else here trying to generate/co-write stories (erotica) with LLMs rather than doing ERP? Specifically, I'm using something like mikupad to have it extend a story from a given premise (using text completion rather than regular chat completion).I'm wondering if the model meta for this use case is different than for ERP. For example I found deepseek to be near useless for this kind of open-ended writing. Any recs?
is it possible to get anything atmospheric in terms of music? I've been trying but I can't get anything that doesn't have actual structure of a track that belongs to a commercial album. I was thinking of something like this:https://www.youtube.com/watch?v=rU9P7C0klfAhttps://www.youtube.com/watch?v=otbI6SD8lpQhttps://www.youtube.com/watch?v=MjoRQHXd6tkno matter if I tell it in detail about the repetition and unique sounds it still adds structure of an album song or makes it sound like a stock sound for a stream countdown prelude with mostly piano or simple pads.
>>108211307please wait for a reply
>>108211321>open-ended writing.start with a brainstorming session give it a plan, no model can handle open ended well.
>>108211323surely you can mask parts out so they don't get edited like with image models?
How do you make MoE work for RP/stories? I'm an LLMlet so I don't get it. Trying with koboldcpp and it doesn't work for me at all out of the box. My three main issues:- The agent seems extremely confused about context/world info stuff if I load a story from chub or wherever. I tried a story with multiple starting message alternatives for example, the model seems to think it has to choose between them and not make its own which Nemo etc have no issue with.- The agent spams the chat window, when I don't need to see all that shit. Honestly even if the rest is solved this would be a dealbreaker. I'm hoping this is just some toggle I missed.- I ran the model in llama.cpp's default UI to get a benchmark and it was showing 15t/s for a generic chat, which seems OK given I don't exactly have a rig. But the agent takes ages to process so it's not exactly quick anyway. I guess this one might just be due to initial context and it will improve if I actually get the story underway.
>>108211324Ask 4o for synthwave lyrics>Get something moody, topical, creative, etc.Ask gpt-5 for lyrics>NEON BARF , SYNTHWAVING INTO THE NEON MIDNIGHT... NEON
>>108211551>But the agent takes ages to processHow many tokens are you feeding into it at one time?
https://huggingface.co/Ex0bit/Kimi-K2.5-PRISM-REAP-530B-A32Bhow bad is it
>>108211589wait for Samsung REAM
are you guys seriously thinking you can compete with 300 b models?
>>108211639? some anons are running far bigger than 300Bs
>>108211589It is probably calculated with activations relevant for coding. Reap is pretty ironic when you think about faggot drummer aids be upon him. Doing REAP with ERP datasets might actually do something but instead we get cydoniav12_f snakeoil. I hope no one ever hires this piece of shit.
I tried GLM 4.7 Flash Q6 on koboldcpp after having been away for a while and it was fucking terrible. I got the quants from after the llamacpp update fixed them. Is Flash still not working right? Anybody had any luck with this model?
>>108211589>>108211598>>108211688>ex0bitpatreon paywall scammer, please don't give him attention
>>108211650you mean some rich silicon valley bros reverse mortgaged their house to take 200k to run 700B models
>>108211741Try it on llama.cpp to be sure, but yeah, it doesn't seem to be great from the little I tried it.
>>108211688doesn't reap require validation?
>>108211589>no goofsDo REAP models need support or can you just quant them like normal?
>>108211753No, some of us just have real jobs. Don't eat so many sour grapes, bro.
>>108211688>activations relevant for codingI looked at the dataset it claims to have used and it looks as multipurpose as it gets, with different languages even. Even though I don't understand how it can work without terminally fucking up the model I'd download a goof.
Does ST have a way to launch it without autopulling from github? Who even uses rolling relases still?
>>108211831The normal start.bat doesn't update, does it?
>>108211831in the .bat you use to start it, delete all except>node server.js
>>108211823>multipurposeMultipurpose isn't 100% SEX.
>>108211841so all this?
>>108211583Variable, as I said I tried just opening a random chat in llama.cpp with like "Hi this is a model test" or some shit, and it took a good while for it to decide on an appropriate response. I did a couple followups with the same issue.Then I set max output to 1200 to test in koboldcpp and fed it a 4k context RP thing, the agent spammed out rando reasoning shit for like 2 screens and threw out a single sentence actual response at the end, which made me laugh a little.I didn't really try to continue RP past the initial context due to the other two issues though, so as mentioned it might be ok past first context parse.
what the fuck is this? trying new qwen on ikllama with --jinja
>>108211886geg
>>108211886ikbros...
>>108211860yes
>>108211886Seems like the jinja template is disagreeing with the request object you are sending.
>>108211904should I update ST or what?
>>108211894It actually needed the NODE_ENV because it didn't launch, but ty.
https://github.com/ikawrakow/ik_llama.cpp/commit/cbf7fc7e2f7de4400dd848ff2c221a6c8ea0384f>Do not use quantized models from Unsloth that have `_XL` in their name. These are likely to not work with `ik_llama.cpp`.lol why?
>>108211910first try this in " Prompt Post-Processing "choose one of the strict ones like system -> user -> assistant
>>108211910I think you just have to fix the order of your messages so that there's a single system message at the top followed only by user and assistant messages in turns.There's some options in the API tab of ST, under "Prompt Post-Processing", to merge consecutive messages with the same role, I'd keep that on too.
>>108211930>>108211910Oh, you could also fuck around with the jinja template.This HF space is the bee's knees to troubleshoot this kind of stuff :>https://huggingface.co/spaces/Xenova/jinja-playgroundJust copy the request object from the lcpp console and the jinja template and see what the final formatted chat looks like. That should give you a better clue of what's wrong exactly.
>comfy shits itself with mass errors over setting output to be on another drive>does it anywayok
>>108211886Is tool calling enabled? Tool calling still needs to be reinvented for every new model so if nobody bothered to do that and the option is enabled, it can cause this.
>>108211914unsloth bro?
>>108211914llama.cpp codebase is pretty big and complex now. He chose a few portions to maintain and improve, but since he lacks the manpower he leaves the rest of the codebase to rot and a lot of shit no longer works.
>>108211767I don't know, but can you even quant that model? They list it as int4. What are you gonna quant it to? Q3? You only save a bit of space with that.
>>108211914>Do not use quantized models from UnslothCould have just left it at this
>>108211927>>108211930thanks, this fixed it
>>108211914Use case for using ik_llama to run unslop quants?
>>108211948Coding a working file system is hard, ok.
He's growing too powerful.
>>108211886 (me)Ok, now it seems to continue the last swipe after each one. Not good.
>>108211753>reverse mortgaged their houseDo you mean 'a second mortgage' ?
>>108211753all you needed was a couple grand and a silly unfounded paranoid schizophrenic fear that prices might spike in the near future at some point between 2023 to summer 2025.
Bros... I can't finish writing a character card, I already coomed multiple times just during the process of putting it together.Haven't generated a single token out of it, maybe "just write the output yourself" was the solution to all of our LLM woes all along.
>>108212162been there
>>108211815Real jobs with 500k starting
>>108212162>llm spits its reply out>i notice a missed opportunity>edit the dialogue just ever so slightly>next turn it picks up on the hint>it goes just like i want it to>splooooooooooort!!!!
>>108211951Only people running local models are pajeets and chinese because they don't have the money
>>108212353but running local costs a lot more than using api?
>>108212353so true sister
>>108212353They can't afford the luxury of experiencing NovelAI's SOTA GLM 4.6, I pity them.
>>108211914wtf I love the schizo fork now?
>>108211976I think it is for high IQ chaotic neutral characters.
i pulled ikllama an hour ago, https://github.com/ikawrakow/ik_llama.cpp/pull/1295 claims to have fixed broken caching but it still happens?
>>108211793>I don't know what they're doingBro tongyi lab is a LAB. They're doing research. They release models when they're good enough AND there's no more internal research they can do/learn from so they give it to the community to see what else randos can come up with The reason they're not releasing wan 2.5 is because it's too big to run locally (probably around 40B) so they're not gonna learn a lot from releasing it. They're probably pretty confident they know the wan architecture better than anyone else at this point and why lose the most of understanding an architecture that is proven to be competitive with SOTA if you just scale data?>>108211967Lizard brain perceives shiny = valuable = want For me, it's wet sparkly glitter because it also hits that arts-and-crafts messiness that I like as well. >>108210701>it's over. we're stuck it 5 seconds wan vid gen foreverThat's like saying you're stuck with weed forever. Yeah you're never gonna be "high-school high" again but don't pretend like you can't have a good time for the rest of your life with it if you let your tolerance reset once a weekI'm excited for when we get heroin though. I'll probably lose my job when heroin comes out from gennjng illegal stuff at work. >>108212186>>108212197Are you anons honest with yourselves that you're into toes because they look like tiny penises/clittiesBecause I'm not. Claude pointed it out and that's totally where the wire crossing comes from but I just pretend it's fully about the intimacy of seeing a private area instead of that being the secondary point of neuron activation for feet to me
lol the jerkies blocked /ldg/ harder than /lmg/ so if they don't fix it I guess I'll just pretend this is /ldg/
>>108212353If average API costs are like 20 USD a month (just pulling a number out of my ass) it would take you like 7 years just to pay for a single used 3090.People who run local do it because it's an excuse to indulge in their PC building hobby.
Best model for japanese? Been using qwen 2.5 but it's censored and randomly shits out chinese. Wanna Practice with some Japanese characters I made in sillytavern.
>>108212449you are trusting a schizo responsible for the schizo fork anon
>>108212523Gemma 4
Is hIs hardware really that much of a bottleneck? I assumed the big models only needed that much power because they serve millions of users at the same time.
>>108212523https://huggingface.co/KoboldAI/GPT-J-6B-Janeway
>>108212536It is really not. You just need to download ollama and type in:ollama run deepseek-r1:8bAnd you can run deepseek.
I have no desire to do coom roleplay which UI should I use if I want a actual
>>108212536A model like deepseek R1 (which is relatively old now) is nearly 700gb at "full quality".You need to load all that in memory, and the memory needs to be fast for the response to not be crazy slow.And you want at least a GPU core to process the prompt/context before it starts generating the response.So yeah, it gets expensive for the actual good stuff.
>>108212353>yes goy, using the API is real wealth. Your time is too valuable to bother hosting anything locally. Only god's cho-I mean, poorfags buy GPU's and CPU's and SSD's
>>108212577>>108212577>>108212577
>>108212523Consider something from this listhttps://github.com/llm-jp/awesome-japanese-llmYMMV intelligence-wise, but you have a smaller chance of getting English-worded Japanese replies
>>108212225Sour grapes, brah
>>108212030hair cut status?
>>108212374No