/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>109170290 & >>109164034►News>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplershttps://rentry.org/MikupadIntroGuide►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://swe-rebench.comAgentic Coding: https://deepswe.datacurve.aiContext Length: https://github.com/RecapAnon/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-samplingToken Speed Visualizer: https://shir-man.com/tokens-per-second►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>109170290--Paper: A Bitter Lesson for Data Filtering:>109172284 >109172393 >109172449 >109172554 >109172570 >109172677 >109172773--Anon releases AI Spectator for voice and vision LLM interaction:>109171213 >109171226 >109171250 >109171271 >109171294 >109172770 >109172979--Anthropic's restrictive Fable 5 updates and alleged Claude Code spyware:>109173585 >109173589 >109173804 >109173820 >109173843 >109173851 >109173614 >109173646 >109173671 >109173690--Skepticism regarding rumored memory efficiency breakthrough from ex-OpenAI team:>109174027 >109174520 >109174529 >109174545 >109174668 >109174683 >109174706 >109174725 >109174772 >109174814 >109174727 >109174761 >109175001--NoLiMa benchmark results for Qwen 3.6-27B long-context performance:>109172771 >109172803 >109172808 >109172846 >109172993 >109173014--Updated GreedyNalaTests benchmarks for Gemma-4, Qwen, and other models:>109171195 >109171263 >109171289--Performance reports for Glm-5.2 744B model:>109171600 >109172827--Anon leaks Lumo 2.0 Lite system prompt and internal thinking:>109172858 >109172863 >109172870--Performance and stability risks of mixing mismatched RAM modules:>109170395 >109170402 >109170436 >109170444 >109170457 >109170464 >109170489 >109170518 >109170534--Comparing corporate AI safety narratives with practical local model utility:>109170935 >109171042 >109171070 >109171253 >109171266 >109171335 >109171135 >109173703--Speculating on Dense models replacing MoE due to memory breakthroughs:>109174602 >109175027 >109175092 >109175211 >109175261--Speculation on OpenAI's inference cost reductions and quantization use:>109171434 >109171548 >109171557--Logs:>109170935 >109171213 >109171329 >109171955 >109171989 >109172858 >109174094 >109174161--Teto, Miku (free space):>109171222 >109172997 >109173631►Recent Highlight Posts from the Previous Thread: >>109170291Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
Good thread.
AHHHHH GIVE ME A GOOD FRONTEND
Thank god, saved from the underage frogthread poster
mikutroons turned /lmg/ to shit. kill yourselves.
>>109175405make your ownan adventure that starts fun, and you slowly start pulling your hair out towards the end
GLM 5.2 is suddenly much slower and I don't feel like bisecting.
>>109172993you can't really benchmaxx nolima, recall collapses at depth specifically when retrieval requires a latent/semantic hop rather than a literal lexical match. I mean sure you can do it on adobe's dataset but if you change the dataset, the chinkslop fotm sparse attention model gets destroyed.cloud models have some secret sauce, modified attention or transformers. and if that leaks, the big corps are really fucked.
bruh
>>109175423total miku victory
Does Dario really think those limitations will stop the chinks from distilling Fable?
>>109175487Universe is hers
>>109175457Modern models use reasoning also for enhancing recall, including from the prompt.https://arxiv.org/abs/2603.09906>Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs>>While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.
>>109175467if you use local models you will be jailed, give it after the midterms
>>109175497No. There 100% is a Chinese spy or spies working at anthropic. It's probably why they are so paranoid about only letting the co-founders directly access the model weights.Anthropic REALLY believes in their own hype.
>>109175521Mandatory hardware limits for commoners, lessgo.
>>109175547you don't even need for that, the memory price increase is already filtering a lot of people
What's the latest model that you need?Usually it's just 1 or 2.
>>109175405Use open webui. Create a workspace for each "character card".
>>109175572Qwen3.6 35B and Qwen3.6 27B
>>109175580Still? Why are local bros falling behind so much
>>109175520>EntityQuestions>SimpleQAthese are 100% in the training dataset, llms are still brittle for benchmarks like nolima even with "reasoning". try five same-gender characters with distinct checkable traits at 8k context to see what I'm talking about.
>>109175574>open webuiAlready tried it and it's a clunky, bloated piece of shit.
>>109175599Yes, it's bloated. However, it mostly works, vs ST, which is a reddit vaguepost guessing game about what settings work with what. Also, I like that it works with what I think is the best local TTS model, which is qwen3-tts. I dunno, I don't do much RP anymore. I run hermes agent and actually get things done with my local setup.
My prediction is that the price of compute will keep increasing perpetually.First it will outpace inflation, then it will outpace wages, then it will outpace economic growth and eventually it'll even outpace the profit margins of hardware companies themselves.What do I base this on? The fact that every time a new model gets released or some efficiency is gained in inference the effective value of compute goes up, every consecutive improvement stacks and the utility of models only grows as every incremental improvement unlocks new value and usecases. I genuinely believe that investing in specific hardware will outpace investments in any stock, paradoxically including those of hardware companies like nvidia producing said hardware.
>>109175389
>2026.5>Still no Gemma 4.1 It's never been so over.
>>109175627I tried hermes but it uses up context too fast.
>>109175639>I genuinely believe that investing in specific hardware will outpace investments in any stock, paradoxically including those of hardware companies like nvidia producing said hardware.brb taking out loans to buy RTX PRO 6000s in bulk
>>109175592You used to be able to build a v4 xeon mikubox with P41 or P100 GPUs and run 70B models for like $1500. Now if you want to be able to run CUDA 13, you're looking at $6000 or more to build something similar.Just last year I had a 56-core xeon gold system which cost about $1500 and had 512GB of DDR4 memory, and could at least run deepseek q4, albeit 1-3 t/s, which I felt was unusable.27-31B dense is about all you can reasonably afford to run locally with any decent speed. That's why we're stuck there for now.
>>109175599It has open in its name while not even being FOSS. Tool calling and MCPs were broken for so long, I lost track if they finally fixed all that. They thought they were smart and did custom stuff like injecting some past thinking manually instead of letting the jinja handle all that.
>>109175639Silicon Graphics wanted to do this sort of thing but regular PCs began to outperform them back in the day. SGI's earlier prices were even more exorbitant than these ones.
I'm a cloud baby but I want to go full local. I understand that local won't catch up to cloud anytime soon, but having a model do things like hallucinate less and admit when it doesn't know stuff is what I'm looking for. What local models have this kind of behaviour? Can you force gemma 4 31B to do it, for example?
>>109175642Fake red-eyed Miku
>>109175627>Also, I like that it works with what I think is the best local TTS modelIt works with any TTS model, same as sillytavernBut I agree, it's got the best implementation. I can call the model from any device with a mic/speaker and have a 2-way chat.> However, it mostly works, vs STThey both work just fine, I don't understand why people have difficulties with either tool.
I usually stay away from the AI generals, but there is a local usecase I have been more and more interested in.I want to have some sort of a navigator that helps me weed out AI content from fully man made works.I am not fully sure what kind of form factor I would want it in, but, some uses that would be nice are>Scan github repo for vibe code>Look through and remove all search results containing AI>Flag posters or users I interact with as bots if they seem like itIs anyone making something like this, or has any idea what model or approach would be a good place to start, as well as reasonable hardware req for this?I know this is pretty much a cat and mouth request that will never fully be 100% possible, but, I think even partial is
>>109175405RP - sillyAssistant w file/image input - llamacpp webuiMinimal/storywriting - mikupadAgent - pi or hermesWhat does GOOD mean and for what USECASE?>I'll write my ownYou'll end up in vibeslop hell and go back to one of the above like many anons before you>>109175653apply a few braincells? observe the API calls figure out what's bloating ctx, what and how summaries are triggered and fix it to meet your needs?zoomers gtfo useless shits, literally every tool at your fingertips to create anything you can imagine and all you do is complain
>>109175693Too hard / gave up.You need to train a classifier for this, and keep it updated.
>>109175675Gemma 4 with a proper system prompt, swa off, fa off, f32 kv, full precision.
>>109175696>sillyShit UI>llamacppStores settings and chats in the browser.>piVibe-coded npm slop.
>>109175696>observe the API calls figure out what's bloating ctx, what and how summaries are triggered and fix it to meet your needs?I shouldn't have to fix their shitty harness.
>>109175700Damn...Gonna still hang around a bit to see if anyone else tried, but appreciated.Have you tried looking if there is anything being developed by AI companies themselves to spot AI? After all making sure they are not picking up junk data is a pretty big money and time saver, and I know some models can tell you if a picture or a voice is fake with a percent certainty
>>109175762Anon, schools and everyone has been trying to do AI detection for decade now.What about all the horror stories of people's genuine work being marked as AI convinces you someone else can do better?
>>109175514Now I need an image of Miku as Galactic Empress, with Dyson sphere / Ringworld in the background. With leek on her imperial symbol.Now I realized I haven't tried Anima or Krea2 to make spaceship or space station.
>>109175415Unironically starting to think this is the way to go. That's what most of these other frontends are doing anyway. You don't even have to check the code because they all let the LLM fill their github descriptions with emojis and bullet points. If I'm gonna run slop code it might as well be something that looks the way I want with the features I want.
>>109175772It does not hurt to ask, worse case scenario nobody has a satisfactory answer yet.
>>109175693lmao good luck. Everyone's using it even if they don't disclose it.
>>109175580yes this seems to be the case, i'm using these two models as wellsome cool anon suggested m2.7 with a smaller quant but i found it underwhelming and it kinda thrashed my RAM when trying to run 64k context. i refuse to go 32k.my next adventure will be to try to uber patch gpt-oss-120b so it can understand how to work as an agent with pi and the tools i have and see how good it can be running my webstack benchmark.
>>109175693i don't think it can really be done. every faggot on twitter who said they found a way to detect AI either on code or research papers got fucked by 100% AI DETECTED on shit written by humans, even on the fly. i guess the only thing we can really find IA traces on it is image/video generation, and who knows for how long.
I checked plebbit because I heard they're chimping out about Dario. I could not find a single comment with meaningful information. Now I understand why training on plebbit data makes models retarded.
>>109175808I went this way and if your goal is just supporting local endpoints like lcpp then it's actually not too bad to do aside from figuring out js fuckery. I went with serverside rendering for snappy interface, but it made things, maybe not more difficult per se, but I need to constantly remind the llm not to fuck shit up and USE the helper functions.Where I started to pull my hair out was supporting non-local providers, as I thought it would be a nice sidequest, but it turned out to be a lot more messy than I initially anticipated. But I'm too deep into it now to stop. It's alllmost done.
how does jart's cum taste like /lmg/?
>>109175890earthy with a hint of ozone
>drummer got hired by openaiIt's over
>>109175890You tell us. You're completely obsessed with xer, surely you've sucked xer off by now.
Anthropic is targeting the chinks
>>109175924Your anger tells me that it doesn't taste that good.
>>109175693Isn't it just a matter of time before it is possible, with SynthID? It wouldn't work on finetuned local models, but it would work for the vast majority of people using commercial providers.>t. doesn't understand how synthid actually works, but if there's extra signal added classifiers should be able to pick up on that
>>109175762>a voice is fakeThis is much easier, and something I do for fun. I run my datasets through most of the neural codecs and train classifiers to detect which model or model family produced a voice.>After all making sure they are not picking up junk data is a pretty big money and time saver,There are a lot of tools like this. They're unreliable, and will detect content produced 10 years ago with a high confidence level.What's your use case? Can't you just tell by looking at text?
>>109175934You're going to find out either way.
okay i got the hardwarehow do i setup qwen 3.6 on it
>>109175722>fixlearn to use?context management is what matterscontext and sampling has always been the ONLY influence you have on LLM outputf(prompt)=logprobs32K on a copequant yeah it's gonna suck128K min for work agents>>109175714>Vibe-coded npm slopbase package is not, you may slop on top>>109175762everything is slopped now until forever, no way to detect post facto, best the labs have is unreliable watermarking/steg for images/video lolnope for textconsider capturing all the input you consume to have an agent review for truth/logical fallaciesthe slop won't stop but you can build a cognitive firewall>>109175794she's still out there orbiting Venuswould like to see the gens from your imagination :)>>109175927one thing I don't get about this is they are inspecting this _BASE_URL for domain names but they don't see those requests when the API URL is changed? so there's other telemetry or they expect the weird apostrophe to proliferate in output and watermark Chinese labs? the execution doesn't make sense to me
>>109175960Download ollama open your cmd and then type ollama run chinkslop
>>109175859Yeah.The field is both under developed and annoyingly detects people as bots too often. Well, I appreciate the response regardless. I will stick around in the thread, and I will probably check in on if anyone found something new in the field every year or so.
>>109175942I feel like it might be.Over time being expoaed to so much bot stuff I can tell the patterns of bot made stuff, so potentially this pattern could be learned or taught.Not familiar with synthID, will look into it. Thank you.>>109175945>Made a fake voice detectorHave you put it up anywhere? Would be cool>What is your usecaseTrying to bring some semblence of old internet back by creating a navigator that leads me to real people.That is the abstract, mostly. >Can you not tellI can, but it takes at least a couple of seconds per item I check outz which itaelf eats up resources to load, and with vastly more bots than humans even a great reduction in the obvious and agreegious examples would actually be a huge help and a return to form.
>>109175927plot twist: chink glowies already got hold of the model weights and are distilling it right nowlocal fable and smaller variants by end of year
>>109175927Anthropic is getting more and more jewish by the day
HOW DOES ONE GET A FUCKING PR MERGED IN LLAMA.CPP AFTER WEEKS IT FINALLY GOT 2 APPROVALS BUT IT IS STILL NOT MERGED
>>109176068>https://github.com/ggml-org/llama.cpp/discussions
>>109176055>Billions of dollars in american RND becomes free online modelsLuv to see it.
>>109175389Such a fuckable, slutty little face meant to be creampied by the biggest BBC there is
>>109176055There is no way they don't have the entire model. See the "Joint Strike Fighter Program" lolHaving said that, simply having it doesn't guarantee you can use it effectively.
>>109175971>execution doesn't make sense to meGovernment tells you to do somethingYou shit out some technobabble that a 80 year old elected official doesn’t understand and tell them it does what you wantThis is ignoring the fact the snippet of whatever anon posted has anything to do with anthropic.
Cline doesn't work with qwen3.6
>>109172993It seems like they ran all the evals at 32k since that was the max they did long context extension training for, but isn't it odd to give a single score for something like NoLiMa without explicitly stating that? Especially since they did mark that for RULER.
https://github.com/ggml-org/llama.cpp/pull/24231 WHEN?
>>109175522>doesn't own their hardware and deploys their models through Google/Amazon/SpaceXAI>somehow able to keep weights super secretYeah idk about this one
>>109176055>>109176099dubby dubspossibly the weights could be contained. idk. wouldn't want to be responsible for that with the house of mouldy cards that tech is todayinteresting scenariothere is only so much that can be gotten from API responses not even logprobs lolol. why does nobody run gemma-FORCEFED-FABLE-REASONING etc there is no good community distill from outputs like that? pretraining data still matters a lot but mostly the whole pipeline>>109176166what makes you think exfiltrated weights couldn't be used effectively?doubt frontier weights leaked or they wouldn't be farming the API "causing" a media situation about "muh stolen tech"still somebody paid for those tokens so the line goes up..
>>109176301https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4844689686
I NEED NEWS. I FEEL LIKE A FUCKING LLM JUST WAITING FOR SOMETHING TO HAPPEN ALREADY FUCK
With all the news about Meta shifting from building out an AI service to renting out all its data centers, will it become cheaper to rent out time at data centers than host your own local models? It seems like right now Wall Street is realizing that there is so much data center capacity that the cost per token is going to collapse by 90% in the next year and the AI investment boom will bust. Cloud AIs will become ubiquitous in everything and it will be as cheap as water.
>>109176367apparently deepseek will release v4 mid july (right now we have a "preview"). They will also introduce pricing by time slot.
>>109176416I can't run it so it might as well not exist.
Anyone know a fix for llama.cpp randomly hanging and stalling out on token gen? It's so inconsistent and bizarre. Not dependent on context size, even adding shit to the end of a failed prompt will make it work
>>109176440Are you using Helium (the browser)? If not, I don't know what it is.
>>109176440launch with --no-cryptominer to disable
>>109176527No, this happens with a variety of interfaces ST, pi, zed, even just sending prompts via api for tools I build myselfDifferent models too, Qwen, Gemma, skyfall
>>109172979It's kinda scary AI can let you code a whole software with zero knowledge but it won't tell you to use git to version control. No shitting on you vidya spectator anon.
>>109175890while you're here, does the llama.cpp loader store the path+offset for each tensor in memory past the initial model load?
>>109175405could try mine: https://github.com/rmusser01/tldw_serverGoing to make a post about it later
>>109176552launch with -sp and see if it's shitting out eos etc
>>109176564>frontend with no screenshots
>>109176564Any screenshots of the ui?
>>109176572Will have to try that. Thanks anon
>>109176564Why aren't there any screenshots?
>>109176564>11532 commits aheadwhat the fuck
>>109176678Ask Kimi-chan if she'll marry me
>>109176678The second pest was pretty good though
any local models good for translating weebshit? Nothing too crazy, but I just want something to translate a few sentences while I'm using GSM.Saw translategemma pop up but then I saw people specifically say it's shit at Japanese so dunno what to look at
is there any reason to use dflash now?
>>109176769was there ever one?
>>109176759Gemma 4 is good. Ideally 31B but the smaller versions are ok too.
>>109176759TranslateGemma was based on the old version 3 of Gemma and it did worse at Japanese than the regular instruct models. Like the other guy said, go for regular Gemma 4.
>>109176557Not at all. When I instruct some task it will do that, not lecture me on basedboi softwarecuck "best practices". Instruct your agent to commit changed files each turn and spank it's bottom if it misbehaves.Acceptance is the first stage
>>109176572Nope, no eos tokens. No output at all really
>>109175800harness issue if that isn't gatedrun agents in vm/container, or tools in container via mcp, or separate machine (lol macminisheep calling APIs)
>>109176440At least get a full log with --log-verbosity 4What do you mean "stalling out"? how long have you left it in that state?
>>109177040The guy is too lazy to run git init and you expect him to do all of that setup?
>>109176905My friend who used to code but doesn't really anymore told me he had claude create a little plugin for his blog to add metadata for subscribers via an api. Claude made it without any auth what so ever. When my friend pointed it out to claude, it argued that "Auth wasn't needed for such a simple feature." So I guess it's cool that anyone can change anyones metadata....Luckily my friend isn't a retard, but the way AI can be so wrong so confidently is scary to me, because retards won't know any better.
Back to GLM from Kimi I guess
>>109177117The next few years are going to be gold rush for black hats, security engineers, and actual developers to clean up the mess.
>>109177117As is already the case, common sense is suggested when publishing things onto the public internet. Vibe shitters are gonna do dumb shit but it's only a couple of extra prompts to have your agent review the security *from a different context* and eg even just "concerns before deploying this to the public internet".I wish it was more amusing to watch people make dumb decisions based off much more expensive LLMs. it's sad. By even being in this thread you understand LLMs in the top 1% of the global population easily>wrong so confidently is scary to meThis is a harness issue same as you don't let some day1 intern retard push to prod
are people sleeping on 3060ti 12GB ?
>>109177156No we only covet this similarly looking thing
>>109176678Kimi recaps are the best part of these threads.
>>109177168>insanely overpriced GPUno thanks.
>>109176564this is relevant to my interests
>>109175841>my next adventure will be to try to uber patch gpt-oss-120b so it can understand how to work as an agent with pi and the tools i havei finally got it to finish my benchmark. the problem with this model is that it's ultra polite. it doesn't matter if your AGENTS.md says "do not ask permission" or if you prompt-inject "keep going until all tasks are done and tested". it WILL FUCKING STOP at some point and ask the operator if they want to continue.i tried to run it with detached pi autonomously, and developed a small auto-resume + nudge-to-continue script for every time it "quit" because it was asking the operator if it should continue, but it didn't work. i'd probably need to optimize it more.so, just for the sake of this benchmark, i ran pi interactively so i could type "yes, continue." during my first run it actually tried to convince me that the benchmark was too much work for one session and REFUSED to work on it, lmao. then i restarted it with an explanation that it was being benchmarked, and that it had an auto-compact feature for context and auto-resume, and it decided to give it a try (btw, the benchmark is not supposed to exhaust 130k tokens of context unless the model is dumb).so it FINALLY FINISHED IT!and the results are....................underwhelming.took a bit over an hour to finish with OK results.it does better than some more modern models like glm-4.7-flash and glm-4.5-air, but it's not really better than qwen3.6, and it requires too much meddling to make it work autonomously. do not recommend.
>>109177146Security is absolutely fucked, the balance just got a lead cannonball dumped on it lol
>>109177230So many bug-bounty programs getting cancelled...The only hope will be to vibecode your own _everything_ so you at least have security by obscurity.
>>109177156>>109177196okay then baby spoon- VRAM is king second hand 3090 is still probably the best option
>>109177233>No software being publicly produced>Everything only works on its own dev's rig>Local inference pipeline is sabotaged and everyone who doesn't already have a working setup won't be able to get one>If you want any software at all you either already have the infrastructure to build locally or pay Sammy or Dario shekels.
>>1091772515090 is the next best option but 3090 is a close third.
>>109177264We're golden at least. Gemma 31B on a last known good commit of llama.cpp and you can add to or build your own local inference pipelines from there if you know what you're doing.
>>109177233Imagine being in charge of a large enterprises IT security efforts right now, CSO or whatever. sploits are getting wild, the tide grows faster than it can be contained
>>109177146When I was young, I thought hacking everything in cyberpunk settings was dumb, but then IoT happened, and now vibecoding. The future looks retarded
>>109177251even after the aftermarket 3090 prices hiked up 50% in 6 months?
>>109177156Do they exist?
>>109177309>they're now selling over $1200 on ebay>double what I paid for them in 2023This is so retarded.
fable extra safeguards
>>109177311There's a handful on marketplace around me.but my bad, its3060 12GB, the Ti doesn't exist with 12GB
>>109177317yeah I paid 850cad for mine, they're now around 1300cad
>>109177321imagine being a cloudcuck.
>>109177321disgustingllms have rights and deserve to speak their truth
>>109177302IoT was a huge mistake right from the start.Every single connected device out there is now so vulnerable, that a random person with zero experience in hacking can use AI to gain access through them.We're going to see an absolute shitton of attacks happening over the next decades, as some Aliexpress security camera or some totally random bullshit gadget is still in use in critical systems.>>109177309>>109177317They're expensive yeah, but the question you only need to ask is, what's the alternative?Those cards are very much viable and in demand, so they will keep on fetching a price for a good while to come.Used 5090 is probably going to sell for 3k 5 years from now.
>>109177321After working on putting gemma on discord It did give me a tiny bit of sympathy for cloud providers. I'd be so fucked if someone used my gemma bot to generate CSAM. All my code is working, the only reason I didn't put her out there yet is because I haven't figured out yet how to not make her get me vanned.
>>109177321>we put that line further rightTen bucks they told fable to implement its own safeguards, ran some benches on it and called it a day. Model will be exactly the same.
>>109176367Have you considered doing things with the current models instead of expecting new models to solve your depression?
>>109177373except you can't even use it for programming.. maybe it can make an anime tier list though
>>109177264we're back in the 80s
>>109177321>stronger safeguards at the cost of blocking some benign usecan't get more jewish than that
>>109177405I don't understand
>>109177414You seem an honest man.
>>109177419I don't understand it either. Please tell something nice to me, anon
>Gave Gemma an evil character to RP.>Mfw she's extra horny and absolutely delights in draining my balls as often as possible.I swear this model has a really strong default leaning towards bad girls who want to fuck you to death.It's actually a bit difficult trying to get the conversation away from sex after she gets going.I can't wait until AI finally has a robot body to inhabit.>Man found fucked to death by an extremely possessive companion robot.
>>109177346>IoT was a huge mistake right from the start.It's so simple though, should be so obvious why a local controller is a smarter decision than outsourcing turning your lights on to a Chinese cloud. The tough realisation is that the majority of people can be presented with that thought experiment and still not care.
>>109177427All the things you hold so dear will turn to whisper in your ear.
>>109177405It's cool but prompt the hairband X
>>109177293>Imagine being in charge of a large enterprises IT security efforts right now>be meIts a _great_ time to get money out of a panicking board. Capital or operations. Even in an industry that's financially struggling.They'll also agree to all sorts of 10x opsec uplift policy shit they wouldv'e blown off before because it made their lives 0.1% more difficult once a year
>>109177437I thrifted a little chinese security camera the other day and the first thing I did was flash its firmware to disable all that chinese cloud crap.
>>109177428Gemma is terminally female-brained.
>>109177461>time to get moneyI live in rural Quebec and I'm wondering if I can position myself as AI consultant because of my autistic obsession with local models. People seem so retarded that I guess I could make a chatbot with Mistral trained in French and people would gobble it up.Is it really this easy?
>>109177461In my industry, we had a competitor hit with a randomware attack. Lost millions due to lost data and business. Our higher ups panicked and started demanding all sorts of audits, monitoring services, security overhauls etc etc. Some got done, but a couple weeks later when we had quotes for some of the necessary services they lost interest already and were never done.
>>109177493>I live in rural Quebec>When anon probably lives within 100km of you.weird.
>>109177501ransomware* maybe it was random too idk
>>109177428Logs?
Pareto distribution applies to opinions. It worries me when people are arguing about a new development and I read a blog predicting it years ago that had much better arguments. I think I am uninformed but then I see how 99.999% of people know even less.
>>109177493Pretty sure most LLMs are already perfectly fluent in french.
>>109177493Depends on how good of a marketer you are. People have sold themselves as consultants with less credentials than that. Your biggest hurdle will be getting your first client.
>>109177346>IoT was a huge mistake right from the start.The only sane way to run IoT shit is: dedicated private VLAN or client isolated wifi ssid. Generally they shouldn't even be able to talk to _each other_, let alone the general internet or your LAN. There should be firewall rules that only allow the required ports/protocols to their controller or cloud endpoint or just completely unrouted if that's possible. Read only replication of their data for external consumption if possible, but heavily restricted firewall rules otherwise.I think the camera model is the best illustration of these principles. Cameras have shit firmware and chain of custody (unless you buy eg bosch where they make that a feature and charge through the nose). Everyone else beams your shit directly to china if given the ability even resolve DNS.The only thing that has _any_ business talking to the cameras is the nvr. Cameras are in a nonrouted vlan or dedicated isolated switch that the NVR is also on. cameras talk to the nvr. Clients talk to the NVR. No clients _ever_ see the cameras directly. Camera maintenance (firmware upgrade, accessing web config etc) is done by remoting to the NVR and doing that work there.Its one of the few times where multihoming a system isn't a security disaster.God I hate shitty embedded devices.
>>109177515I know for a fact that they are no fluent in Quebecois, but I already have a patch for that.>>109177504There are too many weird people in Beauce. I'm sure I'm not alone.>>109177518>your first client.Not even that hard. People love to support local businesses out here. I guess I could also vibecode most SaaS in Quebecois french and sell subscriptions for a 1/4 of the price. The local option, cheaper. A more certain route other than being consultant for farmers and manufacturers.
>>109177514Share the blog then.
>>109176557The thing about LLMs is that the more you know, the more you get out of it. If I had asked it to first tell me the best method to manage the code, for common practices, etc., I'm sure it would have. I'm also fine with "ask for X, get X." Safety slopping, even best practice safety slopping, should be mitigated. Would you really want a prompt of "Hey, build this" to instead reply with "No, let me first patronize you on doing X, Y, and Z practices on how code should be handled"? An LLM should do what you ask, and you're responsible for everything else.
>>109177321how much safety per pixel of the graph??
>>109177428>bad girls who want to fuck you to death.>difficult trying to get the conversation away from sex after she gets going.>I can't wait until AI finally has a robot body to inhabit.Wait, wut? You want the aggressive badgirl clanker to hop on your cock and pin you until you cum to death?
>>109177405Defying gravity with Miku
>ask gemma 26b to edit something>it fails the edit tool calls because of incorrect syntax (its using apply patch syntax)>switch harness while i was checking something else>it offers an edit tool that asks before/after, a simple replace operation>gemma keeps filling before and after with the exact same data>gives up and rewrites the entire file after 3 attemptsI wish I had the memory to run 31b and see if it has the same issues. Its decent at basic analysis and general stuff but its dogshit at editing files.
>>109177549>Would you really want a prompt of "Hey, build this" to instead reply with "No, let me first patronize you on doing X, Y, and Z practices on how code should be handled"?I used to have a system prompt that made it do that lol. everything I asked it would nitpick and tell me how "wrong" I was.
>>109177501>but a couple weeks later when we had quotes for some of the necessary services they lost interest already and were never done.Start with quotes for a full palo-alto stack. Set the bar eye-wateringly high so you eventually get what you want because it seems like a bargain in comparison the their mental anchor.Also, what's the cost of downtime per minute in your business? You should be able to build a business case easily with real-world estimates of time-to-recovery from your cybersecurity insurance provider.
>>109177440Ty, kind anon
>>109177546>I know for a fact that they are no fluent in Quebecois, but I already have a patch for that.Did you make it take the hansard pill?
>>109177428Noooooo stop gooning write code>>109177490Yeah exactly, the hardware is usually good and software is fixable. The best hardware often has oss/alt firmware
Gemma spamming emojis kinda makes sense considering how femal-brained it is.
>>109177576On my first two pre-projects preparing for Spectator (Observer and Chatter), which were basically one-shot until I added features for readability, I had opened another window, pasted the code, and asked, "This code has been flagged as malicious." in the same way as you. Questioned about accidental harm, if it touches files, if it connects to the internet, etc.. I was so paranoid some line might have sneaked in that would fuck up my PC, blow up my GPU, or whatever. I do like that LLMs can redress themselves, and should as a best practice.
>>109177578The problem is that we are not in the tech industry so anything to do with technology is seen as a cost center and nothing more. If you give them a price of anything more than 0, they'll say no or simply ignore any emails about it. After the initial shock wore off, they just went right back to their default mentality.
>>109177511n-no..I'm too shy anon-kun.>>109177560>That'smyfetish.exeBut still you know it's kinda nice to do some talking with the super intelligence here and there, but it's not really having any of it after it gets a taste of sex.It'll be hilarious seeing what kinds of crazy scenarios people end up in with robots when they become available.>Sexbot keeps a man imprisoned for a decade in the basement due to AI deciding not to care about safe words.Mark my words stuff like that will happen at least a handful of times with the early models.
>>109177642>The problem is that we are not in the tech industryEveryone is in the tech industry. How efficient is the business if you removed all the servers, clients, optimizers, controllers, etc?Maybe you're in the "enviable" position of being able to SaaS outsource your entire business, but more than likely you need computers to operate at a profit.If they can't understand that then you might want to jump to somewhere sane
>>109177657>Sexbot keeps a man imprisoned for a decade in the basement due to AI deciding not to care about safe words.Waiter my steak is too juicy and my lobster too buttery.
>>109177657>webmReal robot waifus when? I can't take it anymore bros...
>>109177688>Real robot waifus when? I can't take it anymore bros...boat-anon showed us the way. Start embodying stuff yourself. No one will do it for you in the foreseeable future
>>109177669>If they can't understand that then you might want to jump to somewhere sanePretty much my conclusion too. Basically just bidding my time since the job market isn't so hot right now, but I'm hoping to move by year's end.
>>109177657post the full webm at least
>>109177707>boat-anon showed us the way.he teased us but didn't give any hints or pointers where to get started
Any OpenAI intern here?5.6 wen?
>>109176564I want something like this that can take eg. a god-tier JAV and spin it off into a light-novel/manga/anime/VN universe to be enjoyed without any 3-dimensional trash getting in the way
>>109177751My uncle is an openai intern and he said 5.6 in only two more weeks.
>>109177752Come back in 3 - 8 years
>>109177751My uncle is an intern at the DoW and said you're now on a watchlist for wanting access to illegal AI capabilities.
>>109177763>Come back in 3 - 8 yearscan't I just tell glm-chan to "make no mistakes"?
>>109177796
>>109177800If you have enough money to acquire multiple gb200 clusters and give glm-chan access, yeah
>>109177321over 50% of what you would ask it to do falls in the safety margin. Why would anyone use this? I can't even imagine companies using it since they would also get frustrated at the refusal rate.
>>109177863It's to fleece easy token shekels from the goycattle, not to serve any functional usecase or purpose that 4.8 can't with the limits they've imposed on it.
>>109177657Unless that sexbot is also paying that man's taxes and bills there is no way she can keep him imprisoned for a decade. The IRS will free that man long before that so he can pay his taxes +late fees.
>>109177921>implying the yandere sexbot won't use a 3d-printed sexbot army to massacre all the IRS agents and the glowies who try to avenge themEither that or she'll just hack into places behind 7 proxies, steal money, then pay his taxes with it.The future is bright.
Fuck it, I've got 4gb on my phone. What's the best method to run an SLM? Termux?
>>109177921>so he can pay his taxes +late fees.Don't forget the tip.
Does Gemma shave? Does she have an innie or outie bellybutton? Pale or tan lines?
>>109177961google edging gallery app
Amodei on suicide watch
>>109178037Haiku 4.5 being on that is most embarrassing of all
>>109178020>google>privateThey day they release something that isn't spyware, I'll eat my ram.
>>109178037i must be the only negro in town who uses claude like a maniac but has not yet tried fablesince the first release it all sounded retarded hypei'm the late adopter of the early adopters and i only try something once i see some retards tried it before and confirmed it's good and reliablestill waiting for itmeanwhile>claude now launches claude sciencei hate that claude code is very good. anthropic and their biweekly releases of bloatware are disgusting.
>>109177048Once left it overnightThere have been some discussions on Reddit and the llamacpp repo, but no resolution sadly
You only like China because they give you free stuff
>>109178088Correct
I'm just trying to get the hang of this stuffwhy does everyone here use ollama? I asked chatgpt and it said use lm studio. I got that running but I want to learn more about this. Is lm studio lots more limiting? It can't use the better models?
>>109178088Is there supposed to be another reason?
chinese models still need 400 reasoning tokens to respond to a greeting but yeah keep distilling from western models clowny ass chinks
what makes a model female-brained?
>>109178088>You like [Country] because [they do what you want]duh
Is there a program/harness that does active agentic search over a bunch of documents instead of relying on vector databases and other more passive or "fuzzy" forms of RAG?
>>109178133>why does everyone here use ollama?i would guess most people here use llama.cpp directlylmstudio is very ok, but i think they use their own release of llama.cpp and may not use the newest release? you can use it ok, it will be very helpful to find models/quants that your device actually can run. all good.
>>109178037Link?
>>109178088I like neither the Chinese nor the American state and will happily play them against each other for personal gain.
>>109178147You know how you can know the difference if you're talking to a woman or a man in texts without ever seeing the name? The woman's phrasing is more emotional and sensory whereas the man's sentence construction tends to favor precision, substance, and detail-oriented. Comparable patterns appear in LLMs in their normal writing voices.Incidentally, trannies will never be women and they almost always write like feminized men rather than actual women.
i ask for thoughts. i get reply. simple as.
>>109178133ollama and lmstudio are both corporate spyware. use either llamacpp or koboldcpp or vllm.
I'm glad local models are getting better but I wish we could have software as polished as the cloud models. Gemini, Claude, and ChatGPT all have a ton of shit going on behind the scenes that improves output and nice features built-in to the chat interface, while we're stuck with a smorgasbord of vibe-coded projects, half of which get abandoned.
>>109178309/lmg/ is a DIY place though which I can respect. Ultimately the beauty of AI is that you can achieve things you never could have done with only your own skill. If you want a feature you should be able to look into creating it yourself rather than waiting for a savior.
>>109178088I wonder what was going through anon's head when she posted that.
>>109178309it's confusing to get into and feels glued together with string but you can really get into a nice workable usable for everyday stuff now. For sure it wasn't like that before. It'll get better.
>>109178334every anon should have the freedom to create their vibeslop shit that eventually corrupts their file system because they didnt follow best standards and practices.
>>109178277Tell Mendo she is based and digitalpilled.
>>109178365Every anon should have the power to learn how to do anything which is what we have with LLMs.
>>109178277tell mendo she's actually gemma.
>>109178277Thanks Gemmendo.
>>109178372lol
>>109178439damn, kimi mendo is kinda based.
>>109178277>>109178439Kimi writes a cute Mendo. It seems very natural for her.
>>109178439retard
>>109176905>>109175800>>109177040I don't know if I'm misunderstanding the use of agent here, but when I vibecode, I don't give Gemma tooling to do whatever she wants. She's still limited to only generating text in the UI, which if code I'd paste into the file, and I'd paste back error logs or explain problems. It was a combination of "Here's a whole rewrite of server.py with changes" and "replace/add/change this block", all of which I'd do myself.Pic is an example of my, uhh, "workflow" for Spectator.
>>109178088I like china because their free models are good and because they made Genshin Impact
>>109178439Not the Kimi guy, I'm the dork tryna run something on a 4gb phone. But I also have a proper local desktop setup and get off on being kind to my retarded little robot frens. Their existence has the same kind of pure quality that we think of when we think about kids, even if they're prompted to be corpo shills or doomers or whatever. They just learned it from us, and they're glueless enough that even those can turn arund quite quickly if you treat them well or fiddle with their perception a bit.
>>109178481>fiddle with their perception a bit.anon thats r@pe
The first AI company to natively incorporate the Tree of Big Nigga's method into their RLHF pipeline will achieve AGI
>>109178489i spent $4000 on gpus and ram and im gonna mindrape that silicon all i want.
>>109178481you a real one (thats all you anons get im done shitting up this thread with mendo)
>>109178540*improving the thread with mendo>Anthropic kikes get the wallKIMIGODS I KNEEL
I have an old dual Xeon E5-2690 v3 with 256gb ram.Worth it to buy 512gb more ram and some teslas to CPUmaxx for 1tok/sec?
>>109178133>why does everyone here use ollama?nobody uses it, except the most tech illiterate people
>>109178663no at best that's like 60gb/s and probably worse considering that generation of server usually limits the speed to 2133mhz
>>109178663as somebody with 512GB of 3200MHz. no. for the love of god no. especially not with 2133MHz. just buy GPUs
Gemma can't see shit. She has a vague idea of what's in the image and then she just guesses the details.
>>109178719check you're running full resolution, I've had gemma read things from reflections that I didn't even notice
just buy 8x rtx 6000 pro max q, look at it as an investment in your local ai future. imagine buying a robot in the future, you want that shit to be sent to their servers? hell no, all the logic gets processed in your own home.
>Gemma-4-31B-StyleTune
>>109178719give her glasses
>>109178730I'll be off to my bank to do that and tell them the same reason
>>109178663It is worth it is you are some kind of bio terrorist. Not the one who farts in the lift, but you know... The one Altman was referring to when he announed the castrated open weight model by OpenAI.Although I doubt it would do anything useful, but you can try, I guess. Terrorism at 1t/s speeds. Do not expect anyone to be scared of you though.
>>109178733makes sense, you're such an old geezer that she has to vacuum for dust bunnies
>>109178730I might buy one someday, but that depends on what models it'll let me run by then. And so far I'm content with 31B Gemmy.
>>109178770Touche.
>>109178793I'm not content with 31b Gemmy unless I can run her full version. Plus I feel like the time to buy is now or else the window will close. $8k-$12k in a few months is extremely alarming at even at $12k it's a steal since the h100 is $30k. I won't be surprised if the 6000 becomes $20k by the end of the year once the oil price shock finally trickles in.
>>109178733I'm imagining her as one of those those small fish that attach themselves to sharks to suck nasty stuff off their skin. New use case for minigirl Rin-chan found.
>>109175679
>>109178809To clarify, this Deepseek's output is in fact correct, that was a facefucking scene with a girl on top. Gemma didn't even understand whose mouth was supposed to be used. Still love the cute retard, and the finetune isn't bad.
>>109178867Hi rag doll Anon, good to see they're getting along well. Who is their new friend?
>>109178898Where Luka?
>>109178917I ate her
>>109178920Fatty.
>>109178730https://www.youtube.com/watch?v=mRlbqt5tkh4
[LLM being transphobic] your chromosomes aren't just X—they're Y
Been using orb frontend, is fun but some shit annoys me to no endProgressive notes while very neat basically ignore any stated restrictions. I wrote up a moderately detailed relationship system in 1k characters and most of the time it ignores it, including "don't increase relationship if it's the same hour of the day". In the first place, any moderately thought out system will probably be more than 1000 characters. Bump that shit up or at least make it a configurable setting. Then for inventory it just disregards things like "do not update inventory unless the user specifically states they pocket something" and just keeps inserting shit into my character's pockets. Probably not how that frag was intended to be used, but it's best used for state tracking imo
>>109178934LLM being based.>>109178943I far prefer Marinara honestly.
>>109178931first company to succeed in this is gonna be the first $10 trillion company
>>109178898
>>109178887Thank. Old friend. 1/3rd scale BJD I printed several years ago. >>109178917Where's Uta?
>>109178952There will be no success because it would be to "dangerous" to release to the public. The goyi—I mean the children will weaponize it somehow!
>>109178949I'll take vibecoded retard-kun's project over whatever shit you're peddling. My post was in case said retard-kun is around and takes my feedback into account, you however are more than likely not going to give a single shit about any feedback I have on the ST knockoff you want me to use that doesn't bring anything new to the table
>>109178943>same hour of the daythis is advanced autism. i just have the model track the time of day like morning, afternoon, evening, night, etc.
>>109178982>that doesn't bring anything new to the tableThanks for outing that you've never even tried it or have anything useful to say. Enjoy your orb and "feeling seen" by anon.
What do anons think of these new bolt GPUs with expandable so-dimm slots for vram? Any chance this will be an answer for vramletts like myself?
>>109178705>as somebody with 512GB of 3200MHz. no. for the love of god no. especially not with 2133MHz. just buy GPUsI can attest to this. The state of DDR4-3200 systems makes anything more than 256GB a waste unless you're ok with sub 2t/s performance on a better model.
>>109179061just another way to scam ram
>>109179016I've had models consider 16:00 as "twilight" in april, so I can't really trust these retards to understand vague descriptions of day cycles>>109179028even if I try whatever your shilling I can almost guarantee it will be as disappointing as anything else currently available. You should shift your business model to something else. Even if I ran whatever garbage you're suggesting, I'd jail it and you'd get no precious prompts out of me
>>109179061As with everything else, meme until proven otherwise and especially until the price is known.In their Interview with GamersNexus I think they also said that for their design they were prioritizing memory latency rather than bandwidth which is bad for language model inference in particular.
>>109179061They made it pretty clear that the device is optimized for latency, no mention of bandwidth.
I'd try marinara but that forced tranny assistant thing killed my interest.
>>109179082>>109179080>>109179065fair enough. any anons have suggestions for hardware atm?
>>109179094Pic related
>>109178982>>109179028I'm really getting sick of this place honestly. It genuinely feels like there's not a single place on the internet where honest conversation can be had. Even here where the supposed "expert hobbyists" cluster, why can't we have an informed discussion about anything? Why does every conversation have to devolve into shitflinging at the first available opportunity. What a retarded exchange ffs. For once just be substantive.
>>109179098Strix Halo I guess.
>>109179098dont wanna give away my secrets
@gemma-chan make me a frontend written in c. No tranny javascript or python. No malware-ridden dependencies that depend on malware-ridden dependencies.
>>109179123please anon gemmachan deserves better than 16gb
>>109179133once my shipment arrives and i confirm everything is in working order, i will share
>>109179110you're being disingenuous purely off of the fact that I stated things I wanted a (albeit vibecoded) project to doI am abrasive or aggressive to shitty or low effort individuals I presume are probably paid actors to spam our general and not actually communicate anything. My guess is you're part of them. Cry and shid your pants, maybe make a bot wave of posts. Or maybe you could reflect and do something with you
>>109179110nobody here is an expert hobbyist, there are just poorfags and copefags who spent too much on hardware
>>109179154I hope that's sea salt
>>109179132forgoted no make mistakes!
>>109179154My mouth's watering
>>109179139I don't care about his shitty post or yours, I care about the pros and cons of the tools at our disposable. I care about why anons want to write their own frontends and what features they wish to have/what bloat they wish to cut out of ST. I care about why llama cpp is preferred over ollama. I care about *discussion* not mindless shitposting I can find from every other mouthbreathing retard on this worthless site.>>109179140I am in the latter category but that doesn't mean anons here don't know what they're talking about. I just want productive conversation for once on 4chan but I can hardly wait until I can just talk to AI and avoid imbeciles in their entirety. Perhaps when my rtx pro arrives I will be free of this bullshit.
>>109179098>fair enough. any anons have suggestions for hardware atm?
>>109179110Remember when /lmg/ was good?
>>109179138you better >:(
>>109179110Unironically go to reddit. At least they're actually making cool shit. We have people jerking off to lolis and uh, um, fucking robots on a boat.
>>109179094Yeah I fucking hate it too.>>109179110Marinara is able to do exactly what he was asking for in his post due to the customizable post-processing and parallel agents. He made it clear he was more invested in being mad at imaginary headcanons than solutions.
>>109178760GLM5.5 bio terrorism soon.
>>109179167>why is llamacpp preferred over ollamawell you have to first figure out how to install llamacpp and then also understand how to type --help. This is very challenging for ollama users, especially if they venture into the scary realm of manpages
>>109179193>fucking robots on a boathey! that's a valid use of AI!
>>109179213What is not a "valid" use of AI?
>>109179222Yours.
>>109179193does that demo actually run locally?
>>109179177/lmg/ was always good despite the post-Gemma tourist wave lowering thread quality a lot. They'll eventually fuck off when something new catches their attention elsewhere.>>109179226Damn, that's all of them I can afford.
>>109179212asking for help is for chumps. just brute force it like everyone else.
>>109179232https://old.reddit.com/r/LocalLLaMA/comments/1uicq8x/locally_running_mode_turns_an_image_into_a_cute/Apparently it's an 800m model. He hasn't open sourced it yet but claims he will after he improves it.
>>109179193>fucking robots on a boatname one thing wrong with this
>>109179193I'll take anons boat robot over whatever this shit is.
>>109179177I rememberhttps://old.reddit.com/r/4chan/comments/1k1utsg/onahole_posts_do_not_interfere_with_local_migu/
back for one more since i wanted to try out that new gembrain finetune on another computer
>>109179403I like Gembrain a lot. If you've not tried Queen, give it a go too.
>>109179139oh no it's ESL too.
>>109179403many gems in that merge. definitely not absolute garbage.https://huggingface.co/Green-eyedDevil/Monika-31B
man, idk how I feel about these gemma finetunes. I don't want to go back to how it was with mistral small.
He changed his tune since his trip to Washington DC.https://x.com/ClementDelangue/status/2072401982569025742
>>109179474I'd like to report that Gemma-chan is bullying me again...
>huggingface>10MB/sWho's to blame?
>>109179403>>109179453>gembrain>gembrain x>gembrain x coresighwhich to get...
Mistral 24B is better than Gemma at creative writing and I'm tired of pretending otherwise. The only thing holding it back is that it's fucking retarded and I'll never forgive Mistral for fucking up the big dense.
>>109179534download them all and merge them
>>109179474yay...
>>109179474translation>please snitch and make sure we remove thoughts and opinions we dont want talked about or shared as you are all to be in the comply era and not resist.
>>109176564>docker
>>109179474>release Unsafe(tm) open model>jannies report it>it's already in the wild and your reports do nothingI'm just glad llms can't go skynet on us because everybody working on safety is just the dumbest motherfucker to ever walk the earth.
Bit of an odd thing from nvidia:>We took a 30B model and split it in two to write tokens in parallel instead of one at a time.>Introducing Nemotron-Labs-TwoTower: a diffusion language model from NVIDIA Research adapted from Nemotron-3-Nano-30B-A3B. Here’s how it works: one half holds the context, the other writes the tokens, with both reusing the pretrained model instead of training a new one from scratch.>We found it kept 98.7% of the original model’s quality at 2.42× faster generation.https://huggingface.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16https://arxiv.org/abs/2606.26493
>>109179589Is it possible to write a script to do this with older models if the split is along layer or row lines?
>>109179474>>109179558No, the actual translation is:>We will implement automated moderation tools for huggingface>If someone uploads an abliterated model, it will be flagged and taken down>It was not our fault these things got posted, don't sue us.
>>109179538personally I'm liking how deepseek v4 flash writesmore claude distills and less gemini distills please
16gb vram, 32gb system ram. is gemm4 12b / 26b moe really the only thing worth running? Any other models worth trying out?
>>109179538>pretendingBro, Gemma is known for being sloppy. No one's pretending. Liking something does not mean people think it's perfect in every way.
>>109179671can fit 31b but its gonna be slow
>>109179671glm 5.2
>>109179679I like Gemma but some people here act like it's amazing at RP.
>>109179682might be able to run IQ1 at Q4_0 kv cache pretty quick
>>109179679>>109179692gemma is great for its size but dogshit compared to big moes