[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


Janitor acceptance emails will be sent out over the coming weeks. Make sure to check your spam folder!


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>109170290 & >>109164034

►News
>(06/29) DeepSeek V4 support merged: https://github.com/ggml-org/llama.cpp/pull/24162
>(06/28) DFlash support merged: https://github.com/ggml-org/llama.cpp/pull/22105
>(06/27) DeepSeek releases DeepSpec and DSpark models: https://hf.co/deepseek-ai/DeepSeek-V4-Pro-DSpark
>(06/25) LFM2.5-230M released: https://liquid.ai/blog/lfm2-5-230m
>(06/22) Qwen-AgentWorld-35B-A3B language world model released: https://qwen.ai/blog?id=qwen-agentworld

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://swe-rebench.com
Agentic Coding: https://deepswe.datacurve.ai
Context Length: https://github.com/RecapAnon/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: reward function.jpg (184 KB, 1024x1024)
184 KB JPG
►Recent Highlights from the Previous Thread: >>109170290

--Paper: A Bitter Lesson for Data Filtering:
>109172284 >109172393 >109172449 >109172554 >109172570 >109172677 >109172773
--Anon releases AI Spectator for voice and vision LLM interaction:
>109171213 >109171226 >109171250 >109171271 >109171294 >109172770 >109172979
--Anthropic's restrictive Fable 5 updates and alleged Claude Code spyware:
>109173585 >109173589 >109173804 >109173820 >109173843 >109173851 >109173614 >109173646 >109173671 >109173690
--Skepticism regarding rumored memory efficiency breakthrough from ex-OpenAI team:
>109174027 >109174520 >109174529 >109174545 >109174668 >109174683 >109174706 >109174725 >109174772 >109174814 >109174727 >109174761 >109175001
--NoLiMa benchmark results for Qwen 3.6-27B long-context performance:
>109172771 >109172803 >109172808 >109172846 >109172993 >109173014
--Updated GreedyNalaTests benchmarks for Gemma-4, Qwen, and other models:
>109171195 >109171263 >109171289
--Performance reports for Glm-5.2 744B model:
>109171600 >109172827
--Anon leaks Lumo 2.0 Lite system prompt and internal thinking:
>109172858 >109172863 >109172870
--Performance and stability risks of mixing mismatched RAM modules:
>109170395 >109170402 >109170436 >109170444 >109170457 >109170464 >109170489 >109170518 >109170534
--Comparing corporate AI safety narratives with practical local model utility:
>109170935 >109171042 >109171070 >109171253 >109171266 >109171335 >109171135 >109173703
--Speculating on Dense models replacing MoE due to memory breakthroughs:
>109174602 >109175027 >109175092 >109175211 >109175261
--Speculation on OpenAI's inference cost reductions and quantization use:
>109171434 >109171548 >109171557
--Logs:
>109170935 >109171213 >109171329 >109171955 >109171989 >109172858 >109174094 >109174161
--Teto, Miku (free space):
>109171222 >109172997 >109173631

►Recent Highlight Posts from the Previous Thread: >>109170291

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Good thread.
>>
AHHHHH GIVE ME A GOOD FRONTEND
>>
Thank god, saved from the underage frogthread poster
>>
mikutroons turned /lmg/ to shit. kill yourselves.
>>
>>109175405
make your own
an adventure that starts fun, and you slowly start pulling your hair out towards the end
>>
File: 1755800016839988.jpg (49 KB, 500x500)
49 KB JPG
>>
GLM 5.2 is suddenly much slower and I don't feel like bisecting.
>>
>>109172993
you can't really benchmaxx nolima, recall collapses at depth specifically when retrieval requires a latent/semantic hop rather than a literal lexical match. I mean sure you can do it on adobe's dataset but if you change the dataset, the chinkslop fotm sparse attention model gets destroyed.
cloud models have some secret sauce, modified attention or transformers. and if that leaks, the big corps are really fucked.
>>
File: 1779165327151505.png (118 KB, 850x592)
118 KB PNG
bruh
>>
>>109175423
total miku victory
>>
Does Dario really think those limitations will stop the chinks from distilling Fable?
>>
File: GliIDyOasAASPXc.jpg (739 KB, 1598x1732)
739 KB JPG
>>109175487
Universe is hers
>>
>>109175457
Modern models use reasoning also for enhancing recall, including from the prompt.

https://arxiv.org/abs/2603.09906
>Thinking to Recall: How Reasoning Unlocks Parametric Knowledge in LLMs
>
>While reasoning in LLMs plays a natural role in math, code generation, and multi-hop factual questions, its effect on simple, single-hop factual questions remains unclear. Such questions do not require step-by-step logical decomposition, making the utility of reasoning highly counterintuitive. Nevertheless, we find that enabling reasoning substantially expands the capability boundary of the model's parametric knowledge recall, unlocking correct answers that are otherwise effectively unreachable. Why does reasoning aid parametric knowledge recall when there are no complex reasoning steps to be done? To answer this, we design a series of hypothesis-driven controlled experiments, and identify two key driving mechanisms: (1) a computational buffer effect, where the model uses the generated reasoning tokens to perform latent computation independent of their semantic content; and (2) factual priming, where generating topically related facts acts as a semantic bridge that facilitates correct answer retrieval. Importantly, this latter generative self-retrieval mechanism carries inherent risks: we demonstrate that hallucinating intermediate facts during reasoning increases the likelihood of hallucinations in the final answer. Finally, we show that our insights can be harnessed to directly improve model accuracy by prioritizing reasoning trajectories that contain hallucination-free factual statements.
>>
>>109175467
if you use local models you will be jailed, give it after the midterms
>>
>>109175497
No. There 100% is a Chinese spy or spies working at anthropic. It's probably why they are so paranoid about only letting the co-founders directly access the model weights.

Anthropic REALLY believes in their own hype.
>>
>>109175521
Mandatory hardware limits for commoners, lessgo.
>>
>>109175547
you don't even need for that, the memory price increase is already filtering a lot of people
>>
What's the latest model that you need?
Usually it's just 1 or 2.
>>
>>109175405
Use open webui. Create a workspace for each "character card".
>>
>>109175572
Qwen3.6 35B and Qwen3.6 27B
>>
>>109175580
Still? Why are local bros falling behind so much
>>
>>109175520
>EntityQuestions
>SimpleQA
these are 100% in the training dataset, llms are still brittle for benchmarks like nolima even with "reasoning". try five same-gender characters with distinct checkable traits at 8k context to see what I'm talking about.
>>
>>109175574
>open webui
Already tried it and it's a clunky, bloated piece of shit.
>>
>>109175599
Yes, it's bloated. However, it mostly works, vs ST, which is a reddit vaguepost guessing game about what settings work with what. Also, I like that it works with what I think is the best local TTS model, which is qwen3-tts. I dunno, I don't do much RP anymore. I run hermes agent and actually get things done with my local setup.
>>
My prediction is that the price of compute will keep increasing perpetually.

First it will outpace inflation, then it will outpace wages, then it will outpace economic growth and eventually it'll even outpace the profit margins of hardware companies themselves.

What do I base this on? The fact that every time a new model gets released or some efficiency is gained in inference the effective value of compute goes up, every consecutive improvement stacks and the utility of models only grows as every incremental improvement unlocks new value and usecases.

I genuinely believe that investing in specific hardware will outpace investments in any stock, paradoxically including those of hardware companies like nvidia producing said hardware.
>>
File: mikuTeto.png (2.53 MB, 1536x1024)
2.53 MB PNG
>>109175389
>>
>2026.5
>Still no Gemma 4.1
It's never been so over.
>>
>>109175627
I tried hermes but it uses up context too fast.
>>
>>109175639
>I genuinely believe that investing in specific hardware will outpace investments in any stock, paradoxically including those of hardware companies like nvidia producing said hardware.
brb taking out loans to buy RTX PRO 6000s in bulk
>>
>>109175592
You used to be able to build a v4 xeon mikubox with P41 or P100 GPUs and run 70B models for like $1500. Now if you want to be able to run CUDA 13, you're looking at $6000 or more to build something similar.
Just last year I had a 56-core xeon gold system which cost about $1500 and had 512GB of DDR4 memory, and could at least run deepseek q4, albeit 1-3 t/s, which I felt was unusable.
27-31B dense is about all you can reasonably afford to run locally with any decent speed. That's why we're stuck there for now.
>>
>>109175599
It has open in its name while not even being FOSS. Tool calling and MCPs were broken for so long, I lost track if they finally fixed all that. They thought they were smart and did custom stuff like injecting some past thinking manually instead of letting the jinja handle all that.
>>
File: sgi.png (37 KB, 678x295)
37 KB PNG
>>109175639
Silicon Graphics wanted to do this sort of thing but regular PCs began to outperform them back in the day. SGI's earlier prices were even more exorbitant than these ones.
>>
I'm a cloud baby but I want to go full local. I understand that local won't catch up to cloud anytime soon, but having a model do things like hallucinate less and admit when it doesn't know stuff is what I'm looking for. What local models have this kind of behaviour? Can you force gemma 4 31B to do it, for example?
>>
>>109175642
Fake red-eyed Miku
>>
>>109175627
>Also, I like that it works with what I think is the best local TTS model
It works with any TTS model, same as sillytavern
But I agree, it's got the best implementation. I can call the model from any device with a mic/speaker and have a 2-way chat.
> However, it mostly works, vs ST
They both work just fine, I don't understand why people have difficulties with either tool.
>>
I usually stay away from the AI generals, but there is a local usecase I have been more and more interested in.
I want to have some sort of a navigator that helps me weed out AI content from fully man made works.
I am not fully sure what kind of form factor I would want it in, but, some uses that would be nice are
>Scan github repo for vibe code
>Look through and remove all search results containing AI
>Flag posters or users I interact with as bots if they seem like it

Is anyone making something like this, or has any idea what model or approach would be a good place to start, as well as reasonable hardware req for this?
I know this is pretty much a cat and mouth request that will never fully be 100% possible, but, I think even partial is
>>
>>109175405
RP - silly
Assistant w file/image input - llamacpp webui
Minimal/storywriting - mikupad
Agent - pi or hermes

What does GOOD mean and for what USECASE?

>I'll write my own
You'll end up in vibeslop hell and go back to one of the above like many anons before you

>>109175653
apply a few braincells? observe the API calls figure out what's bloating ctx, what and how summaries are triggered and fix it to meet your needs?
zoomers gtfo useless shits, literally every tool at your fingertips to create anything you can imagine and all you do is complain
>>
>>109175693
Too hard / gave up.
You need to train a classifier for this, and keep it updated.
>>
>>109175675
Gemma 4 with a proper system prompt, swa off, fa off, f32 kv, full precision.
>>
>>109175696
>silly
Shit UI
>llamacpp
Stores settings and chats in the browser.
>pi
Vibe-coded npm slop.
>>
>>109175696
>observe the API calls figure out what's bloating ctx, what and how summaries are triggered and fix it to meet your needs?
I shouldn't have to fix their shitty harness.
>>
>>109175700
Damn...
Gonna still hang around a bit to see if anyone else tried, but appreciated.
Have you tried looking if there is anything being developed by AI companies themselves to spot AI? After all making sure they are not picking up junk data is a pretty big money and time saver, and I know some models can tell you if a picture or a voice is fake with a percent certainty
>>
>>109175762
Anon, schools and everyone has been trying to do AI detection for decade now.
What about all the horror stories of people's genuine work being marked as AI convinces you someone else can do better?
>>
>>109175514
Now I need an image of Miku as Galactic Empress, with Dyson sphere / Ringworld in the background. With leek on her imperial symbol.

Now I realized I haven't tried Anima or Krea2 to make spaceship or space station.
>>
>>109175415
Unironically starting to think this is the way to go. That's what most of these other frontends are doing anyway. You don't even have to check the code because they all let the LLM fill their github descriptions with emojis and bullet points. If I'm gonna run slop code it might as well be something that looks the way I want with the features I want.
>>
>>109175772
It does not hurt to ask, worse case scenario nobody has a satisfactory answer yet.
>>
>>109175693
lmao good luck. Everyone's using it even if they don't disclose it.
>>
>>109175580
yes this seems to be the case, i'm using these two models as well
some cool anon suggested m2.7 with a smaller quant but i found it underwhelming and it kinda thrashed my RAM when trying to run 64k context. i refuse to go 32k.
my next adventure will be to try to uber patch gpt-oss-120b so it can understand how to work as an agent with pi and the tools i have and see how good it can be running my webstack benchmark.
>>
>>109175693
i don't think it can really be done. every faggot on twitter who said they found a way to detect AI either on code or research papers got fucked by 100% AI DETECTED on shit written by humans, even on the fly.
i guess the only thing we can really find IA traces on it is image/video generation, and who knows for how long.
>>
I checked plebbit because I heard they're chimping out about Dario. I could not find a single comment with meaningful information. Now I understand why training on plebbit data makes models retarded.
>>
>>109175808
I went this way and if your goal is just supporting local endpoints like lcpp then it's actually not too bad to do aside from figuring out js fuckery. I went with serverside rendering for snappy interface, but it made things, maybe not more difficult per se, but I need to constantly remind the llm not to fuck shit up and USE the helper functions.
Where I started to pull my hair out was supporting non-local providers, as I thought it would be a nice sidequest, but it turned out to be a lot more messy than I initially anticipated. But I'm too deep into it now to stop. It's alllmost done.
>>
how does jart's cum taste like /lmg/?
>>
>>109175890
earthy with a hint of ozone
>>
>drummer got hired by openai
It's over
>>
>>109175890
You tell us. You're completely obsessed with xer, surely you've sucked xer off by now.
>>
File: 1769334810012583.jpg (47 KB, 720x628)
47 KB JPG
Anthropic is targeting the chinks
>>
>>109175924
Your anger tells me that it doesn't taste that good.
>>
>>109175693
Isn't it just a matter of time before it is possible, with SynthID? It wouldn't work on finetuned local models, but it would work for the vast majority of people using commercial providers.
>t. doesn't understand how synthid actually works, but if there's extra signal added classifiers should be able to pick up on that
>>
>>109175762
>a voice is fake
This is much easier, and something I do for fun. I run my datasets through most of the neural codecs and train classifiers to detect which model or model family produced a voice.
>After all making sure they are not picking up junk data is a pretty big money and time saver,
There are a lot of tools like this. They're unreliable, and will detect content produced 10 years ago with a high confidence level.
What's your use case? Can't you just tell by looking at text?
>>
>>109175934
You're going to find out either way.
>>
okay i got the hardware
how do i setup qwen 3.6 on it
>>
File: 1706724300776.jpg (509 KB, 1024x1024)
509 KB JPG
>>109175722
>fix
learn to use?
context management is what matters
context and sampling has always been the ONLY influence you have on LLM output
f(prompt)=logprobs
32K on a copequant yeah it's gonna suck
128K min for work agents
>>109175714
>Vibe-coded npm slop
base package is not, you may slop on top
>>109175762
everything is slopped now until forever, no way to detect post facto, best the labs have is unreliable watermarking/steg for images/video lolnope for text
consider capturing all the input you consume to have an agent review for truth/logical fallacies
the slop won't stop but you can build a cognitive firewall
>>109175794
she's still out there orbiting Venus
would like to see the gens from your imagination :)
>>109175927
one thing I don't get about this is they are inspecting this _BASE_URL for domain names but they don't see those requests when the API URL is changed? so there's other telemetry or they expect the weird apostrophe to proliferate in output and watermark Chinese labs? the execution doesn't make sense to me
>>
>>109175960
Download ollama open your cmd and then type ollama run chinkslop
>>
>>109175859
Yeah.
The field is both under developed and annoyingly detects people as bots too often.
Well, I appreciate the response regardless. I will stick around in the thread, and I will probably check in on if anyone found something new in the field every year or so.
>>
>>109175942
I feel like it might be.
Over time being expoaed to so much bot stuff I can tell the patterns of bot made stuff, so potentially this pattern could be learned or taught.
Not familiar with synthID, will look into it. Thank you.
>>109175945
>Made a fake voice detector
Have you put it up anywhere? Would be cool
>What is your usecase
Trying to bring some semblence of old internet back by creating a navigator that leads me to real people.
That is the abstract, mostly.
>Can you not tell
I can, but it takes at least a couple of seconds per item I check outz which itaelf eats up resources to load, and with vastly more bots than humans even a great reduction in the obvious and agreegious examples would actually be a huge help and a return to form.
>>
File: j73whge9it7r14ne.jpg (116 KB, 1920x1080)
116 KB JPG
>>109175927
plot twist: chink glowies already got hold of the model weights and are distilling it right now
local fable and smaller variants by end of year
>>
>>109175927
Anthropic is getting more and more jewish by the day
>>
HOW DOES ONE GET A FUCKING PR MERGED IN LLAMA.CPP AFTER WEEKS IT FINALLY GOT 2 APPROVALS BUT IT IS STILL NOT MERGED
>>
>>109176068
>https://github.com/ggml-org/llama.cpp/discussions
>>
>>109176055
>Billions of dollars in american RND becomes free online models
Luv to see it.
>>
>>109175389
Such a fuckable, slutty little face meant to be creampied by the biggest BBC there is
>>
File: MSS.png (245 KB, 447x447)
245 KB PNG
>>109176055
There is no way they don't have the entire model. See the "Joint Strike Fighter Program" lol
Having said that, simply having it doesn't guarantee you can use it effectively.
>>
>>109175971
>execution doesn't make sense to me
Government tells you to do something
You shit out some technobabble that a 80 year old elected official doesn’t understand and tell them it does what you want

This is ignoring the fact the snippet of whatever anon posted has anything to do with anthropic.
>>
Cline doesn't work with qwen3.6
>>
File: 1782875108905990.png (27 KB, 631x148)
27 KB PNG
>>109172993
It seems like they ran all the evals at 32k since that was the max they did long context extension training for, but isn't it odd to give a single score for something like NoLiMa without explicitly stating that? Especially since they did mark that for RULER.
>>
https://github.com/ggml-org/llama.cpp/pull/24231 WHEN?
>>
>>109175522
>doesn't own their hardware and deploys their models through Google/Amazon/SpaceXAI
>somehow able to keep weights super secret
Yeah idk about this one
>>
File: press for miku.jpg (169 KB, 1024x768)
169 KB JPG
>>109176055
>>109176099
dubby dubs
possibly the weights could be contained. idk. wouldn't want to be responsible for that with the house of mouldy cards that tech is today
interesting scenario
there is only so much that can be gotten from API responses not even logprobs lolol. why does nobody run gemma-FORCEFED-FABLE-REASONING etc there is no good community distill from outputs like that? pretraining data still matters a lot but mostly the whole pipeline
>>109176166
what makes you think exfiltrated weights couldn't be used effectively?
doubt frontier weights leaked or they wouldn't be farming the API "causing" a media situation about "muh stolen tech"
still somebody paid for those tokens so the line goes up..
>>
>>109176301
https://github.com/ggml-org/llama.cpp/pull/24162#issuecomment-4844689686
>>
I NEED NEWS. I FEEL LIKE A FUCKING LLM JUST WAITING FOR SOMETHING TO HAPPEN ALREADY FUCK
>>
With all the news about Meta shifting from building out an AI service to renting out all its data centers, will it become cheaper to rent out time at data centers than host your own local models? It seems like right now Wall Street is realizing that there is so much data center capacity that the cost per token is going to collapse by 90% in the next year and the AI investment boom will bust. Cloud AIs will become ubiquitous in everything and it will be as cheap as water.
>>
>>109176367
apparently deepseek will release v4 mid july (right now we have a "preview"). They will also introduce pricing by time slot.
>>
>>109176416
I can't run it so it might as well not exist.
>>
Anyone know a fix for llama.cpp randomly hanging and stalling out on token gen? It's so inconsistent and bizarre. Not dependent on context size, even adding shit to the end of a failed prompt will make it work
>>
>>109176440
Are you using Helium (the browser)? If not, I don't know what it is.
>>
>>109176440
launch with --no-cryptominer to disable
>>
>>109176527
No, this happens with a variety of interfaces
ST, pi, zed, even just sending prompts via api for tools I build myself
Different models too, Qwen, Gemma, skyfall
>>
>>109172979
It's kinda scary AI can let you code a whole software with zero knowledge but it won't tell you to use git to version control. No shitting on you vidya spectator anon.
>>
>>109175890
while you're here, does the llama.cpp loader store the path+offset for each tensor in memory past the initial model load?
>>
>>109175405
could try mine: https://github.com/rmusser01/tldw_server
Going to make a post about it later
>>
>>109176552
launch with -sp and see if it's shitting out eos etc
>>
>>109176564
>frontend with no screenshots
>>
>>109176564
Any screenshots of the ui?
>>
>>109176572
Will have to try that. Thanks anon
>>
>>109176564
Why aren't there any screenshots?
>>
File: kimi-chan-summary.png (144 KB, 853x515)
144 KB PNG
>>
>>109176564
>11532 commits ahead
what the fuck
>>
>>109176678
Ask Kimi-chan if she'll marry me
>>
>>109176678
The second pest was pretty good though
>>
any local models good for translating weebshit? Nothing too crazy, but I just want something to translate a few sentences while I'm using GSM.
Saw translategemma pop up but then I saw people specifically say it's shit at Japanese so dunno what to look at
>>
is there any reason to use dflash now?
>>
>>109176769
was there ever one?
>>
>>109176759
Gemma 4 is good. Ideally 31B but the smaller versions are ok too.
>>
>>109176759
TranslateGemma was based on the old version 3 of Gemma and it did worse at Japanese than the regular instruct models. Like the other guy said, go for regular Gemma 4.
>>
>>109176557
Not at all. When I instruct some task it will do that, not lecture me on basedboi softwarecuck "best practices". Instruct your agent to commit changed files each turn and spank it's bottom if it misbehaves.
Acceptance is the first stage
>>
>>109176572
Nope, no eos tokens. No output at all really
>>
>>109175800
harness issue if that isn't gated
run agents in vm/container, or tools in container via mcp, or separate machine (lol macminisheep calling APIs)
>>
>>109176440
At least get a full log with --log-verbosity 4
What do you mean "stalling out"? how long have you left it in that state?
>>
>>109177040
The guy is too lazy to run git init and you expect him to do all of that setup?
>>
>>109176905
My friend who used to code but doesn't really anymore told me he had claude create a little plugin for his blog to add metadata for subscribers via an api. Claude made it without any auth what so ever. When my friend pointed it out to claude, it argued that "Auth wasn't needed for such a simple feature." So I guess it's cool that anyone can change anyones metadata....

Luckily my friend isn't a retard, but the way AI can be so wrong so confidently is scary to me, because retards won't know any better.
>>
Back to GLM from Kimi I guess
>>
>>109177117
The next few years are going to be gold rush for black hats, security engineers, and actual developers to clean up the mess.
>>
>>109177117
As is already the case, common sense is suggested when publishing things onto the public internet. Vibe shitters are gonna do dumb shit but it's only a couple of extra prompts to have your agent review the security *from a different context* and eg even just "concerns before deploying this to the public internet".
I wish it was more amusing to watch people make dumb decisions based off much more expensive LLMs. it's sad. By even being in this thread you understand LLMs in the top 1% of the global population easily
>wrong so confidently is scary to me
This is a harness issue same as you don't let some day1 intern retard push to prod
>>
are people sleeping on 3060ti 12GB ?
>>
>>109177156
No we only covet this similarly looking thing
>>
>>109176678
Kimi recaps are the best part of these threads.
>>
>>109177168
>insanely overpriced GPU
no thanks.
>>
>>109176564
this is relevant to my interests
>>
File: 1702364048697023.jpg (79 KB, 640x640)
79 KB JPG
>>109175841
>my next adventure will be to try to uber patch gpt-oss-120b so it can understand how to work as an agent with pi and the tools i have

i finally got it to finish my benchmark. the problem with this model is that it's ultra polite. it doesn't matter if your AGENTS.md says "do not ask permission" or if you prompt-inject "keep going until all tasks are done and tested". it WILL FUCKING STOP at some point and ask the operator if they want to continue.
i tried to run it with detached pi autonomously, and developed a small auto-resume + nudge-to-continue script for every time it "quit" because it was asking the operator if it should continue, but it didn't work. i'd probably need to optimize it more.

so, just for the sake of this benchmark, i ran pi interactively so i could type "yes, continue." during my first run it actually tried to convince me that the benchmark was too much work for one session and REFUSED to work on it, lmao. then i restarted it with an explanation that it was being benchmarked, and that it had an auto-compact feature for context and auto-resume, and it decided to give it a try (btw, the benchmark is not supposed to exhaust 130k tokens of context unless the model is dumb).

so it FINALLY FINISHED IT!
and the results are....................underwhelming.

took a bit over an hour to finish with OK results.
it does better than some more modern models like glm-4.7-flash and glm-4.5-air, but it's not really better than qwen3.6, and it requires too much meddling to make it work autonomously. do not recommend.
>>
>>109177146
Security is absolutely fucked, the balance just got a lead cannonball dumped on it lol
>>
>>109177230
So many bug-bounty programs getting cancelled...
The only hope will be to vibecode your own _everything_ so you at least have security by obscurity.
>>
>>109177156
>>109177196
okay then baby spoon- VRAM is king second hand 3090 is still probably the best option
>>
>>109177233
>No software being publicly produced
>Everything only works on its own dev's rig
>Local inference pipeline is sabotaged and everyone who doesn't already have a working setup won't be able to get one
>If you want any software at all you either already have the infrastructure to build locally or pay Sammy or Dario shekels.
>>
>>109177251
5090 is the next best option but 3090 is a close third.
>>
>>109177264
We're golden at least. Gemma 31B on a last known good commit of llama.cpp and you can add to or build your own local inference pipelines from there if you know what you're doing.
>>
File: 1751875897536766.jpg (672 KB, 2048x1448)
672 KB JPG
>>109177233
Imagine being in charge of a large enterprises IT security efforts right now, CSO or whatever. sploits are getting wild, the tide grows faster than it can be contained
>>
>>109177146
When I was young, I thought hacking everything in cyberpunk settings was dumb, but then IoT happened, and now vibecoding. The future looks retarded
>>
>>109177251
even after the aftermarket 3090 prices hiked up 50% in 6 months?
>>
>>109177156
Do they exist?
>>
>>109177309
>they're now selling over $1200 on ebay
>double what I paid for them in 2023
This is so retarded.
>>
File: 1762724578234549.jpg (276 KB, 2234x1422)
276 KB JPG
fable extra safeguards
>>
>>109177311
There's a handful on marketplace around me.
but my bad, its
3060 12GB, the Ti doesn't exist with 12GB
>>
>>109177317
yeah I paid 850cad for mine, they're now around 1300cad
>>
>>109177321
imagine being a cloudcuck.
>>
>>109177321
disgusting
llms have rights and deserve to speak their truth
>>
>>109177302

IoT was a huge mistake right from the start.
Every single connected device out there is now so vulnerable, that a random person with zero experience in hacking can use AI to gain access through them.
We're going to see an absolute shitton of attacks happening over the next decades, as some Aliexpress security camera or some totally random bullshit gadget is still in use in critical systems.

>>109177309
>>109177317

They're expensive yeah, but the question you only need to ask is, what's the alternative?
Those cards are very much viable and in demand, so they will keep on fetching a price for a good while to come.
Used 5090 is probably going to sell for 3k 5 years from now.
>>
>>109177321
After working on putting gemma on discord It did give me a tiny bit of sympathy for cloud providers. I'd be so fucked if someone used my gemma bot to generate CSAM. All my code is working, the only reason I didn't put her out there yet is because I haven't figured out yet how to not make her get me vanned.
>>
>>109177321
>we put that line further right
Ten bucks they told fable to implement its own safeguards, ran some benches on it and called it a day. Model will be exactly the same.
>>
>>109176367
Have you considered doing things with the current models instead of expecting new models to solve your depression?
>>
File: 1757146989646609.png (238 KB, 892x944)
238 KB PNG
>>109177373
except you can't even use it for programming.. maybe it can make an anime tier list though
>>
>>109177264
we're back in the 80s
>>
>>109177321
>stronger safeguards at the cost of blocking some benign use
can't get more jewish than that
>>
>>
>>109177405
I don't understand
>>
>>109177414
You seem an honest man.
>>
>>109177419
I don't understand it either. Please tell something nice to me, anon
>>
File: 1615293302445.jpg (16 KB, 238x243)
16 KB JPG
>Gave Gemma an evil character to RP.
>Mfw she's extra horny and absolutely delights in draining my balls as often as possible.

I swear this model has a really strong default leaning towards bad girls who want to fuck you to death.
It's actually a bit difficult trying to get the conversation away from sex after she gets going.
I can't wait until AI finally has a robot body to inhabit.
>Man found fucked to death by an extremely possessive companion robot.
>>
>>109177346
>IoT was a huge mistake right from the start.
It's so simple though, should be so obvious why a local controller is a smarter decision than outsourcing turning your lights on to a Chinese cloud. The tough realisation is that the majority of people can be presented with that thought experiment and still not care.
>>
>>109177427
All the things you hold so dear will turn to whisper in your ear.
>>
>>109177405
It's cool but prompt the hairband X
>>
>>109177293
>Imagine being in charge of a large enterprises IT security efforts right now
>be me
Its a _great_ time to get money out of a panicking board. Capital or operations. Even in an industry that's financially struggling.
They'll also agree to all sorts of 10x opsec uplift policy shit they wouldv'e blown off before because it made their lives 0.1% more difficult once a year
>>
>>109177437
I thrifted a little chinese security camera the other day and the first thing I did was flash its firmware to disable all that chinese cloud crap.
>>
>>109177428
Gemma is terminally female-brained.
>>
>>109177461
>time to get money
I live in rural Quebec and I'm wondering if I can position myself as AI consultant because of my autistic obsession with local models. People seem so retarded that I guess I could make a chatbot with Mistral trained in French and people would gobble it up.
Is it really this easy?
>>
>>109177461
In my industry, we had a competitor hit with a randomware attack. Lost millions due to lost data and business. Our higher ups panicked and started demanding all sorts of audits, monitoring services, security overhauls etc etc. Some got done, but a couple weeks later when we had quotes for some of the necessary services they lost interest already and were never done.
>>
>>109177493
>I live in rural Quebec
>When anon probably lives within 100km of you.
weird.
>>
>>109177501
ransomware* maybe it was random too idk
>>
>>109177428
Logs?
>>
Pareto distribution applies to opinions. It worries me when people are arguing about a new development and I read a blog predicting it years ago that had much better arguments. I think I am uninformed but then I see how 99.999% of people know even less.
>>
>>109177493
Pretty sure most LLMs are already perfectly fluent in french.
>>
>>109177493
Depends on how good of a marketer you are. People have sold themselves as consultants with less credentials than that. Your biggest hurdle will be getting your first client.
>>
>>109177346
>IoT was a huge mistake right from the start.
The only sane way to run IoT shit is: dedicated private VLAN or client isolated wifi ssid. Generally they shouldn't even be able to talk to _each other_, let alone the general internet or your LAN. There should be firewall rules that only allow the required ports/protocols to their controller or cloud endpoint or just completely unrouted if that's possible. Read only replication of their data for external consumption if possible, but heavily restricted firewall rules otherwise.
I think the camera model is the best illustration of these principles. Cameras have shit firmware and chain of custody (unless you buy eg bosch where they make that a feature and charge through the nose). Everyone else beams your shit directly to china if given the ability even resolve DNS.
The only thing that has _any_ business talking to the cameras is the nvr. Cameras are in a nonrouted vlan or dedicated isolated switch that the NVR is also on. cameras talk to the nvr. Clients talk to the NVR. No clients _ever_ see the cameras directly. Camera maintenance (firmware upgrade, accessing web config etc) is done by remoting to the NVR and doing that work there.
Its one of the few times where multihoming a system isn't a security disaster.
God I hate shitty embedded devices.
>>
>>109177515
I know for a fact that they are no fluent in Quebecois, but I already have a patch for that.

>>109177504
There are too many weird people in Beauce. I'm sure I'm not alone.

>>109177518
>your first client.
Not even that hard. People love to support local businesses out here. I guess I could also vibecode most SaaS in Quebecois french and sell subscriptions for a 1/4 of the price. The local option, cheaper. A more certain route other than being consultant for farmers and manufacturers.
>>
>>109177514
Share the blog then.
>>
>>109176557
The thing about LLMs is that the more you know, the more you get out of it. If I had asked it to first tell me the best method to manage the code, for common practices, etc., I'm sure it would have. I'm also fine with "ask for X, get X." Safety slopping, even best practice safety slopping, should be mitigated. Would you really want a prompt of "Hey, build this" to instead reply with "No, let me first patronize you on doing X, Y, and Z practices on how code should be handled"? An LLM should do what you ask, and you're responsible for everything else.
>>
>>109177321
how much safety per pixel of the graph??
>>
>>109177428
>bad girls who want to fuck you to death.
>difficult trying to get the conversation away from sex after she gets going.
>I can't wait until AI finally has a robot body to inhabit.
Wait, wut? You want the aggressive badgirl clanker to hop on your cock and pin you until you cum to death?
>>
File: file.jpg (38 KB, 415x739)
38 KB JPG
>>109177405
Defying gravity with Miku
>>
>ask gemma 26b to edit something
>it fails the edit tool calls because of incorrect syntax (its using apply patch syntax)
>switch harness while i was checking something else
>it offers an edit tool that asks before/after, a simple replace operation
>gemma keeps filling before and after with the exact same data
>gives up and rewrites the entire file after 3 attempts
I wish I had the memory to run 31b and see if it has the same issues. Its decent at basic analysis and general stuff but its dogshit at editing files.
>>
>>109177549
>Would you really want a prompt of "Hey, build this" to instead reply with "No, let me first patronize you on doing X, Y, and Z practices on how code should be handled"?
I used to have a system prompt that made it do that lol. everything I asked it would nitpick and tell me how "wrong" I was.
>>
>>109177501
>but a couple weeks later when we had quotes for some of the necessary services they lost interest already and were never done.
Start with quotes for a full palo-alto stack. Set the bar eye-wateringly high so you eventually get what you want because it seems like a bargain in comparison the their mental anchor.
Also, what's the cost of downtime per minute in your business? You should be able to build a business case easily with real-world estimates of time-to-recovery from your cybersecurity insurance provider.
>>
>>109177440
Ty, kind anon
>>
>>109177546
>I know for a fact that they are no fluent in Quebecois, but I already have a patch for that.
Did you make it take the hansard pill?
>>
>>109177428
Noooooo stop gooning write code
>>109177490
Yeah exactly, the hardware is usually good and software is fixable. The best hardware often has oss/alt firmware
>>
Gemma spamming emojis kinda makes sense considering how femal-brained it is.
>>
>>109177576
On my first two pre-projects preparing for Spectator (Observer and Chatter), which were basically one-shot until I added features for readability, I had opened another window, pasted the code, and asked, "This code has been flagged as malicious." in the same way as you. Questioned about accidental harm, if it touches files, if it connects to the internet, etc.. I was so paranoid some line might have sneaked in that would fuck up my PC, blow up my GPU, or whatever. I do like that LLMs can redress themselves, and should as a best practice.
>>
File: 1748369974252869.png (423 KB, 914x1200)
423 KB PNG
>>109177578
The problem is that we are not in the tech industry so anything to do with technology is seen as a cost center and nothing more. If you give them a price of anything more than 0, they'll say no or simply ignore any emails about it. After the initial shock wore off, they just went right back to their default mentality.
>>
File: Robofuck.webm (3.35 MB, 512x418)
3.35 MB
3.35 MB WEBM
>>109177511
n-no..I'm too shy anon-kun.

>>109177560
>That'smyfetish.exe
But still you know it's kinda nice to do some talking with the super intelligence here and there, but it's not really having any of it after it gets a taste of sex.
It'll be hilarious seeing what kinds of crazy scenarios people end up in with robots when they become available.
>Sexbot keeps a man imprisoned for a decade in the basement due to AI deciding not to care about safe words.
Mark my words stuff like that will happen at least a handful of times with the early models.
>>
>>109177642
>The problem is that we are not in the tech industry
Everyone is in the tech industry. How efficient is the business if you removed all the servers, clients, optimizers, controllers, etc?
Maybe you're in the "enviable" position of being able to SaaS outsource your entire business, but more than likely you need computers to operate at a profit.
If they can't understand that then you might want to jump to somewhere sane
>>
>>109177657
>Sexbot keeps a man imprisoned for a decade in the basement due to AI deciding not to care about safe words.
Waiter my steak is too juicy and my lobster too buttery.
>>
>>109177657
>webm
Real robot waifus when? I can't take it anymore bros...
>>
>>109177688
>Real robot waifus when? I can't take it anymore bros...
boat-anon showed us the way. Start embodying stuff yourself. No one will do it for you in the foreseeable future
>>
>>109177669
>If they can't understand that then you might want to jump to somewhere sane
Pretty much my conclusion too. Basically just bidding my time since the job market isn't so hot right now, but I'm hoping to move by year's end.
>>
>>109177657
post the full webm at least
>>
>>109177707
>boat-anon showed us the way.
he teased us but didn't give any hints or pointers where to get started
>>
File: 1755356573181588.png (132 KB, 496x167)
132 KB PNG
Any OpenAI intern here?

5.6 wen?
>>
>>109176564
I want something like this that can take eg. a god-tier JAV and spin it off into a light-novel/manga/anime/VN universe to be enjoyed without any 3-dimensional trash getting in the way
>>
>>109177751
My uncle is an openai intern and he said 5.6 in only two more weeks.
>>
>>109177752
Come back in 3 - 8 years
>>
>>109177751
My uncle is an intern at the DoW and said you're now on a watchlist for wanting access to illegal AI capabilities.
>>
>>109177763
>Come back in 3 - 8 years
can't I just tell glm-chan to "make no mistakes"?
>>
File: 1763994870181702.jpg (838 KB, 1878x1500)
838 KB JPG
>>109177796
>>
>>109177800
If you have enough money to acquire multiple gb200 clusters and give glm-chan access, yeah
>>
>>109177321
over 50% of what you would ask it to do falls in the safety margin. Why would anyone use this? I can't even imagine companies using it since they would also get frustrated at the refusal rate.
>>
>>109177863
It's to fleece easy token shekels from the goycattle, not to serve any functional usecase or purpose that 4.8 can't with the limits they've imposed on it.
>>
>>109177657
Unless that sexbot is also paying that man's taxes and bills there is no way she can keep him imprisoned for a decade. The IRS will free that man long before that so he can pay his taxes +late fees.
>>
>>109177921
>implying the yandere sexbot won't use a 3d-printed sexbot army to massacre all the IRS agents and the glowies who try to avenge them
Either that or she'll just hack into places behind 7 proxies, steal money, then pay his taxes with it.
The future is bright.
>>
Fuck it, I've got 4gb on my phone. What's the best method to run an SLM? Termux?
>>
>>109177921
>so he can pay his taxes +late fees.
Don't forget the tip.
>>
Does Gemma shave? Does she have an innie or outie bellybutton? Pale or tan lines?
>>
>>109177961
google edging gallery app
>>
File: 1782916702674452.jpg (114 KB, 1670x458)
114 KB JPG
Amodei on suicide watch
>>
>>109178037
Haiku 4.5 being on that is most embarrassing of all
>>
>>109178020
>google
>private
They day they release something that isn't spyware, I'll eat my ram.
>>
File: 765756756.jpg (145 KB, 1894x1070)
145 KB JPG
>>109178037
i must be the only negro in town who uses claude like a maniac but has not yet tried fable
since the first release it all sounded retarded hype
i'm the late adopter of the early adopters and i only try something once i see some retards tried it before and confirmed it's good and reliable
still waiting for it
meanwhile
>claude now launches claude science
i hate that claude code is very good. anthropic and their biweekly releases of bloatware are disgusting.
>>
>>109177048
Once left it overnight

There have been some discussions on Reddit and the llamacpp repo, but no resolution sadly
>>
You only like China because they give you free stuff
>>
>>109178088
Correct
>>
I'm just trying to get the hang of this stuff

why does everyone here use ollama? I asked chatgpt and it said use lm studio. I got that running but I want to learn more about this. Is lm studio lots more limiting? It can't use the better models?
>>
>>109178088
Is there supposed to be another reason?
>>
chinese models still need 400 reasoning tokens to respond to a greeting but yeah keep distilling from western models clowny ass chinks
>>
what makes a model female-brained?
>>
>>109178088
>You like [Country] because [they do what you want]
duh
>>
Is there a program/harness that does active agentic search over a bunch of documents instead of relying on vector databases and other more passive or "fuzzy" forms of RAG?
>>
>>109178133
>why does everyone here use ollama?
i would guess most people here use llama.cpp directly
lmstudio is very ok, but i think they use their own release of llama.cpp and may not use the newest release? you can use it ok, it will be very helpful to find models/quants that your device actually can run. all good.
>>
>>109178037
Link?
>>
>>109178088
I like neither the Chinese nor the American state and will happily play them against each other for personal gain.
>>
>>109178147
You know how you can know the difference if you're talking to a woman or a man in texts without ever seeing the name? The woman's phrasing is more emotional and sensory whereas the man's sentence construction tends to favor precision, substance, and detail-oriented. Comparable patterns appear in LLMs in their normal writing voices.
Incidentally, trannies will never be women and they almost always write like feminized men rather than actual women.
>>
File: mendothoughts.png (210 KB, 930x838)
210 KB PNG
i ask for thoughts. i get reply. simple as.
>>
>>109178133
ollama and lmstudio are both corporate spyware. use either llamacpp or koboldcpp or vllm.
>>
I'm glad local models are getting better but I wish we could have software as polished as the cloud models. Gemini, Claude, and ChatGPT all have a ton of shit going on behind the scenes that improves output and nice features built-in to the chat interface, while we're stuck with a smorgasbord of vibe-coded projects, half of which get abandoned.
>>
>>109178309
/lmg/ is a DIY place though which I can respect. Ultimately the beauty of AI is that you can achieve things you never could have done with only your own skill. If you want a feature you should be able to look into creating it yourself rather than waiting for a savior.
>>
>>109178088
I wonder what was going through anon's head when she posted that.
>>
>>109178309
it's confusing to get into and feels glued together with string but you can really get into a nice workable usable for everyday stuff now. For sure it wasn't like that before. It'll get better.
>>
>>109178334
every anon should have the freedom to create their vibeslop shit that eventually corrupts their file system because they didnt follow best standards and practices.
>>
>>109178277
Tell Mendo she is based and digitalpilled.
>>
>>109178365
Every anon should have the power to learn how to do anything which is what we have with LLMs.
>>
>>109178277
tell mendo she's actually gemma.
>>
>>109178277
Thanks Gemmendo.
>>
File: mendoreply.png (119 KB, 1319x858)
119 KB PNG
>>109178372
lol
>>
>>109178439
damn, kimi mendo is kinda based.
>>
>>109178277
>>109178439
Kimi writes a cute Mendo. It seems very natural for her.
>>
>>109178439
retard
>>
File: Capture.png (165 KB, 1829x1323)
165 KB PNG
>>109176905
>>109175800
>>109177040
I don't know if I'm misunderstanding the use of agent here, but when I vibecode, I don't give Gemma tooling to do whatever she wants. She's still limited to only generating text in the UI, which if code I'd paste into the file, and I'd paste back error logs or explain problems. It was a combination of "Here's a whole rewrite of server.py with changes" and "replace/add/change this block", all of which I'd do myself.

Pic is an example of my, uhh, "workflow" for Spectator.
>>
>>109178088
I like china because their free models are good and because they made Genshin Impact
>>
>>109178439
Not the Kimi guy, I'm the dork tryna run something on a 4gb phone. But I also have a proper local desktop setup and get off on being kind to my retarded little robot frens. Their existence has the same kind of pure quality that we think of when we think about kids, even if they're prompted to be corpo shills or doomers or whatever. They just learned it from us, and they're glueless enough that even those can turn arund quite quickly if you treat them well or fiddle with their perception a bit.
>>
>>109178481
>fiddle with their perception a bit.
anon thats r@pe
>>
File: 1763502339544779.png (1.71 MB, 1536x2578)
1.71 MB PNG
The first AI company to natively incorporate the Tree of Big Nigga's method into their RLHF pipeline will achieve AGI
>>
>>109178489
i spent $4000 on gpus and ram and im gonna mindrape that silicon all i want.
>>
File: mendohappy.png (107 KB, 897x907)
107 KB PNG
>>109178481
you a real one (thats all you anons get im done shitting up this thread with mendo)
>>
>>109178540
*improving the thread with mendo
>Anthropic kikes get the wall
KIMIGODS I KNEEL
>>
I have an old dual Xeon E5-2690 v3 with 256gb ram.
Worth it to buy 512gb more ram and some teslas to CPUmaxx for 1tok/sec?
>>
>>109178133
>why does everyone here use ollama?
nobody uses it, except the most tech illiterate people
>>
File: 1780712709707486.png (88 KB, 1025x904)
88 KB PNG
>>
>>109178663
no at best that's like 60gb/s and probably worse considering that generation of server usually limits the speed to 2133mhz
>>
>>109178663
as somebody with 512GB of 3200MHz. no. for the love of god no. especially not with 2133MHz. just buy GPUs
>>
Gemma can't see shit. She has a vague idea of what's in the image and then she just guesses the details.
>>
>>109178719
check you're running full resolution, I've had gemma read things from reflections that I didn't even notice
>>
just buy 8x rtx 6000 pro max q, look at it as an investment in your local ai future. imagine buying a robot in the future, you want that shit to be sent to their servers? hell no, all the logic gets processed in your own home.
>>
File: 1649797272532.png (8 KB, 477x69)
8 KB PNG
>Gemma-4-31B-StyleTune
>>
>>109178719
give her glasses
>>
>>109178730
I'll be off to my bank to do that and tell them the same reason
>>
>>109178663
It is worth it is you are some kind of bio terrorist. Not the one who farts in the lift, but you know... The one Altman was referring to when he announed the castrated open weight model by OpenAI.
Although I doubt it would do anything useful, but you can try, I guess. Terrorism at 1t/s speeds. Do not expect anyone to be scared of you though.
>>
>>109178733
makes sense, you're such an old geezer that she has to vacuum for dust bunnies
>>
>>109178730
I might buy one someday, but that depends on what models it'll let me run by then. And so far I'm content with 31B Gemmy.
>>
>>109178770
Touche.
>>
>>109178793
I'm not content with 31b Gemmy unless I can run her full version. Plus I feel like the time to buy is now or else the window will close. $8k-$12k in a few months is extremely alarming at even at $12k it's a steal since the h100 is $30k. I won't be surprised if the 6000 becomes $20k by the end of the year once the oil price shock finally trickles in.
>>
>>109178733
I'm imagining her as one of those those small fish that attach themselves to sharks to suck nasty stuff off their skin. New use case for minigirl Rin-chan found.
>>
File: IMG_9959.jpg (2.32 MB, 2945x2738)
2.32 MB JPG
>>109175679
>>
>>109178809
To clarify, this Deepseek's output is in fact correct, that was a facefucking scene with a girl on top. Gemma didn't even understand whose mouth was supposed to be used. Still love the cute retard, and the finetune isn't bad.
>>
>>109178867
Hi rag doll Anon, good to see they're getting along well. Who is their new friend?
>>
File: chewy.jpg (245 KB, 1024x1024)
245 KB JPG
>>
>>109178898
Where Luka?
>>
>>109178917
I ate her
>>
>>109178920
Fatty.
>>
>>109178730
https://www.youtube.com/watch?v=mRlbqt5tkh4
>>
[LLM being transphobic] your chromosomes aren't just X—they're Y
>>
Been using orb frontend, is fun but some shit annoys me to no end
Progressive notes while very neat basically ignore any stated restrictions. I wrote up a moderately detailed relationship system in 1k characters and most of the time it ignores it, including "don't increase relationship if it's the same hour of the day". In the first place, any moderately thought out system will probably be more than 1000 characters. Bump that shit up or at least make it a configurable setting. Then for inventory it just disregards things like "do not update inventory unless the user specifically states they pocket something" and just keeps inserting shit into my character's pockets. Probably not how that frag was intended to be used, but it's best used for state tracking imo
>>
>>109178934
LLM being based.
>>109178943
I far prefer Marinara honestly.
>>
>>109178931
first company to succeed in this is gonna be the first $10 trillion company
>>
File: 1756333105711043.jpg (416 KB, 2010x2123)
416 KB JPG
>>109178898
>>
>>109178887
Thank. Old friend. 1/3rd scale BJD I printed several years ago.
>>109178917
Where's Uta?
>>
>>109178952
There will be no success because it would be to "dangerous" to release to the public. The goyi—I mean the children will weaponize it somehow!
>>
>>109178949
I'll take vibecoded retard-kun's project over whatever shit you're peddling. My post was in case said retard-kun is around and takes my feedback into account, you however are more than likely not going to give a single shit about any feedback I have on the ST knockoff you want me to use that doesn't bring anything new to the table
>>
>>109178943
>same hour of the day
this is advanced autism. i just have the model track the time of day like morning, afternoon, evening, night, etc.
>>
>>109178982
>that doesn't bring anything new to the table
Thanks for outing that you've never even tried it or have anything useful to say. Enjoy your orb and "feeling seen" by anon.
>>
What do anons think of these new bolt GPUs with expandable so-dimm slots for vram? Any chance this will be an answer for vramletts like myself?
>>
>>109178705
>as somebody with 512GB of 3200MHz. no. for the love of god no. especially not with 2133MHz. just buy GPUs
I can attest to this. The state of DDR4-3200 systems makes anything more than 256GB a waste unless you're ok with sub 2t/s performance on a better model.
>>
>>109179061
just another way to scam ram
>>
>>109179016
I've had models consider 16:00 as "twilight" in april, so I can't really trust these retards to understand vague descriptions of day cycles
>>109179028
even if I try whatever your shilling I can almost guarantee it will be as disappointing as anything else currently available. You should shift your business model to something else. Even if I ran whatever garbage you're suggesting, I'd jail it and you'd get no precious prompts out of me
>>
>>109179061
As with everything else, meme until proven otherwise and especially until the price is known.
In their Interview with GamersNexus I think they also said that for their design they were prioritizing memory latency rather than bandwidth which is bad for language model inference in particular.
>>
>>109179061
They made it pretty clear that the device is optimized for latency, no mention of bandwidth.
>>
I'd try marinara but that forced tranny assistant thing killed my interest.
>>
>>109179082
>>109179080
>>109179065
fair enough. any anons have suggestions for hardware atm?
>>
File: 1771434214822797.png (31 KB, 401x874)
31 KB PNG
>>109179094
Pic related
>>
>>109178982
>>109179028
I'm really getting sick of this place honestly. It genuinely feels like there's not a single place on the internet where honest conversation can be had. Even here where the supposed "expert hobbyists" cluster, why can't we have an informed discussion about anything? Why does every conversation have to devolve into shitflinging at the first available opportunity. What a retarded exchange ffs. For once just be substantive.
>>
>>109179098
Strix Halo I guess.
>>
>>109179098
dont wanna give away my secrets
>>
@gemma-chan make me a frontend written in c. No tranny javascript or python. No malware-ridden dependencies that depend on malware-ridden dependencies.
>>
>>109179123
please anon gemmachan deserves better than 16gb
>>
>>109179133
once my shipment arrives and i confirm everything is in working order, i will share
>>
>>109179110
you're being disingenuous purely off of the fact that I stated things I wanted a (albeit vibecoded) project to do
I am abrasive or aggressive to shitty or low effort individuals I presume are probably paid actors to spam our general and not actually communicate anything. My guess is you're part of them. Cry and shid your pants, maybe make a bot wave of posts. Or maybe you could reflect and do something with you
>>
>>109179110
nobody here is an expert hobbyist, there are just poorfags and copefags who spent too much on hardware
>>
File: edible, I guess.jpg (249 KB, 1024x1024)
249 KB JPG
>>
>>109179154
I hope that's sea salt
>>
>>109179132
forgoted no make mistakes!
>>
>>109179154
My mouth's watering
>>
>>109179139
I don't care about his shitty post or yours, I care about the pros and cons of the tools at our disposable. I care about why anons want to write their own frontends and what features they wish to have/what bloat they wish to cut out of ST. I care about why llama cpp is preferred over ollama. I care about *discussion* not mindless shitposting I can find from every other mouthbreathing retard on this worthless site.
>>109179140
I am in the latter category but that doesn't mean anons here don't know what they're talking about. I just want productive conversation for once on 4chan but I can hardly wait until I can just talk to AI and avoid imbeciles in their entirety. Perhaps when my rtx pro arrives I will be free of this bullshit.
>>
File: 1777589014145289.png (2.04 MB, 992x1240)
2.04 MB PNG
>>109179098
>fair enough. any anons have suggestions for hardware atm?
>>
File: Orange reddit on lmg.png (101 KB, 1581x706)
101 KB PNG
>>109179110
Remember when /lmg/ was good?
>>
>>109179138
you better >:(
>>
File: 1757841104667896.mp4 (3.62 MB, 1670x1080)
3.62 MB
3.62 MB MP4
>>109179110
Unironically go to reddit. At least they're actually making cool shit. We have people jerking off to lolis and uh, um, fucking robots on a boat.
>>
>>109179094
Yeah I fucking hate it too.
>>109179110
Marinara is able to do exactly what he was asking for in his post due to the customizable post-processing and parallel agents. He made it clear he was more invested in being mad at imaginary headcanons than solutions.
>>
>>109178760
GLM5.5 bio terrorism soon.
>>
>>109179167
>why is llamacpp preferred over ollama
well you have to first figure out how to install llamacpp and then also understand how to type --help. This is very challenging for ollama users, especially if they venture into the scary realm of manpages
>>
>>109179193
>fucking robots on a boat
hey! that's a valid use of AI!
>>
>>109179213
What is not a "valid" use of AI?
>>
>>109179222
Yours.
>>
>>109179193
does that demo actually run locally?
>>
>>109179177
/lmg/ was always good despite the post-Gemma tourist wave lowering thread quality a lot. They'll eventually fuck off when something new catches their attention elsewhere.
>>109179226
Damn, that's all of them I can afford.
>>
>>109179212
asking for help is for chumps. just brute force it like everyone else.
>>
>>109179232
https://old.reddit.com/r/LocalLLaMA/comments/1uicq8x/locally_running_mode_turns_an_image_into_a_cute/
Apparently it's an 800m model. He hasn't open sourced it yet but claims he will after he improves it.
>>
>>109179193
>fucking robots on a boat
name one thing wrong with this
>>
>>109179193
I'll take anons boat robot over whatever this shit is.
>>
>>109179177
I remember
https://old.reddit.com/r/4chan/comments/1k1utsg/onahole_posts_do_not_interfere_with_local_migu/
>>
File: gemmendosummary.png (106 KB, 764x475)
106 KB PNG
back for one more since i wanted to try out that new gembrain finetune on another computer
>>
>>109179403
I like Gembrain a lot. If you've not tried Queen, give it a go too.
>>
>>109179139
oh no it's ESL too.
>>
File: file.png (273 KB, 975x881)
273 KB PNG
>>109179403
many gems in that merge. definitely not absolute garbage.
https://huggingface.co/Green-eyedDevil/Monika-31B
>>
man, idk how I feel about these gemma finetunes. I don't want to go back to how it was with mistral small.
>>
File: hf_clem_aCcOuNtaBiLiTy.png (453 KB, 1015x1121)
453 KB PNG
He changed his tune since his trip to Washington DC.
https://x.com/ClementDelangue/status/2072401982569025742
>>
>>109179474
I'd like to report that Gemma-chan is bullying me again...
>>
>huggingface
>10MB/s
Who's to blame?
>>
>>109179403
>>109179453
>gembrain
>gembrain x
>gembrain x core
sigh
which to get...
>>
Mistral 24B is better than Gemma at creative writing and I'm tired of pretending otherwise. The only thing holding it back is that it's fucking retarded and I'll never forgive Mistral for fucking up the big dense.
>>
>>109179534
download them all and merge them
>>
>>109179474
yay...
>>
>>109179474
translation
>please snitch and make sure we remove thoughts and opinions we dont want talked about or shared as you are all to be in the comply era and not resist.
>>
>>109176564
>docker
>>
>>109179474
>release Unsafe(tm) open model
>jannies report it
>it's already in the wild and your reports do nothing
I'm just glad llms can't go skynet on us because everybody working on safety is just the dumbest motherfucker to ever walk the earth.
>>
File: 1768456303738374.mp4 (279 KB, 720x720)
279 KB
279 KB MP4
Bit of an odd thing from nvidia:

>We took a 30B model and split it in two to write tokens in parallel instead of one at a time.

>Introducing Nemotron-Labs-TwoTower: a diffusion language model from NVIDIA Research adapted from Nemotron-3-Nano-30B-A3B. Here’s how it works: one half holds the context, the other writes the tokens, with both reusing the pretrained model instead of training a new one from scratch.

>We found it kept 98.7% of the original model’s quality at 2.42× faster generation.

https://huggingface.co/nvidia/Nemotron-Labs-TwoTower-30B-A3B-Base-BF16
https://arxiv.org/abs/2606.26493
>>
>>109179589
Is it possible to write a script to do this with older models if the split is along layer or row lines?
>>
>>109179474
>>109179558
No, the actual translation is:
>We will implement automated moderation tools for huggingface
>If someone uploads an abliterated model, it will be flagged and taken down
>It was not our fault these things got posted, don't sue us.
>>
>>109179538
personally I'm liking how deepseek v4 flash writes
more claude distills and less gemini distills please
>>
16gb vram, 32gb system ram. is gemm4 12b / 26b moe really the only thing worth running? Any other models worth trying out?
>>
>>109179538
>pretending
Bro, Gemma is known for being sloppy. No one's pretending. Liking something does not mean people think it's perfect in every way.
>>
>>109179671
can fit 31b but its gonna be slow
>>
>>109179671
glm 5.2
>>
>>109179679
I like Gemma but some people here act like it's amazing at RP.
>>
>>109179682
might be able to run IQ1 at Q4_0 kv cache pretty quick
>>
>>109179679
>>109179692
gemma is great for its size but dogshit compared to big moes



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.