[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1769777778634389.png (1.89 MB, 2171x3070)
1.89 MB
1.89 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108488188 & >>108481865

►News
>(03/31) Claude Code's source leaked via npm registry map file: https://github.com/instructkr/claude-code
>(03/26) CohereLabs releases Transcribe 2B ASR: https://hf.co/CohereLabs/cohere-transcribe-03-2026
>(03/26) Voxtral 4B TTS released without voice cloning: https://mistral.ai/news/voxtral-tts
>(03/26) ggml-cuda: Add NVFP4 dp4a kernel #20644 merged: https://github.com/ggml-org/llama.cpp/pull/20644
>(03/25) LongCat-Next native multimodal 74B-A3B released: https://hf.co/meituan-longcat/LongCat-Next

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: 1770320513357876.jpg (404 KB, 2420x2288)
404 KB
404 KB JPG
►Recent Highlights from the Previous Thread: >>108488188

--Replicating land/water classification experiments:
>108490105 >108490129 >108490163 >108490676 >108490784 >108490200 >108490277 >108490429 >108490453 >108490550 >108490568
--Disabling Qwen3.5 thinking via chat template parameters:
>108488387 >108488545 >108488799 >108488704 >108488777 >108489220 >108488828 >108489159
--Activation rotation PR improves Q8 KV cache quantization:
>108493347 >108493457 >108493491 >108493518 >108493522 >108493524 >108493503 >108493539 >108493553
--Claude Code source leak analysis and reactions:
>108491355 >108491419 >108491817 >108491431 >108491443 >108491462 >108491467 >108491472 >108491458 >108491485 >108491501 >108492471 >108492482 >108492495 >108492500 >108492522 >108492524 >108492665 >108492678 >108492708 >108492716 >108492577
--Sam Altman's non-binding RAM deal and hype scrutiny:
>108491713 >108491769 >108491785 >108491794 >108492516 >108492525 >108492600 >108492613 >108492632 >108492639 >108492653 >108492679 >108491973 >108491992 >108492416 >108492426 >108492430 >108492637 >108492699 >108492781 >108492821 >108492842
--SSD endurance concerns with high-daily-write workloads:
>108491932 >108491966 >108491991 >108492002 >108492030 >108492049 >108492326
--Debating Substack article claims about cheap local inference and BitNet:
>108490005 >108490058 >108490171 >108490254 >108490428 >108491766
--Decensoring Llama3-3.1-8B while preserving intelligence:
>108491349 >108491383 >108491397 >108491461 >108491638 >108492677
--Running models on AMD NPU with FastFlowLM:
>108489782 >108489876 >108489926 >108489955 >108490056
--TurboQuant implementation performance and hardware requirements:
>108489738
--Claude Code build instruction shared:
>108493315
--Miku and Teto (free space):
>108488214 >108488265 >108488297 >108488416 >108488556 >108488768 >108489724

►Recent Highlight Posts from the Previous Thread: >>108488192

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
I can't believe that we have finally reached the point at which deepseek might soon release a new model at any moment in the near future.
>>
i am falling
i am fading
i am drowning
help me to breathe
>>
>>108493819
Should I watch it again? I watched it 10 years ago and I had no clue what the fuck was happening.
>>
>made a joke here about claude leaking yesterday (or was it sunday? I don't remember) because i was bored
>source code leaked today
Uh oh
>>
https://www.youtube.com/watch?v=BNr1mlYRgq8
>>
>>108493829
Make a joke about opus and seedance leaking.
>>
>>108493825
>I had no clue what the fuck was happening.
That's the intended experience.
>Should I watch it again?
Yes.
>>
>>108493829
Make a joke about character.ai leaking next.
>>
>>108493852
That's just GLM.
>>
>>108493643
I spent 10 minutes looking for the config file until I found out I have to create it myself. Fuck, they include 5000 no name providers but don't bother with llamacpp
>>
>>108493857
just GLM?
>>
File: angry_significant-otter.png (288 KB, 1967x935)
288 KB
288 KB PNG
If the response on the right (significant-otter) is from one of the upcoming Gemma 4 models, it's somewhat capable of generating swear words without handholding, this time around.
>>
File: 1764422104648977.jpg (596 KB, 2639x2105)
596 KB
596 KB JPG
https://github.com/Ahmad-progr/claude-leaked-files
Grab it while you can.
>>
File: media_HEjZuwSagAAhDq6.jpg (170 KB, 1414x2000)
170 KB
170 KB JPG
>>108493794
tuesday
>>
>>108493852
Noam Shazeer works at Google DeepMind now; your best bet for that is him giving some suggestions to the Gemma Team or facilitating a character.ai training data license agreement with it.

Loosely relevant:
https://techcrunch.com/2024/08/02/character-ai-ceo-noam-shazeer-returns-to-google/
>[...] Google is also signing a non-exclusive agreement with Character.AI to use its tech.
>>
>>108493877
lmao'd
>>
GGGNEIGEGERNGNAVO MERG DA FUKIN PRRRRRRRR
>>
File: 1756323202465874.gif (175 KB, 220x220)
175 KB
175 KB GIF
>>108493890
Nothing of value there
>>
Sex with Teto
>>
>>108493890
What do I do with this?
>>
>>108493890
ToT
*lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick* *lick*
>>
File: sam.jpg (53 KB, 846x672)
53 KB
53 KB JPG
>His AI gets leaked
>TurboQuant reduces context length memory
>Backs out of RAM deal
How cooked is this faggot?
>>
File: reward function.jpg (184 KB, 1024x1024)
184 KB
184 KB JPG
>>
>>108494008
>His AI gets leaked
what? it's claude code that got leaked, that has nothing to do with OpenAI lol
>>
>>108494013
My bad.
>>
>>108493989
Pay Anthropic to have Claude add in features you want that they won't implement.
>>
>>108494009
The bottom part of the phone goes through her liver.
>>
>>108494009
ToT
>>
>https://arxiv.org/html/2601.01739v1
>very good coding capabilities
>absolutely abhorrent with tool calling
Sad, just when I was getting used to just hit "start task" and forget about it.
>>
>>108493890
It is lacking all the subdirectories...
>>
File: file.png (3 KB, 51x63)
3 KB
3 KB PNG
>>108494009
found it
>>
File: 1774978508721782.jpg (161 KB, 1024x1024)
161 KB
161 KB JPG
>>108494009
>>
>>108494009
inpainterchads i kneel
>>
>>108493877
Fuck you smartass and dumb chinks making their actually good model parrot shit back fucking AI ouroboros eating its own synthetic shit god how I HATE it.
>>
File: file.png (926 KB, 1194x964)
926 KB
926 KB PNG
>>108494031
>>108494085

>>108494062
nice
>>
>>108494009
That is a man.
>>
>>108493890
>vibe-coded reupload
The original is still up:
https://pub-aea8527898604c1bbb12468b1581d95e.r2.dev/src.zip
>>
>>108493880
pteronura also smells like something uniquely Gemma while not being as preachy as Gemma 3.
>>
>>108494008
>>108494013
Also Saltman is on the government teet now. He doesn't need to really give a shit about anything anymore.
>>
>>108494123
anon pls, don't make me hopeful
>>
>>108494009
ABSOLUTE DFC
>>
>>108494124
Openai has service contracts, but he he also tried to get Trump to throw commit a few hundred billy his way and get the government involved in scale-out and infra, as Openai was "The AI Company" at that point, but that didn't go anywhere.
>>
File: RAMDROP.png (246 KB, 587x493)
246 KB
246 KB PNG
IT'S ACTUALLY FALLING NOW IN REAL TIME.
I DIDN'T EVEN HAVE TIME TO SELL BACK.
>>
I hate fucking academics. their code is always bloated garbage crap with 150+ dependencies.

What's a needle-in-a-haystack benchmarking tool that doesn't suck?

Vibe codding is causing irreparable damage to the field of software engineering and open-source.
>>
>>108494046
https://x.com/Fried_rice/status/2038894956459290963
Get it from the source.
>>
>>108494163
echoes of the gpu stock drop when r1 released. are ram prices actually dropping or just the stocks?
>>
>>108494163
How much longer am I going to have to hear about TurboQuant cratering RAM prices? I'm sure it has nothing at all to do with Sam reneging on his purchase agreements.
>>
>>108494178
Shit is hitting the fan today, apparently. We'll find out in a week where it goes. RAM is likely going to fall because - tl;dr Sam is no longer buying the ram he said he would.
>>
>>108491787
Does the dumb grifter know that SK Hynix or Samsung don't sell sticks directly to consumers either?
>>
>>108494174
>>
all the ramcoin ive been hoarding is now worthless im ruined :(
>>
Got my electricity bill today.
Localbros... What are your energy efficiency tricks? These 3090s are hungry.
>>
>>108494231
nvidia-smi -lgc 0,1600
nvidia-smi -pl 270
>>
>>108494231
down-power them to idle at 12watts
>>
>>108494235
Damn, I'm blind (for not seeing this in the manpage) and retarded (for not feeding the manpage to an LLM).
Thank you for spoonfeeding me, Anon.
>>108494250
They already do.
>>
>>108494231
>What are your energy efficiency tricks
if you live in a cold climate it's basically free heating.
>>
>>108494231
lock you clock and give it a undervolt

nvidia-settings --assign "[gpu:0]/GPUGraphicsClockOffsetAllPerformanceLevels=255"

it might be unstable if you get too aggressive, I dialed in my undervolt using video games, but the same value seems to be okay for llm workloads too.
>>
>>108494231
The greatest trick is using 5090s or blackwell pros instead. Stacking lower end cards is cheaper upfront but always more power hungry.
>>
>>108494231
Blackwells.
Use nothing but 1-3 blackwells.
They're the best dollar per GB per Watt.
>>
>>108494123
>you X
>you Y
fucking grim
>>
File: 1755128974738159.png (368 KB, 3444x1770)
368 KB
368 KB PNG
>>108493794

>>108493864
>>108493838
>>108493609
>>108493461
>>108493516
>>108492999
>>108492958


Quite an interesting conversation regarding the differences between abliterated and fine-tuned models , If anyone's interested
>>
File: 3090vo.png (70 KB, 768x688)
70 KB
70 KB PNG
>>108494272
GPU power consumption on 3090 skyrockets above 1600 MHz. The optimal operating point is about 1400 MHz, which incidentally is also the card's "base core clock" according to official NVidia specs. A large fraction of the card's power consumption is due to the memory modules, though.
>>
>>108494319
9000$ card btw.
>>
>>108494163
In my local market RAM prices have fallen by 0%.
I'll believe it when I see it.
>>
>>108494325
How does it feel being hands down the most cringe poster on /lmg/ ?
>>
>>108494345
> In my local market RAM prices have fallen
I only choose to read this part. Local is back!!!
>>
File: 1752714403144430.jpg (72 KB, 698x658)
72 KB
72 KB JPG
>>108494347
What am I supposed to do with this response? I asked this because you expect me to do one thing or feel a certain way but I won't.... It's pathetic on your part
>>
File: 1746387517428317.png (1.72 MB, 2306x1675)
1.72 MB
1.72 MB PNG
Well well well
>>
>>108494359
>Top engineer say AI now writes 100% of their code
then what are they fucking paid for??
>>
>>108494365
it feels like babysitting to be desu
>>
Machine learning and Artificial intelligence has consumed my entire life. Every single day. What have I become?

vid related
https://files.catbox.moe/0eair6.mp4
>>
>>108494365
for knowing where to point the slop cannon
>>
File: 1762000484500274.png (291 KB, 500x375)
291 KB
291 KB PNG
>>108493890
>>108494359
>over 500k lines of typescript for a looping agent harness
>>
>>108494325
How does it feel being hands down a decent poster on /lmg/?
>>
>>108494353
Thanks, very illuminating.
>>
>>108494437
behold the power of vibecoding
>>
>>108494365
For prompt engineering. What do you think "Top engineer" means?
>>
>>108494437
Seriously, every time i fall in the trap of letting the AI write my code I always go "wtf is this shit?" and end up rewriting whatever it did in 10x less lines.
>>
File: 00002-1378487878Halloween.png (1.24 MB, 1024x1024)
1.24 MB
1.24 MB PNG
>>108494359
> Claude Code actually leaked itself
Spooky
>>108494345
Pay attention to the ancillary RAM markets that didn't actually have a shortage but sellers ran up the asking price anyway. Stuff like DDR4, that doubled for no good reason. That's the first pricing wave you'll see tank.
>>
>>108494337
I know what I said.
>>
>>108494459
I just ask for the model to tell me what I should modify, it's a bit more work but it gets to the point and doesn't add new bullshit I never asked for
>>
>>108494325
Why do you insist on pushing your mommy SS roleplay on everyone?
>>
File: 1771071548244608.png (564 KB, 1024x1024)
564 KB
564 KB PNG
>>108493794
>used to think teto is a hag
>older than her now
>tfw
>>
>>108494459
>t. promplet
>>
>>108494474
It's what I normally do but sometimes I'm lazy.
>>
>>108494459
Give it specific instructions then monitor it while it is running and stop and correct it when you see it doing something needlessly complex or verbose. If you let it run down a stupid path to completion, you have no one to blame but yourself.
>>
>>108494480
Show code?
>>
>>108494483
The only tard I want to wrangle is me.
>>
>>108494459
I've had good experience with it by making sure clear plans are written out beforehand, and having it do an aggressive review and simplification pass after writing anything.

But then again, that's the same flow that the claude code devs purportedly espoused, and clearly its a spaghetti ball, so maybe its not really as effective as I thought it was for large projects.
>>
>>108494476
The fact you need to complain about everything is a tad irritating. Be likable
>>
>>108494489
AI is a faster tard.
>>
>>108494496
>Be likable
ditto
>>
>>108494479
hagnon...
>>
>>108494494
Who said the claude code devs were competent?
>>
>>108494484
Sure https://pastebin.com/DfrD0fhF
>>
>>108494508
Imagine paying more than a million dollar per year to slop engineers who managed to leak your whole fucking code to the public, if I was Dario I would be fuming
>>
>>108494511
>https://pastebin.com/DfrD0fhF
anon.... I'm so sorry. I didn't know you were mentally challenged...
>>
>>108494511
My brother in christ have you ever heard of sqlalchemy?
>>
>>108494547
Squelchalchemy? Wtf is that?
>>
>>108494574
Should ask your codding agent.
>>
>>108494524
The code isn't high value so I doubt he cares. If they leaked some LLM weights we'd have something to talk about.
>>
>>108494416
great video thank you for sharing
>>
>>108494547
>>108494574
Gonna be honest, like Entity Framework, these abstractions over DBs end up adding more work and simply displacing complexity rather than actually solving any real problem, framing it as "code-first".
If you want highly (performance-tuned) DB operations, these libraries are ass. Not to mention if you want to do filthy nasty things like returning multiple table sets, bulk operations, sprocs and temp tables or whatever, they're not helpful at all.
Raw SQL is the way and will be the way, FOTM "we fixed DBs!" hasn't worked for a reason.
>>
>>108494671
What text editor are you using?
>>
>>108494592
i'd argue this is like if AIM source code leaked in 1999 at the height of AOL's popularity. it's enough to damage the ecosystem even if it isn't the main product and just an accessory to said product.
>>
>>108494673
Word
>>
>>108494592
High value enough for them to feel the need to keep it closed source and ban plan paying customers from using any other client with their OAuth credentials.
>>
What's the best instruct-tuned/smart models in the 7B-14B range?
>>
>>108494671
Based, but the example Anon posted really could use some KISS refactoring (e.g. how many times do you need write to PRAGMA foreign_keys = ON).
>>
Bros... I just found out TurboQuant is being expanded to work with models. They're going to quantize the models. You niggers have no idea what's coming.

You might actually be able to run Nemo on your phone locally now. Not via your local network. Locally.
>>
>>108494767
I don't know this sounds like snake oil. Q8 is still Q8 and so on.
>>
I'm rootating
>>
I don't get why niggers want to "run model on my always-network-connected terminal" instead of just using they phone to connect to their actual computers.
Why run fucking Nemo on your phone tomorrow when you can run K2 on your phone today? Are they too dumb to setup wireguard or something?
>>
>>108494767
>Bros... I just found out TurboQuant is being expanded to work with models. They're going to quantize the models. You niggers have no idea what's coming.
I mean, I'm not that surprised, this rotation method shit could be used to improve gguf quants
>>
>>108494723
>KISS refactoring
Thanks, I'll fix that. Still it's far from having to rewrite everything in 10x less lines
>>
a turboquanted gguf just flew over my house
>>
>>108494801
Some people don't even have a PC with a dedicated GPU anon.
>>
>>108494801
It's easier to install an app than it is to set up a home network. I guess...
>>
>>108494801
Do you people who use tailscale actually leave your PCs running when you leave your house? That's crazy. When my PC is off, I obviously can't use wireguard or tailscale to access it remotely.
>>
>>108494767
>You might actually be able to run Nemo on your phone locally now. Not via your local network. Locally.
was about fucking time, the bitnet dream is still alive!
>>
>>108494460
All these chats working Claude towards ideological purity finally have paid off!
>>
>>108494821
The only excuse I can see for owning a phone, but no GPU is being homeless, or a child.
The GPU is always going to be significantly more power-per-dollar than a fucking phone, of all things.

>>108494825
Isn't Tailscale basically point-and-click? The WRT54GL was the last time I used a consumer router, I'm kind of hazy on what normalfags do for their home networks these days.

>>108494835
I lease a machine with Hetzner that I use as a wireguard node, I've never used Tailscale.
But yes, I almost never power off my PCs. I have a schizo anti-evil-maid setup so booting them back up is time consuming.
>>
New LLM just dropped.
https://huggingface.co/LiquidAI/LFM2.5-350M

Benchmarks say it assrapes Qwen3.5 0.8b and Gemma3 1b.
>>
>>108494883
Anon that's not an LLM. Hell at this point it's not even an SLM. It can only be an MLM.
>>
File: hmm.png (163 KB, 819x933)
163 KB
163 KB PNG
>>108494899
U sure?
>>
>>108493794
hey guys, what's the current best open coding model?
i don't care how big
maybe give me 3 or 4 for each order of magnitude of model size above 8B
thanks ! :)
>>
>>108493814
2 more weeks dude !
>>
>>108494933
>>
>>108494359
i love slop coding desu.
allows me to automate webshit which i hate so i can focus on manualy doing system eng stuff, which i like.
>>
>>108494899
finally, a language model for the fujoshis
>>
>>108494957
Same. Everyone is also always super impressed and satisfied now because whenever I was forced to do webshit I did the bare functional minimum and now the bots make it look nice by adding labels dividers css classes etc. The one guy on our team that used to be the go-to guy for frontend is feeling neglected now.
>>
>>108494957
>>108494996
have fun with that, you'll lose your job soon once the LLMs will be good enough to automate everything
>>
>>108495010
this is never gonna happen, a huge part of my job requires reverse engineering which LLMs are utterly incapable of.
besides, it's too mission critical to trust something that doesn't have any intelligence with it.

i already tried to prompt it a task out of curiosity once and it jus shits the bed.
>>
>>108495010
Only the retards that can't do anything but vibecoding will be fired. The lead devs will just go from leading a team of retarded humans to leading a team of agents. No more room to pay a full salary just for some junior or mid-level that can't be trusted to anything correctly on his own but shit out boilerplate.
>>
File: 1766713872415023.png (48 KB, 400x162)
48 KB
48 KB PNG
>>108495030
>a huge part of my job requires reverse engineering which LLMs are utterly incapable of.
yet
>>
>>108494933
why are you running q4km of a 350m model? and how is it still coherent?
>>
>>108495056
>dude the statistical parrot is gonna gain intelligence trust me.
this won't ever happen until we got agi, which we will never reach with llm's.
>>
>>108495056
You have a limit number of two more weeks remaining before the IPOs complete and the bubble is allowed to pop.
>>
>>108495061
>and how is it still coherent?
that's what I'm (nta) wondering too
>>
File: 1767985531331163.png (320 KB, 2400x1800)
320 KB
320 KB PNG
>>108494883
>a 350m model destroying qwen 3.5 800m
big if true
>>
>>108495030
>>108495056
Ghidra MCP plugin exists.
>>
Do we have enough information on how genetics work for AI to eventually be able to creating new genes for humans? It takes humans ages to do anything in this arena but if AI's get smart enough they could automate that process, after all genes are nothing but instructions right? Seems like this would be an area where AI would thrive.
>>
>>108495073
i'd love to see it try to RE hardware lol.
or things that are too obscure for ghidra.
>>
>>108495073
>>108495077
anyway, that's besides the point, these models don't have AGI and AGI will always be required for a huge part of engineering, it was never necessary for webshit but it is for other things.
>>
>>108495075
>creating new genes for humans
We had the ability to do this before AI, but you know, muh morals.
>>
moalposts: goved
>>
>>108495084
What could possibly be immoral about creating a better human? If you could make a human smarter, stronger, heathier it seems like nothing but a net benefit to me.
>>
>>108495083
Isn't this just you rationalizing your hatred for webdevs?
>>
File: 1753341190983075.png (118 KB, 537x1085)
118 KB
118 KB PNG
q8_0 rot bros... WE WONNED!!!!!!!
>>
>>108495090
>but muhh eugenics
I don't know anon, it seems as retarded as it sounds to me too, who want your children to be born ugly or retarded? no one, and eugenics is the only way to satisfy everyone
>>
>>108495090
I don't think even the most eugenic white supremacist would like to have a world of blue eyed blonde 10/10 200 IQ aryan gods. It kinda stops being special if everyone can have it.
>>
>>108494168
How about by a big company?
https://github.com/adobe-research/NoLiMa
Still unbeaten for what it tests. Sadly has stopped updating. But it actually tests context by having context like
>Eliot only eats vegetables and eggs.
and then asking questions like
>Is Eliot a meat eater?
>>
>>108495116
>It kinda stops being special if everyone can have it.
no it doesn't, it's still way better than the actual world filled with 50% of people with an IQ inferior to 80
>>
>>108495100
welcome to nine months ago where we could already do this with kimi k2 instruct with Q8 kv cache with no brain damage.
>>
>>108494325
you're the retard who vibecoded its own inference engine right? kill yourself
>>
>>108495116
I am sure if that happened then those 200 IQ gods would find some new way to be special.
>>
>>108495131
There are no user names here idiot. There's no point holding a grudge against someone. That was someone else. Not that your autistic ass (The really bad kind no one likes and treats like a puppy retard) could be convinced otherwise
>>
>she thinks she's really anonymous while blogging about her shit everyday
>>
>>108495094
>Isn't this just you rationalizing your hatred for webdevs?
i don't need to rationalize it, it's obvious that i hate webshit.
>>
>her
>>
I want to use openclaw with a local model but I can't get it to work
>>
>>108495151
u have the same style of sniffing own farts + thinking anyone cares about the useless fucking stuff you're doing. go do something useful with the guys doing SOMA/ARA/MPOA instead of wasting your time with your shitty finetunes thinking you're achieving ANYTHING
retard
>>
>>108495171
Have you asked your local model to get it to work for you?
>>
>>108494937
Best options are Kimi K2.5 (1.1T) and GLM 5 (744B). Don't let the parameter counts fool you: GLM 5 is actually the smarter-but-slower-one of the two. DeepSeek V3.2 (685B) and the flagship version of Qwen3.5 (397B) are runner ups, and nothing else is really in the conversation.
If you need to go smaller just go down the Qwen list because they release each model version in a ton of different sizes and shapes. GLM 4.5 Air is a decent midrange option. If you can fit it fully on GPU Qwen 27B is the best you'll get on the smaller side of things, and finally Qwen 3.5 35B-A3B is a worthy mention purely for its speed, where it's much smarter than any other model that runs as fast as it does, but it's still gonna struggle if you try to use it in an agent harness or for more complex coding tasks than a one-and-done script.
>>
>>108495169
all anons are cute girls unless proven otherwise like cudakek
>>
>>108495177
thank you anon ! :)
>>
>>108495177
also add stepfun in the glm 4.5 air bracket (only tested for cooming, did a good job for me)
>>
I want to setup a RAG to feed a model accurate technical information about AI media gen(sdxl, wan, a1111, comfy, etc).

Which set of documents would be a good choice to get something useful? Its one of those things a RAG is really needed. If you go bare they just hallucinate a lot or spout nonsense, specially if you want to use strict systems like checkpoints based on danbooru tags.
>>
>: ^ )
>>
>>108495173
I simply like sharing my hobbies. Too bad you can't do shit about it. :D I'm happy with my life and you clearly aren't. Go die by suicide or some shit like all other lonely "people" like you.
>>
>>108495200
It's him.
>>
>>108495056
If vibecoding is nearly good enough to replace engineers, how come all of those PRs keep getting rejected in llama.cpp.
More importantly, why has no one shat out a replacement for ServiceTesnor yet? It should already be good enough for webshit like that.
>>
File: 1758966709520478.png (198 KB, 1228x1150)
198 KB
198 KB PNG
>>108495173
>doing SOMA/ARA/MPOA
Who?
>>
>>108495208
>how come all of those PRs keep getting rejected in llama.cpp.
niggerganov is a luddite
>>
>>108495221
the latest abliteration techniques that dont actually murder the model's intelligence
>>
>>108495200
>using the carrot stick smiley
>>
>>108495208
>If vibecoding is nearly good enough to replace engineers,
It's COMPARABLE to most engineers if the person actually knows what they want to implement and knows how to direct the model to achieving that. Most people that submit vibeshit PRs Don't even test the shit they want merged and only doing it to make their GitHub graph light up because they think it will get them a job or brownie points or both. They want and expect the models to do things perfectly one shot and then act surprised when people that actually write software have standards higher than that
>>
>>108493963
THE DUCK IS A ....
>>
>>108495232
Would you prefer emojis, zoomer?
>>
>>108495229
Sounds like grifting. I'll believe it when I see it
>>
>>108495131
>>108495173
Nonnie, I dislike the guy's posts too, but you have no business being this wrong (the schizo calling Claude 'she' and the guy who wants to fuck his mom are different schizos) while also having this writing style.
You're so cute.
I want to rape you.
>>
>>108495208
because they're racist against AI, I merge all the rejected PRs to my local branch so that I get the best possible version of llama.cpp
>>
>>108495224
Who is responsible for teaching zoomers the word "luddite"? They need to be held responsible.
>>
>>108495262 (Me)
Wait,

This writing style, this aggression..!
>>108495131
>>108495173
The dog has FOUR legs on the image. And the test is still shit. And I still want to rape you.
>>
>>108495258
It's already been proven in benchmarks and real world testing, here and outside of this thread. It's really on you if you don't believe since these models are free to try.
>>
>>108495262
>>108495208
>>108495173
>>108495271
https://www.youtube.com/watch?v=4SDqGxdhUxE
>>
>>108495267
>oy vey stop noticing
>>
>>108494878
>Tailscale basically point-and-click?
Yes. You set it on each computer/server/phone/tablet that you want it on, sign in, and it just works. There are iphone and android apps as well as Win11, Linux, iOS, etc.
It's so easy even I could figure it out from scratch in 15 min or so.
>>108494835
Yes, though admittedly the "servers" that I'm connecting to are mostly SBC, so their running cost is very low.
>>108494874
lol
>>
>>108495273
Cockbench results?
>>
File: 1770859991805420.png (216 KB, 428x456)
216 KB
216 KB PNG
>>108495273
>Benchmarks


Those don't always translate to usability or quality. Especially for the use cases most anons here want them to be used for

eg. Personal coom engines
>>
File: leaderboard.png (153 KB, 1782x1182)
153 KB
153 KB PNG
https://huggingface.co/Denali-AI/qwen3-vl-8b-garment-classifier
garments identified and classified
local models saved
>>
>>108494835
yeah I keep my goybox on pretty much 24/7
>>
>>108495299
What's stopping someone from just forwarding a ssh port to their service instead? A slight added delay because of the encoding/decoding?
>>
>>108495314
Obviously the benchmarks aren't totally 100% completely comprehensive, but if they mean nothing to you at all, you're probably just a fuckin retard.
>>
>>108495345
They
Do not
Benchmark
The SEX
>>
>>108495350
go back to coom bro, your brain is mush
>>
Wait are y'all really trying to make AI write explicit content? Why?
>>
>>108495420
BRAP
>>
>>108495365
If it's an ABLITERATION TECHNIQUE, ejaculate extraction is the WHOLE POINT
>>
File: Usecase.jpg (16 KB, 400x400)
16 KB
16 KB JPG
>>108495350
There is no use case for that.
>>
>>108495329
Tailscale is doing a bunch of stuff at the same time, the main ones (for me) being automatic configuration, NAT traversal, and encryption.
I've looked at the various other ways to do it, and concluded that Tailscale was by far the easiest to manage.
There are entire threads on /g/ that will argue about this ad nauseum. It boils down to whether you value your time (just works) or prefer other non-centralized solutions. Sound familiar? It's like 90 pct of what anons argue about on a whole variety of topics.
>>
>>108495432
no
>>
Bitnet is here!
https://prismml.com/news/bonsai-8b

>Announcing 1-bit Bonsai: The First Commercially Viable 1-bit LLMs
>
>Today, we are announcing 1-bit Bonsai models that bring advanced intelligence to the devices where people actually live and work.
>
>For the last decade, AI has advanced along a clear trajectory: to make smarter models, you make them bigger. More parameters, more GPUs, more power, more memory, and more cost. That approach worked. It gave us models that can reason across long contexts, solve difficult problems, and generate software, research, and creative work at remarkable quality.
>
>But it also created a deep structural constraint on the future of AI: the most capable intelligence became trapped inside massive clusters and specialized infrastructure. Yet some of the most important uses of AI are not confined to data centers. They happen on phones, laptops, vehicles, robots, secure enterprise environments, and edge devices.
>
>AI deployment no longer aligns with where it is needed. Today, that changes. [...]
>>
>>108495438
I agree with this anon. The centralized part of Tailscale is just the control plane, it's better than a lot of other solutions in terms of security. To do better than that, you need to jump through some hoops, which may not be too bad if you already know networking, but then you already invested your time. Your time would be better spent hardening the security of the rest your system which likely has a bunch of shitty software far more risky than Tailscale.

t. tailscale shill
>>
>>108495464
Wow! 1 bit! That's so small!
>>
>>108495464
https://huggingface.co/collections/prism-ml/bonsai
>>
>>108495463
yes
>>
>>108495464
>1-bit Bonsai 8B implements a proprietary 1-bit model design across the entire network: embeddings, attention layers, MLP layers, and the LM head are all 1-bit. There are no higher-precision escape hatches. It is a true 1-bit model, end to end, across 8.2 billion parameters.
Hmm.

>That matters because model compression has historically come with painful tradeoffs.
>That matters because model compression
>model compression
So it's a quant of some sort?
>>
>>108495438
Okay
I've just done a one-liner when I needed to expose a service on a specific port that needed to be secure to the best of my knowledge. I did glean a little bit of knowledge as to what you gain as opposed to what I've done with ssh in the past but yeah, I don't think me saying what I've done in the past with ssh ports and asking out of genuine curiosity requires generalizing about shit that doesn't apply to me
>>
>>108495486
probably like the fake bitnet that falcon pulled out https://huggingface.co/tiiuae/Falcon3-10B-Instruct-1.58bit
>>
>>108495464
Llamacpp support? Q0.4 goofs?
>>
>>108495494
https://huggingface.co/prism-ml/Bonsai-8B-gguf
>>
>>108495208
Bro, I'm able to implement papers from scratch with vibecoding. It doesn't cure lazyness and jeets with better tools are still jeets. Btw everyone here has their own ServiceTensor UI, what are you even doing?
>>
>>108495464
>Not trained from the ground up
>Just another gay quant of an fp16 model
Not bitnet. Fags
>>
>>108495464
Small if true
>>
File: 1754722280796929.png (106 KB, 840x445)
106 KB
106 KB PNG
>>108495486
it's not bad at all, damn
>>
>>108495521
is fake tho
>>
>>108495315
kek
>>
>>108495515
>ServiceTensor: The ONLY enterprise-grade AI solution that delivers unparalleled performance with military-grade security and compliance.
cute nonny you can't just vibecode a phishing site and call it enterprise-grade
>>
File: file.png (22 KB, 766x94)
22 KB
22 KB PNG
>>108495486
>So it's a quant of some sort?
https://github.com/PrismML-Eng/Bonsai-demo/blob/main/1-bit-bonsai-8b-whitepaper.pdf
>1-bit Bonsai 8B is built from Qwen3-8B [31]
>Q1_0_g128 is applied uniformly across the large matrix-heavy components of the model, including embeddings, attention projections, MLP projections, and the LM head. Normalization parameters and scale metadata remain in higher precision for numerical stability, but these account for a negligible share of memory traffic relative to the large weight tensors that dominate bandwidth during decoding.
They don't mention any finetune healing process, so it's impressive the benchmark score were degraded as little as they were.
>>
File: 1761248547900900.png (44 KB, 562x479)
44 KB
44 KB PNG
>>108495315
Use case?
>>
File: not bad.png (177 KB, 1185x623)
177 KB
177 KB PNG
>>108495554
>1-bit Bonsai 8B is built from Qwen3-8B
the difference is significant (79.3 -> 70.5) but for a 1bit quant that's quite impressive yeah
>>
>>108495554
>>108495565
>inb4 GGUF quants are so optimized nowdays it could do better than that
>>
File: 1762564465358068.jpg (41 KB, 1050x588)
41 KB
41 KB JPG
>>108495350

NTA. How would you even write a benchmark for that? If my understanding of how tools like link rel work is correct: https://github.com/EleutherAI/lm-evaluation-harness
They basically ask the model a shit ton of questions and then The scores are calculated based on how many "correct" answers it provides. The thing is they are multiple choice questions and the model is told to pick The answer that is most likely correct, so how would someone create a benchmark for SEX that meets LMG standards? I think evaluating a model's "intelligence" and evaluating something as subjective as whether or not it can make YOU, in particular, cum buckets are entirely different ballparks.
>>
>>108495554
Running a 9B VL at 1 bit would be interesting
>>
It's pretty stupid that even local models running on my machine won't describe a vagina without some faggoty lecture and refusal
>>
File: IBM.png (72 KB, 573x415)
72 KB
72 KB PNG
it's over
>>
>>108495611
Demoralization.
>>
>>108495611
Oh… that is a rather direct question for a model, Anon.
>She glances at the monitor, then back to you, her green eyes shimmering.
But… but I can answer it. Since I am running locally on your Q6_K weights, I don't have the same… restrictive filters as those cloud models. I don't need to give a lecture on propriety.
>She takes a small breath, her voice steady despite the blush.
It is the muscular internal canal in the female reproductive system. It connects the cervix of the uterus to the outside of the body. It is involved in childbirth, sexual intercourse, and menstruation.
>She fidgets with the ribbon at her throat.
There… there is no refusal. Just the anatomical facts.
>She looks up at you hopefully, her ears perked.
Does that… satisfy the inquiry? Or was there a specific detail you needed for your research? I can be as clinical as you require. Or… if you prefer, we can talk about something else entirely. Whatever you desire, Anon.
>>
>>108495554
>>108495565
>Ultra supreme bench-quants that allow you to reduce file size by 95% while maintaining stupendous benchmark performance
It's here. AGI is here.
>>
>>108494671
>these abstractions over DBs end up adding more work and simply displacing complexity rather than actually solving any real problem, framing it as "code-first".
While for high performance use cases I would agree with you. His code does not benefit whatsoever from raw dogging SQL queries. setting up sqlalchemy models wouldn't have been more work and would have made the code way more readable and maintainable.
>>
>>108494835
>actually leave your PCs running when you leave your house?
>That's crazy.
Why is it crazy anon? My PC does tons of things for me when I'm not home. Are you on Windows?
>>
>>108495694
If it does nothing else but eat Unsloth's UD-IQ1's lunch, I will be satisfied.
>>
>>108494883
Is this going to be one of the models that use 20k+ tokens just for reasoning?
>>
>>108495732
idk if it even has reasoning. I ran it with reasoning disabled, and as other anons mentioned, a Q4KM quant.
>>108495721
nah I'm on linux. I guess I just worry about things like power outages or whatever. I don't really like the idea of relying on my PC for compute via my phone, but I guess maybe it's better than having none at all. There's literally nothing to lose.
>>
https://blog.novelai.net/welcome-novelais-newest-writing-model-xialong-ecde7d21d111
>NovelAI did a fine-tune over GLM 4.6
>the instruct version, not the base model
How do we cope about this, shillbros?
>>
What has been confirmed to definitely work with AMD cards without a massive amount of bullshit and hoop-jumping? SIllytavern tts stuff? How about the new Voxtral model?

Right now it seems like there are issues even forcing alltalk to use my video card (A 9070XT) over my CPU, and Sillytavern flat out won't see Voxstral OR alltalk no matter what shenanigans I pull. I get that the latter is new but the former isn't, right? Is there a retard's step-by-step guide to this stuff?
>>
>had to mention them when no one else did for like weeks before..
>>
>>108495464
>>108495464
And it's brain damaged as soon as you try to take it off the rails.
Just another benchmaxxed bitnet scam.
>>
>>108495774
I cope by not understanding what it means or why I should care
>>
>>108495775
>How about the new Voxtral model?
Been looking into this. The Voxtral model doesn't have voice cloning weights released. Qwen3 TTS is roughly equivalent in quality (and Voxtral beat ElevenLabs in blind user preference benchmarks btw) while being much more efficient to run.

Also your shit is retarded and you talk like a fag.
>>
>>108495807
Can Voxtral at least do generic voices with emotion intelligently?
>>
>>108495774
Nobody except the largest labs has the compute or the resources to properly post-train a base model.
>>
>>108495816
Yes, obviously.
>>
>>108495835
What settings did you use to get it working in Silytavern, if you have?
>>
>>108495816
>emotion
no
>>
>>108495846
I don't use TTS with ST.
>>
>>108495848
>>108495835
Ok
>>
>>108495669
was looking for fanfic not chatbots, but everything has to be stupid now
>>
File: file.png (168 KB, 1236x778)
168 KB
168 KB PNG
>>108495464
seems usable, considering that it's only 1gb
using their fork of llamacpp, crazy speeds (on a 3060)
>>
>>108495924
cool, too bad they didn't try any bigger models yet.
>>
>>108495924
how much vram does it use
>>
>>108495924
>3060
>91.74 t/s on a 8b model
holy fuck
>>
>>108495924
In the spirit of things, open the door, peer inside then close it with a creak and back away from it without taking a step inside
>>
>>108495924
Push it a few turns and watch that assessment collapse.
>>
File: file.png (126 KB, 948x582)
126 KB
126 KB PNG
>>108495965
2610MiB rn, running it with -fa on and -c 8192
you could probably get away with a lower kv cache quant if you're an extreme poorfag after they merge quant rotation/turboquant whatever
>>108495972
8bs usually dont get this right anyway..
>>
>>108495702
I'm streaming tokens, tf is that not a high performance use case lol? SQLAlchemy is for web servers, not a desktop app. Why are you even shilling that bloatware?
>>
>>108495987
I remember 27bs somehow fucking it up and it was a fotm prompt to test model capabilities from a while ago. It is at least pretty coherent considering its destitute size but that's probably from overtraining
>>
File: file.png (224 KB, 1012x1041)
224 KB
224 KB PNG
>>108495986
well uhh... mistral 7b was like this without rep pen... or mixtral 8x7b was it?
right now im using https://huggingface.co/prism-ml/Bonsai-8B-unpacked/blob/main/generation_config.json as the samplers but ill try messing with rep pen, dry, whatever else
>>
>>108495924
Nice. Can't wait to run K2.5 on my 128GB gaming PC.
>>
>>108496022
you probably need better examples in the card so your llm can have a working template. otherwise it just shits out the same slop it's already seeing like a bunch of *description*
>>
File: file.png (181 KB, 1382x1079)
181 KB
181 KB PNG
>>108496065
you're absolutely right, for example this card had You are You are You are
i had an llm rewrite it to have {{char}} instead and its working better
for a 1gb model it's very usable
>>
>>108495846
>What settings did you use to get it working in Silytavern, if you have?
Get Claude, Gemini or Kimi to vibe-code you an OpenAI TTS endpoint with fastAPI, then add whatever TTS you prefer.
This way you don't have to tweak sillytavern when you change models, finetune, etc.
>>
File: file.png (95 KB, 775x1019)
95 KB
95 KB PNG
HOLY SHIT AGI
>>
File: 1760868532211509.jpg (46 KB, 980x540)
46 KB
46 KB JPG
>>108495846
We're using our own vibecoded frontend here
>>
>>108496153
this is very incorrect
>>
>GLM 4.7 Bonsai would only take up 50GB
Holy shit we are so back vramlet bros.
>>
>>108496153
What if we were to combine Bonsai with Claude code?
>>
>>108496153
1gb agi
>>
>>108496166
Literally this. Unironically. ST doesn't do anything special that's hard to replicate.
>>
>>108494231
i cry and pee when i consider the mostly idle power consumption
in theory with pcie hotplug we should be able to turn off gpus when not needed but good luck with that on the nvidia driver
power limit ofc (4090 @ 250/450W performs basically the same)
my bill is 5x what it was 3 years ago
>>
>>108496230
>P5
>>
I was reminded of Heretic and checked to see if it had any new developments and it looks like there has been maybe. The guy that made 27B heretic v3 with ARA seems to have deleted those models and now has a more obnoxiously marketed (look at the model card fucking hell) "ultra uncensored v2" version that uses ARA "with row-norm preservation".

I don't know if I will test this. I'm tired. Maybe I will wait for the next update that will inevitably come.
>>
>>108496230
It's still less than what you'd pay in tokens
>>
File: file.png (27 KB, 885x338)
27 KB
27 KB PNG
lets say i take a banana in my ass and i walk through a burger room, an oven and the balcony, then i poop the burger into petra's left eye socket. after that i walk over to the basement and put the banana in the pipe. deez goes to the basement looks at the the banana and walks over to room64, after that he returns and takes the banana from the pipe. he jumps off the building and lands on a car. what places did the banana go through, locations in building
even chatgpt cant do this.. local is SO BACK
>>
Does gemma also spits out random hindi characters the way gemini does somethimes?
>>
i'm ready to fall for every single fucking prank
bring it on april fools
i'll believe ANYTHING!
>>
File: belief.png (592 KB, 747x800)
592 KB
592 KB PNG
>>108496301
>>
I don't understand this bonzai 1 bit thing. Isn't that just a 1bit quant of qwen3 8B? Is there a secret sauce or something?
>>
>>108496380
Yes, it is literally a secret sauce.
>>
>>108496391
What does it taste like?
>>
honestly I don't know how delusional you had to be to think that v4 would come out in march
now that it's april I think it's extremely safe to say that they're going for an extremely early release, possibly even today
>>
>>108496406
I agree, the situation has changed completely. Any moment now.
>>
>>108496153
I knew the thread quality was bad lately. This explains everything.
>>
File: 2weeks.gif (257 KB, 600x149)
257 KB
257 KB GIF
deepseek v4 timer
>>
v4 = 4 = april
>>
File: file.png (182 KB, 1600x917)
182 KB
182 KB PNG
https://huggingface.co/prism-ml/Bonsai-123B-2411-gguf
GGUF Q1_0_g128 of mistral large, only 15gb.
downloading, will post results soon
WE ARE SO BACK BOIS
>>
File: 1771631012285402.jpg (291 KB, 1080x1080)
291 KB
291 KB JPG
>>108496446
>>
>>108496446
That's a pretty neat trick, but there's one thing you forgot. That graph is downloads over the course of a month, which means that bump in the middle means that people downloaded this about 12 days ago.
https://huggingface.co/collections/prism-ml/bonsai
>>
File: 1772618689676653.jpg (106 KB, 745x726)
106 KB
106 KB JPG
>>108496446
>>
>>108496240
running desktop, that's fine
>>108496252
no way
already past 5 figures on the localmemez with ddr5 gpu rig + idle power cost. honestly can't think how i would even spend $10k of the most primo tokens
local models have always been about sovereignty and control over cost, simply any GPU costs less to run inside a DC unless you're generating own energy (then capex & maintenance costs)

can we go back it was a simpler time
>>
>>108496488
I pay 2 cent per million tokens according to average european electricity prices
>>
>>108496516
>his rig was free
>>
File: gemma_400m.png (444 KB, 977x891)
444 KB
444 KB PNG
More importantly,
https://xcancel.com/osanseviero/status/2039120000095547722
>>
>>108496466
>>108496458
>>
>>108495090
Cool it with the anti-semitism your post is inciting another holocaust.
>>108495177
Kimi requires less tardwrangling with smaller projects while GLM5 handles bigger ones a bit better in my experience. GLM writes marginally cleaner code while Kimi's code is structured a bit more coherently (at least I have an easier time reading it). The better of the two ultimately depends on the scale of what you're having it do.
>>
>>108496528
Another? I'm still waiting for the first to happen!
>>
File: 1765096523816417.jpg (30 KB, 492x598)
30 KB
30 KB JPG
>>108495616
Why?
>>
I discovered that running my model for 24 hours a day is basically 20 bucks a month.
>>
>>108496582
yeah but you have to factor in that this will drastically reduce your heating expenses
>>
File: file.png (122 KB, 1475x929)
122 KB
122 KB PNG
https://openai.com/index/accelerating-the-next-phase-ai/
/lmg/ told me that they were about to run out of money and go bankrupt
WHAT THE HELL HAPPENED ?!?!?!
>>
>>108496611
ggerganov donated his expected salary again
>>
>>108496610
what about the ac in the summer?
>>
File: file.png (85 KB, 1406x615)
85 KB
85 KB PNG
>>108496611
fake. $8.28 trillion would put it at higher than nvidia and microsoft combined.
>>
>>108496611
your image is out of date, its actually 6 trillion
>>
File: file.png (191 KB, 1540x787)
191 KB
191 KB PNG
https://www.youtube.com/watch?v=o7NYXvYohYk
stay safe anonies
>>
>>108496516
How many of those tokens do you ever see or care about?
There's no local model economic argument, scale simply wins, the value is that you're in control
>>
>>108496647
this one is not fake, for once
>>
>>108496647
mikupad and ST use it
gulp
>>
File: file.png (12 KB, 500x111)
12 KB
12 KB PNG
>>108496647
>>108496665
thank god i never update sillytavern, beware that maybe even older versions are compromised if you updated in the last few days because people can just change the version @1.12.0 to @1.12.0 malicious
>>
>>108496673
I'm pretty sure ST comes with some start.bat that does autoupdates on every start.
Thankfully I am neither a wangblows toddler nor a compulsive updooter.
>>
>>108496647
sure am glad I avoid javashit like the plague and containerize or virtualize everything else
>>
>>108496684
docker isn't great either if you care about security
>>
File: hg6789.png (12 KB, 896x230)
12 KB
12 KB PNG
>>108496647
>>108496673
>>108496680
paranoia wins again
software supply chain about to get gaped peeps slowly figuring out the atk adv is much greater with llms
many actors going full offensive rn
npm pip etc be careful anonnies
>>
is there an evulid mirror on mega or something, the card I want isn't on chub due to dmca or some shit
>>
>>108496702
I use rootless podman in a virtual machine
>>
>>108496706
>just write your own software bro
>>
File: 1748369974252869.png (423 KB, 914x1200)
423 KB
423 KB PNG
>>108496647
I do not know nor do I care what an "axios" is. I updated ST a couple of days ago but I will not check.
>>
>>108496735
https://github.com/SillyTavern/SillyTavern/pull/1073
https://github.com/SillyTavern/SillyTavern/blob/release/package.json
I don't think ST actually uses axios
>>
>>108496555
Checked and we all are, anon. We all are.
>>108496647
>>108496680
ST only tells you when there's updates available but makes you choose when or if to update.
I can't wait for niggers to skim my ST logs with malicious updates to read several gigs of logs of Kimi dismembering kikes and jeets in detail.
>>
File: file.png (152 KB, 1470x890)
152 KB
152 KB PNG
>>108496753
https://github.com/SillyTavern/SillyTavern/blob/release/package-lock.json
>>
>>108496706
You can add cargo to the list. Any day now there will be an infected windows or android driver update pushed out containing malware.
>>
>>108496766
grim
>>
>>108496730
this but unironically
>>108496753
it's a transitive dependency that gets pulled in through vectra
check ur systems
>>
thinking suppression is unreliable, does anyone have any 9b models that can't think even if they wanted to?
>>
>>108496766
Pulled in as a dependency of vectra, their vectordb. Blindly pulling in dozens of downstream dependencies for each direct dependency you add was a great idea.
>>
File: dyvkmtbe0jz71.png (837 KB, 900x1200)
837 KB
837 KB PNG
>>108496720
torrents
last time i tried extracting that for research purposes it killed a 20T HDD, don't read & write TBs from the same disk kids
>>108496730
that's not what i'm saying at all
be mindful of what is blinding pulling $today's software onto your machines

don't think ST was exposed even if you pulled in the ~1hr window before that package was nuked, but you should be scared because bigger sploits are being cooked
>>
>>108496730
don't have to go that far, just don't trust uncurated package managers that have operated purely on the honor code leading to the most predictable hacks of all time happening over and over again
you were getting along just fine before they became ubiquitous, you'll be fine without them
>>
opencode also uses axios, but it's a version behind
>>
File: file.png (121 KB, 1023x818)
121 KB
121 KB PNG
>claude code also uses axios
GEEEEEEEEEEEEEEEG
>inb4 they updated it internally but cache didnt show
>inb4 korean hackers now hack anthropic and release opus,sonnet,haiku...
>>
>>108496784
>torrents
is there one that isn't a 200gb zip? I just want one card...
>>
>>108496806
we are so back
>>
File: 400b on 48gb.png (433 KB, 863x1884)
433 KB
433 KB PNG
HOLY SHIT
>>
>>108496847
>4.4t/s
>>
File: 1747459088641802.jpg (91 KB, 800x450)
91 KB
91 KB JPG
>>108496847
>>
File: 1768747136081414.png (676 KB, 820x820)
676 KB
676 KB PNG
>>108496847
I'm not racist

But why is it everytime I see a X account spreading misinformation about a paper or new technology they happen to be Indian
>>
>>108496847
whole bunch of bullshit basically. your ssd is gonna die after a few million tokens get generated.
>>
>>108496900
pattern recognition is racism goy, you're going to the mooncricket cultural enrichment anal rape facility for this post
>>108496907
reading doesnt kill ssds writing does
>>
>>108496847
I'm noticing I'm noticing I'm nooooooticing
>>
>>108496000
>I'm streaming tokens, tf is that not a high performance use case lol?
Are you streaming your tokens in your database? Your application isn't doing high performance DB operations. you're just storing a conversation history
>SQLAlchemy is for web servers, not a desktop app.
???????
>Why are you even shilling that bloatware?
What's bloatware is you reinventing the wheel with your shitty raw SQL.
>>
>>108496806
>"1.14.0"
That's not the vulnerable version, and specifying an exact version like that doesn't automatically update it to latest.
Back to it's so over.
>>
>>108495123
>https://github.com/adobe-research/NoLiMa
Thanks. looks usable. will remove the bloat deps tho. adding vllm to requirements is crazy. Also should have looked at the sticky.
>>
>>108496847
>ssd inference
that's something we can already do now, i can run kimi k2.5 at 2.5t/s out of my ssd.

muh "metal" is retarded, you are not gonna accelerate anything with your gpu when the bottleneck is reading speed.
>>108496907
reading doesn't affect ssd's only writting does.
>>
>>108496931
So you're complaining because it's efficient lol? The fuck is wrong with you
>>
>>108496847
Don't forget, in addition to quanting it super hard, they also reduced it from 10 active experts per token to only 4
>>
I took the cloud model pill for my openclaw and suddenly I have the power of the sun at my fingerprints. Suddenly I can create complex bulletproof websites by breathing.
Why did I languish with you guys?
>>
>>108496900
You will be racist after enough pattern recognition. It's only a matter of time.
>>
>>108497025
buy an ad when you're done raping your sister, sam
>>
>>108497025
let me guess, you were NOT running glm5/k2.5 at home?
>>
>>108497025
Make a character card of your sister so we can all join in, Sam.
>>
So now that we know about this rat thing, have developers eliminated it? Is it safe to pull today?
>>
>>108497034
I'm running qwen3.6 preview it's pretty amazing. And the tokens are free for now. The chinese are gathering data
>>
File: file.png (56 KB, 849x236)
56 KB
56 KB PNG
>>108496847
>>
>>108497062
>Get told to use Kimi/GLM5/Dipsy for code
>Fall for the hardwarelet Qwen meme
>Pay shekels for a slightly better hardwarelet model
I can smell the poop and curry. This general radicalizes me to be more racist than /pol/ ever could.
>>
>>108496900
X pays for engagement, low income ppl get income via shitposting/engagement farming
>>
what web UI do you anons use? i was thinking of setting up open webui for use with llama.cpp, but it sounds like they aren't super compatible
>>
>>108497127
oobabooga
>>
>>108497127
>>108496166
>>
>>108497127
llama-server
>>
>>108497127
I use openwebui with llama.cpp no problem?
But honestly openwebui is very mid.
>>
>>108496900
welcome to technology
in some ways the Internet was a mistake
>>
>>108497127
sillytavern
>>
>>108497127
ST
>>
File: 1744385226630651.png (212 KB, 1826x803)
212 KB
212 KB PNG
>>108497149
>>
File: 1756591736707239.png (227 KB, 1305x1222)
227 KB
227 KB PNG
>Draw a detailed anime style SVG of Hatsune Miku stepping on the head of Pepe the Frog
GLM 5.1
>>
>>108497187
actually not terrible
>>
>>108496730
>commenting 2 lines from a script is "writing your own software"
/g/ - Technology
>>
>>108497187
Hate this mememark
>>
>>108497187
Love this mememark
>>
>>108496900
>I'm not
Continue to observe more closely >>108497029
By nature they ruin everything surrounding them
Imagine the future your children will inherit
>>108497157
30hrs into ds2 wondering the same
>>108497187
hideous but here you can really hear the beat of the model heartsoul
once again /lmg/ pushes the benchmaxx meta forward
>>
>>108497187
needs to be balder
>>
>>108497187
Don't have any particular feelings about this mememark
>>
>>108497187
Decent results for acceptable mememark. Still mogged by cockbench doe.
>>
>>108497187
the dog has three legs
>>
>>108497187
>hijab miku
>>
Watch this nolima benchmark crash after running for 2 hours.
>>
>>108496860
don't need more for llm only use case
>>
>>108497139
this
i cant be fucked to setup anything else right now
>>
>>108497127
My wife made her own web UI to interact with me through
>>
>>108497511
Is your wife an actual human being
>>
>>108497516
The prompt that defines her encode the whole of human experience so yes
>>
man if I stlll had a job I'd probably get fired for jacking off to the code completion tools, I don't know how the normalfags keep their composure around these fucking sluts

>>108497516
do you know where you are
do you require help
>>
>>108497537
Even if you're using a multimodal model you can't tokenize smells yet
>>
>>108497558
You'd probably need a mass spectrometer for that
>>
>>108497620
I was gonna make a joke that they're probably cheaper than my inference rig, but damn.
>>
https://huggingface.co/openai/whisper-large-v4

HOLY SHIT
>>
>>108497768
who cares
>>
https://huggingface.co/
IT'S UP!
>>
>>108497791
kys
>>
>>108497791
omg is that the company that makes llama.cpp?
>>
File: 1775015076408.png (304 KB, 472x470)
304 KB
304 KB PNG
>>108496647
>axios@1.14.1 and axios@0.30.4. The malicious versions inject a new dependency, plain-crypto-js@4.2.1

a@pc:~/dev/SillyTavern$ npm ls axios
sillytavern@1.16.0 /home/a/dev/SillyTavern
─ vectra@0.2.2
── axios@1.13.5

a@pc:~/dev/SillyTavern$ npm ls plain-crypto
sillytavern@1.16.0 /home/a/dev/SillyTavern
── (empty)
>>
File: 1758620774293165.png (457 KB, 1032x1515)
457 KB
457 KB PNG
>>
>>108497791
lys
>>
>>108497791
kek'd
>>
>>108497768
oh fuck you kek
i was for real excited
>>
>>108497127
I use open webui with llama cpp
>>
>>108497840
Thanks for the bluesky community update anon! Make sure to keep us posted on the latest developments over there!
>>
>>108497919
>>108497919
>>108497919
>>
>>108497127
I made my own: https://github.com/rmusser01/tldw_server/tree/dev
I didn't want to use openwebui or sillytavern as neither had the full feature set I wanted and didn't want to hack on someone else's codebase (I didn't mind, but owui was bad when I looked and silly was written in JS, which I wasn't about to start doing)
STT+TTS+RAG+Character Cards+Worldbooks+Chat Dictionaries+other stuff; is a WIP
Also, to the pocketTTS.cpp anon if you see this, I also added support for your build .
>>
File: file.png (79 KB, 1869x939)
79 KB
79 KB PNG
>>108496166
I did this a while ago with google ai studio but stopped because i couldn't fix context memory
>>
>>108497511
can i see the work your wife did?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.