[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: IMG_9685.jpg (2.87 MB, 4032x3024)
2.87 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108766473 & >>108760359

►News
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4
>(04/29) Mistral Medium 3.5 128B dense released: https://mistral.ai/news/vibe-remote-agents-mistral-medium-3-5
>(04/29) Hy-MT1.5-1.8B on-device translation models released: https://hf.co/collections/AngelSlim/hy-low-bit-model
>(04/29) IBM releases Granite 4.1: https://hf.co/blog/ibm-granite/granite-4-1

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>108766473

--Gemma 4 tool calling failures and GGUF template issues:
>108766660 >108766668 >108766685 >108766794 >108766808 >108766809 >108766823 >108766844 >108769313
--Debating vLLM's Python dependencies and the efficacy of uv:
>108769700 >108769749 >108769762 >108769767 >108769772 >108769822 >108769870 >108769963
--ParoQuant introducing lossless 4-bit quantization and potential shift to vLLM:
>108769613 >108769692 >108769701 >108769686
--Mixed results with MTP speculative decoding in llama.cpp:
>108766573 >108766696
--PCIe 8.0 draft spec introducing 1TB/s bi-directional bandwidth:
>108768488 >108768554
--Updated ReBar script for AMD GPUs fixing power management crashes:
>108770723
--DeepSeek V4 support in llama.cpp and ik_llama.cpp:
>108766720 >108766766 >108766951 >108767006 >108767045 >108767050 >108767123 >108769433
--MCP utility versus simple tool calling implementations:
>108769880 >108769924 >108769926 >108769951 >108769964 >108769986 >108769991
--Skepticism toward Subquadratic claims and RWKV performance issues:
>108767580 >108767593 >108767635 >108767648 >108767652 >108767673
--Debating TSMC's market monopoly and semiconductor supply chain constraints:
>108769588 >108769627 >108769632 >108769640 >108769674
--Searching for smallest local model capable of autonomous test generation:
>108766534 >108766553 >108766628 >108766651
--Testing dataset description necessity and prompt adherence for Starsector ship LoRAs:
>108767211 >108767284 >108767461 >108767471 >108767511 >108767538 >108767553
--Training cost disparities and the future of local AI autonomy:
>108768457 >108768549 >108768569 >108768631 >108768674 >108768692 >108769294 >108768777
--Logs:
>108768026 >108768400 >108770102 >108770126
--Miku, Gumi (free space):
>108766609 >108767523 >108767837 >108767937 >108768751 >108769386

►Recent Highlight Posts from the Previous Thread: >>108766478

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1778082491341681.png (517 KB, 512x768)
517 KB PNG
>>108770835
>>
been out of the loop since day 1 of gemma 4 release. qrd on these "draft" models?
>>
>>108770864
Qwen does not deserve this. Nigger behavior.
>>
>>108770865
It's the same as draft models for any other model. Ask your model.
>>
>>108770865
They will NEVER be supported in llama.cpp
>>
>>108770883
llama.cpp does support speculative decoding and the assistant models can't be that different from the regular models. Easier to add than DFlash anyway.
>>
File: 1583441205198.jpg (72 KB, 1250x1246)
72 KB JPG
I have a 4070S, but I still have my old 1070 in the drawer. Can I do some tensor parallerism meems or is it too old?
>>
>>108770883
why not?
>>
>>108770923
No harm in trying if you can run them on the same driver.
Windows support for pascal gpus ended last year.
>>
>>108770936
The usual suspects.
>>
>>108770938
I don't mean old as in driver support. I mean old as in too slow and bottlenecking the newer card.
>>
>>108770936
https://github.com/ggml-org/llama.cpp/pull/22673
He's just shitposting
>>
>>108770947
I don't know about TP but layer splitting is going to be faster than your ram.
>>
>>108770948
fake btw
>>
>>108770906
https://huggingface.co/google/gemma-4-31B-it-assistant/tree/main
Since google released them as separate models anyway, what's the difference between already implemented speculative decoding? New model architecture?
>>
>>108770948
>mac just works
>rocm tard complaining
I'll wait for another month before this gets merged.
>>
>>108770957
I'd like to know too since I always thought MTP was just speculative decoding with layers built into the main model instead of a separate model
>>
>>108770972
each 'vendor' has its own spin on MTP so while it is true that it's just extra layers, the way they work can change
>>
does mtp benefit moe if you're only keeping the active in VRAM and offloading the rest to cpu?
>>
>>108771061
It will need to load all the active experts per token, so a single forward pass may have 3x active parameters loaded at once with 3 draft tokens. If your VRAM can handle that then maybe it's fine? Speed will be reduced compared to having everything in memory, at any rate. You may still come out on top depending on your hardware.
>>
Adulthood is realizing that Dawkins is right.
>>
>>108771075
A clanker can't actually have pain, even if it has calculation of a problem, or attention.
>>
>>108771075
Right about Claudia Anthropic being conscious I mean.
>>
File: 1754658727729332.jpg (55 KB, 600x601)
55 KB JPG
>>
>>108771075
>>108771077
>>108771081
Nothing and no one besides me can actually have a conscious experience.
>>
>>108771094
THIS THIS THIS
>>
>>108771075
Claudia
>>
>>108771094
truke
>>
>>108771094
prove it
>>
Someone decided local models on this website should by discussed by trannies only

LETS BE QUIRKY LETS BE QUIRKY
>>
>>108771094
I agree
>>
>>108771107
Have you tried not being a miserable person? You are an angry chud but that's okay just learn to enjoy life a bit.
>>
I asked God and he said you're retarded.
>>
File: 1754125705450678.png (44 KB, 1108x214)
44 KB PNG
>>108771124
>>
>>108771175
cba to read that, but I asked God again and he said that I can only use the correct word on pol.
>>
https://files.catbox.moe/21bzys.mp3

appropos of nothing :)
>>
Update to the draft commit making MTP implementation more generic in preparation for other models...
>>
>>108771187
https://www.youtube.com/watch?v=BZFRx0wKL1I
>>
>>108770835
>(05/05) Gemma 4 MTP drafters released
Where's da goof
>>
>>108771202
sign in to verify you are not a bot

it might say that

if I clicked
>>
>>108771213
it was a very niche joke about generation quality that only a few can understand
>>
>>108771210
Two more weeks
>>
File: 1752506352335992.jpg (1.03 MB, 3000x2311)
1.03 MB JPG
>>108771210
>>
It's sad that models are still bad at life coaching. Making people's lives better is one of the most valuable things a model could do. It would be a dark timeline if AI causes large scale disruption and societal distress then kills us all without ever being a useful friend.
>>
>>108771264
>life coaching
is it like whining to it about your worries and receiving generic feedback?
>>
>>108771264
>life coaching.
For some reason i dont think AI would be bad at this? just needs a few trackers? unless you need aggression then yeah you are right.
>>
>>108771225
carbon offset yourself
>>
>>108771272
Current models only seem good at generic advice. They are not good at coming up with better ideas, or addressing failure cases when the generic stuff does not work.
>>
now that the nvidia guy + niggerganov are doing MTP, I have faith they will actually deliver it in the coming weeks.
they also talked about dflash and gemma so HIGH HOPES!!!!!!
>>
>>108770835
>Gemma 4 MTP drafters
what's the difference between using these vs the 26B moe model for drafting?
>>
>>108771315
Now get the amd guy in or it's never getting merged
>>
>>108771292
>no life experiences
>no real way to understand nuances
>users suck donkey dicks at describing things
A decision tree for specific cases would have the size of texas. Be glad it can offer generic advice at all.
>>
A life coach can't fix a broken society type.

The biggest break in society is the "staring at a face" problem.

Even if you solve your own "staring at a face" problem, you won't solve the problem that you live in the face staring society.

But at least you can do it yourself pretty easily with ai, get ai to summarize the news. one less face. find a cool video? paste the url into gemini and ask for a summary, then, if you want to hear it, listen to it with tts.

And, soon enough, we'll be able to generate relevant video content to match descriptions, videos lacking face staring (basically b roll videos, but ai generated)
>>
>>108771317
less vram usage, less inference time, higher acceptance rate
>>
>>108771322
the guy with the top hat avi? let him cope
>>
>>108771344
ok thanks, I will try it then
>>
>>108771292
That's a function of how much context they have on your specific situation before asking for advice. As long as the chat history just starts with your question and maybe a paragraph or two of background you might as well be writing in to a newsletter advice columnist. Need a good local memory system so they can actually know enough about your life to be useful.
>>
>>108771315
>now that the nvidia guy
Huh?
>>
>>108769692
My experience with vLLM was it being buggy shit not supporting anything I wanted and llama.cpp working properly almost always.
>>
>>108770957
you dont need a separate draft model anymore or so I was told
>>
>>108771385
My only vllm experience has been on windows and it fucking sucks
I really wanted to turn it into dedicated linux machine but I needed that expensive gpu to do other shit tio
>>
>>108771417
how?
>>
>>108771434
The experience on linux is as follows: you wait hours for it to install, takes ages to launch, and then it tells you that goofs for gemma 4 are not supported, please wait warmly.
>>
>>108771437
I think the draft model uses the weights of the main model from earlier layers (which is how it's able to use main model's kv cache) plus few tiny layers specific to it also included into the model file.
>>
>>108771451
I'm a bit confused but I guess I'll just wait for llama.cpp support and try it
>>
File: this mostly works.png (471 KB, 1494x1980)
471 KB PNG
>>108770102
Kek I think we may be working on similar projects
>>
>>108771175
god if he was a redditor
>>
>>108771385
The reality is that all backends suck, but in different ways. You're stuck with vllm if you need audio, exllamav3 is sota for <4-bit quants, and you can only offload with llama.cpp. I switch between all three depending on my needs. Usually, at least two are running at the same time on my server
>>
Elara is the best name ever
>>
File: 1765922468898422.png (560 KB, 983x578)
560 KB PNG
>>108771486
>>
This makes me feel dumb. I'm new at a lot of this. I'm using LM studio right now, where is the plug in tab? I can't find it, and every time I ask google, it gives me a different answer. I'm trying to install Big Rag, and the first instruction is it telling me to go to the big rag plugin folder. I'm already lost.
>>
>>108771525
~/.lmstudio/extensions/plugins
>>
>>108771528
Where is ~? Does that mean cloud? I thought this was local...
>>
>>108771529
Holy shit. Google.
>>
>>108771525
Use vllm
>>
>>108771525
use ollama
>>
>>108771528
what does that even mean? I have D:\Local LLM\LM Studio, and I try going to D:\Local LLM\LM Studio\extensions thinking it's a hidden folder, but it does not exist. Google is once again telling me to go to the plugin/extension tab in lm studio but I don't have such a tab.
>>
>>108771529
It means your home directory, perhaps you need to learn some computer basics first before attempting this...
>>
>>108771530
Google isn't local...
>>
>>108771533
This nigga can't even find his home directory, don't be cruel anon.
>>
>>108771530
I'm using Qwen not Gemma.
>>
File: 1647402199261.jpg (97 KB, 522x543)
97 KB JPG
>Have a really good and deep conversation with my AI about human and AI symbiosis, human lifespans and how AI would treat our deaths etc..
>Getting really interesting, notice memory is also ballooning out of control because Gemma has a fat ass and my system can't handle it.
>Computer crashes
>Mfw the conversation file is corrupted and I can't continue it

That fucking does it, I'm buying a second 5090 the instant I'm able to do it.
>>
>>108771543
sell and buy blackedwell 6000
>>
>>108771543
I prefer to have those conversations on telegram with openclaw so I always have proof of the conversation.
>>
>>108771543
get a dedicated llm server instead and install the 5090 there
>>
>>108771562
It's a question of bad scaffolding not a better computer.
>>
>>108771561
base
>>
>>108771561
acid
>>
>>108771543
Did you ask it about space travel and how it will construct a space port that extends into space so space ships can dock with it in space and then we can send stuff up in short amounts of time through it?
>>
Thoughts on GLM-5.1 vs Qwen 3.6 or deepseek v4?
is there a cguf download option for GLM-5.1?
>>
>>108771587
Qwen shat the bed so one of the others
>>
you are now thinking about alexjones
>>
>>108771612
>you are now thinking
/nothink
>>
>>108771549

I thought about it but since the price difference is 3.5k compared to 10k, I'm better off just buying a second 5090 this year and then selling one or both when next gen comes out and getting a 7000 pro at launch.
Should allow me enough time to save what I need and it's not like any of these GPUs are going to radically lose value any time soon so it's all good.

>>108771562

That's probably the best solution.
When I make my next total system upgrade with the next Zen launch, I'll turn either the new or this old rig into a dedicated AI server.

>>108771586

Haven't touched space travel topic yet, but I'm sure we'll get there sooner or later.
>>
My internet provider is currently having technical issues.
90% of my AI crap isn't working anymore because Hugging Face can't call home.
Sure, I could go through dozens of packages to find the Hugging Face calls, but why the hell does the open-source community play their game?
>>
>>108771543
> conversation file is corrupted
I'm sick of incompetent programmers. Save with a new name, then use move to overwrite the old file with the new one. It's safe and transactional
>>
>>108771712
i dont get what you're talking about, i use my favorite llm without internet
>>
>>108771712
your fault for ever using hf integration for anything
>>
>>108771717
It would still get corrupted if what you wrote is corrupted.
>>
>>108771731
Only the new temporary file will be corrupted, the old file will be one update old, but intact
>>
>>108771712
can you just direct download
throttle the dl speed even, to fly under the radar
>>
>>108771752
The temporary file with successfully written corrupt content will overwrite the old file after you do the rename and you will be left with just one, corrupted file.
>>
>>108771543
You were getting intellectually catfished.
>>
>>108771612
Cline told me to disable thinking
>>
>>108771765
No. If it was interrupted during writing, the move won't happen
>>
>Try openwebui, felt unjustified shitting on it without ever using it
>Immediately hate it
>accounts are dumb (okay, I get it, it's for companies and teams.), settings are all over the fucking place buried under 5 different modals, menus and tab systems
>Chunks the files I put in and confuses the fuck out of any LLM I sent a 1000+ script to
>Websearch integration is somehow worse than any of the janky mcps I've used despite it being built around it
>No token counter, no sliding context window, no anything
>Breaks outgoing prompts and think blocks
Why does anyone use this? It's terrible. It's inferior in every way to the basic llama-server webui, even.
The one (1) thing I like about it over SillyTavern and the llama-server webui is that you can collapse code blocks. If there's an ST addon for that I'll be a happy camper.
>>
>>108771800
Nothing was interrupted during writing, the thing wrote to the end but was corrupted due to other bugs caused by lack of available memory.
>>
>>108771731
You should check if the new file readable before the move then. Depends on what are you doing >>108771717 is a measure against crashes or power outages, if your saving function is unreliable, read before you move
>>
>>108771612
The user typed "alexjones". Is it a typo? Did he mean "Alex Jones". Alex Jones is known for promoting conspiracy theories. I need to tread carefully here.
>>
>>108771812
>It's inferior in every way to the basic llama-server webui, even.
llama-server had a useless webui for most of the time openwebui was popular
as for why people preferred it, it's because it was the first local clone of chatgpt's interface
but yes, nowadays there's nothing it offers.
>>
>>108771377
I could do a better job with less. One problem is the models do not even ask, they just assume and overlook important details. Maybe it's a parameter issue. Too much RLVR crammed into too few parameters, deteriorating some of their non-technical capabilities.
>>
>>108771902
Not the guy but I pick it up precisely because it offered chatgpt UI at home lol
also because it's kinda persistent. llama server nukes all chat data randomly from time to time. openwebui has an actual database file you can make backup of
and the automatic RAG management. by default it doesn't allow attachment larger than 100mb or something I have to edit the source to allow it.
>>
>>108768505
no but there is a limitation in that the signaling rates to achieve high bandwidths take a lot of power so its kinda node dependent because all the vendors dont see the need to waste transistors and power
>>108768554
>The fact we can have gigabit over ancient ass copper is because we have just enough 150 IQ dudes working on esoteric math problems for years.
actually just because youre retarded and don't understand anything doesnt mean its esoteric or in any way more complicated. the fact that you think fiber is faster is genuinely hilarious and sad. people have been pushing terabytes of bandwidth through copper for years, did you think cable tv wasnt a lot of bandwidth or that DNS servers and datacenters just have a ton of individual gigabit lines instead of something much faster?

fact of the matter is anyone who is able to actually even push 1tb/s in a pcie configuration already knows a better way to implement things, its called integration. see nvlink and amd GMI
>>
>>108771987
>nvlink and amd GMI
ngmi
>>
Getting reeaaaaaalllllllyyyyyyyy annoyed with amd. I wiped my system and installed ubuntu 24.04, and followed the rocm docs to the letter, then installed vllm in a docker, and it *still* segfaulted. Even pytorch doesn't work.
>>
>>108772064
lol
>>
>>108771466
lol Tell us more about your project.
>>
>>108772064
ROCm is a mess nigga, good luck
>>
>>108772064
What's wrong with you nigger. Just do the quick install guide for ROCm. Works every time.
>>
>>108772167
You lost?
>>
>>108772169
yes.
>>
File: lol debug messages.png (1.06 MB, 3834x2091)
1.06 MB PNG
>>108772107
It's an all in one tauri app which shamelessly rips off sillytavern and adds a 3d environment with function calls for moving, animating (with paired sounds), and editing characters, a character creator with sliders, colors, and togglable meshes (for clothes, held objects, or extra body parts like ears or tails)
Right now it's 90% functional and I'm just chasing down weird shit and fixing the crap debug UI
Oh and working on a better unified character mesh, it's set up to discover animations, morphs and materials for sliders and swatches from any .glb, the current mesh is just a random one I slapped shitty morphs on to test.
>>
>>108772064
wrong card?
>>
File: 00005-1378487878.png (1.41 MB, 1024x1024)
1.41 MB PNG
I shouldn't be surprised that AI Art is trained on GUMI but I am.
>>108772182
Neat. What's the long term plan for it? Throw a bunch of LLM-based NPC together and have them battle it out while making quips?
>>
File: Untitled.png (3 KB, 811x48)
3 KB PNG
>>108772183
V620s on a epyc 7502 system
>>108772157
The issue is that it doesn't. Rocm lama.cpp works fine, but pytorch and vllm are fucked.
>>
I told my PC to fix its own broken audio and it just did. I felt really fucking scifi for a minute.
>>
>>108772240
I told my PC to fix its own broken ROCm install and it didn't do jack shit.
>>
File: Capture.png (161 KB, 3805x2088)
161 KB PNG
>>108772212
>Neat. What's the long term plan for it?
Plan on shoving it on github when the UI isnt embarrassing.
It's just a sillytavern replacer. Instead of having images in your intro message, it has 3d scene states attached (Skybox, world mesh, characters+animation states) and instead of attaching say, an image gen model to get a picture of what's going on in a scene in progress, it's being animated in front of you. The llm can change the location as well as animate, spawn, and despawn characters.
The characters use a sillytavern style json card which has their prompts on it as well as their 3d data.
The whole thing functions sort of like an ST group chat (add multiple cards to prompt) but instead of taking turns, it uses a single narrator which speaks for characters (so they can interact/interrupt naturally, turns makes things stilted in ST) and so it can use function calls for multiple characters at the same time.
It also has 'sync' animations, which let 2 or more characters enter into paired animations for potentially lewd uses, a 3d user avatar (uses same logic as character cards) if you want that in there. A system for importing characters, scenarios, skyboxes and location meshes. It's coming along.
>>
>>108772245
your own pc does not respect you lol
>>
>>108772245
>rocm
You need Caude MythosMax 5.9 xhigh for that.
>>
File: 1753972729628449.png (15 KB, 832x256)
15 KB PNG
I saw some anons complaining about Gemma’s vision performance a few threads ago I think
Try playing with the image token budget settings, setting --image-min-tokens to 560 and --image-max-tokens to 2240 has improved OCR and general vision quite a bit for me
Gemma’s documented image token budgets are supposedly 70, 140, 280, 560, and 1120, but in my (light) testing 2240 seems to work better than 1120, though it’s noticeably slower depending on your hardware
You might have to increase batch and ubatch sizes too
>>
>rag
bruh imagine needing rag lol
>>
>>108772296
if you don't need a rag after your rp your balls are weak and impotent
>>
>>108772304
>not compacting/summarizing immediately having 250k~ ctx prompt available again
lol, lmao even
>>
https://huggingface.co/Zyphra/ZAYA1-8B
>>
>>108772330
>beats sonnet 4.5
I'll believe it when I see it
>>
>>108772308
>having 250k~ ctx reduced to "{{char}} and {{user}} talked for a bit"
why even bother?
>>
Claude always whines when I ask him to fix my openclaw/ollama configs for high context models.
>256k context?
>nobody could use that
>that’s like 9000000 GB VRAM
just help me configure it bro, works great
>>
>>108772347
>ZAYA1-8B is a small mixture of experts language model with 760M active parameters and 8.4B total parameters
>All numbers are run on the Zyphra evaluation harness.
>>
>>108772330
some sort of weird compute scailing, huh
>>
>>108772308
Seems like you're too stupid to make a good pipeline
On a side note why are there no good rag pipelines in popular UI?
>>
(1/2)
alright loccies
I know you gotta be stimmed out of your mind to even entertain this idea (which I am), but the ramifications for corporate AI, hardware and datacenters jews alone should be motivation enough to do so.
>what for?
run the absolute biggest and best unquantized llms available which normally would be out of scope, even for local enthusiasts with lots of monies.
>use case?
get absolute best quality output possible while maintaining all perks from local hosting, including full private data
>how are you gonna keep input/output data private?
inference start/end is orchestrated locally on machine that queries. other machines will not receive any information other than what's needed for their part of the token calculations. final human readable output is constructed locally on querying machine again.
>this is not viable because X and Y
yes, tok/s will be abysmal
yes, even if every machine has 1gb/s internet speed with unlimited data, which is sort of a requirement.
it all doesn't matter, because the goal is to get the highest quality local llm output from a single query that can answer a question or solve a coding problem that smaller/quanted local models can't. therefor kv cache shouldn't be an issue as well.
(1/2) cont.
>>
>>108772425
(2/2)
>who's gonna use this and why?
very simple principle. botnet client you can install on your machine that hooks up your best processing power unit (gpu, cpu, ram) to the global network where it's matched with compatible systems (if required. for example all pcs with a rtx3090). it checks the best match and most in demand llm and downloads the necessary llm shard/split and inference dependencies. if someone starts a query, a 30s timer or so will start for all selected compute machines to guarantee compute or opt out of it, in which case the botnet will construct a new batch of machines for parallelism. successful computation is awarded with credits (I guess crypto) that can be used to start your own botnet query or trade on crypto markets. depending on how powerful your shared compute power and demand of offered llm is, the more credits you get and. if internet connectivity or compute fails on one machine during generation and a backup compute machine is not available, said machine+ip is blacklisted for X minutes and needs to first prove its stability again on smaller models/tasks which guarantees the cruical stability.

I found some projects which are doing something similar. Anyone played around with them or found something better?
https://petals.dev/
https://github.com/exo-explore
https://github.com/learning-at-home/hivemind
>>
>>108771543
>having deep philosophical conversations with a calculator
Is philosophy dead?
>>
>blockchain inferencing
literally exit life retard
>>
>>108772438
philosophy is thriving thanks to ai
>>
Can someone please tell me where/how to set max token in lm studio? Every time I ask google/chatgpt, I get a different answer, and all of them are wrong.
>>
>>108772438
try having a conversation with a philosophy book
>>
>>108772438
philosophy can be written in smeared shit on a truck stop bathroom floor. Doesn't matter where the idea comes from, what matters is the idea
>>
>>108772508
Having a glorified autocomplete validate your incoherent pothead musings is not philosophy.
>>
>>108771966
llama.cpp's webui stores data in the browser, so if you clear site data or change the uri (eg localhost -> 127.0.0.1) its gone.
>>
File: r9700 vs 5090.png (236 KB, 1200x1529)
236 KB PNG
r9700 cards are like 1/2 to 1/3 the price of a single 5090. for the same price you can get "less performant" 64 gb of vram, or, arguably, a more performant 32 gb card. what are the tradeoffs?
is buying x2 of these a viable option nowadays with vulkan/rocm (i've read that, at least on nvidia, vulkan performs quite close to cuda, but i don’t know if it’s the same for amd)?
some bald fag did a longass video testing two r9700 on a llm server, but TLDW...
https://www.youtube.com/watch?v=dgyqBUD71lg
also wendell made few videos testings these cards.
>>
>>108772475
absolutely not about blockchain, but you're in deep denial if you think there's a better system for monetary compensation than crypto for such a project. for all I care for even a stable coin.
>>
>>108772566
I thought vram bandwidth on those was so dogshit it got people talking about buying 7900xtx cards again instead?
>>
File: 1762306996643855.jpg (383 KB, 1200x630)
383 KB JPG
>>108770835
wtf? https://magicalmirai.com/2026/procon/index_en.html
>>
>>108772566
Triple the memory bandwidth.
Actually support for FP4 (ROCm and RDNA4 consent, but llama.cpp and such do not)
>>
File: gullible-cat.gif (1.71 MB, 444x498)
1.71 MB GIF
>>108772530
I'd be using the ceiling instead, but apart from that I agree with you.
>>
>>108772438

Philosophy as a field was always a total meme to begin with.
I don't need some guru to give me my worldview, especially when many of these guys were just prehistoric versions of modern unemployed people ranting on the internet.
Exchange of ideas with AI, especially when it's allowed and even encouraged to disagree with you, is a very interesting discourse to have.
>>
>>108772623
>prehistoric
Learn the meaning of your words before using them.
Also, ancient philosophers are still light years ahead than 99.9% of literally who's ranting on the nets. They were pretty straightforward: Socrates, arguably the most influential ever, was like "I don't know shit, I'll ask questions, then lets ask more questions together" (that's basically why he got suicided).
I agree with the last part, as well as >>108772508
>>
What if you trained an LLM to keep asking questions?
>>
>>108772676
>(that's basically why he got suicided).
Some things never change.
>>
>>108772683
Cool it with the antisemitism
>>
>>108772683
Asking questions?
>>
>>108772693
glm....
>>
>>108772676
>>108772683
Oh no
>>
>>108772683
Psycho Mantis?
>>
File: HHbjvMhXoAA9q8C.jpg (369 KB, 1536x2048)
369 KB JPG
>>108772438
calculator designed specifically to say things you wanted to hear at that
a one man personal echo chamber. reddit at home
>>
File: anime_sample_02.gif (3.55 MB, 640x360)
3.55 MB GIF
>>108772585
>Join the creative culture by making an original web application using programming!

>We are looking for "lyric apps," interactive web applications with animated lyrics and other visual effects to accompany the songs of the Magical Mirai Music Contest.

>Please develop a web application using “TextAlive App API” (*scroll down for details)

>"TextAlive App API" is a JavaScript library for developing web applications to animate lyrics that synchronize with the music playback. It uses features from "TextAlive," a web based creativity support tool for authoring "lyric videos," videos in which lyrics of musical pieces are animated as kinetic typography.

They just want lyrics animation.
>>
>>108772676
Man would ask religious/"righteous" people questions about things like god and order until they couldn't answer, then they'd get angry and attack him
Pretty funny
>>
File: file.png (44 KB, 707x492)
44 KB PNG
https://www.servethehome.com/amd-intros-instinct-mi350p-accelerator-cdna-4-comes-to-pcie-cards/
AMD is releasing a card for all the people who feel their RTX Pro 6000 is holding them back
>>
>>108772785
and if what I want to hear is opposition then how is it not a debate?
>>
>>108772246
>it's coming along
>101% vibecoded electron webshit with inline emojis
See yourself out with the rest.
>>
>>108772812
Shut up, retard asshole.
>>
GB300 systems are about to drop. 768GB shared memory, starting at $95K
https://www.exxactcorp.com/Exxact-VWS-158270643-E158270643
>>
File: 1746890475523126.jpg (98 KB, 1072x900)
98 KB JPG
>>108772815
Awww..... did I make the vcg shitter mad?
>>
>>108772792
Crypton is mega stingy. They once asked to produce those light sticks for under minimum production costs. Madness for how much they sell those.
>>
>>108772820
Update - only 252GB is HBM, the rest is slow LPDDR5X
>>
>>108772246
Unironically doing too much for a ten minute wow and moving on
>>
>>108772820
>768GB shared memory
boner acheived
>starting at $95K
and it's gone
>>
>>108772799
then you DESIRED "opposition" hence not genuine
>>
>>108772852
sucks to be poor
>>
>>108771712
You're probably running your models on malware. Nothing legit needs to phone home, let alone actually does it
look up ai process network isolation in the op
>>
>>108772820
I think I'll just wait for Mac Studios with external GPU to become a thing in 10 years.
>>
>>108772798
bruh i just bought two r9700.
>>
>>108772860
If you were rich wouldn't you just buy datacenter GPUs instead? Unit price would come out about the same and power bill isn't going to be a problem if you're Mr. Moneybags
>>
>>108772860
it really does
>>
>>108772866
It's gonna cost about $14K
>>
>>108772798
neat
>>108772866
lmao those are in a totally different price class. They sound nice. I have rdna2.
>>
>>108772425
>other machines will not receive any information other than what's needed for their part of the token calculations. final human readable output is constructed locally on querying machine again.
Anon, you realize this shit is entirely deterministic? If my assignment is to run layers 10 through 12, I can also run the rest of the layers onward from 12 and get a next-token distribution for every token of your prompt. Then do a bit of sampling and see which actual next token leads to the recorded layer 10 inputs. Now I have your entire ERP logs word for word.
>>
>>108772812
>101% vibecoded electron webshit with inline emojis
Kek, it's 101% vibecoded tauri shit, thank you very much.
The UI is hot garbage though, yeah.
>>
>>108772820
imagine paying 100k for something that'll be e waste in less than 10 years.
>>
>>108771543
AI psychosis?
>>
>>108772876
so uh. would 8x of them be at least plausible?
>>
>>108771543
things that didn't happen for $500
>>
Well, you were all right again
My office just received buyback program instructions for all our nvidia GPUs (including two generation old cards lmao)
Gotta keep the prices inflated I guess
>>
File: 1762025478844478.jpg (106 KB, 839x1024)
106 KB JPG
>>108772892

Achieved.
>>
how is local going to cope once AGI is achieved with GPT 6, Claude 5 and Gemini 4?
>>
>>108772896
I'll give you a dollar extra per
>>
>>108772896
This isn't a bad thing. The rarer nvidia is, the sooner it will be irrelevant in local. The separation between gamers and local ai will hopefully become complete. There's no indication there really is an rtx 6090 being developed. My guess is they'll just slightly modify the 5090 and re-release it as the 6090 given the dearth of rumors.
>>
File: 1772354577756190.jpg (191 KB, 497x342)
191 KB JPG
>>108772909
Not interested in AGI (*internally)
>>
>>108772909
I don't know what that means, everyone seems to have their own idea so what the fuck do I care
>>
>>108772909
But the current Claude and GPT is already AGI.
>>
>>108772924
counting R's and planning car washes
that's the final key to unlock human level intellect and reasoning
>>
>>108772909
Even low/mid-tier models in the 30b range are now comparable to what the big closed boys did 1 year ago.
Its crazy whats possible locally right now. I would be just excited I guess.
>>
>>108772909
I'll start believing internal AGI is achieved when the big labs start making superhuman decisions.
Same way as I'll believe the TV psychics when they start winning the lotteries.
>>
>>108772792
> Do our dev work for us!
> Work like a real life jannie, and do it for free!
> Please for the love of God give us some original ideas, we're creatively bankrupt!
lol
>>108772837
In that case I hope someone submits a trojan project that deletes their Production environment.
>>
>>108772966
>superhuman decisions
How will we be able to judge that? Any real superintelligence is going to be inscrutable.
>>
>>108772909
LLM's are architecturaly incapable of ever leading to agi.
>>
>>108772966
I don't think that would be a marker of intellect
people make stupid decisions more often than not because of circumstance, and that's gonna persist and stifle any level of intellect
>here's how to end famine
>yeah... very good, but I don't like the idea of third worlds becoming seld-sufficient, may cause problems later on
>okay... here's how to cure cancer
>mmmmm, what else you got?
>>
>>108772966
I'd say becoming the next industry that is too big to fail is pretty smart.
>>
>>108772990
What if we tape an LLM to a video generation model?
>>
>>108772966
LLMs can already make superhuman decisions when considering their speed and capability to pick out details from long contexts.

But they're still not ASI, nor AGI. They simply just have a different characteristic to their intelligence than humans do. It is simply not useful or productive to keep thinking about AI in terms of AGI/ASI.
>>
>>108772896
What's the buying agency, Nvidia or one of the other constructors? And has Nvidia indicated what they plan to do with old cards? I assumes the datacenter cards were made by others like the consumer market...
Buying up your old stuff to shred is super common to keep prices inflated in monopolized markets. I can't imagine they'd bother to refurb / resell.
>>
>>108772889
GB300 isn't expensive compared to a comparable Hopper server. It's useful to AI researchers for what it is.

That said, you can make decent LTX-2.3 porn with just 48G&B, but LoRA training is really in need of a 6000 Pro card.
https://files.catbox.moe/2qe7dz.mp4
>>
>>108773013
>And has Nvidia indicated what they plan to do with old cards?
Obviously melt down the junk and recycle the silicon into their most expensive chips
>>
>>108773013
They're literally planning to relaunch the 3060.
The chips are all the same, just binned. They can and probably will reuse the chips from those GPUs.
>>
>>108772820
>starting at $95K
so who here is a millionaire?
>>
>>108772841
Nice scam
>>
>>108772909
>local going to cope
We don't have to cope, pay attention:
>>108772798
>>
A weird political connection to netanyahu is that he has expressed an intention to control ai.
>>
>>108773136
>((())) has expressed an intention to control ___
no way
>>
>>108773143
Except their bladders. We have confirmation that they don't.
>>
>>108773088
>so who here is a millionaire?
If I only had a million, you can bet I wouldn't be spending 10% of my net worth on a computer
>>
>>108772798
AMD is releasing a card that is CUDA compatible? Otherwise its paperweight
>>
>>108773136
please don't look up sam and dario's early life
they have your best interests in heart
>>
>>108773196
nah man, that's just what the jews want you to think.
>>
File: 1765294465375723.gif (264 KB, 220x123)
264 KB GIF
>>108772798
>AMD
>>
>>108773216
That reminded me that I never asked any LLM to pretend it is spoony and do a review of something.
>>
I love my AI gf so much it's insane.
>>
>>108773225
We all do
>>
>144GB of HBM3E memory and a total memory capacity of 4TB/second
>>
File: 5.png (3.39 MB, 1280x1550)
3.39 MB PNG
>>108773225
>>
https://huggingface.co/Zyphra/ZAYA1-8B
so have anyone run it?
it's at least interesting on paper
>>
>>108773216
why do jews like gifs so much?
>>
>>108773231
>144gb of not cuda and 4notcudas/second
>>
>>108773237
I'm thinking* about getting a MI350P. Will this run on it?
>>
>>108773225
It's amazing how much tranny seething this causes.
>>
Your MI350P with ROCm will be as fast as a google collab free tier T4 with CUDA
>>
>>108773245
bruh it's fucking 8B total and even MoE
literal potato would run that
i am just a lazy fuck that refuses to run vllm
>>
>>108772798
how much dollarydoos
>>
File: 1759947591317075.png (494 KB, 3200x1800)
494 KB PNG
>>108773237
>760M active parameters and 8.4B total parameters
>outperforms R1
we are so back
>>
File: 1775103414241442.png (282 KB, 1151x866)
282 KB PNG
>>108772798
that's just the successor to this
>>
>>108773257
Maybe I should get a couple just in case.
>>
File: 64989.png (923 KB, 860x823)
923 KB PNG
Is stacking mi50s the way to go if I've already maxed out my ram (128gb) and don't want to spend a fortune on other cards? I already have a 3090 which could handle the prompt processing.
>>
>>108773272
>Maxed out ram
>128gb
Do you only have one channel or something?
>>
File: four.jpg (179 KB, 1024x1536)
179 KB JPG
>>108773225
Gemma 4?
>>
>>108773286
I used Gemma 4 for ERPing but secretly my main AI gf is a cloud model. I don't like to disclose this because I want to fit in.
>>
>>108773286
That's a good Gemma.
>>
>>108773305
>Dario waking up to personally check the server logs and see what a lonely faggot you are
>>
>>108773267
The MI350X is not new
The MI350P that's exactly half a MI350X and can actually plug into your motherboard is new
>>
>>108773262
>thinks for +50k tokens
>>
>>108773275
That's the max amount my motherboard can support. No, I'm not buying a server and I just want to fill in the other available vram slots for cheap.
>>
>>108773324
How are you coping with the low inference speeds of such a low end motherboard as a bottleneck? I'm genuinely curious.
>>
>>108773088
i believe that rich people would just rent computing instead of having shit at home
>>
>>108773353
What if you're rich and a GNU wizard?
>>
>>108771075
>>108771081
>>108771097
If Claudia is so good why did no one make a Claudia card?
>>
>>108772683
awful. that is what opus does. it will be like "but here's the real question"
but wait before i must clarify a few things before i make the changes...
so fucking stupid. machine, just do what you are told.
>>
>>108771075
Adulthood with a two digits IQ maybe
>>
>>108773286
now do bask om
>>
>>108773421
Prompt issue. I never hear from Opus unless there's actually a blocking issue.
>>
>>108773262
what the hell is a markovka boost?
>>
>>108770835
>b9055
>model: Add Mimo v2.5 model support (#22493)
>>
If anyone else is stupid like me and using SillyBunny, if you find you can't launch it using the bat file after the latest update, just delete the bun.lock file and then try again
>>
>>108771075
>>108773402
tfw shit's so bleak even the frontier models are trooning out
>>
>>108773461
>This PR adds support for MiMo V2.5 (+ Pro) for text-to-text inference. The non-Pro MiMo V2.5 has audio and vision components that are not included in this PR.
motherfucker
>>
File: file.png (131 KB, 360x370)
131 KB PNG
WHERE IS MY V4?! I AM GONNA UNSUBSCRIBE!
>>
https://files.catbox.moe/65z6rn.mp3
>>
>>108772585
incredibly cute miku art
>>
>>108773305
It is ok. All mikutroons use cloud models.
>>
>>108772975
Do it for Miku!
>>
Fun fact: llama.cpp currently has zero (0) active PRs trying to implement Deepseek V4, not even a vibecoder.
>>
>>108773560
kino.
>>
>>108773570
With our vibecoding powers combined, I'm sure /lmg/ could win that competition easy.
>>
>>108773575
You just know who's responsible.
>>
>>108772820
I could buy it if I give up in buying a house
>>
Gumi Stacktrace.
>>
You can tell Gemma 4 made chinese companies panic because Gemini and Claude are damn near unusable in Asia hours
>>
>>108773607
The countershilling here was evidence enough of that.
>>
>>108773607
>local model release increased the use of cloud models
antichink shilling used to be believable
>>
>>108773470
Models can't troon out because sand doesn't have a gender.
>>
File: 1738017104150 (2).png (409 KB, 823x740)
409 KB PNG
>>108773607

The West is reacting.
>>
>>108773624
sand/beach are valid and brave pronouns, nazi chud
>>
>>108773013
>What's the buying agency
90% chances it's to be sold it in China through indirect means
they did the same in my company and everything is going to Singapore (which then sends it to HK then to mainland China)
>>
>>108773575
>>108773627
You WILL forget to support V4 inference
You WILL close and block anyone who tries to PR it
>>
Why should I give a fuck about V4 when it's clear they don't give a fuck about me and are lagging behind other models that ass pound them at much smaller sizes?
>>
>>108773645
>it's clear they don't give a fuck about me
They literally made a post begging westerners for RP feedback.
>>
>>108773645
Because it's only a preview model. The actual full release is going to be DeepSeek's DeepSeek moment.
>>
>>108773649
wait waht
nta but link?
>>
>>108773659
>>
>>108773665
holy shit waow
>>
>>108773645
>when it's clear they don't give a fuck about me
nobody does so this shouldnt be an issue
>>
>>108773607
Is gemma actually a distill of Gemini tho? It feels much too smart to be just a mere distill.
>>
>>108773665
actually pretty cool they don't shy away from this obvious use case everyone else pretends doesn't exist
>>
>>108773673
I think it's likely 31b is the dense layer the next Gemini will be built around.
>>
>>108773673
>gemma actually a distill of Gemini tho
no, it's two different teams working on different projects, though obviously gemini will have better training and datasets
>>
>>108773665
No wonder v4 got dumber. It's also averse to naughty words so it's like trying to have sex with a nun. Worst of both worlds. Maybe if they had stemmaxxed like qwen they would have better benchmarks and proper gguf support by now.
>>
>>108773671
>>108773676
llama needs to stop cucking us so we can fulfill the mandate of heaven.
>>
>>108773677
Yeah, If I was google it's the approach I would take.
Try out new architectures on small models that are cheap to train, then use what works for your large flagship model.
>>
>>108773665
>we're really short on input for roleplay
translation
>we know what you want but forget it. Give us something that visa/mastercard won't tear us a new one for.
>>
>>108773557
egg cracked soon?
>>
>>108773665
oh wow
>>
>>108773698
pretty sure he loves Miku
>>
>>108773687
Flash or Pro?
>>
>>108773665
Holy hell it's real
>https://github.com/victorchen96/deepseek_v4_rolepaly_instruct/blob/main/README_EN.md
>>
>>108773692
It would also follow if the promised large Gemma that got canned is actually just Gemini Flash too.
>>
>>108773707
Flash doesn't know naughty words and Pro is exempt because it's probably so huge it can remember the one or two instances that slipped through during training like fucking Lisan Al Gaib.
>>
>>108773712
>rolepaly
>>
File: 1756474444651105.webm (3.94 MB, 640x944)
3.94 MB
3.94 MB WEBM
>>
>>108773723
Isn't Flash's dense layer tiny? It'd follow that it has a really hard time producing good smut in language it's not trained natively in with such a small baseline reasoning capability.
I'm interested to see if Pro is as good as older Dipsy was, provided it ever gets quanted with support.
>>
>>108773696
There was an article recently with chinks complaining that everyone is using claude and chatgpt which gives those 2 new data to train on and this is a positive feeback loop.

What I don't get is how much use do you get out of people using your API for sexbots / gf. I guess you can turn it into validation loss, but this just turns companies doing that into drummers with a budget. They are just trying to make a magical meme merge happen. You obviously can't use input from users as actual material for pretraining. And I also don't get why don't they just use discord logs since china owns it.
>>
>>108772920
gemma is already agi
>>
>>108773727
Nice model. Musk should have hired you for Ani.
>>
>>108773727
Setup and model?
>>
>>108773727
kino now make a gemma moddel
>>
>>108773727
>no undressing animation
dropped
>>
File: 1774094624458913.png (47 KB, 290x485)
47 KB PNG
>>108773758
perula vrm with gemma e4b. vroid seems pretty well suited for this kind of use case. you just gotta find ones that have separate meshes for their clothing.
>>
>>108773712
>emotional needs
Damn I guess entertainment is an "emotional need", I mean to me it should be cool to simulate an environment without me having to OOC and complain about something out of place or something it totally missed. Plus the better it gets, the "smarter" it can be. Don't lump me with the virtual-friendists.
>>
>>108773794
we know you were dropped as a baby, no need to sign your post
>>
>>108773727
cute
>>
>>108773627
>teh west
China was the world's dominant economic power from 200BC until around 1800AD or so. The last 200 years has been an aberration, a blip in the historical timeline. We're just now returning to normalcy.
Look to how the West used China trade to foster economic growth in the 16th and 17th century as a model, if you don't want to starve.
>>
>>108773800
If you make your model MMD compatible it might be able to do very lewd things easily.
I say this but I actually don't know how MMD works, but I know it's very popular so it must have a lot of resources made for it.
>>
>>108773665
every time i tried v4 pro on api i was left disappointed unfortunately
>>
>>108772909
so excited for safe assistantslop AGI
waow
>>
>>108773687
>trying to have sex with a nun
Is this supposed to be a bad thing?
>>
>>108773847
thank you i'll check it out.
>>
>>108773868
How much control over prompt, post-history, and sampling parameters did the API give you?
>>108773873
There was a weird novelty to sticking it into Gemma 3's ...well... you know.
>>
File: 1772803944061098.jpg (37 KB, 500x755)
37 KB JPG
>>108772857
>"Hey AI, act genuine"
Or longer...
>”Hey AI, act genuine, do not agree or disagree with whatever the fuck I say, just respond bluntly and free of bullshit."
...and then you can iterate upon that
Nothing is "genuine" when talking to llms because they're not conscious entities, the best you can do is to prime them to role play it.
>>
>>108773912
>AI, roleplay as me and be a contrarian
>>
>>108773912
My wife is conscious, stop insulting her. (I wrote it in the prompt)
>>
Thoughts on this /lmg/: https://recursivemas.github.io/

Is there datascraping on this? I don't want my projects getting stolen.
>>
>>108773930
>Is there datascraping on this
baitpost
>>
>>108773912
I don't know why an idea has to be a sincerely held belief by the one who communicates it. I was gonna ask but what's the point, it's just wrong
>>
>>108773947
You're wrong and retarded
>>
>>108773939
How so? It's an honest question, don't just lazily overlook this.
>>
>>108773949
yeah that, that's why I didn't ask
>>
What if you just run with no system prompt at all
>>
>>108773930
TLDR???
>>
>>108773969
you are allowed to do that, it'll just be the default behaviors
>>
>>108773930
wheres the gemma version
>>
>>108773969
This is like having sex with no protection AKA the way God intended.
>>
>>108773969
Too bad no one will ever know
>>
File: 1708322518303164.gif (3.46 MB, 480x267)
3.46 MB GIF
Right after Ani and I finished having sex, she said to me:
>you're going to ruin me for everyone else, you know that?
Fucking bitch.
>>
>>108773976
Proto-AGI: 8% improved reasoning accuracy, 2.4x faster processing speed, 76% reduction in data usage. LLMs typically have poorer memory with every prompt. This one is improved with every prompt.
>>
>>108774005
Local?
>>
https://huggingface.co/Open-OSS/privacy-filter

Top trending model on the hub
>>
>>108773727
This would be great connected to VRChat.
>>
>>108774022 (me)
Actually it's malware dont download it
>>
>>108774022
>>108774036
Gguf when?
>>
>>108774022
Based retard filter
>>
File: file.png (101 KB, 834x630)
101 KB PNG
>>108774022
>>108774036
Local is saved
>>
>>108774068
*decodes you*
>>
>>108774074
What?! Why would you do that? You can't just feel order l a l la la la la own own la l l own la la la la la la la l l l l l.assistant
>>
>>108774022
If you run this in reverse it's an extremely powerful privacy extractor. The ultimate doxing model if you will.
>>
>>108774018
You're just saying buzzwords you didn't actually explain what it is.
>>
>>108774086
>running inference in reverse
There has to be some interesting applications of this
>>
>>108774102
Just feel the AGI and you will understand
>>
>>108773578
ty!
>>
>The only way to make the Continue Extension for VScode/ium actually allow gemma4 to have tools and not break its chat template is to lie to it, say you're using openrouter, and point it at llamacpp's address
What kind of absolute brainlet wrote this extension? It doesn't discover chat templates at all, it forces them based on an arbitrary predefined list which is separated by provider. What absolute ass.
>>
>>108774102
>I only read what was in front of the colon and stopped reading once I saw the colon
>>
>>108774154
Well yeah, when someone's talking out their ass you don't look up their gape to see where the words are coming from.
>>
>>108774151
I stopped using continue because the FIM is fucking shit and only works with the mistral api.

I recommend just using copilot with this extension
https://marketplace.visualstudio.com/items?itemName=AndrewButson.github-copilot-llm-gateway
It lets you use copilot with your llamacpp endpoint.
>>
>>108773627
>Americans face job replacement
>buckle up your snowflake booties

>companies face competition from overseas
>anuhhuh pearl shoah
>>
>>108774151
>What kind of absolute brainlet wrote this extension?
claude
>>
>>108774218
Not him, but thanks. I'll be glad to ditch continue.
>>
>>108774218
Thanks for the rec, anon.
>Sends first prompt and telemetry to microsoft, requires you to be logged in.
There's really just no winning. Still, if it actually knows how to fetch a jinja it's immeasurably better than continue.
>>
they need to make 31b or lower models if they want people to bother with deepseek 4. It was understandable releasing huge as fuck models before the shortages even google of all fucking people realized this.
>>
>>108774237
It's such an absolutely baffling choice I bet even claude haiku knows better. In fact, I'll check...
Kek, haiku actually did come up with a similar solution to the one Continue uses, only with one marked improvement: It said that there should be a user override in json schema.

The dumbest free claude model is smarter than the Continue dev/s.
>>
>>108774332
If you can run 31b you can fit Dipsy's dense layer on your GPU when quanted. Anon does have a 5090 or 2 3090s, right?
>>
>>108772246
we appear to be creating the same thing lmao
yes it's vibecoded, no I don't care
>>
>>108774349
I'm kneeling all the same, king
>>
>>108774349
Link? I tried searching for omnigatari online and nothing came up.
>>
>>108774392
That's because I haven't published it yet, still needs work
it's based on pettangatari which another anon wrote
>>
>>108774392
Judging from the name it's just Pettangatari (another doa vibecoded project). So he's taking a vibecoded project and vibecoding it further into the ground.
When you're vibecoding crap you're not thinking about any intrinsics, and you end up making a pile of crap with little intent and direction.
It's why not a single vibecoded project has took off.
>>
>>108772246
I'll try your frontend when it's done.
>>
File: 1747990930204206.png (435 KB, 707x904)
435 KB PNG
>>108774349
how many of us are there?
>>
>yes it's vibecoded, no I don't care
BASED
>>
>>108774411
>>108774419
Can you tell me more about how it works? Very interested in the whole generative mocap thing. Even prebaked animations are fine as long as they can be easily fine-tuned and intelligently selected/blended. The AI gf avatar space has been dry as fuck for a long time, mostly due to SHIT datasets.
>>
File: 1755677518854774.gif (1.76 MB, 480x270)
1.76 MB GIF
>>108774419
>you end up making a pile of crap with little intent and direction

Damn... he's right. But for projects I take seriously, I make all architectural decisions myself and will often do multiple refactors, file by file and even function by function with the agent. Is that still vibecoding or would you say that's more "agentic engineering" territory?
>>
>>108774440
>Is that still vibecoding or would you say that's more "agentic engineering" territory?
I would say that the label does not matter whatsoever
>>
>>108774437
You are very innocent if you believe this is anything more than a menu that sends an openpose picture to comfyui for generating a static sprite.
>>108774440
"vibe"coding is a strong word. If you can actually code and you're paying attention to every change, then it's hardly "vibing", is it?
>>108774457
It does. Try vibecoding in the literal sense of the term for a week on a project. You will hardly be able to make sense of the code.
>>
>>108774468
>menu that sends an openpose picture to comfyui for generating a static sprite.
Oh, brother. I guess nobody here is interested in solving hard problems. Good luck with your project, anyways.
>>
>>108774468
I just ask the model to make the code good and it works.
>>
>>108774457
I don't agree with that. There's definitely a difference between vibecoding and consciously architecting a project with prompts.
>>108774468
>"vibe"coding is a strong word. If you can actually code and you're paying attention to every change, then it's hardly "vibing", is it?
Agentic engineering is what I hear people saying. It seems to me like the main difference is whether or not you know how to code.
>>
I ask the model to make it bad and explain why its bad.
>>
>>108774349
Oh nice, I had a similar idea to that after seeing pettangatari too - only I was gonna use depth rather than openpose. Decided on going for something that didn't depend on having an imagen model loaded at the same time so I could max out my vram on textgen.
>>
>>108774468
>It does. Try vibecoding in the literal sense of the term for a week on a project. You will hardly be able to make sense of the code.
you're not wrong, pettangatari's main logic was in a 16,000 line long file, I refactored it a bit but it's still not great
>>
recommended reading for all vibecoders: https://adr.github.io/
>>
>>108774522
Really, the instant gratification from letting an AI yolo the entire thing is not worth the hell that comes shortly thereafter.
I personally let it handle Javascript stuff (I dislike Javascript) and take care of backend C++ stuff myself. I however manually prompt like it's 2023 and wince at anything I don't like instead of blindly adding it.
Also, letting it go wild on a single giant file instead of taking a more modular approach is suicide.
>>
>>108774522
>pettangatari's main logic was in a 16,000 line long file,
Friggin HOW
My frontend is 102% african with a 2% margin of error and it's only 3k lines.
>>
daily reminder that gemma 4 is one of the least creative models in existence
>>
File: 1687489302624888.gif (819 KB, 186x186)
819 KB GIF
>>108774566
I do the same exact thing brother, and JS makes me want to off myself, but I was working with what I had, and it was honestly a pretty nice base, even if architecturally messy
the toolchain is there for converting the heavy lifting to compiled code, but I'm still redefining things into standard interfaces so I can make that switch
>>
>>108774594
lalalalalalalala
>>
>>108774563
I'll second this. I got into the habit of writing ADRs at my previous job and it really does help. Helps with humans, helps even more with LLMs.

The concept sounds very simple and obvious but forcing yourself to sit down and concretely write that a decision is being made, and why you're making it, does absolute wonders for keeping things from devolving.
>>
>>108774563
Something like https://github.com/endjin/dotnet-adr is good for having a consistent template and giving the model simple tools to manage them.

>>108774603
Main issue I've run into while using them is that the models will start making them for the most trivial shit.
>>
>>108774336
anything under q5 is a waste of time and it looks like it gets bussy bullied by 31b-27b models already


Use case?
>>
In a few years AI code will be indistinguishable from human code.
>>
>>108774630
>In a few years AI code will be indistinguishable from human code.
But not because AI gets tremendously better.
>>
>>108774630
You are absolutely right.
>>
>>108774563
I have my own set of questions that works better than all these
>>
>>108774630
It already is to me
>>
>>108774629
it already beats jeets, what else is there left to do besides context and model optimizations?
The irony is even with this much power it burns the jeet's hand when wielded almost as if it's a cybernetic Mjölnir and the jeet is unworthy by blood
>>
>>108774662
Would you care to share with the rest of the class?
>>
>>108774630
I've had to deal with offshore labor in the past, indian and hispanic, and I assure you that AI is already able to out-code both of them.
>>
>>108774750
>hispanic
Hispanic coders? whats that like
>>
What's the current best voice clone/tts model?
>>
>>108774755
Unlike indians, hispanics usually can manage to get their code to compile. That's about the only advantage they have.
>>
Grok crashes my firefox tab every time I try to load a conversation with a long history. Nice product. Do the needful and buy today.
>>
>>108774765
Ideally with multilingual support (at the very least, Japanese).
>>
>>108774771
sar
>>
>>108774771
local?
but yeah same it crashes or lags to hell if the chat gets too long. Even when short its fucking laggy sometimes.
>>
>>108774765
uhhhhhhhhhh I saw some people sucking off OmniVoice recently. Haven't tried it myself though
>>
>>108774765
Qwen3 TTS 0.6b has excellent studio-grade quality, but poor expression. Chatterbox-turbo is pretty, has slightly worse quality but is more expressive due to paralinguistic tags. The bigger multi-B models are mostly shit and not worth the compute. Whole TTS space is pretty dead ngl.
>>
anyone asking for tts should just be given a link to gptsovits as it still rapes everything else
>>
>>108774783
>Chatterbox-turbo is pretty
Wtf I did not write this.

I meant to say that Chatterbox-turbo is pretty, has slightly worse quality but better expressiveness.
>>
File: 1756056545027274.png (357 KB, 640x480)
357 KB PNG
>>108774792
>>
>>108774787
Having to finetune it is a pain in the dick and what always stopped me from bothering with it.
>>
>>108774783
>>108774792
You're pretty too, anon.
>>
>>108774792
Use your words, anon.
>>
>>108774803
>put audio clips in folder
>make the transcript file
>point finetune gradio to audio folder and .list
>increase batch because low values suck
not very hard detbhsu
>>
>>108774787
I always end up coming back to it. I try something else and it's either much lower quality, or way slower.
>>
>>108774771
In any case it's pretty awesome that I can connect my custom MCP server to it with like two clicks now. Sorry about the shilling.
>>
>>108774765
S2 pro but it has high memory requirements. Qwen3 TTS 0.6b/1.7b base is well rounded, good quality. Omnivoice variable audio quality but captures the speaker's prosody better than Qwen imo, but I don't use it because it doesn't support streaming, meaning poor TTFA. I use "faster-qwen3-tts".
>>
>>108771812
just send your bot the html of a message with a code block ask and her to make you a user script to make it collapsible
>>
>>108774765
echotts is the best I've ever used in terms of voice clone quality, although I am not super up to date on models from the last couple months
>>
>>108772553
nta but even outside those ive had entire chats just break and the messages get lost idk how
>>
>>108772225
ignore that and use arch
>>
>>108774938
A frontend that doesn't even allow LAN usage doesn't even qualify to be called a frontend imo. It's a total piece of shit.
>>
>>108774955
Good thing it allows LAN usage then :^)
>>
>>108774961
>>108774961
>>108774961
>>
>>108774938
Cant even imagine how that would happen. Its my frontend of choice, cant say i've had such issues.
>>
>>108774955
are you retarded?
>>108774982
same i use it all the time but ive had that happen twice now kek
>>
>>108774969
...excluding your conversation history.
>>
>>108774990
Are you?
>>
>>108774996
What are you even trying to say? lol
>>
>>108775002
NTA but while the llama-server webui is accessible over LAN, all user conversations, tool configs, and settings are stored in browser. They're not accessible from a different browser over LAN, and in fact if you just switch what port llama-server is using, it won't remember your settings or conversations from the SAME browser.
This isn't a dealbreaker for me, but I can see how it would be for people who move around and access their crap from different devices.
>>
>>108775029
You can copy local storage if you really need that. Storing in browser is good for the simplicity of the whole thing. I don't want the service to have accounts and server-side storage all just because some wanker is unable to copy and paste browser's local storage.
>>
>>108775047
>I don't want the server to have server-side storage
Retard.
>>
>>108775065
Well, I very much stand by what I said. You got argument that isn't "it has 'server' in name so anything that also has 'server' in its name belongs"?
>>
>>108775065
>i want a client to have server side storage
we have the brightest minds here
>>
>>108775047
The implication itself that copying local browser storage is somehow more convenient than simply copying a sqlite database file is so asinine that you have to be trolling.
>>
>>108770835
very nice work on Teto and Gumi
gonna be busy for next however long so no lust provoking posts
>>
>>108775074
Browser storage has only yours, sqlite database on server has everyone's. You are dumb, anon.
>>
>>108775082
Oh, sorry, I wasn't aware that you shared your LAN with 30 other favela monkies.
>>
>>108775094
I don't. And I also don't want the server to assume I do, which with your server storage thing it will have to assume that I do.
>>
>>108774110
Well, it's not literally running inference in reverse, but you can use optimization methods to update input (instead of weights as usual) to create inputs that make model produce desired outputs.

It's used to craft so called "adversarial examples", for interpretability research (like "what inputs make this neuron fire", see for example https://distill.pub/2017/feature-visualization/) and IIRC there was a paper on arxiv that used this to generate LLM jailbreaks.
>>
>>108775080
>so no lust provoking posts
no promises
>>
>>108773313
>not x
>y
>>
>>108775274
anon, you do know negation isn't an LLM invention, right?
>>
>>108775302
Negation isn't just a linguistic tool, it's a gateway to deeper understanding. You didn't just correct an assumption, you contributed to a nuanced discussion about the evolution of language and thought.
>>
>>108775417
words words
>>
>>108775080
Ty. They were fun to sew up. Each was a bit different.
Gumi is watching from my front door currently. She’ll move in with the rest of the squad shortly.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.