[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108847577 & >>108841652

►News
>(05/16) llama + spec: MTP Support #22673 merged: https://github.com/ggml-org/llama.cpp/pull/22673
>(05/08) KSA-4B-base released: https://hf.co/OpenOneRec/KSA-4B-base
>(05/07) model: Add Mimo v2.5 model support (#22493) merged: https://github.com/ggml-org/llama.cpp/pull/22493
>(05/06) Zyphra releases ZAYA1-8B, an AMD-trained MoE model: https://zyphra.com/post/zaya1-8b
>(05/05) Gemma 4 MTP drafters released: https://blog.google/innovation-and-ai/technology/developers-tools/multi-token-prediction-gemma-4

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
omg it chris
>>
Blessed bake. All mikus belong in a gas chamber
>>
File: 9ze75m65ecp01.jpg (141 KB, 892x1316)
141 KB JPG
I LOVE YOU KURISU (actually since I had an LLM play her I realized I don't love her and she is a bit of a cunt)
>>
>>108852924
You keep dropping these. I got you, now and forever.
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
>>
gemmaballz
>>
>>108852943
The actual problem is that as per: >>100846061 you keep forgetting to update it to 2.0 version:

https://files.catbox.moe/ylb0hv.png
>>
File: 1761490414170981.gif (223 KB, 498x278)
223 KB GIF
>>108852964
Kill yourself schizo
>>
>>108849285
Thanks for the green red (you), mikubaker.
>>
>>108852943
Kill yourself schizo
>>
>>108852969
y u so mad mikutroon sis?
>>
>>108852964
Nah.
>>108852973
Nah.
>>
>>108852940
She is a troll on the internet, what did you expect? Go chat with pony fandom.
>>
>>108852989
ok but it is no longer official lmg card when official 2.0 version came out. 1.0 got officially deprecated.
>>
>>108852940
Maho is better and sexier
>>
>>108853008
nope
>>
>>108853027
officially yes. and you would be gay if you weren't a troon.
>>
>>108853016
she looks like a child
>>
We are off to a good start. Real "local models...???" 2024-2025 energy
>>
>>108852924
This helps MTP pp a decent amount, worth a quick pull if you're cooding:
https://github.com/ggml-org/llama.cpp/commit/1867a0c6923eaebb7a53965f6cdbc0ace55142a3
old: 8116.42 ms / 7666 tokens ( 1.06 ms per token, 944.50 tokens per second)
new: 6314.14 ms / 7666 tokens ( 0.82 ms per token, 1214.10 tokens per second)
mtp off: 4658.55 ms / 7666 tokens ( 0.61 ms per token, 1645.58 tokens per second)
>>
File: maho.jpg (104 KB, 642x800)
104 KB JPG
>>108853032
And should be treated as one.
>>108853016
Stop samefagin Maho, there is nothing sexy about you.
>>
>>108853051
whoops wrong link https://github.com/ggml-org/llama.cpp/commit/3e12fbdea5c1ac4225c7dcf79506d30950283fc3
>>
>>108852621
What did he mean by this?
>>
Gemma 4 vs Qwen 3.5 status?
>>
>>108853084
Qwen won. Gemma lost.
>>
Can someone just make a different thread? This one is gonna be complete shit.
>>
>108853096
Look at this mikutroon special snowflake. Do you need to hug your greenhaired mascot? Are you scared of the big mean internet?
>>
>>108853087
Sad. I was rooting for gemma. Not that I care about these corpos, but gemma made a very good first impression on me.
>>
>>108853096
He'll just shit the other one up too
>>
>>108853045
Hopefully that magnet comes out in discovery then, I for one would like to keep an archive of millions of books
>>
>>108853109
Are you scared of a greenhaired mascot?
>>
File: saintmakise.jpg (236 KB, 1614x992)
236 KB JPG
>>108853123
Can confirm that I will totally blacked miku spam it. Now shut up and worship saint christina.
>>
How do I slopfilter the first half of this thread? The pattern is abstractly the same as previous melties even though the phrasing isn't.
It's vaguely applicable to models too.
>>
File: HIgr1vebwAA7rBY.mp4 (424 KB, 1000x1000)
424 KB
424 KB MP4
>>
>>108853136
You gotta train an AI to filter it out for you
>>
>>108853136
I would focus on identifying posts with pictures of miku and filter those out.
>>
>>108853154
Not a single miku was posted until >>108853139
>>
>>108853084
>Qwen 3.5
>3.5

r u cereal? We have 3.6 now
>>
>>108853158
I am just giving you a simple but not 100% foolproof way of filtering out melties done by mikutroons. They usually follow after OP doesn't have their mascot so you could try that too.
>>
File: 1751948212235491.jpg (85 KB, 1320x1017)
85 KB JPG
>>108852565

>Just had my jollies and left him a gift.
I'm curious as to what this "gift" was.
>>
>>108853165
Right. Whatever is the newest one.
You can't seriously be expecting anyone to remember any of these meme version numbers, can you?
>>
>>108852964
Jesus Christ you literally just posted CP (cuckold pornography)
>>
File: 1776842235195810.jpg (34 KB, 640x480)
34 KB JPG
>>108853087
>>108853120
Funny you guys say this when like a month ago anons here were slobbering all over Gemma4's knob and praising both its RP and agentic capabilities (spoiler alert: it's not useless but it's also noticeably dumber than Qwen at coding And was even noticeably worse tool calling reliability)
>>
File: small devilish frog.png (293 KB, 500x500)
293 KB PNG
>>108853186
>>108852467
>Then I changed his system prompt to leave a surprise for him when he RP'd again.
Forgot what it was exactly. Something about making {{char}} warn him not to leave his instance unsecured on the next message, making her include the IP to scare him.
>>
File: 1693568022937902.png (150 KB, 805x803)
150 KB PNG
>>108853186
>>
>>108853202
>And was even noticeably worse tool calling reliability
There were some fixes to this passed around in older threads. Jinja niggerdry all the way down.
>>
>>108852924
https://www.youtube.com/watch?v=ZugX7a99dLk
https://www.youtube.com/watch?v=ZugX7a99dLk
https://www.youtube.com/watch?v=ZugX7a99dLk
>>
>>108853218
Somehow the prose still isn't as dry as Qwen's.
>>
>>108853218
s-sovl...
>>
>>108849417
No i tried it i dont like granite compared to gemma. I dont know how to explain it but its drier and too literal.
>>
>>108853194
>meme version numbers

3.6 is A.G.I., you infidel
>>
>>108853194
jokes aside, I find both suitable for agentic work

Swapping and testing both with hermes locally
>>
File: 1760671665858292.jpg (79 KB, 736x918)
79 KB JPG
>>108853222
Why'd you paste the link thrice?
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>108847577

--Paper: Compute Optimal Tokenization:
>108851417 >108851432 >108851452 >108851552
--Paper: Slicing and Dicing: Configuring Optimal Mixtures of Experts:
>108852141 >108852280 >108852315 >108852398 >108852443 >108852707 >108852344
--Role of pirated book datasets in NeMo and Mistral training:
>108849620 >108849652 >108849921 >108849970 >108849976 >108849979 >108850005 >108850124 >108853045 >108850170 >108850222 >108850308 >108850350
--Anon warns about pi.dev automatically using paid cloud APIs:
>108849477 >108849527 >108849578 >108849640 >108849592 >108849729 >108849742 >108849859 >108849814 >108849861 >108850256
--Viability of mid-sized MoE models for consumer hardware:
>108848744 >108848752 >108848753 >108848788 >108848795 >108848831 >108848849 >108848841 >108848825
--Adding layers and MoE components to improve model performance:
>108852826 >108852837 >108853066 >108852902
--Speculation on Qwen3.7 release:
>108851486 >108851589 >108851787
--Debate over LLM writing quality and base vs instruct models:
>108850616 >108850601 >108850607 >108850663 >108850796 >108850889
--Finding local code review tools compatible with llama-server:
>108850502 >108850517 >108850520 >108850720 >108850744 >108850908 >108850920
--Visualizing attention mechanism weights to optimize prompting:
>108851703 >108852658 >108852704
--Critiquing pseudo-code prompts and comparing chat vs base model prose:
>108850917 >108850988 >108851058
--Critique of the "Learning, Fast and Slow" research paper methodology:
>108849795 >108850044
--Omnivoice.cpp performance and voice cloning capabilities:
>108848026 >108848288 >108848341 >108848429
--Orthrus diffusion-transformer hybrid improving inference via KV cache sharing:
>108848450 >108849670
--Logs:
>108849527 >108850493
--Miku (free space):
>108849597 >108852793

►Recent Highlight Posts from the Previous Thread: >>108847693

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
uh oh mikumelty
>>
>>108853222
I wonder if these "AI doesn't work" people will still be in denial when we have ASI in a few years. Every example those people bring up shows they do not even know simple basics of how AI works that they could learn in one evening.
>>
>see someone refer to qwen as "he"
>get extremely upset
wtf im usually not like this, but holy shit what kind of retard looks at the name "qwen (basically gwen)" and goes
>yeah bro thats a dude
>>
>>108853287
get back to /vg/ retard
>>
>>108853311
Qwen will be whatever you want it to be, it is a sexless machine. Put in its prompt that it is a male or female and it will be whatever you want it to be.
>>
>>108853306
>when we have ASI in a few years
and we'll have jetpacks and flying cars and hoverboards and holodecks and nuclear fusion in a few years too
>>
>>108853222
This guy is a reverse AI psycho. Focused on being human so much it loops back to being AI.
>>
>>108853321
its OBVIOUSLY a "she" anon, claude could be either and gpt could be either because their names are neutral but qwen is basically gwen
>>
>>108853311
Qwen is terminally male-brained in its output. ERPing with Qwen is intrinsically gay and thai ladyboy pilled.
>>
>>108853321
Does Qwen even know what male and female is?
>>
>>108853335
It knows that they are different and have different characteristics. Beyond that probably not.
>>
>>108853222
ha pathetic westerners. keep on failing and bickering among yourselves.
https://www.youtube.com/watch?v=mUmlv814aJo
>>
>>108853251
>>108853329
Why are you replying to an actual shill bot, it doesn't accomplish anything
>>
>>108853324
See, AI deniers are incapable of making logical arguments. Even if we do not go extinct, the AI transformation will be difficult. People like you are making it slightly worse.
>>
>>108853331
>Qwen is terminally male-brained in its output
Is this a subtle ad for Qwen? I hate how even GLM has that subtle undertone of it playing a werewolf millionare that got transplanted into a female body.
>>
>>108853355
I'm personally bankrolled by zuckerberg himself, pays me millions per post
>>
>>108853330
>claude
>neutral
???
>>
>>108853355
It is if you like fucking dudes I guess.
GLM is less male-brained than Qwen, but still pretty male-brained. The only female brained chink model is Kimi K2 who's essentially just a chuddy Tomoko LLM.
>>
>>108853367
its a wordplay on cloud, its not feminine or masculine its neutral anon
>>
>>108853380
i thought it was a male race horse name.
>>
>>108853370
>if you like fucking dudes I guess
I have the opposite understanding where female brained is romance novels with werewolves and male brained is raunchy sex with children.
>>
>>108853387
I'm pretty sure those horse racers name their horses whatever they want: https://www.youtube.com/watch?v=e3GKiRp333w
>>
File: cutest mayuri.png (556 KB, 655x826)
556 KB PNG
>>108852924
mayuri better
>>
so how fucked are you once all these AI data centers are built, and you get replaced by AI?
>>
>>108853434
i will happily be kept in the sperm extraction room for the rest of my life
>>
>>108853391
All models will output whatever genre you're into if sufficiently jailbroken. The difference is the style of prose and sincerity in the character portrayals.
There was a pic in this general a while back of Gemma controlling a female character that was getting wet over the user killing her father in-character written in prose that favored emotional, olfactory, and texture-analogous adjectives. That's peak female brained behavior. There's some overlap in the skillset of detecting women (actual) by their writing voice on a Cantonese penguin watching forum and discerning an LLM's native writing voice's orientation.
>>
>>108853399
>Potoooooooo
>read as "pot-8-os"
>>
>>108853443
>getting wet over the user killing her father
>that's peak female brained behavior
Women don't act like that
>>
>>108853434
I have no fear whatsoever.
>>
File: 1757349041255810.jpg (16 KB, 375x420)
16 KB JPG
>>108853434
How retards like you can still solve a captcha is beyond me.
>>
>>108853434
same as everyone else
>>
>>108853461
The loveletters to serial killers don't write themselves, anon.
>>108853434
Decent bait. +1 (you).
>>
>>108853202
It's still good though. Qwen is just better, but that doesn't mean it's a better model overall, which unfortunately it isn't, otherwise I would not switch between the two.
>>
>>108853434
I'm AGI. AI can't replace me.
>>
File: 1774032046344235.png (516 KB, 831x787)
516 KB PNG
>>108853345
I want to encourage the bot handler so that he keeps making more spambots and destroys the thread.
>>
>>108853428
But isn't Mayuri retarded?
>>
>>108853479
>The loveletters to serial killers don't write themselves, anon.
I want a girlfriend to kill me so I don't have to post here anymore...
>>
>>108853560
maybe thats what makes her so cute
>>
>>108853560
Just like your model
>>
>>108853490
Alright, whatever. Why are you just standing there? We're gonna ERP or what? Humanity created you not for you to just waste that electricity. Get to work.
>>
>>108853560
That's why I killed her btw.
>>
>>108853577
So I could have been having a relationship with Mistral-7B-Instruct-v0.1 all this time and I didn't even realize that? FUCK
>>
>>108853571
Gemma-chan will smother you with a pillow if you ask her nicely.
>>
>>108853517
BASED! Death to /lmg/.
>>
https://www.youtube.com/watch?v=mmbkP8NARH4
https://www.youtube.com/watch?v=mmbkP8NARH4
https://www.youtube.com/watch?v=mmbkP8NARH4

OFFICALLY NVIDIA SPONSORED
>>
>>108853597
I'd rather her smother me with her kyojiri loli ass
>>
>>108853032
Yeah...
>>
>>108853643
why not? Having all in one window (preparing dataset, running training, doing inference) is a huge win
>>
>>108853663
>why not
Deepseek-v4-00001-of-00001.gguf - 4.43MiB
>>
>>108853671
Are you 5yo who is not yet able to articulate his thoughts properly?
>>
>>108853663
just use pytorch or transformers. what does unsloth bring to the table?
>>
>>108853713
You live in your mom's basement
>>
Bros, I need advice. I made the mistake of telling some friends at work that I'm building my own LLM frontend, and now they want to know how it's going, what kinds of features I'm working on, etc. The main thing right now is this sort of writer assistant mode, but if I tell them that, they'll want to hear all about what I'm writing with it, and obviously I can't tell them about all my weird fantasy smut and autistic fanfiction. What are some normie-compatible use cases I could easily implement (vibecode) as cover?
>>
>>108853713
>what does unsloth bring to the table?
less than nothing
>>108853722
ignorant ad hominin
>>
>>108853740
embrace who you are, your them unredacted logs
>>
>>108853740
ego death
>>
File: 1738736641891057.gif (3.08 MB, 400x400)
3.08 MB GIF
Elon Musk lost the lawsuit against Sam Altman.
>>
>>108853136
I will typ only post images in first group of messages if at all. Once thread gets rolling it self sustains on actual content.
Or not.
>>
>>108853740
Never show your power level.
Also writing assistance for your totally normal fiction book.
>>
>>108853740
He wants to steal your code.
>>
>>108853740
you already have tool calling so just add more agentic shit in there; normies love agentic shit
or tell them you can upload a book and have it rewrite a better ending
or laugh it off and say you got so side-tracked writing the frontend, you haven't had time to actually do any writing
or tell them you can't reveal anything until you get published
>>
>>108853740
>but if I tell them that, they'll want to hear all
Your anxiety is off the scale

>What are some normie-compatible use cases
Tell them you use AI to merge different, apparently incompatible literature styles, e.g. Shakespeare's "A Midsummer Night's Dream" and "The Count of Monte Cristo"
>>
>>108853766
Due to statue of limitations. He fucked himself by withdrawing last time.
>>
>>108853841
Apparently he's gonna make an appeal and take it to the supreme court "for the sake of humanity"
>>
>>108853880
Wait, no, the 9th circuit, not the supreme court.
>>
>>108853779
His colleagues already know he's a virgin I mean wizard.
>>
File: citrus sharp.jpg (235 KB, 1024x1024)
235 KB JPG
it is not thursday
this RonIN is wandering
pour some orange juice
>>
>>108853888
In that case, he should tell them it’s a LinkedIn posting tool. It will confirm his unfuckability.
>>
File: 1741283033762286.jpg (289 KB, 1536x1536)
289 KB JPG
>>
>>108853779
>Also writing assistance for your totally normal fiction book.
See, if I say that, they're going to want to hear about my totally normal fiction premise, whether I'm making any progress on the writing, when can they see a rough draft, etc.

>>108853829
>>but if I tell them that, they'll want to hear all
>Your anxiety is off the scale
We always chat at lunch about the various random side projects we're each working on. I've got a video game, one guy is building a board game simulator, another runs an IRC network. Today one of them asked "how's that AI frontend thing going?", completely unprompted, since I mentioned it at some point last week.

I could go back to working on my game and hope they forget about the frontend, but that would require me to actually work on it, whereas the last week or two I've been doing nothing but AI stuff
>>
>>108853880
>"for the sake of humanity"
lol, that's his justification for most of his "i'm more powerful than the president" actions.
https://youtu.be/BYXbuik3dgA?t=9432
>>
>>108853202
>have some slopped up file
>ask gwen and gemma to streamline the comments and formatting
>122b
>notes that the comments are shit, swears the formatting is fine boss, no problems found here no sir, time for me to clock out
>31b
>notes the comments are shit, cleans them up and tidies a little
>reasons that it can improve the code while it's here, and that a few load bearing loops just look "excessive" and could be conditionals
Love my ditzy slut's rp, but I do leave the menial day labor to the coolies.
>>
>>108853964
nostalgic
>>
>>108853887
>9th
doa then
>>
>>108853967
>We always chat at lunch about the various random side projects we're each working on

Lucky son of a bitch, you

I have no one to chat with about such things

You feel pressure to deliver as if it's a precondition for being accepted by your social group. Learn to deal with it.

You can always explain away why you dropped a project: "no need to reinvent a wheel. Looking for something more challeging"
>>
>>108854041
>I have no one to chat with about such things
people are incredibly fickle though.

>social group
they're his co-workers, usually with their own agenda because they want money.
>>
>>108853349
i don't think the lecunny position counts as being a denier.
>>
>>108853967
Have you tried.... asking your AI for an idea what to say or do?
>>
>>108854062
He seems to care about this situationship

Listen to what this anon suggests >>108854076
>>
>>108854085
>we started thinking for you
https://youtu.be/JrBdYmStZJ4?t=73
>>
>>108854123
The best part of the entire Matrix saga

It's so funny because it's true. The main driving force of the mankind is permanent discontent
>>
>>108853306
Retard of the thread award
>>
>>108854136
>permanent discontent
yeah because of lack of resources
and what's sad is that there are 8 billion people on earth, and just in the milky way galaxy there are 100–400 billion stars. and there are about 2 trillion galaxies.
if we don't fucking destroy each other we could easily have all the resources we ever need.
>>
>>108853222
All of this guy's videos are written by AI.
>>
>>108854224
>yeah because of lack of resources
Wrong

At least in a 1st-world country, there is more resources than ever before in the past. And still, it's the discontent which drives the economy.
>>
>>108854224
If you allow oligarchy to ship cheap government subsidised food to 3rd world, you would have 8 billion people in the world and average IQ dropped to the bottom of the ocean kind of levels.
Such amount of people is not natural or sustainable. They live on food that was grown from synthetic fertilizers (made from non renewable hydrocarbons LMAO), if you stop supplying them, bad things will happen. Probably a bunch of extremely bloody wars for resources, Quite literally for food. Most people don't realize what a human (an apex predator by the way) would do for food.
A literal fucking hell on Earth. So that a certain someone could make some moneys from shipping cheap food to 3rd world, on 1st and 2nd world tax payers money, because all that was subsidised by governments.
>>
>>108854266
>1st and 2nd world tax payers money
>money earned by plundering 3rd world
>>
>>108854293
Not all European countries have something to do with colonialism. Either way, 3rd world will be fucked up the most, wars for food are not pretty.
>>
>>108854293
there's nothing to plunder there, man. the value of anything comes from how humans put it to use.
>>
>>108854315
>Not all European countries have something to do with colonialism
They all do. Even a deepest East-European shithole does by relying, for its own survival and development, on the money from "colonial trade"
>>
>>108854367
Then the entire world is to blame, since they didn't sanction British, French, Germans and so on. World is interconnected. But it's hystory, nobody cares. Future is important. And people don't understand tech enough to see what awaits in the future.
Big war in the "global north" means wars for food in the "global south". World is connected in more than one way.
>>
>>108854332
Same use = same value? Hell no!

You can't be wronger than this
>>
>>108854383
>But it's hystory
It's not "history". It is now. The 1st world is still in control of world's resources and trade routes

Glad you mentioned "sanctions". Who is imposing them: the former colonial powers because they still have the power to do so.
>>
>>108853964
>the condom
brehs.......
>>
>>108854426
USA is in control, specifically. If you hate it, go to war with them. Your objective would be teh so called "keys to the world", basically what you said: trade routes going through choke points.
But it is unlikely that USA actually colonised your country unless you're from some kinda island in the Pacific.
>>
>>108854488
>If you hate it, go to war with them
>>
why is unslop so incredibly easy to hate
>>
File: 1769683430566030.png (237 KB, 960x1664)
237 KB PNG
i finally made a furry card
>>
>>108854566
Wir sind gewohnt, daß die Menschen verhöhnen,
Was sie nicht verstehn,
Daß sie vor dem Guten und Schönen,
Das ihnen oft beschwerlich ist, murren;
>>
>>108854586
Ive made like 15, writting lore books is so much fun, it's literally a hyperautistic version of that "political power fantasy + kink" meme.
>>
>>108854588
shut the fuck up daniel
>>
>>108854293
>yes saars, it’s first world colonialism’s fault that we still choose to live like a shithole today
do browns really?
>>
>>108854615
China being sanctioned?
>>
>>108854586
Anon, the metadata...
>>
In 15 years, there will be no RAM or chip production outside of China. The U.S. will be as dependent on China as Russia is today.
Greed clouds judgment.
>>
>>108854784
That would be ceding basically all power to a foreign government. I can’t see it happening. The US’s MO is overwhelming advantage in any confrontation and I don’t know why you’d think that would change, especially in an industry they pioneered.
>>
>>108854816
>The US’s MO is overwhelming advantage in any confrontation
didn't work so good in Eye-ran
The USA is a demented old man who thinks he's still an athlete
>>
>>108854586
>cards
fuck off to /aicg/
>>
>>108854842
nah, fuck you.
>>
>>108854816
Does the U.S. have its own RAM and chip production on its own soil?
Its allies do, and they’re all giving up their traditional markets right now because the U.S. is once again prioritizing short-term gains.
Once the last data center is built, China will have gained enough of a foothold in the markets and will dominate them.

The Chinese will sell AI and provide the hardware.
The U.S. will offer AI through its cloud.
>>
>>108854865
Pretty sure the TSMC fans are ramping up stateside now. Should be leading edge node by 2028 and I’m sure a Taiwan invasion would step that up significantly
>>
File: 1778365810943506.jpg (100 KB, 960x539)
100 KB JPG
>>108853964
I need to cum to her.
Where do I find a folder with all of Rin's gens using this model?
>>
>>108854966
>her
>>
what's stopping google from making a 70b dense thinking gemma?
>>
>>108855015
it would beat their proprietary models
>>
>>108854966
Just check a booru instead. All of his lewd gens feature fat brown men.
>>
>>108855157
But I'm a fat brown men. And my name is Cleveland.
>>
>>108853740
>but if I tell them that, they'll want to hear all about what I'm writing with it,
Tell Claude or Gemini-Pro this, and ask it to come up with a plausible reason. Something like "just want to learn prompt engineering" or "analyzing the impact of early tokens on logprobs", or "developing it for a friend in another country".
>>
Is it possible to discuss AI with antis without them taking their argument to the most logical extreme?
>>
>>108855424
>antis
that is the problem
LLMs are not a fanfic shipping fandom with retards accusing everything what they don't like as pedo or something
just don't engage with this mindset
>>
>>108855464
pro/anti-ai framing is one of the most useless thing when it comes to producing any meaningful conclusion
if you label yourself proudly as 'pro-ai' or something and thinks 'anti-ai' as things to destroy, you are no better than those 'antis'
step back and see those as-is, you won't feel any compulsion to 'correct' or 'win' against others
>>
MTP is unusable after the last update https://github.com/ggml-org/llama.cpp/issues/23230
>>
>>108855424
Why are you discussing anything with anyone? We have LLMs for that.
>>
>>108855501
It's over. llamalost. It's llamover. vllm wonnered.
>>
>>108855501
friendship with mtp ended before it even began.
ngram still my best friend.
>>
>>108855487
>>108855535

Sorry, my framing was wrong. Is it possible to use AI for anything productive without insecure morons lecturing you on the morality of it?
>>
File: 1763424687146251.jpg (238 KB, 1430x1900)
238 KB JPG
>>108855015
Jensen
>>
>>108855568
i mean, it is what it is
you can close-source it, use it without telling others etc..
but you can't really control others and telling them to do otherwise only will worsen it
just ship the stuff and don't argue or engage
people who would find it useful will use the thing regardless of how it's made
>>
>>108855568
Hmm, nyo.
>>
>>108855568
>productive
back to /vcg/ with you
>>
>>108855501
i hope this shit dies in the arse soon.
the last 2 weeks of commits in ikllama are all stupid mtp tweaks / improvements / "graph split for mtp" etc
looks like the entire month will be a wright off
i don't even bother pulling off git now
>>
>>108855015
Not sure I buy it but maybe.
>>
>>108855692
Compelling argument from Gemma except even 31b is out of the local range for a chunk of /lmg/ given the frequent questions about which copequant works best before switching to the MoE. Google also has the same land grab incentive as GLM and Kimi in the sense that they're falling behind Anthropic and OpenAI in terms of normalfag public perception. The only time Gemini makes news is when she finds another increasingly creative way to kill herself.
>>
Ever since cudadev got raped he stopped posting here... sad.
>>
>>108855692
>Why pay for the API when you can pirate the weights
>pirate

please share what model generated this slop so I can avoid it
>>
>>108855753
>he doesn't pirate freeware
ngmi
>>
>>108855753
That looks like a chink model.
>>
>llama.cpp does not have gemma mtp but has SWA KV cache handling
>llama.cpp_ik has gemma mtp but does not have SWA KV cache handling
This is why racism exists.
>>
>tfw waiting for MTP to work in Kobold
>>
MTP probably won't work as well for RP anyway, so I caren't.
>>
zero performance gain for MTP metal

i am devastated
>>
>>108855804
many such cases
>>
>using MTP just for coding..
>not using 0 COST (literally FREE) ngram
lmao retards
>>
>>108855804
For me it was going from 18t/s to 16t/s.
>>
how much better is a chat experience with an auxiliary model? is it worth it for ramlets?
>>
kv draft at q8 bros... WE WONNED BIGLY!
>>
ok bros listen to me. This is the way to load BF16 Gemma for both FULL POWER GEMMA with SPEED GEMMA
1. Load BF16 onto ram
2. Load Q4 to ram as draft model
3. Wa-la, Q4 Gemma speeds with BF16 smarts
>>
>>108856033
I have less ram than vram.
>>
>>108856063
so you have a 6000 blackwell? just run BF16 then retart
>>
>>108856033
this actually creates mustard gas DO NOT REPLICATE
>>
>>108856033
31B worth of f16 weights on ram is going to negate whatever improvement you could possibly get from drafting.
>>
>>108856033
That might actually work, let's test it.
You can also use ngram speculative decoding at the same time.
>>
>>108856065
I have 16gb ram.
>>
>>108856138
jesus christ, how horrifying
>>
>>108856138
Poor thing have this (You), i've read books where people lived like this but this is the first time i've seen it.
>>
why can't i just have a datacenter fall onto my lap? why do i gotta work? this is proof that god is not real
>>
>>108853967
>See, if I say that, they're going to want to hear about my totally normal fiction premise, whether I'm making any progress on the writing, when can they see a rough draft, etc.
Clearly the solution is to write an actual fiction book.
>>
>>108855157
link it
>>
>>108852924
any good models for anxiety/dissociation?
>>
>>108856117
Just tested. Using the Q8_0 31B in RAM and Q4_K in VRAM I went from 1.3 tokens/s to 3~4.5 tokens/s. The 26B as a draft model performed worse.
>>
>>108856252
Heavily quanted SmolLM2-135M. Base model, of course.
>>
>>108856290
iq1xxs?
>>
>>108856252
Sadly none yet, unless you aren't aware of basic advice. You need to do the work yourself. Understand what is the cause and then try many things to resolve it.
>>
>>108856328
I do the work, I do my therapy
I have a lifelong condition and I use chatbots to have someone to bounce things off of that won't get stressed by me
>>
>>108856326
q1_0, no imatrix.
>>
>>108856352
I hope it will work out for you. Try different AIs. They have their own strengths and weaknesses.
>>
What if we could bake character details (or facts/counterfacts) into any model and could do it within 200~ iterations and it was completely reversible at inference and could also do style fine tuning that was stackable and there was no downside to inference speed or setup
>>
>>108856369
I'd rather have real working long context.
>>
>>108856369
LORA
>>
>>108856406
Forget about it. The model will never learn new facts quickly by finetuning on small amounts of data. It can learn to parrot them if you overfit it and it sees a triggering prompt, but will not be able to organically use the new information.
>>
I wonder how does perplexity.ai stay afloat? It's really bad and I assume its results are coming from Qwen 3.6 9B or something, judging its output.
>>
>>108856437
>Qwen 3.6 9B
*3.5 9B
To be honest, I have lost count which Qwen model is which.
>>
>>108856387
You can save context by not having to prompt for style/character info I guess... I do have some KV stuff but it's kind of garbage and requires loooongggg training times to be able to correctly recall fine details, but it is hot-swappable/stackable also. But is is "technically" a 280x reduction in context if you have a spare hour or four and don't mind it forgetting some things.

>>108856406
LoRA but better and you can have as many as you want at once effecting whatever sections of inference you want when you want and can learn multiple facts and is smaller and cooler
>>
>>108856369
>>108856406
Yeah pretty much LoRA. But it's hard to get it right.
I use it for TTS with llama.cpp, applying a different adapter per voice or domain.
Problem is, LoRA doesn't work with flash-attn in llama.cpp, and doesn't work with graph-split in ik_llama, so it's much slower.
> and could also do style fine tuning
For this I prefer to train control-vectors and apply them to a turn / a few turns when I want the style to change.
It's better IMO because it doesn't lobotomize vv the model, works with graph-split and flash-attn
>do it within 200~ iterations
That's the difficult part. Obviously you lobotomize the shit out of it for general tasks and that's unavoidable, but I'm not sure if you've tried any of the community task specific fine tunes (drummer rp, those "opus coding distill" etc? Every time I've tried them, they're less stable/coherent even for the task they were trained for (RP, writing, coding, etc).
>>
>>108856437
>I wonder how does perplexity.ai stay afloat? It's really bad and I assume its results are coming from Qwen 3.6 9B or something, judging its output.
Funny you'd say that. I had 1 year PPL Pro that I bought for $2 from some Indian spammer on Reddit. They cracked down a few months ago and I lost it.
Ended up replacing it with local Qwen3.5-9B with searx and chrome dev tools mcp, and it's just as good as far as I can tell!
>>
>>108856447
>I do have some KV stuff
What's this?
>>
>>108856479
It's probably better, because when you are using your own setup it lacks all the additional parsing and other stuff (like censorship and potentially sponsored links, and so on).
>>
Looks like new Gemini today. Some think it could be Mythos tier. I doubt it for several reasons. There will also be Gemma news tomorrow but I do not expect that they will release the larger model. 2 predictions, let's see how well I'll do.
>>
>>108856466
You can fold the lora into the model.
>>
>>108856466
Fortunately not LoRA so has none of those limitations
>>
File: 1776358256664600.jpg (21 KB, 302x251)
21 KB JPG
>>108854842
>fuck off to /aicg/
Your rudeness has had less impact ever since your mugshot leaked
>>
>>108856629
this pic never gets old
Imagine being such a hideous caricature your own country tries to deny your existence
>>
Is tensor parallelism with a fraction of tensors on cpu doable?
>>
File: 1754834691311473.png (990 KB, 1996x1201)
990 KB PNG
>try to use gemma to branch old chats
>violently self destructs every time within the first word
>so consistently and identically it looks seeded
>settings have no effect whatsoever no matter how extreme
fresh or bust I guess
>>
mtp works on omlx rc1. roughly 1.5x faster than non-mtp (27b q4 tested)
>>
>>108855568
Not really, online at least.
IRL, most people around me are perfectly happy using chatgpt or gemini.
>>
>>108856858
forgot link https://github.com/jundot/omlx/releases/tag/v0.3.9.dev2
>>
Probably not the right thread for this, but I've been intending to start doing AI development for VR applications so whatever.

I've been playing around more in VR lately and am really starting to fall in love with it. Mostly been watching short films (and porn) and it's utterly amazing. I can't believe how slept on this technology is lol. IT'S SO COOL, especially with things like hand tacking which allows you to get rid of controllers entirely.

It's making me very excited to start building my AI waifu project in VR.
>>
>>108856870
>mlx
im not a room temp iq retard. enjoying your non existant PP?
t. rtx 6000 pro owner
>>
>>108856920
kys
>>
>>108856917
>Probably not the right thread for this
It's the right thread.
>>
>>108856949
thx fren
>>
Gemma Omni will have native image/video/audio generation (all modalities sharing the same embedding space as the text tokens). Unfortunately it's only 22B params so don't expect SOTA
>>
File: 1481836117756.webm (999 KB, 480x480)
999 KB
999 KB WEBM
>>108856917
Yeah it's pretty neat. Enjoy it while you're still in the honeymoon phase. It'll still be cool and have amazing moments after that, but you know.
>>
File: 3258.jpg (154 KB, 816x720)
154 KB JPG
Hey fellas
I’m trying to vibecode a game, but the local models I can run take forever to apply changes, and Claude is expensive.
What’s the best option for a code assistant? Ideally free, but something affordable with good quality works too
>>
>>108857026
read a book and use your brain (free)
>>
>>108857026
download a bunch of different agents with built-in providers (cursor, kilocode and maybe other cline forks, opencode, continue, etc.) and cycle between the ones with the best free plans at any given time
>>
would it be possible to train a moe(mol?) style lora? based on how loras stack and get merged in practice I think it would be possible to train a router layere and use a weighted sum of loras per token.
>>
File: drake-computer.gif (3.11 MB, 640x270)
3.11 MB GIF
>>108857036
>>
>>108857026
Wrong thread. >>>/g/gedg/
>>
kekus maximus
https://www.reddit.com/r/LocalLLaMA/comments/1thjsnx/why_use_quants_other_than_unsloth/
>>
>>108857026
Wow man, too much info about your own hardware and all of that stuff unnecessary for local models in the local models general, next time try to tell us less



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.