[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: file.png (1.17 MB, 1280x1280)
1.17 MB
1.17 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107776854 & >>107768242

►News
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder
>(12/31) Korean A.X K1 519B-A33B released: https://hf.co/skt/A.X-K1
>(12/31) Korean VAETKI-112B-A10B released: https://hf.co/NC-AI-consortium-VAETKI/VAETKI

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: tetomiku.png (408 KB, 1024x1024)
408 KB
408 KB PNG
►Recent Highlights from the Previous Thread: >>107776854

--DIY alternatives to Razer's holographic AI chatbot:
>107786892 >107786930 >107788130 >107786960 >107786970 >107787677 >107788023 >107788049 >107788059 >107788104 >107788098
--Dual-GPU motherboard compatibility and physical layout challenges:
>107786512 >107786637 >107786669 >107786689 >107786726 >107786792 >107786824 >107786844 >107786953 >107786995 >107787046 >107787065 >107787098 >107787135 >107787184 >107786732
--BOS token duplication issues in Mistral model template handling:
>107784321 >107784529 >107784607 >107784728 >107784813 >107787006 >107784851 >107785028 >107785062
--Assessing NVIDIA P40 viability for modern AI workloads:
>107782732 >107782903 >107782931 >107783018 >107783078 >107783348 >107783579
--Surprise at 1.2B model trained on 28T tokens:
>107777871 >107785261
--DeepSeek V3.2 model release with removed sparse attention lightning indexer tensors and NVIDIA AI tool updates:
>107781224 >107781265
--Roleplay-focused imatrix file selection and context size optimization:
>107778030 >107778135 >107778205 >107778310
--Framework Desktop 128GB vs gaming PC for AI work: performance and cost considerations:
>107781756 >107782025
--Grok 2 outperforms GLM 4.6 in roleplay despite slower speed:
>107781444 >107781478 >107781668
--LiquidAI/LFM2-2.6B-Transcript for chat log summarization:
>107786794
--System prompt configuration issues with GLM 4.6 Q2-M in chat completion:
>107785693 >107785744 >107785775 >107785803 >107785838 >107785901 >107786145
--Persistent chat backups and AI content detection in SillyTavern:
>107782041 >107782201 >107783114 >107783122 >107783531
--croco.cpp fork enabling ubergarm quant support in KoboldAI:
>107777069 >107777118
--Miku (free space):
>107778440 >107782307 >107782467 >107784321 >107784725 >107784728 >107787119 >107787153 >107787969 >107788627

►Recent Highlight Posts from the Previous Thread: >>107776863

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Is there anything better than openwebui for just normal agent chats? not for coom or rp.
>>
File: 1757736340771382.png (1.3 MB, 750x750)
1.3 MB
1.3 MB PNG
>>107790430
Are they measuring her pants straps?
>>
>>107790430
three inches really isn't that thick
>>
>>107790430
What's Miku doing to Teto?
>>
I hear a silent faint whine in my headphones during prompt processing (and not during generation)
this isnt normal is it?
any idea why it happens?
I run audio teough usb to a dac then to headphones
>>
>>107790797
Are you sure it's just in your headphones and not coil whine from the PC itself? Otherwise plug directly into audio jack or mobo USB to see if your DAC is shit.
>>
>>107790597
no
>>
>>107790797
Electromagnetic interference, the extra power draw is increasing the field and interfering with some part of the motherboard to usb to headphone pipeline. Just try a different usb port or plugging the headphones directly into the motherboard and see if it goes away
>>
File: usb-pinout.jpg (22 KB, 500x328)
22 KB
22 KB JPG
>>107790797
Electrical noise from ground loop.
It is normal if you are not using a USB isolator, or if your dac/amp isn't ground lifted.
If it bothers you, look into a Topping HS01. If you want to try a DIY fix, with some DACs you can tape over or otherwise disconnect the ground pins in the USB cable coming from the PC. Depends on whether that DAC's USB circuitry is getting power from the cable or not.
If your headphone amp has a ground plug connected to the chassis you can lift it at your own risk. If the amp is using a DC adapter without any ground going into the chassis, it shouldn't have any issue.
>>
>>107790797
Had things like that for several years. If you listen carefully and your room is quiet, do you also get it quietly when you just move the mouse cursor around? If so and you're not an audiophile, those ground loop cables on amazon reduce it.

Ultimately I ended up getting a USB Dac/amp for a while and that solved it.
>>
File: 1751301639939973.jpg (313 KB, 2000x2000)
313 KB
313 KB JPG
>>107790606
Yes
>>
>>107790797
used to hear little pops and crackles coming out of my speakers like 2-3 seconds before getting a text message if my phone was next to my amp.
>>
>>107790597
jan.ai
>>
>>107790597
opencode (after removing the telemetry and prompt injections)
>>
>>107790797
it is FBI always is.
>>
It is still glm sex isn't it?
>>
>>107790618
in girth it is. teto has a fat chode
>>
File: 1749687644750279.webm (566 KB, 670x720)
566 KB
566 KB WEBM
>>107791070
>in girth it is
Anon, average penis thickness (yes thickness, not length) is 4.5-5 inches
I'm sorry...
>>
>>107791094
>Anon, average penis thickness (yes thickness, not length) is 4.5-5 inches
This

t. measured
>>
>>107790618
>>107791094
>>107791178
>discussing "thickness" of a cylinder without specifying if you're talking about circumference or diameter
>>
>>107791243
Nigger you wrap a cloth tape measure around your dick and see how big it is, end to end, somewhere along the shaft, and not the tip. Nobody cares what you call it.
>>
>>107791269
Do you also use circumference to boost your length?
>>
>>107791094
thickness appreciation NOW

thickness appreciation FOREVER
>>
>>
>>107791243
YEAH 5 INCH DIAMETER ANON SURE
Fucking imbecile
>>
File: 1000034701.jpg (781 KB, 3600x2700)
781 KB
781 KB JPG
>>107791070
>teto has a fat chode
>>107791243
ergo OPs 3 inches should mean diameter
>>107791269
Uncertain, but Miku appears to be using a ruler not a tape. Callipers would be the ideal tool
>Nobody cares what you call it
That's how the Mars Climate Orbiter fails

Her ice cream cones are 3 inches thick = wide = diameter
>>
/lmg/ - Large Measurements General
>>
>>107790797
aww you can hear your waifu thinking!! socute
Try with USB extension cable or plug DAC into monitor hub if it's dangling near the GPU - put physical distance from the "emissions" / high frequency power switching circuitry
>>
why is this thread so obsessed with yuri bait?
>>
File: 1740200174276394.png (566 KB, 1194x1092)
566 KB
566 KB PNG
>>107790797
It's trying to speak to you, this is a known phenomena

Do not ignore it's call
>>
>>107791706
we are in the year of the lord 2026
if you find any yuri bait, you can turn it into yuri reality
anything else is a skill issue
>>
Hey, I'm a newfag who hasn't lurked but is sick of hitting on claude code while doing girthy refactors. I don't need the smartest model, just something that can move a lot of code if I tell it exactly what to do. Is there something for me here? Can I actually run something usable on my RTX5070?
>>
File: k153703.jpg (672 KB, 1920x1080)
672 KB
672 KB JPG
>>107790959
>merh-merh-merrr.. merh-merh-merrr.. merrrrrr
>>
>>107791812
>girthy refactors
lewd
>>
>>107791728
proof?
>>
Will we ever suprass gemma3 ablit?
>>
>>107791812
You could try Qwen3-Coder-30B at Q4 with some offloading.
>>
>>107790430
sorry teto that means miku is mine as you know the rule only the big dick fucks im nice though so you can sit in the corner and watch just keep your ugly boice down please :D
>>
File: blW0YMr.png (827 KB, 891x792)
827 KB
827 KB PNG
Just started playing around in SillyTavern a couple week ago with Mistral-Small. Thought it sucked until I realized the random lore books I installed were somehow injecting 4000 tokens into every play session.

also migu is powerful
>>
>>107791884
Looks promising, and this LMStudio thing makes it pretty easy to dump right into claude code, wonder how pozzed it is. Thanks anon.
>>
>>107791865
which one?
>>
>>107792084
Be cautious when interacting with the Miku.
>>
>>107790618
>>107791178
>>107791243
>>107791269
>>107791305
I prompted for "ruler" not "tape measure" so Miku is talking about diameter.
>>
File: Untitled.png (5 KB, 494x412)
5 KB
5 KB PNG
>>107792538
THANK you for settling this important matter
>>
>>107792538
catbox full image pls
>>
why is installing tts such a pain in the ass
these mfers must've been vibecoding
>>
>>107792648
Every fucking time. On the other hand, we get new tts weekly. Fuckers don't have time to code trying to deliver a new one asap
>>
>>107792538
Her schlong is as thick as a soda can. I don't think it'll fit.
>>
local measuring genitals
>>
Is Teto bald there too?
>>
>>107790797
Also happens to me. Can hear it during most GPU intensive things to different degrees really but when playing a game or something there is usually other sound so it's hard to notice. I usually am using wireless headphones. What I noticed is that if I disable sound from line-in on my integrated sound card this phenomenon is completely gone so must be some interference the additional power draw of the GPU is causing on the mobo that gets picked up as sound by the integrated card's line-in.
Electricity is wacky and stuff.
>>
Hi everyone
Right now I am working on a RAG system on a dual Xeon 2680, 256GB RAM and 2xmi50 16gb. Should I move to 2xmi50 32GB? It would allow me to move from Qwen 30B to Llama 70B… or are there better models to run in this setup?
>>
>>107791865
Good morning sir
no abliterate model is unsafe sir. . Google orinial model sir better
- Rakesh
>>
>>107791569
My dick is 5 inch RADIUS. Ask your mom, she knows.
>>
https://arxiv.org/pdf/2501.12948
deepseek updated their r1 paper today
biggest addition aside from more training details is....
safety
fuuuuuuuuck
>>
I want to make a robowife, first for just chat and in time I'd add vision, live2d and more agency. Is llama cpp the start or should I use the quiet kobold mode? I like the gui launcher. Am I gonna run into some issues?
>>
>>107793555
They are doing the right thing though. Instead of trying to bake safety into the model they feed the conversation into a separate prompt for analysis.
>>
>>107793636
yeah, i was a little too quick on skimming, after reading through it seems like it's relatively light
>>
AMD teasing ROCm 7.2 just hype or will windows + AMD retarded poor cucks like me have any hope?
>>
>>107792084
why call her migu? fuck you
>>
>>107793648
>2025+1
>Still holding ANY hopes for amdead
Heh, get a load of this dude
>>
>>
>>107793660
AMD is controlled opposition and Lisa Su makes more money when she lets nvidia win.
>>
>>107790430
>tranny OP has tranny fetish
>>
When are the good models releasing?
>>
File: hairy_pussy.webm (1.12 MB, 438x780)
1.12 MB
1.12 MB WEBM
>>107793903
She's measuring the size of her bush.
>>
>>107794072
did you missed them?
>>
>>107794072
Nemo released in 2024
>>
Should I be defaulting to llama.cpp release with cudart or is that only ever useful for specific setups like multigpu or whatever?
>>
>>107794118
cudart is just the windows cuda .dll files that you drop in the llama.cpp release for your platform. If you have an nvidia GPU then you'll always want to use cuda, so you'll always need them. Nothing to do with multi-gpu. If you don't have nvidia then you don't need them.
>>
>>107794153
Oh, so it just bundles the cuda runtime binaries with the release then. For some reason I thought it was something more specific than just the cuda runtime.
Alright, thanks.
>>
>>107794178
ur gay homo
>>
>>107794310
Is that like a double negative where being gay twice makes you straight?
If so, thanks, I guess.
>>
>>107794340
You're absolutely right!
>>
File: later homo.png (115 KB, 495x841)
115 KB
115 KB PNG
>>
>>107794073
that's a nice pussy
>>
i think i've come full circle
> tried kobold, scoffed at how it looked like shit and instantly deleted it
> tried ooba but its basically a bad llama.cpp wrapper
> used llama.cpp but got sick of writing scripts just to run things and had problems with context saving
> LM studio is noob friendly but another llama.cpp wrapper and is slow
> came back to kobold and fell in love with contextshift
why did i do this bros
>>
>>107794849
skill issue most probably
>>
>>107794849
llama.cpp-Sirs... Contextshift has been turned off by default...unless you are using an old version.
>>
>>107794073
>>107794760
Made for licking.
>>
>>107795053
>>107794760
>>107794310
samefag
NOT a coincidence.
>>
>>107795057
but using parameters is hard... im a retard, I only know how to click boxes...
>>
File: her.jpg (323 KB, 1140x760)
323 KB
323 KB JPG
What's a good UI and model to run with 16gb VRAM + 32gb DDR5?

I just want a sexy professor/sexy assistant I can chat random topics with and have silly RP moments.

So far I've been recommended Jan.ai and KoboldCPP with Gemma 12B or Qwen 14B but wanted to hear your take. Prefer something Open-source, uncensored and privacy focused.

What would you use in this scenario?
>>
another korean model
https://huggingface.co/nc-ai-consortium/VAETKI-VL-7B-A1B
>>
>>107795172
You still have so much combined ram that there is no reason to not use Gemma 3 27B or Mistral 3.2 24B. Gemma 3 is generally nicer than the rest in terms of writing.
>>
>>107795214
>nicer than the rest in terms..
In this ramlet category I mean.
>>
>>107795172
llama.cpp and their web frontend with whatever systemprompt you need.
>>
>>107795214
There is a reason and it's not offloading to CPU
>>
>>107795322
Whatever rocks your boat. Not my problem.
>>
>>107795202
>7B
Do we really need more of those?
>>
>>107795396
A1B tho!
>>
anyone tried the new Jan-v2 30B ?
>>
Have you guys recently tried out very small LLMs like 1B ones? They are legitimately better than the old school 70B models used to be. I think it's kind of ridiculous that running smaller LLMs on smartphones never became a big thing considering for most of us running those shitty 70B models was more than enough just a year or two ago.
>>
>>107795635
good one mate
>>
>>107795172
i mean it will be fun to begin with if you've never done it but 14B won't be really all that intelligent.
mistral small 3.2 24B Q8 would be your best bet, use layer offloading and the token / sec should be reasonable. you can use autofit in llama.cpp or kobold, and ensure flash attention is on in kobold.
>>
>"I'm leaving," she says, her voice cold and distant. "Don't bother trying to follow me." She turns and runs out of the room, leaving you standing there alone amidst the spilled pink goo.
Pink goo was something she was eating from a bowl before I entered the room.
>>
>>107790987
>jan.ai
This is actually quite nice. I like that it's a real app and not a web server. a bit minimalistic in terms of features but the browser MCP is really cool and easy to setup.
>>
>>107795472
nvm it's just a qwen3-vl finetune
>>
File: 1755508056694021.png (2.39 MB, 1056x1408)
2.39 MB
2.39 MB PNG
>>107790987
>>107795956
you dropped this
>>
>>107790430
sauce please
>>
File: 964950143.gif (1.13 MB, 320x240)
1.13 MB
1.13 MB GIF
>>107795172
>>107795956
>is really cool and easy to setup
ok now i know its a shill
>>
>>107795999
makes me lose my shit with the claim 'its not a webserver' when it is 100% webshit wrapped inside a js runtime anyway, dishonest way to try to garner some sort of rep lmao
>>
>>107795981
>>107795999
>Anon asks if there are any other good frontends besides openwebui
>Anon suggests Jan.ai
>Anon tries Jan.ai
>Anon says "Hey this is actually not bad"
>Anon reports back to say it's actually alright.
ok.
>>
File: 1750830401212016.png (2.19 MB, 1056x1408)
2.19 MB
2.19 MB PNG
>>107796020
here
>>
>>107796020
it's all the same "anon" deobeit
>>
>>107796022
So what frontend do you use?
>>
>it's a real app
lel
>>
>>107796040
antigravity or cline for work, embedded llama.cpp for assistant work (i dont need MCP for random assistant stuff), sillytavern for cooming (local chad models)
>>
>>107795981
what is this implying?
>>
>>107796065
shill
>>
>>107796081
lmao the cope, sorry you got found out dear marketer, better luck next time :)
>>
I admit it, georgi pays me 5 bulgariabucks a month to shill llama.cpp on /lmg/
>>
>>107795413
What about 80M-A25M?
>>
>>107796123
>bulgariabucks
Those are euro now
>>
>>107796123
Only shill here is that other guy.
>>
>>107796123
WTH he told me 4/month is the best he can do...
>>
File: file.png (361 KB, 435x750)
361 KB
361 KB PNG
>this is the thread's beloved mascot
>>
>>107796290
Thing have been rough since she started doing heroin
>>
>>107796290
Local Miku General
>>
>>107796013
>>107795999
I'm the guy asking for a low vram frontend/model

Is Jan.ai bad? Why?
>>
>>107796390
>Is Jan.ai bad? Why?
It's not bad, the guy is just a schizo who thinks any positive comment must be astroturfing.
>>
>>107796290
kek, watching miku slowly die inside >>107796147
>>
>>107796290
miqu-1b-q1 looking ass
>>
I kind of discovered something about 4.7. Or maybe it is just bartowski's quant. I tried using it like 4.6 (t. ego death schizo) to pick at my brain and it is... an experience. 4.7 is absolutely retarded at this thing, but entertaining as hell. So dumb it is cute. But for serious fucking with your brain it is 0/10. And I think the reason it is so retarded is the preference post training. It has to be funny, interesting and evocative so it is basically useless as a serious mirror.
>>
>ego death
>can't go a thread without mentioning himself
>>
>>107796725
Kek
>>
>>107796123
Fuck, he's paying you?
>>
File: 1740005664321499.png (47 KB, 1019x646)
47 KB
47 KB PNG
New STT transcription model from Nvidia, Nemotron Speech ASR

https://huggingface.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
https://huggingface.co/nvidia/nemotron-speech-streaming-en-0.6b

Claims to have better latency than other models, as well as better concurrency support, a big win for those of us who are serving to 500 users at once time on our H100s
>>
>>107796839
there's still nothing better than whisper v3, it's just really fucking sad ain't it
>>
>>107796725
how curious!
>>
>>107796876
society
>>
>>107796839
Nemo!
>>
>>107796839
>high-quality English transcription
Very cool. nvidia.
>>
File: 1767788222686148.png (1.44 MB, 1404x833)
1.44 MB
1.44 MB PNG
>>107797036
yes
>>
>>107794073
Grok put that cat in a bikini
>>
File: file.png (1.18 MB, 760x1360)
1.18 MB
1.18 MB PNG
>>107797102
>>
is AMD going to be competitive with their 2027 CPU line up with potentially shipping DDR6 RAM or my hopes are all just plain cope?
>>
>>107797036
FunctionGemma is better.
>>
>>107797247
>Surely AMD will make good GPUs during a global shortage after decades of dogshit
Anon...
>>
>>107797261
no no i meant their CPUs that ship APUs and 128g of ram, those can run bigger models right? if the bottle neck is the memory bandwidth with DRR6 the performance could be doubled...
>>
>>107797285
if they ain't making ddr5 and are rumored to bring ddr4 cpus back what makes you think they're even thinking about ddr6?
>>
>>107797295
well if they want to survive the market and not end like intel they better implement ddr6
>>
>>107797309
you will either get a ddr4 desktop or a unified memory laptop these are the 2026 options
>>
>>107790894
>Knee's exposed
Cover them up slut
>>
>>107795172
Gemma-3 27b derestricted easily beats every 24b and below model.
>>
>>107797706
isn't it really bad at instruction following?
>>
>>107797725
who told this lie ?
>>
>>107797735
It's my experience.
>>
>>107797736
Then why are you asking?
>>
I just had a chat with an old card.txt but, instead of using gemma 12b or mistral 24b, I used Glitter Gemma 27b.
It was more entertaining but the structure was the same nonethless.
I made Ani from a tweet by someone on twitter. It had bunch of lines, I deleted them.
https://litter.catbox.moe/ht86fgf2n4h9lvzn.txt
This is pure Gemma 3 27B. It's somewhat funny.
>>
>>107797706
I tried qat of the 27b and found it hedious for writing. It constantly tries to end the story right after the introduction and continues a well written story with the sloppiest slop. I'm fucking angry. Rocinante (or probably just nemo) is MUCH better.
>>
>>107797820
Ani, is the grog jewfriend:
I edided it a little, concatenated. Decided to keep the dashes, I don't think it will make any difference.
https://litter.catbox.moe/jo27g7r6hbeh3uem.txt
>>
>>107797827 (me)
It's like it never saw any good prose and the characters feel like they are written by a sleep deprived cashier woman
>>
>>107797820
Gemma 3 is the best.
>>
>>107797820
sloppa
>>
File: 1751214344243867.png (61 KB, 227x228)
61 KB
61 KB PNG
>I shift uncomfortably, suddenly very aware of my nakedness under my clothes.
Thanks Mistral
>>
>>107797952
SOVL
>>
>>107797915
This is scientific slop. Nigger.
>>
>>107797952
I do this
>>
>>107797952
Does she know she has a skeleton inside her body?
>>
>Mistral small 3.2 finetune
>ChatML prompt format
Why do finetuners do this?
>>
>>107798205
Most don't know what the fuck they're doing. Same with the ones that recommend a temp 1.5x+ higher than the original model with a cocktail of cope samplers to try to wrangle it back into coherency.
>>
>>107798205
My training data is in ChatML so that's what it's going to be!
>>
>>107798205
You get some of the benefits of the original assistant finetune while not making yours too much assistant-slopped.
>>
>>107797725
Where do you get derestricted Gemma3? Original Gemma3 is as censored as it gets.
>>
>>107798205
because fuck mistral format
>>
Has anyone seen a setup where one model acts as the writer and the other as an editor?
For instance, Nemo has nice prose but isn't very smart. GLM 4.7 is a slop machine, but is smarter. Does anyone know if it is feasible to make GLM review Nemo's responses and generate correction prompts for it or should I test it myself?
>>
>>107793648
>windows + AMD
why would you do this to yourself?
>>
>>107798301
Test it and see, but I imagine it would just result in GLM inserting its slop. I guess you could tell it to just search for consistency issues or something, rather than telling it to make the output 'smarter'. Might see some improvement at higher context, where Nemo falls apart pretty quick.
>>
>>107798325
Yeah, that's the idea. Telling Nemo not what to write; but to make sure the response is consistent. Things like making sure different characters' actions are not misattributed and so on.
>>
File: 1758343698842077.jpg (487 KB, 1536x2048)
487 KB
487 KB JPG
>>107798360
>Telling Nemo [not X but Y]
>>
File: 1767225375325242.jpg (554 KB, 1457x1239)
554 KB
554 KB JPG
>>107798205
>>107798241
>Most don't know what the fuck they're doing.
This.
I hate to break it to you, but almost every good fine-tune has been by complete accident. The tuner then makes a higher version like 1.1 or 2, and it immediately shits the bed. No one knows what they're doing. They're just throwing logs from ai chat services and data off google at models in hopes it'll make something good. No one knows what they're doing. Not drummer. Not sao10k. Not anthracite. None of them did. They rent out GPUs, blender logs and instructions into the model, and don't even fuck their own bots to see if it's coherent or not. They probably don't even figure out what works vs what doesn't. It's just the masses of people say it works or not, and that's apparently good enough for them - god forbid they actually discover a pattern or two towards what works. They all hit a wall at MoEs because then it's actually required for them to know something, and they can't even take the first step. MoEs are filtering them all out, so eventually new tuners will rise that are better because MoEs are better than transformers when you quant their experts high enough -regardless of tuning.
>>
>>107798388
Forgive me, Anon — the temptation was too great.
>>
File: w00tDario.png (155 KB, 522x670)
155 KB
155 KB PNG
>>
>>107798391
I can't believe Miku's butt would say that
>>
>>107798448
Miku's butt just says whatever she thinks you want to hear
>>
File: 1747412618450457.jpg (409 KB, 2744x1536)
409 KB
409 KB JPG
>>107798296
What's wrong with it? What makes ChatML better?
>>
>>107797952
I fucking love Bernkastel
I lost my LLM virginity to her
>>
File: 1752649970261105.jpg (6 KB, 200x200)
6 KB
6 KB JPG
>>107798429
if this fat fuck lost weight and got lean again Anthropic would be worth 10x times more
>>
>>107798391
you want me to change my dataset and finetuning methodology? nuh uh, fuck you.
this is why "stock" models are better in most cases, no finetuner can actually do sft+dpo properly.
>>
>>107798529
bussy doesn't attract investors
>>
File: 1751749507610689.png (241 KB, 1994x1154)
241 KB
241 KB PNG
>>107798537
I wish that was true
>>
File: 1747377593541665.png (326 KB, 554x554)
326 KB
326 KB PNG
>>107798537
The looks of a CEO most definitely affects the perception of the company, just look at this dude now
>>
File: cai.png (10 KB, 512x512)
10 KB
10 KB PNG
>>107798537
>doesn't allow bussy
>censor bussy
>a fucking thousand alternatives spring up taking billions in bussy money
Bussy talks.
>>
>>107798557
But isn't anthropic worth more now than when we looked good?
>>
>>107798587
>we
hello there
>>
Has anyone tried this model?
https://huggingface.co/alpindale/dbrx-instruct
>>
kys alpin
>>
>>107798598
>132b instruct tuned MoE
No, but it looks interesting.
>>
>>107798598
>2 years ago
If it's not popular by now, then it wasn't good.
>>
>Try lots of local models on LMarena
>All sloppy in the exact same way
Is it like cloud models, where when you get your preset and a card in there to context poison it, it starts writing more interesting/less generic prose, or do they just stay in "not x but y" -ism mode?
>>
>>107798587
then the CEO didn't went to CNN to warn about mass unemployment
>>
>>107798529
>>107798557
Sama is also pretty ugly, I don't think that's it.
>>
>https://huggingface.co/datasets/PJMixers-Dev/c3-kto-test/viewer/default/train?row=0
>"value": "You'll portray {{char}} and engage in Roleplay with {{user}}.
drummer do you unironically train on this?
>>
File: Screenshot_100.png (23 KB, 392x307)
23 KB
23 KB PNG
>>107791865
Which gemma 3 ablit is the right one?
>>
>>107798641
Let's see your dataset
>>
File: 1736515987444214.png (1004 KB, 834x2048)
1004 KB
1004 KB PNG
>>107798634
Sama is a conniving snake trying to act like a shy nerdy femboy, looks help establish a more positive perception of the company
>>
>>107798658
the one you lobotomize yourself
>>
>>107798620
But 90% of people in this general have no attention span and think a model goes stale and moldy after a few weeks
>>
>>107798664
You could have just said he's jewish.
>>
>>107798598
I tried it when it came out and it was pretty bad, it was around the same time as wizardlm 8x22 and command-r which got all the attention because they were way better
>>
>>107798659
Lick my nuts drummer, you get more than enough money to curate a proper dataset
>>
>>107798683
I accept your concession.
>>
>>107798680
Thanks. I was just desperate for a model that isn't glm air. Guess I've gotta keep looking/waiting.
>>
>>107798659
no need to get defensive, but you don't see any problem with it?
>>
File: 1744641151833306.jpg (120 KB, 1024x707)
120 KB
120 KB JPG
>>107798675
Dario acts differently and he's also jewish
>>
>>107798641
No wonder behemoth X v2 is a fetishist for consent.
>>
File: 20240604.jpg (599 KB, 2560x1196)
599 KB
599 KB JPG
When I get DDR6 + CXL Motherboard + Blackwell + NPU I’m going finetune at home and beat these current day fine-tuners into a brick wall.
>>
>>107798728
16GB of DDR6 will be like $3000
>>
File: luigigi.png (424 KB, 608x602)
424 KB
424 KB PNG
>>107798757
It costs roughly 1.00 USD in materials to make 16GB RAM.
>>
>>107798789
Yes but that 1 USD of RAM could be put in an AI GPU that businesses will be more than happy to pay $3000 for. You can outbid them, right?
>>
>>107798789
I'm not saying current prices are anywhere close to reasonable but mentioning raw materials price for a product so complex and advanced as ram memory is fucking bullshit even if you ignore profit margins
you know it
I know it
Luigi was based tho
fuck insurance industry for real
>>
>>107798789
Raising a child to 18 in the US is around 300,000 - 390,000
More people should die, imagine how many resources could be saved
>>
>>107798701
>pic
"20 bottle caps for the negro to mine it, why do you ask?"
>>
>>107798829
I agree with everything said here.
>>
>AnythingLLM
>Jan.ai
>something else?

I feel like I want to switch up my frontend and these stood out. What's the best option? I'm more of a casual user.
>>
>>107796867
You mean whisper v2
>>
>>107799056
This. I don't know how anyone manages to use v3 with all the hallucinations during any second of silence.
>>
>>107798688
Just use command-r or llama3 eva0.0
>>
>>107798598
>alpindale
Buy an ad.
>>
>>107798789
At what scale?
>>
>>107799759
In your garage with walmart handtools and $1 worth of amazon parts.
>>
>>107798701
Jewish in the evil sense, not the silly curly hair guy sense.
>>
why did they call it Router-weighted Expert Activation Pruning? why not Router-weighted Activation Pruning of Experts?
>>
I need to devise the successor to cockbench
>>
>>107800112
cuntbench?
>>
>>107800087
Chicken sandwich. Sandwich of chicken.
>>
>>107800137
but the acronym would have been funnier.
>>
>>107800144
It's funny if it doesn't sound contrived.
>>
>>107800087
They decided that they wanted to use REAP as their acronym and worked backwards from there
>>
>>107800177
>>107800287
RAPE would have been more fitting because it basically rapes the models.
>>
>>107800087
The same reason why the Neural Image Generation via Generative Adversarial Rendering paper was never released
>>
Today I just tried ChatGPT again to see how it compares to local and it's still terrible on the free tier. So many people are experiencing absolute garbage without knowing it kek. Literally Mistral Small or something did better in my test. Whatever they're serving feels more like an 8B, or maybe 20B MoE.
>>
>>107791898
they are talking about thickness
3inch diameter is pretty big
>>
File: 250px-SiliconCroda.jpg (20 KB, 250x174)
20 KB
20 KB JPG
>>107798789
turn this into a microchip for me
>>
>>107800583
I canceled my plus sub to just go local+openrouter.

So far I've only spent 6 cents in the last 2 days. honestly it's really nice to have access to all the big flagship models if you need really solid answers. but 70% of the stuff I can ask day to day any old sub 32B model can answer it just fine.
>>
>>107800624
Aren't they made out of sand?
>>
>>107800649
Yeah I stopped my sub a long time ago. Even a year ago there were more than enough alternatives.
>>
https://x.com/ltx_model/status/2008595989096177962



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.