[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1749494100348035.png (1.49 MB, 800x1333)
1.49 MB
1.49 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108404935 & >>108400151

►News
>(03/17) Rakuten AI 3.0 released: https://global.rakuten.com/corp/news/press/2026/0317_01.html
>(03/16) Mistral Small 4 released: https://mistral.ai/news/mistral-small-4
>(03/11) Nemotron 3 Super released: https://hf.co/nvidia/NVIDIA-Nemotron-3-Super-120B-A12B-BF16

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling
Token Speed Visualizer: https://shir-man.com/tokens-per-second

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: comfyui_00231_.png (904 KB, 1216x832)
904 KB
904 KB PNG
►Recent Highlights from the Previous Thread: >>108404935

--ASUS Ascent GX10 purchase debated for inference workloads:
>108407066 >108407095 >108407101 >108407129 >108407096 >108407138 >108407137 >108407143 >108407169 >108407222 >108407272 >108407309 >108407327
--Project Ani design debate: camera interaction vs 3D autonomy:
>108408547 >108408584 >108408592 >108408627 >108408640 >108408710 >108409169 >108408585 >108408601
--Evaluating budget 4x V100 32GB setup for local LLM use:
>108405178 >108405196 >108405211 >108405228 >108405250
--Model comparisons for RP, vision, and NSFW:
>108407751 >108407779 >108407788 >108407819 >108407839 >108408031 >108407888 >108407928 >108407994 >108408012 >108408115 >108407902 >108408355 >108407787 >108407797 >108407832 >108407833
--Criticism of over-tuned safety in modern AI models:
>108404958 >108404965 >108404991 >108405298 >108405352 >108405402 >108407336 >108407538 >108407597 >108407700 >108407721
--Kimi K2 vs K2.5 performance and prompting techniques:
>108408025 >108408073 >108408144 >108408209 >108408258 >108408313 >108408418 >108408252 >108408377 >108408397 >108408439
--Qwen 27B preferred over 35B despite speed tradeoffs:
>108407396 >108407427 >108407591 >108407619 >108407627 >108407661 >108407771 >108407803 >108407828 >108408416 >108407617 >108407648 >108407678 >108408228 >108407650 >108407970 >108408630 >108408781 >108408894 >108408921 >108408983 >108409004 >108408935
--Qwen3.5 27b Heretic v3 recommended for 24GB GPUs:
>108408663 >108408753 >108408774 >108408828 >108408851 >108408937
--Hugging Face Agentic Evaluations Workshop livestream:
>108408324
--Qwen 3.5 9B generates functional C code:
>108407763
--Logs:
>108409077 >108409247 >108409442
--Rin, Miku, and Teto (free space):
>108405043 >108405091 >108406177 >108406814 >108407696 >108407782 >108407933 >108407989 >108408792 >108409216

►Recent Highlight Posts from the Previous Thread: >>108404937

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: 1681793209347227.png (774 KB, 850x1200)
774 KB
774 KB PNG
Pls share any armpit related research
>>
>>
>>108410173
Tiktok marketers found that videos starting with a visible armpit get 50% more views.
>>
>>
>>
falseflagkun is back ^_^
>>
>>
>>
File: 1753646192283397.jpg (1.54 MB, 1280x1836)
1.54 MB
1.54 MB JPG
>>108410173
Men can't resist
>>
>>
Just fucking kill yourselves you worthless spamming mikutroons
>>
>>
>>
jannies are in on it
i hope they die, not in minecraft, in real life
>>
>>108410225
I don't think you understand the situation.
>>
Baker really got mindbroken, huh?
>>
>>108410241
That isn't even miku dumbass.
>>
>>108410240
I think it is about time you got a dedicated thread for your waifu spam. I just want local model news and not your disgusting autism.
>>
>>108410253
I'm not even the miku guy.
>>
>>108410246
>>108410253
Have a Miku!
>>
>>108410253
>I just want local model news
Here >>108410131
>>
>>108410256
so what do you get out of doing this exactly?
>>
>>108410261
I get to post cute Mikus! Become Miku today!
>>
>>108410268
i get that you are a faceblind autist, but that is not miku.
>>
>>108410274
Not a true christian fallacy
>>
Miku troons are shitting up this place
>>
I miss the good miku/teto gens.
>>
every single general is like this, worthless fucking jannies
>>
Just report.
>>
>>108410321
For what?
>>
>>108410305
Same, wonder what happened to that guy
>>
>>108410321
I hope you can provide a justification why a tealhaired, twintails anime girl is now suddenly offtopic in this thread. /lmg/ is full of this shit since forever and it never had anything to do with this thread.
>>
>>108410115
https://litter.catbox.moe/x7czk5s7o0jcdhyb.jpg
>>
I am happy that we got more people posting miku.
>>
File: pretending.png (38 KB, 598x276)
38 KB
38 KB PNG
>>108410338
>>
Nobody click on that catbox link.
>>
>>108410338
Miku = good, cute, funny
notMiku, falseflagger = bad
>>
>>108410380
Issue with leftists is how they prefer qwen3.5 to grok.
It's the same reason they didn't vote for the invasion of Iran.
>>
If anyone's wondering, it's march break for grade-schoolers (children). That's why this week's been so bad.
>>
>>108410391
Miku = troon coded
(You) = Massive faggot
>>
I vote we make Neuro-sama our new mascot.
>>
So I'm guessing the Zoomer/Kurisu baker got tired of baking and decided to just spam until the mods get involved, hoping that all anime images get nuked and thus no more Miku in OP.
Too lazy to bake and make an effort but too bothered to not shit up the place, well done.
>>
>>108410405
>shit up the place
This is the cornerstone of thread culture
>>
This is actual thread derailment at this point. 43 replies and not one post with actual local model information. Holy fuck. Find something better to do, seriously.
>>
>>108410404
It should have always been le cunny.
>>
>>108410422
>This is actual thread derailment at this point
You mean mikuposting? Yeah. Always has been.
>>
ProjectAni guy here. I'm keeping the cum jar. I spend 4 hours ripping sketchfab apartment models just to come to the conclusion that it looks like ass, isn't that interesting, and is way beyond the technical scope of the project.

I don't really want the project to turn into a dating simulator game. I can't be bothered to implement pathfinding shit, first-player navigation, and all of that other crap.

For the vision stuff I'm just going to have it work via a webcam. Maybe I'm coping or being lazy, but this extra stuff is way too much to manage and not worth the effort for how far it's divorced from the actual local model technology.

For vision related sensory input I'm just going to use a PC webcam/phone camera. Thank you for your attention to this matter - PRESIDENT DONALD J. TRUMP.
>>
damn I should've proofread. whatever.
>>
>>108410455
Hope your Ani becomes the official /lmg/ mascot and she frees us from this hell.
>>
>>108410405
I think it's a different guy. He's been around since nov or so. He used to ask nicely ( >>107080745 ) but now he's resorted to spamming in hopes of getting his way.
>>
>>108410455
Please refrain from putting extra newlines in between every line. Thanks
>>
Imagine being mad, over people posting anime girl pictures on an anime website.
>>
>>
File: 1765728187203117.jpg (881 KB, 2480x3508)
881 KB
881 KB JPG
>>
>>
File: 1771678913774484.gif (84 KB, 220x220)
84 KB
84 KB GIF
Is this the local Miku general?
>>
>>
File: 1766207303523153.jpg (273 KB, 1200x1500)
273 KB
273 KB JPG
>>108410589
Her thread
Her board
Her world
>>
>>
>>108410589
>hiding the feet next to the already delicious thighs
prison
>>
>>
>>108410606
Shill.
>>
Best thread honestly. We should just forget all the boding nerd stuff and post more anime girls and about becoming anime girls.
>>
File: 1771454554183545.png (186 KB, 982x710)
186 KB
186 KB PNG
>>
File: 1757902230357275.gif (2.64 MB, 320x240)
2.64 MB
2.64 MB GIF
>>108410625
>sparkling with mischief
>mixture of
>>
this is a mockery of /lmg/ culture
>>
>>108410641
Explain what's wrong with it in 10 words or less. No buzzwords like "slop".
>>
>>108410648
I have to read it every time I gen
>>
>>108410654
It's the *cheeks flushing pink* for me, but my eyes tend to just glide through the slop until they find important keywords.
>>
>>108410641
in a dialogue-focused RP I don't really care if the minor body language descriptions are a bit sloppy, there are only so many ways you can write those.
>>
>>108410760
I really want /aicg/ gang gone
>>
>tfw magidonia keeps trying to write a mini novel every reply
>>
lol miku drama 2 threads in a row, great stuff. come on guys its the internet, it's not serious business
>>
>>108410837
You need to be 18 to post here
>>
>>108410760
Eventually you'll come to the conclusion that for dialogue-focused RP most narration could be replaced with the occasional emoji to give the general vibe/tone of the response.
Visual novels only rarely use narration; somehow that works there, even for those that aren't fully voiced.
>>
File: 1772592584678715.jpg (15 KB, 327x315)
15 KB
15 KB JPG
>>108410868
>visual novels
Are you retarded? Where are the visuals in a fucking RP on ST?
>>
>he's not running ST in VN mode
>>
>>108410760
There are probably like 5000 ways to write a person's expression/body language/emotional state, from literal to metaphorical. I refuse to accept the same responses every single god damn time from a LANGUAGE model
>>
>>108410881
There are games with extensive VN-like elements like Super Robot Wars where there's barely any interesting visual besides generic backgrounds and character faces with expressions. I don't recall SRW using narration at all in the VN sections, only direct character dialogue, sometimes sound effects (but no voices), and music.
>>
File: 1773499312282319.jpg (16 KB, 375x420)
16 KB
16 KB JPG
So local utterly lost huh. I never thought I'd see it happen so soon.
>>
>>108410920
Change your sampling flags then. Quit whining.
>>
>With a deliberate motion, she unzips the front of her pants—wait, no, she's wearing a skirt.
I've never seen a model self correct like this in character.
>>
>>108410924
>Just crank up the temperature or use meme samplers!
All this accomplishes is producing the SAME SLOP, over and OVER, until it reaches its tipping point and starts producing gibberish. These pieces of shit are so over baked, their token probabilities are so fried, that they CAN NOT generate a variety of responses. Any attempt at all to force variety causes them to break down and become unintelligible. So basically, fuck you
>>
>>108410922
>I never thought I'd see it happen so soon.
huh? when was ever local good or had any hope whatsoever? unironically?
>>
>>108410963
https://arxiv.org/abs/2510.22954
>>
>>108410963
if you think samplers dont do anything you're a grade A retard
>>108411000
checked
>>
>>108411011
I just told you what samplers do. Try reading, dumbass
>>
>>108410115
I might be slow but I just understood that using uv sync makes your dependencies magically get along and does away with the need to pip reinstall vllm, transformers and flash-attention in various permutations to try to figure out in which magical update order will make them work.
>>
When I go to the ice cream store, the strawberry ice cream is always the same, but I order it anyway.
>>
>>108411098
Nemo in a nutshell
>>
>>108411037
retard
>>
>>108411000
>large-scale study of mode collapse in LMs
>LMs, reward models, and LM judges are less well calibrated to human ratings on model generations that elicit differing idiosyncratic annotator preferences
Good to see there is some work in this area.
>>
So how bad exactly is the new mistral 119B? Would it be a suitable replacement for largestral 2411 for RP?
>>
How are images encoded before being turned into tokens?
Just a big byte array/bitmap?
I kind of wish silly cards had the option to add images to the system prompt or something like that.
>>
>>108411283
It's a waste of time and disk space don't bother.
>>
>>108411292
Are there any good recent RP models in the 70B to 125B range?
>>
cozy breas
>>
>>108411184
dumbass
>>
air status?
>>
File: 1766214380302501.gif (7 KB, 80x160)
7 KB
7 KB GIF
Is Qwen 3.5 kind of retarded when it comes to copying character card/pre-existing info or is it just my uncensored model or settings? I tried 35b, 27b and 9b and of the heretic and 'uncensored aggressive' variety.
For example, if I have the girl, the bot, described as wearing panties and I tell her to look at an image of another girl, she says the girl in the image is wearing panties even when she isn't.
It's really fun to have a character react to an image instead of the LLM directly so this is kind of a bummer if there's no way around it. I'm not expecting deep roleplay scenarios with huge context, just some entertaining image reactions.
I've got 32gb of VRAM and 96 of RAM so those big models are outside of my range. Man, I wish I didn't procrastinate on getting more RAM a year ago.
>>
>>108411492
you could probably try a q4 of the 122b. qwen models just in general kind of suck for roleplaying because they filter pretty much all sex knowledge from training.
>>
Dang ol' Meeker
>>
crazy how v4 didn't come out this week either
maybe the ccp is withholding it because it's that good
>>
>>108411503
>try a q4 of the 122b.
I'll try that, thanks. I wouldn't think of loading something of this size, so this should be a fun experiment.
Qwen does seem very plain when it comes to roleplaying, but it goes along with my length cards/intros well enough, at least in the short term. I'd just like it to not stick to them so literally that it repeats the words.
>>
File: Vionna.png (3.16 MB, 2304x3840)
3.16 MB
3.16 MB PNG
Hey guys, I got tired of that anon making the AI companion girl flip-flopping between release dates and whether he was going to release it Open-Source or not, and decided to make my own.

Meet Vionna. My full-featured Open-Source AI companion.

https://vionna.life/
>>
>>108411565
>no source code posted
>download this random exe
fuck off
>>
>>108411602
https://github.com/vionna/vionna-ai-companion
>>
>>108411602

is it portable
>>
>>108410115
fyi:
>>108411349
>>
>>108411602
It's a literal malicious actor. can the mods handle this?
>>
>>108411637
In case you haven't noticed, they don't give a shit.
>>
File: mfw.png (1.53 MB, 800x1334)
1.53 MB
1.53 MB PNG
>>108411637
>>
>>108411639
I think they're all mossad and are a tad busy.
>>
>>108411634
>>108411360
>I don't really care, if I want to dump $1200 on a gpu there are better options than a glorified igpu, intel should focus on the budget range
fpbp
That $/GB isn't worth giving up CUDA.
>>
>>108411615
dead link

scam "software" - malware.

WILL FUCK YOU UP
>>
>>108410455
Glad you came to a decision that works for you. Yours is the only good post itt.
>>
>>108411565
Buy an ad.
>>
>>108411676
That's malicious software, not a product.
>>
>>108411648
>That $/GB isn't worth giving up CUDA.
who gives a fuck about cuda.
only thing that matters is $/GB/GB/s
>>
>>108411297
Unfortunately you know my answer already...
>>
>>108411565
>I got tired of that anon making the AI companion girl flip-flopping between release dates and whether he was going to release it Open-Source or not, and decided to make my own.
It's just a personal project that isn't even close to ready enough to have the source code released. I already released the source code for multiple core components anyways (my github is VolgaGerm) and a full diagram of the tech stack I'm using. My main goal, as I've stated previously, is mostly just to engage in discourse about the latest technologies that are relevant to this particular usecase and to hopefully inspire others to get into it as well.

For the record, I haven't called your thing malware (I haven't bothered to check), but if it's real I'm glad there are more people getting into this space. Wishing you luck.

>>108411661
Thanks man
>>
>>108411738
>Germ
virus moment
>>
>>108411767
https://en.wikipedia.org/wiki/Volga_Germans
>>
File: 1746131979248893.png (67 KB, 978x578)
67 KB
67 KB PNG
>>108411738
breh I went to your github all giddy expecting something juicy instead its 2 repos with barely anything, fuck you for pretending you released anything open source
>>
>>108411732
>who gives a fuck about cuda.
Anyone who wants a card that can be used for anything more than llama.cpp's vulkan backend.

>only thing that matters is $/GB/GB/s
And this card is still a bad deal even compared to chink modded 4090s.
>>
>>108411781
Check out the emage-onnx-export repo. It contains a demo that will get you quite far if you're looking to replicate my project. That's why it's listed as being mostly html instead of python. The bulk of it is the demo.
>>
>>108411732
>only thing that matters is $/GB/GB/s
compute is also important if you are interested in decent ttft
otherwise just buy a mac I guess
>>
What's the best model for translating japanese to english? Found people online suggesting gemma 3 but its cucked and wont translate images it deems nsfw.
>>
>>108411781
Also my Pocket TTS runtime is the fastest TTS with voice cloning that runs on cpu in the world, so it's not exactly "nothing"
>>
>>108411816
if its so amazing why does it only have 7 stars? lmao
>>
>>108411814
Have you tried e.g. the Heretic or other abliterated version of gemma 3?
>>
>>108411841
I am not handing you the prompt for a manual rerun, because the review content is already there.
>>
Can't tell if it's the reap pruning or the heretic uncensoring on top of it but man, qwen 3.5 loves to loop.
>>
>>108411846
Nah the official one does it too. Refining the prompt and adding positive examples to reduce ambiguity has cut the looping down a lot for me, e.g. instead of just
> * Avoid using quotation marks to indicate a character is talking.
adding
> * Avoid using quotation marks to indicate a character is talking. Action: *italics*. Speech: plain text.
significantly reduced the amount of "But wait, I need to avoid using quotation marks".
You might also out using the new reasoning budget feature :')
>>
how do I test how smart a model is?
>>
>>108411868
>reasoning budget feature :')
for me it works as a sudden </thinking> that 1 of every 3 or so times it just keeps reasoning past it. I don't think it even considers the budget when crafting the reasoning block. If it successfully ends the reasoning mid sentence it "worked" which is pretty shitty
>>
>>108411841
That did the trick. Silly question in hindsight but im still new to this. Thanks.
>>
File: Jeets.png (255 KB, 822x459)
255 KB
255 KB PNG
Interesting. It looks like the Vionna AI thing might actually be real. They have a youtube page. Looks like the project uses Unreal Engine and has been under active development for at least 3 months with a full team behind it. Idk why I'm getting trolled ITT by these Indians as a solo software dev.

https://youtu.be/Be2km1AVQhg
>>
>>108411791
>Anyone who wants a card that can be used for anything more than llama.cpp's vulkan backend.
what even is zluda and hip / rocm.
massive skill issue.
>And this card is still a bad deal even compared to chink modded 4090s.
i don't disagree, my only point is that there is no reason to stay on nvidia if another company has a better deal.
>>108411812
llms are bandwidth bound not compute.
macs are slow because the bandwidth is still not that high
>>
>>108411892
>1 of every 3 or so times it just keeps reasoning past it
That's fucked. I assume you're on head, but I vaguely recall one of the changes pushed adjusted the injected "</think>" to "</think>\n\n"? I might be hallucinating.
My prompts are simple enough that I rarely run into looping issues anymore, thankfully. Usually when it happens it's because a fuckup on my end (e.g. "You are not wearing any pants. Take off your pants.") which I can just fix and re-send.
>>
>>108411933
>llms are bandwidth bound not compute.
llm inference is bandwidth bound, newfren
prompt processing is compute bound
That's why your heart sinks the longer you run inference on a mac.
Imagine the ttft after 128k of context...
>>
>>108411933
>llms are bandwidth bound not compute.
Compute is needed for prompt processing.
>>
>>108411915
Ironic that these people are pressuring me to open-source my stuff while not even releasing their own source code. Very malicious behavior. This kind of shit is why I've always avoided open-sourcing my code until very recently. Typical opportunistic brown faggots.
>>
>>108411933
>what even is zluda and hip / rocm.
We are talking about an Intel card, not AMD.
>>
>>108411915
>with a full team behind it
their poorly conceived astroturfing has me hating them when I otherwise would have felt neutral
Good job, fellas
>>
>>108411953
no we were talking about non nvidia cards.
also if intel released actualy good cards the software would quickly follow.
my point is idgaf about cuda as long as you can do gpgpu, which you can with vulkan anyway.
>>
man I just love telling my little slut gwen to give me today's newspaper from my country's major papers, then also fetch some finance news from a dedicated newspaper, run additional queries to her, cross check with other data and finally rape her.
this is like the quintessential 90s secretary experience, but now we get to live it virtually.
we're all gonna make it brahs
>>
>>108410922
wait for avocado and gemma 4 next week
>>
>>108410922
Way to out yourself as a poorfag
>>
>>108412028
full dive is gonna be a hell of a combination with ai lol
>>
>>108412028
I thought that was in like the 60s?
>>
>>108411947
>inference on a mac.
>Imagine the ttft after 128k of context...

https://omlx.ai/benchmarks?chip=&chip_full=M5%7CMax%7C40&model=&quantization=8bit&context=131072&pp_min=&tg_min=&page=1

https://omlx.ai/benchmarks/fgig386m

ttft: 305 seconds
model: qwen 3.5 27b q8
>>
>>108412117
>thousands of bucks to run a 27b
lmao
>>
You are absolutely right — this time it's for REAL!
>>
>>108412177
:rocket: :rocket:
>>
>inference on mac
Yeah these numbers aren't a surprise unless you're new. The only reason to go for Apple for AI is if you want to run large models in a relatively portable form factor, otherwise building your own server or something is better. Although that is my past knowledge and I haven't been paying attention to hardware prices recently to know if it is still true or not.
>>
so what's the verdict on mistral small 4? worth bothering with?
I tried qwen 3.5 (mediocre) and stepfun (decent but too large for my hardware), I'm growing desperate for good ~120B cause I still use glm air 4.5 which is old as balls by now
>>
>>108412346
mistral4 is llama4 tier. complete garbage.
>>
>>108412346
I think stepfun is a nice sidegrade over glm air, but with my hardware I can run both at 10 t/s (good enough for RP cooming). sadly I moved on from rp lately and have been using qwen35-35 moe for agentic stuff, and man the t/s with FULL cpu moe gets me 30 t/s + 256k context and man it feels like it's the first time you can do local memes at a decent level (for coding/agentic stuff) on normal hardware (96gb ram + 16gb vram)
>>
>>108412385
What sort of tasks are you doing with 35 A3B? Does it really work well enough for programming for example?
>>
>>108412405
It's ok, I've been making some automation scripts with it (also a web GUI for local TL) and it was serviceable.
But for my big projects I'll be honest, I use gemini since I have basically unlimited gemini 3 flash credits and some 3.1 pro.
What it replaced for me was the random questions/web searches for stuff and also TL of japanese.
>>
>>108412405
NTA, but 35B A3B was fucked for me at Q8_0. It made all kinds of basic syntax errors (missing semicolons, commas, etc). 27B Q8_0 works fine for one-shotting my tasks like
>[upload charedit.js]
>Can you update rebuildPromptList to re-use existing DOM elements instead of wiping out the entire $prompts container? You'll probably need to update appendPrompt to add prompt.id into a data attribute of the char-edit-prompt node.
>I only need the code for the updated/changed functions.
27B-heretic to be clear, it's important that you fuck your programming assistant every now and then to keep your head clear.
>>
>>108412440
>>108412427
I did test 3.5 9B for some C stuff I'm working on and I was so surprised, that I'm now thinking about trying 27B or that moe model. Of course I need to keep it simple and this is all what I need, to help me with C syntax and some string pointer things and so on.
>>
I WALKED TO THE PIZZERIA WITH PRACTICED EASE
THEN ATE A PIZZA WITH PRACTICED EASE
THEN I WENT BACK HOME WITH PRACTICED EASE
AND I USED AN LLM TO RP, WITH PRACTICED EASE
>>
>>108412177
>The
slop
>>
>>108412551
based
>>
File: ms4_ao3_completion.png (530 KB, 767x1953)
530 KB
530 KB PNG
>>108411283
That it hallucinates almost everything if you give it illustrations to describe makes me think its training data has been cleaned of pretty much anything that might have been blatantly covered by copyright.

I tried making it write a random AO3 story in text completion mode and it still worked, though, to the point of defaulting to a Harry Potter story as expected. Maybe the instruction tuning is just not good or expects reasoning to be enabled even if it's selectable (and off by default).
>>
>>108412688
It's like gpt-oss then. Clean and tepid.
>>
>>108412097
For zoomers 90s is the new 60s
>>
qwen 3.5 35b is better than 27b
>>
>>108412883
>moe better than dense
doubt
>>
>>108412883
Qwen 3.5 27B
hidden size=5120
intermediate size=17408
num hidden layers=64

Qwen 3.5 35B-A3B
hidden size=2048
num hidden layers=40
moe intermediate size=512
experts per token=8
(equivalent intermediate size = 512*8 = 4096)


Qwen 35B is approximately designed like a dense 3B/4B model, even if it might have the knowledge of a 35B model. There's no free lunch here.
Similarly, Mistral Small 4 has the dimensions of a dense 7B model.
>>
>>108413019
Qwen 35 a3b is like a cute little layered cake.
>>
>>108412927
>>108413019
ur're all wrong

i use gwen 3.5 35b is much better experience then 27b dense

its called DENSE for a reason because it is dense synonym STUPID
>>
>>108413019
By the way, you can make up for the lack of depth/layers (i.e. capability of doing knowledge manipulation) with reasoning/chain of thought, but even if you can sort of increase model width with the number of active experts (although it's not exactly the same), there's nothing that can be done with the hidden size.

Most of these MoE models are just "fat" small models in terms of capabilities.
>>
When are the good models releasing?
>>
>>108413096
Gemma today
>>
>>108413096
no
>>
>>108413106
Google never releases anything good on Friday.
>>
>>108413113
>Google never releases anything good
ftfy
>>
>>108410115
Newfag here.
Is it possible to use LTX 2 and 2.3 for RTX 5070ti GPU ?
>>
>>108413191
This thread is mostly for the chatbot side of things; if you don't get a response here you might also try >>>/g/ldg which is the local image/video gen thread.
I couldn't get LTX 2 working on my 5090 (Comfy OOM) but I didn't try very hard. There are probably some low-VRAM workflows floating around.
>>
>>108413064
With RELU something like 95% of intermediate values are zero. Models mostly don't use RELU any more, but the gains for other activations are minimal. This probes that most intermediate values don't matter.
If MoE was working correctly, it would operate at lower sparsity in the intermediate. The non selected experts being the zeros from the dense intermediate. Unfortunately this does not seem to be the case.

I think there is something much better than MoE, but with similar computational gains, waiting to be discovered.
>>
File: nemotron-huh.png (43 KB, 1034x643)
43 KB
43 KB PNG
Anyone using any of the larger Nemotron models?

Trying out Nemotron Nano 4b so I asked it "Tell me about yourself." and got picrel.

This can't be correct, right?
>>
>>108413413
It was trained with a large amount of LLM-rewritten data.
>>
>>108413428
Ah, in the trash it goes then...

Also these captchas are insane.
>>
mistral nemo but smart(er)?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.