[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 1751383194547290.jpg (231 KB, 1312x816)
231 KB
231 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107815785 & >>107803847

►News
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2
>(01/05) Nemotron Speech ASR released: https://hf.co/blog/nvidia/nemotron-speech-asr-scaling-voice-agents
>(01/04) merged sampling : add support for backend sampling (#17004): https://github.com/ggml-org/llama.cpp/pull/17004
>(12/31) HyperCLOVA X SEED 8B Omni released: https://hf.co/naver-hyperclovax/HyperCLOVAX-SEED-Omni-8B
>(12/31) IQuest-Coder-V1 released with loop architecture: https://hf.co/collections/IQuestLab/iquest-coder

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>107815785

--Qwen 235B vs newer GLM models: upgrade considerations and performance expectations:
>107821338 >107821349 >107823142 >107823219 >107823230 >107823247 >107823263 >107823265 >107823273 >107823294 >107823283 >107823291 >107823303 >107823327 >107823415 >107823332 >107823360 >107823364 >107823397 >107823642 >107823718 >107823664
--Mad Island mod enables LLM NPC interactions, sparking player nostalgia:
>107820759 >107821087 >107821110 >107821211 >107821320 >107821094 >107822132 >107822930
--DeepSeek coding model launch timeline:
>107824356 >107824413 >107824461 >107824479 >107824486 >107824495 >107824504
--How OpenAI chat systems manage conversation history and prompt caching:
>107822818 >107822876 >107822911 >107823248
--Context size vs speed tradeoffs in Koboldcpp model optimization:
>107821567 >107821938 >107821948 >107822160
--RTX 6000 Ada cost and model compatibility debate:
>107824787 >107824842 >107824852 >107824970 >107825130
--Jamba's uncensored state tied to architectural flaws hindering effective refusal training:
>107824915 >107824997 >107825017
--Optimizing Mistral Small models with DRY sampler and parameter tuning for roleplay:
>107818078 >107818100 >107818123 >107818145 >107818161
--LLaMA model evolution and hardware limitations discussion:
>107821121 >107821141 >107821273 >107821548 >107821573
--Critique of ChatGPT's basic memory implementation in free tier:
>107815963 >107815987 >107816786 >107816032 >107816055
--Jamba model's context handling and performance evaluation:
>107820773 >107820898 >107821422 >107821112
--MoE expert routing complexities and research-driven optimizations:
>107823553 >107823599
--Critique of low-quality datasets and excessive training practices:
>107823952
--Miku (free space):


►Recent Highlight Posts from the Previous Thread: >>107815790

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107826648
>--Miku (free space):
>
What a horrible thread.
>>
>>107826669
I see
>>
File: 1719954801389509.jpg (182 KB, 821x1199)
182 KB
182 KB JPG
>>107826669
>>
best programming model for tool calling with 19GB unified memory?
>>
>>107826689
Thank you for blessing this thread.
>>
File: its dead.jpg (119 KB, 2483x458)
119 KB
119 KB JPG
>>107826699
>>107826689
>>107826648
tranny still spamming his sona i see
>>
>>107826694
nemotron 30b at q4 or maybe q3. who the fuck has specifically 19gb tho?
>>
I don't know how jeets vibecode llamacpp prs, claude max/opus4.5 barely knows how to write complex linq
>>
>>107826795
maybe something fucked with my system
regular nemotron 30b didn't work but i can try q4
>>
>>107826819
actually it looks like even the smallest quants of nemotron are ever so slightly too big for you to fit with any meaningful context. there really are not any good coding models below 30b, so you might just be out of luck.
https://huggingface.co/unsloth/Nemotron-3-Nano-30B-A3B-GGUF/tree/main
https://huggingface.co/bartowski/nvidia_Nemotron-3-Nano-30B-A3B-GGUF/tree/main
>>
>>107826813
Prompt issue, almost certainly.
>>
>>107826813
>implying you need to know how to code to vibecode
Jeets just do the needful while you complain and try to make a perfect solution. "Barely works" is enough, that's why companies hire them and not (You).
>>
>>107826837
yeah just noticed nemotron-3-nano:30b is literally the same file as nemotron-3-nano:30b-a3b-q4_K_M
i'm stuck with gpt-oss:20b, then
>>
>>107826853
seems so. youve gotta get yourself some better hardware if you want better models.
>>
>>107826861
my setup is good but i cheaped out on ram
>surely if i need ram i can just buy more later
fuck my chud life
there will exist better models on my hardware before i buy more ram
>>
>>107826643
soulless compared to the original image
>>
File: kronii cat maid side view.jpg (187 KB, 2048x1825)
187 KB
187 KB JPG
I have 128GB unified RAM. I'm looking for models for
>coding/tool use
>image generation
>research assistant (focus on STEM stuff)
What are my best options? I'll be running these under LocalAI so pretty much any format will work.
>>
What in the fuck did ikawrakow do? Why the fuck did git history get rewritten in the past week or so? The repo is fucked so I can't just pull. Super bad practice. Was this another hissy fit about licensing/attribution?
>>
>>107827160
image gen will have to be separate from this but glm air at q5 or so is the general recommendation for 128gb. will give plenty of space for context and the image gen model.
>>
.
>>
>>107826643
This image is so much more organic than the regular miku spammer autist. This actually engages discussion.
>>
>>107827163
who fucking cares any more llama.cpp can do whatever his shit did
prove me wrong or whatever
>>
>>107827217
>48
>attn norm 35,5120
what kind of moesissy or vramlet model is this
>>
>>107827272
48 is the number of layers
35 is the sequence length
5120 is the hidden dimension
the model is llama 4 scout
>>
File: 1762017110878579.jpg (771 KB, 1125x976)
771 KB
771 KB JPG
>>107826643
>Jamba2 3B and Mini (52B-A12B) released
Anyone tried this yet? Size would be great for my system.
>>
>>107827325
>llama 4 scout
you might be the only one trying to finetune that thing. also you're wasting compute if you want it for erp.
https://github.com/adobe-research/NoLiMa
>>
>>107827163
Few weeks ago I tried the repo and every time I generated something it had fixed seed.
I guess it was a new feature but they didn't say how could I change it. And I didn't care, deleted it.
>>
File: 1760539180034592.png (266 KB, 471x521)
266 KB
266 KB PNG
>>107827347
>>
>>107827347
Tried it for something like 5 minutes. Wasn't very impressed.
>>
>>107827463
the two more weeks continue
>>
schizo is still desperate for attention
>>
so, another chinkslop year?
>>
>>107827347
So it has 52B params but it's barely better than ministral 14B. Ok. good to know.
>>
can we have deepseek v4 mini pls?
>>
>>107827523
There is no incentive to create small models.
>>
>>107827565
small models are inherently communist. Therefore china has all the incentive to make them.
>>
>>107827506
It's... actually more retarded than 14B or even the old Nemo when it comes to actual real conversations. These benchmarks might as well be lies.
>>
>>107827600
China is not a communist country.
>>
File: 1432498179182.png (296 KB, 722x768)
296 KB
296 KB PNG
Can you system prompt random or timed events without it being mentioned in the chat between us? F.e if I make an asthmatic AI and tell it to cough occassionaly I want it to not do "I will cough now" or some retarded shit.
>>
Next Kimi will b3 2T-A3B
>>
Best model for dev, 32gb vram, 128 ram?
>>
>>107827610
yes they are,their inherent pursuit of it is the problem
>>
>>107827653
glm4.7 at iq2m. or sucking sama's dick in the hopes that he will give you some ram.
>>
>>107827620
https://docs.sillytavern.app/usage/core-concepts/macros/#randomization
>>
Is there a version of Nemo that isn't pozzed as fuck?
>>
>>107827931
Extreme skill issue
>>
>>107827620
The issue is always going to be that it can't do the random roll mid-reply. Doesn't matter if you're using the ST-included randomizer or a tool call to a dice tool.
It'll always happen after the reply.
>>
File: 1738508222040753.png (159 KB, 634x868)
159 KB
159 KB PNG
>>107827931
Works for me
>>
anyone see this model?
https://huggingface.co/FreedomIntelligence/openPangu-R-72B-2512
>>
>>107827970
Huh. I asked something far tamer and it would not shut up about equality and respect.
>>
>>107827980
>what is a system prompt
>>
>>107827604
>These benchmarks might as well be lies
Many such cases
>>
>>107827490
Every year is chinkslop year until we ban all Chinese nationals from the US and an ITAR compliant version of HuggingFace exists for publicly funded research models.
>>
>>107827980
PLEASE learn how to use an LLM before coming here and complaining about a model
>>
>>107827348
I'm confused, is it the architecture or the training that limits the model? maybe he can fix it with training it on better data?
>>
Some madlad actually made a merge of GLM 4.6 and 4.7.
https://huggingface.co/shamwowzer/prototype-glmx01
>>
Anyone tried the Nex 8B version?
The full 670b-something model was one of the few good enough ones for high context rp sessions available on OR, roughly equivalent to Gemini Pro 2.5, but they just paywalled it.
It's not available as a gguf file so i'm too much of a retard to get it to work with sillytavern to try it.
>>
>>107827977
>72b
>it's not a benchmaxx'd tune of qwen2.5-72b
color me surprised
>>
>>107828115
Yeah I'm kind of curious about it. We need goofs now. New MoE size we've never seen before, run on a fairly modest system. 24T training tokens.
>>
>>107828105
Nevermind, looks like someone made one:
https://huggingface.co/mradermacher/internlm3-8B-Nex-N1-i1-GGUF
>>
>>107828105
>the full 670b-something model
This? That's just deepseek, of course it's good.
https://openrouter.ai/nex-agi/deepseek-v3.1-nex-n1
>>
File: file.png (50 KB, 830x289)
50 KB
50 KB PNG
>>107826643
I approve of holocaust denial training. The AI should be smart enough to distinguish the truth from the lies instead of trying to push agendas it's been told, otherwise AI is completely useless.
Pic related, it can't even follow simple fucking rules without occasionally trying to push feminist agendas in my medieval roleplay.
>>
File: 1759248425031664.png (94 KB, 276x405)
94 KB
94 KB PNG
>>107828288
The thing is, you have to treat models like a retarded child. You can't just establish a medieval setting and assume that it will conform to that era's politics, you have to explicitly tell it to do so. As much of a meme as it is, 'prompt engineering' is essential to get good outputs, even when your use case it just to write a cohesive story.
>>
>>107828304
I literally have a straight forward rule telling it that women can't hold noble titles. Just like I have a rule explaining that it's a men's world and women have no rights.
It will ignore rules at random no matter what you tell it, that's just how all AIs are.
>>
>>107828324
>>107828304
Hell I even added an additional instruction reminding it that women can't hold noble titles as part of my response inside [] brackets, and it still fucking did it anyway. That's how it knows what I was talking about when I called it out on it.
>>
>>107828324
Which model are you using?
>>
>>107828339
>inside [] brackets
Oh, no. Those are the super important brackets. How could it ignore those?
>>
>>107828343
The only model that allows full blown pedophilia, Nemo 12B. I get similar results on GLM though, so it's definitely not just a 12B problem. All AI tend to randomly ignore parts of your system prompt, and that's my initial argument; that an AI that can't follow all instructions without question or agendas is not something that can ever be relied on or trusted for anything. These little shits will never be real AI, it's an insult to human intelligence to call it AI at all.
>>
>>107828362
You might need to use a lorebook to fix that
>>
>>107828374
You have no idea for how long he's been posting screenshots. May as well recommend him some meme samplers or to add "follow the rules" to the prompt.
>>
>>107828362
No one who knows anything about current models would call them intelligent, they're just token predictors.
Regardless, low active parameter count could be a problem, assuming by 'GLM' you mean Air. Mistral Small for example is perfectly capable of making female characters submissive and take a lower role in society when prompted to do so.
>>
File: 1756853540540388.jpg (1.27 MB, 3610x5208)
1.27 MB
1.27 MB JPG
>>107828382
That's him?
>>
>>107828288
>mikupad chad
>model uses word I don't like
>stop gen, change word, continue gen
>>
>>107828384
Mistral small has the same problem. It will follow some of your rules most of the time, while breaking some of them. There is no AI that can keep up with a full "world setting" at all times. Some rules it will absolutely consistently violate repeatedly.

>>107828394
Yes, I've been here a while...

>>107828406
I increased the banned strings token limit in koboldcpp just so I could ban more words from ever being used in addition to all the slop. There's a shit ton of modern terms the AI insists on trying to use in a medieval roleplay that I had to ban.
>>
>>107828288
It's the 13th century for heaven's sake!
>>
>>107828362
>Nemo 12B
Not the guy, but you mean straight the base model without any tunes?
>>
File: times have changed.png (1.16 MB, 3406x1259)
1.16 MB
1.16 MB PNG
>>107828483
lol it's been 3 years anon. my hate for women has only grown since that day

>>107828490
Well yes, tunes don't really make a big difference. They can reduce a tiny fraction of the slop, so it partially helps. I specifically use the rocinante v1.1 tune for the chatML template which helps reduce how much the model tries to write your character instead of it's own.
>>
Is there any extension that allows it to eat folders so it's not just dumped here?
>>
>>107828362
>>107828324
>>107828394
it's probably just a context length issue.
context rot is a known problem. models tend to follow instructions just fine. they just forget or are bad at reasoning.
as the context length gets higher, the correlation between words at distance gets more and more sparse. works fine for needle in a haystack problems which they benchmark for, but not complex reasoning and logic which they don't.
>>
>>107828662
the solution to this problem is fairly simple: go agentic.
models can always follow simple prompts. knowing this, you design a series of agents which can apply a series of simple rule following guidelines.

honestly the biggest failing of these threads is that everyone always insists that they can get everything done within the memory of one model.
this isn't richard sutton's bitter lesson. you don't have the luxury of waiting for a bigger and better model to come out and blow your engineering out of the water.
>>
>>107828662
Yes, and no. The problem does get worse with longer context length, but it can still also easily ignore a rule on it's first response. So there is more to it than that.
>>
>>107828703
agents to the moon sir llm2.0!
>>
>>107828703
none of us have the patience to jury-rig a bunch of models together
>>
>>107828703
sure, as soon as sillytavern supports multiple APIs and assigning different tasks and characters to different APIs
>>
>>107828709
https://www.youtube.com/watch?v=TUjQuC4ugak
https://www.youtube.com/watch?v=8OvIeJUc1N0
it really is just the model being stupid and incapable of reasoning.
engineers deal with this shit every day.
>output in json format. add no additional characters
>ok what about this ```python
the answer is and always has been that you can't do anything about it, just cope.

this is easier in engineering land where we have somewhat expected responses
no clue what you coomers are going to have to do but it may involve running the same prompt 10 times and collecting the average using similarity search
>>
>>107828731
you needs to build these yourself sir
>>
>>107828738
Your example is easily solved using grammars.
>>
>>107828753
I guess. we don't have access to those when using APIs. it's often just as simple to just use regex to grab the contents of the code block and ignore everything else. this is what mem0 does in production.
>>
>>107828738
Coomers play to like 20-50 messages tops, they aren't relevant in this discussion.
I'm a cultured gentleman that does medieval roleplays to 300-1000 messages. My focus is majorly on escaping this shitty reality, not on jerking off.
I need a highly intelligent AI capable of following the rules and sophisticated world settings of the worlds I write, like an interactive book where I control the narrative.
I doubt that will ever happen though, we are pretty much on the brink of collapse.
>>
>>107828797
glm 4.6 can do that up to about 64k context. if you can get a corporate model to comply, you might be able to get up to 256k context at most. what you want does not exist and will not for at least 3 or 4 more years.
>>
>cloode killing itself
Funniest 2026 moment so far
>>
>>107828797
>300-1000 messages
by then any model has certainly completely forgotten about the system prompt or can no longer interpret it correctly.
you need to occasionally re-inject important messages into the trajectory.
you will also have to trim the trajectory history and replace it with a summary that gets occasionally regenerated.
>>
>>107828797
>300-1000 messages
What does your context get to? Even SOTA api models get noticeably dumber past like 32k.
>>
>>107828816
>>107828817
System prompt is always at the top of context.
I use summaries and lorebook entries to work around context problems like that.
For example if a new character is introduced that is part of the story for a while, I add a lorebook entry for it with all the details of that character, and a summary of how they are relevant to my character.
>>
>>107828841
But your initial complaint is models not conforming to the scenario you've set up, which can be a result of the model just getting dumber, so I'll ask again. What is your context getting to when you're having these problems?
>>
>>107828841
it can't be a manual process. I mean that you have to blow away the conversation history up until like 5 messages ago and replace the whole history with a prompt describing the story arcs.
>>
>'stutters sometimes when flustered'
>EVERY reply begins with a stutter
Makes me want to rip my hair out
>>
>>107828970
Use thinking models and make it think hard
>>
>>107828970
LLMs have no concept of time
>>
File: file.png (12 KB, 593x117)
12 KB
12 KB PNG
>>107828776
>we don't have access to those when using APIs
Your local model?
>>
>>107828738
Making LLMs output json is a retarded idea. Too many points of failure in that shitty syntax and not easily fixable. Pseudo-xml tags are better suited
>>
>>107828861
Yes, it shouldn't have to be a manual process, it's annoying, but that is the best we can get these days.
>>
>>107829072
i would sooner make it output markdown than anything resembling XML inshallah
>>
>>107829095
Learn english syntax first, rajeesh.
>>
>>107829104
sounds like you're the rajeesh here grammarlet
>>
File: pr.png (105 KB, 500x523)
105 KB
105 KB PNG
Many of you don't realize what we could have right now. It wouldn't be that difficult for a company to make a good creative model. Give it modern knowledge with wikipedia. A bit of coding data so it understands token efficient character/setting formatting. Some back-and-forth D&D, erotica, and forum posts. A couple books on historical attitudes and practices from antiquity through to the modern age. Finally, focus the bulk of training on 1940s-2000s fantasy/sci-fi/historical fiction novels, as well as some japanese light novels.
Sell access to the big model as an ultra-advanced AI dungeon remake. Market by publicly releasing an under 70B model. Millions upon millions of dollars from autists and creative professionals.
>>
>>107829192
Well then start with it, nigga.
>>
>>107829192
I'll edit the README.md
>>
File: 1762257186330714.jpg (54 KB, 522x522)
54 KB
54 KB JPG
>>107829192
You retards don't understand the scope at which LLMs are trained, they're already throwing literally everything under the sun at them.
What you want is just a regular model that isn't hammered with RL for benchmarks, which we've had.
Then you'll complain that that model is retarded (shocker, it's not even 100B!).
It's always been about parameter count.
>>
>>107829344
>they're already throwing literally everything under the sun at them
no, they very much aren't, and they're proud of filtering most out because it's ""toxic""
>>
>>107829309
I'll add the LICENSE.md
>>
>>107829344
Distills and crap datasets curated by third worlders + misconfigured training parameters is why we have no good small models
>>
>>107829358
Filtering doesn't change the fact that you can't fit a 'good creative model' inside 30GB or some shit.
"Market by publicly releasing an under 70B model" nigga seriously? You think API models like Claude are good/popular because they got some secret sauce? No it's because it's a fat unquanted model
>>107829405
There are good small models, but relative to larger models they are simply fucking stupid. For their size they are good but people here expect miracles on top of using lobotomy quants.
>>
>>107829425
>You think API models like Claude are good/popular because they got some secret sauce?
for aicg denizens it certainly seemed to be the case when their proxies had it, now they're coping with gemini and whatever else they get their piss soaked hands on
>>
>>107829192
You don't realize how good we already have it. Even a model as small as 24B can be a good DM if you break down tasks into smaller pieces and manage short-term and long-term notes separately. I'm astonished that this shit isn't mainstream yet. I suppose people who love DnD and programmers are two separate groups that don't overlap much
>>
>>107829442
there's a large enough proportion of programmers having a melty, and I think DnD players tend to be amongst the group having a melty.
>>
>>107829442
>I suppose people who love DnD and programmers are two separate groups that don't overlap much
I was interested in DnD many many years ago, but I couldn't get anyone else I knew interested enough to get games going so I gave it up.
>>
>>107827160
>>107827197
I have a similar setup and GLM 4.5 air is the best I've found thus far. I use llama.cpp built with vulkan (amd APU).

One thing to call out though is image gen kind of sucks on unified compared to a dedicated GPU. Still works, just a bit slower. You don't need a lot of VRAM for image gen so if you have a GPU lying around somewhere that might be a better option.
>>
>>107829344
>they're already throwing literally everything under the sun at them
Except when you look at some of these model's training data you can see that there isn't a single book in the entire corpus
>>107829425
>you can't fit a 'good creative model' inside 30GB
Yes, you definitely can. You can't fix shit training data with high parameters, see Llama 4
>>
One of the biggest flaws is that the model doesn't have access to previous chats and keeps making the same shit again and again. There isn't enough context to throw all chats into it, but it works if you have a small task, like, generating a BBEG. You keep previous outputs, and it will start with boring shit like

- The Hollow King – A once-noble ruler reduced to a skeletal figure by his own curse, ruling through fear and necromantic puppets.
- The Fleshweaver – A surgeon who stitches people together into monstrous hybrids, seeking to "improve" humanity against its will.
- The Shadow Puppeteer – A thieves’ guild master who controls others via cursed masks, but his own face is slowly erasing.

And after a while you'll start getting

- The Clockwork Plague – A disease spread by mechanical spiders, turning hosts into ticking bombs.
- The Tidecaller – A leviathan-riding pirate who drowns land to create a new oceanic empire.
- The Glass Prophet – Shatters truth into shards, forcing people to choose which lie to believe or go insane.

That's already a huge progress for a braindead 24b
>>
>>107829559
>Except when you look at some of these model's training data you can see that there isn't a single book in the entire corpus
didn't one of the recent release literally brag about that in their readme something like "books 0" iirc
>>
File: 1762194898780261.jpg (78 KB, 425x614)
78 KB
78 KB JPG
>>107829559
>Yes, you definitely can. You can't fix shit training data with high parameters, see Llama 4
So? Most models aren't Llama 4, a model handled so bad its leads left Meta's AI department. Your collection of amazing fantasy novels isn't going to beat Shannon's theorem and produce a 32B model that is somehow astonishingly better at writing creatively than all the other 32Bs. This is literally the same mindset as finetune tards.
>>
>>107829572
Yeah it was Nemotron
>>
>>107829709
What? Why would you ever think that training on math would produce better creativity than books?
>>
>>107829764
Okay Drummer.
>>
Is it possible to dynamically select a -dev device in llama.cpp based on name? When I wake my desktop from sleep my igpu and dgpu switch device names and it totally messes up my llama-swap config file.
I want to either select the device by name somehow or force linux to use Vulkan0 for my dgpu. Disabling the igpu isn't really an option since I use it for other things too and it massively slows down inference if I use both.
>>
>>107829709
By this logic a 32b trained on only german will be the same as one trained only french.... you're kinda retarded bro...
>>
File: 1746482988344010.jpg (164 KB, 936x936)
164 KB
164 KB JPG
>>107829764
>>107829788
Do you really think math/coding being in the dataset is what's holding back the perfect creative writing model from being made? Take that shit out, replace it with whatever skyrim fanfics you've got saved; the end result will still be retarded and people will come here to complain it has "no spatial awareness" or "good understanding of anatomy"
>>
This guy has to be baiting right?
>>
File: nemotron 0 books.png (78 KB, 439x944)
78 KB
78 KB PNG
>>107829741
that do be it thanks, crazy to see what shit they waste compute on
>>
>>107829794
>"no spatial awareness" or "good understanding of anatomy"
I believe this can only be solved with native multimodality, and you can't change my mind
>>
>>107829804
>we can make the best 12B/32B/70B erp model ever made we just need a really really good dataset of books or some shit
IDK, are you?
>>
>>107829827
So this: >>107829808
Would be the same as an equivalent trained on mostly books? Is this your argument?
>>
>>107829827
>we can make the best 12B/32B/70B erp model ever made we just need a really really good replacement for transformers or some shit
>>
>>107829815
I believe it can be solved with copious amounts amounts of tokens and excessive thinking. And parameters.
>>
>>107829849
bidet will safe us
>>
>>107829776
Drummer trains using synthetic data and ESL RP logs, not books
>>107829794
Nice pilpul you fag. When did I ever say that math/coding shouldn't be in the dataset? Yeah that's right I didn't. How can you compare ZERO books being in the data to finetuning on skyrim fanfics? Deliberately dishonest argumentation, go fuck yourself
>>107829827
>If you train the model on fiction... It won't be better at fiction!
Okay you're just retarded
>>
>>107827347
Tried Q5, holy jesus this thing is dumb as rocks, 13b tier but has some charm to it + low safetyshit. Feels like using old 2023 and earlier models but in a more usable form, so if you're looking for that feel, give it a try.
If you're going to try it, keep temp low low, use strict official ChatML formatting without anything except user assistant. Usually models can figure out custom formatting often with benefits, but this one shits itself.
>>
>>107829841
>>107829875
Go train your AO3 budget model then. If you gimp out on the parameter count it's going to be shit much like Nemotron in spite of being trained on a curated dataset. I don't know what's so hard to understand here. Why the fuck are you even bringing up Nemotron when it's dogshit and you know it is? How does that back up your point in the slightest?
>>
>>107829922
and would a 1T model on nemotron dataset be good according to you?
>>
>>107829943
Hit delete on that post lil nigga, you're genuinely fucking retarded if you believe there's no difference jumping between say 3B-7B, 12B-32B or let alone 1T on the same dataset
>>
File: 1745797389998589.png (1.12 MB, 1080x720)
1.12 MB
1.12 MB PNG
i feel like im the only one in the world interested in the goon jar
>>
>>107829957
Someone will be the first one seal it up and do it, might as well be you.
>>
>>107829957
have they actually demoed it in a video yet?
>>
>>107829957
These don't even look 3D. I don't see the point.
>>
It will be a truly wonderful day, when we can buy a hologram jar direct from China, AI generate a 3d character, and hook it up to an LLM via sillytavern.
>>
>>107830005
i probably wont use it much, just think it could be interesting if its cheap enough
>>
>>107829943
if you train a 32B and 350B coding model each on the same dataset the 32B one is going to be somewhat usable while the 350B one will shit all over its brother... this seems like common sense
>>
File: nikke 2b.gif (657 KB, 637x358)
657 KB
657 KB GIF
>>107829957
It's a novelty you'll try once and never use again, maybe worth it if you can find a cheap enough clone on Ali.
>>
>>107830056
The one from Razer won't be, that's for sure
>>
>>107830067
are there any alternatives?
>>
>>107830027
I think VR headsets are more interesting. Give me a local Neuro in VRChat.
>>
>>107830077
>VR
nah give me one in AR. i have a quest 3 and have been waiting for one.
>>
>>107830077
VR is still lame and low-poly tbdesu, a roughly laptop sized anime jar would be much more convenient.
>>
>>107830075
Hardware is easy enough to replicate. Actually, I think I'd immediately buy one if it were Deepseek or Qwen-branded just to keep it on my shelf
>>
>>107829441
The logs are utterly wretched there lule >>107829979
>>
>>107829192
https://docs.mistral.ai/models/mistral-small-creative-25-12
Here's your ultra-advanced creative writing AI dungeon remake under 70B (24B) trained on meticulously curated data bro. We are ALL hopping off 300B moes to use this shit
>>
File: 1731086290564808.webm (3.92 MB, 1080x1080)
3.92 MB
3.92 MB WEBM
>>107830082
AR, VR, whatever, same thing in this context. VRChat can use passthrough. The main benefit is that's it's an already existing engine with powerful customization and input capabilities so you can use it as the renderer for your chatbot's avatar in XR headsets. Remember this webm?

>>107830101
PC skill issue. Frame's included dongle will even make it easy for idiots to set up.
>>
>>107830156
>Remember this webm?
first time seeing it, also brb
>>
>>107830148
I don't think a single anon on /g/ has tried that model. It could be good lol, is it a new model or a glorified finetune?
>>
>>107830148
>Model that's literally not available to publicly download
Nice one bro, really showing off your intellect
>>
File: file.png (10 KB, 486x136)
10 KB
10 KB PNG
>>107830240
skill issue just email them
>>
>>107830240
It's on the api you dumb chud, or does your "creative writing" involve raping little girls?
>>
>>107830250
LOCAL MODELS GENERAL NIGGER
>>
>>107830259
u serious bro?
>>
File: rock cds.jpg (39 KB, 600x450)
39 KB
39 KB JPG
>>107830250
Of course not. I prefer little girls raping me
>>
>>107830268
yeah
>>
>>107830156
You have very feminine hands.
>>
>>107830297
do not the anon
>>
File: ulfric.png (138 KB, 294x311)
138 KB
138 KB PNG
A book-based creative 32b denseGOD model would wipe the floor vs your estrogen MoE 500b or whatever the fuck you spent $10K to run at q2
>>
File: 1697989269419795.webm (3.93 MB, 1024x1024)
3.93 MB
3.93 MB WEBM
>VRChat with passthrough
>>
>>107830306
>32b
>denseGOD
only thing a 32b is going to be wiping is my ass
>>
>>107830318
because all the 32b shit we've had in like 2+ years is qwencensorshit
>>
>>107830322
all things considered there was command-r
>>
>>107830306
You need minimum 70B for any decent results, whether MoE or dense.
>>
>>107828797
>I'm a cultured gentleman that does medieval roleplays to 300-1000 messages
I'm a 17th level Evoker that built a stone house on a lake in The Shire. By summer the halflings will have the first of my aeroyachts bult.
>>
>>107830347
>t. 3090 hoarder
>>
File: 1751489580594712.png (295 KB, 640x640)
295 KB
295 KB PNG
>another /lmg/ class warfare has broken out between cpusissies and nvidiacucks
>>
>>107830310
My zuck, what long tongue you have.
>>
toss-240 when?
>>
Can embedding models also have refusals when they process shit from goon models or I can just use anything?
>>
still waiting for the day where i can generate live2d models
>>
>Retards ITT are advocating for 1T+ models trained on synthslop that are barely better than 70B from two years ago in creative writing.
>>
How can I tell if the model is using the vectors? If I ask it directly it has no clue.
>>
harmony format for finetuning just werks, had way less issues finetuning oss-20B on tool calling stuff. Wish it was the standard for everything.
>>
>>107829891
yay someone reused my Miku!
>>
>>107830832
nice headcanon, are we reading the same thread tho
>>
https://rentry.org/miqumaxx
>404
Total Miku Death
>>
File: 1733686181b802828.png (532 KB, 857x691)
532 KB
532 KB PNG
>>107830832
> 1t params

We need at least 10T.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.