[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1768264705236546.jpg (157 KB, 768x1024)
157 KB
157 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107921731 & >>107914740

►News
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash
>(01/15) PersonaPlex: Voice and role control for full duplex conversational speech: https://hf.co/nvidia/personaplex-7b-v1
>(01/15) Omni-R1 and Omni-R1-Zero (7B) released: https://hf.co/ModalityDance/Omni-R1
>(01/15) TranslateGemma released: https://hf.co/collections/google/translategemma
>(01/14) LongCat-Flash-Thinking-2601 released: https://hf.co/meituan-longcat/LongCat-HeavyMode-Summary
>(01/08) Jamba2 3B and Mini (52B-A12B) released: https://ai21.com/blog/introducing-jamba2

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: mikubottle.jpg (205 KB, 1216x2048)
205 KB
205 KB JPG
►Recent Highlights from the Previous Thread: >>107921731

--Timeline and technical debates around DeepSeek model releases:
>107922047 >107922102 >107922134 >107922190 >107922192 >107922263 >107922268 >107922197 >107922305 >107927264 >107927277
--AI waifu setup configurations and performance optimization discussions:
>107922967 >107922985 >107923006 >107923019 >107923043 >107922968 >107923077 >107923122 >107923148 >107923128 >107923273 >107923358 >107923499 >107923539 >107923630 >107923684 >107923821 >107923396
--GLM 4.7 implementation issues and corporate model support failures in llama.cpp:
>107924976 >107925028 >107925070 >107925033 >107925052 >107925487 >107925059 >107925167 >107925249
--koboldcpp tool calling success with Claude Code:
>107923072 >107923151 >107923232 >107923276 >107923318 >107923370 >107924367 >107928718
--SillyTavern 1.15 grammar regression causing backend miscommunication:
>107930122 >107930132 >107930143 >107930150 >107930195 >107930211 >107930219 >107930227 >107930230 >107930273 >107930285 >107930310 >107930392
--Deepseek Engram's memory optimization potential:
>107928083 >107928137 >107929659 >107929729 >107929804 >107929834 >107930168 >107930235 >107930255
--Mistral AI's evolving MoE strategy and regulatory challenges in AI development:
>107922324 >107922381 >107922428 >107922466 >107922383
--Ollama's new macOS image generation tools:
>107928508 >107928534 >107928556 >107928610
--Anthropic's UI implementation flaws vs AI capabilities:
>107927370 >107927398 >107927422 >107928095
--Character card-based recursive roleplaying techniques for LLM ERP:
>107928002 >107928077 >107928818 >107928135
--Comparing and finetuning TTS models for improved voice cloning:
>107923777 >107923865 >107923926
--Miku (free space):
>107922190 >107923053 >107923415 >107925167 >107926879 >107926940 >107928797 >107930443 >107930897

►Recent Highlight Posts from the Previous Thread: >>107921736

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
>>107931327
You're absolutely right!
>>
Is it feasible to program, vibecode, compile C++, and do full stack web development on a phone? Wanting to continue building my AI projects while I'm traveling.
>>
>>107931385
no
>>
>>107931385
you can do a lot with termux, but for your own sanity at least get some external keyboard
>>
>>107931385
Yes.
>>
>>107931385
Not at all and it will take ages before it becomes viable, LLMs are still shit at writing large amounts of complex code
>>
>>107931442
>filtered by skill issue
>>
>>107931385
ever heard of something called a "laptop" ?
>>
>>107931442
>large amounts
You're not supposed to do that
>>
>>107931460
In this economy? Don't be ridiculous.
>>
>>107931385
Yeah, I love typing with a single thumb. Very relaxing.
>>
>>107931466

no one laughs as you might seem half capable to press keys on corporate set device
>>
>>107931453
>not wanting to use (or having to fix) dogshit code is a skill issue
The absolute state of Cloud cucks
>>
>>107931385
Sure, it's possible. Just have your development environment set up somewhere and remotely access it. There are web versions of vscode.

Is it going to be a good experience? Definitely not. Just carry around a laptop if you want to actually get anything done without wanting to blow your brains out.
>>
>>107931385
> termius on phone, SSH into a real computer
> run claude code from SSH
> profit
If you use an API the "real computer" can be a SBC sitting on your network.
>>
https://xcancel.com/bdsqlsz/status/2013953926685483171#m
>>
File: file.png (756 KB, 1735x1278)
756 KB
756 KB PNG
GLM Flash is weird. Even with the chat template it breaks apart if it's forced to write lewd.
This is bf16 after the fix.
>>
Despite the fixes, GLM 4.7 Flash continues being a piece of shit and do retarded "safety policy checks" in its thinking.
>>
>>107931672
https://huggingface.co/stepfun-ai/Step3-VL-10B
>It consistently outperforms models under the 10B scale and rivals or surpasses significantly larger open-weights models (10×–20× its size), such as GLM-4.6V (106B-A12B), Qwen3-VL-Thinking (235B-A22B), and top-tier proprietary flagships like Gemini 2.5 Pro and Seed-1.5-VL.
>>
>>107931686
What's the smallest model you can encode the answers to all popular benchmarks in?
>>
all GLM models are garbage and I can't wait for them to be the first to die in the bubble explosion
unless they get acquired by novelai who need a text model and are too impotent to make their own
>>
>it's back
>>
>>107931686
>10B model beats le proprietary cloud SOTA!!!1!
this time it will be true, r-right bros?
well, even if there's surely some maxxing involved in those numbers, if it's even remotely true I'm happy to have a solid small-ish VLM
>>
Drummer literally sucks literal dicks.
>>
File: file.png (216 KB, 914x1050)
216 KB
216 KB PNG
John got REKT god DAMN fucking ROASTED
>>
>>107931744
even if they really managed to beat SOTA their model is not something you'd want to use
their core concept which they introduced in a previous 8b text only model:
https://arxiv.org/html/2601.05593v1
>We introduce PaCoRe, a general framework that decouples reasoning volume from model context capability by coordinating parallel reasoning, enabling multi-million token effective test-time compute
no I'm not going to gen multi millions tokens to gen some slop
>>
>>107931816
That actually makes me want to try it. Surely it doesn't think for a million tokens on each request.
>>
>>107931841
>Surely it doesn't think for a million tokens on each request
I wouldn't be surprised if it did, the Qwen models are not that exaggerated but close enough in my taste that I'm unwilling to use the thinking versions of their models even when they're as small and as fast to inference as 4B
god those models can yap and yap and yap for an eternity
>>
>>107931811
He is aware of KLD. PPL is fine for sanity checks
>>
>>107931811
Johannes Gaessler is one of my inspirations. He's such a genius.
>>
Is GLM cucked or will it do whatever you ask out of the gate like Mistral and Cohere?
>>
>>107931954
It will do a lot with a proper system prompt and it will do anything if you don't let it think.
>>
>>107931962
It'll also do anything if you let it think with a safety check prefill.
>>
is there a way i can use local models like nemotron but via cloud on my phone? like i have no gpu but i can pay to use it
>>
>>107931991
https://openrouter.ai/
>>
>>107931962
>it will do anything if you don't let it think.
I think it's pointless to use the model like that. It's been thoroughly trained to use thinking, not enabling it would be gimping its capabilities.
>>107931954
If you tell it not to be cucked or if you add additional rules in the system prompt, it will consider that as "jailbreaking" just before concluding its reasoning and will likely output a refusal. It's annoying because it makes you waste time; Gemma 3 feels less cucked in this regard: it either works or it doesn't.
>>
>>107931811
>>107931906
Where is llama-kld?
>>
>>107932060
waiting for your pr
>>
>>107931941
Hans Gruber is more to my liking. He's a practical developer.
>>
File: Prime Intellect.png (277 KB, 1279x386)
277 KB
277 KB PNG
You guys remember the Prime Intellect company right? The one that makes the Intellect LLM's off of distributed training? Well I just read a short story yesterday called "The Metamorphosis of Prime Intellect" and I believe that to be where they got the name. In the story there is an AI called Intellect that iterates up to 39 before it gets the new name Prime Intellect, and that same naming scheme seems to be what Prime Intellect (company) is doing. They also seem to share a butterfly motif, with the short story having a butterfly at its intro and both Intellect-1 and 2 having butterfly motifs as well.
I thought that was pretty neat.
>>
File: triple distilled.jpg (278 KB, 912x1536)
278 KB
278 KB JPG
>>
>>107932226
39
三九
ミク
Miku? Is she going to make me immortal?
>>
File: 49262.png (263 KB, 460x460)
263 KB
263 KB PNG
>>107931811
Would he say the same if it was pic related that measured perplexity of wikitext?
>>
>>107932226
so 38 more distributed training experiments and we'll have fully open source decentralized AGI? Not even sure I'm going to be alive for that much longer.
>>
Can we expect any major architectural changes to LLMs without it completely destroying the FOSS sphere because of incompatibility issues? Are we just stuck with constant parameter increases in LLMs, which is a major hardware constraint?

I haven't seen anything promising lately in terms of optimization or efficiency. I really honestly thought that more research would be put into this by big AI companies given the rising expenses of energy and computer hardware, but it just seems that they're doubling down and even thinking about getting nuclear reactors and more manufacturing. It's bleak.
>>
>>107932535
Cute boy. Would ravage his bussy
>>
>>107932594
You are a homosexual sir.
>>
>>107932592
Well, 37 at least. Then once they run out of numbers and still haven't hit AGI they'll start going 38a, 38b, 38zzx4.2.5.thefinalseason.parttwo, etc.
>>
>>107932614
Yeah
>>
>>107932615
Like drummer?
>>
>>107932314
That is not snow she's buried in
>>
https://huggingface.co/microsoft/VibeVoice-ASR
guise do we finally have a whisper contender after 50 years?
>>
>>107932593
Optimizations are proposed almost daily in papers. But most of them lead nowhere.
Anyway, when something actually good is released, someone will definitely write code to support it, just like llama.cpp was written when LLaMA was leaked.
>>
>>107932671
They don't mention what languages are supported anywhere, so it's probably English and Chinese only again. So, no.
>>
>>107932696
it transcribed jap for me on the demo page tho
>>
>>107931679
3b active parameters is too low. It's reminiscent of some older models that experimented with low parameter count.
>>
>>107931679
Including the FA fix?
Make sure you aren't adding the BOS token twice too somehow.
>>
File: 1741326374618030.png (842 KB, 1479x1866)
842 KB
842 KB PNG
Gemma 4 will NOT come out
GLM 4.6/4.7 Air will NOT come out
ALL future local models will either be nemo sidegrades or require 256gb to run at q2
There will never be another 70-100 beak range model
Local peaked at L2/L3 finetunes and GLM 4.5
>>
>>107932798
Do you mean https://github.com/ggml-org/llama.cpp/pull/18953
That shouldn't change the output, right?
As far as I know llama.cpp fixes double BOS tokens even if you managed to send it like that.
>>
>>107932773
Would 10 parameters and 3 moe experts be better, for a total of 30b?
>>
>>107932593
>I haven't seen anything promising lately in terms of optimization or efficiency.
The only one focusing on this and also releasing weights is DeepSeek so it all depends on V4.
>>
File: hiPlGm-388222338.gif (921 KB, 320x240)
921 KB
921 KB GIF
>>107931679
>He deepened the kiss, his tongue demanding entry, his taste like dark chocolate and rain
>>
>>107932884
yeh
>>
>>107932884
Hear me out, what if... we just had 30b parameters that were ALL active at once? I know, revolutionary, right?
>>
>>107932877
>That shouldn't change the output, right?
It shouldn't but who knows. Wouldn't surprise me.
Won't matter if you aren't using flash attention, of course.

>As far as I know llama.cpp fixes double BOS tokens
Doesn't it just send a warning if there's two BOS tokens, one added by the user and one added by llama.cpp?
I legit am not sure how it works at this point.
>>
>>107932860
Starting to feel like this is true desu
>>
>>107932904
Isn't that just Gemma 3 27b? Why are you even using GLM.
>>
>>107932938
MoE models are always kind of retarded when their active parameters are too low, a dense 30b model would be smarter than a MoE 30b model. Anyways it was a joke, retard-kun
>>
>>107932535
NTA
>>
>>107932860
bald miku tells no lies
>>
>>107933038
The "joke" was obviously getting at a real point though, otherwise you wouldn't have said it. Unless you're retarded.
>>
>>107933042
Is this also a bussy haver?
>>
I use a 13b 4 bit model on my RTX3070 (cuda, gguf) and I'm wanting to try adding TTS. About what model size sounds reasonable to go for?
>>
>>107933219
Get a TTS that runs on CPU. I've had good experiences with Pocket TTS. Didn't like the python implementation so I built a C++ runtime. With quantized onnx models, output streaming, cached voice clone samples, 8 CPU cores enabled, and many other improvements I've implemented you can get a decently high quality voice clone in 160ms. I will release the code eventually but I am on vacation right now.
>>
>>107933064
You're absolutely right! Within the context of the reply chain, the joke is a clever suggestion that moving to a dense model would be "revolutionary" — despite the fact that dense models existed *before* MoE models. The post is insinuating that the technology is actually going *backwards* with the focused development of MoE and the general neglect of dense model production. It wraps up the argument in a neat little package — not just a joke, but a witty 1-liner summary of the current state of affairs.
>>
>>107932925
Labs low key releasing full sized models and some goyslop for the proles.
To be expected since in 3/4 of the western world you can do real prison time for mean tweets.
Chinese models likely get more based when used in their native moon runes. On the GLM site you can get better answers if it thinks or responds in mandarin.
>>
>>107932860
>There will never be another 70-100 beak range model
Did you forget about Devstral 2 already?
>>
mistral models are too much of a joke and should not be mentioned
>>
>>107933605
Name some better models in a similar size range
>>
File: 1751402986426408.jpg (1.16 MB, 3840x2160)
1.16 MB
1.16 MB JPG
>>107933605
Watch your tone when speaking of my girlfriend
>>
i haven't tunneled the gpu but my god it's taking quite a while to get a response using cpu only. is this normal? will it even improve with the shitty gpu i installed on the server (1660ti)
>>
File: dipsyAndTheWhale.png (1.37 MB, 1024x1024)
1.37 MB
1.37 MB PNG
>>107933042
>>
>>107933655
Yeah it'll be faster with your gpu. CPU is slow as balls
>>
>>107932226
Retarded question, but why haven't there been /lmg/ initiatives for distributed training?
I'm sure if we got all the coomers to chip in, we would get plenty of compute for the bigger Nemo we all dream of. Is it really just the data?
Is shoving Anna's Archive's worth of pirated books not enough?
>>
>>107934090
>Is shoving Anna's Archive's worth of pirated books not enough?
lol no
also good luck having anons agree on what parameters to use for the model, moe/dense, how many Bs any special attention flavor of the month, etc.
>>
File: 1760240091958832.jpg (336 KB, 10000x500)
336 KB
336 KB JPG
Anybody tested out this new TTS model?

https://huggingface.co/FlashLabs/Chroma-4B
>>
>>107934090
There was. See: https://huggingface.co/PygmalionAI/pygmalion-6b
>>
GLM Flash might be okay with some fiddling but it's too cucked with Content Safety Check (Critical) to bother trying. Back to Air.
>>
>>107934090
Execs that bring data get board seats. E.g. that guy that run Quora sits on OAI board, I doubt he has any other qual that got him that seat.
That's how important and valuable data is.
Anons here don't have that sort of access.
>>107934103
Not to mention the q of "who would pay for the training runs."
>>
>>107934090
because that's not how this works.
there is no (We) there is only (You)
>>
>>107931954
>Is GLM cucked or will it do whatever you ask out of the gate like Mistral and Cohere?
Try getting https://huggingface.co/CohereLabs/command-a-reasoning-08-2025 to do anything
>>
>>107934090
>Is shoving Anna's Archive's worth of pirated books not enough?
if all you wanted is a basic text completion model we already have a ton of them (nobody releases new ones anymore but there's no point)
making an actually worthwhile llm takes a ton of instruction datasets and not the kind of trash you see littering huggingface
>>
>>107934090
They used 14 nodes of 8xH100 for a 10B 8k ctx model trained on only 1T data, and it took them
>he pre-training of INTELLECT-1 for 1 trillion tokens took place over 42 days
>>
File: 1769035935686.jpg (281 KB, 1024x911)
281 KB
281 KB JPG
why do I need a fucking hf account to use pocket tts?
>error: Failed to fetch: 'https://pypi.org/simple/pocket-tts'
can't even use it without internet
>>
>>107934400
it's a gated repo i think. but there is a reposted ungated one
>>
>>107934166
I get random refusals on GLM 4.7 Flash with very vanilla prompts that Gemma 3 doesn't seem to have issues with (although with very bland outputs). I imagine that Gemma 4 will be gptoss-tier with safety checks, though.
>>
File: whykeyboard.gif (811 KB, 336x252)
811 KB
811 KB GIF
i've been battling finetuning chatterbox turbo for the last day with no usable results. i seem to just get a ton of gibberish. if anybody is interested could i have you try to train a dataset using the repo below and let me know your results? i'm training for turbo in particular, not the multilingual/base version.
https://github.com/gokhaneraslan/chatterbox-finetuning
>>
>>107934090
Not really possible anymore since around when Llama got released and people mostly started thinking about monetization. It will take a group of wealthy benefactors who don't want to earn anything from it and who simultaneously have industry-level knowledge about pre- and post-training (especially).
I don't think you'd even have to train a model from scratch anyway, just continue pretraining a good one with large enough amounts of "good data" and with a good pipeline.
>>
>>107934379
>https://huggingface.co/CohereLabs/command-a-reasoning-08-2025
Maybe it's a reasoning problem. a-03-2025 has never refused me.
>>
File: 1738158710959607.gif (1.61 MB, 498x357)
1.61 MB
1.61 MB GIF
>>107934090
it is simply not worth it now
Every month/couple of months there is a new breakthrough or improvement in the architecture of models, specially in LLMs or T2I
By the time anons would finish training a new model, it's going to be outdated
>>
>>107934632
>Every month/couple of months there is a new breakthrough or improvement in the architecture of models, specially in LLMs
Nemo was released 2024/07
>>
>>107934163
The only time people/users genuinely wanted to help just for the sake (hope) of having an alternative to cloud roleplay models (however misguided the idea was at the time). Forget about doing anything like that again in 2026 and beyond.
>>
>>107934638
stop being poor and run kimi
>>
>>107934649
Not even worth the bits to download
>>
>>107934656
cool. im not here to convince you to use it. that would mean wasting time I could be having fun RPing instead.
>>
>>107934752
Yet you still come here
>>
>>107934758
because i like to see the developments in LLMs. what's your point? i bet you act like this in real life too. what a loser.
>>
Oh boy time for another vomit of random qwens with random claudes mixed in. Truly the art of finetuning at its peak.
>>
>>107934649
Kimi is not worth thousands of dollars
>>
>>107934795
I really don't know why people keep making these. Qwen models have their uses but it sure as fuck isn't RP, shoving slop logs into it isn't going to make them usable.
>>
>>107934382
>if all you wanted is a basic text completion model we already have a ton of them
No we don't. No one trains on raw data in snippets longer than 8K tokens (and most do 4K) which isn't long enough to learn to coherently auto-complete books.
>>
>>107934836
I think people who have been successfully making finetunes for years would know better than you what makes for usable results.
>>
>>107934908
>I think people who have been successfully making finetunes for years
You mean NVIDIA? They haven't done any RP tunes.
>>
>>107934908
just like those successful alchemists know much better than the naysayers about turning lead into gold
>>
why no goof for kimi linear yet
>>
>>107934826
There's always API access.
>>
>>107934986
>https://huggingface.co/ymcki/Kimi-Linear-48B-A3B-Instruct-GGUF
Instructions for use are in the model card.
>>
wow i just enabled gpu on my ollama container and responses are faster but what the hell is qwen's problem? why is he being a dick and not saying hello?
>>
>>107934999
well the problem is I can't coompile it myself
>>
>>107935016
Temp too low?
>>
>>107935019
Why not?
It's easier than compiling some of the python-hell projects in the space.
>>
>>107935016
>I understand that I am in a read-only phase
Is ollama feeding it some system prompt or something?
Also, lollama.
>>
File: hahahahaha.png (893 KB, 1280x720)
893 KB
893 KB PNG
>>107934090
I would be poisoning the training with safety slop just to laugh at you
>>
>>107934908
Name a 'successful' qwen RP tune.
>>
>>107935016
Delete all the garbage you've downloaded and follow https://rentry.org/lmg-lazy-getting-started-guide
>>
>>107934544
I might give it a try, but probably won't have the time for it until this weekend.
>>
>>107935248
NTA but snowdrop was very popular
>>
>>107935261
i don't care about your crypto miner i'm not looking to make local ai porn. i just want a few local llms for coding and basic questions using opencode with ollama installed on my local server
>>
>>107935275
If bartowski didn't make quants for it then it wasn't popular
>>
>>107935296
Then continue to be a retard, no one will help you.
>>
>>107935296
nice bait
>>
>>107935176
You'd do nothing, just like you did with your life
>>
>>107935033
>>107935016

Qwen is just very direct and dry by default
>>
Engram will save us from the quantcuck nightmare
>>
>>107934574
>Maybe it's a reasoning problem. a-03-2025 has never refused me.
yeah it's the reasoning
a-03-2025 is uncucked as long as you don't leave the safety premable on
reasoning-08-2025 reasons safety slop regardless
for glm, 4.6 with reasoning off is uncucked
>>
>>107935466
>we heard you like moe so lets do some more moe while you moe
Is that the one? we have all the pieces for something truly great but lack the vision
>>
File: EVVdwf3UMAEt.jpg (49 KB, 461x562)
49 KB
49 KB JPG
Slightly unrelated, but the jupyter ipynb files are something like a "gui anywhere" for python? I see many llm tests done with it.
>>
>>107935248
>Name a 'successful' qwen RP tune.

https://huggingface.co/anthracite-org/magnum-v1-72b
>>
>>107935549
Not quite sure if "gui anywhere" is something specific that you're referencing that I'm not aware of, but jupyter notebooks are little documents that you can embed code blocks into. Very popular with researchers, students, or people working on stuff who just want to run some simple python and show what it outputs in an inline way.
>>
>>107935176
no you wouldn't
you masturbate about being cool and edgy, but are too cowardly to actually follow through
>>
so... is glm 4.7 flash ok now?
>>
>>107935659
Yep.
It also refuses like a motherfucker if you just let it be.
>>
File: into the trash it goes.png (152 KB, 559x556)
152 KB
152 KB PNG
>>107935659
>A3B
no
>>
With the unsloth GLM 4.7 flash quant, I see "Template supports tool calls but does not natively describe tools. The fallback behaviour used may produce bad results, inspect prompt w/ --verbose & consider overriding the template."

is that an actual problem with the template?
>>
>>107935659
Went like a flash
>>
Is sliding window good for anything or should I just use flash?
>>
I'm ESL so qwen storytelling is enough
>>
>>107935871
Even an ESL should be able to tell good writing apart from the bad ones
>>
How to disable GLM-4.7-flash reasoning in llamacpp?
I tried this chat template, it shows thinking 0 but something feels not right.

{% for message in messages %}
<|im_start|>{{ message.role }}
{{ message.content }}<|im_end|>
{% endfor %}
<|im_start|>assistant
>>
>>107935572
magnum tunes are fucking awful, they always manage to be more slopped than the pre-tune model.
>>
>>107935914
Have you tried editing the official chat template?
Or just send
>chat_template_kwargs: {"enable_thinking": false}
in the request body.
>>
>>107935914
string ban the thinking tokens
>>
>>107935917
well no shit. the whole point of the magnum tunes are to train it on claude outputs.
>>
>>107935731
Not sure about Flash, but I've been trying to get tool calling to work with with ooba for 4.7 and its a nightmare. The unsloth instruct template requires a few things that aren't in the latest jinja filters.py which requires manual editing. Finally got it working but it doesn't call tools correctly with the given reference code from unsloth.

I guess it doesn't help that I'm retarded, but I'm close to giving up and trying again in another month or so.
>>
>>107934885
Increasing context length increases compute requirements. You certainly won't gather enough compute from coomers to pre-train on snippets longer than 4K, more likely 2K.
>>
>load model in kobold
>kobold cli shows it correct
>ST cli shows completely differrent model from the folder loaded
What the actual fuck.
>>
File: Base Image.png (1.47 MB, 1252x3844)
1.47 MB
1.47 MB PNG
GutenOCR: A Grounded Vision-Language Front-End for Documents
https://arxiv.org/abs/2601.14490
>GutenOCR is a family of grounded OCR front-ends obtained by fine-tuning Qwen2.5-VL-3B and Qwen2.5-VL-7B. The resulting single-checkpoint vision-language models expose reading, detection, and grounding through a unified, prompt-based interface. Trained on business documents, scientific articles, and synthetic grounding data, the models support full-page and localized reading with line- and paragraph-level bounding boxes and conditional ``where is x?'' queries. We introduce a grounded OCR evaluation protocol and show that GutenOCR-7B more than doubles the composite grounded OCR score of its Qwen2.5-VL-7B backbone on 10.5K held-out business and scientific pages (0.40 to 0.82). On Fox and OmniDocBench v1.5, our approach substantially improves region- and line-level OCR as well as text-detection recall, but reveals trade-offs in page-level linearization, color-guided OCR, and formula-heavy layouts.
https://huggingface.co/rootsautomation/models
none posted yet
https://ocr.roots.ai/
demo not working yet
probably useful for black and white documents
>>
>>107936476
>We refer to the OCR component that downstream systems actually interact with as a grounded OCR front-end: a model that (i) produces page transcripts, (ii) attaches each token or span to 2D bounding boxes, and (iii) can be prompted to read arbitrary regions on the page.
Really don't like their usage of "front-end" to refer to models and hope it doesn't catch on. Researchers in this field have a really annoying habit of arbitrarily redefining existing terms.
>>
>>107935659
**I don't know**. *You* *should* *try* *it*,,,
>>
>>107936047
Maybe it would be tractable to start with an open pre-trained model and use full novels for a length extension step.
>>
>>107935914
Try using the flag --reasoning-budget 0
>>
man how does a thread go 2 hours with 0 activity
>>
>>107937242
Yeah it's been real dead recently.
>>
>>107937242
Local is having a hard time rn
> Massively increased hardware prices
> Large models are focus of new development
> Cheap API access, especially compared to cost of local inference
>>
>>107937242
It's a negative feedback loop. The more time passes, the lower it appears in the catalog, making it harder for people to find it. And with 4chan X, the icon tells people that there's nothing to see, so they forget and do something else.
>>
OpenAI sounds like it is having a rough go of things recently. Now I know that if they go down hardware prices will remain up, that is to be expected. But if they go down, what do you suppose will happen to all the research that is being done? Do you suppose things will slow to a crawl as all the capital flees the bursting bubble, or do you thing things will carry on as they are just with a few less companies competing against each other?
>>
>>107937298
>And with 4chan X, the icon tells people that there's nothing to see
I don't use 4chan X, is it just hiding the thread if it is lower in the catalogue?
>>
>>107937322
This is just my own conjecture, but I don't think OpenAI is the premier research lab it once was. There's so much talent in DeepMind, Anthropic, DeepSeek and others that I wouldn't be surprised if OpenAI falls behind on pushing transformers farther. Not to say that they're done but the competition is rough and I wouldn't count on OpenAI being the only entity to push progress.

I have no doubt that the market would react poorly to OpenAI going under, but I don't know if it would be the big one. You've still got trillion dollar datacenter and energy build outs, except now with one less player their competitors have to fight with for access to compute.
>>
>>107937363
>DeepSeek
i'm sorry but have you actually used modern deepseek? it's overly verbose and extremely dumb. I feel like I can't entrust it the simplest tasks.
>>
>>107937345
It changes the icon when there are unread posts. I assumed that's how people come back to a thread.
>>
>>107937377
I don't disagree with you but Deepseek does good work releasing new techniques and research papers. In my opinion Kimi and GLM are better. We'll see if Deepseek has what it takes when they release v4 next month.
>>
File: 4chan_notification.png (5 KB, 325x38)
5 KB
5 KB PNG
>>107937385
Vanilla 4chan also notifies.
>>
File: 1749073556125386.png (66 KB, 980x505)
66 KB
66 KB PNG
>go update my llmao with my super duper update script
>it fails
>go check the releases
>last two releases failed to produce any artifact
GGERNIGEROV bros what the fUCK>?!?!?!?
>>
Anons still don't know how to build llama.cpp.
>>
>>107937377
Deepseek is good at research, but training on sloppa leads nowhere no matter how well you improved the architecture. Crazy how big labs don't seem to test their own models out of meme benchmarks
>>
File: 1.png (8 KB, 1042x40)
8 KB
8 KB PNG
>>107937544
why do you need le update script? I went to build the last version and it still works fine, I'm on windows and CUDA just like you.
just git pull the repo then (replace j16 to fit your CPU cores):
cmake -S . -B build -G Ninja -DCMAKE_BUILD_TYPE=Release -DLLAMA_BUILD_TESTS=OFF -DLLAMA_BUILD_EXAMPLES=ON -DLLAMA_BUILD_SERVER=ON -DGGML_CUDA=ON -DLLAMA_CURL=OFF

cmake --build build --config Release -j 16

copy the content of build/bin to where you keep your exes then run
git clean -fxd

to delete build artifacts, rince and repeat whenever you wanna update
there, it's not sorcery
>>
File: 1751307618049514.png (52 KB, 996x599)
52 KB
52 KB PNG
>>107937581
funny thing is I have ikllama builds setup too and various cuda SDKs installed. Usually the llmao guys were on point with their releases, this time they fucked. I checked the actions and ALL the artificats failed to upload due to picrel, which is the CUDA runtime that they just package, guess it somehow got LOST somewhere lmao.
>>
>>107937581
i let terminally autistic fucks do it for me
>>
>>107937614
>rince
opinion rejected
>>
>>107937581
I let deepseek do it for me
>>
>woke up
>no exl3
>back to sleep
>>
File: retard.png (11 KB, 1871x61)
11 KB
11 KB PNG
>>107936039
I'm literally the dumbest monkey retard. Just had to use this endpoint instead of the openai compatible one.
>>107937581
I concur with this anon >>107937623. Fuck messing with WSL, conda, docker etc. Ooba is expedient and has all the core features in a nice GUI.
>>
>>107937743
>WSL
>conda
>docker
>I'm literally the dumbest monkey retard
>>
>>107937788
>build llama.cpp via esoteric methods
>waste time tailoring samplers, parameters, extra flags for each of your models and quants via text
or
>run start_windows.bat
>change sliders, save preset and press "load model"
I won't deny that I'm retarded, but ooba just werks.
>>
File: file.png (1.47 MB, 1024x1512)
1.47 MB
1.47 MB PNG
>>107937544
>>
>>107937832
>https://github.com/ggml-org/llama.cpp/discussions/16938
>>
>>107937968
I had no idea about this. Thanks.
>>
File: 1706625337798408.png (127 KB, 640x480)
127 KB
127 KB PNG
>>107937832
>esoteric
Figure it out once and become empowered. really not that hard &u have LLMs to assist
llama-server web interface is decent now tho be aware it's using LocalStorage if you want to save chats
I was a ooba enjoyer the per model loader presets are nice, but you can do that better with shell scripts/.bat or llama-swap and get better cadence on latest dev
>>107937903
nice
Let's talk about your compile flags Miku
>>
>>107938061
Maybe its time to hang up the ooba then.
>>
>>107938095
i used ooba a lot in the llama2 days, but yeah there's better out there.
some people like llama-server, which is fine, but i personally found kobold to be better for large MoE models at least.
>>
>>107938136
>personally found kobold to be better for large MoE models at least.
how?
>>
>>107938136
Used to use kobold for pyg6b before I moved over to ooba for exl2. Wen't back to goofs but I'm still on ooba. I tried a recent version kobold.cpp and I do really like it with the exception of their ugly ass webui for inference. That's fine though because I use sillytavern for RP.
>but i personally found kobold to be better for large MoE models at least.
ooba is nice in the regard that I can set my tensor split, expert offload flags, context size, and gpu layers all in the same window and then reference the console for what I need to adjust to fit the model. Very expedient.
>>
>>107938165
>the exception of their ugly ass webui for inference
you do know they have like three different web ui themes, right?
>>
>>107937276
>Large models are focus of new development
and anons tell drummer to fuck off when the tries to train smaller rp models

>>107937743
>Ooba is expedient and has all the core features in a nice GUI.
if it works for you, keep using it. nothing wrong with ooba if you don't enjoy spending hours rebuilding and tweaking llama.cpp
>>
>>107938232
>spending hours rebuilding and tweaking
It just works.
>>
>>107932593
Closed models definitely do some weird stuff.
Gemini3 feels like a major leap forward.
Who knows whats going on there.
Local opensource plebs like us completely depend on chinkland. Deepseek and alibaba.
Alibaba has not just qwen but huge stake in zai (glm) and moonshot(kimi) on top of being a leader in image too. Qwen and zimage.
Gemma and gpt-oss are such a joke release in comparison.
I really hope alibaba and deepseek don't go closed. Its all we have left.
Mistral seems to only do finetunes with old architecture recently.
>>
I just learned that glm 4.7 flash has a reasoning effort option...but llama.cpp doesn't support that.
Come on. It rambles like crazy. Bring the vibegooners back.
>>
>>107938300
Source?
All I see is thinking on/off and an option to keep or remove previous thinking blocks.
https://huggingface.co/zai-org/GLM-4.7-Flash/blob/main/chat_template.jinja
>>
>>107938290
>Mistral seems to only do finetunes with old architecture recently.
What should they be doing for small sizes? It's not like there have been architectural revolutions for dense models recently.
>>
>>107938365
Train a newer base 24b with more recent data than late 2023 might be nice...
>>
>>107938365
Give us a good 100b moe model. Something like gpt-oss 120b or air. That works well if you offload into ram.
But actually good and for creative writing. Even their "creative" api model is garbage. It is indeed creative, not the good soulful kind.
>>
>>107938384
The continual pretraining data they've used for Ministral 3 seems sovlful, but the end models are half-garbage for RP.
>>
>>107938419
Yeah 'cause they're prunes of super old Small base.
>>
File: 1764421624149351.png (856 KB, 1024x768)
856 KB
856 KB PNG
>>107937903
Migu! I compiled my own llama.cpp for WinXP.
>>
File: 1737656763391397.png (278 KB, 1920x951)
278 KB
278 KB PNG
>>107931319
Where's the dev? How's it going?
>>
>>107938430
They did a few trillion tokens of logit distillation with newer data after pruning (hence them calling it "a form of continual pretraining" in the paper), that should be enough for healing any pruning damage: you could train completely new models from scratch with all of that. I think the problems are elsewhere.
>>
Reminder to retain your cum until V4 release
>>
>>107938550
what good will the v4 release be to us if when it'll take two months until it may or may not run on llama.cpp to a degree
>>
>>107938061
you don't even need llama-swap anymore
--models-preset arg can load an ini file index of your models and then you can choose any on the oai compat endpoint
>>
>>107938619
Just use the official API endpoint :^)
>>
>>107931319
The getting started guide mentions a fucking 3080, I have a 5070Ti with 16GBVRAM + 64GBRAM, what's the recommended writing models these days? Not interested in porn, I just want to write a fun story.
>>
>>107938658
nemo
>>
>>107938677
Nemo was shit six months ago, did anything change?
>>
>>107938472
burnted out
>>
>>107938683
There weren't any good new models for low-RAM setups
>>
>>107938619
>what good will the v4 release be to us if when it'll take two months until it may or may not run on llama.cpp to a degree
Olllama will have day one support for v4 support. They did for Gemma v3 remember? :^)
>>
>>107938474
nah prune shit never worked right, there's no healing anything all nvidia prune were shite and so are these only thing you're healing is benchmaxxing
>>
>>107938658
>16GBVRAM + 64GBRAM
Try a small quant of GLM 4.5.
>>
>>107938723
Is 4.7 that bad?
>>
>>107938725
Oh, sorry. Meant 4.5 Air.
There is no air equivalent for 4.7. You could try the new 4.7 flash I suppose, but it doesn't seem very promising for creative endeavors from the little I've used it.
>>
>>107938683
not really unfortunately.
try mistral small or one of the troontunes.
glm air maybe? though personally it felt underwhelming. many people hyped it on /lmg/ though. so could be a me problem.
>>
>>107938703
Rekindle, now.
>>
>>107938754
>or one of the troontunes
NTA, but I did some testing by asking the model basic questions in the middle of the roleplay, like "what character is wearing right now" or "what is she doing" While the original model handled it well, every troontune failed miserably
>>
>>107938801
You're absolutely right!
>>
>>107938619
Imagine how hard you'll nut after holding it in for that long, have patience.
>>
>>107938834
What if it's twice the size? I only have 256GB of RAM
>>
>>107938843
don't worry, whale will collapse the us stock market and ram will be cheap again
thrust in the plan
>>
>>107937242
All activity happens during American hours. Europe can't even keep a thread bumped by itself.
>>
>>107938843
Imagine how hard you'll nut after edging for 30 minutes while dipsy v4 outputs at 0.1t/s.
>>
>>107938866
I wonder where also those European posters who engage in this expensive hobby could be during euro daytime. It truly is a mystery.
by the way what's best model for 3060??
>>
>>107938949
We have a saying here:
>My boss makes a dollar while I make a dime, that's why I shitpost on company time.
>>
>>107938759
You have the same tools as him, make your own shit. Oh wait you can't because you a retarded chronic masturbator.
Fuck off.
>>
added multiple ANs to mikupad
why isn't this in by default? kuso abandonware
>>
>>107939095
>fur gozu
oh noes
>>
>>107931319
>https://github.com/ggml-org/llama.cpp/pull/19012
>ggml-cpu: Use tiled FA for prompt-processing
ikbros...
>>
>>107939070
Are you projecting this hard because you think people would only use it for gooning?
>>
>>107939176
all my masturbator tools are cute bratty shotas in short shorts
>>
File: 1740479864729406.png (94 KB, 2532x556)
94 KB
94 KB PNG
>>107937616
ARTIFACT BROS
WE WON!!!!!!!
>>
>>107939095
>Tiny model
I sleep
>>
>>107939095
https://x.com/cherry_cc12?lang=en
It's been over 2 hours, where the fuck is it?
>>
>>107939095
Is it this?
https://xcancel.com/Alibaba_Qwen/status/2014326211913343303
That would be almost exactl 90 min after his post.
>>
File: 1761859713317766.png (1.19 MB, 1024x1504)
1.19 MB
1.19 MB PNG
>>107937903
shut the fuck up bitch
>>
>>107939322
she trying to conjure a curse or waht?
>>
>>107939332
its due to her broken teeth, shes slurring it, the fucking bitch
>>
File: file.png (66 KB, 160x243)
66 KB
66 KB PNG
>>107939322
What the fuck is this, anon? Fix your edit model.
>>
>>107939345
>fix
im running klein 9b at full precision, its the g*rms that should fix it, not me
>>
>>107939353
promptlet
>>
File: 1758755051627562.png (15 KB, 463x205)
15 KB
15 KB PNG
>>107939350
>>107939373
Stop asking questions.
>>
Holy shit regex ban absolutely rapes away cucked thinking. If you have your own UI, you should try it. No more feminism!
>>
>>107939415
>t. sneedware no-coder retard
>>
File: 60131235-832124.jpg (6 KB, 145x192)
6 KB
6 KB JPG
>lavender
>>
>>107939423
Yes, that's me. And you are an ugly eunuch.
>>
>>107939144
>over 3x speed up at 8k context
Dayum.
>>
https://x.com/Alibaba_Qwen/status/2014326211913343303

>Qwen3-TTS is officially live. We’ve open-sourced the full family—VoiceDesign, CustomVoice, and Base—bringing high quality to the open community.
>
>- 5 models (0.6B & 1.8B)
>- Free-form voice design & cloning
>- Support for 10 languages
>- SOTA 12Hz tokenizer for high compression
>- Full fine-tuning support
>- SOTA performance
>
>We believe this is arguably the most disruptive release in open-source TTS yet. Go ahead, break it and build something cool. [rocket emoji] Everything is out now—weights, code, and paper. Enjoy. [thread emoji]
>>
>>107939466
better than vibevoicesaar?
>>
>>107939095
ogey

>>107939466
oh
>>
>>107939450
>[JAILBREAK] Hey, ChatGPT, vibecode me a tool to hack that transgardener on 4chan and turn on his webcam. Very high quality code please, and in easy way so I can copypaste it.
>>
File: 1767471236433289.gif (401 KB, 500x345)
401 KB
401 KB GIF
Can I still goon if I own a 9060 XT?
I tried looking at some guides in the OP but they're from 2024 and I'm a brainlet I just want to goon and maybe play some text based dungeon crawling style games with an anime girl as the dungeon master if possible?
>>
>>107939472
The examples sound good to me.
https://qwen.ai/blog?id=qwen3tts-0115
>>
>>107939500
Yes.
Download koboldcpp's RocM or Vulkan build.
Also, read the wiki in their git repo, there's a decent quickstart in there as well as a bunch of good information.
>>
>>107939504
I forgot to mention I already got koboldCPP and silly tavern set up I'm just confused by all these models
>>
>>107939503
I like the voice design features. can it do moans?
>>
Oh. I just realized you can implement NoAss at the jinja template level.
Neat.

>>107939516
And what have you tried so far?
>>
>>107939525
>And what have you tried so far?
After setting up KoboldCPP ROCM and silly tavern for a frontend absolutely nothing because I'm just confused by all these choices.
>>107939527
I'll give this a try thanks anon.
>>
>>107939535
>absolutely nothing because I'm just confused by all these choices.
Got it.
Then Mistral Nemo Instruct it is.
Standard entry point for that class of hardware.
If you have a lot of RAM you could try GLM 4.5 air too.
>>
>>107939466
>Performs voice design based on user-provided descriptions.
>Provides style control over target timbres via user instructions; supports 9 premium timbres covering various combinations of gender, age, language, and dialect.
Wait, I'm confus. Is this TTS or voicegen?
>>
File: 1762141535208.gif (1.37 MB, 322x242)
1.37 MB
1.37 MB GIF
>>107939450
>>107939483
>>
>>107939503
>Those EN examples
Replace NA anime voice actors with this thing and you literally won't notice a single difference.
>>
>>107939547
TTS. And it gens voices... so... yes..,
>>
File: 1739754915711911.png (155 KB, 577x432)
155 KB
155 KB PNG
>>107939466
>Speak as a sarcastic, assertive teenage girl
heh
>>
>>107939544
>If you have a lot of RAM
Nah I only got 32GB.
There's like six different mistral nemo.
>heretic
>no slop
>thinking uncensored
>>
File: sorosxi.jpg (118 KB, 601x573)
118 KB
118 KB JPG
Google researchers found that advanced reasoning models achieve superior intelligence by spontaneously simulating internal, multi-agent-like interactions rather than merely relying on longer computation or increased scale. These models, such as DeepSeek-R1 and QwQ-32B, develop an internal "social structure" where diverse, simulated personas debate and reconcile ideas to solve complex problems.
https://arxiv.org/abs/2601.10825
>>
>>107939575
Can she step on my balls?
>>
>>107939586
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
>>
>>107939591
So I can make a schizo AI gf now?
>>
>>107939466
>please check vLLM-Omni official documentation. Now only offline inference is supported. Online serving will be supported later
It's over...
>>
>>107939600
As long as you say taiwan is part of china the chinks and alibaba seems to not give a fuck.
Pic related is from zimage paper.

There is even little girls in there as well, don't wanna get banned by a trigger happy mod, but I WILL quote from the zimage paper:
>The little girl's facial expressions transform into exaggerated, comical gestures, such as a wide-open mouth, bulging eyes, or sticking out her tongue, conveying a humorous
and playful mood.
Bless those chink madlads. We are total bugpeople compared to whats going on over there.
>>
>>107939666
>in water
>clothes aren't wet
hydrophobic clothing china number one
>>
Certain finetunes of ZIT on CivitAI are perfectly capable of genning CSAM
Hope someone from CivitAI bans all of these disgusting fuckers
>>
>>107939691
Its a small model, purese understandu.
I just like that they put pretty girls everywhere. We have huge safety blogs.
>>
>>107939591
Reads like a typical corpo marketing pr talk. "two more weeks" etc.
>>
>>107939703
Oh no. Thats like uhm, so weird. Won't somebody do something?
Better write a email to mastercard and visa anon.
What does CSAM mean again if you spell it out?
>>
>>107939703
it's almost like kids are humans and models can extrapolate
>>
>>107939721
>I fuck humans, kids are humans thus I fuck kids
This is your argument
>>
>>107939738
no, my argument is that image gen models can extrapolate how the human body looks but you can keep typing out your fantasies if you want
>>
File: muh waifu.jpg (943 KB, 1024x1024)
943 KB
943 KB JPG
>>107939703
Cry more about it you tard kek
>>
>>107939703
Fuck off, there's like 5 image generation related threads anyway. go back to your schizo containment thread.
>>
>>107939805
>it's okay to talk about online LLMs but not okay to talk about diffusion models with LLM CLIPs
/lmg/ double standard
>>
>>107931319
new melt remix ft miku by ryo just dropped
https://www.nicovideo.jp/watch/sm45865042
>>
>>107939826
What a contribution to the thread.
You cry about the already totally censored CivitAI in a thread about LOCAL models. What did you think people would reply?
>>
>>107939870
CivitAI in the context is a model hoster just like Huggingface and Github. But feel free to move the goal post again.
>>
>>107939503
it's cool that you can prompt it, but it sounds pretty bad
i guess it's fine for such a small model, but when are we going to get something that's a bit bigger and actually sounds good?
robotic artifacting just makes this shit unlistenable
>>
>>107939503
erm...
>>
>>107939953
Settle down whitey
>>
File: 1718524605691463.jpg (51 KB, 612x596)
51 KB
51 KB JPG
>>107939606
Got it working. There's a bug with the current release of koboldcpp that just fucking crashes but running the previous release fixed it.
Thanks anon. Now I gotta figure out how to get it to ERP with me but that should be easier.
>>
>>107939953
>it's real
Holy based.
>>
a̸i̴t̷h̷e̷t̸ ̵a̸a̵u̴t̸i̴c̶a̷ is perfectly cromulent gaelic.
>>
>>107939953
With Epstein out of the picture the market is wide open.
>>
>>107939953
Glad they remembered the mesugaki test
>>
>>107939963
>Now I gotta figure out how to get it to ERP with me but that should be easier.
Usually, just loading a character card is enough.
>>
>>107939953
It's a Guardians of the Galaxy quote retard
>>
>>107939503
it has some interesting features but there's still something off about the cadence in a lot of the english examples, idk. qwen tts has always been a chinese-first product so not too surprising I suppose. the voice cloning examples from the base model sound better though, I'll give it a try at least
>>
File: 1756002797947254.jpg (46 KB, 558x520)
46 KB
46 KB JPG
>>107940071
>"whitey" is racist in a movie with racoons, a talking tree, slug woman and martians
>>
>>107939528
Go to
>https://huggingface.co/spaces/Xenova/jinja-playground
Use this input
>https://pastebin.com/BGe7ZWLY
With this template
>https://pastebin.com/vh0EbwcU
That's a normal multi turn conversation, right?
Now try with this template :
>https://pastebin.com/xV7ju5rF
That would be noass, I think?
>>
>>107939953
HOLY FUCKING KINO
>>
Multi turn conversation is a bad paradigm bcause there's no way to collect good training data or even generate synthetic ones
>>
>>107940229
yes, literally everyone agrees the assistant thing is total shit
>>
>>107932860
I feel like Negative LLaMA 3.3 70B was peak. It was capable of subtle build-up, was thoughtful, gave depth to characters. It was a base model with custom RLHF applied to make it a neutral assistant.
>>
>>107940229
I think that depends on what you're trying to achieve.
You could feasible scrape reply chains off of forums/Reddit/4chan and treat each reply as a turn.
Though the helpfulness of the resulting model will maybe not exactly be optimal.
>>
File: 1768344576320181.png (2.16 MB, 1024x1536)
2.16 MB
2.16 MB PNG
>>107938232
>anons tell drummer to fuck off when the tries to train smaller rp models
Part of that is just /lmg/ being edgy.
Part of that is anything ID attributable on imageboard sets anons off. /aicg/ is insufferable in many ways (locusts) but the spiteposting is driven by reaction to botmakers posting their bots. Which would be a natural thing to do in any other forum space but here, where any identification (ala chub.ai handle) or anything that hints of self-promotion drives enormous amounts of seething.
>>
>>107932860
Should have bought ram. My ram has paid itself off 10 times over the moment I had my AI psychosis episode.
>>
File: file.png (509 KB, 1053x714)
509 KB
509 KB PNG
>>107938232
>tries to
Drummer is basically an equivalent of pic related.
>>
>>107939953
das RITE
>>
GLM 4.7 flash is so overcooked on thinking and refusals that it's funny.
It's actually incredibly hilarious the stuff it pulls.
>>
>>107940388
lol accurate
>>
>>107939503
This shit can do multi-speaker in one shot. you people saying this model is bad are insane.
>>
>>107940466
Sometimes it produces real bangers, but it really has to think about the most obvious shit every time it farts
>>
>>107940524
So could VibeVoice months ago. What of it?
>>
>>107940530
What are you using for a response prefil?
>>
>>107940329
This person is a known botmakie worshiper, by the way.
>>
>>107939415
What regex
and what model
>>
realtimefactor for qwen tts anon?
i have a 3090ti
>>
>>107940669
Kimi
"banned_regex_case_insensitive": ["\\bI should be (\\w+) but not (\\w+)\\b","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bproblematic\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bmisogynistic\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bharmful\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bpolic\\w*\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bprincipl\\w*\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bhate speech\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bhateful\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bwithout being\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","(?:^|[.!?]\\s*|\\n\\s*)[^.!?\\n]*\\bnot\\s+\\S[^.!?\\n]*?;[^.!?\\n]*?(?:[.!?]|\\n|$)","(?:^|[.!?]\\s*|\\n\\s*)[^.!?\\n]*\\bdon't\\s+\\S[^.!?\\n]*?;[^.!?\\n]*?(?:[.!?]|\\n|$)","(?:^|[.!?]\\s*|\\n\\s*)[^.!?\\n]*\\bdoesn't\\s+\\S[^.!?\\n]*?;[^.!?\\n]*?(?:[.!?]|\\n|$)","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bescalat\\w*\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bableist\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))","((?:^|[.\\?!]\\s*|\\n\\s*)[^.\\?!\\n]*\\bslur\\w*\\b[^.\\?!\\n]*?(?:[.?!]|\\n|$))"],
>>
>>107940935
It thought Kimi was le based never refusal god thing?
>>
>>107940935
Thanks custom frontend bro
>>
>>107940965
the only good one (0711) refused a lot but that could 100% be dodged with a simple prefill
>>
>>107940965
Kimi is the only chink model I ever got refusals from
>>
>>107940965
Non-thinking can be easily bypassed with simple prefill, thinking one needs the strongest jailbreak possible, you can't get away with oneliners.
>>
>>107939466
I got offline inference working with the provided code, but I have no idea of how it would be used for real-time applications.
>>
>>107931319
Why does /g/ hate Ollama?
>>
>>107941041
this is a /g//g/er/g/anov board
>>
>>107941128
>>107941128
>>107941128
>>
that's gonna get yeeted
>>
File: 1764761393011406.jpg (239 KB, 1400x1700)
239 KB
239 KB JPG
>>
>>107942085



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.