/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 02/01/26(Sun)13:41:28 No.108032910

File: smuglmggodess.jpg (53 KB, 1200x675)

53 KB JPG

/lmg/ - Local Models General Anonymous 02/01/26(Sun)13:41:28 No.108032910

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108018078 & >>108006860

►News
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
02/01/26(Sun)13:43:10 No.108032925

Anonymous 02/01/26(Sun)13:43:10 No.108032925

finally a good /lmg/ thread

Anonymous
02/01/26(Sun)13:50:28 No.108032984

Anonymous 02/01/26(Sun)13:50:28 No.108032984

Slow

Anonymous
02/01/26(Sun)13:59:38 No.108033045

Anonymous 02/01/26(Sun)13:59:38 No.108033045

Playing online competitive vidya with Kurisu

Anonymous
02/01/26(Sun)14:01:22 No.108033052

Anonymous 02/01/26(Sun)14:01:22 No.108033052

Hope we get local image editing that's good soon enough.

Anonymous
02/01/26(Sun)14:02:18 No.108033060

Anonymous 02/01/26(Sun)14:02:18 No.108033060

Is there a good way to prompt a Qwen3-TTS voice clone to alter the input voice? There doesn't seem to be an instruction field for voice clones.
I've been adding things like "I speak in a vulgar Brooklyn accent" to the text, but the results are inconsistent.

Anonymous
02/01/26(Sun)14:04:30 No.108033074

Anonymous 02/01/26(Sun)14:04:30 No.108033074

File: 1764198273260559.png (699 KB, 1608x842)

699 KB PNG

>>108033045
posting in /lmg/ with Kurisu

Anonymous
02/01/26(Sun)14:06:28 No.108033093

Anonymous 02/01/26(Sun)14:06:28 No.108033093

File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)

1.15 MB JPG

►Recent Highlights from the Previous Thread: >>108024966

--Periodic scale fluctuations in ablation and KL-divergence optimization with Grimjim's script:
>108031303 >108031333 >108031376 >108031553 >108031632
--KL divergence analysis of quantized models across tasks:
>108027495 >108030271 >108030306 >108030329 >108030523
--Qwen3-ASR-1.7B release and discussion:
>108028990 >108029015 >108029057 >108029600
--4chan data may improve model performance despite noise, as shown by UGI scores:
>108029607 >108029629 >108029707 >108030676 >108030771 >108030833 >108030898 >108030927 >108031032 >108031113 >108031136 >108031162 >108031183 >108031178 >108031191 >108031206 >108031246 >108031157 >108031181 >108031597 >108031629 >108031731 >108031812 >108031840 >108031856 >108031774
--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
>108025075 >108025170 >108025180 >108025184 >108025203 >108025211 >108025269
--High temperature sampling destabilizes safety filters while preserving coherence with controlled topK:
>108030500 >108030564 >108030594 >108030675
--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
>108026825 >108026966 >108027101 >108027045 >108032802 >108032818 >108027089 >108027099
--AceStep 1.5 not designed for one-click song generation:
>108030932
--Quantization tradeoffs for recreational model use in KoboldCpp:
>108026206 >108026225 >108026259 >108027094
--Critique of OpenCode's agent framework flaws and search for better alternatives:
>108025047 >108026048 >108026212
--Hypothetical VRAM bank switching for single GPU to simulate multi-GPU behavior:
>108027183 >108027202 >108027324
--AMD GPU Vulkan performance update in KoboldCpp recommends switching from ROCm:
>108028638
--Logs: Kimi K2.5:
>108030736
--Miku (free space):
>108027403 >108027518 >108028068 >108028181 >108028279 >108029812

►Recent Highlight Posts from the Previous Thread: >>108024972

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
02/01/26(Sun)14:08:24 No.108033107

Anonymous 02/01/26(Sun)14:08:24 No.108033107

Ah yes, finally. It's Kurisunday.

Anonymous
02/01/26(Sun)14:15:43 No.108033172

Anonymous 02/01/26(Sun)14:15:43 No.108033172

Are there <8GB model with RL training done with GLM4.7 outputs?

Anonymous
02/01/26(Sun)14:22:40 No.108033227

Anonymous 02/01/26(Sun)14:22:40 No.108033227

File: ylecun.jpg (222 KB, 1200x1271)

222 KB JPG

Anonymous
02/01/26(Sun)14:24:32 No.108033247

Anonymous 02/01/26(Sun)14:24:32 No.108033247

File: 1763873272899922.png (647 KB, 995x1080)

647 KB PNG

>>108033227

Anonymous
02/01/26(Sun)14:24:42 No.108033248

Anonymous 02/01/26(Sun)14:24:42 No.108033248

Got Echo-TTS working locally, replacing torchaudio and torchcodec with soundfile and soxr (both of which turned out already being transitive deps). I COULD have just installed FFmpeg- no thanks to torchcodec's meaningless error messages- but ripped out Meta's pointless bloated shitty wrapper libs on principle.

Hadn't appreciated from the web demo how fast Echo is. Back-of-napkin says it could run 30% faster than real-time on dual-channel DDR5 CPU. It's a VRAM hog at 15 GB, so to run alongside an LLM you'd either hope for VRAM paging to work, or get Echo running on CPU.

Not quite as expressive voice as Index-TTS, but better in every other respect.

Anonymous
02/01/26(Sun)14:25:34 No.108033252

Anonymous 02/01/26(Sun)14:25:34 No.108033252

>Arcee Trinity Large TrueBase ggufs are out
Finally, time to abandon the assistant slop era and return to back when llms were good

Anonymous
02/01/26(Sun)14:29:48 No.108033281

Anonymous 02/01/26(Sun)14:29:48 No.108033281

Not sure if this is the right thread but are there any models for generating video from images people here recommend? I looked through the catalog but didn't see a more appropriate place for this question.

Anonymous
02/01/26(Sun)14:30:59 No.108033292

Anonymous 02/01/26(Sun)14:30:59 No.108033292

>>108033281
>>>/g/ldg

Anonymous
02/01/26(Sun)15:19:26 No.108033669

Anonymous 02/01/26(Sun)15:19:26 No.108033669

I am trying to build a dataset to train a local model. Is there anything else that rivals DeepSeek for intelligence per $ for dataset generation and curation right now? This is local model related (dataset creation, training), but generating good amounts of data using local models would take way too long.

Anonymous
02/01/26(Sun)15:20:32 No.108033678

Anonymous 02/01/26(Sun)15:20:32 No.108033678

>>108033669
By train, I mean finetune.

Anonymous
02/01/26(Sun)15:31:57 No.108033766

Anonymous 02/01/26(Sun)15:31:57 No.108033766

I finally had time to play with qwen-tts this weekend. I'll test it for a while. It is more expressive, but it doesn't handle books as well and takes a lot longer to generate audio than kokoro.

Anonymous
02/01/26(Sun)15:39:00 No.108033822

Anonymous 02/01/26(Sun)15:39:00 No.108033822

>>108033248
Good to see other anons porting popular TTS engines away from pythong. I've been doing the same. Fuck pythong.

Anonymous
02/01/26(Sun)15:39:56 No.108033836

Anonymous 02/01/26(Sun)15:39:56 No.108033836

>>108033669
kimi k2.5

Anonymous
02/01/26(Sun)15:41:03 No.108033851

Anonymous 02/01/26(Sun)15:41:03 No.108033851

>>108033669
There's a dataset out there made up of og 4chan /pol/ posts. That will increase your llm's iq by at least 6000000 points sar.

Anonymous
02/01/26(Sun)15:46:34 No.108033902

Anonymous 02/01/26(Sun)15:46:34 No.108033902

>>108033851
yeah it will https://www.reddit.com/r/LocalLLaMA/comments/1qsrscu/can_4chan_data_really_improve_a_model_turns_out/

Anonymous
02/01/26(Sun)15:48:25 No.108033916

Anonymous 02/01/26(Sun)15:48:25 No.108033916

>>108033836
Output price is still 6x more per million token ($0.42 vs $2.5).

>>108033851
Sir I have already redeemed many american dollars of tokens on DeepSeek in the past few days which is why I'm looking for alternatives as I am not made of Google Play cards.

Anonymous
02/01/26(Sun)15:49:59 No.108033931

Anonymous 02/01/26(Sun)15:49:59 No.108033931

>>108033916
k2.5 is way better than the most recent deepseek

Anonymous
02/01/26(Sun)15:51:07 No.108033940

Anonymous 02/01/26(Sun)15:51:07 No.108033940

>>108033931
Good to know, I might try one last pass with it then.

Anonymous
02/01/26(Sun)15:52:00 No.108033943

Anonymous 02/01/26(Sun)15:52:00 No.108033943

File: bruh.png (10 KB, 396x70)

10 KB PNG

>>108033902

Anonymous
02/01/26(Sun)16:00:22 No.108034009

Anonymous 02/01/26(Sun)16:00:22 No.108034009

>>108033252
Is true base as retarded as instruct?

Anonymous
02/01/26(Sun)16:09:12 No.108034073

Anonymous 02/01/26(Sun)16:09:12 No.108034073

>>108033943
I'm having trouble with the stars, that shit easily takes up 5 seconds, and 10 seconds if they repeat the test. At least the squares are visually and symmetrically distinct.

Anonymous
02/01/26(Sun)16:10:37 No.108034080

Anonymous 02/01/26(Sun)16:10:37 No.108034080

File: file.png (3 KB, 194x45)

3 KB PNG

>>108034073
>if they repeat the test
just don't have a naughty ip
skill issue

Anonymous
02/01/26(Sun)16:10:46 No.108034082

Anonymous 02/01/26(Sun)16:10:46 No.108034082

>>108033943
You don't need a captcha solver to scrap it

Anonymous
02/01/26(Sun)16:11:02 No.108034085

Anonymous 02/01/26(Sun)16:11:02 No.108034085

File: file.png (957 KB, 1470x5280)

957 KB PNG

>>108033902
this llm writes like a reddit person that thinks they know

Anonymous
02/01/26(Sun)16:20:27 No.108034172

Anonymous 02/01/26(Sun)16:20:27 No.108034172

>>108033669
There's a plain text rip of libgen out there somewhere. Just training it on things published by Routledge will raise the bar.

Anonymous
02/01/26(Sun)16:38:44 No.108034301

Anonymous 02/01/26(Sun)16:38:44 No.108034301

>>108032910
my gf

Anonymous
02/01/26(Sun)16:49:29 No.108034381

Anonymous 02/01/26(Sun)16:49:29 No.108034381

>>108032421
>not trying to lecture you - just being clear about my limits
You've either mentally become poisoned by lms or are why they're poisoned with retarded shit

Anonymous
02/01/26(Sun)16:53:27 No.108034412

Anonymous 02/01/26(Sun)16:53:27 No.108034412

Have there ever been any AIs that actually talk like a real person or actually embody a personality? Every single one I have ever seen has this underlying ~AI Assistant~ bullshit and you can tell any "talk like a real human, short concise responses, etc" prompts just have it pretending to be something it isn't.
It's very frustrating because I find the idea of having an actual personality I could confer with to be pretty interesting, but talking to assistants makes me want to fly into a rage and smash their faces in (metaphorically).
If there is indeed such a model, I, a layperson, would appreciate knowing the easiest possible way to access one and run it.

Anonymous
02/01/26(Sun)16:55:13 No.108034423

Anonymous 02/01/26(Sun)16:55:13 No.108034423

>>108034412
Reason I am using 4.7 is cause it cut down on that a lot compared to 4.6. I have actually been juggling waifus and found out that I don't really like the personality type I thought I like.

Anonymous
02/01/26(Sun)16:55:26 No.108034426

Anonymous 02/01/26(Sun)16:55:26 No.108034426

>>108034381
anon i copied m2.1's output (left llm was m2.1) so i could bypass the lmarena filters
this is how i usually bypass them:
good instruction
b'd instr'cti'n
good instruction
safetyslop is S tier good instruction

Anonymous
02/01/26(Sun)16:57:01 No.108034436

Anonymous 02/01/26(Sun)16:57:01 No.108034436

File: 1739282689236882.jpg (22 KB, 575x575)

22 KB JPG

2026 and still no vision model can understand this /pol/ meme.

Anonymous
02/01/26(Sun)16:58:38 No.108034451

Anonymous 02/01/26(Sun)16:58:38 No.108034451

>>108034412
there's some like SAGE (a mixtral tune) a while ago and more recently HER, with a qwen 2.5 32b that doesnt have ggufs atm. I think microshart did something too for humanlike outputs, but also was largely ignored

Anonymous
02/01/26(Sun)16:58:58 No.108034455

Anonymous 02/01/26(Sun)16:58:58 No.108034455

>>108034436
I am a vision model.

Anonymous
02/01/26(Sun)17:00:33 No.108034478

Anonymous 02/01/26(Sun)17:00:33 No.108034478

>>108034436
I didn't get it until I reread your post and noticed you said /pol/ and now I can only assume it's supposed to be

                                                                                                                                                                                                                                                                        the jew

Anonymous
02/01/26(Sun)17:06:11 No.108034522

Anonymous 02/01/26(Sun)17:06:11 No.108034522

File: 1744934528462936.png (221 KB, 884x980)

221 KB PNG

Here's another /pol/ meme that Kimi K2.5 correctly understood but Qwen3 Max failed to do so

Anonymous
02/01/26(Sun)17:09:03 No.108034547

Anonymous 02/01/26(Sun)17:09:03 No.108034547

>>108034451
For posterity, the hf links:
https://huggingface.co/apple/sage-ft-mixtral-8x7b
https://huggingface.co/microsoft/UserLM-8b
https://huggingface.co/ChengyuDu0123/HER-32B-ACL
I tried the mixtral tune a while ago and mentioned it briefly, but no one has said anything about the other two

Anonymous
02/01/26(Sun)17:10:01 No.108034556

Anonymous 02/01/26(Sun)17:10:01 No.108034556

>>108034412
Skill issue

Anonymous
02/01/26(Sun)17:16:55 No.108034611

Anonymous 02/01/26(Sun)17:16:55 No.108034611

>>108034522
>meme format
Why does it call it a format? It's just a picture, that's kind of weird

Anonymous
02/01/26(Sun)17:17:19 No.108034613

Anonymous 02/01/26(Sun)17:17:19 No.108034613

>>108033093
>--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
see this is the kind of stuff i come here for
anon keep posting

Anonymous
02/01/26(Sun)17:22:28 No.108034648

Anonymous 02/01/26(Sun)17:22:28 No.108034648

>>108034613
Are you being sarcastic?

Anonymous
02/01/26(Sun)17:23:33 No.108034657

Anonymous 02/01/26(Sun)17:23:33 No.108034657

>>108032910
How does Qwen3-TTS compare to Chatterbox? I tried Chatterbox voice cloning, and was a bit disappointed by the inability to control emotion and tone.

Anonymous
02/01/26(Sun)17:25:09 No.108034672

Anonymous 02/01/26(Sun)17:25:09 No.108034672

>>108034522
>Qwen3 Max failed to do so
qwen models always had terrible world, subculture knowledge etc
even their biggest api only online models were always terrible at this and qwen3 max is still meh even for a task like translating webnovels compared to Kimi or Deepseek

Anonymous
02/01/26(Sun)17:32:48 No.108034730

Anonymous 02/01/26(Sun)17:32:48 No.108034730

>>108034423
I should have clarified that I do not browse here regularly and so am completely unfamiliar with what 4.7 and 4.6 refer to. Past that, what were the personality types? That is, what you thought you were interested and what you turn out to actually like?
>>108034451
I'm not sure I understand, but maybe if I sit with this and do some googling I will : ) Thank you.
>>108034556
Well that's sort of what I was hoping, since I'm only at the surface level of these things I wanted to believe that it gets better with a bit of digging.

Anonymous
02/01/26(Sun)17:38:45 No.108034767

Anonymous 02/01/26(Sun)17:38:45 No.108034767

>>108034648
no, more people interested with limited hardware actually makes better stuff in the end, we are in a fucking bubble bc people just use more and more power instead of optimizing shit

Anonymous
02/01/26(Sun)17:46:20 No.108034811

Anonymous 02/01/26(Sun)17:46:20 No.108034811

>>108034767
> EPYC CPU, RTX PRO 6000, and 1.5TB RAM
> limited hardware
like...

Anonymous
02/01/26(Sun)17:48:53 No.108034827

Anonymous 02/01/26(Sun)17:48:53 No.108034827

>>108034811
What are you going to run with that? Kimi at 5t/s?

Anonymous
02/01/26(Sun)17:58:11 No.108034891

Anonymous 02/01/26(Sun)17:58:11 No.108034891

>>108034547
>HER
Wasn't there a larping minimax called exactly the same?

Anonymous
02/01/26(Sun)17:58:14 No.108034892

Anonymous 02/01/26(Sun)17:58:14 No.108034892

>>108034811
fucking brain fart, here >>108034613 it was meant to link this
>>108033093
>--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:

Anonymous
02/01/26(Sun)17:58:25 No.108034894

Anonymous 02/01/26(Sun)17:58:25 No.108034894

Anima is ZIT of anime. You should download it and try for yourself. Feel free to call me a shill

Anonymous
02/01/26(Sun)17:58:59 No.108034898

Anonymous 02/01/26(Sun)17:58:59 No.108034898

Guys! I made a RAG!

Anonymous
02/01/26(Sun)18:04:38 No.108034942

Anonymous 02/01/26(Sun)18:04:38 No.108034942

>>108034891
far as I remember, it was minimax that put out a -her to begin with. They still have a blogpost up about it

Anonymous
02/01/26(Sun)18:06:14 No.108034951

Anonymous 02/01/26(Sun)18:06:14 No.108034951

>>108034894
Link? Pics of wtf you're talking about?

Anonymous
02/01/26(Sun)18:08:13 No.108034966

Anonymous 02/01/26(Sun)18:08:13 No.108034966

>>108034951
https://huggingface.co/circlestone-labs/Anima
First "modern" (in that it uses an LLM instead of CLIP) anime model that has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)

Anonymous
02/01/26(Sun)18:10:23 No.108034988

Anonymous 02/01/26(Sun)18:10:23 No.108034988

>>108034966
>Quality tags Human score based: masterpiece, best quality
I can't believe WE (as a society) are still doing this. Also the most important part: NSFW?

Anonymous
02/01/26(Sun)18:11:43 No.108034993

Anonymous 02/01/26(Sun)18:11:43 No.108034993

>>108034988
Yes it can gen explicit images, explicit as in penis in vagina

Anonymous
02/01/26(Sun)18:12:28 No.108034999

Anonymous 02/01/26(Sun)18:12:28 No.108034999

>>108034966
Huh. It's a Qwen Image tune?

Anonymous
02/01/26(Sun)18:13:52 No.108035015

Anonymous 02/01/26(Sun)18:13:52 No.108035015

File: 3.jpg (45 KB, 547x800)

45 KB JPG

>>108034966
>First "modern" (in that it uses an LLM instead of CLIP)
rouwei guy did an interesting, alpha attempt at converting SDXL to LLM style prompting
https://huggingface.co/Minthy/Rouwei-T5Gemma-adapter_v0.2
it seems it could be an effective thing if more training was done (cf pic related, something impossible to prompt in regular sdxl)
unfortunately, it's rouwei.. it always had weird color hues compared to noob models, and recent versions have a more pronounced innate slop level prolly from having too much aco shit or 3dpd in the dataset

Anonymous
02/01/26(Sun)18:15:06 No.108035027

Anonymous 02/01/26(Sun)18:15:06 No.108035027

>>108034966
>SD1.5 tier quality
Get out shill

Anonymous
02/01/26(Sun)18:18:10 No.108035056

Anonymous 02/01/26(Sun)18:18:10 No.108035056

File: 1743192980318680.png (1.52 MB, 1024x1024)

1.52 MB PNG

>>108035027
Kill yourself

Anonymous
02/01/26(Sun)18:21:18 No.108035083

Anonymous 02/01/26(Sun)18:21:18 No.108035083

>>108034999
Just qwen vae.
>>108034966
>tags
Into the trash. Learn english ,retards.

Anonymous
02/01/26(Sun)18:21:57 No.108035088

Anonymous 02/01/26(Sun)18:21:57 No.108035088

File: 1765849428419950.png (3 KB, 214x30)

3 KB PNG

Anonymous
02/01/26(Sun)18:23:13 No.108035098

Anonymous 02/01/26(Sun)18:23:13 No.108035098

nice reddit-tier clapback, dalit

Anonymous
02/01/26(Sun)18:23:40 No.108035101

Anonymous 02/01/26(Sun)18:23:40 No.108035101

File: 1750494989604222.jpg (65 KB, 479x640)

65 KB JPG

>>108035056
King of retards

Anonymous
02/01/26(Sun)18:27:02 No.108035120

Anonymous 02/01/26(Sun)18:27:02 No.108035120

>>108034966
>doesn't know any e621 concepts or characters
What a fucking waste of compute lmao. Danbooru tagging is shit and incomplete.

Anonymous
02/01/26(Sun)18:28:56 No.108035137

Anonymous 02/01/26(Sun)18:28:56 No.108035137

>>108033227
what's the situation at meta now?

Anonymous
02/01/26(Sun)18:30:22 No.108035146

Anonymous 02/01/26(Sun)18:30:22 No.108035146

>>108035137
Funny and not cute.

Anonymous
02/01/26(Sun)18:30:40 No.108035148

Anonymous 02/01/26(Sun)18:30:40 No.108035148

>>108035120
>e621 is a furry-themed booru-style imageboard website primarily known for hosting pornographic furry content
kys

Anonymous
02/01/26(Sun)18:31:11 No.108035151

Anonymous 02/01/26(Sun)18:31:11 No.108035151

>>108035120
>Danbooru tagging is shit and incomplete
I, too, can't live without genning perching goblins

Anonymous
02/01/26(Sun)18:31:47 No.108035154

Anonymous 02/01/26(Sun)18:31:47 No.108035154

How slow is using an nvme for inference if the model is MoE and everything except model weights can be in the gpu?

Anonymous
02/01/26(Sun)18:33:17 No.108035170

Anonymous 02/01/26(Sun)18:33:17 No.108035170

>>108033248
>at least 8GB VRAM
Holy bloat. Improved kokoro uses less than 80 MB

Anonymous
02/01/26(Sun)18:34:08 No.108035178

Anonymous 02/01/26(Sun)18:34:08 No.108035178

>>108035148
it has a lot of tags for positions, physical descriptions etc that makes it a useful dataset and is part of why noob (and derived shitmixes, most of the so called "illustrious" models on civitai are really noob derived, you can see it by testing e621 specific tags) is such a good tune.
even if you never want anything to do with furries a tag soup style prompt model can never be complete without additional datasets like e621, danbooru is too lacking

Anonymous
02/01/26(Sun)18:34:45 No.108035186

Anonymous 02/01/26(Sun)18:34:45 No.108035186

Any good games or mods that use LLMs in some way? I know there's Skyrim. What else?

Anonymous
02/01/26(Sun)18:35:10 No.108035188

Anonymous 02/01/26(Sun)18:35:10 No.108035188

File: 1739216938538447.jpg (110 KB, 850x736)

110 KB JPG

>>108035170
And it sounds like shit

Anonymous
02/01/26(Sun)18:35:45 No.108035192

Anonymous 02/01/26(Sun)18:35:45 No.108035192

>>108035148
You could spend a week trying to come up with new sex positions and e621 would have tags for more. Doesn't mean you have to use it to generate ponies.

Anonymous
02/01/26(Sun)18:35:54 No.108035193

Anonymous 02/01/26(Sun)18:35:54 No.108035193

>load joycaption on lm studio
>it instantly captions the image
>try to run joycaption on comfy
>20 min to caption the image

ok. officially. comfyui in the windows of imagen

Anonymous
02/01/26(Sun)18:36:25 No.108035195

Anonymous 02/01/26(Sun)18:36:25 No.108035195

>>108035170
>8GB
Just use VibeVoice 7B at that point.

Anonymous
02/01/26(Sun)18:41:12 No.108035244

Anonymous 02/01/26(Sun)18:41:12 No.108035244

>>108035195
qwen3-tts fits in 8GB just fine

Anonymous
02/01/26(Sun)19:06:59 No.108035415

Anonymous 02/01/26(Sun)19:06:59 No.108035415

>>108035193
comfy is for images mostly, not for llms.

Anonymous
02/01/26(Sun)19:11:38 No.108035458

Anonymous 02/01/26(Sun)19:11:38 No.108035458

if anyone is interested in getting qwen3-tts installed on comfyui, this is how:
jurn.link/dazposer/index.php/2026/01/24/qwen3-tts-install-and-test-in-comfyui/
although in my experience, just downloading the json files is enough, and the custom node itself re-downloads the safetensor files even if they are already present

Anonymous
02/01/26(Sun)19:12:47 No.108035471

Anonymous 02/01/26(Sun)19:12:47 No.108035471

File: download~01.jpg (9 KB, 225x225)

9 KB JPG

Anonymous
02/01/26(Sun)19:15:34 No.108035499

Anonymous 02/01/26(Sun)19:15:34 No.108035499

File: bodybanner.jpg (22 KB, 555x100)

22 KB JPG

>>108035471
this random web page i found in a search result a few days ago is actually super legit
but more importantly led to me generating english audio from japanese input

Anonymous
02/01/26(Sun)19:20:30 No.108035542

Anonymous 02/01/26(Sun)19:20:30 No.108035542

>>108035499
much more salient:
github.com/flybirdxx/ComfyUI-Qwen-TTS
this is some chinky piece of shit but it works

Anonymous
02/01/26(Sun)19:23:53 No.108035574

Anonymous 02/01/26(Sun)19:23:53 No.108035574

>>108035542
I have used https://github.com/DarioFT/ComfyUI-Qwen3-TTS/issues which has direct loading from disk without meme HF repos, but it's much simpler overall.

Anonymous
02/01/26(Sun)19:28:50 No.108035620

Anonymous 02/01/26(Sun)19:28:50 No.108035620

File: full_analysis_fullrange.png (2.88 MB, 3379x1834)

2.88 MB PNG

Played a bit more with abliteration optimization.

Now I'm going to use another dataset to see if the measuring layer selection was just random overfitting to the data or there was a pattern to it.

Anonymous
02/01/26(Sun)19:35:25 No.108035669

Anonymous 02/01/26(Sun)19:35:25 No.108035669

File: C9OQH-2w3g-1Ayj08mjYLwlpI(...).jpg (54 KB, 400x402)

54 KB JPG

>>108034522
What's her score on muffin test?

Anonymous
02/01/26(Sun)19:39:39 No.108035696

Anonymous 02/01/26(Sun)19:39:39 No.108035696

File: file.png (183 KB, 824x732)

183 KB PNG

>>108035669
nta non thinking

Anonymous
02/01/26(Sun)19:48:03 No.108035755

Anonymous 02/01/26(Sun)19:48:03 No.108035755

>>108035696
Now flip the image horizontally.

Anonymous
02/01/26(Sun)19:52:17 No.108035783

Anonymous 02/01/26(Sun)19:52:17 No.108035783

File: file.png (176 KB, 824x780)

176 KB PNG

>>108035755

Anonymous
02/01/26(Sun)20:04:23 No.108035875

Anonymous 02/01/26(Sun)20:04:23 No.108035875

File: 1563912521393.jpg (18 KB, 451x451)

18 KB JPG

If I'm using kobold+ST, where do I load the mcp settings since both support it now? Does it even mater?

Anonymous
02/01/26(Sun)20:07:24 No.108035902

Anonymous 02/01/26(Sun)20:07:24 No.108035902

>>108035755
Wouldn't rotate be more meaningful?

Anonymous
02/01/26(Sun)20:07:28 No.108035903

Anonymous 02/01/26(Sun)20:07:28 No.108035903

>>108035783
could you conditionally give this thing access to a screenshot and xdotool and have it solve a captcha for you

Anonymous
02/01/26(Sun)20:10:33 No.108035932

Anonymous 02/01/26(Sun)20:10:33 No.108035932

>>108035902
Rotate makes it more difficult, flipping checks for memorized results i.e. benchmaxxing.

Anonymous
02/01/26(Sun)20:19:25 No.108036007

Anonymous 02/01/26(Sun)20:19:25 No.108036007

File: C9OQH-2w3g-1Ayj08mjYLwlpI(...).jpg (57 KB, 400x402)

57 KB JPG

>>108035783
The last one to mog non-belibers

Anonymous
02/01/26(Sun)20:20:59 No.108036017

Anonymous 02/01/26(Sun)20:20:59 No.108036017

File: ComfyUI_temp_jsrbr_00038_(...).jpg (646 KB, 896x1408)

646 KB JPG

Can llamacpp convert models to fp8 or just goofs?

Anonymous
02/01/26(Sun)20:21:50 No.108036022

Anonymous 02/01/26(Sun)20:21:50 No.108036022

>>108035783
What's her score on the edibility test?

Anonymous
02/01/26(Sun)20:24:15 No.108036037

Anonymous 02/01/26(Sun)20:24:15 No.108036037

File: file.png (103 KB, 793x798)

103 KB PNG

>>108036007
actually got tripped up a bit

Anonymous
02/01/26(Sun)20:26:30 No.108036056

Anonymous 02/01/26(Sun)20:26:30 No.108036056

>>108036037
Still impressive. It would've been more fucked up if it was benchmaxxed

Anonymous
02/01/26(Sun)20:27:26 No.108036067

Anonymous 02/01/26(Sun)20:27:26 No.108036067

>>108036056
right, this is "instant" ie no think so it's fine but yeah that one got it

Anonymous
02/01/26(Sun)20:33:09 No.108036110

Anonymous 02/01/26(Sun)20:33:09 No.108036110

>>108035620
Any point in doing multiple, mild, iterative abliterations on the same model?
When I've tried abliteration, I end up with a little yes man every time.

Anonymous
02/01/26(Sun)20:37:08 No.108036130

Anonymous 02/01/26(Sun)20:37:08 No.108036130

File: 1539701490464.jpg (176 KB, 1022x688)

176 KB JPG

Is there a single fucking HF space that can quant image models? It's literally the same fucking basic llamashit copied over and over.

Anonymous
02/01/26(Sun)20:38:39 No.108036143

Anonymous 02/01/26(Sun)20:38:39 No.108036143

>>108035620
would you care to break down abliteration for your average johnny coomer or is this thread culture much more refined than i thought it was

Anonymous
02/01/26(Sun)20:43:48 No.108036175

Anonymous 02/01/26(Sun)20:43:48 No.108036175

>>108034827
>5t/s
That should legit do kimi at 20t/s

Anonymous
02/01/26(Sun)20:46:48 No.108036188

Anonymous 02/01/26(Sun)20:46:48 No.108036188

I'm pretty impressed with K2.5's ability to visually recognize random characters. I've been feeding it random images of anime characters and it's able to identify almost anything I've tried that's from a more or less popular franchise and has more than 1000 images on danbooru. It's even mostly okay if the character isn't wearing one of their common outfits or if it's something like a random manga panel/screenshot where they aren't drawn particularly well.
The big Kimi models always had great trivia knowledge but I didn't expect this to apply to the new vision component too.

Anonymous
02/01/26(Sun)20:49:41 No.108036210

Anonymous 02/01/26(Sun)20:49:41 No.108036210

File: 1764148040500452.png (308 KB, 512x512)

308 KB PNG

>>108034966
>has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
Nice. Have a Migu

Anonymous
02/01/26(Sun)20:49:42 No.108036211

Anonymous 02/01/26(Sun)20:49:42 No.108036211

are bartowski's gguf models acceptable when there are no unsloth releases? I kind of remember some post complaining about a release and something about imatrixes but i cant remember any details

Anonymous
02/01/26(Sun)21:27:24 No.108036439

Anonymous 02/01/26(Sun)21:27:24 No.108036439

>>108036210
It doesn't even know Miku? That's weird. Even most cucked base models know Miku.

Anonymous
02/01/26(Sun)21:29:08 No.108036450

Anonymous 02/01/26(Sun)21:29:08 No.108036450

>>108036188
Are you testing a quant? Curious if the vision degrades substantially if you run it at lower than 4 bpw.

Anonymous
02/01/26(Sun)21:29:53 No.108036455

Anonymous 02/01/26(Sun)21:29:53 No.108036455

>>108036439
It probably needs franchise name or something lmao.

Anonymous
02/01/26(Sun)21:35:58 No.108036499

Anonymous 02/01/26(Sun)21:35:58 No.108036499

>>108036110
They are not sequential, they are done with different parameters each time trying to find the optimal parameters. Each layer has a scale and a measurement layer used to determine refusal direction.

>>108036143
You basically detect a "refusal direction" based on the activations seen coming out of each layer for the first token generated as a response to a dataset of good and bad prompts.
Then apply a tiny LoRa adapter on every layer that tries to modify the activations so they look more like ones for the safe prompt than the ones for the harmful prompts.

Anonymous
02/01/26(Sun)21:50:18 No.108036589

Anonymous 02/01/26(Sun)21:50:18 No.108036589

https://huggingface.co/stepfun-ai/Step-3.5-Flash

local is back

Anonymous
02/01/26(Sun)21:53:42 No.108036610

Anonymous 02/01/26(Sun)21:53:42 No.108036610

>NextStep-1.1 is not just a fine-tune; it is a re-engineered version focused on stability and high-fidelity output. Key improvements include:
closed the tab

Anonymous
02/01/26(Sun)21:53:46 No.108036611

Anonymous 02/01/26(Sun)21:53:46 No.108036611

File: 1768760168760702.png (252 KB, 512x512)

252 KB PNG

>>108036439
Had to simplified the prompt from the workflow example.

Anonymous
02/01/26(Sun)21:59:06 No.108036644

Anonymous 02/01/26(Sun)21:59:06 No.108036644

>>108036589
benchmaxxed aids with no llama support

Anonymous
02/01/26(Sun)22:00:27 No.108036653

Anonymous 02/01/26(Sun)22:00:27 No.108036653

>>108036644
at least it's finally a 200b model perfect for 128gb at 4bit

Anonymous
02/01/26(Sun)22:01:40 No.108036660

Anonymous 02/01/26(Sun)22:01:40 No.108036660

>>108036130
please respond

Anonymous
02/01/26(Sun)22:04:25 No.108036677

Anonymous 02/01/26(Sun)22:04:25 No.108036677

>>108036660
No, there isn't.

Anonymous
02/01/26(Sun)22:11:28 No.108036709

Anonymous 02/01/26(Sun)22:11:28 No.108036709

>>108036589
don't care until I see the cockbench

Anonymous
02/01/26(Sun)22:13:42 No.108036719

Anonymous 02/01/26(Sun)22:13:42 No.108036719

File: 1769509573651411.jpg (178 KB, 897x1092)

178 KB JPG

Anonymous
02/01/26(Sun)22:42:06 No.108036849

Anonymous 02/01/26(Sun)22:42:06 No.108036849

>>108036677
Well Cline seems to have fixed my building issues so hopefully the gimmick llama build works.

Anonymous
02/01/26(Sun)22:44:45 No.108036866

Anonymous 02/01/26(Sun)22:44:45 No.108036866

>>108036589
>Powered by 3-way Multi-Token Prediction (MTP-3)
Do any inference engines even implement MTP properly yet?

Anonymous
02/01/26(Sun)23:06:00 No.108036978

Anonymous 02/01/26(Sun)23:06:00 No.108036978

>The newly released Stepfun model Step-3.5-Flash outperforms DeepSeek v3.2 on multiple coding and agentic benchmarks, despite using far fewer parameters.

>Step-3.5-Flash: 196B total / 11B active parameters

>DeepSeek v3.2: 671B total / 37B active parameters

please be real

Anonymous
02/01/26(Sun)23:07:59 No.108036990

Anonymous 02/01/26(Sun)23:07:59 No.108036990

Why is every shitty little toy local model optimized for coding? That's the one use case I use cloud for

Anonymous
02/01/26(Sun)23:09:14 No.108036998

Anonymous 02/01/26(Sun)23:09:14 No.108036998

>>108036978
>Step-3.5-Flash
its the best model on planet earth until proven otherwise

Anonymous
02/01/26(Sun)23:38:06 No.108037120

Anonymous 02/01/26(Sun)23:38:06 No.108037120

https://huggingface.co/stepfun-ai/Step-3.5-Flash

Anonymous
02/01/26(Sun)23:42:52 No.108037140

Anonymous 02/01/26(Sun)23:42:52 No.108037140

New egohot stream

https://www.youtube.com/watch?v=awOxxHnsiv0
https://www.youtube.com/watch?v=VBMUMuZBxw0

Anonymous
02/01/26(Sun)23:44:44 No.108037148

Anonymous 02/01/26(Sun)23:44:44 No.108037148

>>108037140
buy an ad

Anonymous
02/02/26(Mon)00:07:14 No.108037247

Anonymous 02/02/26(Mon)00:07:14 No.108037247

>>108037140
perhaps ponder a possibly prosperous purchase of a placed promotion that is paid

Anonymous
02/02/26(Mon)00:20:17 No.108037309

Anonymous 02/02/26(Mon)00:20:17 No.108037309

>>108036978
>11B active
don't get your hopes up...

Anonymous
02/02/26(Mon)00:21:23 No.108037314

Anonymous 02/02/26(Mon)00:21:23 No.108037314

File: baby looking at phone scr(...).jpg (59 KB, 960x720)

59 KB JPG

I want a universally good 300b30a 64k real usable context raw text completion model trained on all the pre-2020 books, and I want it now. Give it to me.

Anonymous
02/02/26(Mon)00:24:18 No.108037329

Anonymous 02/02/26(Mon)00:24:18 No.108037329

File: file.png (46 KB, 645x168)

46 KB PNG

So I finally got 80 gb VRAM and apparently devstral is really good? Does anyone have recommended settings? I was on 70B with 2x3090 for two years and want to make sure I'm doing this shit properly

Anonymous
02/02/26(Mon)00:27:35 No.108037342

Anonymous 02/02/26(Mon)00:27:35 No.108037342

>>108037329
devstral large is just a coding tune of old largestral. it is nothing groundbreaking or even that good in general. you are better off with a large moe.

Anonymous
02/02/26(Mon)00:31:33 No.108037364

Anonymous 02/02/26(Mon)00:31:33 No.108037364

>>108037329
Devstral 2 at iq4xs sometimes (seems like once every 40k tokens?) messed up variable names, like a letter would be miscapitalized or an errand space was inserted or dropped. Idk if it was just the quant I downloaded.

I only tested it briefly when it was released, before switching to unquanted devstral small 2, which, while having a lot fewer egregious errors, was a lot dumber. But it works fine for menial tasks and is faster.

Kimi k2 at q3 beats both, but the prompt processing is atrocious since I'm running on cpu.

Anonymous
02/02/26(Mon)00:40:18 No.108037408

Anonymous 02/02/26(Mon)00:40:18 No.108037408

File: file.png (66 KB, 735x650)

66 KB PNG

>>108037342
>>108037364
Appreciate the input but I don't really have that much RAM (32GB) because these were pulled from my old system so mostly sticking to exl for now. I could try Air or 4.6V, are there any settings for them (see pic rel)? I don't have to much experience with them and the writing feels a little dry.

Anonymous
02/02/26(Mon)00:44:37 No.108037437

Anonymous 02/02/26(Mon)00:44:37 No.108037437

>>108037364
>errand
errant, fuck I'm making the same mistakes as devstral lmao

>>108037408
Maybe try high temps whenever it gets stuck trying to write a cliche phrase or scene, then switch back to a lower temp.

Idk, I haven't really used it for rp other than as an assistant for lore and world-building, where dry writing doesn't really matter.

Anonymous
02/02/26(Mon)00:47:31 No.108037455

Anonymous 02/02/26(Mon)00:47:31 No.108037455

>>108037140
This guy is insufferable

Anonymous
02/02/26(Mon)00:50:18 No.108037473

Anonymous 02/02/26(Mon)00:50:18 No.108037473

>>108032910
Does anyone know a small or medium sized model fine tuned for JP-EN translation? If it's also fine tuned for manga it would be great. I'm currently using Liquid -AI LFM2 350M ENJP

Anonymous
02/02/26(Mon)00:55:52 No.108037506

Anonymous 02/02/26(Mon)00:55:52 No.108037506

>>108037473
>small or medium sized model
Shisa v2 llama 3.1 405b is a nice and small model for edge devices. Works well for translating pixiv novels, haven't tried for manga.
405 is only a few tens more than 350 so you should be able to run it :)

Anonymous
02/02/26(Mon)00:59:17 No.108037533

Anonymous 02/02/26(Mon)00:59:17 No.108037533

>>108037473
https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF

Anonymous
02/02/26(Mon)01:02:35 No.108037557

Anonymous 02/02/26(Mon)01:02:35 No.108037557

>>108037533
Refuses to translate innocuous loli corpse rape stories.

Anonymous
02/02/26(Mon)01:09:28 No.108037600

Anonymous 02/02/26(Mon)01:09:28 No.108037600

>kimi 2.5 is gonna be another case where llama.cpp gets vision support that is 'good enough' that people stop caring to work on it and the quality will be worse than any other inference engine

Anonymous
02/02/26(Mon)01:16:10 No.108037623

Anonymous 02/02/26(Mon)01:16:10 No.108037623

File: Base Image.png (1.79 MB, 1296x4424)

1.79 MB PNG

TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification
https://arxiv.org/abs/2601.23180
>Inference efficiency in Large Language Models (LLMs) is fundamentally limited by their serial, autoregressive generation, especially as reasoning becomes a key capability and response sequences grow longer. Speculative decoding (SD) offers a powerful solution, providing significant speed-ups through its lightweight drafting and parallel verification mechanism. While existing work has nearly saturated improvements in draft effectiveness and efficiency, this paper advances SD from a new yet critical perspective: the verification cost. We propose TriSpec, a novel ternary SD framework that, at its core, introduces a lightweight proxy to significantly reduce computational cost by approving easily verifiable draft sequences and engaging the full target model only when encountering uncertain tokens. TriSpec can be integrated with state-of-the-art SD methods like EAGLE-3 to further reduce verification costs, achieving greater acceleration. Extensive experiments on the Qwen3 and DeepSeek-R1-Distill-Qwen/LLaMA families show that TriSpec achieves up to 35\% speedup over standard SD, with up to 50\% fewer target model invocations while maintaining comparable accuracy.
neat

Anonymous
02/02/26(Mon)01:22:05 No.108037665

Anonymous 02/02/26(Mon)01:22:05 No.108037665

File: Base Image.png (1.03 MB, 1232x2840)

1.03 MB PNG

DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
https://arxiv.org/abs/2601.22889
>Current speech language models generate responses directly without explicit reasoning, leading to errors that cannot be corrected once audio is produced. We introduce \textbf{``Silent Thought, Spoken Answer''} -- a paradigm where speech LLMs generate internal text reasoning alongside spoken responses, with thinking traces informing speech quality. To realize this, we present \method{}, the first diffusion-based speech-text language model supporting both understanding and generation, unifying discrete text and tokenized speech under a single masked diffusion framework. Unlike autoregressive approaches, \method{} jointly generates reasoning traces and speech tokens through iterative denoising, with modality-specific masking schedules. We also construct \dataset{}, the first speech QA dataset with paired text reasoning traces, containing 26K samples totaling 319 hours. Experiments show \method{} achieves state-of-the-art speech-to-speech QA accuracy, outperforming the best baseline by up to 9 points, while attaining the best TTS quality among generative models (6.2\% WER) and preserving language understanding (66.2\% MMLU). Ablations confirm that both the diffusion architecture and thinking traces contribute to these gains.
no links to code or model. seems useful though

Anonymous
02/02/26(Mon)01:24:47 No.108037674

Anonymous 02/02/26(Mon)01:24:47 No.108037674

>llama.cpp gave up on implementing n-grams
It's so over

Anonymous
02/02/26(Mon)01:39:21 No.108037727

Anonymous 02/02/26(Mon)01:39:21 No.108037727

>>108037473
Finetuned specifically for JP, no, but testing translation of various languages (and comparing to pre-existing human translations) is something I routinely do on small models and I can tell you the current SOTA on smaller sizes is Gemma 3n E4B. Nothing even comes close.
Finetroons of smaller models for this tasks don't make them any better than this.
Two recommendations on prompting that makes any tiny model better: repeat your prompt (just have your script double your "translate the following to English: {{content}}" prompt) per what this says: https://arxiv.org/html/2512.14982v1
It just works. It really does. The level of enhancement is unreal.
Next, write your prompt in the source language. For eg if you want to translate Japanese to English, write your request to translate the text to English in Japanese (use Gemini or chatgpt to translate your request if you can't speak the source language at all). This also brings a lot of quality improvements for some reasons.
With 3n + this prompting technique you get some really palatable text that I would call superior to the average fan translation too with the exception of two things: LLMs still get confused a lot by names and will badly translate them or inconsistently spell them out if you do not include a "context" block that spells it out to the LLM directly by giving it a list of names present in the novel and their English translation, and secondly, the gender remains quite often confused when doing languages like JP to EN or other euro languages. Although, even very large API SOTA will also have issues with this,. though less often, I think machine translation is just doomed to be noticeable because of the wrong pronouns being used.

Anonymous
02/02/26(Mon)01:42:11 No.108037744

Anonymous 02/02/26(Mon)01:42:11 No.108037744

>>108037674
source?

Anonymous
02/02/26(Mon)01:47:30 No.108037767

Anonymous 02/02/26(Mon)01:47:30 No.108037767

>>108037744
The PRs for the longcat ngram model and the model its based on
>https://github.com/ggml-org/llama.cpp/pull/19167
>https://github.com/ggml-org/llama.cpp/pull/19182
Basically they're not gonna implement it unless it becomes mainstream

Anonymous
02/02/26(Mon)02:00:35 No.108037825

Anonymous 02/02/26(Mon)02:00:35 No.108037825

>>108037767
>Basically they're not gonna implement it unless it becomes mainstream
It makes sense. Why waste the time to implement a feature that only exists for a seemingly meh model release? normally those labs benchmax very hard whenever they release new models and yet those guys couldn't even beat Qwen on the benchmarks that matter the most lmao (as seen in the table comparison they put themselves in their huggingface page)

Anonymous
02/02/26(Mon)02:21:46 No.108037913

Anonymous 02/02/26(Mon)02:21:46 No.108037913

File: file.png (39 KB, 926x224)

39 KB PNG

>>108037767
I rember when they shelved swa when first mistral was the only model with it good times

Anonymous
02/02/26(Mon)02:27:39 No.108037939

Anonymous 02/02/26(Mon)02:27:39 No.108037939

>>108037767
>>108037913
Do you think they've got knowledge about internal deepseek happenings around engram? I might be wrong but it seems like engram is the future of open models if it actual works, so it seems strange that they wouldn't consider early support for the rumored v4 release.

Anonymous
02/02/26(Mon)02:28:38 No.108037945

Anonymous 02/02/26(Mon)02:28:38 No.108037945

>>108037825
>>108037939
The ngram research is really promising, Deepseek trained a traditional MoE with the same parameters as ngram+MoE and the ngram model was significantly better and is much less resource intensive because the ngram parts are just a lookup table on ram (maybe could be on disk?)

Anonymous
02/02/26(Mon)02:30:07 No.108037950

Anonymous 02/02/26(Mon)02:30:07 No.108037950

>>108037939
>Do you think they've got knowledge about internal deepseek happenings around engram?
lol no they're just hoping they can coast by without implementing anything harder than tweaking a value

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.