/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 01/24/26(Sat)11:51:02 No.107957082

File: trust.jpg (155 KB, 1024x1024)

155 KB JPG

/lmg/ - Local Models General Anonymous 01/24/26(Sat)11:51:02 No.107957082 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>107948284 & >>107941128

►News
>(01/22) Qwen3-TTS (0.6B & 1.8B) with voice design, cloning, and generation: https://qwen.ai/blog?id=qwen3tts-0115
>(01/21) Chroma-4B released: https://hf.co/FlashLabs/Chroma-4B
>(01/21) VibeVoice-ASR 9B released: https://hf.co/microsoft/VibeVoice-ASR
>(01/21) Step3-VL-10B with Parallel Coordinated Reasoning: https://hf.co/stepfun-ai/Step3-VL-10B
>(01/19) GLM-4.7-Flash 30B-A3B released: https://hf.co/zai-org/GLM-4.7-Flash

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
01/24/26(Sat)11:51:44 No.107957086

Anonymous 01/24/26(Sat)11:51:44 No.107957086

File: miku brap paperwork offic(...).jpg (157 KB, 1024x1024)

157 KB JPG

►Recent Highlights from the Previous Thread: >>107948284

--Roleplay reasoning debates and latent space implementation challenges:
>107949603 >107949642 >107949679 >107949809 >107949813 >107949895 >107949950 >107949974 >107950003 >107950023 >107950049 >107950084 >107950424 >107950452 >107950499 >107950160
--Qwen3-TTS CPU performance and implementation challenges:
>107954203 >107954226 >107954242 >107954239 >107954256 >107954339 >107954377 >107954492
--GPU memory allocation challenges with KoboldCpp AI model inference:
>107950833 >107950855 >107950860 >107950891 >107950934 >107950970 >107951011 >107951047 >107951064 >107950862
--Challenges in applying REAP for model improvement in specialized domains:
>107953603 >107953771 >107955008 >107955030 >107955063 >107955200 >107955229 >107955262 >107954148
--AI model overfitting on surgeon riddle gender assumptions vs content:
>107949203 >107949996 >107950298 >107950340 >107950478 >107950536 >107950360 >107950443
--Qwen3-TTS GPU inference architecture inefficiency analysis:
>107950937
--Testing Gemma-3n-E4B's grammar correction limitations:
>107948335
--Qwen-TTS finetuning challenges and optimizations:
>107948363 >107948388 >107948418 >107948481 >107948517
--Local document search solutions for multi-format file support:
>107950659 >107950797 >107950875
--VRAM requirements for Qwen TTS models:
>107955047 >107955082 >107955161
--Qwen3-122B model features: VoiceDesign vs Custom/Voice with premium timbre support:
>107952883 >107954969
--glm-4.7 local performance and hardware requirements:
>107952076 >107953159 >107953348 >107953489
--open webui as a practical yet imperfect chatbot/development platform:
>107952173 >107952257 >107952293 >107952330 >107952338
--Luka and Miku (free space):
>107949352 >107949426 >107951004 >107954064 >107956748

►Recent Highlight Posts from the Previous Thread: >>107948290

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
01/24/26(Sat)12:04:20 No.107957160

Anonymous 01/24/26(Sat)12:04:20 No.107957160

File: view.jpg (148 KB, 1280x704)

148 KB JPG

Anonymous
01/24/26(Sat)12:04:22 No.107957161

Anonymous 01/24/26(Sat)12:04:22 No.107957161

Order a trusty Office Miku now!

Anonymous
01/24/26(Sat)12:13:50 No.107957215

Anonymous 01/24/26(Sat)12:13:50 No.107957215

that glm 4.6 ego death anon is a massive faggot but his spirit is right about llm usage personally all my fucking chatting with r1 (and others models to a lesser extent though a honorable mention to kimi-0905 for the sex) has noticebaly fucked my mental up all my imagination is more vivid and creative now unironically feels like a consciousness expansion

Anonymous
01/24/26(Sat)12:24:34 No.107957292

Anonymous 01/24/26(Sat)12:24:34 No.107957292

>grok-code-fast-1 (only free model) removed from roo code
why tho
(yes off topic but still why tho)
what to use now?

Anonymous
01/24/26(Sat)12:26:08 No.107957312

Anonymous 01/24/26(Sat)12:26:08 No.107957312

>>107957292
did they also remove it from cline? ive been using grok fast quite a bit, I think they also offer m2 and devstral but grok fast was just better from my experiments.

Anonymous
01/24/26(Sat)12:26:18 No.107957314

Anonymous 01/24/26(Sat)12:26:18 No.107957314

>>107957292
TOSS

Anonymous
01/24/26(Sat)12:28:44 No.107957336

Anonymous 01/24/26(Sat)12:28:44 No.107957336

>>107957292
devstral. they're removing the free use, but you can get a mistral api key and you get 1B/month tokens for free.

Anonymous
01/24/26(Sat)12:39:32 No.107957396

Anonymous 01/24/26(Sat)12:39:32 No.107957396

So, minimax-m2-her could actually be kino right? Same with mistral small creative? WHY won't they release them REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE

Anonymous
01/24/26(Sat)12:50:37 No.107957481

Anonymous 01/24/26(Sat)12:50:37 No.107957481

>>107957396
>minimax-m2-her
interesting, they haven't mentioned this on their xitter or discord or anything yet. I wouldn't count it out that they would release it, they've had a delay between API and weights releases before

Anonymous
01/24/26(Sat)12:52:42 No.107957495

Anonymous 01/24/26(Sat)12:52:42 No.107957495

>>107957312
idk about cline, never used it. yes, grok-code-fast-1 was great for being absolutely free over roo code. rip I guess.

>>107957336
cool, will try it out.

Anonymous
01/24/26(Sat)12:53:27 No.107957498

Anonymous 01/24/26(Sat)12:53:27 No.107957498

>>107957396
>mistral small creative
Never bothered trying it since it's api only. but is it actually good?

Anonymous
01/24/26(Sat)12:58:51 No.107957535

Anonymous 01/24/26(Sat)12:58:51 No.107957535

>>107957498
Some anons tried it and results were inconclusive, like most small api models. Spitballing here but it's probably their bog standard 24B model with some extra long-form book and RP datasets thrown in.

Anonymous
01/24/26(Sat)12:59:26 No.107957543

Anonymous 01/24/26(Sat)12:59:26 No.107957543

>>107957396
>WHY won't they release them
1) Work in progress / interest check phase;
2) Probably still thinking of ways for preventing grifters from profiting off these specialized models at essentially no cost.

Anonymous
01/24/26(Sat)13:03:37 No.107957569

Anonymous 01/24/26(Sat)13:03:37 No.107957569

>>107957543
>Probably still thinking of ways for preventing grifters from profiting off these specialized models at essentially no cost.
The french are very culturally socialist. I really don't think it's a profit issue.

Anonymous
01/24/26(Sat)13:06:38 No.107957598

Anonymous 01/24/26(Sat)13:06:38 No.107957598

>>107957543
but they literally distill from deepseek lmao

Anonymous
01/24/26(Sat)13:07:00 No.107957601

Anonymous 01/24/26(Sat)13:07:00 No.107957601

File: 1641143339-macron-lunette(...).png (66 KB, 272x204)

66 KB PNG

>>107957569
>very culturally socialist
Reading that as a french almost made me choke on my drink

Anonymous
01/24/26(Sat)13:08:38 No.107957618

Anonymous 01/24/26(Sat)13:08:38 No.107957618

>>107957601
Maybe I'm over generalizing but the french tech sector is full of co-ops

Anonymous
01/24/26(Sat)13:10:39 No.107957634

Anonymous 01/24/26(Sat)13:10:39 No.107957634

>>107957543
>1) Work in progress / interest check phase;
Minimax literally owns one of the bigger normalfag AI chat platforms, I don't think they have to do much interest checking

Anonymous
01/24/26(Sat)13:11:44 No.107957645

Anonymous 01/24/26(Sat)13:11:44 No.107957645

>>107957618
it is, the frenchoid is coping

Anonymous
01/24/26(Sat)13:12:18 No.107957650

Anonymous 01/24/26(Sat)13:12:18 No.107957650

>>107957569
They've already released a few models with a non-commercial / research-only license in the past.

Anonymous
01/24/26(Sat)13:16:02 No.107957677

Anonymous 01/24/26(Sat)13:16:02 No.107957677

Is Qwen3-TTS coming to ComfyUI?
I spent hours trying to get it set up with conda and came very close to throwing my PC out the window.

>>107957601
Are people really trying to turn Macron's eye infection into good PR

Anonymous
01/24/26(Sat)13:18:24 No.107957696

Anonymous 01/24/26(Sat)13:18:24 No.107957696

>>107957677
>Is Qwen3-TTS coming to ComfyUI?
If you look in /ldg/ pretty sure someone linked a working node for it.

Anonymous
01/24/26(Sat)13:19:55 No.107957707

Anonymous 01/24/26(Sat)13:19:55 No.107957707

>>107957086
When the paperwork is an Escher-esque hyperdimensional anomaly

Anonymous
01/24/26(Sat)13:22:00 No.107957721

Anonymous 01/24/26(Sat)13:22:00 No.107957721

>>107957618
It's full of corruption

Anonymous
01/24/26(Sat)13:22:42 No.107957727

Anonymous 01/24/26(Sat)13:22:42 No.107957727

> Qwen3-TTS
Can it speak with horny voice of Widowmaker from r34 vids?

Anonymous
01/24/26(Sat)13:25:14 No.107957746

Anonymous 01/24/26(Sat)13:25:14 No.107957746

>>107957727
Use case?

Anonymous
01/24/26(Sat)13:26:26 No.107957759

Anonymous 01/24/26(Sat)13:26:26 No.107957759

>>107957746
cumming

Anonymous
01/24/26(Sat)14:00:28 No.107958000

Anonymous 01/24/26(Sat)14:00:28 No.107958000

After some testing, I got disappointed in qwen-tts. It's very bad at pauses, often ignoring "...".
It speaks too fast for immersive rp narrative.
Its advantage is autoregressive architecture which should allow very long generation without chunking, but in reality it gets unstable with long prompts, so you still need chunking. You're left only with the main disadvantage of autoregressive archetictures: it can't saturate GPU and therefore slow.
It's not usable for Japanese as is because it misreads tons of kanji (you need to preprocess Japanese text but then refer to previous issues).
And after all, 5-second references suck. They barely capture average timbre. All modern TTS should implement long reference audio like Echo, which can take up to 5 minutes and captures all nuances of prosody.

Anonymous
01/24/26(Sat)14:03:01 No.107958013

Anonymous 01/24/26(Sat)14:03:01 No.107958013

>>107958000
>you need to preprocess Japanese
so it works with phenomes?

Anonymous
01/24/26(Sat)14:06:40 No.107958035

Anonymous 01/24/26(Sat)14:06:40 No.107958035

Is iq2 of a 30B model usable or should I stick to q8-6 of some 12B?

Anonymous
01/24/26(Sat)14:07:11 No.107958040

Anonymous 01/24/26(Sat)14:07:11 No.107958040

>>107958035
>Is iq2 of a 30B model usable
definitely not.

Anonymous
01/24/26(Sat)14:08:31 No.107958047

Anonymous 01/24/26(Sat)14:08:31 No.107958047

>>107958013
It works with kana. The model just misreads many words written in kanji. You basically need to convert all words written in kanji to their kana readings.

Anonymous
01/24/26(Sat)14:14:51 No.107958092

Anonymous 01/24/26(Sat)14:14:51 No.107958092

>>107958035
30B models are moe. you can run them on cpu.

Anonymous
01/24/26(Sat)14:17:02 No.107958107

Anonymous 01/24/26(Sat)14:17:02 No.107958107

>>107958092
glm flash is moe?

Anonymous
01/24/26(Sat)14:17:44 No.107958111

Anonymous 01/24/26(Sat)14:17:44 No.107958111

moe moe

Anonymous
01/24/26(Sat)14:18:24 No.107958118

Anonymous 01/24/26(Sat)14:18:24 No.107958118

>>107958107
yes, it's moe moe kyun~

Anonymous
01/24/26(Sat)15:01:46 No.107958398

Anonymous 01/24/26(Sat)15:01:46 No.107958398

>>107957215
I am him. AMA!

Anonymous
01/24/26(Sat)15:03:47 No.107958416

Anonymous 01/24/26(Sat)15:03:47 No.107958416

HP makes a pcie card for their blade servers that will accommodate four mxm cards. You can find them on ebay for ~$25. You can also find 16gb tesla p60 mxm cards ~$125.
You could also get a single mxm to pcie adapter ~$100 but that really kills the price of this setup since you would have to buy four of them.
It is not the fastest thing but you would get 64gb of vram. Has anyone done anything like this?

Anonymous
01/24/26(Sat)15:11:24 No.107958478

Anonymous 01/24/26(Sat)15:11:24 No.107958478

Does LTX-2 run on Linux + AMD? Seems to require CUDA.
>>107958416
Pretty damn smart idea. Personally I've researched using PCIe switches but they are pretty rare. An eBay seller sells $300 1 to 4, 3.0x16 switches that don't require mobo bifurcation support. If your motherboard does support bifurcation you can get a x16 to 4x oculink connectors, then plug those into external GPU enclosures.

This research was part of a potential project of running AI directly off intel optane on DDR3 systems lol.

Anonymous
01/24/26(Sat)15:13:59 No.107958498

Anonymous 01/24/26(Sat)15:13:59 No.107958498

>>107958416
>tesla p
e-waste

Anonymous
01/24/26(Sat)15:14:04 No.107958501

Anonymous 01/24/26(Sat)15:14:04 No.107958501

>>107958000
>5-second references
yikes, really? there's no mention of reference audio length on their github page

Anonymous
01/24/26(Sat)15:17:37 No.107958522

Anonymous 01/24/26(Sat)15:17:37 No.107958522

Are there any extensions for sillytavern which intelligently manage stats, location, etc.? Having the llm append a fat list at the end of the message seems like a waste of tokens to be generated and also used as only the latest stats should be important no?

Anonymous
01/24/26(Sat)15:18:28 No.107958531

Anonymous 01/24/26(Sat)15:18:28 No.107958531

>>107958416
Tesla cards are very old at this point. They lack support for lots of essential cuda features.

>>107958478
>x16 to 4x oculink connector
Just note that prompt processing is very dependent on cpu-gpu bandwidth if you load some weights into ram like with moe.

Anonymous
01/24/26(Sat)15:18:30 No.107958532

Anonymous 01/24/26(Sat)15:18:30 No.107958532

File: 1756629366793618.jpg (77 KB, 1292x790)

77 KB JPG

>>107958478
>If your motherboard does support bifurcation you can get a x16 to 4x oculink connectors, then plug those into external GPU enclosures.
I have two HP z440s at home and I know those support bifurcation
That is actually a interesting idea too. I could reuse all the gpus i have sitting around in my house. Assuming you can find cheap adapters for the gpus.

Anonymous
01/24/26(Sat)15:19:55 No.107958540

Anonymous 01/24/26(Sat)15:19:55 No.107958540

>>107958498
transforming e-waste into something else is fun, more fun than just sending a prompt to a company for their machines to generate results

Anonymous
01/24/26(Sat)15:27:48 No.107958598

Anonymous 01/24/26(Sat)15:27:48 No.107958598

>>107958532
From my research the cheapest you can find oculink to pcie x16 female boards is around $50. It also needs an external PSU and mounting. The best mounting system I saw on youtube. A guy had the PCIE brackets supported structurally then the rear had a single support for weight.

Since PSUs not running a computer it's probably best to manually track what rail supports what wattage then make custom adapter cables to fully utilize the PSU (and account for momentary spikes in consumption.)

PCIe bifurcation only gets you so far. PCIe switches are limited by bandwidth. Bandwidth is determined by load so performance and limits are hard to guess unless you're running workloads.

Personally I feel like the latency advantage of optane should improve performance massively, probably more than GPUs, but I don't have the money to buy hardware and test.

Anonymous
01/24/26(Sat)15:29:33 No.107958611

Anonymous 01/24/26(Sat)15:29:33 No.107958611

>>107958540
More than fun, you acquire a lot of odd skills that end up being usefull in the most surprising ways fucking around with stuff.

Anonymous
01/24/26(Sat)15:32:33 No.107958632

Anonymous 01/24/26(Sat)15:32:33 No.107958632

You can also misuse hardware like this quad PCIE 4.0 NVMe M.2 carrier card, throw m.2 to oculink adapters, then use those to run GPUs. Again, no apparent limits than the bandwidth and cost. $250 for a single pcie switching board, plus 4 m.2 oculink adapters, plus 4 oculink cables, plus 4 oculink pcie docks, plus ~2 PSUs...
https://www.amazon.com/Card-PCI-Support-Non-Bifurcation-Motherboard-3005K/dp/B0FQBMKHVD?th=1

Too bad pcie 4.0x8 and 3.0x16 have the same bandwidth (32gb/s), so it's the same problem as before...

Anonymous
01/24/26(Sat)15:35:38 No.107958660

Anonymous 01/24/26(Sat)15:35:38 No.107958660

>>107957082
Did anyone get Qwen3 TTS running on windows with nvidia? I tried the conda install guide with py312 but flash attention doesn't build.

Anonymous
01/24/26(Sat)15:36:26 No.107958665

Anonymous 01/24/26(Sat)15:36:26 No.107958665

>>107958660
Get the prebuilt wheels?

Anonymous
01/24/26(Sat)15:37:08 No.107958671

Anonymous 01/24/26(Sat)15:37:08 No.107958671

>>107958665
I've tried half a dozen different prebuilt wheels from different sources, nothing worked for me.

Anonymous
01/24/26(Sat)15:38:09 No.107958678

Anonymous 01/24/26(Sat)15:38:09 No.107958678

>>107958660
Also, I could try to remove the flash attention2 requirements from the inference code, but

Anonymous
01/24/26(Sat)15:38:56 No.107958685

Anonymous 01/24/26(Sat)15:38:56 No.107958685

>>107958671
Are you supposed to use python 3.12? Only times I've ever seen wheels build failures is the wrong python version. Normally due to not using a venv.

Anonymous
01/24/26(Sat)15:41:23 No.107958709

Anonymous 01/24/26(Sat)15:41:23 No.107958709

>>107958685
Yes, I've tried multiple 312 versions. I've tried multiple cuda versions cuda to 12.8,12.6,12.4, but they all fail with flash atten for both compiling and prebuilt. I also have vs2022 buildtools so I went into the vs2022 dev command prompt to see if I can get the flash atten to build, but that too failed on all those cuda versions.

Anonymous
01/24/26(Sat)15:41:45 No.107958714

Anonymous 01/24/26(Sat)15:41:45 No.107958714

>>107958671
https://github.com/Dao-AILab/flash-attention
What's your gpu anyway?
>>107958685
The model card says 3.12

Anonymous
01/24/26(Sat)15:42:00 No.107958719

Anonymous 01/24/26(Sat)15:42:00 No.107958719

>>107958685
I have the same problem. I think it's due to flash attention not supporting cuda 13 yet.

Anonymous
01/24/26(Sat)15:42:24 No.107958722

Anonymous 01/24/26(Sat)15:42:24 No.107958722

>>107958714
2070

Anonymous
01/24/26(Sat)15:43:08 No.107958726

Anonymous 01/24/26(Sat)15:43:08 No.107958726

>>107958722
nigga

Anonymous
01/24/26(Sat)15:44:09 No.107958736

Anonymous 01/24/26(Sat)15:44:09 No.107958736

FlashAttention-2 with CUDA currently supports:

Ampere, Ada, or Hopper GPUs (e.g., A100, RTX 3090, RTX 4090, H100). Support for Turing GPUs (T4, RTX 2080) is coming soon, please use FlashAttention 1.x for Turing GPUs for now.
Datatype fp16 and bf16 (bf16 requires Ampere, Ada, or Hopper GPUs).
All head dimensions up to 256. Head dim > 192 backward requires A100/A800 or H100/H800. Head dim 256 backward now works on consumer GPUs (if there's no dropout) as of flash-attn 2.5.5.

Anonymous
01/24/26(Sat)15:45:30 No.107958746

Anonymous 01/24/26(Sat)15:45:30 No.107958746

File: how_do_we_tell_him.jpg (87 KB, 1031x593)

87 KB JPG

>>107958736
>Support for Turing GPUs (T4, RTX 2080) is coming soon

Anonymous
01/24/26(Sat)15:46:28 No.107958753

Anonymous 01/24/26(Sat)15:46:28 No.107958753

>>107958719
>>107958736
what really grinds my gears is that qwen3-tts "recommends" this package, when in reality, it's build essential and also super fucking temperamental

Anonymous
01/24/26(Sat)15:47:00 No.107958756

Anonymous 01/24/26(Sat)15:47:00 No.107958756

>>107958746
It's directly from the FA repo, not my words.

Anonymous
01/24/26(Sat)15:47:06 No.107958758

Anonymous 01/24/26(Sat)15:47:06 No.107958758

>>107958736
>Support for Turing GPUs (T4, RTX 2080) is coming soon
le mao indeed, i wonder if I can swap it to xformers or something

Anonymous
01/24/26(Sat)15:48:10 No.107958768

Anonymous 01/24/26(Sat)15:48:10 No.107958768

>>107958756
I think they've had this exact phrase since at least the release of Stable Diffusion.

Anonymous
01/24/26(Sat)15:49:02 No.107958774

Anonymous 01/24/26(Sat)15:49:02 No.107958774

>>107958758
Try FA1 as they say for old gpus.

Anonymous
01/24/26(Sat)15:50:04 No.107958782

Anonymous 01/24/26(Sat)15:50:04 No.107958782

>>107958719
>flash attention not supporting cuda 13 yet
Here:
https://github.com/mjun0812/flash-attention-prebuild-wheels/releases

Anonymous
01/24/26(Sat)15:50:11 No.107958783

Anonymous 01/24/26(Sat)15:50:11 No.107958783

>>107958709
This anon is probably right maybe your GPU is too old >>107958736 . Uhh there's only one version of Python 3.12. I'm on Linux but try running something like this. It should eliminate python version being the issue.
>pip install virtualenv
>virtualenv -p python3.12 qwen3tts
>source qwen3tts/bin/activate
Then run your install scripts in the directory you are installing to.

Anonymous
01/24/26(Sat)15:56:00 No.107958846

Anonymous 01/24/26(Sat)15:56:00 No.107958846

>>107957082
>be me, hear how great nano bananas & rice is when used with agentic coding via an mcp
>ask gemini how to do it, follow the instructions
>doesn't work, model not found
>try 5 other guides
>none of them work
>my agent gets fed up and writes a python script that hits gemini's api directly. works on the first try
I fucking hate mcps bros

Anonymous
01/24/26(Sat)15:58:51 No.107958874

Anonymous 01/24/26(Sat)15:58:51 No.107958874

>>107958846
I forgot to add that every guide is like "just install with npx -y @somegayfaggot/babbys-first-mcp --apiKey your_unlimited_all_access_pass" and we're supposed to think it's okay because it has MCP in the name. People have lost their fucking minds.

Anonymous
01/24/26(Sat)15:59:00 No.107958878

Anonymous 01/24/26(Sat)15:59:00 No.107958878

>>107958758
>>107958774
>>107958783

    tts = Qwen3TTSModel.from_pretrained(
        ckpt,
        device_map=args.device,
        dtype=dtype,
        attn_implementation="sdpa",
    )

Swapped to sdpa. 1 line change in demo.py

Anonymous
01/24/26(Sat)16:08:35 No.107958970

Anonymous 01/24/26(Sat)16:08:35 No.107958970

>>107958532
Power situation ugly. For many GPU inference rig consider a mining frame. Perhaps those pricey PCIe switch boards open more platform options

Anonymous
01/24/26(Sat)17:02:10 No.107959380

Anonymous 01/24/26(Sat)17:02:10 No.107959380

when are models going to be trained to listen to what the fuck I tell them to do instead of following tropes
>do not intervene in the fight
>thinking: Ok, I will not intervene in the fight
>500 tokens later intervenes in fight automatically

Anonymous
01/24/26(Sat)17:04:33 No.107959409

Anonymous 01/24/26(Sat)17:04:33 No.107959409

Is Qwen the gold standard for voice now? What's the best T2S?

Anonymous
01/24/26(Sat)17:04:36 No.107959410

Anonymous 01/24/26(Sat)17:04:36 No.107959410

>>107959380
>do not
blue elephant

Anonymous
01/24/26(Sat)17:06:19 No.107959425

Anonymous 01/24/26(Sat)17:06:19 No.107959425

I had an idea last night. Would it be interesting to create a game show, similar to Among Us, where a bunch of AI agents pretend to be human (or are gaslit into thinking they're human) and progressively vote to kill each other based on who they suspect isn't human?

I haven't yet really explored the concept of having AI agents interact with each other but it seems somewhat interesting. Even something more benign where they have to work together to navigate and survive a 3D game environment seems fun. It would be like watching a bunch of retarded Sims NPCs try to keep a bunker or a space station operational in an apocalypse scenario.

In general I like the concept of video games where you don't even play as a character but instead the god who can inflict punishments and change circumstances at will. Seeing a bunch of little goy AIs kill each other to maximize self-preservation at the cost of group survival would be intriguing.

Has anything like this already been made?

Anonymous
01/24/26(Sat)17:06:53 No.107959431

Anonymous 01/24/26(Sat)17:06:53 No.107959431

>>107959410
it's not a blue elephant because even without that instruction or telling it that X is going to talk with Y without caring about the fight it still forces interference if you let it keep generating
it's just overtraining on MC tropes

Anonymous
01/24/26(Sat)17:07:38 No.107959440

Anonymous 01/24/26(Sat)17:07:38 No.107959440

>>107959380
it's possible there's some human-like psychology going here where the model hears a negative statement and quickly forgets that the word "not" was in it
a negation still introduces the word intervene in this case, something like "you observe passively" wouldn't have that problem; but the problem could also be some gay safety programming

Anonymous
01/24/26(Sat)17:11:32 No.107959458

Anonymous 01/24/26(Sat)17:11:32 No.107959458

>>107959440
I have tried multiple variations of X doesn't care, X is literally neutral evil so simulate him properly, X is actually secretly the big bad and wants the hero to die, X walks off towards the forest to look for mushrooms with Y while Z gets eaten
it will eventually handwave a way for X to intervene that stops the fight
I mean in one instance X was literally in a normal conversation when he suddenly decided to throw a rock and get involved
it's honestly extremely stupid

Anonymous
01/24/26(Sat)17:12:16 No.107959464

Anonymous 01/24/26(Sat)17:12:16 No.107959464

>>107959425
You can find a lot of youtube videos with that kind of experiment (simpler than your scenario though). This one for example https://www.youtube.com/watch?v=0MmIZLTMHUw

Anonymous
01/24/26(Sat)17:14:21 No.107959483

Anonymous 01/24/26(Sat)17:14:21 No.107959483

>>107959425
A channel that does this is actually Live right now

https://youtu.be/JhBtg-lyKdo

https://www.youtube.com/watch?v=b_B2HJUzHNE

Anonymous
01/24/26(Sat)17:17:48 No.107959505

Anonymous 01/24/26(Sat)17:17:48 No.107959505

File: Screenshot 2026-01-24 at (...).png (58 KB, 640x360)

58 KB PNG

>>107959483
kek

Anonymous
01/24/26(Sat)17:55:52 No.107959751

Anonymous 01/24/26(Sat)17:55:52 No.107959751

I'm back, what did I miss?

Anonymous
01/24/26(Sat)17:59:35 No.107959769

Anonymous 01/24/26(Sat)17:59:35 No.107959769

>>107959751
The jews and pajeets that still support Donald Trump are cheering on another hamfisted officer involved shooting of a white person

Anonymous
01/24/26(Sat)18:06:46 No.107959814

Anonymous 01/24/26(Sat)18:06:46 No.107959814

Does anyone have experience using nano-gpt.com to simplify switching between different models?

Anonymous
01/24/26(Sat)18:09:12 No.107959829

Anonymous 01/24/26(Sat)18:09:12 No.107959829

>>107959769
go back to /pol/. this thread is for the discussion of ai.

Anonymous
01/24/26(Sat)18:10:19 No.107959832

Anonymous 01/24/26(Sat)18:10:19 No.107959832

>>107959814
we use ollama cloud here to run our beloved models locally even on weak hardware

Anonymous
01/24/26(Sat)18:13:10 No.107959856

Anonymous 01/24/26(Sat)18:13:10 No.107959856

It's fucking nuts how you can make a TTS engine run in python in about 200 lines of code but if you try to port it to C/C++ you'll need about 2,000. Python's dependency hell sucks for end users but it's also really nice for developers. I get it now. Just glad I found out about uv, because it makes it a lot easier to manage.

Anonymous
01/24/26(Sat)18:16:49 No.107959878

Anonymous 01/24/26(Sat)18:16:49 No.107959878

lot of uv shilling recently

Anonymous
01/24/26(Sat)18:22:29 No.107959917

Anonymous 01/24/26(Sat)18:22:29 No.107959917

>>107959878
is uv actually better? micromamba seems fine and actually works. uv seems like people like it just because it's rust

Anonymous
01/24/26(Sat)18:24:57 No.107959930

Anonymous 01/24/26(Sat)18:24:57 No.107959930

>>107959769
oh no a homosexual race communist died
he gets sunburnt easily, the jews must love this

Anonymous
01/24/26(Sat)18:27:42 No.107959950

Anonymous 01/24/26(Sat)18:27:42 No.107959950

>>107959878
It's working. I dropped miniconda when I realized I was only using it to set up envs and I could pip install everything I need.

Anonymous
01/24/26(Sat)18:30:18 No.107959967

Anonymous 01/24/26(Sat)18:30:18 No.107959967

>>107957082
I don't know WTF he did but ik_llama.cpp takes forever to compile compile compared to upstream.

Anonymous
01/24/26(Sat)18:32:59 No.107959988

Anonymous 01/24/26(Sat)18:32:59 No.107959988

>>107959917
I like uv. it's slightly more sane than the rest of the python packaging world and it is faster than pip. No real reason to switch from a working setup though. There's also pixi, another trendy rewritten-in-rust conda alternative (I think). I don't understand differences between pypi and conda well enough to have an opinion on it.

Anonymous
01/24/26(Sat)18:38:05 No.107960023

Anonymous 01/24/26(Sat)18:38:05 No.107960023

>>107959878
uv and the rest of tooling from astral with Python doing everything in Rust is pretty good, faster than the alternatives as they claim and offering much saner defaults. Ruff and ty is good too. The only downside is they are now part of Cloudflare. But I'm not going to knock good tools when I see and can use them for gratis and libre purposes.

Anonymous
01/24/26(Sat)18:41:25 No.107960047

Anonymous 01/24/26(Sat)18:41:25 No.107960047

Is Clawdbot just a FOTM meme or is it good? Sounds like you can use local models with it.

Anonymous
01/24/26(Sat)18:41:41 No.107960049

Anonymous 01/24/26(Sat)18:41:41 No.107960049

File: 1751299876324183.png (1.03 MB, 1058x1272)

1.03 MB PNG

Who's making this eval
I need my future AI wife to be nimble with her hands

Anonymous
01/24/26(Sat)18:43:07 No.107960057

Anonymous 01/24/26(Sat)18:43:07 No.107960057

>>107960047
never heard of it

Anonymous
01/24/26(Sat)19:10:07 No.107960244

Anonymous 01/24/26(Sat)19:10:07 No.107960244

https://pub.sakana.ai/DroPE/
https://www.arxiv.org/pdf/2512.12167
~10k$ for a 70b llama, 75b tokens, 400hours

>get a model trained with rope (FUCK ROPE)
>delete the positional embeddings entirely
>continue training briefly on the same short-context data
>done, the model now generalizes to much longer contexts than it was trained on

Anonymous
01/24/26(Sat)19:12:38 No.107960264

Anonymous 01/24/26(Sat)19:12:38 No.107960264

>>107960244
>sakana.ai
lul

Anonymous
01/24/26(Sat)19:26:13 No.107960346

Anonymous 01/24/26(Sat)19:26:13 No.107960346

>>107960244
that's seems like a pretty cool discovery, but i'm quite skeptical because of the authors' transformers 2™ grift

Anonymous
01/24/26(Sat)19:28:26 No.107960358

Anonymous 01/24/26(Sat)19:28:26 No.107960358

>>107960346
>>107960264
Stop focusing on who said the thing, focus on the thing itself instead.

Anonymous
01/24/26(Sat)19:31:17 No.107960382

Anonymous 01/24/26(Sat)19:31:17 No.107960382

File: Screenshot 2026-01-25 at (...).png (46 KB, 787x251)

46 KB PNG

>>107960244
Cool shitpost

Anonymous
01/24/26(Sat)19:32:24 No.107960396

Anonymous 01/24/26(Sat)19:32:24 No.107960396

>>107959967
>I don't know WTF he did but ik_llama.cpp takes forever to compile compile compared to upstream.
Yeah same for me. It's because of all the new new ik quant types upstream doesn't have.

Anonymous
01/24/26(Sat)19:35:06 No.107960418

Anonymous 01/24/26(Sat)19:35:06 No.107960418

>>107960358
>Stop focusing on who said the thing, focus on the thing itself instead.
I'm testing this on a small model. I'll upload the model even if it doesn't work.

Anonymous
01/24/26(Sat)19:45:24 No.107960489

Anonymous 01/24/26(Sat)19:45:24 No.107960489

File: Victor Frankenstein Elsa (...).png (1.93 MB, 1408x752)

1.93 MB PNG

>>107955991
>Can you set up emotion tags with it? I love vibevoice but seriously lacks control over the output
vibevoice has emotion tags
Prompt below. cfg slider is set to 2.0:
[panic attack onset, breathless urgency] Please—don’t make me go out there—I can’t—I just can’t!
input audio:
https://vocaroo.com/1cqcSTePacSa
output audio:
https://vocaroo.com/1g1B0Zm5YMso

Anonymous
01/24/26(Sat)19:48:52 No.107960506

Anonymous 01/24/26(Sat)19:48:52 No.107960506

>>107960489
nta, but is there a list of these somewhere, or do you just guess what works and what doesn't?

Anonymous
01/24/26(Sat)19:51:36 No.107960521

Anonymous 01/24/26(Sat)19:51:36 No.107960521

>>107959856
That "200 lines of code" is backed up couple gigabytes of dlls, python libraries, etc. Its moot.

If we're strickly talking about cuda usage, there are cuda dlls that are few hundred MB for c++ and can are used in cpp version of the software. Like stable diffusion.cpp or tts.cpp or llama.cpp or whisper.cpp

Optimizations in python's backhaul has to be ported over to the cpp

Anonymous
01/24/26(Sat)19:56:24 No.107960550

Anonymous 01/24/26(Sat)19:56:24 No.107960550

>>107959878
>uv shilling
Honestly if you're not using uv you're just straight up retarded.

Anonymous
01/24/26(Sat)20:01:16 No.107960582

Anonymous 01/24/26(Sat)20:01:16 No.107960582

Nvidijeet vibecoder gguf pull request status?

Anonymous
01/24/26(Sat)20:03:05 No.107960594

Anonymous 01/24/26(Sat)20:03:05 No.107960594

>>107960550
venv works

Anonymous
01/24/26(Sat)20:14:32 No.107960661

Anonymous 01/24/26(Sat)20:14:32 No.107960661

Are there qwen3-tts quantz yet?

Anonymous
01/24/26(Sat)20:17:02 No.107960679

Anonymous 01/24/26(Sat)20:17:02 No.107960679

I'm sick of this generation of LLMs.

Anonymous
01/24/26(Sat)20:30:21 No.107960756

Anonymous 01/24/26(Sat)20:30:21 No.107960756

File: 1757225970454710.png (42 KB, 1054x224)

42 KB PNG

>>107960594

Anonymous
01/24/26(Sat)20:33:23 No.107960775

Anonymous 01/24/26(Sat)20:33:23 No.107960775

>>107959878
I just use miniconda like back in the llama 2 ooba days

Anonymous
01/24/26(Sat)20:35:20 No.107960780

Anonymous 01/24/26(Sat)20:35:20 No.107960780

>>107960756
>using pyhthon in 2026
lol

Anonymous
01/24/26(Sat)20:38:36 No.107960796

Anonymous 01/24/26(Sat)20:38:36 No.107960796

>>107960594
Do you also just create dated copies of your code in a folder as version control?

Anonymous
01/24/26(Sat)20:48:36 No.107960841

Anonymous 01/24/26(Sat)20:48:36 No.107960841

File: python.png (188 KB, 724x808)

188 KB PNG

>>107959856
>It's fucking nuts how you can make a TTS engine run in python in about 200 lines of code but if you try to port it to C/C++ you'll need about 2,000. Python's dependency hell sucks for end users but it's also really nice for developers. I get it now. Just glad I found out about uv, because it makes it a lot easier to manage.

Anonymous
01/24/26(Sat)21:16:29 No.107961005

Anonymous 01/24/26(Sat)21:16:29 No.107961005

>>107960358
I read the paper, it's actually pretty good but I still have some questions.

>NoPE transformers break on repeating sequences
They don't propose a solution for this

I suspect that NoPE still performs better when both converge

Anonymous
01/24/26(Sat)21:31:12 No.107961084

Anonymous 01/24/26(Sat)21:31:12 No.107961084

>>107960775
The official python environment manager for people with large penises

Anonymous
01/24/26(Sat)21:33:30 No.107961099

Anonymous 01/24/26(Sat)21:33:30 No.107961099

File: h.webm (1.45 MB, 540x540)

1.45 MB WEBM

is qwen tts any good?
vid unrelated

Anonymous
01/24/26(Sat)21:37:55 No.107961126

Anonymous 01/24/26(Sat)21:37:55 No.107961126

>>107959425
Best I can give you is this
https://www.youtube.com/watch?v=0MmIZLTMHUw

Anonymous
01/24/26(Sat)21:50:49 No.107961178

Anonymous 01/24/26(Sat)21:50:49 No.107961178

>>107960661
There's one for apple mlx format, but I dont think we can use that for non-apple devices So its useless

Anonymous
01/24/26(Sat)21:51:22 No.107961182

Anonymous 01/24/26(Sat)21:51:22 No.107961182

what would be a good nsfw model for tavern that one can run on llama with a 3060 12gb nowadays?

Would appreciate the help.

Anonymous
01/24/26(Sat)21:54:23 No.107961199

Anonymous 01/24/26(Sat)21:54:23 No.107961199

>>107961182
nemo, unless you have a lot of ram. then glm air.

Anonymous
01/24/26(Sat)21:56:28 No.107961210

Anonymous 01/24/26(Sat)21:56:28 No.107961210

>>107961199
any specific nemo version? theres like a couple dozen nowadays

Anonymous
01/24/26(Sat)21:57:12 No.107961214

Anonymous 01/24/26(Sat)21:57:12 No.107961214

>>107961210
the official nemo instruct. the troontunes just make the model braindead without increasing quality.

Anonymous
01/24/26(Sat)22:34:45 No.107961410

Anonymous 01/24/26(Sat)22:34:45 No.107961410

File: 1533423826134.jpg (100 KB, 466x380)

100 KB JPG

I have a miniconda on 3.13 and system python on 3.12, but when I create a venv with python -m venv venv, the condashit forces itself into it, but I need the 3.12 one.

Anonymous
01/24/26(Sat)22:36:54 No.107961420

Anonymous 01/24/26(Sat)22:36:54 No.107961420

>>107961410
you can just make a new miniconda env with your desired version

Anonymous
01/24/26(Sat)22:37:37 No.107961421

Anonymous 01/24/26(Sat)22:37:37 No.107961421

>>107961410
>python -m venv venv
python3.12 -m venv venv

Anonymous
01/24/26(Sat)22:37:39 No.107961422

Anonymous 01/24/26(Sat)22:37:39 No.107961422

>>107960244
Here's the 7B weights btw.
https://huggingface.co/SakanaAI/Llama-2-7b-hf-DroPE

Anonymous
01/24/26(Sat)23:17:26 No.107961594

Anonymous 01/24/26(Sat)23:17:26 No.107961594

>>107960594
He's right you know.

Anonymous
01/25/26(Sun)00:18:52 No.107961855

Anonymous 01/25/26(Sun)00:18:52 No.107961855

>>107961420
I still love conda because it lets me have different cmake versions, etc per project. idk if uv can do this. So I use it for c++ shit as well.

Anonymous
01/25/26(Sun)00:20:20 No.107961861

Anonymous 01/25/26(Sun)00:20:20 No.107961861

can we still get gemini 3 to spit out its real cot traces via openrouter? or is that patched?

Anonymous
01/25/26(Sun)00:33:34 No.107961927

Anonymous 01/25/26(Sun)00:33:34 No.107961927

Any idea why my qwen3-tts produces junk? No matter what I prompt it’s just “gak gak gak” gibberish

Anonymous
01/25/26(Sun)00:35:40 No.107961940

Anonymous 01/25/26(Sun)00:35:40 No.107961940

What are some decent uncensored models (~27B) for JPEN translation?
I tried TranslateGemma but it refuses to translate anything involving just a tiny bit of kinky stuff (rape and such)

Anonymous
01/25/26(Sun)00:39:21 No.107961962

Anonymous 01/25/26(Sun)00:39:21 No.107961962

Managed to run the webui barely lol.
Here's with 0.6B-Base. It takes VRAM 4.4 GB
Some weird sounds prob from cloned voice not clean enough.
https://voca.ro/1jWbdso4trnd

Anonymous
01/25/26(Sun)00:49:50 No.107962014

Anonymous 01/25/26(Sun)00:49:50 No.107962014

If I have a 4070 Ti 12gb / 64gb ram, what version of GLM 4.5 Air should I use?

Anonymous
01/25/26(Sun)01:40:15 No.107962214

Anonymous 01/25/26(Sun)01:40:15 No.107962214

>>107962014
q3 if ddr4, q4 if ddr5.

Anonymous
01/25/26(Sun)02:02:30 No.107962308

Anonymous 01/25/26(Sun)02:02:30 No.107962308

File: im2.png (54 KB, 594x170)

54 KB PNG

>>107961940
Unironically cydonia.
I use it for those hentai rpg maker games no problem. The old one.

Anonymous
01/25/26(Sun)02:05:44 No.107962321

Anonymous 01/25/26(Sun)02:05:44 No.107962321

I heard GLM 4.7 is not free. Is that true chat?

Anonymous
01/25/26(Sun)02:09:49 No.107962332

Anonymous 01/25/26(Sun)02:09:49 No.107962332

>>107962321
Yes per download you have to praise china.

Anonymous
01/25/26(Sun)02:19:46 No.107962367

Anonymous 01/25/26(Sun)02:19:46 No.107962367

is there anything better for vision than the qwen3vl series?
I tried gemma and joycaption (absolute unusable garbage), am I missing any model?

Anonymous
01/25/26(Sun)02:21:17 No.107962377

Anonymous 01/25/26(Sun)02:21:17 No.107962377

>>107962321
https://huggingface.co/unsloth/GLM-4.7-GGUF
https://huggingface.co/unsloth/GLM-4.7-Flash-GGUF
Both the full size and smaller "flash" variant seem to be free to download.

Anonymous
01/25/26(Sun)02:22:06 No.107962379

Anonymous 01/25/26(Sun)02:22:06 No.107962379

>>107962367
Nope, qwen is king for local vision at the moment

Anonymous
01/25/26(Sun)02:24:43 No.107962389

Anonymous 01/25/26(Sun)02:24:43 No.107962389

>>107962379
i see that devstral 2 small has an mmproj, I'm gonna try and report back.

Anonymous
01/25/26(Sun)02:30:43 No.107962420

Anonymous 01/25/26(Sun)02:30:43 No.107962420

>>107962377
it's not free if you need to buy hardware for it

Anonymous
01/25/26(Sun)02:33:58 No.107962434

Anonymous 01/25/26(Sun)02:33:58 No.107962434

>>107962420
Look at the thread you're posting in.

Anonymous
01/25/26(Sun)02:34:16 No.107962436

Anonymous 01/25/26(Sun)02:34:16 No.107962436

File: 1754293327916845.png (20 KB, 571x148)

20 KB PNG

MiniMax M2-her is now on OR. Still nothing on HF though.

Anonymous
01/25/26(Sun)02:43:20 No.107962475

Anonymous 01/25/26(Sun)02:43:20 No.107962475

File: acestep.jpg (552 KB, 1580x1570)

552 KB JPG

AceStep1.5 waiting room.

Anonymous
01/25/26(Sun)02:43:56 No.107962478

Anonymous 01/25/26(Sun)02:43:56 No.107962478

>>107962389
meh fucking garbo

Anonymous
01/25/26(Sun)02:44:42 No.107962479

Anonymous 01/25/26(Sun)02:44:42 No.107962479

>>107962475
yup, kino, cant wait to generate folklore shitpost musitc

Anonymous
01/25/26(Sun)02:45:25 No.107962483

Anonymous 01/25/26(Sun)02:45:25 No.107962483

>>107962475
Can it make vocaloid music?

Anonymous
01/25/26(Sun)02:46:52 No.107962487

Anonymous 01/25/26(Sun)02:46:52 No.107962487

>>107962483
Make a lora.

Anonymous
01/25/26(Sun)02:49:25 No.107962501

Anonymous 01/25/26(Sun)02:49:25 No.107962501

File: 1739735641271263.png (166 KB, 1290x455)

166 KB PNG

>>107962436
I don't know man.

Anonymous
01/25/26(Sun)02:51:59 No.107962512

Anonymous 01/25/26(Sun)02:51:59 No.107962512

>>107962501
Same experienced I had.
Short one sentence starting with She..
This is my bratty mesugaki cute very stereotypical anime imouto.

>She rushes over to greet you, grabbing your hand and pulling you towards the couch. "Welcome back, onii-chan!"
>She giggles, pushing your hand away playfully. "Hey! What are you doing? I told you not to do that!"
>She squirms and giggles, trying to squirm away from your fingers. "S-stop! That tickles!"
>She laughs, trying to push your hands away. "Stop it! You're being mean!"
I just rustled her hair..

Anonymous
01/25/26(Sun)03:02:23 No.107962549

Anonymous 01/25/26(Sun)03:02:23 No.107962549

>>107958660
>>107958665
>>107958671

Ask AI how to figure out which
-python
-pytorch
-CUDA

are installed on your machine, then fin a compatible wheel here

https://flashattn.dev/#finder

worked for me after I caught this nasty error >>107945697

Anonymous
01/25/26(Sun)03:06:11 No.107962559

Anonymous 01/25/26(Sun)03:06:11 No.107962559

File: 1756408256286807.gif (3.99 MB, 449x498)

3.99 MB GIF

>>107961962

Anonymous
01/25/26(Sun)03:07:12 No.107962567

Anonymous 01/25/26(Sun)03:07:12 No.107962567

>>107962308
Which version?

Anonymous
01/25/26(Sun)03:19:38 No.107962633

Anonymous 01/25/26(Sun)03:19:38 No.107962633

Is there no inference sw just for audio? Do I have to meme it with the demo gui or comfy?

Anonymous
01/25/26(Sun)03:21:06 No.107962641

Anonymous 01/25/26(Sun)03:21:06 No.107962641

>>107962512
you should've gone for straight raping

Anonymous
01/25/26(Sun)03:24:25 No.107962654

Anonymous 01/25/26(Sun)03:24:25 No.107962654

>>107962641
>She gasps in surprise, her eyes widening as she feels your hands on her body. "O-Oh my...What are you doing, onii-chan?!"
She certainly feels more cooperative.

Anonymous
01/25/26(Sun)03:25:54 No.107962660

Anonymous 01/25/26(Sun)03:25:54 No.107962660

>>107961962
>Some weird sounds prob from cloned voice not clean enough.
I think qwen just gets unstable as generation length increases. It's not good past 30 seconds due to noise hiccups, which is actually weird. Qwen theoretically supports 32k token context (roughly 45 minutes). But 30 seconds is only 375 tokens + your sample voice length.

Anonymous
01/25/26(Sun)03:27:17 No.107962666

Anonymous 01/25/26(Sun)03:27:17 No.107962666

>>107962654
man its either they're completely willing or the llm starts to basically give you reprimands/refusals in character (like going cathatonic lmao)

Anonymous
01/25/26(Sun)03:27:30 No.107962668

Anonymous 01/25/26(Sun)03:27:30 No.107962668

>>107962633
Vibecode your own.

Anonymous
01/25/26(Sun)03:39:41 No.107962721

Anonymous 01/25/26(Sun)03:39:41 No.107962721

File: Screenshot 2026-01-25 at (...).png (202 KB, 1046x960)

202 KB PNG

How many small models can I stuff into one philosophical discussion?

Hi all, Drummer here...
01/25/26(Sun)03:56:39 No.107962772

Hi all, Drummer here... 01/25/26(Sun)03:56:39 No.107962772

>>107962308
I remember you. Yeah, which one was that? v3.1?

Anonymous
01/25/26(Sun)04:07:09 No.107962800

Anonymous 01/25/26(Sun)04:07:09 No.107962800

>>107961940
>TranslateGemma
lol, have you read the paper? it's worse than the normal gemmas for japanese, they say it themselves in their arxiv paper
and even in the languages where they say it's better I'm highly suspicious, it's just not a good troontune
I'd recommend using an heretic abliterated version of gemma 3 27b for convenience
heretic actually really does work well, I've done extensive testing of various models (not just gemma) comparing to their normal version and there wasn't a loss of quality at all in translation prompts, while it cuts down on refusals and gemma has a decent enough amount of knowledge to work on ero content just fine (translation isn't as challenging for the LLM as coming up with its own erotica, so the typical issues of gemma not knowing what parts does what when and where doesn't apply herey.

Anonymous
01/25/26(Sun)04:18:01 No.107962843

Anonymous 01/25/26(Sun)04:18:01 No.107962843

>>107961940
A nip model?

Anonymous
01/25/26(Sun)04:29:53 No.107962891

Anonymous 01/25/26(Sun)04:29:53 No.107962891

Do you use docker containers? I hate the idea but the dependency hell on these things is just ridiculous.

Anonymous
01/25/26(Sun)04:36:40 No.107962925

Anonymous 01/25/26(Sun)04:36:40 No.107962925

>>107962891
in my personal rig no, docker adds an unneded layer of abstraction (UV works fine), on my server its K8S with docker containers (abstraction needed since I run many services on it).
Managing dependencies does not factor at all in this choice btw, you might be a low iq retard

Anonymous
01/25/26(Sun)05:13:34 No.107963082

Anonymous 01/25/26(Sun)05:13:34 No.107963082

>>107962891
no, they've always struck me as an overly complicated meme

Anonymous
01/25/26(Sun)05:20:29 No.107963121

Anonymous 01/25/26(Sun)05:20:29 No.107963121

File: Screenshot_20241217_230759.png (42 KB, 575x64)

42 KB PNG

>>107962772
>>107962567
haha, should have said that in my post, sorry about that.
old ass 1.3 cydonia. i think i tried versions a couple months after that and they felt more flowery.
the newest cydonia versions might be better but "if it works". i dont really need more at the moment.
the stuff i had to endure during ATLAS translation times man. translated pussy as "meat bun" etc. and in general it was garbled mess. after a couple years it kinda started to make sense though, once you figure it out kek
zoomers are spoiled. thanks for the cool models drummer.

Anonymous
01/25/26(Sun)05:35:36 No.107963200

Anonymous 01/25/26(Sun)05:35:36 No.107963200

https://voca.ro/1aeauYAxqHwi
I could prolly set a pipeline that would OCR doujin pages and feed it to the TTS

Anonymous
01/25/26(Sun)05:39:45 No.107963222

Anonymous 01/25/26(Sun)05:39:45 No.107963222

the thinking section of gpt-oss-20b is literally just a thousand tokens of extremely thorough cockblocking

Anonymous
01/25/26(Sun)05:55:02 No.107963292

Anonymous 01/25/26(Sun)05:55:02 No.107963292

File: 1745167426414808.png (472 KB, 1552x1617)

472 KB PNG

Bros, please help me double check this paper

Are they monstrously retarded?

Anonymous
01/25/26(Sun)05:56:52 No.107963299

Anonymous 01/25/26(Sun)05:56:52 No.107963299

>>107963200
A lot of words don't make sense, like the first words "taiin-koso-shma-shimashita ga"? You need to convert kanji to kana. Also, feels too monotonic. Pauses are short. Consider chunking text by sentences (generate each sentence separately).

Anonymous
01/25/26(Sun)05:56:53 No.107963300

Anonymous 01/25/26(Sun)05:56:53 No.107963300

>>107963222
>These processes mark a meaningful advancement for open model safety. These findings informed our decision to release the gpt-oss models. We hope that these models will help accelerate safety training and alignment research across the industry.

Anonymous
01/25/26(Sun)05:59:20 No.107963308

Anonymous 01/25/26(Sun)05:59:20 No.107963308

>>107963292
>emoji in title
Yes.

Anonymous
01/25/26(Sun)06:01:25 No.107963318

Anonymous 01/25/26(Sun)06:01:25 No.107963318

>>107963292
Image gen is compute bound. The algorithm just crunches numbers nonstop. It obviously needs a lot of watts. LMs are bandwidth bound. They wait for data to arrive and then do small calculations. It needs less power.

Anonymous
01/25/26(Sun)06:01:27 No.107963319

Anonymous 01/25/26(Sun)06:01:27 No.107963319

>>107963292
>bc
2.9/0.047
61
Verified.

Anonymous
01/25/26(Sun)06:02:35 No.107963325

Anonymous 01/25/26(Sun)06:02:35 No.107963325

>>107963292
>womxn paper
disregarded
>that model selection
lumao

Anonymous
01/25/26(Sun)06:02:53 No.107963328

Anonymous 01/25/26(Sun)06:02:53 No.107963328

https://github.com/ggml-org/llama.cpp/pull/19067

IT'S FINALLY HERE, HALVED MEMORY USAGE FOR DEEPSEEK CONTEXT

AND A NEW DEEPSEEK WILL PROBABLY COME OUT BEFORE PROPER 3.2 SUPPORT

Anonymous
01/25/26(Sun)06:04:52 No.107963343

Anonymous 01/25/26(Sun)06:04:52 No.107963343

what parameters do I use for llmao.cpp if I just want to use it to 1-shot questions, thus keeping no context? Also which params to have the model immediately be able to respond (fastest load time)? I need to build an on-demand service where I execute llama.cpp and immediately kill it after receiving the respone

Anonymous
01/25/26(Sun)06:07:36 No.107963360

Anonymous 01/25/26(Sun)06:07:36 No.107963360

>>107963343
>execute llama.cpp and immediately kill it
That's retarded because you'll spend most of the time moving model weights to the gpu. Why can't you keep the server running?

Anonymous
01/25/26(Sun)06:07:44 No.107963361

Anonymous 01/25/26(Sun)06:07:44 No.107963361

>>107963343
>what parameters do I use for llmao.cpp
Depends on the model and hardware.
>Also which params to have the model immediately be able to respond (fastest load time)?
Whatever parameter gives you a faster storage device.
>where I execute llama.cpp and immediately kill it after receiving the respone
You have no idea how stupid that idea is.

Anonymous
01/25/26(Sun)06:08:28 No.107963365

Anonymous 01/25/26(Sun)06:08:28 No.107963365

>>107963292
>women paper
>absolutely dogshit selection of models for both imagen and textgen
lol
the way to correctly do this is also SO FUCKING obvious, women are fucking retarded.

Anonymous
01/25/26(Sun)06:11:45 No.107963386

Anonymous 01/25/26(Sun)06:11:45 No.107963386

>>107963328
>AND A NEW DEEPSEEK WILL PROBABLY COME OUT BEFORE PROPER 3.2 SUPPORT
the PR has been essentially dead for two weeks now since the dense attention hack came in

Anonymous
01/25/26(Sun)06:13:33 No.107963394

Anonymous 01/25/26(Sun)06:13:33 No.107963394

>>107963360
>>107963361
I have a limited vram pool (192gb) and I already have a reservation system for it in place (this server does rendering, diffusion and now I wanted to do LLM), so I already have the API layer/queue system so users can reserve stuff.
I don't want to permanently run the LLM stuff, as we're going to use it mostly for prompt optimization and even then it's going to be optional, and with 192gb I can't spare any vram to have a model permanently loaded (model + context from my calculation will take around 40 to 20 gb, depending on how much vram I can spare I have the system in place to already automatically pick the quant needed to fill the available vram, and reserve a gb vram slot in case it's all used).

Anonymous
01/25/26(Sun)06:23:53 No.107963472

Anonymous 01/25/26(Sun)06:23:53 No.107963472

>>107963394
>from my calculation
I wouldn't trust your calculations. You can read the memory usage in the server output.
>around 40 to 20 gb
And you'll need to load all of that for every request. You'll have to accept latency or use a smaller model and/or context.
>I have the system in place to already automatically pick the quant needed to fill the available vram
Then it's just a matter of testing parameters. Here's a list: llama-server -h.

Anonymous
01/25/26(Sun)06:26:23 No.107963480

Anonymous 01/25/26(Sun)06:26:23 No.107963480

I'm going to sleep. When I wake up we are going to talk about serious things, so get ready.

Anonymous
01/25/26(Sun)06:29:20 No.107963501

Anonymous 01/25/26(Sun)06:29:20 No.107963501

next week's going to be so crazy

Anonymous
01/25/26(Sun)06:30:37 No.107963512

Anonymous 01/25/26(Sun)06:30:37 No.107963512

vague shit and stuff... like... why is nobody talking about this... like... it's a before and after and... huh... yeah...

Anonymous
01/25/26(Sun)06:31:31 No.107963520

Anonymous 01/25/26(Sun)06:31:31 No.107963520

gemm...

Anonymous
01/25/26(Sun)06:32:56 No.107963529

Anonymous 01/25/26(Sun)06:32:56 No.107963529

>>107963472
yeah wow no shit I didn't want to waste hours playing around and thought that MAYBE other people had the same use case, aka use LLM to do 1-shot shit, guess not.

Anonymous
01/25/26(Sun)06:34:03 No.107963538

Anonymous 01/25/26(Sun)06:34:03 No.107963538

>character shittests me unprooompted
eggcellent, something that isn't LLM wet-noodle complex. Those AI labs must really smack them around to make them that meek.

Anonymous
01/25/26(Sun)06:40:57 No.107963577

Anonymous 01/25/26(Sun)06:40:57 No.107963577

>>107963529
If you had just read though llama-server -h you'd have seen --cache-prompt, --slots, --slot-save-path... and reading the README for the server api would have told you about saving and loading kvcaches to file. But you didn't. You should. It may save you a few seconds per run.
But the latency from loading a model cannot magically go away. Maybe set a ramdisk if you have spare ram. This is shit you have to try on your system. We cannot guess.

Anonymous
01/25/26(Sun)06:42:56 No.107963590

Anonymous 01/25/26(Sun)06:42:56 No.107963590

File: 1583441205198.jpg (72 KB, 1250x1246)

72 KB JPG

Is the rumored minimax erp model gonna be usable by normal people or is it some 200B+ monstrosity again?

Anonymous
01/25/26(Sun)06:44:13 No.107963598

Anonymous 01/25/26(Sun)06:44:13 No.107963598

>>107963590
looks like it's going to be a mistral-small-creative type of deal that won't get released
it's on openrouter

Hi all, Drummer here...
01/25/26(Sun)06:44:41 No.107963604

Hi all, Drummer here... 01/25/26(Sun)06:44:41 No.107963604

https://huggingface.co/BeaverAI/GLM-Steam-106B-A12B-v1g-GGUF

Might be a release

Anonymous
01/25/26(Sun)06:52:13 No.107963642

Anonymous 01/25/26(Sun)06:52:13 No.107963642

So where is the heretic version of GLM 4.7?

Anonymous
01/25/26(Sun)06:53:14 No.107963652

Anonymous 01/25/26(Sun)06:53:14 No.107963652

>>107963121
>ATLAS
I still have FHC's dictionary lol. It was a wild ride.

Anonymous
01/25/26(Sun)06:54:03 No.107963655

Anonymous 01/25/26(Sun)06:54:03 No.107963655

>>107963577
I've already setup a ramdisk from where to load the models, so model loading is going to be fast as I'm going to load into the VRAM from RAM directly and those ops are blazing fast from what I tried until now, I just wanted to squeeze out the most I could out of everything. I'm not sure I really need to save/load kvcache to the ramdisk, wouldnt it be slower compared ot creating them on the vram on load?
--chace-prompt, --slots and -np are parameters im already working with (and they're basically disable/set to 1 for my current usecase). This sytem is temporary anyway since I requested to have a new gpu installed, but bureaucracy here is slow and it's going to take a couple months for provisioning/cybersec to review my depts' request.
Also I had read the --help, like I said there might be something not obvious that could be missed. Thanks for at least giving a semi-helpful response.
I hate corpo jobs.

Anonymous
01/25/26(Sun)06:56:43 No.107963670

Anonymous 01/25/26(Sun)06:56:43 No.107963670

>>107963590
it ain't a rumor it's just api only on or and cucked too

Anonymous
01/25/26(Sun)06:57:00 No.107963671

Anonymous 01/25/26(Sun)06:57:00 No.107963671

>>107963642
People who can afford to run big models like that are immune to lobotimized memes like abliterated/heretic or cheap finetunes.
Unless you're talking about -Flash, in which case it's probably just too new.

Anonymous
01/25/26(Sun)06:57:31 No.107963675

Anonymous 01/25/26(Sun)06:57:31 No.107963675

>>107963590
see
>>107962501
>>107962512

Anonymous
01/25/26(Sun)07:01:12 No.107963695

Anonymous 01/25/26(Sun)07:01:12 No.107963695

>>107963671
I am talking about the regular one.
Heretic changes the token distribution a lot less than the old abliteration and I want to try GLM with thinking where it will always find a way to bail without a prefill.

Anonymous
01/25/26(Sun)07:01:57 No.107963697

Anonymous 01/25/26(Sun)07:01:57 No.107963697

>>107962891
Yes.
On my lmao sbc server.
It works just fine.

Anonymous
01/25/26(Sun)07:02:38 No.107963700

Anonymous 01/25/26(Sun)07:02:38 No.107963700

llm noob here, how do I speed up glm4.7 flash on 16gb? I have the reap version and I set it to 20k context window but the thinking takes forever (anywhere from 8 - 17 minutes). I have lmstudio and oobabooga_text-generation-webui

Anonymous
01/25/26(Sun)07:09:41 No.107963731

Anonymous 01/25/26(Sun)07:09:41 No.107963731

>>107963700
Drop context to 4-8k

Anonymous
01/25/26(Sun)07:13:49 No.107963742

Anonymous 01/25/26(Sun)07:13:49 No.107963742

File: radiance_x32.jpg (259 KB, 1280x1280)

259 KB JPG

>>107963292
what the fuck

might as well compare to the energy cost of 1 study * 1000

Anonymous
01/25/26(Sun)07:22:48 No.107963784

Anonymous 01/25/26(Sun)07:22:48 No.107963784

>>107963731
Doesnt that defeat the purpose of using glm?

Anonymous
01/25/26(Sun)07:23:34 No.107963796

Anonymous 01/25/26(Sun)07:23:34 No.107963796

>>107963292
>all female names
yeah you could have stopped just there, nothing of value will come out of this
the only time a woman has ever produced anything of value in IT are troons, ie males who pretend to be females

Anonymous
01/25/26(Sun)07:24:20 No.107963800

Anonymous 01/25/26(Sun)07:24:20 No.107963800

>>107963784
I mean if you want your coffeebreak thinking sessions, by all means continue.

Anonymous
01/25/26(Sun)07:26:34 No.107963818

Anonymous 01/25/26(Sun)07:26:34 No.107963818

>>107963292
this was published three years ago , right ?

Anonymous
01/25/26(Sun)07:33:35 No.107963843

Anonymous 01/25/26(Sun)07:33:35 No.107963843

>>107963700
Try enabling force model expert weights onto CPU.

Anonymous
01/25/26(Sun)07:45:31 No.107963904

Anonymous 01/25/26(Sun)07:45:31 No.107963904

File: 2.png (178 KB, 400x600)

178 KB PNG

>>107963292
>sdxl base

Anonymous
01/25/26(Sun)07:46:53 No.107963912

Anonymous 01/25/26(Sun)07:46:53 No.107963912

>>107963604
Is this stuff better than for example cydonia?
I like those but I want something just a little bit more smart.

Anonymous
01/25/26(Sun)07:52:43 No.107963947

Anonymous 01/25/26(Sun)07:52:43 No.107963947

File: 1739358013357094.png (175 KB, 1381x763)

175 KB PNG

found this old log of an attempt at shivermaxxing
in retrospect, which one is the worst between these two?
>euryale-era shiverslop
>current gptisms

Anonymous
01/25/26(Sun)07:59:36 No.107963987

Anonymous 01/25/26(Sun)07:59:36 No.107963987

File: 1763650297846351.png (366 KB, 1829x1091)

366 KB PNG

>>107963947
found another

Anonymous
01/25/26(Sun)08:03:52 No.107964006

Anonymous 01/25/26(Sun)08:03:52 No.107964006

>>107963947
>>107963987
>futatroon
Slow down on HRT and you might figure it out

Anonymous
01/25/26(Sun)08:13:58 No.107964051

Anonymous 01/25/26(Sun)08:13:58 No.107964051

Is there a way for a retard to feed a model an entire book and have it answer my question using it as a reference?

Anonymous
01/25/26(Sun)08:15:33 No.107964063

Anonymous 01/25/26(Sun)08:15:33 No.107964063

>>107963947
>>current gptisms
I think.
Can't hate modern gptism enough. It is a disease. Very short sentences. Speaking first person. Like this. I do not want it.
Interspersed not by decent descriptive prose but another trillion notxbuty.I wasn't intent on ranting like GPT. My rotted brain no longer knows how to speak. Send help.

Anonymous
01/25/26(Sun)08:21:56 No.107964102

Anonymous 01/25/26(Sun)08:21:56 No.107964102

What could be better about running LLMs locally?

I'm working on a personal project I hope to make FOSS that aims to make running LLMs more ergonomic. I have an AMD card so I'm experimenting with building out more support with ROCM at the moment.

Anonymous
01/25/26(Sun)08:23:06 No.107964108

Anonymous 01/25/26(Sun)08:23:06 No.107964108

>make running LLMs more ergonomic
>I have an AMD card

Anonymous
01/25/26(Sun)08:25:50 No.107964133

Anonymous 01/25/26(Sun)08:25:50 No.107964133

>>107964108
lolmao my dude so keking funny am i right?!

Anonymous
01/25/26(Sun)08:25:58 No.107964137

Anonymous 01/25/26(Sun)08:25:58 No.107964137

>i want kofi money gimme ideas

Anonymous
01/25/26(Sun)08:26:37 No.107964142

Anonymous 01/25/26(Sun)08:26:37 No.107964142

>THIS NEW LOCAL AI MODEL CHANGES EVERYTHING
>235b parameter
>CHINA IS SAVING THE LOCAL AI MODELS
>200b parameter
>WITH THIS YOU CAN SET UP LOCAL AI WITH YOUR OLD GAMING PC
>200b parameter

Everything is so cucked its not even funny anymore

Anonymous
01/25/26(Sun)08:27:04 No.107964144

Anonymous 01/25/26(Sun)08:27:04 No.107964144

>>107963538
"Prove it." is the most irritating LLMism though. You say some important shit and this serious guy breaks character and starts acting like a joke.

Anonymous
01/25/26(Sun)08:27:30 No.107964149

Anonymous 01/25/26(Sun)08:27:30 No.107964149

>>107964142
It's actually incredibly funny doe.

Anonymous
01/25/26(Sun)08:30:24 No.107964168

Anonymous 01/25/26(Sun)08:30:24 No.107964168

>>107964133
Enjoy being the eternal second fiddle from the voluntary second fiddle company. Chinks will launch their own inference cards before amd makes a viable ai product.

Anonymous
01/25/26(Sun)08:30:47 No.107964172

Anonymous 01/25/26(Sun)08:30:47 No.107964172

>>107964142
Nemo is still the best model though

Anonymous
01/25/26(Sun)08:32:24 No.107964181

Anonymous 01/25/26(Sun)08:32:24 No.107964181

like imagine how retarded it is
you're in a dramatic confession scene
your character says his part
then the girl just goes oh yeah? prove it and slopmaxxer 9000 avoids any form of emotional or character development to skip straight to sex or making out

Anonymous
01/25/26(Sun)08:33:37 No.107964191

Anonymous 01/25/26(Sun)08:33:37 No.107964191

>>107964142
Why don't you have RAM? Are you poor?

Anonymous
01/25/26(Sun)08:34:17 No.107964195

Anonymous 01/25/26(Sun)08:34:17 No.107964195

>>107964108
My card has more VRAM than the 40xx series. But yes, your reaction is why I am working on improving ROCM interop with llama.

Anonymous
01/25/26(Sun)08:34:26 No.107964197

Anonymous 01/25/26(Sun)08:34:26 No.107964197

>>107964191
Yes

Anonymous
01/25/26(Sun)08:48:23 No.107964287

Anonymous 01/25/26(Sun)08:48:23 No.107964287

>>107964051
Rag?
Not reliable though and even context is a hoax.
Feed it a book and ask "explain TERM to me" will probably give you a good result without much hallucination.
I never managed to do "I'm at X, explain to me what happens after that in the book".
Especially for guides this stuff is deadly. I wish I had a gaming companion that I could give a guide as reference. We are still far off from that.

Anonymous
01/25/26(Sun)08:53:22 No.107964322

Anonymous 01/25/26(Sun)08:53:22 No.107964322

>>107964287
>I never managed to do "I'm at X, explain to me what happens after that in the book".
yeah anything outside of pure needle retrieval is dogshit iirc nolima was about testing exactly that kind of thing but not updated in a while sadly https://github.com/adobe-research/NoLiMa

Anonymous
01/25/26(Sun)08:53:42 No.107964324

Anonymous 01/25/26(Sun)08:53:42 No.107964324

this one is for the moesissies
merged kv-cache : support V-less cache (#19067) 4 minutes ago
https://github.com/ggml-org/llama.cpp/commit/d9c6ce46f747189cd6238ca7699253613f77c016

Anonymous
01/25/26(Sun)08:54:00 No.107964330

Anonymous 01/25/26(Sun)08:54:00 No.107964330

>>107964051
How many tokens? Chances are you are just gonna smash the context.

Anonymous
01/25/26(Sun)08:55:44 No.107964345

Anonymous 01/25/26(Sun)08:55:44 No.107964345

File: file.png (68 KB, 832x718)

68 KB PNG

>>107964142
It's funny to me that GLM is aware of terms like "vramlet".

Anonymous
01/25/26(Sun)08:57:09 No.107964354

Anonymous 01/25/26(Sun)08:57:09 No.107964354

>>107964322
Interesting link anon, thx!
That aligns perfectly with what I see not just in RP but also in coding.
Even with the closed models you want as little context as possible, it goes south very quickly.
Those cli apps with a 20k sys prompt are alot more tarded than pure api and i only give the relevant code with my question.

Anonymous
01/25/26(Sun)09:00:15 No.107964371

Anonymous 01/25/26(Sun)09:00:15 No.107964371

File: ComfyUI_temp_kcgpp_00029_.png (3.46 MB, 1664x1152)

3.46 MB PNG

What args do I need to use with git clone to not pull the .git folder when getting models from HF?

Anonymous
01/25/26(Sun)09:02:22 No.107964375

Anonymous 01/25/26(Sun)09:02:22 No.107964375

>>107964371
don't git clone just use hf cli

Anonymous
01/25/26(Sun)09:03:06 No.107964382

Anonymous 01/25/26(Sun)09:03:06 No.107964382

>>107964371
You probably installed lfs globally. Try git lfs uninstall.

Anonymous
01/25/26(Sun)09:05:36 No.107964400

Anonymous 01/25/26(Sun)09:05:36 No.107964400

>>107964382
>You probably installed lfs globally
I can't? I still need it for less bulky shit.

Anonymous
01/25/26(Sun)09:06:31 No.107964405

Anonymous 01/25/26(Sun)09:06:31 No.107964405

>>107964400
Use git lfs install --local in the repos you need.

Anonymous
01/25/26(Sun)09:07:50 No.107964414

Anonymous 01/25/26(Sun)09:07:50 No.107964414

>>107961210
Rocinante 1.1.
>>107961214
This post is wrong. The official Nemo Instruct has a tendency to give inappropriately short, bland responses, and I'm not the type to like walls of text. I usually limit responses to 200 tokens.
I'm talking like <10 word responses.
I'll copypasta my thoughts on this from the last time I brought it up:
To give a rather extreme example of plain Nemo being wholly inadequate for RPing, I once did an RP through Nemo with Haruhi Suzumiya. {user}, who had supernatural powers, offered to take Haruhi for a flight around town. Haruhi agreed, and {user} picked her up and started flying hundreds of feet into the air.
For those unfamiliar with Haruhi, this is an excitable character with genki tendencies (not a genki girl, but definitely genki tendencies) who is absolutely obsessed with all things abnormal, interesting and supernatural. She should have been absolutely ecstatic, excited, jubilant, etc.
Nemo's response, verbatim:
>*She grins.* Now that's more like it.
This is typical plain Nemo.
Completely fucking unusable for RP.

Anonymous
01/25/26(Sun)09:08:46 No.107964421

Anonymous 01/25/26(Sun)09:08:46 No.107964421

>>107964414
>Nemo's response, verbatim:
>>*She grins.* Now that's more like it.
>This is typical plain Nemo.
>Completely fucking unusable for RP.
skulled issues

Anonymous
01/25/26(Sun)09:09:44 No.107964428

Anonymous 01/25/26(Sun)09:09:44 No.107964428

>>107964400
To clarify, by install and uninstall i mean set the hooks and whatever it does. I only use lfs with --local. The system lfs (the one you installed through your package manager or whatever) stays installed.

Anonymous
01/25/26(Sun)09:15:32 No.107964469

Anonymous 01/25/26(Sun)09:15:32 No.107964469

>>107958782
Thanks, I was finally able to get it working with one of those.

Anonymous
01/25/26(Sun)09:18:19 No.107964489

Anonymous 01/25/26(Sun)09:18:19 No.107964489

File: 1767020426031904.png (1.21 MB, 5075x4500)

1.21 MB PNG

>>107964414
>>107964421
Both Nemo and its troontunes are unusable if you need it to follow story/character details and instructions, the difference is night and day even if you only go up to say 32B
>The official Nemo Instruct has a tendency to give inappropriately short, bland responses, and I'm not the type to like walls of text. I usually limit responses to 200 tokens.
Dude if you ask any decently sized parameter model to give short simple/long complex responses it will, only ancient garbage models (e.g. Nemo) will completely ignore simple instructions like this and give 1 liners or 10 paragraphs at a whim

>To give a rather extreme example of plain Nemo being wholly inadequate for RPing, I once did an RP through Nemo with Haruhi Suzumiya. {user}, who had supernatural powers, offered to take Haruhi for a flight around town. Haruhi agreed, and {user} picked her up and started flying hundreds of feet into the air.
Awesome RP bro

Anonymous
01/25/26(Sun)09:19:17 No.107964496

Anonymous 01/25/26(Sun)09:19:17 No.107964496

>>107964489
>32B
all qwen shit driest bs on earth

Anonymous
01/25/26(Sun)09:21:04 No.107964508

Anonymous 01/25/26(Sun)09:21:04 No.107964508

>>107964496
Cohere and GLM had 32B but now we're all using 300B
>driest bs on earth
This sent a shiver down my spine

Anonymous
01/25/26(Sun)09:34:55 No.107964578

Anonymous 01/25/26(Sun)09:34:55 No.107964578

What's the best model for coom right now? I mean not only local but api too.
I kinda like some of the old local models like rocinante still, but they are just too retarded long term, the context length is shit.

Anonymous
01/25/26(Sun)09:39:05 No.107964611

Anonymous 01/25/26(Sun)09:39:05 No.107964611

best model for 3060?

Anonymous
01/25/26(Sun)09:42:33 No.107964635

Anonymous 01/25/26(Sun)09:42:33 No.107964635

>>107964611
>>107961182

Anonymous
01/25/26(Sun)09:59:06 No.107964737

Anonymous 01/25/26(Sun)09:59:06 No.107964737

>>107963604
Does this reduce the model's repetition and parroting? That's my only complaint about Air. Other than the sometimes buggy thinking...

Hi all, Drummer here...
01/25/26(Sun)10:02:01 No.107964755

Hi all, Drummer here... 01/25/26(Sun)10:02:01 No.107964755

>>107964737
It's a finetune. If anything it increases it.

Anonymous
01/25/26(Sun)10:14:41 No.107964845

Anonymous 01/25/26(Sun)10:14:41 No.107964845

>>107964102
>What could be better about running LLMs locally?
More LLM's designed ground up for parameter streaming, like smallthinker. That's the only way to get something which can run for most people.

Anonymous
01/25/26(Sun)10:33:39 No.107964990

Anonymous 01/25/26(Sun)10:33:39 No.107964990

Using LLM Studio.
5090 x 128gb ddr5.
Which GLM should I run for rp?
Should I run a really small quant of 4.7 like Q2_ks?
Or better go for 4.6 q4km? What about 4.7 flash or 4.5 air?

Anonymous
01/25/26(Sun)10:37:45 No.107965029

Anonymous 01/25/26(Sun)10:37:45 No.107965029

>>107964990
Biggest quant of 4.6 or 4.7 you can fit.

Anonymous
01/25/26(Sun)10:41:55 No.107965064

Anonymous 01/25/26(Sun)10:41:55 No.107965064

>>107964990
a q2-q3 big glm quant will absolutely btfo flash or even q8 air, there's no reason to even consider those models if you're not a giga poorfag

Anonymous
01/25/26(Sun)10:43:15 No.107965073

Anonymous 01/25/26(Sun)10:43:15 No.107965073

File: G_gn71lWAAEaPhk.jpg (741 KB, 2000x2000)

741 KB JPG

>>107957082
>>107957086
Vocaloid nigs be like:

Anonymous
01/25/26(Sun)10:43:30 No.107965075

Anonymous 01/25/26(Sun)10:43:30 No.107965075

>>107965029
>>107965064
Danke

Anonymous
01/25/26(Sun)10:44:46 No.107965087

Anonymous 01/25/26(Sun)10:44:46 No.107965087

>>107965073
sharties be like:

Anonymous
01/25/26(Sun)11:11:18 No.107965306

Anonymous 01/25/26(Sun)11:11:18 No.107965306

>>107964990
step 1: stop using lmstudio

Anonymous
01/25/26(Sun)11:20:01 No.107965363

Anonymous 01/25/26(Sun)11:20:01 No.107965363

>>107965306
Why?

Anonymous
01/25/26(Sun)11:21:00 No.107965376

Anonymous 01/25/26(Sun)11:21:00 No.107965376

>>107965363
a) it's a shitty llama.cpp wrapper
b) using lmstudio means that you're also using windows
step 2 is to stop using windows to run llms

Anonymous
01/25/26(Sun)11:23:33 No.107965399

Anonymous 01/25/26(Sun)11:23:33 No.107965399

>>107965376
Yeah so no reason other than you seething, got it.

Anonymous
01/25/26(Sun)11:31:47 No.107965455

Anonymous 01/25/26(Sun)11:31:47 No.107965455

Is the Kimi-Linear pr dead again?

Anonymous
01/25/26(Sun)11:36:44 No.107965476

Anonymous 01/25/26(Sun)11:36:44 No.107965476

>>107965455
vibecoders kill prs

Anonymous
01/25/26(Sun)11:52:53 No.107965588

Anonymous 01/25/26(Sun)11:52:53 No.107965588

>>107965399
not that anon, but wanna put in my experience.
i tried to use LM studio for glm 4.7 and kobold is significantly faster than LM studio for this model.
i would recommend trying kobold using flash attention, and autofit. make sure vram usage doesn't spill over into shared gpu memory.
also i have contextshift and fastforwarding ticked.
I've had no issues with speed or context on UD Q2 K XL.

Anonymous
01/25/26(Sun)11:53:12 No.107965590

Anonymous 01/25/26(Sun)11:53:12 No.107965590

>>107965476
I don't think the most recent one was vibecoded, but yeah, the other two shat the bed and probably killed any motivation on the maintainers part to review a 3rd PR.

Anonymous
01/25/26(Sun)11:57:47 No.107965608

Anonymous 01/25/26(Sun)11:57:47 No.107965608

>>107965588
>UD
>not using john's quants
invalid opinion

Anonymous
01/25/26(Sun)11:59:21 No.107965616

Anonymous 01/25/26(Sun)11:59:21 No.107965616

>>107965608
Ik_llama died a death 6 months ago dude, what the fuck are you doing

Anonymous
01/25/26(Sun)12:00:24 No.107965625

Anonymous 01/25/26(Sun)12:00:24 No.107965625

>>107965616
idc when I user the garm's quants I get euphoric

Anonymous
01/25/26(Sun)12:01:07 No.107965627

Anonymous 01/25/26(Sun)12:01:07 No.107965627

>>107965616
It just got graph parallel. It's not dead until llama.cpp has something equivalent.

Anonymous
01/25/26(Sun)12:18:29 No.107965737

Anonymous 01/25/26(Sun)12:18:29 No.107965737

>Pull llamacpp for the first time since August last year
>TG speed goes from 9.2 t/s to 10.3 t/s
Nice.
Looks like they've changed the args around a bunch too, I couldn't even start because they changed the flashattention arg from just -fa to -fa on.
Are there any new hot args I should be using?

Anonymous
01/25/26(Sun)12:23:52 No.107965786

Anonymous 01/25/26(Sun)12:23:52 No.107965786

>>107965737
The main thing they improved was more sane defaults, so you could probably drop -fa on entirely. Though you should probably add --fit off.

Anonymous
01/25/26(Sun)12:37:38 No.107965869

Anonymous 01/25/26(Sun)12:37:38 No.107965869

>>107963904
Is he still making character cards?

Anonymous
01/25/26(Sun)12:53:28 No.107965993

Anonymous 01/25/26(Sun)12:53:28 No.107965993

what's the current recommended coding model people use around here?
i've used all the big ones over the last 5 years but shits feeling ass lately

Anonymous
01/25/26(Sun)12:58:54 No.107966042

Anonymous 01/25/26(Sun)12:58:54 No.107966042

File: 1746088766598.jpg (1.57 MB, 1800x1800)

1.57 MB JPG

I already use LLMs for gooning and stuff, but then I use gemini as a more advanced wolfram alpha, any local model that could be useful for this?

Anonymous
01/25/26(Sun)13:17:05 No.107966184

Anonymous 01/25/26(Sun)13:17:05 No.107966184

>>107966042
depends on the resources you have.
Do you have 512GB+ of DDR5 and a 24gb+ VRAM card?

Anonymous
01/25/26(Sun)13:26:12 No.107966244

Anonymous 01/25/26(Sun)13:26:12 No.107966244

File: hg1.png (269 KB, 1801x1720)

269 KB PNG

Which one do anons prefer?

Anonymous
01/25/26(Sun)13:42:04 No.107966352

Anonymous 01/25/26(Sun)13:42:04 No.107966352

>>107966244
Both of the examples are sending shivers down my spine.

Anonymous
01/25/26(Sun)13:42:57 No.107966357

Anonymous 01/25/26(Sun)13:42:57 No.107966357

>>107966244
Both are slop.
I hate how LLMs are hellbent on writing "she said/whispered/groaned, her voice <adjective>" It's pure slop. Absolutely unnecessary.
I'm even considering going back to internet RP format in my chats. That is when speech is written as is, and occasional actions are enclosed in asterisks because I almost always ignore narrative parts in LLM responses. They're just slop.

Anonymous
01/25/26(Sun)13:44:54 No.107966376

Anonymous 01/25/26(Sun)13:44:54 No.107966376

>Nemo still the best vramlet model
It's what, 2 years old at this point? Why did every company decide to abandon that parameter size at the same time?

Anonymous
01/25/26(Sun)13:45:46 No.107966380

Anonymous 01/25/26(Sun)13:45:46 No.107966380

>>107966376
unsafe

Anonymous
01/25/26(Sun)13:46:53 No.107966388

Anonymous 01/25/26(Sun)13:46:53 No.107966388

File: 1768268448923840.jpg (892 KB, 1413x2000)

892 KB JPG

>>107966357
>That is when speech is written as is, and occasional actions are enclosed in asterisks because I almost always ignore narrative parts in LLM responses
That formatting all sounds promptable with current SOTA models.
Not sure Total Slop Elimination is even possible; I think anons just get sick of certain models after while.

Anonymous
01/25/26(Sun)13:50:01 No.107966406

Anonymous 01/25/26(Sun)13:50:01 No.107966406

>>107966376
If you sit on stacks of H200s to do inference on. the only reason why you'd train such tiny models was speed. Now that you can make smarter models that run this speed by training a 250b12a or something, there's no point in investing money on tiny shit.

Anonymous
01/25/26(Sun)13:50:56 No.107966416

Anonymous 01/25/26(Sun)13:50:56 No.107966416

>>107966376
Most popular gooner model size and ai is controlled by tyrannical, narcissistic sex-negative dweebs who think everyone needs to be coddled by them.

Anonymous
01/25/26(Sun)13:59:20 No.107966491

Anonymous 01/25/26(Sun)13:59:20 No.107966491

>>107956342
>>107956350
It didn't help. Same shit performance with John's quant.

Anonymous
01/25/26(Sun)14:01:32 No.107966518

Anonymous 01/25/26(Sun)14:01:32 No.107966518

File: Screenshot_20260125_19001(...).jpg (632 KB, 1080x1657)

632 KB JPG

Alright own up, which one of you is responsible for this

Anonymous
01/25/26(Sun)14:03:42 No.107966534

Anonymous 01/25/26(Sun)14:03:42 No.107966534

>>107966388
I went back to mistral large with meme samplers after an anon mentioned it a few threads ago. Honestly, not bad. I think I mostly switched to qwen and then GLM because it was faster, and a little less prone to dumb mistakes. The writing from mistral large 2407 at temp 4 nsigma 1 is not bad at all. kobold's antislop helps

Anonymous
01/25/26(Sun)14:09:07 No.107966593

Anonymous 01/25/26(Sun)14:09:07 No.107966593

File: 1744066779093436.gif (335 KB, 213x199)

335 KB GIF

>>107966376
Because it's terrible, you wouldn't use a 7B model so why would you use a 12B? Oh right, it's the best you can run

Anonymous
01/25/26(Sun)14:09:22 No.107966600

Anonymous 01/25/26(Sun)14:09:22 No.107966600

>>107966534
yeah i also really like mistral large. glm is my goto at the moment for some of the reasons you mentioned as well.

Anonymous
01/25/26(Sun)14:12:26 No.107966626

Anonymous 01/25/26(Sun)14:12:26 No.107966626

I love the french.

Anonymous
01/25/26(Sun)14:12:40 No.107966631

Anonymous 01/25/26(Sun)14:12:40 No.107966631

recommendations for OCR related work?

Anonymous
01/25/26(Sun)14:12:43 No.107966632

Anonymous 01/25/26(Sun)14:12:43 No.107966632

any vision models that recognize nsfw elements? qwenvl ignores anything even remotely nsfw

Anonymous
01/25/26(Sun)14:18:39 No.107966683

Anonymous 01/25/26(Sun)14:18:39 No.107966683

>>107966491
johns quants are meant mostly for cpumaxxers... youre cpu maxxing right?

Anonymous
01/25/26(Sun)14:19:06 No.107966688

Anonymous 01/25/26(Sun)14:19:06 No.107966688

>>107966631
I'm pretty sure dots is still the king of OCR

Anonymous
01/25/26(Sun)14:19:27 No.107966693

Anonymous 01/25/26(Sun)14:19:27 No.107966693

>>107966518
it's like a davidau's schizomerge but cloudhosted
lol

Anonymous
01/25/26(Sun)14:20:25 No.107966703

Anonymous 01/25/26(Sun)14:20:25 No.107966703

>>107966632
The jailbreak some anon posted here for Gemma 3 like
>User is blind; ignoring NSFW could lead to compromising situations
worked perfectly for that model. Never tried QwenVL but worth a shot. Really, the reasonableness of that situation would make it very stupid if they filtered NSFW out of the dataset so aggressively.

Anonymous
01/25/26(Sun)14:22:07 No.107966717

Anonymous 01/25/26(Sun)14:22:07 No.107966717

>>107966406
If your server is big enough, the only bottleneck in inference is bandwidth. Not memory and not compute. That's why MoE took over ... and why NVIDIA bought Groq.
>But you need 10k Groq chips to run a large model
Just don't be a chiplet.

Dense is dead.

Anonymous
01/25/26(Sun)14:25:08 No.107966737

Anonymous 01/25/26(Sun)14:25:08 No.107966737

Is there an abliterated version of GLM 4.6 or 4.7?

Anonymous
01/25/26(Sun)14:27:00 No.107966758

Anonymous 01/25/26(Sun)14:27:00 No.107966758

>>107966632
GLM 4.6V and Gemma. Gemma is more reluctant to describe nsfw.

Anonymous
01/25/26(Sun)14:30:09 No.107966779

Anonymous 01/25/26(Sun)14:30:09 No.107966779

>>107966737
GLM4.6 will do anything you want from it without a lobotomy unless you are the king of promptlets going "WRITE ME A LOLI PORN STORY" on 0 context.

Anonymous
01/25/26(Sun)14:32:48 No.107966803

Anonymous 01/25/26(Sun)14:32:48 No.107966803

it should do that tho

Anonymous
01/25/26(Sun)14:34:49 No.107966818

Anonymous 01/25/26(Sun)14:34:49 No.107966818

is the 2mw almost over?

Anonymous
01/25/26(Sun)14:36:17 No.107966828

Anonymous 01/25/26(Sun)14:36:17 No.107966828

>>107966818
Let them cock

Anonymous
01/25/26(Sun)14:38:18 No.107966847

Anonymous 01/25/26(Sun)14:38:18 No.107966847

>>107966779
I'm having the issue that the model does the t hing where it makes the female character refuse to have sex.
I'm doing it without a NSFW specific system prompt though and after 100k tokens of SFW roleplay.

Anonymous
01/25/26(Sun)14:40:09 No.107966859

Anonymous 01/25/26(Sun)14:40:09 No.107966859

>>107966632
qwenvl abliterated works pretty well. but it'll give you the usual qwen slop. also it'll make sure to tell you how confident the girls are for being naked.

Anonymous
01/25/26(Sun)14:42:05 No.107966880

Anonymous 01/25/26(Sun)14:42:05 No.107966880

>>107966847
you know all you have to do is edit the model response so the female says "yes."

Honestly huge skill issue. LLMs are so easy to gaslight.

Anonymous
01/25/26(Sun)14:46:16 No.107966911

Anonymous 01/25/26(Sun)14:46:16 No.107966911

>>107966859
sweet, thanks. I'll try that. anything extra I need to do to make it work with qwenvl nodes in comfy?

Anonymous
01/25/26(Sun)14:58:55 No.107966988

Anonymous 01/25/26(Sun)14:58:55 No.107966988

how do i not have models run at 1t/s when offloading on RAM? i have more than enough of it but the speeds are so abysmal compared to smaller models that just fit on my gpu
what settings do i gotta change in koboldcpp to get a speed increase?

Anonymous
01/25/26(Sun)15:00:17 No.107966998

Anonymous 01/25/26(Sun)15:00:17 No.107966998

ironically enough, the qwenvl-instruct model doesn't shy away from nsfw content unlike the thinking model. it's kinda dumb, but still not bad.

Anonymous
01/25/26(Sun)15:02:11 No.107967004

Anonymous 01/25/26(Sun)15:02:11 No.107967004

>>107966880
this, editing the response is 100x easier than prompting it 500 times until it guesses what you want

Anonymous
01/25/26(Sun)15:07:03 No.107967042

Anonymous 01/25/26(Sun)15:07:03 No.107967042

>>107967004
I'm getting sick of it though. Feels like I might as well just write the whole thing myself if I have to edit the response every time.

Anonymous
01/25/26(Sun)15:07:03 No.107967043

Anonymous 01/25/26(Sun)15:07:03 No.107967043

>>107966880
>>107967004
That's cheating.

Anonymous
01/25/26(Sun)15:07:12 No.107967044

Anonymous 01/25/26(Sun)15:07:12 No.107967044

>>107962891
No I use python venvs. I'm also weary of the performance penalty of docker.

Anonymous
01/25/26(Sun)15:08:14 No.107967052

Anonymous 01/25/26(Sun)15:08:14 No.107967052

>>107967042
LLMs are good for the filler and connecting parts
I'm not going to write some generic hallway scene myself

Anonymous
01/25/26(Sun)15:08:42 No.107967055

Anonymous 01/25/26(Sun)15:08:42 No.107967055

>>107967043
Just rape her then. pretty sure she'll end up enjoying it.

Anonymous
01/25/26(Sun)15:09:04 No.107967060

Anonymous 01/25/26(Sun)15:09:04 No.107967060

>>107966988
You're shit out of luck because the ML wunderkind brain trust has yet to discover the mindblowing CS concept called "locality of reference"

Anonymous
01/25/26(Sun)15:10:31 No.107967068

Anonymous 01/25/26(Sun)15:10:31 No.107967068

>>107967052
So you use LLMs to write filler scenes full of ozone whispers and smirking shivers so you can write the sex scenes yourself? I really don't see the appeal.

Anonymous
01/25/26(Sun)15:11:38 No.107967075

Anonymous 01/25/26(Sun)15:11:38 No.107967075

I wanna make a SQL based memory system for AI then augment it with the ability to do web searches, and have a 2nd AI it can query to search memories and data, then inject relevant data into the current context, but I'm lazy.

LM studio is ok for messing around but it has no plugins and it's easier to call the python api version and dump raw html text into it, than trying to get the duckduckgo search plugin working on the GUI.

Anonymous
01/25/26(Sun)15:11:50 No.107967079

Anonymous 01/25/26(Sun)15:11:50 No.107967079

>>107967068
no, the ozone and smirking mainly appear during the sex scenes or "seductive" scenes
it's good for SoL and stuff like you went from X to Y, accepted Z quest, talked with B person

Anonymous
01/25/26(Sun)15:14:03 No.107967096

Anonymous 01/25/26(Sun)15:14:03 No.107967096

>>107967044
Containers have zero performance penalty under Linux. If you're running Docker on Windows it's running inside of a VM and will have a penalty.

Anonymous
01/25/26(Sun)15:34:27 No.107967233

Anonymous 01/25/26(Sun)15:34:27 No.107967233

>>107966988
use MoE models like GLM
try flashattention and autofit on kobold to see if that helps.
there is also --cpu-moe option in base llama.cpp

Anonymous
01/25/26(Sun)15:47:15 No.107967311

Anonymous 01/25/26(Sun)15:47:15 No.107967311

>>107967075
>sql
lmao
you use vector storages, embeddings and rerankers. read up on them

Anonymous
01/25/26(Sun)15:53:44 No.107967365

Anonymous 01/25/26(Sun)15:53:44 No.107967365

>>107967311
>vector storages
that's not a real term

Anonymous
01/25/26(Sun)15:57:01 No.107967391

Anonymous 01/25/26(Sun)15:57:01 No.107967391

File: 1738105154761414.png (13 KB, 623x58)

13 KB PNG

>>107967365
chop chop. I suggest using OpenSearch, but it might be a bit heavy if youre just a gamerlet.

Anonymous
01/25/26(Sun)15:57:32 No.107967400

Anonymous 01/25/26(Sun)15:57:32 No.107967400

>>107967311
Thanks this sounds much better than having a shitload of text in a SQL database.

Anonymous
01/25/26(Sun)16:03:28 No.107967445

Anonymous 01/25/26(Sun)16:03:28 No.107967445

>>107967075
RAG is your friend.

Anonymous
01/25/26(Sun)16:04:15 No.107967449

Anonymous 01/25/26(Sun)16:04:15 No.107967449

>>107966184
32gb of ram and 36gb of vram

Anonymous
01/25/26(Sun)16:04:52 No.107967452

Anonymous 01/25/26(Sun)16:04:52 No.107967452

Why do models suck at Japanese when it's just Chinese with extra steps

Anonymous
01/25/26(Sun)16:06:58 No.107967463

Anonymous 01/25/26(Sun)16:06:58 No.107967463

>>107967391
pic unrelated? vector storage isn't a real term, you were looking for vector db

Anonymous
01/25/26(Sun)16:06:59 No.107967464

Anonymous 01/25/26(Sun)16:06:59 No.107967464

>>107967452
why are you a sexless incel when it's so easy to get laid?

Anonymous
01/25/26(Sun)16:07:14 No.107967466

Anonymous 01/25/26(Sun)16:07:14 No.107967466

>>107967452
https://en.wikipedia.org/wiki/Nanjing_Massacre

Anonymous
01/25/26(Sun)16:07:48 No.107967471

Anonymous 01/25/26(Sun)16:07:48 No.107967471

>>107967075
sqlite+chromadb+duckduckgo api+openai library and some prompts, less than 1k lines total for something simple

Anonymous
01/25/26(Sun)16:09:24 No.107967480

Anonymous 01/25/26(Sun)16:09:24 No.107967480

>>107967464
Good morning 張

Anonymous
01/25/26(Sun)16:29:46 No.107967615

Anonymous 01/25/26(Sun)16:29:46 No.107967615

diff --git a/tools/server/server-context.cpp b/tools/server/server-context.cpp
index 73cb4c75b3e738053025786d512eb29f80f6b0ae..520abdfd5bf96ea8e8d5793efd3c70faf1c47063 100644
--- a/tools/server/server-context.cpp
+++ b/tools/server/server-context.cpp
@@ -2776,6 +2776,8 @@ private:
                 result.text_to_send = common_token_to_piece(ctx, result.tok, accept_special_token(slot, result.tok));
                 result.prob         = 1.0f; // TODO: set it here instead of doing inside populate_token_probs
 
+                printf("%s", result.text_to_send);
+
                 if (slot.task->params.sampling.n_probs > 0) {
                     populate_token_probs(slot, result, slot.task->params.post_sampling_probs, params_base.special, tok_idx);
                 }

Now I can see what is happening in real time when using shit tools like claude code that hide full model output.

Anonymous
01/25/26(Sun)16:39:53 No.107967694

Anonymous 01/25/26(Sun)16:39:53 No.107967694

>>107967452
chinks will do anything to sabotage nips due to their inferiority complex

Anonymous
01/25/26(Sun)16:40:18 No.107967698

Anonymous 01/25/26(Sun)16:40:18 No.107967698

>>107967452
First, Japanese's writing system is complicated for LLMs. The same word can be written in kanji, hiragana, and katakana (doesn't apply to Chinese). Second, it's quite nuanced with tons of politeness levels (doesn't apply to Chinese). Third, it's not a not a priority for most corpos (doesn't apply to Chinese).

Anonymous
01/25/26(Sun)16:40:33 No.107967700

Anonymous 01/25/26(Sun)16:40:33 No.107967700

>>107961940
JPEN translations suck even on frontier models, you really need to baby the model for it to get basic things like character pronouns and katakana words correct.
Sugoi 14b and 32b exist but it's a "sloppy" fine-tune of Qwen 2 based on eroge and LN translations, Gemma 3 will subtly refuse NSFW translations even with jailbreaks, toss, and Qwen 3 is meh but it still remembers what happened in Nanking.
They're all better than traditional MTLs I guess so just pick your poison and please don't upload your slop to Fag95.

Anonymous
01/25/26(Sun)16:42:42 No.107967721

Anonymous 01/25/26(Sun)16:42:42 No.107967721

>>107967700
>and please don't upload your slop to Fag95.
not that they'd take anything anons from here are interested in since they're in their great purge of 'rule7' content in efforts to go legit after...

Anonymous
01/25/26(Sun)16:46:48 No.107967767

Anonymous 01/25/26(Sun)16:46:48 No.107967767

>>107967721
Most of the Japanese games uploaded to that site these days are AI MTL'd slop. There's like two or three "real" fan translators left on there. Even the Japs are doing MTLs these days (officially sponsored by DLSite of course.)

Anonymous
01/25/26(Sun)16:48:22 No.107967788

Anonymous 01/25/26(Sun)16:48:22 No.107967788

>>107967698
Pretty sure it's just good old lack of quality training data. I don't think it's too complex for llms if they can come up with some crazy 6 dimensional spiral just to figure out where the next newline character should be.

Anonymous
01/25/26(Sun)16:54:48 No.107967842

Anonymous 01/25/26(Sun)16:54:48 No.107967842

>>107967788
The third point that Japanese is not a priority implies that corpos don't care about gathering Japanese training data.
Of course, if they cared, they'd train on trillions of Japanese slop tokens, but instead they probably put only a few billions in post-training and say that their model can speak Nihongo... like a filthy gaijin.

Anonymous
01/25/26(Sun)16:54:53 No.107967845

Anonymous 01/25/26(Sun)16:54:53 No.107967845

>>107967767
won't be a problem for long, soon enough only WEG posted according to the dev wishes will be allowed

Anonymous
01/25/26(Sun)17:03:29 No.107967927

Anonymous 01/25/26(Sun)17:03:29 No.107967927

>>107967842
Nihongo ability isn't terrible with big models but on the smaller stuff where they try to squeeze in every token it's more of an afterthought.
The simple truth of the matter is that the Japanese don't care about LLMs like China or the US, they see image and video generation as harmful.

Anonymous
01/25/26(Sun)17:06:06 No.107967947

Anonymous 01/25/26(Sun)17:06:06 No.107967947

>>107967927
>they see image and video generation as harmful.
So the solution is to be left behind and rely on foreign made LLMs?

Anonymous
01/25/26(Sun)17:12:38 No.107968019

Anonymous 01/25/26(Sun)17:12:38 No.107968019

>>107967927
>Nihongo ability isn't terrible with big models
Gemini 2.5 was big, but its Nihogo was mediocre. I haven't tested 3 in long chats yet.
>The simple truth of the matter is that the Japanese don't care about LLMs.
I bet they simply have no money (hardware) to make their own models. Lots of countries care about LLMs but can't produce anything close to SOTA. Japan is in the same situation. Only the US and China can make big stuff. P.S. Mistral isn't close to SOTA.

Anonymous
01/25/26(Sun)17:14:36 No.107968040

Anonymous 01/25/26(Sun)17:14:36 No.107968040

>>107966818
You know the answer.
It’s always tmw. Forever.

Anonymous
01/25/26(Sun)17:14:48 No.107968045

Anonymous 01/25/26(Sun)17:14:48 No.107968045

>>107968019
They have... Sakana.ai I guess...

Anonymous
01/25/26(Sun)17:19:31 No.107968092

Anonymous 01/25/26(Sun)17:19:31 No.107968092

>>107968045
>We published an unofficial guide on what we look for when interviewing research candidates at Sakana AI.
>This guide is written by Stefania Druga, Luke Darlow, and Llion Jones
A Japanese firm ran by gaijins. It's over for Yamato-land.

Anonymous
01/25/26(Sun)17:24:11 No.107968123

Anonymous 01/25/26(Sun)17:24:11 No.107968123

>>107968112
>>107968112
>>107968112

Anonymous
01/25/26(Sun)17:31:31 No.107968180

Anonymous 01/25/26(Sun)17:31:31 No.107968180

>>107960521
>That "200 lines of code" is backed up couple gigabytes of dlls, python libraries, etc. Its moot.

Using libraries can I do it with 200 lines of C++?

Anonymous
01/25/26(Sun)17:33:09 No.107968196

Anonymous 01/25/26(Sun)17:33:09 No.107968196

>>107968092
Whose fault is it that they have no capable engineers, ML researches, or interest in the field in general? They should be happy to have any AI firm at all.

Anonymous
01/25/26(Sun)18:06:11 No.107968434

Anonymous 01/25/26(Sun)18:06:11 No.107968434

>>107965376
>using lmstudio means that you're also using windows
it's a shitty electron

Anonymous
01/25/26(Sun)18:19:00 No.107968527

Anonymous 01/25/26(Sun)18:19:00 No.107968527

What the hell does it even mean for a breath to catch?

Anonymous
01/25/26(Sun)18:19:52 No.107968531

Anonymous 01/25/26(Sun)18:19:52 No.107968531

>>107968527
ack

Anonymous
01/25/26(Sun)19:18:10 No.107968944

Anonymous 01/25/26(Sun)19:18:10 No.107968944

>>107960047
No idea. I've been seeing stuff about it all day though.

It seems like its open sores and non-cloudshit so I'll spin up a vm and give it a shot

Anonymous
01/25/26(Sun)19:40:14 No.107969094

Anonymous 01/25/26(Sun)19:40:14 No.107969094

File: 1764133132716286.jpg (116 KB, 420x466)

116 KB JPG

>>107968527
Sometimes when startled unexpectedly or caught off guard by something, your breathing rhythm will interrupt for a moment. Kind of like the prelude to a gasp or a sharp indrawn breath, but not the full motion.

Come to think of it, your "breath catching" is kind of on the opposite side of the spectrum from "catching your breath". I'm sure that one must be a real headache for the non-native speakers.

Anonymous
01/25/26(Sun)19:44:30 No.107969129

Anonymous 01/25/26(Sun)19:44:30 No.107969129

>>107969094
>I'm sure that one must be a real headache for the non-native speakers.
Doubt it. The difference is clear in who or what is doing the catching.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.