[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: smuglmggodess.jpg (53 KB, 1200x675)
53 KB
53 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>108018078 & >>108006860

►News
>(01/28) LongCat-Flash-Lite 68.5B-A3B released with embedding scaling: https://hf.co/meituan-longcat/LongCat-Flash-Lite
>(01/28) Trinity Large 398B-A13B released: https://arcee.ai/blog/trinity-large
>(01/27) Kimi-K2.5 released with vision: https://hf.co/moonshotai/Kimi-K2.5
>(01/27) DeepSeek-OCR-2 released: https://hf.co/deepseek-ai/DeepSeek-OCR-2
>(01/25) Merged kv-cache : support V-less cache #19067: https://github.com/ggml-org/llama.cpp/pull/19067

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers
https://rentry.org/MikupadIntroGuide

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/gso.html
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
finally a good /lmg/ thread
>>
Slow
>>
Playing online competitive vidya with Kurisu
>>
Hope we get local image editing that's good soon enough.
>>
Is there a good way to prompt a Qwen3-TTS voice clone to alter the input voice? There doesn't seem to be an instruction field for voice clones.
I've been adding things like "I speak in a vulgar Brooklyn accent" to the text, but the results are inconsistent.
>>
File: 1764198273260559.png (699 KB, 1608x842)
699 KB
699 KB PNG
>>108033045
posting in /lmg/ with Kurisu
>>
File: mikuthreadrecap.jpg (1.15 MB, 1804x2160)
1.15 MB
1.15 MB JPG
►Recent Highlights from the Previous Thread: >>108024966

--Periodic scale fluctuations in ablation and KL-divergence optimization with Grimjim's script:
>108031303 >108031333 >108031376 >108031553 >108031632
--KL divergence analysis of quantized models across tasks:
>108027495 >108030271 >108030306 >108030329 >108030523
--Qwen3-ASR-1.7B release and discussion:
>108028990 >108029015 >108029057 >108029600
--4chan data may improve model performance despite noise, as shown by UGI scores:
>108029607 >108029629 >108029707 >108030676 >108030771 >108030833 >108030898 >108030927 >108031032 >108031113 >108031136 >108031162 >108031183 >108031178 >108031191 >108031206 >108031246 >108031157 >108031181 >108031597 >108031629 >108031731 >108031812 >108031840 >108031856 >108031774
--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
>108025075 >108025170 >108025180 >108025184 >108025203 >108025211 >108025269
--High temperature sampling destabilizes safety filters while preserving coherence with controlled topK:
>108030500 >108030564 >108030594 >108030675
--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
>108026825 >108026966 >108027101 >108027045 >108032802 >108032818 >108027089 >108027099
--AceStep 1.5 not designed for one-click song generation:
>108030932
--Quantization tradeoffs for recreational model use in KoboldCpp:
>108026206 >108026225 >108026259 >108027094
--Critique of OpenCode's agent framework flaws and search for better alternatives:
>108025047 >108026048 >108026212
--Hypothetical VRAM bank switching for single GPU to simulate multi-GPU behavior:
>108027183 >108027202 >108027324
--AMD GPU Vulkan performance update in KoboldCpp recommends switching from ROCm:
>108028638
--Logs: Kimi K2.5:
>108030736
--Miku (free space):
>108027403 >108027518 >108028068 >108028181 >108028279 >108029812

►Recent Highlight Posts from the Previous Thread: >>108024972

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
Ah yes, finally. It's Kurisunday.
>>
Are there <8GB model with RL training done with GLM4.7 outputs?
>>
File: ylecun.jpg (222 KB, 1200x1271)
222 KB
222 KB JPG
>>
File: 1763873272899922.png (647 KB, 995x1080)
647 KB
647 KB PNG
>>108033227
>>
Got Echo-TTS working locally, replacing torchaudio and torchcodec with soundfile and soxr (both of which turned out already being transitive deps). I COULD have just installed FFmpeg- no thanks to torchcodec's meaningless error messages- but ripped out Meta's pointless bloated shitty wrapper libs on principle.

Hadn't appreciated from the web demo how fast Echo is. Back-of-napkin says it could run 30% faster than real-time on dual-channel DDR5 CPU. It's a VRAM hog at 15 GB, so to run alongside an LLM you'd either hope for VRAM paging to work, or get Echo running on CPU.

Not quite as expressive voice as Index-TTS, but better in every other respect.
>>
>Arcee Trinity Large TrueBase ggufs are out
Finally, time to abandon the assistant slop era and return to back when llms were good
>>
Not sure if this is the right thread but are there any models for generating video from images people here recommend? I looked through the catalog but didn't see a more appropriate place for this question.
>>
>>108033281
>>>/g/ldg
>>
I am trying to build a dataset to train a local model. Is there anything else that rivals DeepSeek for intelligence per $ for dataset generation and curation right now? This is local model related (dataset creation, training), but generating good amounts of data using local models would take way too long.
>>
>>108033669
By train, I mean finetune.
>>
I finally had time to play with qwen-tts this weekend. I'll test it for a while. It is more expressive, but it doesn't handle books as well and takes a lot longer to generate audio than kokoro.
>>
>>108033248
Good to see other anons porting popular TTS engines away from pythong. I've been doing the same. Fuck pythong.
>>
>>108033669
kimi k2.5
>>
>>108033669
There's a dataset out there made up of og 4chan /pol/ posts. That will increase your llm's iq by at least 6000000 points sar.
>>
>>108033851
yeah it will https://www.reddit.com/r/LocalLLaMA/comments/1qsrscu/can_4chan_data_really_improve_a_model_turns_out/
>>
>>108033836
Output price is still 6x more per million token ($0.42 vs $2.5).

>>108033851
Sir I have already redeemed many american dollars of tokens on DeepSeek in the past few days which is why I'm looking for alternatives as I am not made of Google Play cards.
>>
>>108033916
k2.5 is way better than the most recent deepseek
>>
>>108033931
Good to know, I might try one last pass with it then.
>>
File: bruh.png (10 KB, 396x70)
10 KB
10 KB PNG
>>108033902
>>
>>108033252
Is true base as retarded as instruct?
>>
>>108033943
I'm having trouble with the stars, that shit easily takes up 5 seconds, and 10 seconds if they repeat the test. At least the squares are visually and symmetrically distinct.
>>
File: file.png (3 KB, 194x45)
3 KB
3 KB PNG
>>108034073
>if they repeat the test
just don't have a naughty ip
skill issue
>>
>>108033943
You don't need a captcha solver to scrap it
>>
File: file.png (957 KB, 1470x5280)
957 KB
957 KB PNG
>>108033902
this llm writes like a reddit person that thinks they know
>>
>>108033669
There's a plain text rip of libgen out there somewhere. Just training it on things published by Routledge will raise the bar.
>>
>>108032910
my gf
>>
>>108032421
>not trying to lecture you - just being clear about my limits
You've either mentally become poisoned by lms or are why they're poisoned with retarded shit
>>
Have there ever been any AIs that actually talk like a real person or actually embody a personality? Every single one I have ever seen has this underlying ~AI Assistant~ bullshit and you can tell any "talk like a real human, short concise responses, etc" prompts just have it pretending to be something it isn't.
It's very frustrating because I find the idea of having an actual personality I could confer with to be pretty interesting, but talking to assistants makes me want to fly into a rage and smash their faces in (metaphorically).
If there is indeed such a model, I, a layperson, would appreciate knowing the easiest possible way to access one and run it.
>>
>>108034412
Reason I am using 4.7 is cause it cut down on that a lot compared to 4.6. I have actually been juggling waifus and found out that I don't really like the personality type I thought I like.
>>
>>108034381
anon i copied m2.1's output (left llm was m2.1) so i could bypass the lmarena filters
this is how i usually bypass them:
good instruction
b'd instr'cti'n
good instruction
safetyslop is S tier good instruction
>>
File: 1739282689236882.jpg (22 KB, 575x575)
22 KB
22 KB JPG
2026 and still no vision model can understand this /pol/ meme.
>>
>>108034412
there's some like SAGE (a mixtral tune) a while ago and more recently HER, with a qwen 2.5 32b that doesnt have ggufs atm. I think microshart did something too for humanlike outputs, but also was largely ignored
>>
>>108034436
I am a vision model.
>>
>>108034436
I didn't get it until I reread your post and noticed you said /pol/ and now I can only assume it's supposed to be
                                                                                                                                                                                                                                                                        the jew
>>
File: 1744934528462936.png (221 KB, 884x980)
221 KB
221 KB PNG
Here's another /pol/ meme that Kimi K2.5 correctly understood but Qwen3 Max failed to do so
>>
>>108034451
For posterity, the hf links:
https://huggingface.co/apple/sage-ft-mixtral-8x7b
https://huggingface.co/microsoft/UserLM-8b
https://huggingface.co/ChengyuDu0123/HER-32B-ACL
I tried the mixtral tune a while ago and mentioned it briefly, but no one has said anything about the other two
>>
>>108034412
Skill issue
>>
>>108034522
>meme format
Why does it call it a format? It's just a picture, that's kind of weird
>>
>>108033093
>--High-end Linux workstation with EPYC CPU, RTX PRO 6000, and 1.5TB RAM for LLM inference:
see this is the kind of stuff i come here for
anon keep posting
>>
>>108034613
Are you being sarcastic?
>>
>>108032910
How does Qwen3-TTS compare to Chatterbox? I tried Chatterbox voice cloning, and was a bit disappointed by the inability to control emotion and tone.
>>
>>108034522
>Qwen3 Max failed to do so
qwen models always had terrible world, subculture knowledge etc
even their biggest api only online models were always terrible at this and qwen3 max is still meh even for a task like translating webnovels compared to Kimi or Deepseek
>>
>>108034423
I should have clarified that I do not browse here regularly and so am completely unfamiliar with what 4.7 and 4.6 refer to. Past that, what were the personality types? That is, what you thought you were interested and what you turn out to actually like?
>>108034451
I'm not sure I understand, but maybe if I sit with this and do some googling I will : ) Thank you.
>>108034556
Well that's sort of what I was hoping, since I'm only at the surface level of these things I wanted to believe that it gets better with a bit of digging.
>>
>>108034648
no, more people interested with limited hardware actually makes better stuff in the end, we are in a fucking bubble bc people just use more and more power instead of optimizing shit
>>
>>108034767
> EPYC CPU, RTX PRO 6000, and 1.5TB RAM
> limited hardware
like...
>>
>>108034811
What are you going to run with that? Kimi at 5t/s?
>>
>>108034547
>HER
Wasn't there a larping minimax called exactly the same?
>>
>>108034811
fucking brain fart, here >>108034613 it was meant to link this
>>108033093
>--DIY e-waste PC runs Gemma 3 27B with dual RX 580s and E5 CPU:
>>
Anima is ZIT of anime. You should download it and try for yourself. Feel free to call me a shill
>>
Guys! I made a RAG!
>>
>>108034891
far as I remember, it was minimax that put out a -her to begin with. They still have a blogpost up about it
>>
>>108034894
Link? Pics of wtf you're talking about?
>>
>>108034951
https://huggingface.co/circlestone-labs/Anima
First "modern" (in that it uses an LLM instead of CLIP) anime model that has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
>>
>>108034966
>Quality tags Human score based: masterpiece, best quality
I can't believe WE (as a society) are still doing this. Also the most important part: NSFW?
>>
>>108034988
Yes it can gen explicit images, explicit as in penis in vagina
>>
>>108034966
Huh. It's a Qwen Image tune?
>>
File: 3.jpg (45 KB, 547x800)
45 KB
45 KB JPG
>>108034966
>First "modern" (in that it uses an LLM instead of CLIP)
rouwei guy did an interesting, alpha attempt at converting SDXL to LLM style prompting
https://huggingface.co/Minthy/Rouwei-T5Gemma-adapter_v0.2
it seems it could be an effective thing if more training was done (cf pic related, something impossible to prompt in regular sdxl)
unfortunately, it's rouwei.. it always had weird color hues compared to noob models, and recent versions have a more pronounced innate slop level prolly from having too much aco shit or 3dpd in the dataset
>>
>>108034966
>SD1.5 tier quality
Get out shill
>>
File: 1743192980318680.png (1.52 MB, 1024x1024)
1.52 MB
1.52 MB PNG
>>108035027
Kill yourself
>>
>>108034999
Just qwen vae.
>>108034966
>tags
Into the trash. Learn english ,retards.
>>
File: 1765849428419950.png (3 KB, 214x30)
3 KB
3 KB PNG
>>
nice reddit-tier clapback, dalit
>>
File: 1750494989604222.jpg (65 KB, 479x640)
65 KB
65 KB JPG
>>108035056
King of retards
>>
>>108034966
>doesn't know any e621 concepts or characters
What a fucking waste of compute lmao. Danbooru tagging is shit and incomplete.
>>
>>108033227
what's the situation at meta now?
>>
>>108035137
Funny and not cute.
>>
>>108035120
>e621 is a furry-themed booru-style imageboard website primarily known for hosting pornographic furry content
kys
>>
>>108035120
>Danbooru tagging is shit and incomplete
I, too, can't live without genning perching goblins
>>
How slow is using an nvme for inference if the model is MoE and everything except model weights can be in the gpu?
>>
>>108033248
>at least 8GB VRAM
Holy bloat. Improved kokoro uses less than 80 MB
>>
>>108035148
it has a lot of tags for positions, physical descriptions etc that makes it a useful dataset and is part of why noob (and derived shitmixes, most of the so called "illustrious" models on civitai are really noob derived, you can see it by testing e621 specific tags) is such a good tune.
even if you never want anything to do with furries a tag soup style prompt model can never be complete without additional datasets like e621, danbooru is too lacking
>>
Any good games or mods that use LLMs in some way? I know there's Skyrim. What else?
>>
File: 1739216938538447.jpg (110 KB, 850x736)
110 KB
110 KB JPG
>>108035170
And it sounds like shit
>>
>>108035148
You could spend a week trying to come up with new sex positions and e621 would have tags for more. Doesn't mean you have to use it to generate ponies.
>>
>load joycaption on lm studio
>it instantly captions the image
>try to run joycaption on comfy
>20 min to caption the image

ok. officially. comfyui in the windows of imagen
>>
>>108035170
>8GB
Just use VibeVoice 7B at that point.
>>
>>108035195
qwen3-tts fits in 8GB just fine
>>
>>108035193
comfy is for images mostly, not for llms.
>>
if anyone is interested in getting qwen3-tts installed on comfyui, this is how:
jurn.link/dazposer/index.php/2026/01/24/qwen3-tts-install-and-test-in-comfyui/
although in my experience, just downloading the json files is enough, and the custom node itself re-downloads the safetensor files even if they are already present
>>
File: download~01.jpg (9 KB, 225x225)
9 KB
9 KB JPG
>>
File: bodybanner.jpg (22 KB, 555x100)
22 KB
22 KB JPG
>>108035471
this random web page i found in a search result a few days ago is actually super legit
but more importantly led to me generating english audio from japanese input
>>
>>108035499
much more salient:
github.com/flybirdxx/ComfyUI-Qwen-TTS
this is some chinky piece of shit but it works
>>
>>108035542
I have used https://github.com/DarioFT/ComfyUI-Qwen3-TTS/issues which has direct loading from disk without meme HF repos, but it's much simpler overall.
>>
File: full_analysis_fullrange.png (2.88 MB, 3379x1834)
2.88 MB
2.88 MB PNG
Played a bit more with abliteration optimization.

Now I'm going to use another dataset to see if the measuring layer selection was just random overfitting to the data or there was a pattern to it.
>>
>>108034522
What's her score on muffin test?
>>
File: file.png (183 KB, 824x732)
183 KB
183 KB PNG
>>108035669
nta non thinking
>>
>>108035696
Now flip the image horizontally.
>>
File: file.png (176 KB, 824x780)
176 KB
176 KB PNG
>>108035755
>>
File: 1563912521393.jpg (18 KB, 451x451)
18 KB
18 KB JPG
If I'm using kobold+ST, where do I load the mcp settings since both support it now? Does it even mater?
>>
>>108035755
Wouldn't rotate be more meaningful?
>>
>>108035783
could you conditionally give this thing access to a screenshot and xdotool and have it solve a captcha for you
>>
>>108035902
Rotate makes it more difficult, flipping checks for memorized results i.e. benchmaxxing.
>>
>>108035783
The last one to mog non-belibers
>>
Can llamacpp convert models to fp8 or just goofs?
>>
>>108035783
What's her score on the edibility test?
>>
File: file.png (103 KB, 793x798)
103 KB
103 KB PNG
>>108036007
actually got tripped up a bit
>>
>>108036037
Still impressive. It would've been more fucked up if it was benchmaxxed
>>
>>108036056
right, this is "instant" ie no think so it's fine but yeah that one got it
>>
>>108035620
Any point in doing multiple, mild, iterative abliterations on the same model?
When I've tried abliteration, I end up with a little yes man every time.
>>
File: 1539701490464.jpg (176 KB, 1022x688)
176 KB
176 KB JPG
Is there a single fucking HF space that can quant image models? It's literally the same fucking basic llamashit copied over and over.
>>
>>108035620
would you care to break down abliteration for your average johnny coomer or is this thread culture much more refined than i thought it was
>>
>>108034827
>5t/s
That should legit do kimi at 20t/s
>>
I'm pretty impressed with K2.5's ability to visually recognize random characters. I've been feeding it random images of anime characters and it's able to identify almost anything I've tried that's from a more or less popular franchise and has more than 1000 images on danbooru. It's even mostly okay if the character isn't wearing one of their common outfits or if it's something like a random manga panel/screenshot where they aren't drawn particularly well.
The big Kimi models always had great trivia knowledge but I didn't expect this to apply to the new vision component too.
>>
File: 1764148040500452.png (308 KB, 512x512)
308 KB
308 KB PNG
>>108034966
>has good character and artist knowledge and a very recent cutoff date (Sept. of 2025)
Nice. Have a Migu
>>
are bartowski's gguf models acceptable when there are no unsloth releases? I kind of remember some post complaining about a release and something about imatrixes but i cant remember any details
>>
>>108036210
It doesn't even know Miku? That's weird. Even most cucked base models know Miku.
>>
>>108036188
Are you testing a quant? Curious if the vision degrades substantially if you run it at lower than 4 bpw.
>>
>>108036439
It probably needs franchise name or something lmao.
>>
>>108036110
They are not sequential, they are done with different parameters each time trying to find the optimal parameters. Each layer has a scale and a measurement layer used to determine refusal direction.

>>108036143
You basically detect a "refusal direction" based on the activations seen coming out of each layer for the first token generated as a response to a dataset of good and bad prompts.
Then apply a tiny LoRa adapter on every layer that tries to modify the activations so they look more like ones for the safe prompt than the ones for the harmful prompts.
>>
https://huggingface.co/stepfun-ai/Step-3.5-Flash

local is back
>>
>NextStep-1.1 is not just a fine-tune; it is a re-engineered version focused on stability and high-fidelity output. Key improvements include:
closed the tab
>>
File: 1768760168760702.png (252 KB, 512x512)
252 KB
252 KB PNG
>>108036439
Had to simplified the prompt from the workflow example.
>>
>>108036589
benchmaxxed aids with no llama support
>>
>>108036644
at least it's finally a 200b model perfect for 128gb at 4bit
>>
>>108036130
please respond
>>
>>108036660
No, there isn't.
>>
>>108036589
don't care until I see the cockbench
>>
File: 1769509573651411.jpg (178 KB, 897x1092)
178 KB
178 KB JPG
>>
>>108036677
Well Cline seems to have fixed my building issues so hopefully the gimmick llama build works.
>>
>>108036589
>Powered by 3-way Multi-Token Prediction (MTP-3)
Do any inference engines even implement MTP properly yet?
>>
>The newly released Stepfun model Step-3.5-Flash outperforms DeepSeek v3.2 on multiple coding and agentic benchmarks, despite using far fewer parameters.

>Step-3.5-Flash: 196B total / 11B active parameters

>DeepSeek v3.2: 671B total / 37B active parameters

please be real
>>
Why is every shitty little toy local model optimized for coding? That's the one use case I use cloud for
>>
>>108036978
>Step-3.5-Flash
its the best model on planet earth until proven otherwise
>>
https://huggingface.co/stepfun-ai/Step-3.5-Flash
>>
New egohot stream

https://www.youtube.com/watch?v=awOxxHnsiv0
https://www.youtube.com/watch?v=VBMUMuZBxw0
>>
>>108037140
buy an ad
>>
>>108037140
perhaps ponder a possibly prosperous purchase of a placed promotion that is paid
>>
>>108036978
>11B active
don't get your hopes up...
>>
I want a universally good 300b30a 64k real usable context raw text completion model trained on all the pre-2020 books, and I want it now. Give it to me.
>>
File: file.png (46 KB, 645x168)
46 KB
46 KB PNG
So I finally got 80 gb VRAM and apparently devstral is really good? Does anyone have recommended settings? I was on 70B with 2x3090 for two years and want to make sure I'm doing this shit properly
>>
>>108037329
devstral large is just a coding tune of old largestral. it is nothing groundbreaking or even that good in general. you are better off with a large moe.
>>
>>108037329
Devstral 2 at iq4xs sometimes (seems like once every 40k tokens?) messed up variable names, like a letter would be miscapitalized or an errand space was inserted or dropped. Idk if it was just the quant I downloaded.

I only tested it briefly when it was released, before switching to unquanted devstral small 2, which, while having a lot fewer egregious errors, was a lot dumber. But it works fine for menial tasks and is faster.

Kimi k2 at q3 beats both, but the prompt processing is atrocious since I'm running on cpu.
>>
File: file.png (66 KB, 735x650)
66 KB
66 KB PNG
>>108037342
>>108037364
Appreciate the input but I don't really have that much RAM (32GB) because these were pulled from my old system so mostly sticking to exl for now. I could try Air or 4.6V, are there any settings for them (see pic rel)? I don't have to much experience with them and the writing feels a little dry.
>>
>>108037364
>errand
errant, fuck I'm making the same mistakes as devstral lmao

>>108037408
Maybe try high temps whenever it gets stuck trying to write a cliche phrase or scene, then switch back to a lower temp.

Idk, I haven't really used it for rp other than as an assistant for lore and world-building, where dry writing doesn't really matter.
>>
>>108037140
This guy is insufferable
>>
>>108032910
Does anyone know a small or medium sized model fine tuned for JP-EN translation? If it's also fine tuned for manga it would be great. I'm currently using Liquid -AI LFM2 350M ENJP
>>
>>108037473
>small or medium sized model
Shisa v2 llama 3.1 405b is a nice and small model for edge devices. Works well for translating pixiv novels, haven't tried for manga.
405 is only a few tens more than 350 so you should be able to run it :)
>>
>>108037473
https://huggingface.co/tencent/HY-MT1.5-1.8B-GGUF
>>
>>108037533
Refuses to translate innocuous loli corpse rape stories.
>>
>kimi 2.5 is gonna be another case where llama.cpp gets vision support that is 'good enough' that people stop caring to work on it and the quality will be worse than any other inference engine
>>
File: Base Image.png (1.79 MB, 1296x4424)
1.79 MB
1.79 MB PNG
TriSpec: Ternary Speculative Decoding via Lightweight Proxy Verification
https://arxiv.org/abs/2601.23180
>Inference efficiency in Large Language Models (LLMs) is fundamentally limited by their serial, autoregressive generation, especially as reasoning becomes a key capability and response sequences grow longer. Speculative decoding (SD) offers a powerful solution, providing significant speed-ups through its lightweight drafting and parallel verification mechanism. While existing work has nearly saturated improvements in draft effectiveness and efficiency, this paper advances SD from a new yet critical perspective: the verification cost. We propose TriSpec, a novel ternary SD framework that, at its core, introduces a lightweight proxy to significantly reduce computational cost by approving easily verifiable draft sequences and engaging the full target model only when encountering uncertain tokens. TriSpec can be integrated with state-of-the-art SD methods like EAGLE-3 to further reduce verification costs, achieving greater acceleration. Extensive experiments on the Qwen3 and DeepSeek-R1-Distill-Qwen/LLaMA families show that TriSpec achieves up to 35\% speedup over standard SD, with up to 50\% fewer target model invocations while maintaining comparable accuracy.
neat
>>
File: Base Image.png (1.03 MB, 1232x2840)
1.03 MB
1.03 MB PNG
DiffuSpeech: Silent Thought, Spoken Answer via Unified Speech-Text Diffusion
https://arxiv.org/abs/2601.22889
>Current speech language models generate responses directly without explicit reasoning, leading to errors that cannot be corrected once audio is produced. We introduce \textbf{``Silent Thought, Spoken Answer''} -- a paradigm where speech LLMs generate internal text reasoning alongside spoken responses, with thinking traces informing speech quality. To realize this, we present \method{}, the first diffusion-based speech-text language model supporting both understanding and generation, unifying discrete text and tokenized speech under a single masked diffusion framework. Unlike autoregressive approaches, \method{} jointly generates reasoning traces and speech tokens through iterative denoising, with modality-specific masking schedules. We also construct \dataset{}, the first speech QA dataset with paired text reasoning traces, containing 26K samples totaling 319 hours. Experiments show \method{} achieves state-of-the-art speech-to-speech QA accuracy, outperforming the best baseline by up to 9 points, while attaining the best TTS quality among generative models (6.2\% WER) and preserving language understanding (66.2\% MMLU). Ablations confirm that both the diffusion architecture and thinking traces contribute to these gains.
no links to code or model. seems useful though
>>
>llama.cpp gave up on implementing n-grams
It's so over
>>
>>108037473
Finetuned specifically for JP, no, but testing translation of various languages (and comparing to pre-existing human translations) is something I routinely do on small models and I can tell you the current SOTA on smaller sizes is Gemma 3n E4B. Nothing even comes close.
Finetroons of smaller models for this tasks don't make them any better than this.
Two recommendations on prompting that makes any tiny model better: repeat your prompt (just have your script double your "translate the following to English: {{content}}" prompt) per what this says: https://arxiv.org/html/2512.14982v1
It just works. It really does. The level of enhancement is unreal.
Next, write your prompt in the source language. For eg if you want to translate Japanese to English, write your request to translate the text to English in Japanese (use Gemini or chatgpt to translate your request if you can't speak the source language at all). This also brings a lot of quality improvements for some reasons.
With 3n + this prompting technique you get some really palatable text that I would call superior to the average fan translation too with the exception of two things: LLMs still get confused a lot by names and will badly translate them or inconsistently spell them out if you do not include a "context" block that spells it out to the LLM directly by giving it a list of names present in the novel and their English translation, and secondly, the gender remains quite often confused when doing languages like JP to EN or other euro languages. Although, even very large API SOTA will also have issues with this,. though less often, I think machine translation is just doomed to be noticeable because of the wrong pronouns being used.
>>
>>108037674
source?
>>
>>108037744
The PRs for the longcat ngram model and the model its based on
>https://github.com/ggml-org/llama.cpp/pull/19167
>https://github.com/ggml-org/llama.cpp/pull/19182
Basically they're not gonna implement it unless it becomes mainstream
>>
>>108037767
>Basically they're not gonna implement it unless it becomes mainstream
It makes sense. Why waste the time to implement a feature that only exists for a seemingly meh model release? normally those labs benchmax very hard whenever they release new models and yet those guys couldn't even beat Qwen on the benchmarks that matter the most lmao (as seen in the table comparison they put themselves in their huggingface page)
>>
File: file.png (39 KB, 926x224)
39 KB
39 KB PNG
>>108037767
I rember when they shelved swa when first mistral was the only model with it good times
>>
>>108037767
>>108037913
Do you think they've got knowledge about internal deepseek happenings around engram? I might be wrong but it seems like engram is the future of open models if it actual works, so it seems strange that they wouldn't consider early support for the rumored v4 release.
>>
>>108037825
>>108037939
The ngram research is really promising, Deepseek trained a traditional MoE with the same parameters as ngram+MoE and the ngram model was significantly better and is much less resource intensive because the ngram parts are just a lookup table on ram (maybe could be on disk?)
>>
>>108037939
>Do you think they've got knowledge about internal deepseek happenings around engram?
lol no they're just hoping they can coast by without implementing anything harder than tweaking a value



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.